We made the building of an AI model super easy by creating Studio. You have probably already seen on Studio that this is a 4 simple steps process:
Upload Data ---> Define Aspects ---> Annotate Data ---> Start training
After automating the training process, we left only one task for the user: to label, or in other words to annotate, data.
Nevertheless, manual annotation is still a step that demands:
- some background knowledge
- some of your time for its completeness
In this article, you will learn everything you need to know for annotating data quickly, efficiently & correctly.
Text Annotation Guidelines
Annotators help AI models associate text segments to tags (topics & sentiments). To achieve that more efficiently, they should:
- Involve domain experts in the process
- Define topics to be comprehensive and non-overlapping in coverage
- Write definitions for each tag and add examples
- Annotate based on explicit mentions of topics with sentiments (= opinion)
- Ignore implicit and subjective topics and factual statements
- Focus on producing high-quality data, take breaks to refresh focus
- Iterate between annotating and training to speed up the process To shed more light on the process, we mention a few examples
Below you can see how someone can define some aspects in a gsheet efficiently. Firstly, create larger groups to categorize the aspects, and then determine the aspects by providing an example and a description of each case.
To shed more light on the process, we mention below two good practices along with exemplary cases.
Be careful only to annotate explicitly mentioned topics with sentiments. AI models can only process and learn from the information that is present in the text.
Try to ignore implicit and subjective or factual segments. Ignoring subjectivity and factual statements improves the quality of the opinion mining model.
Example 1: Facts have no sentiment, so they shouldn’t be annotated.
Example 2: No opinion was explicitly expressed about Delivery.
Example 3 - Not being able to open a bottle makes it not easy to use. It might have a nice design.
Example 4 - The customer received damaged quality. No mention of packaging.