How can I annotate my data efficiently?

Summary

We made the building of an AI model super easy by creating Studio. You have probably already seen on Studio that this is a 4 simple steps process:

Upload Data ---> Define Aspects ---> Annotate Data ---> Start training

After automating the training process, we left only one task for the user: to label, or in other words to annotate, data.

Nevertheless, manual annotation is still a step that demands:

  1. some background knowledge
  2. some of your time for its completeness

In this article, you will learn everything you need to know for annotating data quickly, efficiently & correctly.

Text Annotation Guidelines

Annotators help AI models associate text segments to tags (topics & sentiments). To achieve that more efficiently, they should:

  • Involve domain experts in the process
  • Define topics to be comprehensive and non-overlapping in coverage
  • Write definitions for each tag and add examples
  • Annotate based on explicit mentions of topics with sentiments (= opinion)
  • Ignore implicit and subjective topics and factual statements
  • Focus on producing high-quality data, take breaks to refresh focus
  • Iterate between annotating and training to speed up the process To shed more light on the process, we mention a few examples

Below you can see how someone can define some aspects in a gsheet efficiently. Firstly, create larger groups to categorize the aspects, and then determine the aspects by providing an example and a description of each case.

Screenshot from 2021-05-25 10-12-37.png

To shed more light on the process, we mention below two good practices along with exemplary cases.

  1. Be careful only to annotate explicitly mentioned topics with sentiments. AI models can only process and learn from the information that is present in the text.

  2. Try to ignore implicit and subjective or factual segments. Ignoring subjectivity and factual statements improves the quality of the opinion mining model.

Example 1: Facts have no sentiment, so they shouldn’t be annotated. Screenshot from 2021-05-25 10-17-44.png

Example 2: No opinion was explicitly expressed about Delivery. Screenshot from 2021-05-25 10-19-22.png

Example 3 - Not being able to open a bottle makes it not easy to use. It might have a nice design. Screenshot from 2021-05-25 10-29-04.png

Example 4 - The customer received damaged quality. No mention of packaging. Screenshot from 2021-05-25 10-25-35.png

Please enter a valid email address.