Defining the model’s aspects efficiently is essential for annotating data correctly and eventually building a state-of-the-art model. We can create the aspects for you. You might have them created or build them together. In any case, we share here seven tips for making the most pertinent aspects for your custom models!
Tip 1: Prominence
Try to capture topics that "stick out" and be more specific wherever this is possible. For example, in the software industry, we could use different aspects such as “app”, “desktop app” or “mobile app”, depending on how detailed the data is. On the other hand, if the information generally only mentions an app, we can’t be more specific, and we should call the aspect something like “app general”.
Tip 2: Overlapping
Be aware of overlapping meaning to avoid creating aspects corresponding to similar content. For example, the aspects “support” and “customer center” would be overlapping factors for customer data, as they would be used on the same text segments during labeling.
Tip 3: Removing an aspect
Make sure that a label is relevant to your data, and you use it during labeling before you add it to your aspects list. After spending some time on data annotation, you can permanently remove it from your aspects list if you realize that you never use an aspect. For example, if I have customer data and define the aspect “mobile app” then notice that the particular aspect is not used during labeling, I should remove it.
Tip 4: Adding an aspect
While annotating data, you might feel like an aspect is missing from the aspects list as you would like to use it during labeling. In this case, you can always add the needed aspect to your aspects list. Just make sure that you need this aspect and that there is more than one example where it appears.
Tip 5: Difficulty
Try to define an aspect with a simple word and not use complicated language or abstract concepts. For example, if you have to define aspects for a financial model, it is better to determine “expenses” than “deduction”.
Tip 6: Subjective opinion space
Keep in mind that we try to cover as much personal opinion space as we can. Meaning that people talk in a specific way, using particular words, and are interested in specific aspects of a product depending on the product’s industry. For example, people usually talk about the “engine”, the “wheels/tires”, etc. Our goal is to capture these aspects, which are essential for a product, to process as many text/reviews as possible. By capturing these aspects, we understand how the market “talks” about a product.
Tip 7: Annotators' cognitive effort
Lastly, since the data will be labeled manually by annotators, we should consider their cognitive effort. Creating many aspects that are difficult to understand will significantly increase the annotators’ work and lead to a more flawed result. For this reason, we should consider whether we would be happy with the existing aspects list as an annotator before we finalize our aspects list.