What is The Data Structure of Batch Analysis Results for Opinion Mining

Question

Accepted Answer

What is Aspect-based Sentiment Analysis

Aspect-based sentiment analysis refers to the task of identifying fine-grained opinion towards a specific topic in text. This text can take a variety of formats such as emails, product reviews, customer requests, social media posts, and any other user generated content. This approach to text analysis is considered the industry standard for analyzing customer reviews as seen in the image below.

https://malcolm-en-gb.s3.eu-west-1.amazonaws.com/instances/oiJS2eQZyE/resources/tIC9tp5U72/Amazon_review_absa.png (Amazon_review_absa.png)

How A Customer Review is Structured

A customer review consists of two main parts; the review text and review description or metadata such as name, date, rating, etc. This universally applies to most online reviews including Amazon product reviews, Google Maps company reviews, tweets, and more.

https://malcolm-en-gb.s3.eu-west-1.amazonaws.com/instances/oiJS2eQZyE/resources/l4s3zEcKGK/Amazon_review_detailed.png (Amazon_review_detailed.png)

Reviews tend to be dense with mentions of topics, emotions, and sentiments as customers are expressing their experience in a short message. You would hence find reviews that include multiple sentences. Therefore, we unpack a review to its core elements which we refer to as segments (eg, sentences). This way we ensure high accuracy and provide you with all the essential analysis you require to know about a review.

https://malcolm-en-gb.s3.eu-west-1.amazonaws.com/instances/oiJS2eQZyE/resources/mZxanVYcir/Review_text_segments.png (Review_text_segments.png)

To get fine-grained analysis the review is broken-down to 3 levels. Level 1 is the whole review message, Level 2 is a breakdown of the segments in the review text, and Level 3 are the topics and sentiments associated with each segment which can be none, one, or more. The diagram below illustrates this hierarchy.

https://malcolm-en-gb.s3.eu-west-1.amazonaws.com/instances/oiJS2eQZyE/resources/z4o5iQ1knZ/Absa_structure.png (Absa_structure.png)

Your analysis results will also include a unique ID for each level to give it a convenient structure and make working with it easy in your desired application.

https://malcolm-en-gb.s3.eu-west-1.amazonaws.com/instances/oiJS2eQZyE/resources/O1A0Q5wNuX/Review_absa_structure.png (Review_absa_structure.png)

How Our Analysis Results are Structured

It is common to upload a file with 100 rows (1 review per row) and download results with 300 rows for example. This is because each of these reviews would have multiple sentences that can mention 1 or more topics. To make results easy to work with, we add a new row for each topic identified in each segment.

For example, if you have a review consisting of 2 sentences and each sentence has 2 topics mentioned, you will end with 4 rows for this review. The review text will appear 4 times, each segment will appear twice, and each topic is considered unique to a segment. This way you get a layered/nested results ready for your following activities.

Example of input file with 50 rows (50 reviews)

https://malcolm-en-gb.s3.eu-west-1.amazonaws.com/instances/oiJS2eQZyE/resources/8FBcj8CvX2/Screen%20Shot%202021-10-30%20at%203.07.06%20PM.png (Screen Shot 2021-10-30 at 3.07.06 PM.png)

Example of output file with 280 rows (50 reviews)

Having more rows in the results is an indicator of the richness of the data. You will also notice some column headers are highlighted in blue color. These columns are generated by DeepOpinion Studio.

https://malcolm-en-gb.s3.eu-west-1.amazonaws.com/instances/oiJS2eQZyE/resources/3d2iVLcpzl/Screen%20Shot%202021-10-30%20at%2012.27.08%20PM.png (Screen Shot 2021-10-30 at 12.27.08 PM.png)

Results Data Dictionary

Here is a description of the columns you can expect from your analysis.

id [Level 1]: A system generated unique ID for each original row (document or review)

Text [Level 1]: The original text uploaded for automated analysis (document or review)

Segment id [Level 2]: A system generated unique ID for each segment in the original row (child of id)

Segment text [Level 2]: A segment of the original text uploaded for automated analysis (child of text)

Span start [Level 2]: The character count of where a segment starts (child of text)

Span end [Level 2]: The character count of where a segment ends (child of text)

Tag id [Level 3]: A system generated unique ID for each topic identified in a text segment (child of segment id)

Aspect [Level 3]: A topic identified in a text segment (child of segment text)

Sentiment [Level 3]: A sentiment category identified in a text segment (child of segment text)

Others [Level 1]: Any other column included in the original data such as dates, names, ratings, and source (metadata)

Why This Structure

This data structure has been carefully designed to make the results convenient to work with in downstream tasks such as in process automation or data visualization. For example, you might be interested in counting how often do your customers complain about battery life and auto-create a support ticket for each time this is mentioned.

https://malcolm-en-gb.s3.eu-west-1.amazonaws.com/instances/oiJS2eQZyE/resources/IGmg4MO3NV/Ticket_automation_freshdesk_animated.gif (Ticket_automation_freshdesk_animated.gif)

Contact us to know more about moving your customer review from static insights to dynamic and automated actions

What is Aspect-based Sentiment Analysis

How A Customer Review is Structured

How Our Analysis Results are Structured

Why This Structure

Related questions