Defined Crowd
Training Sentiment Analysis Models

Training Sentiment
Analysis Models

Turn unstructured text into meaningful insights.

The Challenge

Visionary companies are leveraging sentiment analysis models to dig beyond surface-level understandings of what people are saying and examine the nuances of how it’s being said. However, sentiment in language is a difficult thing to parse. One person’s “negative” doesn’t always match their neighbor’s, and even short phrases can contain layers of nuance.

Those complications are only compounded when it comes to long-form writing like feature stories and product reviews. Ideally, the most sophisticated sentiment models could deliver broad-level, composite scores for long-form content, while simultaneously sifting through individual paragraphs, sentences, and words to extract granular-level insights.

When a Fortune 500 Tech company wanted to turn that ideal into a reality, they partnered with DefinedCrowd.

Training Sentiment Analysis Models

The Solution

Step 1. Document Segmentation

This is exactly the kind of use case our dedicated team of NLP experts loves to sink its teeth into. The client provided more than 100,000 documents, ranging from short paragraphs left on their site, to full-length 1,500-word articles published online.

First, we analyzed those documents and developed an optimal segmentation methodology. On average, we cut each document into 4 distinct pieces, though the variance was wide-ranging. The longest document had 84 unique segments.

documents provided
segments / document identified
Step 2. Sentiment Annotation

Our Neevo contributors tagged the sentiment of each individual segment, while also providing high-level sentiment scores for each document as a whole. During that process, we ran a wide range of automated gatekeeping procedures to monitor their quality of work in real time. In the end, we sourced half a million annotations on the original 100,000 documents.

annotations collected

The Results

In partnering with DefinedCrowd, this Fortune 500 Tech company benefited from extensive data expertise, customizable workflows, and full-service data solutions that made for guaranteed quality results, even within this kind of complex data collection. Our rigorous qualification tests, analysis of text-to-speed/ segment-to-speed ratios, and inter-annotator agreement calculations led to an error rate of less than 3%. High-precision training data makes for high-performance models. Companies know this all too well. They choose their data partners accordingly.

Training Sentiment Analysis Models                                        Training Sentiment Analysis Models
RTA % Tag Precision
(Percentage of correct tags vs. RTA task) Users with low RTA% prevented from working.
Average Text to Speed Ratio
(Length of the input document / task time) Outliers spot checked internally.
Average Segment to Speed Ratio
(Number of segments / task time) Outliers spot checked internally.
(Percentage of user’s unique assessments vs. 2 other annotators) >20% spot checked internally.