Visionary companies are leveraging sentiment analysis models to dig beyond surface-level understandings of what people are saying and examine the nuances of how it’s being said. However, sentiment in language is a difficult thing to parse. One person’s “negative” doesn’t always match their neighbor’s, and even short phrases can contain layers of nuance.
Those complications are only compounded when it comes to long-form writing like feature stories and product reviews. Ideally, the most sophisticated sentiment models could deliver broad-level, composite scores for long-form content, while simultaneously sifting through individual paragraphs, sentences, and words to extract granular-level insights.
When a Fortune 500 Tech company wanted to turn that ideal into a reality, they partnered with DefinedCrowd.
This is exactly the kind of use case our dedicated team of NLP experts loves to sink its teeth into. The client provided more than 100,000 documents, ranging from short paragraphs left on their site, to full-length 1,500-word articles published online.
First, we analyzed those documents and developed an optimal segmentation methodology. On average, we cut each document into 4 distinct pieces, though the variance was wide-ranging. The longest document had 84 unique segments.
Our Neevo contributors tagged the sentiment of each individual segment, while also providing high-level sentiment scores for each document as a whole. During that process, we ran a wide range of automated gatekeeping procedures to monitor their quality of work in real time. In the end, we sourced half a million annotations on the original 100,000 documents.
In partnering with DefinedCrowd, this Fortune 500 Tech company benefited from extensive data expertise, customizable workflows, and full-service data solutions that made for guaranteed quality results, even within this kind of complex data collection. Our rigorous qualification tests, analysis of text-to-speed/ segment-to-speed ratios, and inter-annotator agreement calculations led to an error rate of less than 3%. High-precision training data makes for high-performance models. Companies know this all too well. They choose their data partners accordingly.