Smart companies see the pile of unstructured text floating through the digital realm as a strategic goldmine, opening the door to sophisticated consumer insights in real time. However, quantifying something as subjective as sentiment is no easy task. One person’s “positive” doesn’t always match their neighbor’s, and even short phrases sometimes require interpretation (“I love this stupid little office,” for example).
That’s why when a Fortune 500 tech company came to DefinedCrowd seeking sentiment annotation data in Korean, they needed a data partner capable of both collecting and annotating 10,000 input phrases and accounting for the subjectivity inherent to sentiment judgments.
Our workflows, global Neevo community, and automated systems have been expressly designed to handle text sentiment annotation quickly and objectively. First, we collected more than 10,000 domain-specific text selections.
Next, we passed them all on to our Korean-language contributors and had each phrase annotated by three separate crowd members, making for a total of 30,000 annotations.
We delivered the entire set, plus inter-annotator agreement calculations to the client. They found tremendous value therein and requested a second round of collection and annotation. We ran 3,000 phrases through the exact same process and delivered the results just a few weeks later.
Sentiment analysis truly is a marketer’s goldmine. Models can produce both macro-level analysis (sifting through social media postings and online reviews) and micro-level insights (keeping tabs on individual customer interactions). Companies can then leverage these insights to inform PR and Marketing campaign strategies, and use them to disseminate those campaigns to specific geographical locales and individual customers. However, the smartest companies understand that if their sentiment analysis models are built on anything but the highest quality training data— in every language they’re used for— they’ll deliver nothing more than fool’s gold. By partnering with DefinedCrowd, our client received training data with: