We’re Trailblazing Crowdsourced Language Data for Europe’s AI Revolution
For speech and NLP-based systems to function effectively, they need to be trained on high-quality language data, relevant to the geographic location in which the AI system operates. For example, a chatbot in Germany, Portugal, or France would be far more helpful to users if it could understand German, European Portuguese, or French respectively instead of American English.
However, the language datasets required to train these algorithms are currently unavailable. Businesses or government institutions aiming to launch AI initiatives into the market would need to collect, annotate and validate customized datasets–an expensive and time-consuming undertaking. This deficiency of off-the-shelf datasets, available in a variety of European languages, severely hampered the ability of these institutions and businesses to adapt to and compete in an increasingly AI-driven world.
This is all about to change.
At WebSummit 2020, DefinedCrowd announced the 2021 release of a series of European off-the-shelf language datasets, annotated and validated by a global crowd of over 420,000 contributors.
Available through DefinedData, DefinedCrowd’s online catalog of off-the-shelf datasets, this expansion grants companies developing speech and NLP-based systems in European markets the confidence to move their products to market quickly and without compromising quality. The expansion with begin with the launch of a European Portuguese dataset and will complement the existing 70 datasets already available for download.
Martin Andreas Stein, VP/GM of DefinedData believes this new release is set to be a game-changer. “We are constantly expanding our high-quality datasets to enable companies and ML products with European-focused audiences to reduce their time to market,” said Martin. “With the acceleration of the digitalization we’re currently witnessing, speed and quality are key for success, and we believe these new datasets will empower European-markets to truly compete in a fast-paced industry.”
Daniela Braga, CEO and founder of DefinedCrowd agrees. “In these remarkably uncertain times, one constant is that technology is helping us tackle the issues of tomorrow,” she said. “We’ve seen the digitization of services, powered by AI and machine learning create an outside, positive impact in healthcare as we confront the technological hardships of responding to COVID-19. Our high-quality language data can help countless businesses and governmental organizations adapt to our AI-enabled world.”
Keen to check out our online catalog of off-the-shelf datasets? Browse through our 70 AI training datasets here and keep checking as we add new releases!