Off-The-Shelf Data Available for Immediate Use
Speed your AI products to market with high-quality, off-the-shelf datasets from DefinedCrowd. These pre-collected datasets, annotated and validated by a global crowd, can be used to train baseline models or evaluate and benchmark current models. Browse our robust and dynamic catalog for datasets that suit your specific needs.
Watch the video to find out how easy it is to find the speech data you need. Or don’t wait, browse the catalog now.
Quality is at Our Core
Your AI models require high-quality datasets, which is why quality underpins everything we do. The primary quality control mechanism for our speech datasets is Word Error Rate, which for our scripted recordings is less than 5% and for our spontaneous recordings is less than 10%.
For speech collections, we ensure quality by measuring accuracy levels in:
\\ Gender distribution
\\ Age distribution
\\ Noisy vs silent
\\ Nativeness (accuracy of native speakers)
\\ Domain (accuracy in staying on topic)
\\ Segmentation (spontaneous collections)