Browse DefinedCrowd's AI Datasets Catalog
Search our dataset catalog, choose those that meet your requirements and get your data samples today!
Training a baseline model, testing and
evaluating existing ML models, and
benchmarking third-party applications.
Word Error Rate (WER) less than 5% on most datasets.
Pre-collected, off-the-shelf datasets available in a wide range of languages, accents and domains.
Subscription or one-time purchase
- Monologue speech training data
- Dialogue speech gold sets
- 10 different languages
- 5 different industry domains
- Balanced and wide range of demographics represented
- Specialized grammars
Advantages of DefinedCrowd's AI Datasets
Our AI datasets not only are available for immediate use, but they are built with the same level of quality for which DefinedCrowd has become known. Versatile to be used for a variety of training and testing applications while available in specific languages and domains you need.
Time to Market
Quickly build and improve ML models, or adapt live models for faster expansion.
Purchase pre-collected, pre-annotated and validated datasets for model training and testing.
Choose from a one-time download, or our discounted subscription options – whichever fits your needs.
Speech datasets available in multiple languages, domains and recording options.
Datasets Quality Guaranteed
Our multi-faceted approach to data quality ensures you’re only reducing time to market, not quality. Here are several of our quality metrics that are used for quality control.
Word Error Rate (WER)
Our primary quality metric, most datasets are <5% error rate.
Accuracy of audio to a native speaker
Context is specific to a domain
Gender and Age Distribution
Minimizes bias in the dataset