Defined Crowd

DefinedData

High Quality AI Datasets

Accelerate your AI roadmap with instant access to ethically sourced speech and text data.

Get Data

Improve Your Automated Speech Recognition (ASR) Performance

Scripted Monologue Speech Training Data

Single speaker speech following a set of prompts and recorded at 16khz in a single channel audio file.

See all Scripted Monologue Data

Spontaneous Dialogue Speech Training Data

Multi speaker spontaneous conversations following a given scenario, recorded at 8khz in a dual channel audio file, and transcribed with marked speaker turns.

See all Spontaneous Dialogue Data

Spontaneous IVR Speech Training Data

Single speaker spontaneous conversations with a scripted interactive voice response system following a set of scenarios, recorded at 8khz in a single channel audio file, and transcribed.

All Spontaneous IVR Datasets

Data Quality Guaranteed

Our multifaceted approach ensures our AI datasets are both accurate and diverse.

Word Error Rate (WER)
Word Error Rate (WER)
WER is our primary quality metric to measure the accuracy of speech data by comparing spoken words with the corresponding transcriptions.
Language Testing
Language Testing
For each the contributor ensures speech is representative of the target population.
Domain Specificity
Domain Specificity
Scripts gathered from industry specific sources enhance coverage of unique vocabulary.
Gender and Age Distribution
Gender and Age Distribution
Proactively managed throughout collection to minimize and combat bias in the dataset.

Advantages of DefinedData's Prebuilt Datasets

Simplify access to ethically sourced high quality datasets and AI solutions to accelerate go-to-market timelines.

Fast to Market
Fast to Market
Accelerate AI model training, tuning, and testing with datasets available for immediate use.
Flexibility
Flexibility
Choose from numerous datasets curated for model training, benchmarking, or domain customization.
Variety
Variety
Browse an expansive library of fresh, high-quality data available in multiple languages, domains, and recording environments.
Ethically Sourced
Ethically Sourced
Datasets collected with the explicit consent of contributors ensure compliance with global data privacy regulations.