Collect, Label and Validate Text-Based Training Data
People express ideas and intent in different ways creating a complex job for your Natural Language engines. Our text-based AI training data provides high quality datasets in multiple languages and domains to improve your NLU, NLG or TTS engines.
Conversational AI Text Collection
Text datasets consisting of conversations between 2 entities
Text Variant Collection
Datasets consisting of text variants around a specific concept
Text Validation
Validates the quality of any text based dataset on specified criteria
Named Entity Tagging for NER
Annotate and classify single entities in a sentence into pre-defined categories.
Multiple Named-Entity Tagging
Annotate and classify multiple entities in a sentence into pre-defined categories
Sentiment Tagging
Annotate sentences for sentiment, ex. good, neutral, bad
Semantic Annotation
Annotate and classify sentences or phrases by domain and intent
Text Quality Guarantee
Natural language systems rely on multiple quality metrics to function optimally. We combine Word Error Rate (WER) measurements with our ML algorithms and human in the loop validations to ensure your models operate at maximum accuracy.
Spelling and GrammarChecks for proper syntax for each language |
Inter-Annotator AgreementsF1 score >.8 for all annotations with dynamic judgment utilized to perform tiebreakers |
Word Error RateGuaranteed <5% |
NativenessEnsures the datasets use native speakers for transcriptions |