Evaluate and Fine-Tune Your AI models
Our suite of subjective evaluation tests can further tune your AI models with accurate human in the loop workflows. If you care about the Quality of Experience for your virtual assistants, TTS models, machine translation engines or chatbots, partner with us today.
MOS Testing
Obtain a Mean Opinion Score for a single speech, text or image stimulus
MUSHRA Testing
Evaluate and score multiple speech, text or image stimuli
ABX Testing
Compares 2 choices of a random sample against 2 reference outputs to identify detectable differences
Pronunciation Validation
A more objective evaluation designed to evaluate text-to-speech
Evaluation Quality Guarantee
Given subjective nature of opinion-based work, the same level of DefinedCrowd quality cannot be guaranteed, however multiple quality metrics are used to ensure evaluations are performed and validated, including the naturalness, trustworthiness, and likeability of your AI model.
Spammer controlIP location-based blocking of contributors |
Inter-evaluator checksAgreement score calculations using Pearson Correlation Coefficient |
Standard deviationAnswer variety thresholds based on standard deviation and evaluations per contributor |
Real time auditsAnswer pattern detection for suspicious behavior |