Siri, Alexa, Google Assistant, and other Speech Recognition Systems can have trouble understanding the accents and speech patterns of people from minority groups.
As speech recognition technology has gained traction in the last few decades, one prevalent problem has emerged: racial bias. A recent study by PNAS demonstrates “large racial disparities in the performance of five popular commercial ASR systems”, meaning that speech recognition technologies are far less useful to some people than they are to others.
The New York Times article reported “as much as a 16% difference in the number of misidentified words from speech recognition systems when used by Black and white speakers”.
With the growth of AI adoption throughout industries, questions arise surrounding fairness and how it is ensured throughout these systems. Understanding how to avoid and detect bias in AI models is a crucial research topic, and increasingly important as AI continuously expands to new sectors.
The Accent Gap
Black speakers are not the only demographic speech recognition technologies have trouble understanding. In the US, there is a large population of nonnative English speakers (over 350 million), the majority of whom are Spanish speakers (60%). And voice assistants often are unable to understand these bilingual users.
A recent “accent gap” study published by the Washington Post highlighted that Chinese and Spanish accents are the most difficult for Alexa and Google Home to understand, despite the fact that they are the top two spoken languages in the world.
The fact is, speech recognition technologies are not nearly as accurate in understanding nonnative accents as they are in understanding white, non-immigrant, upper-middle-class Americans. It is not a surprising phenomenon; it is this demographic that had access to and trained the technology from the beginning.
Unfortunately, it has resulted in models that exclude a large portion of the population.
For companies deploying speech technologies in the U.S., this is clearly bad for business. “For companies with AI solutions to compete in the large nonnative English-speaking market in the U.S., speech models need to be able to understand a wide range of different Spanish accents, originating from all the Americas,” said Christopher Shulby, Director of Machine Learning Engineering at DefinedCrowd.
So, with so many accent variations, how do speech and voice technologies keep up? The answer is diverse training data.
The secret to successful speech technologies is inclusiveness, achieved by training a model with accented speech training data, representative of diverse groups of people. By using broad speech corpora (both in the words that are used and how they are said), systems can understand all accents and other ways of speaking.
And the more people your model can understand, the more likely you are to acquire and retain customers. However, many companies do not have the resources to train or test their systems with different accents, meaning that speech recognition systems are likely to provide an unresponsive, inaccurate, and even isolating experience to nonnative English speakers.
Closing the Accent Gap
To enable AI developers to test for the accent gap in their technologies, DefinedCrowd is giving away nine hours of Spanish-accented English speech data from the Americas, worth $1350.
It’s safe to say that there will never be one “right” way to speak English. But we can agree there’s only one right way to build AI: ethically and inclusively for all demographics, rather than a select few.
Get the free dataset now!