The News: Several major technology companies are focusing on training voice assistants to better cater to people with speech disabilities and speech patterns that can be difficult for voice processing technologies today to get absolutely right. Among the companies working on improving voice assistants’ processing of atypical speech patterns are Apple (Siri), Amazon (Alexa), and Google (Google Assistant). This new generation of speech recognition layers doesn’t stop at technology accessibility, however. What it also signals is a wave of improvements that will have a wide-ranging impact on voice tech in the workplace as well, and we may begin to see some of these improvements turn up in everyday productivity applications by the end of this year. Read more at the Wall Street Journal.
What Voice Assistants Trained to Understand Atypical Speech Mean to The Future of Work
Analyst Take: The gist of the Wall Street article is that according to the National Institute on Deafness and Other Communication Disorders roughly 7.5 million people in the U.S. have trouble using their voices and are therefore at risk of being left behind by voice-recognition technology. Individuals with dysarthria, cerebral palsy, Parkinson’s disease, and brain tumors especially, who may not have as much access to nonverbal options like standard keyboards or even gesture recognition interfaces as most, stand to benefit the most from these enhancements. Companies like Google, Amazon and Apple are working to address that market’s needs by making their voice assistants better at both identifying atypical speech and processing it better. On its face, this is a wonderful project that will help millions of people suffering from speech and motor challenges leverage voice technology to improve their quality of life, gain more autonomy through everyday human-machine partnerships, participate more in the economy, and find themselves empowered to collaborate better with coworkers even in remote work environments.
The additional opportunity that I see emerging from voice assistants trained to understand atypical speech, or its potentially biggest positive side-effect, is it will considerably improve voice assistant and voice recognition technology in general, for everyone else as well. This is especially true in four key areas, which also happen to be current pain points for many organizations. Here’s a look at those four areas:
- Eliminating “Um” “Uh” and Other Unwanted Verbal Tics From Live Meeting Transcriptions
For instance, training voice assistants to understand atypical speech — specifically, to look for and identify what may be considered as flaws in typical speech patterns is the same process by which they can be trained to identify words like “uh” and “um.” If you have ever tried to rely on voice-activated transcriptions during meetings and interviews, you will no doubt be aware of how painfully ineffective the inability to identify and remove those verbal tics can be. As small as this one capability may be, it is far from insignificant, particularly when the conversation being transcribed is a deposition, financial disclosure, mission-critical briefing, or a political speech. Clarity is key, and accurate transcription would be well served by the ability to automatically remove constant interruptions like verbal tics, vocalized hesitations, and grunted punctuations from the substance of the communication being transcribed.
- Eliminating Embarrassing and Potentially Harmful Speech-to-Text Transcription Errors
Along similar lines, training voice assistants to better identify atypical speech means they can also be trained to simply listen better. Going back to the current unreliability of voice-activated transcriptions, improved performance in this area could significantly limit the number of errors that currently plague such systems. These errors range from unimportant (e.g. the voice assistant mistaking “bored” for “board”) to potentially dire (e.g. the voice assistant mistaking “non-priority” for “priority”). When relying on automated transcription to create a record of a conversation, briefing, interview, report, or speech, it is imperative that the voice-recognition technology be capable of differentiating between contrary words or combinations of words that may sound familiar even if the speaker didn’t enunciate them clearly. Currently, these mistakes are frequent. In the near future, as these technologies become better at recognizing and processing atypical speech, these lapses in performance should begin to fade away.
- Understanding and Transcribing All Regional Accents Equally Well
Voice recognition technologies have also been fairly unreliable when it comes to understanding certain types of accents. While accents aren’t atypical speech per se, the same type of improvements that will enable voice assistants to better understand users with speech challenges can also be applied to teaching voice assistants to understand accents better, and therefore transcribe their words with more accuracy.
As remote collaboration continues to become the new bedrock of digital collaboration, voice-processing technologies are increasingly embedding themselves in the very platforms that organizations depend on for those types of communications. Zoom, Microsoft Teams, Webex Teams, Slack and others will all require accurate transcription of meetings as a matter of basic best practices, and voice processing technology must evolve accordingly, in spite of regional accents, speech challenges, or even poor diction.
- Improving the Accuracy of Simultaneous Translations
Improving the accuracy of voice-processing technologies by applying lessons learned from adapting to atypical speech will also have a significant impact on automatic speech-to-text translation, which still finds itself hindered by the very limitations already discussed above. Better hearing means better transcription, which in turn means more accurate translations. As heartened as I am about automated translations growing from a library of about a dozen languages to about a hundred languages in the coming year (more on that in a separate article), it is equally important that technology companies focus on deepening rather than just broadening the scope of their voice-processing technologies. In sum, quality and quantity both matter.
Promising Projects from Google, Apple, and Amazon Currently Aiming to Improving Voice Processing Accuracy in 2021
The evolution of collaboration and voice-processing technologies is exciting to watch and well underway. Promising projects from Google, Apple, Amazon, and others are in development, and examples of these improvement projects are already taking shape. In December, Amazon announced an Alexa integration with Voiceitt, an Israeli startup that lets people with speech impairments train an algorithm to recognize their own unique vocal patterns. Apple is looking for ways to deliver its Hold to Talk feature, which allows users not to be interrupted by Siri when they pause, without having to touch the device. Siri should soon be able to automatically detect when someone has a stutter or isn’t quite done speaking. In a similar vein, Google’s Project Euphonia is training Google Assistant and smart Google Home products to learn from their individual users’ unique speech patterns. As silicon, software, and AI improve together and are put to use actively learning from hundreds of millions of voice assistant and video collaboration software users worldwide, we should reasonably expect to see noticeable improvements in the quality of real-time voice processing by the end of 2021. It may take a few more years to really iron out the wrinkles that still keep speech processing from being as accurate as we would all like it to be, but we’re getting there.
Futurum Research provides industry research and analysis. These columns are for educational purposes only and should not be considered in any way investment advice.