viralamo

Menu
  • Technology
  • Science
  • Money
  • Culturs
  • Trending
  • Video

Subscribe To Our Website To Receive The Last Stories

Join Us Now For Free
Home
Technology
Researchers find high error rates in commercial speech recognition systems
Technology

Researchers find high error rates in commercial speech recognition systems

22/10/2020

Some automatic speech recognition (ASR) systems might be less accurate than previously assumed. That’s the top-level finding of a recent study by researchers at Johns Hopkins University, the Poznan University of Technology in Poland, the Wrocław University of Science and Technology, and startup Avaya, which benchmarked commercial speech recognition models on an internally created dataset. The coauthors claim that the word error rates (WER) — a common speech recognition performance metric — were significantly higher than the best reported results and that this could indicate a wider-ranging problem in the field of natural language processing (NLP).

ASR has become ubiquitous; it dictates meetings and e-mails, helps to manage smart appliances, and more. A comprehensive benchmark of ASR models cites WER as low as 2% to 3% on standard corpora, but the coauthors of this latest report reject that statistic. The majority of interactions with ASRs happen in the context of “chatbot-like interactions,” they claim, where people are aware they’re conversing with a machine and thus simplify their commands to short, well-structured phrases as opposed to the disfluent hallmarks of natural conversation.

The coauthors evaluated several ASR systems on a dataset of 50 call center conversations from 1,595 agents and 1,261 customers, which spanned 8.5 hours in length — 2.2 hours of which was speech. Depending on the dataset, the ASR systems’ previously published error rates didn’t exceed 15% and dropped as low as 2%. This was in contrast with the study’s findings; tested across recorded phone conversations about finance, insurance, telecom, and booking, the coauthors observed WER as high as 23.31%. The highest rates were on the booking and telecom calls, perhaps because the conversations referred to specific dates and times, money, places, and product and company names. But WER was above 13.73% in every domain.

Automatic speech recognition WER

The researchers attribute the disparity to the simplicity of frequently-used benchmarks like Librispeech (1,000 hours of English audiobook recordings), WSJ (dictations and conversations from journalists), and Switchboard (phone exchanges), which they say might be too simple to truly challenge ASRs. Even more holistic benchmarks suffer from the “domain adaptation problem” — while they attempt to mimic real, spontaneous conversations, they’re inherently artificial because they involve pairs of voice actors having a conversation on subjects drawn from agreed-upon topics. Other benchmark datasets come from scripted or semi-scripted conversations like TED Talks. Moreover, the datasets tend to be homogeneous with respect to voice actor demographics. Non-native language speakers are virtually absent from benchmark datasets and factors like pronunciation, linguistics, and gender often aren’t accounted for.

“Benchmark datasets do not represent the true diversity of real-world conversations, both at input signal characteristics and conversation semantics levels,” the coauthors wrote. “The domain of application imposes strict constraints on the vocabulary and the form of the conversations … There are consequential differences between scripted and spontaneous conversations and they affect the results of the ASR evaluation.”

As a remedy, the researchers suggest the ASR and NLP communities collect and annotate audio datasets better aligned with contemporary applications of ASR systems. They also call for work on extended and more inclusive acoustic models representing a broader spectrum of dialects, as well as models that account for technological advances that influence physical properties of processed audio signals.

“These problems are not insurmountable. A thoughtful collaboration between academia and industry partners can lead to the creation of high-quality training and testing datasets,” the researchers continued. “We believe that the overly optimistic perception of ASR accuracy is detrimental to the development of conversational natural language processing downstream applications.”


The audio problem:

Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here


Source link

Share
Tweet
Pinterest
Linkedin
Stumble
Google+
Email
Prev Article
Next Article

Related Articles

Why I invert the camera controls in video games: Empathy
Science is finally studying the beautiful minds of inverted gamers. …

Why I invert the camera controls in video games: Empathy

Etermax CEO: How Trivia Crack published 8 games during the pandemic
This year will go down as one of the toughest …

Etermax CEO: How Trivia Crack published 8 games during the pandemic

Leave a Reply Cancel reply

Find us on Facebook

Related Posts

  • Google releases new version of TensorFlow optimized for macOS Big Sur
    Google releases new version of TensorFlow optimized …
    18/11/2020
  • Sensor Tower: Mobile players spent $61.7 billion on games in 2019
    Sensor Tower: Mobile players spent $61.7 billion …
    14/01/2020
  • Hands-on: Apple’s $300+ Magic Keyboards turn iPad Pros into business laptops
    Hands-on: Apple’s $300+ Magic Keyboards turn iPad …
    21/04/2020
  • 2020 will be a big year for online childcare — here are 7 startups to watch
    Randall Stephenson to step down as AT&T …
    24/04/2020
  • Astro’s Playroom brings adorable 3D platforming to PS5
    Astro’s Playroom brings adorable 3D platforming to …
    12/06/2020

Popular Posts

  • Top 10 Movie Flops Everybody Expected To …
    18/01/2021 0
  • How an obscure British PC maker invented ARM and changed the world
    How an obscure British PC maker invented …
    20/12/2020 0
  • The Callisto Protocol: How Striking Distance Studios is creating survival horror of the future
    The Callisto Protocol: How Striking Distance Studios …
    20/12/2020 0
  • What game development methodology can teach the Biden administration about solving the COVID-19 pandemic
    What game development methodology can teach the …
    20/12/2020 0
  • Top 10 Bad Movies That Wasted Great …
    21/12/2020 0

viralamo

Pages

  • Contact Us
  • Privacy Policy
Copyright © 2021 viralamo
Theme by MyThemeShop.com

Ad Blocker Detected

Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Refresh
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.I AgreePrivacy policy