viralamo

Menu
  • Technology
  • Science
  • Money
  • Culturs
  • Trending
  • Video

Subscribe To Our Website To Receive The Last Stories

Join Us Now For Free
Home
Technology
Google’s MixIT AI isolates speakers in audio recordings
Technology

Google’s MixIT AI isolates speakers in audio recordings

26/06/2020

In a paper published on the preprint server Arxiv.org, researchers at Google and the University of Illinois propose mixture invariant training (MixIT), an unsupervised approach to separating, isolating, and enhancing the voices of multiple speakers in an audio recording. This approach requires only single-channel (e.g., monaural) acoustic features, and researchers claim it “significantly” improves speech separation performance by incorporating reverberant mixtures and a large amount of in-the-wild training data.

As the paper’s coauthors point out, audio perception suffers a fundamental problem — sounds are mixed together in a way that’s impossible to disentangle without knowledge of the sources’ characteristics. Attempts have been made to design algorithms capable of estimating each sound source from single-channel recordings, but most to date are supervised, meaning they train on audio mixtures created by adding sounds with or without simulations of the environment. The result is that they fare poorly in the presence of acoustic reverberation or when there’s a mismatch in the distribution of sound types. This is due to several factors. First, it’s tough to match the characteristics of a real corpus, and the room characteristics are sometimes unknown. Then, data of every source type in isolation might not be readily available, and accurately simulating realistic acoustics is also difficult.

MixIT claims to solve these challenges by using acoustic mixtures without references. Training examples are constructed by mixing together existing audio mixtures, and the system divides them into a number of sources, with the separated sources remixed to approximate the original.

In experiments, MixIT was trained using four Google Cloud tensor processing units (TPU) to tackle three tasks: speech separation, speech enhancement, and universal sound separation. For speech separation, the researchers drew on the open source WSJ0-2mix and Libri2Mix data sets to extract over 390 hours of recordings of male and female speakers. They added a reverberation effect before feeding a mixture of the two sets (three-second clips from WSJ0-2mix and 10-second clips from Libri2Mix) to the model.

VB Transform 2020 Online – July 15-17. Join leading AI executives: Register for the free livestream.

For the speech enhancement task, they collected non-speech sounds from FreeSound.org to test whether MixIT could be trained to remove noisy audio from a mixture containing LibriSpeech voices. And for the universal sound separation task, they used the recently released Free Universal Sound Separation data set to train MixIT to separate arbitrary sounds from an acoustic mixture.

The researchers report that in universal sound separation and speech enhancement, unsupervised training wasn’t as helpful compared with existing approaches — presumably because the test sets were “well-matched” to the supervised training domain. However, for universal sound separation, unsupervised training appeared to help slightly with generalization to the test set relative to the supervised-only training. While it didn’t reach supervised levels, the coauthors claim MixIT’s no-supervision performance was “unprecedented.”

Here’s a recording fed into the model:


https://venturebeat.com/wp-content/uploads/2020/06/Example_2_mix-1.wav

Here are the separate audio sources:

https://venturebeat.com/wp-content/uploads/2020/06/Example_2_Unsup_FUSS_sep1-1.wavhttps://venturebeat.com/wp-content/uploads/2020/06/Example_2_Unsup_FUSS_sep0-1.wav

Here’s another recording fed into the model:

https://venturebeat.com/wp-content/uploads/2020/06/Example_1_mix-2.wav

And here’s what the model isolated:

https://venturebeat.com/wp-content/uploads/2020/06/Example_1_Matched_unsupervised_2-source_mixtures_sep1.wavhttps://venturebeat.com/wp-content/uploads/2020/06/Example_1_Matched_unsupervised_2-source_mixtures_sep0.wav

“MixIT opens new lines of research where massive amounts of previously untapped in-the-wild data can be leveraged to train sound separation systems,” the researchers wrote. “An ultimate goal is to evaluate separation on real mixture data; however, this remains challenging because of the lack of ground truth. As a proxy, future experiments may use recognition or human listening as a measure of separation, depending on the application.”

Source link

Share
Tweet
Pinterest
Linkedin
Stumble
Google+
Email
Prev Article
Next Article

Related Articles

TechCrunch’s Favorite Things of 2019
TechCrunch ist jetzt Teil der Verizon Media-Familie. Wir (Verizon Media) …

A year after being banned, Lora DiCarlo returns to CES with new sex toys

NHS finds VR training boosts coronavirus frontline worker performance
Virtual reality showed great potential for medical training well before …

NHS finds VR training boosts coronavirus frontline worker performance

Leave a Reply Cancel reply

Find us on Facebook

Related Posts

  • Beat Saber is now an Oculus studio after Facebook acquisition
    Currencycloud nabs $80M from Visa, World Bank …
    27/01/2020
  • Resolution Games’ Demeo aims to re-create a tabletop dungeon crawl in VR
    Resolution Games’ Demeo aims to re-create a …
    08/12/2020
  • Attabotics raises $50 million to help retailers and grocers automate warehouses
    Attabotics raises $50 million to help retailers …
    18/08/2020
  • Amazon, Apple, Google and Zigbee join forces for an open smart home standard – TechCrunch
    Amazon, Apple, Google and Zigbee join forces …
    18/12/2019
  • Plume raises $85 million to bring smarter Wi-Fi networks to more homes
    Plume raises $85 million to bring smarter …
    26/02/2020

Popular Posts

  • Amid backlash from privacy advocates, Meta expands end-to-end encryption trial
    Amid backlash from privacy advocates, Meta expands …
    11/08/2022 0
  • 10 Connections Between Jesus, Christianity, and Ancient …
    13/07/2022 0
  • Vulnerabilities allowing permanent infections affect 70 Lenovo laptop models
    Vulnerabilities allowing permanent infections affect 70 Lenovo …
    13/07/2022 0
  • Microsoft wins deal to serve ads on Netflix, edging out Comcast and Google
    Microsoft wins deal to serve ads on …
    13/07/2022 0
  • 10 Great Details in Popular Movies – …
    14/07/2022 0

viralamo

Pages

  • Contact Us
  • Privacy Policy
Copyright © 2022 viralamo
Theme by MyThemeShop.com

Ad Blocker Detected

Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Refresh
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.I AgreePrivacy policy