viralamo

Menu
  • Technology
  • Science
  • Money
  • Culturs
  • Trending
  • Video

Subscribe To Our Website To Receive The Last Stories

Join Us Now For Free
Home
Technology
Researchers develop AI that reads lips from video footage
Technology

Researchers develop AI that reads lips from video footage

05/12/2019

AI and machine learning algorithms capable of reading lips from videos aren’t anything out of the ordinary, in truth. Back in 2016, researchers from Google and the University of Oxford detailed a system that could annotate video footage with 46.8% accuracy, outperforming a professional human lip-reader’s 12.4% accuracy. But even state-of-the-art systems struggle to overcome ambiguities in lip movements, preventing their performance from surpassing that of audio-based speech recognition.

In pursuit of a more performant system, researchers at Alibaba, Zhejiang University, and the Stevens Institute of Technology devised a method dubbed Lip by Speech (LIBS), which uses features extracted from speech recognizers to serve as complementary clues. They say it manages industry-leading accuracy on two benchmarks, besting the baseline by a margin of 7.66% and 2.75% in character error rate.

LIBS and other solutions like it could help those hard of hearing to follow videos that lack subtitles. It’s estimated that 466 million people in the world suffer from disabling hearing loss, or about 5% of the world’s population. By 2050, the number could rise to over 900 million, according to the World Health Organization.

lip reading

LIBS distills useful audio information from videos of human speakers at multiple scales, including at the sequence level, context level, and frame level. It then aligns this data with video data by identifying the correspondence between them (due to different sampling rates and blanks that sometimes appear at the beginning or end, the video and audio sequences have inconsistent lengths), and it leverages a filtering technique to refine the distilled features.

Both the speech recognizer and lip reader components of LIBS are based on an attention-based sequence-to-sequence architecture, a method of machine translation that maps an input of a sequence (i.e., audio or video) to an output with a tag and attention value. The researchers trained them on the aforementioned and LRS2, which contains more than 45,000 spoken sentences from the BBC, and on CMLR, the largest available Chinese Mandarin lip-reading corpus with over 100,000 natural sentences from the China Network Television website (including over 3,000 Chinese characters and 20,000 phrases).

The team notes that the model struggled to achieve “reasonable” results on the LRS2 data set, owing to the shortness of some sentences. (The decoder struggles to extract relevant information from sentences with fewer than 14 characters.) However, once it was pre-trained on sentences with a maximum length of 16 words, the decoder improved the quality of the end parts of sentences in the LRS2 data set by leveraging context-level knowledge. “[LIBS reduces] the focus on unrelated frames,” wrote the researchers in a paper describing their work. “[T]he frame-level knowledge distillation further improves the discriminability of the video frame features, making the attention more focused.”

Source link

Share
Tweet
Pinterest
Linkedin
Stumble
Google+
Email
Prev Article
Next Article

Related Articles

Team.Video challenges Zoom with features to fix ‘common meeting frustrations’
The world may not need another video conferencing tool, but …

Team.Video challenges Zoom with features to fix ‘common meeting frustrations’

Microsoft’s Transcribe in Word gives Office 365 subscribers 5 hours of transcription a month
In October, Microsoft unveiled a transcription feature — Transcribe in …

Microsoft’s Transcribe in Word gives Office 365 subscribers 5 hours of transcription a month

Leave a Reply Cancel reply

Find us on Facebook

Related Posts

  • NZXT launches $699 Starter PC to get more people gaming
    NZXT launches $699 Starter PC to get …
    25/08/2020
  • PlayStation 5 gets Godfall looter-slasher from Gearbox Publishing
    Telecom operators in India warn people of …
    09/03/2020
  • 2020 will be a big year for online childcare — here are 7 startups to watch
    GM and Ventec Life Systems partner to …
    21/03/2020
  • AMD reveals Radeon RX 6900 XT, 6800 XT, and 6800 with Rage power
    AMD reveals Radeon RX 6900 XT, 6800 …
    28/10/2020
  • Kongregate acquires popular battle royale game Surviv.io
    Kongregate acquires popular battle royale game Surviv.io
    05/12/2019

Popular Posts

  • 10 Unusual Things That Have Washed Ashore …
    18/06/2022 0
  • 10 Cockamamie Causes of Riots – Listverse
    21/05/2022 0
  • Top 10 Dumbest Products on Shark Tank …
    21/05/2022 0
  • 10 Things You May Not Know About …
    22/05/2022 0
  • 10 Real Historical Events That Inspired ‘Game …
    22/05/2022 0

viralamo

Pages

  • Contact Us
  • Privacy Policy
Copyright © 2022 viralamo
Theme by MyThemeShop.com

Ad Blocker Detected

Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Refresh