viralamo

Menu
  • Technology
  • Science
  • Money
  • Culturs
  • Trending
  • Video

Subscribe To Our Website To Receive The Last Stories

Join Us Now For Free
Home
Technology
Google Brain’s AI achieves state-of-the-art text summarization performance
Technology

Google Brain’s AI achieves state-of-the-art text summarization performance

23/12/2019

Summarizing text is a task at which machine learning algorithms are improving, as evidenced by a recent paper published by Microsoft. That’s good news — automatic summarization systems promise to cut down on the amount of message-reading enterprise workers do, which one survey estimates amounts to 2.6 hours each day.

Not to be outdone, a Google Brain and Imperial College London team built a system — Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence, or Pegasus — that leverages Google’s Transformers architecture combined with pretraining objectives tailored for abstractive text generation. They say it achieves state-of-the-art results in 12 summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills, and that it shows “surprising” performance on low-resource summarization, surpassing previous top results on six data sets with only 1,000 examples.

As the researchers point out, text summarization aims to generate accurate and concise summaries from input documents, in contrast to executive techniques. Rather than merely copy fragments from the input, abstractive summarization might produce novel words or cover principal information such that the output remains linguistically fluent.

Transformers are a type of neural architecture introduced in a paper by researchers at Google Brain, Google’s AI research division. As do all deep neural networks, they contain functions (neurons) arranged in interconnected layers that transmit signals from input data and slowly adjust the synaptic strength (weights) of each connection — that’s how all AI models extract features and learn to make predictions. But Transformers uniquely have attention. Every output element is connected to every input element, and the weightings between them are calculated dynamically.

The team devised a training task in which whole, and putatively important, sentences within documents were masked. The AI had to fill in the gaps by drawing on web and news articles, including those contained within a new corpus (HugeNews) the researchers compiled.

In experiments, the team selected their best-performing Pegasus model — one with 568 million parameters, or variables learned from historical data — trained on either 750GB of text extracted from 350 million web pages (Common Crawl) or on HugeNews, which spans 1.5 billion articles totaling 3.8TB collected from news and news-like websites. (The researchers say that in the case of HugeNews, a whitelist of domains ranging from high-quality news publishers to lower-quality sites was used to seed a web-crawling tool.)

Pegasus achieved high linguistic quality in terms of fluency and coherence, according to the researchers, and it didn’t require countermeasures to mitigate disfluencies. Moreover, in a low-resource setting with just 100 example articles, it generated summaries at a quality comparable to a model that had been trained on a full data set ranging from 20,000 to 200,000 articles.

Source link

Share
Tweet
Pinterest
Linkedin
Stumble
Google+
Email
Prev Article
Next Article

Related Articles

Fast & Furious: Crossroads lives video games a quarter-mile at a time
TechCrunch ist Teil von Verizon Media. Klicken Sie auf ‘Ich …

Some investors turn to cutting fully remote checks while sheltering in place

Code-execution bug in Pulse Secure VPN threatens patch laggards everywhere
Organizations that have yet to install the latest version of …

Code-execution bug in Pulse Secure VPN threatens patch laggards everywhere

Leave a Reply Cancel reply

Find us on Facebook

Related Posts

  • Datasaur, a semi-automated text data-labeling tool, raises $1 million
    Datasaur, a semi-automated text data-labeling tool, raises …
    02/03/2020
  • Researcher uses 600-year-old algorithm to crack crypto keys found in the wild
    Researcher uses 600-year-old algorithm to crack crypto …
    14/03/2022
  • Tilting Point acquires Star Trek: Timelines game from Disruptor Beam
    Tilting Point acquires Star Trek: Timelines game …
    05/03/2020
  • Office 2021 will be available for non-Microsoft 365 subscribers on October 5
    Office 2021 will be available for non-Microsoft …
    16/09/2021
  • Hackers access security cameras inside Cloudflare, jails, and hospitals
    Hackers access security cameras inside Cloudflare, jails, …
    10/03/2021

Popular Posts

  • Police linked to hacking campaign to frame Indian activists
    Police linked to hacking campaign to frame …
    17/06/2022 0
  • 10 Things You Might Not Know About …
    20/05/2022 0
  • Hackers backdoor PHP source code after breaching internal git server
    Researchers find backdoor lurking in WordPress plugin …
    21/05/2022 0
  • 10 Cockamamie Causes of Riots – Listverse
    21/05/2022 0
  • Top 10 Dumbest Products on Shark Tank …
    21/05/2022 0

viralamo

Pages

  • Contact Us
  • Privacy Policy
Copyright © 2022 viralamo
Theme by MyThemeShop.com

Ad Blocker Detected

Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Refresh