viralamo

Menu
  • Technology
  • Science
  • Money
  • Culturs
  • Trending
  • Video

Subscribe To Our Website To Receive The Last Stories

Join Us Now For Free
Home
Technology
A brief overview of IBM’s new 7 nm Telum mainframe CPU
Technology

A brief overview of IBM’s new 7 nm Telum mainframe CPU

02/09/2021

Each Telum package consists of two 7nm, eight-core / sixteen-thread processors running at a <em>base</em> clock speed above 5GHz. A typical system will have sixteen of these chips in total, arranged in four-socket
Enlarge / Each Telum package consists of two 7nm, eight-core / sixteen-thread processors running at a base clock speed above 5GHz. A typical system will have sixteen of these chips in total, arranged in four-socket “drawers.”

From the perspective of a traditional x86 computing enthusiast—or professional—mainframes are strange, archaic beasts. They’re physically enormous, power-hungry, and expensive by comparison to more traditional data-center gear, generally offering less compute per rack at a higher cost.

This raises the question, “Why keep using mainframes, then?” Once you hand-wave the cynical answers that boil down to “because that’s how we’ve always done it,” the practical answers largely come down to reliability and consistency. As AnandTech’s Ian Cutress points out in a speculative piece focused on the Telum’s redesigned cache, “downtime of these [IBM Z] systems is measured in milliseconds per year.” (If true, that’s at least seven nines.)

IBM’s own announcement of the Telum hints at just how different mainframe and commodity computing’s priorities are. It casually describes Telum’s memory interface as “capable of tolerating complete channel or DIMM failures, and designed to transparently recover data without impact to response time.”

When you pull a DIMM from a live, running x86 server, that server does not “transparently recover data”—it simply crashes.

IBM Z-series architecture

Telum is designed to be something of a one-chip-to-rule-them-all for mainframes, replacing a much more heterogeneous setup in earlier IBM mainframes.

The 14 nm IBM z15 CPU which Telum is replacing features five total processors—two pairs of 12-core Compute Processors and one System Controller. Each Compute Processor hosts 256MiB of L3 cache shared between its 12 cores, while the System Controller hosts a whopping 960MiB of L4 cache shared between the four Compute Processors.

Advertisement

Five of these z15 processors—each consisting of four Compute Processors and one System Controller—constitutes a “drawer.” Four drawers come together in a single z15-powered mainframe.

Although the concept of multiple processors to a drawer and multiple drawers to a system remains, the architecture inside Telum itself is radically different—and considerably simplified.

Telum architecture

Telum is somewhat simpler at first glance than z15 was—it’s an eight-core processor built on Samsung’s 7nm process, with two processors combined on each package (similar to AMD’s chiplet approach for Ryzen). There is no separate System Controller processor—all of Telum’s processors are identical.

From here, four Telum CPU packages combine to make one four-socket “drawer,” and four of those drawers go into a single mainframe system. This provides 256 total cores on 32 CPUs. Each core runs at a base clockrate over 5 GHz—providing more predictable and consistent latency for real-time transactions than a lower base with higher turbo rate would.

Pockets full of cache

Doing away with the central System Processor on each package meant redesigning Telum’s cache, as well—the enormous 960MiB L4 cache is gone, as well as the per-die shared L3 cache. In Telum, each individual core has a private 32MiB L2 cache—and that’s it. There is no hardware L3 or L4 cache at all.

This is where things get deeply weird—while each Telum core’s 32MiB L2 cache is technically private, it’s really only virtually private. When a line from one core’s L2 cache is evicted, the processor looks for empty space in the other cores’ L2. If it finds some, the evicted L2 cache line from core x is tagged as an L3 cache line and stored in core y‘s L2.

Advertisement

OK, so we have a virtual, shared up-to-256MiB L3 cache on each Telum processor, composed of the 32MiB “private” L2 cache on each of its eight cores. From here, things go one step further—that 256MiB of shared “virtual L3” on each processor can, in turn, be used as shared “virtual L4” among all processors in a system.

Telum’s “virtual L4” works largely the same way its “virtual L3” did in the first place—evicted L3 cache lines from one processor look for a home on a different processor. If another processor in the same Telum system has spare room, the evicted L3 cache line gets retagged as L4 and lives in the virtual L3 on the other processor (which is made up of the “private” L2s of its eight cores) instead.

AnandTech’s Ian Cutress goes into more detail on Telum’s cache mechanisms. He eventually sums them up by answering “How is this possible?” with a simple “magic.”

AI inference acceleration

IBM’s Christian Jacobi briefly outlines Telum’s AI acceleration in this two-minute clip.

Telum also introduces a 6TFLOPS on-die inference accelerator. It’s intended to be used for—among other things—real-time fraud detection during financial transactions (as opposed to shortly after the transaction).

In the quest for maximum performance and minimal latency, IBM threads several needles. The new inference accelerator is placed on-die, which allows for lower latency interconnects between the accelerator and CPU cores—but it’s not built into the cores themselves, a la Intel’s AVX-512 instruction set.

The problem with in-core inference acceleration like Intel’s is that it typically limits the AI processing power available to any single core. A Xeon core running an AVX-512 instruction only has the hardware inside its own core available to it, meaning larger inference jobs must be split among multiple Xeon cores to extract the full performance available.

Telum’s accelerator is on-die but off-core. This allows a single core to run inference workloads with the might of the entire on-die accelerator, not just the portion built into itself.

Listing image by IBM

Source link

Share
Tweet
Pinterest
Linkedin
Stumble
Google+
Email
Prev Article
Next Article

Related Articles

Cox’s bad customer service stymies users who don’t want upload speeds cut
Cox has been making it extremely difficult or impossible for …

Cox’s bad customer service stymies users who don’t want upload speeds cut

2020 will be a big year for online childcare — here are 7 startups to watch
TechCrunch ist Teil von Verizon Media. Klicken Sie auf ‘Ich …

Reddit CEO: TikTok is ‘fundamentally parasitic’

Leave a Reply Cancel reply

Find us on Facebook

Related Posts

  • Enterprise AR will follow these 3 paths in 2021
    Enterprise AR will follow these 3 paths …
    09/12/2020
  • Salesforce’s Einstein Vision & Language services will get named entity recognition
    Salesforce’s Simulation Cards spell out uses, risks, …
    20/10/2020
  • Amazon now lets customers label and track specific objects with Rekognition
    Amazon now lets customers label and track …
    25/11/2019
  • Ajit Pai caves to SpaceX but is still skeptical of Musk’s latency claims
    Amazon investing $10 billion to compete against …
    31/07/2020
  • Twitch expands Watch Parties so we can view Prime movies and TV shows together remotely
    Twitch expands Watch Parties so we can …
    10/04/2020

Popular Posts

  • 10 Disturbing and Eerie Photographs of Abandoned …
    28/06/2022 0
  • 10 U.S. Towns with Terrifying Local Legends …
    30/05/2022 0
  • 10 Inventions Meant for the Military They …
    31/05/2022 0
  • 10 Darkest Rock Albums Ever Made – …
    31/05/2022 0
  • 1.1 quintillion operations per second: US has world’s fastest supercomputer
    1.1 quintillion operations per second: US has …
    31/05/2022 0

viralamo

Pages

  • Contact Us
  • Privacy Policy
Copyright © 2022 viralamo
Theme by MyThemeShop.com

Ad Blocker Detected

Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Refresh