The Next Generation of Arm Server CPU Cores


Just under four years ago, Arm announced their Neoverse family of infrastructure CPU designs. Deciding to double-down on the server and edge computing markets by designing Arm CPU cores specifically for those markets – and not just recycling the consumer-focused Cortex-A designs – Arm set about tackling the infrastructure market in a far more aggressive manner. Those efforts, in turn, have increasingly paid off handsomely for Arm and its partners, whom thanks to the likes of products like Amazon’s Graviton and Ampere Altra CPUs have at long last been able take a meaningful piece of the server CPU market.

But as Arm CPUs finally achieve the market penetration that eluded them in the previous decade, Arm needs to make sure it isn’t resting on its laurels. Of the company’s three lines of Neoverse core designs –the efficient E, flexible N, and high-performance V – the company is already on its second generation of N cores, aptly dubbed the N2. Now, the company is preparing to update the rest of the Neoverse lineup with the next generation of V and E cores as well, announcing today the Neoverse V2 and Neoverse E2 cores. Both of these designs are slated to bring the Armv9 architecture to HPC and other server customers, as well as significant performance improvements.

Arm Neoverse V2: Armv9 Graces High-Performance Computing

Leading the charge for Arm’s new CPU core IP is the company’s second-generation V-series design, the Neoverse V2. The complete V2 platform, codenamed Demeter, marks Arm’s first iteration on their high-performance V-series cores, as well as the transition of this core lineup from the Armv8.4 ISA to Armv9. And while this is only Arm’s second go at a dedicated high-performance core for servers, make no mistake: Arm aims to be ambitious. The company is claiming that Neoverse V2 CPUs will offer the highest single-threaded integer performance available in the market, eclipsing next-generation designs from both AMD and Intel.

While this week’s announcement from Arm is not a full-on deep-dive of the new architecture – and, more annoyingly, the company is not talking about specific PPA metrics – Arm is offering a high-level look at some of the changes and features that will be coming with the V2 platform. To be sure, the V2 IP is already finished and shipping to customers today (most notably NVIDIA), but Arm is playing coy to some degree with what they’re saying about V2 before the first chips based on the IP ship in 2023.

First and foremost, the bump to Armv9 brings with it the full suite of features that come with the latest Arm architecture. That includes the security improvements that are a cornerstone feature of the architecture (and especially handy for cloud shared environments) along with Arm’s newer SVE2 vector extensions.

On the latter, Arm is making an interesting change here by reconfiguring the width of their vector engines; whereas V1 implemented SVE(1) using a 2 pipeline 256-bit SIMD, V2 moves to 4 pipes of 128-bit SIMDs. The net result is that the cumulative SIMD width of the V2 is not any wider than V1, but the execution flow has changed to process a larger number of smaller vectors in parallel. This change makes the SIMD pipeline width identical to Arm’s Cortex parts (which are all 128-bit, the minimum size for SVE2), but it does mean that Arm is no longer taking full advantage of the scalable part of SVE by using larger SIMDs. I expect we’ll find out why Arm is taking this route once they do a full V2 deep dive, as I’m curious whether this is purely an efficiency play or something more akin to homogenizing designs across the Arm ecosystem.

Past that, it’s likely worth noting that while Arm’s presentation slides put bfloat16 and int8 matmul down as features, these are not new features. Still, Arm is promising that V2’s SIMD processing will provide microarchitecture efficiency improvements over the V1.

More broadly, V2 will also be introducing larger L2 cache sizes. The V2 design supports up to 2MB of private L2 cache per core, double the maximum size of V1. V2 will also be introducing further improvements to Arm’s integer processing performance, though the company isn’t going into further detail at this point. From an architectural standpoint, the V1 borrowed a fair bit from the Cortex-X1 CPU design, and it wouldn’t be too surprising if that was once again the case for the V2, borrowing from the X2. In which case consumer chips like the Snapdragon 8 Gen1 and Dimensity 9000 should provide a loose reference on what to expect.

For the Demeter platform Arm will be reusing their CMN-700 mesh fabric, which was first introduced for the V1 generation. CMN-700 is still a modern mesh design with support for up to 144 nodes in a 12×12 configuration, and is suitable for interfacing with DDR5 memory as well as PCIe 5/CXL 2 for I/O. As a result, strictly speaking the V2 isn’t bringing anything new at the fabric level – even the 512MB of SLC could be done with a V1 + CMN-700 setup – but this does mean that the CMN-700 mesh and its features is now a baseline moving forward with V2.

The Neoverse V2 core, in turn, is going to be the cornerstone of the upcoming generation of high-performance Arm server CPUs. The de facto flagship here will be NVIDIA’s Grace CPU, which will be one of the first (if not the first) V2 design to ship in 2023. NVIDIA had previously announced that Grace would be based on a Neoverse design, so this week’s announcement from Arm finally confirms the long-held suspicion that Grace would be based on the next-generation Neoverse V core.

NVIDIA, for its part, has their fall GTC event scheduled to take place in just a few days. So it’s likely we’ll hear a bit more about Grace and its Neoverse V2 underpinnings as NVIDIA seeks to promote the chip ahead of its release next year.

Neoverse E2: Cortex-A510 For Use With N2

Alongside the Neoverse V2 announcement, Arm is also using this week’s briefing to announce the Neoverse E2 platform. Unlike the V2 reveal, this is a much smaller scale announcement, and Arm is only offering a handful of technical details. Ultimately, E2’s day in the sun will be coming a bit later on.

That said, the E2 platform is being delivered to partners with an eye towards interoperability with the existing N2 platform. For this, Arm has paired the Cortex-A510 CPU, Arm’s little/high-efficiency Cortex CPU core, and paired that with the CMN-700 mesh. This is intended to give server operators/vendors further flexibility by providing an alternative CPU core to the N2, while still offering the modern I/O and memory features of Arm’s mesh. Underscoring this, the E2 system backplane is even compatible with the N2 backplane.

Neoverse Next: Poseidon, N-Next, and E-Next

Finally, Arm’s announcement this week provides a glimpse at the company’s future roadmap for all three Neoverse platforms, where, unsurprisingly, Arm is working on updated versions of each of the platforms.

Notably, all three platforms call for adding PCIe 6 support as well as CXL 3.0 support. This would come from the next iteration of Arm’s CMN mesh network, which as Arm already does today, is shared between all three platforms.

Meanwhile, it’s interesting to see the Poseidon name once again pop up in Arm’s roadmaps. Going back to Arm’s very first Neoverse roadmap, Poseidon was the name attached to Arm’s 5mn/2021 platform, a spot since taken by N2 and V1/V2 in various forms. With V2 not landing in hardware until 2023, Poseidon/V3 is still years off, but there’s likely some significance to Arm keeping the codename (such as new microarchitecture).

But first out of the gate will be the N-Next platform – the presumable Neoverse N3. With the Neoverse N platform a generation ahead of the rest (N2 was first announced in 2020), it’ll be the next platform due for a refresh. N3 is due to be available to partners in 2023, with Arm broadly touting generational performance and efficiency improvements.



Source link

Leave a Reply

Your email address will not be published.

Please answer the following anti-spam test

589 + 318 = ?

  1.    903
  2.    894
  3.    993
  4.    976
  5.    907
Just select the correct answers among the proposed