ARM to boost processor performance by 50x with new AI instructions

March 21, 2017 // By Nick Flaherty
ARM has revealed the first steps in its next generation processor technology. Changes to the ARMv8 instruction set will boost the performance of artificial intelligence and machine learning by up to 50 times using a more flexible cluster of processor cores

The DynamIQ cluster technology will allow up to eight completely different cores to be used in a big.LITTLE style. The move is aimed at a wide range of applications, including driverless cars and automotive driver assistance systems as well as enterprise servers.

“By 2020 we expect to see a lot of artificial intelligence deployed from autonomous driving platforms to mixed reality,” said Nandan Nayampally, General Manager of ARM’s Compute Products Group. “Even with 5G you cannot purely rely on the cloud for machine learning or AI so as performance continues to grow it needs to fit into ever smaller power envelopes.”

Cluster technology is at the heart of the ARM strategy for future devices, he says.

“We started cluster with the ARM11 4-core cluster ten years ago, and then big.LITTLE was six years ago, and we used the CoreLink SoC [fabric] to scale these into larger systems,” said Nayampally. “DynamIQ is the next stage, complementary to the existing technology, with up to 8 cores in a single clutser to bring a larger level of performance. Every core in this cluster can be a different implementation and a different core and that brings substantially higher levels of performance and flexibility. Along with this we have an optimised memory sub system with faster access and power saving features,” he said.

This would allow several small cores and several large cores to operate independently and switch code between the different cores depending on the processing requirements. “For example, 1+3 or 1+7 DynamIQ big.LITTLE configurations with substantially more granular and optimal control are now possible. This boosts innovation in SoCs designed with right-sized compute with heterogeneous processing that deliver meaningful AI performance at the device itself,” he said.