Running larger deep learning models is a path to new scientific possibilities, but conventional systems and architectures limit the problems that can be addressed, as models take too long to train, notes the company. Hence, Cray worked with Microsoft and CSCS to leverage their decades of high performance computing expertise to profoundly scale the Microsoft Cognitive Toolkit (formerly CNTK) on a Cray XC50 supercomputer at CSCS nicknamed “Piz Daint”.
By accelerating the training process, instead of waiting weeks or months for results, data scientists can obtain results within hours or even minutes.
With the introduction of supercomputing architectures and technologies to deep learning frameworks, customers now have the ability to solve a whole new class of problems, such as moving from image recognition to video recognition, and from simple speech recognition to natural language processing with context.
Deep learning problems share algorithmic similarities with applications traditionally run on a massively parallel supercomputer. By optimizing inter-node communication using the Cray XC Aries network and a high performance MPI library, each training job can leverage significantly more compute resources – reducing the time required to train an individual model.
A team of experts from Cray, Microsoft, and CSCS have scaled the Microsoft Cognitive Toolkit to more than 1,000 NVIDIA Tesla P100 GPU accelerators on Piz Daint. The result of this deep learning collaboration opens the door for researchers to run larger, more complex, and multi-layered deep learning workloads at scale.
Visit Cray at www.cray.com