DeepBench is available online along with first results from Intel and Nvidia processors running it. The benchmark tests low-level operations such as matrix multiplication, convolutions, handing recurrent layers and the time it takes for data to be shared with all processors in a cluster.
Machine learning has emerged as a critical workload for Web giants such as Baidu, Google, Facebook and others. The workloads come in many flavors serving applications such as speech, object and video recognition and automatic language translation.
Today the job of training machine learning models “is limited by compute, if we had faster processors we’d run bigger models…in practice we train on a reasonable subset of data that can finish in a matter of months,” said Greg Diamos, a senior researcher at Baidu’s Silicon Valley AI Lab.
The lab has found, for example, it can reduce by 40% errors in automatic language translation for every order-of-magnitude performance improvement in computing. “We could use improvements of several orders of magnitude--100x or greater,” said Diamos.
No striking results emerged from initial tests on NVidia TitanX, TitanX Pascal and M40 GPUs and Intel Xeon Phi processors, according to the Baidu researchers.
“Even for the same type of operations like matrix multiplies, depending on the sizes of the models and the ways they are used, performance varies even on the same processor,” said Diamos. “We aren’t as concerned about minor differences between these processors--we want both to be 10 times faster,” he said.
Nevertheless, one executive took occasion to claim bragging rights.
“Baidu’s DeepBench results clearly highlight Pascal as the performance leader across all deep learning workloads,” said Ian Buck, vice president of accelerated computing at Nvidia. “When full applications are benchmarked, such as Caffe AlexNet or Caffe VGG-19, our internal testing shows Pascal is up to 6x faster than [Intel's] Knights Landing,” he added.
The Baidu researchers also hope other processor vendors and data center operators contribute to expanding and running chips on the benchmark.