Nvidia describes 10 teraflops processor
November 18, 2010 // Rick Merritt
Nvidia's chief scientist gave attendees at Supercomputing 2010 a sneak peak of a future graphics chip that will power an exascale computer. Nvidia is competing with three other teams to build such a system by 2018 in a program funded by the U.S. Department of Defense.
Nvidia's so-called Echelon system is just a paper design backed up by simulations, so it could change radically before it gets built. Elements of its chip designs ultimately are expected to show up across the company's portfolio of handheld to supercomputer graphics products.
"If you can do a really good job computing at one scale you can do it at another," said Bill Dally, Nvidia's chief scientist who is heading up the Echelon project. "Our focus at Nvidia is on performance per watt [across all products], and we are starting to reuse designs across the spectrum from Tegra to Tesla chips," he said.
In his talk, Dally described a graphics core that can process a floating point operation using just 10 picojoules of power, down from 200 picojoules on Nvidia's current Fermi chips. Eight of the cores would be packaged on a single streaming multiprocessor (SM) and 128 of the SMs would be packed into one chip.
The result would be a thousand-core graphics chip with each core capable of handling four double precision floating-point operations per clock cyclethe equivalent of 10 teraflops on a chip. A chip with just eight of the cores would someday power a handset, Dally said.
The Echelon chip packs just twice as many cores as today's high-end Nvidia GPUs. However, today's cores handle just one double precision floating-point operation per cycle, compared to four for the Echelon chip.
Many of the advances in the chip come from its use of memory. The Echelon chip will use 256 Mbytes of SRAM memory that can be dynamically configured to meet the needs of an application.
For example, the SRAM could be broken up into as many as six levels of cache, each of a variable size. At the lowest level each core would have its own private cache.
The goal is to get data as close to processing elements as possible to reduce the need to move data around the chip, wasting energy. Thus SMs would have a hierarchy of processor registers that could be matched to locations in cache levels. In addition, the chip would have broadcast mechanisms so that the results of one task could be shared with any nodes that needed that data.All news
Conspiracy alleged over Rousset wafer fab closure
March 07, 2014
A class action lawsuit has been filed in Federal Court in New York alleging that Atmel Corp. (San Jose, Calif.) conspired ...
Europe loses PV market lead to Asia in 2013
Driverless car sharing concept focuses on digital comfort
Automated SSL test system authenticates LED technology performance
Paper-thin ultracapacitor aims to boost Li-ion battery performance
Apple set to transform sapphire wafer market
March 07, 2014
The sapphire industry ended an 18 month period of depressed pricing and achieved $936 million in revenue for wafer products ...
FTDI reveals streaming instruction behind new 32bit architecture
AMD taps UK tool for video verification
UHF RFID the radio technology of choice for Industry 4.0
- DSM presents: Select the best plastic for DDR4
- Wireless Sensor Network Challenges and Solutions
- Putting FPGAs to Work in Software Radio Systems Handbook
- Real-Time Spectrum Analysis for Troubleshooting 802.11n/ac WLAN Devices
InterviewWi-Fi is open for business, which is good news for mobile subscribers
Following the news that Netgear has built a Facebook-linked amenity Wi-Fi option into its routers, enabling businesses to offer free Wi-Fi in return for liking the company Facebook page, David Nowicki, ...
Filter WizardCheck out the Filter Wizard Series of articles by Filter Guru Kendall Castor-Perry which provide invaluable practical Analog Design guidelines.
Linear video channel
READER OFFERRead more
This month, Freescale is giving away ten RIoTboards, worth 74 dollars each, for EETimes Europe's readers to win.
Designed to run Android operating systems efficiently or to run under Linux, the board is based on the Freescale i.MX 6Solo processor; using the ARM Cortex-A9 architecture.
And the winner is...
In our previous reader offer, Crystal Display was giving...Read more
December 15, 2011 | Texas instruments | 222901974
Unique Ser/Des technology supports encrypted video and audio content with full duplex bi-directional control channel over a single wire interface.