Cray Inc. is establishing its supercomputer architectures for the remainder of the decade, planning to devise systems that are more efficient and easier to link together. The systems should boost performance beyond the petaflop range by the end of the decade.
As processor speeds increase, the ability to get data from memory is a bottleneck that grows at around 50% per year. Source: Innovative Computing Laboratory, University of Tennessee.
The Seattle-based company recently inked an agreement with the U.S. Government to continue developing a next-generation supercomputer code-named Black Widow. Cray and the Government will each invest about $17M through 2007.
Black Widow is expected to reach a peak performance of several hundred teraflops (floating point operations) initially, exceeding a petaflops (a thousand trillion calculations per second) in its product lifetime.
Looking further out, Cray is working closely with DARPA to continue developing advanced systems. These efforts, designed to yield products by 2010, will build upon work done under the existing cooperative agreement.
The later phase will create a merger of Cray’s proprietary vector systems and its scalar technologies. “We want to integrate them so we can get systems with different sorts of computation capability,” says Steve Scott, CTO at Cray.
He notes that DARPA and three partners including Cray are attempting to provide dramatic performance increases, addressing issues that plague designers today. “High end computing systems don’t scale well when they’re put in clusters, and they tend to be fragile, with a lot of reliability issues,” Scott says. They are also hard to program, he adds.
The performance target is petaflops and multiple petaflops. Advances will come in many areas with a similar focus. “The common theme is bandwidth. Bandwidth is not only the most important aspect, it’s the most expensive,” Scott says.
FLOPS are becoming cheaper, following Moore’s Law of doubling every 18 months, but bandwidth increases are far slower, he explains. One solution for that problem is to use bandwidth wisely. “We want to reduce the use of bandwidth, with fast processors we want to pull data in and operate on it, using it several times in that processor,” Scott says.
Bandwidth has a major impact on memory access. “Memory capacity is getting cheaper and cheaper at Moore’s Law rates, but the bandwidth to get to memory is not getting faster at that rate,” Scott says.
Another approach is to distribute processors. “Rather than send data to processors, we’ll sprinkle processors out in the memory subsystem, performing operations on data where it sits to reduce the amount of data that gets sent across networks,” Scott says. That provides the same end effect as increasing bandwidth, he adds.
Cray plans to use commodity memory chips in its new designs, improving performance with a technique called memory concurrency. “A processor typically asks for a number of words from memory, then waits for them. In the processors we design, there can have thousands of outstanding memory requests, keeping the pipeline between the processor and memory filled,” Scott says.
The company will also address programming, aiming to make it simpler for users to move from idea to operation.
For industrial control applications, or even a simple assembly line, that machine can go almost 24/7 without a break. But what happens when the task is a little more complex? That’s where the “smart” machine would come in. The smart machine is one that has some simple (or complex in some cases) processing capability to be able to adapt to changing conditions. Such machines are suited for a host of applications, including automotive, aerospace, defense, medical, computers and electronics, telecommunications, consumer goods, and so on. This discussion will examine what’s possible with smart machines, and what tradeoffs need to be made to implement such a solution.