The events of August ended efforts by NCAR's Scientific Computing Division to bring one of the most powerful computers in the world to NCAR and the modeling community that it serves. Staff Notes Monthly editor Bob Henson interviewed SCD director Bill Buzbee in early September to get Bill's take on the recent past and the evolving future.
SN: Obviously this must be a disappointment.
BB: Yes, it is. The SX-4 is the fastest machine that we have ever evaluated--15 to 25 Gigaflops [billions of floating-point calculations per second] sustained. Had we been able to bring the SX-4 to NCAR, it would have enabled U.S. atmospheric scientists to address problems that are currently intractable and will remain intractable until comparable computing power is available.
Our best option for matching the performance and cost-to-performance ratio of the SX-4 is highly parallel microprocessor systems, such as the Hewlett-Packard/Convex SPP2000 that we acquired this spring. This machine has been undergoing a process of upgrades. We expect to make it available to all users within the next month or so. We're doing a number of experiments to evaluate its overall capability, and so far it looks promising. We can routinely get 2 gigaflops out of it and we have surpassed 10 on one occasion. It offers very good performance per unit of cost.
How would you characterize our relationship with Cray at this point?
I think it's professional. It has been throughout, for the most part, and as we move on now to this era of highly parallel nonvector computing, they are a potential supplier. We'll give them the same objective consideration that we have in the past.
Let's talk about the highly parallel nonvector era. What does that imply as far as challenges and potential benefits?
The most significant thing it implies is the potential to have as much computing capability as we would have had with the SX-4. It'll be a year or so later, but nevertheless it gives us the potential to stay in league with our peer organizations around the world, who, as I've noted on various occasions, by the end of this year will have systems that sustain 20 to 80 gigaflops. It's very important that we have comparable capability. This technology is our best hope for achieving that.
Can you explain in a nutshell the difference between vector and nonvector machines?
The vector machines operate on strings of numbers, and as a consequence, the CPU, memory, and various other components can be coordinated in such a fashion as to achieve very high performance.
The microprocessors [i.e., nonvector processors] today have, in theory, peak performances comparable to the vector processors, but they use another strategy--cacheing--in order to enhance their performance.
How has the SCD planning process unfolded through the drawn-out procurement process?
As soon as the antidumping investigation was launched, we realized that the SX-4 might never be available, so we put in place a number of interim steps. We brought the C-90 into the Climate System Laboratory, and that made it possible for the climate system model (CSM) project to make a lot of progress. We brought in a new J-9 computer from Cray Research to replace the old Y-MP, which was beginning to have reliability problems due to its age.
We've had highly parallel systems on the floor for experimentation and first-hand evaluation throughout the nineties, but we realized that if the SX-4 was not going to be available, then our best option would be highly parallel nonvector technology. We put in gear last fall a process to acquire the latest technology in this area, and that culminated with installation of the HP SPP last spring.
|This Hewlett-Packard/Convex SPP2000/64 arrived at NCAR this spring.|
We only have a 64-processor system here, so we do have access to bigger systems at other sites.
How do you think climate modeling at NCAR will adapt to the transition toward nonvector technology? Could we be using supercomputers in other nations, as we did in Japan in a collaboration with NEC earlier this year?
Warren Washington and coworkers have a parallel coupled model (PCM) that runs on highly parallel systems, including the HP SPP. This is one of the models that we will use to evaluate the SPP. On the other hand, the NCAR climate system model (CSM) will need some significant modifications to use the SPP and similar systems.
Part of the NCAR strategic plan includes broadening our national and international collaborations. When scientifically appropriate, such collaborations can include access to supercomputers off site, including in other countries.
Meanwhile, we have the CRAY C-90 downstairs. It's still a very solid 5-gigaflop machine running the CSM quite well. It'll be here for at least another year and probably at least two more years. So I don't think there's any particular crisis with the model as it presently exists.
It does sound like SCD will survive this ordeal and continue to be a community resource.
The data-handling capabilities in SCD are almost unmatched, and we have a very respectable computing capability today with the C-90. We anticipate bringing in another supercomputer for the community. I think we're as good as most U.S. supercomputing centers, and if we're successful with the highly parallel nonvector technology, we will be able to match the computing capability of our international peer organizations. It'll take about a year to a year and a half to get there. Five years from now, people may look back on all this as actually a fortuitous development.
Clearly the past year has been a tough period for SCD staff.
It's created a certain amount of apprehension. [I'm hearing] similar apprehensions among some of the scientists. They're very concerned, as was evident at the director's retreat back in June, that we maintain a good computing capability.
However, you do have a sense of where SCD is headed that you haven't really had for a year.
We're out of limbo now, and we know what we have to do next. The SX-4 is behind us.
It sounds like we'll be going in a different direction from almost any other major atmospheric science computing center.
Not really. The U.K. and German weather services are both using highly parallel nonvector systems. The European Centre [for Medium-range Weather Forecasts] and Meteo-France are using highly parallel systems with vector processors. The only way to get the kind of performance that this community needs is through parallelism. No matter what we do, we're going to go parallel.
This spring SCD acquired a Hewlett-Packard/Convex SPP2000/64, an Exemplar X-Class system that features 64 microprocessors in a highly parallel nonvector environment. More on the HP SPP and its features can be found in New Architectures to Meet New Challenges.
SCDzine, the division's on-line newsmagazine, has a new edition this month with more on the procurement saga and on several recent and upcoming additions to the computer room.