UOP's Unidata program is undergoing some major evolutions that make sharing of data, more than ever, a community effort.
The Unidata Program Center offers software and services that enable universities to acquire and use atmospheric and related data on their own computers, often in real time. Operated under an NSF grant, the center serves more than 130 universities. For most of its ten-year existence, Unidata facilitated university access to data that were transmitted by satellite and originated from a few basic sources. Now, the number of sources and the types of data available are, in theory, limitless.
At the center of this change is a shift from transmitting data by satellite to using computer networking, which Unidata calls its Internet Data Distribution (IDD) System. IDD is made possible by the LDM (Local Data Manager) software, developed by Unidata. Large-scale deployment of the IDD system began on 1 November 1994 with a new release of the LDM. In March, more than 75 Unidata sites acquired data through IDD, many of them receiving twice the data volume they received via satellite.
The essence of IDD is that it is a distributed system with data sources, relays, and receivers at numerous locations. Most of these are operated by the universities, so that those who once were relatively passive recipients of data can now be data sources as well. The system has no data center; Unidata's role is to help organize and guide the system.
"The single most important benefit," says Unidata director David Fulker, "is decentralized decision making about introduction of new data in the system. It sets the stage for any member to become a data provider. That's pretty fundamental. I anticipate profound changes in availability of data over the next few years as a result."
Data of interest to only a few users can be transmitted directly among them using IDD technologies, he explains. "We don't get involved as long as the interest in a particular data set is low. If two people want to exchange data, that doesn't affect anyone else. However, as more users are interested in a data source and as more are affected, Unidata can intervene to acquire data most efficiently and choose proper routing for data in a way that makes effective use of the networks." Data, like so many other things, can be acquired more cheaply "in bulk." That's a role Unidata will continue to play--when a particular data stream is of interest to a number of universities, Unidata can obtain for them the low rates possible only through such bulk purchases or special agreements with agencies and other providers. So the universities can obtain the data free or at a discount. But even in these cases, Fulker adds, Unidata does not become the data center; frequently the data can flow straight from provider to user.
So the system is self-regulating. One of the fundamental implications of such a decentralized system, Fulker continues, "is that you can make incremental changes. When it all went through a central broadcast facility, the decision to add a new data source was significant. Moving all the data to an uplink and buying more bandwidth on the satellite were expensive. These two costs meant you wouldn't introduce new data to the system unless it looked important. But how do you know what data will be important? The history of science demonstrates that you don't know how important data will be until you use it. That's one reason we ran Unidata, through most of its existence, with roughly the same major data sources. This new way of operating means a different paradigm. We don't have to make any assessment of the importance of data. All we've done is provide the tools."
If Unidata's role is less, that of the universities is greater. Shared responsibility, Fulker stresses, is a major feature of the new system. Participation in IDD requires a substantial commitment from the universities. Participants are expected to acquire and install components of the system adequate for handling and relaying all the data they will be requesting. Those who receive data via IDD are expected to relay them, following Unidata guidelines, to other Unidata users where practical and needed. For data of limited interest to the community, users negotiate their own routing arrangements.
Data currently available include Family of Services data from the National Weather Service, GOES images, radar data from the NEXRAD Information Dissemination Service, data from the Geomet Corporation's National Lightning Detection Network, outputs of high-resolution computer models and other experimental data from NOAA's Forecast Systems Laboratory and National Meteorological Center (Suitland, Maryland), and text and graphical descriptions of current weather and environmental phenomena from the University of Michigan's Weather Underground and Blue Skies projects.
The system is open to participation by U.S. universities that will use the data for education and research. LDM software is available free from Unidata and supported for use on a number of platforms.
"IDD is a primary example of how Unidata functions as a community effort," says Fulker. "We've built a system in which data flows throughout the community. It moves through technical means, through technical relays, but there's a human side. Good will among the community is what makes it work. You have to be willing to see your networking capacity utilized in that way. Basically, it depends on neighborliness."
For more information, contact firstname.lastname@example.org or http://www.unidata.ucar.edu/
Multiple sources. Data can be injected from any point on the Internet where an LDM system has been installed.
Data subscription. Users of IDD can select from the available data only those they need.
Local processing. Users can update their data files or conduct their own processing of selected data.
Table-driven control. User-defined tables contain the criteria for selecting data and for controlling which data are routed to what files and processed.
Reliable data delivery. Reliable transport protocols ensure the accuracy of data arriving at a site. A queuing system buffers data flows at relays, preventing losses from network congestion and short outages. Eventually IDD will automatically switch to a backup system during severe outages.
Load distribution. IDD is designed to avoid excessive concentrations of network traffic on any node or link in the system. Generally, data flow only where they are needed.
Community effort. Responsibilities are shared among Unidata and the community it serves.
Evolution. The IDD design is compatible with, and should be simplified by, the probable evolution of the Internet.