UCAR > Communications > Staff Notes Monthly > October 2001 Search

October 2001

UOP's newest program will help unite digital libraries

Ben Domenico and Dave Fulker. (Photo by Carlye Calvin.)

Countless science-oriented Web sites have blossomed on the open frontier of cyberspace. Many of those serving as outposts of science-education reform are being federated over the next few years as part of an unusually grand experiment. They will still operate as independent units but, like a United Sites of the Web, they will be linked in a fashion that multiplies their power and educational impact, creating the National Science Digital Library, or NSDL.

Likely to be the largest and most heterogeneous science library ever attempted, NSDL—scheduled to debut in the fall of 2002—will offer selected materials to students, teachers, and professionals at all levels in science, mathematics, engineering, and technology education (the areas denoted by "science" in NSDL).

NSDL's core integration effort will be headquartered at UCAR as a new part of the UCAR Office of Programs. It will be overseen by Dave Fulker, long-time director of Unidata. Dave will continue to lead Unidata on a half-time basis, with program manager Ben Domenico ( see sidebar) taking on a larger role in Unidata's day-to-day operations. NSDL expects to hire at least six new staffers in its office on the third floor of FL4. The office's deputy director will be Kaye Howe, a consultant in higher education and former vice chancellor for academic services at CU-Boulder. "I am really looking forward to this position—great people and a wonderful project," says Howe.

The core integration effort involves two key institutional partners—Cornell and Columbia Universities—as well as two other parts of UCAR: the Digital Library for Earth System Education (DLESE), whose Program Center is led by Mary Marlino, and the Education and Outreach Program, led by Roberta Johnson.

According to Dave, the library's goal is to establish information flows and an organizational architecture that will take it beyond what one can now do with a Web-based search engine. "We think of NSDL as an education layer over the Web." The Cornell team estimates that by 2006 there may be a million users choosing materials from ten million resources at many thousands of independent sites. Effective characterization of these resources through metadata (data about the data) will thus be essential.

The project will encourage users to customize the library for their own needs. For example, Dave notes, "a teacher might combine certain design elements, tools, and collections into a portal appropriate for his or her eighth-grade astronomy unit." Using a "one library, many portals" philosophy, independent portals—sites oriented toward providing access to other data- rich sites—will be built and supported.

The NSDL core team is also looking closely at copyright and financial aspects of the library. Some parts of the collection will likely be open to all, while others will be restricted to paying users. The project expects to rely largely on institutional licenses, which will enable students and educators to use most or all of NSDL at no additional cost.

"In the long term, any science library worth its salt has to have data," says Dave, "but data sets in a modern library are useless without tools." With its expert-certified materials, powerful indexing, and multiple interfaces, NSDL stands to provide far more than the sum of the vast data holdings it will soon encompass.

• Bob Henson

On the Web:

THREDDS: Helping users weave through data

Tucked in amid the flow charts and plans on Ben Domenico's blackboard is a quote from the jazz great Thelonius Monk: "Simple ain't easy." Although it may not be easy, Ben's goal is a simple one: he wants to "provide the capabilities for scientific data on the Web that we now have for multimedia documents."

Unidata's program manager is heading up a newly funded project to do just that. It's called Thematic Real-time Environmental Data Distributed Services. THREDDS has come to life with roughly $900,000 in funding over the next two years from NSDL (see sidebar). THREDDS is part of NSDL's collections effort—which itself is separate from the core integration tasks that will be based in UOP—but it's a key piece of the puzzle.

The focus of THREDDS is to improve how scientists, educators, and students publish, find, and use data. The default practice for many people in the atmospheric and related sciences is to contact their colleagues about where to get data sets, then download and process them using software on local computers or have them delivered automatically in real time using the Unidata Internet Data Distribution system. "The Unidata community uses powerful analysis tools, [but] the data must reside on users' local machines," says Ben. With THREDDS, researchers will still be able to use the analysis tools on their own machines, but they'll have the option of accessing data from a set of distributed servers.

THREDDS is a highly collaborative project with more than 20 participant institutions. For example, the Distributed Oceanographic Data System (DODS) is a key component that allows users to specify a data set in terms of a URL on a remote server as if it were a file on a local computer. "DODS makes access convenient once you know the URLs for the data sets of interest," says Ben, "but finding the data is not always easy." The metadata at the heart of THREDDS will make complex data sets much easier to find.

These data sets will range across and beyond the breadth of UCAR science, says Ben—"anything from a single report at a weather observation station, to a complete satellite picture, to seismic data."

Since it's centered at Unidata, THREDDS will be able to call upon that program's long history of innovation in providing universities with data. The 12 data providers committed to providing THREDDS services include

  • NOAA's National Climatic Data Center, for climate data;

  • the Incorporated Research Institutions for Seismology, for seismic data;

  • the Navy's Fleet Numerical Meteorology and Oceanography Center, for oceanographic data; and

  • NOAA's National Geophysical Data Center, for geophysical data.

Testbed server implementations will be done on the SCD/Unidata Community Data Portal, which captures nearly a gigabyte of data each hour from the Internet Data Distribution system, and on a satellite data server at the University of Wisconsin's Space Science and Engineering Center.

THREDDS will incorporate a set of client applications for analysis and display that will allow speedy and intuitive access to the data. In addition to Unidata's MetApps team, groups within ATD and SCD, as well as several organizations outside UCAR, are working on a diverse array of potential client software (software linking desktop computers to remote servers). Some of these clients will be as simple as an existing browser that would allow data analysis to be carried out on data-hosting servers. Other, "thicker" clients would allow users to find, analyze, and display data from the remote servers on their local machines. In either case, the client would be capable of

  • visualizing complex, multidimensional data;

  • integrating and overlaying data from multiple sources; and

  • gracefully handling spatial coordinate systems, measurable quantities, units of measure, and sampling variations.

As THREDDS evolves, users will find themselves freed from the arcane world of file formats and naming conventions. Instead they'll be navigating through data almost as easily as a newbie surfs the Web. It's an ambitious goal, says Ben, "but we think we can make significant strides."

• Bob Henson

On the Web:

In this issue... Other issues of Staff Notes Monthly

UCAR > Communications > Staff Notes Monthly > October 2001 Search

Edited by David Hosansky, hosansky@ucar.edu
Prepared for the Web by Jacque Marshall
Last revised: Thu Oct 25 11:18:36 MDT 2001