Fall 2001

Building a more perfect digital library, bit by bit

by Bob Henson

David Fulker. (Photo by Carlye Calvin.)

Countless science-oriented Web sites have blossomed on the open frontier of cyberspace. Many of those serving as outposts of science-education reform are being federated over the next few years as part of an unusually grand experiment. They will still operate as independent units but, like a United Sites of the Web, they will be linked in a fashion that multiplies their power and educational impact, creating the National Science Digital Library, or NSDL. (The "Science" in NSDL stands for Science, Mathematics, Engineering, and Technology Education.)

Likely to be the largest and most heterogeneous science library ever attempted, NSDL—scheduled to debut in the fall of 2002—will offer high-quality materials to students, teachers, and professionals at all levels across the full range of scientific and related disciplines. A varied group of public, private, and nonprofit sponsors is already at work on the NSDL. NSF's Education and Human Resources Division is providing roughly $100 million in primary support over five years.

Nearly a decade of research, much of it sponsored by NSF and the U.S. Department of Education, has shown the potential power of on- line science libraries. In the past year, seed money from NSF has launched 40 NSDL-related projects at universities and other institutions around the country, with more to come. Now the program enters a critical phase as a digital-age analog of the Continental Congress finds common ground for cataloging and otherwise integrating the diverse resources of NSDL.

NSDL's core integration effort will be headquartered at UCAR and overseen by David Fulker, long-time director of the Unidata program (which he'll continue to lead on a half-time basis). The goal, according to Fulker, is to establish information flows and an organizational architecture that take NSDL beyond what one can now do with a Web-based search engine. "We think of NSDL as an education layer over the Web."

The core integration effort involves three primary organizations and several additional collaborators:

Many of the research teams working on NSDL are multidisciplinary. Cornell's team includes three faculty members in computer science, two librarians, several database management experts, and a digital media specialist.

An ambitious goal, set by NSF, is to have an operational NSDL by the end of 2002. To meet that deadline, the core integration team will capitalize on prior and ongoing work. For example, the assembly of catalog records from federated sources will employ protocols from the Open Archives Initiative. Content-based discovery will build on advanced search methods from UM-Amherst. Governance for NSDL is envisioned to resemble that for DLESE and Unidata, emphasizing the democratic ideal of community ownership.

The Cornell team estimates that by 2006 there may be a million users choosing materials from ten million resources at many thousands of independent sites. Effective characterization of these resources, i.e., metadata, will thus be essential. In addition, Fulker and colleagues will encourage users to customize the library for their own needs. For example, notes Fulker, "a teacher might combine certain design elements, tools, and collections into a portal appropriate for her eighth-grade astronomy unit."

To exploit the immense flexibility of libraries that are digital, the NSDL will adopt a "one library, many portals" philosophy, with the independent portals built and supported through an repository of metadata. Principal investigator William Arms, from Cornell's computer science department, draws an analogy to on-line newspapers, in which the same article from a wire service may have an entirely different look and feel from one site to the next. Likewise, high-school students, engineers, or mathematicians may find themselves using the same resources on NSDL after getting to them through widely different portals, each one customized to its audience. "We're not going to change the collections," says Arms. "Our basic approach is to centralize the discovery process."

The NSDL core team is also looking closely at copyright and financial aspects of the library. Some parts of the collection will likely be open to all, while others will be restricted to paying users. The project expects to rely largely on institutional licenses, which will enable students and educators to u se most or all of NSDL at no incremental cost. A round table held at Columbia last winter, led by co–principal investigator Jane Ginsburg (Columbia School of Law), provided insight into community needs and priorities. The bottom line, says Fulker, is to "ensure that users are assured of the authenticity of material in the collection and that author and producer are able to receive both credit and payment for use, as appropriate."

With its expert-certified materials, powerful indexing, and multiple interfaces, NSDL stands to provide far more than the sum of the vast data holdings it will soon encompass. "In the long term, any science library worth its salt has to have data," says Fulker, "but data sets in a modern library are useless without tools."

