UCAR > Communications > UCAR Quarterly > Summer 2001 Search

Summer 2001

Web100: How to have it all

Basil Irwin and Marla Meehl. (Photo by Carlye Calvin.)

by Juli Rew
NCAR Scientific Computing Division

Today's Internet has the potential to offer gigabit-per-second bandwidth. But few users get even a small fraction of that, unless they have networking wizards hand-tuning the network at both ends of the application. To help ordinary users exploit 100% of the available network bandwidth, researchers in NCAR's Scientific Computing Division (SCD), along with their partners at the Pittsburgh Supercomputing Center (PSC) and the National Center for Supercomputing Applications at the University of Illinois, have embarked on an NSF-funded project called Web100.

The main way data are transferred from one networked computer to another is via the venerable Transmission Control Protocol/Internet Protocol (TCP/IP). IP is implemented in all of the end systems and routers and acts as a relay to move packets of data from one host, through one or more routers, to another host. TCP keeps track of the packets of data to assure that all are delivered reliably and in the right order to the appropriate hosts. Unfortunately, the default TCP configurations used by most end systems may not be appropriate for the available network bandwidth. Furthermore, a message may traverse multiple heterogeneous local- and wide-area networks to reach its destination, making it difficult for a user to manually optimize TCP.

Tuning up TCP

SCD co–principal investigators Basil Irwin and Marla Meehl, along with investigators at PSC and NCSA, say that a primary goal of Web100 is to develop software that interacts with the operating system and user applications to automatically optimize performance for all TCP transfers.

"Most TCP applications blindly use the default TCP buffer size," Irwin explains. "If the TCP buffer is too small to hold enough packets to fill a high-bandwidth large-latency network pipe, then transmission of packets is forced to prematurely halt until some of the packets in the filled buffer are acknowledged by the receiver as having correctly arrived." When that happens, the full bandwidth of a pipe isn't being fully used. "While a network administrator can alter the TCP buffer size manually, it's not a trivial task."

The Web100 project will seek to develop a mechanism to allow the operating system to change the TCP buffer size dynamically, transparently, and automatically for all TCP sessions. The first step is to endow TCP implementations with better instrumentation so that they can detect undersized buffer conditions and better see where there are bottlenecks in the path or other TCP bugs.

Once the operating system TCP implementation has been beefed up with real-time TCP metrics, it will also be possible to gather many more statistics about individual TCP sessions. The investigators then envision developing an "autotuner" that will monitor session performance and respond with needed adjustments. Autotuning, the ability to automatically tune TCP to simultaneously achieve maximum throughput across all connections for all applications within the resource limits of the host, has already been successfully demonstrated by PSC in the FreeBSD operating system.

The tuning process can be quite complicated, so autotuning will be partitioned: the operating system kernel will simply accept certain basic tuning adjustments, while the complex tuning algorithms that determine the adjustments will run in user mode, where network researchers can easily extend, replace, or disable them.

Will fixing TCP buffer sizes end most data-transfer bottlenecks on the Internet? That depends on whether a given bottleneck is caused by an incorrect buffer size or some other problem. Other bottlenecks can be caused by TCP packets being dropped by the network or the applications themselves. Conversely, will ubiquitous well-tuned TCP crush the Net? If so, then maybe it simply means the network capacity needs to be increased as users become able for the first time to effectively and conveniently use large amounts of bandwidth.

Web tools for the desktop

A second Web100 goal is to develop diagnostic and performance monitoring tools for tracking the status of data transfers. Ideally, at least some of the tools for end users should pass the "newbie test"—that is, be simple enough for anyone with little network experience to use and understand. For instance, a simple display that shows bytes/second and lost packets/second should be understandable to most people.

The initial Web100 products will be based on kernel modifications to the Linux operating system to allow TCP autotuning. Linux is widely used at research universities, and its source code is freely available. Since Linux is popular on Intel systems, the Web100 group will be testing their new tools primarily on Intel- based computers. Pre-Alpha Web100 code, released in November, is being tested by the Stanford Linear Accelerator Center and the Oak Ridge National Laboratory. A new Alpha0 release in late March included many new instruments, some simple diagnostic tools, some library functions, and a simple autotuning daemon. Additional testers for alpha0 include the Argonne and Lawrence Berkeley National Laboratories, Globus, and Internet2.

The future: Spreading the word

Irwin cautions that the goal of universal tuned access may not be reachable soon since the Internet is such an amorphous distributed network. Individuals and institutions choose and install their own computers, with most being non-Linux systems. To achieve universal 100% throughput would require commercial operating system vendors to adopt autotuning. Thus, although the Web100 group will be making its Linux-based codes freely available, experience suggests that convincing other vendors to incorporate the code in their own systems will take a while.

Meehl says, "We are excited the Web100 will finally help deliver the speed that these networks are capable of." Aside from that potential benefit, she adds, the project will "help us to diagnose and correct fundamental network problems such as packet loss and routing problems."

Coming up

Web100 is looking for active co-developers and plans to set up a Web portal that will allow members to contribute to the project and its products. The first face-to-face meeting of Web100 evaluators from over a dozen institutions took place at NCAR on 24–25 July. It included status reports, project plans, and updates on related work. The Web100 code will eventually be downloadable from the Web site below.

On the Web:

In this issue... Other issues of UCAR Quarterly

UCAR > Communications > UCAR Quarterly > Summer 2001 Search

Edited by Bob Henson, bhenson@ucar.edu
Prepared for the Web by Jacque Marshall
Last revised: Wed Aug 8 17:05:07 MDT 2001