NCAB Host Monitoring Policy - 4/1/08

The Computer Production Group (CPG) in collaboration with the Network Engineering and Telecommunications Section (NETS) has traditionally monitored the status of network-attached hosts throughout NCAR/UCAR. As part of this courtesy service, monitoring is limited to ten hosts per division / program. UCAR General Purpose hosts are not limited (e.g. web servers, auth servers, VPN servers, etc.). CPG will contact the appropriate system administrators when a problem is detected by our monitoring systems.

In order to manage this process, the following definitions of responsibilities and severities will constitute the host monitoring policy.

Responsibilities:

CPG

Upon detecting a problem with a divisional host, CPG will make a reasonable effort to notify the appropriate staff. This reasonable effort will consist of:

  1. E-mail or telephone the first contact person's designated contact number (urgent option for voice mail).
    • Division / program Representatives may contact CPG at 303-497-1200

  2. If the first contact person has not responded within 15 minutes, the second person on the contact list will be contacted.

  3. If the second contact person has not responded within 15 minutes, the third and final person on the list will be contacted.

  4. If the third person does not respond CPG will make no further attempts.

Division / Program Representative

Each division / program that has hosts monitored by CPG will have a designated division / program representative. That representative will have the following responsibilities:

  1. Notify CPG of any planned upgrades or outages to monitored hosts. Delegated division / program representatives may also schedule upgrades and outages.

  2. Notify CPG of any hosts that should be replaced or added with a work request. https://cislcustomersupport.ucar.edu/evj/ExtraView/evSignon

  3. Provide CPG with the prioritized list of three divisional contacts with a work request: https://cislcustomersupport.ucar.edu/evj/ExtraView/evSignon

  4. Notify CPG of any changes to the contact list with a work request: https://cislcustomersupport.ucar.edu/evj/ExtraView/evSignon

Severities:

CPG will make every attempt to balance and distribute our monitoring services fairly. Severity 1 systems such as the Mass Storage System, production supercomputers, and the FRGP are assigned the highest severity.

In the event a Severity 1 system requires attention at the same time as a divisional host, the Severity 1 service will be addressed first. The responsible staff will be contacted regarding the divisional host when time permits.

This policy will be reviewed annually.