Object Service Architecture

Network Monitor Service

Project Summary

Gil Hansen Object Services and Consulting, Inc. September 15, 1998

Executive Summary

Performance is one of a collection of overall systemic properties that pervade all parts of a system's design. In large distributed systems, many decisions are made at design time based on the need for improved performance, but the tools for instrumenting a system for performance measurement at run time and for using this information to adaptively configure systems are often missing or idiosyncratic. The OSA/Network Monitor Service project provides one of the missing ingredients to make large systems performance-aware and bandwidth-adaptive, namely, a generic wrapper that can be used with WAN or LAN Web accesses to record network performance data and trends. This data can later be used in other OSA/* projects (e.g., OSA/Query Augmentor and OSA/Annotations) to provide those projects with the performance metadata they will need to become bandwidth-adaptive.

Problem Statement

Certain decisions made by applications and middleware services require knowing about the current and historical performance conditions of the underlying communication network. For example, in selecting the best site to process a query, a query service needs to know if the source node is active and the expected traffic on links at specific times of the day. To date most applications above the network level are not bandwidth-adaptive, that is, they are insensitive to or unaware of the quality of service (QoS) of the network they operate on. A prerequisite to making applications bandwidth-adaptive is to provide them network topology, historical trend, and current condition performance metadata, but this data is not generally available.

Objectives

The near-term objective of this project was to build a Personal Network Monitor Service (pNMS) for the Web that captures network performance metadata based on URL accesses. A browser issues a GET operation on a URL as usual, but a side effect, if the service is enabled, is to collect network access data on the route of the access through the network and time delays at each hop. We wanted to determine:

how to install an interceptor (access wrapper) to seamlessly capture the access event
how to install a web-based behavioral extension service for capturing the performance data
what performance metadata to collect and what kind of repository to put it in

The long term objective was to provide application and middleware services with network historical trend and current condition performance metadata so they may be aware of the quality of service (QoS) of the network they operate on and thereby enable them to become bandwidth-adaptive. Ultimately, the objective was to provide a generic architectural framework that supports Web-based activities including the optimization of a distributed application; the monitoring of resources, performance, and QoS parameters; resource management; and the management of QoS.

Approach

Ideally, the Internet would be subdivided into monitored domains or realms, and a Network Monitor Service (NMS) would supply current and historical performance conditions. But this information is not generally available. In this project, as an alternative, we explored a local personal NMS (pNMS), which can act as a poor man's NMS. pNMS would unobtrusively monitor user network accesses and record performance information in a local metadata repository (MDR). As a user accesses information using URLs in a browser, an interceptor captures the URL and information about the path taken through the Internet (e.g., nodes visited, transit times between nodes). Over time, the nodes and links of the portion of the Internet of interest to the user are mapped and a picture emerges of node and link availability and responsiveness from the viewpoint of the user. To enlarge the mapped area, a monitor extension could autonomously and periodically ping Internet nodes and record performance data in the MDR. A poor man's NMS for a domain (collection of users, e.g., in an organization or with similar interests, for instance, collaborators) would pool performance metadata gathered by local NMSs and provide a broader coverage of current and historical conditions. One means of a domain NMS forming a collective view is to systematically visit each local site in the domain and analyze its pNMS MDR. Upon return, the gathered information would be used to update the domain NMS's MDR. Alternatively, each local pNMS could periodically transmit its MDR to the domain NMS.

In this project, monitoring the user and pinging the Internet was handled by a network monitor service, which handles service requests by clients for network conditions. It is transparent to the user whether an NMS probes the pNMS, a domain NMS, or a federation of NMSs. Two interfaces will be provided. One is the pNMS service API interface that programs can use to ask about network weather conditions. The other is a human pNMS GUI for viewing historical weather conditions about the mapped portion of the Internet.

pNMS is a scalable service. As the Internet expands, pNMSs can be added to new nodes in a domain. A spider for a domain NMS would now have more local sites to visit, or there could be multiple spiders concurrently gathering information from disjoint portions of the domain and their results combined upon return to form a collective view. There could even be one Internet NMS that gathers weather information from the domain NMSs. Thus, NMSs are naturally hierarchical and scale as links and nodes are added. Domain NMSs also decentralize the gathering and usage of weather information. Weather information need not be routed to a central NMS for processing. Instead spiders can roam a domain visiting local pNMS to form a collective view of the domain, or roam domains visiting domain NMSs to form a collective view of the Internet.

Limitations of Related Work

There already exist a variety of network monitoring tools and network management schemes and Internet and TCP/IP measurement tools. These include:

Windows and Unix provide Ping and TraceRT, which are low-level diagnostic utilities to probe connecting to a remote host. Ping verifies connections to a remote computer(s). TraceRT determines the route taken to a destination by sending ICMP echo packets to the destination. Both are primitive services that provide basic network performance data, but neither collects, analyzes, and records the data.
Net.Medic monitors IP packets from the web user's point of view providing displays of router hops and bandwidth limits between the browser and every Web server they access.
Some work has been done on Internet Weather Services at UCSD. The weather information is not generally accessible to network applications and the approach requires daemons installed on the machines being monitored.
There is a growing literature on Quality of Service though mostly from the communication layer. The DARPA Quorum program is concentrating here. One interesting effort is the JTF/ATD Communications Server, implemented by BBN. It monitors network conditions and advises clients of the available QoS (available bandwidth and latency), negotiates access, monitors network usage and enforces negotiated network usage. QoS measures the current state of a WAN: it periodically probes remote WAN locations, and computes and stores the short and medium term average quality of service available to each remote location. The data is published to the distributed management object Corbus server and can advise a client application of currently available network resources. The server is not widely available and the monitored information is limited to communication QoS.
Emerging networks, such as ATM, can provide QoS guarantees on bandwidth and delay for the transfer of continuous media data. Recently the Internet Engineering Task Force (IETF) has begun to address QoS issues for ATM. RFC 1946, Native ATM Support for ST2+, addresses deterministic delivery services. RFC 1932, IP over ATM: A Framework Document, addresses how real-time application's QoS requirements can be expresses and effectively accomplished using ATM or IP capabilities. These efforts are at the protocol level and do not provide support at the application level.
Most network measurement tools do not track performance characteristics of Internet nodes and links at the application level, and allow for a quantitative assessment of implications on Internet users and applications. Internet Service Monitors and Network Planning Systems are emerging classes of tools that address this problem. They also do not provide management services across some kinds of domain boundaries, like firewalls.
Traditional telecommunication network and distributed Network Management systems are based on the concept of a managed object. Managed objects are used to describe different components of a network/system and are used by management software. In particular, they can be used to retain dynamic values such as monitor information. While such management systems typically have measurement components, measurement data is usually at the network element level and not assimilated for export as a service to the application layer. One effort in this direction is OMG's System Management Common Facilities (pdf, PostScript), a set of utility interfaces such as control, monitoring, security management, configuration, and policy that are needed to perform Systems Management operations. Generally, the micro management interaction of the management station (client) and device (servers) leads to the generation of high traffic and computation overload for the management station. The management station has to communicate with a large number of devices, and to store and process an ever increasing amount of data. Both IETF and ISO have taken some steps to alleviate the situation by introducing some primitive forms of decentralization in their approaches, e.g., event notification, proxies that acts as both a client and a server, Remote MONitoring (RMON) that uses network monitoring devices to determine the status of the network by direct inspection of the packets flowing in it, and the delegation of tasks to proxies by the management station to offload computation and introduce a higher degree of parallelism. There is also the IETF AgentX Working Group which addresses technology that allows communication between a master and multiple slaves that contain management instrumentation. The Agent Extensibility (AgentX) Protocol Specification defines an administrative framework for extensible SNMP agents.

Most QoS work has focused on providing end-to-end network-level QoS (e.g., IETF RSVP) and not top-to-bottom application to comm-layer QoS. For instance, currently Internet search engines do not take network performance data into account. While performance data will not improve the quality of the query result, it can affect the quality of the query service.

Results

Preliminary investigative work for this project was completed, namely, an Internet Tools Survey section on Quality of Service. We identified enabling technology (see below) for use in the project.

Fundamental infrastructure and functionality has been implemented. The weather service was integrated with the Intermediary Architecture that uses W3C's Jigsaw proxy server. The URL of unique remote sites accessed are captured and passed to a concurrent performance monitor application which generates round-trip times to each site using Ping and stores the information in the pNMS metadata repository (MDR), a PSE object database. Also, the effective bandwidth to each site (duration of download/total size of download) is calculated. [NOTE: The duration includes the time for the browser to determine whether to use a cached version of the resource (assuming one exists). It must query the server to determine the date the resource was last modified, and if the server is busy this will add to the duration for those uncached portions.]

A Historical View GUI allows performance metadata for a particular URL to be filtered and viewed. Data can be presented in traditional tabular form in which each column can be sorted causing the rows to be reordered or the raw data for each entry dumped in a prescribed format.

The weather service is not ready for use by others because 1) an API for accessing historical data for a remote host hasn't been defined; 2) there are no mechanisms for providing performance trend data; 3) network performance tools yielding relevant metrics have not been identified or incorporated; 4) performance metadata is not collected from multiple MDRs and used to form a composite view. Basically, only the underlying infrastructure has been implemented.

The MDR may be queried using the OSA Natural Language Interface. This was demonstrated by transforming the PSE database into an Access database and constructing a description of the legal queries.

Enabling Technology

Development was mainly in Java and C++ when necessary (e.g., the implementation of native methods). The Java Development Kit (JDK) was used to compile and debug the Java code. Microsoft's Visual C++ was used for C++ development.
Jigsaw before/after filters, methods defined within the Intermediary Architecture weather class, capture URLs being accessed by the Netscape browser and measure the effective bandwidth of accessed resources.
The OSA/Weather project investigated using existing network monitoring tools including DOS and UNIX utilities Ping and TraceRT and Net.Medic. The only performance tool used in the prototype was Ping which is built into Windows 95/NT. It is accessed using a Java native interface.
The QoS Metadata Repository (MDR) was implemented using Object Design's ObjectStore PSE for Java.
pNMS GUIs were implemented using Sun's Swing Set (part of the JFC to be released with JDK 1.2).

Lessons Learned

Jigsaw, the proxy server used by the IA, doesn't provide an API that mirrors browser events. In particular, one cannot distinguish

the start and stop of a download,
a download was interrupted (via hyperlinking. going back, or reloading), or
accesses are from multiple active browsers

timestamps in the HTTP header is unreliable, and
the size of a resource is not always available from the HTTP header.

A major problem with using public network performance tools is not having access to the raw data since this is what is stored in the MDR. Many tools, such as Net.Medic, only provide graphical displays. The alternative is to become a TCP/IP expert and build our own network performance tools. This does not contribute directly to the goals of the project. Also, one tool of interest, namely pathchar by the author of tracert, is immature, and like other tools, implemented for UNIX (Solaris) platforms.
A major problem with adding new performance metrics to objects in an OO database like PSE is that the database must be rewritten using the new objects. What is needed is a database, like Lore (Lightweight Object Repository), a Stanford semi-structured self-describing database, to store performance/QoS metrics in XML. Unfortunately, Lore is written in C++ and only runs under Solaris.

XML-Data

Next Steps

These are some of the potential next steps for this project. More work is needed in many areas.

Identify key network performance and QoS metrics, and the available tools to measure them (see network monitoring tools and network management schemes and Internet and TCP/IP measurement tools). Write a technical note on Network Management and Network Monitoring Tools that will record their capabilities and limitations.
Invoke additional network monitoring tool(s) in parallel with capturing sites accessed (within the IA).
Make MDR objects extensible (so new performance metadata can easily be added). It will be possible to use XML when the working draft of XML-Data becomes available. In the interim, investigate replacing PSE with JavaSpaces, an object repository that enables applications and hardware to share work and results over a distributed environment.
Use MIBs (Management Information Base) to passively get network stats from network devices using SNMP (when available to 3rd parties).
Identify mechanisms/tools to predict network weather trends.
Specify an API interface that programs can use to obtain network weather conditions.
Determine how to aggregate QoS metadata from multiple sources possibly via shared or federated repositories
Determine how to make a higher level service, such as the OBJS annotation and query services, bandwidth adaptive.
Determine how to automate the augmentation process, i.e., incorporate new tools and store new metrics in the MDR.
Investigate scaling to the Internet by subdividing it into monitored domains/realms with a weather service per domain and a weather service for the entire Internet.
Replace the custom MDR with a generic metadata repository.
Display a map of Web sites visited for viewing historical weather and providing explanations about accessed URLs, and providing trend data. For example, nodes would represent Internet servers and arcs links between servers. Clicking on a link or map node would pop up performance information about the link/node. The GUI could also provide data on the URLs visited sorted by date, domain, etc. along with data on the paths taken.
When accessing a URL, optionally display weather conditions.
Determine the monitoring role the weather service might play in OMG's System Management Common Facilities (pdf, PostScript), a set of utility interfaces that are needed to perform Systems Management.

Impact

We cannot make object services architecture performance-aware if we do not have a means of measuring OSA performance. pNMS by itself is a relatively simple service. Some interesting aspects of pNMS are:

it is installed using the OSA/Intermediary Architecture, which provides URL interceptor and wrapper
it can provide the performance metadata needed for bandwidth-adaptive applications
it provides end-users with an understandable view of network performance metadata
it could be coupled with other services

These results are aligned with our ultimate goal of providing a generic architectural framework that supports Web-based activities which includes the optimization of a distributed application, the monitoring of resources, performance, and QoS parameters, resource management, and the management of QoS.

This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Army Research Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, U.S. Army Research Laboratory, or the United States Government.

© Copyright 1997, 1998 Object Services and Consulting, Inc. Permission is granted to copy this document provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the accuracy or completeness of the information in this survey.

Last revised: September 15, 1998. Send comments to Gil Hansen.