Object Service Architecture
Network Monitor Service
Project Summary
Gil Hansen
Object Services and Consulting,
Inc.
September 15, 1998
Executive Summary
Performance is one of a collection of overall systemic properties that
pervade all parts of a system's design. In large distributed systems,
many decisions are made at design time based on the need for improved performance,
but the tools for instrumenting a system for performance measurement at
run time and for using this information to adaptively configure systems
are often missing or idiosyncratic. The OSA/Network Monitor Service
project provides one of the missing ingredients to make large systems performance-aware
and bandwidth-adaptive, namely, a generic wrapper that can be used with
WAN or LAN Web accesses to record network performance data and trends.
This data can later be used in other OSA/* projects (e.g., OSA/Query Augmentor
and OSA/Annotations) to provide those projects with the performance metadata
they will need to become bandwidth-adaptive.
Problem Statement
Certain decisions made by applications and middleware services require
knowing about the current and historical performance conditions of the
underlying communication network. For example, in selecting the best
site to process a query, a query service needs to know if the source node
is active and the expected traffic on links at specific times of the day.
To date most applications above the network level are not bandwidth-adaptive,
that is, they are insensitive to or unaware of the quality of service (QoS)
of the network they operate on. A prerequisite to making applications bandwidth-adaptive
is to provide them network topology, historical trend, and current condition
performance metadata, but this data is not generally available.
Objectives
The near-term objective of this project was to build a Personal Network
Monitor Service (pNMS) for the Web that captures network performance metadata
based on URL accesses. A browser issues a GET operation on a URL
as usual, but a side effect, if the service is enabled, is to collect network
access data on the route of the access through the network and time delays
at each hop. We wanted to determine:
-
how to install an interceptor (access wrapper) to seamlessly capture the
access event
-
how to install a web-based behavioral extension service for capturing the
performance data
-
what performance metadata to collect and what kind of repository to put
it in
The long term objective was to provide application and middleware services
with network historical trend and current condition performance metadata
so they may be aware of the quality of service (QoS) of the network they
operate on and thereby enable them to become bandwidth-adaptive. Ultimately,
the objective was to provide a generic architectural framework that supports
Web-based activities including the optimization of a distributed application;
the monitoring of resources, performance, and QoS parameters; resource
management; and the management of QoS.
Approach
Ideally, the Internet would be subdivided into monitored domains or realms,
and a Network Monitor Service (NMS) would supply current and historical
performance conditions. But this information is not generally available.
In this project, as an alternative, we explored a local personal NMS (pNMS),
which can act as a poor man's NMS. pNMS would unobtrusively monitor user
network accesses and record performance information in a local metadata
repository (MDR). As a user accesses information using URLs in a browser,
an interceptor captures the URL and information about the path taken through
the Internet (e.g., nodes visited, transit times between nodes). Over time,
the nodes and links of the portion of the Internet of interest to the user
are mapped and a picture emerges of node and link availability and responsiveness
from the viewpoint of the user. To enlarge the mapped area, a monitor extension
could autonomously and periodically ping Internet nodes and record performance
data in the MDR. A poor man's NMS for a domain (collection of users, e.g.,
in an organization or with similar interests, for instance, collaborators)
would pool performance metadata gathered by local NMSs and provide a broader
coverage of current and historical conditions. One means of a domain NMS
forming a collective view is to systematically visit each local site in
the domain and analyze its pNMS MDR. Upon return, the gathered information
would be used to update the domain NMS's MDR. Alternatively, each local
pNMS could periodically transmit its MDR to the domain NMS.
In this project, monitoring the user and pinging the Internet was handled
by a network monitor service, which handles service requests by
clients for network conditions. It is transparent to the user whether an
NMS probes the pNMS, a domain NMS, or a federation of NMSs. Two interfaces
will be provided. One is the pNMS service API interface that programs
can use to ask about network weather conditions. The other is a human
pNMS GUI for viewing historical weather conditions about the mapped portion
of the Internet.
pNMS is a scalable service. As the Internet expands, pNMSs can be added
to new nodes in a domain. A spider for a domain NMS would now have more
local sites to visit, or there could be multiple spiders concurrently gathering
information from disjoint portions of the domain and their results combined
upon return to form a collective view. There could even be one Internet
NMS that gathers weather information from the domain NMSs. Thus, NMSs are
naturally hierarchical and scale as links and nodes are added. Domain NMSs
also decentralize the gathering and usage of weather information. Weather
information need not be routed to a central NMS for processing. Instead
spiders can roam a domain visiting local pNMS to form a collective view
of the domain, or roam domains visiting domain NMSs to form a collective
view of the Internet.
Limitations of Related Work
There already exist a variety of network
monitoring tools and network management schemes and Internet
and TCP/IP measurement tools. These include:
-
Windows and Unix provide Ping and TraceRT, which are low-level diagnostic
utilities to probe connecting to a remote host. Ping verifies connections
to a remote computer(s). TraceRT determines the route taken to a destination
by sending ICMP echo packets to the destination. Both are primitive services
that provide basic network performance data, but neither collects, analyzes,
and records the data.
-
Net.Medic monitors IP packets
from the web user's point of view providing displays of router hops and
bandwidth limits between the browser and every Web server they access.
-
Some work has been done on Internet Weather Services at UCSD.
The weather information is not generally accessible to network applications
and the approach requires daemons installed on the machines being monitored.
-
There is a growing literature on Quality
of Service though mostly from the communication layer. The
DARPA Quorum program is concentrating here. One interesting effort
is the JTF/ATD Communications
Server, implemented by BBN. It monitors network conditions and advises
clients of the available QoS (available bandwidth and latency), negotiates
access, monitors network usage and enforces negotiated network usage. QoS
measures the current state of a WAN: it periodically probes remote WAN
locations, and computes and stores the short and medium term average quality
of service available to each remote location. The data is published to
the distributed management object Corbus server and can advise a client
application of currently available network resources. The server is not
widely available and the monitored information is limited to communication
QoS.
-
Emerging networks, such as ATM, can provide QoS guarantees on bandwidth
and delay for the transfer of continuous media data. Recently the Internet
Engineering Task Force (IETF) has begun to address QoS issues for ATM.
RFC 1946,
Native ATM Support for ST2+, addresses deterministic delivery services.
RFC 1932,
IP over ATM: A Framework Document, addresses how real-time application's
QoS requirements can be expresses and effectively accomplished using ATM
or IP capabilities. These efforts are at the protocol level and do not
provide support at the application level.
-
Most network measurement tools do not track performance characteristics
of Internet nodes and links at the application level, and allow for a quantitative
assessment of implications on Internet users and applications. Internet
Service Monitors and Network
Planning Systems are emerging classes of tools that address this problem.
They also do not provide management services across some kinds of domain
boundaries, like firewalls.
-
Traditional telecommunication network and distributed Network Management
systems are based on the concept of a managed object. Managed objects are
used to describe different components of a network/system and are used
by management software. In particular, they can be used to retain dynamic
values such as monitor information. While such management systems typically
have measurement components, measurement data is usually at the network
element level and not assimilated for export as a service to the application
layer. One effort in this direction is OMG's System Management Common Facilities
(pdf, PostScript),
a set of utility interfaces such as control, monitoring, security management,
configuration, and policy that are needed to perform Systems Management
operations. Generally, the micro management interaction of the management
station (client) and device (servers) leads to the generation of high traffic
and computation overload for the management station. The management station
has to communicate with a large number of devices, and to store and process
an ever increasing amount of data. Both IETF and ISO have taken some steps
to alleviate the situation by introducing some primitive forms of decentralization
in their approaches, e.g., event notification, proxies that acts as both
a client and a server, Remote MONitoring (RMON) that uses network monitoring
devices to determine the status of the network by direct inspection of
the packets flowing in it, and the delegation of tasks to proxies by the
management station to offload computation and introduce a higher degree
of parallelism. There is also the IETF
AgentX Working Group which addresses technology that allows communication
between a master and multiple slaves that contain management instrumentation.
The Agent
Extensibility (AgentX) Protocol Specification defines an administrative
framework for extensible SNMP agents.
Most QoS work has focused on providing end-to-end network-level QoS (e.g.,
IETF RSVP) and not top-to-bottom application to comm-layer QoS. For instance,
currently Internet search engines do not take network performance data
into account. While performance data will not improve the quality of the
query result, it can affect the quality of the query service.
Results
Preliminary investigative work for this project was completed, namely,
an Internet Tools Survey
section on Quality of Service.
We identified enabling technology (see below)
for use in the project.
Fundamental infrastructure and functionality has been implemented. The
weather service was integrated with the Intermediary Architecture that
uses W3C's Jigsaw proxy server.
The URL of unique remote sites accessed are captured and passed to a concurrent
performance monitor application which generates round-trip times to each
site using Ping and stores the information in the pNMS metadata repository
(MDR), a PSE object database. Also, the effective bandwidth to each site
(duration of download/total size of download) is calculated. [NOTE: The
duration includes the time for the browser to determine whether to use
a cached version of the resource (assuming one exists). It must query the
server to determine the date the resource was last modified, and if the
server is busy this will add to the duration for those uncached portions.]
A Historical View GUI allows performance metadata for a particular URL
to be filtered and viewed. Data can be presented in traditional tabular
form in which each column can be sorted causing the rows to be reordered
or the raw data for each entry dumped in a prescribed format.
The weather service is not ready for use by others because 1) an API
for accessing historical data for a remote host hasn't been defined; 2)
there are no mechanisms for providing performance trend data; 3) network
performance tools yielding relevant metrics have not been identified or
incorporated; 4) performance metadata is not collected from multiple MDRs
and used to form a composite view. Basically, only the underlying infrastructure
has been implemented.
The MDR may be queried using the OSA Natural Language Interface. This
was demonstrated by transforming the PSE database into an Access database
and constructing a description of the legal queries.
Enabling Technology
-
Development was mainly in Java and C++ when necessary (e.g., the implementation
of native methods). The Java Development Kit (JDK)
was used to compile and debug the Java code. Microsoft's Visual C++
was used for C++ development.
-
Jigsaw before/after filters,
methods defined within the Intermediary Architecture weather class, capture
URLs being accessed by the Netscape browser and measure the effective bandwidth
of accessed resources.
-
The OSA/Weather project investigated using existing network
monitoring tools including DOS and UNIX utilities Ping and TraceRT
and Net.Medic. The
only performance tool used in the prototype was Ping which is built into
Windows 95/NT. It is accessed using a Java native interface.
-
The QoS Metadata Repository (MDR) was implemented using Object Design's
ObjectStore PSE
for Java.
-
pNMS GUIs were implemented using Sun's Swing
Set (part of the JFC
to be released with JDK 1.2).
Lessons Learned
-
Jigsaw, the proxy server used by the IA, doesn't provide an API that mirrors
browser events. In particular, one cannot distinguish
-
the start and stop of a download,
-
a download was interrupted (via hyperlinking. going back, or reloading),
or
-
accesses are from multiple active browsers
and in addition,
-
timestamps in the HTTP header is unreliable, and
-
the size of a resource is not always available from the HTTP header.
As a consequence the events have to be deduced, timestamps are made using
a local clock, and the size calculated by reading the resource. Identified
URLs are limited to a sequence of downloads from the same site; offsite
downloads for embedded URLs (for example, gifs and ads) are treated as
downloads from a different site. That is, it is not possible to measure
the total download of a webpage if it involves offsite references, only
the individual pieces can be measured.
-
A major problem with using public network performance tools is not having
access to the raw data since this is what is stored in the MDR. Many tools,
such as Net.Medic, only provide graphical displays. The alternative is
to become a TCP/IP expert and build our own network performance tools.
This does not contribute directly to the goals of the project. Also, one
tool of interest, namely pathchar
by the author of tracert, is immature, and like other tools, implemented
for UNIX (Solaris) platforms.
-
A major problem with adding new performance metrics to objects in an OO
database like PSE is that the database must be rewritten using the new
objects. What is needed is a database, like Lore
(Lightweight Object Repository), a Stanford semi-structured self-describing
database, to store performance/QoS metrics in XML. Unfortunately, Lore
is written in C++ and only runs under Solaris.
XML can provide an alternative approach to representing performance
metadata. The notion is to describe the metadata (attribute-value pairs)
in a DTD file; database entries are XML documents. DTD can be mapped to
database structures or object hierarchies. Entries can be made extensible
by extending the DTD file. Metadata from multiple MDRs can be exported
using a structured XML document as an intermediary. Using the DTD file
and MDR, applications can process the metadata however they deem fit; information
is extracted from XML entries via an XML parser. Different metadata formats
can be combined through a common DTD file. Custom DTD files created on
the fly can act as filters to extract specific metadata. A problem
with XML as currently defined is that it does not support strong data types.
However, the XML-Data
document submitted to W3C addresses this problem.
Next Steps
These are some of the potential next steps for this project. More
work is needed in many areas.
-
Identify key network performance and QoS metrics, and the available tools
to measure them (see network
monitoring tools and network management schemes and Internet
and TCP/IP measurement tools). Write a technical note on Network
Management and Network Monitoring Tools that will record their capabilities
and limitations.
-
Invoke additional network monitoring tool(s) in parallel with capturing
sites accessed (within the IA).
-
Make MDR objects extensible (so new performance metadata can easily be
added). It will be possible to use XML when the working draft of XML-Data
becomes available. In the interim, investigate replacing PSE with JavaSpaces,
an object repository that enables applications and hardware to share work
and results over a distributed environment.
-
Use MIBs (Management Information Base) to passively
get network stats from network devices using SNMP (when available
to 3rd parties).
-
Identify mechanisms/tools to predict network weather trends.
-
Specify an API interface that programs can use to obtain network weather
conditions.
-
Determine how to aggregate QoS metadata from multiple sources possibly
via shared or federated repositories
-
Determine how to make a higher level service, such as the OBJS annotation
and query services, bandwidth adaptive.
-
Determine how to automate the augmentation process, i.e., incorporate new
tools and store new metrics in the MDR.
-
Investigate scaling to the Internet by subdividing it into monitored domains/realms
with a weather service per domain and a weather service for the entire
Internet.
-
Replace the custom MDR with a generic metadata repository.
-
Display a map of Web sites visited for viewing historical weather and providing
explanations about accessed URLs, and providing trend data. For example,
nodes would represent Internet servers and arcs links between servers.
Clicking on a link or map node would pop up performance information about
the link/node. The GUI could also provide data on the URLs visited sorted
by date, domain, etc. along with data on the paths taken.
-
When accessing a URL, optionally display weather conditions.
-
Determine the monitoring role the weather service might play in OMG's System
Management Common Facilities (pdf,
PostScript),
a set of utility interfaces that are needed to perform Systems Management.
Impact
We cannot make object services architecture performance-aware if
we do not have a means of measuring OSA performance. pNMS by itself
is a relatively simple service. Some interesting aspects of pNMS
are:
-
it is installed using the OSA/Intermediary Architecture, which provides
URL interceptor and wrapper
-
it can provide the performance metadata needed for bandwidth-adaptive applications
-
it provides end-users with an understandable view of network performance
metadata
-
it could be coupled with other services
These results are aligned with our ultimate goal of providing a generic
architectural framework that supports Web-based activities which includes
the optimization of a distributed application, the monitoring of resources,
performance, and QoS parameters, resource management, and the management
of QoS.
This research
is sponsored by the Defense Advanced Research Projects Agency and managed
by the U.S. Army Research Laboratory under contract DAAL01-95-C-0112. The
views and conclusions contained in this document are those of the authors
and should not be interpreted as necessarily representing the official
policies, either expressed or implied of the Defense Advanced Research
Projects Agency, U.S. Army Research Laboratory, or the United States Government.
© Copyright 1997, 1998 Object Services and Consulting,
Inc. Permission is granted to copy this document provided this copyright
statement is retained in all copies. Disclaimer: OBJS does not warrant
the accuracy or completeness of the information in this survey.
Last revised: September 15, 1998. Send comments
to Gil Hansen.