Component Architectures for High Performance, Distributed Meta-Computing.
Dennis Gannon, Dept. of Computer Science, Indiana University (gannon@cs.indiana.edu)
Introduction
The design of the current generation of desktop software technology differs
from past generations in one fundamental way. The new paradigm states that
applications should be built by composing "off the shelf" components, much
the same way that hardware designers build systems from integrated circuits.
Furthermore, these components may be distributed across a wide area network
of compute and data servers. Components are defined by the public interfaces
that specify function as well as the protocols with which they may use
to communicate other components. An application program in this model becomes
a dynamic network of communicating objects. This basic distributed object
design philosophy is having a profound impact on all aspects of information
processing technology. We are already seeing a shift in the software industry
toward investment in software components and away from hand-crafted, stand-along
applications. And, within the industry, a technology war is being waged
over the design of the component composition architecture.
High performance computing will not be immune from this paradigm shift.
More specifically, as our current and future Internet continues to scale
in both size and bandwidth, it is not unrealistic to think about applications
that might incorporate 10,000 active components that are distributed over
that many compute hosts. Furthermore, pressure from the desktop software
industry will compel us to integrate the applications that run on supercomputer
systems into distributed problem solving environments that use object technology.
Metacomputing systems consisting of MPP servers, advanced, networked instruments,
database servers and gigabit networks will require a robust and scalable
object model that support high performance application design.
In this position paper, we describe the important characteristics of
high performance, "Metacomputing" applications that arise in large scale
science and engineering computation. We will then describe the limitations
of some of the current component architectures based on CORBA, ActiveX,
and Java Beans/Studio when applied to this class of problem. We conclude
with a series of suggestions for building a more robust component model
that will scale to the application described here.
Three Application Scenarios for High Performance Computing.
Example 1. Distributed Algorithm Design
Component architectures have been in use in the scientific programming
community for about seven years. In most cases, these systems have been
used for specialized tasks such as composing scientific tools from algorithmic
components. One of the best examples is AVS and NAG Explorer. Designed
to simplify the process of building scientific visualization applications,
AVS and Explorer use a component model to allow users to compose the the
visualization systems from components that included image analysis tools,
filters and geometric transformations, and rendering modules. These early
systems allowed a limited form of distributed computation, but were primarily
limited by their closed architecture and their lack of an object model
flexible enough to include user defined data types as part of the message
stream. A more modern approach to image analysis and visualization
component architectures can be seen in the Utah SciRun system and the SoftLSI
prototype from Fujitsu.
The Linear System Analyzer (LSA) designed by Bramley, et. al. is one
such experimental component architecture. LSA was built to simplify the
process of solving large sparse systems of linear equations. While many
may consider the task of solving matrix equations to be a "solved problem",
nothing could be further from the truth. This job remains one of the most
difficult problems in most large scale scientific simulation. Bramley observed
that the problem can be decomposed into the following steps
-
Read the matrix (or extract it from another part of a larger problem)
-
Analyze the matrix for obvious properties that help guide the solution
process. For example, is it symmetric, banded, strongly diagonally dominant?
-
Apply a reordering or scaling transformation. For example, Markovitz pivoting,
blocking, etc.
-
Select and apply a preconditioner. For example, MG, ILU, MILU, RILU, ILUT,
SSOR, etc. \item Select a solver from the many available: Direct, AMG,
BiCG, CGS, Bi-CGstabilized, GMRES, GCR, OrthoMin, etc.
-
Extract a solution.
LSA provides a library of components that implement these steps in the
solution process. By connecting a matrix analysis component to a preconditioner
which is connected to an iterative solver and a solution extractor, the
user can build a custom solver for the problem at hand. The components
can be linked together to form a single library which can be added to a
larger application. However, it may be the case the the best solver may
be located on a specific remote parallel machine, or the problem is so
large that it can only be done on a remote machine with a very large memory.
Consequently, LSA allows components to be placed on remote machines by
assigning a host IP address to that component. The underlying component
container architecture works with the grid scheduler to make sure that
the component is initialized and running at that location.
Requirements imposed on the component system.
The LSA and other distributed algorithm systems impose special requirements
that are not part of the conventional desktop software component model.
-
Bandwidth and Performance Characteristics. Large scale problem have
large scale bandwidth demands. Moving a large sparse matrix over a network
link should not be allowed to take longer that the combined execution time
of the sending and receiving component. Otherwise, it does may make sense
to distribute the computation\footnote{There are important exceptions to
this rule. For example, if the capabilities of the host system are special
or the component objects are proprietary it may still be essential to distribute
the computation.}. None of the commercial architectures (ActiveX, CORBA,
Java RMI) have a standard model for associating performance characteristics
or requirements with the communication infrastructure.
-
Scheduling the execution of a large distributed computation can
be very complex. For an application like the LSA, it may be the case that
some of the component may execute interactively, while other components
require waiting in batch queues. Consequently, the synchronization between
components must be flexible enough to allow the network of components to
work asynchronously with very long latencies.
-
Scripting Language Interfaces. Often a network of components will
be executed many time with many different inputs and parameter configurations.
Consequently, it is important to have a scripting interface to supplement
the graphical composition model. A scripting language like Python or Perl
will allow iterative control of the execution as well as the composition
of very large graphs of components.
-
Mixed Language Components are essential for linking scientific applications
like linear algebra solvers with Java based graphical interfaces and component
architectures. In the LSA, approximately 40% of the system is Fortran plus
MPI, 30% is Java and 30% is HPC++ which encapsulates the parallel Fortran
and communicates with the Java front-end.
Example 2. Tele-Immersive Collaborative Design
Consider the following problem. A car company uses collaborative design
to reduce costs and time in new product design. For each new car, there
is a master design database at the main factory and each subcontractor
maintains a separate design database with details about the components
they supply. Some of the information in the subcontractor database is proprietary
and does not appear in the master design database. However it possible
for the master design database to extract any required performance information
from the subcontract by means of simple RPC transactions. These performance
responses can be used to drive a simulation of the car that runs on a remote
supercomputer at the company headquarters. The simulation results can be
transmitted to Tele-Immersive (T-I) systems such as a CAVE or Immersadesk
at the main facility and at the subcontractors over a high bandwidth network.
What is displayed in the T-I environment is the car responding to a virtual
environment under the control of the T-I users.
Suppose the designers want to see the effect of changing the engine
design on the car handling performance. The main designers ask engine builder
to update their database with a different model of the engine. A virtual
mountain road scenario is loaded into the T-I environments and is used
to drive the simulation. The designers interactively experiment with the
handling characteristics of the simulated vehicle. The components of the
system are
-
The T-I environment. This can be viewed as one large component, but it
is more likely that it is many:
-
The display object and associated visual database. The data input stream
consists of updates to the data base which are then rendered by the display
object.
-
The user interface control components - pointers, head trackers and other
haptic devices. As with any graphical user interface system, user events
are detected by the control components and they output the information
as a stream that is fed to other components that are associated with the
application.
-
The application components receive information from the control device
components. Based on this information the application components can query
and update the visual database which is displayed.
-
The design databases. This is a description of the car that is used to
drive both the simulation and the manufacturing process. It can also be
viewed as a large component or as a collection of smaller ones. Outputs
from this object include the polygon model used in by the rendering component
and the finite-element model used by the simulation.
-
The simulation object. The required inputs are the finite-element model
of the car and the road, as well as a sequence of control inputs that "drive"
the car during the simulation.
Requirements imposed on the component system.
The most important aspect of this application is the dependence upon managing
bandwidth so that real-time performance is possible. This is made more
complex by the need to manage many different types of data streams between
components. Currently real-time CORBA implementations are being investigated
in the research community \cite{schmidt}, but the standard implementations
of CORBA, DCOM (ActiveX) and Java communication mechanisms would be insufficient
for this application. The object architecture must provide a mechanism
where performance constraints and QoS mechanisms can be associated with
the logical data paths between component ports.
In addition, support in the object model for multi-cast communication
is very important. While it is likely we will see extensions of Java RMI
to multi-cast, it is not part of the CORBA or ActiveX model.
Example 3. The Digital Sky Survey
The Digital Sky Survey project illustrates a different set of distributed
object examples. The survey will consist of multiple databases consisting
of many billions of meta-data objects that each describe some visible object
such as a star or galaxy. Each of these meta-data objects is linked to
a digital image in the archival system, and the collection of meta-data
objects is organized as relational database. In addition to a reference
to the image object in the archive, each of the meta-data objects contains
basic reference information and a list of data extraction methods that
can be applied to the image object.
A scientist at some remote location may decide to search for all galaxies
that exhibit some important set of properties. Some of the properties may
relate to information stored as part of the meta-data but some of it may
require an analysis of stored images. Formulated as a database query, the
request is sent to the database. The set of objects that satisfy the conditions
associated with the meta-data can be extracted. Then for each of these
objects, a request can be sent to the data archive to apply the remaining
tests to the images. This is a data parallel operation that results in
a references to the subset of galaxies that satisfy all conditions. It
may also be the case that the result of this query must be used as part
of a second query submitted to another remote sky survey repository. This
may involve the transmission of a large stream of data from the first repository
host to the second.
The components of the solution process are the relational databases
and the repositories. To set up a complex analysis of the data two or more
components may need to be connected by a high bandwidth link. The information
that is communicated between components consist of image objects, object
references, meta-data information and relational data base queries.
Requirements imposed on the component system.
While many of the problems associated to this application can be found
in the previous two, there are also some unique features to this problem.
The first of these involves the extensive use of database technology.
While there is a Java database interface standard, it may not scale to
the problems described here. In particular, the interaction between the
object relational database and image archive require the implementation
of the data-parallel remote method invocation described above.
The second problem is related to the communication that must take between
components with parallel implementations. The existing commercial technologies
would require that a single logical channel would be implemented as a single
network stream connection. However, if both components have parallel implementations,
then it may be possible to implement the communication as a set of parallel
communication streams. Pardis, a parallel implementation and extension
of CORBA, is one example system that supports this feature. Pardis demonstrates
that it is possible to significantly improve the utilization of network
bandwidth by providing parallel streams to implement remote method calls.
Component Systems and Object Technology
Object oriented software design principles are only the first step in building
the next generation of metacomputing applications. As the desktop software
industry has learned, it is also necessary to define what it means for
an instance of an object class to become a component in a distributed
system. A precise definition of component depends upon the environment
in which the component is used. This environment defines the component
architecture which prescribes the required features that a component must
support so that the may be composed into functioning applications.
The way in which a component presents a visual interface (if it has
one) and responds to events and communicates with other components is defined
by the component architecture. The three most important commercial component
architectures are Microsoft ActiveX, OMG's CORBA/OpenDoc and Java Beans
and Java Studio. However, because our interest is metacomputing systems
and not graphical user interfaces, we will focus here on the aspects of
component systems that describe the composition and communication behavior
of most component architectures.
There are two common models of component integration
-
Client/Server Communication. In this model a client is an
application that can be viewed as an container of components or their proxies.
The application makes requests of objects by invoking the public member
functions defined by the component objects interfaces. The individual components
are servers which respond to the client as illustrated in Figure 3. The
control flow is based on function call from and return to the client. Microsoft
ActiveX follows this model. CORBA was also designed with this model in
mind, but as a distributed object system it is flexible enough to support
other models..
Figure 3. Client/Server component models consist of a client container
application which hold object components that are often proxies for remote
objects. Such an architecture may support multiple protocols between the
proxies and the remote components.
-
Software ICs. An electronic integrated circuit is a component that
has input buffers and output ports. A design engineer can connect any output
port of the right signal type to an input port of another IC. Software
IC systems have the same nature. A software module has input ports and
output ports and a graphical or script based composition tool can be used
to create instances of the objects and define the connections between the
components. The type of an input port is an interface that describes the
message that port can receive. These ports can be connected to ports whose
interface descriptions describe the types of messages that are sent. As
with electronic ICs, an output port messages can be {\em multi-cast} to
to matching input ports on multiple other components as shown in Figure
4. The control flow of messages is based on macro-dataflow techniques.
Figure 4. A Software IC architecture breaks the client/server hierarchy.
Each component has three standard modes of communication: data streams
that connect component ports (solid lines), control messages from the component
container (dashed lines) and events (star bursts) which are broadcast to
all "listening" objects
In addition to this data stream style communication that takes place between
object ports, there are two other standard forms of communication that
components systems use.
-
Control signals. Every component implements a standard control message
interface that is used by the component control ``container'' framework
to query the components about its properties and state.
-
Events and exceptions. Events are messages generated by a a component
that are broadcast to all other components that are ``listening'' for events
of that type. Most user input is handled by events as well as other GUI
management tasks.
The Java Bean and Java Studio systems from Sun follow this model very closely..
Other commercial systems that are based on this architecture include AVS
and its descendent NAG Explorer which are used to build visualization tools
from components. Unfortunately, Explorer has a very limited and inflexible
type system which limits it extensibility to larger distributed applications.
The CORBA based OpenDoc system uses a similar object model.
The final piece of a component system architecture that distinguishes
it from other types software infrastructures is the concept of a component
container framework. The container is the application that runs on the
users workstation that is used to select components, connect them together
and respond to many of the event messages. The container uses the control
interface of each component to discover its properties and initialize it.
Microsoft Internet Explorer is an example of a component container for
ActiveX. Java Studio provides a graphical user interface for composing
and connecting component that is very similar to the layout system used
by Explorer and other component breadboards. We shall return to more of
the technical requirements for high performance components and container
frameworks later in this chapter.
Component Framework Requirements for Grid Applications
To accomplish the task of building component based, high performance application,
we must solve certain additional problems. First, the objects in the framework
need to know about each other in order to be able to transmit the data
and member function messages as indicated. For example, the CAD database
may be located in one city and the flow simulation may be running on a
parallel processing system in another location. Furthermore, the visualization
system may be an immersive environment like a CAVE in another facility.
Also, some objects may be persistent such as the design data base, while
other objects, such as the grid generation filter, may exist only for the
duration of our computation.
One solution would be to use a visual programming system that allows
a user to draw the application component graph. NAG Explorer uses this
technique. The LSA example described above and Java Studio also use this
graphical model Unfortunately NAG's type system is not very rich and it
is not clear that graphical composition tools will scale to the networks
of more than a few dozen objects. We would also like to describe networks
that are dynamic and are able to incorporate new component resources ``on
the fly'' as they are discovered.
Systems like Explorer and the current SciRun used a fixed type system.
However, most distributed object systems allow arbitrary user defined types
to be transmitted over the channels between components. A system must know
how to transmit application specific objects over the network. This is
called the {\em serialization} problem and a solution to it requires a
protocol for packing and unpacking the components of data structures so
that they may be reliably transmitted between different computer architectures
in a heterogeneous environment. The traditional solution is to use an Interface
Definition Language (IDL) to describe the types of the objects being transmitted.
IDL is a simple C++ like language for describing structures and interfaces.
It was first used in the DCE infrastructure. The DCE IDL was adopted and
extended for use in Microsoft DCOM and CORBA. The CORBA extension is the
most complete and it is used as the foundation of the specification of
the entire CORBA system. Java RMI, on the other hand, is a strictly Java-to-Java
communication model, so Java serves as its own IDL. However, there is now
a Java-to-HPC++ link that uses a combination of IDL and Java RMI, and JavaSoft
has agreed to re-implement RMI so that it runs over the CORBA communication
protocol known as IIOP.
One of the most persistent problems with the existing commercial technologies
is the poor performance of serialization and communication. The Java Remote
Method Invocation (RMI) provides the most sophisticated serialization model,
but the performance is several orders of magnitude below the requirements
of the Grid applications described above.
The Agile Objects project at the University of Illinois is exploring
techniques for high performance implementation of component object standard
interfaces and protocols which focus on lowering the cost of crossing component
boundaries (lower invocation overhead) and reducing the latency of a remote
procedure call (lower invocation latency). In particular, these efforts
are focusing on DCOM and Java RMI invocation mechanisms and build on technologies
from the Illinois Concert runtime which executes RPC and message calls
in 10 to 20 microseconds within a cluster of workstations. In addition,
the WUSL real-time CORBA work and the Indiana JAVA RMI-Nexus projects are
addressing the same problem in the case of heterogeneous environments.
Most good object oriented systems must also include central mechanisms
for the following additional problems.
-
Persistence, and storage management. It is often the case that an
object needs to be ``frozen'' so that its state is preserved on some storage
device and then ``thawed'' later when it is needed again. An system with
the ability to do this to objects is said to support persistence and it
is closely related to serializability as described above.
-
Object Sharing. The problem of object sharing is also important.
If each object instance belonged to only one application, this would be
sufficient. However, for objects that are used in multiple applications
concurrently, there is an additional problem. For example, the design data
base may be in use by several applications. To solve this problem one may
associate a session identifier with each circuit of objects. when a message
is received by any object the session identifier that accompanies that
message can be used to identify the objects that should receive any outgoing
messages associated with that transaction.
-
Process and thread management. Most instances of distributed objects
are encapsulated within their own process, but it is possible that we may
wish more than one object may belong to the same process. Also we may want
the ability to have an object respond to different requests for the same
method invocation concurrently. To do this the object system must be integrated
with a thread system. There are many reasons that this can be a challenging
problem. An important associated problem is that the thread model used
to implement the communication and events for the component must also be
consistent with the thread model that might be used in the computation
kernel. For example, an application that uses Fortran OpenMP may generate
threads with one runtime system, but the component architecture may use
another. These thread systems often have difficulties co-existing in the
same process.
-
Object distribution and object migration. An object implementation
may itself be distributed. This is important in the case of parallel programming,
but it can also happen when part of a particular interface may need to
be implemented on one system and another part on another. In addition,
it is often important for an object to be able to migrate from one host
to another. For example, when the first hosts compute resources become
limiting, it is advantageous to be able to move the object to a second,
more powerful host.
-
Network Adaptability. As the example described here illustrate,
it is essential for the object middleware layer to be able to adapt to
the dynamic network loads and the availability of alternative pathways.
-
Dynamic Invocation. As described so far, the interfaces to distributed
objects must be known at compile time. The IDL description is used to generate
the proxies/stubs and interface skeleton for the remote objects. However,
it is often a requirement for a component system to provide a mechanism
that will allow an application to discover the interfaces to an object
at runtime. This will allow the application to take advantage of special
properties of the component without having to recompile the application.
-
Reflection. Both object migration and network adaptability as described
above are examples of object behavior that is highly dependent upon the
way an object is implemented and its runtime system. Reflection refers
to the ability to obtain information about an object, such as its class
or the interfaces it implements at runtime. It also refers to the capability
of an object to infer properties about the its implementation and the state
of the environment in which it is executing. For example, reflection can
be used to implement dynamic invocation. While reflection can be implemented
in any system, Java is the only conventional language that supports reflection
directly. A closely related concept is that of a metaobject which
can be thought of a runtime object that is bound to each application level
object. In some systems, metaobjects are used to implement method invocations.
Hence the choice of network protocol to use in executing a particular method
invocation can be controlled by the associated metaobject. This allows
the object making the method call to be written without concern for way
the call is implemented because that is the job of the metaobject. This
allows for a great variety of ways to implement some of the features listed
in this section. For example, one way to accomplish the same result as
object migration is to endow a system with the capability called pseudo-migration
which works as follows. The metaobject associated with an object catches
each request for a member function call on that object. If the metaobject
can detect that the current compute host is too busy, the metaobject can
create another instance of the controlled object on another host and forward
the call to the new instance.
-
Event Logging. Debugging distributed systems is very hard. It is
essential to have a mechanism that will allow the events associated with
a set of distributed interactions to be logged in a way that will help
identify what happened and when.
-
Fault tolerance. An exception handling mechanism is the first step
toward building reliable systems, but it falls far short of providing a
mechanism where failure can be tolerated in a reliable manner. The system
must be able to automatically restart applications and rollback transactions
to a previous known state.
-
Authentication and Security. Authentication allows us to identify
which applications and users are allowed to access system components. Security
means that these interactions can be accomplished with safety for the data
as well as the implementations. It is an issue that goes far beyond the
domain of the object system, but the object system must provide a way to
allow the user access to the metacomputing authentication and security
tools that are available.
-
Beyond Client/Server. For high performance computation it is essential
that the future distributed object systems support a greater variety of
models than simple client/server schemes. As illustrated by the example
in section 2, there are paradigms that include peer-to-peer object networks.
In the future we can imagine massive networks of components and software
agents that work without centralized control and dynamically respond to
changing loads and requirements.
-
Support for Parallelism. Beyond multi-treaded applications are those
that involve the concurrent activity of many components. An object systems
must allow both asynchronous as well as synchronous method calls. In addition,
multi-cast communication and collective synchronization are essential for
supporting parallel operation on very large numbers of concurrently executing
objects.
In the proceeding pages we have attempted to outline many of the technical
problems that are associated with extending contemporary component and
distributed object technology to the applications that will run on the
emerging high performance meta computing grid. While the challenges are
great, they are not insurmountable. With a coordinated effort, it should
be possible to build an application-level framework for a distributed component
architecture that will be suitable for these applications and also interoperate
with the emerging standards for desktop software. However, without true
interoperability, there is little hope that a substantial software industry
will emerge from the high end of scientific and engineering problem solving.