Component Architectures for High Performance, Distributed Meta-Computing.

Dennis Gannon, Dept. of Computer Science, Indiana University (gannon@cs.indiana.edu)

Introduction

The design of the current generation of desktop software technology differs from past generations in one fundamental way. The new paradigm states that applications should be built by composing "off the shelf" components, much the same way that hardware designers build systems from integrated circuits. Furthermore, these components may be distributed across a wide area network of compute and data servers. Components are defined by the public interfaces that specify function as well as the protocols with which they may use to communicate other components. An application program in this model becomes a dynamic network of communicating objects. This basic distributed object design philosophy is having a profound impact on all aspects of information processing technology. We are already seeing a shift in the software industry toward investment in software components and away from hand-crafted, stand-along applications. And, within the industry, a technology war is being waged over the design of the component composition architecture.

High performance computing will not be immune from this paradigm shift. More specifically, as our current and future Internet continues to scale in both size and bandwidth, it is not unrealistic to think about applications that might incorporate 10,000 active components that are distributed over that many compute hosts. Furthermore, pressure from the desktop software industry will compel us to integrate the applications that run on supercomputer systems into distributed problem solving environments that use object technology. Metacomputing systems consisting of MPP servers, advanced, networked instruments, database servers and gigabit networks will require a robust and scalable object model that support high performance application design.

In this position paper, we describe the important characteristics of high performance, "Metacomputing" applications that arise in large scale science and engineering computation. We will then describe the limitations of some of the current component architectures based on CORBA, ActiveX, and Java Beans/Studio when applied to this class of problem. We conclude with a series of suggestions for building a more robust component model that will scale to the application described here.

Three Application Scenarios for High Performance Computing.

Example 1. Distributed Algorithm Design

Component architectures have been in use in the scientific programming community for about seven years. In most cases, these systems have been used for specialized tasks such as composing scientific tools from algorithmic components. One of the best examples is AVS and NAG Explorer. Designed to simplify the process of building scientific visualization applications, AVS and Explorer use a component model to allow users to compose the the visualization systems from components that included image analysis tools, filters and geometric transformations, and rendering modules. These early systems allowed a limited form of distributed computation, but were primarily limited by their closed architecture and their lack of an object model flexible enough to include user defined data types as part of the message stream. A more modern approach to image analysis and visualization component architectures can be seen in the Utah SciRun system and the SoftLSI prototype from Fujitsu.

The Linear System Analyzer (LSA) designed by Bramley, et. al. is one such experimental component architecture. LSA was built to simplify the process of solving large sparse systems of linear equations. While many may consider the task of solving matrix equations to be a "solved problem", nothing could be further from the truth. This job remains one of the most difficult problems in most large scale scientific simulation. Bramley observed that the problem can be decomposed into the following steps

Read the matrix (or extract it from another part of a larger problem)
Analyze the matrix for obvious properties that help guide the solution process. For example, is it symmetric, banded, strongly diagonally dominant?
Apply a reordering or scaling transformation. For example, Markovitz pivoting, blocking, etc.
Select and apply a preconditioner. For example, MG, ILU, MILU, RILU, ILUT, SSOR, etc. \item Select a solver from the many available: Direct, AMG, BiCG, CGS, Bi-CGstabilized, GMRES, GCR, OrthoMin, etc.
Extract a solution.

LSA provides a library of components that implement these steps in the solution process. By connecting a matrix analysis component to a preconditioner which is connected to an iterative solver and a solution extractor, the user can build a custom solver for the problem at hand. The components can be linked together to form a single library which can be added to a larger application. However, it may be the case the the best solver may be located on a specific remote parallel machine, or the problem is so large that it can only be done on a remote machine with a very large memory. Consequently, LSA allows components to be placed on remote machines by assigning a host IP address to that component. The underlying component container architecture works with the grid scheduler to make sure that the component is initialized and running at that location.

Requirements imposed on the component system.

The LSA and other distributed algorithm systems impose special requirements that are not part of the conventional desktop software component model.

Bandwidth and Performance Characteristics. Large scale problem have large scale bandwidth demands. Moving a large sparse matrix over a network link should not be allowed to take longer that the combined execution time of the sending and receiving component. Otherwise, it does may make sense to distribute the computation\footnote{There are important exceptions to this rule. For example, if the capabilities of the host system are special or the component objects are proprietary it may still be essential to distribute the computation.}. None of the commercial architectures (ActiveX, CORBA, Java RMI) have a standard model for associating performance characteristics or requirements with the communication infrastructure.
Scheduling the execution of a large distributed computation can be very complex. For an application like the LSA, it may be the case that some of the component may execute interactively, while other components require waiting in batch queues. Consequently, the synchronization between components must be flexible enough to allow the network of components to work asynchronously with very long latencies.
Scripting Language Interfaces. Often a network of components will be executed many time with many different inputs and parameter configurations. Consequently, it is important to have a scripting interface to supplement the graphical composition model. A scripting language like Python or Perl will allow iterative control of the execution as well as the composition of very large graphs of components.
Mixed Language Components are essential for linking scientific applications like linear algebra solvers with Java based graphical interfaces and component architectures. In the LSA, approximately 40% of the system is Fortran plus MPI, 30% is Java and 30% is HPC++ which encapsulates the parallel Fortran and communicates with the Java front-end.

Example 2. Tele-Immersive Collaborative Design

Consider the following problem. A car company uses collaborative design to reduce costs and time in new product design. For each new car, there is a master design database at the main factory and each subcontractor maintains a separate design database with details about the components they supply. Some of the information in the subcontractor database is proprietary and does not appear in the master design database. However it possible for the master design database to extract any required performance information from the subcontract by means of simple RPC transactions. These performance responses can be used to drive a simulation of the car that runs on a remote supercomputer at the company headquarters. The simulation results can be transmitted to Tele-Immersive (T-I) systems such as a CAVE or Immersadesk at the main facility and at the subcontractors over a high bandwidth network. What is displayed in the T-I environment is the car responding to a virtual environment under the control of the T-I users.

Suppose the designers want to see the effect of changing the engine design on the car handling performance. The main designers ask engine builder to update their database with a different model of the engine. A virtual mountain road scenario is loaded into the T-I environments and is used to drive the simulation. The designers interactively experiment with the handling characteristics of the simulated vehicle. The components of the system are

The T-I environment. This can be viewed as one large component, but it is more likely that it is many:

The display object and associated visual database. The data input stream consists of updates to the data base which are then rendered by the display object.
The user interface control components - pointers, head trackers and other haptic devices. As with any graphical user interface system, user events are detected by the control components and they output the information as a stream that is fed to other components that are associated with the application.
The application components receive information from the control device components. Based on this information the application components can query and update the visual database which is displayed.

The design databases. This is a description of the car that is used to drive both the simulation and the manufacturing process. It can also be viewed as a large component or as a collection of smaller ones. Outputs from this object include the polygon model used in by the rendering component and the finite-element model used by the simulation.
The simulation object. The required inputs are the finite-element model of the car and the road, as well as a sequence of control inputs that "drive" the car during the simulation.

Requirements imposed on the component system.

The most important aspect of this application is the dependence upon managing bandwidth so that real-time performance is possible. This is made more complex by the need to manage many different types of data streams between components. Currently real-time CORBA implementations are being investigated in the research community \cite{schmidt}, but the standard implementations of CORBA, DCOM (ActiveX) and Java communication mechanisms would be insufficient for this application. The object architecture must provide a mechanism where performance constraints and QoS mechanisms can be associated with the logical data paths between component ports.

In addition, support in the object model for multi-cast communication is very important. While it is likely we will see extensions of Java RMI to multi-cast, it is not part of the CORBA or ActiveX model.

Example 3. The Digital Sky Survey

The Digital Sky Survey project illustrates a different set of distributed object examples. The survey will consist of multiple databases consisting of many billions of meta-data objects that each describe some visible object such as a star or galaxy. Each of these meta-data objects is linked to a digital image in the archival system, and the collection of meta-data objects is organized as relational database. In addition to a reference to the image object in the archive, each of the meta-data objects contains basic reference information and a list of data extraction methods that can be applied to the image object.

A scientist at some remote location may decide to search for all galaxies that exhibit some important set of properties. Some of the properties may relate to information stored as part of the meta-data but some of it may require an analysis of stored images. Formulated as a database query, the request is sent to the database. The set of objects that satisfy the conditions associated with the meta-data can be extracted. Then for each of these objects, a request can be sent to the data archive to apply the remaining tests to the images. This is a data parallel operation that results in a references to the subset of galaxies that satisfy all conditions. It may also be the case that the result of this query must be used as part of a second query submitted to another remote sky survey repository. This may involve the transmission of a large stream of data from the first repository host to the second.

The components of the solution process are the relational databases and the repositories. To set up a complex analysis of the data two or more components may need to be connected by a high bandwidth link. The information that is communicated between components consist of image objects, object references, meta-data information and relational data base queries.

Requirements imposed on the component system.

While many of the problems associated to this application can be found in the previous two, there are also some unique features to this problem.

The first of these involves the extensive use of database technology. While there is a Java database interface standard, it may not scale to the problems described here. In particular, the interaction between the object relational database and image archive require the implementation of the data-parallel remote method invocation described above.

The second problem is related to the communication that must take between components with parallel implementations. The existing commercial technologies would require that a single logical channel would be implemented as a single network stream connection. However, if both components have parallel implementations, then it may be possible to implement the communication as a set of parallel communication streams. Pardis, a parallel implementation and extension of CORBA, is one example system that supports this feature. Pardis demonstrates that it is possible to significantly improve the utilization of network bandwidth by providing parallel streams to implement remote method calls.

Component Systems and Object Technology

Object oriented software design principles are only the first step in building the next generation of metacomputing applications. As the desktop software industry has learned, it is also necessary to define what it means for an instance of an object class to become a component in a distributed system. A precise definition of component depends upon the environment in which the component is used. This environment defines the component architecture which prescribes the required features that a component must support so that the may be composed into functioning applications.

The way in which a component presents a visual interface (if it has one) and responds to events and communicates with other components is defined by the component architecture. The three most important commercial component architectures are Microsoft ActiveX, OMG's CORBA/OpenDoc and Java Beans and Java Studio. However, because our interest is metacomputing systems and not graphical user interfaces, we will focus here on the aspects of component systems that describe the composition and communication behavior of most component architectures.

There are two common models of component integration

Client/Server Communication. In this model a client is an application that can be viewed as an container of components or their proxies. The application makes requests of objects by invoking the public member functions defined by the component objects interfaces. The individual components are servers which respond to the client as illustrated in Figure 3. The control flow is based on function call from and return to the client. Microsoft ActiveX follows this model. CORBA was also designed with this model in mind, but as a distributed object system it is flexible enough to support other models..

Figure 3. Client/Server component models consist of a client container application which hold object components that are often proxies for remote objects. Such an architecture may support multiple protocols between the proxies and the remote components.

Software ICs. An electronic integrated circuit is a component that has input buffers and output ports. A design engineer can connect any output port of the right signal type to an input port of another IC. Software IC systems have the same nature. A software module has input ports and output ports and a graphical or script based composition tool can be used to create instances of the objects and define the connections between the components. The type of an input port is an interface that describes the message that port can receive. These ports can be connected to ports whose interface descriptions describe the types of messages that are sent. As with electronic ICs, an output port messages can be {\em multi-cast} to to matching input ports on multiple other components as shown in Figure 4. The control flow of messages is based on macro-dataflow techniques.

Figure 4. A Software IC architecture breaks the client/server hierarchy. Each component has three standard modes of communication: data streams that connect component ports (solid lines), control messages from the component container (dashed lines) and events (star bursts) which are broadcast to all "listening" objects

In addition to this data stream style communication that takes place between object ports, there are two other standard forms of communication that components systems use.

Control signals. Every component implements a standard control message interface that is used by the component control ``container'' framework to query the components about its properties and state.
Events and exceptions. Events are messages generated by a a component that are broadcast to all other components that are ``listening'' for events of that type. Most user input is handled by events as well as other GUI management tasks.

The Java Bean and Java Studio systems from Sun follow this model very closely.. Other commercial systems that are based on this architecture include AVS and its descendent NAG Explorer which are used to build visualization tools from components. Unfortunately, Explorer has a very limited and inflexible type system which limits it extensibility to larger distributed applications. The CORBA based OpenDoc system uses a similar object model.

The final piece of a component system architecture that distinguishes it from other types software infrastructures is the concept of a component container framework. The container is the application that runs on the users workstation that is used to select components, connect them together and respond to many of the event messages. The container uses the control interface of each component to discover its properties and initialize it.

Microsoft Internet Explorer is an example of a component container for ActiveX. Java Studio provides a graphical user interface for composing and connecting component that is very similar to the layout system used by Explorer and other component breadboards. We shall return to more of the technical requirements for high performance components and container frameworks later in this chapter.

Component Framework Requirements for Grid Applications

To accomplish the task of building component based, high performance application, we must solve certain additional problems. First, the objects in the framework need to know about each other in order to be able to transmit the data and member function messages as indicated. For example, the CAD database may be located in one city and the flow simulation may be running on a parallel processing system in another location. Furthermore, the visualization system may be an immersive environment like a CAVE in another facility. Also, some objects may be persistent such as the design data base, while other objects, such as the grid generation filter, may exist only for the duration of our computation.

One solution would be to use a visual programming system that allows a user to draw the application component graph. NAG Explorer uses this technique. The LSA example described above and Java Studio also use this graphical model Unfortunately NAG's type system is not very rich and it is not clear that graphical composition tools will scale to the networks of more than a few dozen objects. We would also like to describe networks that are dynamic and are able to incorporate new component resources ``on the fly'' as they are discovered.

Systems like Explorer and the current SciRun used a fixed type system. However, most distributed object systems allow arbitrary user defined types to be transmitted over the channels between components. A system must know how to transmit application specific objects over the network. This is called the {\em serialization} problem and a solution to it requires a protocol for packing and unpacking the components of data structures so that they may be reliably transmitted between different computer architectures in a heterogeneous environment. The traditional solution is to use an Interface Definition Language (IDL) to describe the types of the objects being transmitted. IDL is a simple C++ like language for describing structures and interfaces. It was first used in the DCE infrastructure. The DCE IDL was adopted and extended for use in Microsoft DCOM and CORBA. The CORBA extension is the most complete and it is used as the foundation of the specification of the entire CORBA system. Java RMI, on the other hand, is a strictly Java-to-Java communication model, so Java serves as its own IDL. However, there is now a Java-to-HPC++ link that uses a combination of IDL and Java RMI, and JavaSoft has agreed to re-implement RMI so that it runs over the CORBA communication protocol known as IIOP.

One of the most persistent problems with the existing commercial technologies is the poor performance of serialization and communication. The Java Remote Method Invocation (RMI) provides the most sophisticated serialization model, but the performance is several orders of magnitude below the requirements of the Grid applications described above.

The Agile Objects project at the University of Illinois is exploring techniques for high performance implementation of component object standard interfaces and protocols which focus on lowering the cost of crossing component boundaries (lower invocation overhead) and reducing the latency of a remote procedure call (lower invocation latency). In particular, these efforts are focusing on DCOM and Java RMI invocation mechanisms and build on technologies from the Illinois Concert runtime which executes RPC and message calls in 10 to 20 microseconds within a cluster of workstations. In addition, the WUSL real-time CORBA work and the Indiana JAVA RMI-Nexus projects are addressing the same problem in the case of heterogeneous environments.

Most good object oriented systems must also include central mechanisms for the following additional problems.

Persistence, and storage management. It is often the case that an object needs to be ``frozen'' so that its state is preserved on some storage device and then ``thawed'' later when it is needed again. An system with the ability to do this to objects is said to support persistence and it is closely related to serializability as described above.
Object Sharing. The problem of object sharing is also important. If each object instance belonged to only one application, this would be sufficient. However, for objects that are used in multiple applications concurrently, there is an additional problem. For example, the design data base may be in use by several applications. To solve this problem one may associate a session identifier with each circuit of objects. when a message is received by any object the session identifier that accompanies that message can be used to identify the objects that should receive any outgoing messages associated with that transaction.
Process and thread management. Most instances of distributed objects are encapsulated within their own process, but it is possible that we may wish more than one object may belong to the same process. Also we may want the ability to have an object respond to different requests for the same method invocation concurrently. To do this the object system must be integrated with a thread system. There are many reasons that this can be a challenging problem. An important associated problem is that the thread model used to implement the communication and events for the component must also be consistent with the thread model that might be used in the computation kernel. For example, an application that uses Fortran OpenMP may generate threads with one runtime system, but the component architecture may use another. These thread systems often have difficulties co-existing in the same process.
Object distribution and object migration. An object implementation may itself be distributed. This is important in the case of parallel programming, but it can also happen when part of a particular interface may need to be implemented on one system and another part on another. In addition, it is often important for an object to be able to migrate from one host to another. For example, when the first hosts compute resources become limiting, it is advantageous to be able to move the object to a second, more powerful host.
Network Adaptability. As the example described here illustrate, it is essential for the object middleware layer to be able to adapt to the dynamic network loads and the availability of alternative pathways.
Dynamic Invocation. As described so far, the interfaces to distributed objects must be known at compile time. The IDL description is used to generate the proxies/stubs and interface skeleton for the remote objects. However, it is often a requirement for a component system to provide a mechanism that will allow an application to discover the interfaces to an object at runtime. This will allow the application to take advantage of special properties of the component without having to recompile the application.
Reflection. Both object migration and network adaptability as described above are examples of object behavior that is highly dependent upon the way an object is implemented and its runtime system. Reflection refers to the ability to obtain information about an object, such as its class or the interfaces it implements at runtime. It also refers to the capability of an object to infer properties about the its implementation and the state of the environment in which it is executing. For example, reflection can be used to implement dynamic invocation. While reflection can be implemented in any system, Java is the only conventional language that supports reflection directly. A closely related concept is that of a metaobject which can be thought of a runtime object that is bound to each application level object. In some systems, metaobjects are used to implement method invocations. Hence the choice of network protocol to use in executing a particular method invocation can be controlled by the associated metaobject. This allows the object making the method call to be written without concern for way the call is implemented because that is the job of the metaobject. This allows for a great variety of ways to implement some of the features listed in this section. For example, one way to accomplish the same result as object migration is to endow a system with the capability called pseudo-migration which works as follows. The metaobject associated with an object catches each request for a member function call on that object. If the metaobject can detect that the current compute host is too busy, the metaobject can create another instance of the controlled object on another host and forward the call to the new instance.
Event Logging. Debugging distributed systems is very hard. It is essential to have a mechanism that will allow the events associated with a set of distributed interactions to be logged in a way that will help identify what happened and when.
Fault tolerance. An exception handling mechanism is the first step toward building reliable systems, but it falls far short of providing a mechanism where failure can be tolerated in a reliable manner. The system must be able to automatically restart applications and rollback transactions to a previous known state.
Authentication and Security. Authentication allows us to identify which applications and users are allowed to access system components. Security means that these interactions can be accomplished with safety for the data as well as the implementations. It is an issue that goes far beyond the domain of the object system, but the object system must provide a way to allow the user access to the metacomputing authentication and security tools that are available.
Beyond Client/Server. For high performance computation it is essential that the future distributed object systems support a greater variety of models than simple client/server schemes. As illustrated by the example in section 2, there are paradigms that include peer-to-peer object networks. In the future we can imagine massive networks of components and software agents that work without centralized control and dynamically respond to changing loads and requirements.
Support for Parallelism. Beyond multi-treaded applications are those that involve the concurrent activity of many components. An object systems must allow both asynchronous as well as synchronous method calls. In addition, multi-cast communication and collective synchronization are essential for supporting parallel operation on very large numbers of concurrently executing objects.

In the proceeding pages we have attempted to outline many of the technical problems that are associated with extending contemporary component and distributed object technology to the applications that will run on the emerging high performance meta computing grid. While the challenges are great, they are not insurmountable. With a coordinated effort, it should be possible to build an application-level framework for a distributed component architecture that will be suitable for these applications and also interoperate with the emerging standards for desktop software. However, without true interoperability, there is little hope that a substantial software industry will emerge from the high end of scientific and engineering problem solving.