Object Services and Consulting, Inc.
fmanola@objs.com
March 30, 1999
Fundamentally, a grid is an integrating mechanism or concept. This concept can be applied at different technical levels of computer systems, e.g.:
The definitions of "grid" found in dictionaries generally imply some concept of a "network" or "mesh". This is certainly the generic idea of a grid. Many things can be referred to as "grids" in this simple sense, including, in the context of computer systems, the Internet, the Web, or the objects in a CORBA-based distributed object system (which form an interconnected network by virtue of the references the objects have to each other). However, the grid concept used in computational grids, computing fabrics, and the ABIS and CoABS grids implies additional requirements, a stronger cohesiveness. Typically, such "true grids" are formed by starting with networks of distributed resources, and adding capabilities or services that help further integrate the interconnected resources. The integration found in such "true grids" involves such things as:
Grids at these individual levels are useful by themselves, but the maximum advantage comes when these different levels of grid capabilities are combined. There is a need for additional work to develop a unifying technical grid architecture which incorporates these separate grid levels, and identifies mappings between them. The requirements of true grids at the higher levels (e.g., agent) probably require grid-like functionality at the lower levels (e.g., data and computation) anyway, hence these mappings are needed to guide the implementation of the higher levels in terms of the lower ones. These functional mappings are similar to the types of end-to-end mappings being investigated in providing Quality of Service guarantees in distributed systems [Man99b]. Building such a combined grid, for example, would appear to involve all the computational grid issues of system management (and associated metadata), distributed computation and load balancing, mobile code, security, etc., as well as the data-, object-, and agent-level versions of those issues (issues which, e.g., reflect the semantics of the entities on whose behalf the agents are functioning, or the resources which the agents wrap). This requires a way of describing resources and capabilities, and resource requirements and tasks, and a way to map between them, at all these levels. The combined grid also involves the need for a way to define higher-level goals, i.e., a way to define the goals of the grid itself, that are optimized by the load-balancing, etc. that is going on. These goals are presumably at a higher-level than those of individual agents (although these might also be characterized as the goals of higher-level agents, or agents of higher authority, rather than goals of the grid per se).
In such an integrated architecture, in addition to the technical levels already mentioned, there is also a need to define additional forms of organization on the available resources. These include such things as the use of multiple functional tiers, the use of Common Schema concepts or enterprise-level ontologies (enterprise-wide agreements on common semantics), and specialized schemas/ontologies for use by specialized user communities (together with mappings to the common definitions where possible). Semantics-based mappings between the different technical levels in such an architecture are also required. Such additional levels of organization provide the basis for more interoperability among the various technical levels of resources contained in the system, and hence help enhance the ability of these resources to operate as a "true grid"
All this work requires additional analysis not only of technical issues, but also of application requirements. All sorts of technology already exists that is potentially useful in integrating distributed resources. However, further detailed understanding of application requirements is needed to drive the detailed selection of the combination of technical features needed to form various types of grids. Work on the CoABS grid is an example of ongoing work of this type.
The grid concept is being applied to computer systems at several different "levels" (e.g., to both systems of computers and systems of agents). As a result, this study attempts to identify some general characteristics which seem to apply to all sorts of grids, in order to provide a "big picture" in terms of which grid concepts can be better understood, rather than presenting the details of technical issues associated with specific grid concepts (although this is also important). In Section 2, we give examples of several "grid-like" concepts, in order to provide a background for understanding the grid concept. In Section 3, we identify some general grid characteristics, based on common characteristics of these examples. We also look at some important types of computer systems, such as database and distributed object systems, examine the extent to which they resemble grids, and identify some types of facilities which, when added to those systems, would cause them to be considered more "grid-like", based on the characteristics we have identified. We also discuss the need to combine these various types of grids together into unified architectures, and describe the start of a general approach to doing this.
The paper identifies five major application classes for computational grids:
The paper also notes that "computational infrastructure, like other infrastructures, is fractal, or self-similar at different scales. We have networks between countries, organizations, clusters, and computers; between components of a computer, and even within a single computer." The paper describes systems at the scales of end system, cluster, intranet, and internet, the basic idea being that these constitute different scales at which similar computational services should be provided (mimicing those provided at the smallest scale, in the individual computer). Of course, it is then necessary to look at how those similar services must be provided as the scale changes, since different technologies must typically be employed.
[GFLH98] and [FK99b] describe several projects developing technology for computational grids. A simple example is PVM (Parallel Virtual Machine) <http://www.epm.ornl.gov/pvm/pvm_home.html>. PVM is a software package that permits a heterogeneous collection of Unix computers hooked together by a network to be used as a single large parallel computer. PVM allows users to exploit existing computer hardware to solve large computational problems at minimal additional cost. PVM is very portable, and the source code has been compiled on a wide variety of machines. PVM is very widely used, and is a de facto standard for distributed computing world-wide. A wide range of PVM-related links is available at the PVM home page cited above. Related facilities are provided by MPI (Message Passing Interface) [GLS94], a community-generated standard for message passing used to interconnect multiple machines. UCLA's Project Appleseed <http://exodus.physics.ucla.edu/appleseed/appleseed.html> is an example of how MPI can be used to link together a cluster of computers (Macintoshes in this case) to provide "a plug and play parallel computer" in support of numerically-intensive processing. The Appleseed Web site also contains pointers to further information on MPI.
Legion <http://www.cs.virginia.edu/~legion/> [GG99] provides an environment in which a collection of workstations, vector supercomputers, and parallel supercomputers connected by LANs and larger-scale networks appears to the user as a single very powerful computer. Legion uses object-oriented design techniques to "simplify the definition, deployment, application, and long-term evolution of grid components". The Legion architecture defines a complete object model that includes object abstractions for compute resources (called host objects), storage systems (called data vault objects), as well as other object classes. Users can use inheritance to specialize the behavior of these objects to support specific requirements, as well as to develop new objects. Legion supports PVM's libraries via emulation libraries. Legion aims to provide a single, coherent, virtual machine addressing scalability, programming ease, heterogeneity, fault tolerance, security for users and resource providers, site autonomy, multilanguage support, and interoperability. The use of reflection (the representation of parts of the underlying system as objects that can be directly operated on to access and change system behavior) is particularly important in Legion. For example, host objects represent Legion processors. One or more host objects run on each computing resources included in Legion. These objects create and manage processes for application-level Legion objects. Object classes invoke the operations of host objects to activate their instances on the computing resources that the host objects represent. Representing computing resources as Legion objects abstracts the heterogeneity of different host computing platforms, and allows resource owners to manage and control their resources within the context of the system. (Reflection is also an important technology in providing systemic properties (sometimes called ilities) such as reliability, survivability, and security, and quality of service characteristics, in large-scale computer systems [Man99b]).
Globus <http://www-fp.globus.org/> [FK99c] is developing basic software infrastructure for computations that integrate geographically distributed computational and information resources. Globus is based on the assumptions that:
[FF97a,b,c] discuss the concept of "High-Performance Commodity Computing", the idea that computational grids should be based on emerging commodity network computing technologies such as CORBA, DCOM, and JavaBeans, together with the Web and conventional networking approaches. The papers discuss a three-tier architecture which integrates these technologies. This approach is in contrast with the more specialized grid architectures proposed in Legion and Globus (although these could be integrated to support lower-tier services). The authors particularly emphasize the importance of the emerging "Object Web", integrating the Web, distributed objects, and databases, in the development of computational grid technology.
The focus of much of this work appears to be on large-scale computing problems, although the technology is clearly not limited to those applications. Other grid concepts discussed below extrapolate ideas in distributed supercomputing to more complex applications. For example, in distributed supercomputing, the paradigmatic application is often that of a single large computing "job". The program is run, and a result is produced. A grid is required simply because the job is too large for a single machine. In other grid concepts, the application is of a more continuous nature. This means it must be possible for participants to enter and leave the grid, load distribution is even more dynamic (because the load and its requirements change more dynamically), etc. The next section describes a new twist on more familiar applications supported by computational grid concepts.
The articles give ubiquitous network computing as an example of an application made possible by the Fabric. The first aspect of the application is network computing: each user can access their individual "desktop" (configuration, including all applications, data, etc.) from anywhere on the network. To this is added ubiquitous computing, in which processors, displays, and input devices are everywhere. Users are tracked by sensors, and their location information is used to direct their applications and data to the appropriate devices that are located where the user is located. This changes as the person moves. There is no need for users to explicitly login to access their computing spaces, they are just "there". The Fabric helps avoid the need for the universal presence of sufficient computing power, displays, and input devices necessary to run whatever applications the user wishes to run locally. In this scenario, processors are located all over, e.g., throughout buildings ("as populous as wall sockets, perhaps more so"), and are interconnected by low latency, high bandwidth connections. When the user is stationary, the user's tasks run on a local cell, consisting of processors in the general vicinity, which work together as a single system. If the tasks require it (and they can be paid for), additional processors can be added (thousands of them, if necessary); the computing resources are configured as required to run the software the user wants to run. As the user moves, their cell moves with them. Processing nodes leave the user's cell as their distance makes their communications latencies more than some threshold level, and are replaced by nodes that enter the cell as the user gets near them. A new generation of wearable processor, display, and input devices rounds out the picture.
Technically the concept of Computing Fabrics involves ideas that are somewhat similar to those of the computational grid, but the application focus is somewhat different. Technologies relevant to the creation of the Computing Fabric concept include:
Overall, the ABIS grid is a federated, heterogeneous system-of-systems. Participants in the grid may include civil, commercial, and foreign organizations. In the grid, ownership and management of information and services will be structured according to the needs and prerogatives of the participants. Grid functionality will extend to all types of users in joint and combined operations. As a result, the grid must cope with the heterogeneity of the commercial world, and of allies and potential coalition partners.
[J695] describes a related concept, called C4I For The Warrior (C4IFTW). C4IFTW sets forth a 21st century vision of a global information infrastructure referred to as the global grid that will provide virtual connectivity from anywhere to anywhere instantaneously on warrior demand. This grid connects commanders, sensors, weapons systems, etc., and is made up of a web of computer controlled telecommunications grids that transcends industry, media, government, military, and other nongovernment entities. The C4IFTW global grid essentially corresponds to the ABIS information grid. In addition, C4IFTW identifies a sensor/surveillance grid, layered on top of the global grid, and roughly corresponding to the ABIS battlespace awareness capability.
The concept of Network-Centric Warfare [CG98, DC498] develops these ideas somewhat further. As described in the references, Network-Centric Warfare is a derivative of network-centric computing. Just as network-centric computing is being exploited to provide competitive advantage in the commercial business sector, the emerging concepts of Network-Centric Warfare exploit information superiority to provide a competitive edge in warfare. Grid concepts are key elements in Network-Centric Warfare. In addition to making use of the information grid and sensor grid concepts of ABIS and C4IFTW, Network-Centric Warfare introduces a third element (effectively present in ABIS and C4IFTW, but not explicitly called out as a grid in these descriptions), called the engagement grid. Specifically, Network-Centric Warfare includes the following grid concepts:
The Information Superiority Chapter of the 1998 Joint Warfighting Science and Technology Plan [DDRE98] describes the composition of the information, sensor, and engagement grids to form a C4ISR grid that supports DoD's Information Superiority concept (the "degree of dominance in the information domain that permits the conduct of operations without effective opposition"). The plan also identifies high-level functional capabilities required for Information Superiority, which of them are supported by the C4ISR grid, and key technologies the grid must support, including:
These bullets suggest that the CoABS grid knows not only about agents, but about their computational requirements (e.g., how they can be broken up into processes, so they can be distributed across multiple computers), and about available computational (and other) resources. Hence, the CoABS grid concept appears to incorporate both the concepts of "grid" as used in Section 2.1, and Computing Fabric as used in Section 2.2, in the sense of providing a unified, heterogeneous distributed computing environment in which computing resources are seamlessly linked. In addition, the CoABS grid extends the idea upward to the agents that are the "applications" of this distributed computing environment. Agents become both applications whose computations can be distributed within this computing environment, and also resources that can be used by this environment. At the same time, there appears to be an interface between these two layers, so that at least some agents, e.g., those that do load balancing, can operate on the computing level grid. Furthermore, since the CoABS grid is defined as encompassing other resources (e.g., forces), CoABS grid ideas also appear to be consistent with aspects of the DoD grid concepts of Section 2.3 (although this relationship has, at least so far, not been particularly emphasized). For example, agents are explicitly mentioned as components of these DoD grid concepts, and agents could well be the implementations of choice for many of the applications incorporated in these grids. Agents could also serve as wrappers of resources (and mediators between them) in these architectures.
Building the grid suggested by the above bullets would appear to involve all the computational grid issues of system management (and associated metadata), distributed computation and load balancing, mobile code, security, etc., as well as the "agent-level" versions of those issues (issues which, e.g., reflect the semantics of the entities on whose behalf the agents are functioning, or the resources which the agents wrap). This requires a way of describing resources and capabilities, and resource requirements and tasks, and a way to map between them, at both agent and computational levels. This grid also apparently involves the need for a way of defining higher-level goals, i.e., a way to define the goals of the grid itself, that are optimized by the load-balancing, etc. that is going on. These goals are presumably at a higher-level than those of individual agents (although these might also be characterized as the goals of higher-level agents, or agents of higher authority, rather than goals of the grid per se.)
All this suggests that one view of the CoABS grid could be that of a combination of a computational grid and an Agent System architecture (or at least a form of one, aimed at federating conventional agent system architectures). This would mean that it would need to incorporate typical agent system architecture services. A typical list of these services is given below (see, e.g., [Pis98a; KT98; Paz98a,b; Tho98a (slide 13)].
The problem with ascription is that it allows practically anything to be described as an agent, making communication about agent concepts difficult among people who do not share the same point of view. A useful "filter" for using "agent" to describe a piece of software is that it should be useful to do so; that is, calling something an agent should in some useful sense distinguish it from concepts we already understand. For example, [Bra97b] quotes [Sho93] as observing:
"It is perfectly coherent to treat a light switch as a (very cooperative) agent with the capability of transmitting current at will, who invariably transmits current when it believes that we want it transmitted and not otherwise; flicking the switch is simply our way of communicating our desires. However, while this is a coherent view, it does not buy us anything, since we essentially understand the mechanism sufficiently to have a simpler, mechanistic description of its behavior."
A descriptive definition of an agent, on the other hand, typically involves a set of attributes, which a given agent might have to a greater or lesser extent, one such set being:
A similar situation exists in attempting to precisely define "grid". We can get a general idea of what "gridness" is from the "family resemblance" of the grid examples presented in Section 2. Further examples of grid-like ideas are presented in Characterizing the Agent Grid. In addition, that report contains a set of general grid properties which could be used in a descriptive definition of a grid. [FK99a] contains other sets of grid attributes. In addition, considering an agent grid as a generalization of an agent system architecture, the list of services in Section 2.4 could be used as descriptive attributes of agent grids, together with sets of attributes given in [HS98].
The definitions of "grid" found in dictionaries generally imply some concept of a "network" or "mesh". This is certainly the generic idea of a grid. Many things can be referred to as "grids" in this simple sense, including, in the context of computer systems, the Internet, the Web, or the objects in a CORBA-based distributed object system (which form an interconnected network by virtue of the references the objects have to each other). However, the grid concepts described in Section 2 imply additional requirements, a stronger cohesiveness. If we are going to use the term "grid" in a computer context, the example of "mis-ascription" cited above becomes relevant: in the same sense that it buys us nothing to refer to a light switch as an "agent", it buys us nothing to refer to the Web as a "grid", even if it might be technically accurate to do so. In other words, if we are going to use a new term such as "grid" to describe particular computer-based systems, it would be helpful to explicitly identify the properties we want to associate with those systems that distinguish them from computer-based systems we are already familiar with (such as the Internet, the Web, distributed object systems, etc.), and for which we already have other names.
In addition, a problem with current descriptions is that the grid concept is relatively new. As a result, the focus of descriptions is on individual grid concepts and applications, and little attempt has been made to provide a "big picture" that might help unify the various concepts and related technologies. For example, what is the relationship between a computational grid or computing fabric, the Web (as a form of "information grid"), distributed object systems, and agent grids? In addition, it is clear that multiple kinds of grids will in some cases be integrated to form more extensive "grids". This is illustrated by integration of information, sensor, and engagement grids in the DoD architectures described in Section 2.3. Similarly, while the CoABS grid does not (at least not yet) consider integration of data, distributed object systems, or the Web to any great extent, it seems clear that it will have to in some sense integrate these technologies in order to support its intended applications.
In the sections that follow, we describe some basic ideas for use in characterizing computer-related grids. In Section 3.2, we discuss some general attributes that seem to apply to computer-related grids. In Section 3.3, we look at some important types of computer systems, examine their "gridness", and identify some types of facilities which, when added to those systems, would cause them to be considered more "grid-like". In Section 3.4, we discuss the need to combine the various levels of "grids" together into unified architectures, and the start of a general approach to doing this. We present some concluding remarks in Section 3.5.
"Gridness" can be thought of as a continuum. At one end, there is the simple interconnection or network of resources, as in the dictionary definitions of "grid". We can think of such a network as a "loose grid", if we must use the term "grid" for these networks at all. At the other end, there are the systems that allow the interconnected resources to function as well-integrated units, as in the grid concepts described in Section 2 (particularly the DoD concepts described in Section 2.3, and the CoABS grid as described in [CoA98]). We can refer to these systems, which exhibit the characteristics described above (and possibly other defining characteristics not yet identified) in the strongest sense, as "true grids". This "true grid" endpoint of the "gridness" continuum is, of course, an arbitrary designation. Systems will exist at various points along the continuum, becoming "stronger grids" as they exhibit these "gridness" characteristics to a greater extent.
A key aspect of grids is composition of resources in a sense that goes beyond simply interconnecting them (although interconnection is clearly required). The compositional facilities provided by grids can apply at all levels, including hardware/computational power, data and software (software including both individual components and services, and composition including such things as interoperability and formation of aggregates), and agents and people (e.g., formation of communities and teams). These compositions of resources are applied to "composed tasks" (i.e., tasks that go beyond separately accessing or invoking the individual resources): in the transportation grid, the composed task is generically "provide access to resources"; in the power grid, it's "provide power"; in computers, it's presumably "provide computation" (or, more abstractly, "perform service/task"). At the agent/human level, the tasks are suitably abstract (e.g., as "translate document" is enabled by the CoABS Grid knowing that there's a connected person who understands Arabic). Ideally, we want these compositions to exhibit a fractal property or, looking at them the other way around, we want composition to exhibit a closure property. This means that the resource compositions should have characteristics that are similar to those of individual resources at the same level of abstraction, so that we can treat the compositions as resources themselves. For example, the computational grid seamlessly forms a large virtual computer from individual computers in a network, forming something that looks like yet another computer (which itself could be further aggregated). Similarly, relational database theory emphasizes the idea that operations on data such as joins should exhibit a closure property, permitting newly formed aggregates of data to be operated on in the same way as the pieces from which they were formed. A similar idea can apply to agents. It should be possible to form teams or communities of agents that are interacted with as if they were single agents, with the group transparently dividing up any resulting work that has to be done. Grids also tend to emphasize the dynamic aspects of composition, i.e., that it should be possible to easily form compositions of resources, then break them up when the resources are no longer needed, for recomposition elsewhere. In addition, grids tend to involve some level of unified (but not necessarily centralized) management, since grids tend to be thought of as "units". However, care is needed to match the level of abstraction of the management with the level of abstraction of the grid. For example, the Internet has a certain amount of load management at the network level, but this does not make it a computational grid, even though it does connect numerous computers. A level of management at the computational level would be required for that.
Whether the composition of resources involves movement of the resources depends on the kind of grid and its applications (there is invariably movement of some sort, but not necessarily of the resources). For example, the composition of resources in a transportation grid necessarily involves moving those resources from where they are to where they are needed. In a computational grid, the resources are generically "computational capacity". In conventional computer networks, the capacity itself doesn't move, instead, the load is moved. However, specific groupings of capacity ("virtual capacity") can seem to move as sharing arrangements and interconnections are set up and torn down (as in the case of the Computing Fabric of Section 2.2). Data is moved in a computer network in the same way that resources are moved on a transportation grid. In the case of distributed object systems, there can be either movement of load alone (e.g., in CORBA systems, where objects are static, messages representing load are sent to them, and messages representing results are returned), movement of resources (in the case of Java objects), or both (e.g., even in a Java-based network, some services, or special purpose devices such as sensors, may not be able to move). Similar considerations apply to agent systems.
Grids involve the participants providing to the grid as well as taking from it. There is a great deal of asymmetry in some grid-related technologies that sometimes must be dealt with in order to build "true grids" from these technologies. For example, it is straightforward to think of connecting personal computers to the Internet in order to access information. It is less straightforward to think of these personal computers as being part of the Internet in the sense of having their file systems and computational facilities fully integrated with the Internet in order to form a computational grid in the sense of Section 2.1. To do this, additional technical (and security) issues must be addressed. From another point of view, it is generally more straightforward to integrate data than it is computational capabilities. Typically this is because (a) the interfaces (for others to gain access to attached computing resources) are not as well developed as they are for data, and (b) the mechanisms for effectively using the added computation are not as well developed either (e.g., in a local network it may be possible to run an application located on someone else's machine, but it is not as easy to distribute a computation over several machines).
The relationship between a grid in a "loose" sense and a grid in the stronger sense of Section 2 is generally that the "loose" grid is or can be used as part of an organization that constitutes a "true grid". Finding the actual grid may sometimes require considering a wider context, or adding additional technology. For example, the transport grid (or a subset, like "the railroad grid") may be viewed as just the network of transport connections and the points connected. However, this grid was created in the context of higher-level desires by people to move/share resources (food and other goods). It is the unification of the transport links, together with the higher-level control mechanisms (and to some extent the economic system that provides the "tasking") that creates a grid in the stronger sense. The Internet is another example. At one level, the Internet may be thought of as a loose form of grid, because it provides network connectivity among multiple computers. However, considering it a grid in a stronger sense requires additional technology. For example, [ABIS96] notes that while the Internet might be a model for the ABIS information grid, it lacks attributes such as security, and resource allocation based on (mission) priority, needed to support their idea of a grid. Considering a wider context can also identify relationships between the Internet and a stronger grid concept. For example, via Internet email, it is possible for people to organize collaborative efforts, integrating the activities of widely-scattered people. This does not mean that the Internet, or Internet email, by itself, constitutes a grid. However, considering the connected people as part of the "system" enables that system to be thought of more realistically as a grid, with the Internet as a part, and with higher-level organizational strategy and goals being provided by the people involved. Similarly, distributed computer networks are at the heart of the computational grids described in Section 2.1, but additional mechanisms must be added to those networks in order to form grids in the stronger sense. Expanding the context can help us see both the grid that was intended, and also what additional components and mechanisms would be necessary to form a "true grid". This suggests that we might want to look at technologies, such as the Web and distributed object systems, that clearly exhibit certain characteristics that we associate with grids. However, we want to look at them not as grids in the fullest sense, but as "proto-grids", and then look carefully for the additional technologies that could be added to them to create grids in the stronger sense, as a way of pinning down what a "true grid" really is.
Finally, as stated in the final bullet above, "gridness" seems to imply that the system is "aware of itself" to a certain extent, and has the ability to carry out its tasks "itself", without a great deal of manual intervention. For example, any interconnected group of distributed computers could be used as a much larger "virtual computer" by employing programmers to cope with all of the distributed programming and other problems necessary to use these resources in specific applications. That does not mean that this set of distributed computers by itself is a grid. What differentiates a computational grid is the fact that the grid itself provides services over and above the computers and network to help support the "virtual computer" illusion (possibly to a greater or lesser extent), and alleviate at least some of the detailed programming that would otherwise be necessary. Similar comments apply to grids at other levels.
General technical issues associated with each of these types of systems are fairly well known (or, in the case of the agent systems, becoming so). However, we would like to understand something of the extent to which these types of systems can be said to form "true grids" in the sense discussed above, and the sorts of technologies that might be required to form "true grids" from these systems if they can't be considered grids already. In this section, we briefly look at these important types of computer systems, generally evaluate their "gridness", and identify some types of facilities which, when added to those systems, would cause them to be considered more "grid-like".
We need not say much about grids at the level of computation, since the computational grid is our original, paradigmatic computer-related grid. Computational grids combine an interconnected network of computers with the necessary control and other technologies necessary to form a "true grid" from these computing resources, and the grid exists to form compositions that are bigger, virtual computers. The technologies that need to be added to the interconnected computers to form the grid have been introduced in Sections 2.1 and 2.2, and are thoroughly discussed in the cited references.
The dividing line between the technologies needed at this level and at other levels is necessarily fuzzy. For example, some of the technologies involved at this level are those that provide composition of "computation", not just of "computers", e.g., parallel and distributed programming technologies, such as those provided by PVM. The need for composition at the level of "computation", not just "computer" (but nevertheless at a fairly low level) is further illustrated by Jini's inclusion of a distributed transaction facility as an integral part of what is essentially a rather basic set of facilities. Transactions essentially define compositions of computations that are to be considered, from the outside, as single atomic units, and hence help simplify the programming of distributed concurrent computations.
Grid-like systems also exist at the level of data. By analogy with general grid principles, data-level grids would interconnect pieces of data, and enable the interconnected collection of data to be treated as a unit for various purposes. An obvious candidate for "gridness" at this level is a database. A database constitutes a data grid in the loose sense, since it forms an interconnected collection of related pieces of data. However, a database system can also be thought of as more of a "true grid" by considering the compositional and other technologies typically associated with modern database systems. For example:
At the same time, conventional DBMSs are limited in their support to just data, and data of relatively limited types at that (for example, object DBMSs are considered below). We might expect true "data grids" to provide support for many more data types than current DBMSs. In addition, DBMSs would more closely resemble "true grids" by incorporating additional self-management and organizing facilities. For example, an active DBMS that monitored its own content, and could automatically incorporate attached new data sources, would exhibit more "true grid" characteristics than current "static" DBMSs. Ideally, such capabilities would also be extended to allow the connection of heterogeneous databases to form federations, based on common metadata, ontology, and conceptual schema concepts, much more readily than is now the case. DBMS functionality could also be distributed into the network so that "the network is the DBMS". This trend is related to the information mediator architectures of the DARPA I*3 and BADD programs, as well as to information agents [Tho98a (slide 14)].
As noted above, we might expect true "data grids" to provide support for many more data types than current DBMSs do. The World Wide Web is an example of the variety of data that we would expect to be included in a "data grid". The Web includes a wide variety of data types, including not only HTML pages, but also files of many types (including various document formats, spreadsheets, etc.). The Web is in many respects a primitive form of distributed database (using its own particular data representations), similar in many respects to early network databases. Once a page is posted to a Web server, it potentially (assuming it points to other pages, and other pages point to it) becomes part of an interconnected collection of data whose component pages can be readily and uniformly accessed. However, the mechanisms needed for unifying this collection into a more coherent whole are at a relatively early stage. Examples of the additional technology needed to make the Web more of a "true grid" include:
The Web can increasingly be thought of as a form of object grid as well [Man98a,b; Man99a], due to:
However, it is not enough for such services to be defined; they must also be implemented, and integrated in a seamless way in a given system, in order for that system to begin to have grid properties. This is a general issue with distributed object systems today: while the systems provide the basic distributed object interconnection facilities, the additional services which would allow the objects to be combined and used in flexible ways are generally either not very well developed, or not integrated in a very transparent way either with the objects themselves or with each other. Ideally, what is desired is a seamless "sea of objects" which eliminates or minimizes distinctions between local, persistent, or distributed objects, and in which services are transparently available. For example, an object DBMS attempts to both minimize the distinction between transient and persistent objects (including the largely-automatic movement of objects off and onto persistent storage) and seamlessly integrate services that can be used with such objects. A great deal of additional work must be done when using any of today's distributed object systems (including CORBA) to achieve even this level of seamlessness and integration, let alone transparently supporting such capabilities as load balancing or object replication (although there is an OMG Replication Service RFP to which responses are currently being submitted). Some of these issues are described further in [Man99b].
Both the sets of higher-level services available with current distributed object systems (CORBA, DCOM, Java, and their developments) and the maturity of these services differ greatly. Some facilities are rapidly being developed for Java which are becoming more slowly available in CORBA (due in some respects to the need in CORBA to deal with platform and language heterogeneity). Also, for various technical reasons, some of the techniques used in these distributed object systems do not yet scale well to systems containing many millions of objects (although, in spite of this, such systems can and have been implemented using CORBA-based technologies).
There is also a great deal of work needed on better object composition mechanisms, including improved techniques for forming basic objects from separate pieces of data (state) and code (software), and improved techniques for forming higher level components (or "business objects") or other object aggregations, complete with object interfaces, from collections of individual objects. Better facilities are also needed in many other areas, including:
At the agent level, a considerable amount of additional work also needs to be done, as illustrated by the existence of the CoABS program itself, and work on the CoABS grid described in Section 2.4. An agent grid exhibits all the general requirements (and associated services and issues) of the other grid levels, but "translated" into the agent level. For example, load balancing at the agent level involves balancing the loads of agents (and thus requires a way to describe the "load" of an agent, and how to tell if an agent is "overloaded"), and composition must address the requirements of agent composition (e.g., into teams), and agent-level division of labor. The references cited in Section 2.4 describe some of the many issues connected with the development of the CoABS grid, a particular agent-level grid concept.
At the same time, the technical demands of grid concepts at all levels require increasing amounts of "intelligence", collaborative ability, adaptability, component mobility, etc.; in other words, characteristics frequently associated with agents. For example [Bra97b] discusses the use of agent technology in simplifying and enhancing distributed computing capabilities, and in particular enhancing intelligent interoperability in such systems. One such use is the incorporation of agents as resource managers. He notes: "A higher level of interoperability would require knowledge of the capabilities of each system, so that secure task planning, resource allocation, execution, monitoring, and possibly, intervention between the systems could take place. To accomplish this, an intelligent agent could function as a global resource manager." Further distributing these functions among multiple agents, "A further step toward intelligent interoperability is to embed one or more peer agents within each cooperating system. Applications request services through these agents at a higher level corresponding more to user intentions than to specific implementations, thus providing a level of encapsulation at the planning level, analogous to the encapsulation provided at the lower level of basic communications protocols." Agents can also assist in providing better user interfaces for such distributed systems. As [Bra97b] observes, "In the future, assistant agents at the user interface and resource-managing agents behind the scenes will increasingly pair up to provide an unprecedented level of functionality to people."
[Gen97] also describes the role of agents in enabling interoperability in distributed systems. In his approach, agents and facilitators are organized into a federated system, in which agents surrender autonomy in exchange for the facilitator's services. Facilitators coordinate the activities of agents and provide other services such as locating other agents by name (white pages) or by capability (yellow pages), direct communication, content-based routing, message translation, problem decomposition, and monitoring. On startup, an agent initiates an ACL connection to the local facilitator and provides a description of its capabilities. It then sends the facilitator requests when it cannot supply its own needs, and is expected to act to the best of its ability to satisfy the facilitator's requests.
The integration of agents with other levels requires the use of object/component technology, together with reflective (self-referencing) capabilities combined with extensive metadata. For example, [Bra97b] observes: "A key enabler is the packaging of data and software into components that can provide comprehensive information about themselves at a fine-grain level to the agents that act upon them. Over time, large undifferentiated data sets will be restructured into smaller elements that are well-described by rich metadata, and complex monolithic applications will be transformed into a dynamic collection of simpler parts with self-describing programming interfaces. Ultimately, all data will reside in a "knowledge soup", where agents assemble and present small bits of information from a variety of data sources on the fly as appropriate to a given context. In such an environment, individuals and groups would no longer be forced to manage a passive collection of disparate documents to get something done. Instead, they would interact with active knowledge media that integrate needed resources and actively collaborate with them on their tasks." The Web, in its role as the beginnings of a data/object grid, can be said to be moving in this direction now. This is particularly true when technologies for addressing finer-grained portions of Web documents (e.g., XML, and related technologies) and for attaching behavior to Web data are considered [Man98a,b; Man99a]. [Bra97b] also identifies the need for such agents systems to be able to interact with both object systems and more conventional software: "Ideally, each software component would be "agent-enabled", however, for practical reasons components may at times still rely on traditional interapplication communication mechanisms rather than agent-to-agent protocols."
Objects provide a generic modeling or abstraction mechanism for looking at the wide range of resources that need to be included at all levels in such a combined system. An object in this sense is simply an encapsulated unit that has identity, an interface (possibly more than one), and communicates via messages with other objects and the "outside". This use of objects mirrors the use of objects as a general modeling mechanism in the ISO Reference Model of Open Distributed Processing (RM-ODP) [ISO95]. RM-ODP is intended to describe any distributed processing system (including, in some cases, the roles of humans that may be involved in the system), and its use of objects as a modeling abstraction is not meant to imply that the system is actually implemented using object-oriented programming techniques. However, while object abstractions need not necessarily be implemented using object-oriented programming, the use of these abstractions makes the application of object technologies such as CORBA, Jini, etc. relatively straightforward.
Representing the computational and communication components of a computational grid as objects, as illustrated in the Legion system's reflective capabilities, allows these components to be both uniformly represented within the architecture, and managed in a straightforward way by higher level components. The approach of representing computer or network components as objects for management purposes is well-known in both network and computer system management technologies. Data can be represented as objects in a straightforward fashion, by defining object interfaces containing get (read) and set (write) operations. The World Wide Web Consortium Document Object Model [Woo98] is an example of a set of such interfaces designed to provide object-oriented interfaces to Web data. Such interfaces provide programs and agents with more uniform access to information represented both as data (e.g., in databases, on file systems, or in the Web) in distributed object systems, and also support the integration of more "intelligence", in the form of behavior, with such data. Finally, object interfaces can encapsulate "smart things", e.g., more or less smart agents, and human beings. For example, agents can be modeled as objects (independently of whether they are implemented as objects), in the sense that they are encapsulated things with independent identity, present interfaces to the rest of the world, and communicate to anything outside them via messages sent to interfaces. Similarly, people can be modeled as objects: "fmanola@objs.com" is the identifier of an interface to which messages can be sent. In some cases the messaging protocols between these various kinds of objects will be relatively simple (e.g., conventional object RPC between distributed software objects, or commands sent to hardware), while in other cases they will be more complicated (agent communication language (ACL) sent between agents, or the email flow between people); however, similar abstraction principles can apply to objects at all levels.
In such an integrated architecture, there is also a need to define additional forms of organization on the available resources in addition to the technical levels already discussed, together with associated metadata. For example, large scale distributed object systems increasingly are being designed with 3- (or sometimes multi-) tier architectures [MGHH+98]. These architectures involve the division of the system's components (and object definitions) into functional tiers based on the different functional concerns they address. For example, a typical 3-tier architecture has a tier for objects representing user interface elements, a tier for business or application objects, and a tier for database servers. The business object tier separates out the common definitions of enterprise operations and semantics from the more specialized concerns addressed in the other tiers. Other examples of such organization include the use of Common Schema concepts [Man98c] or enterprise-level ontologies (enterprise-wide agreements on common semantics), and specialized schemas/ontologies for use by specialized user communities (together with mappings to the common definitions where possible).
Semantics-based mappings between the different technical levels in such an architecture are also required. For example, the ATAIS architecture document [BFHH+98] describes a series of interoperability levels: isolated, co-habitable, syntactic, semantic, seamless, and adaptive. The computational grid idea can be characterized as emphasizing high levels of interoperability on this spectrum, but at a low level of abstraction (i.e., in terms of computing resources). The agent grid often involves a much higher level of abstraction. Other levels (e.g., data, objects) are, in a sense, in between these extremes. Raising the level of abstraction complicates providing "gridness" (deep integration) because the requirements on one side, and the available resources/services on the other, are more semantically heterogeneous (unlike, e.g., "memory" and "CPU bandwidth"), and thus both characterizing them, and matching requirements with resources, becomes harder. An example of this is the complexity of addressing quality-of-service (QoS) issues, which involves defining mappings between "quality" measures at higher levels, and resource allocations at lower levels [Man99b].
Such additional levels of organization provide the basis for more interoperability among the various technical levels of resources contained in the system, and hence help enhance the ability of these resources to operate as a "true grid".
The grid concept can be usefully applied at a number of individual technical levels (computation, data, object, agent). The development of grid concepts at these various technical levels reflects the fact that simple interconnection technologies at these levels are becoming relatively mature (even though there is still much work to do on these technologies). The emphasis now is on techniques for combining the interconnected resources to solve increasingly complex problems. In particular, there is emphasis on:
Grids at these individual levels are useful by themselves, but the maximum advantage comes when these different levels of grid capabilities are combined. There is a need for additional work to develop a unifying technical grid architecture which incorporates these separate grid levels, and identifies mappings between them. The requirements of true grids at the higher levels (e.g., agent) probably require grid-like functionality at the lower levels (e.g., data and computation) anyway, hence these mappings are needed to guide the implementation of the higher levels in terms of the lower ones. These functional mappings are similar to the types of end-to-end mappings being investigated in providing Quality of Service guarantees in distributed systems [Man99b]. Building such a combined grid, for example, would appear to involve all the computational grid issues of system management (and associated metadata), distributed computation and load balancing, mobile code, security, etc., as well as the data-, object-, and agent-level versions of those issues (issues which, e.g., reflect the semantics of the entities on whose behalf the agents are functioning, or the resources which the agents wrap). This requires a way of describing resources and capabilities, and resource requirements and tasks, and a way to map between them, at all these levels. The combined grid also involves the need for a way to define higher-level goals, i.e., a way to define the goals of the grid itself, that are optimized by the load-balancing, etc. that is going on. These goals are presumably at a higher-level than those of individual agents (although these might also be characterized as the goals of higher-level agents, or agents of higher authority, rather than goals of the grid per se).
In such an integrated architecture, in addition to the technical levels already mentioned, there is also a need to define additional forms of organization on the available resources. These include such things as the use of multiple functional tiers, the use of Common Schema concepts or enterprise-level ontologies (enterprise-wide agreements on common semantics), and specialized schemas/ontologies for use by specialized user communities (together with mappings to the common definitions where possible). Semantics-based mappings between the different technical levels in such an architecture are also required. Such additional levels of organization provide the basis for more interoperability among the various technical levels of resources contained in the system, and hence help enhance the ability of these resources to operate as a "true grid"
All this work requires additional analysis not only of technical issues, but also of application requirements. All sorts of technology already exists that is potentially useful in integrating distributed resources. However, further detailed understanding of application requirements is needed to drive the detailed selection of the combination of technical features needed to form various types of grids. Work on the CoABS grid is an example of ongoing work of this type.
The discussion in this report does not replace the need to address the detailed technical issues associated with various grid concepts in the cited references. However, it does provide a way of thinking about the general ideas which these grid concepts have in common, which hopefully can be helpful in attempts to understand and unify them.
[BFHH+98] E. Brady, B. Fabian, M. Harrell, F. Hayes-Roth, S. Luce, E. Powell, G. Tarbox, "The Advanced Technology Architecture for Information Superiority", draft 10/16/98.
[Bra97a] J. M. Bradshaw (ed.), Software Agents, American Assn. for Artificial Intelligence/MIT Press, 1997.
[Bra97b] J. M. Bradshaw, "An Introduction to Software Agents", in [Bra97a].
[CG98] A. Cebrowski and J. Garstka, Network-Centric Warfare: Its Origin and Future, U. S. Naval Institute Proceedings, Vol. 124/11,139, January 1998, 28-35 <http://www.usni.org/Proceedings/Articles98/PROcebrowski.htm>.
[CoA98] DARPA CoABS Read Ahead Package and CoABS Kickoff Meeting, Pittsburgh, July 22-23, 1998.
[DC498] Directorate for Command, Control, Communications, and Computer Systems, Observations on the Emergence of Network-Centric Warfare, Information Paper, 1998 <http://www.dtic.mil/jcs/j6/education/warfare.html>.
[DDRE98] Director, Defense Research and Engineering, Joint Warfighting Science and Technology Plan, 1998 <http://www.dtic.mil/dstp/98_docs/jwstp/jwstp.htm>.
[FF97a] G. Fox and W. Furmanski, "Petaops and Exaops: Supercomputing on the Web", IEEE Internet Computing 1(2), March-April 1997.
[FF97b] G. Fox and W. Furmanski, "HPcc as High Performance Commodity Computing", Technical Report, December 1997, http://www.npac.syr.edu/users/gcf/hpdcbook/HPcc.html.
[FF97c] G.Fox and W. Furmanski, "High-Performance Commodity Computing", in [FK99a].
[FK99a] I. Foster and C. Kesselman (eds.). The Grid : Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999. ISBN 1-55860-475-8, Hardcover @ $62.95.
[FK99b] I. Foster and C. Kesselman, "Computational Grids", in [FK99a].
[FK99c] I. Foster and C. Kesselman, "The Globus Toolkit", in [FK99a].
[Gen97] M. R. Genesereth, "An Agent-Based Framework for Interoperability", in [Bra97a].
[GLS94] W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface, MIT Press, Cambridge, 1994.
[GFLH98] A. Grimshaw, A. Ferrari, G. Lindahl, and K. Holcomb, "Metasystems", Comm. ACM 41(11), November 1998.
[GG99] D. Gannon and A. Grimshaw, "Object-Based Approaches", in [FK99a].
[HS98] N. Huhns and M. Singh (eds.), Readings in Agents, Morgan Kaufmann, 1998.
[ISO95] ISO/IEC JTC1/SC21/WG7 (1995), Reference Model of Open Distributed Processing <http://www.iso.ch:8000/RM-ODP/> (see also <http://www-cs.open.ac.uk/~m_newton/odissey/RMODP.html> and <http://www.dstc.edu.au/AU/research_news/odp/ref_model/ref_model.html>).
[J695] Joint Staff (J6), Joint Pub 6.0: Doctrine for C4 Systems Support to Joint Operations, 30 May 1995 <http://www.dtic.mil/doctrine/jel/new_pubs/jp6_0.pdf>.
[Ket98] B. Kettler, DARPA CoABS Program: Use Cases for a Prototype Grid, draft 3.1, 12/15/98, Brian Kettler, ISX Corporation <http://coabs.globalinfotek.com/grid.htm> (password protected).
[KT98] N. Karnik and A. Tripathi, "Design Issues in Mobile-Agent Programming Systems", IEEE Concurrency 5(3), July-September 1998.
[Man98a] F. Manola, Towards a Web Object Model, Technical Report, Object Services and Consulting, Inc., <http://www.objs.com/OSA/wom.htm>, 1998.
[Man98b] F. Manola, Some Web Object Model Construction Technologies, Technical Report, Object Services and Consulting, Inc., <http://www.objs.com/OSA/wom-II.htm>, 1998.
[Man98c] F. Manola, Flexible Common Schema Study, Technical Report, Object Services and Consulting, Inc., December, 1998 <http://www.objs.com/aits/9811-common-schema-report.htm>.
[Man99a] F. Manola, "Technologies for a Web Object Model", IEEE Internet Computing, 3(1), January/February, 1999.
[Man99b] F. Manola, Providing Systemic Properties (Ilities) and Quality of Service in Component-Based Systems, Technical Report, Object Services and Consulting, Inc., January 1999 <http://www.objs.com/aits/9901-iquos.html>.
[MGHH+98] F. Manola, et.al., "Supporting Cooperation in Enterprise-Scale Distributed Object Systems", in M. P. Papazoglou and G. Schlageter (eds.), Cooperative Information Systems: Trends and Directions, Academic Press, 1998.
[Paz98a] P. Pazandak, Best of Class Agent System Features, <http://www.objs.com/agility/tech-reports/9809-best-of-class-capabilities.htm>, 1998.
[Paz98b] P. Pazandak, Next Generation Agent Systems & the CoABS Grid, draft Technical Report, <http://www.objs.com/agility/tech-reports/9810-NGAS.htm>, 1998.
[Pis98a] A. Piszcz, "Background on Agents for DARPA's NGII Architecture", Mitre Techical Report MTR 98W0000085, August 1998.
[Pis98b] A. Piszcz, Grid Metaservice Considerations for Control of Agent Based Systems, draft, 3 September, 1998.
[Sho93] Y. Shoham, "Agent-Oriented Programming", Artificial Intelligence 60(1), 51-92.
[Tho98a] C. Thompson, Strawman Agent Reference Architecture, slide presentation, <http://www.objs.com/agility/tech-reports/9808-agent-ref-arch-draft3.ppt>, 1998.
[Tho98b] C. Thompson, Characterizing the Agent Grid, Technical Report, Object Services and Consulting, Inc., 1998 <http://www.objs.com/agility/tech-reports/9812-grid.html>.
[VV95] W. Van de Velde, "Cognitive Architectures--From Knowledge Level to Structured Coupling", in L. Steels (ed.), The Biology and Technology of Intelligent Autonomous Agents, Springer Verlag, Berlin, 1995.
[Woo98] L. Wood, et al., Document Object Model (DOM) Level 1 Specification, W3C Recommendation, World Wide Web Consortium, <http://www.w3.org/TR/REC-DOM-Level-1/>, 1998.
[WWWK94] J. Waldo, G. Wyant, A. Wollrath, and S. Kendall, A
Note on Distributed Computing, SMLI TR-94-29, Sun Microsystems Laboratories,
Inc., November 1994 <http://www.smli.com/techrep/1994/abstract-29.html>.
© Copyright 1998, 1999 Object Services and Consulting,
Inc. (OBJS)
© Copyright 1998, 1999 Institute for Defense Analyses
(IDA)
Permission is granted to copy this document provided this copyright statement is retained in all copies.
Disclaimer: Neither OBJS nor IDA warrant the accuracy
or completeness of the information in this report.