Characterizing the Agent Grid

Frank Manola and Craig Thompson
Object Services and Consulting, Inc.
June 1999

Abstract
1. Introduction
2. Examples of Grids and Grid-Like Systems

2.1 Computational Grids
2.2 Computing Fabrics
2.3 DoD C4ISR Grid Concepts
2.4 Other Grid and Grid-like Concepts

3. The CoABS Agent Grid

3.1 Application Level Requirements
3.2 Functional Requirements
3.3 Initial Progress

4. Issues in Defining the Agent Grid

4.1 The Problem of Defining "Agent" and "Grid"
4.2 General Characteristics of Grids
4.3 Viewing the Agent Grid from Different Perspectives

4.3.1 Viewing the Agent Grid as a Collection of Agent-related Mechanisms and Protocols
4.3.2 Viewing the Agent grid as a Composition or Federation of Agent Systems
4.3.3 Viewing Agent Grids as Organizational Units

4.4 The Need for Unified Grid Architectures
4.5 Open Issues about the Agent Grid

5. Conclusions and Future Work
Acknowledgements
References

Abstract

The agent grid is a proposed construct that is intended to support the rapid and dynamic configuration and creation of new functionality from existing software components and systems. Grids in general and the agent grid in particular are at an early stage of development so this paper focuses on characterizing grids by examining their purposes and properties. The paper describes work on existing grid architectures, some of the properties that an agent grid might provide, then explores various architectural interpretations of the agent grid. The paper concludes that there is more work needed to begin to consolidate the agent grid concept, and to allow agents to better interoperate with other technologies in forming such grids.

1. Introduction

The term grid is increasingly appearing in computer literature, generally referring to some form of system framework into which hardware, software, or information resources can be plugged, and which permits easy configuration and creation of new functionality from existing resources. The "killer applications" for these grid concepts include computational challenge problems (e.g., codebreaking) requiring supercomputing capabilities, universal availability of customized computing services (e.g., access to one's individual desktop and application suite anywhere in the world), and global integration of information, computing, and other resources for various purposes. Several DoD and industry programs use some form of grid concept. However, such computer-related "grids" are a relatively new architectural idea, and not very well understood. Sometimes the term grid is used loosely in describing systems connecting some collection of distributed resources, while in other cases it is clear that some more advanced set of capabilities is involved. The grid concept has begun to be applied to computer systems involving agents, with agents playing both the roles of enablers and customers of grid capabilities. However, in this newer context there is even more that is not understood about the characteristics that such grids might have. The purpose of this paper (which derives from [Man99c, Tho98b]) is to better characterize the architectural concept of an agent grid, and describe some issues associated with defining such grids. The paper attempts to identify some general characteristics which seem to apply to all sorts of grids, in order to provide a "big picture" in terms of which agent grid concepts can be better understood, and then focuses on issues specifically associated with agent grids.

In Section 2, we give examples of several computer-related grid concepts, in order to provide some examples as a basis for understanding the grid concept. In Section 3, we describe, as an example of an agent grid, the grid currently being developed within DARPA's Control of Agent-Based Systems (CoABS) program. In Section 4, we identify some general grid characteristics, based on common characteristics of these various types of grids, and then describe various issues that arise in attempting to characterize agent grids in particular. In particular, we look at various design decisions that need to be made in defining an agent grid, and identify related technologies that should be integrated in building agent grids. Section 5 summarizes our conclusions and areas where more work will be needed.

2. Examples of Grids and Grid-Like Systems

2.1 Computational Grids

The basic concept of a computational grid is defined in [FK99b]. The term grid is used to indicate an analogy with the electrical power grid. Just as a power grid links sources of electrical power together, and provides for widespread access to and distribution of that power (with associated load-balancing and other services), a computational grid is "a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities". The concept of a grid as an infrastructure is important because "…a computational grid is concerned, above all, with large-scale pooling of resources, whether computer cycles, data, sensors, or people. Such pooling requires significant hardware infrastructure to achieve the necessary interconnections and software infrastructure to monitor and control the resulting ensemble". The intent of a [computational] grid is to provide [computational] services that are dependable, consistent, pervasive, and inexpensive. The term "grid" is only slowly becoming associated with the concept of creating a giant computational environment out of a distributed collection of files, databases, computers, and external devices. The term "metacomputing" is also frequently used.

[FK99b] identifies five major application classes for computational grids:

Distributed supercomputing, in which the grid is used to aggregate computational resources to address very large problems that cannot be handled on a single system. Examples include distributed interactive simulation (e.g., military simulations), and simulation of physical processes (e.g., stellar dynamics and climate modeling).
High-throughput computing, in which the grid is used to schedule large numbers of loosely-coupled or independent tasks, with the goal of putting unused processor cycles to work. Examples include the use of multiple distributed workstations to solve hard cryptographic or complex design problems.
On-demand computing, in which the grid is used to meet short-term requirements for resources that cannot be cost-effectively or conveniently located locally (i.e., providing the ability to share scarce resources). The resources may be computation, but may also include software, data repositories, specialized sensors and other devices, etc. Unlike the distributed supercomputing applications, these applications are often driven by cost-performance concerns rather than absolute performance. Particular challenges in these applications have to do with the dynamic resource requirements, and the potentially large population of users and resources involved. The challenges include resource location, scheduling, code management, configuration, fault-tolerance, security, and payment mechanisms.
Data-intensive computing, in which the grid is used to synthesize new information from data maintained in geographically-distributed repositories, digital libraries, and databases. The synthesis process is often computationally and communication intensive as well. Challenges in this class of applications are the scheduling and configuration of complex, high-volume data flows through the network and multiple levels of processing.
Collaborative computing, concerned primarily with enabling and enhancing human-to-human interactions. Examples include collaborative design activities, and "virtual worlds". In many cases, these applications involve providing the distributed participants with shared access to data and computational resources, in which case these applications share characteristics with the other application classes.

For application development, a grid must provide both appropriate programming models and a range of services. The paper notes that there is currently no consensus on what programming model is the most appropriate for a grid environment. Models that have been proposed include:

low-level techniques such as datagram/stream communication (UDP, TCP, Multicast)
shared memory/multithreading (e.g., distributed shared memory techniques)
data parallelism
message passing (MPI, PVM)
remote procedure call (DCE)
object-oriented (CORBA)
agents.

Services that must be provided include:

security: authentication, authorization, and protection
process concepts
data concepts: memory, files, databases
shared address space
communication mechanisms
control
signals/events
resource management: acquisition, allocation, scheduling
accounting and payment

Relevant technologies come from areas such as distributed file systems and databases, distributed operating systems (particularly, such areas as load balancing and process and data migration), parallel and distributed programming, and network management.

The paper also notes that "computational infrastructure, like other infrastructures, is fractal, or self-similar at different scales. We have networks between countries, organizations, clusters, and computers; between components of a computer, and even within a single computer." The paper describes systems at the scales of end system, cluster, intranet, and internet, the basic idea being that these constitute different scales at which similar computational services should be provided (mimicing those provided at the smallest scale, in the individual computer). Of course, it is then necessary to look at how those similar services must be provided as the scale changes, since different technologies must typically be employed.

[GFLH98] and [FK99b] describe several projects developing technology for computational grids. A simple example is PVM (Parallel Virtual Machine, <http://www.epm.ornl.gov/pvm/pvm_home.html>), a software package that permits a heterogeneous collection of Unix computers hooked together by a network to be used as a single large parallel computer. PVM is very portable, and the source code has been compiled on a wide variety of machines. Related facilities are provided by MPI (Message Passing Interface) [GLS94], a community-generated standard for message passing used to interconnect multiple machines. UCLA's Project Appleseed <http://exodus.physics.ucla.edu/appleseed/appleseed.html> is an example of how MPI can be used to link together a cluster of computers (Macintoshes in this case) to provide "a plug and play parallel computer" in support of numerically-intensive processing.

Legion <http://www.cs.virginia.edu/~legion/> [GG99] provides an environment in which a collection of workstations, vector supercomputers, and parallel supercomputers connected by LANs and larger-scale networks appears to the user as a single very powerful computer. Legion uses object-oriented design techniques to "simplify the definition, deployment, application, and long-term evolution of grid components". The Legion architecture defines a complete object model that includes object abstractions for computer resources, storage systems, and other object classes. Inheritance can be used to specialize the behavior of these objects to support specific requirements. The use of reflection (the representation of parts of the underlying system as objects that can be directly operated on to access and change system behavior) is particularly important in Legion. For example, host objects represent Legion processors. One or more host objects run on each computing resources included in Legion. These objects create and manage processes for application-level Legion objects. Object classes invoke the operations of host objects to activate their instances on the computing resources that the host objects represent. Representing computing resources as Legion objects abstracts the heterogeneity of different host computing platforms, and allows resource owners to manage and control their resources within the context of the system. Reflection is also an important technology in providing systemic properties (sometimes called ilities) such as reliability, survivability, and security, and quality of service characteristics, in large-scale computer systems [Man99b].

Globus <http://www-fp.globus.org/> [FK99c] is developing basic software infrastructure for computations that integrate geographically distributed computational and information resources. Globus is based on the assumptions that:

Grid architectures should provide basic services, but not prescribe particular programming models or higher-level architectures.
Grid applications require services beyond those provided by today's commodity technologies.

Globus thus focuses on defining a toolkit of low-level services for security, communication, resource location, resource allocation, process management, and data access. These services are then used to implement higher-level services, tools, and programming models. According to [GFLH98], "Globus has withstood many tests, including a recent one involving battlefield simulations distributed across more than 30 machines and representing the independent activity of more than 100,000 tanks, trucks, and other units."

[FF97a,b,c] discuss the concept of "High-Performance Commodity Computing", the idea that computational grids should be based on emerging commodity network computing technologies such as CORBA, DCOM, and JavaBeans, together with the Web and conventional networking approaches. The papers discuss a three-tier architecture which integrates these technologies. This approach is in contrast with the more specialized grid architectures proposed in Legion and Globus (although these could be integrated to support lower-tier services). The authors particularly emphasize the importance of the emerging "Object Web", integrating the Web, distributed objects, and databases, in the development of computational grid technology.

The focus of much of this work appears to be on large-scale computing problems, although the technology is clearly not limited to those applications. Other grid concepts discussed below extrapolate ideas in distributed supercomputing to more complex applications. For example, in distributed supercomputing, the paradigmatic application is often that of a single large computing "job". The program is run, and a result is produced. A grid is required simply because the job is too large for a single machine. In other grid concepts, the application is of a more continuous nature. This means it must be possible for participants to enter and leave the grid, load distribution is even more dynamic (because the load and its requirements change more dynamically), etc. The next subsection describes a new twist on more familiar applications supported by computational grid concepts.

2.2 Computing Fabrics

Another grid-related "vision" is presented in a series of articles describing what is referred to as Computing Fabrics <http://www.infomaniacs.com/>. As described in these articles, the Computing Fabric consists of nodes, which are packages of processors, memory, and peripherals, linked together by an interconnection facility. Within the Fabric are regions of nodes and interconnections that are so tightly coupled that they appear to be a single node. These are called cells. This tight coupling is obtained using hardware, software, or both. Cells in the Fabric are then loosely coupled with each other. The coupling between cells appears differently from the coupling between the components of a node. The Fabric as a whole, or each cell in it, can grow or shrink in a modular fashion, by adding or removing nodes and links. Nodes from the Fabric surrounding a cell can join that cell, and nodes within a cell may leave that cell and join the surrounding Fabric. In addition, cells can divide and merge. Each cell presents the image of a single system, even though it can consist of many nodes.

The articles give ubiquitous network computing as an example of an application made possible by the Fabric. The first aspect of the application is network computing: each user can access their individual "desktop" (configuration, including all applications, data, etc.) from anywhere on the network. To this is added ubiquitous computing, in which processors, displays, and input devices are everywhere. Users are tracked by sensors, and their location information is used to direct their applications and data to the appropriate devices that are located where the user is located. This changes as the person moves. There is no need for users to explicitly login to access their computing spaces, they are just "there". The Fabric helps avoid the need for the universal presence of sufficient computing power, displays, and input devices necessary to run whatever applications the user wishes to run locally. In this scenario, processors are located all over, e.g., throughout buildings ("as populous as wall sockets, perhaps more so"), and are interconnected by low latency, high bandwidth connections. When the user is stationary, the user's tasks run on a local cell, consisting of processors in the general vicinity, which work together as a single system. If the tasks require it (and they can be paid for), additional processors can be added (thousands of them, if necessary); the computing resources are configured as required to run the software the user wants to run. As the user moves, their cell moves with them. Processing nodes leave the user's cell as their distance makes their communications latencies more than some threshold level, and are replaced by nodes that enter the cell as the user gets near them. A new generation of wearable processor, display, and input devices rounds out the picture.

Technically the concept of Computing Fabrics involves ideas that are somewhat similar to those of the computational grid, but the application focus is somewhat different. Technologies relevant to the creation of the Computing Fabric concept include:

Distributed shared memory architectures
Modularly scalable multiprocessor interconnect facilities (in addition to networking technologies at scales from LANs to the Internet; the idea here is that "buses converge with networks")
Distributed operating systems (e.g., SGI's Cellular Irix distributed Unix)
Distributed object systems supporting mobile objects
Integrations of the above two technologies, as in the later stages of Microsoft's Millennium <http://www.research.microsoft.com/sn/Millennium/>. The increasing integration of Java with CORBA, plus additional infrastructure at lower levels (e.g., to support load management) would provide similar capabilities
Jini, and distributed shared object spaces (JavaSpaces, IBM's T Spaces).

The authors note that full exploitation of the Computing Fabric concept requires the integration of distributed object technologies and database technologies. For example, technologies such as Microsoft's Millennium and Sun's Jini support code developed using object technology being automatically distributed using a distributed object infrastructure running atop massively distributed clusters. Large-scale DBMSs already exploit parallelism and multi-system clustering. DBMSs need to be further exploded into interoperable components that can more fully utilize Fabrics. Logical-level models and views become increasingly important as data and processing are distributed over the Fabric, and as data is organized on increasingly large scales. The Web (and XML) will also need to be included, as representing a large scale distributed data store.

2.3 DoD C4ISR Grid Concepts

The U.S. Department of Defense (DoD) has developed a number of advanced information system concepts employing the idea of grids to support advanced Command, Control, Communications, Computers, Intelligence, Surveillance, and Reconnaissance (C4ISR) capabilities. The grid concepts used in these systems are very ambitious and powerful, embodying the idea of being able to integrate not only global computing and communications resources, but also sensors, weapons, etc., in extremely flexible, custom-tailored combinations to achieve mission objectives. For example, the Advanced Battlespace Information System (ABIS) [ABIS96] concept describes a set of information services, technologies, and tools to support C4ISR. The ABIS concept was produced by a task force composed of operational and technical personnel from all Services, the JCS, and major DoD agencies involved with C4ISR systems. The foundation of the framework is an information grid, described in the "Grid Capabilities Working Group Results" section of [ABIS96] <http://www.dtic.mil/dstp/96_docs/abis/volume5/abis501.htm>). The C4I For The Warrior (C4IFTW) [J695] concept sets forth a 21st century vision of a global information infrastructure referred to as the global grid that will provide virtual connectivity from anywhere to anywhere instantaneously on warrior demand. This grid connects commanders, sensors, weapons systems, etc., and is made up of a web of computer controlled telecommunications grids that transcends industry, media, government, military, and other nongovernment entities. The Network-Centric Warfare (NCW) concept [CG98, DC498] develops these ideas somewhat further. Network-Centric Warfare is described as a derivative of network-centric computing. Just as network-centric computing is being exploited to provide competitive advantage in the commercial business sector, the emerging concepts of Network-Centric Warfare exploit information superiority to provide a competitive edge in warfare. Grid concepts are key elements in Network-Centric Warfare.

These information system concepts have in common the idea of organizing the system into separate functional layers, each employing a specialized grid. The separate grids defined in the Network-Centric Warfare concept are representative of this organization:

The information grid is an information environment including communications, processing, information repositories, and value-added services that provide users with an ability to find information, obtain processing services, and exchange information. It is a federated, heterogeneous system-of-systems that provides "dial tone", "web tone", and "data tone", and provides the infrastructure for network-centric computing and communications. Voice, data, and video can be transmitted via point-to-point or direct broadcast. Embedded capabilities for Information Assurance will prevent intrusive attack and assure commanders that their information will be valid. Warfighters will be able to connect to this grid anywhere and at any time, and will be able to craft their own information environment by selecting the types of services, information, and interfaces that are appropriate to their missions and styles of operations. The grid will provide connectivity and information that will adapt to changing situations and be responsive to the warfighter's need for knowledge. It will adapt to the constraints imposed by connectivity at the tactical levels and will be able to organize resources within the global infrastructure to service the needs of the warfighters. Participants in the grid may include civil, commercial, and foreign organizations, and Grid functionality will extend to all types of users in joint and combined operations. As a result, the grid must cope with the heterogeneity of the commercial world, and of allies and potential coalition partners. The ABIS description of the information grid explicitly mentions that intelligent agents will be included to assist the users in finding and retrieving information, so that they are not overwhelmed with the massive amount of information and sources available in the grid.
Sensor grids can be viewed as sets of sensor peripherals and sensor applications that are installed on the information grid. The sensor peripherals consist of space-, air-, ground-, sea-, and cyberspace-based sensors. These sensors can be based on dedicated sensor platforms, weapons platforms, or deployed by individual soldiers. The sensor peripherals also include, e.g., embedded sensors that track levels of consumables (e.g., fuel or munitions). The sensor applications consist of the software applications associated with specific sensor peripherals, as well as the software applications that enable multi-mode sensor tasking and data fusion. Individual, custom-tailored sensor grids can be created from the sensor resources available on the grid. Dynamic sensor tasking, data fusion, and effective distribution of information over the information grid provide increased battlespace awareness, and an increased ability to synchronize this battlespace awareness with military operations.
Engagement grids can be viewed as sets of shooter peripherals and shooter applications that operate on the information grid. The shooter peripherals consist of geographically-distributed air-, land-, sea-, and cyberspace-based "shooters" (weapons systems). The shooter applications consist of software for command and control and weapon employment. Some of these shooter applications implement high-speed automated weapon-target pairing algorithms. These algorithms can rapidly determine near optimal weapon-target pairings subject to time-varying constraints, such as number and value of remaining targets, number of remaining shooter rounds, and the probability of kill of remaining rounds. The concept is similar to what occurs in automated securities trading, where the expertise of the trader is embedded in high-speed automated trading software. As with sensor grids, custom-tailored (e.g., mission-specific) engagement grids can be created from the resources available on the grid. The engagement grids exploit the battlespace awareness provided by the sensor grids to enable new operational capabilities for force employment.

An example of an existing operational architecture that employs network-centric operations to increase combat power is the U.S. Navy's Cooperative Engagement Capability (CEC). CEC networks the sensors, command and control, and shooters of the Carrier Battle Group's platforms to develop a sensor grid and an engagement grid. The mission-specific sensor grid generates a high level of battlespace awareness by fusing data from multiple sensors, enabling quantum improvements in track accuracy, continuity, and target identification over standalone sensors. The CEC engagement grid exploits this awareness by extending the battlespace, and engaging incoming targets in depth with multiple shooters.

The Information Superiority Chapter of the 1998 Joint Warfighting Science and Technology Plan [DDRE98] describes the composition of the information, sensor, and engagement grids as forming a C4ISR grid that supports DoD's concept of Information Superiority: the "degree of dominance in the information domain that permits the conduct of operations without effective opposition". The plan also identifies high-level functional capabilities required for Information Superiority, which of them are supported by the C4ISR grid, and key technologies the grid must support, including:

dynamic allocation of computing resources
fault avoidance and recovery mechanisms
tailored search and retrieval of information
multimode, multilingual interface services
automated mediators and database management system tools
massive data storage and manipulation
robust, adaptive, automated, context-based information distribution infrastructure
tools for projecting and visualizing C4ISR grid capabilities in terms of projected operational needs

Some DARPA activities also refer to grid concepts. For example, the Advanced Technology Architecture for Information Superiority (ATAIS) architecture [BFHH+98] uses "grid" in a number of places and senses (i.e., "nformation grid, communication grid (in the sense of the Internet), and a full-scale computational grid).

2.4 Other Grid and Grid-like Concepts

[Tho98b] mentions other examples of grid-like concepts. Many of these concepts are somewhat high level as compared with those described in Sections 2.1-2.3.

The electrical power grid. The electrical power grid has already been mentioned as the analogy on which computational grids have been based. If viewed globally, it uses wires connected in some physical topology to carry electricity across the world delivering power to enable a rich variety of applications that need and will pay for the power received. At a slightly lower level of abstraction it is connected physically by wires and also by a collection of standards to provide a uniform quality of service, though standards vary in different locales. The power grid is an example of a physical grid.
The Transportation Grid. This is another physical grid. Transportation grids carry vehicles or materials. Examples are the national highway system, railroads, ships, planes, and pipelines. Each of these can be viewed as a subgrid of the Transportation Grid distinguished by its own infrastructure. The different grids have different properties (capacities, speeds, connection topologies), but they interoperate (or federate) at connection or bridge points to, e.g., allow transfers between planes and trucks.
Data Dissemination Grids - The DARPA Intelligent Integration of Information (I**3), Battlefield Data Dissemination (BADD) and Agile Information Control Environment (AICE) programs describe information architectures for connecting thousands of data sources to thousands of queriers across global networks. New infrastructure technologies needed include wrappers, caching, push-pull, and channels.
Geographic Grids - Paper and digital maps as well as GPS provide the infrastructure for a geo grid, in the sense that objects of interest are integrated by locating them with respect to common coordinate systems. The geographic information systems (GIS) community, including the National Image and Mapping Agency (NIMA), view such grids as an underpinning for command and control. They are also important in other domains, such as agriculture and real estate.
Supply Chains, Virtual Enterprises, and Simulation Architectures - While supply chains and virtual enterprises are not generally referred to as grids, they could be viewed as federations of a collection of organizations from the point of view of integrating the production and delivery of goods and services, the result being that the community of organizations acts like one large virtual organization or logistics grid. A concrete example is the DARPA Advanced Logistics Program (ALP), which is developing a software architecture consisting of a federation of clusters (agents) that wrap logistics organizations to rapidly define and support detailed logistics plans in heterogeneous environments. The Defense Modeling and Simulation Office (DMSO) High Level Architecture (HLA) similarly provides a federation architecture for connecting together simulations - the simulations share a common clock and send messages to each other using a common bus and content format; players can enter or leave the simulation dynamically.
Social and Cultural Networks - The following "grids" exist to hold our society together: physical laws, families, tribes, religion, rules, laws, unions, armies, government, common language, financial system, writing system, printing systems, home addresses, libraries, mail system, media, fast food, personal computers, VCR standards, etc. These grids exist simultaneously and interact (there is no overarching grid), each involves some notion of something being shared, and a support infrastructure. Some grids are decentralized (money) and depend on shared (cultural) assumptions as well as indexes of supply and demand. These things aren't very often referred to as "grids", but are certainly sometimes referred to as "networks" (e.g., you "network" among your acquaintances), which emphasizes the relationship between the concepts of "grid" and "network".

3. The CoABS Agent Grid

DARPA ISO's Control of Agent-Based Systems (CoABS) program is exploring a new kind of grid called the agent grid. Here, the grid concept is applied to agents, since a key "vision" of the program is the concept of a grid as a means of making agent-based systems more interoperable and pervasive. The agent grid can be described by requirements at two levels: application and functional.

3.1 Application Level Requirements

Application level requirements describe benefits that applications receive by using a grid. At the application level, the agent grid is defined as an enabling technology needed for command and control as a main ingredient in supporting DoD's Information Superiority concept. The notion of the agent grid is important in DoD as one of the architectural constructs that might make a variety of command and control systems easier to build, maintain, scale, evolve, adapt, and survive. These systems are characterized by:

multi-year lifetimes and evolving and changing requirements
more components than any group of designers can design or even understand
design by groups that do not know about each other
adaptable and scalable to large or small sizes
systems management without explicitly monitoring all components all the time (there are too many components to do this)

From the DoD perspective, agent technology is expected to help:

reduce the 60% of time in command and control systems spent manipulating stovepipe systems, and incrementally replace stovepipes with more reliable, scalable, survivable, evolvable, adaptable systems
make it much easier to snap together future systems to meet flexible needs in uncertain environments
connect the $40B worth of DoD equipment that currently only interoperates with one or two other components, permitting better situation assessment, resource sharing, and logistics support
reduce system complexity
help solve data blizzard and information starvation problems

Proposed characteristics of the agent grid are described in [CoA98], including:

It is a distributed electronic environment.
When agents enter the grid, they receive status information, and their activities are modified and integrated with other activities in the grid. E.g., "When your personal assistant connects to the grid, it tells the grid where you are, what you are doing, how your resources are configured, which supplies you need, and so on."
It encompasses both computer and other resources, and allows them to be used by other agents. "Your forces might be dynamically reassigned to a new plan; your computer equipment, currently underutilized, might briefly be recruited to run a meteorological simulation by a load-balancing agent; due to your personal expertise in Arabic, you might receive documents to translate, or perhaps not if the grid realizes that your time is already claimed by other responsibilities."
"All resources - mental and material, human and non-human, permanent and ephemeral - are balanced by the grid. Goals are reconciled by agents in the grid and priorities are established."
"Whatever kind of agent you are [including both humans and pieces of equipment] when you enter the grid, you immediately become part of a larger, coherent system. And when you leave the grid, to travel, sleep, or shut out the hubbub for a while, the grid prepares for your return by generating status reports, reading and summarizing your mail, planning how to use your resources, and so on."
It provides the ability to easily connect heterogeneous components together to form coherent aggregates.

Further discussions of CoABS grid ideas are provided in [Ket98,99; Tho98b; Pis98b]. Some additional characteristics expected from an agent grid are:

humans and agents can connect to the grid anytime from anywhere and get the information and capability they need
it scales to millions of agents so agents are pervasive and information and computation is not restricted to machine or organization boundaries
it provides/supports agents that act for users and interact with them, wrap data sources, filter information, plan and execute tasks, and coordinate with other agents
it enables teams led by humans and staffed by agents
it supports dynamic configuration, reorganization, and adaptability of associated resources (software components, applications, data, agents, etc.) to solve a variety of C4ISR problems

3.2 Functional Requirements

From a functional point of view, the CoABS grid application-level characteristics suggest that the agent grid knows not only about agents, but about their computational requirements (e.g., how they can be broken up into processes, so they can be distributed across multiple computers), and about available computational (and other) resources. Hence, the agent grid appears to incorporate both the concepts of Computational Grid and the Computing Fabric, in the sense of providing a unified, heterogeneous distributed computing environment in which computing resources are seamlessly linked. In addition, the agent grid extends the idea "upward" to agents. These agents play the roles of applications whose computations can be distributed within this distributed computing environment, resources that can be used within this environment, and infrastructure components of this environment. At the same time, there appears to be an interface between the computational and agent layers, so that at least some agents, e.g., those that do load balancing, can operate on the computing level grid. Furthermore, since the agent grid is defined as encompassing other resources (e.g., forces), agent grid ideas are also consistent with aspects of the DoD grid concepts. For example, agents are explicitly mentioned as components in parts of these DoD grid concepts, and agents could well be the implementations of choice for many of the applications incorporated in these grids. Agents could also serve as wrappers of resources (and mediators between them) in these architectures. The assumption is that "agent technology" (viewed broadly) provides mechanisms for late binding, reconfiguration, load balancing optimizations, achieving and maintaining systemic properties like survivability and scalability, and coordinating teams and organizations.

Building the grid suggested by these requirements would appear to involve all the computational grid issues of system management (and associated metadata), distributed computation and load balancing, mobile code, security, etc., as well as the "agent-level" versions of those issues (issues which, e.g., reflect the semantics of the entities on whose behalf the agents are functioning, or the resources which the agents wrap). This requires a way of describing resources and capabilities, and resource requirements and tasks, and a way to map between them, at both agent and computational levels. This grid also apparently involves the need for a way of defining higher-level goals, i.e., a way to define the goals of the grid itself, that are optimized by the load-balancing, etc. that is going on. These goals are presumably at a higher-level than those of individual agents (although these might also be characterized as the goals of higher-level agents, or agents of higher authority, rather than goals of the grid per se.)

All this suggests that one view of the CoABS grid could be that of a framework which combines the capabilities of a computational grid and those of an Agent System architecture. This would mean that it would need to incorporate services provided by agent system architectures, such as communications, lifecycle services, ACL and knowledge representation, transactions, metering and charging, matchmaking/facilitating/negotiation, security, persistence facilities, system management, and mobility (see, e.g., [Pis98a; KT98; Paz98a,b; Tho98a]). The result would resemble a form of "agentized" Object Services Architecture (OSA), that is, an agent bus architecture similar to the OMG Object Management Architecture. This in turn raises a number of issues, such as:

Is an OSA a good GSA (Grid Services Architecture)? a good start toward one?
What are the characteristics of a good grid service?
Do object services in an OSA have these characteristics "out of the box"?
Is a GSA component necessarily expected to be "smarter" compared to an OSA component? why, and in what ways?

At the same time, other views of an agent grid are also possible. For example, we might note that there are many individual agent systems (and many notions of agent) today consisting of a collection of agents and an agent biosphere (shared support services and resources). Hence, we might argue that the grid should be something that helps us connect these together so they will interoperate, possibly allowing agents to leave one system/biosphere and go to another, or leave one grid and go to another. For example, an agent grid might be a federation of agent systems which allows some sharing and interoperation (but how much), or a meta agent system that generalizes agent systems.

In addition, the agent grid is likely to depend not only on computational grids, but also on other "lower-level" grids or similar unifying technology such as Web technology, distributed object systems, etc. The CoABS program assumes that these other kinds of technology (distributed objects, simulation, network management, ...) are useful and will be needed, in addition to agent technology. Hence, whatever we mean by agent technology, it must co-exist with other useful and already pervasive technology. This suggests the view that the agent grid might be considered as, in some sense, a technology layer that enables other grids. That this might be the case can be seen in the grids described earlier - in each case we can imagine agent technology adding value to these grids. If that is so, then there are several challenges including:

understanding what constitutes the agent technology grid (as an abstraction layer)
understanding how the agent technology grid complements related technology
demonstrating that higher level grids are enabled by the agent technology grid.

3.3 Initial Progress

At this point, work on the DARPA CoABS project focused specifically on developing grid system ideas and implementations includes:

[Paz98d] examines Sun's Jini as a potential grid infrastructure component. Potential advantages are Java run-anywhere portability, possible Jini pervasiveness, source code availability (with license), and availability of a starter kit of services. Initial examples show Jini running in embedded systems like hotel rooms, TVs, printers, etc. so there is a presumption that Jini frameworks can be used to model complex systems .
[Pis98a] focuses on the namespace and trader aspects of a global grid. It mainly views the grid as a namespace and registry that must scale.
[Ket98] takes a use case view of the grid and generates a grid abstract machine that takes the view of the grid as similar to an agent system and consisting of a backplane registry of events that describe agent, service, and resource operations and status. The paper is insightful in that it captures the system view of a grid (sense B) while trying to make minimal assumptions. It does not commit to any control tradeoff between agents and the grid and so is underspecified (by design). By itself the paper does not identify or answer many of the grid issues raised later in this paper, but it provides a framework for doing so more precisely.
[Ket99] considers the entire grid vision, DoD motivations, the architecture, and some architectural issues. "The Grid is not meant to provide a uniform agent architecture that all components must adhere to, but rather a bridge between agent (and other component) architectures, allowing interoperability across these architectures but not replacing all of the services provided by these infrastructures." The paper also contains a description of the grid as a particular collection of incrementally evolving implemented services including:

Access Framework - grid access mechanism for message handling and ACL translation
Directory - white + yellow pages manager
Logging - message log manager
Visualization - grid activity and status manager
Brokerage - recruitment and mediation manager
Translation - ACL translation to/from KQML and FIPA ACL (TBD)

[TBPV99] does not describe a grid as a single system but rather as a collection of separately useful component subsystems which can interoperate. One of the subsystems is a WebTrader that uses Web search services to index XML service advertisements contained in Web documents. Using this approach the WebTrader inherits industrial strength, scalability and pervasiveness to support trading, an assumed grid service. Another subsystem is Tao, an agent system that uses email to transport FIPA ACL messages encoded in XML between agents. Like WebTrader, Tao inherits advantages of an existing infrastructure (e.g., support for disconnected operation, firewalls, and security) to provide a scalable and potentially pervasive agent communication bus. This work takes a componentized view of grids - rather than assuming a grid is a fixed collection or minimal set of infrastructure capabilities, this view assumes that gridness is the goal, and individual components can be viewed as providing improved grid infrastructure capabilities.

The following section describes various ways of viewing what an agent grid is, and describe the associated issues that must be addressed in determining the characteristics of an agent grid.

4. Issues in Defining the Agent Grid

From the examples of grids in earlier sections, we can begin to identify aspects, properties, and characteristics of grids to see what would be desirable in an agent grid (since we get to define that term, and in possibly multiple ways). These observations might begin to be the basis for architectural design principles and patterns for agent-based grids. At very least they add to our vocabulary about grids, creating terms for the "grid ontology" which can be mined for potential requirements and use cases to consider in defining the grid architecture. In this section, we first describe some problems with defining the concept of a "grid". We then attempt to synthesize aspects of the grid concepts already described, in order to identify common characteristics of grids. We then focus on the specific characteristics of agent grids.

4.1 The Problem of Defining "Agent" and "Grid"

Attempts to come up with a precise definition of "grid" run into difficulties similar to those found in trying to come up with a precise definition of "agent" (similar definitional difficulties have surrounded the word "object", although these have been to some extent reduced by its operational definition in various object systems). Since we are trying to characterize the agent grid, we need to consider these difficulties in defining both terms.

[Bra97b] observes that attempts to define the term "agent" have taken two approaches: ascription and description. Definition by ascription recognizes the fact that, while there is often little commonality among the details of various "agent" concepts, they all have a "family resemblance". This leads to the idea that "agent-ness is in the eye of the beholder". In other words, definition by ascription says that agent-ness "cannot ultimately be characterized by listing a collection of attributes, but rather consists fundamentally as an attribution on the part of some person" [VV95]. As [Bra97b] notes, "This insight helps us understand why coming up with a once-and-for-all definition of agenthood is so difficult: one person's 'intelligent agent' is another person's 'smart object'; and today's 'smart object' is tomorrow's 'dumb program'."

The problem with ascription is that it allows practically anything to be described as an agent, making communication about agent concepts difficult among people who do not share the same point of view. A useful "filter" for using "agent" to describe a piece of software is that it should be useful to do so; that is, calling something an agent should in some useful sense distinguish it from concepts we already understand. For example, [Bra97b] quotes [Sho93] as observing:

"It is perfectly coherent to treat a light switch as a (very cooperative) agent with the capability of transmitting current at will, who invariably transmits current when it believes that we want it transmitted and not otherwise; flicking the switch is simply our way of communicating our desires. However, while this is a coherent view, it does not buy us anything, since we essentially understand the mechanism sufficiently to have a simpler, mechanistic description of its behavior."

A descriptive definition of an agent, on the other hand, typically involves a set of attributes, which a given agent might have to a greater or lesser extent. Numerous sets of such attributes exist [Bra97b], and there is much discussion about which attributes best characterize agents. In the course of the next few years we must tease these (possibly orthogonal) attributes apart and understand what each technology is adding to the picture, especially if we want a large body of industry and DoD to adopt this next generation technology.

For the purposes of this paper, we can use as a working assumption that agents are (some of):

autonomous - agents are proactive, goal directed and act on their own performing tasks on your behalf
adaptive - agents dynamically adapt to and learn about their environment. They are adaptive to uncertainty and change.
mobile - agents move to where they are needed
cooperative - agents coordinate and negotiate to achieve common goals. They are self-organizing and can delegate.
interactive - agents interoperate with humans, other, legacy systems, and information sources
social - they work together in communities, may have personality.

Moreover, we assume that agents are objects or components, in the sense that agents have identity (you can tell one agent from another), they have their own state and behavior (distinct from that of other agents), and they have interfaces by/through which they communicate with each other and with "other things". For example, [Bra97b] refers to agents as "objects with an attitude" in this sense. Here, we are using objects as a generic modeling or abstraction mechanism, independently of whether agents are implemented as objects (using object-oriented programming techniques). Object interfaces can encapsulate "smart things", e.g., more or less smart agents, and human beings; for example, "fmanola@objs.com" is the identifier of an interface to which messages can be sent. Object interfaces can also encapsulate "dumb things", e.g., conventional software objects, or dogs ("on the Internet, no one knows you're a dog"). In some cases the messaging protocols between these various kinds of objects will be relatively simple (e.g., conventional object RPC between distributed software objects, or commands sent to hardware), while in other cases they will be more complicated (agent communication language (ACL) sent between agents, or the email flow between people); however, similar abstraction principles can apply to objects at all levels.

One way to unify the ascriptive and descriptive view of agents is to view the maximal agent as potentially having all behaviors found in the agent attributes list, and that degenerate forms of agents are those containing fewer than all properties. In this view, objects are agents without these extra agent attributes. This helps explain how agents might literally be "objects with an attitude."

A similar situation regarding ascriptive and descriptive definitions exists in attempting to precisely define "grid" and, in particular, "agent grid". We can get a general idea of what "gridness" is from the "family resemblance" of the grid examples presented earlier, and try to apply that general idea to the concept of an "agent grid". Sets of grid attributes which could be used in a descriptive definition of a grid are presented in [FK99a]. In addition, if we were to consider an agent grid as a generalization of an agent system architecture, the list of grid services could be used as descriptive attributes of agent grids, together with sets of attributes given in [HS98]. However, whether an agent grid is a form of agent system architecture is precisely one of the definitional issues that must be considered.

The definitions of "grid" found in dictionaries generally imply some concept of a "network" or "mesh". This is certainly the generic idea of a grid. Many things can be referred to as "grids" in this simple sense, including, in the context of computer systems, the Internet, the Web, or the objects in a CORBA-based distributed object system (which form an interconnected network by virtue of the references the objects have to each other). However, the grid concepts described earlier imply additional requirements, a stronger cohesiveness. If we are going to use the term "grid" in a computer context, the example of "mis-ascription" cited above becomes relevant: in the same sense that it buys us nothing to refer to a light switch as an "agent", it buys us nothing to refer to the Web as a "grid", even if it might be technically accurate to do so. In other words, if we are going to use a new term such as "grid" to describe particular computer-based systems, it will be helpful to explicitly identify the properties we want to associate with those systems that distinguish them from computer-based systems we are already familiar with (such as the Internet, the Web, distributed object systems, etc.), and for which we already have other names. Things get even more complex when agents are included, since we understand less about agents at this point than about some of these other technologies.

In the sections that follow, we describe some basic ideas for use in characterizing the concept of an agent grid. We first discuss some general attributes that seem to apply to computer-related grids in general. We then discuss agent grids in particular.

4.2 General Characteristics of Grids

By looking at the "family resemblance" of the grid concepts, we can say that a grid is fundamentally a mechanism or concept for integrating or sharing physical or logical things which can be considered as a single unit. In considering an integrating mechanism, it is useful to focus on:

what things are being integrated
what integrating those things means

We can think of the things (resources) to be integrated in a computer-related grid as (in a rough order of increasing "semantic complexity"):

computation, as in the computational grids of Section 2.1
data and information, as in the "information grid" of Section 2.3
software
agents ("smarter" software), as in the CoABS grid
people (e.g., as users of these systems)

Integrating these things involves:

the ability to link resources with/into the grid via some kind of interconnection mechanism (and where the interconnection mechanism is itself considered as a collection of resources); this is the basic "network" characteristic of any grid
the ability for any grid participant to use any of these resources ("local" or "non-local") to perform some task
the ability to compose grid resources to form new combined resources that can be used in the same way as the individual resources (including treating the entire grid as a resource if necessary)

These, in turn, involve a number of more detailed, but still general, capabilities, including:

the ability for the grid to maintain state information about itself - this involves static metadata (e.g., database schemas) for describing available resources, dynamic metadata for describing the current status of resources (for managing resource usage), as well as any state information that defines the resources themselves (e.g., program code, database data); moreover, as a general rule, the more "grid-like" a system is, the richer this metadata and other descriptive information must be.
the ability for the grid to discover the existence of new resources as they connect to the grid (and add their resource descriptions to its state)--this involves more than simply white/yellow pages services for recording the existence of resources, but also includes well-defined mechanisms, such as those in Jini <http://java.sun.com/products/jini/>, for resources to announce their arrival as grid members, and for grid services to take notice.
the ability for the grid to match resources with requests (e.g., trader/broker services), and use the resources to satisfy the requests.
the ability for the grid to form resource aggregates (groupings of resources) that match "aggregate" requests--this is more straightforward if requests and resources are always defined using the same "units" (e.g., computer cycle or memory requirements) than at higher levels, where requirements involve very different types of resources, or the need to perform human-level tasks.
the ability for the grid to monitor (and hopefully optimize) the use of resources being provided--this includes such things as load balancing, query optimization, or maintenance of quality of service, and typically requires the use of information at several semantic levels and mappings between them (e.g., to manage quality of service of a video presentation in terms of adjusting network bandwidth [Man99b]).
the ability for the grid to deal with resources entering or leaving the grid (or temporarily becoming unavailable).
the ability for the grid to do these things autonomously (and automatically) to a significant extent, i.e., without a great deal of manual intervention. Also, the ability of the grid to provide a system management interface so humans or agents can conveniently monitor it and possibly manually intervene to control it.

A number of observations can be made in connection with the points made so far:

"Gridness" can be thought of as a continuum. At one end, there is the simple interconnection or network of resources, as in the dictionary definitions of "grid". We can think of such a network as a "loose grid", if we must use the term "grid" for these networks at all. At the other end, there are the systems that allow the interconnected resources to function as well-integrated units, as in the grid concepts described in Section 2 (particularly the DoD concepts described in Section 2.3, and the CoABS grid as described in [CoA98]). We can refer to these systems, which exhibit the characteristics described above (and possibly other defining characteristics not yet identified) in the strongest sense, as "true grids". This "true grid" endpoint of the "gridness" continuum is, of course, an arbitrary designation. Systems will exist at various points along the continuum, becoming "stronger grids" as they exhibit these "gridness" characteristics to a greater extent.

A key aspect of grids is composition of resources in a sense that goes beyond simply interconnecting them (although interconnection is clearly required). The compositional facilities provided by grids can apply at all levels, including hardware/computational power, data and software (software including both individual components and services, and composition including such things as interoperability and formation of aggregates), and agents and people (e.g., formation of communities and teams). These compositions of resources are applied to "composed tasks" (i.e., tasks that go beyond separately accessing or invoking the individual resources): in the transportation grid, the composed task is generically "provide access to resources"; in the power grid, it's "provide power"; in computers, it's presumably "provide computation" (or, more abstractly, "perform service/task"). At the agent/human level, the tasks are suitably abstract (e.g., as "translate document" is enabled by the CoABS Grid knowing that there's a connected person who understands Arabic). Ideally, we want these compositions to exhibit a fractal property or, looking at them the other way around, we want composition to exhibit a closure property. This means that the resource compositions should have characteristics that are similar to those of individual resources at the same level of abstraction, so that we can treat the compositions as resources themselves. For example, the computational grid seamlessly forms a large virtual computer from individual computers in a network, forming something that looks like yet another computer (which itself could be further aggregated). Similarly, relational database theory emphasizes the idea that operations on data such as joins should exhibit a closure property, permitting newly formed aggregates of data to be operated on in the same way as the pieces from which they were formed. A similar idea can apply to agents. It should be possible to form teams or communities of agents that are interacted with as if they were single agents, with the group transparently dividing up any resulting work that has to be done. Grids also tend to emphasize the dynamic aspects of composition, i.e., that it should be possible to easily form compositions of resources, then break them up when the resources are no longer needed, for recomposition elsewhere.

Since grids tend to be thought of as "units", grids tend to involve some level of unified (but not necessarily centralized) management or government. The government consists of rules, policies, and mechanisms. Grid government might come in many forms (central authority, de-centralized, democratic, etc.). An interesting grid necessarily implies a certain amount of autonomous behavior of units of the grid, in some cases to the extent that the grid's properties can be emergent, "emergent" being defined in as "stable macroscopic patterns arising from the local interaction of agents." Part of this government is the grid's economy, both because the grid's activities and use of the grid generally costs something, and because of the need to allocate resources. In considering grid government, care is needed to match the level of abstraction of the management with the level of abstraction of the grid. For example, the Internet has a certain amount of load management at the network level, but this does not make it a computational grid, even though it does connect numerous computers. A level of management at the computational level would be required for that.

Partly as a corollary of a grid being both a unit, and having compositional properties, there may be more than one grid. For example, there can be local and global grids, or a hierarchy of grids (not just global and local), and larger grids can be constructed be composing smaller (perhaps local) grids. It may be possible for local grids to operate independently from larger grids of which they may temporarily be a part. Local grids can provide heterogeneous enclaves where different standards, policies, or systemic properties (adaptability, scalability, security, reliability, etc.) hold. This also implies that a grid has a boundary.

Whether the composition of resources involves movement of the resources depends on the kind of grid and its applications (there is invariably movement of some sort, but not necessarily of the resources). For example, the composition of resources in a transportation grid necessarily involves moving those resources from where they are to where they are needed. In a computational grid, the resources are generically "computational capacity". In conventional computer networks, the capacity itself doesn't move, instead, the load is moved. However, specific groupings of capacity ("virtual capacity") can seem to move as sharing arrangements and interconnections are set up and torn down (as in the case of the Computing Fabric of Section 2.2). Data is moved in a computer network in the same way that resources are moved on a transportation grid. In the case of distributed object systems, there can be either movement of load alone (e.g., in CORBA systems, where objects are static, messages representing load are sent to them, and messages representing results are returned), movement of resources (in the case of Java objects), or both (e.g., even in a Java-based network, some services, or special purpose devices such as sensors, may not be able to move). Similar considerations apply to agent systems.

Grids involve participants that provide to the grid as well as those that are benefited or enabled by the grid (grid users). There is a great deal of asymmetry in some grid-related technologies that sometimes must be dealt with in order to build "true grids" from these technologies. For example, it is straightforward to think of connecting personal computers to the Internet in order to access information. It is less straightforward to think of these personal computers as being part of the Internet in the sense of having their file systems and computational facilities fully integrated with the Internet in order to form a computational grid in the sense of Section 2.1. To do this, additional technical issues must be addressed (e.g., security). From another point of view, it is generally more straightforward to integrate data than it is computational capabilities. Typically this is because (a) the interfaces (for others to gain access to attached computing resources) are not as well developed as they are for data, and (b) the mechanisms for effectively using the added computation are not as well developed either (e.g., in a local network it may be possible to run an application located on someone else's machine, but it is not as easy to distribute a computation over several machines).

The relationship between a grid in a "loose" sense and a grid in the stronger sense of Section 2 is generally that the "loose" grid is or can be used as part of an organization that constitutes a "true grid". Finding the actual grid may sometimes require considering a wider context, or adding additional technology. For example, the transport grid (or a subset, like "the railroad grid") may be viewed as just the network of transport connections and the points connected. However, this grid was created in the context of higher-level desires by people to move/share resources (food and other goods). It is the unification of the transport links, together with the higher-level control mechanisms (and to some extent the economic system that provides the "tasking") that creates a grid in the stronger sense. The Internet is another example. At one level, the Internet may be thought of as a loose form of grid, because it provides network connectivity among multiple computers. However, considering it a grid in a stronger sense requires additional technology. For example, [ABIS96] notes that while the Internet might be a model for the ABIS information grid, it lacks attributes such as security, and resource allocation based on (mission) priority, needed to support their idea of a grid. Considering a wider context can also identify relationships between the Internet and a stronger grid concept. For example, via Internet email, it is possible for people to organize collaborative efforts, integrating the activities of widely-scattered people. This does not mean that the Internet, or Internet email, by itself, constitutes a grid. However, considering the connected people as part of the "system" enables that system to be thought of more realistically as a grid, with the Internet as a part, and with higher-level organizational strategy and goals being provided by the people involved. Similarly, distributed computer networks are at the heart of the computational grids described in Section 2.1, but additional mechanisms must be added to those networks in order to form grids in the stronger sense. Expanding the context can help us see both the grid that was intended, and also what additional components and mechanisms would be necessary to form a "true grid". This suggests that we might want to look at technologies, such as the Web and distributed object systems, that clearly exhibit certain characteristics that we associate with grids. However, we want to look at them not as grids in the fullest sense, but as "proto-grids", and then look carefully for the additional technologies that could be added to them to create grids in the stronger sense, as a way of pinning down what a "true grid" really is. In addition, we want to look at how a grid at one level may be enabled (implemented) on top of a grid at another level (as, e.g., a supply chain grid might be implemented using Web connectivity and/or by email, fax, snail mail, ...), and at the mappings and bridges/adapters required between them.

The example of viewing email plus people and organizational goals as a grid raises an additional openness issue. Here the grid does not have closed system boundaries, and its enabling mechanisms (email) might be used for many purposes, not just to further organizational goals. So (open) grid enabling technology may be as important to identify as closed "true grid" technology. A corollary to grid openness is that a "true grid" can coexist with loose grids. For instance, a computer can run conventional programs and, at the same time, participate in a true grid that shares some of its resources among a community.

Finally, as stated in the final bullet above, "gridness" seems to imply that the system is "aware of itself" to a certain extent, and has the ability to carry out its tasks "itself", without a great deal of manual intervention. For example, any interconnected group of distributed computers could be used as a much larger "virtual computer" by employing programmers to cope with all of the distributed programming and other problems necessary to use these resources in specific applications. That does not mean that this set of distributed computers by itself is a grid. What differentiates a computational grid is the fact that the grid itself provides services over and above the computers and network to help support the "virtual computer" abstraction (possibly to a greater or lesser extent), and alleviate at least some of the detailed programming that would otherwise be necessary. Similar comments apply to grids at other levels.

4.3 Viewing the Agent Grid from Different Perspectives

We now turn to the agent grid as a special case of the general idea of a grid. In attempting to characterize the "agent grid" more precisely, we need to recognize that the term can be used to refer to different things, and convey different perspectives (although these perspectives are related). Each of these perspectives involves resolving different issues about the definition in order to pin it down more precisely.

4.3.1 Viewing the Agent Grid as a Collection of Agent-related Mechanisms and Protocols

One view of the agent grid (sense A) is that it refers to a collection of agent-related mechanisms, constructs (interfaces, protocols, ...), and universal assumptions that augment object and other technology to increase the ability to create dynamically composable systems. Many of these mechanisms can be independently studied and could be standardized so that communities that used these standard mechanisms could more easily develop interoperable "agent-based" capabilities. For example, a thesis of the DARPA CoABS program is that agent technology will provide some missing ingredients in an evolving list of architectural mechanisms that will make software composition much easier in the future. Examples of these agent-related mechanisms would (somewhat redundantly) include: trading, matchmaking, facilitating, advertising, negotiating, brokering, yellow pages, constraints, rules, inference, planning, schedulers, control algorithms, ontologies, agent communication languages, agent system frameworks, conversational protocols, mobility frameworks, models of teams, models of markets, models of trust, user models, learning algorithms, and models of exceptions.

However, while these mechanisms may be "agent-related", many of them have been used in or derive from non-agent systems (e.g., such systems have employed trading and brokering; the limited forms of some "ontologies" are roughly equivalent to other metadata mechanisms such as database schemas and Web metadata mechanisms). Moreover, there is a great deal of "not-strictly-agent-related" technology that is also useful in building composable software systems or information access frameworks, such as component and object distribution and mobility frameworks, messaging services, event monitoring, catalog and directory services, publish and subscribe mechanisms, security mechanisms, transactions, persistence, query facilities, load balancing, etc.

Moreover, within this general view there are several subviews. On the one hand, the agent grid may simply be a name for an (unstructured) agent technology layer consisting of mechanisms and services (sense A1). Another subview is that the agent grid is a specific construct or mechanism within that layer for making services and resources available (sense A2) (in which case it belongs in the set of agent technologies above). Within this view, the grid may have the narrow purpose of just being a registry mechanism (sense A2a) for tracking which agents, services, and resources are assigned to an agent system and monitoring key events associated with these. Alternatively, the grid may be a backplane or organizing framework itself into which all the agent and non-agent services and mechanisms plug (sense A2b , which is closely related to sense B below). In this view, the agent grid refers not only to a collection of technologies, but also to a particular framework or organization of the technologies, similar to the way OMG's OMA organizes object technologies.

Issues related to these views of the grid are:

what new technologies and mechanisms can we identify and add to the agent technology list. Can we more precisely define the ones we have already? Can we get industry standards groups to adopt these specifications? Can we influence COTS vendors to supply reference implementations?
how do we avoid reinventing non-agent technologies and mechanisms? how do we insure that the agent mechanisms will gracefully interoperate with these useful non-agent mechanisms (since agent systems will not do everything)? Can we stand on the shoulders and not the toes of potentially competing technologies, for instance,

the meta computing grid and DARPA programs like Quorum define non-agent mechanisms for distributing computation and organizing resources.
the distributed object community provides ways of modeling and distributing applications and provides collections of middleware services.
the distributed simulation community provides ways to federate collections of simulations with a shared transport, data model, and notion of time.

how do we insure that we do not simply supply toolboxes with mechanisms (glue) in them, but also standard construction techniques for putting the parts together in creating real systems? What sort of (open) frameworks can be used to connect all the parts (agent and non-agent) as well as future technologies?

4.3.2 Viewing the Agent Grid as a Composition or Federation of Agent Systems

Another view of the agent grid (sense B) is that it is a kind of super or meta agent system for connecting agents and agent systems. This system could be built using the standard mechanisms (of sense A) as well as existing non-agent technology (e.g., middleware). The benefit of this view is to show how the agent mechanisms discussed above make it easier to construct and structure agent systems-of-systems, using grid-aware protocols, resources, and services as well as foreign wrapped agent systems and also legacy non-agent systems. In this role, the grid is responsible for providing services and allocating resources among its members. This is equivalent to sense A2b above, that the grid becomes a form of framework.

Among the issues relevant to this view of the grid is grid government. There is a tradeoff between individual agent autonomy and (logically) centralized control (or at least control external to an agent). The grid is a balancing act in providing services and resources. How much does the grid have to understand about the tasks of the agents? To what extent does the grid make these decisions or is it just a mechanism. One could define an empowered active grid as the locus of control making the key decisions of whether agents can have resources and which agents have priority - an almost socialist scheme in which the grid has the power to provide resources, rewards and benefits and is empowered to assign or remove tasks from agents. Agents only have to report status to the grid and receive tasks and that is all. At another extreme, the passive grid just supplies mechanisms for registration but the agents provide the control. The mechanism might now be market driven with some notion of cost and market conditions as the arbiter of decision-making with agents making the decisions. In between, both the grid and the agents in it share control and either divide control (e.g., so the grid handles low level issues like mobility for load balancing, replication for fault tolerance, but not semantic control issues), or agents and the grid must negotiate for control with each other. In this latter variant, by a symmetry argument, the grid must effectively be an agent in so far as it itself negotiates with agents. If the grid only understands some kind of currency (electricity, gridbucks) then it may not have to understand much about the agents registered with the grid. If it must be able to reason about different agents' tasks (can it see their state, or must it ask?), then the grid must have authority and control over agents to make some decisions or task them, again acting like an agent.

Another dimension of the grid as a system view involves the architecture of the grid. One architecture would define a flat (global) grid (a backplane) where the grid supplies services, resources and optimization to a very large collection of agents and/or agent systems. In a variant of this architecture, agents that are currently not registered with any agent system may or must be registered with the grid so they can be found. This makes the grid a kind of agent system itself but one that manages homeless agents. In another variant grid architecture, the flat (logical) grid is really a federated composition of a large number of smaller grids or agent systems. This permits some local control and local differences within each such grid domain. Perhaps different local grids offer different services or make policy decisions differently. Two fairly different variations on this federation-of-grids architecture are worth describing:

The grid federation is not logically a flat grid but rather a hierarchy (like the Internet router hierarchy), graph, or other kind of federation. Agents register with agent systems and agent systems (not agents) register with the grid, which is effectively an agent system itself. So there are hierarchies of agent systems.
There is no central meta grid at all but rather each agent system that wants to participate in the grid chooses to implement a collection of protocols that permit it to interoperate with other agent systems also implementing the grid protocols. Now, no central grid implementation has primacy but rather it is the universal adoption of shared protocols that constitutes the grid. In this view, a grid implementation that provides a collection of services would really just be yet another agent system, one that might be available to interoperate with existing agent systems. Many variations of this architecture can exist where various levels of interoperation are supported among participating agent systems.

Another family of issues concerns whether services supplied by the grid are themselves agents. This immediately raises the question of what constitutes an agent. Is it that the entity communicates via Agent Communication Language or that it contains its own thread of control (other definitions exist). Many of the services that agent systems might need may also be supplied by middleware object services. So agents must be able to interoperate with these services, either directly, via agent wrappers, or by reimplementing the services in agent systems).

4.3.3 Viewing Agent Grids as Organizational Units

Another possible view of grids (sense C) is as coherent organizational units. In this view, grids are not just part of the underlying mechanics of an agent system (mechanism or framework) but are first-class modeling entities meant to represent aspects of the problem being modeled, helping to make the modeled world of agents one-to-one with reality. Examples of such a view of grids might be

enterprises including virtual enterprises that have articles of confederation including IP rules and some profit or non-profit objectives
org charts containing a hierarchy of responsibilities including management, direct and overhead activities. Military force structures and company org charts are historically predominantly hierarchical. They reduce information flows from bottom to top and so handle information overload and communication bandwidth issues.
teams that come together for a short time to do some task
herds, flocks, ... with emergent behavior

In this sense of "grid," members have defined roles, resources, and responsibilities related to the purpose of that grid. Over time, member roles can change so that a child becomes an adult, a worker, a parent, and a grandparent. Similarly, these organizational grids themselves enter, evolve, and leave modeled systems. Resources are allocated to specific parts of organizational grids (not to some giant grid in the sky). Time, money, energy, knowledge, respect, and legal contracts are the glue that binds organizational grids and how the agent spends its efforts. For example, a battalion grid registers into a mission grid which includes registering its control, resources, and services. It seems likely that the mission grid will continue to treat the battalion as an organizational unit and not as a flattened collection of resources that can be given away to others - resources are assigned to force units for a reason - so this kind of grid is a unit of organization and encapsulation. My doing my mission versus you doing yours versus us cooperating seems within the province of agent reasoning -- no omnipotent backplane grid oversees each organizational grid's behavior and resources to insure global optimization (at the expense of each organizational grid's autonomy); instead an organizational grid acts with its own autonomy, and a collection of such grids interoperates, competes, or cooperates emergently depending on goals and plans of the individuals and organizations. Responsibilities are related to and help define roles. An agent might be a member of many organizational grids simultaneously, all of which account for some of the agent's objectives.

A technical issue raised by this view is that if a grid is going to serve as an organizational entity and be referred to as a unit, then it must have identity, have an interface, etc. In other words, it must have many, if not all, of the characteristics of an agent. This may involve defining grids as actual (composite) agents (or ensembles), or defining a single agent in the grid as a representative or connection point for the grid. This is related to the issues discussed in Section 4.3.2, because the desire to use grids to model organizational entities may drive the sets of structures you want to be able to construct with grid technology. Another observation related to this view is that real organizations and systems such as logistics supply chains have a lot of built-in structure. For example, one logistics supply center generally knows its main suppliers and specific ways to find others - that is, they have worked out the way they do the work they spend most of their time doing. This indicates that agent grid technology should not concentrate only on dynamic aspects (e.g., matchmaking) and overlook facilities for modeling existing, more static relationships.

We may or may not choose to call these organizational units "grids" - we might rather call them teams, ensembles, or even ensemble agents, and reserve the word "grid" for large-scale or even global-scale infrastructure. But many of the attributes of a grid are shared with organizational units - especially organizational goals and their interaction with sharing of resources controlled by the organization. Even if not considered as grids, it is clear that organizational units must be modeled in many agent application scenarios. Their existence accounts for many use cases of why we, our co-workers, or our competitors do what we do. Agents will have to model and understand command hierarchies. So it is most likely that some of the most important agent contributions will happen while modeling organizational units.

4.4 The Need for Unified Grid Architectures

The agent grid (at least as described in [CoA98]), in wishing to control non-agent resources (such as computing resources) as well as agents, raises an important additional requirement concerning agent grids, namely the need for grid-like capabilities at multiple technical levels, e.g.:

computational grids and computing fabrics (in the sense of Section 2.1 and 2.2)
the Internet and the Web
databases
distributed object systems (CORBA, DCOM)
agent systems (and agent grids in the CoABS sense)

The same requirement is illustrated by the DoD grid architectures described in Section 2.3. We can think of this as involving both the need for these individual levels to become more "grid-like", and the need for these different levels of grid-like capabilities to be combined.

In considering gridness at these individual levels, we need not say much about grids at the level of computation, since the computational grid is our original, paradigmatic computer-related grid. Grid-like systems also exist at the level of data. By analogy with general grid principles, data-level grids would interconnect pieces of data, and enable the interconnected collection of data to be used in different combinations for various purposes. An obvious candidate for "gridness" at this level is a database. A database constitutes a data grid in a loose sense, since it forms an interconnected collection of related pieces of data. Moreover, a database system also provides query processing, transaction processing, and metadata (schema and data dictionary) facilities that enable the database to be treated as a unified whole, which is a key characteristic of a grid. At the same time, conventional DBMSs are limited in their support to data of relatively limited types. We might expect true "data grids" to provide support for many more data types than current DBMSs. In addition, DBMSs would more closely resemble "true grids" by incorporating additional self-management and organizing facilities. For example, an active DBMS that monitored its own content, and could automatically incorporate attached new data sources, would exhibit more "true grid" characteristics than current "static" DBMSs. Ideally, such capabilities would also be extended to allow the connection of heterogeneous databases to form federations, based on common metadata, ontology, and conceptual schema concepts, much more readily than is now the case. DBMS functionality could also be distributed into the network so that "the network is the DBMS". This trend is related to the information mediator architectures of the DARPA I*3, BADD, and AICE programs, as well as to information agents [Tho98a].

The World Wide Web is another example of the variety of data that we would expect to be included in a "data grid". The Web includes a wide variety of data types, including not only HTML pages, but also files of many types (including various document formats, spreadsheets, etc.). The Web is in many respects a primitive form of distributed database (using its own particular data representations), similar in many respects to early network databases. Once a page is posted to a Web server, it potentially (assuming it points to other pages, and other pages point to it) becomes part of an interconnected collection of data whose component pages can be readily and uniformly accessed. However, the mechanisms needed for unifying this collection into a more coherent whole are at a relatively early stage. Examples of additional technologies needed to make the Web more of a "true grid" include better data representation technologies such as XML, query and transaction support, additional metadata representation technology and standardized vocabularies, WebDAV (an HTTP extension for supporting Web Distributed Authoring and Versioning) and other technologies for making the creation and modification of Web content as straightforward as retrieval of content, and mechanisms such as URIs for separating the identity of data from its location (which, in turn, forms the basis of load balancing mechanisms that can direct requests to alternative sites), and related mechanisms for dealing with information that is removed or is temporarily unavailable. These and other Web technologies are described in [Man98a,b; Man99a,c].

The addition of behavior (code, software) to data moves us into the realm of systems based on objects, e.g., distributed object systems such as CORBA-based systems, or object DBMSs. In such an object system, the basic "grid" is formed of interconnected objects (interconnected by virtue of the references objects contain to other objects). These objects are pre-packaged units of data and associated software. If we consider the relationships between the data that forms an object's state and the code which defines its methods as an additional part of the interconnection that forms the "grid", we can also think of an object grid as an interconnected data and software grid. The Web can increasingly be thought of as a form of object grid as well [Man98a,b; Man99a], due to such things as the increasing use of scripted Web pages and Java in the Web for integrating behavior with Web content, and the development of additional technology to more thoroughly integrate Web and object technologies, such as the Document Object Model [Woo98], and Web-based remote procedure call mechanisms.

As with the systems discussed in connection with "data grids", if we add specific additional capabilities or "services" to the basic network (of objects in this case), the systems become more like "true grids". For example, an object DBMS provides an integrated collection of query, transaction, and other facilities that enable the collection of objects in an object database to behave in a much more cohesive, "grid-like" fashion. Similarly, the addition of CORBAservices such as transaction, query, trading (yellow pages), etc. services to the basic CORBA-enabled network of distributed objects moves a CORBA-based system in the direction of becoming more grid-like. However, there is much work to be done to raise object systems to the level of "true grids", containing well-integrated services, that provide a virtual, distributed, shared object space, and which transparently handle the load balancing, reliability, and other issues associated with "true grids". At the same time, of course, these systems are attempting to address very difficult problems, about which there is still much debate. For example, there is a considerable amount of debate in programming and architectural circles as to the extent to which it is practically possible to achieve transparency when dealing with both local and distributed objects (see, e.g., [WWWK94]).

In addition to the need for the individual technical levels described above to become more grid-like, the resulting grids themselves need to be unified. An agent-level grid supporting this requirement should provide both grid capabilities at the computation and data/object levels in support of agents, as well as grid capabilities at these other levels enabled by agents. Both these types of support are important in making the maximum use of agent-level capabilities. For example, agent-level grids (and also object-level grids) can take advantage of the capabilities of underlying computational grids in supporting their load balancing and quality-of-service requirements (particularly where the higher-level grids can interact directly with the lower levels to exert control). Operational agent grids will also need to interact with data and object systems (which hopefully will become grids at these levels), since much information and software functionality that will need to be accessible to agent grids will continue to exist in these systems.

At the same time, the technical demands of grid concepts at all levels require increasing amounts of "intelligence", collaborative ability, adaptability, component mobility, etc.; in other words, characteristics frequently associated with agents. For example [Bra97b] discusses the use of agent technology in simplifying and enhancing distributed computing capabilities, and in particular enhancing intelligent interoperability in such systems. One such use is the incorporation of agents as resource managers. He notes: "A higher level of interoperability would require knowledge of the capabilities of each system, so that secure task planning, resource allocation, execution, monitoring, and possibly, intervention between the systems could take place. To accomplish this, an intelligent agent could function as a global resource manager." Further distributing these functions among multiple agents, "A further step toward intelligent interoperability is to embed one or more peer agents within each cooperating system. Applications request services through these agents at a higher level corresponding more to user intentions than to specific implementations, thus providing a level of encapsulation at the planning level, analogous to the encapsulation provided at the lower level of basic communications protocols." Agents can also assist in providing better user interfaces for such distributed systems. As [Bra97b] observes, "In the future, assistant agents at the user interface and resource-managing agents behind the scenes will increasingly pair up to provide an unprecedented level of functionality to people."

[Gen97] also describes the role of agents in enabling interoperability in distributed systems. In his approach, agents and facilitators are organized into a federated system, in which agents surrender autonomy in exchange for the facilitator's services. Facilitators coordinate the activities of agents and provide other services such as locating other agents by name (white pages) or by capability (yellow pages), direct communication, content-based routing, message translation, problem decomposition, and monitoring. On startup, an agent initiates an ACL connection to the local facilitator and provides a description of its capabilities. It then sends the facilitator requests when it cannot supply its own needs, and is expected to act to the best of its ability to satisfy the facilitator's requests.

The integration of agents with other levels requires the use of object/component technology, together with reflective (self-referencing) capabilities combined with extensive metadata. For example, [Bra97b] observes: "A key enabler is the packaging of data and software into components that can provide comprehensive information about themselves at a fine-grain level to the agents that act upon them. Over time, large undifferentiated data sets will be restructured into smaller elements that are well-described by rich metadata, and complex monolithic applications will be transformed into a dynamic collection of simpler parts with self-describing programming interfaces. Ultimately, all data will reside in a "knowledge soup", where agents assemble and present small bits of information from a variety of data sources on the fly as appropriate to a given context. In such an environment, individuals and groups would no longer be forced to manage a passive collection of disparate documents to get something done. Instead, they would interact with active knowledge media that integrate needed resources and actively collaborate with them on their tasks." The Web, in its role as the beginnings of a data/object grid, can be said to be moving in this direction now. This is particularly true when technologies for addressing finer-grained portions of Web documents (e.g., XML, and related technologies) and for attaching behavior to Web data are considered [Man98a,b; Man99a]. [Bra97b] also identifies the need for such agents systems to be able to interact with both object systems and more conventional software: "Ideally, each software component would be "agent-enabled", however, for practical reasons components may at times still rely on traditional interapplication communication mechanisms rather than agent-to-agent protocols."

Objects provide a generic modeling or abstraction mechanism for looking at the wide range of resources that need to be included at all levels in such a combined system. An object in this sense is simply an encapsulated unit that has identity, an interface (possibly more than one), and communicates via messages with other objects and the "outside". This use of objects mirrors the use of objects as a general modeling mechanism in the ISO Reference Model of Open Distributed Processing (RM-ODP) [ISO95]. RM-ODP is intended to describe any distributed processing system (including, in some cases, the roles of humans that may be involved in the system), and its use of objects as a modeling abstraction is not meant to imply that the system is actually implemented using object-oriented programming techniques. However, while object abstractions need not necessarily be implemented using object-oriented programming, the use of these abstractions makes the application of object technologies such as CORBA, Jini, etc. relatively straightforward.

Representing the computational and communication components of a computational grid as objects, as illustrated in the Legion system's reflective capabilities, allows these components to be both uniformly represented within the architecture, and managed in a straightforward way by higher level components. The approach of representing computer or network components as objects for management purposes is well-known in both network and computer system management technologies. Data can be represented as objects in a straightforward fashion, by defining object interfaces containing get (read) and set (write) operations. The World Wide Web Consortium Document Object Model [Woo98] is an example of a set of such interfaces designed to provide object-oriented interfaces to Web data. Such interfaces provide programs and agents with more uniform access to information represented both as data (e.g., in databases, on file systems, or in the Web) in distributed object systems, and also support the integration of more "intelligence", in the form of behavior, with such data. Finally, as noted in Section 4, object interfaces can encapsulate "smart things", e.g., agents and human beings. In some cases the messaging protocols between these various kinds of objects will be relatively simple (e.g., conventional object RPC between distributed software objects, or commands sent to hardware), while in other cases they will be more complicated (agent communication language (ACL) sent between agents, or the email flow between people); however, similar abstraction principles can apply to objects at all levels.

In such an integrated architecture, there is also a need to define additional forms of organization on the available resources in addition to the various grid technical levels of computational, data, etc.. For example, it may be desirable to build functional layers of grids, such as the information, sensor, and engagement grids identified in the DoD grid architectures described in Section 2.3. Building grids at each of these functional layers would require use of technologies from more than one, and possibly all, of the technical grid levels. In addition, large scale distributed object systems increasingly are being designed with 3- (or sometimes multi-) tier architectures [MGHH+98]. These architectures involve the division of the system's components (and object definitions) into functional tiers based on the different functional concerns they address. For example, a typical 3-tier architecture has a tier for objects representing user interface elements, a tier for business or application objects, and a tier for database servers. The business object tier separates out the common definitions of enterprise operations and semantics from the more specialized concerns addressed in the other tiers. Other examples of such organization include the use of Common Schema concepts [Man98c] or enterprise-level ontologies (enterprise-wide agreements on common semantics), and specialized schemas/ontologies for use by specialized user communities (together with mappings to the common definitions where possible).

Semantics-based mappings between the different technical levels in such an architecture are also required. For example, the ATAIS architecture document [BFHH+98] describes a series of interoperability levels: isolated, co-habitable, syntactic, semantic, seamless, and adaptive. The computational grid idea can be characterized as emphasizing high levels of interoperability on this spectrum, but at a low level of abstraction (i.e., in terms of computing resources). The agent grid often involves a much higher level of abstraction. Other levels (e.g., data, objects) are, in a sense, in between these extremes. Raising the level of abstraction complicates providing "gridness" (deep integration) because the requirements on one side, and the available resources/services on the other, are more semantically heterogeneous (unlike, e.g., "memory" and "CPU bandwidth"), and thus both characterizing them, and matching requirements with resources, becomes harder. An example of this is the complexity of addressing quality-of-service (QoS) issues, which involves defining mappings between "quality" measures at higher levels, and resource allocations at lower levels [Man99b].

Such additional levels of organization provide the basis for more interoperability among the various technical levels of resources contained in the system, and hence help enhance the ability of these resources to operate as a "true grid".

At the same time, it is worth pointing out that there is not likely to be one ultima grid or all-purpose unified true grid framework as technology evolves. Instead, migration paths will likely connect grids and grid levels incrementally where it is most useful to do so. But recognizing that that is what is happening will likely make grid composition easier as architectural patterns for doing so become better understood.

4.5 Open Issues about the Agent Grid

Based on the preceding discussion, the following is a partial list of agent grid issues that need to be resolved in defining an agent grid:

Application View

What will the benefits be to applications that make use of an agent grid?
If commercial-off-the-shelf component software technology is not easy to compose today and assuming this is a goal of the agent grid, how will agent technology help?

Architectural Concepts

Is the agent grid a kind of (super) agent or agent system itself.

Does it negotiate with resident agents for resources?
Do grids and agent systems both provide services within their biosphere (boundary).

If computational grids support sharing computations, then what sharing do agent grids support? Possible answers: information, knowledge, decisions, requests and responses, plans? agent capabilities of all kinds including the resources that can be assigned to agents, like processor usage, disk space, etc.
Are there definitional properties of agent grids?

Is there any minimal or maximal set of properties that we can agree on for something to be an agent grid? A minimal grid may be lightweight in providing a least number of services. Is no services the minimum or must a grid at least provide an agent registry? Can just any registry do?
Is the criterion for agent gridness based on a certain set of technical mechanisms (e.g., makes use of an ACL) or just any system with (some of) these properties: autonomy, adaptive, cooperative, mobile, interoperable.

Agent-grid relationships

How much do we need to know about agents to define the agent grid? Does the same grid support agents that are mobile, intelligent, complex, autonomous, reactive, etc.
Are agents fully autonomous including being independent of the/any grid? Or can they be dependent on specific grid services being available (and fail if the service becomes unavailable)?
Are there non-grid agents, that is, agents outside any grid? If so, how do they interoperate with grid agents?
How are agents and grid services related? For instance, do agents implement the long list of services that the grid provides or is that underlying component software? Is there a difference between a service being an agent and being controlled by an agent?
Does each agent contain a planner or is a planning service global to a collection of agents?

Is a grid an abstraction layer defined by emergent properties (implicit in the way agents interact with each other) or an explicit construct?
How can we avoid the grid as "yet-another-architecture"? What is the appropriate relationship between required grid capabilities and capabilities in existing multi-agent architectures, distributed object systems, DBMSs, simulation systems, network management systems, workflow, and meta computing systems? How do we make the best use of these existing capabilities in building agent grids?

Control Issues

Does the grid actively control services, resources and optimization? Can the grid unilaterally take resources away from agents that have them?
Where are the control points where different control algorithms might be substituted into the grid architecture
Are agents actively part of the grid or are they end users of the grid?
How much does the grid have to understand about the tasks of the agents?
Are some grid governments/economies inherently better than others? e.g., socialist vs. market
How are resource allocations made in the grid (e.g., is competition for resources based on a marketplace concept)?

Scalability, Federation, and System-of-system Issues

Will one grid scale to support millions of agents, or are there many such grids?
How do agent grids federate or interoperate?
How are grids federated? e.g., global grid, flat grid, hierarchy of grids
Do agents seek out/lookup different kinds of grids depending upon their needs? Do grids seek out agents/systems to join their grid?
What happens if agents interoperate that come from differently configured grids?
Can agents belong to multiple grids simultaneously?
How should quality of service (QoS) and system-wide properties like those covered in [Man99b] be architected into the agent grid? These include: reliability, security, manageability, administrability, evolvability, flexibility, affordability, usability, understandability, availability, scalability, performance, deployability, configurability, adaptability, mobility, responsiveness, interoperability, maintainability, degradability, durability, accessibility, accountability, accuracy, demonstrability, footprint, simplicity, stability, fault tolerance, timeliness, schedulability, survivability, simplicity, openness, seamlessness, safety, and trust.

Do some grids have these properties and others not?
If different grids contain different policy choices or different services, how does that affect agents communicating across grid boundaries?
Can we add new services and -ilities to a grid once it is deployed (grid evolution)? how transparent is addition or subtraction of services and ilities?
How can we accommodate heterogeneity in local grids and still guarantee systemic properties across grid federations?
If the system has an -ility, is the grid tasked with monitoring or full enforcement? Is -ility maintenance local or global?

Pervasiveness andGrid Economy

How do we foster an economy of componentized agent software? What are the roadblocks and what is missing?

security issues
micro licensing component software and leasing resources across the network

like many grid services, licensing’s degenerate form is no licensing.
Agents and component software cannot succeed without an economic model that makes broad communities get value from them. One way to do this is via licensing space on your machine, capabilities and services, data sources, …
A model of licensing might be critical for CoABS to succeed in the large.

How do we populate the grid with millions of agents and/or advertisements for services?
Do planning techniques scale for Internet and programming language communities?

5. Conclusions and Future Work

The concept of a "grid" is a generally useful idea, but only if it means something more than an ordinary collection of distributed resources. Ideally, it implies some higher level integration of distributed resources beyond simply connecting them. Additional work is needed to identify the details of the added functionality required to go beyond simple distributed collections of resources to the formation of "true grids". The grid concept can be applied at a number of individual technical levels (computation, data, object, agent). Grids have been defined and demonstrated at the computation level already. At other levels, more advanced integrating mechanisms have been defined (e.g., federated DBMSs at the data level), but these do not yet approach the level of grids.

The notion of an agent grid in particular is, on the surface, not too controversial. It is clear that

there are several kinds or levels of grids and there is a body of agent infrastructure technology and requirements that could be useful in developing grids (e.g., ACLs, matchmakers, agent wrappers, the need for agent system interoperability)
the grid might be related to agent systems either as providing a master agent registry mechanism or by itself being a meta agent system.
agent-like organizational units (ensembles) act like grids in that they control resources.

At the same time, there are several ways to characterize the agent grid and many open issues. For example, few would agree to one global construct called the agent grid or one such implemented system that makes all optimization tradeoffs and provides all system-wide control. As a result, there is more work to be done in areas including:

issue identification and resolution
identification and further development of technology underpinnings (e.g., Web and distributed object technologies)
identification of grid service requirements
how to obtain system-wide properties (-ilities and quality of service issues)
interoperability and loose coupling
functionality and pervasiveness demonstrations

Looking more broadly, to advance the use of the grid concept, the general idea of a grid needs to be applied at multiple technical levels (computation, data, object, agent) in order to identify in detail the specific technologies which would enable the creation of grids from collections of distributed resources at each of these levels. Grids at these individual levels are useful by themselves, but the maximum advantage comes when these different levels of grid capabilities are combined. There is a need for additional work to develop a unifying technical grid architecture which incorporates these separate grid levels, and identifies mappings between them. The development of grid concepts at these various technical levels will require more work in areas such as:

the identification and composability of resources in these systems (reducing the need for explicit programmer or other human intervention)
the dynamic aspects of systems (e.g., resources entering and leaving, load balancing)
greater generality, ubiquity, and mobility of computing resources and applications
loose coupling between layers so different layers can be at different stages of development.
concepts or enterprise-level ontologies (enterprise-wide agreements on common semantics), and specialized schemas/ontologies for use by specialized user communities. Semantics-based mappings between the different technical levels in such an architecture are also required.
application requirements to understand how future applications can be written to take advantage of grid architectures.

The discussion in this paper does not replace the need to address these detailed technical issues. However, it does provide a way of thinking about the general ideas which the various grid concepts have in common, and the issues they raise, which hopefully can be helpful in attempts to address them.

Acknowledgements

This research was sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Air Force Research Laboratory under contract F30602-98-C-0159. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency, U.S. Air Force Research Laboratory, Department of Defense, or the United States Government. We acknowledge helpful discussions and input from Paul Pazandak (OBJS), Venu Vasudevan (OBJS), Richard Ivanetich (IDA), Al Piszcz (MITRE), Brian Kettler (ISX), Jeff Bradshaw (Boeing), and Doyle Weishar (Global Infotek).

References

[ABIS96] ABIS Task Force, 1996 Advanced Battlespace Information System (ABIS) Task Force Report, 1996. <http://www.dtic.mil/dstp/96_docs/abis/abis.htm>, (protected access).

[BFHH+98] E. Brady, B. Fabian, M. Harrell, F. Hayes-Roth, S. Luce, E. Powell, G. Tarbox, "The Advanced Technology Architecture for Information Superiority", DARPA ISO, draft 10/16/98.

[Bra97a] J. Bradshaw (ed.), Software Agents, American Association for Artificial Intelligence/MIT Press, 1997.

[Bra97b] J. Bradshaw, "An Introduction to Software Agents", in [Bra97a].

[CG98] A. Cebrowski and J. Garstka, Network-Centric Warfare: Its Origin and Future, U. S. Naval Institute Proceedings, Vol. 124/11,139, January 1998, 28-35 <http://www.usni.org/Proceedings/Articles98/PROcebrowski.htm>.

[CoA98] DARPA CoABS Read Ahead Package and CoABS Kickoff Meeting, Pittsburgh, July 22-23, 1998. <http://ballston.prc.com/coabs/workshops.htm>

[DC498] Directorate for Command, Control, Communications, and Computer Systems, Observations on the Emergence of Network-Centric Warfare, Information Paper, 1998 <http://www.dtic.mil/jcs/j6/education/warfare.html>.

[DDRE98] Director, Defense Research and Engineering, Joint Warfighting Science and Technology Plan, 1998 <http://www.dtic.mil/dstp/98_docs/jwstp/jwstp.htm>.

[FF97a] G. Fox and W. Furmanski, "Petaops and Exaops: Supercomputing on the Web", IEEE Internet Computing 1(2), March-April 1997.

[FF97b] G. Fox and W. Furmanski, "HPcc as High Performance Commodity Computing", Technical Report, December 1997, http://www.npac.syr.edu/users/gcf/hpdcbook/HPcc.html.

[FF97c] G. Fox and W. Furmanski, "High-Performance Commodity Computing", in [FK99a].

[FK99a] I. Foster and C. Kesselman (eds.). The Grid : Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999.

[FK99b] I. Foster and C. Kesselman, "Computational Grids", in [FK99a].

[FK99c] I. Foster and C. Kesselman, "The Globus Toolkit", in [FK99a].

[KEFM97] J. Knight, M. Elder, J. Flinn, P. Marx. Summaries of Three Critical Infrastructure Applications. Technical Report, U Virginia, November 14, 1997.

[Gen97] M. R. Genesereth, "An Agent-Based Framework for Interoperability", in [Bra97a].

[GLS94] W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface, MIT Press, Cambridge, 1994.

[GFLH98] A. Grimshaw, A. Ferrari, G. Lindahl, and K. Holcomb, "Metasystems", Comm. ACM 41(11), November 1998.

[GG99] D. Gannon and A. Grimshaw, "Object-Based Approaches", in [FK99a].

[HS98] N. Huhns and M. Singh (eds.), Readings in Agents, Morgan Kaufmann, 1998.

[ISO95] ISO/IEC JTC1/SC21/WG7 (1995), Reference Model of Open Distributed Processing <http://www.iso.ch:8000/RM-ODP/> (see also <http://www-cs.open.ac.uk/~m_newton/odissey/RMODP.html> and <http://www.dstc.edu.au/AU/research_news/odp/ref_model/ref_model.html>).

[J695] Joint Staff (J6), Joint Pub 6.0: Doctrine for C4 Systems Support to Joint Operations, 30 May 1995 <http://www.dtic.mil/doctrine/jel/new_pubs/jp6_0.pdf>.

[Ket98] B. Kettler, "DARPA CoABS Program: Use Cases for a Prototype Grid," draft 3.1, 12/15/98, ISX Corporation <http://coabs.globalinfotek.com/coabs_downloads/coabs_word/COABS_GRID_USE_CASES_V3.DOC> (password protected).

[Ket99] B. Kettler, "The CoABS Grid: Technical Vision," draft version 1.0, 4/7/99, ISX Corporation <http://coabs.globalinfotek.com/coabs_downloads/coabs_word/990407GridVisionDocDraftV1.doc> (password protected)..

[KT98] N. Karnik and A. Tripathi, "Design Issues in Mobile-Agent Programming Systems", IEEE Concurrency 5(3), July-September 1998.

[Man98a] F. Manola, Towards a Web Object Model, Technical Report, Object Services and Consulting, Inc., <http://www.objs.com/OSA/wom.htm>, 1998.

[Man98b] F. Manola, Some Web Object Model Construction Technologies, Technical Report, Object Services and Consulting, Inc., <http://www.objs.com/OSA/wom-II.htm>, 1998.

[Man98c] F. Manola, Flexible Common Schema Study, Technical Report, Object Services and Consulting, Inc., December, 1998 <http://www.objs.com/aits/9811-common-schema-report.htm>.

[Man99a] F. Manola, "Technologies for a Web Object Model", IEEE Internet Computing, 3(1), January/February, 1999.

[Man99b] F. Manola, Providing Systemic Properties (Ilities) and Quality of Service in Component-Based Systems, Technical Report, Object Services and Consulting, Inc., January 1999 <http://www.objs.com/aits/9901-iquos.html>.

[Man99c] F. Manola, "Characterizing Computer-Related Grid Concepts," Frank Manola, Technical Report, Object Services and Consulting, Inc., March 1999, <http://www.objs.com/agility/tech-reports/9903-grid-report-fm.html>

[MGHH+98] F. Manola, et.al., "Supporting Cooperation in Enterprise-Scale Distributed Object Systems", in M. P. Papazoglou and G. Schlageter (eds.), Cooperative Information Systems: Trends and Directions, Academic Press, 1998.

[Paz98a] P. Pazandak, Best of Class Agent System Features, <http://www.objs.com/agility/tech-reports/9809-best-of-class-capabilities.htm>, 1998.

[Paz98b] P. Pazandak, Next Generation Agent Systems & the CoABS Grid, draft Technical Report, <http://www.objs.com/agility/tech-reports/9810-NGAS.htm>, 1998.

[Paz98c] P. Pazandak, Agent System Comparison, draft, 10/19/98, <http://www.objs.com/agility/tech-reports/9810-agent-comparison.html>.

[Paz98d] P. Pazandak, A Rough Guide to Sun's Jini, August 1998, <http://www.objs.com/agility/tech-reports/9808-Jini-summary.htm>.

[Pis98a] A. Piszcz, "Background on Agents for DARPA's NGII Architecture", Mitre Techical Report MTR 98W0000085, August 1998.

[Pis98b] A. Piszcz, Grid Metaservice Considerations for Control of Agent Based Systems, draft, 3 September, 1998, <http://www.objs.com/agility/other-documents/9810-gmsspec012.pdf>.

[Sho93] Y. Shoham, "Agent-Oriented Programming", Artificial Intelligence 60(1), 51-92.

[Tho98a] C. Thompson, Strawman Agent Reference Architecture, Presentation to DARPA ISO Architecture Working Group, August 13, 1998 and OMG Agent Working Group, September 14-15, 1998. <http://www.objs.com/agility/tech-reports/9808-agent-ref-arch-draft2.ppt>.

[Tho98b] C. Thompson, Characterizing the Agent Grid, Technical Report, Object Services and Consulting, Inc., 1998 <http://www.objs.com/agility/tech-reports/9812-grid.html>.

[TBPV99] C. Thompson, T. Bannon, P. Pazandak, and V. Vasudevan, Agents for the Masses, Workshop on Agent-based High Performance Computing: Problem Solving Applications and Practical Deployment, Agents '99, Seattle, WA, May 1, 1999, <http://www.objs.com/agility/tech-reports/9902-agents-for-the-masses.doc>.

[VV95] W. Van de Velde, "Cognitive Architectures--From Knowledge Level to Structured Coupling", in L. Steels (ed.), The Biology and Technology of Intelligent Autonomous Agents, Springer Verlag, Berlin, 1995.

[Woo98] L. Wood, et al., Document Object Model (DOM) Level 1 Specification, W3C Recommendation, World Wide Web Consortium, <http://www.w3.org/TR/REC-DOM-Level-1/>, 1998.

[WWWK94] J. Waldo, G. Wyant, A. Wollrath, and S. Kendall, A Note on Distributed Computing, SMLI TR-94-29, Sun Microsystems Laboratories, Inc., November 1994 <http://www.smli.com/techrep/1994/abstract-29.html>.

© Copyright 1998, 1999 Object Services and Consulting, Inc. (OBJS)
Permission is granted to copy this document provided this copyright statement is retained in all copies.
Disclaimer: OBJS does not warrant the accuracy or completeness of the information in this report.