WebTrader: Agent Discovery

Project Summary

Tom Bannon and Craig Thompson
Object Services and Consulting, Inc.

17 June 2002

Executive Summary
Objective
Technical Accomplishments

Architecture
Internet Resource Advertisements, Trader Federation, and Rebinding
WebTrader Query Tool
Gridifying WebTrader

Technology Transition
Next Steps

Executive Summary

WebTrader is a lightweight trader/matchmaker/yellow pages that is scalable to WAN environments and can be used to locate Internet resources (e.g., services, agents, data sources, search engines, other traders, anything) whose advertisements can appear on any web page, represented in XML. WebTrader depends on existing search engines to index the advertisements and effectively provides a meta search front end that can locate Web pages containing advertisements that match its query.

We demonstrated evolving versions of WebTrader at CoABS Workshops at Las Vegas (January 1999), Northhampton (June 1999), Science Fair in Arlington (October 1999), Atlanta (February 2000), and Boston (August 2000). Thereafter, we halted the project, per Jim Hendler's guidance, to provide added resources for eGents and MIATA, CoAX and JBI TIEs.

Objective

The objective of the WebTrader project is to develop WebTrader Query Tool (aka WebTrader), a trader/matchmaker/yellow pages that can scale to WANs and permit anyone on the Web to advertise a resource (e.g., agent, service, data source, search engines) that anyone else can discover. The approach to scaling is to represent advertisements in XML, store them on Web pages, let existing already pervasive and industrial strength search engines index these pages, then WebTrader's engine accesses one or more search engines, locates pages with advertisements, matches the ads against the request, and returns the matching advertisements. One interesting application of WebTrader Query Tool is DeepSearch, a recursive/federated implementation of WebTrader that acts like a normal search engine but locates search engines at local sites and recursively searches these. Many web search engines only index top-level pages leaving the other half of the web unindexed except by these local search engines.

Specific scientific and engineering subgoals were:

represent Internet resource advertisements for services, agents, data sources, search engines, traders. (This opens the door to the ontology problem that is the focus of the DARPA DAML program.)
locate these advertisements via WebTrader which itself piggybacks on one or more Internet search engines that have indexed the resource advertisements
demonstrate trader federation and rebinding
re-engineer client-server WebTrader so the client is an applet that can download quickly to most browsers
explore how to locate and mine local domain search engines as a way to broaden and deepen Web searching
demonstrate WebTrader as a CoABS grid agent
demonstrate WebTrader in scenarios of interest to DoD

WebTrader alone is not an agent system, but it is a generalized component capability of most agent systems, providing matchmaking. The interesting aspect of WebTrader is its scalability to the Web. Seen in this light, WebTrader can be viewed as a standalone agent grid enabling component, but one that operates in open WAN environments, is compatible with not only agent but also object and ontology technologies, and has no downloading barrier to widespread deployment.

Technical Accomplishments

Architecture

The basic architecture of WebTrader is fairly simple. Essentially, WebTrader is a new kind of specialized meta search engine that wraps other search engines to return typed advertisements from the open Web. As shown in the figure below, any page on the Web (1) can be annotated by anyone with an XML WebTrader advertisement (2) - see example Trading Advertisement DTD. These pages are indexed (3) by ordinary Web search engines. When a Query (client advertisement) (4) is presented (5) to WebTrader's engine (6), it accesses known Web search engines (7), they find matching webpages in the usual way (8) and return then (9). A WebTrader matching algorithm (10-12) finds candidate advertisements, scores them, and returns a series of responses in rated order (13-14).

Internet Resource Advertisements, Trader Federation and Rebinding

Our Salt Lake City demonstration of the WebTrader (January 1999) illustrated:

XML-based resource advertisements
service binding and rebinding
trader federation

A significant evolution in the design of the WebTrader ads occurred along the way, as we realized we could rework advertisements so that resource description in ads could be independently defined, allowing anyone to create an ad for anything they cared to define or use an existing definition of, from CORBA IDL components to bicycles. This change also affected ad matching and returned results. The revised TradingAd DTD now defines <!ELEMENT resource ANY>, allowing resources to be any XML document, from reconnaissance reports to parts inventories, from Java RMI components to Oracle databases. This successfully decoupled resource definition, basically an ontology question, from the WebTrader design, a major step forward.

We also made progress on service binding and rebinding and trader federation. In the Salt Lake City demo, trading ads exist for a variety of components (e.g., service agents, clients, and WebTraders). Metadata including color and cost is included into the ads. A Blue client, for example, asks the USA WebTrader to locate a Blue agents implementing a particular interface and with zero cost, if possible. One is found and bound, and the client makes use of the agent. When the agent unexpectedly dies, the WebTrader is consulted again by the client, in case the “state of the world” has changed. It gets back a new list of agents, sorts them by cost, and goes down the list trying each agent until it finds one that works. If the WebTrader fails to respond, the client can fall back on its cached list of previously found agents. When the Yellow client asks the USA WebTrader for Yellow agents, the WebTrader’s initial search turns up none, as it only consults a domain that indexes ads of Red, White, or Blue agents. However, it does find an ad for a WebTrader that knows about Yellow agents, and so passes on the original client query to the Euro WebTrader, which finds a Yellow agent, passes it back to the USA WebTrader, which passes it back to the client, which then uses it to connect to the agent. At the right in the figure is shown a service advertisement in XML. For more detail, see WebTrader Demo Script.

WebTrader Query Tool

Our status in Salt Lake City was, we had a better understanding of both webtrading and also the deep search problem but separate implementations. The next and main phase of the WebTrader project was to develop WebTrader Query Tool, a re-designed and re-implemented next generation WebTrader/DeepSearch capability that combined and generalized all functionality in a single prototype. This next generation WebTrader was engineered so that the front-end works as an applet with good performance on most browsers (tested on Internet Explorer and Netscape) and the backend is a servlet.

In a demonstration from the CoABS Boston Workshop (August 2000), WebTrader Query Tool was coupled with a modified Java App Builder (a visual programming tool), extended with WebTrader drag-and-drop capability. In the demo, the user creates a want ad called ColorButtonsSpec.xml, drags it to WebTrader Query Tool, runs a query which returns pages containing ads for colored buttons, then selected buttons are dragged to the WebTrader App Builder and connected together to build a simple application.

WebTrader Query Tool is a generalization, redesign, and reimplementation of previous work in numerous ways -- the more significant changes are listed below. Almost all code was rewritten from the Salt Lake City implementation.

advertisements - We redesigned the XML advertisements. The new design included <!ELEMENT resource ANY> as described above. We redesigned the WebTrader's JavaRMI DTD (now called JavaRMIObject), modified its CGI DTD, and modified all the example ad web pages to reflect these changes. We also created a new ad type (via a DTD) called SiteSearch to advertise local search engines. We located several local site search engines that indexed content having to do with agents and manually created ads for them. Later, we developed heuristics for discovering unadvertised local search engines on webpages. If found, we include these as a sub-result of that page and add it to the ad repository - this kind of discovery is a general approach to populating ad repositories.
applet-servlet architecture - We separating searching from the display the results, and implemented an applet-servlet architecture. We reimplementing the client-side GUI as an applet, around 30KB in Java 1.02 for widest portability, about half for the Tree GUI component. The tree component is used to record query histories, multiple live searches, canceling and deleting queries, adding increments to a search (e.g., top 20, 50, or 100 search results. The applet also supports a polling capability to handle the deep results as well as the top-level ones, scrollbars, and a concern for efficiency, to reduce the number of messages between the servlet and applet, so that web-sized scaling of the search service can be realistically achieved.
inputs - WebTrader Query Tool can take either search terms or want ads as input. We made WebTrader Java Web Start-able, ie. one can download and run DeepQ as a desktop application with one click. When run on JVM 1.2 or later, such as via Java Web Start (JWS) or the Java Plugin, then dragging the WebTrader results over to the Web Trader App Builder (WTAB) is automatically enabled.
target search engines - We developed a modular search engine results parser and specialized it to Webinator, Google and later to Open Directory. This kind of thing is an ongoing effort, and not just with Google, as SEs all seem to tweak their output syntax from time to time. We redesigned and implemented the match information returned to clients. Originally we expected to return W3C DOM parse trees of the XML ad resource to clients as part of this information and build convience routines for clients to pick out desired information from the resource (such as the URL of a Java RMI object to connect to to get the service advertised in the ad). However, the DOM objects generated by Sun's XML parser are not (directly) serializable and thus not able to be sent through the RMI link the WebTrader uses between its client and server parts. So, instead we return the raw XML of the resource and use SAX-based XML parsing tools on the client side to accomplish the same thing, a better design since we are sending less information and SAX is lighter weight than DOM. Later, we developed an algorithm to accurately parse the results of foreign search engines encountered on the fly (or fail safely).
matchmaking algorithm - We modified the WebTrader metadata matching algorithm to more successfully match metadata dynamically created from user inputted search keywords. Also, we ideveloped a generic scoring model.
federation - We developed a new WebTrader federation design. One insight from the new design of the WebTrader federation scheme is that a WebTrader does not have to wait to determine that it cannot find any or enough matches to satisfy a client; at any point it comes across an ad for another WebTrader that looks promising it can propagate its query to it, and have all the results dynamically merged (sorted) on the client end. In fact, there is probably not any other practical way - what WebTrader has time for secondary WebTraders (and so on) to build up a list of results - all results must be streamed dynamically to be timely.
performance - We implemented a new result polling capability. Now all search engines are equally controllable from the applet, instead of special consideration given to the top level engines in the old design. Now, the GUI will not keep up constant connection to servlet, but will rather poll for results.
security - We modified the applet GUI to better work as a JWS app for the situation where there is no web page that the applet is embedded in.

Gridifying WebTrader

The CoABS grid is a JINI-based implementation of an agent interoperability platform developed by GITI, the DARPA CoABS program integration contractor. It is an important, on-going experiment in agent system interoperability. As described elsewhere, we contributed architectural ideas to the grid. But in addition, we developed three standalone agent components (eGents, WebTrader, and AgentGram) that can play a role as (stand alone or connected) grid components or services. As part of the Agility WebTrader project, we developed the grid-relevant capabilities described below. At the same time, we note that WebTrader is itself a potentially pervasive stand-alone grid capability that can supply matchmaking for agent, object or ontology systems.

For the Science Fair (October 1999), we gridified the WebTrader Agent (WTA). Previously, in NEO TIE #2 (see description below), the WebTraderAgent (WTA) was used as a WAN matchmaker to provide SRI OAA and USC/ISI Ariadne the ability to dynamically extend its information sources based on user queries it received. For the Science Fair, we modified the WTA to query the Grid for any WebTraders registered there. If none is found (or timeout), then the WTA automatically falls back to its local embedded WebTrader. This essentially extends CoABS grid JINI lookup service to be a WAN lookup service. The implementation involved creating the WebTrader Grid Service (WTGS), a Java program that registers a WebTrader on the CoABS Grid and logs its actions, along with a new version of WebTrader Agent (WTA) that lists WebTraders found on the Grid. We worked with Hank Seebeck and Adam Wenchel (both GITI) to test WTA on the grid. For the Boston CoABS Workshop (August 2000), we upgraded WebTrader Query Tool to use the grid in the same way as shown in the diagram below.

We also created a launch page for a CoABS 7x24 grid accessible version of DeepQ. WTA was maintained on the CoABS 7x24 grid for approximately a year.

Technology Transition

In 1999, we applied WebTrader in the DARPA CoABS Non-combatant Evacuation Order (NEO) Technology Integration Experiments (TIE). The NEO TIE involved an urban rescue effort and served to organize many CoABS program activities in the first year of the CoABS program. The thrust was on agent interoperability and rapid assembly of heterogeneous agent systems to solve problems. Lessons learned and components from this exercise where later incorporated into the CoABS grid. TIE #2 involves the scenario Find Civilians, Get Them to Embassy, and the interaction scenario below shows not only what information is required but also the role of various CoABS subsystems: OBJS WebTrader, OBJS AgentGram/MBNLI, ISI Ariadne (which extracts information from Web pages), and SRI Open Agent Architecture (OAA, which enables inter-agent communication). Steve Minton (ISI) coordinated the TIE. For the NEO TIE, we developed WebTraderAgent, an OAA wrapper for a WebTrader. This effectively added a Web-wide trader to OAA. In the demo, when a new service or data source is needed, the WebTraderAgent is consulted to locate a WebTrader which can then be consulted to locate advertisements for matching services. In addition to building the OAA wrapper, this involved porting WebTraderAgent to JDK 1.2 (Java 2), creating a stripped-down security policy for WebTraderAgent, and using an executable jar file to house the agent.

In March 2000, we presented work on Agent Discovery Matchmaking to OMG, essentially, a sketch of a reference model for traders that subsumes WebTrader as well as conventional agent matchmakers and LAN traders like OMG's Trader.

Next Steps

The main puzzle preventing widespread use of WebTrader is the problem of populating the Web with advertisements. Since advertisements can be about anything, this is essentially the ontology problem. The entire DARPA DAML program as well as W3C RDF and semantic web communities are working this problem. Universal Description, Discovery, and Integration (UDDI) has recently been working to register businesses and their services. Finally, various Want Ad repositories have developed interesting proprietary ontologies. Since WebTrader's architecture is relatively independent of resource descriptions, the architecture can use any or several of these ontology representations. Both DAML and UDDI are examples of manually creating advertisements. An alternative is to automate this process (insofar as possible). A potential source of ads is the Web itself, namely, mining Web pages for ads of interest, especially in tractable domains. Alta Vista has done this with various easy-to-recognize data types like images and mpg's. It might not be hard to find certain other kinds of elements, for instance, banner ads. Similarly, our DeepSearch project has developed ad hoc recognizers for locating local search engines on Web pages. Of course, both manual and automated means of populating the ad space would be useful.

It is worth noting that the basic design of WebTrader appears to be that of an open world repository, that is, ads can come from any pages that search engines index. Interestingly, it is not a large change to convert WebTrader into a closed world search engine only containing ads that are inserted into its database (instead of SE index). This might make more sense for some kinds of services, e.g. resume services or a company's web services, where the service provider might want control over who can advertise.

Although robust in several ways, the WebTrader implementation is still a prototype. More work is needed in several areas:

We would like to explore type specific matchmaking so different matchmakers could be plugged into WebTrader on-the-fly depending on the type of advertisement. We ran into this requirement in the NEO TIE where it would have been useful to match the predicate, the arguments, and the variables in OAA solvables so only exact matches would be found
We need more work on the algorithm for parsing results of search engines to extend the meta search capability.
For many purposes, it makes more sense to index advertisements during the crawling phase rather than when pages containing ads are returned from search engines.
We would like to remove the dependence on Thunderstone's Webinator which requires a live Internet connection (so it can check the online license at Thunderstone), does not always rank the results of OR queries correctly, does not support some of the query logic we need, and does not index XML pages. Open Directory ISearch is a potential alternative.
We would like to use toolkits flike WEBL and XWRAP Elite for web page content extraction as a way to locate SEs on web pages faster.

This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Air Force Research Laboratory under contract F30602-98-C-0159. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency, U.S. Air Force Research Laboratory, or the United States Government.

© Copyright 1998, 1999, 2000, 2001 Object Services and Consulting, Inc. All rights reserved. Permission is granted to copy this document provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the accuracy or completeness of the information in this survey.

Last revised: October 2001. Send comments to Craig Thompson.

Acknowledgements: Paul Pazandak did most of the design and implementation with Craig Thompson providing brainstorming and review. Steve Ford installed and demoed MBNLI at CoABS Workshops that Pazandak did not attend. Thompson completed the CoAX TIE demo.