The technical objective of this project was to improve the foundation for Web and object model integration. Several candidate technologies from different communities exist -- for instance, HTML and MIME from the Internet community; Webserver from the DARPA JTF/ATD command and control community; ORBs, IDL, Tagged Data from OMG; Java and ActiveX from the component community; Harvest SOIF and Netscape RDM from the search engine community; ODMG ODL and Tsimmis OEM from the database community; Dublin Core, Warwick Framework, and related work from the Internet metadata and digital libraries communities; two different specifications for document object models from Netscape and Microsoft; and RDF and XML from W3C.
Our approach has been to identify the main contender approaches, identify any deficiencies in each approach, identify a convergence approach centered around the use of XML as a basic representation, fill in (some of) the gaps, and transfer our results and lessons learned directly and incrementally to DoD command and control projects like DARPA Advanced Information Technology Services (AITS) Architecture and industry standards organizations, primarily W3C and OMG.
The results of our work have been the identification of a key technical framework for integrating Web and object technologies, a number of Technical Reports and external publications describing this approach, a prototype illustrating one of the possible Web object construction mechanisms, and the injection of ideas from this work into the activities of OMG and W3C.
There is much ongoing work within both the Web and database communities on data structure enhancements to address these issues. Work on similar issues is ongoing within the Object Management Group (OMG) as well. This work has contributed valuable ideas, and the various proposals illustrate similar basic concepts, generally, movement toward some form of simple object model. However, these similarities are often obscured by detailed representational differences, and the work is fragmented and lacks a unifying framework. As a result, individual proposals often lack key capabilities that are in some cases contained in other proposals. Moreover, in most cases these proposals are not well-integrated with key areas of emerging industry consensus on emerging Web data structuring technologies.
If the Internet is to develop to support advanced application requirements, there is a need for both richer individual data structuring mechanisms, and a unifying overall framework which supports heterogeneous representations and extensibility and provides metalevel concepts for describing and integrating them.
We also identified a potential formal basis for applying database operations, such as query and view operators, to the resulting structures, based on object logics such as F-logic. These logics provide limited second-order capabilities for dealing with the metalevel concepts, while using first-order semantics, which provides for computational efficiency and tractability.
In addition, in an effort to push the integration of the technologies we have identified, we made our results available to DARPA AITS architecture projects, published technical reports and papers, and made our results available to both W3C and OMG both via published results and presentations, and via participation in the technical activities of these groups.
This work, which integrates data, metadata, and object capabilities from both database and key emerging Web technologies, will be crucial in integrating object service, Web, and database technologies in a deep and efficient manner to support increasingly-demanding enterprise-scale applications.
The World Wide Web Consortium (W3C) Resource Description Framework (RDF) effort <http://www.w3.org/Metadata/RDF/> extends the PICS technology for labeling Internet content to support more general metadata requirements. Related work includes Netscape's Meta Content Framework (MCF) <http://www.w3.org/TR/NOTE-MCF-XML/> and Microsoft's XML-Data <http://www.w3.org/TR/1998/NOTE-XML-data>. These efforts define what are effectively metadata type systems, based on collections of attribute/value pairs. They provide a core of good ideas for supporting metadata, such as explicit links from pages representing resources to metadata describing them. However, there are important differences among the various approaches, and the approaches are not integrated with other parallel work, such as the Document Object Model (see below).
The RDF and related work define mappings to the Extensible Markup Language (XML) <http://www.w3.org/XML/>, a W3C Recommendation (adopted specification). XML, which is a subset of SGML, allows creation of customized markup languages incorporating user-defined tags and a standardized way of describing those languages (DTDs) that can be understood by generalized clients. XML thus provides direct support for using tagged data items (attribute/value pairs) in Web resources, as opposed to the current need to use ad hoc encodings of data items in terms of HTML tags. XML DTDs are similar in some ways to database schemas, and thus provide a natural target for database information. The linking of resources with their DTDs is similar to the association of a database record with its schema type, and to the association of an object with its type or class definition. The hypertext linking capabilities of XML are greater than those of HTML, including bidirectional and multiway links, and links to spans of text. Work is also underway on tying XML to Java. XML has considerable industry support (e.g., both Netscape and Microsoft). However, XML provides only basic tagged value support. Additional concepts must be added to apply it to extended data and metadata structuring requirements (as illustrated by RDF and related efforts).
W3C's Document Object Model (DOM) effort <http://www.w3.org/DOM/>, based on Dynamic HTML facilities defined by Microsoft and Netscape, extends HTML with an object model allowing scripts or programs to change styles and attributes of page elements (or objects) or even to replace existing elements (or objects) with new ones. This provides a basic way to integrate a page's data with code in the page and provides an explicit metalevel and API. Current W3C specifications provide a DOM for XML as well as for HTML. However, as currently defined, these capabilities are not sufficiently tailorable or general. For example, current specifications lack support for integrating code not co-located on the page (e.g., code that already exists on the client) or for defining application-specific objects based on data on the page, and the work is currently not integrated with metadata work such as RDF.
Stanford's Tsimmis Object Exchange Model (OEM) and related work by others (e.g., U. Penn.) have also based metadata models on collections of attribute/value pairs, together with extensions such as reifying individual attributes by assigning identifiers to them. This work provides a valuable core of ideas for applying database concepts to this type of data. However, the metadata capabilities of these structures are somewhat limited. They do not explicitly consider capturing type and schema information where it exists, or linking that type information to the structures it describes. The work is also not well integrated with emerging Web technologies such as XML, DOM, and RDF that are likely to change the basic nature of the Web's representation. Finally, an assumption behind these database approaches so far, which in part explains their limited technical success, has been that the problem they address is to query largely syntactically structured text bases, the kinds supported by HTML. XML-based approaches provide a higher level, more semantic representational structure, which can start with the assumption that information authors themselves have support to provide more semantic structure information.
Finally, the OMG has identified a number of requirements similar to those found in the context of the Web. An example is a recent Tagged Data RFP. These requirements involve the use of tagged data items to support semantics-based information exchange between applications, and also support for nesting and the ability to locate objects via tags through layers of nesting. Such high-level communication is considered important in OMG's attempts to define Business Object capabilities. OMG's Property Service provides similar capabilities. These are of interest in showing the recognized need for data organizations, similar to those described above, within OMG's object-oriented distributed architecture. However, these are not yet fully coordinated with emerging Web or database representations.
We completed a Technical Report Towards a Web Object Model <http://www.objs.com/OSA/wom.htm>. This report :
This Technical Report has been widely read on the Web. As a result, we were asked to write the following invited papers:
In conjunction with the OSA/Intermediary Architecture subproject, we also developed a prototype of an extended XML parser which can generate application-specific objects from XML documents, in order to experiment with one form of Web object construction mechanism. This prototype uses XML-defined metadata added to XML documents to define associations between object classes and the XML elements in the document. A White Paper <http://www.objs.com/OSA/XML-to-Java-Mapping.html> describing this work was also completed.
We also helped form a Web/OMA Integration Working Group of the OMG Internet SIG <http://www.objs.com/isig/home.htm>, with the general goals of:
Our participation in W3C activities has been only moderate (although OBJS is a W3C member), but we have submitted input on coordinating the various W3C metadata-related activities, and participated in technical interchanges on W3C-related email lists.
There is a need to better understand how standards for defining representations, such as XML, and standards for defining interfaces, such as CORBA, can be used together in providing enhanced interoperability. Distributed object architectures such as CORBA have tended to emphasize interface standards, while the Internet has tended to emphasize representation standards. However, the two approaches are clearly complementary, examples being the role of IIOP in providing CORBA interoperability, and the role of the DOM (essentially a set of interfaces) in providing a means to add behavior to Web pages. Moreover, the two forms of standards will increasingly be used together as, for example, CORBA-based systems increasingly deal with data in domain-specific standard representations.
The concept of "objects" in the context of the Web should not necessarily be identical to that of "objects" in a programming language or conventional distributed object system. The Web generally supports a philosophy of "loose-coupling" (e.g., of data and processing), which makes it highly flexible. This essential flexibility should be preserved in the Web's further technical development, given the diversity and heterogeneity both of the applications the Web must support, and the data and processing resources the Web makes available for possible integration. This means, among other things, that technology integration must be modular, and it must be possible to easily alter connections between data and processing resources to adapt to new requirements. The general approach we have identified attempts to take these requirements into consideration.
The Web's standards process is in many respects still maturing. The W3C has made tremendous progress, and done some outstanding technical work, but the incorporation of this work into widely-available commercial products is somewhat spotty. This is to some extent the result of the fact that the demand for standards compliance is still rather lacking as compared with the demand for new features. The increasing use of the Web for larger-scale and enterprise-critical applications will create much of the required pressure for standards compliance.
Another obvious next step is the development of database-like capabilities based on Web technologies such as XML, RDF, and DOM. We had originally intended to work on such capabilities (in particular, query facilities) in this project. However, we did not pursue this activity due to a decision to concentrate on Web/object integration, as providing the basic foundation for this and other work. The database community has defined extended query facilities (e.g., Lorel, UnQL) to support their semistructured data representations. The database community has also developed query facilities, together with formal underpinnings, for SGML structures (e.g., OQL-doc). Developments of this type of technology have begun to address Web requirements, e.g., the recent XML-QL submission to W3C <http://www.w3.org/TR/NOTE-xml-ql>, but further work is required in this area. Query-like capabilities also play important roles in both formatting specifications (a limited query notation for identifying parts of SGML structures called SDQL exists within the ISO DSSSL standard for formatting SGML documents, and a similar notation exists within XSL) and more advanced Web addressing mechanisms (e.g., the XML linking capabilities). The possible integration of these query capabilities is worth investigating.
This project has identified the foundational basis for supporting more complex data structures and services in the Internet without requiring major departures from current emerging Web technology. The work also provides guidance toward rationalizing further developments within the Web and OMG communities for better-integrating Web and object technologies.
A program immediately benefiting from this project would be the DARPA AITS Architecture project, especially its Webserver component, since the approach we have identified is expected to provide the benefits of the current idiosyncratic Webserver architecture but in a form compatible with emerging industry standards. More broadly, our approach provides a sound direction for combining Web and object technologies into a richer knowledge-based representation, which should benefit both a knowledge-based Web and enterprise computing.
© Copyright 1997, 1998 Object Services and Consulting,
Inc. Permission is granted to copy this document provided this copyright
statement is retained in all copies. Disclaimer: OBJS does not warrant
the accuracy or completeness of the information in this document.
Last revised: September 15, 1998. Send comments
to Frank Manola.