9902-ontology-draft

Notes on ontologies, (and their relevance to service trading in an internet service market)

Venu Vasudevan

Note about the title: At this point it is more the former than the latter, but we'll get there

Why the interest in ontologies

To paraphrase [Chandra], "the current interest in ontologies is the latest version of AI's alternation of focus between mechanism theories and content theories. The ontology realization amounts to the fact that no matter what magic you have in the problem solver (fuzzy logic, neural nets, frame language etc.), it cannot do much good without a content theory of the domain. Further, given a good content theory, different mechanisms may be used to build effective systems". Similar statements have been independently realized by the software engineering community, the database community, the workflow community and others. Domain modelling has become an important research topic in software engineering because tools with different domain models can't talk to each other since they say the same thing in different (incompatible) ways. To the database community, it is hard to integrate data from databases that aren't consistent about how they model the world. Multiple workflow systems need to agree on what concepts such as processes, actors, concepts of resource "usage" by activities etc. so as to exchange workflow and worflow execution information across workflow agents (thus PIF defines a workflow ontology). All these are versions of the ontology problem amongst human/programmed agents.

Ontologies: Various perspectives

There is an ontology problem with ontologies. While everybody agrees ontologies are important, there is debate about wher the dividing line is between ontologies and a number of other approaches (e.g object models) of representing concepts and conceptualization. The most frequently quoted definition of ontologies is that by Gruber: that an ontology is a specification of a conceptualization. The quibbling arises when you dive deeper and ask: "how formal or rich does this specification need to be before one can call it an ontology". The AI community views ontologies as formal logical theories whereby you are not only defining terms and relationships, but formally defining the context in which the term(relationship) applies, and implied facts and relationships. These ontological theories are formal enough to be testable for soundness and completeness by theorem provers. An AI ontology approach will be able to fully define an "attack aircraft" as - " An attack aircraft is a fixed wing aircraft that has been assigned to a combat mission that is one of those in the group SEAD". In contrast, databases and other communities view ontologies more as object models, taxonomies and schemas, and do not explicitly express constraints such as the context in which a fixed wing aircraft turns into an attack aircraft. Linguistic ontologies (e.g. WordNet) and thesaurii express various relationships between concepts (e.g synonymy, antonymy, is_a, contains_a), but do not explicitly and formally describe what a concept means.

In addition to the formailty and rigor dimensions, ontologies can be classified along dimensions of coverage, guiding principle, and point of view. Upper ontologies such as CYC aim to cover concepts that are common across domains (e.g. common sense reasoning in the case of CYC), while domain ontologies focus on a single domain (evacuation operations, medicine etc.). The point of view of an ontology is the kinds of concepts for which the ontology forms a theory. The point of view of an ontology need not be a domain (e.g mission planning). For instance, problem solving ontologies describe the strategies taken by problem solvers to attack domain problems, and theory ontologies might describe concepts of space, time, causality, plans etc . The guiding principle of an ontology is the principle by which the concepts and relationships in the ontology were chosen. WordNet and other ontologies use linguistics as a guide for identifying the concepts that should be in the ontology (particularly the upper level concepts). Linguistics is not the only guidance that could be used. It might be possible, for example, to use conceptual clustering as a guide for ontology structure.

How are ontologies used

One of the first motivators for ontologies was to share domain knowledge across problem solvers. As Patil points out, if two problem solvers use different conceptualizations of the same world, there can be no sharing of the knowledge bases, and mapping concepts from one KB to the other can be tricky business. So, the idea was to use logics (e.g KIF) to define an ontological theory, which could then be translated to the particular problem solver's formalism (KEE, CLIPS, OPS-5 etc.). For this kind of knowledge reuse, the ontology needs to be semantically rich.

Even if a single kind of problem solver (e.g KEE) is being used, ontologies are deemed as a useful knowledge structuring mechanism. A whole bunch of KEE users could design their knowledge base so it consists of an ontology (a shared KB with only the sharable assertions) and non-sharable sets of facts and assertions that apply only to a particular situation that not everybody is interested in.

Closer to the world of tool integration, enterprise integration people realized that tools used internal models of the enterprise that were intersecting and incompatible. An enterprise might use multiple workflow systems, each one of which used the concepts "resource", "completion time", "owner of a step" etc. in different ways, or didn't have a common terminology. The solution was to a) define an enterprise ontology b) exchange results between workflow systems A and B by translating A's state to the enterprise ontology which was then read in by B. The ontology became a common format (an interlingua), with diverse models providing reader of and writers to this ontology.

A more basic version of the interlingua idea is the database view of ontologies. Here information sources, their capabilities, and their pedigrees are mapped to a common ontology (e.g. UML). This then allows these repositories to be federated by a federation system. A slew of mediator based multi-database architectures thus use ontologies in one form or another.

Ontologies: Principles, Methods and Applications (Mike Uschold, Michael Gruninger)

People and programs use different models of the world in solving the same problem.This causes problems in both people and programmatic inter-communication. Ontologies specify a domain/world model, often in concepts/relationships/processes/... lingo, which if expressed (and pointed to) explicitly, will allow programs and people to avoid misunderstandings. Ontology is an overloaded term, and everything from object models to database schemas to logical theories (not just objects and relationships, but assertions and inferences) serve an ontological purpose for a given domain and given set of applications. For a single domain, and a narrow set of problems, a relation db schema is an ontology in that applications that share the schema "view of the world" can interoperate. As you try to build a knowledge model that cuts across domains and unifies radically different world views (say a logistics application and a military course-of-action application), concepts, relationships and contexts get hairy, you might have to "say a lot to say what you mean". In this multi-domain situation, one might need to not only define concepts but contexts as well, in some formal manner.

Ontologies have several different uses. To those who build AI problem solvers, it allows to problem solvers to reuse each other's knowledge base, or at least minimize the pain of translation (Patil makes the same point). If two knowledge bases agree that they mean the same thing when they talk about "strut", the problem of translating from a CLIPS KB to a KEE KB is simplified to a data translation problem (as opposed to a "gee are we even talking about the same thing" problem). To repository people, an ontology provides common representation into which they can translate their repositories, thus avoiding the O(n²) translator problem.

Clearly, if the ontology is the glue to all your tools, building a bad ontology is a bad idea. The rest of this paper talks about guidelines to building good ontologies, which is not of immediate interest to me.

Scalable Knowledge Composition (Gio's work - see http://www-db.stanford.edu/LIC/SK.html)

Gio has a series of papers on the topic resolving semantic heterogeneity in Information Systems. These include:

"An Algebra for Ontology Composition"
"Interoperation, Mediation and Ontologies"
"Composing Diverse Ontologies" (IFIP paper)
Large Scale Information Systems: Current Research and Future Opportunities (presentation at an I**3 workshop)

The collective summary of these papers is below.

The semantics of informations sources are captured by their ontologies (i.e the terms and relationships they use = their domain of discoruse). To support the coherent querying of multiple overlapping information sources, we need to use ontologies to understand and compensate for the overlap of "world views", or actual data between the information sources. Integrating information from different sources without an understanding of ontologies will lead to duplicate data that just looks different, missing data that is actually there in the information source, multiple inconsistent views of the same information (e.g same information different fidelities) , information that belongs on different points in the timeline etc. So "unintelligent I**2" will give you more data, less knowledge, and lead you to make wrong conclusions.

Gio assumes that ontologies are pre-developed for non-collaborating repositories. These ontologies define not only the concepts that model the repository content (e.g shoes, their manufacturers etc.) but also the pedigree (is the information authoratiative, how recent is it), wrapper smarts etc. Given that an I**3 application has to be built on multiple non-collaborating repositories (each with its own ontology, the ontology being relevant in a context), we have an ontology composition problem, which is what Gio is addressing. So, continuing, contexts provide gurantees about the exported knowledge, and the inferences feasible over the knowledge. Context would then include stuff like: schema of the source, supported queries, pedigree of the data (and/or authoritativeness of the data provider), latency and accuracy of data etc. The application now has to operate across a third ontology (with its own context) which is some subset of the combined ontologies of the information sources it is operating on. Gio proposes an ontology algebra (with set operations) by which an application ontology can be defined over multiple resource ontologies using set operators. The papers deal with implementing such an algebra using lower level rules that relate concepts in one ontology with concepts in the other (e.g. factory shoe color number #43423 is shoe store color "pink").

Anyway, the execution of these operators (which are a bunch of rules underneath) allows the application to:

reason about data in a common encoding
reason about the same data using the same label
explicitly prioritize one information source as more authoratitative about a particular fact than another
reason about information conflicts between data sources, as the pedigree of these sources is explicitly encoded.

The Ontology of Tasks and Methods (B.Chandrasekharan et al.)

The current interest in ontologies is the latest version of AI's alternation of focus between mechanism theories and content theories. The ontology realization amounts to the fact that no matter what magic you have in the problem solver (fuzzy logic, neural nets, frame language etc.), it cannot do much good without a content theory of the domain. Further, given a good content theory, different mechanisms may be used to build effective systems. E.g. if you model students and employees as "is_a" humans, then you will draw wrong conclusions, as opposed to modelling them as "roles_of" humans. A bad ontology can make the reaonser draw wrong conclusions. A good ontology can be reused across problem solvers.

The rest of the paper makes a distinction between "domain ontology" (what do you know about), and "problem solving ontology" (strategies you use to solve a problem using the domain ontology) and expands on the elements of a problem solving ontology.

Ontologies: Where are the killer apps (ECAI-98 Workshop on Applications of Ontologies and Problem-Solving Methods)

While ontology technology has been "ready" for a while, practical applications of ontologies are hanrd to pin down. Part of the reason is that ontologies come in many flavors with many uses, and nobody has categorized the (ontology application) design space so that people can index their ontology applications into this space. Once such a design space is defined, and applications are slotted into this space, we can get a perspective of the common uses of ontologies, and why they are not being used in more ambitious ways. Uschold proposes the following design space:

Category	Variations
Purpose	Knowledge reuse, interoperability between heterogenous software applications, reduce s/w maintenance costs
Formality	Is the ontology a taxonomy(object model) or highly formal specification of the meaning of terms
Breadth (subject matter)	Narrow domain ontology or broad upper ontology
Scale	Ontology size - 100, 1000, million concepts?
Conceptual Architecture	Is the ontology used for repository federation, as an interchange language for multiple KBs ....?
Mechanisms	what are the operators used on the ontologies and why (inferences, articulation rules, tracing mapping etc.). This clarifies how the ontology adds value.

Uschold's view is that AI applications of ontologies are few, and most fielded applications are in databases, Corba and workflows. There are growing applications of ontologies in query term expansion (closest to trader) and cluster purification. Although data warehouses are not viewed as ontology applications, they could be. Based on his experience uschold explains the lack of fielded ontologies by the fact that applications have to be very large before an ontology can be justified from a cost viewpoint. In the case of data translation, until you have at least 4 different complex object models, it is more cost effective to write translators than to translate all these models into a common ontology.

Toward Distributed Use of Large-Scale Ontologies (Swartout, Patil et al. -USC-ISI)

Currently, people who build knowledge bases use not only differnt problem solvers, but model the same domain differently. See for example, two models of the same concept "strut" used by different knowledge bases. Such diversity makes it hard to share (or merge) knowledge bases, as there is a mismatch in the intermediate concepts. Knowledge becomes more shareable if knowledge bases addressing the same problem share a common domain model skeletal structure, i.e an ontology. This paper deals with how to build a large ontology. It proposes some guiding principles and a methodology.

The guiding principles are:

domain models are hard to build. model only concepts that (and only that) which is required to solve your problem(s).
ontology is not one big thing. should be extensible
there are many good ways to model the same concepts (e.g should a relation be a reified object or not). The problem solving methods that will use the ontology should drive which approach you take
use an organizing principle (e.g linguistics, concept clustering) to choose a coherent set of concepts.

How to build an ontology, say for a family of air campaing planning operations:

build an "upper ontology" , i.e one that is domain-independent

each concept here maps to a "word sense" - e.g strut is a swagger or a mechanical brace.
extract such concepts semi-automatically by querying electronic information sources (WordNet, English Dictionaries etc.)
no single source is perfect (not even WordNet), so there is an extract and merge methodology
Note: This upper ontology acts as a) a skeletong for domain ontologies b) a "hinge" across multiple domain ontologies (e.g transportation and campaign planning)

build a merged ontology = adding a domain ontology to the upper ontology (e.g a military air campaign ontology)

a simple glomming of domain terms will lead to a huge ontology
so, use the following "slice and expand" approach
domain experts first select some number of "seed" terms in the upper ontology that are important for the domain
all paths from seeds to upper ontology root are included.
extra military concepts may be added as intermediate noeds
some nodes may be expanded to include their entire subtrees

SHOE papers (TB summarized)

Notes

Claim here that RDF has fewer inferential capabilities than SHOE, and is limited to binary relations. (Don't believe the latter is true any more)
Shoe allows Ontology inheritance. A

Jeff Heflin, Jim Hendler, and Sean Luke.Reading Between the Lines: Using SHOE to Discover Implicit Knowledge from the Web To be presented at AAAI-98 Workshop on AI and Information Integration.

Sean Luke, Lee Spector, David Rager, and Jim Hendler.Ontology-based Web Agents In Proceedings of First International Conference on Autonomous Agents 1997, AA-97.

The WordNet papers (TB summarized)

WordNet .vs. a dictionary - WN divides the lexicon into nouns,adj,verbs, adverbs. More like a thesaurus in that taxonomy by word meaning than form

Wordnet: (TBD)

A semantic network organized around:

synsets ( = sets of synonymous words).
basic semantic relations between these synsets (synonym and antonym, hyperonym/hyponym or is_a, meronym, or has_a)
morphological relations (stemming - trees is plural of tree)

Each wordnet

Ontology.org notes

The Role of Shared Ontology in XML-Based Trading Architectures
The main barrier to electronic commerce lies in the need for applications to share information, not in the reliability or security
of the Internet. Because of the wide range of enterprise and electronic commerce systems deployed by businesses and the way these systems are configured, the problem is particularly acute among large electronic trading groups, yet this
is precisely where the greatest return on investment (RoI) can be achieved. While companies are beginning to organise, standardise and stabilise their digital services in order to create and maintain sustainable network relationships with their trading partners, they are doing this only in conjunction with their immediate trading partners. This severely limits the RoI opportunities.

RosettaNet

RosettaNet The lack of electronic business interfaces in the IT supply chain puts a huge burden on manufacturers, distributors, resellers, and end-users, ultimately creating tremendous inefficiencies and ultimately inhibiting our ability to leverage the Internet as a business-to-business commerce tool. Here are a few examples:

Manufacturers - No standardized way to make inventory queries to guess partner's inventory levels.impacts production planning, channel allocation, and the cost of returns
Distributors, who provide pre- and post-sale technical support to their

resellers on tens of thousands of SKUs, must grapple with disparate

forms of product information collected from hundreds of manufacturers

Resellers must learn and maintain different ordering/return procedures

End-users have no mechanism enabling effective procurement through uniform templates

What is missing in order to scale eBusiness are the "dictionaries," the "framework," the "Partner Interface Processes - PIPs" and the "eBusiness processes."
Note: RosettaNet has standard properties specifications for laptops, memory, s/w etc.

Ontology Problems

Consider a domain in which there are people, some of whom are students, some professors, some other type of employees, some females and some males. For quite some time, a simple ontology was used in which the classes of students,employees, professors, males and females were represented as ``types of'' humans. Soon this caused problemsbecause it was noted that students could also be employees at times and can also stop being students. Databasesbuilt using the simple ontology could not make simple inferences that one would expect to be able to make given the knowledge base. Further ontological analysis showed that ``students,'' ``employees,'' etc. are not ``types-of''humans, but rather they were ``roles'' that humans can play, unlike terms such as ``females,'' which were in fact a``type-of'' humans.
>From http://www.cs.umbc.edu/agents/humor/ontology.html

Attributed by Washington Technology (a beltway industry paper) to James Schlesinger (a senior DoD Executive) froma recent Washington DC luncheon keynote address; (remarks are paraphrased to some degree). "In managing the DoD there are many unexpected communications problems: For instance, when the Marines are ordered to "secure a building" , they form a landing party and assault it. On the other hand, the same instructions will lead the Army to occupy the building with a troop of infantry, and the Navy will characteristically respond by sending a yeoman to assure that the building lights are turned out. When the Air Force acts on these instructions, what results is a 'three year lease with option to purchase'."

ISI(Patil) example of how two kbs conceptualize the same thing differently without ontologies

Attic

ontology.vs.kb: an ontology contains a description (or "theory") about a domain, but no problem solving knowledge. A kb will contain some of the latter as well. (from Gruber).
Should ontologies be dependent on the tasks they are meant to facilitate:

Chandra: An ontology of the domain of fruits would focus on some aspects of reality if it is being written for selecting pesticides, and on different aspects if it is being written to help chefs select fruits for cooking

What do they do for us

provide a context for people and web agents to interpret terms ("take" in a medicine ontology means to consume medicine, while it means "take a class" in a university-ontology)
provides a concept taxonomy, used to generalize or specialize a query (is_a)
allows inferences to be defined. Take an RDF assertion ( = horn clause). If rhs is all true, then lhs can be claimed.

Chandra:

A representation vocabulary, typically specialized to some domain or subject matter. More precisely, it is not the vocabulary as such that qualifies as an ontology, but the conceptualizations that the terms in the vocabulary are intended to capture. For example, in engineering design, one might talk about the ontology of the domain of electronic devices. Such an ontology might have elements such as ``transistors,'' ``operational amplifiers,'' ``voltages,'' and so on, and relations between these elements, such as one class of devices is a subtype or a part of another, or that certain terms are properties of certain devices. Identifying such terms --and the underlying conceptualizations-- generally requires careful analysis of the kinds of objects andrelations that can exist in the domain. In fact, in what has come to be called ``Upper Ontologies'' --i.e., ontologies that describe generic knowledge that holds across many fields-- the analysis required to establish the ontologies is a major research challenge.
http://www.w3.org/Conferences/WWW4/Panels/krp/macgregor.html Formal descriptions permit one to draw arbitrarily-fine distinctions between pairs of information items and they permit automatic

categorization, both of which will be needed to manage very large taxonomies. They also provide the representational framework needed to

generate "virtual nodes" used to reduce fan-out. Information retrieval techniques that introduce attribute-value pairs partially meet the same

goals as our descriptions.