Notes on ontologies, (and their relevance to service
trading in an internet service market)
Venu Vasudevan
Note about the title: At this point it is more
the former than the latter, but we'll get there
Why the interest in ontologies
To paraphrase [Chandra], "the
current interest in ontologies is the latest version of AI's alternation
of focus between mechanism theories and content theories. The ontology
realization amounts to the fact that no matter what magic you have in the
problem solver (fuzzy logic, neural nets, frame language etc.), it cannot
do much good without a content theory of the domain. Further, given a good
content theory, different mechanisms may be used to build effective systems".
Similar statements have been independently realized by the software engineering
community, the database community, the workflow community and others.
Domain modelling has become an important research topic in software engineering
because tools with different domain models can't talk to each other since
they say the same thing in different (incompatible) ways. To the database
community, it is hard to integrate data from databases that aren't consistent
about how they model the world. Multiple workflow systems need to agree
on what concepts such as processes, actors, concepts of resource "usage"
by activities etc. so as to exchange workflow and worflow execution information
across workflow agents (thus PIF
defines a workflow ontology). All these are versions of the ontology
problem amongst human/programmed agents.
Ontologies: Various perspectives
There is an ontology problem with ontologies. While
everybody agrees ontologies are important, there is debate about
wher the dividing line is between ontologies and a number of other approaches
(e.g object models) of representing concepts and conceptualization. The
most frequently quoted definition of ontologies is that by Gruber: that
an
ontology is a specification of a conceptualization. The quibbling arises
when you dive deeper and ask: "how formal or rich does this specification
need to be before one can call it an ontology". The AI community views
ontologies as formal logical theories whereby you are not only defining
terms and relationships, but formally defining the context in which the
term(relationship) applies, and implied facts and relationships.
These ontological theories are formal enough to be testable for soundness
and completeness by theorem provers. An AI ontology approach will be able
to fully define an "attack aircraft" as - " An attack aircraft is a fixed
wing aircraft that has been assigned to a combat mission that is one of
those in the group SEAD". In contrast, databases and other communities
view ontologies more as object models, taxonomies and schemas, and do not
explicitly express constraints such as the context in which a fixed wing
aircraft turns into an attack aircraft. Linguistic ontologies (e.g.
WordNet) and thesaurii express various relationships between concepts (e.g
synonymy, antonymy, is_a, contains_a), but do not explicitly and
formally describe what a concept means.
In addition to the formailty and rigor dimensions,
ontologies can be classified along dimensions of coverage, guiding
principle, and point of view. Upper ontologies such as
CYC aim to cover concepts that are common across domains (e.g. common sense
reasoning in the case of CYC), while domain ontologies focus on a single
domain (evacuation operations, medicine etc.). The point of view
of an ontology is the kinds of concepts for which the ontology forms a
theory. The point of view of an ontology need not be a domain (e.g mission
planning). For instance, problem solving ontologies describe the
strategies taken by problem solvers to attack domain problems, and theory
ontologies
might describe concepts of space, time, causality, plans
etc . The guiding principle of an ontology is the principle by which the
concepts and relationships in the ontology were chosen. WordNet and
other ontologies use linguistics as a guide for identifying the concepts
that should be in the ontology (particularly the upper level concepts).
Linguistics is not the only guidance that could be used. It might be possible,
for example, to use conceptual clustering as a guide for ontology structure.
How are ontologies used
One of the first motivators for ontologies was to
share domain knowledge across problem solvers. As Patil
points
out, if two problem solvers use different conceptualizations of the same
world, there can be no sharing of the knowledge bases, and mapping concepts
from one KB to the other can be tricky business. So, the idea was to use
logics (e.g KIF) to define an ontological theory, which could then be translated
to the particular problem solver's formalism (KEE, CLIPS, OPS-5 etc.).
For this kind of knowledge reuse, the ontology needs to be semantically
rich.
Even if a single kind of problem solver (e.g KEE)
is being used, ontologies are deemed as a useful knowledge structuring
mechanism. A whole bunch of KEE users could design their knowledge base
so it consists of an ontology (a shared KB with only the sharable assertions)
and non-sharable sets of facts and assertions that apply only to a particular
situation that not everybody is interested in.
Closer to the world of tool integration, enterprise
integration people realized that tools used internal models of the enterprise
that were intersecting and incompatible. An enterprise might use multiple
workflow systems, each one of which used the concepts "resource", "completion
time", "owner of a step" etc. in different ways, or didn't have a common
terminology. The solution was to a) define an enterprise ontology b) exchange
results between workflow systems A and B by translating A's state to the
enterprise ontology which was then read in by B. The ontology became a
common format (an interlingua), with diverse models providing reader
of and writers to this ontology.
A more basic version of the interlingua idea is
the database view of ontologies. Here information sources, their capabilities,
and their pedigrees are mapped to a common ontology (e.g. UML). This then
allows these repositories to be federated by a federation system.
A slew of mediator based multi-database architectures thus use ontologies
in one form or another.
Ontologies: Principles, Methods and Applications
(Mike Uschold, Michael Gruninger)
People and programs use different models of the world
in solving the same problem.This causes problems in both people and programmatic
inter-communication. Ontologies specify a domain/world model, often in
concepts/relationships/processes/... lingo, which if expressed (and pointed
to) explicitly, will allow programs and people to avoid misunderstandings.
Ontology is an overloaded term, and everything from object models to database
schemas to logical theories (not just objects and relationships, but assertions
and inferences) serve an ontological purpose for a given domain and given
set of applications. For a single domain, and a narrow set of problems,
a relation db schema is an ontology in that applications that share the
schema "view of the world" can interoperate. As you try to build a knowledge
model that cuts across domains and unifies radically different world views
(say a logistics application and a military course-of-action application),
concepts, relationships and contexts get hairy, you might have to "say
a lot to say what you mean". In this multi-domain situation, one might
need to not only define concepts but contexts as well, in some formal manner.
Ontologies have several different uses. To those
who build AI problem solvers, it allows to problem solvers to reuse each
other's knowledge base, or at least minimize the pain of translation (Patil
makes the same point). If two knowledge bases agree that they mean the
same thing when they talk about "strut", the problem of translating from
a CLIPS KB to a KEE KB is simplified to a data translation problem (as
opposed to a "gee are we even talking about the same thing" problem). To
repository people, an ontology provides common representation into
which they can translate their repositories, thus avoiding the O(n2)
translator problem.
Clearly, if the ontology is the glue to all your
tools, building a bad ontology is a bad idea. The rest of this paper talks
about guidelines to building good ontologies, which is not of immediate
interest to me.
Scalable Knowledge Composition (Gio's work - see
http://www-db.stanford.edu/LIC/SK.html)
Gio has a series of papers on the topic resolving
semantic heterogeneity in Information Systems. These include:
-
"An Algebra for Ontology Composition"
-
"Interoperation, Mediation and Ontologies"
-
"Composing Diverse Ontologies" (IFIP paper)
-
Large Scale Information Systems: Current Research
and Future Opportunities (presentation at an I**3 workshop)
The collective summary of these papers is below.
The semantics of informations sources are captured
by their ontologies (i.e the terms and relationships they use = their domain
of discoruse). To support the coherent querying of multiple overlapping
information sources, we need to use ontologies to understand and compensate
for the overlap of "world views", or actual data between the information
sources. Integrating information from different sources without an understanding
of ontologies will lead to duplicate data that just looks different, missing
data that is actually there in the information source, multiple inconsistent
views of the same information (e.g same information different fidelities)
, information that belongs on different points in the timeline etc. So
"unintelligent I**2" will give you more data, less knowledge, and
lead you to make wrong conclusions.
Gio assumes that ontologies are pre-developed
for non-collaborating repositories. These ontologies define not only the
concepts that model the repository content (e.g shoes, their manufacturers
etc.) but also the pedigree (is the information authoratiative, how recent
is it), wrapper smarts etc. Given that an I**3 application has to be built
on multiple non-collaborating repositories (each with its own ontology,
the ontology being relevant in a context), we have an ontology
composition problem, which is what Gio is addressing. So, continuing,
contexts provide gurantees about the exported knowledge, and the inferences
feasible over the knowledge. Context would then include stuff like: schema
of the source, supported queries, pedigree of the data (and/or authoritativeness
of the data provider), latency and accuracy of data etc. The application
now has to operate across a third ontology (with its own context) which
is some subset of the combined ontologies of the information sources it
is operating on. Gio proposes an ontology algebra (with set operations)
by which an application ontology can be defined over multiple resource
ontologies using set operators. The papers deal with implementing such
an algebra using lower level rules that relate concepts in one ontology
with concepts in the other (e.g. factory shoe color number #43423 is shoe
store color "pink").
Anyway, the execution of these operators (which
are a bunch of rules underneath) allows the application to:
-
reason about data in a common encoding
-
reason about the same data using the same label
-
explicitly prioritize one information source as more
authoratitative about a particular fact than another
-
reason about information conflicts between data sources,
as the pedigree of these sources is explicitly encoded.
The Ontology of Tasks and Methods
(B.Chandrasekharan et al.)
The current interest in ontologies is the latest
version of AI's alternation of focus between mechanism theories and content
theories. The ontology realization amounts to the fact that no matter
what magic you have in the problem solver (fuzzy logic, neural nets, frame
language etc.), it cannot do much good without a content theory of the
domain. Further, given a good content theory, different mechanisms may
be used to build effective systems. E.g. if you model students and employees
as "is_a" humans, then you will draw wrong conclusions, as opposed to modelling
them as "roles_of" humans. A bad ontology can make the reaonser draw wrong
conclusions. A good ontology can be reused across problem solvers.
The rest of the paper makes a distinction between
"domain ontology" (what do you know about), and "problem solving ontology"
(strategies you use to solve a problem using the domain ontology) and expands
on the elements of a problem solving ontology.
Ontologies: Where are the killer apps (ECAI-98 Workshop
on Applications of Ontologies and Problem-Solving Methods)
While ontology technology has been "ready" for a
while, practical applications of ontologies are hanrd to pin down. Part
of the reason is that ontologies come in many flavors with many uses, and
nobody has categorized the (ontology application) design space so that
people can index their ontology applications into this space. Once such
a design space is defined, and applications are slotted into this space,
we can get a perspective of the common uses of ontologies, and why they
are not being used in more ambitious ways. Uschold proposes the following
design space:
Category |
Variations |
Purpose |
Knowledge reuse, interoperability between heterogenous
software applications, reduce s/w maintenance costs |
Formality |
Is the ontology a taxonomy(object model) or highly
formal specification of the meaning of terms |
Breadth (subject matter) |
Narrow domain ontology or broad upper ontology |
Scale |
Ontology size - 100, 1000, million concepts? |
Conceptual Architecture |
Is the ontology used for repository federation,
as an interchange language for multiple KBs ....? |
Mechanisms |
what are the operators used on the ontologies
and why (inferences, articulation rules, tracing mapping etc.). This clarifies
how the ontology adds value. |
|
|
Uschold's view is that AI applications of ontologies
are few, and most fielded applications are in databases, Corba and workflows.
There are growing applications of ontologies in query term expansion (closest
to trader) and cluster purification. Although data warehouses are not viewed
as ontology applications, they could be. Based on his experience uschold
explains the lack of fielded ontologies by the fact that applications have
to be very large before an ontology can be justified from a cost viewpoint.
In the case of data translation, until you have at least 4 different complex
object models, it is more cost effective to write translators than to translate
all these models into a common ontology.
Toward Distributed Use of Large-Scale
Ontologies (Swartout, Patil et al. -USC-ISI)
Currently, people who build knowledge bases use not
only differnt problem solvers, but model the same domain differently. See
for example, two models of the same concept "strut"
used by different knowledge bases. Such diversity makes it hard to share
(or merge) knowledge bases, as there is a mismatch in the intermediate
concepts. Knowledge becomes more shareable if knowledge bases addressing
the same problem share a common domain model skeletal structure, i.e an
ontology. This paper deals with how to build a large ontology. It proposes
some guiding principles and a methodology.
The guiding principles are:
-
domain models are hard to build. model only concepts
that (and only that) which is required to solve your problem(s).
-
ontology is not one big thing. should be extensible
-
there are many good ways to model the same concepts
(e.g should a relation be a reified object or not). The problem solving
methods that will use the ontology should drive which approach you take
-
use an organizing principle (e.g linguistics, concept
clustering) to choose a coherent set of concepts.
How to build an ontology, say for a family of air
campaing planning operations:
-
build an "upper ontology" , i.e one that is domain-independent
-
each concept here maps to a "word sense" - e.g strut
is a swagger or a mechanical brace.
-
extract such concepts semi-automatically by querying
electronic information sources (WordNet, English Dictionaries etc.)
-
no single source is perfect (not even WordNet), so
there is an extract and merge methodology
-
Note: This upper ontology acts as a) a skeletong
for domain ontologies b) a "hinge" across multiple domain ontologies (e.g
transportation and campaign planning)
-
build a merged ontology = adding a domain ontology
to the upper ontology (e.g a military air campaign ontology)
-
a simple glomming of domain terms will lead to a
huge ontology
-
so, use the following "slice and expand" approach
-
domain experts first select some number of "seed"
terms in the upper ontology that are important for the domain
-
all paths from seeds to upper ontology root are included.
-
extra military concepts may be added as intermediate
noeds
-
some nodes may be expanded to include their entire
subtrees
SHOE papers (TB summarized)
Notes
-
Claim here that RDF has fewer inferential capabilities
than SHOE, and is limited to binary relations. (Don't believe the latter
is true any more)
-
Shoe allows Ontology inheritance. A
-
Jeff Heflin, Jim Hendler, and
Sean Luke.Reading Between the Lines: Using SHOE to Discover Implicit Knowledge
from the Web To be presented at AAAI-98 Workshop on AI and Information
Integration.
-
Sean Luke, Lee Spector, David Rager, and Jim Hendler.Ontology-based
Web Agents In Proceedings of First International Conference on Autonomous
Agents 1997, AA-97.
The WordNet papers (TB summarized)
-
WordNet .vs. a dictionary - WN divides the lexicon
into nouns,adj,verbs, adverbs. More like a thesaurus in that taxonomy by
word meaning than form
Wordnet: (TBD)
A semantic network organized
around:
-
synsets ( = sets of synonymous words).
-
basic semantic relations between these synsets (synonym
and antonym, hyperonym/hyponym or is_a, meronym, or has_a)
-
morphological relations (stemming - trees is plural
of tree)
-
Each wordnet
Ontology.org notes
The
Role of Shared Ontology in XML-Based Trading Architectures
The main barrier to electronic commerce lies
in the need for applications to share information, not in the reliability
or security
of the Internet. Because of the wide range of
enterprise and electronic commerce systems deployed by businesses and the
way these systems are configured, the problem is particularly acute among
large electronic trading groups, yet this
is precisely where the greatest return
on investment (RoI) can be achieved. While companies are beginning to organise,
standardise and stabilise their digital services in order to create and
maintain sustainable network relationships with their trading partners,
they are doing this only in conjunction with their immediate trading partners.
This severely limits the RoI opportunities.
RosettaNet
RosettaNet
The lack of electronic business interfaces in the IT supply chain puts
a huge burden on manufacturers, distributors, resellers, and end-users,
ultimately creating tremendous inefficiencies and ultimately inhibiting
our ability to leverage the Internet as a business-to-business commerce
tool. Here are a few examples:
-
Manufacturers - No standardized way to make inventory
queries to guess partner's inventory levels.impacts production planning,
channel allocation, and the cost of returns
-
Distributors, who provide pre- and post-sale technical
support to their
resellers on tens of thousands of SKUs,
must grapple with disparate
forms of product information collected
from hundreds of manufacturers
Resellers must learn and maintain different ordering/return
procedures
-
End-users have no mechanism enabling effective procurement
through uniform templates
What is missing in order to scale eBusiness are the
"dictionaries," the "framework," the "Partner Interface Processes - PIPs"
and the "eBusiness processes."
Note: RosettaNet has standard properties specifications
for laptops, memory, s/w etc.
Ontology Problems
-
Consider a domain in which there are people, some
of whom are students, some professors, some other type of employees, some
females and some males. For quite some time, a simple ontology was used
in which the classes of students,employees, professors, males and females
were represented as ``types of'' humans. Soon this caused problemsbecause
it was noted that students could also be employees at times and can also
stop being students. Databasesbuilt using the simple ontology could not
make simple inferences that one would expect to be able to make given the
knowledge base. Further ontological analysis showed that ``students,''
``employees,'' etc. are not ``types-of''humans, but rather they were ``roles''
that humans can play, unlike terms such as ``females,'' which were in fact
a``type-of'' humans.
-
>From http://www.cs.umbc.edu/agents/humor/ontology.html
-
Attributed by Washington Technology (a beltway industry
paper) to James Schlesinger (a senior DoD Executive) froma recent Washington
DC luncheon keynote address; (remarks are paraphrased to some degree).
"In managing the DoD there are many unexpected communications problems:
For instance, when the Marines are ordered to "secure a building" , they
form a landing party and assault it. On the other hand, the same
instructions will lead the Army to occupy the building with a troop of
infantry, and the Navy will characteristically respond by sending
a yeoman to assure that the building lights are turned out. When the Air
Force acts on these instructions, what results is a 'three year lease with
option to purchase'."
-
ISI(Patil) example of how two kbs conceptualize the
same thing differently without ontologies
Attic
-
ontology.vs.kb: an ontology contains a description
(or "theory") about a domain, but no problem solving knowledge. A kb will
contain some of the latter as well. (from Gruber).
-
Should ontologies be dependent on the tasks they
are meant to facilitate:
-
Chandra: An ontology of the domain of fruits would
focus on some aspects of reality if it is being written for selecting pesticides,
and on different aspects if it is being written to help chefs select fruits
for cooking
-
What do they do for us
-
provide a context for people and web agents to interpret
terms ("take" in a medicine ontology means to consume medicine, while it
means "take a class" in a university-ontology)
-
provides a concept taxonomy, used to generalize or
specialize a query (is_a)
-
allows inferences to be defined. Take an RDF assertion
( = horn clause). If rhs is all true, then lhs can be claimed.
Chandra:
-
A representation vocabulary, typically specialized
to some domain or subject matter. More precisely, it is not the vocabulary
as such that qualifies as an ontology, but the conceptualizations that
the terms in the vocabulary are intended to capture. For example, in engineering
design, one might talk about the ontology of the domain of electronic devices.
Such an ontology might have elements such as ``transistors,'' ``operational
amplifiers,'' ``voltages,'' and so on, and relations between these elements,
such as one class of devices is a subtype or a part of another, or that
certain terms are properties of certain devices. Identifying such terms
--and the underlying conceptualizations-- generally requires careful analysis
of the kinds of objects andrelations that can exist in the domain. In fact,
in what has come to be called ``Upper Ontologies'' --i.e., ontologies that
describe generic knowledge that holds across many fields-- the analysis
required to establish the ontologies is a major research challenge.
-
http://www.w3.org/Conferences/WWW4/Panels/krp/macgregor.html
Formal descriptions permit one to draw arbitrarily-fine distinctions between
pairs of information items and they permit automatic
categorization, both of which will be needed
to manage very large taxonomies. They also provide the representational
framework needed to
generate "virtual nodes" used to reduce fan-out.
Information retrieval techniques that introduce attribute-value pairs partially
meet the same
goals as our descriptions.