"We dissect nature along lines laid down by
our native language .
Language is not simply a reporting
device for experience but a defining framework for it." --
Benjamin Whorf, Thinking in Primitive Communities
in Hoyer (ed.) New Directions in the Study of Language,
1964.
Introduction
In recent years we have seen the emergence of standards
for creating component and distributed software: standards like
COM, DCOM, CORBA and the various World Wide Web architectures.
I think of these as architectural structures within which programmers
create software systems of components or distributed objects.
These emerging frameworks for building components and distributed
applications provide an opportunity to look at the advantages
of creating systems out of components built with different or
multiple computer languages. I call these heterogeneous development
environments, as analogous to heterogeneous computing systems
with mixed hardware configurations. They allow programmers to
combine different tools, frameworks and languages to build the
best solution to their problem. Heterogeneous systems like this
allow us to use the unique capabilities of each language for reflection
and syntactic and semantic abstraction to more appropriately and
cleanly separate the underlying distributed and component frameworks
from the program implementation, and to investigate different
tradeoffs in addressing the problems of component reuse.
My point of view is one of both a tools provider
and a consumer of these frameworks. Harlequin implements advanced
development tools including the programming languages Dylan, ML,
Lisp, Prolog and KnowledgeWorks (an expert system shell). We provide
solutions to the printing and publishing industry, including RIP
management and workflow products. We build applications for film
and video post production, criminal investigation, and publishing
to the World Wide Web. One of our recent products is a CORBA-based,
distributed, adaptive workflow system for the printing industry.
So we bring to the table a sense of the challenges for both sides
in this struggle toward distributed components.
Problems with Existing Models
Before specifically addressing the language issues,
I'd like to review some of the problems with existing models of
component software systems. These are largely taken from our own
experience building systems or integrating language implementations
with COM, OLE, the World Wide Web and CORBA. For all the apparent
complexity of the layers of these models, they are all based on
fairly obvious architectural and method invocation models (see
Chung, Liang, Wang et.al., DCOM and CORBA Side by Side, Step
by Step, and Layer by Layer, http://www.bell-labs.com/~emerald/dcom_corba/Paper.html
). Unfortunately these models fall short in several ways.
The most obvious problems are the well known failures
of black box abstraction in component and distributed systems.
These include the loss of rationale, invisibility to the client
of critical implementation efficiency issues, and the inability
of the client to specialize any behaviors that cross-cut the natural
component breakdown of the system. (Gregor Kiczales, Beyond
the Black Box: Open Implementation, IEEE Software,
January 1996).
In practice, the situation is much worse than this.
Partly because of the languages and tools used to create and commonly
interoperate with these architectures, actual implementations
manage to give you all the disadvantages of abstraction with few
of the advantages. Code is riddled with specific assumptions and
implementation detail of the underlying architecture. Programs
are difficult to convert to different middleware products - even
when they are supposed to be implementing the same abstract interfaces.
Wizards generate reams of code that you can't understand or safely
modify because the documentation only describes what button to
push in the wizard dialog, not how the underlying system works.
A significant amount of development time is spent
trying to find the memory leaks caused by undocumented, or incorrectly
documented, memory management strategies - again forcing the client
to explicitly reference the internal implementation details of
the component.
Additionally, components are fragile. Users are often
unable to customize component behavior without access to source
code, and when they can performance of their applications are
often compromised. Clients are forced to decide whether to sacrifice
performance by using dynamic interfaces, or suffer version fragility
that defeats much of the purpose of distributing software as individual
components.
Advantages of existing models
The existing models have given us enough common vocabulary
and practical experience to start to explore the next generation
of heterogeneous development environments. From my language-centric
point of view, they provide a framework for experimenting with
the way multiple languages are used to solve complex problems.
This is mostly an accident of the fact that the languages commonly
in use today are so ill-suited for component and distributed programming
that each of the architectural standards has to reinvent its own
mechanism for dispatch, versioning, and object identity. In doing
so, they create language-neutral architectures that advanced-language
developers can use as a way to integrate their languages with
legacy systems. In fact, the constantly changing interfaces and
assumptions in these rapidly evolving frameworks provide a great
opportunity to show off the more sophisticated syntactic abstraction
and type management facilities of languages like Dylan or Lisp.
Future Architectures and Addressing the "abilities"
The goals of maintainability, composability, scalability,
reliability, debuggability, and most importantly reusability,
will only be achieved by a combination of advances in software
technology. These include better evolutionary architectural specification,
run time safety and verification, and developing support for abstractions
that span the component segmentation and allow more explicit control
over the tradeoffs of component reuse. It is this last area that
I am most interested in: how to provide the benefits of abstraction
in building dynamic applications, applications that will have
to evolve over their lifetime to support new substrates and functionality,
while still allowing the abstraction layers to be selectively
flattened to improve performance and to give clients control over
performance and the meta-interfaces of a component.
This is a sort of fish net abstraction, as
opposed to black box abstraction. Information about the component
or distributed object can be exposed to clients, and clients can
pick and choose the level of dependency on the available internals
they are willing to accept. We have been studying this in the
context of the library usage model in Dylan and how to allow the
user to get type inferencing, inlining, and other optimizations
to flow over the library boundaries while still providing a coherent
and robust component distribution and versioning model.
Toward a Unified Language Model
As we build the next generation of tools and frameworks
to support distributed systems, we face a fundamental choice.
Should we create a tower of architectural tools on top of the
current frameworks, or should we revisit those frameworks and
realize that we can support integration of more sophisticated
computational models, while still integrating legacy systems?
We don't have to take the least common denominator approach.
Microsoft's recent description of COM+, and the OMG's
recent RFPs, like the one for the CORBA component model, seem
to be moving in the right direction. Microsoft is incorporating
the generation of language-specific interfaces into their tools,
just as we at Harlequin automatically generate Dylan interface
bindings to components by crawling over the COM type library.
CORBA's IDL provides a language-neutral interface, and their recent
RFPs aim for interacting with "emergent component technologies".
However, both technologies continue to restrict the domain of
the component interfaces to those obviously supported by the commonly
used languages of the past and present. It is interesting to note
the ramifications to these architectures of the surprising popularity
of just one new language: Java.
As we continue to struggle with problems of black
box abstraction, we will be forced to look at the potential of
aspect-oriented programming, allowing programs to explicitly manipulate
emergent properties that cut across component boundaries. (Gregor
Kiczales et. al, Aspect-oriented Programming, http//www.xerox.com/aop).
This will lead us to consider what aspects are made apparent by
different languages and computational paradigms. Languages that
better support reflectivity, meta-object protocols, and syntactic
abstraction, provide utility in attacking different reuse issues
(Mira Mezini, Maintaining the consistency of class libraries
during their evolution, OOPSLA 97). Languages that make it
easy to use and tune certain computational paradigms, such as
Prolog, can expose their properties to clients. The same applies
to languages that focus on correctness, such as ML, or parallel
computational models.
I am not proposing that the OMG or Microsoft should
scrap current models and design aspect languages that address
interesting characteristics of each and every language. In fact
I think some of the use of aspect languages is motivated by the
expressive weakness of the classical object-oriented paradigm
as implemented in C++ and Java. I do suggest that we build component
software and distributed frameworks that not only integrate our
legacy systems and widely used languages, but let us build systems
that effectively adapt to the changing underlying technologies
and computational paradigms of the future. For example, they might
compatibly support more complex method invocation interfaces,
or evolutionary changes to the underlying class structure and
inheritance hierarchy.
In more optimistic moments I dream of the emergence
of a renaissance programmer: a programmer that rather than dogmatically
sticking to one language, would choose the right language for
the job, and on creating a solution to his or her problem in the
common distributed framework, could expose interesting aspects
of the program to potential clients. The choice of language would
be similar to the choice of algorithm, but would be based largely
on expressiveness and what meta-properties of the component the
choice allows.
To conclude, the component and distributed-systems
community should build systems that allow the use of multiple
languages in a coherent framework and the exploitation of the
interesting properties of those languages.
Acknowledgments
Many of the ideas expressed above were developed
in conversations with members of the Harlequin Advanced Development
Tools staff including Jason Trenouth, Andy Sizer, Greg Sullivan,
and P. T. Withington.