Toward a Unified Language Model

Toward a Unified Language Model

John Hotchkiss,

Group Manager, Harlequin Inc.

jh@harlequin.com

"We dissect nature along lines laid down by our native language .… Language is not simply a reporting device for experience but a defining framework for it." -- Benjamin Whorf, Thinking in Primitive Communities in Hoyer (ed.) New Directions in the Study of Language, 1964.

Introduction

In recent years we have seen the emergence of standards for creating component and distributed software: standards like COM, DCOM, CORBA and the various World Wide Web architectures. I think of these as architectural structures within which programmers create software systems of components or distributed objects. These emerging frameworks for building components and distributed applications provide an opportunity to look at the advantages of creating systems out of components built with different or multiple computer languages. I call these heterogeneous development environments, as analogous to heterogeneous computing systems with mixed hardware configurations. They allow programmers to combine different tools, frameworks and languages to build the best solution to their problem. Heterogeneous systems like this allow us to use the unique capabilities of each language for reflection and syntactic and semantic abstraction to more appropriately and cleanly separate the underlying distributed and component frameworks from the program implementation, and to investigate different tradeoffs in addressing the problems of component reuse.

My point of view is one of both a tools provider and a consumer of these frameworks. Harlequin implements advanced development tools including the programming languages Dylan, ML, Lisp, Prolog and KnowledgeWorks (an expert system shell). We provide solutions to the printing and publishing industry, including RIP management and workflow products. We build applications for film and video post production, criminal investigation, and publishing to the World Wide Web. One of our recent products is a CORBA-based, distributed, adaptive workflow system for the printing industry. So we bring to the table a sense of the challenges for both sides in this struggle toward distributed components.

Problems with Existing Models

Before specifically addressing the language issues, I'd like to review some of the problems with existing models of component software systems. These are largely taken from our own experience building systems or integrating language implementations with COM, OLE, the World Wide Web and CORBA. For all the apparent complexity of the layers of these models, they are all based on fairly obvious architectural and method invocation models (see Chung, Liang, Wang et.al., DCOM and CORBA Side by Side, Step by Step, and Layer by Layer, http://www.bell-labs.com/~emerald/dcom_corba/Paper.html ). Unfortunately these models fall short in several ways.

The most obvious problems are the well known failures of black box abstraction in component and distributed systems. These include the loss of rationale, invisibility to the client of critical implementation efficiency issues, and the inability of the client to specialize any behaviors that cross-cut the natural component breakdown of the system. (Gregor Kiczales, Beyond the Black Box: Open Implementation, IEEE Software, January 1996).

In practice, the situation is much worse than this. Partly because of the languages and tools used to create and commonly interoperate with these architectures, actual implementations manage to give you all the disadvantages of abstraction with few of the advantages. Code is riddled with specific assumptions and implementation detail of the underlying architecture. Programs are difficult to convert to different middleware products - even when they are supposed to be implementing the same abstract interfaces. Wizards generate reams of code that you can't understand or safely modify because the documentation only describes what button to push in the wizard dialog, not how the underlying system works.

A significant amount of development time is spent trying to find the memory leaks caused by undocumented, or incorrectly documented, memory management strategies - again forcing the client to explicitly reference the internal implementation details of the component.

Additionally, components are fragile. Users are often unable to customize component behavior without access to source code, and when they can performance of their applications are often compromised. Clients are forced to decide whether to sacrifice performance by using dynamic interfaces, or suffer version fragility that defeats much of the purpose of distributing software as individual components.

Advantages of existing models

The existing models have given us enough common vocabulary and practical experience to start to explore the next generation of heterogeneous development environments. From my language-centric point of view, they provide a framework for experimenting with the way multiple languages are used to solve complex problems. This is mostly an accident of the fact that the languages commonly in use today are so ill-suited for component and distributed programming that each of the architectural standards has to reinvent its own mechanism for dispatch, versioning, and object identity. In doing so, they create language-neutral architectures that advanced-language developers can use as a way to integrate their languages with legacy systems. In fact, the constantly changing interfaces and assumptions in these rapidly evolving frameworks provide a great opportunity to show off the more sophisticated syntactic abstraction and type management facilities of languages like Dylan or Lisp.

Future Architectures and Addressing the "abilities"

The goals of maintainability, composability, scalability, reliability, debuggability, and most importantly reusability, will only be achieved by a combination of advances in software technology. These include better evolutionary architectural specification, run time safety and verification, and developing support for abstractions that span the component segmentation and allow more explicit control over the tradeoffs of component reuse. It is this last area that I am most interested in: how to provide the benefits of abstraction in building dynamic applications, applications that will have to evolve over their lifetime to support new substrates and functionality, while still allowing the abstraction layers to be selectively flattened to improve performance and to give clients control over performance and the meta-interfaces of a component.

This is a sort of fish net abstraction, as opposed to black box abstraction. Information about the component or distributed object can be exposed to clients, and clients can pick and choose the level of dependency on the available internals they are willing to accept. We have been studying this in the context of the library usage model in Dylan and how to allow the user to get type inferencing, inlining, and other optimizations to flow over the library boundaries while still providing a coherent and robust component distribution and versioning model.

Toward a Unified Language Model

As we build the next generation of tools and frameworks to support distributed systems, we face a fundamental choice. Should we create a tower of architectural tools on top of the current frameworks, or should we revisit those frameworks and realize that we can support integration of more sophisticated computational models, while still integrating legacy systems? We don't have to take the least common denominator approach.

Microsoft's recent description of COM+, and the OMG's recent RFPs, like the one for the CORBA component model, seem to be moving in the right direction. Microsoft is incorporating the generation of language-specific interfaces into their tools, just as we at Harlequin automatically generate Dylan interface bindings to components by crawling over the COM type library. CORBA's IDL provides a language-neutral interface, and their recent RFPs aim for interacting with "emergent component technologies". However, both technologies continue to restrict the domain of the component interfaces to those obviously supported by the commonly used languages of the past and present. It is interesting to note the ramifications to these architectures of the surprising popularity of just one new language: Java.

As we continue to struggle with problems of black box abstraction, we will be forced to look at the potential of aspect-oriented programming, allowing programs to explicitly manipulate emergent properties that cut across component boundaries. (Gregor Kiczales et. al, Aspect-oriented Programming, http//www.xerox.com/aop). This will lead us to consider what aspects are made apparent by different languages and computational paradigms. Languages that better support reflectivity, meta-object protocols, and syntactic abstraction, provide utility in attacking different reuse issues (Mira Mezini, Maintaining the consistency of class libraries during their evolution, OOPSLA 97). Languages that make it easy to use and tune certain computational paradigms, such as Prolog, can expose their properties to clients. The same applies to languages that focus on correctness, such as ML, or parallel computational models.

I am not proposing that the OMG or Microsoft should scrap current models and design aspect languages that address interesting characteristics of each and every language. In fact I think some of the use of aspect languages is motivated by the expressive weakness of the classical object-oriented paradigm as implemented in C++ and Java. I do suggest that we build component software and distributed frameworks that not only integrate our legacy systems and widely used languages, but let us build systems that effectively adapt to the changing underlying technologies and computational paradigms of the future. For example, they might compatibly support more complex method invocation interfaces, or evolutionary changes to the underlying class structure and inheritance hierarchy.

In more optimistic moments I dream of the emergence of a renaissance programmer: a programmer that rather than dogmatically sticking to one language, would choose the right language for the job, and on creating a solution to his or her problem in the common distributed framework, could expose interesting aspects of the program to potential clients. The choice of language would be similar to the choice of algorithm, but would be based largely on expressiveness and what meta-properties of the component the choice allows.

To conclude, the component and distributed-systems community should build systems that allow the use of multiple languages in a coherent framework and the exploitation of the interesting properties of those languages.

Acknowledgments

Many of the ideas expressed above were developed in conversations with members of the Harlequin Advanced Development Tools staff including Jason Trenouth, Andy Sizer, Greg Sullivan, and P. T. Withington.