Composing Active Proxies to Extend the Web
Rohit Khare,
rohit@uci.edu, University of
California at Irvine
Adam Rifkin,
adam@cs.caltech.edu,
California Institute of Technology
Introduction
In considering the future of "compositional software architectures" as
rendered through distributed object systems and on the World Wide Web,
it is useful to set aside the hype of new technologies and consider what
is already being accompished with existing infrastructures. On the Web,
users and developers have already adopted two powerful ways to compose
active processing with information distribution: active pages ("cgi-bin")
and active proxies. In this position paper, we focus on the latter
as a tool for parties beyond the original developer to externalize extensions
to a software or information architecture.
Our Position
Independent extensibility is a critical affordance of compositional software
architectures. To realize the full potential of concurrent evolution of
systems by all the system's stakeholders, architects should be encouraged
to support externalized, component-oriented hooks. In particular, active
proxies on the Web demonstrate the power of independent evolution and the
serendipitous synergy of orthogonal services. Soon, HTTP in conjunction
with PEP will systematize this power and bring it to clients and servers
as well.
Examples of Active Proxies
When fetching a resource through the HyperText Transport Protocol, clients
can contact the origin server or an intermediate server that will
fetch it on their behalf. One of the most familiar uses of proxied HTTP
is caching: many users behind a single caching proxy can benefit from a
local copy of their most frequently accessed resources. The caching proxy
operates on behalf of users to maintain up-to-date copies from the origin
server. Sometimes, it also acts on behalf of the publisher to collect and
report usage statistics for its cached resources; the entire cache may
also be filled on behalf the publisher (a "mirror" proxy).
There are many other species of proxies, though. Caches merely relay
the original resource; active proxies have free reign to extend and emend
these resources. Consider these applications:
-
Crit-Link Mediator
-
Any user can fetch any page on the Web through the crit.org
proxy; it's returned with a Crit-Link banner across the top and the collected
comments of previous users about that page at the bottom. It's a public
annotation service that moves the Web closer to its original vision of
a multiparty conversation.
This service was developed by Ka-Ping Yee, a student at Waterloo, based
on design work by Eric Drexler at the Foresight Institute -- the ideas
can be traced back to Ted Nelson's Xanadu. Alexa
is a similar community annotation tool that collects feedback from users
about document quality (and also acts as a cache -- in this case, from
the multiterabyte Internet Archive
of extinct pages).
-
Lucent Personalized Web Assistant
-
Lucent's tool systematically replaces one's real identity with a
pseudonyumous one on the Net. Using HTTP security features, users log onto
the proxy, which then allows users to 'register' at sites using escape
codes in fill-in forms (\u for username, \p for a site-specific password,
etc).
This service was developed by several Bell Labs cryptography and Web
protocol researchers. Two related services are the Anonymizer,
which intercepts cookies, Java, JavaScript, and strips out Referer:
and User-Agent: request headers; and NoShit,
which strips out graphics characteristic of Web advertising. Conversely,
some advertisers weave ads onto public pages and track end-users with the
same technology.
-
Format Converters
-
Numerous services exist to transform multimedia formats. Low-bandwidth
devices or low-resolution devices like wireless Web palmtops will access
proxies that reduce color graphics to black-and-white thumbnails on the
server side. Another service of Yee's, a Medium-Independent Notation for
Structured Expressions (MINSE), allows authors to embed mathematical expressions
and the like which are automatically compiled to ASCII layouts or embedded
graphics for different clients.
-
Content Translators
-
Natural-language understanding has progressed to the point of assisting
professional translators by producing a first pass. Several companies demonstrate
their technology by offering to translate pages as a proxy service.
-
Remote Processing
-
The best defense against rogue mobile code is isolation. At least one vendor
offers to quarantine Java applets by running them on the proxy and only
sending the display output inward to the end-user.
-
Content Filtering
-
Many proponents of content selection strategies -- whether for child-protection,
political censorship, or enforcing organizational security directives --
posit centralized filtering by proxies. Content labels, digital signatures,
and other assertions are inspected, migrating policy enforcement upstream.
-
Protocol Gateways
-
HTTP proxies can ease migration between editions of HTTP. In the future,
developers expect to multiplex multiple HTTP streams through HTTP-NG ("next
generation") adaptors. Security and authentication protocols for Web sites
are also found centralized in "Web firewalls". HTTP proxies also stand
in for other URL schemes: FTP, Gopher, Wais, and Ph protocols are commonly
available.
This sampler focuses primarily on extending the Web as an information space.
The same approaches apply to the Web as an active software system, though.
Imagine a proxy that extracts the user interface from a dozen package-tracking
services and presents a single meta-interface for any shipper? WebMethods'
Web Interface Defintion Language points
the way to composing Web transactions. Imagine a data-logging proxy that
filters the event stream with pluggable strategies. UC Irvine's Expectation-Driven
Event Monitoring dynamically composes filters fetched from the Web.
Imagine extending a flight-reservation systems' command interface to interpose
a graphical map. Apple Computer's Web
Objects demonstrates an airline UI prototype.
The challenge is that only first-party stakeholders can extend their
software architectures in these ways today. The information-oriented extensions
listed above were not; they were developed by outsiders and deployed by
outsiders in service of outsiders. Active proxies hold the promise that
soon, outsiders will write new health-plan comparators and mutual-fund
trading interfaces and ...
Composing Active Proxies
Want to annotate a Japanese page without advertisements from a HTTP-NG
server? Want to book a plane ticket and a hotel room in a single transaction?
Active proxies can be neatly reused as black-box components when chained
together via HTTP. However, we can envision neater, more efficient ways
to enable reuse. The HTTP Protocol Extension Protocol (PEP) transcends
the welter of competing APIs to offer a single syntax for naming, specializing,
and applying active proxies with finer-grained control. PEP also affords
reasoning about compatible extensions and composite extensions.
We are already familiar with many analogues to active proxies as reusable
filters. The difference is in the the affordances of the interchange format.
UNIX filters operate on ASCII streams; SQL queries operate on relational
tables; active proxies and pages operate on Web hypermedia (HTML/XML +
HTTP).
The affordances for composition are also similar: manual, sequential
composition only. The specification clearly allows HTTP proxy chains to
apply several transformations along the way, but in practice none of the
services sampled above allows for onward chaining (they just fetch the
actual content from the origin server, rather than branching to yet another
proxy server specified by the end-user). The downside of packaging these
extensions as a proxy is the assumption that all users and all destinations
are treated the same -- that is, in applying the same function to all inputs
and outputs.
This is the universe PEP was designed for -- each PEP module has the
same executable power as a proxy, but can be selectively applied to portions
of Web space, on behalf of certain users, with known urgency (required
or optional), in concert with other extensions (or exclusively of conflicting
ones), in sequence or in parallel, on selected hops of the HTTP proxy chain.
Most significantly, since PEP modules are identified by the URI of the
protocol they implement, PEP-aware Web tools can negotiate common sets
of compatible modules and settings.
PEP enshrines a philosophy of decentralization. Anyone can publish an
extension by maintaining a Web page describing it. Any such module has
as much expressive power to rewrite its input as an active page or active
proxy. Any resource can be bound to require an extension ("those .quicken
files require an http://pep.w3.org/SEA/Encryption/-compatible
filter"). Any extension can express its own policy (hop-by-hop or end-to-end;
requisite and incompatible co-extensions).
Its designers developed applications for content-filtering, electronic-payment
selection, and a modular security architecture -- all of which could be
composed to, say, purchase encrypted PICS labels. Many of these applications
are actually more powerful than active proxies, since PEP allows their
functions to be moved into the origin client and server; security decisions
can be made at the desktop rather than the firewall.
These benefits are not free, though. Selectively applying active proxy
extensions requires more logic in the server to select compatible PEP modules
and enforce users' and publishers' policies. In truth, it has been easier
for extenders to deploy active pages and active proxies than to design
for the future. PEP is still on the IETF standards track two years after
its debut. Decentralized extensibility is a tough sell, but we believe
it is essential.
Internal extensibility exists and is well supported *within* the black
box -- OOA&D and software architectures research gives us reason to
hope. Outside the box, though, the rise of open information systems on
the geodesic network of the Web heralds a political shift in the
constituency for extensiblity: there are many, many more actors with an
interest in the extensiblity of your architectures!
References
For more information about the ideas and systems we have discussed in this
document...
-
PEP Working Draft: HTTP
Protocol Extension Protocol. Henrik Frystyk Nielsen, Rohit Khare, and
Dan Connolly.
-
Crit-Link Design Paper: Crit-Link
mediator, a proxy for annotating pages -- PUBLIC annotation of pages.
-
LPWA Design: The Lucent Personalized Web
Assistant
-
Paper: Application-Specific Proxy Servers as HTTP Stream Transducers. World
Wide Web Journal, Winter 1996 (v1n1) (proceedings of WWW4). Charles
Brooks, Murray S. Mazer, Scott Meeks, and Jim Miller. (html)
-
Paper: Ubiquitous Advertising on the WWW: Merging Advertising at the Browser.
World Wide Web Journal, Summer 1996 (v1n3) (proceedings of Workshop
on Web Demographics). Youji Kohda, Susumu Endo. (html)
-
World Wide Web Proxies . Luotonen,
A., and Altis, K.
-
Paper: Weaving a Web of Trust. World Wide Web Journal, Summer 1997
(v2n3). Rohit Khare and Adam Rifkin. (html)
-
Paper: Capturing the State of Distributed Systems with XML. World Wide
Web Journal, Fall 1997 (v2n4). Rohit Khare and Adam Rifkin (html)
-
Article: Product Evaluation of WebObjects, Byte July 1997. Rohit
Khare. (html)
-
Discussion Archive: FoRK
Mailing List.
Thanks to Jim Whitehead for reviewing a draft of this position paper.