Current Web Architecture
Table of Contents
Introduction
Basic Web Architecture
Web Architecture Extensibility
Other Transfer Protocols
Other Open Standards
Introduction
This section of the Internet Tool Survey describes
the current architecture of the World Wide Web (WWW). The NCSA
Glossary is a useful starting point for Web terms. Another is the ILC
glossary of Internet Terms.
The following sections describe
-
the basic two-tier architecture of the web in which static web pages
(documents) are transferred from information servers to browser clients
world-wide,
-
extensions that permit three-tiered architectures where content pages
can be constructed dynamically and where programs as well as data can be
transferred,
-
other information transfer protocols, and
-
related standards.
Basic Web Architecture
The basic web architecture is two-tiered and characterized by a web client
that displays information content and a web server that transfers information
to the client. This architecture depends on three key standards: HTML for
encoding document content, URLs for naming remote information objects in
a global namespace, and HTTP for staging the transfer.
-
HyperText Markup
Language (HTML) - the common representation language for hypertext
documents on the Web. HTML had a first public release as HTML 0.0 in 1990,
was Internet draft HTML 1.0 in 1993, and HTML 2.0 in 1994. The September
22 1995 draft of the HTML
2.0 specification has been approved as a standard by the IETF
Application Area HTML Working Group. HTML
3.0 and Netscape
HTML are competing next generations of HTML 2.0. Proposed features
in HTML 3.0 include: forms, style sheets, mathematical markup, and text
flow around figures. For more detailed information, see the HTML
Reference Manual.
HTML is an application of the Standard
Generalized Markup Language (SGML ISO-8879), an international standard
approved in 1986, which specifies a formal meta-language for defining document
markup systems (more here
and here). An SGML
Document Type Definition (DTD) specifies valid tag names and element attributes.
HTML consists of embedded content separated by hierarchical case sensitive
start and end tag names which may contain embedded element attributes
in the start tag. These attributes may be required, optional, or empty.
In addition, documents can be inter or intra linked by establishing
source and target anchor points. Many HTML documents are the result
of manual authoring or word processing HTML converters, but now several
WYSIWYG editors support HTML styles -- see listing
at W3C and the Internet Tools Survey section
on Authoring HTML.
HTML files are viewed using a WWW client browser (software), the
primary user interface to the Web. HTML allows for embedding of images,
sounds, video streams, form fields and simple text formatting. References,
called hyperlinks, to other objects are embedded using URLs (see
below). When an object is selected by a hyperlink, the browser takes
an action based on the URL's type, e.g., retrieve a file, connect to another
Web site and display a HTML file stored there, or launch an application
such as an E-mail or newsgroup reader.
-
Universal
Resource Identifier (URI) - an IETF addressing protocol for objects
in the WWW ("if it's out there, we can point at it"). There are two types
of URIs, Universal Resource Names (URN) and the Universal Resource Locators
(URL). The current IETF URI spec is here
and the URL spec is here.
URLs are location dependent and contain four distinct
parts: the protocol type, the machine name, the directory path and the
file name. There are several kinds of URLs: file URLs, FTP URLs, Gopher
URLs, News URLs, and HTTP URLs. URLs may be relative to a directory or
offsets into a document. Arguments to CGI programs (see below) may be embedded
in URLs after the ? character.
-
HyperText Transfer
Protocol (HTTP) - an application-level network protocol for the
WWW. Tim Berners-Lee, father of the Web, describes it as a "generic stateless
object-oriented protocol." Stateless means neither the client nor
the server store information about the state of the other side of an ongoing
connection. Statelessness is a scalability property but is not necessarily
efficient since HTTP sets up a new connection for each request, which is
not desirable for situations requiring sessions or transactions.
-
In HTTP, commands (request methods) can be associated with particular
types of network objects (files, documents, network services). Commands
are provided for
-
establishing a TCP/IP connection to a WWW server,
-
sending a request to the server (containing a method to be applied
to a specific network object identified by the object's identifier, and
the HTTP protocol version, followed by information encoded in a header
style),
-
returning a response from the server to the client (consisting
of three parts: a status line, a response header, and response data), and
-
closing the connection.
-
HTTP supports dynamic data representation through client-server negotiation.
The requesting client specifies it can accept certain MIME
content types (more on this below) and the server responds with
one of these. All WWW clients can handle text/plain and text/html.
-
HTTP/1.0 Internet Draft 05 (the seventh release of HTTP/1.0) is targeted
as an Internet Informational RFC. The next immediate version of HTTP is
HTTP/1.1 Internet Draft 01.
Web Architecture Extensibility
This basic web architecture is fast evolving to serve a wider variety of
needs beyond static document access and browsing. The Common Gateway Interface
(CGI) extends the architecture to three-tiers by adding a back-end server
that provides services to the Web server on behalf of the Web client, permitting
dynamic composition of web pages. Helpers/plug-ins and Java/JavaScript
provide other interesting Web architecture extensions.
-
Common Gateway Interface(CGI)
- CGI is a standard for interfacing external programs with Web servers
(see Figure 1). The server hands client requests encoded in URLs
to the appropriate registered CGI program, which executes and returns results
encoded as MIME messages back to the server. CGI's openness avoids the
need to extend HTTP. The most common CGI applications handle HTML <FORM>
and <ISINDEX> commands.
-
CGI programs are executable programs that run on the Web server. They
can be written in any scripting language (interpreted) or programming language
(must be compiled first) available to be executed on a Web server, including
C, C++, Fortran, PERL, TCL, Unix shells, Visual Basic, Applescript, and
others. Security precautions typically require that CGI programs be run
from a specified directory (e.g, /cgi-bin) under control of the webmaster
(Web system administrator), that is, they must be registered with the system.
-
Arguments to CGI programs are transmitted from client to server via
environment variables encoded in URLs. The CGI program typically returns
HTML pages that it constructs on the fly.
-
Some problems with CGI are:
-
the CGI interface requires the server to execute a program
-
the CGI interface does not provide a way to share data and communications
resources so if a program must access an external resource, it must open
and close that resource. It is difficult to construct transactional interactions
using CGI.
-
The current version is CGI/1.1.
W3C and others
are experimenting with next generation object-oriented APIs based on OMG
IDL; Netscape provides Netscape
Server API (NSAPI) and Progress Software and Microsoft provide Internet
Server API (ISAPI).
-
Helpers/Plug-ins - When a client browser retrieves a file, it
launches an installed helper application or plug-in to process the file
based on the file's MIME-type (see below). For example,
it may launch a Postscript or Acrobat reader, or MPEG or QuickTime player.
A helper application runs external to the browser while a plug-in
runs within the browser. For information on how to create new Netscape
Navigator plug-ins, see The
Plug-in Developer's Guide.
-
Common Client Gateway (CCI) - this gateway allows a third-party
application to remotely control the Web browser client. Netscape
Client APIs 2.0 (NCAPIs) depends on platform specific native methods
of interprocess communication (IPC). They plan to support DDE and OLE2
for Windows clients, X properties for UNIX clients, and Apple Events for
Macintosh clients.
-
Extensions to HTTP. W3C
and IETF Application Area
HTTP Working Group are working together on current and future versions
of HTTP. The HTTP-NG project is assessing two implementation approaches
to HTTP "replacements":
-
Spero's
approach - allows many requests per connection, the requests can be
asynchronous and the server can respond in any order, allowing several
transfers in parallel. A "session layer" divides the connection into numerous
channels. Control messages (GET requests, meta information) are returned
in a control channel; each object is returned in its own channel.
-
W3C
approach - Jim Gettys at W3C is using Xerox
ILU (a CORBA variant) to implement an ILU transport similar to Spero's
session protocol. The advantages of this approach are openness with respect
to pluggable transport protocols, support for multiple language environments,
and a step towards viewing the "web of objects." Related to this approach,
Netscape recently announced future support for OMG Internet
Inter-ORB Protocol (IIOP) standard on both client and server. This
will provide a uniform and language neutral object interchange format making
it easier to construct distributed object applications.
-
Java/ JavaScript
- Java is a cross-platform WWW programming language modeled after C++ from
Sun Microsystems. Java programs embedded in HTML documents are called applets
and are specified using <APPLET> tags. The HTML for an applet contains
a code attribute that specifies the URL of the compiled applet file.
Applets are compiled to a platform-independent bytecode which can be safely
downloaded and executed by the Java interpreter embedded into the Web browser.
Browsers that support Java are said to be Java-enabled. If performance
is critical, a Java applet can be compiled to native machine language on
the fly. Such a compiler is known as a Just-In-Time (JIT) compiler.
JavaScript is a scripting language designed for creating dynamic, interactive
Web applications that link together objects and resources on both clients
and servers. A client JavaScript can recognize and respond to user events
such as mouse clicks, form input, and page navigation, and query the state
or alter the performance of an applet or plug-in. A server JavaScript script
can exhibit behavior similar to common gateway interface (CGI) programs.
JavaScript scripts are embedded in HTML documents using <SCRIPT> tags.
Similar to Java applets, JavaScript scripts are directly interpreted within
the client's browser and are therefore platform-independent. For a comparison
of Java and JavaScript, see here.
The Java Language Specification can be found here,
a Java tutorial here,
the Java Virtual Machine (interpreter) here,
the Java Developer's Kit (JDK) here,
and Java FAQs here. A comprehensive
Java page of resources can be found at JPL.
The JavaScript Language Specification can be found here,
a JavaScript tutorial here,
and the JavaScript FAQs here.
-
The IETF
Security Area Web Transaction Security (WTS) Working Group is working
on security services for WWW. As chartered, it has produced Internet-drafts
of a Requirements
for Web Transaction Security and a Secure
HyperText Transfer Protocol specification plus Security
Extensions For HTML.
Other Transfer Protocols
The Web also uses other HTTP-related
protocols for transferring and representing information, including:
-
Transmission Control Protocol/Internet Protocol (TCP/IP) - the fundamental
protocol that provides for the reliable delivery of streams of data from
one host to another. An introduction to TCP/IP is here.
-
File Transfer Protocol (FTP)
- a common method of moving files between two Internet sites. It is based
on TCP/IP.
-
Gopher
- a distributed document search and retrieval protocol (IETF RFC
1436) for obtaining files or information from hierarchical menus in
the Gopher information-retrieval system.
-
Internet Inter-ORB Protocol (IIOP) - an inter-ORB protocol for communication
between objects and applications. It is based on the Common Object Request
Broker Architecture (CORBA) specification.
-
Multipurpose Internet Mail Extensions (MIME)
- the protocol for multimedia email and a building block of HTTP. The first
packet of information received by a client identifies the type of file
the server has sent, e.g., binary, audio, video, movie, formatted word-processor
documents, graphics, spreadsheets, etc.. The extensions to the SMTP format
allow it to carry multiple types of data. When multimedia files are sent
using the MIME standard they are encoded into non-readable text. The Web
browser maintains a list of pairs of MIME-Types and helper applications
for handling each type.
-
Network News Transfer Protocol (NNTP)
- the protocol used to connect to Usenet discussion groups.
-
Secure Socket Layer (SSL)
- a security protocol developed by Netscape for sending and receiving encrypted
information. It is based on encryption technology developed by RSA, Inc..
-
Simple Mail Transfer Protocol (SMTP)
- a protocol for transferring electronic mail from one host to another.
-
Simple Network Management Protocol (SNMP) - a protocol that allows
a network administrator to monitor network devices over the network.
-
Z39.50 - a protocol
that governs the formats and procedures by which two computers interact
with one another. It is used to search several databases of the same type,
and is session-oriented and stateful.
Other Open Standards
The Web also builds on additional open standards:
-
GIF, JPEG, and XBM image formats.
-
Virtual Reality Modeling Language (VRML)
- a proposed standard language for describing multi-participant interactive
simulations within the WWW.
-
HyperMedia Management Protocol (HMMP)
- a protocol to access and manipulate components of the Hypermedia Management
Schema (HMMS), a
data representation formalism (schema) for representing managed objects.
HMMS and HMMP are major components of the Web-Based
Enterprise Management standards effort to integrate existing standards,
such as SNMP/UDP, HTML/HTTP and DMI/RPC into a browser-managed
architecture.
-
Real Time Streaming Protocol (RTSP) - a recently proposed communication
protocol for control and delivery of video and audio in real-time.
-
Proxy and SOCKS firewall protocols.
-
S-HTTP security protocol.
A more complete list of standards can be found at Netscape
and the World
Wide Web Consortium. A complete list of Internet Engineering Task Force
(IETF) standard RFCs can be found here.
This research is sponsored by the Defense Advanced Research
Projects Agency and managed by the U.S. Army Research Laboratory under
contract DAAL01-95-C-0112. The views and conclusions contained in this
document are those of the authors and should not be interpreted as necessarily
representing the official policies, either expressed or implied of the
Defense Advanced Research Projects Agency, U.S. Army Research Laboratory,
or the United States Government.
© Copyright 1996 Object Services and Consulting, Inc.
Permission is granted to copy this document provided this copyright statement
is retained in all copies. Disclaimer: OBJS does not warrant the accuracy
or completeness of the information on this page.
This page was written by Craig Thompson and Gil Hansen. Send
questions and comments about this page to thompson@objs.com
or gil@objs.com.
Last updated: 1/3/97 sjf
Back to Internet Tool Survey --
Back to OBJS