October 11, 2004
Please note that contributed articles, blog entries, and comments posted on EDACafe.com are the views and opinion of the author and do not necessarily represent the views and opinions of the management and staff of Internet Business Systems and its subsidiary web-sites.
In my last editorial on Memory I described the database of memory models from Denali Software. These models are downloadable as XML-based SOMA (Specification Of Memory Architecture) files. This prompted me to write this week's editorial on XML (eXtensible Markup Language). XML has been promoted as a mechanism for data exchange across a multitude of industries.
XML is a subset of SGML (Standard Graphics Markup Language) while HTML is an application of SGML. The evolution of XML, HTML and the web itself is in the hands of the World Wide Web Consortium (W3C). Before we look at XML, let us review some background.
World Wide Web Consortium
In October 1994, Tim Berners-Lee, inventor of the Web, founded the World Wide Web Consortium (W3C) at the MIT Laboratory for Computer Science [MIT/LCS] in collaboration with CERN, where the Web originated, with support from DARPA and the European Commission. By promoting interoperability and encouraging an open forum for discussion, W3C is committed to leading the technical evolution of the Web. W3C has developed more than fifty technical specifications for the Web's infrastructure.
W3C's long term goals for the Web are:
W3C concentrates its efforts on three principle tasks:
The Web is an application built on top of the Internet and, as such, has inherited its fundamental design principles.
W3C activities are generally organized into groups: Working Groups (for technical developments), Interest Groups (for more general work), and Coordination Groups (for communication among related groups). These groups, made up of representatives from Member organizations, the Team, and invited experts, produce the bulk of W3C's results: technical reports, open source software, and services (e.g. validation services).
Standard Generalized Markup Language (SGML) is an international standard (ISO 8879:1986) for the definition of device-independent, system-independent methods of representing texts in electronic form.
There are three characteristics of SGML which distinguish it from other markup languages: its emphasis on descriptive rather than procedural markup; its document type concept; and its independence of any one system for representing the script in which a text is written.
A descriptive markup system uses markup codes which simply provide names to categorize parts of a document. Markup codes such as <P> or </UL> simply identify a portion of a document and assert of it that "the following item is a paragraph,'' or "this is the end of the most recently begun list,'' etc. By contrast, a procedural markup system defines what processing is to be carried out at particular points in a document: "call a function or procedure with parameters thus and so”.
With descriptive instead of procedural markup the same document can readily be processed by many different pieces of software, each of which can apply different processing instructions to those parts of it which are considered relevant.
Secondly, SGML introduces the notion of a document type, and hence a document type definition (DTD). The type of a document is formally defined by its constituent parts and their structure. The definition of a report, for example, might be that it consisted of a title and possibly an author, followed by an abstract and a sequence of one or more paragraphs. Anything lacking a title, according to this formal definition, would not formally be a report, and neither would a sequence of paragraphs followed by an abstract, whatever other report-like characteristics these might have for the human reader.
If documents are of known types, a special purpose program (called a parser) can be used to process a document claiming to be of a particular type and check that all the elements required for that document type are indeed present and correctly ordered. More significantly, different documents of the same type can be processed in a uniform way. Programs can be written which take advantage of the knowledge encapsulated in the document structure information, and which can thus behave in a more intelligent fashion.
A basic design goal of SGML was to ensure that documents encoded according to its provisions should be transportable from one hardware and software environment to another without loss of information.
SGML provides a general purpose mechanism for string substitution, that is, a simple machine-independent way of stating that a particular string of characters in the document should be replaced by some other string when the document is processed.
Hypertext Markup Language (HTML)
The history of hypertext begins in July of 1945. President Roosevelt's science advisor during World War II, Dr. Vannevar Bush, proposes Memex in an article titled As We May Think published in The Atlantic Monthly. In the article, Bush outlines the ideas for a machine that would have the capacity to store textual and graphical information in such a way that any piece of information could be arbitrarily linked to any other piece.
Hypertext Markup Language or HTML was originally proposed by Tim Berners-Lee in 1989 while at CERN, an international high energy physics research center near Geneva. HTML was popularized by the Mosaic browser developed at the National Center for Supercomputer Applications (NCSA). HTML is now the lingua franca for publishing hypertext on the World Wide Web. It is a non-proprietary format based upon SGML, and can be created and processed by a wide range of tools, from simple text editors to WYSIWYG authoring tools such as MS Frontpage. Also MS Word can output a word file as a webpage through the Save_as command. There have been several versions of HTML. The HTML 4, the latest
version issued in 1999, extends HTML with mechanisms for style sheets, scripting, frames, embedding objects, improved support for right to left and mixed direction text, richer tables, and enhancements to forms, offering improved accessibility for people with disabilities.
During 1999, HTML 4 was re-cast in XML and the resulting eXtensible Hypertext Markup Language (XHTML) 1.0 became a W3C Recommendation in January 2000. This recommnedation simply re-casts HTML 4 as an XML application, while the next Recommendation is concerned with the modularization of XHTML. This Recommendation describes how XHTML can be organized as a number of modules used to mark up headings, paragraphs, lists, hypertext links, images and other document idioms. Modules provide a means for subsetting and extending XHTML, a feature desired for extending XHTML's reach onto emerging platforms.
Today's search engines are cleverer. The body section consists of a combination
<B><I> and </I></B> will be displayed as both bold and italicized.
You can find the full EDACafe event calendar here.
To read more news, click here.
-- Jack Horgan, EDACafe.com Contributing Editor.
Be the first to review this article