Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

HyperText Markup Language (HTML), versions prior to 2.0

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name HyperText Markup Language (HTML), versions prior to 2.0
Description

Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. This format description applies to early HTML files, created before the adoption of the first formal specification for HTML, published in November 1995 as IETF RFC 1866: HTML 2.0. See HTML_family for a description that provides context for the entire family of HTML formats and HTML_2_0 for a description for HTML 2.0

HTML was originally developed at CERN (Conseil Européen pour la Recherche Nucléaire; European Organization for Nuclear Research) by Tim Berners-Lee around 1990. In 1989, Berners-Lee wrote a proposal to his boss. It was "an attempt to persuade CERN management that a global hypertext system was in CERN's interests." He was given the go-ahead and started to build what became the World Wide Web. HTML, as a syntax for marking up online documents, was a key component of the World Wide Web. A statement about HTML directions (from CERN, around 1992) stated, "The HTML language has been in use in the field since 1990." Between 1992 and 1995, Berners-Lee and Dan Connolly published several drafts of documentation for HTML. Initial specifications were distributed on the www-talk mailing list and on the incipient World Wide Web at CERN (see World Wide Web: The Project from December 1992). In mid-1993, a specification for the "Hypertext Markup Language (HTML)" was published as an IETF (Internet Engineering Task Force) Internet Draft. In 1994, the IETF formed an HTML Working Group, which issued several updates to the HTML Internet Draft. See Status pages for HTML Working Group from the IETF.

The original HTML was based on SGML (ISO 8879:1986, Standard Generalized Markup Language), an international standard for a system-independent approach for marking up text into structural units such as paragraphs, headings, and list items. See Notes below on SGML as the basis for HTML. One innovative aspect of HTML was support for hypertext, by using the <A> (anchor) element with the HREF attribute to link to other resources.

HTML documents are plain text documents. Early HTML files typically began with an <HTML> tag and were encoded as 7-bit ASCII or its 8-bit extension ISO/IEC 8859-1, also known as ISO Latin-1. A list of HTML Tags from 1992 shows some tags (i.e., elements) that are still used today and others that have been dropped. For example, the <NEXTID> tag was specific to the Next computer and was already deprecated in the 1995 HTML 2.0 specification. The <ISINDEX> tag was deprecated in HTML 4.01, and is no longer supported by browsers. The <HEAD> section was apparently not used in 1992 to wrap the elements <TITLE>, <LINK>, etc., but was included in the first Internet Draft for an HTML specification (June 1993). The 1993 draft also introduced the use of a DOCTYPE declaration, with the aim of making an HTML document a valid SGML document. For a more specific proposal for the DOCTYPE declaration, see Towards Closure on HTML (1994).

Production phase The primary use of HTML is as a final-state format for web pages made available on the Internet. Early HTML files were often created directly in a text editor.
Relationship to other formats
    Subtype of HTML_family, HTML File Format Family
    Has later version HTML_2_0, HyperText Markup Language (HTML) 2.0
    Defined via SGML, Standard Generalized Markup Language (SGML). ISO 8879:1986. Early versions of HTML were defined by an SGML DTD (Document Type Definition).

Local use Explanation of format description terms

LC experience or existing holdings See HTML_family.
LC preference See HTML_family.

Sustainability factors Explanation of format description terms

Disclosure

HTML has been an openly documented specification since its origin in the early 1990s. Initially it was developed and documented at CERN and via the www-talk mailing list.

In 1994, the Internet Engineering Task Force formed an HTML working group, which published several Internet Drafts of a specification for HTML. In November 1995, the first formal standard for HTML was published by the IETF as "RFC 1866: Hypertext Markup Language - 2.0." See HTML_2_0 for a description of HTML 2.0.

    Documentation

Dan Connolly distributed an SGML DTD for HTML in a July 15, 1992 message on the www-talk mailing list. This linked to an informal description of HTML tags (i.e., elements) used at that time, part of more extensive early documentation on HTML from CERN. From 1993 through 1995, Berners-Lee and Connolly wrote a sequence of Internet Drafts. See IETF HTML WG Status Pages and Format Specifications below.

Adoption See HTML_family.
    Licensing and patents

Between 1990 and 1992, the copyright in documentation at CERN for HTML and other aspects of the World Wide Web was claimed by CERN, but made available on the following terms that encouraged use: "The information (of all forms) in these directories is the intellectual property of the European Particle Physics Laboratory (known as CERN). It is freely available for non-commercial use in collaborating non-military academic institutes. Commercial organisations wishing to use this code should apply to CERN for conditions. Any modifications, improvements or extensions made to this code, or ports to other systems, must be made available under the same terms."

The Birth of the Web, from CERN, states, "On 30 April 1993 CERN put the World Wide Web software in the public domain. CERN made the next release available with an open licence, as a more sure way to maximise its dissemination."

Transparency All HTML files can be opened and viewed in text editors. Since early HTML files do not include style markup (e.g., in CSS) or scripts (e.g., Javascript), their contents are easily read and understood using a plain text editor.
Self-documentation The earliest versions of HTML documentation did not include a mechanism for embedding metadata in a web page. The <META> element was included in a 1994 Document Type Definition for the HyperText Markup Language Plus from Dave Raggett. This allowed the inclusion of name/value pairs in the <HEAD> section of an HTML document. A primary purpose was to support indexing of documents on a server. By mid-1995, a draft specification for HTML 2.0 described two main functions for "meta-information": to provide a means to discover that the data set exists and how it might be obtained or accessed; and to document the content, quality, and features of a data set, indicating its fitness for use.
External dependencies None.
Technical protection considerations See HTML_family.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering Early HTML documents were encoded as plain text, either in 7-bit ASCII or or its 8-bit extension ISO/IEC 8859-1. All other characters could be incorporated by using SGML-style entities. For other textual characteristics of HTML documents, see HTML_family.
Integrity of document structure Semantic tags for paragraphs, up to six heading levels, and list structures of several types have been part of HTML since the earliest use of the markup language. See HTML Tags (from 1992).
Integrity of layout and display Preserving particular aspects of layout was not an intent of the original HTML. The focus was on making the textual content and semantic structure of a resource conveniently readable on different devices.
Support for mathematics, formulae, etc. Dave Raggett's 1994 Proposal for Mathematical Equations in HTML+, stated, "Currently, the best way of including equations in HTML documents is to first write the document in LaTeX and then use the latex2html filter to create the corresponding HTML document, together with the equations as a number of bitmap files." His proposal for mathematical markup was not incorporated into HTML 2.0. See also HTML_family.
Functionality beyond normal rendering HTML was developed specifically to support linking among online resources.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension See related format.  See HTML_family.
Internet Media Type See related format.  See HTML_family.
Pronom PUID See note.  PRONOM does not have an entry and signature specifically for the early HTML versions, but does have fmt/96 for "generic" HTML, identified by an opening <HTML> tag and closing </BODY> and </HTML> tags. This matches HTML documents which pre-date the first formal specification (HTML 2.0), or which otherwise to do not fully conform to any formal specifications. See http://www.nationalarchives.gov.uk/PRONOM/fmt/96. PUID: fmt/96
Wikidata Title ID See note.  Wikidata does not include a Title ID specifically for the HTML file format prior to HTML 2.0. The Wikipedia Title ID Q881 is used for the HTML File Format Family.

Notes Explanation of format description terms

General

SGML as Basis for HTML: In his 1998 "A History of HTML," Dave Raggett, wrote, "The HTML that Tim invented was strongly based on SGML (Standard Generalized Mark-up Language), an internationally agreed upon method for marking up text into structural units such as paragraphs, headings, list items and so on. SGML could be implemented on any machine. The idea was that the language was independent of the formatter (the browser or other viewing software) which actually displayed the text on the screen. The use of pairs of tags such as <TITLE> and </TITLE> to enclose an element's content is taken directly from SGML, which does exactly the same. The SGML elements used in Tim's HTML included P (paragraph); H1 through H6 (heading level 1 through heading level 6); OL (ordered lists); UL (unordered lists); LI (list items) and various others. What SGML does not include, of course, are hypertext links: the idea of using the anchor element with the HREF attribute was purely Tim's invention, as was the now-famous `www.name.name' format for addressing machines on the Web.
Basing HTML on SGML was a brilliant idea: other people would have invented their own language from scratch but this might have been much less reliable, as well as less acceptable to the rest of the Internet community. Certainly the simplicity of HTML, and the use of the anchor element A for creating hypertext links, was what made Tim's invention so useful."

History

HTML was developed as a primary technological component of a distributed information environment, first proposed to CERN management by Tim Berners-Lee in 1989. According to a CERN page about the proposal, ‘Vague, but exciting’, were the words that his boss wrote on the proposal, allowing Berners-Lee to continue developing what became the World Wide Web.

Tim Berners-Lee began work on his ideas for the World Wide Web around 1990 and started the WWW-talk mailing list in September 1991. An early public description of HTML tags (i.e., elements) was published on the Web in 1992.

Following a period of informal development, the first formal standard for the HTML format was version 2.0, published as RFC 1866 in November 1995 by the IETF (Internet Engineering Task Force). The authors were Tim Berners-Lee and Dan Connolly. The RFC indicates,"This specification roughly corresponds to the capabilities of HTML in common use prior to June 1994." See HTML_2_0.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 03/29/2018