Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

HyperText Markup Language (HTML) 4.01

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name HyperText Markup Language (HTML) 4.01
Description

HyperText Markup Language (HTML) is the standard markup language for creating web pages and web applications. This format description is for HTML version 4.01, standardized under the auspices of the World Wide Web Consortium (W3C) and published as a W3C Recommendation in December 1999. The abstract of the specification reads, "This specification defines the HyperText Markup Language (HTML), the publishing language of the World Wide Web. This specification defines HTML 4.01, which is a subversion of HTML 4. In addition to the text, multimedia, and hyperlink features of the previous versions of HTML (HTML 3.2 and HTML 2.0), HTML 4 supports more multimedia options, scripting languages, style sheets, better printing facilities, and documents that are more accessible to users with disabilities. HTML 4 also takes great strides towards the internationalization of documents, with the goal of making the Web truly World Wide." As indicated in the abstract, HTML 4 introduced major changes from HTML 3.2.

The changes between HTML 4.0 and HTML 4.01 were mainly minor corrections and clarifications. However, since corrections were needed to the underlying SGML DTD (Document Type Definition) declared at the start of an HTML 4 document, the compilers of this resource have chosen to treat the two formats as distinct. This corresponds to the practice used in the PRONOM database of the UK National Archives. See PRONUM PUID file signifiers below. Since HTML 4.01 was the current recommendation over a longer period and used much more widely, the description for HTML 4.0 is minimal. In this description the use of "HTML 4" indicates applicability to both HTML 4 versions.

Major changes introduced in HTML 4 included:

  • Extending the document character set from ISO-8859-1 (aka ISO Latin-1) to ISO 10646 (equivalent to Unicode). See Notes below on internationalization and characters sets. Also to support internationalization, HTML 4 improved support for right-to-left and mixed-direction text (aka Bi-Di) and permitted inline text spans to be given a lang attribute.
  • Increased emphasis on use of stylesheets and the <STYLE> element to separate representation of the semantic structure of a document from its presentation. Style information could be specified within an HTML document or in external style sheets. W3C expressed an intent to eventually phase out many of HTML's presentation elements and attributes and the specification marked many as "deprecated."
  • Increased emphasis on the use of scripts and the <SCRIPT> element to allow authors to create dynamic Web pages (e.g., "smart forms" that react as users fill them out) and use HTML as a means to build networked applications.
  • The model for tables was extended to serve several objectives, including response to practical experience, support for accessibility, and to simplify import of tables conforming to the widely used CALS table model.
  • Support both for documents constructed from frames (known as framesets) and for inline frames (known as iframes). See Notes on Frames below.
  • The <OBJECT> element was introduced as a generalized approach for inserting multimedia. This element was intended to take over from <IMG>, <ISMAP>, <APPLET>, etc.
  • HTML 4 introduced three Document Type Definitions (DTDs) and associated DOCTYPE declarations: Transitional, Strict, and Frameset. The Frameset DTD was required for an HTML document that comprised a collection of rectangular areas (frames). See Notes on Frames below. The Strict DTD excluded presentation attributes and elements that W3C expected to phase out as support for style sheets matured. The compilers of this resource have not found evidence for widespread use of the Strict DTD on public websites; it may have been used as part of internal workflows. The Transitional DTD was very widely used for many years. According to W3Techs (Web Technology Surveys), in early March 2018, of websites based on HTML (as distinct from XHTML), roughly 13% still used HTML 4 Transitional and less than 1% used HTML Strict. The remainder had almost all adopted HTML 5. Comments welcome.

For more detail on changes between HTML 3.2 and HTML 4, see HTML 4 Explained by Ross Shannon and Changes between HTML 3.2 and HTML 4.0 (18 December 1997) from W3C.

Production phase The primary use of HTML is as a final-state format for web pages made available on the Internet. Early HTML files were often created directly in a text editor, but by the time HTML 4 was the accepted standard for web pages, most pages were created in visual editors like Adobe's Dreamweaver or in enterprise-level content management systems.
Relationship to other formats
    Subtype of HTML_family, HTML File Format Family
    Has earlier version HTML_4_0, HyperText Markup Language (HTML), HTML 4.0
    Equivalent to XHTML_1_0, Extensible HyperText Markup Language (XHTML) 1.0. XHTML 1.0 is described as a reformulation of HTML 4 as XML.
    Has later version HTML_5, HyperText Markup Language (HTML) 5
    Defined via SGML, Standard Generalized Markup Language (SGML). ISO 8879:1986

Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress home page archived on January 30, 2001 used HTML 4.01 Transitional. For the new design introduced in April 2005, XHTML 1.0 Transitional was used. See also HTML_family.
LC preference See HTML_family.

Sustainability factors Explanation of format description terms

Disclosure

HTML 4.01, produced under the auspices of the W3C, is a non-proprietary format, openly developed and published, and freely implementable.

    Documentation

The specification for version 4.01 of the HTML markup language was published as a W3C Recommendation in December 1999.

Adoption

HTML 4 was widely used in the late 1990s and 2000s and many sites using this version of HTML are still to be found on the Web. According to W3Techs (Web Technology Surveys), in early February 2018, roughly 20% of websites were based on XHTML. Roughly 16% still used HTML 4, while the remainder had almost all adopted HTML 5. Some features permitted in HTML 4 are considered obsolete as of early 2018 and support may be dropped in browsers. For example, the <ISINDEX> element is largely unsupported. The special <DIR> and <MENU> list elements have been declared obsolete, with guidance to use the <UL> unordered list with appropriate styling instead. Obsolete tags can be identified at CanIUse or in the latest HTML 5.x specification.

See also HTML_family.

    Licensing and patents

No concerns. See W3C IPR Software Notice that was applicable from 1995 to 1998 and the revised W3C Software Notice and License introduced in August 1998. See also HTML_family.

Transparency HTML 4 files can be opened and viewed in text editors. The increased use of CSS and Javascript files in HTML 4 sometimes resulted in less transparency than in earlier HTML versions.

See also HTML_family.

Self-documentation

The HTML 4 specification provided extensive guidance and examples for the <META> element, including mention of the Dublin Core set of properties as intended to promote interoperability. The specification mentions the RDF Resource Description Language as a development effort at W3C aimed at supporting richer metadata. See also HTML_family.

External dependencies See HTML_family.
Technical protection considerations See HTML_family.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering See HTML_family.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension See related format.  See HTML_family
Internet Media Type See related format.  See HTML_family
Magic numbers <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
The specification for HTML 4.01 requires that a conforming document have a document type declaration before the opening <HTML> tag. For HTML 4.01, the declaration must begin with '<!DOCTYPE HTML PUBLIC "--//W3C//DTD HTML 4.01'. Note that lower case characters may also be used. This string covers Strict, Transitional, and Frameset DTDs.
Pronom PUID fmt/100
See http://www.nationalarchives.gov.uk/PRONOM/fmt/100.
Wikidata Title ID See note.  There is no Wikidata Title ID for HTML 4.01 See https://www.wikidata.org/wiki/Q3782232 for HTML 4.0

Notes Explanation of format description terms

General

Internationalization and character encodings: HTML 4 integrated the recommendations of RFC 2070: Internationalization of the Hypertext Markup Language published in January 1997, with a few minor modifications. The restriction to the ISO-8859-1 coded character set was removed. The document character set was extended to be the Universal Character Set (UCS) of ISO 10646:1993 (equivalent to Unicode). The HTML 4 specification stated, "Commonly used character encodings on the Web include ISO-8859-1 (also referred to as "Latin-1"; usable for most Western European languages), ISO-8859-5 (which supports Cyrillic), SHIFT_JIS (a Japanese encoding), EUC-JP (another Japanese encoding), and UTF-8 (an encoding of ISO 10646 using a different number of bytes for different characters)." Use of registered character encodings from the IANA registry for Character Sets was recommended. For HTML 4, there was no default charcater set. The specification recommended that the charset used in a web page be identified in a <META> element as early as possible in the <HEAD> element. An example of the syntax is <META http-equiv="Content-Type" content="text/html; charset=EUC-JP">. See Useful References, below.

Frames and framesets: The concept of a frameset as the structure for a web page was introduced in HTML 4.0. A frameset defined the positioning of rectangular frames in a browser window. The content for each frame was an independent HTML document referred to by a target URL. The Frameset DTD was used to specify the arrangement of frames in a browser window with a target URL for each frame. See Basic Frames by Ross Shannon for details. One of the most popular uses of frames was to present a coherent body of content, such as user documentation or help for a software application. A table of contents would be presented in one frame. When a user selected a topic, the document on that topic was shown in another frame. However, because of disadvantages such as those listed at Advantages and disadvantages of frames or Frames: Good or Bad? and challenges presented by mobile devices with small screens, support for the frameset structure and the associated elements were dropped from XHTML 1.1 in 2001.

The <IFRAME> element was also introduced in HTML 4. Inline or “floating” frames can be positioned on a page, much like an image or a table. This extension to HTML was introduced by Microsoft in Internet Explorer and adopted by other browsers. Iframes do not have the same disadvantages as framesets and their use is encouraged as a substitute for framesets. See Inline Frames by Ross Shannon.

Changes to elements and attributes: HTML 4 made many changes to the set of elements and attributes supported. Not all these changes persisted in the later migration to HTML 5.

  • Elements introduced in HTML 4 included: <ABBR>, <ACRONYM>, <BDO>, <BUTTON>, <COL>, <COLGROUP>, <DEL>, <FIELDSET>, <FRAME>, <FRAMESET>, <IFRAME>, <INS>, <LABEL>, <LEGEND>, <NOFRAMES>, <NOSCRIPT>, <OBJECT>, <OPTGROUP>, <PARAM>, <SPAN>, <TBODY>, <TFOOT>, <THEAD>, and <Q>.
  • The new id attribute allowed any element to be the destination anchor of a link. This attribute was significant for application of CSS style sheets.
  • Elements deprecated included : <APPLET>, <BASEFONT>, <CENTER>, <DIR>, <FONT>, <ISINDEX>, <MENU>, <S>, <STRIKE>, and <U>.
  • Elements declared obsolete included: <LISTING>, <PLAINTEXT>, and <XMP>. Use of the <PRE> element was recommended instead of these elements.
History

HTML 4.0, a major update to the HTML specification, was published as a W3C Recommendation in December 1997 and revised in April 1998. It was superseded by a minor update, to HTML 4.01, in December 1999. Following the publication of the HTML 4.01 specification, the focus of the W3C working group turned to XHTML 1.0, which described itself as "a reformulation of HTML 4 as XML" when it was published in January 2000.

The first working draft of HTML 5 was published in January 2008. Adoption of HTML 5 was gradual between 2008 and 2012, and then steady. HTML 5 was finally issued as a W3C Recommendation in 2014.

For a more complete discussion and chronology of versions for the HTML format, see HTML_family.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 03/29/2018