Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Extensible HyperText Markup Language (XHTML), 1.0

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Extensible HyperText Markup Language (XHTML) 1.0
Description

XHTML is an openly documented, freely implementable format for marking up structured documents for use as pages and applications on the World Wide Web. The standard was developed and has been maintained by the World Wide Web Consortium (W3C). The specification for XHTML 1.0, first published in January 2000, describes itself as a reformulation of HTML 4 as XML. Like HTML, XHTML is a language for marking up the structure of a document intended for distribution on the Web. The motivations for basing HTML on XML (which had been published as a W3C Recommendation in February 1998) were several:

  • XHTML documents must conform to XML. Thus, they can be viewed, edited, and validated with standard XML tools.
  • HTML browsers (and other user agents) had tolerated technical errors, but the extra code required to display documents with errors, particularly on mobile devices, generated performance problems. XHTML introduced stricter error handling.
  • HTML 4 was ostensibly an application of Standard Generalized Markup Language (SGML); however the specification for SGML was complex, and neither web browsers nor the W3C Recommendation for HTML 4 were fully conformant to it. XML provided a simpler underlying data format, more conducive to transformation by servers for distribution to devices such as mobile phones.
  • XML supports extensibility through its namespace mechanism. XHTML documents could easily include fragments in other XML-based languages such as Scalable Vector Graphics (SVG) and MathML.

The W3C XML Working Group also argued that the move to XHTML would provide an opportunity to divide HTML into reusable components (XHTML Modularization) and clean up untidy parts of the language.

Like HTML 4.01, XHTML 1.0 had three DTDs: Strict, Transitional, and Frameset. The Transitional profile was designed with backwards compatibility in mind. The Strict profile omitted a number of deprecated elements, particularly presentational elements such as center and font, for which use of a style sheet language such as CSS was judged to be the best practice. The Frameset profile allowed a page to be structured as a set of rectangular areas, with the content for each frame being an independent XHTML document. See Notes below for more detail about framesets. In February 2018, W3Techs indicated that, of websites using XHTML, 75% used the Transitional Profile.

Production phase The primary use of XHTML is as a final-state format for web pages made available on the Internet.
Relationship to other formats
    Equivalent to HTML_4_01, HyperText Markup Language (HTML) 4.01
    Has later version XHTML_1_1, Extensible HyperText Markup Language (XHTML) 1.1. Initially published in May 2001.
    Defined via XML_DTD, XML Document Type Definition (DTD)

Local use Explanation of format description terms

LC experience or existing holdings Historically, starting in April 2005, the Library adopted XHTML 1.0 Transitional as the document type for its website. In January 2011, a redesigned home page used HTML 5. There have been several redesigns since then and webpages are created dynamically. The Library has many XHTML files in its digital collections - over 41 GB in 2025 - across numerous collections.
LC preference The Library of Congress Recommended Formats Statement (RFS) for textual works in digital form, electronic serials (which can be found in section iii. Textual Works - Electronic Serials), and musical scores - digital includes XHTML as an acceptable digital format, when accompanied by DOCTYPE declaration and presentation stylesheet. The RFS does not distinguish between XHTML versions.

Sustainability factors Explanation of format description terms

Disclosure

XHTML 1.0 is a fully documented, non-proprietary format developed and maintained by W3C.

    Documentation

Two editions of XHTML 1.0 were published by W3C as Recommendations, both by the HTML Working Group:

Adoption

Conversion from HTML 4 to XHTML 1.0 was straightforward if the HTML source was already well-formed and avoided markup shortcuts that had been permissible in HTML. Hence, many websites were migrated from HTML 4 to XHTML, often with the help of HTML editors such as Dreamweaver. See Notes below for more detail on the changes needed to turn an HTML 4 document into a corresponding XML version.

However, despite the touted advantages of XHTML, many website creators agreed with IronSpider on his Learn HTML or XHTML? page and continued to use HTML 4 because it was more forgiving and allowed various shortcuts. For most website creators, what matters is whether browsers handle the pages.

According to W3Techs (Web Technology Surveys), in early February 2018, roughly 20% of websites were based on XHTML. Roughly 16% still used HTML 4.x, while the remainder had almost all adopted HTML 5. The same organization publishes a chart comparing use of XHTML with HTML over the years. At its peak, XHTML accounted for about 65% of websites in the survey on January 1, 2012. Since then, the use of XHTML has fallen steadily, with HTML (both HTML 4 and HTML 5) and XHTML both around 50% in January 2014.

XHTML documents can be created and edited using a variety of tools, including XML-aware editors, and many HTML editors, such as the visual editor Adobe Dreamweaver.

    Licensing and patents According to the HTML Working Group IPR disclosures page, "As of August 2002, the HTML Working Group participants and the W3C are not aware of any patents that are essential to implement the deliverables of the HTML Working Group.
Transparency XHTML files can be opened and viewed in text editors. The XHTML markup is human-readable with human-comprehensible element tags and also designed for straightforward automatic parsing.
Self-documentation

The XHTML specification defines a META element within the HEAD section of an HTML document. This element, which may have NAME and CONTENT attributes to hold name/value pairs, is widely used for recording descriptive or administrative metadata for documents or web pages. Web browsers do not typically display this data.

In addition, since XHTML is XML, there is a mechanism to use other namespaces. Hence XML-based metadata specifications, such as RDF, could be used in the HEAD section of an XHTML document. The Semantic Web Deployment Working Group at W3C, active from 2006 until 2009, worked on specifications for embedding RDF in XHTML, resulting in the 2008 W3C Recommendation RDFa in XHTML: Syntax and Processing. The compilers of this resource have not investigated the degree to which this mechanism has been used for describing XHTML web pages. Comments welcome.

Accessibility Features

Depending on implementation, XHTML has the capacity for strong accessibility support thanks to its defined XML structure. See also XML for details. In addition, HTML and XHTML Techniques for WCAG 2.0 which describes methods to "to provide both text and iconic representations of links without making the web page more confusing or difficult for keyboard users or assistive technology users. Since different users finding text and icons more usable, providing both can improve the accessibility of the link." Along with specific details, there is general guidance to use only features that are defined in the specification; use features in the manner prescribed by the specification; and make sure the content can be parsed.

External dependencies None.
Technical protection considerations XHTML provides no internal capabilities for encryption or other technical protection.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering Since XHTML 1.0 was a reformulation of HTML 4, its support for desirable quality and functionality characteristics for textual content is equivalent to HTML 4.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension xhtml
xht
html
htm
See IETF RFC 3236.
Internet Media Type text/html
application/xhtml+xml
For registrations, see RFC 2854 and RFC 3236. XHTML Media Types provides guidance on when to use which media type.
Magic numbers <!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0
This magic number covers the three XHTML profiles: Strict, Transitional, and Frameset. This string should be near the beginning of the file but not necessarily right at the beginning.
Other NF00185
See https://www.archives.gov/files/lod/dpframework/id/NF00185.ttl for eXtensible Hypertext Markup Language 1.0.
Other NF00186
See https://www.archives.gov/files/lod/dpframework/id/NF00186.ttl for eXtensible Hypertext Markup Language 1.1.
Pronom PUID fmt/102
See http://www.nationalarchives.gov.uk/PRONOM/fmt/102.
Wikidata Title ID Q29017286
See https://www.wikidata.org/wiki/Q29017286.

Notes Explanation of format description terms

General

Differences between HTML 4 and XHTML 1.0: XHTML 1.0 was a reformulation of HTML 4. However, the fact that XHTML documents must be valid XML often required some changes to HTML documents. One clause of the XHTML 1.0 specification, Differences with HTML 4, described practices that were perfectly legal in SGML-based HTML 4 but needed to be changed.

  • XHTML documents must be well-formed. Elements required closing tags and must nest without overlaps.
  • Names for elements and attributes must be in lower case.
  • For non-empty elements, end tags are required.
  • For empty elements, end tags must be present or start tags must incorporate a slash, e.g., as in <br/>.
  • Attribute values must always be in quotes.
  • XML does not support attribute minimization

HTML Compatibility Guidelines in Appendix C of the XHTML 1.0 specification summarized design guidelines for authors who wished their XHTML documents to render on existing HTML browsers.

Frames and framesets: The concept of a frameset as the structure for a web page was introduced in HTML 4.0 and incorporated into XHTML 1.0. A frameset defined the positioning of rectangular frames in a browser window. The content for each frame was an independent HTML document referred to by URL. One of the most popular uses of frames was to present a coherent body of content, such as user documentation or help for a software application. A table of contents would be presented in one frame. When a user selected a topic, the document on that topic was shown in another frame. However, disadvantages such as those listed at Advantages and disadvantages of frames and challenges presented by mobile devices with small screens, support for the frameset structure and the associated elements were dropped from XHTML 1.1.

History

A second version of XHTML (XHTML 1.1) was published as a W3C Recommendation in May 2001. XHTML 1.1 was close to XHTML 1.0 Strict, but with the specification modularized. Features deprecated in HTML 4 or XHTML 1.0 were dropped. A proposed XHTML 2.0 never reached the Recommendation status; it was abandoned as a separate specification and published as a Working Group Note in December 2010. Instead, when HTML5 was published as a W3C Recommendation in October 2014, it had the title "HTML5: A vocabulary and associated APIs for HTML and XHTML." When the term "XHTML5" is used, it refers to the XML serialization for HTML5.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 03/31/2025