Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

EA-PDF: Archival Email Format Based on PDF/A

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name EA-PDF: Archival Email Format Based on PDF/A
Description

EA-PDF, an acronym for "Email Archiving in PDF", is a profile of PDF/A specifically designed to meet the complex needs of archiving email messages and mailboxes. Its goal is to define a mechanism for offline preservation of email source data together with a reliable static rendition of the email data. A note on terminology: The term 'EA-PDF' is used as an all-encompassing term and includes the file format, software, use cases/ scenarios, etc while the term 'PDF/mail' represents just the file format. The EA-PDF specification was developed under the auspices of the PDF Association's EA-PDF LWG (Email Archiving in PDF Liaison Working Group). All EA-PDF files must conform to PDF/A-3a, PDF/A-3u (described collectively in this resource on the PDF/A-3 page), PDF/A-4, PDF/A-4f or PDF/A-4e. In addition, valid EA-PDF file is always a valid PDF according to ISO 32000.

A note on terminology: The term 'EA-PDF' is used as an all-encompassing term and includes the file format, software, use cases/scenarios, etc while the term 'PDF/mail' represents just the file format. See File type Signatures for more information about this labeling and Notes for an overview of the history of the specification.

The main advantage of EA-PDF is its ability to leverage the structure and widespread adoption of the PDF format across many domains and sectors. Some of the key features of EA-PDF include

  • Email source data can be included with each email associated with PDF renderings, allowing for easy visualization, programmatic search, and other research activities using existing PDF technology
  • A static rendering in a PDF file allows for downstream distribution, markup, or modification without compromising or sharing source data and without having to support numerous proprietary email formats
  • The ability to package and preserve hierarchical groups of messages, an entire account, or even multiple accounts into a single PDF via PDF’s unique Collections feature (which are also known as 'portfolios', 'binders' and 'packages'
  • Support for unsent and incomplete emails such as drafts
  • Control over which URL links in emails should be active to address security and tracking concerns" among others.

EA-PDF defines a set of profiles "to meet the various requirements discussed in the LWG that reflected specific email archival scenarios and use-cases" (as described in section 3.2).

  • PDF/mail-1s: “Single” in which a single email message preserved as a single EA-PDF file without a folder, where pages in the PDF are a visual representation of only that email’s content or context. The original raw source email asset(s) are always embedded and other files, such as email attachments, are also embedded (if present).
  • PDF/mail-1si: “Single, isolated” in which a single email message preserved as a single EA-PDF file, where pages in the PDF are a visual representation of the email’s content or context, but the original raw source email asset(s) are not embedded (i.e., are not faithfully preserved). Other files may be embedded (e.g. email attachments).
  • PDF/mail-1m: “Multiple” in which multiple emails are preserved as a single EA-PDF file without folder structure, where pages in the PDF are a visual representation of the content or context of the emails. The original raw source email assets are always embedded (faithfully preserved).
  • PDF/mail-1mi: “Multiple, isolated” in which multiple emails are preserved as a single EA-PDF file without folder structure, where pages in the PDF are a visual representation of the content or context of the emails. The original raw source email assets are not embedded, however other files may be embedded (e.g. email attachments).
  • PDF/mail-1c: “Container” in which an EA-PDF “structured container” (also known as a PDF Portfolio) for one or more embedded EA-PDF files, each of which may be any other PDF/mail-1 profile. PDF/mail-1c files can replicate complex folder hierarchies typically found in modern email clients, email formats, or file systems. The container that is the PDF/mail-1c file does not contain pages representing content of preserved emails – all PDF representations of email content are in the embedded EA-PDF files stored within the container PDF collection. Pages in the container PDF represent the context of the collection.
  • PDF/mail-1ci: “Container, isolated” in which an EA-PDF “structured container” for one or more embedded EA-PDF files, each of which may be any other PDF/mail-1 profile. PDF/mail-1ci files can replicate complex folder hierarchies typically found in modern email clients, email formats, or file systems. The container that is the PDF/mail-1ci file does not contain pages representing content of preserved emails – all email content is in the embedded EA-PDF files stored within the container PDF collection. Pages in the container PDF represent the context of the collection.

Structurally, EA-PDF (like PDF) requires that the logical structure tree root structure element is always a single Document structure element, representing the entire EA-PDF file: "For PDF/mail-1{s, si, m, mi} files, every email is represented by a nested Document structure element, directly nested below the top-level Document structure element that represents the PDF/mail-1 file itself. Again, each email may use the Mail_Message custom EA-PDF tag which is always role mapped to Document. In PDF/mail-1{c, ci} container files this top-level Document structure element represents the container PDF and its related Content Sets (not emails, as these are in the embedded files in the collection). Thus the Mail_Message custom EA-PDF tag will never occur in PDF/mail-1{c, ci} container files." (Specification p. 28).

EA-PDF also defines a set of common email metadata header fields and related attributes of each email labeled as Core Fields in Table 4. These include To, From, Sent, Subject, CC, BCC, and more. Core Fields names usually correspond to the matching email header field name, however EA-PDF Creation Software may add additional email header fields prefixed with “Raw-” to indicate a raw value from the email that would otherwise be an error when using a more rigid or structured XMP data type.

Production phase A final-state format for delivery to end users and long-term preservation.
Relationship to other formats
    Subtype of PDF/A-3, PDF/A-3, PDF for Long-term Preservation, Use of ISO 32000-1, With Embedded Files. As defined in the EA-PDF specification, all EA-PDF files must conform to either PDF/A-3a, PDF/A-3u, PDF/A-4, PDF/A-4f, or PDF/A-4e.
    Subtype of PDF/A-4, PDF/A-4, PDF for Long-term Preservation, Use of ISO 32000-2. As defined in the EA-PDF specification, all EA-PDF files must conform to either PDF/A-3a, PDF/A-3u, PDF/A-4, PDF/A-4f, or PDF/A-4e.
    Subtype of PDF/A-4e, PDF/A for Engineering, Use of ISO 32000-2 (PDF/A-4): ISO 19005-4, Annex B. As defined in the EA-PDF specification, all EA-PDF files must conform to either PDF/A-3a, PDF/A-3u, PDF/A-4, PDF/A-4f, or PDF/A-4e.
    Subtype of PDF/A-4f, PDF/A for Embedded Files, Use of ISO 32000-2 (PDF/A-4): ISO 19005-4, Annex A. As defined in the EA-PDF specification, all EA-PDF files must conform to either PDF/A-3a, PDF/A-3u, PDF/A-4, PDF/A-4f, or PDF/A-4e.
    Has subtype PDF_Portfolio, PDF Portfolio. EA-PDF can be supported in a “structured container” (also known as a 'PDF Portfolio') for one or more embedded EA-PDF files, each of which may be any other PDF/mail-1 profile. PDF/mail-1c files can replicate complex folder hierarchies typically found in modern email clients, email formats, or file systems. The container that is the PDF/mail-1c file does not contain pages representing content of preserved emails – all PDF representations of email content are in the embedded EA-PDF files stored within the container PDF collection. Pages in the container PDF represent the context of the collection.

Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress is represented on the working group for PDF/A (ISO/TC 171/SC 2/WG 5) as well as the PDF Association's EA-PDF (Email Archiving in PDF) LWG. As of December 2025, the Library does not have EA-PDF files in its collections.
LC preference See the Library of Congress Recommended Formats Statement (RFS) for format preferences related to email. The metadata section is influenced by compliance with EA-PDF metadata requirements.

Sustainability factors Explanation of format description terms

Disclosure

Fully documented. Specification version 1.0 was published by the PDF Association in February 2025. Specification was developed under the auspices of the PDF Association's EA-PDF (Email Archiving in PDF) LWG which was part of a 24-month-long project led by the University of Illinois funded by a National Leadership Grant from the Institute of Museum and Library Services. Participation in the EA-PDF LWG is open to members of the PDF Association as well as to experts who wish to participate as either Invited or Liaison experts, subject to PDF Association’s IPR Policy.

    Documentation

EA-PDF: An archival email format based on PDF/A, version 1.0 February 2025.

Adoption Adoption is not wide-spread as of December 2025 in part because tools are still developing. One example is Big Faceless Organization's (BFO) support in BFO Publisher. Another draft project is University of Illinois Urbana-Champaign's (UIUC) Email2Pdf to transform "EML or MBOX into archival PDF files that conform to the EA-PDF specification as output."
    Licensing and patents The EA-PDF specification is under a Creative Commons Attribution 4.0 International License and also notes that "vendors own their respective copyrights and trademarks wherever they are mentioned. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The PDF Association shall not be held responsible for identifying any or all such patent rights. Users of this document are responsible for independently verifying any and all information."
Transparency See PDF/A_family. For PDF/A-3 and PDF/A-4, transparency and characterization of non-PDF/A-4 embedded files are primary concerns for long-term preservation.
Self-documentation

EA-PDF defines a set of core fields in email headers (see table 4).

Accessibility Features

EA-PDF extends PDF’s Logical Structure and Tagged PDF features to provide enhanced semantics with improved accessibility and reuse of extracted content with an optional custom EA-PDF tag-set with specific role mapping back to the PDF 1.7 standard structure tag set. These custom tags (structure element types) semantically mark up PDF page content streams to identify Core Field information using a custom PDF 2.0 namespace.

External dependencies See PDF/A_family.
Technical protection considerations See PDF/A_family.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering See PDF/A_family.
Integrity of document structure See PDF/A_family.
Integrity of layout and display See PDF/A_family.
Support for mathematics, formulae, etc. See PDF/A_family.
Functionality beyond normal rendering See PDF/A_family.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension pdf
The specification does not indicate that a different extension should be used to identify EA-PDF.
Internet Media Type See related format.  See PDF/A_family.
Magic numbers ASCII: %PDF-1.7
From the EA-PDF specification, section 9.1. "The PDF header SHALL be either “%PDF-1.7” or “%PDF-2.0” and the declared PDF version number of an EA-PDF file SHALL be either PDF 1.7 or PDF 2.0...PDF version requirement accounts for the Document catalog Version key (if present) and file header."
Magic numbers ASCII: %PDF-2.0
From the EA-PDF specification, section 9.1. "The PDF header SHALL be either “%PDF-1.7” or “%PDF-2.0” and the declared PDF version number of an EA-PDF file SHALL be either PDF 1.7 or PDF 2.0...PDF version requirement accounts for the Document catalog Version key (if present) and file header."
Other PDF/mail
PDF/mail-1
The specification notes that "the term “EA-PDF” is used as an all-encompassing term and includes the file format, software, use cases/ scenarios, etc. The moniker “PDF/mail” represents just the file format defined by this industry specification and any later editions. The term “PDF/mail-1” refers to this first edition of PDF/mail - future versions may use other versions (such as PDF/mail-2, PDF/mail-3, etc.) in a similar manner to the way that ISO subsets are versioned. If this industry specification is ever standardized by ISO, “PDF/M” would be the most likely equivalent ISO moniker following the current principles of PDF naming conventions (cf. “PDF/raster” and “PDF/R”)."
Indicator for profile, level, version, etc. See note.  The standard specifies that the PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension schema defined in the standard. This schema has two mandatory elements: pdfaid:part (integer), pdfaid:rev (4-character integer of the date of publication or revision). A PDF/A-4 file should have the integer value 4 for pdfaid:part. Claim to conformance with one of the profiles defined in Annexes A and B is made in the optional pdfaid:conformance by the following single characters: E for PDF/A-4e and F for PDF/A-4f. E and F are the only valid values for pdfaid:conformance in a PDF/A-4 file. Note that pdfaid:conformance is not mandatory for PDF/A-4 as it is for previous versions of PDF/A.
Other See related format.  See PDF/A-3 for NARA File Format Preservation Plan ID values for PDF/A-3a and PDF/A-3u. There is no separate entry for EA-PDF.
Other See related format.  See PDF/A-4 for NARA File Format Preservation Plan ID values for PDF/A-4. There is no separate entry for EA-PDF.
Pronom PUID See related format.  See PDF/A-4. There is no separate entry for EA-PDF.
Pronom PUID See related format.  See PDF/A-4e. There is no separate entry for EA-PDF.
Pronom PUID See related format.  See PDF/A-4f. There is no separate entry for EA-PDF.
Wikidata Title ID See related format.  See PDF/A-4. There is no separate entry for EA-PDF.
Wikidata Title ID See related format.  See PDF/A-4e. There is no separate entry for EA-PDF.
Wikidata Title ID See related format.  See PDF/A-4f. There is no separate entry for EA-PDF.

Notes Explanation of format description terms

General  
History

As summarized in Email Archiving in PDF (EA-PDF): From Initial Specification to Community of Practice, "This 24-month-long project builds upon the recently released recommendation from a planning project that was supported by the Andrew W. Mellon Foundation: A Specification for Using PDF to Package and Represent Email (EA-PDF Working Group, 2021). That report provides summary recommendations from a multi-institution working group and was refined through community feedback. The project now proposed will fulfill those recommendations by providing tangible outcomes that can standardize preservation-oriented email archiving in the mainstream of archival practice."

The concepts about an email-specific profile of PDF was first surfaced in the Mellon Foundation and Digital Preservation Coalition sponsored project, Task Force on Technical Approaches for Email Archives (link via Internet Archive). The project's final report published by CLIR in 2018 The Future of Email Archives: A Report from the Task Force on Technical Approaches for Email Archives (p, 82-83) states a goal to "Improve Options for PDF in Email-Archiving Workflows" as a High-Impact/Long-Term Activities: "Options to output email messages to PDF are well integrated into many common email clients. However, important header fields and other key technical metadata are often lost or concealed in the format migration. In addition, message threading and connections to attachments are terminated. Improving the technical capability of PDF software, especially software embedded in email clients, to address issues relevant to email archiving would simplify workflows at a large scale. Activity: Work with the PDF Association, the international vendor-neutral organization focused on PDF software and tools, to identify software requirements for email-archiving features for the PDF format. Planned action: Task force members will contact the PDF Association to start the project in the fall of 2018."

See also PDF/A_family, PDF/A-3, PDF/A-4e and PDF/A-4f.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 03/20/2026