Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
| Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact | |


| Full name | EA-PDF: Archival Email Format Based on PDF/A |
|---|---|
| Description |
EA-PDF, an acronym for "Email Archiving in PDF", is a profile of PDF/A specifically designed to meet the complex needs of archiving email messages and mailboxes. Its goal is to define a mechanism for offline preservation of email source data together with a reliable static rendition of the email data. A note on terminology: The term 'EA-PDF' is used as an all-encompassing term and includes the file format, software, use cases/ scenarios, etc while the term 'PDF/mail' represents just the file format. The EA-PDF specification was developed under the auspices of the PDF Association's EA-PDF LWG (Email Archiving in PDF Liaison Working Group). All EA-PDF files must conform to PDF/A-3a, PDF/A-3u (described collectively in this resource on the PDF/A-3 page), PDF/A-4, PDF/A-4f or PDF/A-4e. In addition, valid EA-PDF file is always a valid PDF according to ISO 32000. A note on terminology: The term 'EA-PDF' is used as an all-encompassing term and includes the file format, software, use cases/scenarios, etc while the term 'PDF/mail' represents just the file format. See File type Signatures for more information about this labeling and Notes for an overview of the history of the specification. The main advantage of EA-PDF is its ability to leverage the structure and widespread adoption of the PDF format across many domains and sectors. Some of the key features of EA-PDF include
EA-PDF defines a set of profiles "to meet the various requirements discussed in the LWG that reflected specific email archival scenarios and use-cases" (as described in section 3.2).
Structurally, EA-PDF (like PDF) requires that the logical structure tree root structure element is always a single Document structure element, representing the entire EA-PDF file: "For PDF/mail-1{s, si, m, mi} files, every email is represented by a nested Document structure element, directly nested below the top-level Document structure element that represents the PDF/mail-1 file itself. Again, each email may use the Mail_Message custom EA-PDF tag which is always role mapped to Document. In PDF/mail-1{c, ci} container files this top-level Document structure element represents the container PDF and its related Content Sets (not emails, as these are in the embedded files in the collection). Thus the Mail_Message custom EA-PDF tag will never occur in PDF/mail-1{c, ci} container files." (Specification p. 28). EA-PDF also defines a set of common email metadata header fields and related attributes of each email labeled as Core Fields in Table 4. These include To, From, Sent, Subject, CC, BCC, and more. Core Fields names usually correspond to the matching email header field name, however EA-PDF Creation Software may add additional email header fields prefixed with “Raw-” to indicate a raw value from the email that would otherwise be an error when using a more rigid or structured XMP data type. |
| Production phase | A final-state format for delivery to end users and long-term preservation. |
| Relationship to other formats | |
| Subtype of | PDF/A-3, PDF/A-3, PDF for Long-term Preservation, Use of ISO 32000-1, With Embedded Files. As defined in the EA-PDF specification, all EA-PDF files must conform to either PDF/A-3a, PDF/A-3u, PDF/A-4, PDF/A-4f, or PDF/A-4e. |
| Subtype of | PDF/A-4, PDF/A-4, PDF for Long-term Preservation, Use of ISO 32000-2. As defined in the EA-PDF specification, all EA-PDF files must conform to either PDF/A-3a, PDF/A-3u, PDF/A-4, PDF/A-4f, or PDF/A-4e. |
| Subtype of | PDF/A-4e, PDF/A for Engineering, Use of ISO 32000-2 (PDF/A-4): ISO 19005-4, Annex B. As defined in the EA-PDF specification, all EA-PDF files must conform to either PDF/A-3a, PDF/A-3u, PDF/A-4, PDF/A-4f, or PDF/A-4e. |
| Subtype of | PDF/A-4f, PDF/A for Embedded Files, Use of ISO 32000-2 (PDF/A-4): ISO 19005-4, Annex A. As defined in the EA-PDF specification, all EA-PDF files must conform to either PDF/A-3a, PDF/A-3u, PDF/A-4, PDF/A-4f, or PDF/A-4e. |
| Has subtype | PDF_Portfolio, PDF Portfolio. EA-PDF can be supported in a “structured container” (also known as a 'PDF Portfolio') for one or more embedded EA-PDF files, each of which may be any other PDF/mail-1 profile. PDF/mail-1c files can replicate complex folder hierarchies typically found in modern email clients, email formats, or file systems. The container that is the PDF/mail-1c file does not contain pages representing content of preserved emails – all PDF representations of email content are in the embedded EA-PDF files stored within the container PDF collection. Pages in the container PDF represent the context of the collection. |

| LC experience or existing holdings | The Library of Congress is represented on the working group for PDF/A (ISO/TC 171/SC 2/WG 5) as well as the PDF Association's EA-PDF (Email Archiving in PDF) LWG. As of December 2025, the Library does not have EA-PDF files in its collections. |
|---|---|
| LC preference | See the Library of Congress Recommended Formats Statement (RFS) for format preferences related to email. The metadata section is influenced by compliance with EA-PDF metadata requirements. |

| Disclosure |
Fully documented. Specification version 1.0 was published by the PDF Association in February 2025. Specification was developed under the auspices of the PDF Association's EA-PDF (Email Archiving in PDF) LWG which was part of a 24-month-long project led by the University of Illinois funded by a National Leadership Grant from the Institute of Museum and Library Services. Participation in the EA-PDF LWG is open to members of the PDF Association as well as to experts who wish to participate as either Invited or Liaison experts, subject to PDF Association’s IPR Policy. |
|---|---|
| Documentation |
EA-PDF: An archival email format based on PDF/A, version 1.0 February 2025. |
| Adoption | Adoption is not wide-spread as of December 2025 in part because tools are still developing. One example is Big Faceless Organization's (BFO) support in BFO Publisher. Another draft project is University of Illinois Urbana-Champaign's (UIUC) Email2Pdf to transform "EML or MBOX into archival PDF files that conform to the EA-PDF specification as output." |
| Licensing and patents | The EA-PDF specification is under a Creative Commons Attribution 4.0 International License and also notes that "vendors own their respective copyrights and trademarks wherever they are mentioned. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The PDF Association shall not be held responsible for identifying any or all such patent rights. Users of this document are responsible for independently verifying any and all information." |
| Transparency | See PDF/A_family. For PDF/A-3 and PDF/A-4, transparency and characterization of non-PDF/A-4 embedded files are primary concerns for long-term preservation. |
| Self-documentation |
EA-PDF defines a set of core fields in email headers (see table 4). Accessibility Features EA-PDF extends PDF’s Logical Structure and Tagged PDF features to provide enhanced semantics with improved accessibility and reuse of extracted content with an optional custom EA-PDF tag-set with specific role mapping back to the PDF 1.7 standard structure tag set. These custom tags (structure element types) semantically mark up PDF page content streams to identify Core Field information using a custom PDF 2.0 namespace. |
| External dependencies | See PDF/A_family. |
| Technical protection considerations | See PDF/A_family. |

| Text | |
|---|---|
| Normal rendering | See PDF/A_family. |
| Integrity of document structure | See PDF/A_family. |
| Integrity of layout and display | See PDF/A_family. |
| Support for mathematics, formulae, etc. | See PDF/A_family. |
| Functionality beyond normal rendering | See PDF/A_family. |

| Tag | Value | Note |
|---|---|---|
| Filename extension | pdf |
The specification does not indicate that a different extension should be used to identify EA-PDF. |
| Internet Media Type | See related format. | See PDF/A_family. |
| Magic numbers | ASCII: %PDF-1.7 |
From the EA-PDF specification, section 9.1. "The PDF header SHALL be either “%PDF-1.7” or “%PDF-2.0” and the declared PDF version number of an EA-PDF file SHALL be either PDF 1.7 or PDF 2.0...PDF version requirement accounts for the Document catalog Version key (if present) and file header." |
| Magic numbers | ASCII: %PDF-2.0 |
From the EA-PDF specification, section 9.1. "The PDF header SHALL be either “%PDF-1.7” or “%PDF-2.0” and the declared PDF version number of an EA-PDF file SHALL be either PDF 1.7 or PDF 2.0...PDF version requirement accounts for the Document catalog Version key (if present) and file header." |
| Other | PDF/mail PDF/mail-1 |
The specification notes that "the term “EA-PDF” is used as an all-encompassing term and includes the file format, software, use cases/ scenarios, etc. The moniker “PDF/mail” represents just the file format defined by this industry specification and any later editions. The term “PDF/mail-1” refers to this first edition of PDF/mail - future versions may use other versions (such as PDF/mail-2, PDF/mail-3, etc.) in a similar manner to the way that ISO subsets are versioned. If this industry specification is ever standardized by ISO, “PDF/M” would be the most likely equivalent ISO moniker following the current principles of PDF naming conventions (cf. “PDF/raster” and “PDF/R”)." |
| Indicator for profile, level, version, etc. | See note. | The standard specifies that the PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension schema defined in the standard. This schema has two mandatory elements: pdfaid:part (integer), pdfaid:rev (4-character integer of the date of publication or revision). A PDF/A-4 file should have the integer value 4 for pdfaid:part. Claim to conformance with one of the profiles defined in Annexes A and B is made in the optional pdfaid:conformance by the following single characters: E for PDF/A-4e and F for PDF/A-4f. E and F are the only valid values for pdfaid:conformance in a PDF/A-4 file. Note that pdfaid:conformance is not mandatory for PDF/A-4 as it is for previous versions of PDF/A. |
| Other | See related format. | See PDF/A-3 for NARA File Format Preservation Plan ID values for PDF/A-3a and PDF/A-3u. There is no separate entry for EA-PDF. |
| Other | See related format. | See PDF/A-4 for NARA File Format Preservation Plan ID values for PDF/A-4. There is no separate entry for EA-PDF. |
| Pronom PUID | See related format. | See PDF/A-4. There is no separate entry for EA-PDF. |
| Pronom PUID | See related format. | See PDF/A-4e. There is no separate entry for EA-PDF. |
| Pronom PUID | See related format. | See PDF/A-4f. There is no separate entry for EA-PDF. |
| Wikidata Title ID | See related format. | See PDF/A-4. There is no separate entry for EA-PDF. |
| Wikidata Title ID | See related format. | See PDF/A-4e. There is no separate entry for EA-PDF. |
| Wikidata Title ID | See related format. | See PDF/A-4f. There is no separate entry for EA-PDF. |

| General | |
|---|---|
| History |
As summarized in Email Archiving in PDF (EA-PDF): From Initial Specification to Community of Practice, "This 24-month-long project builds upon the recently released recommendation from a planning project that was supported by the Andrew W. Mellon Foundation: A Specification for Using PDF to Package and Represent Email (EA-PDF Working Group, 2021). That report provides summary recommendations from a multi-institution working group and was refined through community feedback. The project now proposed will fulfill those recommendations by providing tangible outcomes that can standardize preservation-oriented email archiving in the mainstream of archival practice." The concepts about an email-specific profile of PDF was first surfaced in the Mellon Foundation and Digital Preservation Coalition sponsored project, Task Force on Technical Approaches for Email Archives (link via Internet Archive). The project's final report published by CLIR in 2018 The Future of Email Archives: A Report from the Task Force on Technical Approaches for Email Archives (p, 82-83) states a goal to "Improve Options for PDF in Email-Archiving Workflows" as a High-Impact/Long-Term Activities: "Options to output email messages to PDF are well integrated into many common email clients. However, important header fields and other key technical metadata are often lost or concealed in the format migration. In addition, message threading and connections to attachments are terminated. Improving the technical capability of PDF software, especially software embedded in email clients, to address issues relevant to email archiving would simplify workflows at a large scale. Activity: Work with the PDF Association, the international vendor-neutral organization focused on PDF software and tools, to identify software requirements for email-archiving features for the PDF format. Planned action: Task force members will contact the PDF Association to start the project in the fall of 2018." See also PDF/A_family, PDF/A-3, PDF/A-4e and PDF/A-4f. |

|
|