Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

PDF Portfolio

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name PDF Portfolio
Description

A PDF Portfolio is a collection of PDF (or Portable Document Format) files or other files as attachments within a PDF wrapper. The unique feature of the PDF Portfolio (which uses the .pdf extension) is that the attached files remain in their original file format (such as text documents, e-mail messages, spreadsheets, CAD drawings, PowerPoint presentations and more) within the PDF container and each component file can be opened, read, edited, and formated independently of the other component files in the PDF Portfolio. The PDF specification does not provide a clear list or indication of the types of files that can be included as part of a PDF Portfolio but Adobe's Overview of PDF Portfolios lists "text documents, e-mail messages, spreadsheets, CAD drawings, and PowerPoint presentations." Presumably, any format that can be supported by PDF can be attached as a PDF Portfolio. Comments welcome.The article PDF portfolios and how to use them from the PDF Association adds that a PDF Portfolio "can easily incorporate original images, photographs and videos into a single file without needing to worry about compression artifacts affecting the perception of their work, since unlike a combined PDF where all files are converted to PDF, files contained within the PDF portfolio remain untouched and easily viewed with a supported application."

Note that Adobe suggests that the PDF Portfolio "functions more as an aggregate file type". In this resource, the term "aggregate" refers to a specific set of files which "are used to collect multiple data files together into a single file for easier portability and storage, with the option for data compression to save storage space. In addition to files and metadata, archive files may include features for fixity checking or checksums, process history data, encryption and other technical protection mechanisms, directory structures and error detection and correction information." See Aggregate - Quality and Functionality Factors for more information and Format Descriptions for Aggregate Formats for the list of formats which include ZIP and RAR. While PDF Portfolio acts as a bundling or container format, it falls outside the definition of an aggregate format in this resource because , due to the wide variety of formats that can be contained within, assumptions about file encryption, compression, or the presence of DRM should not be made. It's important to note that PDF Portfolios do not employ compression. The advantage of using a PDF Portfolio, rather than combining all files into a single PDF, is that it preserves the original files without making any transformations. Comments welcome.

Operating systems and file identification software typically identify this file as a PDF because its magic numbers match those of PDF Version 1.7 and PDF 2.0.

Users can manage the contents of this file type as a PDF Portfolio. If you delete a folder, all files within it are also removed from the PDF Portfolio. Additionally, you have the option to extract one or more components from the PDF Portfolio and save them individually.

The term "portfolio" is not mentioned within the PDF specification. This format first appeared in PDF Version 1.7 and was further expanded in PDF Version 2.0. ISO 32000-1:2008. Document management -- Portable document format -- Part 1: PDF 1.7 section 12.3.5, Collections) introduces the concept of a "portable collection": "Beginning with PDF 1.7, PDF documents may specify how a conforming reader’s user interface presents collections of file attachments, where the attachments are related in structure or content. Such a presentation is called a portable collection." PDF 2.0 (ISO 32000-2 available for download free of charge from the PDF Association and its sponsors) added the following features to this format: An interactive layout called “navigator”; a color dictionary for the collection layout; and “folders,” an indirect reference to the root of the collection's folder structure.Adobe Reader Version 8 referred to the concept of this format as “PDF Packages”. The National Archives and Records Administration (NARA) refers to this format as a “portable collection”.

Many File Types, A Polished PDF Portfolio from AdobePress explains the difference between a PDF Portfolio and the more simple combined files function: When multiple files are combined into a single PDF (using the  File, Create, Combine Files into a Single PDF feature), the "native file formats (such as Word) are converted to PDF and all of the pages are combined into one PDF. A portfolio is an entirely different animal... [in which] the series of PDFs or native files in a PDF “wrapper” (a single PDF file)...maintains the format of the original files embedded in it."

To identify a PDF Portfolio, look for the inclusion of a "collection dictionary" within the PDF document. If this dictionary is present, a conforming PDF viewer should present the document as a portable collection. This is indicated by the "/Collection" (Hex: 2f 43 6f 6c 6c 65 63 74 69 6f 6e) entry in ASCII, and also the use of "/CI" to identify a Collection Item within the specification.

Identification can also be made with the presence of the "hasCollection" tag.

Most fields used by Collections are marked as optional in the specification, except for the “Collection Subtype Entry” and “Collection Subtype Entry (textual) Name” fields. This rule applies to both the collection field dictionary and collection sort dictionary.

When using Adobe Reader, you have the option to view the component files in two different ways: “Layout (or Preview) mode and Details (or Files mode)”.

Production phase In general, a final-state format for delivery to end users.
Relationship to other formats
    Subtype of PDF_1_7, PDF, Version 1.7 (ISO 32000-1:2008)
    Subtype of PDF_2_0, PDF 2.0, ISO 32000-2 (2017, 2020)

Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress has many PDF Portfolios in its varied collections.
LC preference See the Library of Congress Recommended Formats Statement. PDF Portfolios are not specified for any specific content category.

Sustainability factors Explanation of format description terms

Disclosure Fully documented.  While the standardization and documentation of PDF v1.7 started with Adobe Systems Incorporated,  standards work for PDF now is coordinated through ISO ISO/TC 171/SC 2/WG 8. See PDF, Version 1.7 (ISO 32000-1:2008) and PDF 2.0, ISO 32000-2 (2017, 2020) for more details. See also PDF Association for background documents related to PDF.
    Documentation

ISO 32000-1:2008. Document management -- Portable document format -- Part 1: PDF 1.7. Confirmed in 2018. Adobe makes available an ISO-approved copy of the standard at https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf.

ISO 32000-2:2017 Document management -- Portable document format -- Part 2: PDF 2.0.

ISO 32000-2:2020 Document management -- Portable document format -- Part 2: PDF 2.0. [dated revision]

The PDF Association announced in an April 2023 press release that it provides no cost downloads of the ISO 32000-2 (PDF 2.0) bundle. Included are ISO 32000-2:2020; ISO 32000-2:2020/Amd 1; ISO/TS 32002:2022. See https://www.pdfa-inc.org/product/iso-32000-2-pdf-2-0-bundle-sponsored-access/ for direct links to the downloads.

See also PDF (Portable Document Format) Family, PDF, Version 1.7 (ISO 32000-1:2008) and PDF 2.0, ISO 32000-2 (2017, 2020).

Adoption

PDF Portfolio files can be created using Adobe Acrobat versions 9 and later and can be read by Adobe Reader versions 8 and later. In addition, other options include but are not limited to iText Core (starting with version 7), Nitro PDF Pro and Foxit PDF Editor.

As explained by Johan van der Knijff in PDF processing and analysis with open-source tools, Apache Tika toolkit can extract view and extract metadata in embedded documents including PDF Portfolios by "invoking Tika with the -J (“output metadata and content from all embedded files”) option results in JSON-formatted output that contains metadata (and also extracted text) for all for all files that are embedded in this document."

Both the PDF Version 1.7 specification (section 12.3.5) and the PRONOM entry discuss utilizing PDF Portfolio for email preservation. The specification for PDF Version 2.0 also discusses email preservation (section 12.3.5.1).

The PDF Association details use cases for business project sharing, artist portfolios, and email preservation.

See: PDF (Portable Document Format) Family

    Licensing and patents See: PDF (Portable Document Format) Family
Transparency See: PDF (Portable Document Format) Family
Self-documentation

See: PDF, Version 1.7 (ISO 32000-1:2008), PDF 2.0, ISO 32000-2 (2017, 2020)

External dependencies

Adobe Acrobat 9 or later is required to create this format.

Adobe Reader versions 8 and later can read this format. The concept of this format was referred to in Adobe Acrobat 8 as “PDF Packages”. While these serve the same purpose, versions of Adobe Acrobat will render their layout differently.

At one point, Adobe Flash Player 10.1 or later was also required. Earlier versions of Flash Player cannot play a published PDF Portfolio.

See PDF (Portable Document Format) Family for more.

Technical protection considerations

PDF Portfolios adhere to the same role as PDF files. Any original files contained within a PDF Portfolio will adhere to their own technical protection considerations.

See PDF (Portable Document Format) Family for more.


Quality and functionality factors Explanation of format description terms

Still Image
Normal rendering For quality and functionality factors associated with still images, see PDF Family.
Text
Normal rendering For most quality and functionality factors associated with text, see PDF Family.
Other
Additional considerations for attached files. Depends on the attached files because PDF Portfolio can contain a range of file types which are supported in native applications.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension pdf
See PDF (Portable Document Format) Family.
Internet Media Type application/pdf
See PDF (Portable Document Format) Family.
Magic numbers Hex: 25 50 44 46
ASCII: %PDF

PDF Portfolio magic numbers are not different from other members of the PDF Family. See PDF (Portable Document Format) Family.

Note: The only way to determine whether a PDF is a PDF Portfolio or a standard PDF is to seek within the file for a "/Collection" string (as detailed in Description).

Other NF00792
NARA File Format Preservation Plan ID. See https://www.archives.gov/files/lod/dpframework/id/NF00792.ttl.
Pronom PUID fmt/1451
See https://www.nationalarchives.gov.uk/PRONOM/fmt/1451.
Wikidata Title ID Q109971781
See https://www.wikidata.org/wiki/Q109971781.

Notes Explanation of format description terms

General

Relationship with Flash: Version 8 and older of this format required Adobe Flash to be viewed. In this era, this format could be saved as a Flash-based website>. Shockwave Flash Movie (SWF) files were not supported in PDF Portfolios.

Relationship with Digital Signatures: If a digital signature is within any of the included files of a PDF Portfolio and edits to any portion of the PDF Portfolio will break all signatures within the file. A digital signature can be used on the PDF Portfolio as a whole, which reduces the need to create signatures for the individual included files.

History

In 2006, the PDF specification introduced support for document collections referring to these as “PDF Packages” and “PDF Portfolios” collated in Adobe Acrobat 8. This package was able to include file attachments of various formats such as PDFs, text documents, email messages, spreadsheets, CAD drawings, and Microsoft PowerPoint presentations. Each component file in a PDF Package could be opened, edited, and formatted independently.

With the release of Adobe Acrobat 9 and ISO 32000-1:2008, PDF Portfolios extended the collection dictionary with customizable ActionScript user interfaces for navigating the collection's files. This introduced a more dynamic approach to managing collections. PDF Portfolios include an ActionScript program, optional resources, icons, and XML files to define the interface and organization of the portfolio layout. Users could create PDF Portfolios with Adobe Acrobat Pro 9, including custom layouts or sample layouts provided by Adobe Acrobat. Adobe Reader 9 and Acrobat Standard could display PDF Portfolio Layouts but could not create PDF Portfolios.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 12/08/2023