Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

PyTorch Serialized File Format

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name PyTorch Serialized File Format
Description

The PyTorch Serialized File Format is an uncompressed ZIP64 archive (as of PyTorch version 1.6.0) that is used to save the weights and biases of a trained PyTorch model to disk, along with other data. It is common convention to save PyTorch models using either .pt or .pth as the file extension, but not mandatory. The file within the archive that contains the model's weights and biases will have the file extension .pkl, having been pickled/serialized upon save.

PyTorch is an open-source, Python-based deep learning library based on the Torch library. According to Datacamp's Deep Learning with PyTorch Cheat Sheet, PyTorch models are used for neural network development, natural language processing, computer vision, and reinforcement learning.

The PyTorch Serialized File Format is documented within the PyTorch documentation. A PyTorch model is saved to disk using the torch.save function. This function uses Python's pickle utility to serialize the object that is passed to it. When saving a PyTorch model for inference, PyTorch recommends only saving the trained model's learned parameters (i.e., its weights and biases), which are found in its state_dict. According to PyTorch, a state_dict is "a Python dictionary object that maps each layer to its parameter tensor. Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) and registered buffers (batchnorm's running_mean) have entries in the model's state_dict."

Inside the archive, as described in PyTorch's serialization semantics documentation, is:

  • A pickled/serialized version of the object that was passed to torch.save, such as a state_dict (excluding its storage objects, which are stored in the data directory)
  • A string with the sys.byteorder ("little" or "big")
  • A data directory containing a file for each storage in the object
  • Version number at save time

Comments welcome.

Production phase May be used at any lifecycle phase for saving PyTorch models.
Relationship to other formats
    Contains Serialized Python object, not described separately at this time.

Local use Explanation of format description terms

LC experience or existing holdings As of this writing in April 2025, the Library of Congress does not have PyTorch serialized files in its collections.
LC preference The Library of Congress has not yet expressed any format preference for preserving machine learning models.

Sustainability factors Explanation of format description terms

Disclosure PyTorch is an open-source library with documentation available on the PyTorch website and code available on the PyTorch GitHub repository.
    Documentation PyTorch documents its serialized file format for torch.save on the serialization semantics page.
Adoption The PyTorch blog described PyTorch in 2022 as "one of the primary platforms for AI research, as well as commercial production use." Also in 2022, The Register called PyTorch "one of the major deep learning frameworks at the moment, the other being TensorFlow, developed by Google." A 2017 article from O'Reilly Media credits the adoption of PyTorch "to native Python-style imperative programming already familiar to researchers, data scientists, and developers of popular Python libraries such as NumPy and SciPy." Comments welcome.
    Licensing and patents From the PyTorch GitHub repository: "PyTorch has a BSD-style license, as found in the LICENSE file."
Transparency The PyTorch Serialized File Format is not intended to be manually opened, but to be imported into a PyTorch module according to FileInfo.com. See ZIP_PK for more general information. Comments welcome.
Self-documentation

PyTorch ZIP archives contain some metadata: a string with the sys.byteorder ("little" or "big") and a version number at save time that can be used at load time. See ZIP_PK for more general information. Comments welcome.

Accessibility Features

The PyTorch Serialized File Format has no specific attributes to support accessibility. Comments welcome.

External dependencies

PyTorch can be installed on Windows, Mac, or Linux OS. Prerequisites for installation are outlined on the PyTorch Get Started page.

The PyTorch documentation on saving and loading models recommends saving only the model's learnable parameters, which are stored in the state_dict object of the model's module, over saving complete model architecture and parameters. Saving the entire model is not recommended because it creates dependencies on the class definitions and directory structure used save time.

Comments welcome.

Technical protection considerations TBD

Quality and functionality factors Explanation of format description terms

Aggregate
Compression See ZIP_PK.
Support for Error Dectection See ZIP_PK.
Beyond normal functionality See ZIP_PK.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension pt
pth
From PyTorch's Saving and Loading Models: Use of these extensions is common convention when saving PyTorch models.
Other See note.  NARA File Format Preservation Plan ID has no corresponding entry as of April 2025.
Pronom PUID See note.  PRONOM has no corresponding entry as of April 2025.
Wikidata Title ID Q47509047
See https://www.wikidata.org/wiki/Q47509047 for PyTorch library.

Notes Explanation of format description terms

General

The two defining features of PyTorch, according to the PyTorch README on GitHub, are "tensor computation with strong GPU acceleration" and "deep neural networks built on a tape-based autograd system."

From the PyTorch tutorial on tensors: "Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model's parameters."

Tensors are passed through a neural network, which the PyTorch tutorial on building models describes as a module that contains other modules nested within it. These layers of modules perform operations on the tensor data. Modules can have parameters, "weights and biases that are optimized during training," which are also encoded as tensors.

History PyTorch was originally developed by the Facebook AI Research (FAIR) team at Meta and first launched in 2017 according to the FAIR blog. Since September 2022, it has been governed, developed, and maintained by the PyTorch Foundation, part of the Linux Foundation, per an announcement on the PyTorch blog. As of this writing in April 2025, PyTorch's latest stable release is version 2.6.

Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 04/02/2025