Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | bzip2 |
---|---|
Description |
The bzip2 file format is a freely available, patent-free data compression program created by Julian Seward. It is both the name of the format and the program used to create it. The program is designed for compressing single files only. It was created as a successor to its predecessor, bzip, to avoid potential patent issues. See: History for more information. Different versions of bzip2 maintain file format compatibility. Newer versions can work with files created by older versions, ensuring a level of stability. However, the format creator acknowledges limitations in the compressed file format and provides source code for decompressing older files created by bzip-0.21. A bz2 stream consists of a 4-byte header, followed by zero or more compressed blocks. An end-of-stream marker contains a 32-bit CRC for the plaintext whole stream processed. The compressed blocks are bit-aligned, and no padding occurs. |
Production phase | May be used at any lifecycle phase for bundling/packaging files together for exchange, storage, or distribution. |
LC experience or existing holdings | The Library of Congress has a small number of bzip files across its varied collections. |
---|---|
LC preference | Bzip is not includes in the Library of Congress Recommended Formats Statement. |
Disclosure | No formal specification for the bzip2 file format exists. Comments welcome. |
---|---|
Documentation |
Two unofficial documentation resources are commonly cited.
|
Adoption |
Widely adopted. The bzip2 file format “ships standard on many Unix/Linux systems.” Often compared to gzip and ZIP File Format (PKWARE). |
Licensing and patents |
The bzip2 homepage states that the license is a GNU’s Not Unix (GNU) General Public License (GPL). It is unclear which version of GNU GPL would apply. Other sources state conflicting information about bzip2’s license, stating it is a Berkeley Software Distribution (BSD) style license. Comments welcome. |
Transparency | Depends upon algorithms and tools to read. Would require sophistication to build tools from scratch. |
Self-documentation |
Identifies self as a bzip2-compressed file with magic numbers (see magic numbers section). There is no specific language for the inclusion of other metadata. However, documentation is sparse. Comments welcome. Accessibility Features No specific features in the file format. Features to support accessibility would be found in the bundled and compressed files (such as embedded captions and subtitles in audiovisual content, tagged and structured text in textual documents, and alt text for images). Aggregate files can also contain separate files for transcripts, timed text or captions as part of the bundled package. See Relationships to other formats for details. |
External dependencies | None, beyond the availability of software to extract and decompress the files contained in a bzip2 file. |
Technical protection considerations | Does not support encryption. |
Aggregate | |
---|---|
Compression | According to the bzip2 software official manual, bzip2 files are compressed using the Burrows-Wheeler block-sorting text compression algorithm, and Huffman coding. |
Support for Error Dectection | Unknown. Comments welcome. |
Tag | Value | Note |
---|---|---|
Filename extension | bz2 |
Used for bzip2. See Wikidata: https://www.wikidata.org/wiki/Q27866052 |
Internet Media Type | application/x-bzip2 |
See the Mozilla list of common MIME types. Not listed in IANA. |
Magic numbers | Hex: 42 5a 68 ASCII: BZh |
For more details see:
Note this header, when converted from Hexadecimal to ASCII, is "BZh". “BZ” stands for “bzip”, and the "h" is for "Huffman coding," the compression algorithm used with bzip2. Some sources, such as Wikipedia, will cite the magic numbers as “BZh” instead of the hexadecimal. |
Pronom PUID | x-fmt/268 |
See https://www.nationalarchives.gov.uk/PRONOM/x-fmt/268 |
Wikidata Title ID | Q27866052 |
See https://www.wikidata.org/wiki/Q27866052 |
General |
The bzip2 program, and by extension the bzip2 file format, is based on its predecessor bzip. Despite similarities in appearance and name to bzip, bzip2 is rewritten and re-engineered. It was developed to address potential patent issues with bzip. The format created by bzip2 is not compatible with bzip, and efforts to make them compatible were avoided to maintain the purpose of patent avoidance. Seward expressed commitment to backwards compatibility for future changes. The predecessor program bzip is no longer available. |
---|---|
History |
Julian Seward released bzip2, version 0.15, in July 1996. The compressor’s popularity grew over the next several years due to its stability. Julian Seward released version 1.0 in late 2000. In June 2019 Federico Mena became the new maintainer of bzip2. In 2019, Mark Wielaard began maintaining a bzip2 stable repository at Sourceware. In June 2021 Micah Snyder became the new maintainer of the Sourceware repository. |
|