Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

ChemDraw Exchange

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name ChemDraw Exchange
Description

CDX is the native file format for the program ChemDraw which stores molecular data in a tagged binary format. FileInfo.com describes ChemDraw as a molecular editing program suite used for storing accurate chemical drawings. According to CDX File Format Documentation, "ChemDraw stores a document as a set of objects and properties. Objects are such things as atoms, bonds, fragments, arrows, and text. Properties are things like position, color, arrow type, and bond order." Richard L. Apodaca in the article, A Brief Introduction to the ChemDraw CDX File Format, describes "ChemDraw as the industry standard tool for generating publication-quality chemical structure graphics."

ChemDraw Editor Software:

Described on the ChemDraw Wikipedia page, ChemDraw is a molecule editor software program, first developed in 1985 by David A. Evans and Stewart Rubenstein. Cambridge Scientific Computing was launched in 1986 and eventually became CambridgeSoft Corporation. ChemDraw is now part of Revvity Signals Software's ChemOffice Suite of programs.

According to Revvity Signals Software's blog, Back to School with ChemDraw, September 2022, "ChemDraw software is the most efficient way to draw and represent complex chemical structures and reaction schemes."

CDX File Format:

CDX Documentation, states that "the CDX file format is a tagged file format, meaning that it consists of a series of objects, each of which is preceded by a tag that identifies what the object represents."

The general structure of a CDX file, as described by CDX Documentation, contains a header, objects, and properties, and uses little- endian byte order.

  • CDX header consists of:
    • 8 bytes with the value "VjCD0100" (56 6A 43 44 30 31 30 30).
    • 4 bytes reserved (04 03 02 01).
    • 16 bytes reserved, set to zero (00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00).
    • The header is then followed by an object tree of tagged items.
  • Properties, also called attributes, are also self-contained. A property applies to the object which logically contains it. Properties have 3 parts:
    • Tag identifier - defines what the property represents.
    • Length - The 2-byte length item specifies the number of bytes that comprise the data in the property.
    • Data - may be an integer, a floating-point number, or some other type determined by the property tag.
  • Objects are self-contained and can contain properties and other objects. The enclosed object is called a "container." Objects have 4 parts:
    • Tag identifier - An object's tag is a 2-byte value, which will always have the most significant bit set.
    • Object identifier - a 4-byte object ID.
    • Object contents - An object may contain any number of properties or other objects.
    • End object - Every object ends with a pair of zero bytes (00 00).
  • The End of a CDX file is marked by two bytes of 0 (00 00).

Text-Based CDXXML File Format:

As stated in the CDX Documentation, "CDXML is an XML encoding of CDX -- a variant of CDX that complies with the XML specification. It differs from CDX only in the details of its formatting, and it doesn't even differ by that much...This is a very important point: a document can be converted from binary CDX to text-based CDXML and back again with absolutely no loss of information."

As stated by Richard L. Apodaca, in the article, A Brief Introduction to the ChemDraw CDX File Format, 2010, "Interconversion between the two formats is lossless; everything that can be represented as a binary CDX file can also be represented as an XML CDX file."

According to CDX Format Wikipedia, CDXML is the XML version and the preferred version of CDX.

CDX Documentation describes the CDXML format as a file containing a header, followed by a series of tagged items and the end of file document object.

CDXML files contain properties and objects same as the CDX files.

  • Properties all have names.
  • Objects have names that identify the type of object, names are case-sensitive.

CDX Documentation has 'A Simple Example,' that includes a graphical drawing, the CDX binary version, the CDXML version, and a side-by-side comparison of the two file formats.

Uses of CDX Files:

According to An Nguyen in the Journal of Cheminformatics, December 2019, "The file formats CDX and CDXML are often used for the capture of chemical information...In addition, the CDX format allows the embedding of chemical structures into the Word files DOC or DOCX while maintaining the consistency and the synchronization of the ChemDraw information...The content can be used to process and retain most of the important information that was generated via the ChemDraw editor. Both file formats contain chemical objects (e.g., atoms, bonds, reactions) and properties (e.g. charge, valence, atom number, bond order) as structure content."

Production phase Middle to final state. CDX files are mainly used for storage, the CDXML format is used for delivery.
Relationship to other formats
    Defined via XML, Extensible Markup Language (XML). CDX Documentation, "A CDXML is a CDX file specially formatted so that it conforms to the XML specification."

Local use Explanation of format description terms

LC experience or existing holdings None
LC preference The Library of Congress has not yet expressed any format preference for scientific data.

Sustainability factors Explanation of format description terms

Disclosure

Standard format partially documented. According to the CDX Documentation, because of its ability to incorporate custom information, and because it is in the public domain, CDX has been adopted by the U.S. Patent Office as its standard chemical format.

Depth-First.com article, An Introduction to the ChemDraw CDXML Format, Richard L. Apodaca states "The authoritative specification from Perkin Elmer (PE) offers a starting point for understanding CDXML...Although this documentation is mostly complete, several items are missing." Note: Authoritative Specification from Perkin Elmer (PE) is the same as CDX Documentation referenced in this document.

    Documentation

CDX File Format Documentation.(http://www.cambridgesoft.com/services/documentation/sdk/chemdraw/cdx/)

Adoption

According to CDX Wikipedia, the CDX file format is used across Windows, Mac, and Linux distributions.

ChemDraw allows the use of the system clipboard, allowing users to copy and paste CDX files from ChemDraw to either Mac or Windows clipboards. Richard L. Apodaca, in the article, A Brief Introduction to the ChemDraw CDX File Format, 2010, describes CDX files, stating, "Chemists rarely save CDX files to disk themselves. Instead, ChemDraw content is copied from a drawing tool and pasted into Microsoft Office documents (particularly Word). This embedded CDX then gets saved along with the rest of the document into a single file. Extracting this embedded CDX content requires an Office file API."

CDX Office File API Example:

CDX Readers and Writers:

C++ header file - human-readable enumerations of all CDX object/property values, for writing third-party CDX readers.

CDX is also supported by Wolfram Research's Mathematica application.

Comments welcome.

    Licensing and patents

According to FairSharing.org, "Because of its ability to incorporate custom information, and because it is in the public domain, CDX has been adopted by the U.S. Patent Office as its standard chemical format."

Comments welcome.

Transparency

Depends on the format and available software.

According to iChemLabs.com's news article, Read and Write ChemDraw Files, January 2010, "ChemDraw has two formats that need to be considered, the ChemDraw Exchange format (CDX) and it's xml sister (CDXML). The CDX format is a pure binary format (users won't be able to make sense of the objects inside when users open it in a text editor) while the CDXML format is text based and can be coherently read. Both formats are structurally identical and completely describe any group of ChemDraw objects. CambridgeSoft has been urging users to switch to the CDXML format due to its ease of use, but there are some drawbacks to the XML version due to its inherently larger size."

Comments welcome.

Self-documentation

Supports the inclusion of metadata. As stated on CDX Format Wikidata, "file format for two- dimensional atomic coordinates, chemical bond information and metadata."

As explained in the CDX Documentation, CDX binary files consist of a fixed header, followed by tagged items/objects which can have properties (attributes) applied to them. "Properties, also called attributes, are self-contained. A property applies to the object which logically contains it. It may also describe other objects contained within the object which logically contains the property."

Comments welcome.

External dependencies

None beyond availability of supporting software.

Comments welcome.

Technical protection considerations

None found.

Comments welcome.


Quality and functionality factors Explanation of format description terms

Text
Normal rendering

Some support. CDX Documentation states, "The CDX file format (binary) is a tagged file format, meaning that it consists of a series of objects, each of which is preceded by a tag that identifies what the object represents. Tagged file formats in general are very flexible. Readers of a tagged file can efficiently skip over parts they aren't interested in or do not recognize...This flexibility means that a tagged file format can be expanded without invalidating any existing files...In the simplest view, a CDX file consists of a document header followed by a stream of tagged items followed by the end of the Document...Nesting can be difficult to see in a raw binary file."

CDX Documentation, "A CDXML (XML) is a CDX file specially formatted so that it conforms to the XML specification. We expect that anyone who manipulates a CDXML file will be familiar with the general XML specifications." See XML for more information.

Comments welcome.

Integrity of document structure

Good support. CDX files have a general structure, see description.

Comments welcome.

Integrity of layout and display

Little to no information about CDX layout and display. In An Introduction to the ChemDraw CDXML Format, 2021, Richard L. Apodaca states, "CDX/ML is an odd cheminformatics file format in that is mixes a molecular graph encoding system with visual elements and styling. For example, a given CDX/ML file may contain a chemical structure together with a TLC plate. Each individual bond can be colored, and the text on atom labels can bear custom colors, fonts, and layout instructions. Sometimes visual elements can carry chemical meaning. For example, an arrow may be part of a reaction scheme. Likewise, a bracket may surround the repeating unit of a polymer. This broad scope, in which chemically meaningful elements are mixed with visual layout and arbitrary vector graphics, makes CDX/ML one the most complicated file formats in cheminformatics. Coupled with the useful, but incomplete PE specification, CDX/ML is not an easy format to understand or use."

CDX saves ChemDraw drawings without loss of data.

Comments welcome.

Support for mathematics, formulae, etc.

Little to no information on support of mathematics, chemical formulae, diagrams, etc.

Comments welcome.

Functionality beyond normal rendering

According to Richard L. Apodaca in the article, An Introduction to the ChemDraw CDXML Format, 2021, it is "more common to find them (CDX/ML files) embedded in Microsoft Office or ChemOffice documents. Often, CDX/ML makes its way from these embedded environments to the outside world via the system clipboard.

As stated in the CDX Documentation, "When an object is copied, ChemDraw puts a CDX binary file directly on the clipboard. The data placed on the clipboard is exactly the same as would be written to a file, so once users retrieve it from the clipboard in the first place, users can process it exactly as the user would process a disk-based file."

Comments welcome.


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension cdx
CDX Documentation, "CDX is the native file format of ChemDraw."
Filename extension cdxml
CDX Documentation, "CDXML is a variant of CDX that complies with the XML specification...Everything that can be stored in a CDX file can also be stored in a CDXML and vice versa."
Internet Media Type chemical/x-cdx
The National Archives, Chemical Draw Exchange Format. See https://www.nationalarchives.gov.uk/pronom/fmt/378.
Pronom PUID fmt/378
The National Archives, Chemical Draw Exchange Format. See https://www.nationalarchives.gov.uk/pronom/fmt/378.
Wikidata Title ID Q5010021
CDX Format, file format for two-dimensional atomic coordinates, chemical bond information and metadata, ChemDraw Exchange, CDX. See (https://www.wikidata.org/wiki/Q5010021)
Wikidata Title ID Q5010020
CDXML, CDX file specially formatted so that it conforms to the XML specification, ChemDraw Exchange XML format, CDXML. See https://www.wikidata.org/wiki/Q5010020
Wikidata Title ID Q898716
ChemDraw, software for chemical structure drawing. See https://www.wikidata.org/wiki/Q898716
Wikidata Title ID Q105850644
ChemDraw Template, file format. See https://www.wikidata.org/wiki/Q105850644

Notes Explanation of format description terms

General  
History

Bethany Halford's article, Reflections on ChemDraw, describes how ChemDraw was developed by the collaboration between Stewart Rubenstein and David and Sally Evans. Cambridge Scientific Computing was launched in 1986 and eventually became CambridgeSoft Corporation. CambridgeSoft Corporation later became PerkinElmer Informatics and was acquired by PerkinElmer, Inc. In 2011. As of May 9, 2023, PerkinElmer Informatics is now Revvity Signals Software.

According to ChemDraw Wizard Pierre Morieux, Ph.D., in the article, Back to School with ChemDraw, September 2022, "ChemDraw has been the software application chemists use to draw chemical structures since 1985. It has long since become the industry standard and is packed with features that make it easy to create publication-ready drawings."


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 05/14/2024