Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

ESRI Shapefile

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name ESRI Shapefile
Description

The ESRI Shapefile format was developed by Esri, formerly the Environmental Systems Research Institute, Inc. and abbreviated as "ESRI," and published in 1998.  Although proprietary, the intention behind publishing the format was to encourage its use for interoperability among geographic information system (GIS) applications. The Shapefile format stores nontopological geometry and attribute information for spatial features in a data set.  A Shapefile consists minimally of a main file, an index file, and a dBASE  table.

In the main file, the geometry for a feature is stored as a shape comprising a set of vector coordinates. This main file is a direct access, variable-record-length file in which each record describes a shape with a list of its vertices. In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file. Attributes are held in a dBASE format file. The dBASE table contains feature attributes with one record per feature. Attribute records in the dBASE file must be in the same order as records in the main file. Each attribute record has a one-to-one relationship with the associated shape record.

The shapefile format can support point, line, and area features. Area features are represented as closed loop, double-digitized polygons.

Instances of the Shapefile format have often been used as a data exchange format from Esri formats to non-Esri applications. The format is most useful for writing simple features and attributes quickly as there are limitations inherent in the Shapefile format related to both geometry and attributes. As outlined elsewhere in this description, these limitations may cause loss of data when using shapefiles to contain or exchange complex geometry or attributes. The Shapefile format may be used as an intermediary between data creation applications and more functionally capable GIS formats and applications, albeit with the limitations noted in the Dataset/Normal Dataset section.

The cluster of files is typically stored in the same file directory or project workspace, with all component files having the same filename (prefix) and identified by individual file extension (suffixes). Three components are mandatory: a main file that contains the feature geometry (.shp), an index file that stores the index of the feature geometry (.shx), and a dBASE table (.dbf) that stores the attribute information of features. A comprehensive list of component files follows:

  • shp -- Main file (mandatory); a direct access, variable-record-length file in which each record describes a shape with a list of its vertices.
  • shx -- Index file (mandatory). In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file. The index file (.shx) contains a 100-byte header followed by 8-byte, fixed-length records.
  • dbf -- dBASE Table file (mandatory); a constrained form of DBF that contains feature attributes with one record per feature. The one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file.
  • sbn -- Part 1 of spatial index for read-write instances of the Shapefile format. If present, essential for correct processing.
  • sbx -- Part 2 of spatial index for read-write instances of the Shapefile format. If present, essential for correct processing.
  • atx -- Created by ArcView 3.x for each instance of the Shapefile format or dBASE attribute index created in ArcCatalog. ArcView GIS 3.x attribute indexes for shapefiles and dBASE files are not used by later versions of ArcGIS as a new attribute indexing model has been developed for shapefiles and dBASE files.
  • fbn -- One of the files that store the spatial index of the features for instances of the Shapefile format that are read-only.
  • fbx -- The other file (besides .fbn) that stores the spatial index of the features for instances of the Shapefile format that are read-only.
  • ain -- One of the files that stores the attribute index of the active fields in a table or a theme's attribute table.
  • aih -- The other file (besides .ain) that stores the attribute index of the active fields in a table or a theme's attribute table
  • ixs -- Geocoding index for read/write shapefiles. If present, essential for correct processing.
  • mxs --Geocoding index for read-write shapefiles (ODB format).
  • prj -- Projections Definition file; stores coordinate system information.
  • xml -- contains metadata, as used by ArcGIS.
  • cpg -- An optional file that can be used to specify the codepage for identifying the character set to be used.

See Notes for more information about filenames and contents.

Production phase The Shapefile format is open and popular for data transfer. An initial state format during map and shape digitization output, employed as a middle state format by many programs and publishers, and used for data transfer between GIS applications. Shapefiles can be created by exporting any data source to a shapefile, digitizing shapes directly, using programming software, or writing directly to the shapefile specifications by creating a program.
Relationship to other formats
    Must have component Main file (.shp) and index file (.shx), not described separately on this website.
    Must have component Shape_DBF, dBASE Table for ESRI Shapefile (DBF)
    May have component Optional component files include files with the following extensions: .sbn; .sbx; .atx; .fbn; .fbx; .ain; .aih; .ixs; .mxs; prj; xml; cpg. None of the formats for these files are described separately on this website.

Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress has acquired geospatial data in Shapefile format for its collections and to produce maps to support service to the U.S. Congress and to illustrate the scope of collections. Where the Library once acquired ongoing map sets on paper, many are now acquired digitally. For example, a map acquisition may include both a GeoTIFF created by scanning a paper map and vectorizations of the original as ESRI_shape or GeoDB_file format. From various sources, including archived web pages, over 130,000 files with .shp extension were found in its digital collection storage in May 2020.
LC preference For works acquired for its collections, the Library of Congress Recommended Format Specifications for Geographic Information System (GIS) - Vector Data, indicates that the Shapefile format, ESRI_Shape, is a preferred format for GIS vector data. Since ESRI_Shape is supported by widely adopted geospatial information systems and by well supported open source software libraries it is also a preferred format for geospatial datasets. See the Library of Congress Recommended Formats Statement for Datasets. It is also a preferred format for 2D and 3D Computer Aided Design vector images.

Sustainability factors Explanation of format description terms

Disclosure Fully documented. Developed and regulated by Esri, formerly the Environmental Systems Research Institute, Inc. and abbreviated as "ESRI", as an open specification for data interoperability among Esri and other software products.
    Documentation ESRI Shapefile Technical Description: An ESRI White Paper—July 1998
Adoption

During the 1990s, Esri introduced the Shapefile format and it soon became a de facto standard. The format is still widely deployed today, although the limitations outlined in the Quality and functionality factors and Notes within this description have led many users to move to geospatial database formats. See GeoDB and GeoPackage.

According to Shapefiles in the online course "CASA0005 Geographic Information Systems and Science" from University College, London, "Perhaps the most commonly used GIS data format is the shapefile. Shapefiles were developed by ESRI, one of the first and now certainly the largest commercial GIS company in the world. Despite being developed by a commercial company, they are mostly an open format and can be used (read and written) by a host of GIS Software applications."

Essentially all GIS applications can view, use, or manipulate data in the Shapefile format. In early 2020, examples of mainstream geospatial software applications supporting the Shapefile format, with links to lists of supported formats, include: Esri ArcGIS and other Esri products, such as CityEngine; Global Mapper; MapInfo Professional (now from Pitney Bowes); LuciadFusion (from Hexagon Geospatial). The open-source GIS application QGIS can import and export Shapefiles. Data streams, such as those from global positioning system (GPS) receivers, can also be stored in the Shapefile format. Shapefiles can be imported into Google Earth Pro, Open Street Map, and AutoCAD Map 3D. There are also several software libraries, in a variety of programming languages that support its use. In particular, the Open Source Geospatial Foundation's Geospatial Data Abstraction Library (GDAL) supports the Shapefile format, and Safe Software's FME Desktop for integrating and transforming spatial data supports read and write of the Shapefile format on Windows, Linux, and MacOS operating systems. See FME Technical Specifications for FME Desktop and FME Server.

A number of U.S. government agencies have distributed data in Shapefile format, including the U.S. Geological Survey (USGS), the U.S. Census Bureau, the National Oceanic and Atmospheric Administration, the Environmental Protection Agency, and the interagency National Atlas of the United States project led by USGS. See Notes for more detail on the Shapefile format data available from these agencies.

Among the archival institutions that list the Shapefile format as a preferred or acceptable format are: the U.S. National Archives; the UK Data Service; and the Data Archiving and Networked Services for the Netherlands.

Selected examples of research datasets made available for re-use in the Shapefile format are: United States District Court Boundary Shapefiles (1900-2000) from Open ICPSR; GIS files from the National Historical Geographic Information System; and ParkServe data from The Trust for Public Land.

    Licensing and patents The original specification, ESRI Shapefile Technical Description: An ESRI White Paper—July 1998, states, "This document also provides all the technical information necessary for writing a computer program to create shapefiles without the use of ESRI® software for organizations that want to write their own data translators." Esri clearly encouraged others to write software to use the format and not only to use Esri applications. Esri | Master Agreements; Products and Services Terms of Use detail the terms of use for Esri GIS software.
Transparency Computer programs can be created to read or write the Shapefile format using the technical specification in ESRI Shapefile Technical Description: An ESRI White Paper—July 1998.
Self-documentation GIS metadata documenting important characteristics of the resource found in the Shapefile format such as bounding coordinates, datum, etc. may be included as a .xml file within the file group.
External dependencies No concerns
Technical protection considerations No concerns

Quality and functionality factors Explanation of format description terms

Dataset
Normal functionality

The ESRI Shapefile format is a special-purpose dataset for storing nontopological geometry and attribute information for the spatial features in a data set. Its component Shape_DBF file uses a constrained form of the dBASE File Format (DBF) to store feature attributes using a limited set of data types.

Some relevant considerations were outlined in the Esri Help explanation "Geoprocessing Considerations for Shapefile Output," (as of 2009 and ArcGIS 9.3), including the idea that the relative simplicity of the Shapefile format's structure means that data may be lost if the format is used to transfer complex geometry and attributes.  The document also notes that the format's attributes cannot contain null values, and stores numeric values as characters rather than binary, thus leading to rounding errors for numbers containing decimal places, i.e., real numbers.  The format also lacks good support for Unicode character strings, thus limiting the use of non-English languages, and does not allow field names longer than ten characters. The format cannot store both a date and a time in the same date field, and cannot support spatial domains or subtypes.  In terms of geometry limitations, instances of the Shapefile format have a 2 GB size limitation for any of the component file, but any given instance may take up to three to five times as much space as file GIS databases. The Shapefile format does not contain an XY tolerance (the minimum distance between coordinates before they are considered equal), thus impacting the precision with which comparison between features can be calculated.  Since circular arc curves are not supported in the format, existing circular arc curves will be transformed to simple line features with closely spaced vertices rather than as true arcs.

As of 2020, with increasing size of datasets and more GIS use and analysis by beginners and non-specialists, the shortcomings of the Shapefile format are increasingly significant. Alternatives include the openly specified OGC GeoPackage and the proprietary ESRI File Geodatabase. Examples of advocacy for stopping use of Shapefile, with lists of problems, include Switch from Shapefile and Why you should use GeoPackage instead of Shapefile.

Support for software interfaces (APIs, etc.) There are many non-Esri applications that can view, use and output instances of the Shapefile format, although the instances that are output can easily be corrupted, and may not be properly formatted. Information about how to create data in the Shapefile format can be found within the ESRI Shapefile Technical Description: An ESRI White Paper—July 1998. Links to a free C library for reading and/or writing the Shapefile format, and an Open Source (MIT License) Python library for reading/writing in the format can be found in the Useful References section.
Data documentation (quality, provenance, etc.) The minimal requirements for an instance of the Shapefile format do not specify a place for the documentation of data quality or provenance.
GIS images and datasets
Normal functionality

The minimal structure for the Shapefile format (i.e., the required .shp, .shx, .dbf files in the cluster) facilitates georeferencing to the extent that "auxiliary" files are also clustered in the same directory structure, including a .prj file for projection information, and a .txt or .xml file for metadata. If the metadata record for a given Shapefile format instance includes coordinates, datum, and scale, the location for the features represented by the instance of the format can be accurately and precisely determined.

The Shapefile format handles single features that overlap or that are noncontiguous. The format can support point, line, and area features. Area features are represented as closed loop, double-digitized polygons.

Because the format does not have the processing overhead of a topological data structure, it typically requires less disk space and is easier to read and write. It has advantages over some more complex geospatial data formats such as faster drawing speed and editability. However, Shapefiles do not have a spatial domain, which defines the geographic extent that all coordinates must fall within. This spatial extent is useful when editing geometry since it prevents you from entering coordinates outside the extent. See Normal Dataset and Notes for further limitations.

Support for GIS metadata When .txt or .xml files are included within the Shapefile cluster's directory, they are usually intended as metadata for the data contained within the other files that comprise the shapefile. No assumptions are made about the completeness or accuracy of the metadata, nor is any particular content standard presumed.
Support for grids Instances of the Shapefile format are ready for grid analysis by virtue of the component Shape_DBF file, essentially a relational database table. This table contains the characteristics describing geographic features that are available for viewing and/or for simple grid analysis. The extent to which mathematical and statistical calculations can be performed against the data in the table is dependent upon the data structure built into the dataset layers comprising the shapefile. Often, the tabular data found in an instance of a Shapefile format are joined or related to other tabular data to support more complex analysis.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension shp
Known as the main file. One of three mandatory files in a Shapefile format cluster, stored in the same project workspace, typically a file folder. The .shp file stores the feature geometry and shares a base filename (prefix) with the index and the Shape_DBF file. See Notes for more information on filenaming.
Magic numbers Hex: 00 00 27 0A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ASCII: '
From the FileExtension Source
Tag Value Note
Filename extension shx
The index file stores the index of the feature geometry. One of the three mandatory files included in a Shapefile format cluster. In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file (with extension .shp). The index file (.shx) contains a 100-byte header followed by 8-byte, fixed-length records. The shx file must have the same base filename as all other files included in the Shapefile format cluster.
Filename extension dbf
For dBASE Table file. One of the three mandatory files included in a Shapefile format cluster. The dBASE table contains feature attributes with one record per feature. The one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file. The dBASE file must have the same base filename (prefix) as all other files included in the Shapefile format cluster.
Filename extension sbn
One of two files that stores the spatial index of the features, i.e., Part 1 of the spatial index for read-write instances of the Shapefile format.
  • If present, essential for correct processing.
  • Must have the same base filename as all other files included in the Shapefile format cluster.
Filename extension sbx
One of two files that stores the spatial index of the features, i.e., Part 2 of the spatial index for read-write instances of the Shapefile format.
  • If present, essential for correct processing.
  • Must have the same base filename as all other files included in the Shapefile format cluster.
Filename extension atx
Created by ArcView 3.x for each shapefile or dBASE attribute index created in ArcCatalog. Not used by later versions of ArcGIS.
  • Associated file; if present, essential for correct processing.
  • Must have the same base filename as all other files included in the Shapefile format cluster.
Filename extension fbn
One of two files that stores the spatial index of the features for shapefiles that are read-only along with .fbx files.
  • Must have the same base filename as all other files included in the Shapefile format cluster.
Filename extension fbx
One of two files that stores the spatial index of the features for instances of the Shapefile format that are read-only along with .fbn files.
  • Must have the same base filename as all other files included in the Shapefile format cluster.
Filename extension ain
One of two files that stores the attribute index of the active fields in a table or a theme's attribute table along with .aih files.
  • Must have the same base filename as all other files included in the Shapefile format cluster.
Filename extension aih
One of two files that stores the attribute index of the active fields in a table or a theme's attribute table along with .ain files.
  • Must have the same base filename as all other files included in the Shapefile format cluster.
Filename extension ixs
Geocoding index for read-write shapefiles.
  • Associated file; if present, essential for correct processing.
  • Must have the same base filename (prefix) as all other files included in the Shapefile format cluster.
Filename extension mxs
Geocoding index for read-write shapefiles (ODB format).
  • Must have the same base filename (prefix) as all other files included in the Shapefile format cluster.
Filename extension prj
For Projections Definitions file.
  • Must have the same base filename (prefix) as all other files included in the Shapefile format cluster.
Filename extension xml
Stores information (metadata) about the shapefile. In ArcGIS, the metadata file is often called metadata.xml and must be stored in the same file directory or project workspace as the rest of the component files in the Shapefile format cluster in order to be used by ArcGIS applications.
Filename extension cpg
Specifies the codepage for identifying the character set to be used.
  • Must have the same base filename (prefix) as all other files included in the Shapefile format cluster.
Pronom PUID x-fmt/235
See http://www.nationalarchives.gov.uk/PRONOM/x-fmt/235.
Wikidata Title ID Q278934
See https://www.wikidata.org/wiki/Q278934.

Notes Explanation of format description terms

General

All file names in a Shapefile format cluster adhere to the 8.3 naming convention. The main file, the index file, and the dBASE file have the same base filename (prefix), which must start with an alphanumeric character (a–Z, 0–9), followed by zero or up to seven characters (a–Z, 0–9, _, -). All letters in a file name are in lower case on operating systems with case sensitive file names.

The Shapefile format stores integer and double-precision numbers. The ESRI Shapefile Technical Description refers to the following types:

  • Integer: Signed 32-bit integer (4 bytes)
  • Double: Signed 64-bit IEEE double-precision floating point number (8 bytes)
  • Floating point numbers must be numeric values.

Positive infinity, negative infinity, and Not-a-Number (NaN) values are not allowed in the format. Nevertheless, the format supports the concept of "no data" values, but they are currently used only for measures. Any floating point number smaller than –1038 is considered by a Shapefile reader to represent a "no data" value.

The functionality associated with the Shapefile format is constrained by the rules associated with the building and display of points, polylines, and polygons. Limitations are also imposed by the use of the dBASE component file with its field types and character width restrictions, its restriction to support only for ANSI characters in field names and values. The number of fields within an attribute table are limited to 255, and there is little support for SQL functions other than that provided by use of WHERE clauses. Feature class subtyping, assignment of attribute domains, geometric networks, topologies and annotations are not supported by shapefiles, thus more or less limiting functionality to that of normal GIS functionality.

The Shapefile format can be useful as a middle state when exporting data for use in a non-Esri software application, or for exporting data to use in ArcView 3 or ArcInfo Workstation. The Shapefile format can be used to write simple features and attributes quickly, such as for ArcGIS Server geoprocessing services. But as is outlined in the Esri Help explanation Geoprocessing Considerations for Shapefile Output (from 2009, ArcGIS 9.3), the format does not handle the full life cycle of data creation, editing, versioning, and archiving, thus inhibiting its use in modern life-cycle, active database management.

Listed here are examples of the adoption of the Shapefile format by U.S. government agencies. Since government websites are reorganized and redesigned and use of formats changes over time, many of the links are via the Internet Archive:

Shapefiles are often distributed as compressed packages that combine the related files and reduce download time. The USGS Digital Data Viewer: dlgv32 Pro a limited-feature version of commercial software Global Mapper, can load directly Shapefiles distributed as compressed .tar.gz files.

History

Esri introduced the Shapefile format as a part of ArcView GIS version 2 during the 1990s. The format was welcome because interest in simple geometric structures had grown during the 1990s as disk storage and hardware costs decreased and computational speed increased. At the same time, existing geographic information system (GIS) datasets were more readily available, and the work of GIS users was evolving from primarily data compilation activities to include data use, analysis, and data sharing. Shapefiles could be easily created from many GIS systems and, over time, shapefiles were widely adopted as a de facto standard.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 07/08/2021