EAD Application Guidelines for Version 1.0


Chapter 4: Authoring EAD Documents

4.1. The Authoring Process
4.1.1. Select Authoring Software
4.1.2. Obtain the EAD DTD
4.1.3. Encode the Finding Aid
4.1.4. Validate the EAD Document
4.2. Options for Authoring Software
4.2.1. Text Editors and Word Processors
4.2.2. Native SGML/XML Editors
4.2.2.1. Linear Editor
4.2.2.2. Tree-structured Editors
4.2.3. Text Converters
4.2.4. Databases
4.3. Technical Issues in Authoring
4.3.1. Structure of the EAD DTD
4.3.2. SGML versus XML
4.3.2.1. Changes in the DTD Files
4.3.2.2. Case Sensitivity
4.3.2.3. The XML Declaration
4.3.2.4. Empty Elements
4.3.3. Parsing
4.3.4. Sharing Data Between MARC Records and EAD Finding Aids
4.3.5. Effects of Various Features on Output
4.3.5.1. Whitespace
4.3.5.2. Punctuation
4.3.5.3. Headings
4.3.5.4. Tabular Display
4.4. Document Control
4.4.1. Encoding Consistency
4.4.2. DTD Conformance
4.4.3. File Names and Locations
4.4.4. Version Control
4.4.5. Security


4.1. The Authoring Process

The process of applying SGML markup to documents, known as "authoring," involves several steps. These steps are summarized below and are described in greater detail in later sections of chapter 4 or elsewhere in these Guidelines.

4.1.1. Select Authoring Software

A variety of software choices are available for authoring EAD-encoded finding aids, which means that you must first select a particular application to use. This selection process involves careful evaluation of features, cost, appropriateness for your institution's technical environment, and suitability of the tool for creating new finding aids or for converting those already existing in machine readable form. Available software options are described in section 4.2.

4.1.2. Obtain the EAD DTD

Whichever software application you use, a copy of the EAD DTD will be required at some point in the authoring process. Some application vendors include a copy of the DTD with their software, or it may be obtained from the Encoded Archival Description Official Web Site at the Library of Congress.(72) Some SGML/XML authoring tools convert the DTD into an internal proprietary structure for more efficient manipulation within the particular product; examples include the rules files for Author/Editor and logic files for WordPerfect. Each software manufacturer supplies the software needed to convert the DTD into its own binary format; the necessary files for these applications are also available at the EAD Help Pages Web site(73) and may be downloaded in lieu of the DTD.

4.1.3. Encode the Finding Aid

A variety of methods may be employed to mark up a finding aid, depending on existing institutional workflows and software, staffing, cost considerations, and technical sophistication. The status of each particular document also may affect the techniques used: for example, whether the document being encoded is a newly-created finding aid, an existing finding aid that is not yet in machine readable form, or an existing electronic finding aid. Relevant administrative issues are reviewed in section 2.5, and detailed encoding instructions are in chapter 3.

4.1.4 Validate the EAD Document

To ensure conformity, proper processing, and data interchange, the encoded document must be compared to the specifications of the DTD to ensure that the markup adheres to the EAD standard. This process involves two steps: parsing and validation. This functionality may be accomplished either by a parser that is built into the authoring software or by a separate program. Parsing is discussed in greater detail in section 4.3.3 and section 6.2.4.

4.2. Options for Authoring Software

A variety of software applications may be used to create EAD documents. This section describes four broad categories of authoring tools by type of application:(74)

The overview of each type explains how that type of software generally functions, cites representative products available at the time these Guidelines were written, and characterizes the general advantages and disadvantages of each method. This marketplace currently is very dynamic, with the emergence of XML and XML-compliant software as a potentially significant force in a variety of environments, including office productivity tools, relational database managers, and electronic commerce.(75)

Additional steps will be required to produce an attractive printed version of the finding aid for public use, since the document will include all of the EAD tagging but will lack any formatting or other presentational directions. Several options are available, including the use of stylesheets produced through use of formatting languages such as XSL and DSSSL that can be used to specify the layout of print copies generated from EAD finding aids. See section 5.3.3 for information on stylesheets and section 5.3.4 for a discussion of print output.

4.2.1. Text Editors and Word Processors

Because an SGML document exists as a simple text file, it is possible to key an EAD finding aid using any software that can output a document in ASCII format. This includes text editors that come with your operating system, such as the DOS Editor, Windows Notepad or the Macintosh SimpleText programs. Word processors also may be employed, but the files must be exported in ASCII format rather than in the word processor's proprietary native format.

Advantages: Low cost, ready availability, and user familiarity are the chief virtues of these products.

Disadvantages: Text editors and word processor applications have no built-in knowledge of the rules of the EAD DTD and hence no method of verifying conformance to it. You must rely, therefore, on the encoder's knowledge of EAD to ensure that data elements and attributes are correctly applied. You will have to employ a separate application to validate the document. Several are available currently as freeware, including NSGMLS and XML-specific parsers from Microsoft, IBM, and others.

When keying an EAD document using a text processor, you must be particularly careful about the use of certain characters and symbols. For example, the characters &, <, >, ", and ' have special meaning for SGML processors, as they may be either part of the text or part of the markup, and it is necessary to differentiate between the two. It is possible to include these and other "nonstandard" characters, such as letters that carry diacritics in non-English languages or other symbols not found on standard keyboards, in EAD documents by use of entity references (see section 6.5.2.1 for a discussion of character entity references). Extensive manual keying of entity strings may, however, significantly slow the authoring process. When using a word processor to create an EAD instance, particular care must be taken to ensure that an entity reference to the appropriate ISO character is inserted for such non-Latin letters and symbols instead of the proprietary escape codes that word processors typically use to display such characters. You will also have to generate the "prolog" section of the EAD instance manually (see section 6.2.3 for information about the EAD prolog).

4.2.2. Native SGML/XML Editors

Many software packages are available that are designed specifically for authoring SGML/XML documents such as EAD instances. These products may be differentiated by the operating systems for which they are available, their capacity to generate documents in SGML or XML (or both), or by their "look and feel."

With regard to look and feel, some software displays the text of the EAD instance in a linear fashion, with both the markup and the content of the finding aid appearing as a continuous block of text. In other software, markup appears in one window in the form of a tree structure that displays the hierarchical and nesting relationships of the elements, and the text of the finding aid appears in a parallel window. Software applications representative of both categories are described below, emphasizing those currently being used by archivists to create EAD documents.

4.2.2.1. Linear Editors
Author/Editor, initially developed by SoftQuad but now distributed by Interleaf, is a widely employed editor typical of this product category. It is available in both Windows and Macintosh versions, performs continuous DTD validation, includes standard cut-and-paste editing features, a spell-checker and thesaurus, and has a built-in macro language. It has limited capabilities, using its internal styles features, to produce print copies from an EAD document, though it can work with the Quark Express desktop publisher to generate finely formatted output. Author/Editor requires a DTD to be compiled into its internal "rules" file format (ead.rls, available via the EAD Help Pages (76)).

Corel's WordPerfect word processing software began including SGML authoring capabilities with version 6.0. It too requires that the DTD be converted into its internal structure as a "logic" file (ead.lgc, available via the EAD Help Pages) and also incorporates the standard text manipulation features one would expect in a word processor. Printing the finding aid in a format suitable for public use (without tags and with appropriate physical layout on the page) requires application of the "styles" feature in WordPerfect.

4.2.2.2. Tree-structured Editors
Interface Electronics offers Internet Archivist, an authoring package designed specifically for EAD. (77) The structure of the document being encoded displays in one window in a tree or hierarchical view, while the content of each element and its associated attributes appear in text boxes within the principal frame. The package features built-in conversion to HTML and simple printing capabilities.

The ADEPT Editor software from ArborText similarly displays the element structure as a tree in a secondary window and the text of the EAD document in the main window. Like other products in this category, it offers a full range of text processing features and has probably the most complete set of specialized functions for SGML authoring of any available tool. The format of both the screen view and the print output of the SGML instance is governed by two separate stylesheets written according to the FOSI specification (see section 5.3.1 and section 5.3.3 for a discussion of stylesheet languages). ADEPT Editor is available for most operating systems, including Windows, Macintosh, UNIX, and OS2. It was among the first commercial authoring packages to incorporate XML functionality.

Similar products in this group include Adobe's Framework + SGML and Vervet's XML Pro. As XML enters the commercial marketplace, software companies such as SoftQuad (with XMetaL) and Macromedia (with Dreamweaver 2) are adding XML editing capabilities to their HTML editors.

Advantages: Both types of native editors (linear and tree structured) have many useful features that make them an attractive option. The software "knows" SGML in general and the DTD being used in particular. By directly incorporating the DTD, the software can provide continuous validation of a document during the authoring process. These particular applications include many of the features that you would expect in a full-featured word processor, including a spelling checker, thesaurus, macros, internal styles governing the display of text, and templates. They also will manage entities and generate the document's prolog.

Although native editors assume that users have a general knowledge of the structure and application of a particular DTD, the software's prompts and pull-down menus aid the user in the selection of elements and the assigning of attributes. They also help encoders to insert and manage character and file entities. In a sense, the effort is done "up front" during the initial data entry phase. Once the document is finished, no further work is required, with the possible exception of printing a user-friendly version of the inventory.

Disadvantages: Some knowledge of the DTD is required of the encoder, though mastering the software itself generally is no more complex than for any typical office computer application. You probably will have to learn on your own, however, since local training centers are not likely to feature courses in such specialized tools. In addition, the cost of software may be a factor with some products. All of these applications are priced as specialized rather than commodity products, with prices beginning around $450, though Corel does offer significant educational discounts for WordPerfect.

Generating finely formatted print copies often involves additional steps and skills and sometimes additional software. Native editors are best suited to the keying of new or existing inventories not already in electronic form rather than for conversion of existing electronic files. This is because using such editors to encode existing machine-readable texts requires much cutting and pasting of the file after it has been opened in the editor and therefore may actually prove more time consuming (and therefore more costly) than simply rekeying the text.

4.2.3. Text Converters

Text conversion software transforms existing machine-readable text from its original format into an encoded document that conforms to a particular DTD. This category includes tools specific to source documents in a particular file format, such as Microsoft Word Rich Text Format (RTF), or a particular DTD structure; generalized and special purpose SGML-aware programming languages; and word processing macro languages.

Conversion always proceeds from the premise that there is information available in the source document that permits the conversion software to map equivalencies between text or codes in the original document and comparable EAD elements. Such information may include physical formatting data such as punctuation, capitalization, tabbing and indention; word processing styles; or other markup codes such as elements from another SGML DTD or from MARC tags. In general, the greater the consistency in the application of these clues in the source document, the more reliable and complete the conversion. Do not presume that these techniques can be applied successfully to any and all existing electronic texts, absent such consistent conversion "hooks."

Microsoft's SGML Author for Word is a generalized tool for converting documents created in the Windows version of Word into SGML documents. It accomplishes the conversion by using Word styles and templates features. You create styles, one corresponding to each EAD element, in a Word template for your documents, and then you formally create a link in the SGML Author software between each style and the equivalent EAD tag. This map is stored in an association file. As the finding aid is keyed into a Word document, appropriate formatting styles are applied to the text. During the conversion process, the software reads the association file and encodes the text of a particular document with the appropriate SGML tagging. TagPerfect software from the Finnish firm Delta Computers offers comparable functionality for converting Word documents into SGML.

As an alternative to these off-the-shelf solutions, you may choose to create your own program to accomplish the conversion. There are three general categories of tools for doing so, and they are distinguished by the complexity of the effort and power of the languages involved. The simplest languages to learn and apply are the internal macro languages of Word and WordPerfect, and some archival repositories have successfully used them for conversion. The macro programming language in version 8.0 of WordPerfect includes special features that address SGML-specific issues. Beginning with Word97, Microsoft has changed its macro language from WordBasic to Visual Basic for Applications. The macros that can be written using these tools can range in complexity from very simple to highly sophisticated.

A number of special purpose programs have been developed expressly for the task of converting structured text into SGML documents. These include DynaTag from INSO Corporation and Balise from AIS Software. They employ a complex programming syntax and are geared to experienced programmers. Balise is described, for example, as closely resembling the C++ language.

Another conversion option that falls somewhere between these two poles is Perl. A widely employed and well-documented programming language, Perl was designed expressly for the type of text manipulation that is required for conversion of existing finding aids to EAD. It has been used at several repositories by staff with an affinity for such technical undertakings. While one can purchase an introductory Perl manual in order to begin learning this programming language, be forewarned that it will take time to master Perl.

Advantages: Converters permit you to leverage existing machine-readable files and familiar software, provided that existing files are structured in a manner that will enable such transformation. By using the same software for creating inventories as you do for other office documents, you avoid the cost of a new suite of software. You also eliminate or substantially reduce the time required to learn a new application, thereby improving the likelihood of staff acceptance. There is an implicit assumption in such an ex post facto conversion scenario that the authors of documents will need only minimal, if any, knowledge of the underlying DTD structure. Also, since the original document was probably produced using a word processor, you avoid the need for additional steps to generate print copies for public use. Text conversion may be an effective approach for the encoding of legacy data already in electronic form and may also be suitable as part of your workflow for the production of new inventories.

Disadvantages: While staff costs (in terms of the overhead associated with knowing specialized software and the EAD DTD) are assumed up front during the authoring process when you use a native editor, similar overhead costs occur both before and after the fact with converters. First, source documents must be carefully formatted in advance to facilitate subsequent conversions. Development of the conversion routines themselves may involve an extended iterative design process. The conversion itself may prove more or less automatic, but manual intervention or post-conversion manipulation might be required. Some converters may be sensitive to variations occurring in source documents, either because the organizational structure of archival collections themselves vary or because of changes in finding aid formats over time. Programming adjustments may be required. Careful quality control review is necessary to insure that automated processes actually generate the desired output.

4.2.4. Databases

Some archives create and store descriptive information using off-the-shelf relational database management (DBMS) software. This approach may be particularly attractive for repositories that wish to link collections management data stored in a DBMS with the associated descriptive metadata typically found in finding aids. In this scenario, EAD encoding may be applied to text stored in the database during the process of generating output from the database system. This assumes both that the field structure of the database corresponds to EAD elements and that the software has the functional capability to generate output in the format of an EAD-compliant document. One feasible scenario may be to use a DBMS to generate container lists and perhaps series descriptions, while keying lengthier narratives such as biographical or scope and content notes in a word processor.

Eloquent Software's Gencat program is a proprietary DBMS that offers output of files in multiple formats, including EAD. Some archives have already written their own applications to export data from a database as EAD documents, but fairly advanced programming skills are required to do so. A potentially complex, yet extremely critical, issue in the design of such a database is the development of an architecture that supports the multilevel hierarchical structure of the components of an archival collection that is at the heart of EAD. Part of the challenge may lie in the fact that many archival database systems are very "flat," allowing only one or two levels of hierarchy to be expressed.

The use of databases may become more widespread and simpler to implement in the future when producers of relational databases such as Oracle, SyBase and others implement XML functionality into their products, as they have promised.

Advantages: Use of a DBMS may be advantageous for institutions with substantial investments in such applications. It may also be valuable for those needing to interchange descriptive metadata of the type found in EAD finding aids with other applications, such as collection management or records management systems.

Disadvantages: The programming required to implement such a database and export its data to EAD may required highly specialized training or skills. In addition, conversion of a "flat" database structure into EAD will fail to exploit some of EAD's power to express archival hierarchies.

4.3. Technical Issues in Authoring

This section provides discussions of the following technical issues as they relate to the authoring of EAD documents:

4.3.1. Structure of the EAD DTD

The EAD Document Type Definition (DTD) is an essential component of the authoring process. As a document, the DTD is constructed according to a strict syntax specified by the SGML standard. For file management purposes, components of the EAD DTD have been divided in a modular fashion into the ead.dtd file and four other associated files that function together as a unit. Two of these (see below) are not required if the finding aid is encoded using EAD in XML mode. All five files are simple text documents in ASCII format that can be viewed and edited in text or word processing software. The five files are:

ead.dtd-This is the core EAD DTD file. It is brief, containing a version history of the DTD plus entity references that invoke the other files in the EAD suite. It also contains three conditional sections that enable or disable the following features: XML compatibility, XLink functionality, and the specialized features of EAD's array of tabular elements. The use of these features is described in section 4.3.2.1 (XML compatibility), section 4.3.5.4 (tabular layout), and section 7.2.4 (XLink functionality).

eadbase.ent-This is the largest file of the group and contains the SGML rules for EAD.

eadnotat.ent-This file contains references to the various types of notational (nontext) files that might be used within an EAD document. These include common image file formats such as GIF, JPEG, TIFF, and MPEG (see section 6.5.2.4.2 for more information on notational files).

eadchars.ent-This file contains references to the various character sets that might be used in an EAD document. All character sets are referenced by their standard ISO identifiers. This file is not required if the document is created in XML, which uses the Unicode character set (or some subset thereof) by default (see section 6.5.2.1 for more information on character sets).

eadsgml.dcl-This is the SGML declaration file, which specifies various features of the DTD that a processing application may need to know. While many DTDs utilize a standard SGML reference declaration, EAD employs its own version. Some software applications incorporate the text of the declaration at the beginning of each SGML instance. All XML documents employ a default declaration and so do not require the use of this file.

4.3.2. SGML versus XML

EAD is written so that it can be made to conform to the specifications of either SGML or XML. The form of the DTD and its associated files that is available from the EAD home page at the Library of Congress is SGML compliant. While XML may, in general, be thought of as a subset of SGML, there are five differences in XML that must be accommodated to make an EAD document XML-compliant. You must be particularly aware of these differences when converting existing SGML versions of documents into XML.
4.3.2.1. Changes in the DTD Files
If the DTD is to be used with XML applications such as validating processors, one change must first be made to the "ead.dtd" file. There is a section towards the end of the file headed "SGML EADNOTAT AND EADCHARS INCLUSION/EXCLUSION." At the end of this section, there is an entity reference that reads "<!ENTITY % sgml 'INCLUDE' >". To "switch off" SGML compatibility and "switch on" XML compatibility, change 'INCLUDE' to 'IGNORE'.

When you make this change, observe that the explanatory note in this section of the DTD file points out that "for XML, the eadnotat.ent file should be invoked in the declaration subset of [the] individual instance." This means that the file "eadnotat.ent" must be explicitly declared as an entity in the prolog of each EAD instance that contains links to notational (nontextual) data such as graphics files (see section 6.2.3 for a general discussion of the document prolog). For XML instances, the prolog of EAD-encoded finding aids should therefore read:

	<!DOCTYPE ead PUBLIC "-//Society of American Archivists//DTD ead.dtd
	(Encoded Archival Description (EAD) Version 1.0)//EN" "ead.dtd"
	[
	<!ENTITY % eadnotat PUBLIC "-//Society of American
	Archivists//DTD eadnotat.ent (Encoded Archival Description (EAD)
	Notation Declarations Version 1.0)//EN" "eadnotat.ent">
	%eadnotat;
	]>

While it is not necessary to declare the notation file "eadnotat.ent" if the finding aid does not contain a link to notational data such as graphics files, it is probably safest to add it in all cases as a default. Note that the Uniform Resource Identifiers (URIs), in this case simple file names that refer to the "ead.dtd" and the "eadnotat.ent" files, must point to the exact physical location of these two files on your system. Their content may therefore vary from the above examples in accordance with your local storage practices for the DTD and its associated files.

4.3.2.2. Case Sensitivity
Markup in SGML is not case sensitive for element or attribute names. For compatibility with the Unicode character set specifications, however, XML markup is case sensitive for this data. The EAD DTD prescribes that element and attribute names must all be in lower case for XML compliance. Some SGML authoring or parsing software automatically writes out such names in upper case. The resulting files must be edited to change all attribute and element names to lower case if such files are intended for use in an XML-compliant system. A conversion macro written for Microsoft Word is available via the EAD Help Pages.(78)
4.3.2.3. The XML Declaration
A declaration that the file is an XML document must appear at the beginning of each XML instance. It has three components: the XML version employed, whether the document uses an external DTD such as EAD, and optionally, the Unicode character encoding scheme utilized (such as UTF-8). A typical XML declaration might read:

	<?xml version="1.0" standalone="no" encoding="UTF-8"?>

4.3.2.4. Empty Elements
Lastly, SGML and XML differ in the markup syntax used for elements declared to be "empty" in the DTD; that is, elements that contain neither other elements nor PCDATA. The relevant EAD elements are <lb>, <extptr>, <extptrloc>, <ptr>, and <ptrloc>. Except for <lb>, which is a formatting device, these are all linking elements that utilize attribute values to point to other locations or files. In SGML, empty elements require only a start-tag (<lb>). XML adds an additional form called the empty-element tag, which has the syntax <lb/>. In XML systems either the empty-element tag form (<lb/>) or the use of both start- and end-tags (<lb></lb>) is valid; however, the XML standard declares that the empty-element tag form must be employed "for interoperability." While the meaning of this statement is admittedly vague, it is easiest simply to use the empty-element-tag syntax (<lb/>) as your default in XML documents.

4.3.3. Parsing

It is important to verify the conformity of an EAD document to the specifications of the DTD. This should be done regularly during the authoring process in order to reveal any errors made during encoding, and it also must be done once as a final step prior to publishing an encoded finding aid. This process is known as parsing and may be accomplished in several ways (See section 6.2.4 for additional technical information on parsing). Native SGML and XML editors, as well as programs like WordPerfect and Framemaker + SGML, have built-in parsers that continuously monitor DTD compliance. There are also numerous stand-alone parsers that are freely available. James Clark's SP program is considered to be among the best for SGML; it may be configured for XML as well.(79) Currently other free XML parsers are available from IBM/Lotus, DataChannel, and Microsoft. One must observe a bit of caution with XML parsers, as a number of those currently on the market are validators only; this means that they check for a "well-formed" XML file rather than actually parsing the file against the appropriate DTD.

4.3.4. Sharing Data Between MARC Records and EAD Finding Aids

Many archival repositories (particularly in the United States) that create electronic finding aids also produce a MARC catalog record for each collection described in an EAD finding aid, and operational advantages may be possible by sharing data between the two electronic files (see section 1.6 for a discussion of the descriptive relationship between catalog records and finding aids). The migration of data may flow in either direction-from finding aid to MARC record, or vice versa-depending on factors such as the relationship of the data in each, the sequence in which each is created, and institution-specific workflow. Two techniques for data exchange currently are possible.

Both the Windows and Macintosh operating systems permit the transfer of data from one application to another (via the clipboard and scrapbook respectively), and it is therefore a simple matter to cut-and-paste text between a catalog record and a finding aid document. You can simply open your catalog editing software and EAD authoring application simultaneously, in separate windows on the desktop, and transfer the information. This may be particularly useful if the existing legacy finding aid comprises only a container list and can be combined with the contents of a MARC record containing summary contextual information such as a scope and content note.

Some repositories may require a more automated approach in order to transfer large quantities of such data in batch mode. One option would be to write your own conversion program to transform the data from MARC into EAD, or vice versa. Another approach would be to use the MARC DTD developed by the Library of Congress. A simple DOS program from Logos Research(80) converts records from the MARC "transmission format" into the MARC DTD structure, and vice versa. The Library of Congress also offers two free programs, written in the Perl language, to convert records between these two formats. (81) Once a MARC record has been converted into the MARC DTD structure, you can use a transforming application to render the data from the MARC DTD syntax into the appropriate EAD syntax. Such transformations may be accomplished by various tools such as an XSL processor used in conjunction with an XSL stylesheet (see section 5.3.3.2 for a discussion of XSL and stylesheets). Among these tools is PatML, a freeware product from IBM.(82)

Future development of the xml:namespace standard may make it possible to include information encoded in more than one DTD within a single EAD instance. As a result, we may have a third option in the future in which MARC data, in the MARC DTD structure, might be embedded directly in the EAD instance without first necessitating its transformation into EAD.

4.3.5. Effects of Various Features on Output

While EAD markup focuses on designating the content and structure of the finding aid, there are certain aspects of encoding that may affect document output, both online display and printouts. Among these are whitespace, punctuation, headers, tabular displays, and certain other elements and attributes. (83) In applying the following guidelines, be aware that the need for consistency in union catalogs of finding aids may require that you make modifications to accommodate system requirements.
4.3.5.1. Whitespace
Areas within an EAD document that do not include text because of a blank space (such as between words), a tab, a carriage return, or a line feed may have meaning. Such areas are known as whitespace, which is preserved in SGML within the text of a document, though not necessarily in markup, according to a complex set of specifications. In contrast, an XML processor must pass along all characters, including whitespace, to an application, which may or may not preserve them.

We cannot necessarily anticipate the actions that future processors will take on current text. With both SGML and XML, it is therefore prudent not to attempt to format text by incorporating whitespace in your document, other than between words, but rather to manipulate display completely through a stylesheet. Two examples may help illustrate how SGML and XML handle whitespace.

Keying text in the following manner into an SGML authoring application

	<p>November 1:       The work of the Commission began ... </p>

will not ensure that rendering software such as a browser will actually display the text as follows:

	November 1:       The work of the Commission began ...

It is much more likely with current software that the six blank spaces will be compressed into one. There are certain circumstances, however, in which one must be careful to ensure that at least one space does appear between words. This is true for inline elements, especially those that might be nested. Consider the following example:

	<p>The movie, <title render="italic">Shakespeare in
	Love</title>, won the Academy Award for best picture.</p>

Without the space before the <title> start-tag and after the </title> end-tag, the text might be rendered as follows:

	The movie,Shakespeare in Love, won the Academy Award for best picture.

Where the need for spacing in a prescribed situation can be anticipated (such as when a <unitdate> always follows <unittitle>) and a universally valid style rule can therefore be applied, a stylesheet may be used to supply the whitespace required. Unfortunately, not all situations are so predictable; for example, as shown above, you may not be able to guarantee that a space will be required after every instance of <title> when it occurs within a <p>. In such cases, your markup should include a single space following the inline element. It's better to be safe than sorry! Most processing software will reduce extra whitespace to a single space, but it would be quite problematic to expect your system to supply spacing where none exists.

4.3.5.2. Punctuation
Terminal punctuation that appears at the end of the text within an element may be keyed into the body of the document or supplied by a stylesheet. It is advisable to key in periods at the end of full sentences. Other decisions will be situational.

Punctuation within the body of a paragraph of text must be entered as data. Marks of punctuation such as colons and commas that are used between EAD elements for visual recognition or clarity, however, may be more safely supplied by a stylesheet; this approach enables global changes in such formatting to be accomplished at a later date with a minimum of effort by simple changes to the stylesheet.

The XSL style language provides the ability to reorder the sequence of elements, and such resorting of text may affect output punctuation as well. Elements that are initially keyed in a particular sequence and separated by some mark of punctuation may later be resorted into another sequence. For example, embedded punctuation in the <unittitle> element in the following markup

	<unittitle>Papers,<unittitle><unitdate>1975-1997</unitdate>

might read correctly in one case when displayed as:

	Papers, 1975-1997

but would include a superfluous mark of punctuation if this text were presented "out of line," in this manner:

	Title:   Papers,
	Dates: 1975-1997

It is therefore preferable to supply such punctuation for display purposes through your stylesheet. In some circumstances, however, such as the <persname> element in the following example, you should supply the punctuation within the markup; this is because you cannot predict whether a comma will always follow a <persname> element within a <p>. Moreover, this text is unlikely to be reordered for display. Since the text might be extracted for indexing purposes, however, it is advisable to place the second comma outside the <persname> element so that it is not inadvertently treated as a part of the name:

<p>The author, <persname altrender="bold">Bill Smith</persname>, was born in 1912.</p>

4.3.5.3. Headings
The <head> element, which is widely available in EAD, and the LABEL attribute, which is available only in the <did> subelements, provide two methods for incorporating text that assigns a name or header to certain sections of a finding aid. (84) The phrase "Scope and Content of the Collection" at the beginning of a <scopecontent> element is one example of such a heading. There may be long-term advantages to generating such display via a stylesheet, however, in lieu of actually encoding the heading phrase into the finding aid by using <head> or LABEL, because the stylesheet approach facilitates global modification of the heading information

On the other hand, the content of a heading may be unique to a particular finding aid in a way that cannot be anticipated by a stylesheet or derived from other text in the document; in such a case, the use of <head> is ideal.(85)

4.3.5.4. Tabular Display
Finding aids often present data in a columnar or tabular format. Typical examples are the listing of biographical data by year and event, or the layout of box and folder or microfilm reel and frame numbers in container listings. Such displays may be thought of in the same terms as a spreadsheet: a series of cells containing data in a grid of rows and columns. Elements such as <date> and <event> define the data that constitutes the information in each "cell" that will be included within a tabular display. They are then wrapped up in a <chronlist>, <list>, or <table> element. It is possible for a stylesheet to formulate the desired tabular layout based on this markup by implication, rather than by having to designate explicitly that every <event> is a separate cell.

Within the <dsc> element, EAD includes an optional model of tabular displays that does require the deliberate specification of each cell, wrapping <drow> and <dentry> tags around them. Experience with EAD has shown, however, that effective tabular displays can be generated in the <dsc> and other areas of the finding aid by using stylesheets without the need to add this extra layer of tabular markup. Both the Cascading Style Sheets (CSS) language and the Extensible Style Language (XSL) can create tabular layout (see section 5.3.3 on stylesheet languages). Consequently, the <drow> and <dentry> tabular model is not included as a default feature of the EAD DTD, nor is it detailed in these Guidelines, though its application is documented in the Tag Library.(86)

Should you wish to invoke tabular layout, you must alter the section of the ead.dtd file headed "<!-- TABULAR DSC INCLUSION/EXCLUSION -->" in the following two ways:

4.4. Document Control

As you begin to create EAD documents, the consistency of your data and effective management of the various electronic files will quickly become significant administrative concerns. The issues fall into five major categories:

4.4.1. Encoding Consistency

In addition to mastering the guidance offered in the EAD Tag Library and these Guidelines, you should develop local "best practice" guidelines to record your decisions regarding the EAD elements and attributes to be used, their relationships to each other, the order in which they will be presented to structure a complete finding aid, depth of tagging of data such as proper names, and other such determinations. Regardless of the software you are using to encode your finding aids, the use of templates will help to enforce consistency.

4.4.2. DTD Conformance

Various types of SGML software can help enforce valid encoding by controlling how and where an encoder may apply particular elements and attributes. In addition, running a complete EAD instance through an SGML parser will determine whether the finished file constitutes valid SGML.

4.4.3. File Names and Locations

As your population of EAD documents grows from a few dozen to hundreds or even thousands in number, the resulting array of computer files will become a problem if not properly managed. You will have EAD document files, entity files of text and images, stylesheet files, and the EAD DTD files themselves, all of which require careful file management.

Section 5.4 discusses the effects that issues such as changing file names, file directory structure schemes, or Web site locations have on the publication process and suggests options-such as an SGML catalog, file handlers and purls-for dealing with them. Good file management, however, begins during the authoring process with the systematic assignment of a standard naming protocol for files and a logical directory structure for organizing files on your computer. You also will need some type of system-a file, an index, or even a database-that tracks the names that have been used and associates them with a unique and meaningful description of the collection represented by the electronic version. This is necessary both to ensure administrative sanity and to enable the proper functioning of systems for user indexing, display, and retrieval.

4.4.4. Version Control

Long before EAD was developed, it was routine for repositories to update existing finding aids, either to incorporate additions to collections, to reflect improved physical processing, or to upgrade descriptive information. Archivists therefore are already aware of the importance of maintaining a record of earlier versions of finding aids in order to ensure that queries based in those versions can be successfully answered. As finding aids are encoded in EAD and made more widely available, the frequency of revisions may increase, due both to ease of revision and feedback received from users. It will therefore be important to maintain awareness among repository staff of the importance of such record keeping.

Use of the <revisiondesc> subelement within <eadheader> may be useful in this regard (see section 3.6.1.4 for additional information). Documentation of the processes that you develop for encoding will also be helpful.

4.4.5. Security

As with all important computer files, you should perform regular backups of your finding aid files, locate at least one set of backup files offsite, and maintain virus protection software on your system. In addition, you should give users only read-only access to copies of finding aid files, not read-write access to the original files.


Footnotes

  1. The Encoded Archival Description Official Web Site is available at: <//www.loc.gov/ead/>.

  2. The EAD Help Pages are available at: <http://jefferson.village.virginia.edu/ead>.

  3. Other taxonomies are certainly possible.

  4. Information about many of the software products, both commercial and freeware, and the specific EAD-related files mentioned in this chapter is provided in the EAD Help Pages, available at: <http://jefferson.village.virginia.edu/ead>. New information will be added as additional products become available. Detailed information on many products also may be found in Robin Cover's SGML/XML Web Page, available at: <http://www.oasis-open.org/cover>.

  5. The EAD Help Pages are available at: <http://jefferson.village.virginia.edu/ead>.

  6. For more information, consult the company's Web site, available at: <http://www.interface.com/ead>.

  7. The EAD Help Pages are available at: <http://jefferson.village.virginia.edu/ead>.

  8. James Clark's SP program is available at: <http://www.jclark.com>.

  9. Logos Research's MARC-related resources are available at: <http://www.logos.com/marc>.

  10. The Library of Congress' MARC-related programs are available at: <//www.loc.gov/marc/marcdtd/usermanual.html>.

  11. Information about PatML is available at: <http://alphaworks.ibm.com/tech/patml>.

  12. Section 4.3.5.3 describes how the <head> and <emph> elements and the LABEL and RENDER attributes affect display.

  13. See section 3.5.1.1 for a discussion of the difference between <head> and LABEL.

  14. Note that the same uses and limitations of stylesheets apply in similar fashion in the context of the <emph> element and the RENDER attribute.

  15. Encoded Archival Description Tag Library, Version 1.0 (Chicago: Society of American Archivists, 1998), 33-35, 108-109, 115-116.

Table of Contents
Home Page Preface Acknowledgments How to Use
This Manual
Setting EAD
in Context
Administrative
Considerations
Creating Finding
Aids in EAD
Authoring EAD
Documents
Publishing EAD
Documents
SGML and XML
Concepts
EAD Linking
Elements
Appendices


Go to:


Copyright Society of American Archivists, 1999.
All Rights Reserved.


[VIEW OF LC DOME] The Library of Congress

Library of Congress Help Desk (11/01/00)