Illustrated Book Study: Digital Conversion Requirements Printed Illustrations

Anne R. Kenney and Louis H. Sharpe II
with Barbara Berger, Rick Crowhurst, D. Michael Ott, and Allen Quirk
July 1999

Part One -- Part Two

RELIEF

Wood Engraving

INTAGLIO

Photogravure

PLANOGRAPHIC

Collotype

Report to the Library of Congress - Preservation Directorate
Contract #IN97C-22/97CLCCT7021

Contractors
Cornell University Library, Department of Preservation and Conservation, Ithaca, NY
Picture Elements, Inc., Berkeley, CA

Cover illustration: The authors intentionally chose representative eyes from the various book illustrations as a tribute to James M. Reilly, Director of the Image Permanence Institute. In his "Flowchart for Identification Guide,"Care and Identification of 19th Century Prints (1996), Reilly used photomacrographs of eyes to illustrate the differences among various 19th century photographic prints.

TABLE OF CONTENTS

ABSTRACT

1.0 INTRODUCTION

2.0 PROJECT METHODOLOGY

2.1 SELECTING SAMPLE PAGES

TABLE 1. COMMON BOOK ILLUSTRATIONS OF THE 19TH AND 20TH CENTURIES

2.2 CHARACTERIZING THE ATTRIBUTES OF DIFFERENT ILLUSTRATION PROCESS TYPES AT VARIOUS LEVELS

2.3 MAPPING ILLUSTRATION PROCESS TYPES TO ELECTRONIC CONTENT TYPES

2.4 DIGITIZING SAMPLE PAGES

2.5 EVALUATING SAMPLE IMAGES

2.5.1 Structure
2.5.2 Detail
2.5.3 Essence
2.5.4 Print Evaluation

2.6 INVESTIGATING METHODS FOR AUTOMATIC DETECTION OF ILLUSTRATION CONTENT REGIONS

2.6.1 Introduction
2.6.2 Background
2.6.3 A Possible Strategy for Identifying Illustration Regions
2.6.4 Document Understanding
2.6.5 Detecting Text Regions
2.6.6 Conclusion

2.7 INVESTIGATING AUTOMATIC METHODS TO DISCRIMINATE DIFFERENT ILLUSTRATION PROCESS TYPES

2.7.1 Introduction
2.7.2 Necessary Distinctions
2.7.3 Possible Characteristics for Classifying Illustration Processes
2.7.4 The Special Case of Halftones
2.7.5 Conclusions on Discriminating Illustration Process Types

2.8 INVESTIGATING METHODS FOR PROCESSING DIFFERENT ILLUSTRATION TYPES

2.8.1 Introduction
2.8.2 Possible Image Processing Steps
2.8.3 General Concepts
2.8.4 Common Processing Steps
2.8.5 Halftone Illustration Regions
2.8.6 Soft Process Illustration Regions
2.8.7 Fine-Featured Hard Process Illustration Regions
2.8.8 Hard Process Illustration Regions

2.9 AN EXAMPLE UTILITY FOR HALFTONE PROCESSING

2.10 TESTING AND VERIFYING THE PROCESS

2.10.1 Full Resolution View
2.10.2 Derivative Creation
2.10.3 Printing
2.10.4 Other Observations
2.10.5 Compound Documents
2.10.6 Desirable Enhancements to the Utility

3.0 CONCLUSIONS

REFERENCES

APPENDIX 1. ADVISORY COMMITTEE MEMBERS

LIBRARY OF CONGRESS AFFILIATION
CORNELL UNIVERSITY AFFILIATION

APPENDIX 2. HALFTONE UTILITY USER'S MANUAL

Abstract

The Cornell University Library Department of Preservation and Conservation and Picture Elements, Incorporated undertook a joint study for the Library of Congress to determine the best means for digitizing the vast array of illustrations used in 19th and early 20th century commercial publications. This work builds on two previous studies. A Cornell study[1] characterized a given illustration type based upon its essence, detail, and structure. A Picture Elements study[2] created guidelines for deciding how a given physical content region type should be captured as an electronic content type. Using those procedures, appropriate mappings of different physical content regions (representing instances of different illustration processes) to electronic content types were created. These mappings differed based on the illustration type and on the need to preserve information at the essence, detail, or structure level. Example pages that are typical of early commercial illustrations were identified, characterized in terms of the processes used to create them (e.g., engraving, lithograph, halftone), and then scanned at high resolutions in 8-bit grayscale. Digital versions that retained evidence of information at the structure or process level were derived from those scans, which for many illustration types required high resolution to capture. A general consensus was reached, however, that 400 dpi 8-bit capture could serve to preserve the essence and detail information present in all the illustration types studied, regardless of the production process used to create the published originals. This recommendation represents a good cost-benefit requirement for imaging when process identification is not an absolute requirement and in circumstances where mass-produced books, containing both illustrations and text, are to be converted. Project staff investigated the available means for automatic detection of illustration content regions and methods for automatically discriminating different illustration process types, and for encoding and processing them. A public domain example utility was created and tested, which automatically detects the presence and location of a halftone region in a scan of an illustrated book page and applies special processing to it.

1.0 Introduction

In 1998, Cornell University Library's Department of Preservation and Conservation and Picture Elements, Incorporated conducted a joint study for the Library of Congress to determine the best means for digitizing the vast array of illustrations found in 19th and early 20th century commercial publications. This project was intended as the first step in the development of automated means for detecting, identifying, and treating each illustration process type in an optimal manner to create electronic images that can rival the quality of analog capture. The technology does not currently exist to do this. In fact, no thorough attempt has been made to even characterize all of the features of importance for illustrations produced by a given production process, from the point of view of high-fidelity digital image capture.

The Illustrated Book Study had a number of key objectives:

Select representative samples of relief, intaglio, and planographic illustration processes prevalent in book production in the 19th and early 20th century.
Characterize the key attributes of different illustration process types by subjective examination, identifying significant informational content for each type at three levels: essence, detail, and structure (see 2.2).
Develop appropriate mapping of illustration content types to electronic content types that preserve their essential features to an appropriate degree.
Investigate methods for automatic detection of illustration content regions.
Investigate automatic methods to discriminate different illustration process types.
Investigate methods for processing different illustration process types.
Create an example utility for halftone detection and processing.
Report project results to the Library of Congress and to the broader preservation community.

As a result of this study, recommendations on digital capture have been advanced for use in preservation reformatting of the range of book illustrations typically found in commercial publications from the past two centuries. The groundwork has also been laid for further development that could lead to fully automated processing of such illustrations to ensure high fidelity to the original. Such automated processing will be exceptionally useful during the next decade as cost-effective, high-quality production scanning will be needed to capture these materials for inclusion in electronic libraries.

2.0 Project Methodology

2.1 Selecting Sample Pages

An Advisory Committee of Cornell University and Library of Congress curators, faculty, and other experts in printmaking and the graphic arts played a critical role in the selection process (see Appendix 1 for names). Project staff consulted a number of extremely useful publications [3-6] in assembling a group of books and journals containing known illustration types from Cornell University Library's circulating collection. From this grouping, the Advisory Committee chose nine examples that represented the range of printmaking techniques prevalent in the 19th and early 20th century commercial book trade. These included a wood engraving, a halftone, steel and copper engravings, an etching, a mezzotint, a photogravure, a lithograph, and a collotype. All examples appeared in bound volumes, either as separate plates or as illustrations on a text page, and they varied in size, level of detail, and sophistication of technique. The following table characterizes the attributes of these illustration types.

Table 1. Common Types of Illustration in 19th and 20th Century Books

ILLUSTRATION TYPE	PREVALENCE	ILLUSTRATION CHARACTERISTICS	CHARACTERISTICS OF SPECIFIC EXAMPLE
RELIEF PRINTING	Most Common Method of this Era	Raised printing surface; image created subtractively "White" surface removed to leave "black" printing surface Matte or glossy paper; separate plate or presented in text; little to no tonal variation
Wood Engraving[7]	1400s- 1890s; most prevalent illustration process in letterpress until introduction of halftones	Created along the end grain of wood Technique permits finer detail than woodcut Line width varies. Ink appears darker around edges Illustration quality varies depending upon where it is in the press run.	Typical of 1860s school Carefully tooled Presented on text page Matte paper .04 mm feature size
Halftone[8]	1880s-Present	Photo-mechanical reproduction process Regularly spaced dots of variable sizes Ridges of ink along dot edges Poor reproduction of detail Common screen rulings, 110-200	Halftone of a painting Presented on text page Glossy paper 166 screen ruling at 45 degrees
INTAGLIO PRINTING	1400s-Present	Recessed printing surface: "black" areas removed to create grooves to hold ink Tonal variation created by groove size and depth; separate plate or presented in text Illustration quality depends on print run
Steel Engraving[9]	1820s-Present	Metal removed to create lines; finer detail than wood engraving and larger print runs than copper Lines are fine, uniform, smooth, and parallel, with crisp edges that tend to be tapered at the end; cross hatchings represent mid-tones.	Typical example; includes some etching Presented on text page Matte paper .02-.04 mm feature size
Copper Engraving[10]	1700-1880s	Lines are fine, uniform, smooth, and parallel, with crisp edges that tend to be pointed at the end; cross hatchings represent mid-tones Difficult to distinguish from steel engraving. Softer than steel; large print runs show signs of plate wear, with loss of fine lines	Topographical scene Separate plate; no plate mark Matte paper, covered with protective sheet .04 mm feature size
Etching[11]	1600s-1880s	Illustration drawn with needle on wax or gelatin covered plate which is etched by dipping in acid. Lines characterized by blunt ends, width varies as result of acid dips More free-form than engravings	Separate plate; plate mark is present Matte paper Etching with dry point .02-.06 mm feature size; most .04 mm
Photogravure[12]	1880s-Present	Virtually continuous tone photo-mechanical reproduction Varied amounts of ink on page offer excellent reproduction of detail, mimics tonal variation Extremely fine grid screen of soft, ragged dots or irregular grain, like confectioner's sugar	Representation of photograph Separate plate; plate mark present Matte paper, covered with protective sheet Under .01mm feature size
Mezzotint[13]	1780s-1870s	Plate surface is roughened to a texture of fine sand paper Surface then burnished to produce lighter tones Irregular sandy grain structure; occasional linear pattern detected.	Typical example Separate plate; plate mark present Matte paper, covered with protective sheet .01 mm feature size
PLANOGRAPHIC PRINTING	1820s-Present	Flatness of both paper and ink, no plate marks Wide tonal appearance possible. Matte or glossy paper
Lithograph[14]	1820s-Present	Image transferred or drawn directly on the printing surface Drawing substance must be greasy Irregular pebbly grain structure; appears as crayon on coarse paper	Good example of process Separate plate Matte paper .04 mm feature size
Collotype[15]	1870s-1910	Virtually continuous tone photo-mechanical reproduction Telltale irregular and fine cracks (reticulation). Process is used where accuracy of tone is important; excellent detail rendering	Collotype of an engraving Printed on separate paper, trimmed, and pasted into the book Glossy paper, covered with protective sheet .01 mm feature size

2.2 Characterizing the Attributes of Different Illustration Process Types at Various Levels

Determining what information in an original artifact should be represented in a digital reproduction is a subjective decision that must be based on a solid understanding of the nature and significance of the material to be converted. Advisory Committee members characterized the key attributes of commercially reproduced versions of the different illustration processes, and assessed the significant informational content that must be conveyed by an electronic surrogate to support various research needs. They articulated the telltale characteristics of the various relief, intaglio, and planographic production processes reviewed, and their descriptions have been summarized in Table 1. Finally, the Advisory Committee was also asked to reflect on the intended uses of the sample documents in the context of their having been issued as part of larger published works rather than as individual pieces of art.

Three levels of presentation were determined:

structure: representing the process or technique used to create the original. The level required for a positive identification of the illustration type varies with the process used to create it. For instance, it is easy to make a positive identification of a woodcut or a halftone with the unaided eye. The telltale reticulation of a collotype, however, may only be observable at magnification rates above 25x.
detail: representing the smallest significant part typically observable close up or under slight magnification, e.g., two times, again a psycho-visual determination.
essence: representing what the unaided eye can detect at a normal reading distance. This view is based on the psycho-visual experience of the reader rather than any feature associated with the source document.

2.3 Mapping Illustration Process Types to Electronic Content Types

Once the various illustration process types had been characterized subjectively, project staff then sought to represent these attributes objectively, e.g., by measuring the spatial extent of the finest lines. The next step involved translating the objective measurements of the original illustrations into similar assessments that pertain directly to the electronic version.

Digital imaging is a process of representing an original document by sampling and mapping it as a grid of uniform dots or picture elements (pixels). Each pixel is assigned a tonal value and represented as a digital number. Conventional wisdom regarding full capture is to have one to two pixels span the finest feature.

Digital requirements to reflect the structure view were predicted by measuring the finest element of the various print processes, which was easy to do for those characterized by well defined, distinct edge-based features, including the engravings, the etching, and the halftone. Despite differences in their identifying characteristics, project staff measured features ranging from .02 mm to .06 mm in size, with the majority of them measuring .04 mm. Evidence of the collotype structure was found in microscopically thin reticulation lines, measuring .01 mm or finer. For those items that were continuous tone-like, exhibiting soft grainy, dotted, or pebbly structures (e.g., the photogravure, mezzotint, and lithograph), feature details were hard to characterize and measure. Feature size estimates ranged from .04 mm to below .01 mm.

Based on the feature size measurements taken at Cornell, we quickly concluded that the resolution required to faithfully represent the structural characteristics would overwhelm any scanning project involving commercially produced publications. At a minimum, the resolution needed to preserve structural evidence in the digital surrogate, calculated at one pixel/feature, ranged from 635 dpi to over 2,500 dpi.

Predictions of digital requirements for the essence view were based on what a person with 20/20 vision could expect to discern under normal lighting at a reading distance of 16 inches. According to optometrists, such a person can distinguish a small letter "e" subsuming 5 minutes of arc at that distance. The "e" comprises five parts, each represented by 1 minute of arc. A minute of arc equals 1/60 of a degree, or .01667 degrees. To determine the size of the smallest feature discernible at 16 inches, the following formula is used: x/16 = tan (.01667); so x=.004656 inches. This means that a person with 20/20 vision can detect features as fine as 1/215th of an inch (118 micrometers) at a 16 inch distance. Brian Wandell makes reference to studies showing that, at high ambient light levels, the highest detectable spatial frequency is 50 to 60 cycles per degree (cpd). Since there are 60 minutes of arc in one degree of arc, this says we need two digital samples (one for the black bar of the cycle and one for the white bar of the cycle) in one minute of arc or 120 of them in one degree of arc. This is reasonably consistent with the optometrists' metric, which is based on visual perception under normal light conditions.[16]

These human visual capabilities suggested that a reasonable digital requirement for an on screen view representing the essence of a page would be 215 dpi. Predictions of digital requirements for the detail view were pegged at 2x magnification, which would require a digital resolution of 430 dpi. The resolution required to produce a print equivalency was estimated to be higher because printing is a notorious "quality sink."[17]

2.4 Digitizing Sample Pages

Each sample page was scanned at a variety of resolutions with 8-bit grayscale data captured. Grayscale data is essential to reproduce the subtleties of perceived tonality inherent in many of the illustration types. It also permits accurate representation of fully bitonal features (having little tonality) when the feature size decreases toward the size of the image sampling function. Grayscale images allow various techniques used by skilled illustration artisans to have the intended tonal effects. For example, grayscale can preserve the modulation of the acid bite in an etching or the variation of the depth of a gouge in an engraving. Grayscale further permits the production of reduced-resolution images from a high-resolution original by means of accurate scaling algorithms.

All illustrations were captured at a fixed spatial resolution of approximately 24 dots per millimeter (600 dots per inch) with an attempt made to capture the entire page that contained the illustration. These full view images were captured on a PhaseOne PowerPhase camera back having a 7,072 pixel moving tri-linear color CCD array. A Hasselblad camera body and Zeiss lenses were used, with a TG-1 filter intended to produce a photopic-like response from the array's wavelength characteristics and the tungsten lighting (using ENH-type reflector bulbs). A color balanced grayscale output was created by the PhaseOne system from the red, green, and blue inputs. For those finely inscribed or continuous tone-like illustrations, a high magnification was used to capture close-up views of their structure. These zoom images were captured on a Kodak Ektron 1400 series camera, having a 4,096 element moving linear grayscale CCD array. Nikon 35mm enlarging lenses and extension tubes were used.

2.5 Evaluating Sample Images

Project staff at Cornell evaluated these images on screen to make a preliminary assessment on resolution requirements for representing the structure of the different illustration processes. They determined that in the case of the relief printing examples (the wood engraving and the halftone) the full view images successfully represented the structure. For the intaglio and planographic illustrations, the zoom images were needed to represent structural evidence. A set of images at lower spatial resolutions (ranging from 200 dpi to 600 dpi) was created from these source images by a process of bi-cubic scaling. Project staff prepared two views of these images to make preliminary judgements regarding the essence and detail representation. View 1 presented image segments at their native resolutions in a 100% view (1:1). For the second view, the lower resolution images were resampled up to 600 dpi using bi-cubic interpolation, a scaling procedure that predicts a new value between two real pixels based on more than the immediately adjacent pixels. The resampled images allowed reviewers to assess images that were the same size on screen. The staff concluded that the 200 dpi versions represented the essence view in all cases, and that the detail view was represented somewhere between 300 and 500 dpi.

Sample images are located at: http://www.library.co rnell.edu/preservation/illbk/AdCom htm.

The Advisory Committee met several times, both in Ithaca, NY and Washington, DC, and assessed the digital surrogates at the three levels of view, comparing them to the original illustrations with and without magnification, and to printouts created from the essence and detail images.

2.5.1 Structure

In most cases, the Advisory Committee agreed with the project staff's judgement regarding structure representation, but noted that the concept could represent two meanings. The first interpreted "structure" as a view that allowed for identification of process type; the second required a view that faithfully replicated the sample under review. The resolution demands for the latter are much higher. For instance, it is easy to identify a halftone, even at relatively low resolutions. In the examples presented on the Web site, the halftone pattern is evident in the 300 dpi view. Representing the exact dot shape and ruling of the original 166 lines per inch (lpi) halftone placed at a 45 degree angle, however, required a 600 dpi - or perhaps a 900 dpi representation. The Advisory Committee also noted that it may be difficult to differentiate between similar process types even at high resolution without additional testimonial evidence conveyed by the original artifact. These include date of publication, creator's name, whether the illustration appears on a separate plate or paper stock, and whether there was evidence of a plate mark. Finally, committee members felt that process identification for the softer edged images required both close examination and a pull back view to reflect on the nature of the overall composition. For instance, identification of the lithograph process relied on assessing the crayon-like appearance of the representation as well as examination of the pebbly grain structure revealed at higher resolutions or under magnification.

In conclusion, most members of the Advisory Committee determined that digital images could provide good evidence of structure, but at the price of very high-resolution image files. A number suggested that while this might be justified for individual artwork or selective samples, this was an impractical expectation in digitizing most commercially produced monographs and journals. One member suggested that a sample of the higher resolution image (e.g., 2,000 x 2,000 pixel clip) could be produced for identification purposes when necessary.

2.5.2 Detail

Advisory Committee members generally agreed that the 400 dpi on-screen view sufficiently captured the detail present in the original when viewed close up or under slight magnification, using a magnifying glass. With two examples-the copper engraving and the etching-some committee members were divided between the 400 dpi and 500 dpi views. Both cases represented intaglio printing with characteristic hard-edged details, which seem easier to judge in terms of accurate representation than the softer featured illustrations. Nonetheless, the committee's judgement regarding the on-screen detail view was remarkably consistent, and varied little with the illustration type.

Detail Represented at 400 dpi
600 dpi	500 dpi	400 dpi	300 dpi	200 dpi

Committee members agreed that 400 dpi 8 bit capture represented a good cost-benefit requirement for imaging when process identification was not an absolute requirement. The value of this approach is that it represents an assessment of close reading requirements that are based on visual perception, not on the informational content of the original materials. This is an important distinction, and suggests a uniform approach to determining conversion requirements for items that contain a broad range of illustration types or that are difficult to quantify objectively. It also represents a reasonable conversion requirement for mixed items, containing both illustrations and text. The complete work can be imaged at the same level, and files post-processed to reflect the best presentation of the informational content - on screen to support various views or printed out to meet readers' needs or used to create an equivalent to a preservation photocopy or microfilm. Where analyzing the print process of the original source is critical to an understanding of the work, the artifact itself should be preserved.

2.5.3 Essence

There was broad consensus from the Advisory Committee on the adequacy of the 200 dpi on-screen view to represent the essence of the original. Lower resolution versions - say 70-100 dpi - will provide a fair likeness of the general image content of the original, but will not match the psycho-visual perception of the original at normal viewing distances. Some tradeoff of perception, however, may be justified in cases where the original can be viewed completely on-screen, particularly for users with lower resolution monitors. For instance, a reader could display the complete image at 200 dpi on an 800 x 600 monitor, only if the dimensions of the original illustration did not exceed 4 inches by 3 inches. At 100 dpi, the complete image could be displayed for illustrations whose dimensions did not exceed 8 inches by 6 inches. In the future, as monitor resolutions increase, the 200 dpi view may become a practical standard for presenting the essence of original graphic illustrations.

2.5.4 Print Evaluation

After the first Advisory Committee meeting in which consensus was reached regarding the essence and detail views, project staff prepared a variety of printouts at the two resolution levels for review. Prints were created on a Hewlett Packard 4MV (HP) laser printer at 600 dpi, using the printer's default dithering algorithm to translate grayscale into the bitonal halftone print. Prints were also produced on the Tektronix Phaser 440, a 300 dpi dye sublimation printer, which offers continuous tone printing rather than halftoning. Project staff created prints two ways: first at the native size of the original, and second in an enlarged mode to review the detail present in the file (e.g., simulating a 400 dpi print on a 300 dpi printer). For comparison purposes, project staff also created photocopies of the original illustrations on the 6085 Canon copier used at Cornell to produce preservation photocopies.

Comparative evaluation of the prints generated by the two printers varied depending on the process used to create the originals. The structure of many of the originals was so fine that when viewed without magnification they appeared to contain shades of gray. Their underlying structure - dots, grains, and lines - became obvious only under magnification. In the photogravure, the detail is so fine (evident only at 50x magnification) that the deposit of ink appears translucent, perhaps allowing some of the light of the paper support to shine through, thus introducing a gray appearance to the black medium. Although the Tektronix printer had half the resolution of the HP printer, its ability to produce actual grays at each pixel resulted in superior print quality. The laser printer relied on a halftoning process to simulate the gray, at a comparatively low 106 lpi, enabling the representation of only 33 gray levels. True grayscale representation proved to be most advantageous in generating prints for those illustration types with soft-edged features that appear to have continuous tonal variation. The photogravure, which conveys tonality rivaling photographic prints, was well reproduced even at 150 dpi on the Tektronix. To rival this quality through halftoning would have required a 2,400 dpi printer capable of producing a 150 lpi screen with the 256 gray values fully represented. On the other hand, distinct, hard-edged representations, such as the wood engraving, rely more on resolution than apparent tonal range in conveying information, as demonstrated by the enlarged Tektronix view.

Advisory Committee members found the printed versions noticeably inferior to the on-screen views, but adequate presentations of the originals, when considered in the context of a preservation reformatting program for brittle books. With one possible exception, prints generated from the 400 dpi 8-bit files produced on either the dye sublimation or laser printers were judged superior to the preservation photocopies made directly from the original illustrations. In the case of the very fine, uniformly inscribed copper engraving, a noticeable moiré appeared in the sky region - nonetheless, project staff favored it over the photocopy. Even the 200 dpi 8-bit image produced a better quality print on the laser printer than was obtained via photocopy. A complete set of all prints generated has been supplied to the Library of Congress.

2.6 Investigating Methods for Automatic Detection of Illustration Content Regions

2.6.1 Introduction

The goal of this part of the project was to develop some approaches for e detection of illustrations of any type, not for the discrimination of one type of illustration from another. This is a most useful general goal, especially if similar processing is to be applied to other illustration types.

This part of the project was aimed at developing general approaches to the problem rather than at developing actual working algorithms, with the expectation that future work will tackle the creation of software for this purpose. This is in contradistinction to the portion of the project that developed the halftone utility. It is worth pointing out, however, that the halftone utility assumes it to be known that at least one halftone region is present in the page image on which it is run. For this reason, the methods of this and the next section will be necessary precursors in a truly automatic processing tool chain for illustrated book pages.

The steps in such a tool chain could be as follows:

Perform basic common processing steps, for example:
- Conservative brightening
- Deskewing
Detect/locate illustration content regions (Section 2.6-this section)
Identify illustration region process types (Section 2.7)
Apply processing steps specific to each illustration type (Section 2.8)
(The halftone utility developed in this project is an instance of a tool fitting into step 4.)

2.6.2 Background

It is typically desirable to handle illustrations in a different manner than text. One reason for this is to permit higher spatial resolutions to be used on the text, since careful rendering of fine character features is key to its legibility. The human eye is quite sensitive to the uniformity of style inherent in a carefully designed typeface across all its characters. If text is rendered with too few samples per stroke, this uniformity is destroyed, with one stroke being one pixel wide and the next being two pixels wide. When a larger number of samples occur across strokes of characters, these variations might be, for example, 10 pixels across one stroke and 11 across the next, a nearly imperceptible difference. The accurate rendition of fine serif features also requires high spatial resolution.

Another reason is that illustration regions more often require grayscale or color (what we will call "multitonal") electronic representations in order to be reproduced with fidelity than text regions do, for which bitonal data often suffices, especially when the spatial resolution is high enough.

The cost of preserving multitonal data rises sharply as the spatial resolution rises, even when moderate compression ratios (of 8:1 or so) are introduced. Since we want to store these multitonal illustration regions at more moderate spatial resolutions, this also argues for separating the regions, allowing separate treatment.

2.6.3 A Possible Strategy for Identifying Illustration Regions

Given the rich variations seen across illustration process types, one approach is not to detect every possible illustration type, but only to detect regions containing textual content and regions containing background. Then, by exclusion, the remaining non-background regions are declared to be illustration regions. This is the approach we suggest.

2.6.4 Document Understanding

Many approaches to segmentation of mixed content pages have been published in the open literature as part of the domain of research referred to as document understanding. Methods for understanding documents attempt to parse a page image into layout blocks or layout elements. These are objects or groups of related objects having a single purpose in the original layout of the page [18]. Several good surveys of this field exist [19, 20].

A variety of top-down or bottom-up methods exist for breaking the various regions of the image into layout blocks. Once this is done, the content within the bounding rectangles of these blocks may be analyzed and classified.

2.6.5 Detecting Text Regions

Many methods for identifying a layout element as textual exist. In most cases, the image is first deskewed and thresholded to yield a bitonal image.

Text regions have a variety of distinctive characteristics. They have a relatively predictable ratio of white to black pixels, often on the order of 10 to 1 in regions mostly containing characters. The statistical distributions of run lengths (counts of adjacent pixels of the same color) for white and black pixels are quite different, with long runs dominating for white pixels and relatively shorter, more consistent lengths (corresponding to strokes) prevalent in black runs.

The correlation (or degree of similarity) between adjacent horizontal scan lines is quite high in text regions, this being the principal way in which the Group 4 MMR (modified modified READ) compression algorithm achieves its results.

Block text regions of Roman characters have a very distinctive texture. They have horizontal spacings between the centroids of black objects which have dominant peaks corresponding to inter-character and inter-word gaps, and vertical spacings which correspond to inter-line gaps. The distribution of inter-word gaps relates to the statistics of the language used. Text lines have highly consistent character baselines and top-lines, with well-known frequencies of occurrence of ascenders and descenders poking through these boundaries (which incidentally can allow the detection of upside-down pages).

2.6.6 Conclusion

To automatically distinguish illustration regions, the best approach seems to be first to perform a general document understanding operation to identify all the layout elements in a page image. Next, each non-background region is examined to determine if it is a primarily textual region. If not, it is classified as an illustration region.

Part One -- Part Two

Home >> Resources >> Research Publications >> Illustrated Book Study