
ALTO <Styles> & <Layout> Usage
TextStyles
Textstyles have no content. The attributes are:
- FONTFAMILY
- FONTSIZE
- FONTCOLOR
- FONTWEIGHT
- FONTSTYLE
- FONTPITCH
- FONTCHARSET
Only FONTFAMILY and FONTSIZE are required.
↑ Back to top ↑
Paragraph Styles
Paragraph styles have no content. The attributes are:
- ALIGN (Left, Right, Center, Block)
- LEFT [numeric]
- RIGHT [numeric]
- LINESPACE [numeric]
- FIRSTLINE [numeric]
↑ Back to top ↑
Attributes of a Page Element
- PAGECLASS
- STYLEREFS
- HEIGHT
- WIDTH
- PHYSICAL_IMG_NR
- PRINTED_IMG_NR
- QUALITY (OK, Damaged, Missing)
- POSITION (Left, Right, Foldout, Single)
- PROCESSING (A link to processing information)
↑ Back to top ↑
Page Areas
Each page is divided into different areas (TopMargin, LeftMargin, RightMargin, BottomMargin and PrintSpace). The margins may contain text or other objects that are not part of the main body.
The positions are given as HPOS, VPOS, WIDTH and HEIGHT.
↑ Back to top ↑
Margins
- TopMargin
- The area between the top line of print and the upper
edge of the leaf. It may contain page number or running
title.
- InnerMargin
- That margin of a page adjacent to the binding edge of a
book.
- OuterMargin
- The space between the text and the outer extremity of
the leaf of a book. May contain margin notes.
- BottomMargin
- The area between the bottom line of letterpress or writing
and the bottom edge of the leaf. It may contain a page
number, a signature number or a catch word.
- PrintSpace
- Rectangle surrounding the printed area of a page. Page
number and running title are not part of the print space.
The position of the margins on a page is illustrated in this picture:

↑ Back to top ↑
The structure of one of the page area (PageSpace) elements
The page area elements have the attributes:
- HPOS
- Horizontal position upper/left corner (1/10 mm)
- VPOS
- Vertical position upper/left corner (1/10 mm)
- WIDTH
- Width (1/10 mm)
- HEIGHT
- Height (1/10 mm)
- ROTATION
- In degrees as floating point number (optional)
All the subelements have those same attributes (except SP, where HEIGHT is missing) with the same meaning.
Each page area may contain any number of elements. Those elements are one of the following:
- TextBlock
- A block of text
- ComposedBlock
- A block that consists of other blocks
- Illustration
- A picture or image
- GraphicalElement
- A graphic used to separate blocks. Mostly a line or rectangle.
Each of them may have the following attributes:
- ID
- Unique ID
- STYLEREFS
- Reference to text or paragraph styles
- HPOS
- Horizontal position upper/left corner (1/10 mm)
- VPOS
- Vertical position upper/left corner (1/10 mm)
- WIDTH
- Width (1/10 mm)
- HEIGHT
- Height (1/10 mm)
- ROTATION
- In degrees as floating point number (optional)
- IDNEXT
- Reference to the next element related to reading order.
If the shape of the element is not rectangular an element SHAPE might be added:

Polygons are coded as X,y x,y ... with different coordinate pairs separated by spaces.
Circles and ellipses are, although allowed in principle, not supported by some vendor tools like docWORKS. Instead, such shapes are represented as polygons with sufficient accuracy.
A TextBlock is divided into lines and those are divided into strings, spaces and hyphens:
<TextBlock>
<TextLine>
<String/>
<SP/>
<HYP/>
</TextLine>
</TextBlock>
Meaning of those tags:
- TextBlock
- A paragraph of text
- TextLine
- A line of text
- String
- A single word
- SP
- White space
- HYP
- Hyphenation characteristics
↑ Back to top ↑
Additional Attributes of the tags
| Element |
Attribute name |
Description |
TextBlock |
language |
ISO639-2 language character code |
String |
CONTENT |
String content (word) |
|
SUBS_TYPE |
HypPart1 |
If content is the first part of a hyphenated word, applies only for the last word of a line if it is hyphenated |
|
|
HypPart2 |
If content is the second part of a hyphenated word, applies only for the first word of a line if it is hyphenated |
|
SUBS_CONTENT |
Complete content of a hyphenated word |
|
WC |
Word Confidence: Confidence level of the OCR results for this string. A float value between 0 (unsure) and 1 (confident) |
|
CC |
Confidence level of each character in that string. A list of numbers, one number between 0 (confident) and 9 (unsure) for each character |
| |
STYLEREFS |
Text style used for this string, if it is different from the parent text block style |
| |
STYLE |
Any combination of font style (italics, bold, …) |
|
ALTERNATIVE |
(element) Any number of alternative strings to be used instead |
Illustration |
TYPE |
A user defined description of the type of the illustration |
|
FILEID |
A link to a seperate file that contains just the illustration. |
ComposedBlock |
TYPE |
A user defined description of the type of the composed block |
|
FILEID |
A link to a separate file that contains just the composed block |
|