A file encoding scheme for formal metadata
by Peter Schweitzer (USGS)
Revised 7-February-1997
Since the
Content Standard for Digital Geospatial Metadata, as the
name implies, specify only the contents of metadata files and not their
encoding, it was necessary to devise this specification for metadata
encoding in order to develop and use the
metadata compiler. The encoding format is purely textual and the
fidelity of the compiler to this format is fanatical.
In general, this encoding format uses an outline-like list of element
names in an ASCII text file. The hierarchy of the Standard is encoded
explicitly and is expressed using indentation.
Note: mp does not read word-processor documents, it only reads ASCII text!
Terms:
- tab
- ASCII 9
- space
- ASCII 32
- element name
- A sequence of bytes consisting of alphanumeric characters, the
underscore, hyphen, apostrophe, and forward slash. This sequence
is one of the formal names given to metadata element names in the
formal syntax specification of the metadata content standard.
Examples:
- Citation:
- Identification_Information
- Data_Set_G-Polygon_Outer_G-Ring
- Range_of_Dates/Times
- value
- A text string associated with an element by the author of
the metadata record.
- compound element
- An element that exists to contain other elements. These are used to
convey the hierarchical relationships among component elements.
- CR
- ASCII 13, carriage return
- LF
- ASCII 10, line feed
Arrangement:
- Metadata files contain only ASCII characters; lines may be terminated
using CR, LF, or CR followed by LF.
- Each file contains exactly one metadata record.
- The number of characters per line is not limited.
- Indentation is accomplished using tab characters or spaces. A single
tab character represents the same degree of logical indentation as a
single space character.
- Blank lines may occur anywhere in the file.
- Element names are spelled out in the metadata file exactly as in the
syntax rules of the metadata content standard.
- A single colon or equal sign may follow each element name.
- Spaces or tabs may occur between element name and colon or equal sign,
and may occur after the colon or equal sign.
- Values are associated with an element in one of three ways:
- The value begins at the first nonblank following the element name
(or following colon or equal sign) and extends to the end of the line.
Example:
Originator: Beeblebrox, Zaphod
- The value begins on the line following the element name. It is
indented more than the element name, i.e. there are more spaces or
tabs preceding the value than precede the element name. Example:
Title:
Geometeorological data collected by the USGS Desert Winds
Project at Gold Spring, Great Basin Desert, northeastern
Arizona, 1979 - 1992
- The value begins on the line containing the element name. It extends
onto subsequent lines, where it is indented more than the element name,
i.e. there are more spaces or tabs preceding the value on lines following
the element name than precede the element name. Example:
Title: Geometeorological data collected by the USGS Desert Winds
Project at Gold Spring, Great Basin Desert, northeastern
Arizona, 1979 - 1992
- Compound elements must be specified if any of their components
(or their component's components, and so on) contain text values.
- Compound elements contain only elements and not text. No extra
text may be included as a child of a compound element. Example:
Metadata:
Identification_Information:
(no plain text is permitted here, only component elements)
Citation_Information:
Citation_Information:
Originator:
Publication_Date:
- Components of compound elements occur on successive lines using the
same degree of indentation. In the example below, the components of
Citation_Information are indented the same, and the components
of Series_Information are indented the same, but are more
indented than their parent element. Degree of indentation does not
have to be uniform throughout the file but all of the children of a
specific compound element must be indented the same way.
Citation_Information:
Citation_Information:
Originator:
Publication_Date:
Publication_Time:
Title:
Type_of_Map:
Series_Information:
Series_Name:
Issue_Identification:
Publication_Information:
Publication_Place:
Publisher:
Other_Citation_Details:
Online_Linkage:
Larger_Work_Citation:
- The lines in a textual value may have variable indentation as long as
they are all more indented than the element name to which they belong.
However, this indentation will be lost in subsequent processing of the
metadata, so it is not recommended. Example:
Title:
Geometeorological data
collected by the USGS Desert Winds
Project at Gold Spring, Great Basin Desert, northeastern
Arizona, 1979 - 1992
In the output of mp and other metadata-processing tools the additional
indentation in the text will be lost, and the text of the title will
appear as follows:
Title:
Geometeorological data
collected by the USGS Desert Winds
Project at Gold Spring, Great Basin Desert, northeastern
Arizona, 1979 - 1992
This file is <http://geology.usgs.gov/tools/metadata/tools/doc/encoding.html>
Last updated 20-Jul-1998