|
Metadata Home > Decisions and Best Practices
Coding set (i.e. collection) identification
The MSG resolved to avoid semantically loaded terms, and agreed to use "set"
rather than "collection" to describe the various types of aggregated materials.
- Sets can consist of the following:
- Materials purchased from a vendor as concrete units
- Materials created by the Library (or faculty projects) as concrete
units
- Materials brought together by the Library as being usefully related
but not inherently related to each other
- Materials inherently related to each other by their bibliographic nature
are considered series (i.e. electronic texts bearing a series statement)
- The set code only serves to link the individual object back to the collection
object. There is no hierarchical relationship implied in the <set>
elements. All hierarchy will be assumed by the collection objects.
| DescMeta |
<surrogate>
<set code="UVA-LIB-ArtArchit"/>
</surrogate>
|
| GDMS |
<gdmshead>
<filedesc>
<setstmt>
<set code="UVA-LIB-ArtArchit"/>
</setstmt>
</filedesc>
</gdmshead> |
| TEI |
<filedesc>
<seriesStmt>
<title level="s">University of Virginia, Modern English collection</title>
<idno type="uva-set">UVA-LIB-ModEngl</set>
</seriesStmt>
</filedesc> |
Set code conventions
- Set codes will begin with UVA-LIB for any sets created or collected by
the Library. This includes faculty projects that have been selected and
collected by the Library.
- For vendor collections, set codes will begin with standard abbreviations
in all caps (i.e. SI-SAAM for the Smithsonian Institute, Smithsonian American
Art Museum )
- For the remainder of the set code, abbreviations should be pulled, where
possible, from the standardized list, The List of Title Word Abbreviations
(following the ISO 4 standard, Rules for the abbreviation of title words
and titles of publications) and follow their abbreviation conventions
without following their punctuation or capitalization rules.
- There should be a central authority for determining the "official" name
of a set (regardless of which set category it falls into).
- Once the official name has been determined, the code should prefixed as
above, followed by a hyphen, followed by the standardized abbreviation with
all words strung together as a compound word.
- The first word of each abbreviated title should be capitalized.
- All set codes must be unique.
- Given the above formulation, the phase 2 sets are:
- The Art and Architecture collection: UVA-LIB-ArtArchit
- The Barcelona collection: UVA-LIB-Barcelona
- The Architecture of Jefferson Country: UVA-LIB-ArchitJeffCtry
- The Catlin collection: SI-SAAM-CatlinIndianPaint (from The Smithsonian
American Art Museum Catlin Indian Paintings Collection)
- The Fowler collection: UCLA-FOWLER-AfrArt
- The Modern English Text collection: UVA-LIB-ModEngl
- The Finding Aids collection: UVA-LIB-FindAids
Descriptive Metadata for
Images
Image objects will only contain:
- Their parent pointer: <idno type=parent">
- A label:
- For page images (based on the existence of a pb tag), the label will
be:
book title, page [value of n= (page number)].
- For figures (based on the existence of a fig tag), the label will
be:
book title, [fig caption]
Some figures have extensive captions. For phase 2, we will grab the
entire caption. If this turns out to be unwieldy, we'll consider limiting
captions to a certain number of characters only for phase 3.
- Their rights information. Page images can inherit their rights from their
parent, but the images referred to by GDMS objects must know their individual
rights.
This descriptive metadata for image objects will not populate the discovery
index and full descriptive metadata will be inherited from the parent on demand.
Where to store PIDS in the
"master" metadata records
All "master" records should contain their Fedora PIDS. When files
are spun off from their masters for archiving or other purposes, their PIDS
will already be embedded in the metadata
| TEI |
<fileDesc>
<publicationStmt>
<idno type="uva-pid">
|
| GDMS |
<gdmshead>
<idno type="uva-pid">
|
| EAD |
<filedesc>
<publicationstmt>
<num type="uva-pid"> |
| TIFF |
in the TIFF dump (for phase 2), not the actual TIFF headers |
Dealing with TEI
headers that represent serials & monographic sets
The challenge: Each issue/volume of a multi-volume publication is a separate
file, will be a separate Fedora Object, and needs a separate TEI header.
We can create alternate titles in the TEI to ease searchability, but we don't
want to populate the discovery index with each volume's metadata. To
achieve this end:
Individual volume/issue headers
- Each individual issue/volume has it's own individual TEI header
which describes the issue/volume in hand.
- Each individual TEI header, by extraction from VIRGO, has an element: <idno type="UVa Title Control Number">
- Each individual TEI header has an element identifying the "form" of the item. This is coded as follows:
<profileDesc>
<keywords scheme="uva-form">
<term>periodical issue</term>
</keywords>
</profileDesc>
The scheme "uva-form" must also be declared:
<classDecl>
</taxonomy>
<taxonomy id="uva-form">
<bibl>UVa Library Form Categories</bibl>
</taxonomy>
</classDecl>
The uva-form keyword scheme is a locally developed thesaurus.
Valid terms for this scheme currently are:
article
broadside
manuscript
monograph
monographic set
monographic volume
newspaper
newspaper issue
periodical
periodical issue
periodical volume
serial
serial volume
Please contact Erin Stalberg, MSG Chair, if you need additional terms to be added to
this list.
An additional advantage of using <keywords scheme="uva-form"> in this way is that the digital library will be able to group hits based on particular uva-form values. The user will be able to scan a hit list and have their hits groups by uva-form, i.e. first the monograph hits and then the periodical hits.
- We modified the TEI DTD to be able to use<biblScope> within <fileDesc> and within <sourceDesc><bibFull>
<fileDesc>
<titleStmt>
<title n="245|a" type="main">The Cavalier Daily</title>
<biblScope type="volume"><num value="79">79th Year</num></biblScope> <biblScope type="issue"><num value="2">Number 2</num></biblScope>
<biblScope type="date">
<date value="1968-09-13">Friday, September 13, 1968</date>
</biblScope>
</titleStmt>
</fileDesc>
and
<sourceDesc>
<biblFull>
<titleStmt>
<title n="245|a" type="main">The Cavalier daily</title>
<biblScope type="volume"><num value="79">79th Year</num></biblScope>
<biblScope type="issue"><num value="2">Number 2</num></biblScope>
<biblScope type="date">
<date value="1968-09-13">Friday, September 13, 1968</date>
</biblScope>
</titleStmt>
</biblFull>
</sourceDesc>
Standalone headers
- Standalone TEI headers are created to represent the serial or monographic
set as a whole. The UVa standalone header is based on the concept of TEI Independent Headers (see the TEI
website). We have not used the practice as written in TEI, however, because the TEI Independent Header DTD does not allow for accompanying extension files as the normal TEI DTD does. We have adapted the concept with local modification.
- Standalone TEI headers include also the <keywords scheme="uva-form"> element describing the parent. For example:
Individual TEI header (child)
<keywords scheme="uva-form">
<term>periodical issue</term>
</keywords>
Standalone TEI header (parent)
<keywords scheme="uva-form">
<term>periodical</term>
</keywords>
"uva-form" is also declared in the <classDecl> for the standalone header as described above.
- Each iteration of a title change will have a separate Standalone Header.
- The digital library software will link the Standalone (parent) headers
to the Individual (children) headers by the value of the UVa Title Control
Number idno.
- Only the Standalone Header will populate its metadata to the digital
discovery index. Therefore, when searching the digital discovery index,
the user will first locate the parent and then find all it's children (or
relations, in the case of title changes). When searching the full-text TEI
index, the user will be able to discover both the parent and children records
separately.
- A new Fedora content model will be developed.
TEI sort titles
The MARC-to-TEI script generates sort titles based on the indicator values in the MARC record's 245 tag. The script also normalizes the data based on the NACO Normalization Rules and converts all characters to lower case. The DL will sort based on the data in the element <title type="sort"> and display on the element <title type="main">.
MARC
245 04 $a The Cavalier daily
TEI
<title n="245|a" type="main">The Cavalier daily</title>
<title type="sort">cavalier daily</title>
Overview of IRIS import/export processes
PDF documents created by Jack Kelly to document his workflow for importing data to and exporting data from IRIS using Perl & Applescripts.
IRIS-GDMS
Data Import into IRIS
Access Rights
The level of access that a member of the UVa community or the general public can have to this resource. Currently there are 4 valid values only.
| Machine processable data |
Display values |
| public |
Publicly accessible |
| uva |
Accessible to UVa community only |
| viva |
Accessible to VIVA community only |
| restricted |
Restricted to Library staff for management only |
Note: VIVA is the Virtual Library of Virginia. UVa hosts a number of resources on their behalf.
- Access Rights must confirm to one of the above. If you have additional restrictions not accounted for above, please contact Erin Stalberg, MSG Chair.
|