University of Virginia Community Digitization Guidelines
|
Before You Begin | Storage
| Definitions | Images (raster)
| Images (vector) | Electronic Texts
| Audio | Video | This document is also available as a PDF (500K) for easy printing and reference. This document offers guidance and minimum recommendations that are in line with the UVa Library's current practice for faculty who are planning digitization projects. Inherent or unique characteristics of different source materials necessitate different approaches to scanning and conversion requirements for digital projects should be considered on a case-by-case basis (particularly for grant projects with specific requirements). These guidelines have been developed in order to:
Because technology and industry standards are constantly improving and changing, we view this as a continually evolving document. Before You Begin DigitizingBefore you digitize anything, take some time to consider your needs. The worst possible outcome is to spend time digitizing materials that end up being inappropriate for the goals of your project. To avoid this scenario, consider a number of issues ahead of time.
StorageStorage options for your digitized media should be considered before you begin digitizing. Storage space needs vary significantly, depending on file formats and the quality of media desired. Backup policies should always be implemented. There are numerous solutions for storing media. Hard drives and CD or DVD offer local but limited storage space for many media types. For storage of larger media like raw digital audio and video, you might consider an external hard drive. The Firewire standard allows for faster access to these drives. MiniDV/DVcam is a commonly used medium for storing for video. Definitions
For a more extensive glossary and links, visit: http://www.lib.virginia.edu/digital/reports/dl_terminology_uva.htm Images - Bitmap/Raster
Illustrations/Graphs/Charts - Vector
Electronic Texts
Audio
|
|
Creation |
|
|
|
|
Purpose |
Format |
Resolution & Sample rate |
Description |
|
Master |
Broadcast WAV |
44.1 kHz, 16 bits per sample |
Maintain channel pattern of original, e.g. stereo, mono, and multi-channel. |
|
Deliverables |
|
|
|
|
Purpose |
Format |
Resolution & Sample rate |
Description |
|
Service |
MPEG 1/2 Layer 3 (.mp3); MPEG 4/AAC |
Appropriate to type and quality or original |
Maintain channel pattern where practical. |
|
Deliverable |
MPEG 1/2 Layer 3 (.mp3);; MPEG 4/AAC |
Appropriate to delivery needs and conditions |
|
|
Preview |
MPEG 1/2 Layer 3 (.mp3);; |
|
Reduce duration to create a representative sample: a "clip" |
Video
|
Creation |
|
|
|
|
Purpose |
Format |
Compression |
Description |
|
Master |
NTSC DV, DV-Cam tape, Beta-SP |
DV |
Media should be stored in an environmentally stable location |
|
Deliverables |
|
|
|
|
Purpose |
Format |
Compression |
Description |
|
Service |
Select as appropriate for use |
Appropriate to format; and use |
Service, i.e. editable, versions produced as required by "dubbing"; implies change of storage medium and/or format. Very large file sizes; not network distributable. |
|
Deliverable |
MPEG1, MPEG2, MPEG4 |
Appropriate to format and use |
Only highly compressed forms, network distributable. |
|
Preview |
MPEG4 |
Appropriate to format and use |
Reduce duration to create a representative sample: a "clip." |
|
Thumbnail |
120 pixels on the longest side, JPEG |
JPEG is automatically compressed, select High or level 10 compression |
Representative frame: indication of content. |
Statistical/Numeric Data
|
Purpose |
Format |
Comments |
|
Master copy |
ASCII columnar format SPSS, STATA, SAS program code and/or machine readable text based documentation to define data for analysis |
ASCII delimited preferred DDI standard metadata preferred documentation format Following the ICPSR standard for data archiving and preservation. |
|
Service |
Data stored in some statistical package format (SAS, SPSS, STATA) or in queryable SQL database system |
Storage for access, retrieval, or extraction. |
|
Deliverable |
SAS, STATA, SPSS, Excel or delimited ASCII format with data map or variable list. |
Excel not advised for very large files. All users get documentation built from DDI records. |
|
Preview |
Screen dump of 5% of records, no more than 100 |
Practice not currently in place. |
Spatial Data - Raster
|
Purpose |
Format |
Comments |
|
Master copy |
Photography or remote sensing imagery: Non compressed TIF+world file or GeoTIFF (preferred), BIL, IMG (Erdas Imagine) |
Also applicable for geo-referenced maps. GeoTIFF retains geographic information in TIFF header; world file does same as separate file. |
|
Non-image raster data: ASCII based storage and exchange format (Arc Exchange .e00; ArcGenerate .gen; Spatial Data Transfer Standard (SDTS)) |
SDTS is federal standard, but not widely adopted in commercial industry or government; format is cumbersome for further processing. |
|
|
Service |
Photography or remote sensing imagery: GeoTIFF, BIL, IMG, SID + world file |
|
|
Non-image raster data: ArcExchange; GeoTIFF; native data formats (.cdo); native software data models (ArcGRID) |
Users will almost always need to process stored data. Tiffs can store pixel value as color value and be converted in GIS software; native data formats are common in federal data. GRID data model is directory, not file-based but could be stored for access purposes. |
|
|
Deliverable |
Photography or remote sensing imagery: GeoTIFF, BIL, IMG, SID+world file, JPG+world file |
|
|
Non-image raster data: Arc Exchange, native formats or models, GeoTIFF |
||
|
Preview |
JPG, GIF, or SID |
Sizes may need to be slightly larger than those outlined for other types of images |
Spatial Data - Vector
|
Purpose |
Format |
Comments |
|
Master copy |
ASCII-based exchange format such as SDTS, Arc Exchange (.e00), ArcGenerate (.gen), or delimited text. |
Note that two of these are tied to proprietary software formats and are not available for all data models. SDTS is available but rarely used in data distribution. |
|
Service |
Industry standard formats such as ESRI shape (.shp) or ArcInfo Coverage model, or CAD format such as Microstation (.dgn) or AutoCAD (.dgw). Possible storage in SQL based system through proprietary middleware (ArcSDE, Oracle Spatial) |
Note that ESRI’s shapefile model consists of several related files. The ArcInfo Coverage model is directory-based. RDBMS models are still relatively new. |
|
Deliverable |
Industry standard formats such as ESRI shape, Arc Exchange, or CAD formats. |
|
|
Preview |
GIF, JPG or other raster image format. |
Preview graphics need to be large enough to convey the general “look” of the data. |
Describing Your Digital Resources
The Library suggests a minimum list of categories of information that you should use to describe the content of your resources as well as the nature of the digital files themselves. We recommend that you send email to <lib-metadata-help@virgina.edu> at the start of your project. A Librarian with your area or subject expertise will be happy to work with you in setting up a process and identifying appropriate descriptive terminology.
The guidelines that follow outline the type of descriptive information that we recommend you collect and give you some basics for structuring that data. For assistance with creating a database or choosing a metadata format to encode your descriptions, please feel free to contact send email to <lib-metadata-help@virgina.edu>.
There is important descriptive information to be gathered both about the intellectual content of the resource and about the digital creation. These elements are outlined below. Some fields are strongly recommended, some are required, and others are optional. In order for the Library to take ownership of your resource and/or commit to digital preservation, we ask that you consider all of the fields for describing the intellectual content or the digital resource. The absolutely required fields are marked with asterisks. Please document your practices and standards and be prepared to include that documentation with any data files you deliver to the Library.
The Notes in the third column refer to the notes available online at http://www.lib.virginia.edu/digital/metadata/communityguidelines.html.
Describing the Intellectual Content
|
*Title |
The actual title of the content of the resource, or a brief descriptive phrase. |
|
|
*Agent |
The name(s) of individuals or organizations that bear some important relationship to the content. At least one agent of some sort is required. Agents have types (creator, publisher, contributor) and one of these types is also required to be specified in the data. |
|
|
*Date |
Date or date range associated with the creation of the content. |
|
|
Place |
A physical location associated with the creation of the content (i.e. the place of publication or the location of a building or of a painting). |
|
|
Physical Description |
The extent of the resource (number of pages of the print book), physical dimensions (for paintings or sculpture), the medium (bronze, oil), etc. |
|
|
*Content Type |
The nature of the content being described. |
Describing the Digital Resource
|
*Identifier |
A name/code for each resource that is unique within your database. |
|
|
*Access Rights |
The level of access that a member of the UVa community or the general public can have to this resource. |
|
|
Agent |
The name(s) of individuals or organizations that bear some important relationship to the digital resource. |
|
|
*Resource Type |
The type of digital object being described |
|
|
*Date |
The date the digital file was created. |
Optional Elements
|
Culture |
A culture of origin or context for a given resource. |
|
|
Style |
A style or period associated with the content. |
|
|
Description |
Descriptive text, notes, remarks, or comments about the resource. |
|
|
Language |
The language(s) of the intellectual content of the resource |
|
|
Subject/Keywords |
Topic of the resource. Typically the subject will be expressed as keywords or phrases that describe the subject content of the resource, or terms related to significant associations of people, events, or other contextual information. |
|
|
Place coverage |
A physical location represented by the content (i.e. the geographic subject of a book or the representation of a place within a painting). |
|
|
Date coverage |
Date or date range represented by the content (i.e. the temporal subject of a book). |
|
|
Relationships |
Used to relate two metadata records together, i.e. items in a set, issues of a newspaper, a painting located within a Church. |
|
|
Mimetype |
A standard for the formatting of files so that they can be sent over the Internet. |
Where to Get More Help
Digital Media Lab
Clemons Library, 3rd Floor
Judy Thomas, jthomas@virginia.edu
Jama Coartney, jama@virginia.edu
http://lib.virginia.edu/clemons/RMC/dml.htmlDigital Scholarship Services
Scholars' Lab
Alderman Library, 4th Floor
Donna Tolson, dtolson@virginia.edu
http://www.lib.virginia.edu/scholarslab/Rare Materials Digital Services
Small Library, 2nd Floor
Bradley Daigle, bjd2b@virginia.edu
http://www.lib.virginia.edu/rmds/Fiske Kimball Fine Arts Library
Campbell Hall
Liz Gushee, egushee@virginia.edu
http://www.lib.virginia.edu/fine-arts/collections/visual_res.htmlInstructional Scanning Services
Alderman Library, 3rd Floor
Mitch Farish, ISS Coordinator, lib-iss@virginia.edu
http://lib.virginia.edu/leo/iss.htmlCharles L. Brown Science & Engineering Library Research Computing Lab
Clark Hall
Andrew Sallans, sallans@virginia.edu
http://www.lib.virginia.edu/science/rescomp/
Developed by the UVa Library - October 8, 2004