Digitizing Collections

 
 

woman standing in front of digitization equipment. In order to broaden access to the Archives’ collections, and reduce the impact of frequent handling, the Archives is digitizing its most valuable and used collections. High-resolution surrogates of the Archives’ digitized collections are created and available online for researchers, scholars and the public to view, and download for personal and educational purposes. A portion of our digitized holdings are placed in the Smithsonian Transcription Center where volunteers help to transcribe these original, handwritten documents online.

Digitization Standards

The materials in the Archives’ collections vary in fragility, such as letterpress from the 1850’s, glass plate negatives from the early 20th century, and videotape from the 1970s. The handling and light necessary for digitization contribute to the wear and tear of collections. For video and audio material, tapes that are brittle or suffer from “sticky shed” syndrome may not survive multiple playbacks. Therefore the Archives employs digital curation methodologies and standards to avoid repeated digitization. The outcome are high resolution images, audio, and video in preservation quality digital file formats. The Archives’ metadata standards ensure descriptive and technical characteristics are noted in its collection management system and embedded in the surrogate files as appropriate. Access derivatives are created from the digital preservation masters to fulfill reference requests and public interest.

Digitization Specifications

  • Images: 6,000 pixels along the long axis (minimum 600 ppi), RGB un-compressed TIFF format
  • Audio: Uncompressed Broadcast Wave Format (BWF; WAV), 16 bit depth, sampling rate of 96 kHz or 44.1 kHz for spoken word
  • Video: MPEG 4:2:2 and MJEPG (MXF wrapper)

The Archives’ approach is based on several professional standards and best practice guidelines. The Federal Agencies Digitization Guidelines Initiative (FADGI) is a useful distillation of the many applicable guidelines. The Archives meets or exceeds the FADGI guidelines as described below.

Still Images

The specifications are followed when digitizing materials such as photographs, negatives, documents, manuscripts, diaries and books.

  • Resolution
    • 6,000 pixels on the long axis of the image
    • Minimum value is 600 ppi (pixels per inch), increasing resolution in intervals of 25 ppi as necessary to achieve a minimum of 6,000 pixels along the long axis.
    • For images from microfilm, a resolution of 300 ppi grayscale is acceptable.
  • Digital File Format
    • Tagged Image File Format (TIFF) using Windows (PC) byte orientation
    • For color images, a 24 bit RGB setting is used, yielding 8 bits per color channel.
    • For black and white images, a 24 bit RGB setting is used.
  • File Compression
    • None

Audiovisual Materials

In years past, audio preservation professionals relied on reformatting to alternate analog formats as a means of preservation. Excellent-quality media formats were available and the digital options were limited. Today, the availability of analog playback equipment is dwindling and the audiovisual industry has shifted to digital production. Digital preservation has become the preferred preservation approach, which also helps with facilitating access.

With the Archives acquiring over 300 new accessions annually, its collection of audiovisual materials spans both analog and digital recordings. Despite born-digital audiovisual content being much “younger” than its analog counterparts, it faces similar risks to its long-term accessibility.

The Federal Agency Digitization Guidelines Initiative (FADGI) serve as a useful preservation-quality benchmark for both digitization and digital migration specifications. The following physical care digitization standards are followed with each media type. 

Analog Audiovisual Media Digitization

Upon acquisition or prior to playback, ensure that any record tabs and buttons have been removed or depressed to avoid accidentally recording over the content present.

  • Digital File Format
    • Broadcast Wave File (BWF), uncompressed
    • 16-bit, 48 kHz is preferred for spoken word, 16-bit 96 kHz for performances unless originally recorded at the lower rate
  • Metadata
    • See the FADGI Guidelines. The Archives embeds the metadata field values in the BWF preservation files.

Digital Audio Preservation

Digital audio can be received on specialized media like DAT tapes or on computer-based media like a hard drive or CD. In either case, the preservation format and resolution is the same. As with analog media, any record tabs and buttons are removed or depressed to avoid accidentally recording over the content present.

  • Digital File Format
    • Broadcast Wave File (BWF), uncompressed
    • 16-bit, 48 kHz or 44.1 kHz, whichever is the originally recorded sampling rate
  • Metadata
    • See the FADGI Guidelines. The Archives embeds the metadata field values in the BWF preservation files.
  • Digital Audio Received on Computer Storage Media
    • Retain the contents in the originally acquired file format
    • Generate a preservation master following the above specifications

Analog Video Media Digitization

Upon acquisition or prior to playback, ensure that any record tabs and buttons have been removed or depressed to avoid accidentally recording over the content present.

  • Digital File Format Metadata
    • For Preservation: Motion JPEG-2000 with lossless compression in a MXF wrapper
    • For Production Uses: MPEG-2 in a 4:2:2 format
    • For Access: Windows Media Video file and H.264 in a MOV wrapper
  • Metadata
    • Currently, we do not embed metadata into our video files. However, an XML file containing descriptive and technical metadata is generated for, and kept with, the MXF file
  • Digital Video Received on Computer Storage Media
    • Retain the contents in the originally acquired file format, along with any and all codecs accompanying the original content
    • Generate a MPEG-2 4:2:2 format preservation master file

Accessing the Archives’ Digitized Collections

Our digitized holdings are accessible through the Archives’ website, the Smithsonian’s Collections Search Center, and cultural heritage aggregators like the Digital Public Library or America.  Individually cataloged documents with online surrogates can be found by searching for the content, then selecting the “Media” heading at the top of the search result list. For digital content within a collection, the collection’s finding aid will indicate when digital content is available with a link to where the content is directly available online.