Smithsonian Institution Archives
  • Collections
  • Services
  • Smithsonian History
  • About
  • Education
  • Blog
  • Forums
  • Press
  • Audiences
  • Donate

The Bigger Picture: Visual Archives and the Smithsonian

Digital Video Preservation: Identifying Containers and Codecs

by Killian Escobedo, Intern, Digital Services Division on July 26, 2011

In addition to a rich collection of analog moving image material currently being digitized, the Smithsonian Institution Archives (SIA) accessions large quantities of born-digital video from various hard drives, CDs, DVDs, and websites across the Institution. And just as digitization is a method of preserving moving image content before it degrades on an analog carrier, digital material must be retrieved from optical media before hardware failure or the degradation of such devices renders the content unplayable. The digital video files that are selected to be archived, like other electronic records at the Archives, are then incorporated into digital preservation workflows to insure that the files will remain playable for future generations.

MediaInfo detects video and audio streams in an AVI file, identifying the video codec as Indeo 4. The application can also detect other attributes including a frame rate of 10.00 FPS. That’s almost slide show speed!

Ensuring the longevity of digital video at the Smithsonian begins with inventorying a collection of video files, and capturing the technical information (codec, resolution, frame rate, etc.) related to a video file so its format can be identified. This process provides the means for assessing what may be at risk for obsolescence, as well as to determine what needs to be prioritized for preservation. Finally, because there are thousands of video files already accessioned, and with the potential for that number to grow exponentially in the coming years, understanding what’s in the Archives’ collections is key in developing priorities, better management practices, and preservation strategies for digital video.

VLC media player cannot play the Indeo 4 video codec. VLC politely informs the user with an error message and provides him or her with the FOURCC to seek another source to enable playback of a file.

Digital video files wrap a video and audio stream in a container or wrapper that is typically identified by a video file’s extension, which is important for archivists to keep track of. And due to the size of uncompressed video, streams are often compressed to more manageable sizes via a compressor/decompressor program called a codec. Media player applications like Windows Media, RealPlayer, QuickTime, and VLC will detect a codec type and access a program to decode the video and audio streams for playback. Some codec types are lossless, meaning the compression is mathematically reversible and no data is lost in the compression process. Other compression techniques are lossy and are effective means of providing high-quality, access copies. However, because data loss weighs heavily on an archivist’s conscience, any digital video format used for preservation will utilize lossless compression or no compression at all.

 

This video from Smithsonian Institution Archives Accession 11-014 is an excerpt of a larger video file found on a CD-ROM created in 2000 by the National Museum of Natural History. Before being converted for playback on the web, the video was in an MPG file container and was compressed with the MPEG-2 codec, which is playable in most media players. Initially, it was converted to MOV but would not play in YouTube. This version is now in WebM for YouTube playback.

For my internship at the Archives, I inventoried a variety of video files, taking note of each files’s container and codec types. The inventory yielded almost ten thousand video files with over twenty different container types, fifty video codec types, and twenty audio codec types, all of which were tested for playback in Windows Media, RealPlayer, QuickTime (Mac and Windows) and VLC media players. As it turned out, some 20 percent of those video files would not play in those four media player applications.  Surprisingly, some of the more relatively modern codec types were more susceptible to playback issues than the older, more obscure codecs, which appear to have more established support in consumer media players. As you might imagine, a file that can only play in one media player is at greater risk for format and software obsolescence.

 

This excerpted video, from Smithsonian Institution Archives Accession 05-173, was accessioned as result of efforts to preserve the Smithsonian 150th anniversary website. Created in 1997, I could at first only get this file to play both video and sound in RealPlayer, but was eventually able to get it to play in YouTube.

I used various software applications to analyze and identify video and audio streams and their respective codecs, but each application had its own nomenclature for identifying a video file and its streams. Terms like “format,” “format name,” “format profile,” “compressor name,” “codec,” and “codec ID” were all used to identify a codec. Resources like Wikipedia and MultimediaWiki turned out to be helpful in keeping a consistent scheme for the identification of codecs and addressing discrepancies between identification tools.

Automating the capture of all this technical metadata is crucial in accessioning digital video into the Archive, especially as this data will serve as a key tool in managing and accessing these assets, and making preservation related decisions throughout the lives of these assets in the Archives.

Categories: What Gets Saved
Tags: Web/Tech, Film/Video, Conservation
Comments: View 4 comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

Comments (4) – Leave a comment

Earl J Moniz

Killian, most excellent article... We are considering MediaBeacon for our digital asset management system to get employed in the next month or so... I'm interested in how you cataloged all this metadata you dis/un/covered during this stage of the conversion. It appears that the LOC may move away from MARC in the not-too-distant future. Does the Smithsonian have any such plan as well? Just curious. Since we haven't deployed our solution just yet, I'm curious about whether we should even consider MARC in our deployment? Until that time ... Earl J.

Earl J Moniz July 30, 2011 at 1:09 pm
  • reply
Sadek

Nice article about Digital Video Preservation! It was a pleasure to read it

Sadek July 26, 2011 at 12:58 pm
  • reply
Graham Hukill

This is a stellar post Killian, learning stuff left and right.

Graham Hukill July 26, 2011 at 11:34 am
  • reply
Lynda

Earl, SIA is currently not cataloging these items with the extensive technical metadata detailed in the post, but rather exploring options for doing so in the future. The initial inventory was a simple spreadsheet with file name, accession number, file extension, succesful playback, which players, etc. This data can be imported into a future system. We do have MARC records and finding aids for our accessions at the collection level. Best wishes, Lynda

Lynda August 4, 2011 at 1:24 pm
  • reply

Leave a comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
By submitting this form, you accept the Mollom privacy policy.

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.

Stay in touch!

Facebook Twitter Flickr YouTube SlideShare
Join our eNewsletter

About

Connecting you to America’s past with a behind-the-scenes exploration of the Smithsonian’s history, treasures, and the challenges that Archives face preserving collections. More details...

Smithsonian on Flickr Commons

Topics/Tags

  • See Here (611)
  • American History (542)
  • Science (429)
  • Archive (329)
  • Cities/Places (277)
  • Exhibitions (234)
  • Web/Tech (210)
  • Photo History (189)
  • Link Love (153)
  • Politics/Government (153)

Blog Roll

All Smithsonian blogs
American Historical Association Blog
American Institute of Conservation Blog
Archives Next
Archives of American Art
Around the Mall
Field Book Project
Hanging Together
Library of Congress Blogs
National Archives (US) Blogs
National Museum of American History, O say can you see?
Smithsonian Collections Blog
Smithsonian Libraries
Teaching American History

Categories

  • Collections in Focus (988)
  • What Gets Saved (337)
  • Behind the Scenes (212)
  • Smithsonian History (134)

Recent Posts

  • See Here: 5/17/2013
  • Link Love: 5/17/2013
  • Weird and Wonderful: The Surprising Mrs. Hilda Hempl Heller
  • Women in Science Wednesday: Anne Hagopian
  • Sneak Peek 5/15/2013

Monthly Archive

  • May 2013 (20)
  • April 2013 (26)
  • March 2013 (26)
  • February 2013 (26)
  • January 2013 (28)
  • December 2012 (26)
  • November 2012 (28)
  • October 2012 (32)
  • September 2012 (26)
  • August 2012 (31)
  • July 2012 (26)
  • June 2012 (27)
  • May 2012 (27)
  • April 2012 (27)
  • March 2012 (28)
  • February 2012 (27)
  • January 2012 (26)
  • December 2011 (31)
  • November 2011 (28)
  • October 2011 (35)
  • September 2011 (31)
  • August 2011 (35)
  • July 2011 (41)
  • June 2011 (43)
  • May 2011 (33)
  • April 2011 (40)
  • March 2011 (43)
  • February 2011 (35)
  • January 2011 (36)
  • December 2010 (42)
  • November 2010 (40)
  • October 2010 (44)
  • September 2010 (37)
  • August 2010 (39)
  • July 2010 (38)
  • June 2010 (37)
  • May 2010 (42)
  • April 2010 (44)
  • March 2010 (47)
  • February 2010 (40)
  • January 2010 (39)
  • December 2009 (43)
  • November 2009 (34)
  • October 2009 (11)
  • September 2009 (11)
  • August 2009 (12)
  • July 2009 (14)
  • June 2009 (10)
  • May 2009 (12)
  • April 2009 (14)
  • March 2009 (10)
  • January 2009 (1)
Smithsonian Institution Archives
eNewsletter Facebook Twitter Flickr Historypin YouTube SlideShare Browsealoud
Smithsonian Institution
  • Privacy
  • Copyright
  • Contact