The Bigger Picture: Visual Archives and the Smithsonian
Digital Video Preservation: Identifying Containers and Codecs
In addition to a rich collection of analog moving image material currently being digitized, the Smithsonian Institution Archives (SIA) accessions large quantities of born-digital video from various hard drives, CDs, DVDs, and websites across the Institution. And just as digitization is a method of preserving moving image content before it degrades on an analog carrier, digital material must be retrieved from optical media before hardware failure or the degradation of such devices renders the content unplayable. The digital video files that are selected to be archived, like other electronic records at the Archives, are then incorporated into digital preservation workflows to insure that the files will remain playable for future generations.
Ensuring the longevity of digital video at the Smithsonian begins with inventorying a collection of video files, and capturing the technical information (codec, resolution, frame rate, etc.) related to a video file so its format can be identified. This process provides the means for assessing what may be at risk for obsolescence, as well as to determine what needs to be prioritized for preservation. Finally, because there are thousands of video files already accessioned, and with the potential for that number to grow exponentially in the coming years, understanding what’s in the Archives’ collections is key in developing priorities, better management practices, and preservation strategies for digital video.
Digital video files wrap a video and audio stream in a container or wrapper that is typically identified by a video file’s extension, which is important for archivists to keep track of. And due to the size of uncompressed video, streams are often compressed to more manageable sizes via a compressor/decompressor program called a codec. Media player applications like Windows Media, RealPlayer, QuickTime, and VLC will detect a codec type and access a program to decode the video and audio streams for playback. Some codec types are lossless, meaning the compression is mathematically reversible and no data is lost in the compression process. Other compression techniques are lossy and are effective means of providing high-quality, access copies. However, because data loss weighs heavily on an archivist’s conscience, any digital video format used for preservation will utilize lossless compression or no compression at all.
This video from Smithsonian Institution Archives Accession 11-014 is an excerpt of a larger video file found on a CD-ROM created in 2000 by the National Museum of Natural History. Before being converted for playback on the web, the video was in an MPG file container and was compressed with the MPEG-2 codec, which is playable in most media players. Initially, it was converted to MOV but would not play in YouTube. This version is now in WebM for YouTube playback.
For my internship at the Archives, I inventoried a variety of video files, taking note of each files’s container and codec types. The inventory yielded almost ten thousand video files with over twenty different container types, fifty video codec types, and twenty audio codec types, all of which were tested for playback in Windows Media, RealPlayer, QuickTime (Mac and Windows) and VLC media players. As it turned out, some 20 percent of those video files would not play in those four media player applications. Surprisingly, some of the more relatively modern codec types were more susceptible to playback issues than the older, more obscure codecs, which appear to have more established support in consumer media players. As you might imagine, a file that can only play in one media player is at greater risk for format and software obsolescence.
This excerpted video, from Smithsonian Institution Archives Accession 05-173, was accessioned as result of efforts to preserve the Smithsonian 150th anniversary website. Created in 1997, I could at first only get this file to play both video and sound in RealPlayer, but was eventually able to get it to play in YouTube.
I used various software applications to analyze and identify video and audio streams and their respective codecs, but each application had its own nomenclature for identifying a video file and its streams. Terms like “format,” “format name,” “format profile,” “compressor name,” “codec,” and “codec ID” were all used to identify a codec. Resources like Wikipedia and MultimediaWiki turned out to be helpful in keeping a consistent scheme for the identification of codecs and addressing discrepancies between identification tools.
Automating the capture of all this technical metadata is crucial in accessioning digital video into the Archive, especially as this data will serve as a key tool in managing and accessing these assets, and making preservation related decisions throughout the lives of these assets in the Archives.
Comments (4) – Leave a comment
Killian, most excellent article... We are considering MediaBeacon for our digital asset management system to get employed in the next month or so... I'm interested in how you cataloged all this metadata you dis/un/covered during this stage of the conversion. It appears that the LOC may move away from MARC in the not-too-distant future. Does the Smithsonian have any such plan as well? Just curious. Since we haven't deployed our solution just yet, I'm curious about whether we should even consider MARC in our deployment? Until that time ... Earl J.
Nice article about Digital Video Preservation! It was a pleasure to read it
This is a stellar post Killian, learning stuff left and right.
Earl, SIA is currently not cataloging these items with the extensive technical metadata detailed in the post, but rather exploring options for doing so in the future. The initial inventory was a simple spreadsheet with file name, accession number, file extension, succesful playback, which players, etc. This data can be imported into a future system. We do have MARC records and finding aids for our accessions at the collection level. Best wishes, Lynda
Leave a comment
Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.
About
Smithsonian on Flickr Commons
Topics/Tags
- See Here (611)
- American History (542)
- Science (429)
- Archive (329)
- Cities/Places (277)
- Exhibitions (234)
- Web/Tech (210)
- Photo History (189)
- Link Love (153)
- Politics/Government (153)
Blog Roll
Categories
- Collections in Focus (988)
- What Gets Saved (337)
- Behind the Scenes (212)
- Smithsonian History (134)
Monthly Archive
- May 2013 (20)
- April 2013 (26)
- March 2013 (26)
- February 2013 (26)
- January 2013 (28)
- December 2012 (26)
- November 2012 (28)
- October 2012 (32)
- September 2012 (26)
- August 2012 (31)
- July 2012 (26)
- June 2012 (27)
- May 2012 (27)
- April 2012 (27)
- March 2012 (28)
- February 2012 (27)
- January 2012 (26)
- December 2011 (31)
- November 2011 (28)
- October 2011 (35)
- September 2011 (31)
- August 2011 (35)
- July 2011 (41)
- June 2011 (43)
- May 2011 (33)
- April 2011 (40)
- March 2011 (43)
- February 2011 (35)
- January 2011 (36)
- December 2010 (42)
- November 2010 (40)
- October 2010 (44)
- September 2010 (37)
- August 2010 (39)
- July 2010 (38)
- June 2010 (37)
- May 2010 (42)
- April 2010 (44)
- March 2010 (47)
- February 2010 (40)
- January 2010 (39)
- December 2009 (43)
- November 2009 (34)
- October 2009 (11)
- September 2009 (11)
- August 2009 (12)
- July 2009 (14)
- June 2009 (10)
- May 2009 (12)
- April 2009 (14)
- March 2009 (10)
- January 2009 (1)

