I recently attended the 2012 Association of Moving Image Archivists (AMIA) Annual Conference, and digital video was a common thread in most of the presentations. The topics ranged from how to digitize film, to providing online access to digital files, and ways to preserve born-digital video files. There was also a workshop on using FFmpeg, an open source command line program (like DOS for those who remember it) to convert audio and video files into a variety of different formats. When digital video is created many different codecs are used to decode or translate the raw data into something you can view. Additionally digital video is wrapped together with its audio in a container to form a package. Usually, the codec and container for a given video file format is specific to the proprietary software that was used to create the file. FFmpeg is a tool archivists can use to decode and convert a multitude of audio and video file formats with differing codecs and containers into a preservation standard.
Last summer, Killian Escobedo, intern for the Digital Services Division, wrote about some of the challenges of born digital video preservation, including the occasional inability to determine the codec and container format for a given video file. The FFmpeg family contains a program called ffprobe, which uses libraries of various codecs and containers (specifically called: libavcodec and libavformat) to extract technical metadata to determine the codec and container of just about any digital video file. Also, the libavcodec and libavformat libraries can be integrated with open source media players like VLC, which will allow the program to play back any file that has a codec and container listed in the libraries. One important aspect of these libraries is that a once a codec or container is added, it will only be removed if it poses a security risk. This is especially important since materials are usually accessioned several years after they were created and FFmpeg can be used by archivists to access information about file formats that may have become obsolete as both the codec and container information is needed in order to play back digital video.
While FFmpeg contains several tools for analyzing existing digital video, its main purpose is to convert digital video and audio files from one codec and container format to another using a command line interface. The transcoding process starts by removing the container and codec to get to the raw data of the video, and then encoding that information into the codec and container specified at the beginning of the transformation. The commands to simply transcode the video from one container and codec to another are fairly basic, but FFmpeg also allows you to make additional transformations during the transcode process, such as specifying a new aspect ratio or bit rate for the video. These transformations are not ideal for the preservation of digital video because they can drastically change how the video looks. Additionally, FFmpeg can be used to perform a MD5 checksum on a video after it has been converted from one format to another with greater accuracy than programs like JHOVE or DROID because it will look at the frame by frame raw data contained within a video, which should remain the same, even once the codec and container have been changed.
Thanks to the workshop, I now have a greater understanding of how to use FFmpeg to convert our born-digital video to a preservation format. I am looking forward to running ffprobe on the files that Killian was unable to identify to see if it can determine the codec and container formats of some of the more complicated files. Hopefully, this will help with the development of a long term preservation plan for the multitude of codecs and containers that are rapidly becoming obsolete.
- Digital Video Preservation: Further Challenges for Preserving Digital Video and Beyond, The Bigger Picture blog, Smithsonian Institution Archives
- Association of Moving Image Archivists