The Archives manages and preserves digital collections containing a wide variety of digital content including documents and spreadsheets, images, audio, video, text, email, databases, architectural designs, scientific data sets, geographical data, one-of-a-kind software, and website and social media records.
In most circumstances, the Archives acquires born digital material, or electronic records, as part of a larger record accession, received on removable storage media and intermingled with analog records. The age of the files on acquired media is often 10-15 years old, but in reality can range from a week old to more than 30 years old. In virtually every instance, the content on each piece of digital media is unique. Copies or backups are rarely included. Scale is also an issue; a given accession may contain a handful of digital files or an email account containing hundreds of thousands of messages.
The Archives’ preservation approach aligns with professional best practices which begins with stabilization and ingest. The electronic records are transferred to a temporary location on a network server shortly after acquisition to avoid relying on hardware and operating systems that are already obsolete or will be in a few years’ time. Multiple backups of the working environment and its contents ensure redundancy (LOCKSS - Lots of Copies Keep Stuff Safe), and offline copies provide a means of disaster recovery should it ever be required.
In addition to transferring the digital content, fixity markers are generated for each file and used to ensure integrity and authenticity throughout the life of the digital object. Assessment of the digital content is an essential step in the ingest phase. File format recognition software is used to determine file types, software versions, and inform preservation decisions. If a format is in danger of becoming obsolete soon, then actions are taken to make sure accessibility continues. Documentation details steps that have been taken at each stage of the process and by whom.
A Blend of Three Strategies
With every accession, the Archives keeps a bit-level preserved version of an accession’s digital holdings. In addition, and where appropriate, a migration strategy is also applied and the results kept in a parallel set of files. When necessary for research access, an emulation plan will be addressed.
Bit-level preservation keeps a file in its original format with each bit preserved in its original order. This strategy ensures that every aspect of the original file is retained with integrity. It does not guarantee future accessibility. An example is a document created in 1985 with a proprietary piece of software that no longer exists, or that current operating systems cannot access, and there is no viewer or conversion software to accommodate it. Bit-level means the file is kept as is. Whatever is known about the file/s (software used) is also noted for future work. In some instances the file might be viewable later as other technologies develop that make migration or emulation viable strategies.
If a file is already in an acceptable preservation format, such as uncompressed TIFF, no additional action is taken at the time. A copy of file might be revisited after five years to make sure it is still renderable and reliable. MD5 hashes also are rechecked to make sure the file is still intact and has not become corrupted.
Based on the significant properties of a digital object, migrating the object from its original file format to a preservation quality file format may be the best strategy to ensure both preservation and enduring access. Based on the object’s significant properties, an appropriate preservation format is selected and the transformation performed. The migrated format is stored together with the original format.
The intent is to preserve the content completely as well as the intended look and functionality. The Archives will migrate a file to its accepted preservation format when required and if possible. For instance, a word-processing document created in proprietary software will be preserved as a PDF/A or PDF file since PDF is a well-established and open format that does not rely on only one specific software platform or version for renderability. Because the original file is always retained, future preservation work could potentially be performed, working directly from the original format.
When migration is not appropriate and the bit-level-preserved original format is not accessible with easily accessible equipment, an emulation strategy is appropriate. An emulation strategy creates an environment to render the file as it would have appeared or functioned in its original form. Emulation can be useful for complex digital packages including computer games, digital art, multimedia, and executables that rely on specific hardware, operating systems, and software to perform and render accurately. An example is an old computer game written for a now obsolete operating system such as Atari DOS that can function on a current machine thanks to the emulator.
- What Does an Electronic Records Archivist Do?, The Bigger Picture, Smithsonian Institution Archives
- A Peek into an Electronic Records Archivist’s Toolbox, The Bigger Picture, Smithsonian Institution Archives
- The Importance of the Original, The Bigger Picture, Smithsonian Institution Archives