There are some of the common challenges that archives and other organizations face regarding digital preservation.
Proprietary and Obsolete Formats
Formats that are optimal for long-term preservation and access tend to be open, well established, and not dependent on only one software application, hardware, or operating system. If the Archives receives a digital file that is not already in its accepted preservation format, it will determine if the file is at immediate risk for obsolescence, and if so, will migrate it into the preservation format if possible.
Accessibility of Files
In some cases, a preservation format might also serve as the access format for a digital file. A collection might include a set of word-processing files that were created in XyWrite decades ago. The XyWrite application was mostly used in DOS and Windows environments in the late 1980s and early 1990s. These files have been migrated to PDF/A, which should be accessible on current and older computers with the proper software. There are multiple software applications that can access PDF files.
On the other hand, a research request might include a set of audio files. The Archives preservation format for audio is Broadcast WAV, which can be uncompressed and contains embedded metadata. These uncompressed files can be large and difficult to quickly deliver to a researcher. Instead the access files would be delivered as MP3 files because they are smaller and the audio quality difference is not typically apparent to most listeners.
Using Storage and Backups Effectively
While storage costs continue to decline, the growth of digital records is exponentially increasing. Cheaper storage can sometimes lead to the false notion that everything can be saved just because there is space for it. Digital records need to be managed effectively in order to be useful. This means making thoughtful decisions about retaining file versions (draft vs. final), consulting records schedules, using informative and consistent file names, and adding metadata.
Another cost of operating online storage is the energy required to power machines, especially when it is available 24/7. Offline storage still requires management so that digital records can be retrieved in a timely manner when needed. Archives and libraries also need to regularly forecast storage needs as digital collections grow.
The Archives’ storage needs are fulfilled by network servers behind a firewall at a Smithsonian data center that are backed up regularly to multiple tapes, and one copy is kept offsite as a best practice. As the machines age, the Smithsonian plans for the eventual replacement to continue keeping digital collections free from corruption and remaining accessible.
Planning Ahead for Software, Hardware, and Operating Systems Obsolescence
Software developers update and improve their applications. Hardware slows down as it ages and manufacturers build faster machines. Operating systems also evolve and adapt to changes in computing environments. The Archives monitors these developments and plans for how its digital collections might be affected. If an accession arrives that contains a 3.5” floppy diskette with old Microsoft Word 97 documents, staff needs to have and maintain the proper equipment and procedures to access both the obsolete media and files. In this case, the Archives is able to use a USB-connected 3.5” drive to a PC to transfer those files off the media. Either special viewer/conversion software or newer Microsoft Word software is used to access a copy of the file and migrate it to its preservation format of PDF/A or PDF.
Related Resources
- Together We Can Meet The Email Preservation and Access Challenge, The Bigger Picture, Smithsonian Institution Archives
- Digital Dilemma: Preserving Computer Aided Design (CAD) Files, The Bigger Picture, Smithsonian Institution Archives
- Clean Sweep in the New Year: Organizing Digital Photos, The Bigger Picture, Smithsonian Institution Archives