This document is intended for Smithsonian staff responsible for organizing and managing electronic records. It describes Smithsonian Institution Archives’ guidelines regarding file formats used for the long-term preservation of electronic records. All electronic records transferred to the Archives requiring permanent retention will be handled according to the information contained in this document and related procedural documents. This document addresses file format concerns only. For guidance on the full set of practices necessary to ensure reliable, authentic electronic records are created and maintained prior to transfer to the Archives, please contact the Digital Services team.
Smithsonian units requiring assistance in determining how paper and electronic records should be maintained should contact their appropriate Archives’ contact. An archivist can assist staff to determine whether records should be permanently maintained in the archives, temporarily stored in the records center, or discarded on-site. Learn more about the Records Management program at the Archives.
Smithsonian staff, interns, and volunteers use a wide variety of equipment and software in the course of creating electronic records that include text, images, audio, video, GIS files, email, CAD files, databases, websites, and social media accounts. Digital preservation best practices recommend specific file formats (typically open, non-proprietary, and widely available) for long-term archival use. This document outlines the formats preferred by the Archives for long-term preservation. The electronic record types most commonly used at the Smithsonian are described below along with the corresponding primary (preferred) preservation format. Secondary preservation formats are noted when available and appropriate, and are to be used only when the primary preservation format cannot be accomplished. The conversion to these formats is known as migration. For applications or formats not listed in this document, contact the Archives’ Digital Services team for guidance.
These formats were chosen because of their documented acceptance by the archival and digital preservation communities. Factors leading to this acceptance include format longevity and maturity, adaptation in relevant professional communities, incorporated information standards, and long-term accessibility of any required viewing software. For instance, uncompressed TIFFs are considered a good preservation format for born-digital and digitized still images because of its maturity, wide adaptation in various communities, thorough documentation, and the format is accessible in many software applications.
Preservation of digital items at the Archives employs a strategy of migration of official electronic records (when needed) following transfer. This process helps prevent relying on obsolete operating systems and hardware/software that may be inoperable or unavailable in the future. For example, 3.5” floppy disk readers are no longer installed in computers. SIA ingests electronic records as soon as they are received by the Digital Services team, when possible. The Archives does not actively preserve software or hardware.
The original source files are retained in their native formats as well. This is considered a bit-level preservation version. SIA’s best practices include periodic review of the migrated records, the media, and storage conditions in order to ensure the longevity of the accessions. Preserved collections are stored on secure servers backed up by the Smithsonian’s Office of the Chief Information Officer (OCIO). Copies are made onto LTO (Linear Tape-Open) tapes and stored off-site. Metadata records also are kept with the collections.
The Archives makes every effort to ensure staff and researchers will be able to access these permanent materials indefinitely, but sometimes older files simply will not transfer or open due to corruption of the media or data.
The Archives will request Smithsonian units publish or finalize their electronic records in either a primary preservation format (preferred) or a secondary format prior to transfer to the Archives. Ideally, this occurs when the record is originally created or soon after. If this does not happen, Digital Services staff will attempt to preserve the records received by conducting the format conversion prescribed in this document on a copy of the transferred records. Successful transformations will be verified. See OCIO’s Technical Reference Model policy and preferred product list for the Smithsonian (both accessible by Smithsonian staff only).
Smithsonian Institution Archives Digital Preservation Formats
In accordance with best practices, SIA prefers to preserve transferred electronic records in the formats described below. These formats tend to be open, standard, non-proprietary, and well-established.
If a file cannot be created or saved in a preservation format, then the file will receive bit-level preservation (maintained as is) by SIA. An example of this could be a document in a proprietary format more than 30 years old that cannot be identified by current format detection tools and cannot be accessed. SIA will maintain the file in its current state.
When creating your files consider using these formats below as your original document (create/shoot TIFF images) or saving them in these formats (save your Word file as a PDF when complete) when possible.
|Type||Primary Preservation Format (preferred)||Secondary Preservation Format (acceptable)|
|Text/word processing applications|
XML with schema
|Spreadsheet applications or structured data|
PDF/A (must capture entire workbook – macros disabled)
|Video||Motion JPEG 2000, MOV, AVI||MPEG-4|
|Audio||BWF-Broadcast WAV (.wav is the extension)|
|Websites and social media records||WARC|
Files from Content Management System
|Email messages/account||XML email preservation format - Consult SIA Digital Services||Consult SIA Digital Services|
|Database Management Systems (DBMS)||Keep original||XML with schema|
|CAD||PDF/A, PDF/E or PDF with original file||Original|
|Other||Consult SIA Digital Services|
Related Blog Posts
- Word-processing files need love, too, The Bigger Picture, Smithsonian Institution Archives
- The Importance of the Original, The Bigger Picture, Smithsonian Institution Archives
- Digital Dilemma: Preserving Computer Aided Design (CAD) Files, The Bigger Picture, Smithsonian Institution Archives
- Sustainability of Digital Formats: Planning for Library of Congress Collections, The Library of Congress
- Risk Management of Digital Information: A File Format Investigation, Council on Library and Information Resources
- The Technical Registry: Pronom, The United Kingdom National Archives