About the Project
For more than two decades, Smithsonian Libraries and Archives (SLA) Electronic Records Program has preserved and stewarded millions of born-digital holdings in the Archives permanent collections. The term “born-digital” refers specifically to records and primary source material that is created digitally. For the past 10 years, around 50 percent of the Archives annual average of 289 accessions have contained born-digital materials. The Archives' stewardship of these specialized holdings aligns with the Trustworthy Digital Repository standard and the Open Access Information System (OAIS) Reference Model. Preserving this material digitally helps to ensure a truly authentic object is accessible for decades to come.
The Smithsonian Institution Archives' holdings contain more than 400 individual file formats and a wide variety of removable media. Prior to the launch of the BDAP, our findings aids have only indicated the presence and details of these records at the folder level. Exploring the folder contents at the file level required working with the Archives reference team to identify content and then request access either in the Archives' reading room or via electronic file transfer. The BDAP aims to provide more immediate and detailed online access to our file-level holdings for researchers worldwide by providing detailed born-digital file inventories through our online finding aids.
This multi-phased project will begin with hyperlinks from the collection finding aids to static file inventories of our born-digital content that reflect the original file structure and hierarchy used by the content creator; future phases will aim to provide direct access to born-digital file content when possible. In certain cases, access versions of the creator's files may differ from the original file format, for example, when the original software program is no longer available.
Please contact osiaref@si.edu with additional reference questions.
How to Read the Born-Digital Inventory Reports - Definitions
For the purposes of our SIA Born-Digital Project (BDAP) the following values and column headers are defined as:
Name:
The heading folder lists the SIA collection number in the format SIA-##-###.
Top level folders, which are nested below the heading folder, include the SIA collection number in the format SIA-##-### followed by a short description derived from the folder content. The designation for the archival Box Number “B##” and the designation for archival disk number “D## “ is added to the end. In some cases, these folder names have been shortened due to character limitations, however the substance and meaning has not changed. For example “Smithsonian” may sometimes appear as “SI”.
Names of subfolders and files display the digital folder and/or file name as submitted by the original creator. In some cases, as above, the folder names have been shortened due to character limitations.
Size: the numerical storage size of the folder and/or digital file represented in Megabytes.
Files: the numerical count of files included in the line item. Folders may contain multiple files, while a single file will display a single file count.
Folders: a numerical count of sub-folders nested within a given folder. Single files will still show a count, but it will say "0."
% of Parent (Allocated): Shows the size of each folder and file, as a percentage, in relation to the collection's total born-digital content. For the purpose of the born-digital inventories, “Parent” represents the total storage size of the born-digital content in each collection number.
***Please be aware that the term "Parent" has many different meanings in other contexts relating to digital and digitized content, so any questions about the inventories that reference "Parent" should be careful to specify the source of the information.
Last Modified: This depicts the date of the last change made to the folder and/or files. Top level folder names were modified during the creation of the born-digital inventories and represent an archival processing date. The dates on the individual files have not been altered and represent the date of the last saved changes to the content as provided by the creator.
Type: Indicates if a line item is a folder or individual file. Individual files will include the file type, based off the file extension.
File Extension: Indicates the likely format of a digital file. This is determined using the suffix of the provided file name [ex. .jpg, .png, .pdf]. Please be aware that older legacy files did not always use the software program's default extension, e.g. a text file named “minutes.jan” or “project.rpt” and often show up as format ‘unspecified' and at times, file extensions can be inaccurate.
For a detailed list of file types and their meaning, see the Library of Congress's list of format descriptions here: https://loc.gov/preservation/digital/formats/fdd/browse_list.shtml.