One Lens for Multiple Archives: A Pan-Institutional Survey of Born Digital Holdings

In just a handful of decades, our society has gone from hearing about the impending miracles of the digital age to daily lives permeated with digital culture. As a result, digital objects have become part of the Smithsonian’s historical record with its digital archives managed and preserved by the Smithsonian Institution Archives (SIA).  Rarely do born digital holdings arrive carefully set to the side with documentation about what is on the storage media and with a backup or copy. At the Archives today, one out of three accessions will contain born digital material, most commonly found mixed in with the paper files.

Similarly other archives at the Institution have been steadily acquiring born digital holdings over the past several decades. Four years ago, the Smithsonian Institution Archives and archives within the National Museum of Natural History (National Anthropological Archives, Human Studies Film Archive), the National Air and Space Museum, the Archives Center at the National Museum of American History, the Archives of American Art, and the National Museum of African American History and Culture, gathered to frame out a collaborative survey of their born digital holdings. Key goals of this effort were to uncover hidden holdings, establish physical and intellectual control of born digital material, and to perform a baseline preservation assessment, thereby strengthening the collections care provided. An integral part of the survey’s design is its shared methodology and metrics which can then serve as a foundation for future joint preservation initiatives and stewardship planning. 

Receiving its first grant in 2012, the survey work focused initially on building an inventory of removable storage media present in each archive while completing questionnaires that evaluated the preparedness of the archives to manage these types of collections. A second grant was received in 2014 to complete the survey work, perform risk analysis at the individual file level and provide essential interventions to stabilize these fragile materials. Completed in April 2015, the resulting qualitative and quantitative insights are being incorporated into the collections stewardship planning of the participating archives and museums. 

Leveraging familiar waters

Established eleven years ago, the Archives’ Electronic Records Program (ERP) conducted its first born digital holdings survey in 2004-2005. As a result, changes were made to the acquisition, processing and preservation workflows to achieve best practices for holdings that can vary dramatically in formats, age, and quantity.  What started initially as documents, spreadsheets, and simple databases from the late 1990’s, has now grown to include images, audio, video, mobile apps, websites and social media, construction drawings, GIS data, email accounts, scientific data sets, and even custom built software programs with an estimated half a terabyte of new born digital holdings acquired each year.

Obsolete storage media.

The Electronic Records Archivist Lynda Schmitz Fuhrig and ERP volunteer Peter Finkel assisted regularly throughout the survey and continue, along with the shared workflows and software tools, to serve as mentors and a common resource to the survey’s participating archives.

In many ways, the survey implemented the principles laid out in Ricky Erway’s white paper, "You've Got to Walk Before You Can Run".

Determining levels of risk

Preservation risk for content on media that could be read was determined on the basis of format and age, creating a simple mechanism to rank individual files:

  • Severe (1) indicated files older than 10 years and whose format the participating archive was unable to access. 
  • High (2) indicated files younger than 10 years and whose format the participating archive was unable to access. 
  • Medium (3) indicated files older than ten years yet were in formats that the participating archive was able to access. 
  • Low (4) indicated files younger than ten years in formats that the participating archive was able to access. 

Taken as a whole, risk was distributed 14% Severe, 5% High, 43% Medium and 38% Low according to the image below:

Aggregate risk level by percentage from pan-Smithsonian survey of born digital holdings.

The results

Over 470 accessions were inspected, 6,613 pieces of removable media inventoried, and 651,629 born digital files assessed for preservation risks. Concurrently, the assessed files were stabilized. That is to say, they were scanned for viruses, their fixity values determined, backups made into secure storage environments, and metadata generated such that a minimum of bit-level preservation of well-defined holdings is now in effect. Combined with the portion of SIA holdings that had already been assessed and preserved prior to the survey, close to 1.5 million born digital holdings across six archives are now under proper archival control. Placed in the context of the recently published [POWRR framework], the progress made by this survey is striking. 

State of born digital holdings preservation among survey participants of 2012:

State of born digital holdings preservation among survey participants of 2012.

State of born digital holdings preservation among survey participants at the survey conclusion:

State of born digital holdings preservation among survey participants at the survey conclusion.

We are excited at the enduring effect this survey will have on the born digital holdings within Smithsonian collections and their stakeholders, as well as the stewardship community and the born digital advocacy it empowers.

Related Resources

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.