The Work Continues While We Are Away

Even though our physical office in Washington, D.C., is closed to staff and visitors due to the COVID-19 pandemic, the Smithsonian Institution Archives staff is able to work remotely on some projects. For those of us who work with the born-digital collections, this means we are continuing to focus on web archiving, reference requests for some accessible materials, and cataloging and metadata projects, while making sure various systems can be securely accessed.

It also is a time to catch up with some backlogs and projects that have been put on the back burner, while also dealing with and adapting to the everyday challenges of technology limitations away from the office.

One of those ongoing projects is prep work for sharing more of our born-digital materials online. Winter intern Julie Rockwell recently wrote about some workflow and access ideas for the Archives. We are exploring what could be some engaging and interesting materials to post online and how we would do it.

DArcInfo (Digital Archive Information System) provides details about the Smithsonian Institution Arc

How does this work start though? Our in-house database called DArcInfo (Digital Archives Information System) of born-digital collection items helps with this review. With it, we are able to sort collections by year, restriction status, types of files, and other parameters to narrow down to the best candidates.

A few considerations for sharing born-digital files online include:

Format – is the file in a format that is accessible and accurately renderable? For instance, a WordPerfect file requires specific software for viewing since it is a proprietary format. At the Archives, the preservation format for WordPerfect files is PDF/A or PDF. A PDF also serves as the access copy for that WordPerfect file.
On the other hand, another file might be a mystery, in that our tools are not able to identify what it is at this time (could be corrupt or its age/rarity is not recognized by current format identification tools). It is a file that has bit-level preservation, but does not have an access copy available. It is not a candidate at this time to be posted.
Context – is there enough information within the file (embedded caption or accurate file name) or other details? For example, a CD can have incorrect labels or no labels at all and the files are named IMG001, IMG002, IMG003, etc. In some cases, also viewing the associated finding aid and other items (paper or digital) for the collection can provide some clues.

It can be fun, though, to post a mystery photo to see if the public can identify people, objects, or places.
Privacy issues – unrestricted collections also can present challenges with sensitive data that should not be public. It is possible a few files out of thousands might contain personally identifiable information (PII). A careful review that involves both software tools and human intervention is necessary. There also might be intellectual property rights issues in some cases.

Images in a thumbnail display from a CD from the National Air and Space Museum, SIA Accession 15-233

The example here is from the National Air and Space Museum’s Office of Special Events. This collection has both paper and digital records and documents the National Air and Space Museum’s Trophy Awards. The images were on CDs labeled “NASM Awards” and the word- processing files were on 3.5” floppies with labels referring to 1995 and 1996 Trophy Award scripts.

The former National Air and Space Director, Gen. J.R.

Viewing the files from the CDs in file explorer as thumbnails and the .tif extensions makes it clear that they are images. There are no captions to identify the event or the people, though, with this set that is possibly from 2005.

The people in the photograph above are the NASM employees being recognized for their work, in addition to the separate Trophy Awards. Unfortunately, there are no names for anyone and no photographer credit either on the CD or the paper folder it was in or embedded within the files. I do recognize Gen. J.R. “Jack” Dailey, the former NASM director, in some of them, however, we’ll need to rely on a review of the entire collection or other related collections or talk to NASM staff to identify the others. Another potential setback is that the metadata for the image has a creation date of 2002, meaning it’s possible that the photographs are not from a 2005 event and the CD was mislabeled.

WordPerfect files from 1995 and 1996 from SIA Accession 15-233 that were identified with file format

The other files from this accession require more digging. This is where format tools assist in this detective work. Making use of JHOVE and DROID in this instance, the files are identified as WordPerfect 5.1. As noted above, the preservation and access copy for these file types is PDF/A or PDF since they can be viewed more easily in an online environment.

Previous Pause Next

1 of 2

WordPerfect files from SIA Accession 16-147 that needed tools to detect what file format they are.

Screenshot of WordPerfect files from SIA Accession 16-147, Smithsonian African American Association, Program Records, 1988-1998, with some unusual file extensions in 2020. File format tools helped determined what they are.

WordPerfect file from 1998 announcing an event by the Smithsonian African American Association from

The MAILFLYR file is a WordPerfect file for a Smithsonian African American Association event in 1998. SIA Accession 16-147, Smithsonian African American Association, Program Records, 1988-1998.

More examples are WordPerfect files from programs held by the Smithsonian African American Association. Again, it isn’t immediately clear from a glance what these files might be. Note the creative extensions (or none), which was commonplace in the 1980s and 1990s, that were used: .mem for memo, .98 for a file created in 1998, and .ins for Instructions. It is unclear what .NTC implies. This is another reason file extensions aren’t always an accurate indicator what a file might be. The MAILFLYR file is about a mentor group program that was set for January 30, 1998.

Stay tuned as we continue to work to share more of our born-digital materials.

Related Resources

Digital Preservation Challenges and Solutions, Smithsonian Institution Archives
Accessing Digital Archives, UNC-Chapel Hill, University Libraries
“Finding the Digital Treasures,” by Lynda Schmitz Fuhrig, The Bigger Picture, Smithsonian Institution Archives

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.

Search Google Appliance

The Work Continues While We Are Away

Leave a Comment