Web-enabling Archive Digital Content

As a web developer for Smithsonian Institution Archives, I deal with different issues than most of my colleagues who are professional archivists. Accordingly, I have an entirely different perspective about my role here at the Archives.

Archivists are focused on the content of our collections on a very fundamental level—they actually get to touch and feel the physical objects in our collections (with gloves as appropriate of course!). They are trained on organization, appraisal, description, conservation, recovery, and storage, and they provide reference support, answering questions for those doing research within the Archives’ collections. Whether the items our archivists work with are physical documents, or “born digital” objects, such as word processing documents or digital photographs, the same issues come up. The only difference is how these items are stored and handled.

Sarah Stauderman, Susannah Wells, and Marguerite Roby.

As a web developer, the goal is to provide access to two levels of information. First, to let others know what the Archives’ has in its collections, and secondly, to help provide access to “digital surrogates” of the physical artifacts in our collections.

The primary purpose of our online Finding Aids or Collection Guides is to help the public explore the Archives myriad holdings. The Finding Aids are not historical documents themselves, but a descriptive summary of what is contained within a collection. Our Finding Aid Search (right side of page) allows full-text searching of these documents to help discover which specific collection may be of interest. These finding aids are created using a standardized XML format called Encoded Archival Description (EAD), and are generated from data created by archivists when they accession a new collection. XML is not very pretty to read . . . unless you are a data geek. I like XML because it is very generic and easy to process for display by a computer, as compared to an average word processing document or database. Our finding aid search applies a visual style to make the Finding Aids more visually usable. Additionally, the XML format allows us to share our data with collaborative efforts, like Archive Grid--a search engine that allows one to search through historical documents, papers, and histories held in archives around the world.

As a sample, here is how a small piece of the finding aid for our accession RU7055, the Vail Telegraph Collection as it appears in XML:

And here is how it appears on our web site:

Now, the second level of interest, and likely of most interest to readers as well, is access to direct digital surrogates for artifacts via the web. We use the term “surrogates” as the form of digitization varies widely, depending on the format of the object. For example, if you have ever used born-digital images from modern cameras or made scans of print documents, you are familiar with the digital files they produce and you can display them on your computer. But items like movie film, video, or architectural CAD drawings can present challenges for both storage, due to size, and display, when in the future the software that created them may no longer function.

This is a much bigger issue due to the volume of material already in our archive: 37,412 cubic feet for 5,408 collections, and growing! The good news is that 95 percent of that material is digitally described, and the unrestricted items are searchable with both classic and full-text searches. However, of that material, only about 250,000 items from our collections have digital surrogates, and most of them are born-digital items acquired since 1994. We’ve barely put a dent in the volumes of historical collections the Archives holds.

Lynda Schmitz Fuhrig

But, as the digital surrogates are created, we are developing ways to deliver them via the web, and we are currently experimenting with different possibilities. Recently, for example, I’ve worked with one of our interns to test linking to digital surrogates directly within the EAD finding aid document. The other half of this problem is the delivery mechanism. Our system for the storage, retrieval, and display of digital images is great. However it won’t work for the archival ISO standard PDF/A format used for our scanned documents. We use PDF/A since it is the recommended format for the long-term archiving of electronic documents. The PDF/A format also gives us other advantages, like grouping related documents from different collections across the Smithsonian, such as those from the Field Book Project—a collaborative effort between the National Museum of Natural History and the Archives, that creates an online location for scholars and others to visit when searching for field research materials.

As you can imagine, the volume of scans and digitized material is huge and expanding daily. In addition, we have the continuous struggle with available resources, from digital storage space, to scanning equipment to support the volume and quality of work demanded for archival work, as well as the staff to handle it all. With the help of many interns this summer, we hope to put a small dent into our continuing flow of content while giving them much needed exposure and experience in the world of digital archiving. By the end of summer, we hope to have approximately 10,000 more digital surrogates available and a plan to make them web accessible.

These are the kinds of challenges that many Archives face today, but with patience, good planning, and a steady pace we will continue to make the resources of the Archives available to the world.

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.

Search Google Appliance

Web-enabling Archive Digital Content

Leave a Comment