When I last wrote about email in 2011, there were rumblings that the electronic communication tool was dying. Claims about email being on its last leg still continue. Texting and social media tools are presenting additional options depending on the message content and who it is intended for. A business contract is not being sent via Facebook Messenger. Also consider that many online forms still require an email address from the person seeking a service, newsletter, etc.
Email helps (contacting multiple parties at once) and hinders (messages that get buried) us. It was 10 years ago that the Smithsonian Institution Archives and the Rockefeller Archive Center launched its Collaborative Electronic Records Project (CERP) that evolved into an email preservation project. At that the time, the largest email account we worked with was 1.5 GB or 28,000 messages. Today, our collections include individual email accounts that are nearly 30 GB or more than 250,000 messages and attachments. These email collections come from accounts that are no longer active at the Smithsonian, dating from the late 1990s through 2015.
Even if email is obsolete in five years, memory institutions will continue to receive email accounts from previous years that need to be accessible to researchers.
Other archives, libraries, museums, universities, and various organizations also are exploring email preservation challenges within their collections. These messages and attachments come from artists, authors, professors, and government officials, to name a few. Researchers, scholars, and journalists have always had an interest in the correspondence from the past. Previously this information was in the printed form of letters, memos, cards, etc.
In June the Library of Congress and the National Archives and Records Administration hosted the Archiving Email Symposium. There were about 150 attendees, which included archivists, librarians, technology specialists, curators, and others. The event included presentations by the Smithsonian Institution Archives, the Library of Virginia, Stanford University, and various federal agencies. Topics included toolsets, appraisal, legal and records management issues, and processing workflows. A workshop on the second day focused on challenges and next steps for the interested parties to address through additional collaboration.
More tools and approaches are being developed across the preservation community to provide access and to help preserve email collections. This is just a sampling of some projects:
- Library of Virginia's Kaine Email Project makes the emails from Governor Tim Kaine’s administration searchable online in full-text PDFs.
- Stanford University’s ePADD or email: Process, Accession, Discovery processes an email account and allows for searching, browsing, and restricting messages, as well as applying user-created lexicons to help in finding confidential information. Some visualization features also are available (Note: The Smithsonian Institution Archives assisted in testing the software and providing feedback).
- The University of Maryland is working with email collections from companies that have failed. The project is dealing with issues of PII (personal identifiable information) and researcher access.
- Harvard University developed a system that Harvard curatorial partners are using that takes in email content, deals with processing of the materials, and offers long-term preservation of the messages and attachments.
The Archives also has been busy improving its in-house tools for email preservation work. Since the Smithsonian email accounts have grown in size, our original preservation processing software was showing its limitations. We have been testing an in-house program called DArcMail (Digital Archive Mail System) written in Python that still gives us the XML preservation output we adopted during CERP, as well as a database for searching email messages and attachments within accounts. So far the results have been promising with faster output, multiple options for searching, and viewing related emails within a chain.
The various options that are being tested and implemented demonstrate that many institutions and organizations understand the importance of preserving email communications from the late 20th and early 21st centuries.
- The History of Email at the Smithsonian, The Bigger Picture blog, Smithsonian Institution Archives<
- Emerging Collaborations for Accessing and Preserving Email, The Signal: Digital Preservation blog, Library of Congress