Email Preservation - Collaborative Electronic Records Project


The Smithsonian Institution Archives has led and collaborated on the earliest efforts in the field of email preservation since the mid-2000s. Email was first used at the Smithsonian in the 1980s using ELM (Electronic Mail). Since then, the Smithsonian has used a variety of email applications including PINE, cc:Mail, Lotus Notes, GroupWise, and other applications before adopting Microsoft Outlook and Exchange in the mid-2000s. Smithsonian employees were told to print out email for recordkeeping, as was customary at other organizations and businesses at the time. As an archives organization that receives content that can be decades old, a preservation solution for email correspondence was essential.

Very few organizations and archives were working on the email preservation challenge in the mid-2000s. The Smithsonian Institution Archives teamed up with the Rockefeller Archive Center in 2005 to tackle the issues of long-term preservation of digital records. The focus quickly turned to email preservation for the three-year Collaborative Electronic Records Project (CERP), which resulted in tools and processes for acquiring, documenting, processing, and preserving email accounts and associated attachments.

An XML email preservation schema was co-developed with the E-mail Collection and Preservation (EMCAP) project during CERP that is still being used by the Archives today. CERP also took the account-level approach rather than message by message for preservation, as other email research projects had done prior to CERP, which makes collection and metadata management easier.

Issues Identified during CERP

  • The variety of attachment formats - some could not be easily identified or viewed.
  • The different versions of email message formats - MBOX is used as a preservation format among some archives, but MBOX files can vary due to the software that created it since MBOX is not a controlled standard.
  • The different management styles of email accounts - some account holders kept every email in the inbox while another deleted most messages on a weekly basis. Some had personal messages mixed in with business messages.
  • PII (personally identifiable information) within accounts.
  • Duplicate messages within an account.

CERP documented its findings and approaches to these issues, in addition to creating guidance for account holders regarding managing and weeding of accounts prior to transfer to an archives or repository. Many of these issues persist today.

Email Preservation Today at the Archives

Email accounts are accessioned by the Archives based on the role of the account holder at the Smithsonian, and usually coincides with his or her departure from the Smithsonian. Accounts that could be considered for permanent accession can include curators, museum and unit directors, undersecretaries, the secretary, and other senior Smithsonian officials. In some cases, email messages about a particular high-level project are acquired by the Archives. Email collections have a fifteen-year access restriction.

In 2018, the Archives has released DArcMail, a new email preservation application and successor to the CERP tool. The DArcMail application adds a graphical user interface, powerful search and filtering and reduced processing time to the original preservation functionality. DArcMail is available as open source software delivered on a Python platform for Windows, Mac OS and Linux operating systems. Read more about using DArcMail or download the software.

Related Blog Posts

Presentations & Resources