Email Preservation - DArcMail


The Smithsonian Institution Archives joined early email preservation efforts in the mid-2000s. In its collections, the Archives has email records dating to the 1980’s and generated using the ELM (Electronic Mail) program, the first used at the Smithsonian. Since then, the Smithsonian has used a variety of email applications and formats including PINEcc:MailLotus NotesGroupWise, and other applications before adopting Microsoft Outlook and Exchange in just as the Collaborative Electronic Records Project (CERP), the Archives joint email preservation research project with the Rockefeller Archive Center, was getting underway.

As a result of the research with the Rockefeller Archive center and subsequent collaborations with other organizations, the Archives developed methodology more suited to address the major shift in email use, from isolated use for convenience to ubiquitous form of informal and formal correspondence. Additionally, the Archives developed open-source applications to preserve (CERP Email Preservation Parser) and more recently adding processing and access (DArcMail).

Current Approach

The Archives focuses its email collections development on the accounts of identified record creators. These include senior Institution managers, museum directors and some senior curators, however noteworthy individuals are found at many different levels of the Institution. Therefore, some selection of accounts occurs on a case-by-case basis. This approach has become more popularly known as Capstone. For more details, consult our functional records schedule.

Challenges and Solutions

Email by its nature presents challenges to archival preservation, most notably: 1) the variety of email message formats; 2) message components hidden by default by the email client; 3) capturing the internal account organization, often nested, given to the email by the account owner; 4) capturing the interrelationships between messages and attachments within an account; and 5) variety of file formats of embedded attachments. 

Very large scale presents another significant challenge. The nature of the Smithsonian’s work and the depositors’ length of time at the Institution results in accounts that range in volume from tens of thousands emails to, in some cases, half a million emails or more. Depending on any given year, we are likely to acquire roughly fifteen accounts. Many available preservation tools are unable to effectively address accounts of this size.

DArcMail (Digital Archives of Email)

DArcMail v. 2.0 released!

Smithsonian Libraries and Archives is pleased to announce the release of DArcMail v. 2.0 on November 4, 2021. DArcMail is our open source email preservation application, created to accomplish preservation and limited processing and access of one or more email accounts. Originally released in 2018, it is Python-based and uses SQLite on the backend. Consequently, a full DArcMail implementation is no-cost, platform-agnostic and easy to maintain. DArcMail includes a graphical user interface, powerful search and filtering and faster processing time than our original open source application, the CERP Email Preservation Parser. Read more about using DArcMail or download the software on Github.