The Smithsonian Institution Archives joined early email preservation efforts in the mid-2000s. In its collections, the Archives has email records dating to the 1980’s and generated using the ELM (Electronic Mail) program, the first used at the Smithsonian. Since then, the Smithsonian has used a variety of email applications and formats including PINE, cc:Mail, Lotus Notes, GroupWise, and other applications before adopting Microsoft Outlook and Exchange in just as the Collaborative Electronic Records Project (CERP), the Archives joint email preservation research project with the Rockefeller Archive Center, was getting underway.
As a result of the research with the Rockefeller Archive center and subsequent collaborations with other organizations, the Archives developed methodology more suited to address the major shift in email use, from isolated use for convenience to ubiquitous form of informal and formal correspondence. Additionally, the Archives developed open-source applications to preserve (CERP Email Preservation Parser) and more recently adding processing and access (DArcMail).
The Archives focuses its email collections development on the accounts of identified record creators. These include senior Institution managers, museum directors and some senior curators, however noteworthy individuals are found at many different levels of the Institution. Therefore, some selection of accounts occurs on a case-by-case basis. This approach has become more popularly known as Capstone. For more details, consult our functional records schedule.
Challenges and Solutions
Email by its nature presents challenges to archival preservation, most notably: 1) the variety of email message formats; 2) message components hidden by default by the email client; 3) capturing the internal account organization, often nested, given to the email by the account owner; 4) capturing the interrelationships between messages and attachments within an account; and 5) variety of file formats of embedded attachments.
Very large scale presents another significant challenge. The nature of the Smithsonian’s work and the depositors’ length of time at the Institution results in accounts that range in volume from tens of thousands emails to, in some cases, half a million emails or more. Depending on any given year, we are likely to acquire roughly fifteen accounts. Many available preservation tools are unable to effectively address accounts of this size.
Our most recent application was created to accomplish preservation and limited processing and access is DArcMail (Digital Archives of eMail) of one or more email accounts. Released in 2018, this open source tool is Python-based and uses either SQLite or MySQL on the backend. Consequently, a full DArcMail implementation no-cost, platform-agnostic and easy to maintain. DArcMail adds a graphical user interface, powerful search and filtering and reduced processing time to the CERP tool’s preservation functionality. Read more about using DArcMail or download the software.