The DArcMail Suite was created by the Smithsonian Institution Archives, which creates XML preservation files of email accounts and lets a user search a collection of email messages.

Email Management Remains Important

Archives and libraries explore various tools to preserve and share email collections.

Happy New Year and Happy 2020!

Love it or hate it, but email is still a big part (of most) of our lives, despite growth among cloud, mobile, and social media platforms for communication in the last ten years. Radicati, a technology market research group, predicted that there will be more than four billion email users in 2020. That’s more than half of the number of people on the planet.

A new year is a great time to think about or even revisit email account practices.

A list in an email inbox of CNN alerts.

Should all email messages be kept? No! We have an excellent post on weeding email messages in the workplace that is still relevant today. Some of this information also can apply to one’s personal email when it comes to spam or messages that are part of a later thread of messages. Some email clients/software can even help automate some cleanup work depending on the features it offers. For instance email software sometimes can move specific email messages from a sender to a specific folder by applying filters or rules.  

Many archives, libraries, and other institutions realize the significance that email messages and collections offer, and are considering issues and tools for email appraisal, sensitive data and duplicate messages, preservation, and access in a variety of ways.

Good email practices are important to us in the Archives. Email collections can present a wealth of information for researchers on key business decisions, day-to-day operations, and social networks evident in “To” and “From” lines. Using email in research successfully is all dependent on how an email collection is managed by the user over time and, eventually, by a repository. Just like a donor typically should not bequeath a box of grocery store receipts (depending on the collecting mission of the archive, of course), an email collection usually should not have messages retained permanently about donuts in the breakroom or neighborhood listserv announcements about pet sitters. The repository needs to make sure the messages and attachments remain accessible while retaining authenticity and integrity— even it means the messages and attachments are likely to be presented in a format or program that might not be in the software that the email was originally viewed from.

Email collections have the additional complication of possibly containing sensitive data within email messages or attachments such as bank account numbers, Social Security numbers, or other information. Some current email applications, though, can prevent or flag this information before it is sent. 

Some email projects that are exploring these issues:

  • The Review, Appraisal, and Triage of Mail (RATOM) is a project at University of North Carolina at Chapel Hill in partnership with State Archives of North Carolina. The Andrew W. Mellon Foundation awarded a grant to the project, which is exploring open-source tools and procedures to identify email in born-digital collections and to detect sensitive information. It also helps sort out important messages for preservation that can be tagged or labeled for better organization and retrieval.
  • PDF as an Archival Container for Email is a project, also funded by the Andrew W. Mellon Foundation, from the University of Illinois at Urbana-Champaign that is determining the requirements needed if emails are preserved as PDFs. A requirements draft is expected to be released in early 2020 for public comment. This work can help standardize the use of PDF as another option for the preservation and accessibility of email messages/collections.

(Full disclosure – The author is involved with both projects, serving on the advisory board for the RATOM project and as a collaborator for the PDF email project)

An email alert about the weather sent via email. A labels tag is above the email.

  • ePADD – This open-source tool/project from Stanford Libraries offers a potential donor/email holder the ability to sort and decide which emails they might want to donate to a repository through an export function.  An archive also can use ePADD to search email for sensitive information like Social Security numbers as well as filter by certain correspondents. A researcher can use ePADD for viewing a collection of email messages since its interface is very user friendly.

Screenshot from the Archives' DArcMail suite.

  • DArcMail Suite - This open-source application was developed by the Smithsonian Institution Archives for its processing, review, access, and preservation work with email collections. It creates XML-preservation files of email accounts and offers basic searching and sorting and browsing within email collections. Work also has started on searching for sensitive information.

While there is general agreement in some areas of digital preservation among practitioners (uncompressed broadcast WAV for audio and PDF/A or PDF for proprietary word-processing documents), there is no one best solution for processing, preserving, and making email collections accessible. More options are being developed since the early-to-mid-2000s when work was just starting. An archive/library needs to explore its systems and workflows that might already be in place to determine which email tools and procedures to adopt.

Related Resources

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.