The Collaborative Electronic Records Project

Email Preservation Parser

The Email Preservation Parser is available for download and use. We hope that as archivists and other users work with the parser, additional enhancements and functionality will be developed, incorporated, and shared by its user community. Therefore, we offer it with open source software and documentation licenses.

The parser is designed to be used on email accounts, or groups of related email messages, that are transferred to an archival organization in accession in contrast to email records that are harvested incrementally from active email systems.

The application runs on the open source Smalltalk Squeak virtual machine environment that works on Windows, Macintosh, and other operating systems. The parser has only been tested in a Windows XP environment. See the parser installation and user guide documentation for full details.

We would like to know if you choose to download and try the parser. Please send an email to IT Archivist Ricc Ferrante (ferranter [at] si.edu). We would be glad to answer questions about the parser's installation, use and how the Smithsonian Institution Archives (SIA) is using it in SIA's ongoing digital preservation activities.

Download CERP's Email Preservation Parser

Please download and review the installation guide before installing the application.


Parser Software

Test email account

  • Email Account Example.zip
  • This a test email account in the MBOX format that can be used for testing once the parser is installed.

Tips to keep in mind

  • The extension of the MBOX file must be .mbox. If you have .mbx file/s, change it to .mbox before running the parser. Thunderbird uses the .mbx extension and the parser does not recognize it.
  • During the installation of the parser, be sure to save all files at the C:\ drive level so the parser will run correctly, i.e., don't install on your Desktop or in Program Files. It should be C:\EmailParser.

Squeak (Smalltalk) Virtual Machine Environment

Open Source Licenses

The Squeak license is posted at http://www.squeak.org/SqueakLicense/ on the Squeak website and applies to the Squeak VME. The parser itself is covered by the MIT License. The documentation is covered by both the MIT License and a Creative Commons Attribution Non-commercial Share Alike 3.0 license.