Home Directory for the Email Parser Web Service Control Application

Intro

This index.html file is the default index.html page assumed by the Web Application front-end of the Squeak 3.9 email parser.  It must be present (named index.html) in the Squeak parser home directory.

At the time of this writing (10/4/2008), this file is a placeholder.  It should be expanded into a multipage FAQ on topics such as:
Since a set of FAQ pages can be built and edited by any simple html editor (I use the Mozilla "Composer"), I'm leaving it as a sketch for now.  Note: if someone uses one of the high-powered html editors to add to or modify this FAQ, the tool will very likely transmogrify simple html pages into wildly complex pages that can no longer be edited manually and, typically can only be edited by the same tool that created it.  I recommend using only the simplest tools unless there is a VERY good reason to lock yourself into using one of the trickier tools.

Basic assumptions about the parsing tool

The Web interface assumes that all mail accounts to be parsed are rooted in the designated Email_Accounts directory located within the Squeak Parser home directory. That is, the root directories for all Account/folder/mbox trees must be subdirectories of the Email_Accounts directory.
It also assumes that each subdirectory (of which there must be at least one) of a well-formed Account directory contains one and only one xxxx.mbox file.  It contains the email in one folder.  The name of that subdirectory is assumed to be the name of the folder as given by the user. The mbox file must contain all the email messages in that folder.  If the archivist doing the parsing happened to receive a single mbox file with no indication of what the user might have named the folder containing it, the archivist will have to choose a name presumably guided by the sender, receiver, or subject lines in the messages (as determined by looking at the mbox file with some vanilla editor such as Notepad).

How to use the Web Interface

The user will see a text box in which to choose the directory of the account to be parsed.  Below that box is a button labled "Proceed with parsing".  Assuming that the account is well-formed, parsing will begin on the server when that button is clicked..

Below that is a "Parse Status" button.  After a parse has begun, when that button is clicked, text will appear below the button with the parser status (what you would see in the Transcript window within Squeak).

Well-formed accounts

Note: all parser input will be expected to be just as it is for parsing from inside Squeak, i.e., a tree of folder directories each with its mbox file, if any.  Output also will be placed on the server just as before, i.e., the xml output, the MessageSummary.csv file, the attachment files and the BadMessage files, if any, will be created in the same places as they are now.  Blah, blah, blah.

Other technical tidbits

The Comanche web server is exposed on port 9090 and the Seaside parsing application service is exposed on port 9091. Both Comanche and Seaside run as independent processes inside the running Squeak image. The home directory for the Comanche web server is assumed to be the same directory as that in which the parser .exe and image files live.