Home
Directory
for the Email Parser Web Service Control Application
Intro
This index.html file is the default index.html page assumed by the Web
Application front-end of
the Squeak 3.9 email parser. It must be present (named
index.html) in the Squeak parser home directory.
At the time
of this writing (10/4/2008), this file is a placeholder. It
should be expanded into a multipage FAQ on topics such as:
- How to use the parser, both from the Web Interface and from
within the parser.
- How to prepare the Account tree, i.e., what files must be where
in a well-formed account tree.
- What files are produced by the parser and what they mean:
MailAcctName.xml (the main parser output), attachxxxxx.xml,
BadMessagexxxxx.eml, MessageSummary.csv, parseStatus.txt, and so forth.
- Maybe how to install
- Some pedigree on Squeak 3.9 and how to learn Smalltalk/Squeak
- Certainly how to stop/start both Comanche and Seaside
- Some links to Seaside documentation in case the user needs to
modify anything
Since a set of FAQ pages can be built and edited by any simple html
editor (I use the Mozilla "Composer"), I'm leaving it as a sketch for
now. Note: if someone uses one of the high-powered html editors
to add to or modify this FAQ, the tool will very likely transmogrify
simple html pages into wildly complex pages that can no longer be
edited manually and, typically can only be edited by the same tool that
created it. I recommend using only the simplest tools unless
there is a VERY good reason to lock yourself into using one of the
trickier tools.
Basic assumptions about the parsing tool
The Web interface assumes that all mail accounts to be parsed
are rooted in the designated Email_Accounts directory located within
the Squeak Parser home directory.
That is, the root directories for all Account/folder/mbox
trees must be subdirectories of the Email_Accounts directory.
It also assumes that each subdirectory (of which there must be at least
one) of a well-formed Account directory contains one and only one
xxxx.mbox file. It contains the email in one folder. The
name of that subdirectory is assumed to be the name of the folder as
given by the user. The mbox file must contain all the email messages in
that folder. If the archivist doing the parsing happened to
receive a single mbox file with no indication of what the user might
have named the folder containing it, the archivist will have to choose
a name presumably guided by the sender, receiver, or subject lines in
the messages (as determined by looking at the mbox file with some
vanilla editor such as Notepad).
How to use the Web Interface
The user will see a text box in which to choose the directory of
the account to be parsed. Below that box is a button labled
"Proceed with parsing". Assuming that the account is well-formed,
parsing will begin on the server when that button is clicked..
Below that is a "Parse Status" button. After a parse has
begun, when that button is clicked, text will appear below the button
with
the parser status (what you would see in the Transcript window within
Squeak).
Well-formed accounts
Note: all parser input will be expected to be just as it is for parsing
from inside
Squeak, i.e., a tree of folder directories each with its mbox file, if
any.
Output also will be placed on the server just as before, i.e., the xml
output, the
MessageSummary.csv file, the attachment files and the BadMessage files,
if any,
will be created in the same places as they are now. Blah, blah,
blah.
Other technical tidbits
The Comanche web server is exposed on port 9090
and the Seaside parsing application service is exposed on port 9091.
Both Comanche and Seaside run as independent processes inside the
running
Squeak image. The home directory for the Comanche web server is assumed
to be the same directory as that in which the parser .exe and image
files live.