The content of any given message may be small or large. With multiple attachments, the total content size for a single message can exceed 10 megabytes. Storing all message content for an account that may contain more 100,000 messsages in the database may be unwise. For this reason, DArcMail allows you to divide the storage for message content between the database and external storage. Here are the rules:
- All header information for all messages is stored directly in the database. The content of any given message may be stored in the database or externally.
- The mbox format divides the content of a message into "parts". In the DArcMail database, each message part is stored separately; the maximum size for database storage of a single message part is 60,000 bytes.
- When loading an email account (or loading one email folder of an account), you may specify the upper size limit for database storage of a message part. If you specify 0 for this upper limit, then all parts of all messages will be stored externally. If you specify 60,000 for this upper limit, all parts that are 60,000 or smaller will be stored in the database.
If a message part is stored externally, it is stored in a file located in the same directory as the mbox file for the the email folder that contained the message. File names for externally stored message parts are constructed from UUIDs (universally unique identifiers). They look like this:Depending on the number of messages, the number of parts, and the upper size limit you specify, there may be thousands of these external files. To prevent the size of the directory from becoming too large, you can tell the account loader program to distribute the externally stored content in subdirectories of the email folder directory. If you do request storage in subdirectories, then the loader program will create 1296 subdirectories ('aa','ab',...,'az','a0',a1',...,'a9','ba','bb',...,'z9',...,'99') within the folder directory and will distribute the content files among these subdirectories. The subdirectory in which a file is stored is determined by the the 10th and 11th characters in the file name; since these characters are randomly distributed in a set of UUIDs, each subdirectory will contain roughly the same number of files. Subdirectories are created only as needed: if an email account has only one externally stored message part, then only one subdirectory will be created. When you delete a folder or account, if you specify that externally stored content also be deleted, then an empty subdirectory will also be deleted.60c675f1-a16e-4295-a20f-ae3d31405754.raw
d50e05ed-ed31-4e25-84d0-62bb660cf727.raw
It is possible for a single piece of content to occur in more than one message. For example, a signature block may be in its own message part, occurring in thousands of messsges. Likewise, a large attachment may be an attachment to several messages. DArcMail optimizes storage in the following way:
- When processing a message part, the load program computes the SHA1 checksum for the part.
- The database is searched to see if the database already contains a message part which (a) is part of message in the same account and the same folder as the folder currently being loaded, and (b) has the same MD5 checksum. If a match is found, then instead of duplicating storage, the program associates the existing part with the new message.
- If shared message part is an attachment, then it typically has a message-specific name. So for attachments, the database stores the message-specific name any shared content.
- The same optimization logic applies to message parts stored internally in the database and to message parts stored externally. However, when a part is to be stored internally, then only exsiting internally stored parts are searched, and when a part is to be stored externally, then only existing externally stored parts are searched. This condition is an automatic consequence of these rules: (a) Storage optimization applies only within a single email folder. (b) When loading a folder, the same "max size for internal storage" parameter applies to all message parts within the folder.