5.1. On-disk layout
Everything the TOA server keeps for itself lives under
toa-server.dataRoot (see Server settings). The layout is
plain folders and files - no database - so an operator with shell
access can answer support questions and do manual cleanup with
ordinary tools.
5.1.1. Top-level structure
<dataRoot>/
<domain.code>/ one folder per configured domain
cmserver2.xml cached templates (URL-based domains only)
yyyy-MM-dd/ one folder per server-local calendar day
HHmm-xxxxxxxx/ one folder per import (HHmm + 8 hex chars)
import.json
doc-1/
pages/
page-1.bin
page-1.meta.json EML pages only
page-2.bin
...
doc-2/ sibling document, see below
pages/
page-1.bin
...
The per-domain folder is created on startup. The
yyyy-MM-dd folder is created the first time an import lands on
that calendar day. The HHmm-xxxxxxxx folder is created when the
add-in calls POST /import/<domain>. Nothing else writes into
<dataRoot>.
The yyyy-MM-dd and HHmm parts use the server-local time
zone. The same zone governs the retention cutoff, so a misconfigured
zone has visible consequences both here and in
Retention and cleanup. Pin the JVM time zone explicitly on every
TOA server instance - see Time zone.
5.1.2. Import identifier
The API-level import id has the form:
yyyy-MM-dd_HHmm-xxxxxxxx
The two halves correspond directly to the date folder and the import folder on disk. Given an id, an operator can locate the import on disk without searching:
<dataRoot>/<domain>/<yyyy-MM-dd>/<HHmm-xxxxxxxx>/
The xxxxxxxx suffix is 8 random hex characters; it makes the id
unguessable for download URLs and keeps imports unique within the
same minute.
5.1.3. What each file means
import.jsonSingle source of truth for an import. Operator-relevant fields:
status-DRAFT,PENDING,SUBMITTEDorFAILED.FAILEDcarries anerrormessage;SUBMITTEDcarriesdamisBatchIdfor cross-referencing the storage server.userName/userEmail- whoever created the import in Outlook.userEmailis also the ownership key enforced on subsequent mutations.documents[]- the list of documents in this import; each entry carries its template id, attribute values andpages[]metadata (filename, byte size, content type, sidecar filename if any).
The file is rewritten atomically (write to
import.json.tmp, thenATOMIC_MOVE). If you ever seeimport.json.tmpleft over, the server crashed mid-rewrite - it is safe to delete; the previousimport.jsonis intact.doc-N/One folder per document inside the import.
doc-1always exists and corresponds to the original message the user uploaded.doc-2,doc-3, … are sibling documents created by the “extract attachments” flow - each holds the attachments split out of an EML page indoc-1(or a later sibling).doc-N/pages/page-M.binRaw page payload. The byte stream is whatever the client posted - typically an EML message for
page-1ofdoc-1, or a single extracted attachment for sibling-document pages. The.binextension is intentional; the semantic content type lives inimport.jsonand (for EML) in the sidecar.doc-N/pages/page-M.meta.jsonSidecar produced for
message/rfc822pages only. Contains the decodedfrom/subjectand the list of MIME attachments (filename, content type, decoded size). It is a convenience index - if the sidecar is missing or unreadable, the.binis still the source of truth and the server falls back to re-parsing on demand. Sidecars are rewritten when attachments are extracted, so they always match the on-disk EML.cmserver2.xmlOnly present for domains whose templates are loaded from a URL (see Template catalogue). Cached copy of the last successfully downloaded catalogue; used as the fallback when the next refresh fails. Safe to delete - the next refresh re-downloads it. If you delete it while the remote URL is also unreachable, the domain has no catalogue until either the URL recovers or you drop in a copy by hand.
5.1.4. Atomicity guarantees
The layout is designed so that an operator’s mental model matches the filesystem state without race conditions:
A page binary file existing on disk implies the upload completed. Interrupted uploads leave no
page-N.binat all - never a half-written one. The controller streams the request body to a sibling.tmpfile andATOMIC_MOVEit into place.import.jsonand the page binaries inside the same import folder are mutated under a per-import lock held byDomainStorage, so concurrentaddPage/createDocumentFromAttachments/submitcalls cannot interleave their writes.Atomicity is per-import. Two different imports under the same date folder are independent; backing up or deleting one never affects the other.
5.1.5. Manual operations
Because the layout has no database the following are all safe shell operations, as long as the server is not actively writing to the target import:
Inspect an import:
cat <importPath>/import.json,ls <importPath>/doc-*/pages/.Archive an import:
tarorziptheHHmm-xxxxxxxxfolder. The server will not notice it is gone until the next API call references it.Delete a single import:
rm -rftheHHmm-xxxxxxxxfolder. The corresponding API id will then return 404.Bulk delete old date folders: see Retention and cleanup.
Do not rename folders or hand-edit import.json while the
server is running - the per-import lock is in-process only and
external renames will be observed mid-operation. Stop the server
first.