5.2. Retention and cleanup
The TOA server enforces a single, time-based retention policy on
imports: any per-date folder older than toa-server.dataRetention
is deleted automatically by a background sweep. Operators do not need
to write cleanup scripts for the common case - the sweep covers it.
5.2.1. How the sweep works
A daemon thread inside the server runs the sweep:
once when the application context finishes starting up (so a long downtime catches up immediately on restart),
and then once every 24 h for as long as the process lives.
For each configured domain the sweep lists the entries directly under
<dataRoot>/<code>/ and deletes every subfolder whose name parses
as yyyy-MM-dd and whose date is strictly before
today - dataRetention. The deletion is recursive (the entire date
folder, including all its imports, document folders, page binaries
and metadata sidecars). Today’s date folder is never touched.
5.2.2. Tuning the window
The window is set in Server settings (toa-server.dataRetention,
default 30 days). Choose it from the longest interval over which you
realistically need to investigate a failed import or re-export an
already submitted one. After the window expires, the data is gone -
plan accordingly.
Setting dataRetention: 0d disables the sweep. Use this only if a
separate process (e.g. a snapshotting backup tool) takes over the
cleanup; otherwise dataRoot will grow without bound.
5.2.3. What the sweep does not delete
The cached templates XML (
<dataRoot>/<code>/cmserver2.xml) - it has no date in the name and is regenerated by Template catalogue.Date folders under domains that are no longer in
toa-server.domains[]. If you remove a domain from the configuration, its data folder stays on disk in full. Delete it manually if you want the space back.Anything outside
dataRoot.
5.2.4. Manual cleanup
Because the on-disk layout is plain dated folders (see On-disk layout), an operator can supplement or replace the automatic sweep with standard tools:
# Same effect as a one-shot sweep with N=14:
find /var/lib/toa-server/data/<domain>/ -maxdepth 1 -type d \
-regex '.*/[0-9]{4}-[0-9]{2}-[0-9]{2}' \
-mtime +14 -exec rm -rf {} +
This is also the way to recover space after a domain has been removed
from configuration: the automatic sweep no longer touches that path,
so a one-shot rm -rf on the whole domain folder is the right tool.
5.2.5. Backups
The retention policy is destructive and runs without confirmation.
If you need a longer-term archive, snapshot dataRoot to external
storage on a schedule shorter than dataRetention. Filesystem-level
snapshots (LVM, ZFS, btrfs) are the cheapest option because the
on-disk layout is append-only at the per-import level - no database
to flush, no consistent-cut concerns beyond import.json and the
adjacent page-N.bin files in the same folder.