.. _operate-retention: ===================== Retention and cleanup ===================== The TOA server enforces a single, time-based retention policy on imports: any per-date folder older than ``toa-server.dataRetention`` is deleted automatically by a background sweep. Operators do not need to write cleanup scripts for the common case - the sweep covers it. How the sweep works =================== A daemon thread inside the server runs the sweep: * once when the application context finishes starting up (so a long downtime catches up immediately on restart), * and then once every 24 h for as long as the process lives. For each configured domain the sweep lists the entries directly under ``//`` and deletes every subfolder whose name parses as ``yyyy-MM-dd`` and whose date is strictly before ``today - dataRetention``. The deletion is recursive (the entire date folder, including all its imports, document folders, page binaries and metadata sidecars). Today's date folder is never touched. Tuning the window ================= The window is set in :ref:`configure-server` (``toa-server.dataRetention``, default 30 days). Choose it from the longest interval over which you realistically need to investigate a failed import or re-export an already submitted one. After the window expires, the data is gone - plan accordingly. Setting ``dataRetention: 0d`` disables the sweep. Use this only if a separate process (e.g. a snapshotting backup tool) takes over the cleanup; otherwise ``dataRoot`` will grow without bound. What the sweep does *not* delete ================================ * The cached templates XML (``//cmserver2.xml``) - it has no date in the name and is regenerated by :ref:`configure-templates`. * Date folders under domains that are no longer in ``toa-server.domains[]``. If you remove a domain from the configuration, its data folder stays on disk in full. Delete it manually if you want the space back. * Anything outside ``dataRoot``. Manual cleanup ============== Because the on-disk layout is plain dated folders (see :ref:`operate-data-layout`), an operator can supplement or replace the automatic sweep with standard tools: .. code-block:: bash # Same effect as a one-shot sweep with N=14: find /var/lib/toa-server/data// -maxdepth 1 -type d \ -regex '.*/[0-9]{4}-[0-9]{2}-[0-9]{2}' \ -mtime +14 -exec rm -rf {} + This is also the way to recover space after a domain has been removed from configuration: the automatic sweep no longer touches that path, so a one-shot ``rm -rf`` on the whole domain folder is the right tool. Backups ======= The retention policy is destructive and runs without confirmation. If you need a longer-term archive, snapshot ``dataRoot`` to external storage on a schedule shorter than ``dataRetention``. Filesystem-level snapshots (LVM, ZFS, btrfs) are the cheapest option because the on-disk layout is append-only at the per-import level - no database to flush, no consistent-cut concerns beyond ``import.json`` and the adjacent ``page-N.bin`` files in the same folder.