Maintenance and health workflows

Org2 treats corpus maintenance as a core compiler capability: cleanup, graph repair, and review become part of the normal workflow.

A knowledge corpus has source files, derived artifacts, links, IDs, provenance, publish outputs, editor indexes, and generated views. The compiler model gives Org2 one place to parse those files, check trust boundaries, and report or repair drift before it reaches publishing or daily planning.

Core passes Org2 should own

The maintenance story groups existing and planned checks into a few explicit passes:

Format normalization with org2 fmt: keep syntax and prose layout stable enough for clean diffs and generated edits.
Corpus lint / health with org2 lint: validate artifact metadata, IDs, provenance references, and corpus-flow conventions.
Link graph health with org2 backlinks, org2 query, and org2 roam graph: surface broken or ambiguous knowledge links before they become stale navigation.
Generated artifact maintenance with org2 publish and export options: rebuild HTML/views from source files instead of editing generated outputs by hand.
Editor index maintenance with org2 roam db-sync and LSP features: keep file IDs and editor intelligence aligned with the corpus model.

Recommended maintenance loop

Run a lightweight check before committing note changes:

npm run org2 -- fmt --dir ~/notes --recursive --check
npm run org2 -- lint --dir ~/notes --recursive

Run the repair-style passes intentionally, review their diffs, then commit the result:

npm run org2 -- fmt --dir ~/notes --recursive --apply
npm run org2 -- roam db-sync --dir ~/notes --recursive --apply
npm run org2 -- publish docs-site --config org2.json --preview

Use org2 roam graph --format report for an inspectable local maintenance view, including orphan/high-degree nodes, alias collisions, unresolved/ambiguous links, and linkify suggestions with source locations. Use --format json on lint, backlink, query, graph, and roam workflows when wiring these checks into scripts or CI; graph JSON carries the same maintenance payload under maintenance.

Repository generated artifact hygiene

The Org2 repository intentionally keeps a small set of generated outputs under version control so releases, docs hosting, and spec conformance are inspectable from a checkout:

Path	Source-controlled?	Why
`dist/`	yes	TypeScript build output used by the package `bin` entrypoints and npm distribution.
`site/`	yes	Published HTML docs generated from `docs/site/*.org` for static hosting.
`spec/v0/tests/*.json`	yes	Expected canonical AST fixtures paired with sibling `*.org` inputs.
ad hoc `compiled/`, `views/`, local reports, temp exports	no	Disposable corpus outputs unless a maintainer explicitly promotes them into docs, fixtures, or another reviewed artifact.

Before committing changes that affect parser output, docs publishing, or TypeScript build output, run:

npm run check:generated

This rebuilds dist/, validates fixture pairs with tools/fixture-runner.mjs --e2e, republishes the docs site, and then runs git diff --exit-code -- dist site spec/v0/tests. If it fails, review and commit intentional generated diffs alongside the source change; otherwise fix the source of nondeterminism before committing.

Trust boundaries

Org2 should preserve a clear distinction between source files and generated artifacts.

Source notes are edited by people and tools.
Compiled views, publish outputs, and generated reports should carry artifact role/provenance metadata when they live in the corpus.
Lint should flag missing or stale provenance before derived content is trusted as source.
Repair passes should default to preview/check modes where possible and require --apply before writing.

Current concrete workflow

Today, org2 lint is the main corpus-health command. It checks artifact metadata, duplicate IDs, unresolved provenance references, graph link health (broken id: links, unresolved wiki links, and ambiguous wiki labels), and conventional corpus-flow directories such as raw/, notes/, compiled/, views/, and publish/.

The same diagnostics are available in --format json for CI/editor consumers, so VS Code and scripts can surface corpus trust-boundary problems without scraping report text.

For command details, see Tooling reference: Corpus lint.