Corpus flow

Canonical zones for Org2 knowledge corpora

Org2 treats a corpus like a small local knowledge compiler. Source material moves through explicit zones so humans and tools can tell which files are authored, which files are derived, and which boundaries need provenance.

The canonical flow is:

raw/ -> notes/ -> compiled/ -> views/ -> publish/

You do not need every zone on day one. The value is that when a corpus adopts these names, Org2 lint can check the trust boundary instead of relying on convention alone.

Zones

DirectoryRoleCanonical?Purpose
raw/rawsource evidenceInbox material, imports, captures, transcripts, clippings, and other minimally edited inputs.
notes/canonicalyesHuman-authored durable notes, tasks, decisions, IDs, links, and properties. This is the primary corpus.
compiled/compiledgeneratedNormalized or aggregated machine outputs derived from raw/canonical inputs.
views/viewgeneratedQuery results, indexes, dashboards, agenda snapshots, graph views, or other read models.
publish/reportgenerated/public-facingExported reports, sites, feeds, handoff documents, or other presentation outputs.

Canonical authorship lives in notes/. Generated outputs may be useful and may be committed, but they should be reproducible from provenance whenever possible.

Trust boundaries

The most important boundary is between canonical material and generated material:

  • raw/ can be messy and externally sourced; preserve it when evidence matters.

  • notes/ is the reviewed human source of truth.

  • compiled/, views/, and publish/ are derived outputs and should say where they came from.

  • Generated outputs should not silently become canonical notes. Promote useful generated text by copying or rewriting it into notes/ and reviewing it there.

Artifact metadata

Use file-level or headline-level property drawers to declare artifact roles and provenance.

Canonical note example:

:PROPERTIES:
:ID: project-alpha
:ORG2_ARTIFACT_ROLE: canonical
:END:

Generated view example:

:PROPERTIES:
:ID: project-alpha-dashboard
:ORG2_ARTIFACT_ROLE: view
:ORG2_PROVENANCE: id:project-alpha, query:todo-status-open
:ORG2_GENERATOR: org2 query --format org
:ORG2_GENERATED_AT: 2026-05-10T22:30:00Z
:ORG2_SOURCE_HASHES: file:notes/project-alpha.org2=sha256:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
:ORG2_REVIEW_STATUS: review-required
:END:

The standard generated-artifact property set is:

  • ORG2_ARTIFACT_ROLE records the trust boundary: raw, canonical, compiled, view, or report.

  • ORG2_PROVENANCE records source references as kind:value tokens.

  • ORG2_GENERATOR records the deterministic command, job, or workflow that produced the artifact.

  • ORG2_GENERATED_AT records an ISO date or timestamp for the generation run.

  • ORG2_SOURCE_HASHES optionally records content hashes for specific inputs as kind:value=sha256:<64 hex chars>.

  • ORG2_REVIEW_STATUS records whether generated content is generated, review-required, reviewed, or promoted.

The supported ORG2_ARTIFACT_ROLE values are:

  • raw

  • canonical

  • compiled

  • view

  • report

The conventional directory mapping is:

  • raw/ -> raw

  • notes/ -> canonical

  • compiled/ -> compiled

  • views/ -> view

  • publish/ -> report

ORG2_PROVENANCE entries use kind:value tokens separated by commas or semicolons. Supported kinds are id, file, query, run, url, note, and artifact. ORG2_SOURCE_HASHES uses the same source kinds plus a SHA-256 digest, for example file:notes/project-alpha.org2=sha256:....

Lint workflow

Run lint over the corpus root:

npm run org2 -- lint --dir /path/to/corpus --recursive

When files live under the conventional directories, lint checks that their declared ORG2_ARTIFACT_ROLE matches the path. It also checks generated artifacts for provenance, generator metadata, generated timestamps, source-hash syntax, review-status values, duplicate IDs, and unresolved provenance references.

A minimal corpus can start like this:

corpus/
  raw/
  notes/
  compiled/
  views/
  publish/

Then wire editor, query, export, and publish commands to read from notes/ and write derived results into compiled/, views/, or publish/ rather than back into canonical notes.