External LLM consumer pattern

Cited Org2 context without provider calls in core

Org2 is useful to LLM assistants because it compiles local plain-text notes into cited, machine-readable evidence. The core Org2 CLI should stay deterministic: it parses, queries, lints, and compiles a corpus, but it does not need an LLM provider or hosted service to do that work.

This page describes the recommended boundary for OpenClaw, Claude, Codex, local models, or any other external agent. For the proposed in-repo AI job architecture, see AI processing architecture.

Boundary

Org2 guarantees local compiler outputs:

stable command outputs such as org2 query --format json and org2 compile corpus
source files, line ranges, heading ancestry, matched lines, and snippets for citations
artifact roles, provenance metadata, graph links, backlinks, agenda state, and lint diagnostics
no network or provider dependency for non-AI commands

External LLM consumers handle model-specific work:

ranking which cited sources matter for a prompt
summarizing or answering from the provided context
proposing edits, links, TODOs, or generated draft artifacts
choosing providers, model names, credentials, and retry policy outside note files

Generated text should remain reviewable. Write it to views/, compiled/, or a patch/draft artifact with provenance; promote it to notes/ only after human review.

Query-to-context workflow

Use org2 query to fetch grounded context before asking an LLM to answer a question.

npm run org2 -- query "What did we decide about Project Alpha?" \
  --dir ~/notes \
  --recursive \
  --subtree \
  --answer-context \
  --format json \
  > /tmp/org2-query.json

node examples/external-llm-context.mjs --limit=6 < /tmp/org2-query.json \
  > /tmp/org2-llm-context.txt

The formatter is intentionally tiny and provider-free. It turns Org2 query JSON into a context packet with source markers such as [S1] and explicit file:line citations. An external model prompt can then say: “Use only the evidence below. Preserve citations as [S1], [S2].”

Prompt skeleton

You answer questions from an Org2 knowledge base.

Rules:
- Use only the supplied Org2 context packet.
- Cite every factual claim with [S1], [S2], etc.
- If the evidence is missing or contradictory, say so.
- Do not invent file paths, dates, people, or decisions.

<context>
... contents of /tmp/org2-llm-context.txt ...
</context>

Question: What did we decide about Project Alpha?

Compiled-corpus workflow

For larger agents, compile a corpus artifact and let the external orchestration layer decide retrieval, ranking, and model calls.

npm run org2 -- compile corpus --dir ~/notes --recursive --out compiled/corpus.json
npm run org2 -- lint --dir ~/notes --recursive --format json > views/corpus-health.json

The compiled artifact supplies file hashes, heading nodes, IDs, aliases, links, backlinks, planning data, properties, snippets, and source ranges. Keep provider credentials and model configuration outside the corpus artifact. If an agent saves a report or summary, stamp it with generated-artifact metadata described in Corpus flow.

Agent thread records

Agent conversations that matter should be represented as durable Org2 headings instead of only living in an external chat transcript. Use KIND: agent-thread for a thread record and keep the remote session reference separate from selected context.

* Thread: Firebolt report help
:PROPERTIES:
:ID: thread-1
:KIND: agent-thread
:AGENT: openclaw
:SESSION: openclaw:session:abc123
:STATUS: active
:CONTEXT: id:report-1, file:reports/firebolt.csv, ticket:REP-52
:TRANSCRIPT: file:threads/thread-1.transcript.org2
:STORAGE: summary
:END:

** Context attachments
- [[id:report-1][Firebolt report]]
- [[file:reports/firebolt.csv][latest CSV artifact]]

** Durable outputs
- [ ] Follow up on validation notes.

The agent and session fields identify the external worker/session. context is a comma- or semicolon-separated list of typed references such as id:, file:, ticket:, report:, entity:, artifact:, or url:. The body can also include normal Org2 links under a Context attachments section for readable labels and editor navigation.

org2 agent fetch --id thread-1 --format json exposes this as thread.contextAttachments along with thread.agent, thread.session, thread.status, thread.transcript, and thread.storage. Attachments that point at known id:, file:, wiki/note targets, or exact typed refs such as report:report-1 include a target object with the resolved title, file, line range, and citation.

Fetching the selected context object goes the other direction: org2 agent fetch --id report-1 --format json includes relatedThreads when agent-thread records attach to that node. Each related thread carries its id/title/session/status/citation plus matchingAttachments, so a Workspace or Agenda client can show “open existing thread” from a TODO/report/entity without scanning every thread itself. This preserves citations and avoids raw transcript storage unless the user chooses it.

For prompt-ready context, org2 context --id report-1 --format markdown renders the selected node directly as a cited context pack. When attached agent-thread records exist, the rendered pack includes a Related agent threads section with the session/status and matching attachments so a client can decide whether to open an existing OpenClaw session or start a new one. Selecting the thread itself with org2 context --id thread-1 renders a Selected agent threads section with agent/session/status/storage/transcript metadata plus each resolved context attachment, so OpenClaw handoffs can recover what the thread is about without scraping the thread body.

Collaboration and workflow state

Shared human/agent work should keep enough routing and policy state in the Org2 document for every client to agree on what can happen next. Use normal properties on the task, report, draft, or thread heading rather than hiding that state inside one chat UI.

* TODO Send support follow-up
:PROPERTIES:
:ID: support-followup-1
:OWNER: Casey
:ASSIGNEE: openclaw
:AGENT: codex
:NEXT_ACTION: Draft the reply for Casey's approval
:WAITING_ON: Casey approval
:LIFECYCLE: review
:REQUIRES_HUMAN_APPROVAL: true
:ALLOW_AGENT_EDIT: true
:ALLOW_EXTERNAL_SEND: false
:SESSION: openclaw:session:triage-1
:RUN_ID: run-42
:RUN_STARTED_AT: 2026-05-15T10:00:00-07:00
:RUN_LOG: file:runs/support-followup-1.log
:SOURCE_ARTIFACTS: file:tickets/support.json, artifact:reports/ticket-volume.csv
:HANDOFF_SUMMARY: Draft is ready; do not send externally until approved.
:HANDOFF_LINKS: id:support-decision-1
:END:

The first stable convention is property-backed and intentionally small:

ownership/routing: OWNER, ASSIGNEE, AGENT, NEXT_ACTION, WAITING_ON
lifecycle: LIFECYCLE, WORKFLOW_STATE, or WORKFLOW_STATUS
policy gates: REQUIRES_HUMAN_APPROVAL, ALLOW_AGENT_EDIT, ALLOW_EXTERNAL_SEND
run/session metadata: SESSION, RUN_ID, RUN_STARTED_AT, RUN_FINISHED_AT, RUN_LOG, SOURCE_ARTIFACTS
handoff metadata: HANDOFF_SUMMARY, HANDOFF_NEXT_ACTION, HANDOFF_LINKS

org2 agent fetch --id support-followup-1 --format json exposes these fields under collaboration. Policy gates are booleans, and source/handoff references use the same typed attachment shape as agent threads, resolving known id:, file:, report:, entity:, and related refs when possible. org2 context --id support-followup-1 renders a Collaboration state section with owner, assignee, lifecycle, next action, waiting state, policy gates, run metadata, and handoff summary.

These conventions define the minimum shared state needed for CLI, Workspace, OpenClaw, Codex, and future clients to render the same next action and avoid unsafe external sends or silent canonical edits.

Data and query records

Data-backed reports should keep executable details and materialized artifacts as typed headings. Use KIND: warehouse-query for external systems, KIND: dataset for local/ad hoc sources, and KIND: data-link, sql-view, event-stream, or timeline-link for adjacent projections.

* Report: package fetches
:PROPERTIES:
:ID: report-fetches
:KIND: report
:END:

** Data link: package fetches by company
:PROPERTIES:
:ID: query-fetches-by-company
:KIND: warehouse-query
:SYSTEM: clickhouse
:DATABASE: scarf_analytics
:SCHEMA: package_usage
:TABLE: package_fetches_by_company
:COLUMNS: company_id:string, package:string, fetches:int
:PRIMARY_KEY: company_id, package
:PARTITION_BY: package
:SORT_BY: fetches:desc, company_id:asc
:DIMENSIONS: company_id, package
:MEASURES: fetches
:GRAIN: daily
:FILTERS: package = 'firebolt/foo'; fetches > 0
:GROUP_BY: company_id, package
:TIME_COLUMN: fetched_at
:TIMEZONE: America/Los_Angeles
:QUERY_ID: scarf.package_fetches_by_company.v1
:QUERY: SELECT company_id, package, fetches FROM package_fetches_by_company WHERE package = {package}
:PARAMS: {"packages":["firebolt/foo"],"from":"2026-01-01"}
:CREDENTIAL_REF: env:SCARF_ANALYTICS_TOKEN
:CONFIG_REF: profile:product-analytics
:LAST_RUN: 2026-06-12T12:30:00-07:00
:ARTIFACT: customer-reports/firebolt/package_fetches_by_company.csv
:ROW_COUNT: 1234
:RESULT_LIMIT: 500
:SAMPLE_SIZE: 250
:SAMPLE_RATE: 10%
:SAMPLING_METHOD: stratified
:COVERAGE: customers-with-package-fetches
:WINDOW_START: 2026-01-01
:WINDOW_END: 2026-06-12
:FRESHNESS_SLA: 2h
:WATERMARK: 2026-06-12T12:00:00-07:00
:DATA_LATENCY: 30m
:AVAILABILITY: available
:BACKFILL_STATUS: complete
:FRESHNESS: live
:REFRESH_REF: query-data:package_fetches_by_company
:REFRESH_COMMAND: org2 query-data --file reports/firebolt.org2 --results package_fetches_by_company --out views/package_fetches_by_company.org2
:REFRESH_STATUS: ready
:REFRESH_AFTER: 24h
:NEXT_REFRESH: 2026-06-13T12:30:00-07:00
:VALIDATION_STATUS: sampled
:VALIDATION_AT: 2026-06-12T13:00:00-07:00
:VALIDATION_BY: Casey
:VALIDATION_NOTE: Compared row counts against the dashboard export.
:CONFIDENCE: high
:VALIDATION_REFS: file:validation/package-fetches-check.md
:DATA_OWNER: Product Analytics
:DATA_STEWARD: Casey
:SENSITIVITY: customer-private
:VISIBILITY: internal
:ACCESS_POLICY: approval-required
:RETENTION: 90d
:LINEAGE_REFS: query:scarf.raw_package_fetches.v1, artifact:warehouse/raw_fetches.parquet
:DATA_CONTRACT: contract:package-fetches-v1
:SCHEMA_VERSION: v3
:QUALITY_STATUS: passed
:QUALITY_SCORE: 0.98
:QUALITY_CHECKS: row-count-reconciled, no-null-company-id
:QUALITY_NOTE: Latest warehouse checks passed before materialization.
:END:

Keep credentials and secrets outside org2 files. Store only stable query IDs, non-secret query text or params, relation shape metadata such as DATABASE, SCHEMA, TABLE, VIEW, COLUMNS, PRIMARY_KEY, PARTITION_BY, SORT_BY, DIMENSIONS, MEASURES, GRAIN, FILTERS, GROUP_BY, TIME_COLUMN, and TIMEZONE, safe credential/config handles such as env:SCARF_ANALYTICS_TOKEN or profile:product-analytics, source paths/URIs, row counts, result limit/window/sampling metadata, operational readiness metadata such as FRESHNESS_SLA, WATERMARK, DATA_LATENCY, AVAILABILITY, and BACKFILL_STATUS, freshness, artifact references, query hashes, materialization state, provenance refs, source hashes, lineage refs, data contracts/schema versions, non-secret refresh handles, validation/confidence notes, quality status/checks, and lightweight ownership/access metadata such as DATA_OWNER, DATA_STEWARD, SENSITIVITY, VISIBILITY, ACCESS_POLICY, and RETENTION. This lets agents trace claims back to data and understand handling constraints without embedding warehouse-scale rows in notes or leaking inline secrets.

org2 agent fetch --id query-fetches-by-company --format json exposes this as dataLink metadata, including kind, system, database, schema, table, view, parsed columns, primaryKey, partitionBy, sortBy, dimensions, and measures arrays, grain, filters, groupBy, timeColumn, timezone, source / path, queryId, query, queryHash, parsed JSON params when possible, safe credentialRef / configRef values, artifact, rowCount, resultLimit, resultOffset, sampleSize, sampleRate, samplingMethod, coverage, windowStart, windowEnd, lastRun, freshnessSla, watermark, dataLatency, availability, backfillStatus, freshness, materialized, optional refresh metadata (refreshRef, refreshCommand, refreshStatus, refreshAfter, and nextRefresh), validation metadata (validationStatus, validationAt, validationBy, validationNote, confidence, and validationRefs), ownership/access metadata (dataOwner, dataSteward, sensitivity, visibility, accessPolicy, and retention), optional lineageRefs from LINEAGE_REFS / DATA_LINEAGE / UPSTREAM_REFS, dataContract, schemaVersion, quality metadata (qualityStatus, qualityScore, qualityChecks, and qualityNote), optional provenance refs from ORG2_PROVENANCE, and optional sourceHashes from ORG2_SOURCE_HASHES. A local DuckDB-style dataset can use KIND: dataset with ENGINE, PATH, TABLE, COLUMNS, PARAMS, and RESULTS, and the same dataLink field is returned for Workspace/search clients.

Fetching a parent report or file also exposes descendant data/query records as relatedDataLinks. For example, org2 agent fetch --id report-fetches --format json includes child warehouse-query and dataset headings with their citations, claimState review/freshness metadata, and compact dataLink metadata. The prompt-ready renderer mirrors this with a Related data links section, so a Workspace client can show the report's evidence, relation shape, partition/sort hints, semantic dimensions/measures/grain, filters/grouping/time-bucket hints, source path/URI, non-secret query text and params, artifacts, row counts, result limits/windows/sampling notes, data freshness/SLA/watermark/latency/availability/backfill state, materialization state, refresh status, validation/confidence state, lineage/contract/quality state, access/retention constraints, claim review state, and source-hash snapshot next to the selected note without reparsing the org2 file.

Data/query records can also live outside the selected report subtree, such as in a central data catalog. Point them at the selected note with CONTEXT: id:report-fetches, TARGET_ID: report-fetches, REPORT_ID: report-fetches, or a normal [[id:report-fetches]] link. relatedDataLinks includes those referenced records with matchingAttachments so clients can explain why the query belongs with the selected report/entity.

When the selected node is itself a warehouse-query, dataset, or other data-link record, org2 context --id query-fetches-by-company also includes that selected record in Related data links. This gives an OpenClaw or Workspace handoff the same structured query/artifact/freshness metadata whether the user starts from a parent report or directly from the data node.

Event-backed records use the same dataLink envelope, but emphasize timeline provenance instead of result rows. Use KIND: event-stream for an append-only external stream and KIND: timeline-link for a curated projection over one or more streams.

* Event stream: support state changes
:PROPERTIES:
:ID: support-state-changes
:KIND: timeline-link
:SYSTEM: linear
:TIMELINE: support.ticket.lifecycle
:EVENT_TYPE: status_changed
:ENTITY: ticket:SUP-42
:ACTOR: Casey
:OCCURRED_AT: 2026-05-15T09:30:00-07:00
:CAPTURED_AT: 2026-05-15T09:31:00-07:00
:SOURCE_CURSOR: linear:SUP-42:9
:STREAM_POSITION: 9
:PARTITION_KEY: ticket:SUP-42
:CORRELATION_ID: support-sync-42
:CAUSATION_ID: webhook-delivery-42
:CHANGE_ID: change-42
:CONTEXT: id:support-followup-1
:END:

org2 agent fetch exposes event/timeline metadata as eventStream, timeline, eventType, entity, actor, occurredAt, capturedAt, sourceCursor, streamPosition, partitionKey, correlationId, causationId, changeId, and changeHash when present. This lets Workspace and OpenClaw clients answer "what changed and why?", replay or order durable activity, and group related changes with cited pointers into an event ledger while keeping high-volume event bodies in the source system.

What to save back

Good generated outputs are inspectable artifacts, not silent canonical edits:

views/project-alpha-brief.org2 for a cited briefing
views/weekly-digest.org2 for a generated report
compiled/retrieval-index.json for a derived machine artifact
patch files or draft notes for proposed canonical changes

Generated Org files should include properties such as ORG2_ARTIFACT_ROLE, ORG2_PROVENANCE, ORG2_GENERATOR, and ORG2_GENERATED_AT. Run org2 lint before trusting or publishing generated outputs.

Minimal answer contract

When an external agent answers from Org2 data, require:

cited claims using Org2 source markers
a short list of source files or IDs used
a caveat when the answer came from stale or generated artifacts
no automatic write to canonical notes unless the user explicitly applies/promotes it