Rules & Invariants

What the system enforces and why

How Rules Work

The system enforces quality through rule files — machine-readable documents that tell each agent what to check, what to flag, and how severely to penalize violations. Every critic reads the relevant rules before reviewing. Every worker reads them before creating.

This page is a reference for all 11 rule files, the 22 content invariants, the severity gradient, and the quality gates. If you fork the repo and change nothing else, these rules still apply.

Rule Files Overview

Eleven files in .claude/rules/ govern the system. Each file has a clear scope — no rule appears in two places.

File	Purpose	Primary readers
`agents.md`	Worker-critic pairs, separation of powers, 3-strike escalation	Orchestrator, all agents
`content-invariants.md`	22 non-negotiable checks (paper, code, talk, traceability)	All critics, verifier, lint hook
`content-standards.md`	Table format, figure format, PDF processing, exploration protocol	Coder, writer, coder-critic, writer-critic
`lifecycle.md`	Pre-dispatch and post-completion validation protocol	Orchestrator
`logging.md`	Session report, research journal, and pipeline state formats	Orchestrator, all agents
`meta-governance.md`	Dual-nature rule (template vs. working project), learning promotion	Orchestrator, user
`permissions.md`	Agent registry: phase, dependencies, outputs, escalation targets, quality weights	Orchestrator
`quality.md`	Scoring protocol, gate thresholds, severity gradient, deduction scaling	Orchestrator, all critics
`revision.md`	R&R cycle: comment classification and routing	Orchestrator, writer, coder
`workflow.md`	Plan-first protocol, orchestrator loop, dependency graph, context management	Orchestrator, user
`working-paper-format.md`	LaTeX preamble standard, title page format, bibliography setup	Writer, writer-critic, verifier

What each file governs

agents.md defines the adversarial pairing system. Every worker has a paired critic. Critics never create artifacts; creators never self-score. When a pair fails to converge after 3 rounds, the file specifies where to escalate (often back to the user for fundamental disagreements).

content-invariants.md lists the 22 non-negotiable rules that every agent checks. These are the “constitution” of the system — violations produce deductions, not suggestions. Critics cite invariant numbers (e.g., “violates INV-3”) in their reports. See Section 2 for the full table.

content-standards.md specifies how tables, figures, and PDFs should look. It covers booktabs formatting, coefficient display conventions (including AEA no-stars policy), preferred R packages (modelsummary, fixest::etable), figure typography (serif fonts, no in-plot titles), and the exploration folder lifecycle.

lifecycle.md defines PRE-dispatch and POST-completion validation. Before dispatching any agent, the Orchestrator checks that required input artifacts exist and contain the right sections. After completion, it verifies outputs were created and the critic score was recorded. This prevents launching agents with missing inputs or advancing past agents with missing outputs.

logging.md standardizes three persistence mechanisms: the session report (append-only human-readable log), the research journal (structured agent-by-agent entries with scores), and the pipeline state JSON (machine-readable status for session recovery).

meta-governance.md addresses the repo’s dual nature as both a working project and a public template. The “one rule” is simple: before committing, ask whether another researcher forking the repo would benefit. Project-specific state goes in .claude/state/; reusable patterns go in the repo itself. It also governs learning promotion — patterns validated across 3+ projects can be elevated to new invariants, but only with user approval.

permissions.md is the agent registry. Each agent’s entry declares its phase, parallel group, required inputs, expected outputs, paired critic, escalation target, and quality weight. The Orchestrator reads this file to determine what to dispatch and when. Adding a new agent means adding an entry here — no other file needs to change.

quality.md defines how individual critic scores aggregate into the project-wide score that gates submission. It also specifies the severity gradient — the same violation costs more in later phases. See Section 3 and Section 4 for details.

revision.md handles the R&R cycle. When real referee reports arrive, /revise classifies each comment as NEW ANALYSIS (routed to coder), CLARIFICATION (routed to writer), DISAGREE (flagged for user), or MINOR (writer handles directly). The system never autonomously pushes back on referees — DISAGREE items always require human judgment.

workflow.md covers the plan-first protocol (plan before coding, save plans to disk, get user approval), the orchestrator’s dependency-driven loop (identify, dispatch, review, verify, score), standalone vs. pipeline mode, and context management (compaction discipline, session recovery).

working-paper-format.md specifies the LaTeX preamble, title page conventions, abstract format, section ordering, and bibliography setup. It distinguishes required items (blocking deductions if violated) from recommended items (advisory, no penalty). The writer-critic checks every paper against this standard.

Content Invariants

These are non-negotiable. Violations produce deductions, not suggestions. Critics cite invariant numbers (e.g., “violates INV-7”) in their reports.

Full Invariant Table

#	Rule	Category	Enforced by	Typical deduction
INV-1	Every table has notes (variables, sample, source)	Paper	writer-critic	-5
INV-2	Every figure has a caption with note (what, how to read, source)	Paper	writer-critic	-5
INV-3	No `\hline` — use booktabs (`\toprule`, `\midrule`, `\bottomrule`). No vertical rules	Paper	writer-critic	-3
INV-4	Significance stars follow journal profile (AEA: no stars)	Paper	writer-critic	-3
INV-5	Abstract is 150 words or fewer	Paper	writer-critic	-5
INV-6	JEL codes and keywords present after abstract	Paper	writer-critic	-5
INV-7	Notation consistent across all sections	Paper	writer-critic	-5
INV-8	Every causal claim has a corresponding identification section	Paper	writer-critic	-10
INV-9	`biblatex` + `biber`, not `natbib` + `bibtex`	Paper	writer-critic, verifier	-3
INV-10	`hyperref` loaded second-to-last; `cleveref` loaded after it	Paper	writer-critic, verifier	-2
INV-11	Numbers in text match tables/figures exactly	Paper	writer-critic	-5
INV-12	No titles inside figures; titles go in LaTeX `\caption{}`	Paper	writer-critic	-3
INV-13	Scripts export bare `tabular` — no `\begin{table}`, `\caption`, or notes	Paper / Code	writer-critic, coder-critic	-3
INV-14	`set.seed()` called exactly once, at top, if stochastic	Code	coder-critic, verifier	-5
INV-15	All packages loaded at top, before any computation	Code	coder-critic, verifier	-3
INV-16	No absolute paths; use `here()` / `pathlib.Path` / `joinpath(@__DIR__)`	Code	coder-critic, verifier	-5
INV-17	No growing vectors in loops; pre-allocate or vectorize	Code	coder-critic	-3
INV-18	Output files go to the path specified in CLAUDE.md	Code	coder-critic	-3
INV-19	No prohibited functions (`setwd()`, `rm(list=ls())`, `install.packages()`, `attach()`)	Code	coder-critic, verifier	-5
INV-20	Talk notation matches paper exactly	Talk	storyteller-critic	-5
INV-21	Every claim on a slide is traceable to the paper	Talk	storyteller-critic	-5
INV-22	Every numerical claim has an entry in the claim-source map	Traceability	writer-critic	-5

Invariants by Category

Paper (INV-1 through INV-13)

Most paper invariants enforce standard economics formatting that journals expect. A few deserve explanation:

INV-4 (significance stars) adapts to the target journal. AEA journals prohibit significance stars entirely — report standard errors and confidence intervals instead. Working papers default to the * p < 0.10, ** p < 0.05, *** p < 0.01 convention. The journal profile determines which convention applies.

INV-7 (notation consistency) catches a common drafting problem: using \(\beta\) for the treatment effect in Section 3 but \(\delta\) in Section 6, or using \(i\) for individuals in one place and firms in another. The same symbol must mean the same thing everywhere.

INV-8 (causal claims need identification) prevents the most common referee complaint in empirical economics. If you write “X causes Y,” you need an identification section explaining how you separate causation from correlation. Descriptive papers should not use causal language at all.

INV-11 (numbers match) catches stale values — when you re-run the analysis and get 0.045 but the text still says 0.052 from a previous run. The writer-critic cross-references every number in the text against the tables and figures.

INV-13 (bare tabular export) enforces the division of labor between code and paper. R/Python/Julia scripts produce the raw table content. The main.tex file wraps it with \begin{table}, \caption{}, and notes. This means you can change a caption without re-running code, and code changes automatically flow into the paper.

Code (INV-14 through INV-19)

Code invariants enforce reproducibility and portability:

INV-14 (single seed) ensures stochastic results are reproducible. One set.seed() call at the top of the master script, not scattered throughout. Multiple seeds make debugging non-deterministic failures nearly impossible.

INV-16 (no absolute paths) is the single most common portability failure. Code that works on your machine but breaks on a collaborator’s because of /Users/yourname/... paths is not reproducible code. Use here::here() in R, pathlib.Path in Python, or joinpath(@__DIR__, ...) in Julia.

INV-19 (prohibited functions) bans functions that break reproducibility or are dangerous in production. setwd() changes global state and breaks when paths differ across machines. rm(list = ls()) destroys the workspace and breaks interactive debugging. install.packages() in scripts is a side effect that should not run automatically.

Talk (INV-20 through INV-21)

Talk invariants enforce fidelity between the presentation and the paper:

INV-20 (notation matches) prevents the common problem of simplifying notation for slides and introducing inconsistencies. If the paper uses \(\hat{\beta}_{ATT}\), the slides must use the same symbol — not \(\beta\) or \(\hat{\tau}\).

INV-21 (traceable claims) prevents “orphan results” that appear on slides but not in the paper. Every number, every claim, every figure on a slide must trace back to the manuscript.

Traceability (INV-22)

INV-22 (claim-source map) is the audit trail. Every numerical claim in the manuscript — “treatment effect of 4.5 percentage points” — must have an entry in quality_reports/claim_source_map_{project}.md that traces it to a specific script line and output file. This makes it possible for a referee (or a future you) to verify any number in the paper by following the chain from text to table to code.

Severity Gradient

Critics do not apply the same harshness at every stage. A missing citation in an early brainstorm is a gentle nudge; the same gap in a near-final manuscript is a serious deduction. The Orchestrator tells each critic which severity level to use.

Phase-Based Severity

Phase	Critic stance	Rationale
Discovery	Encouraging (low)	Early ideas need space to develop. Over-criticizing kills exploration.
Strategy	Constructive (medium)	Identification must be sound, but the critic should suggest alternatives, not just reject.
Execution	Strict (high)	Code and paper are near-final. Bugs found here are expensive to fix later.
Peer Review	Adversarial (maximum)	Simulates real referees. The goal is to find every weakness before a journal does.
Presentation	Professional (medium-high)	Talks should be polished, but scores are advisory — they do not block submission.

The Orchestrator includes the severity level explicitly in the critic’s prompt:

You are reviewing at SEVERITY: HIGH (Execution phase).
Flag all issues. Do not suggest "consider" --- state what must change.

Deduction Scaling

The same violation costs more in later phases. This table shows how four representative issues scale:

Issue	Discovery	Strategy	Execution	Peer Review
Missing citation	-2	-5	-10	-15
Notation inconsistency	-1	-3	-5	-5
Hedging language	—	—	-3	-5
Missing robustness check	—	-5	-15	-20

The principle: early phases are about getting the direction right. Late phases are about getting the details right. A notation inconsistency in Discovery is a reminder; the same inconsistency in Execution means the writer is not reading their own paper carefully enough.

Why “adversarial” in Peer Review?

The simulated referees are intentionally harsh because real referees are harsh. A system that gives you 95/100 and then a journal rejects the paper has failed. The adversarial stance exists to catch problems before submission, not after. If the paper survives the system’s referees, it is better prepared for the real ones.

Quality Gates

Quality scores determine what actions are allowed. The system uses a weighted aggregate of individual critic scores, with hard thresholds at each gate.

The Four Gates

Gate	Overall score	Per-component minimum	What it means
Commit	>= 80	None	Work can be committed to the repository
PR	>= 90	None	Work can be opened as a pull request
Submission	>= 95	>= 80 per component	Work can be submitted to a journal
Blocked	< 80	—	Work cannot be committed; issues must be fixed first

How Scores Aggregate

Each critic produces a score from 0 to 100 by starting at 100 and deducting for issues found. These individual scores combine via a weighted average:

Component	Weight	Source
Literature coverage	10%	librarian-critic
Data quality	10%	explorer-critic
Identification validity	25%	strategist-critic
Theory (when present)	20%	theorist-critic
Code quality	15%	coder-critic
Paper quality	25%	Avg(domain-referee + methods-referee)
Manuscript polish	10%	writer-critic
Replication readiness	5%	Verifier (pass/fail, mapped to 0 or 100)

Theory weight applies only to papers with a formal theory section (econometric methods, theory + empirics, structural identification). For applied papers using off-the-shelf estimators, the theory row is excluded and remaining weights renormalize.

When Components Are Missing

Not every project has every component. If a component has not been scored:

It is excluded from the weighted average
Remaining weights are renormalized proportionally
Example: no literature review yet — weights become 11%, 28%, 17%, 28%, 11%, 6%

This means you can run partial pipelines (e.g., just Strategy + Execution) and still get meaningful scores.

The Per-Component Minimum

For submission, the aggregate score must be >= 95 and no individual component can be below 80. A perfect literature review cannot compensate for broken identification. A beautiful manuscript cannot compensate for code that does not reproduce.

This rule prevents the “average-up” problem — where excellence in one area masks serious deficiencies in another.

Recovering from a Low Score

When a score falls below the gate threshold:

The Orchestrator identifies which components are blocking
It re-dispatches the relevant worker-critic pair to fix the flagged issues
After fixes, the critic re-reviews (up to 3 rounds per pair)
If 3 rounds fail, the system escalates — often back to the user for a design decision

The goal is convergence, not perfection on the first pass. Most artifacts reach 80+ by round 2.

The submission gate is intentionally high

A score of 95 is difficult to achieve. It requires all components to be polished, all robustness checks to pass, and the simulated referees to find no fatal issues. This is by design — the system should flag problems before a journal does, not after. If the gate feels too strict, the alternative is a rejected paper.

Working Paper Format

The working-paper-format.md rule file specifies the LaTeX standard that all papers must follow. The full reference preamble, title page conventions, and bibliography setup are documented there. Here is a summary of what the writer-critic enforces.

Required (Blocking Deductions)

Violation	Deduction
Wrong document class or font size	-5
Missing `\doublespacing` in body	-5
Using `natbib` instead of `biblatex`	-3
Missing `fancyhdr` page number setup	-2
`\textbf{}` wrapping `\title{}`	-3
`\and` between authors instead of `\quad`	-3
Repeated affiliation text outside `\thanks{}`	-3
Missing JEL codes or keywords	-5
`\hline` instead of booktabs rules	-3
Missing table notes	-5
Missing figure notes	-5
`hyperref` not loaded second-to-last	-2
Missing `cleveref` after `hyperref`	-2
Manual `Figure~\ref{}` instead of `\cref{}`	-1 per (max -5)
Using `bibtex` instead of `biber`	-3
Missing `microtype`	-2

Recommended (Advisory Only)

These are reported but do not produce deductions:

Missing lmodern font
Non-default citation color
Missing captionsetup
Missing hidelinks in hyperref

Agent Enforcement Summary

Different agents check different subsets of the rules. This table shows which invariants each enforcer is responsible for and what happens when it finds a violation.

Agent	Checks	Action on violation
writer-critic	INV-1 through INV-13, INV-22	Deduct per scoring rubric
coder-critic	INV-13 through INV-19	Deduct per scoring rubric
storyteller-critic	INV-20, INV-21	Deduct per scoring rubric
verifier	INV-9, INV-10, INV-14, INV-15, INV-16, INV-19	FAIL (binary — no partial credit)
lint hook	INV-14, INV-15, INV-16, INV-19	Advisory warning (non-blocking)

Note the overlap: INV-13 is checked by both the writer-critic (paper side) and coder-critic (code side). INV-9, INV-10, INV-14–16, and INV-19 are checked by both the coder-critic (with deductions) and the verifier (with a hard FAIL). This redundancy is intentional — the critic catches issues during development; the verifier catches anything that slipped through at submission time.

Lifecycle Validation

The Orchestrator runs two validation checks around every agent dispatch, defined in lifecycle.md.

PRE-dispatch: Before launching any agent, the Orchestrator reads the agent’s entry in permissions.md and verifies that all required input artifacts exist and contain the expected sections. If validation fails, the agent is not dispatched — instead, the system reports what is missing and suggests which skill to run first.

POST-completion: After an agent finishes, the Orchestrator verifies that the expected output artifacts were created, that required sections are present in those outputs, and that the paired critic has produced a scored report logged in the research journal. If validation fails, the agent is re-dispatched with the specific gaps noted.

The principle is fail fast: never launch an agent with missing inputs and hope it works, and never advance past an agent with missing outputs. Every handoff between agents is validated.