What’s New

Release History

Versioning: CalVer (YY.MM) — one version per month maximum. Git commits provide granularity within a release.


26.05 — May 2026

HTML Report Pipeline

Two Python generators produce self-contained interactive HTML from agent markdown output:

  • scripts/generate_dashboard.py — project-level overview (manuscript sections, data, code, quality scorecard, review history, active plans)
  • scripts/generate_html_report.py — 5 detail report subcommands (peer-review, code-audit, strategy-review, quality-gate, literature)

Shared design system in templates/html/base/. Dark mode, print support, collapsible sections, filter engine.

MAS Evolution v2

Five architectural phases drawn from OxyGent, LangGraph, Data-to-Paper, MAR, and Cunningham’s Referee 2:

  • Permission Registry — centralized agent registry replacing hardcoded dispatch tables. Adding a new agent is a one-file change.
  • Lifecycle Validation — PRE-dispatch and POST-completion checks with fail-fast.
  • Writer Evolution — writer-critic rebuilt with 8 categories (voice fidelity, claim-source traceability INV-22). Writer hard gates: refuses Results without tables, refuses drafting without style guide, section-level approval checkpoints.
  • Cold-Read Critics — all 7 critics evaluate blind (no round history). Dual-critic dispatch for gate artifacts.
  • Pipeline Checkpointing — structured JSON state (pipeline_state.json) survives sessions. Execution traces with Mermaid graphs.
  • Learning Loop — post-pipeline pattern detection (HIGH-PERF, FRICTION, ESCALATION). Learning promotion after 3+ project validation.

Skill-Centric Restructure

Skills are now rich directories. Each skill folder contains templates, gotchas, references, and config. Agents slimmed to identity + voice + constraints only.

  • 76 new files across 13 skill folders (templates, gotchas, references, config)
  • 15 agents slimmed (total lines: 4,454 → 1,898, −57%)
  • Three-level loading: metadata (always) → SKILL.md (on trigger) → templates (on demand)
  • Gotchas — every skill documents known failure points

Session-Scoped Guards

  • /freeze [dirs] — blocks edits outside specified directories
  • /careful — blocks destructive bash commands
  • Backed by session-guard.py PreToolUse hook; zero overhead when inactive

Quarto-First Talks

/talk create defaults to Quarto RevealJS. Beamer via --beamer.

Guide Site

9 pages rewritten with guide-writer voice (bold openings, problem-solution arcs, progressive disclosure). Thariqs aesthetic (ivory/clay/serif). agents.qmd compressed 603 → 205 lines. Skill-folder architecture documented. Nav reordered for user journey.

Also

  • CalVer adoption (YY.MM)
  • Rewind strategy in workflow.md
  • pre-compact.py exit code fix | literature | Zotero-like filterable bibliography (category, proximity, method filters, sort, search, copy-cite) |

Shared design system in templates/html/base/ (styles.css + components.js). Dark mode, print support, collapsible sections, filter engine. All skills auto-generate HTML after saving markdown reports.

Guide Site Overhaul

Migrated from cyberpunk neon (dark, scanlines, pink/cyan glow) to the thariqs aesthetic (ivory background, clay accents, serif headings). All mermaid diagrams updated to match. Readability pass across all pages — trimmed agent descriptions, broke up dense paragraphs, removed duplicate content.


26.04.3 — Theorist Pair, Personal Style Guide, Checkpoint

Released 2026-04-17

Three feature additions inspired by parallel work in the Claude-Code-for-economists space and by a theorist pair developed in the bad-controls project.

Theorist + theorist-critic pair

A first-class pair for formal theory sections — assumptions, definitions, lemmas, theorems, and proofs calibrated to top methods journals (Econometrica, Journal of Econometrics, Quantitative Economics, Annals of Statistics).

The theorist drafts identification results, consistency, asymptotic normality, influence functions, DML, bootstrap validity, test properties, and comparative-static propositions. Paper-type aware — activates for econometric methods, theory+empirics, structural identification, and methodological reduced-form papers.

The theorist-critic reviews through 4 sequential phases with early-stop on critical gaps: triage, proof validity (logical, measurability, expansions, identification, asymptotic distribution), assumption minimality + statement calibration, citations + linkage + polish.

Field-specific anchors live in a new Theoretical Foundational References table in domain-profile.md; broad defaults cover DiD, IV, RDD, DML, semiparametric efficiency, GMM, and bootstrap. A Paper Author Team table lets the critic avoid lecturing authors on results they themselves wrote.

Scoring: 20% weight in the aggregate when a theory section is present, renormalized otherwise. Invoke via /strategize theory [target] or audit via /review --theory [target].

Personal style guide

A /write style-guide [paper-dir] mode that extracts the user’s writing voice from their prior papers, then feeds it back to the writer on every subsequent invocation.

Strategic sampling across intros, section openings, abstracts, conclusions, and results paragraphs. Quantitative patterns (sentence-length distribution, passive/active ratio, em dash rate) and qualitative patterns (paragraph openings, section openings, lexicon used and avoided, hedging, comparison style, citation split, tone markers). Self-citation check surfaces author self-citations missing from Bibliography_base.bib.

Writes to .claude/references/personal-style-guide.md with quoted examples from the corpus — never invents patterns. The writer auto-loads it if real content is present; voice guide overrides generic academic defaults but never INV-1..21.

Compaction discipline + /checkpoint skill

Codifies compaction hygiene and adds a project-level session-handoff skill.

Compaction discipline (workflow.md Section 5): manual /compact at natural stopping points over auto-compression; 5–10 turn focused sessions; /checkpoint before /compact or session end; Session Recovery starts at Step 0 — read recent checkpoint artifacts before the plan.

/checkpoint — scaffold-friendly port. Core (always on, fork-friendly): auto-memory updates, SESSION_REPORT.md append, research_journal.md append, git-state snapshot. Obsidian integration is gated: activates only when .claude/state/obsidian-config.md exists AND Obsidian MCP is connected. Fork users get the template; user-specific paths stay local via .gitignore.


26.04.2 — Modern LaTeX Stack

Released 2026-04-11

Modernizes the LaTeX infrastructure with Overleaf-compatible upgrades.

latexmk Build System

paper/latexmkrc configures XeLaTeX, TEXINPUTS, and BIBINPUTS. One command replaces the manual 4-command build: cd paper && latexmk main.tex. No dot prefix — Overleaf reads it automatically. Updated across CLAUDE.md, verifier, /tools compile, and /talk compile.

tabularray

Modern table engine with key-value interface. Hand-written tables use tblr/talltblr (captions, notes, and rules in one declarative block). R/Python/Julia output continues exporting bare tabular wrapped with threeparttable. Both approaches documented with examples.

cleveref

\usepackage[nameinlink]{cleveref} loaded after hyperref. \cref{fig:x} auto-generates “Figure 1” — eliminates Figure~\ref{} boilerplate. Writer-critic deducts for missing cleveref and manual ref patterns.

microtype Promotion

Promoted from Recommended to Required. Writer-critic now deducts -2 for missing microtype.

GitHub Actions CI

.github/workflows/compile-paper.yml compiles the paper on push/PR when paper/** files change. Skips gracefully in template repos (no main.tex). Uploads compiled PDF as artifact.


26.04.1 — Enforcement Layer

Released 2026-04-09

Adds mechanical enforcement, structured pre-flight reporting, and decision traceability across the pipeline.

Content Invariants

21 numbered rules (INV-1 through INV-21) in .claude/rules/content-invariants.md. Non-negotiable standards for paper, code, and talks — critics now cite violations by invariant number. Covers table format, figure notes, notation consistency, anti-hedging, reproducibility requirements, and more.

Pre-Flight Reports

Two new structured reports that agents produce before doing work:

  • Pre-Strategy Report (Strategist): documents paper type classification, available data, and candidate designs before proposing a strategy
  • Pre-Code Report (Coder): documents naming map, script structure, and numerical guards before writing code

Grep-Based Linter

/tools lint [file|dir] runs mechanical checks on R, Python, and Julia scripts — prohibited patterns, style violations, reproducibility issues. /review --code now runs lint as Step 1 before dispatching coder-critic. Fast, deterministic, no LLM calls.

Decision Records

ADR-style records at discovery and strategy stages. Template: templates/decision-record.md. Saved to quality_reports/decisions/. Each record captures the decision, alternatives considered, rationale, and status.

PostToolUse Lint Hook

Auto-lints R, Python, and Julia files on every Edit or Write operation. Advisory only — reports issues but does not block. Catches problems at write time instead of review time.


26.04 — Paper-Type Architecture

Released 2026-04-08

Every agent now knows whether it’s working on a reduced-form, structural, theory+empirics, or descriptive paper — and adapts accordingly.

7 Agents Rewritten

Agent Before After
Writer Humanizer-first, one template Paragraph-level argument moves, 4 paper-type section templates
Writer-critic 6 checks, format-focused 8 checks: structure, coherence, design-specific completeness
Strategist Reduced-form only 4 paper types: reduced-form, structural, theory+empirics, descriptive
Strategist-critic DiD/IV/RDD checklists + structural model checks, prediction sharpness, construct validity
Coder Basic script standards Engineering discipline: naming maps, numerical guards, function-per-file
Coder-critic 12 checks 16 checks: + numerical discipline, prohibited patterns
Methods-referee Reduced-form evaluation only Paper-type-specific dimensions and scoring rubrics

Numerical Discipline

New in the Coder and Coder-critic, derived from C++ Core Guidelines:

  • Float comparison guards (no == on floats)
  • CDF clamping to [0, 1], inverse link protection
  • Integer literals (1L, seq_len(n))
  • Pre-allocation (no growing lists in loops)
  • Bootstrap/parallel patterns with proper seed handling
  • Language-specific coding standards: .claude/references/coding-standards-{r,python,julia}.md

Scope: Back to Economics

The 26.03.4 release claimed to serve “all empirical social science.” That was too broad — the pipeline’s agents, rules, and section templates are built for economics.

26.04 is honest about this: built for economics, adaptable to adjacent fields. Finance, accounting, marketing, and management researchers can customize the domain profile and journal profiles. The 30 journal profiles across these fields are retained as useful reference data.

Profile-Aware Table Standards

Significance stars are now journal-dependent:

  • Working papers (default): Stars OK
  • AEA journals (AER, AEJ:Applied, AEJ:Policy, AER:Insights): No stars, per current AEA style guide
  • All others: Stars acceptable

Journal profiles now include a Table format field for overrides.

Compilation Alignment

  • Fixed bibtex → biber mismatch across CLAUDE.md and tools/SKILL.md
  • Fixed remaining uppercase paths (Paper → paper, Talks → paper/talks)

Dead Code Removal

  • Deleted scripts/quality_score.py (761 lines targeting nonexistent directories)
  • Deleted scripts/sync_to_docs.sh (92 lines targeting nonexistent deployment structure)

README Honesty

  • “turns your terminal into a research assistant” → “scaffold for empirical economics research”
  • “works autonomously” → removed
  • “Realistic Peer Review” → “Simulated Peer Review”
  • “calibrated referee pools” → “configured referee pools”
  • Added Limitations section

26.03.6 — 2026-03-24

Output organization setting (by-script default). Bug fixes for Paper/ → paper/ path migration, domain-profile path, plan mode in /new-project.


26.03.5 — 2026-03-23

Skill detail restoration — 22 items lost in the 26.03 to 26.03.4 consolidation were restored while keeping 26.03.4 structure.


26.03.4 — 2026-03-20

Major architecture redesign. Peer review simulation with Editor + dispositions + desk reject. 30 journal profiles. Demand-loaded references. Skill consolidation (26 → 10 commands). Scope clarification to economics.