Meet the Agents

17 Specialists That Do the Work

How Agents Work

When you ask Claude to analyze data or draft a paper, it doesn’t do everything itself. It dispatches specialized agents — think of them as 17 different research assistants, each with a specific skill.

The key pattern: every agent that creates something (a literature review, analysis code, paper text) has a paired critic that checks its work. The critic can read but never edit — so it has no reason to go easy. If the critic finds problems, the worker fixes them. If they can’t agree after 3 rounds, you get asked to decide.

This is what prevents Claude from saying “looks good” about its own work.

NoteWhat to Expect

The critics catch structural problems reliably — wrong clustering, missing citations, broken LaTeX, design-specific gaps. The creators produce solid first drafts that follow economics conventions. You bring the field knowledge and editorial judgment. Notes on each agent’s strengths and limits are below.


The Researchers (Discovery Phase)

These agents help you find the literature and data you need before designing your empirical strategy.

Librarian

Searches top-5 journals, NBER working papers, SSRN, and RePEC for papers related to your topic. Produces an annotated bibliography with BibTeX entries, maps where the research frontier is, and identifies where your paper fits in the conversation. Calibrated to your field via the domain profile.

librarian-critic

Reads the bibliography the Librarian produced and looks for blind spots. Are there important recent papers missing? Is the review biased toward one methodology or subfield? Does it cover the right adjacent literatures? Scores the bibliography and sends it back for fixes if needed.

Explorer

Searches for public, administrative, and survey datasets that could answer your research question. Evaluates each source for variable coverage, sample size, time span, access restrictions, and how well it fits your identification strategy. Produces a ranked list with feasibility grades.

explorer-critic

Reviews the Explorer’s data assessment with a skeptical eye. Checks whether the recommended datasets actually measure what you need, whether the sample is representative enough for your question, and whether the data limitations would threaten your identification strategy.

NoteWhat to Expect

Good starting lists for literature and data. Verify access status and check for recent working papers yourself.


The Strategist (Strategy Phase)

These agents design and validate your empirical approach before you write a single line of code.

Strategist

Given your research question, literature, and available data, designs the empirical strategy. Paper-type aware — first classifies the paper, then applies the right strategy template:

Paper Type Strategy Focus
Reduced-form Estimand, estimator (DiD/IV/RDD/SC), assumptions, robustness, falsification
Structural Model environment, parameter identification, estimation method, counterfactuals
Theory+empirics Testable predictions, mapping to data, test power
Descriptive Construct definition, validation plan, decomposition

Can also draft Pre-Analysis Plans in AEA/OSF/EGAP format (/strategize --pap).

strategist-critic

The toughest reviewer in the system. Paper-type aware — uses different checklists per type:

Paper Type Critic Focus
Reduced-form Design-specific assumptions (parallel trends, exclusion restriction, manipulation test), sanity checks, inference, package-specific code
Structural Model specification, parameter identification (data variation vs. functional form), convergence, model fit, counterfactual credibility
Theory+empirics Prediction sharpness (do they rule things out?), test power, honesty about failures
Descriptive Construct validity, measurement error, representativeness, decomposition assumptions

Sends it back with specific objections until the design is sound.

NoteWhat to Expect

Strong on checklists and modern estimator recommendations. The institutional story behind your variation is yours to evaluate — the Strategist tells you what to check, not whether your setting is convincing.

Theorist

The methods coauthor for formal theory sections. Drafts assumptions, definitions, lemmas, theorems, and proofs at the rigor level of Econometrica, Journal of Econometrics, Quantitative Economics, and Annals of Statistics. Paper-type aware — activates primarily for:

Paper Type When Theorist Contributes
Econometric methods Identification + asymptotic theory + inference validity
Theory+empirics Model, propositions, comparative statics, mapping to data
Structural Identification of structural parameters + estimation theory
Methodological reduced-form Identification + asymptotic distribution of proposed estimator

Covers identification results, consistency, asymptotic normality, non-standard rates, influence functions and efficiency bounds, uniform validity, double/debiased ML, bootstrap validity, test properties, and signed comparative-static propositions. Field-specific citation anchors come from domain-profile.md; broad defaults cover DiD, IV, RDD, DML, semiparametric efficiency, GMM, and bootstrap. Invoke via /strategize theory [target].

theorist-critic

A top methods-journal referee. Reviews through 4 sequential phases with early-stop on critical gaps — the proof-validity phase gates everything downstream, so a broken proof doesn’t get buried under citation nits:

Phase Focus
1. Claim identification Object type, target parameter, estimator, assumption list, paper-type fit
2. Proof validity Logical validity, measurability / integrability, expansions + remainders, design-specific identification checks, asymptotic distribution
3. Assumptions + statements Minimality (each assumption used), primitiveness, over/under-claim, pointwise vs. uniform, notation consistency (INV-7)
4. Citations + linkage + polish Citation fidelity, orphan theorems, linkage to empirical claims, exposition

Deduction rubric: -20 per logic gap, -25 circular, -15 overclaim, -10 unjustified limit interchange, -5 per never-used assumption. Score ≥ 80 to approve.

Authorship-aware calibration: reads the Paper Author Team table in domain-profile.md; if the paper’s authors are themselves among the reference literature on the topic, the critic avoids lecturing them on their own contributions. Audit via /review --theory [target].


The Builders (Execution Phase)

These agents produce the actual analysis code, data pipelines, and paper manuscript.

Data-engineer

Handles the unglamorous but critical work: data cleaning, variable construction, panel building, merges, and missing data. Also creates publication-quality figures (ggplot2 themes, multi-panel layouts, accessibility-conscious colors). Paired with coder-critic for review.

Coder

Translates the strategy memo into working analysis scripts. Paper-type aware:

Paper Type Implementation
Reduced-form DiD/IV/RDD with design-specific packages
Structural MLE/GMM/SMM/BLP with convergence diagnostics and counterfactual simulation
Theory+empirics Prediction-by-prediction tests
Descriptive Construction, validation, decomposition

Enforces engineering discipline: paper-to-code naming maps, numbered script structure with master runner, function-per-file, numerical guards (float comparison, CDF clamping, inverse link protection), and bootstrap/parallel patterns. Supports R, Python, and Julia.

coder-critic

Reviews all code — from both the Data-engineer and the Coder. When invoked via /review --code, a mechanical grep-based linter runs first (Step 1) to catch prohibited patterns and style violations before the full review. 16 check categories including strategy alignment, numerical discipline (float guards, CDF clamping, pre-allocation, NaN/Inf checks), paper-to-code naming map, function design, reproducibility, and paper-type-specific checks (convergence for structural, seed handling for simulations). Also enforces prohibited patterns (setwd(), sapply(), <<-, T/F, hardcoded paths).

Score Breakdown Example

- Starting: 100
- Code-strategy alignment: OK
- Numerical discipline: -10 (float comparison with == on line 47)
- Reproducibility: -10 (no set.seed() found)
- Console hygiene: -3 (print() on lines 23, 89)
- **Final: 77/100** --- Below commit threshold (80). Fix numerical discipline and reproducibility.

Writer

Drafts paper sections using paragraph-level argument moves — each paragraph has one job (motivation, result statement, mechanism, robustness narration, qualification). Paper-type aware with full section templates:

Paper Type Section Templates
Reduced-form DiD parallel trends, IV exclusion restriction, RDD manipulation
Structural Model environment, estimation, counterfactual simulations, welfare
Theory+empirics Model predictions, empirical tests, honest assessment
Descriptive Data construction, validation, key facts

Results narration adapts to output type (regression table, event study figure, IV first stage + 2SLS, counterfactual simulation). Cleanup pass strips AI writing patterns after drafting.

writer-critic

8 check categories including argument structure (each paragraph has identifiable purpose, findings lead sentences), claims-evidence alignment, paper-type coherence (does the introduction promise match the strategy delivery?), design-specific completeness (DiD must have parallel trends discussion, IV must have LATE interpretation, structural must estimate and simulate), results narration quality, notation consistency, and LaTeX compilation. Sends it back for fixes.

NoteWhat to Expect

The Coder produces well-structured, reproducible scripts. The Writer produces structured first drafts using argument moves — your voice and field knowledge turn them into a paper. Both critics catch mechanical issues reliably; you bring the substance.


The Reviewers (Peer Review Phase)

These agents simulate what happens when you submit to a journal. An Editor does desk review, then two independent referees review your paper — neither sees the other’s report. Invoked via /review --peer [journal].

Editor

The desk reviewer. Screens the paper before sending it to referees: checks scope fit, minimum quality, and assigns each referee a disposition (Structuralist, Credibility, Measurement, Policy, Theory, or Skeptic) plus pet peeves (1 critical + 1 constructive per referee) that shape their review personality. After referee reports return, the Editor synthesizes the editorial decision, classifying each issue as FATAL (rejection-worthy), ADDRESSABLE (fixable in revision), or TASTE (referee preference, not binding).

domain-referee

The subject expert. Evaluates whether your paper makes a real contribution to the literature, whether you’ve positioned it correctly relative to existing work, whether your substantive arguments hold up, and whether your results generalize beyond the specific setting. Calibrated to your research field via the domain profile. When you specify a target journal (/review --peer JHR), it emulates that journal’s review culture using journal profiles. Every major comment includes a “what would change my mind” statement.

methods-referee

The empirical methods expert. Paper-type aware with separate evaluation dimensions per type. Reduced-form: identification strategy, estimation, inference, robustness, replication. Structural: model specification, parameter identification, estimation and computation, model fit, counterfactual credibility. Theory+empirics: model quality, prediction sharpness, test design and power, honesty of assessment. Descriptive: construct validity, construction replicability, validation, analysis quality. Calibrates to target journal when specified. Like the domain-referee, includes “what would change my mind” on every major comment.

R&R memory: When invoked with --peer --r2, referees remember their prior reports and check whether their concerns were addressed. Max 3 rounds.

NoteWhat to Expect

Catches real structural problems and produces actionable revision plans. The disposition system creates useful reviewer variation. Not a substitute for actual field referees — use it for pre-submission stress testing.


The Infrastructure

These agents coordinate and verify — they don’t produce research content.

Verifier

Runs in two modes. Standard mode (4 checks): does the paper compile? Do the scripts run? Are output files fresh? Are file references valid? Submission mode (10 checks): adds a full AEA replication package audit — numbered script order, README completeness, data provenance, and dependency documentation. Pass/fail only.

Orchestrator

The general contractor. Manages the entire pipeline: dispatches worker-critic pairs based on what phase you’re in, enforces quality gates (80 to commit, 90 for PRs, 95 for submission), escalates when pairs can’t converge, synthesizes referee reports into editorial decisions, and recommends target journals at submission time. You rarely interact with it directly — it works behind the scenes when you run /new-project or when Claude plans a complex task.

NoteWhat to Expect

The Verifier checks compilation and file integrity. The Orchestrator dispatches agents and enforces quality gates. Scores track issue resolution — use them directionally, not as absolute quality measures. For non-standard projects, invoke agents directly.


Quick Reference

All 19 agents at a glance:

Agent File What It Does
Librarian librarian.md Literature collection, bibliography, frontier mapping
librarian-critic librarian-critic.md Reviews bibliography for coverage, gaps, recency
Explorer explorer.md Data source discovery, feasibility scoring
explorer-critic explorer-critic.md Reviews data for measurement validity, ID fit
Strategist strategist.md Empirical strategy (4 paper types), Pre-Analysis Plans
strategist-critic strategist-critic.md Validates assumptions, inference, robustness (4 paper types)
Theorist theorist.md Formal theory: assumptions, theorems, proofs (paper-type aware)
theorist-critic theorist-critic.md 4-phase proof review with early-stop on critical gaps
Data-engineer data-engineer.md Data cleaning, wrangling, publication figures
Coder coder.md Analysis implementation (4 paper types, numerical discipline)
coder-critic coder-critic.md 16 checks: code quality, numerical discipline, strategy alignment
Writer writer.md Argument-move drafting (4 paper types, design-specific)
writer-critic writer-critic.md 8 checks: structure, coherence, design completeness, LaTeX
Editor editor.md Desk review, referee dispatch, editorial decision
domain-referee domain-referee.md Blind peer review (subject expertise)
methods-referee methods-referee.md Blind peer review (methods, 4 paper types)
Storyteller storyteller.md Beamer + Quarto talk creation (4 formats)
storyteller-critic storyteller-critic.md Talk narrative, visuals, content fidelity
Verifier verifier.md Compilation + AEA replication audit
Orchestrator orchestrator.md Agent dispatch, quality gates, editorial synthesis

Multi-Model Strategy

Not all agents need the same model. Each agent file has a model: field that controls which Claude model it uses:

Task Type Recommended Model Why
Deep reasoning (strategy, peer review) model: opus Needs nuanced judgment
Fast, constrained work (code review, compilation) model: sonnet Speed matters more
Default model: inherit Uses whatever model your session is running

This saves cost and time — a Verifier checking if LaTeX compiles doesn’t need the most powerful model. A strategist-critic stress-testing your identification design does.


Talks (Presentation Phase)

These agents create and review Beamer presentations derived from your paper.

Storyteller

Creates presentations from paper/main.tex in 4 formats: job market talk (45–60 min, full results), seminar (30–45 min, standard format), short talk (15 min, conference session), and lightning (5 min, elevator pitch). Supports both Beamer (output to paper/talks/) and Quarto RevealJS (output to paper/quarto/ via --quarto flag). All content derives from the paper — the Storyteller never invents results.

storyteller-critic

Reviews the talk for narrative flow (does the argument build?), visual quality (readable slides, not text walls), content fidelity (does the talk match the paper?), and format appropriateness (is a 15-min talk actually 15 minutes?). Scored as advisory — talks don’t block the pipeline.

NoteWhat to Expect

Good narrative structure derived from the paper. Expect to cut content and add whitespace — talks need less on each slide than the Storyteller tends to produce.