[{"content":"Assistant Professor of Economics at UAB Collat School of Business. I study how large-scale disruptions — immigration shocks, environmental crises, gender dynamics — reshape labor markets, and I build AI tooling to improve empirical research.\nCurrently obsessed with difference-in-differences when time-varying covariates carry the identification.\nRecent Publications # \u0026ldquo;The Effects of the Venezuelan Refugee Crisis on the Brazilian Labor Market\u0026rdquo; (with Samyam Shrestha) Journal of Economic Geography, 2026 — paper\n\u0026ldquo;The End of Free Movement and International Migration\u0026rdquo; (with Samyam Shrestha) Economics Letters, 2025 — paper\nAll research →\nThe Clo-Author # Econ AI Research Assistant for Claude Code — a worker-critic agent scaffold for empirical economics research, from literature review through writing.\nLearn more →\n","date":"1 May 2026","externalUrl":null,"permalink":"/","section":"","summary":"","title":"","type":"page"},{"content":"I am from the beautiful city of Rio de Janeiro, Brazil. Rio has undoubtedly been a major influence on my enthusiasm for economics.\nI am also a huge fan of history, cooking, and physics. When I am not doing research, I am probably in my kitchen making an experimental dish. My favorite cuisines are Brazilian-Portuguese, Lebanese, and Italian.\nA few of my snapshots from Rio # Ipanema Beach at winter sunset Copacabana at night, Sugar Loaf in the distance Rainbow over Botafogo Bay Christ the Redeemer emerging from clouds atop Corcovado A marmoset (sagui) perched on a railing A pier on the Barra lagoon, high-rises across the water Sunset over the Pedra Branca chain — Pedra da Bruxa rising in the middle ","date":"1 May 2026","externalUrl":null,"permalink":"/about/","section":"","summary":"","title":"About","type":"page"},{"content":" ⬇ Download CV ↗ Open in new tab Job Market Paper (PDF) Your browser cannot display the embedded CV. Click here to download the PDF. ","date":"1 May 2026","externalUrl":null,"permalink":"/cv/","section":"","summary":"","title":"CV","type":"page"},{"content":" Publications # \u0026ldquo;The Effects of the Venezuelan Refugee Crisis on the Brazilian Labor Market\u0026rdquo; (with Samyam Shrestha) Journal of Economic Geography, 2026 — paper — details\n\u0026ldquo;The End of Free Movement and International Migration\u0026rdquo; (with Samyam Shrestha) Economics Letters, 2025 — paper\nWorking Papers # \u0026ldquo;Gender Differences in Comparative Advantage Matches: Evidence from Linked Employer-Employee Data\u0026rdquo; (Job Market Paper) Working Paper — details\n\u0026ldquo;Difference in Differences with Bad Controls\u0026rdquo; (with Carolina Caetano, Brantly Callaway, and Stroud Payne) Working Paper — details\n\u0026ldquo;Labor Market Effects of an Environmental Disaster: Evidence from the 2015 Mariana Dam Failure\u0026rdquo; Working Paper — details\nSelected Work in Progress # \u0026ldquo;Immigration Enforcement and Business Dynamics\u0026rdquo; (with Samyam Shrestha)\nWe analyze whether reducing the undocumented immigrant population affects local business dynamics and the entrepreneurial climate by leveraging the temporal and spatial variation in the implementation of the Secure Communities (SC) program. SC relies on data-sharing between local law enforcement agencies to identify and arrest undocumented immigrants. We find that SC implementation at the commuting zone level reduced the number of establishments and establishment entries, and increased establishment exits in the construction sector, along with a decrease in job creation. As expected, we find no effect on economic sectors with a traditionally low percentage of immigrant workers. Surprisingly, we also find no significant effects in the agricultural sector. We are currently testing four potential mechanisms to explain the construction-sector effects: the entrepreneurial drain effect, the chilling effect, the labor cost effect, and the consumption effect.\n\u0026ldquo;The Effects of Crime on Firm Entry and Exit: Evidence from Rio de Janeiro\u0026rdquo; (with Samyam Shrestha)\nThis paper examines the effects of crime on firm entry and exit in Rio de Janeiro, using data on the universe of firms and establishments in the city from 2007 to 2017. By spatially locating firms and merging this information with granular neighborhood-level crime data, we investigate how crime influences the local business environment. We address endogeneity through an instrumental variable approach, leveraging spatiotemporal variations in the Pacifying Police Unit program — the deployment of the Brazilian military across Rio neighborhoods in the lead-up to the 2014 World Cup and 2016 Olympics. We identify neighborhoods with persistently high crime levels that did not receive military intervention to serve as a control group. We explore heterogeneity at the level of crime type, firm size, industry, and productivity distribution.\n","date":"1 May 2026","externalUrl":null,"permalink":"/research/","section":"","summary":"","title":"Research","type":"page"},{"content":" Instructor of Record # University of Alabama at Birmingham — Collat School of Business\nManagerial Economics — graduate Principles of Microeconomics — undergraduate Intermediate Microeconomics — undergraduate Recognition # Outstanding Teaching Assistant Award — Center for Teaching and Learning, University of Georgia\nSwift Undergraduate Teaching Fellowship Award — Terry College of Business, University of Georgia\nCausal Inference: The Mixtape # Back in my master\u0026rsquo;s, I collaborated with Scott Cunningham to translate every model in Causal Inference: The Mixtape from Stata to R. Check it out — a new edition is coming soon.\n","date":"1 May 2026","externalUrl":null,"permalink":"/teaching/","section":"","summary":"","title":"Teaching","type":"page"},{"content":"Detail pages for papers in progress — abstracts, manuscripts, and replication code.\n","date":"1 May 2026","externalUrl":null,"permalink":"/workingpapers/","section":"Working Papers","summary":"","title":"Working Papers","type":"workingpapers"},{"content":"","date":"20 April 2026","externalUrl":null,"permalink":"/tags/ai/","section":"Tags","summary":"","title":"Ai","type":"tags"},{"content":"","date":"20 April 2026","externalUrl":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"},{"content":"","date":"20 April 2026","externalUrl":null,"permalink":"/categories/notebook/","section":"Categories","summary":"","title":"Notebook","type":"categories"},{"content":"","date":"20 April 2026","externalUrl":null,"permalink":"/posts/","section":"Posts","summary":"","title":"Posts","type":"posts"},{"content":"","date":"20 April 2026","externalUrl":null,"permalink":"/tags/research/","section":"Tags","summary":"","title":"Research","type":"tags"},{"content":"","date":"20 April 2026","externalUrl":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"Last post I sold you on worker-critic pairs as the elegant core of the architecture.\nBut I think the writer pair is kinda bad?\nThe diminishing returns of scientific paper writing # \u0026ldquo;The last 20% of a paper takes 80% of the time\u0026rdquo; is a phrase I heard constantly in grad school. For instance, say you start a difference-in-differences paper on immigration enforcement. The first stretch flies: you code the main spec, the point estimate goes the right way, you\u0026rsquo;re already sketching the intro in your head. Somewhere around the halfway mark of the time you\u0026rsquo;ll eventually spend, you have something that genuinely looks like 80% of a paper.\nThen you hit the flat part of the curve.\nWe spend half of the time pushing the first draft, the other half fine-tuning.\nThis is where the fine-tuning hell lives. You need to read several times the introduction to check if it is under the proper narrative. The mechanism section has to be tight and requires a back-and-forth approach, especially if you have co-authors. Economists rewrite sections endlessly to find that sweet spot.\nNone of this should be new to anyone who\u0026rsquo;s written a paper in grad school or under a tenure clock. So what does it have to do with AI?\nContext rot is a thing # Context rot is something you probably noticed using AI in long sessions. Language models, once a session accumulates enough context, stop reliably using what\u0026rsquo;s in the conversation and drift back toward their generic priors.\nRoughly the shape of the degradation.\nThis has been measured — see RULER and Chroma\u0026rsquo;s context-rot evals — and the usual fix is session hygiene. Short sessions with scoped tasks. Say you want to generate the first block of code for your results section. In your current session, ask Claude to write a task document: the spec, the data, the expected output, something like a to-do. Then start a new session, point it at the document, and have it dispatch the coder-critic pair. You dodge context rot because Claude has no memory across sessions — it starts fresh every time, reading only the document.\nThe writer carries everyone else\u0026rsquo;s context # Session hygiene works beautifully for the librarian, the coder, the theorist. Each of them has a scoped job. The librarian queries a literature. The coder reads a spec and a test file. The theorist works a derivation that fits in one session. None of them needs to know what the others are doing to do its own job.\nWell, the writer can\u0026rsquo;t. Writing a paper is synthesis at its core. By definition, the paper is where all the other aspects of the research are glued together. You can\u0026rsquo;t draft a paragraph of the introduction without all of it loaded. The writer needs the union of every other agent\u0026rsquo;s context.\nThe sweet spot is an interior solution # Let us go back to the production function of a paper. Assume a classic log form for quality as a function of effort:\n$$Q(e) = \\alpha \\log(1 + k \\cdot e)$$concave, with diminishing returns. Now layer in context rot. Let $R(c)$ be the probability of rot at context level $c$. Under AI-assisted work, the usable quality is the production function scaled by reliability:\n$$Q_{\\text{AI}}(e, c) = Q(e) \\cdot \\bigl(1 - R(c)\\bigr)$$Human intervention is the gap:\n$$H(e, c) = Q(e) - Q_{\\text{AI}}(e, c) = Q(e) \\cdot R(c)$$ The gap between the two curves is where human intervention has to fill in.\nWhat I learned # The coder-critic sits in a wide sweet spot — it can run on autopilot until the tests pass, precisely because the tests exist. The writer-critic sits in a narrow one and spends most of its life on the wrong end of the rot curve, precisely because no analogous test exists.\nWhat I\u0026rsquo;m doing now is letting the writer-critic run once, maybe twice, and then pulling the plug.\n","date":"20 April 2026","externalUrl":null,"permalink":"/posts/research-pipeline-not-linear/","section":"Posts","summary":"On diminishing returns, context rot, and the one worker-critic pair in Clo-Author where both curves go bad at once.","title":"The research pipeline is not linear","type":"posts"},{"content":"","date":"20 April 2026","externalUrl":null,"permalink":"/tags/writing/","section":"Tags","summary":"","title":"Writing","type":"tags"},{"content":"","date":"19 April 2026","externalUrl":null,"permalink":"/tags/claude-code/","section":"Tags","summary":"","title":"Claude-Code","type":"tags"},{"content":"I am an AI skeptic. Or I was. Or I still am, sort of — it\u0026rsquo;s complicated, and that\u0026rsquo;s actually the reason I started building Clo-Author.\nI\u0026rsquo;ve been at it for months. This post (and the ones that follow) is me writing down what I\u0026rsquo;ve found. Less a tutorial, more a notebook: what I tried, what broke, what surprised me, what I changed my mind about.\nWhat Clo-Author is # A Claude Code scaffold for empirical economics research. It started as a fork of Pedro Sant\u0026rsquo;Anna\u0026rsquo;s claude-code-my-workflow and I reoriented from lecture production to research papers. My idea was to become the principal investigator of a team of virtual research assistants.\nThe novelty in the architecture is actually pretty simple. It\u0026rsquo;s around worker-critic pairs: every phase in the research pipeline has a worker-critic pair: the worker creates and cannot audit; the critic audits and cannot edit. One pair for literature review, another for data search, etc.\nPicture Karl Popper\u0026rsquo;s epistemology, raised by the Separatist Army.\nEvery creator gets shadowed by a critic that exists only to attack it. The librarian writes a literature review; the librarian-critic hunts for missed papers and weak framings. The coder ships a script; the coder-critic stress-tests it. Nothing gets through without surviving a refutation attempt.\nLooks absurd from outside — half the army exists to fight the other half. But I think it\u0026rsquo;s the only setup that keeps a creator agent honest. Left alone, no matter how good the prompt, they all start believing their own drafts.\nThe scoring side has quality gates so a paper can\u0026rsquo;t ship with one weak section masked by strong others. There\u0026rsquo;s also a simulated peer review pipeline (/review --peer [journal]) that runs an editor desk review, assigns two referees with intellectual dispositions weighted by the journal\u0026rsquo;s culture, and produces an editorial decision with FATAL / ADDRESSABLE / TASTE classifications.\nVery cool. Does it work?\nWhy I\u0026rsquo;m writing this series # Two reasons.\nBefore I get to those — a detour.\nMost AI users, and most non-users, still think AI is just ChatGPT. Heck, even people using Claude Code are still debating whether \u0026ldquo;it p-hacks or not.\u0026rdquo; Lols. Let me humbly attempt to convince you that those discussions are somewhat pointless — all the same critiques apply in a world without AI. Yeah, the AI will p-hack. So will your Ph.D. student. Yeah, the AI will hallucinate. Yes — just like an RA reads a paper and misinterprets it completely. At the end of the day, it\u0026rsquo;s your judgment call, not the machine\u0026rsquo;s. It\u0026rsquo;s like blaming a Canon DSLR for taking bad photos.\nAnyway. The reasons.\nFirst, every time I extend the scaffold, I learn something about how Claude Code behaves under sustained, structured workflows that I didn\u0026rsquo;t expect from one-shot prompting. Compaction discipline, demand-loading reference files, the difference between an agent that drafts and an agent that scores — these are real design choices with real tradeoffs, and I keep rediscovering the same lessons because I never wrote them down.\nSecond, I think more economists should be building infrastructure like this for themselves, and I\u0026rsquo;d rather show the rough edges than pretend it\u0026rsquo;s polished. The README sells the architecture. These posts will document the messy parts: the agent that hallucinated a citation, the rubric that was too lenient, the refactor that broke three workflows, the moment I realized the writer was just paraphrasing the theorist instead of writing.\nIf you want to follow along, the repo is here and the live guide is at hsantanna.org/clo-author. Feedback welcome — open an issue or email me. Especially if you\u0026rsquo;re an economist tinkering with something similar.\n","date":"19 April 2026","externalUrl":null,"permalink":"/posts/notes-from-building-clo-author/","section":"Posts","summary":"Kicking off a series of notes on building Clo-Author.","title":"Notes from building Clo-Author","type":"posts"},{"content":"","date":"19 April 2026","externalUrl":null,"permalink":"/tags/workflow/","section":"Tags","summary":"","title":"Workflow","type":"tags"},{"content":" Manuscript (arXiv) Code (GitHub) Abstract # We use administrative panel data on the universe of Brazilian formal workers to investigate the labor market effects of the Venezuelan crisis in Brazil, focusing on the border state of Roraima. The results using difference-in-differences show that the monthly wages of Brazilians in Roraima increased by around 2 percent, mostly driven by those working in sectors and occupations with no refugee involvement. The study finds negligible job displacement for Brazilians but finds evidence of native workers moving to occupations without immigrants. We also find that immigrants in the informal market offset the substitution effects in the formal market.\nSuggested Citation # Sant\u0026rsquo;Anna, Hugo and Samyam Shrestha. \u0026ldquo;The Effects of the Venezuelan Refugee Crisis on the Brazilian Labor Market.\u0026rdquo; Journal of Economic Geography, 2026. https://doi.org/10.1093/jeg/lbaf047\n","date":"1 January 2026","externalUrl":null,"permalink":"/workingpapers/vzcrisis/","section":"Working Papers","summary":"","title":"The Effects of the Venezuelan Refugee Crisis on the Brazilian Labor Market","type":"workingpapers"},{"content":" Manuscript (PDF) Abstract # In this paper, I introduce a novel decomposition method based on Gaussian mixtures and k-Means clustering, applied to a large Brazilian administrative dataset, to analyze the gender wage gap through the lens of worker–firm interactions shaped by comparative advantage. These interactions generate wage levels in logs that exceed the simple sum of worker and firm components, making them challenging for traditional linear models to capture effectively. I find that these \u0026ldquo;complementarity effects\u0026rdquo; account for approximately 17% of the gender wage gap. Larger firms, high human capital, STEM degrees, and managerial roles are closely related to it. For instance, among managerial occupations, the match effect goes as high as one-third of the total gap. I also find women are less likely to be employed by firms offering higher returns to both human capital and firm-specific premiums, resulting in a significantly larger firm contribution to the gender wage gap than previously estimated. Combined, these factors explain nearly half of the overall gender wage gap, suggesting the importance of understanding firm–worker matches in addressing gender-based pay disparities.\nSuggested Citation # Sant\u0026rsquo;Anna, Hugo. \u0026ldquo;Gender Differences in Comparative Advantage Matches: Evidence from Linked Employer-Employee Data.\u0026rdquo; Working Paper.\n","date":"1 January 2025","externalUrl":null,"permalink":"/workingpapers/assortmatch/","section":"Working Papers","summary":"","title":"Gender Differences in Comparative Advantage Matches: Evidence from Linked Employer-Employee Data","type":"workingpapers"},{"content":"","date":"4 August 2024","externalUrl":null,"permalink":"/tags/bunching/","section":"Tags","summary":"","title":"Bunching","type":"tags"},{"content":"","date":"4 August 2024","externalUrl":null,"permalink":"/tags/causal-inference/","section":"Tags","summary":"","title":"Causal-Inference","type":"tags"},{"content":"","date":"4 August 2024","externalUrl":null,"permalink":"/categories/r/","section":"Categories","summary":"","title":"R","type":"categories"},{"content":"","date":"4 August 2024","externalUrl":null,"permalink":"/tags/r/","section":"Tags","summary":"","title":"R","type":"tags"},{"content":"Here I explain the basic concepts of bunching as a causal inference method. Full replication code is on GitHub.\nlibrary(tidyverse) library(ggdag) library(AER) set.seed(20240804) Constructing the data # We need a data generating process where an unobserved confounder $\\eta$ affects both the treatment (cigarettes) and the outcome (birth weight), while a covariate (education) is also observed.\nn \u0026lt;- 5000 educ \u0026lt;- sample(6:16, n, replace = TRUE) eta \u0026lt;- rnorm(n) cigs_star \u0026lt;- 20 - 1.5 * educ + 5 * eta + rnorm(n, 0, 3) cigs \u0026lt;- pmax(0, round(cigs_star)) bw \u0026lt;- 3000 - 20 * cigs + 15 * educ - 10 * eta + rnorm(n, 0, 100) data \u0026lt;- tibble(bw = bw, cigs = cigs, educ = educ) The key feature: $\\eta$ directly affects both cigs_star (smoking propensity) and bw (birth weight). This is the unobserved confounder that will bias naive estimates.\nThe naive approach # Imagine you\u0026rsquo;re interested in the causal effect of smoking on birth weights. You observe a covariate, mom\u0026rsquo;s education, and you control for it:\n$$y_i = \\beta X_i + \\gamma Z_i + \\varepsilon_i$$naive_model \u0026lt;- lm(bw ~ cigs + educ, data = data) summary(naive_model) Notice something weird? Can you say that for every cigarette smoked per day, the baby loses about 40 grams?\nProbably not. # The hint is in the educ coefficient — wrong-signed or insignificant. It implies babies are worse off (or no better off) as we increase the mom\u0026rsquo;s education level.\nWhat if there\u0026rsquo;s an unobserved $\\eta$ that influences both birth weights and the propensity to smoke?\nThere are 3 causal paths to birth weights: a direct path from education, another from $\\eta$, and another where cigarettes act as a mediator variable. Not only are we incorrectly estimating the cigarettes-birth weights relationship, we\u0026rsquo;re probably messing with the education path too.\nEnter bunching # The key assumption: there is a proxy relationship between smoking and covariates. Individuals have a propensity to smoke, like a utility function. Some maximize this utility by smoking heavily. Others are very averse — they would pay not to smoke. And then there are the marginally inclined — they would smoke given a chance but are slightly better off not smoking.\nWe partially observe this mechanism. We only see individuals who are positively inclined to smoke — there is no negative cigarette consumption.\nNotice the linear relationship until we hit zero — then a bunching pattern of birth weights. The puzzle: why are there so many data points accumulated at zero?\nThe running variable # There\u0026rsquo;s a variable we don\u0026rsquo;t observe that \u0026ldquo;runs\u0026rdquo; continuously through zero and accepts negative values. If we could capture this variable, we could isolate the cigarettes–birth weight relationship without the confounding $\\eta$.\nFormally:\n$$X = \\max(0, X^*)$$$X$ is a proxy for $X^*$, the running variable that is continuous in zero and can assume negative values.\nAssuming linearity:\n$$Y = \\beta X + Z'\\gamma + \\delta \\eta + \\varepsilon$$$$X^* = Z'\\pi + \\eta$$Combining:\n$$\\mathbb{E}(Y \\mid X,Z) = X\\beta + Z'(\\gamma - \\pi\\delta) + \\delta\\left(X + \\mathbb{E}(X^* \\mid X^* \\leq 0, Z)\\cdot\\mathbb{1}(X = 0)\\right)$$The trick: we impute $\\mathbb{E}(X^* \\mid X^* \\leq 0, Z)\\cdot\\mathbb{1}(X = 0)$ as a proxy for when $X$ becomes 0 — we now \u0026ldquo;observe\u0026rdquo; negative cigarette values. This accounts for the unobservable confounder.\nThe Tobit approach # Since our data is censored at zero, we assume normality of the latent error: $\\eta \\sim \\mathcal{N}(0, \\sigma^2)$. A Tobit model recovers the truncated conditional expectation:\ntobit_model \u0026lt;- tobit(cigs ~ educ, data = data, left = 0) sigma_hat \u0026lt;- tobit_model$scale xb \u0026lt;- predict(tobit_model, type = \u0026#34;lp\u0026#34;) mills \u0026lt;- dnorm(-xb / sigma_hat) / pnorm(-xb / sigma_hat) trunc_exp \u0026lt;- xb - sigma_hat * mills data \u0026lt;- data |\u0026gt; mutate(cf_imput = ifelse(cigs == 0, trunc_exp, 0)) The key step is the inverse Mills ratio correction. The linear predictor $Z\u0026rsquo;\\hat{\\pi}$ alone is not $\\mathbb{E}(X^* \\mid X^* \\leq 0, Z)$ — we must account for the truncation.\ncf_model \u0026lt;- lm(bw ~ cigs + educ + cf_imput, data = data) The coefficient on cigs should now be close to the true value of $\\beta = -20$. Compare with the naive OLS, which was biased toward $-40$.\nWhere is the randomness? # Causal experiments usually rely on random shocks. But here individuals bunched at zero simply because they cannot smoke less than zero. We need randomness to ensure unobservables are not affecting the treatment effect — in bunching, we exploit the bunched values to reach the unobservable confounder and ultimately control for it.\nFurther reading # For a deeper overview, see Bertanha, McCallum, and Seegert (2024) and the original Caetano (2015).\n","date":"4 August 2024","externalUrl":null,"permalink":"/posts/bunching/","section":"Posts","summary":"A brief introduction to bunching analysis as a causal method, based on Caetano (2015).","title":"So, what is bunching?","type":"posts"},{"content":" Manuscript (arXiv) Abstract # This paper considers identification and estimation of causal effect parameters from participating in a binary treatment in a difference-in-differences (DID) setup when the parallel trends assumption holds after conditioning on observed covariates. Relative to existing work in the econometrics literature, we consider the case where the value of covariates can change over time and, potentially, where participating in the treatment can affect the covariates themselves. We propose new empirical strategies in both cases. We also consider two-way fixed effects (TWFE) regressions that include time-varying regressors, which is the most common way DID identification strategies are implemented under conditional parallel trends. We show that, even with only two time periods, these TWFE regressions are not generally robust to (i) time-varying covariates being affected by the treatment, (ii) treatment effects and/or paths of untreated potential outcomes depending on the level of time-varying covariates in addition to only the change in the covariates over time, (iii) treatment effects and/or paths of untreated potential outcomes depending on time-invariant covariates, (iv) treatment effect heterogeneity with respect to observed covariates, and (v) violations of strong functional form assumptions, both for outcomes over time and the propensity score, that are unlikely to be plausible in most DID applications. Thus, TWFE regressions can deliver misleading estimates of causal effect parameters in a number of empirically relevant cases. We propose both doubly robust estimands and regression adjustment / imputation strategies that are robust to these issues while not being substantially more challenging to implement.\nSuggested Citation # Caetano, Carolina, Brantly Callaway, Stroud Payne, and Hugo Sant\u0026rsquo;Anna. \u0026ldquo;Difference in Differences with Time-Varying Covariates.\u0026rdquo; June 2024. Working Paper.\n","date":"1 June 2024","externalUrl":null,"permalink":"/workingpapers/badcontrols/","section":"Working Papers","summary":"","title":"Difference in Differences with Time-Varying Covariates","type":"workingpapers"},{"content":" This page has moved to the full interactive guide.\nGo to the Guide R Package on GitHub ","date":"1 June 2024","externalUrl":null,"permalink":"/badcontrols/","section":"","summary":"","title":"Difference-in-Differences with Bad Controls","type":"page"},{"content":" Manuscript (arXiv) Abstract # This paper examines the labor market impacts of the 2015 Mariana Dam disaster in Brazil. It contrasts two theoretical models — an urban spatial equilibrium model and a factor-of-production model — with diverging perspectives on environmental influences on labor outcomes. Utilizing rich national administrative and spatial data, the study reveals that the unusual environmental alteration, with minimal human capital loss, primarily affected outcomes via the factor-of-production channel. Nevertheless, spatial equilibrium dynamics are discernible within certain market segments. This research contributes to the growing literature on environmental changes and their economic consequences.\nSuggested Citation # Sant\u0026rsquo;Anna, Hugo. \u0026ldquo;Labor Market Effects of an Environmental Disaster: Evidence from the 2015 Mariana Dam Failure.\u0026rdquo; May 2024. Working Paper.\n","date":"1 May 2024","externalUrl":null,"permalink":"/workingpapers/mariana/","section":"Working Papers","summary":"","title":"Labor Market Effects of an Environmental Disaster: Evidence from the 2015 Mariana Dam Failure","type":"workingpapers"},{"content":"","date":"5 April 2024","externalUrl":null,"permalink":"/tags/em-algorithm/","section":"Tags","summary":"","title":"Em-Algorithm","type":"tags"},{"content":"The EM algorithm is a powerful iterative method for finding maximum likelihood estimates in statistical models with \u0026ldquo;latent variables\u0026rdquo; or missing data. It was first proposed by Dempster, Laird, and Rubin (1977) and has since been a common tool in machine learning models.\nThe core idea is to iteratively alternate between two steps until convergence:\nExpectation (E) step: Estimate the missing data given the observed data and current parameter estimates. Maximization (M) step: Update the parameters to maximize the likelihood, treating the estimated missing data as if it were observed. Eventually (hopefully) the algorithm converges. This is particularly useful for Gaussian mixture models.\nA labor market with hidden types # Say we observe labor market data with log wages and we suspect it is composed of two types of workers: low types and high types. We do not observe the worker type — only the social identifier and their payment.\nlibrary(tidyverse) set.seed(123) n \u0026lt;- 10000 true_means \u0026lt;- c(2, 3) true_sds \u0026lt;- c(0.5, 0.5) true_weights \u0026lt;- c(0.6, 0.4) lmarket \u0026lt;- tibble( worker_id = 1:n, type = c(rep(1, n * true_weights[1]), rep(2, n * true_weights[2])), log_wage = c( rnorm(n * true_weights[1], true_means[1], true_sds[1]), rnorm(n * true_weights[2], true_means[2], true_sds[2]) ) ) |\u0026gt; mutate(log_wage = log_wage - min(log_wage) + 1) We can formally write this mixture as:\n$$f(w_i; \\mu, \\sigma, \\pi) = \\sum^2_{k = 1} \\pi_k \\mathcal{N}(w_i; \\mu_k, \\sigma_k)$$ The histogram # Notice how we can barely see the mixture components. In real-world data, wages are all over the place the same way. There must be \u0026ldquo;hidden\u0026rdquo; distributions blended together. So how do we extract them?\nThe EM algorithm # The first step is to guess the initial moments and priors using k-means.\ninitial_guess \u0026lt;- kmeans(lmarket$log_wage, centers = 2, nstart = 25)$cluster mu1 \u0026lt;- mean(lmarket$log_wage[initial_guess == 1]) mu2 \u0026lt;- mean(lmarket$log_wage[initial_guess == 2]) sigma1 \u0026lt;- sd(lmarket$log_wage[initial_guess == 1]) sigma2 \u0026lt;- sd(lmarket$log_wage[initial_guess == 2]) pi1 \u0026lt;- mean(initial_guess == 1) pi2 \u0026lt;- mean(initial_guess == 2) The observed-data log-likelihood is:\n$$\\ell = \\sum_i \\log \\left( \\sum_k \\pi_k \\mathcal{N}(w_i; \\mu_k, \\sigma_k) \\right)$$Note the log of the sum — this is what makes direct maximization difficult and motivates EM.\nsum_finite \u0026lt;- function(x) sum(x[is.finite(x)]) L \u0026lt;- c(-Inf, sum(log(pi1 * dnorm(lmarket$log_wage, mu1, sigma1) + pi2 * dnorm(lmarket$log_wage, mu2, sigma2)))) current_iter \u0026lt;- 2 max_iter \u0026lt;- 500 while (abs(L[current_iter] - L[current_iter - 1]) \u0026gt;= 1e-8 \u0026amp;\u0026amp; current_iter \u0026lt; max_iter) { # E step comp1 \u0026lt;- pi1 * dnorm(lmarket$log_wage, mu1, sigma1) comp2 \u0026lt;- pi2 * dnorm(lmarket$log_wage, mu2, sigma2) comp_sum \u0026lt;- comp1 + comp2 p1 \u0026lt;- comp1 / comp_sum p2 \u0026lt;- comp2 / comp_sum # M step pi1 \u0026lt;- sum_finite(p1) / length(lmarket$log_wage) pi2 \u0026lt;- sum_finite(p2) / length(lmarket$log_wage) mu1 \u0026lt;- sum_finite(p1 * lmarket$log_wage) / sum_finite(p1) mu2 \u0026lt;- sum_finite(p2 * lmarket$log_wage) / sum_finite(p2) sigma1 \u0026lt;- sqrt(sum_finite(p1 * (lmarket$log_wage - mu1)^2) / sum_finite(p1)) sigma2 \u0026lt;- sqrt(sum_finite(p2 * (lmarket$log_wage - mu2)^2) / sum_finite(p2)) current_iter \u0026lt;- current_iter + 1 L[current_iter] \u0026lt;- sum(log(pi1 * dnorm(lmarket$log_wage, mu1, sigma1) + pi2 * dnorm(lmarket$log_wage, mu2, sigma2))) } The EM algorithm guarantees monotone ascent: $\\ell(\\theta^{(t+1)}) \\geq \\ell(\\theta^{(t)})$.\nResults # Converged in 498 iterations Component 1: mu = 2.654, sigma = 0.498, pi = 0.614 Component 2: mu = 3.641, sigma = 0.502, pi = 0.386 The estimated parameters are very close to the true values.\nThe two Gaussian components (teal and dark teal) are clearly separated, and their weighted sum (dashed red) closely matches the empirical density.\nWhat about model selection? # What if we assume 3 worker types instead of 2? You\u0026rsquo;ll get a tiny third component ($\\hat{\\pi}_3 \\approx 0.018$) that the algorithm struggles to fit — a clear sign of overfitting. AIC and BIC penalize complexity and help balance fit against overfitting.\n","date":"5 April 2024","externalUrl":null,"permalink":"/posts/expectation-maximization/","section":"Posts","summary":"A gentle explanation of Gaussian Mixtures and the Expectation Maximization algorithm, with a labor market application in R.","title":"Expectation Maximization Algorithm and Gaussian Mixtures","type":"posts"},{"content":"","date":"5 April 2024","externalUrl":null,"permalink":"/tags/labor/","section":"Tags","summary":"","title":"Labor","type":"tags"},{"content":"","date":"13 December 2022","externalUrl":null,"permalink":"/tags/econometrics/","section":"Tags","summary":"","title":"Econometrics","type":"tags"},{"content":"Here, I will provide a simple explanation of doubly robust estimators using R\u0026rsquo;s tidyverse. This post is heavily inspired by Matheus Facure\u0026rsquo;s Causal Inference for the Brave and True. There you can find comprehensive tutorials of many causal inference models in Python.\nIn a first introduction to causal inference, we learn about linear estimators and propensity score weighting methods to estimate the average treatment effect conditional on covariates.\nHowever, covariates are the main source of confounding in causal inference settings, so we should ask: which method should we use? In fact, we can combine both to achieve consistent estimation even when one of the models is misspecified.\nSetting up the data # Let us imagine we are investigating the causal relationship between income and a government training program in which individuals may or may not participate.\nFirst, we create a very simple static \u0026ldquo;labor market\u0026rdquo;. Assume 10,000 workers for whom we observe two variables, X1 and X2.\nlibrary(tidyverse) library(fixest) set.seed(123) n \u0026lt;- 10000 # Generate covariates X1 \u0026lt;- rnorm(n) X2 \u0026lt;- rnorm(n) For the sake of simplicity, X1 and X2 are standard normal distributions. We can assume there is a function that maps these draws to real-world data variables — for example, education and work experience.\nTreatment assignment # Let us now create a treatment variable. In several cases dealing with real-world data, where there is no randomized experiment, assignment to treatment is actually correlated with certain variables in a functional form we may not know.\nThe key identifying assumption in this exercise is unconfoundedness (also called selection on observables): conditional on X1 and X2, treatment assignment is independent of potential outcomes:\n$$\\left(Y(1), Y(0)\\right) \\perp D \\mid X_1, X_2$$This means that, after conditioning on the covariates, there are no remaining unobserved factors that simultaneously affect both treatment and outcomes. We also require overlap: $0 \u0026lt; P(D = 1 \\mid X) \u0026lt; 1$, ensuring that every type of individual has a positive probability of being treated or untreated.\n# Generate treatment via logistic model ps_true \u0026lt;- plogis(-1 - 0.5 * X1 - 0.5 * X2) treat \u0026lt;- rbinom(n, 1, ps_true) The naive ATE # tau \u0026lt;- 0.5 df \u0026lt;- tibble( id = 1:n, X1 = X1, X2 = X2, treat = treat, Y = tau * treat + 0.25 * X1 + 0.25 * X2 + rnorm(n, 0, 0.5) ) naive_ATE \u0026lt;- df |\u0026gt; group_by(treat) |\u0026gt; summarize(meanY = mean(Y)) |\u0026gt; summarize(ATE = diff(meanY)) |\u0026gt; pull() naive_ATE #\u0026gt; [1] 0.2863138 The true treatment effect is $\\tau = 0.5$. The naive ATE is biased downward by nearly half. Terrible estimates.\nThe correct linear model # correct_ols \u0026lt;- feols(Y ~ treat + X1 + X2, data = df) #\u0026gt; treat: 0.5118 We found 0.51 — right on target. So as long as we know the true specification, OLS works.\nThe propensity score approach # ps_model \u0026lt;- feglm(treat ~ X1 + X2, data = df, family = binomial) df \u0026lt;- df |\u0026gt; mutate( ps = predict(ps_model, type = \u0026#34;response\u0026#34;), weight = ifelse(treat == 1, 1 / ps, 1 / (1 - ps)) ) Y1 \u0026lt;- sum(df$Y[df$treat == 1] * df$weight[df$treat == 1]) / nrow(df) Y0 \u0026lt;- sum(df$Y[df$treat == 0] * df$weight[df$treat == 0]) / nrow(df) correct_PS \u0026lt;- Y1 - Y0 #\u0026gt; [1] 0.5085938 We invert the propensity scores: heavier weights on rarity. A treated observation that looks like a control is more valuable than otherwise. The estimate of 0.508 is very close.\nThe star of the show: Doubly Robust # The main idea: even if we misspecify either the propensity score model or the outcome model (but not both), we still obtain consistent estimates when combining them.\nThe Augmented Inverse Probability Weighting (AIPW) estimator, due to Robins, Rotnitzky, and Zhao (1994), is:\n$$\\widehat{ATE} = \\frac{1}{N} \\sum \\left( \\frac{D_i(Y_i - \\hat{\\mu}_1(X_i))}{\\hat{P}(X_i)} + \\hat{\\mu}_1(X_i) \\right) - \\frac{1}{N} \\sum \\left( \\frac{(1-D_i)(Y_i - \\hat{\\mu}_0(X_i))}{1-\\hat{P}(X_i)} + \\hat{\\mu}_0(X_i) \\right)$$mu1_model \u0026lt;- feols(Y ~ X1 + X2, data = df |\u0026gt; filter(treat == 1)) mu0_model \u0026lt;- feols(Y ~ X1 + X2, data = df |\u0026gt; filter(treat == 0)) df \u0026lt;- df |\u0026gt; mutate( mu1_hat = predict(mu1_model, newdata = df), mu0_hat = predict(mu0_model, newdata = df) ) aipw_1 \u0026lt;- mean(df$treat * (df$Y - df$mu1_hat) / df$ps + df$mu1_hat) aipw_0 \u0026lt;- mean((1 - df$treat) * (df$Y - df$mu0_hat) / (1 - df$ps) + df$mu0_hat) DR_ATE \u0026lt;- aipw_1 - aipw_0 #\u0026gt; [1] 0.5118 Excellent — right on target.\nWhy does it work? # Case 1: Outcome model is correct, propensity score is wrong. Both terms have $Y_i - \\hat{\\mu}_d(X_i)$. If the outcome model is correct, $\\mathbb{E}[Y_i - \\hat{\\mu}_d(X_i) \\mid X_i] = 0$ — the propensity score component vanishes.\nCase 2: Propensity score is correct, outcome model is wrong. Rearranging isolates $\\hat{\\mu}_d(X_i)$ paired with $D_i - \\hat{P}(X_i)$, which is zero in expectation when the propensity score is correctly specified.\nDemonstration with an intentionally misspecified outcome model:\nmu1_wrong \u0026lt;- feols(Y ~ 1, data = df |\u0026gt; filter(treat == 1)) mu0_wrong \u0026lt;- feols(Y ~ 1, data = df |\u0026gt; filter(treat == 0)) aipw_misspec_1 \u0026lt;- mean(df$treat * (df$Y - df$mu1_wrong_hat) / df$ps + df$mu1_wrong_hat) aipw_misspec_0 \u0026lt;- mean((1 - df$treat) * (df$Y - df$mu0_wrong_hat) / (1 - df$ps) + df$mu0_wrong_hat) DR_misspec \u0026lt;- aipw_misspec_1 - aipw_misspec_0 #\u0026gt; [1] 0.5086 Even with an intercept-only outcome model, AIPW still delivers 0.51 — the correct propensity score carries the identification.\nThe naive estimator (red) is severely biased. OLS, IPW, and both AIPW variants (green) cluster tightly around the true $\\tau = 0.5$.\nWhat comes next? # How do we know the correct specification for propensity scores? Sant\u0026rsquo;Anna and Zhao (2020) provide a pathway to use difference-in-differences with doubly robust estimators. Kennedy (2022) reviews semiparametric methods for doubly robust estimation.\n","date":"13 December 2022","externalUrl":null,"permalink":"/posts/quick-start-doubly-robust/","section":"Posts","summary":"A quick tutorial explaining why Doubly Robust estimators are so powerful — with fully reproducible R code.","title":"Quick Start to Doubly Robust Estimators","type":"posts"},{"content":"","externalUrl":null,"permalink":"/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"Notes-style posts from ongoing projects. Less polished than the research posts — these are the rough edges, the design choices, the things I changed my mind about.\n","externalUrl":null,"permalink":"/notebook/","section":"Notebook","summary":"","title":"Notebook","type":"notebook"},{"content":"","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"}]