Listen Get

The Optimal Policy Generator: A Causal Inference Protocol for Maximizing Median Health and Wealth Through Public Policy

Systematic Generation of Enact/Replace/Repeal/Maintain Recommendations Using Quasi-Experimental Methods and Bradford Hill Criteria

Author
Affiliation

Mike P. Sinn

Abstract

Centuries of public policy variation across thousands of jurisdictions (countries, states, cities) constitute a massive natural experiment. The data to identify which policies maximize welfare exists but has not been systematically harvested.

The Optimal Policy Generator (OPG) applies causal inference methods (synthetic control, difference-in-differences, regression discontinuity) and Bradford Hill criteria to this cross-jurisdictional data, measuring policy impact on two welfare metrics: real after-tax median income growth and median healthy life years.

For any jurisdiction, OPG produces four categories of public policy recommendations: ENACT (evidence-supported policies the jurisdiction lacks), REPLACE (policies set at suboptimal levels), REPEAL (policies with net welfare harm), and MAINTAIN (policies aligned with evidence). Each recommendation includes expected effects on both metrics, confidence grades, and blocking factors including freedom and autonomy constraints.

The framework is agnostic to which party enacted each policy, evaluating only whether it improved outcomes. Projected welfare gains for typical US states: 5-15% of GDP (90% CI: 2-25%). At system scale, the model’s Optimal-Governance Path reaches 56.7x the Earth baseline after 20 years, raises average income to $1.16M versus $20.5K on the status-quo path, reaches $10.7 quadrillion in total output, and recovers roughly $101T/year in suppressed value (The Political Dysfunction Tax).

Keywords

policy evaluation, causal inference, quasi-experimental methods, synthetic control, difference-in-differences, Bradford Hill criteria, evidence-based policy, policy recommendations, jurisdiction analysis

Abstract

This specification describes the Optimal Policy Generator (OPG), a framework for producing jurisdiction-specific policy recommendations from quasi-experimental evidence. OPG measures policy impact on two fundamental welfare dimensions: real after-tax median income growth (economic welfare) and median healthy life years (health welfare). These metrics capture the primary welfare effects of most policies while remaining directly interpretable.

OPG output classes: ENACT (add policy), REPLACE (adjust policy level), REPEAL (remove harmful policy), MAINTAIN (retain evidence-aligned policy).

OPG answers four questions: “What should we add? Change? Remove? Keep?” The framework operates at any jurisdiction level (country, state, county, city) and produces four outputs:

  1. Enact: New policies the jurisdiction should adopt
  2. Replace: Existing policies to modify
  3. Repeal: Harmful policies to remove
  4. Maintain: Current policies aligned with evidence

Each recommendation includes expected effects on both metrics, confidence grades, and blocking factors (see Section 56.10.2).

Scale note. If recommendation quality is the bottleneck, the upside from fixing it is enormous. Under the project’s best-case governance ceiling, the recoverable upside is $101T per year. The 20-year Optimal-Governance Path reaches 56.7x the Earth baseline, raises average income to $1.16M versus $20.5K on the status-quo path, and reaches $10.7 quadrillion in total output. This specification focuses on the policy-selection layer of that plan; the full derivation lives in The Political Dysfunction Tax46.

JEL Classification: H10, D72, C54, I18, D61

H10 (Public Finance, Structure and Scope), D72 (Political Economy), C54 (Quantitative Policy Modeling), I18 (Health Policy), D61 (Allocative Efficiency; Cost-Benefit Analysis)

56.1 The Two Welfare Metrics

OPG measures policy impact using the two-metric welfare function defined in the Optimocracy137 Framework:

  1. Real after-tax median income growth (pp/year) - economic welfare
  2. Median healthy life years (years) - health welfare

See the Optimocracy paper for full justification of these metric choices, data sources, and the welfare function formula.

The two things that matter: having money and being alive to spend it. You’d think this would be obvious, but governments often forget the second bit.

The two things that matter: having money and being alive to spend it. You’d think this would be obvious, but governments often forget the second bit.

56.1.1 Why Only Two Metrics?

Simplicity: These two metrics capture the primary welfare dimensions affected by most policies while remaining directly interpretable. No complex conversion factors (VSL, QALY→$) are needed.

Coverage gap: Freedom and autonomy concerns are handled as blocking factors rather than adding metric complexity. A policy that improves income and health but restricts freedom is flagged, not silently scored. Environmental impacts and distributional effects are tracked as supplementary indicators where data permits.

56.1.2 Income Metric Definition

Real after-tax median income growth is defined narrowly as: wages, salaries, and self-employment income, minus taxes paid. This metric captures what appears in household budgets.

What counts as income effects:

  • Wage increases from productivity gains (e.g., fewer sick days → measurably higher wages)
  • Tax changes that directly affect take-home pay
  • Employment effects that translate to wage income

What does NOT count as income effects:

  • Healthcare cost savings (these are health system efficiency gains, not personal income)
  • Reduced insurance premiums (unless they translate to higher take-home pay via employer pass-through)
  • Quality-of-life improvements that don’t appear in wages

Implication for policy analysis: This creates genuine tradeoffs that the two-metric framework makes explicit. A tobacco tax, for example, may show:

  • Income effect: Negative for smokers (direct tax burden), partially offset by productivity gains for those who quit
  • Health effect: Positive (reduced smoking → longer healthy life)

The framework does not hide this tradeoff by claiming healthcare cost savings are “income gains.” If a policy improves health but costs money, both effects are reported honestly. This is a feature, not a bug: it prevents corner solutions (an infinite tobacco tax would maximize health but devastate income for smokers) and surfaces the welfare tradeoff for democratic deliberation.

56.1.3 Outcome Translation Methodology

While OPG uses only two terminal metrics (income growth and healthy life years), evidence often measures surrogate outcomes (smoking rates, traffic deaths, crime rates). This section specifies how surrogate outcomes are translated to the terminal metrics.

The translation chain:

Policy → Proximate Outcome → Intermediate Outcome → Terminal Metric
Stage Example (Tobacco Tax) Example (Seat Belt Law)
Proximate Cigarette sales (-8%) Seat belt usage (+15 pp)
Intermediate Smoking prevalence (-3 pp) Crash fatalities (-11%)
Terminal Healthy life years (+0.25) Healthy life years (+0.15)

Conversion factors must be explicit:

Each translation step requires a documented conversion factor with source:

Conversion Factor Source Uncertainty
1 pp smoking reduction → healthy life years +0.083 years CDC life tables,138 ±30%
1% fatality reduction → healthy life years +0.015 years NHTSA data,139 ±25%
1 pp employment increase → income growth +0.12 pp/year BLS wage data ±40%

Uncertainty propagation:

When multiple translation steps are chained, uncertainties compound:

\[ \sigma_{\text{terminal}} = \sqrt{\sum_i \left(\frac{\partial f}{\partial x_i}\right)^2 \sigma_i^2} \]

For linear translations, this simplifies to:

\[ \text{CV}_{\text{terminal}} = \sqrt{\sum_i \text{CV}_i^2} \]

Where CV is the coefficient of variation at each translation step. A three-step translation with 30% uncertainty at each step yields ~52% uncertainty in the terminal metric.

Important clarification: The claim that “no complex conversion factors are needed” in the abstract refers to the terminal metrics themselves (income and health are directly interpretable, unlike utility or welfare indices). Translation from surrogate outcomes to terminal metrics does require conversion factors, which must be documented and include uncertainty bounds.

56.2 The Evidence Base: Centuries of Natural Policy Experiments

Every jurisdiction that enacted a policy created a natural experiment. The evidence to know what works already exists, scattered across thousands of jurisdictions and hundreds of years. OPG systematically harvests this evidence.

56.2.1 Scale of Available Natural Experiments

Level Jurisdictions Years Policy-Years
US States 50 70+ 3,500+
Countries 200+ 230+ 46,000+
EU Regions 300+ 50+ 15,000+
US Counties 3,000+ 50+ 150,000+
Cities worldwide 10,000+ varies millions

Each policy change creates a before/after comparison. Each jurisdiction that didn’t adopt creates a control group. This represents a vast, largely untapped evidence base.

US states give you 3,500 policy-years of data. Cities worldwide give you millions. It’s like comparing a cookbook to the entire history of food.

US states give you 3,500 policy-years of data. Cities worldwide give you millions. It’s like comparing a cookbook to the entire history of food.

56.2.2 The OPG Pipeline

Data goes in, gets organized, analyzed, scored, then spits out recommendations. It’s a sausage factory, but for telling politicians what works instead of what kills you.

Data goes in, gets organized, analyzed, scored, then spits out recommendations. It’s a sausage factory, but for telling politicians what works instead of what kills you.
┌─────────────────────────────────────────────────────────────────┐
│                    OPG EVIDENCE PIPELINE                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. INGEST                                                       │
│     └── All policy changes with dates, jurisdictions, details   │
│                                                                  │
│  2. ALIGN                                                        │
│     └── Match policies to outcome time series by jurisdiction   │
│                                                                  │
│  3. ANALYZE                                                      │
│     └── Apply quasi-experimental methods (synth control, DiD)   │
│                                                                  │
│  4. SCORE                                                        │
│     └── Compute Policy Impact Scores using Bradford Hill        │
│                                                                  │
│  5. RANK                                                         │
│     └── Generate jurisdiction-specific recommendations          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

56.2.3 Why This Hasn’t Been Done Before

  1. Data fragmentation: Policy records scattered across legislative databases, government archives, academic papers
  2. Computational limits: Meta-analysis at this scale requires modern infrastructure
  3. Methodological advances: Synthetic control (2003), modern DiD (2021) are recent
  4. Incentive structures: No existing institution has mandate + capability + incentive

OPG aggregates fragmented evidence, applies modern causal inference at scale, and produces actionable output.

Four reasons this was impossible before: scattered data, slow computers, bad methods, nobody cared. Now: fast computers, good methods, some people care. Progress is three steps forward, four barriers removed.

Four reasons this was impossible before: scattered data, slow computers, bad methods, nobody cared. Now: fast computers, good methods, some people care. Progress is three steps forward, four barriers removed.

56.3 System Overview

56.3.1 What Policymakers See

A jurisdiction-specific dashboard showing which policies to enact, replace, repeal, or maintain, ranked by expected welfare impact:

See Appendix A for a complete worked example showing jurisdiction-specific recommendations.

56.3.2 What Policy Analysts See

Eight different types of data combine to tell you if a policy actually works. Like ingredients in a recipe, except this one tells you which recipes poison people.

Eight different types of data combine to tell you if a policy actually works. Like ingredients in a recipe, except this one tells you which recipes poison people.
  • Effect estimates with standard errors, confidence intervals, and heterogeneity statistics
  • Policy Impact Scores (PIS) for each policy-outcome relationship (intermediate metric)
  • Bradford Hill criteria scores for causality assessment
  • Analysis method used (synthetic control, DiD, RDD) with quality diagnostics
  • Confounders controlled and potential threats to validity
  • Natural experiments identified for validation opportunities
  • Jurisdiction-specific adjustments based on demographics, existing policies, and context

56.4 Introduction

56.4.1 Why Policy Ranking Fails Today

Current policy adoption follows a process dominated by political economy dynamics well-documented in the public choice literature140,141:

  1. Lobbying intensity: Policies that benefit concentrated interests (with resources to lobby) are adopted over policies that benefit diffuse majorities142,143
  2. Ideological priors: Policymakers filter evidence through pre-existing beliefs, accepting studies that confirm priors and rejecting those that don’t
  3. Anecdote-driven reasoning: Vivid individual cases drive policy more than systematic evidence (“If it saves one child…”)
  4. Status quo bias: Existing policies persist regardless of evidence because change requires political capital
  5. Salience heuristics: Policies addressing visible problems (terrorism, rare diseases) receive disproportionate resources relative to invisible problems (air pollution, chronic disease)

The result: welfare losses from documented policy failures. Evidence-based policy movements have attempted to address these failures144,145, but lack systematic, jurisdiction-specific recommendation generation.

Evidence says policy X works. But lobbying, fear of change, and shiny distractions filter it out. It’s like having the cure but drinking the poison because the bottle is prettier.

Evidence says policy X works. But lobbying, fear of change, and shiny distractions filter it out. It’s like having the cure but drinking the poison because the bottle is prettier.

56.4.2 Scale of Available Evidence

The evidence base comprises millions of policy-years of natural experiments across all jurisdictional levels (see Section 56.2 for detailed counts). Systematically analyzing this data provides a far better basis for policy adoption than the current system of lobbying-driven, ideology-filtered decision-making.

Current system: decide based on feelings, maybe 10 examples. New system: decide based on millions of examples. It’s the difference between astrology and astronomy, but for governance.

Current system: decide based on feelings, maybe 10 examples. New system: decide based on millions of examples. It’s the difference between astrology and astronomy, but for governance.

56.4.3 Contributions

  1. Methodological: A systematic framework for translating quasi-experimental evidence into jurisdiction-specific policy recommendations, extending beyond generic evidence ratings to actionable output in four categories (enact/replace/repeal/maintain).

Evidence becomes a score. Score tells you: do this new thing, swap that old thing, stop doing that terrible thing, or keep doing that good thing. It’s like Marie Kondo, but for laws.

Evidence becomes a score. Score tells you: do this new thing, swap that old thing, stop doing that terrible thing, or keep doing that good thing. It’s like Marie Kondo, but for laws.
  1. Taxonomic: We formalize the four recommendation types and introduce the Policy Impact Score (PIS) as an intermediate metric combining effect magnitude, causal confidence (Bradford Hill criteria), and methodological quality. This provides a standardized approach to evidence aggregation.

  2. Applied: We demonstrate the complete framework with a worked example for Texas traffic safety policy, showing how generic effect estimates are translated into context-adjusted, prioritized recommendations with blocking factors and tracking guidance.

56.4.4 Performance Benchmarking

OPG is designed to generate deployable, evidence-weighted recommendations. Retrospective and prospective benchmarking refine the thresholds, context adjustments, and prioritization rules over time (see Section 56.20).

56.6 Theoretical Framework

56.6.1 The Policy Optimization Problem

Let \(\mathcal{P}\) denote the set of available policies. For jurisdiction \(j\), let \(P_j \subseteq \mathcal{P}\) denote the current policy bundle. Welfare under policy bundle \(P\) is defined using the two core metrics:

\[ W_j(P) = \alpha \cdot \text{IncomeGrowth}_j(P) + (1-\alpha) \cdot \text{HealthyYears}_j(P) \]

Where:

  • \(\text{IncomeGrowth}_j(P)\) = Real after-tax median income growth (pp/year)
  • \(\text{HealthyYears}_j(P)\) = Median healthy life expectancy (years)
  • \(\alpha = 0.5\) (default equal weighting; can be adjusted for jurisdiction priorities)

The social planner’s problem: \[ P_j^* = \arg\max_{P \subseteq \mathcal{P}} W_j(P) \quad \text{subject to feasibility constraints} \]

Assumption 1 (Additive Separability): For tractability, assume each metric is approximately additively separable across policies: \[ \text{IncomeGrowth}_j(P) \approx \sum_{p \in P} \beta^{\text{inc}}_{jp} + \varepsilon_{\text{inc}} \] \[ \text{HealthyYears}_j(P) \approx \sum_{p \in P} \beta^{\text{hlth}}_{jp} + \varepsilon_{\text{hlth}} \]

where \(\beta^{\text{inc}}_{jp}\) and \(\beta^{\text{hlth}}_{jp}\) are the marginal effects of policy \(p\) on each metric in jurisdiction \(j\), and interaction terms are assumed to be second-order.

Two circles: what you do now, what you should do. The bits that don’t overlap are where people are dying unnecessarily. Venn diagrams finally do something useful.

Two circles: what you do now, what you should do. The bits that don’t overlap are where people are dying unnecessarily. Venn diagrams finally do something useful.

Justification and limitations: Additive separability is a standard simplifying assumption in policy analysis (see146 for regulatory impact analysis applications). This assumption is most valid when: (1) policies operate through distinct mechanisms, (2) jurisdictions have not reached saturation in any policy domain, and (3) policies do not create complementarities or substitution effects. When these conditions fail (for example, when a carbon tax interacts with renewable energy subsidies), the marginal effects may be mis-estimated.

Policy Interaction Detection:

OPG flags potential interaction effects using the following heuristics:

  1. Effect heterogeneity test: If a policy’s effect varies significantly depending on whether another policy is present, flag the pair as potentially interacting.

  2. Known interaction database: Documented policy complementarities and substitutes:

Policy A Policy B Interaction Type Evidence
Seat belt law Speed limit Complementary Both target crash fatalities
Nutrition labeling School lunch programs Complementary Both improve dietary outcomes
Tobacco tax Smoking ban Complementary Reinforce each other
Income tax cut Sales tax increase Substitutable Offsetting fiscal effects
  1. Sensitivity analysis recommendation: For high-priority recommendations, report: “How would this recommendation change if policies X and Y interact?” with bounds on combined effect.

Proposition 1 (Policy Gap Characterization): Under Assumption 1, the welfare-optimal policy set satisfies: \[ P_j^* = \{p \in \mathcal{P} : w_j(p) > 0\} \]

and the policy gap for jurisdiction \(j\) is: \[ \Delta_j = (P_j^* \setminus P_j) \cup (P_j \setminus P_j^*) \]

where \((P_j^* \setminus P_j)\) represents beneficial policies the jurisdiction lacks (enact candidates) and \((P_j \setminus P_j^*)\) represents harmful policies the jurisdiction has (repeal candidates). See Section 56.9 for the operational implementation.

Proof: Direct consequence of additive separability. Include policy \(p\) if and only if \(w_j(p) > 0\). ∎

56.6.2 Evidence Aggregation Properties

Proposition 2 (PIS as Precision-Weighted Evidence): Under random-effects meta-analysis with between-jurisdiction variance \(\tau^2\), the pooled effect estimate \(\hat{\beta}_{\text{pooled}}\) is (see Section 56.13 for implementation): \[ \hat{\beta}_{\text{pooled}} = \frac{\sum_j \frac{1}{\text{SE}_j^2 + \tau^2} \hat{\beta}_j}{\sum_j \frac{1}{\text{SE}_j^2 + \tau^2}} \]

with variance: \[ \text{Var}(\hat{\beta}_{\text{pooled}}) = \frac{1}{\sum_j \frac{1}{\text{SE}_j^2 + \tau^2}} \]

Proof: Standard random-effects meta-analysis derivation (DerSimonian-Laird). ∎

Proposition 3 (Heterogeneity Bounds Transferability): When \(I^2 > 75\%\) (high heterogeneity): \[ \text{Var}[\hat{\beta}_j | \hat{\beta}_{\text{pooled}}] > 0.75 \cdot \text{Var}[\hat{\beta}_j] \]

meaning the pooled estimate explains less than 25% of cross-jurisdiction variation. Context-specific estimates are required rather than direct application of the pooled effect. This constraint is operationalized in Section 56.13.4.

Proof: By definition, \(I^2 = \frac{\tau^2}{\tau^2 + \bar{\sigma}^2}\) where \(\bar{\sigma}^2\) is typical within-study variance. When \(I^2 > 0.75\), between-study variance dominates, and the pooled estimate provides limited information about any individual jurisdiction’s true effect. ∎

56.6.3 Information Value

Proposition 4 (Value of Additional Evidence): The expected value of information from an additional jurisdiction study is: \[ \text{VOI} = E[\max_{a \in \{adopt, reject\}} U(a | \text{new data})] - \max_{a} E[U(a | \text{current data})] \]

which is maximized when prior uncertainty is high and decision stakes are large.

Proof: Standard Bayesian decision theory144. ∎

Corollary 1 (Trial Prioritization): Policies with (1) high prior variance in effect estimates, (2) large potential welfare impact, and (3) low trial cost should be prioritized for experimental validation. See Section 56.17 for implementation.

56.7 Core Methodology

56.7.1 Policy-Outcome Data Structure

The OPG system uses a relational database schema. The following is a reference implementation showing the conceptual data model; production deployments may vary.

How the database connects policies to outcomes. It’s plumbing, but for knowledge instead of waste. Although some policies are also waste.

How the database connects policies to outcomes. It’s plumbing, but for knowledge instead of waste. Although some policies are also waste.

56.7.1.1 Core Tables

-- Hierarchical jurisdictions (country > state > county > city)
jurisdictions (
    id, name, jurisdiction_type, -- 'country', 'state', 'county', 'city'
    parent_id, -- FK to parent jurisdiction (e.g., Texas -> USA)
    iso_code, population, gdp_per_capita,
    constitution_type, -- constraints on policy space
    data_quality_score, -- how complete is our policy inventory?
    latitude, longitude, ...
)

-- Policy types (canonical definitions)
policy_types (
    id, name, policy_category_id, policy_type,
    is_continuous, typical_onset_delay_days,
    typical_duration_of_effect_years, canonical_text, ...
)

-- Current policy inventory by jurisdiction
jurisdiction_policies (
    jurisdiction_id, policy_type_id,
    has_policy BOOLEAN,
    policy_strength, -- e.g., tobacco tax amount, not just yes/no
    implementation_date,
    policy_details_json,
    data_source, last_verified
)

-- Two core welfare metrics (fixed schema)
outcome_metrics (
    id,
    metric_type ENUM('income', 'health'), -- Only two types
    jurisdiction_id,
    measurement_date,
    value, -- pp/year for income; years for health
    confidence_interval_low,
    confidence_interval_high,
    data_source -- Census/BLS for income; WHO/BRFSS for health
)

-- Policy recommendations (generated output)
policy_recommendations (
    jurisdiction_id, policy_type_id,
    recommendation_type, -- 'enact', 'replace', 'repeal', 'maintain'
    current_status, -- what they have now (NULL if nothing)
    recommended_target, -- what evidence suggests
    -- Two-metric effects
    income_effect_pp, -- Expected effect on median income growth (pp/year)
    income_effect_ci_low, income_effect_ci_high,
    health_effect_years, -- Expected effect on healthy life years
    health_effect_ci_low, health_effect_ci_high,
    evidence_grade, priority_score,
    blocking_factors, -- 'constitutional', 'federal_preemption', 'political', 'autonomy', etc.
    similar_jurisdictions,
    -- Jurisdictional level guidance
    minimum_effective_level, recommended_level,
    -- Tracking for feedback loop
    tracking_frequency, tracking_baseline_method,
    last_generated
)

56.7.1.2 Policy Types

Type Description Example Measurement
law Statutory law passed by legislature Environmental regulation law Binary (exists/not)
regulation Administrative rule by agency Agency emission standards Continuous (stringency)
tax_policy Tax rate, bracket, credit, deduction Investment income tax rate Continuous (rate)
budget_allocation Spending decision Education spending per pupil Continuous ($/capita)
executive_order Executive action Enforcement priority directive Binary
court_ruling Judicial precedent Constitutional interpretation Binary
treaty International agreement Multilateral cooperation treaty Binary
local_ordinance Municipal rule Land use restrictions Categorical

56.7.2 Analysis Methods

Different ways to figure out if policies work when you can’t run proper experiments because ethics committees get upset about randomly killing control groups.

Different ways to figure out if policies work when you can’t run proper experiments because ethics committees get upset about randomly killing control groups.

The OPG system supports multiple quasi-experimental designs, reflecting the “credibility revolution” in applied economics152. Each method is appropriate for different data structures153:

56.7.2.1 Synthetic Control Method

Use case: Single treated jurisdiction, good donor pool of similar untreated jurisdictions.

Method: Construct a “synthetic” control as a weighted average of untreated jurisdictions that matches the treated jurisdiction’s pre-treatment outcome trajectory. Post-treatment divergence estimates the causal effect.

Quality metrics:

  • pre_treatment_rmse: How well does synthetic control match pre-treatment? (Lower is better)
  • placebo_p_value: Permutation test comparing treated effect to placebo effects (Lower is better)

Example: Effect of a state tobacco tax increase on smoking rates, using similar states without tax changes as donors154,155. For comprehensive reviews of the synthetic control method, see156.

56.7.2.2 Difference-in-Differences (DiD)

Use case: Multiple treated jurisdictions, staggered adoption timing, parallel trends assumption plausible.

Two lines run parallel, then one gets the policy and diverges. The gap between them is how much the policy helped or hurt. It’s like twins, but one gets vegetables.

Two lines run parallel, then one gets the policy and diverges. The gap between them is how much the policy helped or hurt. It’s like twins, but one gets vegetables.

Method: Compare pre-post change in treated jurisdictions to pre-post change in control jurisdictions. Difference of differences estimates treatment effect. For settings with staggered adoption, modern estimators account for heterogeneous treatment effects across cohorts157.

Quality metrics:

  • parallel_trends_test_stat: Test statistic for pre-treatment trend equality
  • parallel_trends_p_value: P-value for parallel trends test (Higher is better, want to fail to reject)

Example: Effect of occupational licensing reforms across states with different adoption timing.

56.7.2.3 Regression Discontinuity Design (RDD)

Use case: Sharp eligibility threshold determines treatment assignment.

Dots on either side of a line, big jump at the cutoff. People just above the line do better. It’s like being born one day later and getting free healthcare.

Dots on either side of a line, big jump at the cutoff. People just above the line do better. It’s like being born one day later and getting free healthcare.

Method: Compare outcomes just above vs. just below the threshold. If other characteristics are smooth across the threshold, the discontinuity in outcomes estimates the causal effect.

Quality metrics:

  • Bandwidth selection diagnostics
  • McCrary density test for manipulation
  • Covariate balance at threshold

Example: Effect of program eligibility on outcomes at an income or age threshold (e.g., retirement benefits at age 65).

56.7.2.4 Event Study / Interrupted Time Series

Use case: Need to visualize pre-trends and dynamic treatment effects.

Nothing happens, nothing happens, nothing happens, policy hits, then things change. It’s like a heart rate monitor, but for legislation instead of life.

Nothing happens, nothing happens, nothing happens, policy hits, then things change. It’s like a heart rate monitor, but for legislation instead of life.

Method: Estimate treatment effects at each time period relative to treatment, including leads (pre-treatment) and lags (post-treatment).

Quality metrics:

  • Pre-treatment coefficients should be near zero (no anticipation)
  • Post-treatment coefficients show effect dynamics

Example: Effect of unemployment insurance extensions on job search behavior, showing both anticipation effects (before benefits expire) and persistence of impact (after return to baseline).

56.7.2.5 Confidence Weighting by Method

The following weights reflect default values based on the methodological rigor hierarchy in applied economics.

Method Base Confidence Weight Rationale
Randomized experiment 1.00 Gold standard; rare for policies
Regression discontinuity 0.90 Local randomization at threshold
Synthetic control 0.85 Good pre-treatment fit implies validity
Difference-in-differences 0.80 Requires untestable parallel trends
Event study 0.75 Descriptive of dynamics; less rigorous
Interrupted time series 0.65 Single-unit; history threats
Simple before-after 0.40 No control group; confounding likely
Cross-sectional 0.25 Snapshot; severe confounding

56.7.3 Bradford Hill Criteria Scoring Functions

Bradford Hill’s criteria for causality158, originally developed for epidemiology, are operationalized here as explicit scoring functions. Each criterion maps to a saturation function that produces a score in \([0, 1]\).

Take nine different ways to check if something causes something else, squish them into numbers between 0 and 1. Science loves turning confidence into decimals.

Take nine different ways to check if something causes something else, squish them into numbers between 0 and 1. Science loves turning confidence into decimals.

56.7.3.1 Strength of Association

Larger effect estimates provide stronger evidence. We use an exponential saturation function:

\[ S_{\text{strength}} = 1 - e^{-|\hat{\beta}_{\text{std}}| / \beta_{\text{sig}}} \]

Where \(|\hat{\beta}_{\text{std}}|\) is the absolute standardized effect size and \(\beta_{\text{sig}} = 0.3\) is the saturation parameter.

Parameter justification: The threshold \(\beta_{\text{sig}} = 0.3\) corresponds to Cohen’s convention for a “medium” effect size in social science159. This is a starting point; sensitivity analysis shows PIS changes by ±15% when \(\beta_{\text{sig}}\) varies from 0.2 to 0.4. A standardized effect of 0.3 yields \(S_{\text{strength}} \approx 0.63\); effects of 0.6+ yield scores \(>0.86\).

56.7.3.2 Consistency Across Jurisdictions

Replication across contexts provides stronger evidence. Scored by number of independent jurisdiction studies:

\[ S_{\text{consistency}} = 1 - e^{-N_j / N_{\text{sig}}} \]

Where \(N_j\) is the number of jurisdictions with concordant effect direction and \(N_{\text{sig}} = 10\) is the saturation parameter.

Parameter justification: The threshold \(N_{\text{sig}} = 10\) reflects that replication across 10+ independent jurisdictions provides strong evidence against idiosyncratic local effects. This aligns with meta-analytic conventions where 10+ studies enable reliable heterogeneity estimation160. Sensitivity analysis shows PIS varies by ±12% when \(N_{\text{sig}}\) ranges from 7 to 15. Five concordant jurisdictions yield \(S_{\text{consistency}} \approx 0.39\); ten yield \(\approx 0.63\).

56.7.3.3 Temporality (Required)

Policy adoption must precede outcome change. This is binary (either satisfied or not):

\[ S_{\text{temporality}} = \begin{cases} 1.0 & \text{if } \delta > 0 \\ 0.0 & \text{otherwise} \end{cases} \]

Where \(\delta\) is the lag between policy implementation and outcome measurement. If temporality is violated, the overall CCS is zeroed regardless of other criteria.

56.7.3.4 Dose-Response Gradient

For continuous policies (tax rates, spending levels), dose-response strengthens causal inference:

\[ S_{\text{gradient}} = \frac{r_{\text{dose}}^2}{r_{\text{dose}}^2 + r_{\text{sig}}^2} \]

Where \(r_{\text{dose}}\) is the correlation between policy intensity and outcome magnitude, and \(r_{\text{sig}} = 0.5\) is the saturation parameter.

Parameter justification: The threshold \(r_{\text{sig}} = 0.5\) reflects that a correlation of 0.5 between policy intensity and outcome represents moderate dose-response evidence. This is analogous to toxicological dose-response standards where monotonic relationships strengthen causal inference161. Sensitivity analysis shows PIS varies by ±8% when \(r_{\text{sig}}\) ranges from 0.3 to 0.7. A dose-response correlation of 0.5 yields \(S_{\text{gradient}} = 0.5\); correlation of 0.7 yields \(\approx 0.66\).

Binary policies: For binary (yes/no) policies, dose-response cannot be assessed. Rather than defaulting to a neutral score of 0.5, binary policies are marked as “N/A” for gradient and this criterion is excluded from the CCS calculation (weights are renormalized across remaining criteria). This prevents binary policies from being systematically penalized relative to continuous policies.

56.7.3.5 Experiment Quality

Quality of the quasi-experimental design, weighted by validity diagnostic violations:

\[ S_{\text{experiment}} = w_{\text{method}} \times (1 - v_{\text{violations}}) \]

Where \(w_{\text{method}}\) is the base method weight and \(v_{\text{violations}} \in [0, 1]\) is the proportion of validity checks failed (parallel trends, pre-treatment fit, placebo tests).

56.7.3.6 Plausibility (Mechanistic)

Economic or behavioral mechanism linking policy to outcome. Scored by expert-validated mechanism database:

\[ S_{\text{plausibility}} = \frac{\sum_i w_i \cdot m_i}{\sum_i w_i} \]

Where \(m_i \in \{0, 1\}\) indicates whether mechanism component \(i\) is satisfied and \(w_i\) are component weights.

Mechanism component checklist:

Component Weight Assessment Criterion
Economic theory predicts direction 0.30 Peer-reviewed theory paper supports predicted sign
Behavioral response documented 0.25 Empirical evidence of behavioral change in response to similar policies
No implausible required assumptions 0.20 Mechanism doesn’t require assumptions contradicted by evidence
Timing consistent with mechanism 0.15 Effect onset matches expected mechanism timeline
Magnitude plausible 0.10 Effect size within range predicted by mechanism

Scoring procedure: Each component is scored binary (0 or 1) by literature review. The weighted sum yields \(S_{\text{plausibility}} \in [0, 1]\). When expert-validated mechanism assessments are unavailable, this score defaults to 0.5 with a note that mechanism plausibility is unassessed.

56.7.3.7 Coherence with Literature

Consistency with broader economic and social science evidence:

\[ S_{\text{coherence}} = 1 - e^{-N_{\text{studies}} / N_{\text{sig}}} \]

Where \(N_{\text{studies}}\) is the count of supporting studies in the literature and \(N_{\text{sig}} = 5\). Three supporting studies yield \(S_{\text{coherence}} \approx 0.45\); ten yield \(\approx 0.86\).

56.7.3.8 Specificity

Whether the policy affects specific outcomes rather than everything:

\[ S_{\text{specificity}} = \frac{1}{1 + \log(1 + N_{\text{outcomes}})} \]

Where \(N_{\text{outcomes}}\) is the number of outcome categories with significant effects. A policy affecting 1-2 outcomes has \(S_{\text{specificity}} > 0.7\); a policy affecting 10+ outcomes has \(S_{\text{specificity}} < 0.3\). Lower specificity suggests confounding or measurement artifact.

56.7.4 Causal Confidence Score (CCS) Calculation

The aggregate CCS combines the eight non-temporality criteria with explicit weights, gated by temporality:

\[ \text{CCS} = S_{\text{temporality}} \times \frac{\sum_{k \neq \text{temp}} w_k \cdot S_k}{\sum_{k \neq \text{temp}} w_k} \]

\(S_{\text{temporality}}\) acts as a binary gate: if temporality fails (policy doesn’t precede outcome), the entire CCS is zero regardless of other criteria scores.

Proposed default criterion weights:

These weights reflect the relative importance of each criterion for causal inference in policy contexts. They can be adjusted based on domain expertise, sensitivity analysis, or implementation experience. The weights are adapted from the epidemiological Bradford Hill framework.

Criterion Weight Role
Temporality Gate Binary prerequisite (must be 1.0 to proceed)
Experiment 0.225 Method quality is primary for causal inference
Consistency 0.19 Replication across jurisdictions crucial
Strength 0.15 Effect magnitude matters for welfare
Gradient 0.125 Dose-response is strong causal evidence
Coherence 0.10 Literature support adds confidence
Plausibility 0.09 Mechanism existence supports causation
Specificity 0.06 Targeted effects more credible
Analogy 0.06 Transfer learning from similar policies

Weights for the eight scored criteria sum to 1.0. Temporality is not weighted because it is a binary gate, not a continuous score.

56.8 Jurisdiction Policy Inventory

56.8.1 Tracking Current Policies by Jurisdiction

Before generating recommendations, OPG must know what policies each jurisdiction currently has. The jurisdiction_policies table tracks:

Field Description Example
has_policy Whether jurisdiction has this policy type TRUE/FALSE
policy_strength For continuous policies, the current level $1.41/pack (tobacco tax)
implementation_date When current policy took effect 2009-01-01
policy_details_json Structured details about implementation {“primary_enforcement”: false}
data_source Where this information came from “Texas Tax Code §154.021”
last_verified When this was last confirmed accurate 2024-06-15

56.8.2 Data Sources for Policy Status

Jurisdiction Level Primary Sources Update Frequency
Country WTO, OECD, IMF policy databases Annual
US State NCSL, state legislative databases, LexisNexis Continuous
EU Member EUR-Lex, national legal databases Continuous
US City/County Municipal code databases, Municode Varies
Other Subnational National statistics offices, academic datasets Varies

56.8.3 Handling Missing Data

Data completeness varies by jurisdiction and policy type:

Data Quality Score Interpretation Recommendation Confidence
> 0.9 Comprehensive inventory Full confidence
0.7 - 0.9 Most major policies tracked High confidence
0.5 - 0.7 Significant gaps Medium confidence; flag gaps
< 0.5 Sparse data Low confidence; prioritize data collection

Recommendations are only generated when policy status is known with reasonable confidence.

56.9 Policy Gap Analysis

56.9.1 Comparing Current to Optimal

For each jurisdiction \(j\), the policy gap for policy type \(p\) is:

\[ \text{Gap}_{jp} = \text{Evidence-Supported}_{p} - \text{Current}_{jp} \]

Where:

  • Evidence-Supported: What the evidence suggests the jurisdiction should have
  • Current: What the jurisdiction actually has

56.9.2 Gap Types

Gap Type Definition Example
Missing policy Jurisdiction lacks a policy with strong positive evidence Texas lacks primary seat belt enforcement
Harmful policy Jurisdiction has a policy with strong negative evidence Jurisdiction has policy X shown to increase mortality
Suboptimal strength Continuous policy set below evidence-supported level Minimum wage below optimal level
Excessive strength Continuous policy set above evidence-supported level Speed limit at 85 mph vs. optimal ~70 mph

56.9.3 Priority Scoring

Recommendations are ranked by priority score, which combines gap magnitude, evidence quality, and expected welfare impact:

\[ \text{Priority}_{jp} = |\text{Gap}_{jp}| \times \text{PIS}_p \times M_{jp} \]

Where:

  • \(|\text{Gap}_{jp}|\) = Absolute difference between evidence-supported and current policy level (normalized to \([0, 1]\))
  • \(\text{PIS}_p\) = Policy Impact Score (see Section 56.12), capturing effect magnitude and causal confidence
  • \(M_{jp}\) = Monetized annual welfare impact, adjusted for jurisdiction \(j\)’s population and context

Priority tiers:

Tier Priority Score Interpretation
Critical \(\geq 0.80\) Immediate action recommended
High \([0.50, 0.80)\) Strong candidate for adoption
Medium \([0.25, 0.50)\) Consider if political capital available
Low \(< 0.25\) Monitor for better evidence

High-priority recommendations have: 1. Large gap between current and optimal 2. Strong evidence (Grade A or B; high PIS) 3. Large expected welfare impact (high M)

56.9.4 Context Adjustment

Effect estimates are adjusted for jurisdiction characteristics:

Adjustment Factor Description Example
Demographics Age structure, income distribution Tobacco tax effect varies by income
Existing policies Interaction with current policy bundle Effect depends on what else is in place
Institutional capacity Enforcement capability Weak institutions → smaller effects
Cultural factors Compliance norms Varies by society

Context Adjustment Algorithm:

The context adjustment multiplier is computed as:

\[ \text{Context Adjustment}_j = \prod_{k \in \{D, P, I, C\}} \left(1 + \delta_k \cdot d_{jk}\right) \]

Where:

  • \(d_{jk}\) = Standardized distance between jurisdiction \(j\) and the evidence-weighted mean on factor \(k\)
  • \(\delta_k\) = Sensitivity coefficient for factor \(k\) (estimated from heterogeneity analysis)
Factor \(\delta_k\) Default Quantification Method
Demographics (\(D\)) 0.15 Distance on age/income/education distributions
Existing policies (\(P\)) 0.10 Policy overlap comparison
Institutional (\(I\)) 0.20 World Bank Governance Indicators
Cultural (\(C\)) 0.10 Hofstede dimensions + compliance indices

Uncertainty widening:

When jurisdiction \(j\) differs substantially from the evidence base (context adjustment \(< 0.7\) or \(> 1.3\)), confidence intervals are widened:

\[ \text{CI}_{\text{adjusted}} = \text{CI}_{\text{pooled}} \times \left(1 + 0.5 \times |1 - \text{Context Adjustment}_j|\right) \]

This reflects increased uncertainty when extrapolating beyond the observed evidence distribution.

56.10 Recommendation Generation

56.10.1 Recommendation Types

Type Question When to Use Example
Enact “Add this?” New policy the jurisdiction doesn’t have “ENACT primary seat belt law”
Replace “Change this?” Modify existing policy level or approach “REPLACE tobacco tax: $1.41 → $2.50”
Repeal “Remove this?” Remove policy with negative evidence “REPEAL [harmful policy]”
Maintain “Keep this?” Current policy is evidence-supported “MAINTAIN DUI threshold at 0.08 BAC”

For continuous policies (taxes, spending levels), Replace specifies the change from current to optimal level. Enact is reserved for truly new policies that don’t exist in the jurisdiction.

56.10.2 Blocking Factors

Recommendations flag constraints that may impede adoption:

Blocking Factor Severity Description Example
Constitutional Constraint Hard Requires constitutional amendment Takings Clause limits on land use regulations
Federal Preemption Hard Federal law prevents state/local action Federal minimum wage floor
Treaty Obligation Hard International agreement constrains policy WTO rules on tariffs
Autonomy Concern Soft Restricts individual freedom/choice Mandatory helmet laws
Political Feasibility Soft Strong organized opposition Industry lobbying
Implementation Cost Soft High fixed costs to implement New regulatory agency needed

Design rationale: Why blocking factors are metadata only

OPG produces evidence-based rankings, not political forecasts. Blocking factors are flagged but do not affect algorithmic priority scores, for three reasons:

  1. Political feasibility shifts over time. A policy “impossible” in 2020 may be mainstream by 2025. Filtering by current political feasibility would lock in the status quo and fail to surface the evidence-supported set.

  2. Politicians know their context. An elected official in Texas understands local political dynamics better than any algorithm. OPG provides the evidence; filtering is left to policymaker judgment.

  3. Autonomy tradeoffs require human judgment. A universal helmet law may save lives but restrict freedom. This is a value judgment, not an evidence question. OPG surfaces the health/income effects; the autonomy tradeoff is for democratic deliberation.

Hard vs. Soft blocking factors:

  • Hard blockers (constitutional, preemption, treaty): These represent legal impossibility at the current jurisdictional level. Recommendations with hard blockers are marked distinctly but still shown, as they may inform advocacy for constitutional change or higher-level policy.

  • Soft blockers (political, cost, autonomy): These represent practical difficulty, not impossibility. Many transformative policies faced “impossible” political opposition before adoption.

Important: The full evidence-supported recommendation set is always shown. Users can filter by blocking factor severity if desired, but the default view shows all recommendations ranked by expected welfare impact.

56.10.3 Similar Jurisdictions

For each recommendation, OPG identifies jurisdictions that: 1. Had similar characteristics to the target jurisdiction 2. Adopted the recommended policy 3. Experienced the predicted effects

This provides concrete examples for policymakers: “Vermont (similar demographics, adopted this in 2015, saw -7.1 pp smoking reduction).”

How to find good examples to copy: find places like you, who did the thing, and didn’t collapse. It’s like plagiarism, but encouraged.

How to find good examples to copy: find places like you, who did the thing, and didn’t collapse. It’s like plagiarism, but encouraged.

56.10.3.1 Computing Jurisdiction Similarity

Similarity between jurisdictions \(j_1\) and \(j_2\) is computed as a weighted sum across three dimensions:

\[ \text{sim}(j_1, j_2) = w_D \cdot \text{sim}_D(j_1, j_2) + w_P \cdot \text{sim}_P(j_1, j_2) + w_I \cdot \text{sim}_I(j_1, j_2) \]

Where default weights are \(w_D = 0.4\), \(w_P = 0.3\), \(w_I = 0.3\).

Demographic Similarity (\(\text{sim}_D\)):

Variable Weight Normalization
Log GDP per capita 0.25 By cross-jurisdiction SD
Population log 0.15 By cross-jurisdiction SD
Median age 0.20 By cross-jurisdiction SD
Urban population % 0.15 By cross-jurisdiction SD
Education (years) 0.15 By cross-jurisdiction SD
Gini coefficient 0.10 By cross-jurisdiction SD

\[ \text{sim}_D = 1 - \frac{\sum_k w_k |z_{j_1,k} - z_{j_2,k}|}{\sum_k w_k \cdot 4} \]

Where \(z\) values are z-scores and the denominator normalizes to [0,1] (4 SD maximum difference).

Institutional Similarity (\(\text{sim}_I\)):

Feature Comparison
Federal vs. unitary Binary match (1.0 if same, 0.5 if different)
Legal tradition Common law, civil law, mixed (1.0/0.5/0.0)
Enforcement capacity World Bank governance indicator proximity
Corruption level Transparency International CPI proximity

Usage: Jurisdictions with \(\text{sim}(j_1, j_2) > 0.7\) are considered “similar” for evidence transfer purposes. Effect estimates from similar jurisdictions receive higher weight in context adjustment.

56.11 Optimal Jurisdictional Level for Policy Implementation

56.11.1 The Subsidiarity Principle for Evidence Generation

OPG recommends policies be implemented at the lowest jurisdictional level where the policy can be effective, for two reasons:

  1. Maximize experimental data: 50 states experimenting > 1 federal policy. 3,000+ counties > 50 states. More jurisdictions = more natural experiments = faster evidence accumulation.

Federal level: little data, big risk. County level: lots of data, small risk. It’s safer to experiment in Shropshire than with the entire country.

Federal level: little data, big risk. County level: lots of data, small risk. It’s safer to experiment in Shropshire than with the entire country.
  1. Minimize harm from policy failures: A failed city ordinance affects thousands; a failed federal policy affects hundreds of millions. Lower-level experimentation bounds downside risk.

56.11.2 When Higher Levels Are Necessary

Some policies require higher jurisdictional levels:

Reason Example Recommendation
Externalities Pollution crosses borders State or federal
Race-to-bottom risk Labor standards, tax competition Federal floor, state variation above
Network effects Infrastructure standards Federal coordination
Economies of scale Defense, diplomacy National

56.11.3 Jurisdictional Level in Recommendations

For each policy recommendation, OPG specifies:

Field Example
Minimum effective level “City or higher”
Recommended level “City (maximize data collection)”
Current adoption “12 states, 47 cities have this”
Level constraints “Federal preemption prevents city-level”

56.12 Policy Impact Score (Intermediate Metric)

56.12.1 Overview

The Policy Impact Score (PIS) is the intermediate metric used to generate recommendations. It quantifies the strength of evidence that a policy affects an outcome, combining effect magnitude, causal confidence, and analysis quality into a single score.

56.12.2 Jurisdiction-Level PIS Calculation

How to calculate if a policy works: add up how big the effect is, how sure we are, and how good the data is, for both money and health. Then argue about the number.

How to calculate if a policy works: add up how big the effect is, how sure we are, and how good the data is, for both money and health. Then argue about the number.

For each jurisdiction \(j\) and policy \(p\), compute PIS separately for each of the two metrics:

\[ \text{PIS}^{\text{inc}}_{jp} = |\hat{\beta}^{\text{inc}}_{jp}| \times \text{CCS}^{\text{inc}}_{jp} \times Q_{jp} \]

\[ \text{PIS}^{\text{hlth}}_{jp} = |\hat{\beta}^{\text{hlth}}_{jp}| \times \text{CCS}^{\text{hlth}}_{jp} \times Q_{jp} \]

Where:

  • \(|\hat{\beta}^{\text{inc}}_{jp}|\) = Absolute standardized effect on median income growth (pp/year)
  • \(|\hat{\beta}^{\text{hlth}}_{jp}|\) = Absolute standardized effect on healthy life years
  • \(\text{CCS}\) = Causal Confidence Score from Bradford Hill criteria (see Section 56.7.3)
  • \(Q_{jp}\) = Quality adjustment factor based on analysis method

The combined PIS for ranking purposes is:

\[ \text{PIS}_{jp} = 0.5 \times \text{PIS}^{\text{inc}}_{jp} + 0.5 \times \text{PIS}^{\text{hlth}}_{jp} \]

56.12.3 Always Report Both Metrics Separately

While combined PIS is useful for ranking, recommendations should always display effects on both metrics:

Policy Income Effect Health Effect CCS Grade
Primary seat belt +0.02 pp/yr +0.15 years 0.81 A
Speed limit reduction +0.01 pp/yr +0.06 years 0.73 B

The two-metric format makes tradeoffs explicit when they exist, preventing policies from hiding negative effects behind aggregated scores.

56.12.4 Effect Estimate Standardization

Each metric is standardized using cross-jurisdictional standard deviations:

\[ \hat{\beta}^{\text{inc}}_{\text{std}} = \frac{\hat{\beta}^{\text{inc}}_{\text{raw}}}{\sigma_{\text{income}}} \quad\quad \hat{\beta}^{\text{hlth}}_{\text{std}} = \frac{\hat{\beta}^{\text{hlth}}_{\text{raw}}}{\sigma_{\text{health}}} \]

Where \(\sigma_{\text{income}}\) is the cross-jurisdictional SD of median income growth (typically ~1.5 pp/year) and \(\sigma_{\text{health}}\) is the cross-jurisdictional SD of healthy life expectancy (typically ~3-5 years).

56.12.5 Quality Adjustment Factor

\[ Q = w_{\text{method}} \cdot (1 - \text{violations}) \]

Where:

  • \(w_{\text{method}}\) = Method confidence weight (see table above)
  • \(\text{violations}\) = Proportion of validity checks failed (parallel trends, pre-treatment fit, etc.)

56.12.6 Confounder Adjustment

For each analysis, we track which confounders were controlled:

{
    "confounders_controlled": ["gdp_growth", "unemployment", "population_age_structure"],
    "confounders_not_controlled": ["neighboring_policy_spillovers", "measurement_error"],
    "confounder_sensitivity": 0.85
}

The confounder_sensitivity field estimates how much the effect estimate might change if uncontrolled confounders were addressed (Oster’s delta,162).

Policy causes outcome, but other things also cause outcome. We control for the things we know about. The things we don’t know about are called ‘oops.’

Policy causes outcome, but other things also cause outcome. We control for the things we know about. The things we don’t know about are called ‘oops.’

56.13 Global (Aggregate) PIS Calculation

Aggregate estimates combine jurisdiction-level analyses via random-effects meta-analysis.

56.13.1 Pooled Effect Estimate

\[ \hat{\beta}_{\text{pooled}} = \frac{\sum_j w_j \hat{\beta}_j}{\sum_j w_j} \]

Where weights incorporate both within-study variance and between-study heterogeneity:

\[ w_j = \frac{1}{\text{SE}_j^2 + \tau^2} \]

56.13.2 Pooled PIS Across Jurisdictions

The aggregate PIS for policy \(p\) and outcome \(o\) is:

\[ \text{PIS}_{\text{pooled}} = \frac{\sum_j w_j \cdot \text{PIS}_j}{\sum_j w_j} \]

This precision-weighted average gives more influence to high-precision estimates (low SE) while accounting for true heterogeneity (\(\tau^2\)).

56.13.3 Heterogeneity Statistics

Following standard meta-analysis conventions148:

  • : Percentage of variance due to heterogeneity (vs. sampling error)

    • \(I^2 < 25\%\): Low heterogeneity
    • \(25\% \leq I^2 < 75\%\): Moderate heterogeneity
    • \(I^2 \geq 75\%\): High heterogeneity (effects vary substantially across jurisdictions)
  • τ²: Estimated between-study variance

  • Q statistic: Cochran’s test for heterogeneity

High heterogeneity suggests moderators (policy effects vary by context) rather than a single true effect.

56.13.4 Evidence Grading

Evidence grades are assigned using explicit thresholds on PIS, heterogeneity (\(I^2\)), and jurisdiction count (\(N_j\)):

\[ \text{Grade} = \begin{cases} A & \text{if } \text{PIS} \geq 0.80 \text{ AND } I^2 < 0.50 \text{ AND } N_j \geq 5 \\ B & \text{if } \text{PIS} \geq 0.60 \text{ AND } I^2 < 0.50 \text{ AND } N_j \geq 3 \\ C & \text{if } \text{PIS} \geq 0.40 \text{ AND } I^2 < 0.75 \text{ AND } N_j \geq 2 \\ D & \text{if } \text{PIS} \geq 0.20 \\ F & \text{otherwise} \end{cases} \]

Grade interpretation:

Grade PIS Threshold Heterogeneity Jurisdictions Interpretation
A \(\geq 0.80\) \(I^2 < 50\%\) \(\geq 5\) Strong evidence; ready for implementation
B \(\geq 0.60\) \(I^2 < 50\%\) \(\geq 3\) Good evidence; consider piloting
C \(\geq 0.40\) \(I^2 < 75\%\) \(\geq 2\) Suggestive evidence; suitable for targeted pilots
D \(\geq 0.20\) Any Any Weak evidence; exploratory only
F \(< 0.20\) Any Any Insufficient evidence

Threshold benchmarking methodology:

These thresholds can be benchmarked against historical policy adoption outcomes. The benchmarking procedure:

  1. Historical validation: Apply OPG grading to policies adopted 10+ years ago with known outcomes
  2. Target validation rates: Grade A recommendations should validate at 70%+ rate; Grade B at 50%+
  3. Threshold adjustment: If observed validation rates differ from targets, adjust PIS and \(I^2\) thresholds

Heterogeneity threshold rationale:

The \(I^2 < 50\%\) threshold for Grades A and B follows Cochrane Collaboration guidance that heterogeneity above 50% indicates “substantial” variability across studies148. Grade C allows heterogeneity up to 75% (the “high” threshold) with explicit acknowledgment that effects are context-dependent. Above 75%, pooled estimates provide limited guidance for any specific jurisdiction.

Evidence grading decision rule (text summary): start with PIS threshold, then apply heterogeneity threshold (\(I^2\)), then jurisdiction count (\(N_j\)). The canonical Grade A/B threshold in this spec is \(I^2 < 50\%\).

Additional grade modifiers:

  • Conflicting evidence: Downgrade by 1 letter if direction of effect differs across high-quality studies
  • High-quality RCT: Automatic Grade A if RCT with low risk of bias, regardless of other criteria
  • Single jurisdiction: Maximum Grade C unless effect is extraordinarily large (\(|\hat{\beta}| > 1.0\) SD)

56.13.5 Context-Specific Confidence

Effects may vary by jurisdiction characteristics. We report confidence separately for:

Context Description Example Modifier
High-income countries OECD members, GDP/capita > $30K Tax policy effects
Low-income countries GDP/capita < $5K Different institutional capacity
Federal systems Policy set at national level vs. subnational variation
Subnational States, provinces, cities Local policy autonomy

56.14 Quality Requirements & Benchmarking

56.14.1 Minimum Thresholds for Inclusion

Criterion Minimum Rationale
Pre-treatment periods 4 Need to assess pre-trends
Post-treatment periods 2 Need to observe effect
Outcome observations 20 Statistical power
Control jurisdictions (for DiD) 5 Donor pool size
Pre-treatment RMSE (synthetic control) < 2 SD Acceptable pre-treatment fit

56.14.3 Pre-Treatment Fit (Synthetic Control)

How to check if your fake control group is good enough: measure error, try fake treatments, reject if it’s rubbish. Quality control for imaginary things.

How to check if your fake control group is good enough: measure error, try fake treatments, reject if it’s rubbish. Quality control for imaginary things.

For synthetic control analyses:

  1. Calculate RMSE of synthetic vs. actual treated unit pre-treatment
  2. Compare to distribution of placebo RMSEs (treating each donor as “treated”)
  3. If treated RMSE is in top 10% of placebo RMSEs, flag as poor fit
  4. Report ratio of post-treatment effect to pre-treatment RMSE

56.14.4 Placebo and Robustness Tests

Test Purpose Implementation
In-time placebo Does “treatment” show effect before it happened? Assign fake treatment date before actual
In-space placebo Do untreated units show similar effects? Apply analysis to control jurisdictions
Leave-one-out Is result driven by single jurisdiction? Re-estimate dropping each jurisdiction
Bandwidth sensitivity (For RDD) Is result robust to bandwidth choice? Estimate with multiple bandwidths
Covariate adjustment Does controlling for confounders change result? Add covariates, compare estimates

56.15 Interpreting Recommendations

56.15.1 Priority Tiers

Tier Criteria Action
Quick Wins High impact, low blocking factors, Grade A evidence Immediate adoption recommended
Major Reforms High impact, significant blocking factors Requires political capital; strategic timing
Long-Term Moderate impact, constitutional or treaty constraints Requires structural change
Monitor Moderate impact, Grade C/D evidence Watch for better evidence

56.15.2 Political Feasibility Notes

While OPG does not filter by political feasibility, it provides context:

  • Organized opposition: Industries or groups likely to lobby against
  • Public opinion: Polling data on similar policies where available
  • Adjacent jurisdictions: Whether neighbors have adopted (diffusion effects)
  • Historical attempts: Previous failed attempts and why

56.15.3 Sequencing Guidance

Start with easy wins, build momentum, bundle things together, hit critical mass. It’s like a diet plan, but for governance and with better success rates.

Start with easy wins, build momentum, bundle things together, hit critical mass. It’s like a diet plan, but for governance and with better success rates.

Some policies are easier to adopt after others:

  1. Quick wins first: Build political capital with easy, high-impact changes
  2. Complementary bundles: Some policies work better together
  3. Threshold effects: Some benefits only appear after critical mass of policies

56.16 Effect Size Benchmarks

Effect sizes are calibrated to cross-jurisdictional variation to aid interpretation:

Size Income (pp/year) Health (years) Example
Small < 0.05 < 0.1 Minor regulatory changes
Medium 0.05 - 0.15 0.1 - 0.3 Typical tax policy effects
Large 0.15 - 0.30 0.3 - 0.5 Major reform programs
Very Large > 0.30 > 0.5 Transformative policies (rare)

Calibration basis: US states vary by ~1.5 pp/year in median income growth and ~3-5 years in healthy life expectancy. A “medium” effect represents ~10% of cross-state variation.

Confidence interval interpretation:

  • Narrow (< 25% of effect): Precise estimate; high confidence
  • Moderate (25-50% of effect): Reasonable precision
  • Wide (> 50% of effect): Imprecise; low confidence

For the complete two-metric framework definition, see Section 56.1.

56.17 Trial Prioritization

56.17.1 Value of Information Calculation

The expected value of running a randomized trial on policy \(p\) is:

\[ \text{VOI}_p = P(\text{adopt}|\text{trial}) \cdot E[\text{benefit}|\text{trial}] - P(\text{adopt}|\text{no trial}) \cdot E[\text{benefit}|\text{no trial}] - \text{Cost}_{\text{trial}} \]

Policies with high VOI have:

  • High prior uncertainty: Current evidence is inconclusive
  • High potential impact: If the policy works, benefits are large
  • Low trial cost: Policy can be randomized in small jurisdictions cheaply
  • Decision relevance: Trial result would change adoption decision

56.17.2 Natural Experiment Identification

The system automatically identifies potential natural experiments:

Type Identification Method Example
Border discontinuity Adjacent jurisdictions with different policies Minimum wage differences at state borders
Temporal discontinuity Abrupt policy change Court ruling invalidating previous policy
Eligibility threshold Sharp cutoff for policy application Income threshold for benefit eligibility
Staggered adoption Different jurisdictions adopting at different times Unemployment insurance extensions by state
Lottery Random assignment (rare) Charter school lotteries
Court mandate Externally imposed change Desegregation orders

Identified natural experiments are stored in natural_experiments table for validation.

56.18 Data Sources

56.18.1 Primary Policy Databases

Database Coverage URL Use Case
V-Dem 202 countries, 1789-present v-dem.net163 Democracy indices, political institutions
Polity V 167 countries, 1800-present systemicpeace.org164 Regime type, political stability
CPDS 36 OECD, 1960-present cpds-data.org165 Economic policy, welfare state
OECD iLibrary OECD members oecd-ilibrary.org166 Tax, labor, education policy
Congress.gov US federal, 1973-present congress.gov167 US federal legislation
EUR-Lex EU, 1951-present eur-lex.europa.eu168 EU legislation and regulations

56.18.2 Primary Outcome Databases

Database Coverage URL Use Case
World Bank WDI 217 countries, 1960-present data.worldbank.org169 GDP, poverty, education, health
Our World in Data Global, varies ourworldindata.org170 Curated outcome metrics
WHO GHO Global who.int/data/gho171 Health outcomes
Penn World Tables 183 countries, 1950-present ggdc.net/pwt172 GDP, productivity, prices
SIPRI Global, 1949-present sipri.org173 Military spending
IMF 190 countries imf.org/data174 Fiscal, monetary indicators

56.18.3 Subnational Data

Country Source Coverage
United States Census Bureau, BLS, state agencies 50 states + territories
European Union Eurostat regional database ~300 NUTS-2 regions
India CMIE, NSS, state data portals 28 states + territories
China National Bureau of Statistics 31 provinces
Brazil IBGE 27 states

56.18.4 Jurisdiction Policy Inventory Sources

Level Source Coverage
US States NCSL State Legislation Database All 50 states, continuous updates
US States State government websites Primary verification
US Cities Municode, American Legal Publishing Major cities
Countries OECD Government at a Glance OECD members
Countries World Bank Doing Business (archived) 190 economies
EU EUR-Lex All member states

56.19 Limitations

56.19.1 Oracle Capture Risk

The measurement process itself can be captured:

  1. Outcome measurement: Agencies reporting outcomes have incentives to manipulate
  2. Policy implementation dates: Recording when policies “really” took effect is subjective
  3. Confounder selection: Which confounders to control affects estimates

Mitigation: Multiple independent data sources, pre-registered analysis protocols, adversarial audits.

56.19.2 Confounding Severity

Policy effects face more confounding than drug trials:

Confounder Type Example Mitigation
Economic cycles Recession coincides with policy Control for GDP growth, unemployment
Secular trends Improving health over time Include time trends, compare to controls
Selection Jurisdictions adopting policies differ Matching, synthetic control
Spillovers Neighboring policies affect outcomes Spatial controls, SUTVA violations noted
Reverse causality Outcomes drive policy adoption Instruments, timing-based identification

56.19.3 Heterogeneous Effects

Policy effects vary by:

  • Jurisdiction characteristics (income, institutions, culture)
  • Implementation fidelity
  • Complementary policies
  • Time period

High heterogeneity (I² > 75%) suggests context-dependence rather than universal effects.

Same policy, different places, different results. Turns out context matters. Who knew, apart from everyone who’s ever tried anything anywhere.

Same policy, different places, different results. Turns out context matters. Who knew, apart from everyone who’s ever tried anything anywhere.

56.19.4 Jurisdiction-Specific Caveats

Caveat Description Mitigation
Data completeness Policy inventory may be incomplete Flag data quality; recommend verification
Context transfer Effect in State A may not transfer to State B Adjust for observable differences; widen CIs
Implementation variation Same policy, different enforcement Track implementation quality where possible
Interaction effects Effect depends on other policies in place Model policy bundles, not just single policies

56.19.5 Time-Varying Effects

  • Short-run vs. long-run: Immediate effects may differ from sustained effects
  • Policy drift: Implementation changes over time (amendment_notes tracking)
  • Adaptation: Jurisdictions and individuals adapt to policies

The event study design explicitly models dynamic effects; we report both immediate and sustained impact estimates.

Immediate effect, people adapt, effect drifts, long-run effect settles. Policies age like milk, not wine.

Immediate effect, people adapt, effect drifts, long-run effect settles. Policies age like milk, not wine.

56.19.6 Publication Bias

Studies that find nothing don’t get published, so we think everything works. Funnel plots fish the failures out of the file drawer. Science learns to count its zeros.

Studies that find nothing don’t get published, so we think everything works. Funnel plots fish the failures out of the file drawer. Science learns to count its zeros.

The policy evaluation literature suffers from systematic publication bias:

  1. Null effects underreported: Studies finding “no significant effect” are less likely to be published
  2. Positive framing: Researchers may frame results to emphasize statistically significant findings
  3. File drawer problem: Failed replications rarely published
  4. Jurisdiction selection: Jurisdictions with cleaner natural experiments are overrepresented

Mitigation strategies:

  • Weight by inverse probability of publication (using funnel plot asymmetry tests)
  • Require pre-registration of analysis protocols before data access
  • Include unpublished working papers and government reports
  • Apply trim-and-fill or PET-PEESE corrections for funnel plot asymmetry
  • Report null findings prominently in the database

56.19.7 Operational Scope

OPG is built to rank policies by expected welfare impact, generate jurisdiction-specific recommendations, identify strong candidates for piloting, quantify heterogeneity, and flag likely harms. It relies on quasi-experimental designs with explicit diagnostics, robustness checks, and jurisdiction-level context adjustment.

Randomized trials, local implementation knowledge, and expert review complement OPG by deepening the evidence base where higher-resolution evidence is valuable.

56.20 Benchmarking Framework

56.20.1 The Critical Question

The decisive question is straightforward: Do jurisdictions that adopt high-priority OPG recommendations see better outcomes than those that don’t?

56.20.2 Addressing Adoption Bias

A naive retrospective comparison suffers from adoption bias: jurisdictions that voluntarily adopt policies may differ systematically from those that don’t. States adopting tobacco tax increases may already have anti-smoking momentum, overstating the causal effect of the tax itself.

Instrumental variable approach:

To address adoption bias, validation should exploit exogenous shocks to adoption:

Exogenous Shock Example Rationale
Court rulings State court strikes down previous policy Adoption forced by legal ruling, not political choice
Federal mandates Clean Air Act state implementation Compliance driven by federal law, not state preference
Close electoral outcomes Ballot measure passes 51-49% Near-randomization around threshold
Leadership turnover New governor from different party Adoption reflects leadership change, not underlying trends

These quasi-random adoption events provide cleaner tests of OPG predictions than voluntary adoption comparisons.

56.20.3 Retrospective Benchmark Study

Design: Retrospective prediction using instrumental variable identification.

Check if the system would have been right in the past: compute old data, identify policies, compare predictions to reality, grade yourself. It’s like marking your own homework, but honest.

Check if the system would have been right in the past: compute old data, identify policies, compare predictions to reality, grade yourself. It’s like marking your own homework, but honest.

Method:

  1. Compute OPG recommendations for all jurisdictions using only data available before a cutoff date (e.g., 2015)
  2. Identify exogenously-induced policy adoptions (court rulings, mandates, close votes) after the cutoff
  3. Compare actual outcome changes in adopting jurisdictions to OPG predictions
  4. Assess prediction accuracy and prioritization value

Success Metrics (strengthened from initial draft):

Metric Definition Target
Discrimination (AUC) Does adopting recommendations predict “welfare improved”? AUC > 0.70
Calibration Correlation between predicted effect and actual effect r > 0.5
Prioritization value High-priority validation rate vs. low-priority rate Ratio > 2:1
False positive rate High-priority recommendations that harmed welfare < 10%

Expected Outcomes:

  • If high-priority recommendations show validation rate of 60%+ and low-priority show rate < 30%, the system has practical utility
  • If no discrimination observed, the methodology needs recalibration or fundamental revision

56.20.4 Prospective Pre-Registration

To prevent hindsight bias, OPG should publish recommendations before adoption decisions are made:

  1. Quarterly publication of jurisdiction-specific recommendations with timestamps
  2. Public pre-commitment to methodology (no post-hoc adjustments)
  3. Tracking of which recommendations were subsequently adopted
  4. Comparison of pre-registered predictions to actual outcomes

This creates an auditable record that prevents retrofitting methodology to match observed outcomes.

Promise what you’ll measure before you measure it, then stick to the promise. Prevents ‘we meant to test that all along’ syndrome.

Promise what you’ll measure before you measure it, then stick to the promise. Prevents ‘we meant to test that all along’ syndrome.

56.20.5 Benchmarking Questions

  1. Context adjustment accuracy: Do jurisdiction-specific adjustments improve prediction?
  2. Blocking factor impact: Are recommendations with blocking factors less likely to be adopted?
  3. Evidence grade thresholds: Are the A-F grade cutoffs appropriately calibrated?
  4. Heterogeneity interpretation: Does high I² actually indicate context-dependence vs. measurement noise?
  5. Translation pipeline accuracy: Do surrogate→terminal metric conversions introduce systematic bias?

56.20.6 Continuous Improvement via Adoption Feedback

OPG improves through a learning loop:

  1. OPG generates recommendation with expected effect ± uncertainty
  2. Jurisdiction adopts policy at recommended level
  3. Jurisdiction tracks primary metric per tracking guidance
  4. Jurisdiction reports outcomes to OPG feedback system
  5. OPG incorporates new data point into meta-analysis
  6. Future recommendations reflect updated evidence

This transforms OPG from a static evidence aggregator into a self-improving system where every adoption strengthens the evidence base. The tracking guidance included with each recommendation standardizes what data jurisdictions should collect and report.

Recommend policy, place tries it, place reports results, analysis updates, better recommendations. It’s machine learning, but for government instead of cat pictures.

Recommend policy, place tries it, place reports results, analysis updates, better recommendations. It’s machine learning, but for government instead of cat pictures.

Adoption feedback improves the system but does not substitute for exogenous-shock benchmarking. Jurisdictions that implement OPG-recommended policies and report outcomes may differ systematically from those that do not, so instrumental-variable designs remain the cleanest performance test.

56.21 Future Directions

56.21.1 Benchmarking Priorities

Ways to check if predictions work, ranked by importance: retrospective studies, prospective trials, cross-validation, expert review. Trust in descending order.

Ways to check if predictions work, ranked by importance: retrospective studies, prospective trials, cross-validation, expert review. Trust in descending order.
  1. Retrospective validation study (highest priority): Test OPG predictions against subsequent outcomes
  2. Prospective prediction pre-registration: Publicly commit to recommendations before policy adoption decisions
  3. Domain expert review: Have policy experts assess face validity of rankings
  4. Cross-validation: Hold out jurisdictions, predict their outcomes from others

56.21.2 Data Infrastructure

Collect laws, teach computers to read them, standardize the results, give researchers access. It’s a library, but the books are alive and the librarian is an algorithm.

Collect laws, teach computers to read them, standardize the results, give researchers access. It’s a library, but the books are alive and the librarian is an algorithm.
  1. Automated policy tracking: NLP pipeline to detect policy changes from legislative databases
  2. Outcome harmonization: Standardized outcome definitions across jurisdictions
  3. API access: Enable researchers to query OPG data programmatically
  4. Version control: Track how recommendations change as new data arrives

56.21.3 Integration with Decision-Making

Show data, model scenarios, get feedback, repeat. It’s a disciplined operating loop instead of vibes with better branding.

Show data, model scenarios, get feedback, repeat. It’s a disciplined operating loop instead of vibes with better branding.
  1. Policy dashboard: Real-time recommendations for policymakers
  2. Uncertainty communication: Visualizations that convey confidence appropriately
  3. Scenario modeling: “What if” analysis for proposed policies based on similar historical policies
  4. Feedback mechanisms: Track whether recommendations were actually adopted and outcomes realized

56.22 Conclusion

The Optimal Policy Generator provides a systematic framework for translating policy-outcome evidence into jurisdiction-specific recommendations. By comparing each jurisdiction’s current policy inventory to the evidence-supported set, OPG produces actionable recommendations in four categories (enact/replace/repeal/maintain) ranked by expected welfare impact. The framework transforms scattered natural experimental evidence into actionable, jurisdiction-specific guidance.

Acknowledgments

[To be added: acknowledgments for seminar participants, reviewers, and colleagues who provided feedback.]

56.23 References

56.24 Appendix A: Worked Example - Texas Policy Recommendations

56.24.1 Warning SYNTHETIC DATA - NOT EMPIRICAL FINDINGS

All numbers in this appendix are fabricated for illustration. The effect sizes (+0.15 years, +0.02 pp/year, etc.), confidence intervals, and Bradford Hill scores are synthetic placeholders demonstrating the OPG framework’s output format. They were not derived from actual data analysis.

Do not cite these numbers as empirical findings. Actual policy effects would require jurisdiction-specific evidence analysis using real data from the sources described in this specification.

56.24.2 Overview

This worked example demonstrates the complete OPG output for a specific jurisdiction: Texas. It shows how generic policy evidence is translated into jurisdiction-specific recommendations.

56.24.3 Texas Policy Inventory (Sample)

Policy Status Income Effect Health Effect Recommendation Grade
Primary seat belt Missing +0.02 pp/yr +0.15 years ENACT A
Motorcycle helmet (all ages) Partial +0.01 pp/yr +0.08 years ENACT A
Speed limit (85→70 mph) Excessive +0.01 pp/yr +0.06 years REPLACE B
DUI threshold (0.08 BAC) Optimal N/A N/A MAINTAIN A
Graduated licensing Optimal N/A N/A MAINTAIN A

56.24.4 Step 1: Calculate Policy Impact Scores

Example: Primary Seat Belt Law

From meta-analysis of 47 US states (2000-2020):

Metric Effect SE Grade
Income (pp/yr) +0.025 0.008 28% A
Health (years) +0.18 0.04 28% A

Income effect derives from reduced healthcare costs and fewer disability-related productivity losses. Health effect converts mortality reduction to median healthy life years.

Bradford Hill Criteria Scores (applies to both metrics):

Criterion Score Rationale
Strength 0.75 Moderate standardized effects on both metrics
Consistency 0.82 I² = 28%, consistent across states
Temporality 0.95 Clear temporal ordering
Plausibility 0.90 Clear mechanism (increased compliance)
Experiment 0.85 Multiple synthetic control studies

CCS = 0.81 → Grade A

56.24.5 Step 2: Apply Context Adjustment for Texas

Factor Texas Value Adjustment
Current seat belt use 91.5% Effect may be smaller (already high)
Rural driving proportion High Effect may be larger (more severe crashes)
Population 29.5M Scale up total state-level impact

Adjusted expected effects for Texas:

  • Income: +0.02 pp/year (slightly smaller due to already-high compliance)
  • Health: +0.15 years

56.24.6 Step 3: Generate Recommendations

OPG Recommendations for Texas

56.25 ENACT (New Policies to Adopt)

1. Primary Seat Belt Enforcement Law (Current: secondary enforcement only)

Metric Expected Effect 95% CI
Income +0.02 pp/year [+0.01, +0.03]
Health +0.15 years [+0.10, +0.20]
  • Evidence grade: A
  • Priority: High
  • Blocking factors: None identified
  • Similar jurisdictions: Florida adopted 2009; saw +0.18 years health effect

2. Universal Motorcycle Helmet Requirement (Current: partial - under 21 only)

Metric Expected Effect 95% CI
Income +0.01 pp/year [+0.005, +0.015]
Health +0.08 years [+0.05, +0.12]
  • Evidence grade: A
  • Priority: Medium
  • Blocking factors: Political (autonomy concerns from rider groups)
  • Similar jurisdictions: California (all ages since 1992)

56.26 REPLACE (Policies to Modify)

3. Maximum Speed Limit: 85 mph → 70 mph

Metric Expected Effect 95% CI
Income +0.01 pp/year [+0.005, +0.02]
Health +0.06 years [+0.03, +0.09]
  • Current level: 85 mph (highest in US)
  • Recommended level: 70 mph
  • Evidence grade: B
  • Priority: Low
  • Blocking factors: Political (driver opposition), autonomy

56.27 REPEAL (Policies to Remove)

No high-priority repeal recommendations for Texas at this time.

(Example format: If Texas had a policy shown to cause net harm, it would appear here with expected welfare gain from removal.)

56.28 MAINTAIN (No Change Needed)

5. DUI Threshold at 0.08 BAC

  • Current level: 0.08 BAC (national standard)
  • Evidence: Aligned with evidence-supported level
  • Status: Continue current policy

6. Graduated Driver Licensing Program

  • Current level: Three-stage system with night/passenger restrictions
  • Evidence: Consistent with best practices
  • Status: Continue current policy

56.28.1 Step 4: Summary Dashboard

Total Expected Welfare Gain by Recommendation Type

Type Recommendation Income Effect Health Effect Grade
ENACT Primary seat belt +0.02 pp/yr +0.15 years A
ENACT Universal helmet +0.01 pp/yr +0.08 years A
REPLACE Speed limit: 85→70 mph +0.01 pp/yr +0.06 years B
MAINTAIN DUI threshold, GDL N/A N/A A
Total from changes +0.04 pp/yr +0.29 years

Note: MAINTAIN items confirm evidence alignment and require no action. REPEAL section empty for Texas; no harmful policies identified with strong evidence.

56.28.2 Interpretation

This example demonstrates how OPG transforms generic evidence into actionable, jurisdiction-specific recommendations using the two-metric framework:

If Texas adopted all recommendations, expected effects are:

  • +0.04 pp/year increase to median income growth
  • +0.29 years added to median healthy life expectancy

How policies affect money versus how they affect not dying. Ideally, both go up. Often, you have to pick one.

How policies affect money versus how they affect not dying. Ideally, both go up. Often, you have to pick one.

The two-metric format enables direct interpretation without complex conversion factors. When genuine tradeoffs exist (health gains with income costs, or vice versa), both effects are reported explicitly rather than hidden behind aggregated scores.

56.29 Appendix B: OPG Analysis Workflow

56.29.1 Complete OPG Pipeline

┌─────────────────────────────────────────────────────────────────┐
│               OPTIMAL POLICY GENERATOR WORKFLOW                  │
└─────────────────────────────────────────────────────────────────┘

Phase 1: DATA COLLECTION
─────────────────────────
1. Policy database ingestion
   ├── Parse legislative text
   ├── Record implementation dates by jurisdiction
   └── Classify policy type and category

2. Jurisdiction policy inventory
   ├── Pull current policy status for each jurisdiction
   ├── Record policy strength (for continuous policies)
   ├── Flag data quality and gaps
   └── Identify last verification date

3. Outcome data collection
   ├── Pull from primary sources (World Bank, WHO, etc.)
   ├── Harmonize units and definitions
   ├── Identify missing data patterns
   └── Flag measurement quality issues

4. Confounder data collection
   ├── Economic indicators (GDP, unemployment)
   ├── Demographic variables (age structure, education)
   ├── Political variables (regime type, election cycles)
   └── Geographic variables (neighbors' policies)

Phase 2: EVIDENCE ANALYSIS (Quasi-Experimental)
───────────────────────────────────────────────
5. Policy-outcome pair identification
   ├── Match policies to plausible outcome categories
   ├── Filter by minimum data requirements
   └── Identify applicable quasi-experimental methods

6. Method selection
   ├── Synthetic control: single treated, good donors
   ├── Difference-in-differences: multiple treated, parallel trends
   ├── Regression discontinuity: sharp threshold exists
   ├── Event study: need dynamic effects
   └── Interrupted time series: fallback

7. Effect estimation
   ├── Run primary analysis
   ├── Calculate standard errors (clustered)
   ├── Compute confidence intervals
   └── Store jurisdiction-level results

8. Robustness checks
   ├── In-time placebo tests
   ├── In-space placebo tests
   ├── Leave-one-out sensitivity
   └── Covariate adjustment sensitivity

Phase 3: AGGREGATION & PIS CALCULATION
──────────────────────────────────────
9. Meta-analysis
   ├── Pool jurisdiction estimates (random effects)
   ├── Calculate I², τ², Q statistics
   ├── Test for publication bias (funnel plot)
   └── Apply trim-and-fill if needed

10. Bradford Hill scoring
    ├── Score each criterion (0-1)
    ├── Apply criterion weights
    ├── Compute CCS (causal confidence score)
    └── Document evidence for each criterion

11. PIS calculation
    ├── Standardize effect estimate
    ├── Calculate quality adjustment
    ├── Compute final PIS
    └── Assign evidence grade (A-F)

Phase 4: RECOMMENDATION GENERATION
──────────────────────────────────
12. Policy gap analysis (per jurisdiction)
    ├── Compare current inventory to evidence-supported
    ├── Calculate gap magnitude
    ├── Identify gap type (missing, harmful, suboptimal)
    └── Flag blocking factors

13. Context adjustment
    ├── Adjust effect estimates for jurisdiction characteristics
    ├── Widen confidence intervals for context uncertainty
    ├── Identify similar jurisdictions for comparison
    └── Note implementation considerations

14. Priority scoring
    ├── Rank by |Gap| × Evidence Grade × Impact
    ├── Assign to priority tiers (Quick Win, Major Reform, etc.)
    ├── Generate enact/replace/repeal/maintain lists
    └── Calculate total expected welfare gain

Phase 5: OUTPUT GENERATION
──────────────────────────
15. Recommendation dashboard
    ├── Enact list (new policies to adopt)
    ├── Replace list (existing policies to modify: current → optimal)
    ├── Repeal list (harmful policies to remove)
    ├── Maintain list (policies aligned with evidence)
    └── Jurisdictional level and tracking guidance for each

16. Two-metric reporting
    ├── Income effect (pp/year)
    ├── Health effect (years)
    └── Combined welfare score (for ranking)

17. Documentation
    ├── Generate jurisdiction-specific reports
    ├── Create methodology audit trail
    ├── Version control all recommendations
    └── Publish to API/dashboard

56.29.2 Minimum Data Requirements Checklist

Before generating recommendations, verify:

56.30 Appendix C: Glossary

Brief definitions for quick reference. See referenced sections for full details.

Term Definition See Section
OPG Optimal Policy Generator: produces jurisdiction-specific recommendations Section 56.3
PIS Policy Impact Score: effect × causal confidence × quality Section 56.12
CCS Causal Confidence Score: weighted Bradford Hill average Section 56.7.3
Evidence Grade A-F rating based on PIS, heterogeneity, jurisdiction count Section 56.13.4
Policy Gap Difference between current and evidence-supported policy Section 56.9
Blocking Factor Constraint on adoption (constitutional, political, etc.) Section 56.10.2
Between-study heterogeneity (>75% = high) Section 56.13
Synthetic Control Weighted donor pool matching pre-treatment trajectory Section 56.7.2
DiD Difference-in-differences under parallel trends Section 56.7.2
RDD Regression discontinuity at eligibility threshold Section 56.7.2

Recommendation types: ENACT (add new), REPLACE (modify level), REPEAL (remove harmful), MAINTAIN (keep current). See Section 56.10.

Bradford Hill criteria: Strength, Consistency, Temporality, Gradient, Experiment, Plausibility, Coherence, Analogy, Specificity. See Section 56.7.3.


Corresponding Author: Mike P. Sinn, Decentralized Institutes of Health ([email protected])

Data Availability: This specification describes a methodological framework. Policy databases referenced (V-Dem, Polity V, CPDS, World Bank WDI) are publicly available at URLs provided in Data Sources section. A complete replication package including data extraction scripts, analysis code, and recommendation generation algorithms will be deposited in a public repository upon system deployment.