Methodology — Five-Method Causal Sensitivity Pentagon

The five methods

Each method addresses a different bias structure.

No single observational method is sufficient. The pentagon delivers convergent or divergent evidence depending on what biases are actually present in your cohort; the Γ-bound then quantifies the residual fragility.

METHOD 01

AIPW — Augmented Inverse Propensity Weighting

Doubly-robust ATT estimator: consistent if EITHER the propensity model OR the outcome model is correctly specified. Resilient to single-model misspecification.

When this method is appropriate: primary workhorse for observational treatment-effect estimation when you have rich measured covariates. Three covariate-enrichment stages supported (base, severity-augmented, trajectory-augmented) to test stability across feature-set richness.

References: Robins, Rotnitzky, & Zhao (1994). "Estimation of regression coefficients when some regressors are not always observed." JASA, 89(427), 846–866. · Bang, H. & Robins, J. M. (2005). "Doubly robust estimation in missing data and causal inference models." Biometrics, 61(4), 962–973.

METHOD 02

DR-ATT with Crump-2009 overlap trim

Doubly-robust ATT (Hahn 1998 estimand, Lunceford-Davidian 2004 formulation) computed on the Crump-trimmed overlap region. Tight propensity clipping at the [0.10, 0.90] range; cohorts trimmed to where treatment-effect identification is genuinely supported.

When this method is appropriate: small-N panels with possible propensity-score blow-up, or any cohort where overlap is questionable. The trim sacrifices statistical efficiency for identification credibility. We report both trim sensitivity (α = 0.05/0.10/0.15) so reviewers see whether the trim choice drives the estimate.

References: Hahn, J. (1998). "On the role of the propensity score in efficient semiparametric estimation of average treatment effects." Econometrica, 66(2), 315–331. · Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik, O. A. (2009). "Dealing with limited overlap in estimation of average treatment effects." Biometrika, 96(1), 187–199.

METHOD 03

IV-LATE — Instrumental Variable, Two-Stage Least Squares

Two-stage least squares for the Local Average Treatment Effect on the marginal-complier subpopulation. Uses the per-prescriber preference instrument when assignment is provider-driven (n=4,383 prescribers in MIMIC-IV) with stage-1 F-statistic diagnostics for instrument strength and m-of-n bootstrap for CI.

When this method is appropriate: when an instrument plausibly satisfies the exclusion restriction (provider preference, geographic variation, policy discontinuities). IV-LATE estimates a different estimand than AIPW (LATE vs ATT) — convergence across the two is evidence of structural soundness; divergence indicates instrument-specific local effects or violations.

References: Imbens, G. W., & Angrist, J. D. (1994). "Identification and estimation of local average treatment effects." Econometrica, 62(2), 467–475. · Bickel, P. J., & Sakov, A. (2008). "On the choice of m in the m out of n bootstrap." Statistica Sinica.

METHOD 04

Neural counterfactual estimator

Individual-level treatment-effect estimation with representation-balanced counterfactual learning. Patent-pending architecture; trains end-to-end on the same cohort the other four methods see, outputs per-unit conditional average treatment effects (CATE), and aggregates to ATT for direct comparison with AIPW and DR-ATT.

When this method is appropriate: when treatment-effect heterogeneity matters (sub-population identification, individualized risk surfaces) or when high-dimensional confounding strains parametric methods. The neural estimator's role in the pentagon is to surface heterogeneity that the parametric methods average away.

Implementation specifics: covered under our USPTO provisional patent (filed 2026-03-22). Architectural detail available to design partners under NDA after term-sheet signing.

METHOD 05

Rosenbaum Γ-sensitivity bounds

Quantitative sensitivity analysis: how strong an unmeasured confounder would have to be (on the odds-ratio scale) to flip the inferred treatment effect from significant to null. Reported as Γ_zero (the Γ at which the bound crosses zero) and visualized via an interactive Γ-slider on every study result.

When this method is appropriate: every observational study, always. The Γ-bound is not an alternative to the other four methods — it's a layer ON TOP of them that quantifies their fragility. A Γ_zero of 1.06 means "a 6% odds-ratio shift from an unmeasured confounder flips the estimate" — very sensitive. A Γ_zero of 2.0+ means "the result is robust to substantial residual confounding."

References: Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer. Ch. 4: "Sensitivity to Hidden Bias."

Why the Γ-bound is THE differentiating reporting layer.

Every observational study has unmeasured confounders. The standard practice is to acknowledge this in a qualitative discussion paragraph at the end of the manuscript. Rosenbound delivers the quantitative answer alongside every estimate: "this result is robust up to Γ = X; beyond that, the conclusion flips."

The FDA's March 2024 Non-Interventional Studies draft guidance directly asks for this: "assessment of unmeasured confounding factors… planned sensitivity analyses to assess the robustness of study findings." The PRINCIPLED process (BMJ 2024) makes it explicit: "deterministic sensitivity analyses, quantitative bias analyses, and net bias evaluation." The Γ-bound is exactly the right instrument.

Output guarantees

Every study produces the same six artifacts.

Regardless of the cohort, the treatment contrast, or the outcome — a Rosenbound study always delivers these six things, every time.

1. The five point estimates with confidence intervals

AIPW, DR-ATT, IV-LATE, neural counterfactual, plus the Γ-bound envelope. Side-by-side comparison surfaces convergence (evidence of structural soundness) or divergence (interpret carefully).

2. Per-method covariate balance + diagnostic table

For AIPW + DR-ATT: standardized mean differences across treatment and control after weighting. For IV-LATE: stage-1 F-statistic + weak-instrument flag. For the neural estimator: representation-distance diagnostic. For Rosenbaum: per-Γ envelope width.

3. Per-method feature attribution

Which features each estimator weighted most heavily. SHAP-style attribution where the underlying estimator supports it. "Method X attributed 65% of the propensity to feature Y" — auditable, not narrative.

4. Interactive Γ-slider

Drag the Γ value, watch the bound envelope update in real time, see the crossing-at-zero point shift. Reviewers explore the sensitivity surface directly rather than reading a static table.

5. Reproducibility certificate

Cohort definition hash + cohort data hash + certificate ID + git commit of the platform version + pinned library versions for every method used. Re-runnable by an external auditor with access only to the certificate and the raw data.

6. Methodology PDF export

One-click TRIPOD+AI-aligned submission package: reproducibility certificate + per-method methodology section + sensitivity-pentagon figures + hash-chained audit trail. Aligned with the FDA 7-step AI credibility framework for direct inclusion in regulatory submissions.

Five methods. One pentagon. Full sensitivity.