Methodology

Five methods. One pentagon. Full sensitivity.

Every Rosenbound study runs five complementary causal estimators on the same cohort and reports them side by side. The point estimates triangulate. The Rosenbaum-Γ bound quantifies the unmeasured-confounding strength needed to flip them. This is the methodology that ships in production today — not a roadmap aspiration.

Try the pentagon at rosenbound.ai →
The five methods

Each method addresses a different bias structure.

No single observational method is sufficient. The pentagon delivers convergent or divergent evidence depending on what biases are actually present in your cohort; the Γ-bound then quantifies the residual fragility.

METHOD 01

AIPW — Augmented Inverse Propensity Weighting

Doubly-robust ATT estimator: consistent if EITHER the propensity model OR the outcome model is correctly specified. Resilient to single-model misspecification.

When this method is appropriate: primary workhorse for observational treatment-effect estimation when you have rich measured covariates. Three covariate-enrichment stages supported (base, severity-augmented, trajectory-augmented) to test stability across feature-set richness.

References: Robins, Rotnitzky, & Zhao (1994). "Estimation of regression coefficients when some regressors are not always observed." JASA, 89(427), 846–866. · Bang, H. & Robins, J. M. (2005). "Doubly robust estimation in missing data and causal inference models." Biometrics, 61(4), 962–973.

METHOD 02

DR-ATT with Crump-2009 overlap trim

Doubly-robust ATT (Hahn 1998 estimand, Lunceford-Davidian 2004 formulation) computed on the Crump-trimmed overlap region. Tight propensity clipping at the [0.10, 0.90] range; cohorts trimmed to where treatment-effect identification is genuinely supported.

When this method is appropriate: small-N panels with possible propensity-score blow-up, or any cohort where overlap is questionable. The trim sacrifices statistical efficiency for identification credibility. We report both trim sensitivity (α = 0.05/0.10/0.15) so reviewers see whether the trim choice drives the estimate.

References: Hahn, J. (1998). "On the role of the propensity score in efficient semiparametric estimation of average treatment effects." Econometrica, 66(2), 315–331. · Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik, O. A. (2009). "Dealing with limited overlap in estimation of average treatment effects." Biometrika, 96(1), 187–199.

METHOD 03

IV-LATE — Instrumental Variable, Two-Stage Least Squares

Two-stage least squares for the Local Average Treatment Effect on the marginal-complier subpopulation. Uses the per-prescriber preference instrument when assignment is provider-driven (n=4,383 prescribers in MIMIC-IV) with stage-1 F-statistic diagnostics for instrument strength and m-of-n bootstrap for CI.

When this method is appropriate: when an instrument plausibly satisfies the exclusion restriction (provider preference, geographic variation, policy discontinuities). IV-LATE estimates a different estimand than AIPW (LATE vs ATT) — convergence across the two is evidence of structural soundness; divergence indicates instrument-specific local effects or violations.

References: Imbens, G. W., & Angrist, J. D. (1994). "Identification and estimation of local average treatment effects." Econometrica, 62(2), 467–475. · Bickel, P. J., & Sakov, A. (2008). "On the choice of m in the m out of n bootstrap." Statistica Sinica.

METHOD 04

Neural counterfactual estimator

Individual-level treatment-effect estimation with representation-balanced counterfactual learning. Patent-pending architecture; trains end-to-end on the same cohort the other four methods see, outputs per-unit conditional average treatment effects (CATE), and aggregates to ATT for direct comparison with AIPW and DR-ATT.

When this method is appropriate: when treatment-effect heterogeneity matters (sub-population identification, individualized risk surfaces) or when high-dimensional confounding strains parametric methods. The neural estimator's role in the pentagon is to surface heterogeneity that the parametric methods average away.

Implementation specifics: covered under our USPTO provisional patent (filed 2026-03-22). Architectural detail available to design partners under NDA after term-sheet signing.

METHOD 05

Rosenbaum Γ-sensitivity bounds

Quantitative sensitivity analysis: how strong an unmeasured confounder would have to be (on the odds-ratio scale) to flip the inferred treatment effect from significant to null. Reported as Γ_zero (the Γ at which the bound crosses zero) and visualized via an interactive Γ-slider on every study result.

When this method is appropriate: every observational study, always. The Γ-bound is not an alternative to the other four methods — it's a layer ON TOP of them that quantifies their fragility. A Γ_zero of 1.06 means "a 6% odds-ratio shift from an unmeasured confounder flips the estimate" — very sensitive. A Γ_zero of 2.0+ means "the result is robust to substantial residual confounding."

References: Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer. Ch. 4: "Sensitivity to Hidden Bias."

Why the Γ-bound is THE differentiating reporting layer.

Every observational study has unmeasured confounders. The standard practice is to acknowledge this in a qualitative discussion paragraph at the end of the manuscript. Rosenbound delivers the quantitative answer alongside every estimate: "this result is robust up to Γ = X; beyond that, the conclusion flips."

The FDA's March 2024 Non-Interventional Studies draft guidance directly asks for this: "assessment of unmeasured confounding factors… planned sensitivity analyses to assess the robustness of study findings." The PRINCIPLED process (BMJ 2024) makes it explicit: "deterministic sensitivity analyses, quantitative bias analyses, and net bias evaluation." The Γ-bound is exactly the right instrument.

Output guarantees

Every study produces the same six artifacts.

Regardless of the cohort, the treatment contrast, or the outcome — a Rosenbound study always delivers these six things, every time.

1. The five point estimates with confidence intervals

AIPW, DR-ATT, IV-LATE, neural counterfactual, plus the Γ-bound envelope. Side-by-side comparison surfaces convergence (evidence of structural soundness) or divergence (interpret carefully).

2. Per-method covariate balance + diagnostic table

For AIPW + DR-ATT: standardized mean differences across treatment and control after weighting. For IV-LATE: stage-1 F-statistic + weak-instrument flag. For the neural estimator: representation-distance diagnostic. For Rosenbaum: per-Γ envelope width.

3. Per-method feature attribution

Which features each estimator weighted most heavily. SHAP-style attribution where the underlying estimator supports it. "Method X attributed 65% of the propensity to feature Y" — auditable, not narrative.

4. Interactive Γ-slider

Drag the Γ value, watch the bound envelope update in real time, see the crossing-at-zero point shift. Reviewers explore the sensitivity surface directly rather than reading a static table.

5. Reproducibility certificate

Cohort definition hash + cohort data hash + certificate ID + git commit of the platform version + pinned library versions for every method used. Re-runnable by an external auditor with access only to the certificate and the raw data.

6. Methodology PDF export

One-click TRIPOD+AI-aligned submission package: reproducibility certificate + per-method methodology section + sensitivity-pentagon figures + hash-chained audit trail. Aligned with the FDA 7-step AI credibility framework for direct inclusion in regulatory submissions.

Watch the product walkthrough at rosenbound.ai — three moments that define the platform: the Cognitive Validation Report refusing incoherent data, the live Γ-bound sensitivity visualization, and the reproducibility certificate generated on every study. The full platform stays gated for Founding Partners.

Watch the preview →

pip install rosenbound  —  Official Python SDK for programmatic access: cohort upload, sensitivity-bounded study runs, and reproducibility certificate retrieval. Apache 2.0; Pydantic v2 typed; py.typed for IDE autocomplete + mypy. Platform access gated by Bearer token + RBAC + tenant scoping — the SDK is open, the audit substrate is not.

View on PyPI →

See the methodology in action on your cohort.

The Founding Partner Program includes a benchmark co-authorship clause: Rosenbound runs the full pentagon on your in-house cohort (under your IP terms) and the resulting methodology paper carries your team as co-authors. Two-way value.