Causal Inference for Clinical & Pharmaceutical Research

Causal inference you can defend in front of regulators.

Most clinical-AI platforms sell observational point estimates as if they were RCT-grade evidence. They aren't. We ship causal estimates with explicit Rosenbaum sensitivity bounds and 21 CFR Part 11 ALCOA+ audit trails. The first platform that reports the fragility of its own claims.

546K
MIMIC-IV admissions ingested
1.64M
Clinical notes processed
8
Benchmark lanes locked
1
USPTO provisional filed (Mar 2026)
The Founding Insight

Why we exist.

We ran five different causal-inference estimators on a real ICU anticoagulation comparison — heparin vs LMWH on 153,708 admissions from MIMIC-IV. None of them recovered the RCT-published direction on bleeding outcomes. Most observational-AI vendors don't publish that result. We led with it.

The W12 Open-Confounder Benchmark

"No method recovers RCT direction. Here's the bound that explains why."

Across AIPW with progressively richer covariate sets, instrumental-variable 2SLS using per-prescriber preference, and a patent-pending neural counterfactual learner with Sinkhorn-Wasserstein representation balancing — every estimator converged on the same answer: positive ATT for LMWH on bleeding, opposite to RCT direction.

Rosenbaum sensitivity bounds quantified why: an unobserved confounder of strength Γ = 1.06 (a 5.6% odds-ratio shift) would flip the estimate to null. Exactly the magnitude physician-judgment artifacts plausibly produce.

Methods tested
5 estimators + Rosenbaum bounds
RCT-direction recovered
Y_vte: yes (IV-LATE) · Y_bleed: no
Γ to flip estimate
1.06 (very sensitive)
What we report
All five + the sensitivity bound

Honest reporting of fragility is more valuable than pretending you don't have it. Pharmaceutical drug-safety teams, clinical-research methodologists, and physicians making treatment decisions all need this honesty — and current vendor tooling doesn't provide it.

Four Verticals, One Substrate

Where Rosenbound deploys.

Same causal-inference engine + audit-trail ledger across four buyer types. Each vertical pays differently; the technical substrate is shared.

PV
Pharmacovigilance

Drug-safety triage with sensitivity bounds

For drug-safety directors at top-20 pharma and safety leads at major CROs. Replaces FAERS-disproportionality workflows. Every adverse-event signal comes with Rosenbaum bounds and 21 CFR Part 11 ALCOA+ provenance — regulator-ready by default.

CR
Clinical Research

Real-world-evidence methodology, packaged

For RWE methodology leads at academic medical centers and CROs. Five-method causal sensitivity pentagon as a single API call. Replaces the hand-rolled methodologist work behind every observational pharma study. Validated on ACIC22 (3,400 datasets, coverage 77.53%).

DD
Drug Discovery

Causal effect estimation for target validation

For computational biology and target-discovery teams. Augments DMPNN-based molecular property prediction (validated on BACE / BBBP / HIV scaffold splits) with causal-inference-driven mechanism-of-action analysis. Honest bounds on every effect estimate.

RX
Physician Decision Support

Treatment-selection risk surface (post-2027)

For ICU and clinical-AMC physicians making individualized treatment decisions. Risk surface with explicit "this estimate is fragile to unobserved confounding at Γ = X" disclosure. Targeted FDA Class II SaMD 510(k) pathway; available 2027+ post-clinical-deployment validation.

Methodology

Five methods, one pentagon, full sensitivity.

Standard practice ships one ATT estimate. We ship the full sensitivity pentagon plus quantitative bounds. Every number comes with the uncertainty its method admits.

  1. 01
    AIPW (Augmented Inverse Propensity Weighting)Robins-Rotnitzky-Zhao 1994 / Bang-Robins 2005. Three covariate enrichment stages: base, severity-at-switch, multi-day trajectory.
    +0.0253 → +0.0301 (W12)
  2. 02
    DR-ATT with Crump-2009 overlap trimHahn-1998 doubly-robust ATT, Lunceford-Davidian 2004 formulation. Tight propensity clipping for small-N panels.
    +0.0084 stroke / -0.0396 bleed (W13, RCT-consistent)
  3. 03
    IV-LATE (2SLS instrumental variable)Per-prescriber preference instrument (n=4,383 providers in MIMIC-IV). Stage-1 F = 67,250.
    Y_vte CI ⊃ 0 ✓ recovered RCT non-inferiority
  4. 04
    DCIE neural counterfactualPatent-pending stack: TARNet + Sinkhorn-Wasserstein IPM + NCAAP twin-network + Dragonnet propensity head.
    Y_bleed +0.0255 (AIPW-equivalent)
  5. 05
    Rosenbaum 2002 Γ-sensitivity boundsQuantitative measure of unobserved-bias strength required to flip estimates. THE differentiating reporting layer.
    Γ_zero = 1.06 (bleed) / 1.17 (vte)
Benchmarks

Actual numbers, actual logs.

Every metric below is from production training runs on the full data corpus. Test AUROC / AUPRC / ECE / coverage values are quoted directly from the timestamped log files; there are no smoke-run inflations and no cherry-picked seeds. Specific log file references are listed at the bottom of this section.

MIMIC-IV v3.1 clinical prediction Full corpus, 5-fold CV + temporal test split

Lane Outcome Cohort Test AUROC Calibration Published reference
W1 In-hospital mortality n=546K admits 0.9476 Beta-Bayes ECE 0.0024 Tomasev 2019: 0.92
W2 30-day all-cause readmission n=534K excl. deaths 0.7034 Isotonic ECE 0.0040 Rajkomar 2018: 0.75-0.76 (different cohort)
W3 Sepsis-3 onset (first 48h ICU) n=74,829 ICU stays 0.8908 AUPRC 0.8710 Saria 0.83-0.85, Komorowski 0.85, Hyland 0.82-0.86
W5 KDIGO AKI Stage 1+ (first 48h) n=75K ICU stays 0.8371 AUPRC 0.6804 Tomasev 2019: 0.82, Koyner 0.82
W11 Mortality time-to-event (Cox PH) n=546K admits c-index 0.8019 IPCW c 0.6468 · IBS 0.0841 Cheng 2019 LSTM: 0.81-0.85
W11 Mortality time-to-event (AFT) n=546K admits c-index 0.7707 IPCW c 0.6375 dual-comparator survival

All five lanes locked under Beta+Bayes vs isotonic auto-pick using validation ECE; calibrator chosen per workload, not per dataset family.

FAERS pharmacovigilance — Pipeline B severity classifier Full 20M-row corpus, temporal split

Split Cohort AUROC AUPRC ECE Brier
Train n=420,294 (pre-2020) 0.9562 0.8616 0.0946 0.0919
Validation n=170,716 0.8858 0.7836 0.1751 0.1692
Test (2020-2025) n=158,732 0.8872 0.7680 0.1744 0.1669

Test AUROC is approximately +9pp above the demographics-only Bate 2019 baseline (~0.78). ECE is recoverable via SMCE capability-conditional abstain on the same test split (A2/A3/A4 PASS).

ACIC22 Track-2 causal inference challenge V3 lock, full 3,400-cohort canonical

Estimator Bias RMSE Coverage Width Lift over baseline
DCIE Ensemble +19.26 28.80 77.53% 78.04 11× coverage lift (vs 7% baseline)

40.86 sec/dataset on 12-thread laptop. Coverage gain unlocked by three Phase-1 ensemble fixes: 7-module checkpoint completeness, stratified bootstrap fallback, Sinkhorn cost-matrix clamp.

W12 anticoagulation causal sensitivity pentagon Heparin vs LMWH ATT, n=153,708 ICU admissions

Method Y_bleed ATT Y_vte ATT RCT-direction recovery
v3 AIPW (baseline covariates) +0.0253 [+0.004, +0.047] +0.0766 [+0.050, +0.106] No
v4 AIPW (severity-augmented 24h) +0.0271 [+0.005, +0.049] +0.0794 [+0.052, +0.107] No
v5 AIPW (trajectory 72h × 3 bins) +0.0301 [+0.009, +0.052] +0.0802 [+0.053, +0.108] No
IV-LATE (2SLS, n=4,383 prescribers) +0.0625 [+0.048, +0.079] +0.0102 [-0.002, +0.021] Y_vte: yes (CI ⊃ 0)
DCIE neural counterfactual +0.0255 [+0.0245, +0.0266] deferred (full-Sinkhorn full-N hardware-gated) No (matches AIPW)
Rosenbaum Γ-bound Γ_zero = 1.06 (bleed) · Γ_zero = 1.17 (vte) Quantitative fragility

Five methods on the same cohort. None recovers RCT direction on Y_bleed; IV-LATE recovers RCT non-inferiority on Y_vte. Γ = 1.06 means a 5.6% odds-ratio shift from an unobserved confounder flips the bleed estimate.

W13 anticoagulation comparator (positive control) DOAC vs warfarin DR-ATT, AFib+CKD

Outcome ATT [95% CI] Direction RCT consistency
Y_stroke +0.0084 [+0.0012, +0.0168] DOAC marginally worse RE-LY / ROCKET-AF / ARISTOTLE / ENGAGE-AF AFib+CKD subgroups
Y_bleed -0.0396 [-0.0521, -0.0277] DOAC ~27% relative reduction RCT-consistent

Cohort n=8,990 → 4,220 post Crump-2009 trim. First observational W-lane to recover RCT direction on both outcomes. Trim sensitivity α=0.05/0.10/0.15 robust (Δ ATT < 0.004).

DMPNN MoleculeNet (drug-discovery lane) Chemprop v2 parity, scaffold splits, 3 seeds

Benchmark Rosenbound D-MPNN AUROC Chemprop v2 reference Within σ overlap
BBBP (blood-brain barrier penetration) 0.9144 ± 0.0113 0.897 ± 0.012 Yes (matches)
BACE (β-secretase inhibition) 0.8861 ± 0.001 0.859 ± 0.024 Yes (matches)
HIV (replication inhibition) 0.7937 ± 0.0149 0.776 ± 0.020 Yes (matches)

In-house PyTorch rewrite (~500 LOC, no Chemprop runtime dependency). 5-test gradient-check suite green. Bemis-Murcko scaffold splits, no ensembling — published-baseline parity under stricter conditions.

HNSI clinical-NER pipeline MIMIC-IV-Note v2.2, scispaCy + medspacy

Stage Volume Throughput Coverage
W1 radiology corpus 570K notes / 572 chunks / 141 MB output 9.7 notes/sec ~30M entities, 13-18% negation rate
W2 discharge + radiology corpus 1.07M notes / 1,071 chunks / 975 MB output 9.7 notes/sec ~50M+ entities
Total processed 1.64M clinical notes en_core_sci_md NER + medspacy_sectionizer + ConText 80M+ entities with negation/historical/family attributes

Stage 1 cohort builder: 4.3 min via DuckDB. Block-NumPy-ABI-fix-1 unblocked the 3.7.5 spaCy stack on Windows (numpy < 2.0 + spacy 3.7.4 + thinc 8.2.5).

Source logs — raw training/eval log files referenced above are stored in RUN LOGS OF CAI/: MIMIC-IV W1/W2/W3/W5/W11 (05-03-2026-p1.txt); FAERS Pipeline B (04-22-2026.txt); ACIC22 V3 lock (05-02&03-2026.txt); W12 sensitivity pentagon + DCIE (05-04-2026.txt); DMPNN MoleculeNet (04-19-2026.txt, 04-20-2026.txt); HNSI extraction (04-26-26-p1.txt through 04-29-2026). Available to design partners under NDA on request.
Regulatory Pathway

Built to regulator standards from day one.

Non-device CDS exemption

Pharmacovigilance triage product satisfies all four criteria of 21st Century Cures Act § 3060 (transparent algorithms, documented provenance, human-in-the-loop, independent review of basis). Ship to pharma drug-safety customers in 2026 without 510(k).

21 CFR Part 11 ALCOA+

Patent-pending VBSM module satisfies 21 CFR Part 11 § 11.10(a)–(k) by design. SHA-256 chained ledger entries, monotonic_ns timestamps, fsync + PID-lock multi-writer protection. Pharma clients use our outputs in their own regulated workflows without bolting on separate audit infrastructure.

510(k) pathway documented (post-2027)

Q-Submission scoping for direct clinician-facing CDS deployment scheduled Q3 2026 contingent on first signed pilot. Breakthrough Device designation eligible for the W12 anticoagulation lane given quantitative advantage over existing point-estimate-based decision support.

RWE Framework alignment

Aligned with FDA's 2024 Real-World Evidence Framework and EMA's 2023 Reflection Paper on RWE. Sensitivity-bounded reporting is exactly what both agencies are now requiring in observational submissions; we provide it as the default output, not as an afterthought.

Intellectual Property

USPTO provisional patent filed.

Founder Harsh Singh filed a USPTO provisional patent on March 22, 2026 covering the four core modules of the Rosenbound platform — DCIE neural counterfactual learner, VBSM ALCOA+ commit-gate ledger, PSIM online causal memory, and SMCE capability-conditional abstention — plus their integration architecture. 30+ claims. 12-month conversion window to non-provisional or PCT through March 2027. A second provisional covering the W-lane sensitivity-pentagon orchestration layer is in preparation.

DCIE

Deterministic Causal Inference Engine. TARNet + Sinkhorn-Wasserstein IPM + NCAAP twin-network + Dragonnet propensity head. Production-validated on ACIC22 V3 lock and MIMIC-IV vasopressor TE.

VBSM

Verifiable Boundary State Machine. 21 CFR Part 11 / ALCOA+ commit-gate ledger with cryptographically chained entries. Production-validated on W1, W2, Pipeline B, and DCIE retrains.

PSIM

Patient State Identity Manager. Online accumulating causal memory (ESCG + HCMC + MCLR). Production-validated on 2.83M FAERS-row backfill with full memory loop.

SMCE

Self-Monitoring Capability Estimator. Probe-based per-prediction capability estimation for abstain-when-uncertain. Production-validated A2/A3/A4 PASS on Pipeline B test split (n=158,732).

Become a design partner.

Looking for one pharmacovigilance, clinical-research, drug-discovery, or AMC partner who'd benefit from sensitivity-bounded causal reporting in their existing workflow. The pilot is no-cost; you provide an adverse-event corpus or observational cohort, we deliver causal estimates with Rosenbaum bounds and a 21 CFR Part 11 audit trail.

Email harsh22@bu.edu

Or connect via LinkedIn: linkedin.com/in/harshsingh2103