Rosenbound — Sensitivity-Bounded Clinical AI

The Founding Insight

Why we exist.

We ran five different causal-inference estimators on a real ICU anticoagulation comparison — heparin vs LMWH on 153,708 admissions from MIMIC-IV. None of them recovered the RCT-published direction on bleeding outcomes. Most observational-AI vendors don't publish that result. We led with it.

The W12 Open-Confounder Benchmark

"No method recovers RCT direction. Here's the bound that explains why."

Across AIPW with progressively richer covariate sets, instrumental-variable 2SLS using per-prescriber preference, and a patent-pending neural counterfactual learner with Sinkhorn-Wasserstein representation balancing — every estimator converged on the same answer: positive ATT for LMWH on bleeding, opposite to RCT direction.

Rosenbaum sensitivity bounds quantified why: an unobserved confounder of strength Γ = 1.06 (a 5.6% odds-ratio shift) would flip the estimate to null. Exactly the magnitude physician-judgment artifacts plausibly produce.

Methods tested

5 estimators + Rosenbaum bounds

RCT-direction recovered

Y_vte: yes (IV-LATE) · Y_bleed: no

Γ to flip estimate

1.06 (very sensitive)

What we report

All five + the sensitivity bound

Honest reporting of fragility is more valuable than pretending you don't have it. Pharmaceutical drug-safety teams, clinical-research methodologists, and physicians making treatment decisions all need this honesty — and current vendor tooling doesn't provide it.

Four Verticals, One Substrate

Where Rosenbound deploys.

Same causal-inference engine + audit-trail ledger across four buyer types. Each vertical pays differently; the technical substrate is shared.

Pharmacovigilance

Drug-safety triage with sensitivity bounds

For drug-safety directors at top-20 pharma and safety leads at major CROs. Replaces FAERS-disproportionality workflows. Every adverse-event signal comes with Rosenbaum bounds and 21 CFR Part 11 ALCOA+ provenance — regulator-ready by default.

Clinical Research

Real-world-evidence methodology, packaged

For RWE methodology leads at academic medical centers and CROs. Five-method causal sensitivity pentagon as a single API call. Replaces the hand-rolled methodologist work behind every observational pharma study. Validated on ACIC22 (3,400 datasets, coverage 77.53%).

Drug Discovery

Causal effect estimation for target validation

For computational biology and target-discovery teams. Augments DMPNN-based molecular property prediction (validated on BACE / BBBP / HIV scaffold splits) with causal-inference-driven mechanism-of-action analysis. Honest bounds on every effect estimate.

Physician Decision Support

Treatment-selection risk surface (post-2027)

For ICU and clinical-AMC physicians making individualized treatment decisions. Risk surface with explicit "this estimate is fragile to unobserved confounding at Γ = X" disclosure. Targeted FDA Class II SaMD 510(k) pathway; available 2027+ post-clinical-deployment validation.

Benchmarks

Actual numbers, actual logs.

Every metric below is from production training runs on the full data corpus. Test AUROC / AUPRC / ECE / coverage values are quoted directly from the timestamped log files; there are no smoke-run inflations and no cherry-picked seeds. Specific log file references are listed at the bottom of this section.

MIMIC-IV v3.1 clinical prediction Full corpus, 5-fold CV + temporal test split

Lane	Outcome	Cohort	Test AUROC	Calibration	Published reference
W1	In-hospital mortality	n=546K admits	0.9476	Beta-Bayes ECE 0.0024	Tomasev 2019: 0.92
W2	30-day all-cause readmission	n=534K excl. deaths	0.7034	Isotonic ECE 0.0040	Rajkomar 2018: 0.75-0.76 (different cohort)
W3	Sepsis-3 onset (first 48h ICU)	n=74,829 ICU stays	0.8908	AUPRC 0.8710	Saria 0.83-0.85, Komorowski 0.85, Hyland 0.82-0.86
W5	KDIGO AKI Stage 1+ (first 48h)	n=75K ICU stays	0.8371	AUPRC 0.6804	Tomasev 2019: 0.82, Koyner 0.82
W11	Mortality time-to-event (Cox PH)	n=546K admits	c-index 0.8019	IPCW c 0.6468 · IBS 0.0841	Cheng 2019 LSTM: 0.81-0.85
W11	Mortality time-to-event (AFT)	n=546K admits	c-index 0.7707	IPCW c 0.6375	dual-comparator survival

All five lanes locked under Beta+Bayes vs isotonic auto-pick using validation ECE; calibrator chosen per workload, not per dataset family.

FAERS pharmacovigilance — Pipeline B severity classifier Full 20M-row corpus, temporal split

Split	Cohort	AUROC	AUPRC	ECE	Brier
Train	n=420,294 (pre-2020)	0.9562	0.8616	0.0946	0.0919
Validation	n=170,716	0.8858	0.7836	0.1751	0.1692
Test (2020-2025)	n=158,732	0.8872	0.7680	0.1744	0.1669

Test AUROC is approximately +9pp above the demographics-only Bate 2019 baseline (~0.78). ECE is recoverable via SMCE capability-conditional abstain on the same test split (A2/A3/A4 PASS).

ACIC22 Track-2 causal inference challenge V3 lock, full 3,400-cohort canonical

Estimator	Bias	RMSE	Coverage	Width	Lift over baseline
DCIE Ensemble	+19.26	28.80	77.53%	78.04	11× coverage lift (vs 7% baseline)

40.86 sec/dataset on 12-thread laptop. Coverage gain unlocked by three Phase-1 ensemble fixes: 7-module checkpoint completeness, stratified bootstrap fallback, Sinkhorn cost-matrix clamp.

W12 anticoagulation causal sensitivity pentagon Heparin vs LMWH ATT, n=153,708 ICU admissions

Method	Y_bleed ATT	Y_vte ATT	RCT-direction recovery
v3 AIPW (baseline covariates)	+0.0253 [+0.004, +0.047]	+0.0766 [+0.050, +0.106]	No
v4 AIPW (severity-augmented 24h)	+0.0271 [+0.005, +0.049]	+0.0794 [+0.052, +0.107]	No
v5 AIPW (trajectory 72h × 3 bins)	+0.0301 [+0.009, +0.052]	+0.0802 [+0.053, +0.108]	No
IV-LATE (2SLS, n=4,383 prescribers)	+0.0625 [+0.048, +0.079]	+0.0102 [-0.002, +0.021]	Y_vte: yes (CI ⊃ 0)
DCIE neural counterfactual	+0.0255 [+0.0245, +0.0266]	deferred (full-Sinkhorn full-N hardware-gated)	No (matches AIPW)
Rosenbaum Γ-bound	Γ_zero = 1.06 (bleed) · Γ_zero = 1.17 (vte)		Quantitative fragility

Five methods on the same cohort. None recovers RCT direction on Y_bleed; IV-LATE recovers RCT non-inferiority on Y_vte. Γ = 1.06 means a 5.6% odds-ratio shift from an unobserved confounder flips the bleed estimate.

W13 anticoagulation comparator (positive control) DOAC vs warfarin DR-ATT, AFib+CKD

Outcome	ATT [95% CI]	Direction	RCT consistency
Y_stroke	+0.0084 [+0.0012, +0.0168]	DOAC marginally worse	RE-LY / ROCKET-AF / ARISTOTLE / ENGAGE-AF AFib+CKD subgroups
Y_bleed	-0.0396 [-0.0521, -0.0277]	DOAC ~27% relative reduction	RCT-consistent

Cohort n=8,990 → 4,220 post Crump-2009 trim. First observational W-lane to recover RCT direction on both outcomes. Trim sensitivity α=0.05/0.10/0.15 robust (Δ ATT < 0.004).

DMPNN MoleculeNet (drug-discovery lane) Chemprop v2 parity, scaffold splits, 3 seeds

Benchmark	Rosenbound D-MPNN AUROC	Chemprop v2 reference	Within σ overlap
BBBP (blood-brain barrier penetration)	0.9144 ± 0.0113	0.897 ± 0.012	Yes (matches)
BACE (β-secretase inhibition)	0.8861 ± 0.001	0.859 ± 0.024	Yes (matches)
HIV (replication inhibition)	0.7937 ± 0.0149	0.776 ± 0.020	Yes (matches)

In-house PyTorch rewrite (~500 LOC, no Chemprop runtime dependency). 5-test gradient-check suite green. Bemis-Murcko scaffold splits, no ensembling — published-baseline parity under stricter conditions.

HNSI clinical-NER pipeline MIMIC-IV-Note v2.2, scispaCy + medspacy

Stage	Volume	Throughput	Coverage
W1 radiology corpus	570K notes / 572 chunks / 141 MB output	9.7 notes/sec	~30M entities, 13-18% negation rate
W2 discharge + radiology corpus	1.07M notes / 1,071 chunks / 975 MB output	9.7 notes/sec	~50M+ entities
Total processed	1.64M clinical notes	en_core_sci_md NER + medspacy_sectionizer + ConText	80M+ entities with negation/historical/family attributes

Stage 1 cohort builder: 4.3 min via DuckDB. Block-NumPy-ABI-fix-1 unblocked the 3.7.5 spaCy stack on Windows (numpy < 2.0 + spacy 3.7.4 + thinc 8.2.5).

Source logs — raw training/eval log files referenced above are stored in RUN LOGS OF CAI/: MIMIC-IV W1/W2/W3/W5/W11 (05-03-2026-p1.txt); FAERS Pipeline B (04-22-2026.txt); ACIC22 V3 lock (05-02&03-2026.txt); W12 sensitivity pentagon + DCIE (05-04-2026.txt); DMPNN MoleculeNet (04-19-2026.txt, 04-20-2026.txt); HNSI extraction (04-26-26-p1.txt through 04-29-2026). Available to design partners under NDA on request.

Regulatory Pathway

Built to regulator standards from day one.

✓ Non-device CDS exemption

Pharmacovigilance triage product satisfies all four criteria of 21st Century Cures Act § 3060 (transparent algorithms, documented provenance, human-in-the-loop, independent review of basis). Ship to pharma drug-safety customers in 2026 without 510(k).

✓ 21 CFR Part 11 ALCOA+

Patent-pending VBSM module satisfies 21 CFR Part 11 § 11.10(a)–(k) by design. SHA-256 chained ledger entries, monotonic_ns timestamps, fsync + PID-lock multi-writer protection. Pharma clients use our outputs in their own regulated workflows without bolting on separate audit infrastructure.

✓ 510(k) pathway documented (post-2027)

Q-Submission scoping for direct clinician-facing CDS deployment scheduled Q3 2026 contingent on first signed pilot. Breakthrough Device designation eligible for the W12 anticoagulation lane given quantitative advantage over existing point-estimate-based decision support.

✓ RWE Framework alignment

Aligned with FDA's 2024 Real-World Evidence Framework and EMA's 2023 Reflection Paper on RWE. Sensitivity-bounded reporting is exactly what both agencies are now requiring in observational submissions; we provide it as the default output, not as an afterthought.

Intellectual Property

USPTO provisional patent filed.

Founder Harsh Singh filed a USPTO provisional patent on March 22, 2026 covering the four core modules of the Rosenbound platform — DCIE neural counterfactual learner, VBSM ALCOA+ commit-gate ledger, PSIM online causal memory, and SMCE capability-conditional abstention — plus their integration architecture. 30+ claims. 12-month conversion window to non-provisional or PCT through March 2027. A second provisional covering the W-lane sensitivity-pentagon orchestration layer is in preparation.

DCIE

Deterministic Causal Inference Engine. TARNet + Sinkhorn-Wasserstein IPM + NCAAP twin-network + Dragonnet propensity head. Production-validated on ACIC22 V3 lock and MIMIC-IV vasopressor TE.

VBSM

Verifiable Boundary State Machine. 21 CFR Part 11 / ALCOA+ commit-gate ledger with cryptographically chained entries. Production-validated on W1, W2, Pipeline B, and DCIE retrains.

PSIM

Patient State Identity Manager. Online accumulating causal memory (ESCG + HCMC + MCLR). Production-validated on 2.83M FAERS-row backfill with full memory loop.

SMCE

Self-Monitoring Capability Estimator. Probe-based per-prediction capability estimation for abstain-when-uncertain. Production-validated A2/A3/A4 PASS on Pipeline B test split (n=158,732).

Causal inference you can defend in front of regulators.