Most clinical-AI platforms sell observational point estimates as if they were RCT-grade evidence. They aren't. We ship causal estimates with explicit Rosenbaum sensitivity bounds and 21 CFR Part 11 ALCOA+ audit trails. The first platform that reports the fragility of its own claims.
We ran five different causal-inference estimators on a real ICU anticoagulation comparison — heparin vs LMWH on 153,708 admissions from MIMIC-IV. None of them recovered the RCT-published direction on bleeding outcomes. Most observational-AI vendors don't publish that result. We led with it.
Across AIPW with progressively richer covariate sets, instrumental-variable 2SLS using per-prescriber preference, and a patent-pending neural counterfactual learner with Sinkhorn-Wasserstein representation balancing — every estimator converged on the same answer: positive ATT for LMWH on bleeding, opposite to RCT direction.
Rosenbaum sensitivity bounds quantified why: an unobserved confounder of strength Γ = 1.06 (a 5.6% odds-ratio shift) would flip the estimate to null. Exactly the magnitude physician-judgment artifacts plausibly produce.
Honest reporting of fragility is more valuable than pretending you don't have it. Pharmaceutical drug-safety teams, clinical-research methodologists, and physicians making treatment decisions all need this honesty — and current vendor tooling doesn't provide it.
Same causal-inference engine + audit-trail ledger across four buyer types. Each vertical pays differently; the technical substrate is shared.
For drug-safety directors at top-20 pharma and safety leads at major CROs. Replaces FAERS-disproportionality workflows. Every adverse-event signal comes with Rosenbaum bounds and 21 CFR Part 11 ALCOA+ provenance — regulator-ready by default.
For RWE methodology leads at academic medical centers and CROs. Five-method causal sensitivity pentagon as a single API call. Replaces the hand-rolled methodologist work behind every observational pharma study. Validated on ACIC22 (3,400 datasets, coverage 77.53%).
For computational biology and target-discovery teams. Augments DMPNN-based molecular property prediction (validated on BACE / BBBP / HIV scaffold splits) with causal-inference-driven mechanism-of-action analysis. Honest bounds on every effect estimate.
For ICU and clinical-AMC physicians making individualized treatment decisions. Risk surface with explicit "this estimate is fragile to unobserved confounding at Γ = X" disclosure. Targeted FDA Class II SaMD 510(k) pathway; available 2027+ post-clinical-deployment validation.
Standard practice ships one ATT estimate. We ship the full sensitivity pentagon plus quantitative bounds. Every number comes with the uncertainty its method admits.
Every metric below is from production training runs on the full data corpus. Test AUROC / AUPRC / ECE / coverage values are quoted directly from the timestamped log files; there are no smoke-run inflations and no cherry-picked seeds. Specific log file references are listed at the bottom of this section.
| Lane | Outcome | Cohort | Test AUROC | Calibration | Published reference |
|---|---|---|---|---|---|
| W1 | In-hospital mortality | n=546K admits | 0.9476 | Beta-Bayes ECE 0.0024 | Tomasev 2019: 0.92 |
| W2 | 30-day all-cause readmission | n=534K excl. deaths | 0.7034 | Isotonic ECE 0.0040 | Rajkomar 2018: 0.75-0.76 (different cohort) |
| W3 | Sepsis-3 onset (first 48h ICU) | n=74,829 ICU stays | 0.8908 | AUPRC 0.8710 | Saria 0.83-0.85, Komorowski 0.85, Hyland 0.82-0.86 |
| W5 | KDIGO AKI Stage 1+ (first 48h) | n=75K ICU stays | 0.8371 | AUPRC 0.6804 | Tomasev 2019: 0.82, Koyner 0.82 |
| W11 | Mortality time-to-event (Cox PH) | n=546K admits | c-index 0.8019 | IPCW c 0.6468 · IBS 0.0841 | Cheng 2019 LSTM: 0.81-0.85 |
| W11 | Mortality time-to-event (AFT) | n=546K admits | c-index 0.7707 | IPCW c 0.6375 | dual-comparator survival |
All five lanes locked under Beta+Bayes vs isotonic auto-pick using validation ECE; calibrator chosen per workload, not per dataset family.
| Split | Cohort | AUROC | AUPRC | ECE | Brier |
|---|---|---|---|---|---|
| Train | n=420,294 (pre-2020) | 0.9562 | 0.8616 | 0.0946 | 0.0919 |
| Validation | n=170,716 | 0.8858 | 0.7836 | 0.1751 | 0.1692 |
| Test (2020-2025) | n=158,732 | 0.8872 | 0.7680 | 0.1744 | 0.1669 |
Test AUROC is approximately +9pp above the demographics-only Bate 2019 baseline (~0.78). ECE is recoverable via SMCE capability-conditional abstain on the same test split (A2/A3/A4 PASS).
| Estimator | Bias | RMSE | Coverage | Width | Lift over baseline |
|---|---|---|---|---|---|
| DCIE Ensemble | +19.26 | 28.80 | 77.53% | 78.04 | 11× coverage lift (vs 7% baseline) |
40.86 sec/dataset on 12-thread laptop. Coverage gain unlocked by three Phase-1 ensemble fixes: 7-module checkpoint completeness, stratified bootstrap fallback, Sinkhorn cost-matrix clamp.
| Method | Y_bleed ATT | Y_vte ATT | RCT-direction recovery |
|---|---|---|---|
| v3 AIPW (baseline covariates) | +0.0253 [+0.004, +0.047] | +0.0766 [+0.050, +0.106] | No |
| v4 AIPW (severity-augmented 24h) | +0.0271 [+0.005, +0.049] | +0.0794 [+0.052, +0.107] | No |
| v5 AIPW (trajectory 72h × 3 bins) | +0.0301 [+0.009, +0.052] | +0.0802 [+0.053, +0.108] | No |
| IV-LATE (2SLS, n=4,383 prescribers) | +0.0625 [+0.048, +0.079] | +0.0102 [-0.002, +0.021] | Y_vte: yes (CI ⊃ 0) |
| DCIE neural counterfactual | +0.0255 [+0.0245, +0.0266] | deferred (full-Sinkhorn full-N hardware-gated) | No (matches AIPW) |
| Rosenbaum Γ-bound | Γ_zero = 1.06 (bleed) · Γ_zero = 1.17 (vte) | Quantitative fragility | |
Five methods on the same cohort. None recovers RCT direction on Y_bleed; IV-LATE recovers RCT non-inferiority on Y_vte. Γ = 1.06 means a 5.6% odds-ratio shift from an unobserved confounder flips the bleed estimate.
| Outcome | ATT [95% CI] | Direction | RCT consistency |
|---|---|---|---|
| Y_stroke | +0.0084 [+0.0012, +0.0168] | DOAC marginally worse | RE-LY / ROCKET-AF / ARISTOTLE / ENGAGE-AF AFib+CKD subgroups |
| Y_bleed | -0.0396 [-0.0521, -0.0277] | DOAC ~27% relative reduction | RCT-consistent |
Cohort n=8,990 → 4,220 post Crump-2009 trim. First observational W-lane to recover RCT direction on both outcomes. Trim sensitivity α=0.05/0.10/0.15 robust (Δ ATT < 0.004).
| Benchmark | Rosenbound D-MPNN AUROC | Chemprop v2 reference | Within σ overlap |
|---|---|---|---|
| BBBP (blood-brain barrier penetration) | 0.9144 ± 0.0113 | 0.897 ± 0.012 | Yes (matches) |
| BACE (β-secretase inhibition) | 0.8861 ± 0.001 | 0.859 ± 0.024 | Yes (matches) |
| HIV (replication inhibition) | 0.7937 ± 0.0149 | 0.776 ± 0.020 | Yes (matches) |
In-house PyTorch rewrite (~500 LOC, no Chemprop runtime dependency). 5-test gradient-check suite green. Bemis-Murcko scaffold splits, no ensembling — published-baseline parity under stricter conditions.
| Stage | Volume | Throughput | Coverage |
|---|---|---|---|
| W1 radiology corpus | 570K notes / 572 chunks / 141 MB output | 9.7 notes/sec | ~30M entities, 13-18% negation rate |
| W2 discharge + radiology corpus | 1.07M notes / 1,071 chunks / 975 MB output | 9.7 notes/sec | ~50M+ entities |
| Total processed | 1.64M clinical notes | en_core_sci_md NER + medspacy_sectionizer + ConText | 80M+ entities with negation/historical/family attributes |
Stage 1 cohort builder: 4.3 min via DuckDB. Block-NumPy-ABI-fix-1 unblocked the 3.7.5 spaCy stack on Windows (numpy < 2.0 + spacy 3.7.4 + thinc 8.2.5).
RUN LOGS OF CAI/:
MIMIC-IV W1/W2/W3/W5/W11 (05-03-2026-p1.txt); FAERS Pipeline B (04-22-2026.txt);
ACIC22 V3 lock (05-02&03-2026.txt); W12 sensitivity pentagon + DCIE (05-04-2026.txt);
DMPNN MoleculeNet (04-19-2026.txt, 04-20-2026.txt);
HNSI extraction (04-26-26-p1.txt through 04-29-2026).
Available to design partners under NDA on request.
Pharmacovigilance triage product satisfies all four criteria of 21st Century Cures Act § 3060 (transparent algorithms, documented provenance, human-in-the-loop, independent review of basis). Ship to pharma drug-safety customers in 2026 without 510(k).
Patent-pending VBSM module satisfies 21 CFR Part 11 § 11.10(a)–(k) by design. SHA-256 chained ledger entries, monotonic_ns timestamps, fsync + PID-lock multi-writer protection. Pharma clients use our outputs in their own regulated workflows without bolting on separate audit infrastructure.
Q-Submission scoping for direct clinician-facing CDS deployment scheduled Q3 2026 contingent on first signed pilot. Breakthrough Device designation eligible for the W12 anticoagulation lane given quantitative advantage over existing point-estimate-based decision support.
Aligned with FDA's 2024 Real-World Evidence Framework and EMA's 2023 Reflection Paper on RWE. Sensitivity-bounded reporting is exactly what both agencies are now requiring in observational submissions; we provide it as the default output, not as an afterthought.
Founder Harsh Singh filed a USPTO provisional patent on March 22, 2026 covering the four core modules of the Rosenbound platform — DCIE neural counterfactual learner, VBSM ALCOA+ commit-gate ledger, PSIM online causal memory, and SMCE capability-conditional abstention — plus their integration architecture. 30+ claims. 12-month conversion window to non-provisional or PCT through March 2027. A second provisional covering the W-lane sensitivity-pentagon orchestration layer is in preparation.
Deterministic Causal Inference Engine. TARNet + Sinkhorn-Wasserstein IPM + NCAAP twin-network + Dragonnet propensity head. Production-validated on ACIC22 V3 lock and MIMIC-IV vasopressor TE.
Verifiable Boundary State Machine. 21 CFR Part 11 / ALCOA+ commit-gate ledger with cryptographically chained entries. Production-validated on W1, W2, Pipeline B, and DCIE retrains.
Patient State Identity Manager. Online accumulating causal memory (ESCG + HCMC + MCLR). Production-validated on 2.83M FAERS-row backfill with full memory loop.
Self-Monitoring Capability Estimator. Probe-based per-prediction capability estimation for abstain-when-uncertain. Production-validated A2/A3/A4 PASS on Pipeline B test split (n=158,732).
Looking for one pharmacovigilance, clinical-research, drug-discovery, or AMC partner who'd benefit from sensitivity-bounded causal reporting in their existing workflow. The pilot is no-cost; you provide an adverse-event corpus or observational cohort, we deliver causal estimates with Rosenbaum bounds and a 21 CFR Part 11 audit trail.
Email harsh22@bu.eduOr connect via LinkedIn: linkedin.com/in/harshsingh2103