Validation

How well does the model reproduce real-world NRMP outcomes? Each chart compares a predicted (simulated cohort) value against a target (held-out NRMP statistic that the calibration code never sees). Tolerances and pass criteria are described inline.

Per-pool match rates

Simulated vs. NRMP 2024 target

Pool-indicator weights in W_p are calibrated at build time so the cohort match runs reproduce real per-pool match rates. Bars show simulated rate (teal) vs. NRMP target (amber).

Loading cohort metadata…

Per-specialty match rates

Bootstrap mean vs NRMP holdout target

Specialty-level match rate calibration. A check passes when the bootstrap mean is within tolerance of the NRMP target, OR when the target falls inside the 95% CI of the bootstrap samples.

Loading scorecard…

Couples-match outcomes

NRMP Table 16 (1987–2024) reference

The cohort precompute doesn't simulate couples directly (joint Roth-Peranson on couple-pairs is a separate build-time step). At runtime we redistribute joint-success cohort outcomes into the empirical Table-16 split (89.84% both / 8.04% one-only / 2.11% neither), 5-year average 2020–2024.

Loading couples history…

What we don't model

Visible caveats about the model's honest limits. Each is a real source of error that the calibration metrics on this page do not capture.

No second-order beliefs. Strategic behavior (signaling, audition rotation choice) is driven by what each side believes the other side perceives. We model a single-shot deferred acceptance with deterministic preferences. See methodology for the perception-matrix framework that captures this.
No per-program W_p variation. Within a specialty all programs share the same per-feature weights. Real programs vary — academic vs community, research vs clinical focus. This is one of the larger sources of error for borderline applicants.
No Doximity-style reputation signal. Program prestige enters the model only via fill-rate posterior (a noisy proxy). Doximity Residency Navigator is paywalled and we don't scrape it.
Limited geographic preference modeling. Applicant-side preferences for region are sampled synthetically; we don't observe real applicants' geographic constraints. State-licensure-track and home-state effects are captured only at the visa-status level.
Couples-match precision. Anon-couples mode (single account, partner data via sliders) approximates partner ARCS / experiences. Linked couples (two accounts) is deferred to a future phase.
No ARCS adoption in W_p. The ARCS publication-impact metric is recent and not yet in NRMP / PD-Survey instruments. We compute and surface it on user profiles for self-comparison, but programs aren't calibrated to weight it directly.

Validation is a continuous discipline — we'll re-run these checks with each cohort regen and update this page automatically.