Appearance
Tier 0 — Federal ZIP completeness audit (2026-05-01)
Status: Complete. 366 gaps inserted; 451 discrepancies logged for review; 1,353 extras logged. Closes Issue #73 Path 2.
Purpose: Systematic completeness check of federal-30 + NY ZIP coverage in
zip_countyagainst the U.S. Census 2020 ZCTA-County universe. Companion to the Tier 1 zip-county audit (which checks accuracy at 100% match) — Tier 0 checks coverage.
Summary
| Metric | Count |
|---|---|
| Census federal+NY universe (zip, countyFips tuples) | 29,793 |
| DB before audit | 30,329 |
| DB after audit | 30,695 |
| Insertable gaps (Census has, DB doesn't, CMS confirms) | 366 |
| Discrepancies (Census says X, CMS says Y — log only, not inserted) | 451 |
| Extras (DB has, Census doesn't — informational, not modified) | 1,353 |
| Needs-PUF (county entirely missing — would need PUF re-ingest) | 0 |
| CMS errors during audit | 0 |
What was inserted
All 366 inserts are NY-side multi-county additions. Every NY ZIP that gained a doc already had ≥1 sibling doc in the DB; the audit added the missing additional county docs. The original NY ingest (scripts/db/load-ny-2026.js, 2026-04-12) loaded the primary county per ZIP for many multi-county ZIPs but missed the secondary counties.
Examples of fixed ZIPs:
- 10463 → had only Bronx; added New York County (Manhattan)
- 10470 → had only Bronx; added Westchester County
- 10509 → had only Putnam; added Westchester County
- 10940 → had only Orange; added Sullivan County
User impact: residents of these NY ZIPs whose actual address is in the secondary county will now correctly see plans for that county. Pre-audit they may have been mapped to the wrong county's plan flow.
Per-state breakdown
Insertable (366 gaps inserted)
| State | Count |
|---|---|
| NY | 366 (all) |
All other federal-30 states: 0 insertable gaps. Federal-30 ZIP coverage was already complete at the (zip, countyFips) tuple level — only NY had the multi-county-secondary-county gap pattern.
Extras (1,353 — logged, not modified)
DB has these (zip, countyFips) tuples; Census 2020 ZCTA doesn't recognize them. Likely because:
- Our PUF Service Areas data (CMS, more current) tracks ZIPs Census 2020 ZCTA didn't capture (Census ZCTA boundaries lag USPS by years)
- Some county-boundary changes since Census 2020
- Multi-county tracking in our DB beyond what Census 2020 reflects
Distribution:
| State | Extras | Notes |
|---|---|---|
| TX | 155 | Largest absolute count (consistent with state size) |
| IA | 96 | |
| IN | 91 | |
| OH | 86 | |
| KS | 77 | |
| MO | 71 | |
| NE | 68 | |
| AR | 67 | |
| MI | 61 | |
| WI | 55 | |
| TN | 51 | |
| OK | 50 | |
| SD | 50 | |
| NC | 46 | |
| WV | 46 | |
| FL | 43 | |
| MS | 40 | |
| ND | 38 | |
| AL | 33 | |
| LA | 32 | |
| MT | 27 | |
| NH | 14 | |
| OR | 14 | |
| SC | 14 | |
| WY | 11 | |
| AK | 9 | |
| UT | 4 | |
| AZ | 3 | |
| HI | 1 |
These extras are not corrected because:
- They serve users correctly today (users in these ZIPs get plans via existing data)
- Removing them risks stranding real users on USPS ZIPs Census 2020 doesn't yet recognize
- CMS-side validation hasn't been run on the extras (out of scope for Tier 0)
A future audit could spot-check the extras against CMS to confirm they're real ZIPs. Out of scope here.
Discrepancies (451 — logged, not inserted)
Census 2020 says ZIP X is in county A; CMS says it's in county B (different county). For these, trust CMS. The audit doesn't insert based on Census's view because CMS is canonical.
Most discrepancies are NY-skewed (~440 of 451). Likely Census 2020 boundaries differ from current CMS Marketplace API view of ZIP→county mapping, particularly for upstate NY ZIPs.
Sample:
| ZIP | Census says | CMS says |
|---|---|---|
| 03458 | NH/Cheshire | NH/Hillsborough |
| 11370 | NY/Bronx | NY/Queens |
| 12120 | NY/Greene | NY/Albany |
| 12763 | NY/Ulster | NY/Sullivan |
| 12785 | NY/Orange | NY/Sullivan |
For these ZIPs, our DB likely already has the CMS-correct county. The discrepancy is just "Census 2020 ZCTA is mildly stale relative to current CMS data." No action.
The full list is in scripts/db/data/federal-gap-report-2026-05-01.json under the discrepancy key.
Methodology
Input sources
- Census 2020 ZCTA-County relationship file — universe of every U.S. ZIP→county mapping. Free, federal, refreshed annually. Source: https://www2.census.gov/geo/docs/maps-data/data/rel2020/zcta520/tab20_zcta520_county20_natl.txt
- MongoDB
zip_countycollection — current state of our ZIP coverage data. Filter:state ∈ federal-30 ∪ {NY}ANDsbeRedirect: { $exists: false }. - CMS Marketplace API
/counties/by/zip/{zip}— canonical truth for each gap, used to verify Census's claim before inserting.
Pipeline
Census ZCTA file ──→ build-federal-snapshot.js ──→ federal-zip-state-2020.csv (committed)
│
▼
audit-federal-completeness.js ──→ federal-gap-report-2026-05-01.json
│ (committed)
▼
seed-federal-completeness.js
--apply
│
▼
zip_county collection
(366 new docs with
_seedSource marker)Classification logic per gap
For each (zip, countyFips) tuple in Census \ DB:
- Query CMS for the ZIP. If CMS returns counties:
- CMS confirms (state, fips) match → check regionId lookup
- regionId found in DB (county already has siblings) → insertable
- regionId not found (county entirely absent from our PUF) → needs-PUF
- CMS doesn't return Census's expected county → discrepancy (logged only)
- CMS state ≠ Census state → discrepancy (logged only)
- CMS confirms (state, fips) match → check regionId lookup
- CMS error → cms-errors (logged for retry)
Safety guards
- Hard-coded state allowlist (
FEDERAL_STATES ∪ {NY}) — won't touch SBE-state docs - Per-(zip, countyFips) keying for inserts; idempotent
- Marker tag
_seedSource: "federal-completeness-audit-2026-05-01"on every insert → unambiguous rollback - regionId sourced from existing DB siblings (any other ZIP in the same
(state, countyFips)) — guarantees rating-area consistency within a county - Never modifies existing docs
Verification
Apply results identical on staging + prod:
| Step | Result |
|---|---|
| Inserted | 366 |
| Already present (idempotent skip) | 0 |
| Rejected — state allowlist | 0 |
| Rejected — missing fields | 0 |
Validation tier (Phase 8)
| Test | Result |
|---|---|
| Calculator baseline diff (12 scenarios — UT, TX, FL, NY, SBE redirect, PO Box, Medicaid) | ZERO DIFFS |
| Prod consistency check (no-marker docs unchanged) | 30,326 → 30,326 (verified) |
| Multi-county integrity check (sample 5 inserted ZIPs) | All return correct multi-county responses (e.g., 10463 → Bronx + New York County) |
Smoke probe matrix on prod (post-deploy)
zip=10463 → counties:[{Bronx, 36005}, {New York County, 36061}] ← multi-county now
zip=10470 → counties:[{Bronx, 36005}, {Westchester County, 36119}] ← multi-county now
zip=10509 → counties:[{Putnam, 36079}, {Westchester County, 36119}] ← multi-county now
zip=10512 → counties:[{Putnam, 36079}, {Dutchess County, 36027}] ← multi-county now
zip=10940 → counties:[{Orange, 36071}, {Sullivan County, 36105}] ← multi-county nowRollback
bash
MONGODB_WRITE_URI=$(aws --profile askflorence-prod secretsmanager get-secret-value \
--secret-id prod/mongodb/app-write --query SecretString --output text) \
node scripts/db/seed-federal-completeness.js --rollbackRemoves only docs with _seedSource: "federal-completeness-audit-2026-05-01". The 30,326 legacy federal/NY docs untouched. Federal-gap-fix marker (3 docs) untouched. SBE marker (17,537 docs) untouched.
Annual refresh
Add to the data-sources playbook:
- Re-pull Census ZCTA file at plan-year transition
- Re-run
build-federal-snapshot.js→ updated CSV - Re-run
audit-federal-completeness.js→ updated report - Triage classification counts; should be ~0 new gaps in steady state (the federal-30 ingest captures things directly via PUF)
- If gaps surface, run
seed-federal-completeness.jsafter triage - Append change-log entry
Files
scripts/db/build-federal-snapshot.js(build the universe CSV)scripts/db/data/federal-zip-state-2020.csv(committed snapshot, 29,793 rows)scripts/db/audit-federal-completeness.js(run the audit)scripts/db/data/federal-gap-report-2026-05-01.json(committed report, 467 KB)scripts/db/seed-federal-completeness.js(apply the inserts)
Related
- Issue #73 — parent (Path 1: 3 known gaps fixed in commit
aa2a97a; Path 2: this audit) docs/validation/methodology.md— audit methodology referencedocs/infrastructure/data-sources.md— ingest pipeline overview- Tier 1 zip-county audit (
scripts/audit/tier-1-zip-county.js) — companion accuracy check - Tier 1.5 SBE zip-county audit (Issue #70) — companion SBE-side audit