Known Gotchas — Things Learned the Hard Way

Non-obvious behaviors of the PUF and CMS API that bit us during validation. Document so future engineers don't repeat the same mistakes.

1. Same plan, different rates per county within the same RA = wrong RA mapping

The trap: CMS returned three different rates for the same Ambetter AL plan in counties all labeled "Rating Area 13" in our zip_county. We thought CMS had per-county adjustments and tried to "correct" the rates.

The truth: Per ACA regulations (45 CFR 156.255), plans MUST have one rate per rating area. If CMS shows different rates for the same plan in counties of the same RA, the RA labels are wrong, not the rates. Some counties were misassigned to RA13 when they actually belonged to RA5 or RA7.

Fix: Update zip_county.regionId to the actual RA. Don't touch plan rates.

Lesson: CMS API rates are ground truth for which RA each county belongs to. Match CMS rate → find the matching RA in PUF → that's the correct mapping.

2. PUF rate file (`rate-puf.csv`) is incomplete

The trap: Some plans serve counties (per service-area-puf.csv) but rate-puf.csv has no rate entries for the corresponding rating area.

The truth: The PUF rate file only contains rates for RAs the issuer explicitly filed. CMS supplements with rates from another source. If you reload ageRatesByArea purely from rate-puf.csv, you'll lose RAs.

Fix: Use the per-state PUF JSONs ({state}-2026-puf.json) as the authoritative list of RAs, then enrich with rates from rate-puf.csv for those RAs only. For RAs not in the CSV, CMS gap-fill is required.

3. Iterative rate corrections cause cumulative corruption

The trap: Running rate correction scripts multiple times applies ratios on top of previously-corrected rates, doubling the correction.

Example:

PUF rate: $605.95
Correction script run 1: × 1.0716 → $649.32
Correction script run 2: × 1.0308 (because new ratio computed from corrected base) → $669.34

Lesson: Rate corrections must be single-pass from a clean PUF baseline. Always restore to PUF originals before applying corrections. Store correction metadata (_cmsRatio, _originalPufAge21) for auditability.

4. Plan `countiesServed` derived from rating areas is WRONG

The trap: The original ingest derived countiesServed from rating area presence — if a plan had a premium for RA1, it claimed to serve all counties in RA1. This caused massive over-listing.

The truth: The authoritative source is service-area-puf.csv which has explicit county-level data (with FIPS codes). Plans serve specific counties, not whole rating areas.

Fix: Always derive countiesServed from service-area-puf.csv keyed by (IssuerId, ServiceAreaId). Note: ServiceAreaId is NOT globally unique — it's unique per issuer within a state.

5. ServiceAreaId is not globally unique

The trap: Two different issuers in the same state can both have a service area called "MIS001". If you build a global lookup keyed by ServiceAreaId, you'll merge their service areas.

Fix: Always key by ${issuerId}:${serviceAreaId}.

6. Partial county coverage

The trap: Some plans serve only specific zip codes within a county (e.g., Ambetter MI serves only zips 48827, 49251, 49264 of Ingham County). If you query by county name only, you'll show the plan to users in zips it doesn't actually serve.

Fix: Read PartialCounty='Yes' rows from service-area-puf.csv, capture the ZipCodes list, store as partialCountyZips: { county: [zips] }. Filter at query time:

$or: [
  { [`partialCountyZips.${county}`]: { $exists: false } },
  { [`partialCountyZips.${county}`]: userZip },
]

7. CMS API doesn't return APTC plans for Medicaid-eligible income

The trap: Querying CMS with income < 138% FPL returns plans WITHOUT APTC discounts, because CMS thinks the user should be on Medicaid. Comparing this to our DB output (which auto-adjusts income) produces false negatives.

Fix: When auditing Medicaid-income scenarios:

Detect Medicaid eligibility (income < 138% FPL)
Apply our auto-adjustment (138.5% FPL + $500)
Query CMS with the adjusted income
Compare against our DB output (which uses the same logic)

8. CMS API has a 200-plan response limit

The trap: Querying CMS without a metal level filter for a large county (e.g., Harris TX has 250+ plans across all metals) returns only 200 plans. The cheapest of any metal might be cut off.

Fix: Either (a) query each metal level separately with filter: { metal_levels: ['Silver'] }, or (b) paginate with offset. Per-metal queries are simpler and stay under the limit.

9. CMS zip → county mapping has multi-county zips

The trap: Zip code 79118 in TX returns 3 counties from CMS: Randall, Armstrong, Potter. If your zip_county has only one entry per zip, the county picker never fires for these zips and the user is silently routed to whichever county your DB happened to pick.

Fix: Allow multiple zip_county entries per zip. Drop the unique index on zip alone, replace with compound unique on (zip, countyFips). Run a CMS sweep to populate all multi-county mappings (38% of zips span multiple counties).

10. County name conventions differ between CMS and PUF

The trap: CMS returns "Saint Clair", PUF lists "St. Clair". Same county, different name. If you key plans by countiesServed: county and the names don't match, plans are invisible.

Known patterns:

CMS name	PUF name
`Saint X`	`St. X`
`Sainte Genevieve`	`Ste. Genevieve`
`LaPaz`	`La Paz`
`LaSalle`	`La Salle`
`LaCrosse`	`La Crosse`
`LaMoure`	`LaMoure`
`Lagrange`	`LaGrange`
`Desoto`	`DeSoto` (or `De Soto` for LA)
`Obrien`	`O'Brien`
`Juneau`	`Juneau City and` (AK borough)
`Sitka`	`Sitka City and` (AK borough)
`St. Louis City`	`St. Louis city` (lowercase city)

Fix: Normalize CMS-sourced names to PUF conventions when ingesting. Build a normalization map.

11. Plan IDs in CMS responses include CSR variant suffixes

The trap: Plan IDs in plan-attributes-puf.csv look like 53932AL0100012-00, -01, -02, etc. CMS API returns IDs like 53932AL0100012 (the StandardComponentId). When matching, strip the variant suffix.

Fix: Use the first 14 characters as the canonical plan ID (hiosId.slice(0, 14)).

12. Some issuers use non-standard age curves

The trap: ACA standard age curve (45 CFR 147.102) says age 35 should be 1.222× age 21. UT issuers use ratios closer to 1.39 — a state-specific curve allowed by regulation.

Fix: When sanity-checking age curves in Tier 4, allow state-specific curves. UT plans should pass with ratios 1.35-1.45.

13. Atlas backups are raw WiredTiger files

The trap: Atlas snapshot downloads aren't BSON — they're raw WiredTiger files. You can't mongorestore them directly to extract a single collection.

Fix: Use the Atlas web console or CLI to do a full cluster restore. To restore a single collection, you need to:

Spin up a temporary cluster
Restore the snapshot there
Use mongoexport to dump the collection
mongoimport into the production cluster

Or just maintain your own pre-write snapshots via atlas backups snapshots create.