Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Data sources & ingest cadence ​

Purpose: Canonical record of every external data source we ingest into MongoDB, refresh cadence, lineage, and the script that owns each pipeline. SOC 2 CC8.1 (Change Management) + CMS EDE Phase 3 data-provenance evidence.

Convention: every row in Sources table below maps to exactly one ingest script under scripts/db/ and one MongoDB collection. The script's header docstring embeds the refresh playbook.

Sources ​

DomainSourceFormatLicenseCadenceScriptMongoDB collection(s)
ZIP → county → state (SBE-state ZIPs)CMS Marketplace API /counties/by/zip/{zip} (canonical for byte-for-byte audit parity)JSON snapshotPublic domain (CMS)Annual (plan-year transition)scripts/db/build-cms-snapshot.js → scripts/db/seed-sbe-zips-from-cms.jszip_county (per-county redirect docs with _seedSource marker)
ZIP → county (federal-30 + NY)U.S. Census ZCTA + hand-curated NY dataJSONPublic / internalAnnual (plan-year transition)scripts/db/load-zip-county.jszip_county (county docs)
ZIP USPS-completeness universe (federal-30 + NY)zipcodes npm package (USPS-derived, MIT) - catches PO-Box-only / business-only / single-building ZIPs Census ZCTA missesCSV snapshotMIT (npm)Annual (plan-year transition) - upgrade to HUD ZIP-County crosswalk recommendedscripts/db/build-usps-snapshot.js → scripts/db/audit-federal-completeness-tier-0-5.js → scripts/db/seed-federal-tier-0-5.jszip_county (_seedSource: "federal-tier-0-5-audit-2026-05-01")
Plans, premiums, rate areas (federal-30)CMS Federal Marketplace PUF (Plan Attributes, Benefits & Cost Sharing, Service Areas)CSVPublic domain (CMS)Annual (CMS releases ~Sept)scripts/db/ingest-puf-augment.js, ingest-qrs-ratings.jsplans, regions, plan_years
Plans, premiums (NY)NY State of Health Essential Plan + DFS rate filingsJSONPublic (NY DFS)Annualscripts/db/load-ny-2026.jsplans, regions, plan_years
Stale ZIP redirects + PO Box ZIPsManual curation from CMS/Census discrepanciesHardcoded JS arrayn/aAd-hoc (audit-driven)scripts/db/fix-stale-zips.jszip_county (specific overrides)
(SUPERSEDED) ZIP → state (SBE redirects, original)U.S. Census 2020 ZCTA-to-County relationship filePipe-delimited TXT——scripts/db/seed-sbe-zips.js (deprecated)—

Why CMS replaced Census ZCTA for SBE-state ZIPs (2026-04-30): the original seed used Census-derived FIPS, which doesn't always agree with what CMS Marketplace API returns for the same ZIP. Future audits compare our DB byte-for-byte against CMS as the canonical source for ACA marketplace ZIP→county mapping; using a different source breaks that property. The corrective seed switched to querying CMS directly for every SBE-state ZIP, persisting the response to scripts/db/data/sbe-zip-cms-snapshot.json for reproducibility, and inserting per-county docs with FIPS anchors. See change-log entries from 2026-04-30 for the full lineage.

Refresh cadence ​

Every plan-year transition (typically October-November in the year before the plan year, e.g., October 2026 for 2027 plan year) runs the same sequence:

  1. Refresh STATE_BASED_MARKETPLACES in src/lib/constants.ts. A state may have transitioned to/from SBE for the new plan year. Cross-check with CMS's Marketplace operating-status page.
  2. Federal-30 + NY plan + ZIP refresh — primary annual ingest. Owner: ingest-puf-augment.js + load-zip-county.js + load-ny-2026.js.
  3. SBE-state ZIP refresh — see scripts/db/seed-sbe-zips.js header for the full playbook (download Census ZCTA → regenerate snapshot CSV → dry-run → apply staging → apply prod).
  4. Tier 0 federal-completeness audit + apply — see Tier 0 doc. Catches Census-ZCTA-tracked federal+NY ZIPs missing from DB.
  5. Tier 0.5 USPS-completeness audit + apply — see Tier 0.5 doc. Catches PO-Box-only / business-only / single-building ZIPs that Census ZCTA misses but USPS recognizes. Playbook: npm update zipcodes → node scripts/db/build-usps-snapshot.js → node scripts/db/audit-federal-completeness-tier-0-5.js (default concurrency=5; CMS rate-limits at concurrency=10) → node scripts/audit/validate-cms-errors.js --tier=1 for any CMS errors → triage report → phased seed-federal-tier-0-5.js --apply per Constraint 2 (with mongodump backup before each batch per Constraint 1). Recommended upgrade: swap zipcodes npm for HUD ZIP-County crosswalk (https://www.huduser.gov/portal/datasets/usps_crosswalk.html, quarterly refresh, free + HUD account).
  6. Tier 1 + Tier 1.5 audits re-run on prod. Expect TRUE 100% pass post-refresh. Run validate-cms-errors.js --tier=1 and --tier=1.5 for any leftover errors before declaring 100%.
  7. Append change-log entry in change-log.md with timestamp + commit SHA + counts.

For all the operational patterns (CMS rate-limit defenses, backup discipline, phased apply protocol, audit-script behaviors, post-apply validation), see the audit operations runbook. Required reading for any future plan-year refresh or ad-hoc audit work.

Quarterly (within a plan year) refreshes are optional and only triggered if a known data drift is observed.

Provenance + audit trail ​

  • Source URLs: every script's header docstring records the upstream data URL the snapshot was derived from.
  • Committed snapshots: source data is committed to the repo under scripts/db/data/ (e.g., sbe-zip-state-2020.csv for SBE ZIPs). Allows reproducibility, airgap-safe re-runs, and git blame-able lineage.
  • CloudTrail + Atlas audit log: every --apply run produces MongoDB write operations recorded by Atlas's audit log (HIPAA tier) and any AWS-side credential fetches recorded by CloudTrail. Cross-reference at audit time.
  • Tier audits (scripts/audit/tier-1 through tier-5): structural validation of ingest correctness, run after every refresh.

SBE-state ZIP redirect data — specific notes ​

The seed-sbe-zips.js pipeline has three layers of safety guards documented in the script header. Key invariants:

  1. Federal-30 + NY county docs are never modified by this script. Guard 2 detects any countyFips field on an existing doc and skips the source row with a CONFLICT log.
  2. Border ZIPs that span SBE + federal counties (57 such ZIPs identified in 2026 ingest, 12 of which already had fix-stale-zips.js redirect entries; 45 of which are skipped as conflicts) require per-county redirect handling — out of scope for current schema. Tracked as future enhancement on Issue #68.
  3. Marketplace strings are sourced from STATE_BASED_MARKETPLACES in src/lib/constants.ts (the application's single source of truth). The script copies them inline; a cross-check at script start would catch divergence (TODO if drift becomes a concern).
  4. IL was added to STATE_BASED_MARKETPLACES in this seed (2026-04-30) reflecting Get Covered Illinois launch in 2025 plan year. Prior to this seed, IL ZIPs returned 404 from /api/counties (then HTTP-403 from CMS via the temp-fix fallback).

Conflict log archive (45 ZIPs as of 2026-04-30) ​

Border ZIPs where Census data claims SBE state but our verified federal data wins. Listed for audit + future per-county-redirect feature scope:

ME/NH border:    03579
DE/MD border:    19973 (also covered by fix-stale-zips: 21874, 21912)
VA/WV border:    20135 (also fix-stale-zips: 24604, 24622)
NC/VA border:    27048, 28675
AL/GA border:    36855 (also fix-stale-zips: 30165, 30741)
TN/VA border:    37642, 37752
TN/KY border:    38079, 38549, 42223, 42602, 40965 (last via fix-stale-zips)
IA/MN border:    51360
SD/MN border:    56136, 56144, 56164, 56219, 56220, 56257, 57026, 57030, 57068
ND/MN border:    58030, 58225 (also fix-stale-zips: 56027, 56744)
MT/ID border:    59847
NM border:       (fix-stale-zips: 88430, 87328)
CO border:       (fix-stale-zips: 81324)

These ZIPs continue to serve federal county data (i.e., the user gets a real plan-search experience for the federal county). Per-county SBE redirect support — where the user could pick "this county is in MD, redirect me" vs "this county is in DE, search plans" — is a future enhancement to the /api/counties response shape + frontend useCalculator() consumer.

Optional CMS API cross-check ​

scripts/db/seed-sbe-zips.js --verify-cms-sample cross-checks 200 ZIPs against CMS Marketplace API at 1 req/sec (~3 minutes). Recommended on first seed of any plan year; skippable on quarterly refreshes if previously clean. The script refuses to --apply if CMS sample finds mismatches (configurable behavior).

Future improvements ​

  • Self-checking: at script start, parse src/lib/constants.ts and assert STATE_BASED_MARKETPLACES matches the script's inlined copy. Currently manual.
  • Per-county SBE redirect support for border ZIPs (Issue #68 follow-up scope).
  • Atlas-CLI-driven snapshot refresh: a one-command scripts/db/refresh-sbe-snapshot.sh that pulls the latest Census file, regenerates the CSV, and emits a diff report.
  • Annual refresh runbook landed as a GitHub Action that opens a PR with the regenerated CSV + dry-run output for human review.
Pager
Previous pageCloudFront + WAFv2
Next pagePhase 4 DNS

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.