Skip to content

AskFlorence — System Architecture

Status: Living document. Last updated April 7, 2026. Version: 0.2.0 (pre-launch architecture)


Overview

AskFlorence provides 50-state ACA health insurance plan comparison with real-time subsidy calculations. Two complementary data paths serve a unified consumer experience, with licensed broker fulfillment for enrollment.


Data Paths

Path A: CMS Marketplace API (28 Federal States)

Real-time proxy to the Centers for Medicare & Medicaid Services Marketplace API.

States: AL, AK, AZ, AR, DE, FL, GA, HI, IN, IA, KS, LA, MI, MS, MO, MT, NE, NH, NC, ND, OH, OK, OR, SC, SD, TN, TX, UT, WI, WV, WY

Medicaid income handling: When CMS flags is_medicaid_chip: true, the API automatically re-queries with income bumped above the Medicaid threshold to retrieve correct CSR-enhanced benefits. APTC is then recalculated using the IRS formula with the consumer's real income. This serves asset-ineligible Medicaid populations — our primary target audience.

Path B: PUF Data + Federal Formula (21+DC SBE States)

Batch-loaded plan data from CMS Public Use Files, combined with real-time APTC/CSR/state subsidy calculations.

States: CA, CO, CT, DC, GA, ID, IL, KY, ME, MD, MA, MN, NV, NJ, NM, NY, PA, RI, VT, VA, WA

Unified Response Schema

Both paths return identical shape to the frontend:

typescript
interface PlanResult {
  plan_id: string;
  issuer_name: string;
  plan_name: string;
  metal_level: string;        // "Silver", "Silver CSR 94", etc.
  plan_type: string;          // "HMO", "PPO", "EPO"
  sticker_premium: number;
  subsidized_premium: number;
  federal_aptc: number;
  state_subsidy: number;
  state_credit: number;
  total_savings: number;
  deductible: number;
  max_oop: number;
  copays: Record<string, string>;
  quality_rating: number;
  benefits_url: string | null;
  formulary_url: string | null;
  network_url: string | null;
  data_source: "cms_api" | "puf_calculated";
  data_year: number;
}

Consumer & Broker Enrollment Flow

See consumer-broker-flow.md for the detailed end-to-end journey.


MongoDB Collections


Infrastructure

Cost Estimate

PhaseConfigMonthly
Pre-launchAtlas M10, Fargate 0.25vCPU, Redis micro~$87
Post-launch (1K-10K users)Atlas M20, Fargate 0.5vCPU ×2, Redis small, CloudFront~$255
Scale (10K-50K users)Atlas M30, Fargate 1vCPU ×4, Redis medium, WAF~$800

Data Architecture — Multi-Source Ingestion

Phase 1: Source File Storage & Audit Trail

All source files are stored in S3 with a manifest per state per year for auditability.

s3://askflorence-data/
├── sources/
│   ├── {state}/{year}/
│   │   ├── dfs-exhibits/         # State regulatory filings (ZIP/Excel)
│   │   ├── marketplace-scrape/   # State marketplace HTML (raw backup)
│   │   ├── puf/                  # CMS PUF CSVs (when available)
│   │   └── official-docs/        # Rating region PDFs, plan lineups
│   └── federal/{year}/
│       └── cms-api-snapshots/    # Periodic CMS API response snapshots
├── processed/
│   └── {state}-{year}-unified.json  # Parsed, merged, validated
└── manifests/
    └── {state}-{year}-manifest.json # Source URLs, dates, checksums, validation results

Phase 1: Data Sources by State Type

Phase 1: API Facade (Competitive Opacity)

The frontend NEVER knows where data comes from. All requests hit AskFlorence API routes. Whether the backend queries MongoDB or proxies CMS is invisible to the client and competitors.

Key principle: Response schema is always ours. Even when proxying CMS, we strip CMS-specific fields and normalize. No marketplace.api.healthcare.gov URLs visible in network inspector. The data_source field in PlanResult should be removed from client-facing responses.

Phase 1: MongoDB Collections (Updated)

Phase 1: Ingestion Scripts

scripts/ingest/
├── common/
│   ├── build-zip-county-map.py    # HUD crosswalk → zip_county collection
│   ├── validate.py                # SLCSP match, copay consistency checks
│   └── load-to-mongo.py           # Unified JSON → MongoDB collections
├── ny/
│   ├── download-dfs-exhibits.py   # 14 ZIP files from DFS portal
│   ├── parse-exhibit-23.py        # Excel → premiums JSON
│   ├── scrape-nysoh-listings.py   # 62 counties, plan availability
│   ├── scrape-nysoh-details.py    # 282 plan detail pages
│   └── build-unified.py           # Merge DFS + NYSOH → unified JSON
├── ca/
│   └── (similar structure)
└── federal/
    └── snapshot-cms-api.py        # Periodic CMS API snapshots for owned data

Data Refresh Pipeline


EDE Readiness Checklist

This architecture is designed so EDE certification is an addition, not a rewrite.

RequirementCurrent StatusEDE Gap
HIPAA compliance✅ BAA + encryption + audit logsNone
Data model✅ Consumer, enrollment, plan schemasAdd CMS enrollment fields
Identity verification❌ Not builtIntegrate ID proofing service
CMS API integration✅ 28-state proxy workingAdd 20+ EDE-specific endpoints
Security audit (NIST 294)❌ Not done$150K-$400K third-party audit
Legal agreements❌ Not filedCMS EDE application
Broker NPN tracking✅ Schema readyPopulate with licensed brokers
Audit trail✅ CloudWatch + MongoDBVerify covers CMS requirements

Migration Path

StepTimelineWhat
1. DatabaseWeeks 1-2MongoDB Atlas setup, migrate waitlist, load config
2. PUF IngestionWeeks 3-5Parse 21 state PUFs, load to MongoDB, build Path B queries
3. Unified APIWeeks 5-7ECS Fargate service, state routing, Redis cache, audit logging
4. Frontend UpdateWeeks 7-8Point to unified API, 50-state UI, state subsidy display
5. HIPAA DocsParallelBAA signatures, risk assessment, incident response plan
6. Rx/Condition MatchingWeeks 8-10Integrate CMS formulary data for plan recommendation
7. Broker QueueWeeks 10-12PII collection, encryption, broker portal, enrollment flow
8. EDE PrepMonths 6-12NIST audit, CMS application, additional API integration

AskFlorence Internal Documentation. Not for public distribution.