Appearance
AskFlorence — System Architecture
Status: Living document. Last updated April 7, 2026. Version: 0.2.0 (pre-launch architecture)
Overview
AskFlorence provides 50-state ACA health insurance plan comparison with real-time subsidy calculations. Two complementary data paths serve a unified consumer experience, with licensed broker fulfillment for enrollment.
Data Paths
Path A: CMS Marketplace API (28 Federal States)
Real-time proxy to the Centers for Medicare & Medicaid Services Marketplace API.
States: AL, AK, AZ, AR, DE, FL, GA, HI, IN, IA, KS, LA, MI, MS, MO, MT, NE, NH, NC, ND, OH, OK, OR, SC, SD, TN, TX, UT, WI, WV, WY
Medicaid income handling: When CMS flags is_medicaid_chip: true, the API automatically re-queries with income bumped above the Medicaid threshold to retrieve correct CSR-enhanced benefits. APTC is then recalculated using the IRS formula with the consumer's real income. This serves asset-ineligible Medicaid populations — our primary target audience.
Path B: PUF Data + Federal Formula (21+DC SBE States)
Batch-loaded plan data from CMS Public Use Files, combined with real-time APTC/CSR/state subsidy calculations.
States: CA, CO, CT, DC, GA, ID, IL, KY, ME, MD, MA, MN, NV, NJ, NM, NY, PA, RI, VT, VA, WA
Unified Response Schema
Both paths return identical shape to the frontend:
typescript
interface PlanResult {
plan_id: string;
issuer_name: string;
plan_name: string;
metal_level: string; // "Silver", "Silver CSR 94", etc.
plan_type: string; // "HMO", "PPO", "EPO"
sticker_premium: number;
subsidized_premium: number;
federal_aptc: number;
state_subsidy: number;
state_credit: number;
total_savings: number;
deductible: number;
max_oop: number;
copays: Record<string, string>;
quality_rating: number;
benefits_url: string | null;
formulary_url: string | null;
network_url: string | null;
data_source: "cms_api" | "puf_calculated";
data_year: number;
}Consumer & Broker Enrollment Flow
See consumer-broker-flow.md for the detailed end-to-end journey.
MongoDB Collections
Infrastructure
Cost Estimate
| Phase | Config | Monthly |
|---|---|---|
| Pre-launch | Atlas M10, Fargate 0.25vCPU, Redis micro | ~$87 |
| Post-launch (1K-10K users) | Atlas M20, Fargate 0.5vCPU ×2, Redis small, CloudFront | ~$255 |
| Scale (10K-50K users) | Atlas M30, Fargate 1vCPU ×4, Redis medium, WAF | ~$800 |
Data Architecture — Multi-Source Ingestion
Phase 1: Source File Storage & Audit Trail
All source files are stored in S3 with a manifest per state per year for auditability.
s3://askflorence-data/
├── sources/
│ ├── {state}/{year}/
│ │ ├── dfs-exhibits/ # State regulatory filings (ZIP/Excel)
│ │ ├── marketplace-scrape/ # State marketplace HTML (raw backup)
│ │ ├── puf/ # CMS PUF CSVs (when available)
│ │ └── official-docs/ # Rating region PDFs, plan lineups
│ └── federal/{year}/
│ └── cms-api-snapshots/ # Periodic CMS API response snapshots
├── processed/
│ └── {state}-{year}-unified.json # Parsed, merged, validated
└── manifests/
└── {state}-{year}-manifest.json # Source URLs, dates, checksums, validation resultsPhase 1: Data Sources by State Type
Phase 1: API Facade (Competitive Opacity)
The frontend NEVER knows where data comes from. All requests hit AskFlorence API routes. Whether the backend queries MongoDB or proxies CMS is invisible to the client and competitors.
Key principle: Response schema is always ours. Even when proxying CMS, we strip CMS-specific fields and normalize. No marketplace.api.healthcare.gov URLs visible in network inspector. The data_source field in PlanResult should be removed from client-facing responses.
Phase 1: MongoDB Collections (Updated)
Phase 1: Ingestion Scripts
scripts/ingest/
├── common/
│ ├── build-zip-county-map.py # HUD crosswalk → zip_county collection
│ ├── validate.py # SLCSP match, copay consistency checks
│ └── load-to-mongo.py # Unified JSON → MongoDB collections
├── ny/
│ ├── download-dfs-exhibits.py # 14 ZIP files from DFS portal
│ ├── parse-exhibit-23.py # Excel → premiums JSON
│ ├── scrape-nysoh-listings.py # 62 counties, plan availability
│ ├── scrape-nysoh-details.py # 282 plan detail pages
│ └── build-unified.py # Merge DFS + NYSOH → unified JSON
├── ca/
│ └── (similar structure)
└── federal/
└── snapshot-cms-api.py # Periodic CMS API snapshots for owned dataData Refresh Pipeline
EDE Readiness Checklist
This architecture is designed so EDE certification is an addition, not a rewrite.
| Requirement | Current Status | EDE Gap |
|---|---|---|
| HIPAA compliance | ✅ BAA + encryption + audit logs | None |
| Data model | ✅ Consumer, enrollment, plan schemas | Add CMS enrollment fields |
| Identity verification | ❌ Not built | Integrate ID proofing service |
| CMS API integration | ✅ 28-state proxy working | Add 20+ EDE-specific endpoints |
| Security audit (NIST 294) | ❌ Not done | $150K-$400K third-party audit |
| Legal agreements | ❌ Not filed | CMS EDE application |
| Broker NPN tracking | ✅ Schema ready | Populate with licensed brokers |
| Audit trail | ✅ CloudWatch + MongoDB | Verify covers CMS requirements |
Migration Path
| Step | Timeline | What |
|---|---|---|
| 1. Database | Weeks 1-2 | MongoDB Atlas setup, migrate waitlist, load config |
| 2. PUF Ingestion | Weeks 3-5 | Parse 21 state PUFs, load to MongoDB, build Path B queries |
| 3. Unified API | Weeks 5-7 | ECS Fargate service, state routing, Redis cache, audit logging |
| 4. Frontend Update | Weeks 7-8 | Point to unified API, 50-state UI, state subsidy display |
| 5. HIPAA Docs | Parallel | BAA signatures, risk assessment, incident response plan |
| 6. Rx/Condition Matching | Weeks 8-10 | Integrate CMS formulary data for plan recommendation |
| 7. Broker Queue | Weeks 10-12 | PII collection, encryption, broker portal, enrollment flow |
| 8. EDE Prep | Months 6-12 | NIST audit, CMS application, additional API integration |