Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Session log — 2026-04-23 / 2026-04-24 UTC — Phase 10 cutover ​

Scope ​

Flip Cloudflare apex DNS from Vercel to the prod CloudFront distribution, migrating live customer traffic to the AWS stack. Identify + fix latent bugs surfaced during post-cutover smoke (Vercel write failure, broken Resend account, missing S3 upload permissions on prod task role). Vercel stays warm as rollback target for the first 48h.

Actor ​

  • Human: Taha Abbasi.
  • Agent: Claude Opus 4.7 (1M context), running in Claude Code CLI.

Tickets ​

  • Advances Issue #47 from Phase 8 (prod canary) through Phase 10 (live cutover).
  • Surfaces a pre-existing Vercel prod bug (empty MONGODB_WRITE_URI for ~2 weeks) — resolved in-session via app-write password rotation and Vercel env repopulation.
  • Phase 11 items started — SES production-access request filed, Resend retirement confirmed rather than revived.

External systems touched ​

Cloudflare DNS (askflorence.health) ​

Two records edited via Cloudflare dashboard, both changed from Proxied to DNS-only with TTL 300s:

RecordBeforeAfter
askflorence.health (apex)A 216.198.79.1 (proxied)CNAME d1pnfyzua893hx.cloudfront.net (DNS only)
www.askflorence.healthCNAME askflorence.health (proxied)CNAME d1pnfyzua893hx.cloudfront.net (DNS only)

Global DNS propagation observed within 15 seconds. First CloudFront edge log entry from a real user hit came in ~30s after save.

AWS prod (039624954211) ​

  • Prod ECS task def revision :5 created (EMAIL_PROVIDER=resend failover attempt), :6 (back to EMAIL_PROVIDER=ses), :7 (final — adds S3_AGENT_SURVEY_BUCKET=askflorence-data). Service rolled to :7.
  • Prod task role gained inline policy S3AgentSurveyUploadsWrite via Terraform (infra/envs/prod/ecs.tf).
  • Prod Atlas IP access list unchanged — 0.0.0.0/0 + 10.20.0.0/16 both present. 0.0.0.0/0 removal deferred until 48h Phase 10 bake completes.

AWS management (778477254880) ​

  • askflorence-data bucket gained a Terraform-managed bucket policy (first TF-owned aspect of that bucket). New file infra/envs/management/s3-askflorence-data.tf. Preserves existing DenyNonSSLRequests statement and adds cross-account grant AllowProdEcsTaskRolePutAgentSurveyUploads for arn:aws:iam::039624954211:role/askflorence-prod-app-task on s3:PutObject on askflorence-data/agent-survey-uploads/*.
  • KMS CMK alias/askflorence-data key policy — unchanged. Existing AllowOrgPrincipalsForTfstate statement (ViaService-bound to s3.us-east-1) covers the prod task role's need to GenerateDataKey when writing KMS-encrypted objects.

MongoDB Atlas (prod project 69dc20c64005b222804dafa4) ​

  • app-write user password rotated via atlas dbusers update (safe rotation — pre-existing Vercel bug had MONGODB_WRITE_URI="" so no production consumer relied on the old password).
  • IP access list entries unchanged.

Vercel ​

  • MONGODB_WRITE_URI env var repopulated with the rotated app-write URI.
  • vercel --prod redeploy triggered so the running functions pick up the new env.
  • Vercel write path restored to working state (previously broken since 2026-04-16 per env var modification timestamp).
  • Deployment otherwise unchanged — Vercel stays warm as Phase 10 rollback target.

Resend (external) ​

  • Investigated: API key stored in Vercel env has a literal \n (backslash + n) at the end, same bug class as previously-fixed CMS_API_KEY. Stripping the literal \n produces a valid authenticating key.
  • Resend account's updates.askflorence.health domain has been in status: "failed" since 2026-04-10 — required DKIM CNAMEs were never added to Cloudflare. Vercel email sending therefore stopped ~2 weeks ago (compounded with the empty MONGODB_WRITE_URI bug).
  • Decision: not revive Resend. Send path moves to AWS SES full-time. Phase 11 retires the Resend account.

AWS Support (prod account SES case) ​

  • SES production-access request filed with conservative transactional framing (< 100/day current, < 500/day 60d ceiling, < 5k/day through end of 2026). AWS initial response asked for email-type detail + bounce/complaint/unsubscribe handling + example content + verified-identity status. Detailed response submitted 2026-04-24T02:05Z. Awaiting approval (typical turnaround 24-72h).

The three bugs surfaced during cutover smoke ​

(1) Vercel MONGODB_WRITE_URI="" — latent since 2026-04-16 ​

Consumer + agent waitlist + agent discovery writes on Vercel prod were failing with the code-level error MONGODB_URI_WAITLIST_WRITE or MONGODB_URI_SURVEY_WRITE or MONGODB_WRITE_URI must be set. Because the UI renders the same "You're on the list" success page regardless of the backend outcome (email is fire-and-forget, the Mongo write is best-effort after the response is already formed in memory), no user or monitor caught this. No alerting on MongoParseError in Vercel logs.

Discovered by reading Vercel env during Phase 8 secret population (Vercel stored the key with an empty value). Confirmed via direct Mongo query — no agent waitlist rows from the Vercel era in the ~2-week window.

Fix: rotate the prod Atlas app-write password (Atlas CLI), populate prod/mongodb/app-write in AWS Secrets Manager, push the same URI to Vercel env, re-deploy Vercel. Verified via POST /api/waitlist at Vercel returning 200 + real _id post-fix.

Impact lesson: trailing-\n literal on a secret value, and empty-string on a required env var, are a class of bug that needs a pre-commit or CI guard. Adding a validation step to a future CI job is captured in Phase 11 todo.

(2) Resend API key literal \n + domain "failed" status ​

AWS SES cutover smoke surfaced an attempt to failover to Resend (via EMAIL_PROVIDER=resend). Resend API returned API key is invalid. Hex dump of the Vercel-stored value:

N   q   k   i   \   n   "   \n

That's literal backslash-n followed by LF — a known bug class from the CMS_API_KEY episode. Stripping the \n (shell: "${V%\\n}") produces a valid key. Subsequent test with the stripped key returned domain updates.askflorence.health is not verified — separate issue.

Resend dashboard (via API): updates.askflorence.health added 2026-04-10, status failed. No DKIM CNAMEs for Resend in Cloudflare. Resend email sending has been non-functional on this account for ~2 weeks, independently of the Mongo bug.

Decision: AWS SES is the forward path. Resend retires per Phase 11. No need to verify Resend DKIM now.

(3) Prod ECS task role had no S3 upload permission ​

POST /api/agents/discovery/upload on prod returned 400 "Only PDF, JPG, or PNG files are accepted" after correct docType + blankConfirmed fields. Reading the upload route revealed:

  • Writes to S3 bucket askflorence-data in management account (778477254880), not prod.
  • Vercel had cross-account access via static AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY env vars (IAM user creds, retire at Phase 11 post-cutover).
  • Prod ECS task role had only ses:SendEmail in its inline policy — no S3 grant. Upload path was dead end-to-end.

Proper fix via Terraform (not IAM user keys):

  1. infra/envs/management/s3-askflorence-data.tf — new file. Manages the bucket policy on askflorence-data (first TF-owned aspect of that bucket; the bucket resource itself predates TF and stays unmanaged). Preserves existing DenyNonSSLRequests + adds explicit cross-account grant for the prod task role.
  2. infra/envs/prod/ecs.tf — task role gains S3AgentSurveyUploadsWrite inline policy granting s3:PutObject on the same prefix. Task def env var S3_AGENT_SURVEY_BUCKET=askflorence-data added.
  3. No KMS key policy change — existing AllowOrgPrincipalsForTfstate statement on the mgmt CMK covers S3-via-kms:ViaService.

Verified end-to-end: PDF upload → 200 + object present at askflorence-data/agent-survey-uploads/custom/1776993996441-0a767d98490801537e44789e-consent-template.pdf. GuardDuty Malware Protection scans the new object automatically.

Verification (Phase 9 + Phase 10 combined) ​

Pre-cutover (Phase 9 gate):

  • HTTP parity probe: 60/60 PASS across 20 stratified scenarios × 3 endpoints (/api/counties, /api/eligibility, /api/plans). Stratified over federal states (TX, FL, OH, GA, NC, UT including UT's unique age-curve band, AZ, PA) + SBE NY (Manhattan, Rochester, Syracuse), various household sizes 1-4, incomes spanning CSR-94 through no-CSR zones.
  • Prod canary /api/waitlist POST → 200 with real Mongo _id via peering.
  • Direct ALB smoke from CI runners via origin.askflorence.health — bypasses WAF false-positive block on GitHub IP ranges.

Post-cutover:

  • Every public route (/, /plans, /agents, /agent-onboarding, /agent-discovery, /updates, /privacy, /terms) → 200.
  • POST /api/eligibility with correct nested shape → 200 with real APTC + CSR (APTC=$425, CSR=73% AV Silver for age 35 single $35k Dallas TX).
  • POST /api/plans same shape → 200 with 100 plans + full cost-share data.
  • POST /api/waitlist consumer + agent variants → 200 with real Mongo writes.
  • POST /api/agents/discovery/upload with valid PDF → 200 with real S3 object key.
  • WAF SQLi probe → 403 blocked.
  • Response headers clean: server: AskFlorence, HSTS, CSP, X-Frame-Options DENY. No trace of Vercel or Cloudflare proxy.
  • ECS: 2 HA tasks, rollout COMPLETED, task def :7, 0 × 5xx over 10-min window.

Next ​

  • T+48h: remove 0.0.0.0/0 from prod Atlas IP access list (closes Vercel's reach into prod Atlas; Vercel keeps running without DB access as a pure DNS-level rollback target).
  • T+48h: archive Vercel project (don't delete — keep for reference).
  • T+48h: raise Cloudflare TTL back from 300s to Auto.
  • SES production-access approval: expected 24-72h from reply submission. Once granted, all email sends resume from SES without any sandbox recipient limitations.
  • Phase 11 hardening: retire Resend account, finalize PostHog self-host vs replace decision, activate Drata read-only IAM, schedule first pen test, clean up secret-validation CI guard to catch future literal-\n bugs.
  • Phase 12 compliance docs: SOC 2 + HIPAA + EDE control-mapping.
Pager
Previous pageIndex
Next page2026-04-22 — Phase 8 prod AWS mirror

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.