Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Session log — 2026-04-22 / 2026-04-23 UTC — Phase 8 prod AWS mirror ​

Scope ​

Build the entire AWS prod stack in askflorence-prod (039624954211), mirroring staging 1:1 with prod-scoped values + HA. Get the current Next.js app (consumer marketplace, agent flows, APIs) running behind a CloudFront + WAF edge on a private canary URL (prod-canary.askflorence.health) for end-to-end validation before Phase 10 cutover. Vercel prod (askflorence.health + www) continues to serve real user traffic throughout. The apex does not move in this phase.

Actor ​

  • Human: Taha Abbasi.
  • Agent: Claude Opus 4.7 (1M context), running in Claude Code CLI.

Tickets ​

  • Advances Issue #47 Phase 8.
  • Identifies a pre-existing prod Vercel bug worth a follow-up: MONGODB_WRITE_URI on Vercel prod is empty-string, so writes from Vercel have been failing ~6 days. Taha sets the new URI on Vercel post-session to restore Vercel writes until cutover.

External systems touched ​

AWS (prod account 039624954211) ​

  • Network module applied — VPC 10.20.0.0/16, 2 AZs (us-east-1a/b), 2 NAT gateways (HA vs staging's 1), 6 VPC endpoints multi-AZ (kms, secretsmanager, bedrock-runtime interface + S3 gateway + ECR api/dkr), 90-day flow-log retention. Disjoint from staging (10.40.0.0/16) and the future log-archive CIDR for eventual org-wide peering.
  • KMS — new CMK alias/askflorence-prod, rotation on, 30-day deletion window. Same service-principal grants as staging (Secrets Manager + CloudWatch Logs).
  • Secrets Manager — 15 prod shells under prod/*. Populated during the session:
    • prod/mongodb/app-read — from Vercel prod env (the app-read user's SRV URI).
    • prod/mongodb/app-write — freshly generated URI using a rotated app-write password (via atlas dbusers update). Safe rotation because Vercel's MONGODB_WRITE_URI was empty, so nothing on Vercel currently used app-write credentials.
    • prod/mongodb/{waitlist,survey,agents}-write + prod/mongodb/agents-admin — stopgap-populated with the same app-write URI until a follow-up #56 prod session creates narrow-scoped users on the prod Atlas project.
    • prod/mongodb/audit-read — populated with the app-read URI.
    • prod/cms-api-key + prod/resend-api-key + prod/unsubscribe-token-secret + prod/posthog-key — copied from Vercel prod env.
    • Florence / Bedrock / Whisper shells left as PLACEHOLDER.
  • ACM cert — askflorence.health + www.askflorence.health + *.askflorence.health in us-east-1 (required for CloudFront). DNS validation via 2 CNAMEs Taha added at Cloudflare. Status ISSUED in ~3 min after records landed.
  • SES identity — updates.askflorence.health verified (6 records added at Cloudflare by Taha: 3 DKIM CNAMEs + MX for MAIL FROM + SPF TXT + DMARC TXT p=quarantine). DKIM SUCCESS, MAIL FROM SUCCESS, VerifiedForSending true. SES account still in sandbox (production-access request filed separately).
  • ECR — askflorence-app repo with immutable tags (prod-strict — each tag can only be written once, no :latest drift), scan-on-push, 50-image lifecycle retention, CMK-encrypted.
  • ECS — cluster askflorence-prod, task definition family askflorence-prod-app-task (0.5 vCPU / 1 GB per task), service askflorence-prod-app desired 2 (HA across AZs), SES-send inline policy on task role, 90-day CloudWatch Logs retention.
  • ALB — askflorence-prod-alb-1177205004.us-east-1.elb.amazonaws.com. HTTPS listener with the prod cert, HTTP→HTTPS redirect. Deletion protection ON (prevents accidental terraform destroy from nuking the hostname CloudFront points at).
  • CloudFront distribution E9RU8LOGSYL9I (d1pnfyzua893hx.cloudfront.net). Serves 3 aliases: askflorence.health, www.askflorence.health, prod-canary.askflorence.health. PriceClass_All, HTTP/2+HTTP/3, TLSv1.2_2021, the same WAFv2 rule set used on staging (5 managed groups + rate rule 2000 req/5min/IP). Same response-headers policy — HSTS + CSP + X-Frame-Options DENY + Server override to AskFlorence.
  • Atlas prod peering pcx-0cefe999865679045 — Atlas-initiated, accepted on AWS, AllowDnsResolutionFromRemoteVpc=true on accepter side, routes added in both prod private route tables (192.168.248.0/21 → pcx). Atlas IP access list adds 10.20.0.0/16. Legacy 0.0.0.0/0 entry tagged dev stays in place until Phase 10 cutover (Vercel still needs reachability).
  • deploy-prod.yml GitHub Actions workflow — manual workflow_dispatch trigger (GitHub Team plan doesn't support required-reviewers on private-repo environments; workflow_dispatch is the approval surrogate). OIDC federation to arn:aws:iam::039624954211:role/GitHubActionsDeployRole. Smokes origin.askflorence.health/api/health (direct ALB, not CloudFront) because WAF managed rule groups false-positive-block GitHub Actions runner IPs on the prod-canary.* path.

Cloudflare (zone askflorence.health) ​

DNS records added manually by Taha during the session. All DNS-only (proxy OFF):

  • 2 × CNAME for ACM validation (_<hex>.askflorence.health, _<hex>.www.askflorence.health)
  • 3 × CNAME for SES DKIM (<token>._domainkey.updates)
  • 1 × MX + 1 × TXT for SES MAIL FROM (mail.updates)
  • 1 × TXT for DMARC (_dmarc.updates)
  • 1 × CNAME origin.askflorence.health → ALB DNS (CloudFront origin handshake target)
  • 1 × CNAME prod-canary.askflorence.health → CloudFront d1pnfyzua893hx.cloudfront.net (private canary URL for validation)

Apex + www CNAMEs stay pointed at Vercel through this phase. Phase 10 is the swap.

MongoDB Atlas (prod project 69dc20c64005b222804dafa4) ​

  • Peering connection from prod project to the new prod AWS VPC. Added project-level route to 10.20.0.0/16 in prod VPC.
  • app-write user password rotated via atlas dbusers update — because Vercel's MONGODB_WRITE_URI is empty (pre-existing Vercel bug), nothing consumer-facing was using the old password so rotation is a pure no-op from Vercel's perspective.
  • IP access list — added 10.20.0.0/16. Kept the legacy 0.0.0.0/0 entry (tag dev) because Vercel still serves real traffic and uses unpredictable egress IPs.
  • Prod cluster data — untouched.

Vercel ​

  • Not touched. Apex DNS unchanged, env vars unchanged. Vercel continues serving real production traffic through this phase.

GitHub ​

  • Repo variables NEXT_PUBLIC_POSTHOG_PROJECT_TOKEN + NEXT_PUBLIC_POSTHOG_HOST already set during Phase 5.5 — the prod workflow reuses them.
  • No production environment protection rule created — GitHub Team plan limits required-reviewers to public repos. workflow_dispatch is the approval gate instead.

What shipped (chronological) ​

  1. Terraform scaffolding applied in prod. Phase 3 had already planted versions.tf, providers.tf, github-oidc.tf for the prod env — they came up clean on terraform plan. Added network.tf, kms.tf.
  2. terraform apply 1 — 28 resources: VPC + subnets + NAT HA + 6 VPC endpoints + KMS CMK + flow logs + IGW + RTs.
  3. terraform apply 2 — 31 resources: 15 Secrets Manager shells + ACM cert (request only; validation pending DNS). Paused for Taha to add ACM validation CNAMEs at Cloudflare.
  4. SES identity applied alongside ACM. Paused again for Taha to add 6 SES CNAMEs + MX + TXT at Cloudflare.
  5. Polled ACM + SES status until all four signals green (ACM ISSUED, DKIM SUCCESS, MAIL FROM SUCCESS, VerifiedForSending true). ~5 min total propagation after DNS.
  6. terraform apply 3 — 24 resources: ECR + ECS cluster/task-def/service (desired=0) + ALB + CloudFront distribution (slow first-create) + WAFv2 web ACL + response-headers policy + CloudWatch log groups.
  7. Secrets populated with values pulled from vercel env pull. Discovered MONGODB_WRITE_URI="" on Vercel; rotated app-write password via Atlas CLI, populated prod/mongodb/app-write, applied the same URI to the 4 narrow-scoped write secrets as a stopgap until #56 prod session.
  8. Atlas prod peering handshake via atlas networking peering create aws + aws ec2 accept-vpc-peering-connection. Routes added in both private RTs, allowlist entry added. Terraform-imported the accepter + routes into peering.tf.
  9. deploy-prod.yml workflow written and pushed to main. First invocation ran, image built, ECS deployed, tasks came up, smoke step blocked by WAF on the runner IP (false-positive from AnonymousIpList/AmazonIpReputationList). Fix: switched smoke target from prod-canary.* (CloudFront + WAF) to origin.* (direct ALB). Second invocation failed on ECR immutable-tag rejection of :latest. Fix: stopped pushing :latest on prod + switched buildx cache from inline-in-ECR to GHA cache backend. Third invocation fully green.
  10. Full canary validation — all endpoints serve correctly on prod-canary.askflorence.health with parity against Vercel responses; Mongo write over the peered network path succeeds with a real waitlist_submission_id; WAF blocks SQLi probes; CloudFront security headers + server override all correct.

Two gotchas worth preserving ​

(1) workflow_dispatch as approval gate on private Team-plan repos. The original plan had push: branches: [main] + a production GitHub environment with required-reviewers. Confirmed GitHub Team does not support required-reviewers on environments attached to private repos — that's an Enterprise ($21/user) feature. workflow_dispatch gives us the same "nothing deploys without Taha clicking a button" guarantee without a plan upgrade or making the repo public. Any Vercel-era "release on merge" habit doesn't apply.

(2) Immutable ECR tags + inline buildx cache are incompatible. --cache-to type=inline embeds the cache manifest into the image's own manifest, which means re-pushing the same tag. Fine with staging's immutable_tags=false. On prod with immutable_tags=true, every cached rebuild attempts a tag rewrite and gets rejected. The resolution — ditch :latest entirely on prod (task defs pin :<sha>, so no one consumes :latest) and use GitHub Actions' layer cache (type=gha) instead. Side benefit: GHA cache is account-scoped, not repository-scoped, so it doesn't leak container bits outside the account.

Verification ​

From operator laptop, direct public internet, against https://prod-canary.askflorence.health (via Cloudflare CNAME → CloudFront edge → origin.askflorence.health CNAME → ALB → ECS):

  • GET /api/health → 200 {"status":"ok","commit":"a189041…","env":"prod"}
  • GET /api/counties?state=TX&zip=75001 → 200 identical JSON to Vercel prod (CMS proxy)
  • GET /api/counties?state=NY&zip=10001 → 200 identical JSON to Vercel prod (owned-data path)
  • POST /api/waitlist → 200 with waitlist_submission_id (Mongo write via peering — NAT never touched)
  • Response headers from CloudFront: server: AskFlorence, strict-transport-security: max-age=31536000; includeSubDomains; preload, x-frame-options: DENY, content-security-policy: …
  • GET /?id=1' OR '1'='1 → 403 blocked by WAF SQLiRuleSet
  • ECS state: desired 2, running 2, rollout COMPLETED, task def revision :4 (after the narrow-user secret populate + force-new-deployment)
  • CloudWatch aws-waf-logs-askflorence-prod-web-acl receiving WAF logs

SES send path attempted on /api/waitlist flow returned with the expected sandbox rejection — recipient taha@askflorence.health is verified in the staging SES account, not prod. Non-blocker: app returns 200 because sendEmail is fire-and-forget; the code path is exercised and will work on first SES production-access approval + prod-side sandbox recipient verification.

What this session does NOT do ​

  • Does not move production DNS. Cloudflare apex still points at Vercel.
  • Does not retire Vercel. Vercel continues to serve real users exactly as before.
  • Does not populate prod secrets that Florence + Bedrock + Whisper will eventually use — those shells stay as PLACEHOLDER until the relevant workloads ship.
  • Does not provision narrow-scoped prod Atlas users. Stopgap points the app_writer_* secrets at the broad app-write URI. The proper scoped users land in a follow-up #56 prod session.
  • Does not remove the legacy 0.0.0.0/0 entry from the prod Atlas allowlist. Removing that would break Vercel right now. Phase 10 cutover is where it comes out.
  • Does not request SES production access from the prod account. Separate manual request; not blocking because nothing real sends email from the prod AWS stack yet.

Next ​

  • Phase 9 — canary bake. Real-ish synthetic traffic against prod-canary.askflorence.health for 48 h. Full audit tier 1-5 parity run. GuardDuty + Security Hub clean. Nothing in Phase 9 touches apex DNS.
  • Phase 10 — Cloudflare apex CNAME flip askflorence.health + www from Vercel edges to d1pnfyzua893hx.cloudfront.net. After 48 h of clean cutover: pull 0.0.0.0/0 from prod Atlas allowlist, retire Vercel prod.
  • Phase 11 — post-cutover hardening. Resend retirement, PostHog self-host/replace decision, Drata read-only role activation, annual pen test vendor selection.
  • Phase 12 — SOC 2 + HIPAA + EDE control mapping docs closed out against the operating state established from Phase 2 onward.
Pager
Previous page2026-04-23 — Phase 10 DNS cutover
Next page2026-04-22 — Phase 7 Atlas VPC peering

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.