Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

ADR 0004 — Cross-cluster Atlas reads from prod via AWS PrivateLink ​

Status ​

Accepted — 2026-05-08.

Context ​

The doctor + Rx coverage flow on prod (askflorence.health) needs to read 2.14M NPI provider docs (providers_staging) and 12,557 RxCUI / ~30M drug-plan tuples (formularies_staging). All public CMS marketplace data — non-PHI by classification, sourced from the §1311 MRF ingest pipeline.

This data canonically lives on the staging Atlas cluster (askflorence-staging, project_id 69e31af12fd2c0aef51bbb41, M30 tier ~$382/mo) — that's where the §1311 ingest writes it, where the staging app reads it, and where the non-prod surface for the YC demo URL is exercised.

To make doctor + Rx work on prod we considered three paths:

  • Path A — Duplicate the data onto prod cluster. Forces prod cluster from M10 HIPAA ($56/mo) to M30 ($382/mo) to handle the collection size + index footprint. Total cost rises from $438/mo to $764/mo (+$326/mo recurring). Cleanest audit boundary but expensive, and it adds public reference data to the prod cluster's PHI audit surface, arguably making future EDE Phase 3 audit harder rather than easier.

  • Path B — VPC peering between prod VPC and staging Atlas project. Free, AWS-backbone-only. Blocked by an unsolvable CIDR conflict: both Atlas projects use the default 192.168.248.0/21 for their network containers, Atlas does not allow changing CIDRs on existing projects, and re-creating either project means data migration + multi-day operational disruption.

  • Path B1 — AWS PrivateLink. AWS Interface VPC Endpoint in the prod VPC targets a PrivateLink endpoint service that Atlas exposes for the staging project. AWS-backbone-only at the network layer, TLS at the application layer (doubly protected). Identity-bound at the AWS account level. Doesn't use route-table CIDRs — the endpoint is an ENI in our subnets, traffic flows through it directly, no peering / transit-gateway / route-table involvement. This is the documented pattern Atlas + AWS designed for cross-Atlas-project access where peering doesn't fit. Cost: ~$7-10/mo for the Interface endpoint + negligible AWS data egress.

The three paths were filed and analyzed in docs/decisions/2026-05-03-pivot-cms-api-direct.md "Cross-cluster reference reads via AWS PrivateLink" and the decision-matrix walkthrough on #101.

Decision ​

Prod VPC reads non-PHI public CMS reference data from the staging Atlas cluster over AWS PrivateLink. Concretely:

  • AWS Interface VPC Endpoint vpce-0c81aea11e29bb928 in prod VPC vpc-09201679b87261b6d, multi-AZ across the prod private subnets.
  • Targets Atlas-issued endpoint service com.amazonaws.vpce.us-east-1.vpce-svc-0d8138ea0f6542afa (Atlas endpointId 69fe75c5b02c024f32d2af50).
  • Connection authenticated as a read-only app_read_staging user on the askflorence database.
  • Connection string lives in AWS Secrets Manager (prod/mongodb/reference-uri) with project CMK encryption.
  • Application layer uses a distinct connection pool (getReferenceDb() in src/lib/db.ts) routed via the MONGODB_REFERENCE_URI env var. Falls back to MONGODB_URI when unset, so dev + staging keep working without code changes.

The prod cluster (askflorence-prod-01, M10 HIPAA) remains the only PHI processor. The staging cluster remains the only home for formularies_staging + providers_staging (and the §1311 ingest pipeline that writes them).

Consequences ​

Positive:

  • Saves ~$326/mo recurring vs duplicating data onto a prod M30 cluster.
  • EDE Phase 3 audit boundary stays clean — PHI lives only on prod cluster; non-PHI public reference data lives only on staging; the cross-cluster path is read-only, AWS-backbone-only, and easy to point at in an audit ("identity-bound, no public IP, doubly-protected encryption").
  • Avoids the CIDR conflict that blocks Path B without requiring a project re-creation.
  • §1311 delta-aware MRF refresh (#98) gets a clean architectural target: refresh runs in the staging AWS account, prod picks up the refresh automatically via PrivateLink with no prod-side cron, no double-ingest, no cluster cutover.
  • PrivateLink is the documented pattern Atlas + AWS designed for this case — it survives the EDE Phase 3 cutover to FedRAMP-authorized Atlas Government with the same architecture.

Accepted costs:

  • Two Atlas projects must stay configured + monitored together for the data layer to function. Cross-cluster posture is now load-bearing for the doctor + Rx feature.
  • Staging cluster's IP allowlist remains permissive during pre-launch (Taha's laptop + CI runners need IP-based access for ingest). Hardening to "PrivateLink-only" is deferred post-launch when ingest jobs move to ECS Fargate in the staging VPC. Tracked in #71.
  • A drift risk exists if a future writer ever puts PHI on the staging cluster — the PrivateLink "non-PHI cross-cluster read" architectural claim would silently break. Mitigated by the CI guard at #100, which ships in two complementary phases:
    • Phase 1 shipped 2026-05-08 — static check at scripts/audit/staging-collections-guard.ts runs on every PR via .github/workflows/staging-collections-guard.yml. Fails the build if any getReferenceDb() call is made against a collection not on STAGING_ALLOWED_COLLECTIONS in src/lib/db.ts. Catches string-literal, dynamic-name, and inline-call patterns. Verified on synthetic violations 2026-05-08.
    • Phase 2 shipped 2026-05-09 — live nightly check at scripts/audit/staging-cluster-drift.ts runs at 08:00 UTC daily via .github/workflows/staging-cluster-drift.yml (cron + manual dispatch). Audits the actual Atlas state of app_read_staging: verifies the user has exactly one role (role_reader_reference@admin) granting only FIND action on exactly askflorence.formularies_staging + askflorence.providers_staging and nothing else (no extra roles, no inheritedRoles, no wider actions, no extra collections, no DB-wide grants). Opens a P1 GitHub issue on drift. Catches the runtime cases the static guard cannot — privilege escalation via Atlas Admin UI, out-of-band role changes, etc. Verified 2026-05-09 against three synthetic violations (extra collection grant / wider action / extra role on user) — all caught with correct violation reports.
    • As part of Phase 2, app_read_staging's role was tightened from built-in read@askflorence (whole-DB scope) to custom role_reader_reference@admin (per-collection FIND on the 2 collections actually consumed: formularies_staging + providers_staging). This is the "future audit requires per-collection scoping" follow-up referenced in docs/runbooks/atlas-user-provisioning.md. Verified prod cross-cluster reads (drug + provider tier fallback) remain healthy after the tightening — drug_tier=PreferredBrand and network_tier=Preferred smoke responses byte-identical to baseline.
  • Atlas BAA must enumerate both projects in writing — chase tracked in #57.

Alternatives considered ​

  • Path A — Duplicate data onto prod cluster. Rejected. Recurring $326/mo cost is prohibitive for a pre-revenue startup; co-residing public reference data with PHI on the prod audit surface complicates EDE Phase 3 narrative.
  • Path B — VPC peering prod VPC ↔ staging Atlas. Rejected. Blocked by CIDR conflict (192.168.248.0/21 on both Atlas projects); resolution would require destroy + recreate of one project, multi-day operational disruption.
  • Status quo — keep doctor + Rx feature staging-only. Rejected. Doctor + Rx is a launch-tier feature for askflorence.health; deferring it past launch was not an option per product priority.

Revisit triggers ​

Switch architecture if any of these fire:

  1. Staging cluster cost > $500/mo sustained for >2 months → evaluate M20 with delta refresh (#98).
  2. Cross-cluster read p99 latency > 250ms → evaluate co-locating data on prod cluster (Path A revisited).
  3. Auditor flags cross-cluster path under EDE Phase 3 review → migrate both clusters to FedRAMP-authorized Atlas Government (architecture transfers; PrivateLink stays).
  4. Any PHI ever needs to land on the staging cluster → immediate cutover. CI guard #100 is the early-warning system for this.

Amendment 2026-05-11 (ENG-257 closeout) ​

role_reader_reference's canonical scope is four collections, not two: formularies_staging, providers_staging, plans, mrpuf_issuers_staging. All four are part of the §1311 / MRF reference dataset on the staging cluster and share the same non-PHI data classification.

When this ADR shipped on 2026-05-08, the role was tightened to two collections (the runtime-fallback consumers only). On 2026-05-09 the §1311 re-validation audit (ENG-230) needed read access to plans + mrpuf_issuers_staging and the role was widened to four. ENG-257 was filed as the planned narrow-back once the audit cycle closed.

Re-examining on 2026-05-11: the wider scope is the correct permanent posture, not a temporary tradeoff. The role's responsibility is "cross-cluster reads of staging-cluster §1311 reference data for both runtime tier-fallback AND periodic audit re-validation." Both purposes:

  • Operate against the same dataset family (§1311 / MRF).
  • Share the same data classification (non-PHI public CMS marketplace data).
  • Use the same AWS PrivateLink path (no incremental network surface).
  • Recur on a known cadence (audit re-validation runs each refresh cycle — ENG-231 makes that ongoing).

Narrowing back to two collections and re-widening on each future audit cycle would be operational churn that delivers no posture benefit — the data classification is identical and the network path is unchanged.

Resolution: re-baseline the matrix at four collections as canonical. No Atlas change (the live role has been at four since 2026-05-09). No code-consumer change (the runtime consumers still touch only formularies_staging + providers_staging; plans + mrpuf_issuers_staging are reachable via the same role for audit harnesses only). The drift CI guard (Phase 2 above) stays green because the matrix matches Atlas state. ENG-257 closed as not planned with this rationale. The ADR's "non-PHI cross-cluster read" architectural claim is unchanged — only the enumeration of reachable collections widens to reflect the role's full purpose.

If a future cycle wants to ratchet back to a 2-collection runtime + dedicated audit_reader user pattern, the original recipe is preserved in GH #122's description.

References ​

  • Issue #101 — Phase 11 shipped: 2.14M provider + 12.5K medication data live on prod via cross-cluster Atlas PrivateLink
  • Decision doc: docs/decisions/2026-05-03-pivot-cms-api-direct.md "Cross-cluster reference reads via AWS PrivateLink"
  • Terraform: infra/envs/prod/atlas-staging-privatelink.tf
  • Code: src/lib/db.ts getReferenceDb(), src/lib/drug-tier-fallback.ts
  • ADR 0001 — Atlas project isolation — the project-boundary baseline this decision builds on
  • ADR 0003 — Narrow-scoped MongoDB users — the role-shape pattern app_read_staging follows
  • SOC 2 controls mapping: docs/security-compliance/soc2-control-mapping.md CC6.6 (additional row) + CC6.7
  • Vendor register: docs/security-compliance/vendor-register.md MongoDB Atlas row (both project IDs enumerated)
  • Cross-references: #57 (BAA enumeration), #71 (staging IP allowlist hardening), #96 (Phase D provider-network fallback — same pattern), #98 (delta-aware MRF refresh), #100 (CI guard)
  • AWS PrivateLink for MongoDB Atlas — docs (vendor reference)
  • HIPAA §164.312(e)(1) — Transmission Security
  • SOC 2 TSC 2017 — CC6.6, CC6.7
  • CMS EDE Audit Program Appendix A § 3 (Environment Separation), § 4 (Encryption in Transit)
Pager
Previous page0003 — Narrow-scoped Mongo users
Next page0005 — Delayed-job architecture

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.