Appearance
2026-05-08 — Phase 11: Cross-cluster Atlas reads from prod via AWS PrivateLink
Scope
Bring 2.14M NPI provider docs + 12,557 RxCUI / ~30M drug-plan tuples of public CMS marketplace reference data within reach of the prod app — without duplicating the data onto the prod Atlas cluster. Goal: unblock the doctor + Rx coverage flow on askflorence.health (was returning empty tier metadata for tier-fallback-eligible drugs like Eliquis 2.5mg) at minimum recurring cost and best compliance posture.
Actor
taha.abbasi@abbasiindustries.com driving from ~/Developer/ask-florence-doctor-rx/ worktree (branch doctor-rx-flow); agent: Claude Opus 4.7 (1M context).
Tickets / ADRs
- Created: ADR 0004, Issue #100 (CI guard), Issue #101 (Phase 11 umbrella with decision matrix comment)
- Cross-referenced: #57 (BAA enumeration), #71 (staging IP allowlist hardening, post-launch), #96 (Phase D provider-network fallback), #98 (delta-aware MRF refresh)
- Built on: ADR 0001 (project isolation), ADR 0003 (narrow-scoped users)
External systems touched
| System | Account / Project | Delta |
|---|---|---|
Atlas (staging project askflorence-staging) | 69e31af12fd2c0aef51bbb41 | Created PrivateLink endpoint service (Atlas endpointId 69fe75c5b02c024f32d2af50); created read-only app_read_staging Atlas user on askflorence database; approved AWS-side VPC endpoint vpce-0c81aea11e29bb928 |
Atlas (prod project askflorence-prod-01) | 69dc20c64005b222804dafa4 | No changes |
AWS prod (account 039624954211) | prod VPC vpc-09201679b87261b6d | NEW: aws_security_group askflorence-prod-atlas-staging-privatelink; NEW: aws_vpc_endpoint af-prod-to-staging-reference-pl (multi-AZ Interface endpoint); NEW: Secrets Manager secret prod/mongodb/reference-uri (project CMK encrypted); NEW: ECS task def revision 53 (with MONGODB_REFERENCE_URI env binding); ECS service rolled to revision 53, 2/2 tasks running |
GitHub (repo askflorencehealth/ask-florence) | n/a | Filed #100, filed #101; cross-link comments on #57, #71, #96, #98; merged 6 commits on doctor-rx-flow (incl. 2bba8d4 feat(db): add getReferenceDb() + dd06efe feat(infra): Phase 11 cross-cluster Atlas reference reads via AWS PrivateLink) into main (advanced from 9179c8d to 1ac9a58) |
| GitHub Actions (Deploy prod) | n/a | Run 25587085583 — success in 6m59s, container image 1ac9a584006cc10df88864e08536e01159515f86 |
What shipped
Code
src/lib/db.ts— addedgetReferenceDb()two-pool helper. Prod usesMONGODB_REFERENCE_URI(cross-cluster path); dev + staging useMONGODB_URI(same as before) whenMONGODB_REFERENCE_URIis unset. Backward-compat preserved.src/lib/drug-tier-fallback.ts— switched fromgetDb()togetReferenceDb(). Module-level comment updated to explain the cluster-routing rationale.
Infrastructure (Terraform)
infra/envs/prod/atlas-staging-privatelink.tfNEW — defines theaws_security_group(MongoDB ports from prod VPC CIDR) +aws_vpc_endpoint(multi-AZ, Atlas-issued service name,private_dns_enabled = falsesince Atlas issues its own DNS via private connection string).infra/envs/prod/secrets.tf— addedmongodb/reference-urientry withdata_class = "public"(non-PHI public CMS marketplace data).infra/envs/prod/ecs.tf— addedMONGODB_REFERENCE_URItosecrets_from_managermap; ECS task def revision 53 was registered directly via AWS CLI (since the module'slifecycle.ignore_changes = [container_definitions]pattern means CI/CD owns task-def revisions, not Terraform)..gitignore— addedinfra/**/*.tfvars+*.tfvars.jsondefensive ignore.
Atlas (CLI)
atlas privateEndpoints aws create --projectId 69e31af12fd2c0aef51bbb41 --region US_EAST_1— created the endpoint service (Atlas endpointId69fe75c5b02c024f32d2af50).atlas privateEndpoints aws interfaces create 69fe75c5b02c024f32d2af50 --privateEndpointId vpce-0c81aea11e29bb928 --projectId 69e31af12fd2c0aef51bbb41— Atlas-side approval of the AWS VPC endpoint.atlas dbusers create— createdapp_read_stagingAtlas user with read-only role onaskflorencedatabase.atlas privateEndpoints aws describe 69fe75c5b02c024f32d2af50— final state:status: AVAILABLE, interface endpointvpce-0c81aea11e29bb928.
Documentation
- ADR 0004 NEW — full decision record (context, decision, consequences, alternatives, revisit triggers, references).
docs/decisions/2026-05-03-pivot-cms-api-direct.md— added "Cross-cluster reference reads via AWS PrivateLink" section with decision-matrix table, compliance posture, mitigations.docs/security-compliance/soc2-control-mapping.md— new CC6.6 row (cross-cluster reference reads) + new CC6.7 row (transmission encryption with TLS + PrivateLink dual protection). (Was atdocs/compliance/soc2/controls.mduntil the 2026-05-11 doc consolidation.)docs/security-compliance/vendor-register.md— MongoDB Atlas row expanded to enumerate both project IDs under organization-level BAA + cross-cluster posture.docs/infrastructure/change-log.md— Phase 11 entry at top.docs/infrastructure/mongodb-setup.md— added cross-cluster reference reads section.docs/infrastructure/data-classification.md— addedformularies_staging+providers_stagingcollection rows.docs/runbooks/atlas-user-provisioning.md— added Step forapp_read_staginguser creation.
Verification
End-to-end on prod (live test)
POST askflorence.health/api/drugs/covered with Eliquis 2.5mg (RxCUI 1364441) on 8 UT plans:
plan 42261UT0060023 → coverage=Covered, drug_tier=PreferredBrand, quantity_limit=true
plan 42261UT0060026 → coverage=Covered, drug_tier=PreferredBrand, quantity_limit=true
plan 68781UT0020024 → coverage=Covered, drug_tier=PreferredBrand
plan 68781UT0040006 → coverage=Covered, drug_tier=PreferredBrand
plan 68781UT0200007 → coverage=Covered, drug_tier=PreferredBrand
plan 68781UT0200009 → coverage=Covered, drug_tier=PreferredBrand
plan 68781UT0200014 → coverage=Covered, drug_tier=PreferredBrand
plan 68781UT0020025 → coverage=Covered, drug_tier=PreferredBrandThe drug_tier and UM flags are populated only by lookupStagingDrugTiers() reading from formularies_staging via the cross-cluster path. Their presence is the proof. Different quantity_limit values across plans confirms real per-plan formulary data, not a stub. Sub-second latency (~225-465ms) on the AWS-backbone path.
Infrastructure state
- AWS WAF: no relevant rule changes (the WAF item from #47 PostHog/Telegram remains an open follow-up — separate work).
- Atlas connection status:
AVAILABLEboth sides (Atlas-side describe + AWS-sideaws ec2 describe-vpc-endpoints). - ECS service: rev 53, 2/2 tasks running, rolloutState=COMPLETED.
- ECR: container image
1ac9a584006cc10df88864e08536e01159515f86(the merge commit SHA on main).
Cost outcome (verified vs estimate)
| Component | Tier | Monthly | Holds |
|---|---|---|---|
Prod cluster askflorence-prod-01 | M10 HIPAA | ~$56 | PHI-scope app data |
Staging cluster askflorence-staging | M30 | ~$382 | Public CMS reference data |
| Total | ~$438/mo | ||
| (avoided) duplicate-on-prod | M30 prod + M30 staging | ~$764 | Would have doubled tier cost |
| Savings | ~$326/mo |
Deviations from plan
- Initial attempt to create the secret directly via
aws secretsmanager create-secretused the AWS-default KMS key. Caught + corrected: scheduled the bad secret for deletion and re-created via the Terraform secrets module which uses the project CMK (key/860e6ae2-1ddb-4b3d-969e-9496d8dec7af), then populated value viaput-secret-value. - Terraform
applyshowed "no changes" when addingMONGODB_REFERENCE_URIto the ECS task def env — the module'slifecycle.ignore_changes = [container_definitions]pattern means CI/CD owns task-def revisions, not Terraform. Bypassed by registering revision 53 directly via AWS CLI: download current task def, inject env binding via Python,aws ecs register-task-definition+aws ecs update-service. The Terraform side stays correct as the source-of-truth for next CI deploy; this CLI registration just bridges the gap until the next deploy bakes the env binding in normally.
Compliance posture impact
| Framework | Control | Status |
|---|---|---|
| HIPAA | §164.312(e)(1) Transmission security | TLS 1.2+ at app layer + AWS-backbone-only at network layer (PrivateLink). Doubly-protected encryption. |
| SOC 2 | CC6.6 — restrictions on logical access from outside boundaries | Cross-cluster reads identity-bound at AWS account level (PrivateLink) + Atlas Mongo auth. Documented in docs/security-compliance/soc2-control-mapping.md (additional row added). |
| SOC 2 | CC6.7 — transmission encryption | TLS-only Atlas endpoint + PrivateLink eliminates public-network exposure. New row added. |
| SOC 2 | CC8.1 — change management | This session log + change-log entry + ADR 0004 = full evidence trail. |
| CMS EDE Phase 3 | Environment separation + audit boundary | PHI lives only on prod cluster. Non-PHI public reference data lives only on staging. One-way private read prod → staging. Audit narrative is clean. |
| Atlas BAA | §164.314(a) | Organization-level BAA covers both projects. Confirmation-in-writing chase tracked in #57. |
What was NOT done (deferred)
- Staging IP allowlist hardening — staging cluster's allowlist still permits Taha's laptop IP + a few CI runner egress IPs. Pre-launch ingest scripts (RxNorm refresh, NPPES refresh, formulary loads) need IP-based access. Tracked in #71 for post-launch closure once ingest moves to ECS Fargate in staging VPC.
- CI guard implementation — issue #100 filed today with full spec but not yet implemented. P1, separate session.
- Phase D provider-network fallback — same architectural pattern for
providers_stagingas we just shipped forformularies_staging. Cross-cluster path already wired (getReferenceDb()is the same helper); just needs the analogous lookup function + route handler. Tracked in #96. - Atlas BAA written enumeration — open ask of Atlas support per #57; does not block.
Rollback
If the cross-cluster path needs to be torn down:
bash
# Application layer rollback
# 1. Revert env binding (re-deploy without MONGODB_REFERENCE_URI)
# 2. Code path falls back to MONGODB_URI automatically (getReferenceDb fallback)
# 3. drug-tier-fallback returns empty Map; CMS coverage stays authoritative;
# only the tier-enrichment becomes silently unavailable on prod
# Infrastructure rollback (Terraform)
cd ~/Developer/askflorence # or any worktree with Terraform state access
AWS_PROFILE=askflorence-prod terraform -chdir=infra/envs/prod destroy \
-target=aws_vpc_endpoint.atlas_staging \
-target=aws_security_group.atlas_staging_privatelink
# Atlas rollback (CLI)
atlas privateEndpoints aws delete 69fe75c5b02c024f32d2af50 \
--projectId 69e31af12fd2c0aef51bbb41 --force
atlas dbusers delete app_read_staging \
--projectId 69e31af12fd2c0aef51bbb41 --force
# Secret cleanup (30-day recovery window)
aws secretsmanager delete-secret \
--secret-id prod/mongodb/reference-uri \
--recovery-window-in-days 30The 30-day Secrets Manager recovery window is a deliberate safety net — accidental deletion is reversible via aws secretsmanager restore-secret within that window.