Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Runbook — Provision Atlas users and roles (Issue #56) ​

🔴 LIVE STATE → see Atlas Access Matrix (auto-generated from infra/atlas/access-matrix.ts, CI-guarded for accuracy). This runbook documents the ORIGINAL provisioning steps for #56 plus the Phase 11 cross-cluster reader, but the matrix is the canonical view of every user that exists today (including post-runbook additions like app_read_local_staging from ENG-271, app_writer_hubspot_sync from PR91, and app_admin_schema from ENG-266). When provisioning a new user, update the matrix in lock-step — CI enforces it.

This runbook reproduces the user/role setup executed against the askflorence-staging Atlas project on 2026-04-17. Use it to bring a second project (prod, a future DR region, or a throwaway sandbox) to the same state. It assumes an empty Atlas project — no pre-existing role names collide.

Design rationale lives in ADR 0003. Project-isolation rationale in ADR 0001. Append-only audit log rationale in ADR 0002.

Cross-cluster read user (app_read_staging) — the read-only user prod uses to read non-PHI public CMS reference data from the staging cluster via AWS PrivateLink. Provisioning steps for this user are documented in the dedicated section at the bottom of this runbook ("Cross-cluster app_read_staging user — Phase 11"). Decision rationale: ADR 0004.

Prerequisites ​

  • atlas CLI v1.x with an authenticated session (atlas auth login).
  • mongosh installed (for verification probes).
  • openssl installed (for password generation).
  • Project Owner role in the Atlas project you're targeting.
  • The target project ID recorded: atlas projects list.

Variables you'll substitute ​

bash
PROJECT_ID=<target Atlas project ID>          # e.g. 69e31af12fd2c0aef51bbb41 for staging
CLUSTER_HOST=<cluster SRV host>               # e.g. askflorence-staging.efsikmv.mongodb.net
DB_NAME=askflorence                           # never changes

Step 1 — Confirm no role name collisions ​

bash
atlas customDbRoles list --projectId $PROJECT_ID --output json

Expected on a fresh project: []. If any of the five role names (role_writer_survey, role_writer_plans, role_writer_agents, role_admin_agents, role_audit_reader) already exist, stop — investigate.

Step 2 — Create the five custom roles ​

role_writer_survey ​

bash
atlas customDbRoles create role_writer_survey \
  --privilege FIND@${DB_NAME}.agent_survey_responses,INSERT@${DB_NAME}.agent_survey_responses,UPDATE@${DB_NAME}.agent_survey_responses,REMOVE@${DB_NAME}.agent_survey_responses \
  --projectId $PROJECT_ID

role_writer_plans ​

bash
ACTIONS=(FIND INSERT UPDATE REMOVE CREATE_INDEX DROP_INDEX COLL_MOD)
COLLS=(plans zip_county regions plan_years audit_log)
PRIV=""
for a in "${ACTIONS[@]}"; do for c in "${COLLS[@]}"; do PRIV="${PRIV}${a}@${DB_NAME}.${c},"; done; done
PRIV="${PRIV%,}"
atlas customDbRoles create role_writer_plans --privilege "$PRIV" --projectId $PROJECT_ID

role_writer_agents — append-only on agent_audit_log ​

bash
AGENTS_RW="FIND@${DB_NAME}.agents,INSERT@${DB_NAME}.agents,UPDATE@${DB_NAME}.agents,REMOVE@${DB_NAME}.agents,FIND@${DB_NAME}.agencies,INSERT@${DB_NAME}.agencies,UPDATE@${DB_NAME}.agencies,REMOVE@${DB_NAME}.agencies,FIND@${DB_NAME}.agent_sessions,INSERT@${DB_NAME}.agent_sessions,UPDATE@${DB_NAME}.agent_sessions,REMOVE@${DB_NAME}.agent_sessions"
AUDIT_APPEND="FIND@${DB_NAME}.agent_audit_log,INSERT@${DB_NAME}.agent_audit_log"
atlas customDbRoles create role_writer_agents --privilege "${AGENTS_RW},${AUDIT_APPEND}" --projectId $PROJECT_ID

role_admin_agents — same as above plus admins ​

bash
ADMINS_RW="FIND@${DB_NAME}.admins,INSERT@${DB_NAME}.admins,UPDATE@${DB_NAME}.admins,REMOVE@${DB_NAME}.admins"
atlas customDbRoles create role_admin_agents --privilege "${AGENTS_RW},${AUDIT_APPEND},${ADMINS_RW}" --projectId $PROJECT_ID

Why not --inheritedRole? Atlas rejected --inheritedRole role_writer_agents@askflorence with ATLAS_INVALID_CUSTOM_ROLE_INHERITED_SCOPE. Custom-role inheritance in Atlas is strict about the scope — explicit enumeration is more robust and the privilege list is still short.

role_audit_reader ​

bash
atlas customDbRoles create role_audit_reader \
  --privilege FIND@${DB_NAME}.agent_audit_log \
  --projectId $PROJECT_ID

Step 3 — Create the six users ​

Important: when assigning a custom role to a user, the role must be referenced as role_name@admin, not role_name@${DB_NAME}. Atlas rejects the latter with UNSUPPORTED_ROLE: Custom role X must scoped to admin database. The role's privileges target askflorence.* collections; the role itself is assigned via @admin.

Generate a 32-char alphanumeric password per user (openssl rand -base64 48 | tr -dc 'A-Za-z0-9' | head -c 32). Do not reuse passwords across users.

bash
declare -a USERS=(
  "app_read_staging:read@${DB_NAME}:MONGODB_URI"
  "app_writer_survey:role_writer_survey@admin:MONGODB_URI_SURVEY_WRITE"
  "app_writer_plans:role_writer_plans@admin:MONGODB_URI_PLANS_WRITE"
  "app_writer_agents:role_writer_agents@admin:MONGODB_URI_AGENTS_WRITE"
  "app_admin_agents:role_admin_agents@admin:MONGODB_URI_AGENTS_ADMIN"
  "audit_reader:role_audit_reader@admin:MONGODB_URI_AUDIT_READ"
)

PW_FILE=/tmp/.atlas-provision-pws   # mode 600, deleted at the end
: > "$PW_FILE"
chmod 600 "$PW_FILE"

for entry in "${USERS[@]}"; do
  USER="${entry%%:*}"
  rest="${entry#*:}"
  ROLE="${rest%%:*}"
  ENV_NAME="${rest#*:}"
  PW=$(openssl rand -base64 48 | tr -dc 'A-Za-z0-9' | head -c 32)
  atlas dbusers create --username "$USER" --password "$PW" --role "$ROLE" --projectId $PROJECT_ID
  echo "${ENV_NAME}=mongodb+srv://${USER}:${PW}@${CLUSTER_HOST}/${DB_NAME}?retryWrites=true&w=majority" >> "$PW_FILE"
done

For the prod rollout, rename the first user from app_read_staging to app_read_prod (or just keep the existing app-read and skip that entry).

Step 4 — Write creds to the env file ​

Move the contents of $PW_FILE to the appropriate local env file:

  • Staging cluster: .env.staging.local (mode 600, gitignored).
  • Prod cluster: do not write to .env.local on a dev machine. Move directly to Vercel env (or AWS Secrets Manager post-migration) and securely share with only the engineers who need local prod access.

Then:

bash
rm "$PW_FILE"

Step 5 — Verify (positive + negative probes) ​

Source the env file with the line-by-line loader (raw source breaks on & in SRV strings):

bash
while IFS= read -r line; do
  [[ -z "$line" || "$line" == \#* ]] && continue
  k="${line%%=*}"; v="${line#*=}"
  export "$k=$v"
done < .env.staging.local

Run all 12 probes — expect 6 positive ACKs and 6 "user is not allowed to do action" denials. The full probe script is in docs/session-log/2026-04-17-atlas-staging.md. Any unexpected outcome means a role privilege is wrong — stop and fix before handing the creds to any consumer.

Step 6 — Probe-row hygiene ​

The positive probe against role_writer_agents inserts {_probe: true} into agent_audit_log. By design of ADR 0002, this row cannot be deleted by any app-tier user. Leave it; it ages out with retention.

Step 7 — Cleanup ​

  • Ensure $PW_FILE was deleted.
  • If a temp restore admin was used for a seeded cluster, delete it (atlas dbusers delete tmp_restore_admin --projectId $PROJECT_ID --force).
  • Remove any local mongodump output (rm -rf ./tmp/prod-snapshot/).

When you're done — handoff ​

Produce a session brief with: project ID, cluster host, six usernames, env file location, region/tier/version. Never include passwords. Briefs live in docs/briefs/ — the staging handoff is at docs/briefs/SESSION_BRIEF_2026-04-17_atlas.md; for the prod rollout use the same filename pattern (SESSION_BRIEF_<YYYY-MM-DD>_<topic>.md).

Rollback ​

For a newly-minted project, one command drops everything:

bash
atlas projects delete $PROJECT_ID

For an in-flight provisioning against an existing project (e.g. prod), revert by deleting each user created this session and each custom role created this session, in that order (users first, so the roles are unused).

Cross-cluster app_read_staging user — Phase 11 ​

This user lives on the staging Atlas project (askflorence-staging, project_id 69e31af12fd2c0aef51bbb41) and is read by the prod app over AWS PrivateLink to fetch non-PHI public CMS reference data (formularies_staging, providers_staging). Decision rationale: ADR 0004.

The user is distinct from the staging-side app_writer_* / app_admin_* users defined earlier in this runbook — those scope writes to staging-app collections; app_read_staging is a narrow read-only consumer used by external (prod) callers only.

Step A — Generate a strong password ​

bash
PW_FILE=$(mktemp)
openssl rand -base64 32 > "$PW_FILE"
chmod 600 "$PW_FILE"

The password is later loaded into AWS Secrets Manager as part of the connection string (prod/mongodb/reference-uri) on the prod AWS account. Never commit, paste into chat, or store outside Secrets Manager.

Step B — Create the Atlas user with the custom role_reader_reference role ​

bash
PROJECT_ID=69e31af12fd2c0aef51bbb41   # staging
DB_NAME=askflorence

# First, ensure the custom role exists (one-time per project — idempotent).
# If it already exists this returns a 409; safe to ignore.
# Canonical scope: 4 collections (see ADR 0004 amendment 2026-05-11 /
# ENG-257 closeout). Two for runtime tier-fallback (formularies_staging +
# providers_staging) and two for audit re-validation (plans +
# mrpuf_issuers_staging) — all part of the §1311 / MRF reference dataset.
atlas customDbRoles create role_reader_reference \
  --privilege FIND@${DB_NAME}.formularies_staging,FIND@${DB_NAME}.providers_staging,FIND@${DB_NAME}.plans,FIND@${DB_NAME}.mrpuf_issuers_staging \
  --projectId $PROJECT_ID

# Then create the user with that role.
atlas dbusers create \
  --username app_read_staging \
  --password "$(cat "$PW_FILE")" \
  --role role_reader_reference@admin \
  --projectId $PROJECT_ID

The custom role role_reader_reference grants ONLY FIND on the four §1311 / MRF reference collections: formularies_staging, providers_staging, plans, mrpuf_issuers_staging. Tighter than the built-in read role (which grants read on the entire askflorence DB). Per ADR 0004 amendment 2026-05-11 (ENG-257 closeout), the four-collection scope is the canonical permanent posture — two collections back the runtime tier-fallback path on prod, two back periodic audit re-validation cycles (ENG-230, future ENG-231 refresh cycles). All four share the same non-PHI data classification and the same AWS PrivateLink network path. Two reasons we ship the tight version:

  • Defense in depth. Even though the data classification policy + Phase 1 static guard (scripts/audit/staging-collections-guard.ts) keep the staging cluster non-PHI, the principle of least privilege says the cross-cluster reader should only see what it needs.
  • Phase 2 nightly drift check (scripts/audit/staging-cluster-drift.ts) audits this exact role shape every night via the Atlas Admin API. Any drift (extra collection grant, wider action like INSERT, additional role on the user, escalation to a built-in like read/readWrite) opens a P1 GitHub issue.

If you ever need to widen the role (new cross-cluster collection, etc.), update both the role on Atlas AND the constants STAGING_REFERENCE_READ_COLLECTIONS (src/lib/db.ts) + EXPECTED_READ_COLLECTIONS_SCRIPT_COPY (scripts/audit/staging-cluster-drift.ts) in lock-step. Otherwise the next nightly run flags the change as drift.

Emergency rollback (revert to the historical built-in read@askflorence, e.g. if Phase 2 audit ever breaks production cross-cluster reads):

bash
atlas dbusers update app_read_staging \
  --role read@${DB_NAME} \
  --projectId $PROJECT_ID

This is reversible and the cross-cluster reader keeps working immediately. Disables the data-classification posture though — the user can read anything in askflorence DB on the staging cluster — so investigate + revert the rollback as soon as the audit is fixed.

Step C — Resolve the private connection string ​

After AWS creates the VPC endpoint and Atlas approves the connection (per infra/envs/prod/atlas-staging-privatelink.tf), Atlas issues a private SRV connection string:

bash
atlas privateEndpoints aws describe 69fe75c5b02c024f32d2af50 \
  --projectId 69e31af12fd2c0aef51bbb41
# Look for endpointServiceName + connection string under interfaceEndpoints

The connection string format is:

mongodb+srv://app_read_staging:<password>@askflorence-staging-pl-0.<random>.mongodb.net/askflorence?retryWrites=true&w=majority

Step D — Store in prod Secrets Manager (project CMK encrypted) ​

bash
AWS_PROFILE=askflorence-prod aws secretsmanager put-secret-value \
  --secret-id prod/mongodb/reference-uri \
  --secret-string "$CONNECTION_STRING"

The secret shell is created by Terraform (infra/envs/prod/secrets.tf) which sets kms_key_arn to the prod project CMK so the value is encrypted with our key, not the AWS-default key. Do not create the secret directly via aws secretsmanager create-secret — that uses the AWS-default KMS key and is inconsistent with the rest of our secret hygiene.

Step E — Wire the env binding ​

Already declared in infra/envs/prod/ecs.tf:

hcl
secrets_from_manager = {
  ...
  MONGODB_REFERENCE_URI = module.secrets.secret_arns["mongodb/reference-uri"]
}

The next ECS deploy bakes the binding into a fresh task def. Application code (getReferenceDb() in src/lib/db.ts) routes via MONGODB_REFERENCE_URI and falls back to MONGODB_URI when unset — dev + staging keep working without code changes.

Step F — Verify end-to-end ​

bash
# From a host that can reach prod ECS — typically a smoke test against
# askflorence.health that exercises the cross-cluster read path.
curl -sS -X POST https://askflorence.health/api/drugs/covered \
  -H 'Content-Type: application/json' \
  -d '{"rxcuis":["1364441"],"plan_id":"42261UT0060023","year":2026}'

Expected response: coverage=Covered, drug_tier=PreferredBrand. The drug_tier field is only populated by lookupStagingDrugTiers() reading from formularies_staging via the cross-cluster path — its presence is the proof.

Step G — Cleanup ​

  • Ensure $PW_FILE was deleted (rm -f "$PW_FILE").
  • Confirm Secrets Manager secret has populated value (aws secretsmanager get-secret-value --secret-id prod/mongodb/reference-uri --query 'SecretString' --output text should return non-empty; never paste the value into chat / commit / log).

Step H — API key for nightly drift check (Phase 2) ​

Phase 2 of #100 / ENG-239 adds a nightly GitHub Actions workflow (.github/workflows/staging-cluster-drift.yml) that audits the live Atlas state of app_read_staging via the Atlas Admin API. To run the audit in CI, the workflow needs an Atlas Programmatic API key bound to GitHub Actions secrets.

One-time provisioning:

  1. Create the API key in Atlas Org settings. Requires Org Owner role.

    Atlas UI → Organization Settings → Access Manager → Applications → Create API Key
    Name:        gh-actions-staging-drift-check
    Org Permission: leave at default (no Org-level role)

    On the next page, when prompted for project access:

    Project:                     askflorence-staging (69e31af12fd2c0aef51bbb41)
    Project permission:          Project Read Only

    Do NOT grant Org-level permissions, and do NOT grant any other project. Project Read Only on staging is the only access this key needs (it reads app_read_staging's role + the role_reader_reference definition, nothing else).

  2. Capture the public + private key pair. The private key is shown ONCE at creation. Save to 1Password under Atlas — gh-actions-staging-drift-check.

  3. Bind to GitHub Actions secrets:

    bash
    gh secret set ATLAS_DRIFT_CHECK_PUBLIC_KEY  --body "<public-key>"
    gh secret set ATLAS_DRIFT_CHECK_PRIVATE_KEY --body "<private-key>"

    The workflow exposes these as MONGODB_ATLAS_PUBLIC_API_KEY + MONGODB_ATLAS_PRIVATE_API_KEY env vars at runtime — the atlas CLI consumes those automatically (no atlas auth login step needed in CI).

  4. Manually trigger to verify. First run should pass (the role is in canonical state):

    bash
    gh workflow run staging-cluster-drift.yml
    gh run list --workflow staging-cluster-drift.yml --limit 1

Rotating the key: Atlas does not auto-rotate Programmatic Keys. Rotate annually as part of the same quarterly review that touches STAGING_ALLOWED_COLLECTIONS. To rotate:

Atlas UI → Org Settings → Access Manager → Applications → gh-actions-staging-drift-check → ... → Rotate

Then re-run gh secret set for both keys with the new values. The old key auto-deactivates after the rotation grace period.

Drift-check rollback (emergency): if the nightly audit ever produces false positives (e.g. Atlas API schema change), disable the schedule by editing .github/workflows/staging-cluster-drift.yml and removing the schedule: block (keeping workflow_dispatch: for manual runs while debugging). The workflow itself does not modify Atlas state — it's a read-only audit — so disabling it carries zero security blast radius beyond losing the nightly check.

Rollback ​

bash
# Atlas:
atlas dbusers delete app_read_staging \
  --projectId 69e31af12fd2c0aef51bbb41 --force

# AWS Secrets Manager (30-day recovery window):
AWS_PROFILE=askflorence-prod aws secretsmanager delete-secret \
  --secret-id prod/mongodb/reference-uri \
  --recovery-window-in-days 30

To fully tear down the cross-cluster path including the AWS PrivateLink endpoint + security group, see the rollback section in the Phase 11 session log.

Pager
Previous pageOffboard Team Member
Next pageDeploy via Terraform (ENG-277)

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.