Appearance
ADR 0006 — MongoDB user simplification (4-user functional model)
Status
Accepted — 2026-05-12 (ENG-279). Supersedes ADR 0003.
Context
ADR 0003 provisioned five custom roles + six narrow-scoped database users (app_writer_survey, app_writer_plans, app_writer_agents, app_admin_agents, audit_reader) on the staging Atlas project. Subsequent issues added more narrow users (app_writer_waitlist, app_writer_hubspot_sync, app_read_local_staging, app_read_staging, app_admin_schema) until the live user count reached 10 on staging + 4 on prod = 14 active database users by 2026-05-11.
The narrow-scoped model produced three silent regressions in the same week:
| Issue | Cause | Impact |
|---|---|---|
| ENG-271 | Narrowed staging MONGODB_URI to app_read_local_staging (lacked FIND on providers_staging, formularies_staging); getReferenceDb() silently fell back to it | /api/providers/covered + /api/drugs/covered returned HTTP 500 on apex (the YC-application surface). Caught only when founder typed a real doctor name. |
| ENG-272 | Added app_read_local_staging + removed silent fallback; fixed deployed staging but missed canonical .env.local | Local dev pipeline ran against the wrong user; calculator regression scripts gave misleading results until manually corrected |
| ENG-279 (this ADR) | Discovered the local-side miss while wrapping up ENG-214 compliance docs | Surfaced the systemic pattern: every narrow grant creates a new way to mis-bind |
The pattern is over-segmentation — exactly the anti-pattern MongoDB's own documentation calls out:
- Use built-in roles before custom roles
- Custom roles at DB level before collection level
- Per-tenant data isolation via views + JWT in app middleware, NOT per-tenant DB users
Pre-Phase-5, askflorence content is public CMS marketplace data + agent waitlist PII. No row-level filtering is required at the DB layer — a DB-wide read grant for reads and readWrite for writes is the right scope.
Decision
Roll back to a 4-user functional model per cluster (= 8 active users across both clusters):
| User | Role | Privileges | Env var bindings |
|---|---|---|---|
app_read | Built-in read@askflorence | DB-wide FIND on askflorence | MONGODB_URI (prod + staging + local primary); MONGODB_REFERENCE_URI (prod cross-cluster via PrivateLink to staging cluster's app_read; staging local; local dev) |
app_write | Built-in readWrite@askflorence | DB-wide readWrite on askflorence | MONGODB_WRITE_URI (all envs) |
app_audit_writer | Custom role_audit_writer — FIND + INSERT on agent_audit_log only | Append-only (ADR 0002 preserved verbatim) | MONGODB_AUDIT_WRITE_URI (all envs); consumed via getAuditDb() helper in src/lib/db.ts; no Phase-1-4 consumer (Phase 5 audit writers drop in) |
app_admin_schema | Custom role_admin_schema — FIND + CREATE/LIST/DROP INDEX on agent_waitlist_submissions + agent_survey_responses + hubspot_sync_log | Same as today (ENG-266 architectural separation preserved) | MONGODB_URI_ADMIN_SCHEMA (staging + prod); wired only to deploy-time ECS RunTask, NOT to the runtime app — keeps runtime task role from holding index-mgmt grants |
The 5 deprecated env vars (MONGODB_URI_PLANS_WRITE, _SURVEY_WRITE, _WAITLIST_WRITE, _HUBSPOT_SYNC_WRITE, _AGENTS_WRITE, _AGENTS_ADMIN, _AUDIT_READ) collapse to MONGODB_WRITE_URI (the writers) and MONGODB_URI (the audit read).
Append-only agent_audit_log (ADR 0002 preserved)
app_audit_writer holds a custom DB role (role_audit_writer) with FIND + INSERT on agent_audit_log ONLY. UPDATE + REMOVE return "not authorized on askflorence" at the MongoDB layer regardless of app code — same property ADR 0002 specified, now consolidated into a single user across both clusters. Verified at Phase B via direct probes: insert succeeds, UPDATE/REMOVE return MongoDB unauthorized errors.
Schema-admin runtime/deploy-time separation (ENG-266 preserved)
app_admin_schema stays as a 4th user (NOT folded into app_write) because ENG-266 Phase 3.5 deliberately split runtime-app credentials from deploy-time index-maintenance credentials. The ECS RunTask family that runs scripts/db/ensure-indexes.ts mounts this secret; the runtime ECS service does NOT. Folding it would re-introduce index-management privileges into the runtime task role's blast radius for zero functional gain.
Re-narrowing playbook for Phase 5 PHI (does NOT add new DB users)
Phase 5 introduces agent + member data containing PHI. The 4-user model holds by shifting the security boundary from "narrow DB user per collection" to JWT-in-middleware + MongoDB views:
- JWT identity (app middleware). Authenticated agent requests carry a JWT signed by our auth layer (magic link + TOTP). Payload includes
agent_id+agent_npn. Middleware extracts these into a request context. Admins carry a JWT withadmin_id+admin_role. - MongoDB views. Per-tenant data isolation is enforced via views filtered on
agent_id/member_id. Agent portal route handlers query the view, NOT the underlying collection. Views are created at agent provisioning time viascripts/db/setup-agent-views.ts(Phase 5 work). The grants onapp_read/app_writestay unchanged. - Admin tier reads everything via underlying collections + WHERE clauses computed from their JWT role. Audit-log every admin read (
app_audit_writerinsert) — the audit trail IS the access-control evidence. - Audit-log writes go through
app_audit_writerviaMONGODB_AUDIT_WRITE_URI— append-only property preserved across agent portal, admin dashboard, super-admin path. - Super-admin uses the
/sa-loginflow (CLAUDE.md): password + TOTP + IP allowlist. Connects with an Atlas user not held by any task role — break-glass operation only.
ENG-279 acceptance criteria cap total users at ≤ 5 per cluster. The 4-user baseline plus an optional 5th app_admin_full at Phase 5 (if the admin dashboard needs writes that views cannot express) lands at the cap. Any 6th user requires a follow-up ADR.
Cross-cluster PrivateLink (ADR 0004) preserved
The prod app continues to read non-PHI public CMS reference data (formularies_staging, providers_staging, plans, mrpuf_issuers_staging) from the staging Atlas cluster via AWS PrivateLink. The user backing MONGODB_REFERENCE_URI rotates from app_read_staging (custom role scoped to 4 collections) to the new app_read (built-in read on the whole askflorence DB). All four collections remain non-PHI per the data-classification policy; the wider role doesn't change posture because the data classification still constrains what may live on the staging cluster (enforced by the staging-collections-guard CI workflow).
The defense-in-depth argument for the narrower custom role (ADR 0004 Step B) is replaced by:
- Phase-1 static guard (
staging-collections-guard.ts) continues to enforce the allow-list of collections that may be read cross-cluster - Phase-2 nightly drift check simplifies because the custom role no longer needs per-collection enumeration — just verify
app_read@staginghas the built-inread@askflorencerole and nothing more - Cross-cluster reads never touch PHI because the staging cluster's data classification posture forbids PHI from living there in the first place
Data-ingestion coverage (ENG-230, ENG-231, ENG-236, ENG-252)
The new app_write user covers all four parallel data-ingestion workstreams without any role widening:
- ENG-230 (done): re-validate §1311 staging audit — read-only,
app_readcovers it - ENG-231 (done): refresh cadence DESIGN — no DB writes
- ENG-236 (in flight): LARK delta-aware MRF refresh pipeline — needs writes on
mrf_file_state_staging(new),formularies_staging,providers_staging. Today'sapp_writer_plansLACKS these grants (latent blocker that ENG-279 simplification removes) - ENG-252 (in flight): plan-catalog refresh cadence investigation — reads + potentially writes on
plans. Both covered byapp_read+app_write.
Consequences
Positive:
- 13 active users → 8 active users (~40% drift-surface reduction)
- The silent-regression class collapses because there are fewer narrow grants to mis-bind
- Local dev works with vanilla
MONGODB_URI+MONGODB_REFERENCE_URI— no env-var juggling required for calculator regression - ENG-236 + ENG-252 ingestion workstreams unblocked (the consolidated
app_writecoversmrf_file_state_stagingand other emerging collections without role updates) - Append-only
agent_audit_logproperty is now backed by a SINGLE consolidated user (app_audit_writer) rather than two split users (role_writer_agents+role_admin_agents) — simpler to verify, drift-check, document - ENG-266 runtime↔deploy-time separation preserved verbatim (
app_admin_schema)
Accepted costs:
- DB-wide write grant means a compromised app-tier write credential touches every collection. This is the explicit MongoDB pattern — defense in depth shifts up the stack to: (1) JWT + middleware + views for per-tenant isolation post-Phase-5; (2) ADR 0002 append-only property for the audit trail; (3)
staging-collections-guard+ Phase-2 drift check for data classification - Loss of the cross-cluster reader's per-collection role enumeration. Compensated by: data-classification policy + Phase-1 static guard ensuring nothing PHI ever lives on the staging cluster
Alternatives considered
- Keep the narrow-scoped 10-role model (ADR 0003 status quo) — rejected. Source of three silent regressions in one week. MongoDB documentation specifically calls out the over-segmentation anti-pattern.
- Per-tenant Mongo users at Phase 5 — rejected. Atlas isn't designed for per-tenant user provisioning at agent-portal scale; rotation + audit-trail complexity scales with N agents. Views + JWT scale O(1) at the DB layer.
- Single
app_fulluser withdbAdmin@askflorence— rejected. Indexes go through the deploy-time-onlyapp_admin_schemafor ENG-266 separation; lumping admin + runtime would re-introduce the blast-radius problem. - Eliminate the cross-cluster reader; replicate reference data into the prod cluster — would close the PrivateLink network path but doubles the storage cost on the prod M10 tier and adds an ingest sync to keep two copies aligned. Deferred to a future ADR if the cross-cluster path proves problematic.
References
- Issue: ENG-279
- Session brief:
docs/briefs/SESSION_BRIEF_ENG-279_mongo-user-simplification.md - Pre-change baselines:
/tmp/eng-279-baselines/SUMMARY.md(proof artifact retained until PR merges) - Superseded ADR: ADR 0003
- Preserved ADRs: ADR 0001, ADR 0002, ADR 0004
- Live state of record:
docs/infrastructure/atlas-access-matrix.md(auto-generated frominfra/atlas/access-matrix.ts) - MongoDB pattern reference: built-in roles, custom roles at DB level, views for filtering, JWT for tenant identity — see ENG-279 comment for full citation
- Related: ENG-271, ENG-272 (the regression history); ENG-266 (
app_admin_schemaseparation preserved)