ADR 0006 — MongoDB user simplification (4-user functional model)

Status

Accepted — 2026-05-12 (ENG-279). Supersedes ADR 0003.

Context

ADR 0003 provisioned five custom roles + six narrow-scoped database users (app_writer_survey, app_writer_plans, app_writer_agents, app_admin_agents, audit_reader) on the staging Atlas project. Subsequent issues added more narrow users (app_writer_waitlist, app_writer_hubspot_sync, app_read_local_staging, app_read_staging, app_admin_schema) until the live user count reached 10 on staging + 4 on prod = 14 active database users by 2026-05-11.

The narrow-scoped model produced three silent regressions in the same week:

Issue	Cause	Impact
ENG-271	Narrowed staging `MONGODB_URI` to `app_read_local_staging` (lacked FIND on `providers_staging`, `formularies_staging`); `getReferenceDb()` silently fell back to it	`/api/providers/covered` + `/api/drugs/covered` returned HTTP 500 on apex (the YC-application surface). Caught only when founder typed a real doctor name.
ENG-272	Added `app_read_local_staging` + removed silent fallback; fixed deployed staging but missed canonical `.env.local`	Local dev pipeline ran against the wrong user; calculator regression scripts gave misleading results until manually corrected
ENG-279 (this ADR)	Discovered the local-side miss while wrapping up ENG-214 compliance docs	Surfaced the systemic pattern: every narrow grant creates a new way to mis-bind

The pattern is over-segmentation — exactly the anti-pattern MongoDB's own documentation calls out:

Use built-in roles before custom roles
Custom roles at DB level before collection level
Per-tenant data isolation via views + JWT in app middleware, NOT per-tenant DB users

Pre-Phase-5, askflorence content is public CMS marketplace data + agent waitlist PII. No row-level filtering is required at the DB layer — a DB-wide read grant for reads and readWrite for writes is the right scope.

Decision

Roll back to a 4-user functional model per cluster (= 8 active users across both clusters):

User	Role	Privileges	Env var bindings
`app_read`	Built-in `read@askflorence`	DB-wide FIND on `askflorence`	`MONGODB_URI` (prod + staging + local primary); `MONGODB_REFERENCE_URI` (prod cross-cluster via PrivateLink to staging cluster's `app_read`; staging local; local dev)
`app_write`	Built-in `readWrite@askflorence`	DB-wide readWrite on `askflorence`	`MONGODB_WRITE_URI` (all envs)
`app_audit_writer`	Custom `role_audit_writer` — FIND + INSERT on `agent_audit_log` only	Append-only (ADR 0002 preserved verbatim)	`MONGODB_AUDIT_WRITE_URI` (all envs); consumed via `getAuditDb()` helper in `src/lib/db.ts`; no Phase-1-4 consumer (Phase 5 audit writers drop in)
`app_admin_schema`	Custom `role_admin_schema` — FIND + CREATE/LIST/DROP INDEX on `agent_waitlist_submissions` + `agent_survey_responses` + `hubspot_sync_log`	Same as today (ENG-266 architectural separation preserved)	`MONGODB_URI_ADMIN_SCHEMA` (staging + prod); wired only to deploy-time ECS RunTask, NOT to the runtime app — keeps runtime task role from holding index-mgmt grants

The 5 deprecated env vars (MONGODB_URI_PLANS_WRITE, _SURVEY_WRITE, _WAITLIST_WRITE, _HUBSPOT_SYNC_WRITE, _AGENTS_WRITE, _AGENTS_ADMIN, _AUDIT_READ) collapse to MONGODB_WRITE_URI (the writers) and MONGODB_URI (the audit read).

Append-only `agent_audit_log` (ADR 0002 preserved)

app_audit_writer holds a custom DB role (role_audit_writer) with FIND + INSERT on agent_audit_log ONLY. UPDATE + REMOVE return "not authorized on askflorence" at the MongoDB layer regardless of app code — same property ADR 0002 specified, now consolidated into a single user across both clusters. Verified at Phase B via direct probes: insert succeeds, UPDATE/REMOVE return MongoDB unauthorized errors.

Schema-admin runtime/deploy-time separation (ENG-266 preserved)

app_admin_schema stays as a 4th user (NOT folded into app_write) because ENG-266 Phase 3.5 deliberately split runtime-app credentials from deploy-time index-maintenance credentials. The ECS RunTask family that runs scripts/db/ensure-indexes.ts mounts this secret; the runtime ECS service does NOT. Folding it would re-introduce index-management privileges into the runtime task role's blast radius for zero functional gain.

Re-narrowing playbook for Phase 5 PHI (does NOT add new DB users)

Phase 5 introduces agent + member data containing PHI. The 4-user model holds by shifting the security boundary from "narrow DB user per collection" to JWT-in-middleware + MongoDB views:

JWT identity (app middleware). Authenticated agent requests carry a JWT signed by our auth layer (magic link + TOTP). Payload includes agent_id + agent_npn. Middleware extracts these into a request context. Admins carry a JWT with admin_id + admin_role.
MongoDB views. Per-tenant data isolation is enforced via views filtered on agent_id / member_id. Agent portal route handlers query the view, NOT the underlying collection. Views are created at agent provisioning time via scripts/db/setup-agent-views.ts (Phase 5 work). The grants on app_read / app_write stay unchanged.
Admin tier reads everything via underlying collections + WHERE clauses computed from their JWT role. Audit-log every admin read (app_audit_writer insert) — the audit trail IS the access-control evidence.
Audit-log writes go through app_audit_writer via MONGODB_AUDIT_WRITE_URI — append-only property preserved across agent portal, admin dashboard, super-admin path.
Super-admin uses the /sa-login flow (CLAUDE.md): password + TOTP + IP allowlist. Connects with an Atlas user not held by any task role — break-glass operation only.

ENG-279 acceptance criteria cap total users at ≤ 5 per cluster. The 4-user baseline plus an optional 5th app_admin_full at Phase 5 (if the admin dashboard needs writes that views cannot express) lands at the cap. Any 6th user requires a follow-up ADR.

Cross-cluster PrivateLink (ADR 0004) preserved

The prod app continues to read non-PHI public CMS reference data (formularies_staging, providers_staging, plans, mrpuf_issuers_staging) from the staging Atlas cluster via AWS PrivateLink. The user backing MONGODB_REFERENCE_URI rotates from app_read_staging (custom role scoped to 4 collections) to the new app_read (built-in read on the whole askflorence DB). All four collections remain non-PHI per the data-classification policy; the wider role doesn't change posture because the data classification still constrains what may live on the staging cluster (enforced by the staging-collections-guard CI workflow).

The defense-in-depth argument for the narrower custom role (ADR 0004 Step B) is replaced by:

Phase-1 static guard (staging-collections-guard.ts) continues to enforce the allow-list of collections that may be read cross-cluster
Phase-2 nightly drift check simplifies because the custom role no longer needs per-collection enumeration — just verify app_read@staging has the built-in read@askflorence role and nothing more
Cross-cluster reads never touch PHI because the staging cluster's data classification posture forbids PHI from living there in the first place

Data-ingestion coverage (ENG-230, ENG-231, ENG-236, ENG-252)

The new app_write user covers all four parallel data-ingestion workstreams without any role widening:

ENG-230 (done): re-validate §1311 staging audit — read-only, app_read covers it
ENG-231 (done): refresh cadence DESIGN — no DB writes
ENG-236 (in flight): LARK delta-aware MRF refresh pipeline — needs writes on mrf_file_state_staging (new), formularies_staging, providers_staging. Today's app_writer_plans LACKS these grants (latent blocker that ENG-279 simplification removes)
ENG-252 (in flight): plan-catalog refresh cadence investigation — reads + potentially writes on plans. Both covered by app_read + app_write.

Consequences

Positive:

13 active users → 8 active users (~40% drift-surface reduction)
The silent-regression class collapses because there are fewer narrow grants to mis-bind
Local dev works with vanilla MONGODB_URI + MONGODB_REFERENCE_URI — no env-var juggling required for calculator regression
ENG-236 + ENG-252 ingestion workstreams unblocked (the consolidated app_write covers mrf_file_state_staging and other emerging collections without role updates)
Append-only agent_audit_log property is now backed by a SINGLE consolidated user (app_audit_writer) rather than two split users (role_writer_agents + role_admin_agents) — simpler to verify, drift-check, document
ENG-266 runtime↔deploy-time separation preserved verbatim (app_admin_schema)

Accepted costs:

DB-wide write grant means a compromised app-tier write credential touches every collection. This is the explicit MongoDB pattern — defense in depth shifts up the stack to: (1) JWT + middleware + views for per-tenant isolation post-Phase-5; (2) ADR 0002 append-only property for the audit trail; (3) staging-collections-guard + Phase-2 drift check for data classification
Loss of the cross-cluster reader's per-collection role enumeration. Compensated by: data-classification policy + Phase-1 static guard ensuring nothing PHI ever lives on the staging cluster

Alternatives considered

Keep the narrow-scoped 10-role model (ADR 0003 status quo) — rejected. Source of three silent regressions in one week. MongoDB documentation specifically calls out the over-segmentation anti-pattern.
Per-tenant Mongo users at Phase 5 — rejected. Atlas isn't designed for per-tenant user provisioning at agent-portal scale; rotation + audit-trail complexity scales with N agents. Views + JWT scale O(1) at the DB layer.
Single app_full user with dbAdmin@askflorence — rejected. Indexes go through the deploy-time-only app_admin_schema for ENG-266 separation; lumping admin + runtime would re-introduce the blast-radius problem.
Eliminate the cross-cluster reader; replicate reference data into the prod cluster — would close the PrivateLink network path but doubles the storage cost on the prod M10 tier and adds an ingest sync to keep two copies aligned. Deferred to a future ADR if the cross-cluster path proves problematic.

References

Issue: ENG-279
Session brief: docs/briefs/SESSION_BRIEF_ENG-279_mongo-user-simplification.md
Pre-change baselines: /tmp/eng-279-baselines/SUMMARY.md (proof artifact retained until PR merges)
Superseded ADR: ADR 0003
Preserved ADRs: ADR 0001, ADR 0002, ADR 0004
Live state of record: docs/infrastructure/atlas-access-matrix.md (auto-generated from infra/atlas/access-matrix.ts)
MongoDB pattern reference: built-in roles, custom roles at DB level, views for filtering, JWT for tenant identity — see ENG-279 comment for full citation
Related: ENG-271, ENG-272 (the regression history); ENG-266 (app_admin_schema separation preserved)