Sensitive data handling — Member portal

Status: Draft. Phase A deliverable per ENG-187. Must be reviewed + signed off before Phase B sections that collect SSN / immigration documents / payment data ship code.

Owner: Taha (founder, CTO-of-record). Reviewer: Asad (CFO, compliance owner).

Scope: every field the member portal collects that, if leaked, would create an identity-theft, fraud, or HIPAA breach risk. Concretely: SSN, immigration document numbers, full DOB, full address, full income detail, payment account / card numbers. The control set below is the floor — Phase B sections may add controls but cannot remove any.

This doc lives under docs/security-compliance/ alongside encryption-policy.md, data-retention-policy.md, and access-control-policy.md. It's referenced from ENG-187 under the "Sensitive data handling" plan section.

1. Storage at rest

Field	At-rest treatment	Rationale
SSN	CSFLE field-level encryption with KMS-CMK-derived data keys, AES-256-CBC, deterministic algorithm so equality-match queries work for SSA verification re-runs	Highest-value identity theft vector. AWS BAA covers KMS. Driver-layer encryption means even direct Atlas queries by `app_write` return ciphertext
Immigration document numbers (I-551, I-94, I-766, etc.)	CSFLE field-level encryption, same key family as SSN	Same risk class. SAVE verification re-uses the value
Full DOB	CSFLE field-level encryption	Combined with name + ZIP this is a re-identification vector
Full home address	Plaintext in main collection, encrypted backups	Already widely-handled in agent flows; not a unique re-id vector on its own
Phone, email	Plaintext	Standard contact info
Income per source ($amount, frequency, employer name)	Plaintext in main collection	Member sees this in their portal as the editable record; FFM submission requires plaintext
Bank routing + account number	CSFLE field-level encryption, separate key from SSN family for blast-radius isolation	Direct fraud vector
Card primary account number (PAN)	Never stored in member_applications. PAN tokenized via payment vendor (Stripe / Square / chosen Phase B vendor) at moment of entry; we store only vendor-side token + last-4 + brand	PCI-DSS: storing PANs requires Level 1 attestation we don't want to assume

CSFLE is enforced at the MongoDB driver layer (autoEncryption with mongocryptd). Application code SETs values as plaintext; driver encrypts on write. Reads through the driver decrypt automatically. Direct Atlas queries (e.g. an Atlas admin running a query in the UI) return ciphertext for encrypted fields. This is the property we want for tamper / leak / insider-access containment.

Key management: KMS-CMK in the prod AWS account, alias alias/prod-member-portal-csfle-master, rotation enabled (annual). Data Encryption Keys (DEKs) are stored in a dedicated __keyVault collection per MongoDB CSFLE convention. DEKs are wrapped by the CMK.

Key rotation cadence: annual rotation of the master CMK (automatic). Re-encryption of existing data is not automatic but is also not required for envelope encryption — only the new wrappings use the new key version. Member-portal application data has a 7-year retention horizon (per data-retention-policy.md); re-encryption job is deferred unless a compromise indicator forces it.

2. Transport

TLS 1.2 minimum on every hop: viewer → CloudFront, CloudFront → ALB (custom-origin), ALB → ECS task. The ACM cert covering *.askflorence.health is used for both the CloudFront viewer cert AND the origin-side TLS handshake to ALB
Internal VPC traffic is TLS too — no plaintext on the wire even within our VPC. ALB → task uses HTTPS (port 443 / certified at the task per ECS service config)
MongoDB Atlas connections use TLS 1.2 minimum, certificate-validated

3. Presentation to the member

For SSN and immigration document numbers — the two highest-value re-identification fields — apply this rule:

Default presentation: masked. When a portal page renders a previously-captured value, the value is masked: ***-**-1234 for SSN, A123 4567 **** style for document numbers. The full value is never sent to the browser on the standard read path.

Edit path: step-up verification. When the member clicks "edit my SSN" or "edit my immigration document":

The page presents a step-up verification challenge (re-enter password OR re-do magic link OR TOTP if enabled)
On successful challenge, server returns the unmasked value to the browser ONE TIME, in the edit form
After submit (or cancel), the page reverts to masked display
The unmasked-value response includes a Cache-Control: no-store + Clear-Site-Data: "cache" header to prevent retention in browser cache

No "always-visible" mode, ever. Members who want to verify their SSN with a third party (employer, bank) must reveal it through the step-up path.

For DOB: shown in full (it's already on the calculator and on every form section the member filled). For income: shown in full (the member wrote it). For address: shown in full. For payment fields: vendor-tokenized; we only show **** 1234 · Visa (last-4 + brand).

4. Step-up verification before reveal

The challenge required to unmask a sensitive field:

Phase	Challenge
MVP-1 (single-factor)	Re-enter email + receive magic link + click within 10 min
Phase F (MFA on)	Magic link + TOTP / hardware-key challenge

The challenge response token is single-use, 10-min TTL, bound to the specific field reveal (scope: "reveal_ssn" in the JWT claim). Re-using the token for a different reveal action fails server-side validation.

5. Logging discipline

Application logs MUST NEVER include sensitive field values. The deny-list is enforced at the structured-logger layer (src/lib/logger.ts — to be created Phase A). Deny-listed property names:

ssn, ssn_last_four, immigration_document_number, immigration_doc_number,
date_of_birth, dob, bank_routing_number, bank_account_number,
card_pan, card_number, card_cvv, card_exp

The logger redacts these property names recursively in any object passed to logger.{info,warn,error} calls — value becomes [REDACTED]. A CI check (Phase B) scans logger.* call sites for hand-built strings that include deny-listed substrings.

Stack traces are stripped of all query, body, headers.cookie, headers.authorization fields before being shipped to CloudWatch / observability backends.

No PHI in error messages returned to clients. Server errors that bubble from validation must use generic copy ("We couldn't save your changes — please try again or contact support").

6. Backup + restore

Encrypted Atlas backups remain encrypted at rest. The MongoDB CSFLE master key (KMS-CMK) is also backed up — losing it means permanent loss of all sensitive data.

Dual-control restore: restoring from backup requires (a) Atlas project admin (Taha) + (b) AWS KMS-CMK access (Taha; Asad as backup). Both controls are documented in access-control-policy.md. The restore runbook lives alongside break-glass-root-login.md and atlas-user-provisioning.md — to be created Phase A as member-portal-restore.md in the same runbook directory.

7. Egress controls

Sensitive fields NEVER appear in:

HubSpot sync — member-portal data has zero HubSpot egress. The HubSpot sync worker explicitly skips the member_applications collection. Codified in src/lib/hubspot-sync.ts allowlist (collection-scoped, not field-scoped, for defense-in-depth)
First-party analytics events (OpenPanel) — event names + properties use only sanitized fields (step_key, section_completed, submission_status, bucketed values). NEVER include identity values, SSN, doc numbers, raw income, etc. (ADR 0009)
SES email content — email templates pull only the member's first name, the plan name, and the application ID. No SSN, DOB, income, or document numbers in email bodies
Outbound webhooks (Phase E+ FFM ack handler) — payloads sanitized via a separate outboundPayloadFilter before send

A CI check scans for hand-built JSON bodies that include sensitive field names.

8. What we present vs what we hide

Field	Member can see in portal	Pattern
Full name	Yes	Always visible
Date of birth	Yes	Always visible
Sex	Yes	Always visible
Home address	Yes	Always visible
Phone	Yes	Always visible
Email	Yes	Always visible
SSN	Masked default; step-up to reveal	`*--1234`
Immigration doc number	Masked default; step-up to reveal	`A123 4567 ****`
Citizenship status	Yes	Always visible
Income detail (per source)	Yes	Always visible
Employer name + EIN	Yes	Always visible
Bank routing	Masked	`***0123`
Bank account	Masked default; step-up to reveal	`*****1234`
Card	Tokenized; last-4 + brand only	`**** 1234 · Visa`

For household members: same rules apply to each member's sensitive fields. The primary applicant can see masked summaries of all household members' fields (since they entered them); step-up reveal is per-member.

9. Audit trail

Every read of a sensitive field — even by the member themselves — appends an entry to agent_audit_log:

{
  event: "member_sensitive_field_accessed",
  actor: { type: "member" | "agent" | "system", accountId, sessionId },
  applicationId,
  fieldPath,                    // generic name, never the value
  revealedTo: "member_ui" | "agent_review" | "ffm_submission",
  timestamp
}

Auditor (with audit_reader role) can reconstruct: who saw what, when. The audit log is append-only per ADR 0002 and tamper-evident.

10. Data retention

Per data-retention-policy.md, member-portal data retention windows:

State	Retention
Abandoned drafts (no email captured)	90 days from last update, then hard-delete
Abandoned drafts (email captured)	18 months from last update, then hard-delete
Submitted-but-not-yet-active	Through coverage year + 7 years (tax retention)
Active member	Through coverage year + 7 years post-termination
Florence conversation logs	Same as parent application doc
`agent_audit_log`	7 years minimum (HIPAA + EDE Year 9 default), 10 years for safety
Backups	90 days rolling; encrypted; subject to dual-control restore

Deletion is hard-delete (not soft-delete) at TTL boundary. CSFLE-encrypted fields go to ciphertext-shredding at TTL — the DEK for that range is destroyed, rendering the ciphertext unrecoverable even with the master key.

A right-to-be-forgotten request that comes BEFORE the retention TTL fires triggers an immediate ciphertext-shredding path on the affected document, EXCEPT for the agent_audit_log entries which must be retained for compliance — those rows have the personallyIdentifiable: false invariant (only IDs + event types, no names/SSNs/DOBs).

11. Plan-of-record for Phase B section integration

Every Phase B section that collects a sensitive field must:

Tag the field in the per-section Zod schema with .brand<"sensitive">()
Add an entry to SENSITIVE_FIELDS registry in src/lib/portal/sensitive-fields.ts (drives the masking + step-up + logger deny-list)
Write an integration test asserting (a) masked display on read, (b) step-up challenge on edit, (c) audit-log entry on reveal, (d) deny-list scrub in logs
Document the field in a Phase B addendum to this doc

Sections that include sensitive fields (per the canonical 9-section scope):

Section 1 (primary applicant identity): SSN
Section 2 (household composition): per-member SSN, DOB
Section 3 (citizenship / immigration): immigration document numbers
Section 9 (payment): bank or card details

Section 4 (income) does NOT collect a directly-sensitive field per this doc's definition (income amount is not a re-identification vector on its own), but it IS subject to the HubSpot-egress block and the first-party-analytics no-PHI rule (OpenPanel: bucketed income only, never raw).

12. Open questions for sign-off

[ ] Approve the masked-by-default + step-up-reveal pattern for SSN and document numbers, OR opt-in to a different pattern (e.g. "never show, even on edit — always require re-entry")
[ ] Approve the 18-month retention for abandoned drafts with captured email — too long? too short?
[ ] Approve the dual-control restore designation: Taha + Asad. Need a third on-call backup before Asad is fully onboarded
[ ] Approve the payment vendor approach (Stripe / Square / other) — separate vendor decision, but it locks the PCI scope downstream

Sign-off

Reviewer	Role	Status	Date
Taha Abbasi	CTO-of-record	Pending	—
Asad Khalid	CFO / compliance owner	Pending	—

Once both sign off, link this doc from the Phase A PR and from the ENG-187 Linear issue.