Incident Response Plan

Status: Active. Effective 2026-05-11. Owner: Taha Abbasi (Incident Commander) + Asad Khalid (Compliance Liaison) + Ian Friend (Comms Lead). Reviewed: Annually + after every Severity 2+ incident as part of post-mortem.

Purpose

Defines roles, escalation paths, regulatory notification timelines, and the operational playbook for security and privacy incidents. Required artifact for:

HIPAA §164.308(a)(6) — security incident procedures
HIPAA Breach Notification Rule (45 CFR §164.400-414) — 60-day window from discovery; HHS + affected individuals + (if >500) media notification
CMS EDE Phase 3 / MARS-E 2.2 IR-1 through IR-8 — incident response policy + procedures + handling + monitoring + reporting + assistance + plan + spillage response
SOC 2 CC7.3 — incident response; CC7.4 — disclosure
NIST 800-53 R4 Moderate IR-family controls
State breach notification statutes (varies; default to longest applicable — CA = 30 days for affected residents, others vary)

The operational playbook lives at docs/runbooks/security-incident-response.md. This document defines roles, criteria, and timelines; the runbook is the moment-of-incident first-responder checklist.

Roles

Role	Person	Backup	Responsibilities
Incident Commander	Taha Abbasi	Asad Khalid (until a second technical admin is provisioned, see risk R-001)	Owns the incident end-to-end. Triggers containment, drives RCA, owns the post-mortem.
Compliance Liaison	Asad Khalid	Taha Abbasi	Owns regulatory clock — calculates Breach Notification Rule timing, drafts HHS / state-AG / affected-individual notifications. Owns vendor BAA + sub-processor coordination.
Comms Lead	Ian Friend	Asad Khalid	Owns external comms — affected-individual emails, public statement (if needed), agent + investor comms.
Engineering Responder	Whoever is most-relevant to the incident's blast radius	All engineers	Hands-on remediation, evidence preservation, technical RCA
Customer + Agent Liaison	Ian Friend	Asad Khalid	Direct contact with affected agents / future members. Owns the "what to tell them" decision matrix.

Severity classification

Severity	Definition	Notification window (internal)	Examples
SEV-0	Confirmed PHI breach OR active exploit OR site fully down to customers	Immediate (within 15 min of detection — page the IC)	Database publicly accessible; ransomware; credential dump on the dark web
SEV-1	Suspected PHI breach OR significant data integrity issue OR auth/MFA bypass OR vendor BAA breach	Within 1 hour	Suspect lateral movement; suspected SQL injection; vendor reports BAA violation
SEV-2	Limited data exposure OR availability degradation OR misconfiguration with potential breach implication	Within 4 hours	A `GET /api/waitlist` triggers spurious emails; PostHog free-tier captures unexpected data; an admin login from an unrecognized IP
SEV-3	Operational defect with potential compliance implication but no immediate data exposure	Within 24 hours	Quarterly access review missed; BAA expiration not renewed in time; CI guard fails on a PR

PHI involvement upgrades the severity by one tier minimum. When in doubt, classify higher.

Detection sources

The incident-response practice depends on these signals being monitored:

Source	What it surfaces	Status today
AWS GuardDuty	Account-level threat detection (anomalous API patterns, credential misuse, malware)	Active, org-wide, log-archive aggregation
AWS Security Hub	Cross-service findings (CIS, FSBP, NIST 800-53 standards)	Active, org-wide
AWS CloudTrail	All AWS API activity, 7-year retention in log-archive S3	Active, org-wide
Atlas database audit	DB-layer auth attempts + admin actions	Active on M10 HIPAA tier (prod) + M30 (staging)
`agent_audit_log` collection	App-layer auth + admin actions + future per-record PHI access	Live (append-only enforced at DB layer)
`staging-cluster-drift` workflow	Nightly drift check on cross-cluster reader role; opens P1 issue on drift	Active, daily 08:00 UTC
`staging-collections-guard` workflow	Per-PR static guard on cross-cluster data classification	Active
`validate-secrets` workflow	Per-PR check for malformed secrets (the bug class that broke Resend)	Active
Vercel / AWS WAF logs	L7 attack patterns, blocked requests	Active (separate WAF rule-exclusion item pending for PostHog crawlers)
Customer / agent report	External party emails `security@askflorence.health` or `taha@`	Always active; no formal triage queue yet
Founder direct observation	"Email's not sending," "the cost spiked," "this CTA does nothing"	The 2026-04-30 home CTA no-op + 2026-04-10 Resend incident were detected this way

Procedure (5-step lifecycle)

The runbook is the moment-of-incident checklist; this is the lifecycle framework.

Step 1 — Detect

Trigger source: any of the detection signals above. Page the IC within the notification window above.
IC ack within the window — text or call from any team member is acceptable.
Open a private incident channel (Slack DM thread or Google Chat space — 🚨 sev-N incident <date> <short slug>).

Step 2 — Contain

Block the immediate vector. Examples:
- Revoke compromised credentials (Secrets Manager update-secret + IAM key rotation)
- Disable the affected route (/api/... 503 toggle or feature flag)
- Quarantine the affected ECS task / Lambda function
- Block the source IP at WAF or Atlas allowlist
Preserve evidence — do NOT delete logs, do NOT clean up. Snapshot the relevant Atlas cluster + S3 bucket BEFORE remediating if there is any chance of post-mortem need.

Step 3 — Assess

IC + Engineering Responder: what data was accessed? Which subjects affected? Scope of exposure?
Compliance Liaison: is this a HIPAA breach (45 CFR §164.402 definition)? If yes, the 60-day clock starts at discovery (not "investigation complete") — log the timestamp.
Time-bound the assessment. SEV-0/1 assessment in ≤ 24h; SEV-2 in ≤ 72h.

Step 4 — Notify

Recipient	Trigger	Timeline	Channel
Affected individuals	HIPAA breach involving their PHI	Within 60 days of discovery; state laws may require earlier (CA = 30 days for affected residents)	Letter or email per individual preference + HHS-mandated content
HHS Office for Civil Rights	HIPAA breach affecting any individual	Within 60 days (>500 affected) or annually (<500); via OCR breach portal	Online submission
Media	HIPAA breach affecting >500 individuals in a state	Within 60 days; "prominent media outlet" in the state	Press release
State Attorney General	Per state-specific law (CA, NY, etc.)	Varies — default to 30 days	Per state-specific procedure
CMS EDE program contact	EDE program-eligibility-relevant incident	Per EDE Phase 3 program requirements (once submitted)	EDE program portal
Vendor BAA partners	Incident involving their data flows	Per BAA terms (typically 30 days)	Per vendor contract
Investors + advisors	SEV-0 customer-facing incident	Same business day	Email + scheduled brief
Internal team	Any SEV-0/1	Immediate	Private Google Chat / Slack

Compliance Liaison owns the regulatory clock. A spreadsheet template lives in the runbook for tracking notification deadlines per breach.

Step 5 — Remediate + post-mortem

Implement the fix. Document the fix.
Within 5 business days of incident close, the IC files a post-mortem at docs/session-log/<date>-incident-<slug>.md covering:
- Timeline (detection → containment → notification → remediation)
- Root cause analysis
- Contributing factors
- What worked
- What didn't
- Preventive measures (with owners + due dates)
- Status of regulatory notifications
Post-mortem is reviewed at the next quarterly access review; preventive-measure follow-ups are tracked to closure.

Documented incident history (worked examples)

These are the actually-documented incidents AskFlorence has encountered. They are the IRP's worked examples — drilling against these scenarios builds the muscle for real ones.

2026-04-10 — Resend transactional email outage (SEV-2 in retrospect)

Detection: founder-side test send showed no delivery; downstream Vercel logs showed Resend 401s
Root cause: literal \n character embedded in the RESEND_API_KEY Vercel env var + Resend domain DKIM CNAMEs never published; both compounded to cause updates.askflorence.health domain to fail Resend's status check
Impact: 3 weeks of broken transactional email on Vercel-hosted prod (waitlist confirmations + ops notifications). No external party complained because the volume was low pre-AWS-cutover; impact = ~30-40 lost confirmation emails to early waitlist signups
Resolution: decision to retire Resend in favor of AWS SES (v0.33.0 commits, 2026-04-30)
Preventive measure: validate-secrets CI workflow (now live)
What IRP would have done differently: if this had been SEV-1 we should have notified affected individuals (waitlist signups didn't receive their confirmation); decision made retrospectively to absorb the customer-trust impact rather than re-notify

2026-04-10 — `GET /api/waitlist` crawler-triggered SES sends (SEV-3)

Detection: founder observed unexpected SES Send metric values + spam-folder mail to a hardcoded address
Root cause: GET handler triggered a real SES send for every request; Googlebot / unfurlers / monitoring tools hit the URL ~15-25 times over 30 days
Impact: ~15-25 spurious emails to a hardcoded ops address. No PII / PHI exposure (no external recipient leak). Documented at commit 4422ca8
Preventive measure: engineering rule documented in CLAUDE.md — "no side-effect-triggering code in a GET handler unless gated on auth or NODE_ENV != production"
IRP role this exercised: detection + containment + post-mortem. No regulatory notification was needed (no PHI / PII involved beyond a single internal address).

2026-04-30 — Home CTA no-op (SEV-2)

Detection: founder noticed signup count was zero in HubSpot post-deploy
Root cause: v0.29.0 home swap shipped a fake-success CTA handler that didn't actually call /api/waitlist
Impact: every click between v0.29.0 deploy and the fix produced no record (no Mongo row, no SES email, no PostHog event). Estimated 5-15 lost signups based on landing-page traffic
Preventive measure: post-deploy smoke testing of conversion-critical CTAs
IRP exercise: detection by founder observation; remediation in same session

2026-05-06 — CMS ingest cost spike (SEV-3)

Detection: monthly Atlas billing email; cost on M60 ~$2,800/mo
Root cause: full re-ingest pattern on M60 instead of delta-aware refresh
Impact: financial only — $2,000+ unbudgeted Atlas spend
Resolution: delta-aware refresh cadence per decisions/2026-05-09-refresh-cadence.md
Preventive measure: budget alarms on AWS + Atlas cost (planned)
IRP exercise: assessment + remediation + post-mortem (in the decision doc)

Detection: founder noticed a real email address (taha@askflorence.health) was in the test-data cleanup list of a HubSpot script run
Root cause: test cleanup script used POST /crm/v3/objects/contacts/gdpr-delete indiscriminately — that endpoint is irreversible portal-level blocklist
Impact: primary work email of the founder permanently blocked from HubSpot's UI / CSV-import paths (auto-write via API still works); HubSpot Support confirmed irreversible
Preventive measure: engineering convention captured in CLAUDE.md — use +alias@ test addresses for HubSpot data; archive endpoint for soft-delete; never gdpr-delete a real address
IRP exercise: assessment, no regulatory notification needed (no PHI affected; only impacted internal workflow ergonomics)

Tabletop exercise

The IRP becomes effective when the team has practiced against it. First tabletop scheduled for Q2 2026 access review (July 2026) — a 60-minute walkthrough of a SEV-1 PHI-breach scenario, role-playing IC + Compliance Liaison + Comms Lead. Tabletop outcomes are documented in the same access-review file.

Reference

Operational playbook: docs/runbooks/security-incident-response.md
Break-glass procedure: docs/runbooks/break-glass-root-login.md
Risk Assessment — incident-relevant risks
Data Retention Policy — erasure flow that may be triggered by an incident
Privacy Impact Assessment — data flows that determine incident scope
Access Control Policy — credential rotation as remediation
Vendor / Subprocessor Register — vendor BAA contact info
HIPAA Breach Notification Rule: 45 CFR §§164.400-414
HHS OCR Breach Portal: https://ocrportal.hhs.gov/ocr/breach/wizard_breach.jsf