Appearance
Data Classification Policy
Status: Active. Last updated April 12, 2026. Purpose: SOC 2 evidence for CC6.1 (Logical Access), CC6.5 (Data Protection), A1.2 (Availability)
Classification Levels
| Level | Definition | Examples | Encryption | Retention |
|---|---|---|---|---|
| Public | No restrictions. Intentionally published. | Plan names, metal levels, issuer names, premium amounts | At rest (AES-256) | Indefinite |
| Internal | Business-sensitive. Not for external sharing. | SLCSP calculations, data source URLs, API keys | At rest + in transit (TLS) | Duration of use |
| PII | Personally identifiable information. | Email, name, phone, address | At rest + in transit + field-level (CSFLE) | Per purpose + 7yr audit |
| PHI | Protected health information (HIPAA). | SSN, DOB, income (with health context), enrollment records | At rest + in transit + field-level (CSFLE + KMS) | Per purpose + 7yr audit |
Collection Classification
Phase 1 Collections (Active)
| Collection | Classification | Contains PII/PHI? | Encryption | Retention | Access |
|---|---|---|---|---|---|
plan_years | Public | No | At rest (Atlas default) | Per plan year (keep all years) | app-read, app-write |
plans | Public | No | At rest (Atlas default) | Per plan year (keep all years) | app-read, app-write |
regions | Public | No | At rest (Atlas default) | Per plan year (keep all years) | app-read, app-write |
zip_county | Public | No | At rest (Atlas default) | Indefinite (geographic data) | app-read, app-write |
audit_log | Internal | May contain IP addresses | At rest (Atlas default) | 7 years (TTL index) | audit-write (insert), admin (read) |
Key: Phase 1 collections contain NO PII or PHI. All data is publicly available plan information from government sources (DFS filings, marketplace data, CMS PUF).
Phase 2 Collections (Future — Not Yet Created)
| Collection | Classification | Contains PII/PHI? | Encryption | Retention | Access |
|---|---|---|---|---|---|
consumers | PHI | Yes (SSN, name, DOB, address) | At rest + CSFLE + KMS | Per purpose + 7yr audit trail | Scoped (per-consumer access) |
enrollments | PHI | Yes (links consumer to health plan) | At rest + CSFLE | Per purpose + 7yr audit trail | Broker (assigned only), consumer (own) |
broker_assignments | Internal | No (broker business info only) | At rest | Duration of relationship | Admin |
Phase 2 requires: MongoDB Client-Side Field Level Encryption (CSFLE) with AWS KMS before these collections are created. See docs/security-compliance.md for the encryption architecture.
Data Flow Classification
| Data Flow | Classification | Handling |
|---|---|---|
| User enters zip + age + income | Not stored | Stateless; used for calculation only; not persisted |
| Plan search results | Public | Returned to client; no PII |
| Waitlist email submission | PII | Stored via Resend API; not in MongoDB |
| Enrollment application (future) | PHI | Field-level encrypted in MongoDB; audit logged |
| Broker view of consumer data (future) | PHI access event | Decrypted on-demand; time-limited session; audit logged |
Source File Classification
| Source | Classification | Storage | Retention |
|---|---|---|---|
| DFS Final Exhibit ZIPs | Public (government filings) | S3 + local backup | Indefinite |
| NYSOH scraped HTML | Public (public marketplace data) | S3 + local backup | Indefinite |
| CMS PUF CSVs | Public (government data) | S3 + local backup | Indefinite |
| Official NY documents (PDFs) | Public | S3 + local backup | Indefinite |
| Data ingestion manifests | Internal | S3 (with source file checksums) | Indefinite |
Role-to-Collection Access Matrix
| Role | plan_years | plans | regions | zip_county | audit_log |
|---|---|---|---|---|---|
app-read | Read | Read | Read | Read | — |
app-write | Read/Write | Read/Write | Read/Write | Read/Write | — |
audit-write | — | — | — | — | Insert only |
| Atlas admin | Full | Full | Full | Full | Full |
SOC 2 Control Mapping
| Control | Evidence |
|---|---|
| CC6.1 (Logical Access) | Role-to-collection matrix, minimum necessary access |
| CC6.5 (Data Protection) | Classification levels, encryption requirements per level |
| A1.2 (Availability) | Retention policies, backup configuration |
| P6.1 (Privacy — Data Use) | Data flow classification, "not stored" for anonymous queries |