Skip to content

Data Classification Policy

Status: Active. Last updated April 12, 2026. Purpose: SOC 2 evidence for CC6.1 (Logical Access), CC6.5 (Data Protection), A1.2 (Availability)


Classification Levels

LevelDefinitionExamplesEncryptionRetention
PublicNo restrictions. Intentionally published.Plan names, metal levels, issuer names, premium amountsAt rest (AES-256)Indefinite
InternalBusiness-sensitive. Not for external sharing.SLCSP calculations, data source URLs, API keysAt rest + in transit (TLS)Duration of use
PIIPersonally identifiable information.Email, name, phone, addressAt rest + in transit + field-level (CSFLE)Per purpose + 7yr audit
PHIProtected health information (HIPAA).SSN, DOB, income (with health context), enrollment recordsAt rest + in transit + field-level (CSFLE + KMS)Per purpose + 7yr audit

Collection Classification

Phase 1 Collections (Active)

CollectionClassificationContains PII/PHI?EncryptionRetentionAccess
plan_yearsPublicNoAt rest (Atlas default)Per plan year (keep all years)app-read, app-write
plansPublicNoAt rest (Atlas default)Per plan year (keep all years)app-read, app-write
regionsPublicNoAt rest (Atlas default)Per plan year (keep all years)app-read, app-write
zip_countyPublicNoAt rest (Atlas default)Indefinite (geographic data)app-read, app-write
audit_logInternalMay contain IP addressesAt rest (Atlas default)7 years (TTL index)audit-write (insert), admin (read)

Key: Phase 1 collections contain NO PII or PHI. All data is publicly available plan information from government sources (DFS filings, marketplace data, CMS PUF).

Phase 2 Collections (Future — Not Yet Created)

CollectionClassificationContains PII/PHI?EncryptionRetentionAccess
consumersPHIYes (SSN, name, DOB, address)At rest + CSFLE + KMSPer purpose + 7yr audit trailScoped (per-consumer access)
enrollmentsPHIYes (links consumer to health plan)At rest + CSFLEPer purpose + 7yr audit trailBroker (assigned only), consumer (own)
broker_assignmentsInternalNo (broker business info only)At restDuration of relationshipAdmin

Phase 2 requires: MongoDB Client-Side Field Level Encryption (CSFLE) with AWS KMS before these collections are created. See docs/security-compliance.md for the encryption architecture.


Data Flow Classification

Data FlowClassificationHandling
User enters zip + age + incomeNot storedStateless; used for calculation only; not persisted
Plan search resultsPublicReturned to client; no PII
Waitlist email submissionPIIStored via Resend API; not in MongoDB
Enrollment application (future)PHIField-level encrypted in MongoDB; audit logged
Broker view of consumer data (future)PHI access eventDecrypted on-demand; time-limited session; audit logged

Source File Classification

SourceClassificationStorageRetention
DFS Final Exhibit ZIPsPublic (government filings)S3 + local backupIndefinite
NYSOH scraped HTMLPublic (public marketplace data)S3 + local backupIndefinite
CMS PUF CSVsPublic (government data)S3 + local backupIndefinite
Official NY documents (PDFs)PublicS3 + local backupIndefinite
Data ingestion manifestsInternalS3 (with source file checksums)Indefinite

Role-to-Collection Access Matrix

Roleplan_yearsplansregionszip_countyaudit_log
app-readReadReadReadRead
app-writeRead/WriteRead/WriteRead/WriteRead/Write
audit-writeInsert only
Atlas adminFullFullFullFullFull

SOC 2 Control Mapping

ControlEvidence
CC6.1 (Logical Access)Role-to-collection matrix, minimum necessary access
CC6.5 (Data Protection)Classification levels, encryption requirements per level
A1.2 (Availability)Retention policies, backup configuration
P6.1 (Privacy — Data Use)Data flow classification, "not stored" for anonymous queries

AskFlorence Internal Documentation. Not for public distribution.