Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Roadmap ​

How Florence gets built and shipped, against the dependency graph of AWS migration (#47), deterministic-flow completion, compliance work (#55, #56, #57), and the broader data-classification rollout.

Guiding sequencing rule ​

Florence AI integration ships only after:

  1. AWS migration is complete (#47).
  2. The full deterministic lead → enrollment → member-servicing flow is live for consumer AND agent users, without Florence.

Research, design, eval-harness bootstrapping, data-classification retrofit, and FlorenceRuntime prototypes run in parallel, now. No Florence code ships to production ahead of the sequencing rule.

Sequence (ordering, not timeline) ​

Phases below are in dependency order, not on a calendar. Each phase ships when its entry criteria are met — not on a fixed clock. The dependencies graph at the end of this doc is the load-bearing artifact.

Work that runs now, in parallel ​

These tracks do not block on AWS migration or deterministic-flow completion. They harden the foundation so Florence integration is routine when its turn comes.

Data classification Layer 1 + Layer 2 retrofit ​

Goal: brand-type data classes and typed adapter sinks on the existing codebase. Retrofit Resend / SES, PostHog, Atlas drivers, CMS API client, any future HubSpot.

Why now: independently valuable for the waitlist + agent-portal work already in flight. Every Florence tool wrapper depends on this pattern. Doing it pre-Florence removes the "retrofit while in motion" risk.

Reference: infrastructure/data-classification. Broader compliance plan tracked under #57 / #58.

FlorenceRuntime spike (throwaway) ​

Goal: prove the streaming + tool-use + grounding loop end-to-end on Claude Agent SDK against the existing /api/plans endpoint. Thrown away; we keep the learnings.

Why now: de-risks the runtime choice before we commit to it. Catches integration surprises while they're cheap.

Eval harness bootstrap (first 50 golden evals) ​

Goal: write the first 50 golden Q&A evals by hand, targeting the existing deterministic surface (plan search, eligibility). Even without Florence, these are the spec.

Why now: the golden set is the hardest-to-build-fast artifact. Writing it early means we're not launching with an unvalidated assistant.

Voice vendor partnership outreach ​

Goal: open conversations with Deepgram, Cartesia, ElevenLabs on dedicated-VPC deployment pricing, FedRAMP-reference-customer status, volume pricing at 10k → 100k members. Secure term sheets ahead of need.

Why now: leverage is highest when we're still pre-launch and talking possibilities. Detail: voice vendor partnership track comment on #61.

FTI compliance read (counsel) ​

Goal: formal read from an EDE-literate compliance attorney on whether self-attested income (pre-enrollment subsidy estimates) counts as FTI, or only IRS-sourced data via the CMS Hub counts. Outcome shapes the voice-vendor surface area dramatically.

Why now: lowest cost, highest-leverage single item on the whole voice track.

user_profile schema design ​

Goal: design the user_profile Mongo schema to be fillable from both UI forms AND conversation extraction (see runtime — conversation-as-form). Design for classification from day 1.

Why now: this schema is the skeleton every downstream Florence feature (intake, renewal, servicing) hangs on.

Phase 1 — text Florence, member-mode ​

Prerequisites: #47 complete; deterministic lead → enrollment → member-servicing flow live; eval harness at ≥ 200 golden cases; data-classification Layer 1+2 shipped.

Scope:

  • Claude Agent SDK runtime in our AWS-hosted stack.
  • Member-mode system prompt + tool surface (initially: api_search_plans, api_check_eligibility, api_check_drug_coverage, api_check_provider_network, api_get_member_*, ui_* set, api_escalate_to_human).
  • Guardrails: all 5 layers active (input + output classifiers in full, tool auth, jailbreak evals, canary tokens).
  • Grounding check in shadow mode; enable blocking once production false-positive rate ≤ 1 % on a statistically meaningful turn volume.
  • Right-rail desktop + bottom-sheet mobile UX.
  • English only at this stage (Spanish rolls with voice in 1.5).

Rollout: shadow-against-human until eval pass rate + human-parity metrics hold, then gradual user rollout — 1 %, 10 %, 50 %, 100 % as each cohort's unit-economics + quality targets stay green.

Exit criteria: ≥ 98 % eval pass rate across all categories; ≤ 5 % escalation rate; unit-economics targets met.

Phase 1.5 — voice ​

Prerequisites: Phase 1 in production with unit-economics + quality + escalation-rate targets sustained across a statistically meaningful conversation volume.

Scope:

  • ASR (Deepgram Nova-3) + TTS (Cartesia Sonic-2) adapters.
  • EN + ES voices (Spanish text support comes along in this phase — separate work item but same rollout window).
  • Reference-audio corpus collection from day 1 (consent-captured, in boundary).
  • Voice telemetry dashboard (confidence, latency, per-minute cost).
  • End-of-speech to first-audio latency target ≤ 400 ms.

Rollout: voice is opt-in per conversation. Shadow-mode unavailable for voice; instead, initial cohort is capped at ≤ 10 % of eligible users until latency + quality + unit-economics targets hold on live voice traffic.

Phase 5 — agent-mode Florence ​

Prerequisites: #47 complete (already required for Phase 1); full agent auth + portal live (Phase 5 of agent platform); Phase 1 text Florence in production.

Scope:

  • Agent-mode system prompt + tool surface: api_list_my_assigned_members, api_get_member_full_history, api_draft_sep_letter, api_compose_member_message, api_assign_escalation, api_view_audit_trail, + member-view tools on behalf of assigned members.
  • Separate eval set — PHI-boundary tests (agent A cannot see agent B's members), draft-not-send patterns, compliance-language accuracy for SEP letters.
  • Admin-dashboard-integrated escalation queue (replaces Phase 1's email + Mongo-row v1).

Phase 3 — EDE readiness ​

Prerequisites: ISA signed with CMS; FedRAMP Moderate (or GovCloud equivalent) infrastructure complete; data-classification Layers 3–5 complete (CSFLE per class, network/account isolation, CI data-flow graph assertion).

Scope:

  • LLM provider swap: Anthropic direct → Bedrock (config change thanks to adapter sinks).
  • Voice: whichever of Track A (dedicated VPC deployment of Deepgram/Cartesia), Track B (vendor FedRAMP certification), or Track C (self-hosted Florence-voice) has landed becomes primary. Others remain as fallbacks.
  • Authenticated-member Florence-voice possibly becomes default (brand moat, "your Florence").
  • All audit retention migrated to EDE-safer (10 years).

Decision points and revisit schedule ​

Decision: Anthropic-direct vs. Bedrock for Phase 1 launch ​

Deferred until adapter sinks are shipped. At that point the decision is one-line: LLM_PROVIDER=anthropic or LLM_PROVIDER=bedrock. Current lean: Bedrock from day 1 — the Bedrock Runtime VPC endpoint is already provisioned (Phase 4 infra), one fewer BAA to track, and it validates the Phase 3 code path in production under real traffic. Revisit when FlorenceRuntime spike concludes.

Decision: revisit self-hosted (open-weight) models ​

Revisit when steady-state LLM spend > $5 k / month AND eval harness is mature enough to detect model-family regressions reliably. Hermes-4 / Qwen-based tunes for the lookup-intent tier is the most plausible first target. Adapter sinks make this a one-file swap when the time comes.

Decision: grounding check — shadow vs. blocking ​

Launch Phase 1 with grounding check in shadow mode. Flip to blocking once false-positive rate ≤ 1 % on a statistically meaningful production turn volume. If not, tune the ground-truth extractor and re-evaluate; do not ship with blocking enabled above 1 % FP because it degrades UX.

Dependencies visualized ​

Open questions still blocking ​

  • Final Florence-AI / AskFlorence AI naming (legal / trademark).
  • Compliance-counsel ruling on FTI vs. self-attested income (track: #61).
  • Escalation-queue v1 shape (email + Mongo-row accepted for Phase 1; admin-dashboard integration scope for Phase 5 needs a design pass when agent portal is closer).
  • Native app timeline (affects on-device voice plan).
Pager
Previous pageOutage playbook
Next pageBuild plan

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.