Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Principles ​

Invariants that govern every Florence AI decision. Change these only by amending this document in a PR with explicit rationale.

1. Deterministic grounding — Florence never computes a fact ​

Florence is a natural-language interface, not a knowledge base. Every factual claim in a Florence response must trace to a tool call in the current turn. Enforced three ways:

  1. System prompt contract — the prompt explicitly forbids arithmetic, benefit derivation, or advisory claims without a backing tool result.
  2. Post-response grounding check — a cheap Haiku call after every assistant turn scans for factual claims and asserts each traces to a tool result ID in the same turn. Ungrounded claims → block + log + escalate.
  3. Hallucination dragnet in evals — CI eval regexes every number in every response against that turn's tool-result JSON. Unbacked number = failing test.

Consequence: model upgrades (Claude 4.7 → 5 → N) are safe, because knowledge lives in tools, not weights.

2. Text is the source of truth — voice is a UI affordance ​

Text transcripts are the legal record. Voice I/O (ASR on the way in, TTS on the way out) wraps the same text code path. We do not use integrated voice-to-voice models (OpenAI Realtime, Gemini Live, etc.) — they abstract away the tool loop, grounding check, and audit trail, exactly the things we cannot abstract.

3. Camouflage — raise the cost of fingerprinting ​

Competitors should have to work to know which model powers Florence. No "powered by Claude" badges. All model calls server-mediated. Tool-use blocks never stream to the client. Output runs through a style normalizer that enforces the Florence voice and strips model-family tells. Full detail in guardrails & camouflage.

4. Unit economics — committed targets ​

These are binding design goals, not aspirations. Architecture choices that break them require explicit review.

MetricTarget
Text turn (LLM + guardrails + grounding)≤ $0.005
Text conversation (~10 turns)≤ $0.05
Voice turn (ASR + LLM + TTS)≤ $0.03
Voice conversation (~5 min, ~15 turns)≤ $0.50
Per-member-per-month Florence cost≤ $0.50
LLM + voice + infra as % of PMPM revenue≤ 3 % at 10 k members, ≤ 2 % at 100 k+
Escalation-to-human rate≤ 5 %
First-token latency (text)≤ 500 ms
End-of-speech to first audio (voice)≤ 400 ms

Three moves make or break these targets:

  1. Prompt caching as a first-class design constraint. Fixed-order prompt structure; only the delta is fresh tokens. Target ≥ 85 % input-token cache-hit rate.
  2. Haiku-default model routing. ~85 % Haiku 4.5, ~14 % Sonnet 4.6, ~1 % Opus 4.7. Measured monthly; alert on drift.
  3. Tool-result caching with clear TTLs. Plan data, drug coverage, provider network cached per (input-hash) for minutes, not seconds.

Missing any one of these 10× cost at scale. See runtime for implementation.

5. Data classification is enforced in code, not policy ​

Every vendor integration is a typed adapter sink that declares the data classes it accepts. Routing FTI to HubSpot is a compile error, not a policy violation. Every stored field carries a compliance class; every MongoDB document is encrypted with the CMK for its class. Full detail in infrastructure/data-classification and applied here in tool surface.

6. Tool access is scoped by user context ​

Every tool call carries an auth context: anonymous | authenticated_member | authenticated_agent | authenticated_admin. Tools declare which contexts they accept. An anonymous user cannot invoke member-specific tools. An agent cannot invoke member-data tools for members not assigned to them. Enforced in the tool wrapper, not the prompt.

7. Evals are deployment gates ​

Florence's prompts, tool schemas, and model selections are code. Every change runs the eval suite. A > 2 % regression on any category blocks the merge. The bar is better than a licensed human health insurance agent on factual recall, appropriately deferential on advisory judgment, reliably escalatory on edge cases. See evals & observability.

8. Every turn is an audit record ​

Every Florence turn produces an immutable audit-log row: user identity, turn content (encrypted with the appropriate CMK), tools called, tool parameters, tool result summaries, model used, token counts, grounding-check outcome, any escalation, any PHI/FTI touched. Retention ≥ 6 years (HIPAA) or 10 years (EDE-safer). See evals & observability.

9. Member and agent — one runtime, two prompts ​

Florence serves both sides. Same FlorenceRuntime, same Claude Agent SDK wiring, same grounding check, same audit log. The delta is system prompt + tool surface + auth context. Agent-side Florence has her own tool registry (draft_sep_letter, list_my_assigned_members, compose_member_message) and her own eval set (PHI boundary tests, compliance-language accuracy).

10. Provider independence — portable by construction ​

Florence's core intelligence runs on a third-party LLM. The specific vendor is a commodity choice, not a differentiator. Every LLM call goes through a provider abstraction; tool schemas are model-neutral Zod with per-provider renderers; prompts have per-provider adaptation layers; evals run against the primary AND at least one warm-standby provider daily, not quarterly. A vendor switch is a config change plus a known-quantity quality delta, not a platform rewrite.

The four tiers of switch (same-vendor transport → model version → cross-vendor → self-hosted open-weight) and the risk register that justifies each live in provider risk & portability. Treat that document as binding; the enablers it requires (abstraction layer, warm-standby evals, adaptation prompts, kill-switch) are non-optional.

11. Florence ships after AWS migration + full deterministic flow ​

No Florence code runs in production until:

  1. AWS migration completes (#47)
  2. The full deterministic lead → enrollment → member-servicing flow is live for consumer + agent users

Research, architecture, eval harness, data-classification retrofit, and FlorenceRuntime spike can all run now, in parallel. Integration and launch wait. See roadmap.

Pager
Previous pageOverview
Next pageRuntime

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.