Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Florence AI - the voice-synced rendered WOW flow ​

Research + a runnable local demo. Linear ENG-356. Branch eng-356-florence-wow-flow-research. No em-dashes anywhere per the CLAUDE.md hard rule (hyphens only). Brand: Florence AI / AskFlorence AI per index.

0. One paragraph ​

A member talks to Florence by voice. The screen composes each beat in lockstep with her voice (a Clicky-style rendered, guided experience, not a chat): the hook, then ZIP / household / income collected conversationally, then the real subsidized plan revealed with the sticker-to-subsidized strike, then optional doctor / Rx coverage, then onto the waitlist for the plan they pick. Florence orchestrates only. Every dollar, subsidy, deductible, and coverage answer comes verbatim from the existing deterministic AskFlorence pipeline (the same fetchPlansForHousehold + /api/* the home calculator uses) - she never computes a number. ElevenLabs Conversational AI is the ears + turn-taking

  • voice; our deterministic client tools are the only source of facts.

1. Clicky distilled (from the actual farzaa/clicky code) ​

Concrete, named patterns - not vague praise. Read: worker/src/index.ts, ElevenLabsTTSClient.swift, CompanionManager.swift, CompanionResponseOverlay.swift, CompanionPanelView.swift, BuddyDictationManager.swift, AssemblyAIStreamingTranscriptionProvider.swift, OverlayWindow.swift, DesignSystem.swift.

1a. Their voice-loop architecture ​

Clicky is a macOS menu-bar companion. The loop:

  1. Push-to-talk (Control+Option) starts mic capture (BuddyDictationManager).
  2. Audio streams over a websocket to AssemblyAI v3 streaming for ASR; partial transcripts compose live (storedTurnTranscriptsByOrder -> accumulated text).
  3. On release, the final transcript + a screenshot go to Claude (api.anthropic.com/v1/messages, streaming SSE) via a Cloudflare Worker that holds the keys (/chat, /tts, /transcribe-token).
  4. Claude's reply (point-tag stripped) is spoken via ElevenLabs TTS (/v1/text-to-speech/{voiceId}, model_id: eleven_flash_v2_5, streaming, AVAudioPlayer).
  5. Barge-in: a new push-to-talk cancels the in-flight response task and stopPlayback() immediately.

The load-bearing fact for us: Clicky uses ElevenLabs as TTS only. Claude is the brain; AssemblyAI is the ears. This 1:1 matches our documented three-stream voice architecture. It proves the separation works; we go one better by using ElevenLabs Conversational AI (its own integrated ASR + turn-taking) for a far higher conversational bar than Clicky's push-to-talk.

1b. Their rendering craft (the part the founder is pointing at) ​

Clicky pattern (code)EffectOur editorial translation (shipped)
Cursor-glued non-activating overlay, 60fps, never steals focus (CompanionResponseOverlay)A presence on the canvas, not a windowFlorencePresence: a lantern mark on cream with a gold glow, anchored to the active scene
[POINT:x,y:label] -> triangle flies to + points at the exact element; bezier arc, smoothstep ease, tangent rotation, scale pulse at apex, glow intensifies in flight (OverlayWindow 495-568)The AI directs attention as it talksOn-canvas spotlight: the named element gets a gold focus ring + the rest dims; spotlight target is part of the Scene state
Visible voice state: dot color + status text + live audio-power waveform; RMS boosted 10.2x, decay max(level, prev*0.72), 70ms sample (CompanionManager, BuddyDictationManager 687-734)User always knows what it is doingFlorencePresence 4 states (greeting / listening / thinking / speaking); the listening/speaking ring scale is driven by getInputVolume/getOutputVolume with the same 0.72 decay + 70ms cadence
Streaming narration near the cursor, fades after readNarration rendered where attention isCaption ribbon bound to the agent transcript
Latency hiding: hold the processing/spinner state until TTS audio truly starts (sendTranscriptToClaudeWithScreenshot)No dead air without a visual signalThe searching beat (gold pulse) holds until the find_plans tool result, reusing the proven home cinematic
Onboarding self-reveal (welcome anim -> intro -> at 40s the buddy demos itself -> char-streamed prompt)The product teaches itself, cinematicallyFlorence's hook IS the self-reveal; the first scene composes as she speaks it
Spring 0.2-0.4s / 0.6 damping, char-stream 30-60ms, fade 0.4s, cursor blue #3380FF (DesignSystem)Alive, not mechanicalReused as the home register's cubic-bezier(0.16,1,0.3,1) 520ms stagger + 1400ms gold pulse; color stays our gold-2 #B8903F on cream
Conversation history capped at 10; point-tag stripped before TTSClean spoken text, bounded contextAgent transcript is the record; a grounding dragnet flags any spoken number absent from a tool result

2. Voice vendor evaluation: Cartesia vs ElevenLabs ​

This section is the written research, decoupled from the demo (the demo is ElevenLabs per the founder decision). The load-bearing factor is BAA: members speak medications, doctor names, and conditions into the mic, so the voice vendor processes PHI-adjacent audio AND stores a transcript (which IS PHI).

AxisElevenLabsCartesia
Product fitConversational AI (Agents): integrated ASR + turn-taking model + barge-in + TTS + tools + per-conversation transcripts. One vendor, one socket. This is what the founder validated on elevenlabs.io.Sonic-2 TTS is best-in-class for latency (~75ms first audio) but Cartesia is TTS-first; ASR + turn-taking + agent orchestration are not a single integrated product to the same degree. You assemble the loop yourself (Deepgram ASR + your turn logic + Sonic TTS).
Time-to-first-audioLow; turn-taking model hides latency well in practiceSonic-2 ~75ms first audio (the strongest single number); but end-to-end depends on the ASR + LLM you bolt on
NaturalnessVery high; large voice library; the "Sarah" voice reads as a reassuring professionalVery high; fewer voices; excellent prosody
Turn-taking / barge-inNative in Conversational AI (their turn model + interruption handling). This is the hard part and it is solved for you.You build it (VAD + endpointing). More control, more work, more risk for an overnight bar.
Browser SDK + micFirst-class: @elevenlabs/react useConversation + WebRTC, mic + playback + barge-in handledBrowser TTS SDK exists; full conversational loop in-browser is more assembly
STT sideTheir ASR inside Conversational AI (no separate STT vendor needed). Standalone "Scribe" model also exists.No first-party real-time ASR at parity; pairs with Deepgram Nova-3 (the voice.md Phase 1.5 ASR pick, has a BAA)
Cost at expected volumeConversational AI priced per minute; higher than raw TTS; acceptable at pre-scale, watch at 100k members against the unit-economics targetsSonic TTS cheaper per minute; total cost depends on the ASR you add
HIPAA / BAAThe decision gate. ElevenLabs offers a BAA on enterprise/scale tiers (not the default self-serve tier). Confirm: (a) does the BAA cover Conversational AI specifically (ASR audio + the hosted LLM turn) or only TTS; (b) the transcript store. Until a signed BAA explicitly covering Conversational AI + transcript storage is in hand, ElevenLabs Conversational AI is demo-acceptable but NOT production-acceptable for real member PHI.Cartesia: confirm BAA on TTS. Because you would pair it with Deepgram (BAA: yes) + our own LLM, the PHI surface is more decomposable and each subprocessor's BAA is individually known. Easier to reason about for production.
Transcript = audit AND PHIElevenLabs stores a transcript of every conversation (a free audit trail - good). But a transcript of a member speaking meds/doctors/conditions IS PHI. Must resolve before production: where does that transcript live (ElevenLabs side), retention + deletion controls, and is it inside the ElevenLabs BAA. The audit-trail benefit only counts if that store is BAA-covered OR we disable/redirect it to our own BAA-covered storage (Mongo florence_* per runtime.md).Same question, smaller blast radius: with the decomposed stack the transcript can be produced + stored by OUR runtime (text-as-source-of-truth, voice.md) inside our existing BAA boundary, rather than vendor-side.

Recommendation ​

  • Demo (now): ElevenLabs Conversational AI. Fastest path to the WOW bar, founder-validated, the turn-taking is solved. Acceptable because the demo uses synthetic data only; no real member PHI.
  • Production: do NOT let the demo's vendor choice imply the production vendor. Two viable production shapes, decided by which BAA lands cleanest:
    1. ElevenLabs Conversational AI with a signed BAA that explicitly covers Conversational AI (ASR + transcript store) + bring-your-own LLM pointed at our Bedrock Claude so the reasoning + grounding stay ours. Transcript retention/deletion contractually pinned or disabled in favor of our Mongo store.
    2. Decomposed: Deepgram Nova-3 ASR (BAA) + our Bedrock Claude + Cartesia Sonic-2 TTS (BAA), our runtime owns the transcript inside the existing AWS BAA boundary. More wiring, cleanest compliance story, matches voice.md Phase 1.5 exactly.
  • Tie-in to vendor-BAA discipline (#57): add ElevenLabs (Conversational AI tier + transcript store) and Cartesia to the vendor register with BAA status = OPEN; a vendor that will not sign a BAA covering the conversational + transcript surface is disqualifying for production even though it is fine for this demo.

3. The proposed WOW flow (what shipped in the demo) ​

Persona: the warmest, most expert, most genuinely helpful health insurance guide alive - a sharp friend who is the best agent in the country. Hook (spoken first, no tool):

"Turns out good, affordable healthcare in America is real. It is just hidden from the people who need it. I am Florence. Plans here can start at zero to seven dollars a month, many with no deductible and strong coverage. The same plans run close to a thousand dollars a month on healthcare dot gov, and usually only show up through a broker. Tell me three quick things and I will pull real plans with the subsidy already applied. No social security number, no spam, just real numbers. First, what is your ZIP code?"

BeatScene composesTool (deterministic)Spotlight
greetinghook + presence bloomnonepresence
collect_zipZIP prompt; location chip fillscollect_location -> GET /api/counties (multi-county / SBE / PO-box edges handled)zip chip
collect_householdwho + agesset_householdhousehold chip
collect_incomerough yearlyset_incomeincome chip
searchinggold pulse, holds for audio + resultfind_plans -> fetchPlansForHousehold (the exact shared pipeline = /api/eligibility + /api/plans)center
revealPriceReveal strike: sticker -> subsidized(from find_plans)price
plans3-col PlanCard micro stagger(from find_plans)top plan
coverageper-plan coverage pills lightcheck_provider / check_drug -> `/api/providersdrugs`
waitlisthold-your-spot cardselect_plan then join_waitlist -> POST /api/waitlist (interest:"plan_interest", source_page:"florence_voice")waitlist
donewarm close + what's nextnonepresence

Voice state machine (UX layer; ElevenLabs owns transport turn-taking): IDLE -> GREETING -> LISTENING -> THINKING(tool) -> SPEAKING -> LISTENING ... -> WAITLISTED. Barge-in cancels scene + spotlight.

Edges handled: mis-heard / invalid ZIP (re-ask), multi-county ZIP (ask which), SBE state (honest stop, name the state marketplace, no fabricated numbers), PO-box / unsupported ZIP (suggest nearby county), no plans (honest), provider/drug not found (re-ask), bad email (re-ask), tool error (Florence says she could not pull it + offers retry; never fabricates), mic denied (typed fallback on the dedicated page).

Grounding (the architectural invariant): the agent prompt forbids stating any number not in a tool result this turn; a client-side dragnet scans every salient number Florence speaks against the tool-result numbers and soft-flags ungrounded ones (logged + surfaced in dev). Production uses the Haiku grounding pass per principles #1.

4. Architecture - demo vs production ​

Both keep every number byte-for-byte ours. They differ only in where the conversational LLM runs.

  • Demo (shipped): ElevenLabs Conversational AI + their hosted LLM (gemini-2.0-flash, orchestration only) + our deterministic CLIENT tools. The agent cannot state a number; it must call client tools that run in the browser and hit our same-origin /api/*. Zero tunnel, fully local, lowest latency, most robust, numbers 100% deterministic. Client-tool calls double as the Scene Director's timing events.
  • Production (recommended, seam shipped, off the demo path): bring-your-own-LLM -> our Bedrock Claude via the OpenAI-shaped shim at /api/florence/byo-llm (uses @anthropic-ai/bedrock-sdk, FLORENCE_BEDROCK_MODEL_ID default us.anthropic.claude-sonnet-4-6, flips to us.anthropic.claude-opus-4-7 by one env var once Bedrock model-access for opus-4-7 is granted on the mgmt account). Requires ElevenLabs cloud to reach our endpoint (deploy or tunnel) - that is why it is the production path, decoupled from the local demo.

Why this resolves the comment-6 fork: ElevenLabs Agents brings its own LLM, but the agent is contractually + structurally barred from being the source of any fact - the only way it can say a price is to call our tool and read back exactly what the deterministic pipeline returned. Audio + transcript transit ElevenLabs in BOTH wirings, so the BAA question is identical either way and is the production gate, not a demo blocker.

Files (post-M0 monorepo): apps/web/src/lib/florence/* (agent prompt + provision, scene steps + director, deterministic client tools, Bedrock seam), apps/web/src/app/api/florence/* (agent-session, flag, byo-llm), apps/web/src/components/florence/* (the shared hook + presence + the two dedicated experiences + launcher), apps/web/src/app/florence/* (flag-gated page + CSS). Server flag FLORENCE_WOW_DEMO_ENABLED (plain server env, default off, mirrors session-flag.ts).

Two dedicated experiences, one brain: desktop (1496x756, side presence + on-canvas spotlight + dim, 3-col stagger) and a separate purpose-built mobile-native tree (390x844, full-bleed beat scenes, persistent presence, 56px thumb CTA, safe-area, progressive disclosure). Selected by a real device decision (UA hint + matchMedia swap), not a CSS reflow.

5. Running the demo ​

Local only. No deploy, no staging.

# from the worktree
cd ~/Developer/ask-florence-eng-356-florence-wow-flow-research/apps/web
PORT=3056 npx next dev
# open http://localhost:3056/florence  (grant microphone)
# resize to <=600px wide to load the dedicated mobile tree, or
#   /florence?device=mobile  /florence?device=desktop  to force a tree

The flag FLORENCE_WOW_DEMO_ENABLED=enabled and ELEVENLABS_API_KEY live in the canonical .env.local (gitignored, symlinked into the worktree). Flag off -> /florence 404s and the launcher does not mount.

Real vs simulated (honest) ​

Verified live in the sandboxHow
ElevenLabs Conversational AI integration end to endPOST /api/florence/agent-session provisions a real agent + mints a real WebRTC conversation token with the founder's key (agent AskFlorence WOW (ENG-356), voice Sarah). The voice brain is real, not stubbed.
Both dedicated experiences render on-registerDesktop intro pixel-centered at 1496x756 (DOM geometry + screenshot), mobile a separate tree at 390x844 with a 56px thumb CTA (screenshot). Zero console errors. tsc clean across apps/web.
The full voice-synced rendered arc + groundingA network-free Scene Director proof runs the entire golden arc (greeting -> ... -> done) and confirms the grounding dragnet flags ONLY a fabricated $873, never the real tool-sourced $0 / $1,051.30.
Flag gate/api/florence/flag -> {enabled:true}; the 404-when-off path is the notFound() guard.
NOT live-captured hereWhy
The spoken golden-scenario numbers (84094 -> Salt Lake UT -> Medicaid -> ~$1,041 APTC -> Tyler Wood -> Ozempic narrowing)This sandbox firewalls outbound TCP 27017, so MongoDB Atlas is unreachable. This breaks the ENTIRE app's data layer here (the home calculator, /plans, every /api/* that hits getDb()), not anything Florence-specific. The Florence layer delegates 100% to the unchanged shared pipeline and recomputes nothing, so on any Atlas-reachable machine (the founder's local, prod) the real byte-for-byte numbers flow. The Atlas CLI is authed but allowlisting cannot fix an egress-port block.
The live spoken loop (mic in, Florence voice out)A headless preview cannot grant a real microphone, run WebRTC, or play audio. Founder: open http://localhost:3056/florence on your machine (Atlas-reachable), grant the mic, and speak the golden scenario; the spoken numbers will be the live deterministic values.

6. Open decisions / asks for the founder ​

  1. Bedrock Opus 4.7 access. Verified working on the askflorence-mgmt profile: Sonnet 4.6 / Opus 4.6 / Opus 4.5 / Haiku 4.5. Opus 4.7 is an ACTIVE inference profile but anthropic.claude-opus-4-7 is not yet granted (AccessDeniedException). Enable model-access in the Bedrock console and the production BYO-LLM flips to Opus 4.7 with one env var.
  2. Production voice-vendor + BAA decision (section 2). Demo-acceptable != production-signable. Needs: a signed BAA explicitly covering ElevenLabs Conversational AI (ASR + transcript store) OR a decision to go decomposed (Deepgram + Bedrock + Cartesia). Add both vendors to the #57 register, BAA status OPEN.
  3. Transcript-as-PHI (section 2). Decide: contractually pin ElevenLabs transcript retention/deletion inside a BAA, or disable vendor-side transcripts and persist only to our Mongo florence_* store (text-as-source-of-truth). Production blocker, not a demo one.
  4. Prod brain-wiring = bring-your-own-LLM -> Bedrock Claude + server tools (seam shipped at /api/florence/byo-llm). Confirm direction.
  5. Integrated launcher placement. FlorenceLauncher is built + self- gating but intentionally NOT mounted on home / /plans / plan-detail in this PR (avoid regression risk on conversion-critical pages during an unattended build). Say where to mount it.
Pager
Next pageHome

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.