Tool surface

Florence interacts with the deterministic AskFlorence platform exclusively via LLM tool use. Every factual claim in a Florence response comes from a tool call; Florence does not compute or recall facts. The tool surface is therefore the single most important contract in the architecture.

This document defines the shape of that contract, the two tool families (api_* and ui_*), the auth/classification model, and the pattern every deterministic endpoint follows when it becomes a Florence tool.

Two families of tools

`api_*` — read / write against the deterministic platform

Server-side tools that invoke deterministic endpoints. Examples: api_search_plans, api_check_drug_coverage, api_get_member_plan_details. Result is a structured JSON payload that becomes part of Florence's conversation context.

`ui_*` — drive the frontend state

Server-side tools whose result is delivered to the client as a message over the SSE stream, triggering state changes in the UI. Examples: ui_set_plan_filter, ui_open_plan, ui_highlight_field, ui_start_intake_section. No direct deterministic-API call; the UI reflects Florence's intent.

ui_* tools are what make Florence a co-pilot rather than a chat widget. "Show me plans my kid's pediatrician takes" → Florence calls api_check_provider_network + ui_set_plan_filter + narrates. The user sees the filter change.

The tool definition contract

Every Florence tool is defined in one place with a uniform contract:

// src/lib/florence/tools/types.ts
type DataClass = "Public" | "PHI" | "PII" | "FTI" | "ApplicationPayload";
type AuthContext = "anonymous" | "authenticated_member" | "authenticated_agent" | "authenticated_admin";

interface FlorenceTool<Input, Output> {
  name: string;                         // e.g. "api_check_drug_coverage"
  description: string;                  // shown to the LLM
  inputSchema: ZodSchema<Input>;        // LLM-visible schema
  outputSchema: ZodSchema<Output>;      // runtime-validated

  // Classification contract
  acceptsAuthContexts: readonly AuthContext[];
  inputClass: DataClass;                // highest class an input field may carry
  outputClass: DataClass;               // highest class the output may carry

  // Execution
  execute(input: Input, ctx: ToolExecutionContext): Promise<Output>;

  // Observability
  cacheKey?(input: Input): string;      // opt-in result cache (see TTL rules)
  cacheTtlSeconds?: number;

  // Evolution
  version: string;                      // semver; tools are versioned (see lifecycle)
  status: "stable" | "beta" | "deprecated";
}

The ToolExecutionContext carries the auth context, the conversation ID, the turn ID, the member ID (if authenticated), and the parent Florence system role (member-mode vs. agent-mode). The wrapper enforces acceptsAuthContexts before the underlying endpoint is called — unauthorized tool calls are rejected at the tool layer, not the prompt.

The deterministic-integration pattern

Every Florence api_* tool wraps a deterministic endpoint. The pattern is uniform so that adding a new API (drug lookup, provider lookup, appointment booking, claims, renewal analysis, etc.) is a routine operation, not an architectural decision.

Steps every wrapper performs, in order

Input validation against the Zod schema. Malformed inputs from the LLM are rejected with a structured error the LLM can read and retry against.
Auth gate. Compare ctx.authContext against tool.acceptsAuthContexts. Deny unambiguously if mismatched. Log the denial.
Classification gate. If the tool's outputClass is FTI or ApplicationPayload, the destination adapter sink (see data classification) must allow that class. This is a compile-time check via branded types + a runtime belt.
Result cache check. If the tool declares a cacheKey and a fresh result exists within cacheTtlSeconds, return the cached value. Plan data, drug coverage, and provider network are prime cache candidates. Member-specific tools generally are not.
Execute the underlying deterministic endpoint. Standard retry policy (3 attempts, exponential backoff, circuit breaker at 5 consecutive failures).
Output validation against the Zod schema. Deterministic responses are strict — a schema mismatch is a deployment bug, not a user-visible error.
Serialize into a compact, LLM-friendly shape. See serialization below.
Audit emit. Every tool call produces an audit-log row (tool name + version, input hash, output summary hash, auth context, auth decision, cache hit/miss, latency, errors).

Wrappers live in `src/lib/florence/tools/`

src/lib/florence/tools/
  index.ts              — exports the tool registry
  types.ts              — FlorenceTool, ToolExecutionContext, DataClass, AuthContext
  registry.ts           — imports every tool module; builds the registry
  helpers/
    execute.ts          — the uniform wrapper (auth, classification, cache, audit)
    cache.ts            — tool-result cache
    serializer.ts       — serialization helpers
  api/                  — one file per api_* tool
    search-plans.ts
    check-drug-coverage.ts
    check-provider-network.ts
    get-member-plan-details.ts
    ...
  ui/                   — one file per ui_* tool
    set-plan-filter.ts
    open-plan.ts
    ...

One tool per file. Adding a tool is a single PR that touches: the tool file, the registry, the tool-registry doc, and the eval set. See adding a tool.

Serialization

LLM context is expensive; serialized outputs should be compact, well-typed, and self-describing.

Prefer stable field names over positional arrays: { premium: 234.50, deductible: 1500 } not [234.50, 1500].
Include a _meta.tool field with the tool name and version in every serialized result, so the grounding check can tie factual claims back to a specific call.
Include a _meta.callId UUID so audit logs and grounding checks can cross-reference. This UUID is how Florence "cites" her claims internally.
Truncate wisely: a 50-plan list returned as full detail is ~30 k tokens. A 50-plan list returned as { planId, issuer, metal, premiumMonthly, rating, topBenefits } is ~3 k tokens. Florence asks for detail on a specific plan via a follow-up tool call.
Never include raw PII/FTI fields that Florence does not need. A member-profile tool returns medication-and-provider context to Florence; it does not return SSN or DOB just because the record has them.

The current tool surface

See tool registry for the living inventory. At the time of this writing:

Stable: api_search_plans (wraps /api/plans), api_check_eligibility (wraps /api/eligibility)
Coming soon (Phase C, #17): api_check_drug_coverage
Coming soon (Phase D, #18): api_check_provider_network
Coming soon (Phase 5 member auth): api_get_member_profile, api_get_member_plan_details, api_initiate_sep_workflow
Coming soon (Phase 5 agent auth): api_list_my_assigned_members, api_get_member_full_history, api_draft_sep_letter, api_compose_member_message, api_assign_escalation
Always present: ui_set_plan_filter, ui_open_plan, ui_highlight_field, ui_start_intake_section, ui_show_comparison
Escalation: api_escalate_to_human (v1: email + Mongo row; Phase 5: admin-dashboard queue)

SBE vs FFM data lookup — which backend each tool hits per state

Read this before touching any coverage tool. The deterministic endpoints behind Florence's coverage tools do NOT all use the same data source. The backend is chosen by the plan's state, not by plan-ID format (NY uses 14-char FFM-format IDs; CA uses 16-char — only state disambiguates). The canonical fork is isOwnedDataState(state) in @askflorence/shared (today: NY + CA). See docs/data-sources/sbe-state-watchouts.md for the full per-state decision log.

Two data planes:

FFM states (~30) — CMS Marketplace API is authoritative for plans, drug coverage, and provider coverage. Provider identity = NPPES NPI.
Owned-data states (NY, CA) — CMS does not index these plans. We serve them from our own collections via getReferenceDb() (staging cluster over PrivateLink, ADR 0004). Drug coverage from formularies_staging; provider coverage from providers_staging. CA provider identity = Symphony providerId, NOT NPPES NPI (the CalHEERS/Symphony directory does not expose NPI — external_ids.npi is null on every CA provider doc).

What this means for each tool (verified on apex 2026-05-28):

Tool	Endpoint	Lookup key	FFM	NY	CA
`find_plans`	`/api/plans`	`state`	✅ CMS	✅ owned	✅ owned
`check_drug`	`/api/drugs/find`	rxcui + plan_id against `formularies_staging`	✅	✅	✅
`find_doctors` (suggest)	`/api/providers/suggest` (passes `state`)	zip + Symphony for CA	✅ NPPES	✅	✅ Symphony
`check_provider`	`/api/providers/autocomplete` (NPPES) → `/api/providers/covered`	NPPES NPI	✅	✅	❌ see below

Why check_drug is state-agnostic and just works: drugs are identified by RxCUI, a single national identifier. /api/drugs/find queries formularies_staging by (rxcui, plan_id) regardless of state — and CA drugs were ingested there (ENG-395). Search AND coverage use the same key, so it's fully end-to-end for every state. No state param needed.

Why check_provider is broken for CA (known gap, [ENG-410]): the discovery step (/api/providers/autocomplete) hits NPPES and returns an NPPES NPI. But CA provider data in providers_staging is keyed by Symphony providerId (_id: "ca-sym:<id>", npi: null). There is no NPPES↔Symphony bridge, so /api/providers/covered finds no match → NotCovered for every CA doctor a user names, even in-network ones. coverAcrossAllPlans() also does not thread state today. find_doctors/suggest is unaffected — it discovers via Symphony, so its providerIds match. Until ENG-410 lands (route CA provider discovery through /api/providers/search-ca → Symphony providerIds), prefer the suggest path for CA provider questions and do not trust a check_provider "not covered" for a CA plan.

Rule for any new or modified coverage tool:

Thread the user's state (from store.location.state) into the deterministic call. Owned-data routes need it to dispatch off CMS.
Know your identifier: national (RxCUI) → state-agnostic; provider (NPI vs Symphony providerId) → state-specific, and discovery + coverage MUST use the same ID system or they won't join.
If you add a state to OWNED_DATA_STATES, re-verify every coverage tool against a real plan from that state before announcing the tool as working there.

Versioning

Tools are versioned because they will evolve.

Minor versions (additive fields, relaxed input validation): no breaking change, no eval reset, old calls still valid.
Major versions (input rename, output shape change, behavior change): register as api_foo_v2, keep api_foo available during deprecation, run both through evals, retire the old version after the next successful audit window.

The registry tracks version + status per tool. The system prompt includes only stable + beta tools; deprecated tools are still callable (for in-flight conversations) but not announced to the model.

Extensibility — tool surface grows with the platform

The integration pattern is deliberately uniform so that drug lookup, provider lookup, appointment booking, renewal analysis, claims, bills, and anything else the deterministic platform grows to support can be exposed to Florence as routine work:

Build the deterministic endpoint as usual.
Write a tool wrapper in src/lib/florence/tools/api/ declaring input, output, classification, auth contexts.
Register it in registry.ts.
Add it to the tool registry doc.
Write eval coverage (at minimum: three factual cases, one adversarial, one auth-boundary).
Ship behind a feature flag; monitor the cost + latency + routing-mix impact for a week; graduate to stable.

Full playbook: adding a tool.