Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Deferred architecture decisions ​

This page is the canonical home for architecture decisions that are correctly deferred today but worth documenting so future engineers (and auditors) can see we've thought about them. Pattern: keep Linear backlog focused on actively-actionable work; park "revisit when conditions X happen" decisions here.

For each deferred decision: current state, known limitations, trigger conditions to revisit, proposed migration plan + effort, cross-references.

Companion to: docs/data-sources/cms-dependency-map.md (same pattern, applied to CMS dependency posture).

How to use this page:

  • Reviewing this page quarterly is enough to catch shifts in trigger conditions
  • When a trigger fires, file a Linear issue for the migration work and reference the section here
  • New deferred decisions land here, not as perpetual Linear backlog issues

Rate-limiter storage: per-task in-memory → Redis (ElastiCache) ​

Source: ENG-286 audit M8. Originally filed as ENG-334 (cancelled 2026-05-14 — moved here).

Current state ​

Per-task in-memory Map<ip, timestamps[]> rate limiter in src/lib/agent-db.ts:136-195. Used by waitlist + agent-discovery + (post-ENG-321) every state-changing POST + every CMS-proxy route.

typescript
// Pattern (simplified):
const buckets = new Map<string, number[]>();
function checkRateLimit(ip: string, limit: number, windowMs: number): boolean {
  const now = Date.now();
  const timestamps = (buckets.get(ip) ?? []).filter(t => now - t < windowMs);
  if (timestamps.length >= limit) return false;
  timestamps.push(now);
  buckets.set(ip, timestamps);
  return true;
}

Known limitation: fuzzy cap ​

  • ECS service runs N tasks (currently 2 in prod)
  • Each task holds its own in-memory Map
  • Effective user-facing cap is N × configured cap — a user load-balanced across tasks gets up to N× the per-task throughput
  • Not a breakage; just means configured 30/5min is in practice up to 60/5min for a real user

ENG-321 explicitly documents acceptance of this fuzziness — for anti-scraping defense it's a speed bump, not a hard ceiling. Real scraper still hits a meaningful (even if fuzzy) cap.

Trigger conditions to revisit ​

Migrate to shared-state rate limiting when any of:

  • ECS scales to ≥4 tasks (effective cap drift ≥4x, abuse defense gets too loose)
  • Legitimate traffic grows to where N × per-task cap matters for legitimate UX (currently low volume; user portal milestone will change this)
  • Anti-scraping precision becomes a strategic requirement vs. "speed bump" deterrent
  • Specific abuse pattern observed that the fuzzy cap permits (e.g., scraper exploiting per-task state intentionally)

Proposed migration plan ​

Target: ElastiCache Redis cluster in VPC, shared across all ECS tasks for rate-limit state.

Scope:

  • Provision ElastiCache Redis cluster (cache.t4g.micro for start) in existing VPC subnet group
  • Wire security group: ECS task SG → Redis SG, port 6379
  • Add Redis connection string to Secrets Manager (prod/redis-rate-limit, staging/redis-rate-limit)
  • Refactor src/lib/agent-db.ts rate-limit logic to use Redis INCR + EXPIRE:
typescript
async function checkRateLimit(ip: string, route: string, limit: number, windowSec: number): Promise<boolean> {
  const key = `rl:${route}:${ip}`;
  const count = await redis.incr(key);
  if (count === 1) await redis.expire(key, windowSec);
  return count <= limit;
}
  • Update ECS task-def to inject Redis connection string env var
  • Add Redis health check to startup probes
  • Fall-open behavior: if Redis unreachable, allow request (log WARN with [rate-limit-degraded] marker per ENG-330 observability pattern)

Effort: ~4h (Terraform + code + verification)

Reversibility: trivial — revert code change keeps in-memory map; ElastiCache cluster can stay running for future use or be destroyed.

Related future opportunities (not part of the rate-limiter migration itself) ​

When Redis lands for rate limiting, two adjacent opportunities to consider (file separate Linear issues at that time):

  1. Marketing session storage — ENG-322's marketing_sessions Mongo collection could optionally move to Redis. Faster reads (~ms vs ~10-50ms), but Mongo with encryption-at-rest is sufficient for marketing-tier (non-PHI) data. Decide at migration time based on actual perf data.
  2. Distributed locks for delayed-job coordination — currently scheduler-coordinated; Redis-backed locks would enable finer-grained job orchestration. Phase 5 (user portal) work may need this.

Cross-references ​

  • src/lib/agent-db.ts:134 — current rate-limiter
  • ENG-321 — rate limits + Origin allowlist + test bypass (consumer of this rate-limiter today)
  • ENG-322 — session-cookie architecture (potential future co-tenant on Redis)
  • ENG-330 — graceful degradation + observability pattern (same [degraded] log marker pattern applies)
  • ENG-286 audit doc docs/audit/comprehensive-code-review-2026-05-12.md — finding M8

ECS task execution role: shared → per-task-def secret ARN scoping ​

Source: ENG-286 audit I16. Originally filed as ENG-333 (cancelled 2026-05-14 — moved here).

Current state ​

Prod ECS task execution role gets secretsmanager:GetSecretValue on every ARN in values(module.secrets.secret_arns) — broader than the per-task-def need.

hcl
# infra/envs/prod/ecs.tf:117
task_execution_secret_arns = values(module.secrets.secret_arns)

Known limitation: defense-in-depth gap ​

  • The TASK role (different from execution role) is correctly narrow
  • The EXECUTION role's job is to pull secrets at task startup and inject them as env vars
  • Execution role is invoked once per task spawn, never used by the running app
  • Even if an attacker compromised the execution role (highly unusual — startup-time credential, not runtime), they could pull secrets the task doesn't reference
  • BUT: the task definition's secrets_from_manager map limits which secrets actually get injected into the running task at runtime

So this is defense-in-depth: tightening the IAM grant matches the task-def need, but the gap doesn't change runtime behavior. Audit explicitly flagged this as Info-severity ("fine for current scale, refine when scaling").

Trigger conditions to revisit ​

Tighten to per-task-def secret ARN scoping when any of:

  • User portal milestone adds new task definitions (multiple task defs sharing one execution role = bigger blast radius if execution role compromised)
  • SOC 2 audit specifically flags least-privilege evidence requirements
  • General Terraform refactor sweeps the ECS module (fold this in for free)

Proposed migration plan ​

In infra/modules/ecs-service (or per-env config), build the list of secret ARNs from the actual task_definition.secrets_from_manager map rather than values(module.secrets.secret_arns). Each task def's execution role policy contains only the ARNs that task def references.

Effort: ~30min Terraform refactor + verification (IAM policy JSON diff pre/post)

Reversibility: trivial — revert the Terraform change.

Cross-references ​

  • infra/envs/prod/ecs.tf:117 — current task-execution-role secret ARN grant
  • ENG-286 audit doc docs/audit/comprehensive-code-review-2026-05-12.md — finding I16

Pattern: when does a decision belong here vs in Linear? ​

Belongs in Linear (actionable now, time-boxed, milestone-bound):

  • The work has an immediate acute pain it addresses
  • The work is part of an active milestone or sprint
  • The work has a definite "done" state achievable in the current cycle

Belongs here (deferred-pending-trigger):

  • No acute pain today
  • Specific trigger conditions exist that would change the calculus
  • Migration plan can be sketched but execution waits for the trigger
  • Auditor / future engineer benefits from documented thinking

When a trigger fires:

  1. File a new Linear issue
  2. Reference the relevant section here
  3. The Linear issue captures the execution; this doc captures the decision and trigger

This page is reviewed quarterly to catch shifts in trigger conditions and surface any items that have become actionable.

Pager
Previous pageMobile app strategy
Next pageSession cookie architecture

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.