Skip to main content

Design Doc

Bedrock Access Gateway – Design Spec

Doc status: Draft v1.0
Owner: Miguel Merlin
Date: Aug 12, 2025
Reviewers:


1) Overview

Provide Bedrock access to internal IAM users via a controlled gateway that works from developer IDEs. The system exposes a stable HTTP API (OpenAI‑style), enforces per‑user throttling and daily token budgets, and includes multiple kill‑switches.

Primary goals

  • Simple IDE integration via HTTPS + API key

  • Per‑user TPS throttling and daily output token cap

  • Hard and soft kill‑switches

  • Centralized audit, cost visibility, and guardrails

Non‑goals

  • Expose IAM credentials to users’ machines

  • Provide direct Bedrock access from user principals

  • Replace existing enterprise identity provider (SSO remains unchanged)


2) Requirements

2.1 Functional

  1. Users can call /v1/chat and /v1/complete from IDEs.

  2. Requests authenticated via API Key; no IAM credentials on client.

  3. Enforce rate limits (TPS/burst) per user (or per team plan) at the edge.

  4. Enforce daily output token budget per user; reject after budget exceeded.

  5. Support streaming responses (SSE) for chat/completions.

  6. Allow list of Bedrock models; requests to others are rejected.

  7. Provide usage/remaining quota endpoint.

  8. Emit structured logs and metrics for cost and auditing.

2.2 Operational

  1. Soft kill: feature flag to disable traffic quickly.

  2. Hard kill: SCP denying Bedrock; model access toggle in Bedrock.

  3. Config changes (caps, models, plans) without redeploy.

  4. SLO: 99.9% monthly availability for gateway.

  5. P90 end‑to‑end latency target: ≤ 1.5s for 2‑KB prompts, non‑streaming.

2.3 Security & Compliance

  • Users cannot directly call Bedrock; only the gateway’s IAM role can.

  • TLS 1.2+, HSTS, no PII in logs by default; opt‑in redaction allowlist.

  • Least‑privilege IAM and VPC endpoints for Bedrock where applicable.


3) High‑Level Architecture

IDE/CLI ──HTTPS──> API Gateway ──(authorizer)──> Lambda Proxy ──> Bedrock
                               │                 └─> DynamoDB (usage)
                               └─> Usage Plans/API Keys

Control plane: SSM Parameter Store (feature flags), DynamoDB (config/limits)
Kill‑switches: SSM flag (soft), Org SCP + Bedrock model access (hard)
Edge safety: AWS WAF (optional)

Key Components

  • API Gateway (HTTP/REST): Routing, API Keys, Usage Plans, throttling.

  • Lambda Authorizer: Validates API key, checks feature flag, reads per‑user caps, pre‑checks remaining daily tokens (estimated).

  • Lambda Proxy: Validates payload, calls Bedrock, streams results, updates usage counters from actual token usage.

  • DynamoDB: Token metering per user/day; user profiles; config.

  • SSM Parameter Store: Feature flags + global toggles.

  • Organizations SCP: Bedrock deny policy for emergency.

  • WAF (optional): Rate‑based block/allow rules.


4) Data Model

4.1 DynamoDB Tables

**Table: **``

  • PK: userId (S)

  • SK: date (S, YYYY-MM-DD)

  • Attributes: inputTokensTotal (N), outputTokensTotal (N), lastUpdated (S RFC3339), version (N)

  • TTL (optional): expireAt (N, unix epoch) for 90‑day retention

**Table: **``

  • PK: userId (S)

  • Attributes: apiKeyId (S), planId (S), dailyOutputCap (N), maxTokensPerCall (N), allowedModels (SS), status (S: ACTIVE|SUSPENDED)

**Table: **``

  • PK: configId (S)

  • Attributes: defaultDailyOutputCap (N), defaultMaxTokensPerCall (N), allowedModels (SS), plans (M: rate, burst)

4.2 SSM Parameters

  • /app/bedrock/enabled = true|false (global soft kill‑switch)


5) API Design

5.1 Authentication

  • Header: x-api-key: <key> (API Gateway API Key)

  • Authorizer attaches principalId = userId into request context.

5.2 Endpoints (JSON)

POST /v1/chat

Request:

{
  "model": "anthropic.claude-3-5-sonnet-20240620-v1:0",
  "messages": [
    {"role": "user", "content": "Explain reservoir sampling."}
  ],
  "max_tokens": 400,
  "temperature": 0.2,
  "stream": false
}

Response (non‑stream):

{
  "id": "chatcmpl_...",
  "model": "...",
  "created": 1723456789,
  "usage": {"input_tokens": 123, "output_tokens": 278},
  "choices": [
    {"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}
  ]
}

Response (stream): text/event-stream with data: {"delta": "..."} frames; final frame includes usage.

POST /v1/complete

Simple prompt → completion variant; same usage object.

GET /v1/models

Returns allowed model IDs.

GET /v1/usage/today

Response:

{
  "date": "2025-08-12",
  "output_tokens_used": 9217,
  "output_tokens_cap": 50000,
  "remaining": 40783
}

Error Format

{ "error": { "code": "DAILY_CAP_EXCEEDED", "message": "Reached daily token cap." } }

HTTP Codes: 200, 400 (validation), 401/403 (auth/quota), 429 (rate limit), 5xx (upstream/internal).


6) Throttling & Quotas

6.1 Edge TPS (API Gateway Usage Plans)

Plans:

  • Standard: rate=2 RPS, burst=10

  • Power: rate=5 RPS, burst=20 Attach one or more API keys per plan. Optionally add API Gateway request quota per day as a coarse control.

6.2 Token Budgets (Authorizer + Proxy)

  • Pre‑check (Authorizer): estimate estOut = min(request.max_tokens, user.maxTokensPerCall); if used + estOut > dailyCap, deny.

  • Post‑accounting (Proxy): parse Bedrock usage and ADD to BedrockTokenUsage atomically:

    • SET outputTokensTotal = if_not_exists(outputTokensTotal, :zero) + :out

    • SET inputTokensTotal = if_not_exists(inputTokensTotal, :zero) + :in

6.3 Kill‑switches

  • Soft: /app/bedrock/enabled=false ⇒ Authorizer denies all.

  • Hard: Org SCP denying bedrock:*; or revoke model access in Bedrock.


7) Request Flow (Sequence)

  1. IDE sends request with x-api-key → API Gateway.

  2. API Gateway authenticates API key and enforces plan TPS/burst.

  3. Lambda Authorizer executes:

    • Check SSM flag enabled; check GatewayUsers.status.

    • Load user profile & todays usage; pre‑check against cap.

    • Return Allow with context {userId, planId} or Deny.

  4. Lambda Proxy validates payload, ensures model is in allowlist, clamps max_tokens.

  5. Call Bedrock Converse/Invoke; stream or buffer output back to client.

  6. On completion, parse usage and update BedrockTokenUsage.

  7. Emit metrics and structured logs; return final body.


8) IAM & Network

8.1 Execution Role (Proxy Lambda)

Permissions:

  • bedrock:InvokeModel*, bedrock:Converse* (scoped to allowed models)

  • dynamodb:GetItem/UpdateItem on both tables

  • ssm:GetParameter on /app/bedrock/enabled

  • CloudWatch Logs

8.2 End‑User Principals

  • Explicit Deny policy for bedrock:* to prevent bypass.

  • Allow invoke of API Gateway only via API Keys (no SigV4 from clients).

8.3 Network

  • Private VPC Endpoint to Bedrock (where available); add IAM condition aws:SourceVpce on the Lambda role.


9) Validation & Limits

  • Payload size: limit prompt/messages total ≤ 64 KB (configurable).

  • **Per‑call **``: clamp to user or global max (e.g., 1024).

  • Allowed models: configured in GatewayConfig; reject others.

  • Timeouts: Lambda 30s–60s; API GW 29s for non‑stream; for streaming use integration with Lambda function URLs or REST API w/ chunked responses.


10) Observability

10.1 Metrics (CloudWatch, with EMF)

  • Requests: count, 4xx, 5xx, latency p50/p90/p99

  • Bedrock call duration and errors by model

  • Tokens: input/output per user/day; top N users

  • Rejections: DAILY_CAP_EXCEEDED, MODEL_NOT_ALLOWED, DISABLED_FLAG

10.2 Logs

  • Correlation id (request id) end‑to‑end

  • Redact message content by default (toggle for debugging)

10.3 Alerts (SNS/Slack)

  • 5xx rate > 2% over 5m

  • DISABLED_FLAG set to false (notify)

  • User at 80% and 100% of daily cap (optional DM)

  • Cost anomaly (tokens/day spike)


11) Security Considerations

  • No API key leakage in logs; rotate keys quarterly.

  • Optional HMAC sidecar token per request (defense in depth).

  • WAF geo/IP allowlist if necessary.

  • Content guardrails (Bedrock Guardrails) for policy compliance.


12) Deployment & IaC

  • CDK (TypeScript/Java) project containing:

    • API Gateway (routes, models), Usage Plans, API Keys (seed via script)

    • Lambdas (Authorizer + Proxy) with env vars for table names/params

    • DynamoDB tables, autoscaling RCU/WCU

    • SSM params with defaults

    • Optional WAF association

  • Environments: dev, staging, prod; feature flags per env

  • CI/CD: GitHub Actions to synth/deploy; unit + integration tests


13) Runbooks

13.1 Emergency Shutdown

  1. Set /app/bedrock/enabled=false (Immediate soft stop).

  2. If not sufficient, attach Org SCP denying bedrock:* to account/OU.

  3. Optionally disable Bedrock model access in console.

13.2 Raise/Lower User Cap

  • Update GatewayUsers.dailyOutputCap; change takes effect immediately.

13.3 Key Rotation

  • Create new API key, map to user, notify; revoke old key after grace period.

13.4 Hotspot/Abuse

  • Move user to stricter Usage Plan or suspend user; add WAF rule.


14) Testing Strategy

  • Unit tests: payload validation, authorizer cap math, DDB updates

  • Contract tests: /v1/chat success/429/403/400 cases

  • Load tests: confirm API GW TPS enforcement; backpressure behavior

  • Chaos: Bedrock 5xx/latency injections; ensure graceful degradation

  • Security: key rotation test; WAF rule efficacy


15) Cost Model (rough)

  • API Gateway requests (per million)

  • Lambda GB‑seconds (authorizer + proxy)

  • DynamoDB RCUs/WCUs proportional to calls (two writes per request worst‑case)

  • Bedrock tokens (dominant cost) – tracked via usage metrics

Optimizations: batch writes (streaming buffer), on‑demand → provisioned with autoscaling, aggregate counters with periodic compaction.


17) Open Questions

  • Do we need per‑team budgets in addition to per‑user?

  • Retention for logs and token usage (90 vs 180 days)?