Design Doc

Bedrock Access Gateway – Design Spec

Doc status: Draft v1.0
Owner: Miguel Merlin
Date: Aug 12, 2025
Reviewers:

1) Overview

Provide Bedrock access to internal IAM users via a controlled gateway that works from developer IDEs. The system exposes a stable HTTP API (OpenAI‑style), enforces per‑user throttling and daily token budgets, and includes multiple kill‑switches.

Primary goals

Simple IDE integration via HTTPS + API key

Per‑user TPS throttling and daily output token cap

Hard and soft kill‑switches

Centralized audit, cost visibility, and guardrails

Non‑goals

Expose IAM credentials to users’ machines

Provide direct Bedrock access from user principals

Replace existing enterprise identity provider (SSO remains unchanged)

2) Requirements

2.1 Functional

Users can call /v1/chat and /v1/complete from IDEs.

Requests authenticated via API Key; no IAM credentials on client.

Enforce rate limits (TPS/burst) per user (or per team plan) at the edge.

Enforce daily output token budget per user; reject after budget exceeded.

Support streaming responses (SSE) for chat/completions.

Allow list of Bedrock models; requests to others are rejected.

Provide usage/remaining quota endpoint.

Emit structured logs and metrics for cost and auditing.

2.2 Operational

Soft kill: feature flag to disable traffic quickly.

Hard kill: SCP denying Bedrock; model access toggle in Bedrock.

Config changes (caps, models, plans) without redeploy.

SLO: 99.9% monthly availability for gateway.

P90 end‑to‑end latency target: ≤ 1.5s for 2‑KB prompts, non‑streaming.

2.3 Security & Compliance

Users cannot directly call Bedrock; only the gateway’s IAM role can.

TLS 1.2+, HSTS, no PII in logs by default; opt‑in redaction allowlist.

Least‑privilege IAM and VPC endpoints for Bedrock where applicable.

3) High‑Level Architecture

IDE/CLI ──HTTPS──> API Gateway ──(authorizer)──> Lambda Proxy ──> Bedrock
                               │                 └─> DynamoDB (usage)
                               └─> Usage Plans/API Keys

Control plane: SSM Parameter Store (feature flags), DynamoDB (config/limits)
Kill‑switches: SSM flag (soft), Org SCP + Bedrock model access (hard)
Edge safety: AWS WAF (optional)

Key Components

API Gateway (HTTP/REST): Routing, API Keys, Usage Plans, throttling.

Lambda Authorizer: Validates API key, checks feature flag, reads per‑user caps, pre‑checks remaining daily tokens (estimated).

Lambda Proxy: Validates payload, calls Bedrock, streams results, updates usage counters from actual token usage.

DynamoDB: Token metering per user/day; user profiles; config.

SSM Parameter Store: Feature flags + global toggles.

Organizations SCP: Bedrock deny policy for emergency.

WAF (optional): Rate‑based block/allow rules.

4) Data Model

4.1 DynamoDB Tables

**Table: **``

PK: userId (S)

SK: date (S, YYYY-MM-DD)

Attributes: inputTokensTotal (N), outputTokensTotal (N), lastUpdated (S RFC3339), version (N)

TTL (optional): expireAt (N, unix epoch) for 90‑day retention

**Table: **``

PK: userId (S)

Attributes: apiKeyId (S), planId (S), dailyOutputCap (N), maxTokensPerCall (N), allowedModels (SS), status (S: ACTIVE|SUSPENDED)

**Table: **``

PK: configId (S)

Attributes: defaultDailyOutputCap (N), defaultMaxTokensPerCall (N), allowedModels (SS), plans (M: rate, burst)

4.2 SSM Parameters

/app/bedrock/enabled = true|false (global soft kill‑switch)

5) API Design

5.1 Authentication

Header: x-api-key: <key> (API Gateway API Key)

Authorizer attaches principalId = userId into request context.

5.2 Endpoints (JSON)

`POST /v1/chat`

Request:

{
  "model": "anthropic.claude-3-5-sonnet-20240620-v1:0",
  "messages": [
    {"role": "user", "content": "Explain reservoir sampling."}
  ],
  "max_tokens": 400,
  "temperature": 0.2,
  "stream": false
}

Response (non‑stream):

{
  "id": "chatcmpl_...",
  "model": "...",
  "created": 1723456789,
  "usage": {"input_tokens": 123, "output_tokens": 278},
  "choices": [
    {"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}
  ]
}

Response (stream): text/event-stream with data: {"delta": "..."} frames; final frame includes usage.

`POST /v1/complete`

Simple prompt → completion variant; same usage object.

`GET /v1/models`

Returns allowed model IDs.

`GET /v1/usage/today`

Response:

{
  "date": "2025-08-12",
  "output_tokens_used": 9217,
  "output_tokens_cap": 50000,
  "remaining": 40783
}

Error Format

{ "error": { "code": "DAILY_CAP_EXCEEDED", "message": "Reached daily token cap." } }

HTTP Codes: 200, 400 (validation), 401/403 (auth/quota), 429 (rate limit), 5xx (upstream/internal).

6) Throttling & Quotas

6.1 Edge TPS (API Gateway Usage Plans)

Plans:

Standard: rate=2 RPS, burst=10

Power: rate=5 RPS, burst=20 Attach one or more API keys per plan. Optionally add API Gateway request quota per day as a coarse control.

6.2 Token Budgets (Authorizer + Proxy)

Pre‑check (Authorizer): estimate estOut = min(request.max_tokens, user.maxTokensPerCall); if used + estOut > dailyCap, deny.

Post‑accounting (Proxy): parse Bedrock usage and ADD to BedrockTokenUsage atomically:
- SET outputTokensTotal = if_not_exists(outputTokensTotal, :zero) + :out
- SET inputTokensTotal = if_not_exists(inputTokensTotal, :zero) + :in

6.3 Kill‑switches

Soft: /app/bedrock/enabled=false ⇒ Authorizer denies all.

Hard: Org SCP denying bedrock:*; or revoke model access in Bedrock.

7) Request Flow (Sequence)

IDE sends request with x-api-key → API Gateway.

API Gateway authenticates API key and enforces plan TPS/burst.

Lambda Authorizer executes:
- Check SSM flag enabled; check GatewayUsers.status.
- Load user profile & todays usage; pre‑check against cap.
- Return Allow with context {userId, planId} or Deny.

Lambda Proxy validates payload, ensures model is in allowlist, clamps max_tokens.

Call Bedrock Converse/Invoke; stream or buffer output back to client.

On completion, parse usage and update BedrockTokenUsage.

Emit metrics and structured logs; return final body.

8) IAM & Network

8.1 Execution Role (Proxy Lambda)

Permissions:

bedrock:InvokeModel*, bedrock:Converse* (scoped to allowed models)

dynamodb:GetItem/UpdateItem on both tables

ssm:GetParameter on /app/bedrock/enabled

CloudWatch Logs

8.2 End‑User Principals

Explicit Deny policy for bedrock:* to prevent bypass.

Allow invoke of API Gateway only via API Keys (no SigV4 from clients).

8.3 Network

Private VPC Endpoint to Bedrock (where available); add IAM condition aws:SourceVpce on the Lambda role.

9) Validation & Limits

Payload size: limit prompt/messages total ≤ 64 KB (configurable).

**Per‑call **``: clamp to user or global max (e.g., 1024).

Allowed models: configured in GatewayConfig; reject others.

Timeouts: Lambda 30s–60s; API GW 29s for non‑stream; for streaming use integration with Lambda function URLs or REST API w/ chunked responses.

10) Observability

10.1 Metrics (CloudWatch, with EMF)

Requests: count, 4xx, 5xx, latency p50/p90/p99

Bedrock call duration and errors by model

Tokens: input/output per user/day; top N users

Rejections: DAILY_CAP_EXCEEDED, MODEL_NOT_ALLOWED, DISABLED_FLAG

10.2 Logs

Correlation id (request id) end‑to‑end

Redact message content by default (toggle for debugging)

10.3 Alerts (SNS/Slack)

5xx rate > 2% over 5m

DISABLED_FLAG set to false (notify)

User at 80% and 100% of daily cap (optional DM)

Cost anomaly (tokens/day spike)

11) Security Considerations

No API key leakage in logs; rotate keys quarterly.

Optional HMAC sidecar token per request (defense in depth).

WAF geo/IP allowlist if necessary.

Content guardrails (Bedrock Guardrails) for policy compliance.

12) Deployment & IaC

CDK (TypeScript/Java) project containing:
- API Gateway (routes, models), Usage Plans, API Keys (seed via script)
- Lambdas (Authorizer + Proxy) with env vars for table names/params
- DynamoDB tables, autoscaling RCU/WCU
- SSM params with defaults
- Optional WAF association

Environments: dev, staging, prod; feature flags per env

CI/CD: GitHub Actions to synth/deploy; unit + integration tests

13) Runbooks

13.1 Emergency Shutdown

Set /app/bedrock/enabled=false (Immediate soft stop).

If not sufficient, attach Org SCP denying bedrock:* to account/OU.

Optionally disable Bedrock model access in console.

13.2 Raise/Lower User Cap

Update GatewayUsers.dailyOutputCap; change takes effect immediately.

13.3 Key Rotation

Create new API key, map to user, notify; revoke old key after grace period.

13.4 Hotspot/Abuse

Move user to stricter Usage Plan or suspend user; add WAF rule.

14) Testing Strategy

Unit tests: payload validation, authorizer cap math, DDB updates

Contract tests: /v1/chat success/429/403/400 cases

Load tests: confirm API GW TPS enforcement; backpressure behavior

Chaos: Bedrock 5xx/latency injections; ensure graceful degradation

Security: key rotation test; WAF rule efficacy

15) Cost Model (rough)

API Gateway requests (per million)

Lambda GB‑seconds (authorizer + proxy)

DynamoDB RCUs/WCUs proportional to calls (two writes per request worst‑case)

Bedrock tokens (dominant cost) – tracked via usage metrics

Optimizations: batch writes (streaming buffer), on‑demand → provisioned with autoscaling, aggregate counters with periodic compaction.

17) Open Questions

Do we need per‑team budgets in addition to per‑user?

Retention for logs and token usage (90 vs 180 days)?