Bedrock Access Gateway – Design Spec
Bedrock Access Gateway – Design Spec
Doc status: Draft v1.0
Owner: Miguel Merlin
Date: Aug 12, 2025
Reviewers:
1) Overview
Provide Bedrock access to internal IAM users via a controlled gateway that works from developer IDEs. The system exposes a stable HTTP API (OpenAI‑style), enforces per‑user throttling and daily token budgets, and includes multiple kill‑switches.
Primary goals
-
Simple IDE integration via HTTPS + API key
-
Per‑user TPS throttling and daily output token cap
-
Hard and soft kill‑switches
-
Centralized audit, cost visibility, and guardrails
Non‑goals
-
Expose IAM credentials to users’ machines
-
Provide direct Bedrock access from user principals
-
Replace existing enterprise identity provider (SSO remains unchanged)
2) Requirements
2.1 Functional
-
Users can call
/v1/chat
and/v1/complete
from IDEs. -
Requests authenticated via API Key; no IAM credentials on client.
-
Enforce rate limits (TPS/burst) per user (or per team plan) at the edge.
-
Enforce daily output token budget per user; reject after budget exceeded.
-
Support streaming responses (SSE) for chat/completions.
-
Allow list of Bedrock models; requests to others are rejected.
-
Provide usage/remaining quota endpoint.
-
Emit structured logs and metrics for cost and auditing.
2.2 Operational
-
Soft kill: feature flag to disable traffic quickly.
-
Hard kill: SCP denying Bedrock; model access toggle in Bedrock.
-
Config changes (caps, models, plans) without redeploy.
-
SLO: 99.9% monthly availability for gateway.
-
P90 end‑to‑end latency target: ≤ 1.5s for 2‑KB prompts, non‑streaming.
2.3 Security & Compliance
-
Users cannot directly call Bedrock; only the gateway’s IAM role can.
-
TLS 1.2+, HSTS, no PII in logs by default; opt‑in redaction allowlist.
-
Least‑privilege IAM and VPC endpoints for Bedrock where applicable.
3) High‑Level Architecture
IDE/CLI ──HTTPS──> API Gateway ──(authorizer)──> Lambda Proxy ──> Bedrock
│ └─> DynamoDB (usage)
└─> Usage Plans/API Keys
Control plane: SSM Parameter Store (feature flags), DynamoDB (config/limits)
Kill‑switches: SSM flag (soft), Org SCP + Bedrock model access (hard)
Edge safety: AWS WAF (optional)
Key Components
-
API Gateway (HTTP/REST): Routing, API Keys, Usage Plans, throttling.
-
Lambda Authorizer: Validates API key, checks feature flag, reads per‑user caps, pre‑checks remaining daily tokens (estimated).
-
Lambda Proxy: Validates payload, calls Bedrock, streams results, updates usage counters from actual token usage.
-
DynamoDB: Token metering per user/day; user profiles; config.
-
SSM Parameter Store: Feature flags + global toggles.
-
Organizations SCP: Bedrock deny policy for emergency.
-
WAF (optional): Rate‑based block/allow rules.
4) Data Model
4.1 DynamoDB Tables
**Table: **``
-
PK:
userId
(S) -
SK:
date
(S,YYYY-MM-DD
) -
Attributes:
inputTokensTotal
(N),outputTokensTotal
(N),lastUpdated
(S RFC3339),version
(N) -
TTL (optional):
expireAt
(N, unix epoch) for 90‑day retention
**Table: **``
-
PK:
userId
(S) -
Attributes:
apiKeyId
(S),planId
(S),dailyOutputCap
(N),maxTokensPerCall
(N),allowedModels
(SS),status
(S: ACTIVE|SUSPENDED)
**Table: **``
-
PK:
configId
(S) -
Attributes:
defaultDailyOutputCap
(N),defaultMaxTokensPerCall
(N),allowedModels
(SS),plans
(M: rate, burst)
4.2 SSM Parameters
-
/app/bedrock/enabled
=true|false
(global soft kill‑switch)
5) API Design
5.1 Authentication
-
Header:
x-api-key: <key>
(API Gateway API Key) -
Authorizer attaches
principalId = userId
into request context.
5.2 Endpoints (JSON)
POST /v1/chat
Request:
{
"model": "anthropic.claude-3-5-sonnet-20240620-v1:0",
"messages": [
{"role": "user", "content": "Explain reservoir sampling."}
],
"max_tokens": 400,
"temperature": 0.2,
"stream": false
}
Response (non‑stream):
{
"id": "chatcmpl_...",
"model": "...",
"created": 1723456789,
"usage": {"input_tokens": 123, "output_tokens": 278},
"choices": [
{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}
]
}
Response (stream): text/event-stream
with data: {"delta": "..."}
frames; final frame includes usage
.
POST /v1/complete
Simple prompt → completion variant; same usage object.
GET /v1/models
Returns allowed model IDs.
GET /v1/usage/today
Response:
{
"date": "2025-08-12",
"output_tokens_used": 9217,
"output_tokens_cap": 50000,
"remaining": 40783
}
Error Format
{ "error": { "code": "DAILY_CAP_EXCEEDED", "message": "Reached daily token cap." } }
HTTP Codes: 200, 400 (validation), 401/403 (auth/quota), 429 (rate limit), 5xx (upstream/internal).
6) Throttling & Quotas
6.1 Edge TPS (API Gateway Usage Plans)
Plans:
-
Standard: rate=2 RPS, burst=10
-
Power: rate=5 RPS, burst=20 Attach one or more API keys per plan. Optionally add API Gateway request quota per day as a coarse control.
6.2 Token Budgets (Authorizer + Proxy)
6.3 Kill‑switches
-
Soft:
/app/bedrock/enabled=false
⇒ Authorizer denies all. -
Hard: Org SCP denying
bedrock:*
; or revoke model access in Bedrock.
7) Request Flow (Sequence)
-
IDE sends request with
x-api-key
→ API Gateway. -
API Gateway authenticates API key and enforces plan TPS/burst.
-
Lambda Authorizer executes:
-
Check SSM flag enabled; check
GatewayUsers.status
. -
Load user profile & todays usage; pre‑check against cap.
-
Return
Allow
with context{userId, planId}
orDeny
.
-
-
Lambda Proxy validates payload, ensures model is in allowlist, clamps
max_tokens
. -
Call Bedrock Converse/Invoke; stream or buffer output back to client.
-
On completion, parse
usage
and updateBedrockTokenUsage
. -
Emit metrics and structured logs; return final body.
8) IAM & Network
8.1 Execution Role (Proxy Lambda)
Permissions:
-
bedrock:InvokeModel*
,bedrock:Converse*
(scoped to allowed models) -
dynamodb:GetItem/UpdateItem
on both tables -
ssm:GetParameter
on/app/bedrock/enabled
-
CloudWatch Logs
8.2 End‑User Principals
-
Explicit Deny policy for
bedrock:*
to prevent bypass. -
Allow invoke of API Gateway only via API Keys (no SigV4 from clients).
8.3 Network
-
Private VPC Endpoint to Bedrock (where available); add IAM condition
aws:SourceVpce
on the Lambda role.
9) Validation & Limits
-
Payload size: limit prompt/messages total ≤ 64 KB (configurable).
-
**Per‑call **``: clamp to user or global max (e.g., 1024).
-
Allowed models: configured in
GatewayConfig
; reject others. -
Timeouts: Lambda 30s–60s; API GW 29s for non‑stream; for streaming use integration with Lambda function URLs or REST API w/ chunked responses.
10) Observability
10.1 Metrics (CloudWatch, with EMF)
-
Requests: count, 4xx, 5xx, latency p50/p90/p99
-
Bedrock call duration and errors by model
-
Tokens: input/output per user/day; top N users
-
Rejections: DAILY_CAP_EXCEEDED, MODEL_NOT_ALLOWED, DISABLED_FLAG
10.2 Logs
-
Correlation id (request id) end‑to‑end
-
Redact message content by default (toggle for debugging)
10.3 Alerts (SNS/Slack)
-
5xx rate > 2% over 5m
-
DISABLED_FLAG set to false (notify)
-
User at 80% and 100% of daily cap (optional DM)
-
Cost anomaly (tokens/day spike)
11) Security Considerations
-
No API key leakage in logs; rotate keys quarterly.
-
Optional HMAC sidecar token per request (defense in depth).
-
WAF geo/IP allowlist if necessary.
-
Content guardrails (Bedrock Guardrails) for policy compliance.
12) Deployment & IaC
-
CDK (TypeScript/Java) project containing:
-
API Gateway (routes, models), Usage Plans, API Keys (seed via script)
-
Lambdas (Authorizer + Proxy) with env vars for table names/params
-
DynamoDB tables, autoscaling RCU/WCU
-
SSM params with defaults
-
Optional WAF association
-
-
Environments:
dev
,staging
,prod
; feature flags per env -
CI/CD: GitHub Actions to synth/deploy; unit + integration tests
13) Runbooks
13.1 Emergency Shutdown
-
Set
/app/bedrock/enabled=false
(Immediate soft stop). -
If not sufficient, attach Org SCP denying
bedrock:*
to account/OU. -
Optionally disable Bedrock model access in console.
13.2 Raise/Lower User Cap
-
Update
GatewayUsers.dailyOutputCap
; change takes effect immediately.
13.3 Key Rotation
-
Create new API key, map to user, notify; revoke old key after grace period.
13.4 Hotspot/Abuse
-
Move user to stricter Usage Plan or suspend user; add WAF rule.
14) Testing Strategy
-
Unit tests: payload validation, authorizer cap math, DDB updates
-
Contract tests:
/v1/chat
success/429/403/400 cases -
Load tests: confirm API GW TPS enforcement; backpressure behavior
-
Chaos: Bedrock 5xx/latency injections; ensure graceful degradation
-
Security: key rotation test; WAF rule efficacy
15) Cost Model (rough)
-
API Gateway requests (per million)
-
Lambda GB‑seconds (authorizer + proxy)
-
DynamoDB RCUs/WCUs proportional to calls (two writes per request worst‑case)
-
Bedrock tokens (dominant cost) – tracked via usage metrics
Optimizations: batch writes (streaming buffer), on‑demand → provisioned with autoscaling, aggregate counters with periodic compaction.