Bedrock Access Gateway – Design Spec
Doc status: Draft v1.1
Owner: Miguel Merlin
Date: Aug 11, 2025
Reviewers: Brandon Yen
1) Overview
Provide Bedrock access to internal IAM users via a controlled gateway that works from a variety of interfaces, ranging from developer IDE extensions (Cline) to a front-facing UI. The system exposes a stable HTTP API (OpenAI‑style), enforces per‑user throttling and monthly token budgets, and includes multiple kill‑switches.
Primary goals
-
Simple IDE integration via HTTPS + API key
-
Per‑user TPS throttling and monthly cost limits
-
Hard and soft kill‑switches
-
Centralized audit, cost visibility, and guardrails
2) Requirements
2.1 Functional
-
Users can call the Lambda Function URL from IDE extensions or the UI.
- Requests from IDE extensions authenticated via STS credentials generated from IAM credentials.
- Requests from UI authenticated via Cognito JWT tokens.
-
Enforce rate limits (TPS/burst) per user (or per team plan) at the edge.
-
Enforce monthly cost limits per user; reject after budget exceeded.
-
Support streaming responses (SSE) for chat/completions.
-
Allow list of Bedrock models; requests to others are rejected.
- Current list of models includes:
- Claude 3 Haiku
- Claude 3.5 Sonnet
- All other Anthropic models were deemed not necessary or require access through provisioned throughput.
- Current list of models includes:
-
Provide usage/remaining quota endpoint.
-
Emit structured logs and metrics for cost and auditing.
2.2 Operational
-
Soft kill: feature flag to disable traffic quickly.
-
Hard kill: SCP denying Bedrock; model access toggle in Bedrock.
- Per-user soft kill: activate/deactivate access keys for each user.
-
Config changes (caps, models, plans) without redeploy.
-
SLO: 99.9% monthly availability for gateway.
-
P90 end‑to‑end latency target: ≤ 1.5s for 2‑KB prompts, non‑streaming.
2.3 Security & Compliance
-
Users cannot directly call Bedrock; only the gateway’s IAM role can.
-
TLS 1.2+, HSTS, no PII in logs by default; opt‑in redaction allowlist.
3) High‑Level Architecture
IDE/CLI ──HTTPS──> Lambda Function URL ──> Lambda Proxy ──> Bedrock
└─> DynamoDB (usage)
Control plane: DynamoDB (config/limits)
Kill‑switches: Org SCP + Bedrock model access (hard)
Key Components
-
API Gateway (HTTP/REST): Routing, API Keys, Usage Plans, throttling.
-
Lambda Authorizer: Validates API key, checks feature flag, reads per‑user caps, pre‑checks remaining daily tokens (estimated).
-
Lambda Proxy: Validates payload, calls Bedrock, streams results, updates usage counters from actual token usage.
-
DynamoDB: Token metering per user/day; user profiles; config.
-
SSM Parameter Store: Feature flags + global toggles.
-
Organizations SCP: Bedrock deny policy for emergency.
-
WAF (optional): Rate‑based block/allow rules.
4) Data Model
4.1 DynamoDB Tables
Transaction Table:
-
PK:
userId(S) -
SK:
timestamp(S,YYYY-MM-DDTHH:mm:ss) - Attributes:
cost(float),modelId(S),usage(outputTokens(S),inputTokens(S))
Monthly Usage Table:
-
PK:
userArn(S) - SK:
month_year(S,MM_YYYY) -
Attributes:
cost(float),invocations(int)
5) API Design
5.1 Authentication
-
Headers:
-
x-aws-session-token: <key>(STS session token) x-aws-access-key: <key>(STS access key)x-aws-secret-key: <key>(STS secret key)
-
-
Inference proxy finds user ARN based on credentials
5.2 Endpoints (JSON)
POST (Lambda Function URL)
Request:
{
"modelId": "anthropic.claude-3-haiku-20240307-v1:0",
"messages": [
{
"role": "user",
"content": [
{
"text": "<task>"
},
{
"text": "<system prompt>"
},
{
"text": "<environment details>"
}
]
}
],
"system": [
{
"text": "<system prompt>"
}
],
"inferenceConfig": {
"maxTokens": 4096,
"temperature": 0
},
"additionalModelRequestFields": {}
}
GET /v1/usage
Include the userArn in the header to retrieve their monthly usage statistics and the monthly limit.
Error Format
{ "error": { "code": "DAILY_CAP_EXCEEDED", "message": "Reached daily token cap." } }
HTTP Codes: 200, 400 (validation), 401/403 (auth/quota), 429 (rate limit), 5xx (upstream/internal).
6) Throttling & Quotas
6.1 Edge TPS (API Gateway Usage Plans)
Plans:
Standard: rate=2 RPS, burst=10Power: rate=5 RPS, burst=20 Attach one or more API keys per plan. Optionally add API Gateway requestquotaper day as a coarse control.
6.2 Token Budgets (Authorizer + Proxy)
6.3 Kill‑switches
Soft:/app/bedrock/enabled=false⇒ Authorizer denies all.Hard: OrgSCPdenyingbedrock:*; or revoke model access in Bedrock.
7) Request Flow (Sequence)
IDE sends request withx-api-key→ API Gateway.API Gateway authenticates API key and enforces plan TPS/burst.Lambda Authorizerexecutes:Check SSM flag enabled; checkGatewayUsers.status.Load user profile & todays usage; pre‑check against cap.ReturnAllowwith context{userId, planId}orDeny.
Lambda Proxyvalidates payload, ensures model is in allowlist, clampsmax_tokens.CallBedrock Converse/Invoke; stream or buffer output back to client.On completion, parseusageand updateBedrockTokenUsage.Emit metrics and structured logs; return final body.
8) IAM & Network
8.1 Execution Role (Proxy Lambda)
Permissions:
bedrock:InvokeModel*,bedrock:Converse*(scoped to allowed models)dynamodb:GetItem/UpdateItemon both tablesssm:GetParameteron/app/bedrock/enabledCloudWatch Logs
8.2 End‑User Principals
ExplicitDenypolicy forbedrock:*to prevent bypass.Allow invoke of API Gateway only via API Keys (no SigV4 from clients).
8.3 Network
PrivateVPC Endpointto Bedrock (where available); add IAM conditionaws:SourceVpceon the Lambda role.
9) Validation & Limits
Payload size: limit prompt/messages total ≤ 64 KB (configurable).**Per‑call **``: clamp to user or global max (e.g., 1024).Allowed models: configured inGatewayConfig; reject others.Timeouts: Lambda 30s–60s; API GW 29s for non‑stream; for streaming use integration with Lambda function URLs or REST API w/ chunked responses.
10) Observability
10.1 Metrics (CloudWatch, with EMF)
Requests: count, 4xx, 5xx, latency p50/p90/p99Bedrock call duration and errors by modelTokens: input/output per user/day; top N usersRejections: DAILY_CAP_EXCEEDED, MODEL_NOT_ALLOWED, DISABLED_FLAG
10.2 Logs
Correlation id (request id) end‑to‑endRedact message content by default (toggle for debugging)
10.3 Alerts (SNS/Slack)
5xx rate > 2% over 5mDISABLED_FLAG set to false (notify)User at 80% and 100% of daily cap (optional DM)Cost anomaly (tokens/day spike)
11) Security Considerations
No API key leakage in logs; rotate keys quarterly.Optional HMAC sidecar token per request (defense in depth).WAF geo/IP allowlist if necessary.Content guardrails (Bedrock Guardrails) for policy compliance.
12) Deployment & IaC
CDK(TypeScript/Java) project containing:API Gateway (routes, models), Usage Plans, API Keys (seed via script)Lambdas (Authorizer + Proxy) with env vars for table names/paramsDynamoDB tables, autoscaling RCU/WCUSSM params with defaultsOptional WAF association
Environments:dev,staging,prod; feature flags per envCI/CD: GitHub Actions to synth/deploy; unit + integration tests
13) Runbooks
13.1 Emergency Shutdown
Set/app/bedrock/enabled=false(Immediate soft stop).If not sufficient, attach Org SCP denyingbedrock:*to account/OU.Optionally disable Bedrock model access in console.
13.2 Raise/Lower User Cap
UpdateGatewayUsers.dailyOutputCap; change takes effect immediately.
13.3 Key Rotation
Create new API key, map to user, notify; revoke old key after grace period.
13.4 Hotspot/Abuse
Move user to stricter Usage Plan or suspend user; add WAF rule.
14) Testing Strategy
Unit tests: payload validation, authorizer cap math, DDB updatesContract tests:/v1/chatsuccess/429/403/400 casesLoad tests: confirm API GW TPS enforcement; backpressure behaviorChaos: Bedrock 5xx/latency injections; ensure graceful degradationSecurity: key rotation test; WAF rule efficacy
15) Cost Model (rough)
API Gateway requests (per million)Lambda GB‑seconds (authorizer + proxy)DynamoDB RCUs/WCUs proportional to calls (two writes per request worst‑case)Bedrock tokens (dominant cost) – tracked via usage metrics
Optimizations: batch writes (streaming buffer), on‑demand → provisioned with autoscaling, aggregate counters with periodic compaction.