Bedrock Access Gateway – Design Spec
Doc status: Draft v1.1
Owner: Miguel Merlin
Date: Aug 11, 2025
Reviewers: Brandon Yen
1) Overview
Provide Bedrock access to internal IAM users via a controlled gateway that works from a variety of interfaces, ranging from developer IDE extensions (Cline) to a front-facing UI. The system exposes a stable HTTP API (OpenAI‑style), enforces per‑user throttling and monthly token budgets, and includes multiple kill‑switches.
Primary goals
Simple IDE integration via HTTPS + API key-
Per‑user TPS throttling and monthly cost limits
-
Centralized audit, cost visibility, and guardrails
2) Requirements
2.1 Functional
-
Users can call the Lambda Function URL from IDE extensions or the UI.
- Requests from IDE extensions authenticated via STS credentials generated from IAM credentials.
- Requests from UI authenticated via Cognito JWT tokens.
-
Enforce monthly cost limits per user; reject after budget exceeded.
-
Support streaming responses (SSE) for chat/completions.
-
Allow list of Bedrock models; requests to others are rejected.
- Current list of models includes:
- Claude 3 Haiku
- Claude 3.5 Sonnet
- All other Anthropic models were deemed not necessary or require access through provisioned throughput.
- Current list of models includes:
-
Provide usage/remaining quota endpoint.
-
Emit structured logs and metrics for cost and auditing.
2.2 Operational
-
Hard kill: SCP denying Bedrock; model access toggle in Bedrock.
- Per-user soft kill: activate/deactivate access keys for each user.
-
Config changes (caps, models, plans) without redeploy.
-
SLO: 99.9% monthly availability for gateway.
-
P90 end‑to‑end latency target: ≤ 1.5s for 2‑KB prompts, non‑streaming.
2.3 Security & Compliance
-
Users cannot directly call Bedrock; only the gateway’s IAM role can.
3) High‑Level Architecture
CLI/IDE CLI ──> HTTPS --└─> Lambda Function URL ──> Lambda Proxy ──> Bedrock
└─> DynamoDB (usage)
Control plane: DynamoDB (config/limits)
Key Components
API Gateway (HTTP/REST): Routing, API Keys, Usage Plans, throttling.-
Lambda Proxy: Validates payload, authenticates users, calls Bedrock, streams results, updates usage counters from actual token usage.
-
DynamoDB: Token metering per user/day; user profiles; config.
4) Data Model
4.1 DynamoDB Tables
Transaction Table:
-
PK:
userId(S) -
SK:
timestamp(S,YYYY-MM-DDTHH:mm:ss) - Attributes:
cost(float),modelId(S),usage(outputTokens(S),inputTokens(S))
Monthly Usage Table:
-
PK:
userArn(S) - SK:
month_year(S,MM_YYYY) -
Attributes:
cost(float),invocations(int)
5) API Design
5.1 Authentication
-
Headers:
-
x-aws-session-token: <key>(STS session token) x-aws-access-key: <key>(STS access key)x-aws-secret-key: <key>(STS secret key)
-
-
Inference proxy finds user ARN based on credentials
5.2 Endpoints (JSON)
POST (Lambda Function URL)
Request:
{
"modelId": "anthropic.claude-3-haiku-20240307-v1:0",
"messages": [
{
"role": "user",
"content": [
{
"text": "<task>"
},
{
"text": "<system prompt>"
},
{
"text": "<environment details>"
}
]
}
],
"system": [
{
"text": "<system prompt>"
}
],
"inferenceConfig": {
"maxTokens": 4096,
"temperature": 0
},
"additionalModelRequestFields": {}
}
GET /v1/usage
Include the userArn in the header to retrieve their monthly usage statistics and the monthly limit.