Bedrock Access Gateway – Design Spec

Doc status: Draft v1.1
Owner: Miguel Merlin
Date: Aug 11, 2025
Reviewers: Brandon Yen

1) Overview

Provide Bedrock access to internal IAM users via a controlled gateway that works from a variety of interfaces, ranging from developer IDE extensions (Cline) to a front-facing UI. The system exposes a stable HTTP API (OpenAI‑style), enforces per‑user throttling and monthly token budgets, and includes multiple kill‑switches.

Primary goals

Simple IDE integration via HTTPS + API key
Per‑user TPS throttling and monthly cost limits
Hard and soft kill‑switches
Centralized audit, cost visibility, and guardrails

2) Requirements

2.1 Functional

Users can call the Lambda Function URL from IDE extensions or the UI.
Requests from IDE extensions authenticated via STS credentials generated from IAM credentials.
Requests from UI authenticated via Cognito JWT tokens.
Enforce rate limits (TPS/burst) per user (or per team plan) at the edge.
Enforce monthly cost limits per user; reject after budget exceeded.
Support streaming responses (SSE) for chat/completions.
Allow list of Bedrock models; requests to others are rejected.
1. Current list of models includes:
  1. Claude 3 Haiku
  2. Claude 3.5 Sonnet
2. All other Anthropic models were deemed not necessary or require access through provisioned throughput.
Provide usage/remaining quota endpoint.
Emit structured logs and metrics for cost and auditing.

2.2 Operational

Soft kill: feature flag to disable traffic quickly.
Hard kill: SCP denying Bedrock; model access toggle in Bedrock.
Per-user soft kill: activate/deactivate access keys for each user.
Config changes (caps, models, plans) without redeploy.
SLO: 99.9% monthly availability for gateway.
P90 end‑to‑end latency target: ≤ 1.5s for 2‑KB prompts, non‑streaming.

2.3 Security & Compliance

Users cannot directly call Bedrock; only the gateway’s IAM role can.
TLS 1.2+, HSTS, no PII in logs by default; opt‑in redaction allowlist.

3) High‑Level Architecture

IDE/CLI ──HTTPS──> Lambda Function URL ──> Lambda Proxy ──> Bedrock
                                                └─> DynamoDB (usage)
                             

Control plane: DynamoDB (config/limits)
Kill‑switches: Org SCP + Bedrock model access (hard)

Key Components

API Gateway (HTTP/REST): Routing, API Keys, Usage Plans, throttling.
Lambda Authorizer: Validates API key, checks feature flag, reads per‑user caps, pre‑checks remaining daily tokens (estimated).
Lambda Proxy: Validates payload, calls Bedrock, streams results, updates usage counters from actual token usage.
DynamoDB: Token metering per user/day; user profiles; config.
SSM Parameter Store: Feature flags + global toggles.
Organizations SCP: Bedrock deny policy for emergency.
WAF (optional): Rate‑based block/allow rules.

4) Data Model

4.1 DynamoDB Tables

Transaction Table:

PK: userId (S)
SK: timestamp (S, YYYY-MM-DDTHH:mm:ss)
Attributes: cost (float), modelId (S), usage (outputTokens (S), inputTokens (S))

Monthly Usage Table:

PK: userArn (S)
SK: month_year (S, MM_YYYY)
Attributes: cost (float), invocations (int)

5) API Design

5.1 Authentication

Headers:
- x-aws-session-token: <key> (STS session token)
- x-aws-access-key: <key> (STS access key)
- x-aws-secret-key: <key> (STS secret key)
Inference proxy finds user ARN based on credentials

5.2 Endpoints (JSON)

`POST (Lambda Function URL)`

Request:

{
    "modelId": "anthropic.claude-3-haiku-20240307-v1:0",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": "<task>"
                },
                {
                    "text": "<system prompt>"
                },
                {
                    "text": "<environment details>"
                }
            ]
        }
    ],
    "system": [
        {
            "text": "<system prompt>"
        }
    ],
    "inferenceConfig": {
        "maxTokens": 4096,
        "temperature": 0
    },
    "additionalModelRequestFields": {}
}

`GET /v1/usage`

Include the userArn in the header to retrieve their monthly usage statistics and the monthly limit.

Error Format

{ "error": { "code": "DAILY_CAP_EXCEEDED", "message": "Reached daily token cap." } }

~~HTTP Codes~~~~: 200, 400 (validation), 401/403 (auth/quota), 429 (rate limit), 5xx (upstream/internal).~~

6) Throttling & Quotas

6.1 Edge TPS (API Gateway Usage Plans)

~~Plans:~~

~~Standard~~~~: rate=2 RPS, burst=10~~

~~Power~~~~: rate=5 RPS, burst=20 Attach one or more API keys per plan. Optionally add API Gateway request~~ ~~quota~~ ~~per day as a coarse control.~~

6.2 Token Budgets (Authorizer + Proxy)

~~Pre‑check~~ ~~(Authorizer): estimate~~ estOut = min(request.max_tokens, user.maxTokensPerCall)~~; if~~ used + estOut > dailyCap~~, deny.~~

~~Post‑accounting~~ ~~(Proxy): parse Bedrock usage and~~ ADD to BedrockTokenUsage ~~atomically:~~
- SET outputTokensTotal = if_not_exists(outputTokensTotal, :zero) + :out
- SET inputTokensTotal = if_not_exists(inputTokensTotal, :zero) + :in

6.3 Kill‑switches

~~Soft~~: /app/bedrock/enabled=false ~~⇒ Authorizer denies all.~~

~~Hard~~~~: Org~~ ~~SCP~~ ~~denying~~ bedrock:*~~; or revoke model access in Bedrock.~~

7) Request Flow (Sequence)

~~IDE sends request with~~ x-api-key ~~→ API Gateway.~~

~~API Gateway authenticates API key and enforces plan TPS/burst.~~

~~Lambda Authorizer~~ ~~executes:~~
- ~~Check SSM flag enabled; check~~ GatewayUsers.status.
- ~~Load user profile & todays usage; pre‑check against cap.~~
- ~~Return~~ Allow ~~with context~~ {userId, planId} or Deny.

~~Lambda Proxy~~ ~~validates payload, ensures model is in allowlist, clamps~~ max_tokens.

~~Call~~ ~~Bedrock Converse/Invoke~~~~; stream or buffer output back to client.~~

~~On completion, parse~~ usage ~~and update~~ BedrockTokenUsage.

~~Emit metrics and structured logs; return final body.~~

8) IAM & Network

8.1 Execution Role (Proxy Lambda)

~~Permissions:~~

bedrock:InvokeModel*, bedrock:Converse* ~~(scoped to allowed models)~~

dynamodb:GetItem/UpdateItem ~~on both tables~~

ssm:GetParameter on /app/bedrock/enabled

~~CloudWatch Logs~~

8.2 End‑User Principals

~~Explicit~~ ~~Deny~~ ~~policy for~~ bedrock:* ~~to prevent bypass.~~

~~Allow invoke of API Gateway only via API Keys (no SigV4 from clients).~~

8.3 Network

~~Private~~ ~~VPC Endpoint~~ ~~to Bedrock (where available); add IAM condition~~ aws:SourceVpce ~~on the Lambda role.~~

9) Validation & Limits

~~Payload size~~~~: limit prompt/messages total ≤ 64 KB (configurable).~~

**Per‑call **``: clamp to user or global max (e.g., 1024).

~~Allowed models~~~~: configured in~~ GatewayConfig~~; reject others.~~

~~Timeouts~~~~: Lambda 30s–60s; API GW 29s for non‑stream; for streaming use integration with Lambda function URLs or REST API w/ chunked responses.~~

10) Observability

10.1 Metrics (CloudWatch, with EMF)

~~Requests: count, 4xx, 5xx, latency p50/p90/p99~~

~~Bedrock call duration and errors by model~~

~~Tokens: input/output per user/day; top N users~~

~~Rejections: DAILY_CAP_EXCEEDED, MODEL_NOT_ALLOWED, DISABLED_FLAG~~

10.2 Logs

~~Correlation id (request id) end‑to‑end~~

~~Redact message content by default (toggle for debugging)~~

10.3 Alerts (SNS/Slack)

~~5xx rate > 2% over 5m~~

~~DISABLED_FLAG set to false (notify)~~

~~User at 80% and 100% of daily cap (optional DM)~~

~~Cost anomaly (tokens/day spike)~~

11) Security Considerations

~~No API key leakage in logs; rotate keys quarterly.~~

~~Optional HMAC sidecar token per request (defense in depth).~~

~~WAF geo/IP allowlist if necessary.~~

~~Content guardrails (Bedrock Guardrails) for policy compliance.~~

12) Deployment & IaC

~~CDK~~ ~~(TypeScript/Java) project containing:~~
- ~~API Gateway (routes, models), Usage Plans, API Keys (seed via script)~~
- ~~Lambdas (Authorizer + Proxy) with env vars for table names/params~~
- ~~DynamoDB tables, autoscaling RCU/WCU~~
- ~~SSM params with defaults~~
- ~~Optional WAF association~~

~~Environments~~: dev, staging, prod~~; feature flags per env~~

~~CI/CD~~~~: GitHub Actions to synth/deploy; unit + integration tests~~

13) Runbooks

13.1 Emergency Shutdown

~~Set~~ /app/bedrock/enabled=false ~~(Immediate soft stop).~~

~~If not sufficient, attach Org SCP denying~~ bedrock:* ~~to account/OU.~~

~~Optionally disable Bedrock model access in console.~~

13.2 Raise/Lower User Cap

~~Update~~ GatewayUsers.dailyOutputCap~~; change takes effect immediately.~~

13.3 Key Rotation

~~Create new API key, map to user, notify; revoke old key after grace period.~~

13.4 Hotspot/Abuse

~~Move user to stricter Usage Plan or suspend user; add WAF rule.~~

14) Testing Strategy

~~Unit tests~~~~: payload validation, authorizer cap math, DDB updates~~

~~Contract tests~~: /v1/chat ~~success/429/403/400 cases~~

~~Load tests~~~~: confirm API GW TPS enforcement; backpressure behavior~~

~~Chaos~~~~: Bedrock 5xx/latency injections; ensure graceful degradation~~

~~Security~~~~: key rotation test; WAF rule efficacy~~

15) Cost Model (rough)

~~API Gateway requests (per million)~~

~~Lambda GB‑seconds (authorizer + proxy)~~

~~DynamoDB RCUs/WCUs proportional to calls (two writes per request worst‑case)~~

~~Bedrock tokens (dominant cost) – tracked via usage metrics~~

~~Optimizations: batch writes (streaming buffer), on‑demand → provisioned with autoscaling, aggregate counters with periodic compaction.~~