Skip to main content

Bedrock Access Gateway – Design Spec

Doc status: Draft v1.1
Owner: Miguel Merlin
Date: Aug 11, 2025
Reviewers: Brandon Yen


1) Overview

Provide Bedrock access to internal IAM users via a controlled gateway that works from a variety of interfaces, ranging from developer IDE extensions (Cline) to a front-facing UI. The system exposes a stable HTTP API (OpenAI‑style), enforces per‑user throttling and monthly token budgets, and includes multiple kill‑switches.

Primary goals

  • Simple IDE integration via HTTPS + API key

  • Per‑user TPS throttling and monthly cost limits

  • Hard and soft kill‑switches

  • Centralized audit, cost visibility, and guardrails


2) Requirements

2.1 Functional

  1. Users can call the Lambda Function URL from IDE extensions or the UI.

  2. Requests from IDE extensions authenticated via STS credentials generated from IAM credentials.
  3. Requests from UI authenticated via Cognito JWT tokens.
  4. Enforce rate limits (TPS/burst) per user (or per team plan) at the edge.

  5. Enforce monthly cost limits per user; reject after budget exceeded.

  6. Support streaming responses (SSE) for chat/completions.

  7. Allow list of Bedrock models; requests to others are rejected.

    1. Current list of models includes:
      1. Claude 3 Haiku
      2. Claude 3.5 Sonnet
    2. All other Anthropic models were deemed not necessary or require access through provisioned throughput.
  8. Provide usage/remaining quota endpoint.

  9. Emit structured logs and metrics for cost and auditing.

2.2 Operational

  1. Soft kill: feature flag to disable traffic quickly.

  2. Hard kill: SCP denying Bedrock; model access toggle in Bedrock.

  3. Per-user soft kill: activate/deactivate access keys for each user.
  4. Config changes (caps, models, plans) without redeploy.

  5. SLO: 99.9% monthly availability for gateway.

  6. P90 end‑to‑end latency target: ≤ 1.5s for 2‑KB prompts, non‑streaming.

2.3 Security & Compliance

  • Users cannot directly call Bedrock; only the gateway’s IAM role can.

  • TLS 1.2+, HSTS, no PII in logs by default; opt‑in redaction allowlist.


3) High‑Level Architecture

IDE/CLI ──HTTPS──> Lambda Function URL ──> Lambda Proxy ──> Bedrock
                                                └─> DynamoDB (usage)
                             

Control plane: DynamoDB (config/limits)
Kill‑switches: Org SCP + Bedrock model access (hard)

Key Components

  • API Gateway (HTTP/REST): Routing, API Keys, Usage Plans, throttling.

  • Lambda Authorizer: Validates API key, checks feature flag, reads per‑user caps, pre‑checks remaining daily tokens (estimated).

  • Lambda Proxy: Validates payload, calls Bedrock, streams results, updates usage counters from actual token usage.

  • DynamoDB: Token metering per user/day; user profiles; config.

  • SSM Parameter Store: Feature flags + global toggles.

  • Organizations SCP: Bedrock deny policy for emergency.

  • WAF (optional): Rate‑based block/allow rules.


4) Data Model

4.1 DynamoDB Tables

**Transaction Table: **

  • PK: userId (S)

  • SKdatetimestamp (S, YYYY-MM-DDDDTHH:mm:ss)

  • Attributes: inputTokensTotalcost (N)float)outputTokensTotalmodelId (N)S)lastUpdatedusage (SoutputTokens RFC3339)(S), versioninputTokens (N)

  • TTL (optional): expireAt (N, unix epoch) for 90‑day retention

    S))

**Monthly Usage Table: **

  • PK: userIduserArn (S)

  • SK: 

    Attributes: apiKeyIdmonth_year (S),S, planIdMM_YYYY (S), dailyOutputCap (N), maxTokensPerCall (N), allowedModels (SS), status (S: ACTIVE|SUSPENDED)

**Table: **

  • PK: configId (S)

    )
  • Attributes: defaultDailyOutputCapcost (N)float)defaultMaxTokensPerCallinvocations (N), allowedModels (SS), plans (M: rate, burst)

4.2 SSM Parameters

  • /app/bedrock/enabled = true|false (global soft kill‑switch)int)


5) API Design

5.1 Authentication

  • HeaderHeaders:

    • x-api-aws-session-token: <key> (STS session token)

    • x-aws-access-key: <key> (APISTS Gatewayaccess APIkey)
    • Key)

    • x-aws-secret-key: <key> (STS secret key)
  • AuthorizerInference attachesproxy principalIdfinds =user userIdARN intobased requeston context.credentials

5.2 Endpoints (JSON)

POST /v1/chat(Lambda Function URL)

Request:

{
    "model"modelId": "anthropic.claude-3-5-sonnet-20240620-haiku-20240307-v1:0",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "Explaintext": reservoir"<task>"
                sampling.},
                {
                    "text": "<system prompt>"
                },
                {
                    "text": "<environment details>"
                }
            ]
        }
    ],
    "max_tokens"system": 400,[
        {
            "text": "<system prompt>"
        }
    ],
    "inferenceConfig": {
        "maxTokens": 4096,
        "temperature": 0.2,0
    "stream": false
}

Response (non‑stream):

{
  "id": "chatcmpl_...",
  "model": "...",
  "created": 1723456789,
  "usage": {"input_tokens": 123, "output_tokens": 278},
  "choices": [
    {"index": 0, "message": {"role": "assistant", "content": "..."},
    "finish_reason"additionalModelRequestFields": "stop"{}
]
}

Response (stream): text/event-stream with data: {"delta": "..."} frames; final frame includes usage.

POST /v1/complete

Simple prompt → completion variant; same usage object.

GET /v1/modelsusage

ReturnsInclude allowedthe modeluserArn IDs.in the header to retrieve their monthly usage statistics and the monthly limit.

GET /v1/usage/today

Response:

{
  "date": "2025-08-12",
  "output_tokens_used": 9217,
  "output_tokens_cap": 50000,
  "remaining": 40783
}

Error Format

{ "error": { "code": "DAILY_CAP_EXCEEDED", "message": "Reached daily token cap." } }

HTTP Codes: 200, 400 (validation), 401/403 (auth/quota), 429 (rate limit), 5xx (upstream/internal).


6) Throttling & Quotas

6.1 Edge TPS (API Gateway Usage Plans)

Plans:

  • Standard: rate=2 RPS, burst=10

  • Power: rate=5 RPS, burst=20 Attach one or more API keys per plan. Optionally add API Gateway request quota per day as a coarse control.

6.2 Token Budgets (Authorizer + Proxy)

  • Pre‑check (Authorizer): estimate estOut = min(request.max_tokens, user.maxTokensPerCall); if used + estOut > dailyCap, deny.

  • Post‑accounting (Proxy): parse Bedrock usage and ADD to BedrockTokenUsage atomically:

    • SET outputTokensTotal = if_not_exists(outputTokensTotal, :zero) + :out

    • SET inputTokensTotal = if_not_exists(inputTokensTotal, :zero) + :in

6.3 Kill‑switches

  • Soft: /app/bedrock/enabled=false ⇒ Authorizer denies all.

  • Hard: Org SCP denying bedrock:*; or revoke model access in Bedrock.


7) Request Flow (Sequence)

  1. IDE sends request with x-api-key → API Gateway.

  2. API Gateway authenticates API key and enforces plan TPS/burst.

  3. Lambda Authorizer executes:

    • Check SSM flag enabled; check GatewayUsers.status.

    • Load user profile & todays usage; pre‑check against cap.

    • Return Allow with context {userId, planId} or Deny.

  4. Lambda Proxy validates payload, ensures model is in allowlist, clamps max_tokens.

  5. Call Bedrock Converse/Invoke; stream or buffer output back to client.

  6. On completion, parse usage and update BedrockTokenUsage.

  7. Emit metrics and structured logs; return final body.


8) IAM & Network

8.1 Execution Role (Proxy Lambda)

Permissions:

  • bedrock:InvokeModel*, bedrock:Converse* (scoped to allowed models)

  • dynamodb:GetItem/UpdateItem on both tables

  • ssm:GetParameter on /app/bedrock/enabled

  • CloudWatch Logs

8.2 End‑User Principals

  • Explicit Deny policy for bedrock:* to prevent bypass.

  • Allow invoke of API Gateway only via API Keys (no SigV4 from clients).

8.3 Network

  • Private VPC Endpoint to Bedrock (where available); add IAM condition aws:SourceVpce on the Lambda role.


9) Validation & Limits

  • Payload size: limit prompt/messages total ≤ 64 KB (configurable).

  • **Per‑call **``: clamp to user or global max (e.g., 1024).

  • Allowed models: configured in GatewayConfig; reject others.

  • Timeouts: Lambda 30s–60s; API GW 29s for non‑stream; for streaming use integration with Lambda function URLs or REST API w/ chunked responses.


10) Observability

10.1 Metrics (CloudWatch, with EMF)

  • Requests: count, 4xx, 5xx, latency p50/p90/p99

  • Bedrock call duration and errors by model

  • Tokens: input/output per user/day; top N users

  • Rejections: DAILY_CAP_EXCEEDED, MODEL_NOT_ALLOWED, DISABLED_FLAG

10.2 Logs

  • Correlation id (request id) end‑to‑end

  • Redact message content by default (toggle for debugging)

10.3 Alerts (SNS/Slack)

  • 5xx rate > 2% over 5m

  • DISABLED_FLAG set to false (notify)

  • User at 80% and 100% of daily cap (optional DM)

  • Cost anomaly (tokens/day spike)


11) Security Considerations

  • No API key leakage in logs; rotate keys quarterly.

  • Optional HMAC sidecar token per request (defense in depth).

  • WAF geo/IP allowlist if necessary.

  • Content guardrails (Bedrock Guardrails) for policy compliance.


12) Deployment & IaC

  • CDK (TypeScript/Java) project containing:

    • API Gateway (routes, models), Usage Plans, API Keys (seed via script)

    • Lambdas (Authorizer + Proxy) with env vars for table names/params

    • DynamoDB tables, autoscaling RCU/WCU

    • SSM params with defaults

    • Optional WAF association

  • Environments: dev, staging, prod; feature flags per env

  • CI/CD: GitHub Actions to synth/deploy; unit + integration tests


13) Runbooks

13.1 Emergency Shutdown

  1. Set /app/bedrock/enabled=false (Immediate soft stop).

  2. If not sufficient, attach Org SCP denying bedrock:* to account/OU.

  3. Optionally disable Bedrock model access in console.

13.2 Raise/Lower User Cap

  • Update GatewayUsers.dailyOutputCap; change takes effect immediately.

13.3 Key Rotation

  • Create new API key, map to user, notify; revoke old key after grace period.

13.4 Hotspot/Abuse

  • Move user to stricter Usage Plan or suspend user; add WAF rule.


14) Testing Strategy

  • Unit tests: payload validation, authorizer cap math, DDB updates

  • Contract tests: /v1/chat success/429/403/400 cases

  • Load tests: confirm API GW TPS enforcement; backpressure behavior

  • Chaos: Bedrock 5xx/latency injections; ensure graceful degradation

  • Security: key rotation test; WAF rule efficacy


15) Cost Model (rough)

  • API Gateway requests (per million)

  • Lambda GB‑seconds (authorizer + proxy)

  • DynamoDB RCUs/WCUs proportional to calls (two writes per request worst‑case)

  • Bedrock tokens (dominant cost) – tracked via usage metrics

Optimizations: batch writes (streaming buffer), on‑demand → provisioned with autoscaling, aggregate counters with periodic compaction.