| title | API Proxy Sidecar |
|---|---|
| description | Secure LLM API credential management using an isolated proxy sidecar container. |
The AWF firewall supports an optional Node.js-based API proxy sidecar that securely holds LLM API credentials and automatically injects authentication headers while routing all traffic through Squid to respect domain whitelisting.
:::note For a deep dive into how AWF handles authentication tokens and credential isolation, see the Authentication Architecture guide. :::
When enabled, the API proxy sidecar:
- Isolates credentials: API keys are never exposed to the agent container
- Auto-authentication: Automatically injects Bearer tokens and API keys
- Dual provider support: Supports both OpenAI (Codex) and Anthropic (Claude) APIs
- Transparent proxying: Agent code uses standard SDK environment variables
- Squid routing: All traffic routes through Squid to respect domain whitelisting
┌─────────────────────────────────────────────────┐
│ AWF Network (172.30.0.0/24) │
│ │
│ ┌──────────────┐ ┌─────────────────┐ │
│ │ Squid │◄──────│ Node.js Proxy │ │
│ │ 172.30.0.10 │ │ 172.30.0.30 │ │
│ └──────┬───────┘ └─────────────────┘ │
│ │ ▲ │
│ │ ┌──────────────────────────────┐ │
│ │ │ Agent Container │ │
│ │ │ 172.30.0.20 │ │
│ │ │ OPENAI_BASE_URL= │ │
│ │ │ http://172.30.0.30:10000 │────┘
│ │ │ ANTHROPIC_BASE_URL= │
│ │ │ http://172.30.0.30:10001 │
│ │ └──────────────────────────────┘
│ │
└─────────┼─────────────────────────────────────┘
│ (Domain whitelist enforced)
↓
api.openai.com or api.anthropic.com
Traffic flow:
- Agent makes a request to
172.30.0.30:10000(OpenAI) or172.30.0.30:10001(Anthropic) - API proxy strips any client-supplied auth headers and injects the real credentials
- API proxy routes the request through Squid via
HTTP_PROXY/HTTPS_PROXY - Squid enforces the domain whitelist (only allowed domains pass)
- Request reaches
api.openai.comorapi.anthropic.com
# Set API keys in environment
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Enable API proxy sidecar
sudo awf --enable-api-proxy \
--allow-domains api.openai.com,api.anthropic.com \
-- your-commandexport OPENAI_API_KEY="sk-..."
sudo awf --enable-api-proxy \
--allow-domains api.openai.com \
-- npx @openai/codex -p "write a hello world function"The agent container automatically uses http://172.30.0.30:10000 as the OpenAI base URL.
export ANTHROPIC_API_KEY="sk-ant-..."
sudo awf --enable-api-proxy \
--allow-domains api.anthropic.com \
-- claude-code "write a hello world function"The agent container automatically uses http://172.30.0.30:10001 as the Anthropic base URL.
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
sudo awf --enable-api-proxy \
--allow-domains api.openai.com,api.anthropic.com \
-- your-multi-llm-toolAWF manages environment variables differently across the three containers (squid, api-proxy, agent) to ensure secure credential isolation.
The Squid proxy container runs with minimal environment variables:
| Variable | Value | Description |
|---|---|---|
HTTP_PROXY |
Not set | Squid is the proxy, not a client |
HTTPS_PROXY |
Not set | Squid is the proxy, not a client |
The API proxy sidecar receives real credentials and routing configuration:
| Variable | Value | When set | Description |
|---|---|---|---|
OPENAI_API_KEY |
Real API key | --enable-api-proxy and env set |
OpenAI API key (injected into requests) |
ANTHROPIC_API_KEY |
Real API key | --enable-api-proxy and env set |
Anthropic API key (injected into requests) |
COPILOT_GITHUB_TOKEN |
Real token | --enable-api-proxy and env set |
GitHub Copilot token (injected into requests) |
COPILOT_API_KEY |
Real API key | --enable-api-proxy and env set |
GitHub Copilot BYOK key (injected into requests) |
GEMINI_API_KEY |
Real API key | --enable-api-proxy and env set |
Google Gemini API key (injected into requests) |
HTTP_PROXY |
http://172.30.0.10:3128 |
Always | Routes through Squid for domain filtering |
HTTPS_PROXY |
http://172.30.0.10:3128 |
Always | Routes through Squid for domain filtering |
:::danger[Real credentials in api-proxy] The api-proxy container holds real, unredacted credentials. These are used to authenticate requests to LLM providers. This container is isolated from the agent and has all capabilities dropped for security. :::
The agent container receives redacted placeholders and proxy URLs:
| Variable | Value | When set | Description |
|---|---|---|---|
OPENAI_BASE_URL |
http://172.30.0.30:10000 |
OPENAI_API_KEY provided to host |
Redirects OpenAI SDK to proxy |
ANTHROPIC_BASE_URL |
http://172.30.0.30:10001 |
ANTHROPIC_API_KEY provided to host |
Redirects Anthropic SDK to proxy |
ANTHROPIC_AUTH_TOKEN |
placeholder-token-for-credential-isolation |
ANTHROPIC_API_KEY provided to host |
Placeholder token (real auth via BASE_URL) |
CLAUDE_CODE_API_KEY_HELPER |
/usr/local/bin/get-claude-key.sh |
ANTHROPIC_API_KEY provided to host |
Helper script for Claude Code CLI |
COPILOT_API_URL |
http://172.30.0.30:10002 |
COPILOT_GITHUB_TOKEN or COPILOT_API_KEY provided to host |
Redirects Copilot CLI to proxy |
COPILOT_TOKEN |
placeholder-token-for-credential-isolation |
COPILOT_GITHUB_TOKEN or COPILOT_API_KEY provided to host |
Placeholder token (real auth via API_URL) |
COPILOT_GITHUB_TOKEN |
placeholder-token-for-credential-isolation |
COPILOT_GITHUB_TOKEN provided to host |
Placeholder token protected by one-shot-token |
COPILOT_API_KEY |
placeholder-token-for-credential-isolation |
COPILOT_API_KEY provided to host |
BYOK placeholder token protected by one-shot-token |
COPILOT_OFFLINE |
true |
COPILOT_API_KEY provided to host |
Enables offline+BYOK mode (skips GitHub OAuth handshake) |
COPILOT_PROVIDER_BASE_URL |
http://172.30.0.30:10002 |
COPILOT_API_KEY provided to host |
Points Copilot CLI BYOK provider at sidecar |
COPILOT_PROVIDER_API_KEY |
placeholder-token-for-credential-isolation |
COPILOT_API_KEY provided to host |
BYOK provider API key placeholder (real key in sidecar) |
GOOGLE_GEMINI_BASE_URL |
http://172.30.0.30:10003 |
GEMINI_API_KEY provided to host |
Redirects Gemini CLI to proxy (primary var read by Gemini CLI) |
GEMINI_API_BASE_URL |
http://172.30.0.30:10003 |
GEMINI_API_KEY provided to host |
Redirects Gemini SDK to proxy (kept for backward compatibility) |
GEMINI_API_KEY |
gemini-api-key-placeholder-for-credential-isolation |
GEMINI_API_KEY provided to host |
Placeholder so Gemini CLI auth check passes (real key in sidecar) |
OPENAI_API_KEY |
Not set | --enable-api-proxy |
Excluded from agent (held in api-proxy) |
ANTHROPIC_API_KEY |
Not set | --enable-api-proxy |
Excluded from agent (held in api-proxy) |
HTTP_PROXY |
http://172.30.0.10:3128 |
Always | Routes through Squid proxy |
HTTPS_PROXY |
http://172.30.0.10:3128 |
Always | Routes through Squid proxy |
NO_PROXY |
localhost,127.0.0.1,172.30.0.30 |
--enable-api-proxy |
Bypass proxy for localhost and api-proxy |
AWF_API_PROXY_IP |
172.30.0.30 |
--enable-api-proxy |
Used by iptables setup script |
AWF_ONE_SHOT_TOKENS |
COPILOT_GITHUB_TOKEN,GITHUB_TOKEN,... |
Always | Tokens protected by one-shot-token library |
:::note[Gemini setup is conditional]
GOOGLE_GEMINI_BASE_URL, GEMINI_API_BASE_URL, the GEMINI_API_KEY placeholder, the ~/.gemini home directory mount, and the AWF_GEMINI_ENABLED signal are only configured when GEMINI_API_KEY is provided to the host AWF process. This avoids spurious log entries and unnecessary directory setup in non-Gemini runs (e.g. Copilot-only workflows).
GOOGLE_GEMINI_BASE_URL is the primary variable read by the Gemini CLI (google-gemini/gemini-cli). GEMINI_API_BASE_URL is kept for backward compatibility with older SDK versions.
Important: GEMINI_API_KEY must be set as a runner-level environment variable (e.g. env: GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} in the workflow step), not only as a GitHub Actions secret. The AWF process running on the runner must be able to read it so it can pass the key to the api-proxy sidecar container.
:::
:::tip[Placeholder tokens]
Token variables in the agent are set to placeholder-token-for-credential-isolation instead of real values. This ensures:
- Agent code cannot exfiltrate credentials
- CLI tools that check for token presence still work
- Real authentication happens via the
*_BASE_URLor*_API_URLenvironment variables - The one-shot-token library protects placeholder values from being read more than once :::
These environment variables are recognized by:
- OpenAI Python SDK (
openai) - OpenAI Node.js SDK (
openai) - Anthropic Python SDK (
anthropic) - Anthropic TypeScript SDK (
@anthropic-ai/sdk) - GitHub Copilot CLI (
@github/copilot) - Codex CLI
- Claude Code CLI
:::tip
You don't need to change any agent code. The SDKs automatically read *_BASE_URL environment variables and redirect API calls through the proxy.
:::
API keys are held in the sidecar container, not the agent:
- Agent code cannot read API keys from environment variables
- A compromised agent cannot exfiltrate credentials
- Keys are not exposed in the agent container's stdout/stderr logs
:::danger[Protect host credentials] API keys are stored in the sidecar container's environment and in the Docker Compose configuration on disk. Protect the host filesystem and configuration accordingly. Only non-sensitive key prefixes are logged for debugging. :::
The proxy enforces domain-level egress control:
- The agent can only reach the API proxy IP (
172.30.0.30) for API calls - The sidecar routes all traffic through Squid proxy
- Squid enforces the domain whitelist (L7 filtering)
- iptables rules prevent the agent from bypassing the proxy
The sidecar has strict resource constraints:
- 512 MB memory limit
- 100 process limit
- All capabilities dropped
no-new-privilegessecurity option
When you pass --enable-api-proxy:
- AWF starts a Node.js API proxy at
172.30.0.30 - API keys are passed to the sidecar via environment variables
HTTP_PROXY/HTTPS_PROXYin the sidecar are configured to route through Squid- The agent container waits for the sidecar health check to pass
Agent Code
↓ (HTTP request to 172.30.0.30:10000)
Node.js API Proxy
↓ (strips client auth headers)
↓ (injects Authorization: Bearer $OPENAI_API_KEY)
↓ (routes via HTTPS_PROXY to Squid)
Squid Proxy
↓ (enforces domain whitelist)
↓ (TLS connection to api.openai.com)
OpenAI API
The Node.js proxy automatically:
- Strips any client-supplied
Authorization,x-api-key,Proxy-Authorization, andX-Forwarded-*headers - Injects the correct authentication headers:
- OpenAI:
Authorization: Bearer $OPENAI_API_KEY - Anthropic:
x-api-key: $ANTHROPIC_API_KEYandanthropic-version: 2023-06-01(if not already set by the client)
- OpenAI:
:::caution The proxy enforces a 10 MB request body size limit to prevent denial-of-service via large payloads. :::
Before running the user command, the agent container runs a health check script (api-proxy-health-check.sh) that verifies:
- API keys are not present in the agent environment (credential isolation working)
- The API proxy is reachable and responding (connectivity established)
If either check fails, the agent exits immediately without running the user command.
sudo awf --enable-api-proxy [OPTIONS] -- COMMANDRequired environment variables (at least one):
OPENAI_API_KEY— OpenAI API keyANTHROPIC_API_KEY— Anthropic API keyGEMINI_API_KEY— Google Gemini API keyCOPILOT_GITHUB_TOKEN— GitHub Copilot access tokenCOPILOT_API_KEY— GitHub Copilot API key (BYOK)
:::caution[GitHub Actions: expose keys as runner env vars]
When running AWF in a GitHub Actions workflow, API keys must be available as runner-level environment variables — not just as GitHub Actions secrets. AWF reads the key from the environment at startup to pass it to the api-proxy sidecar container. Use env: in the workflow step and sudo --preserve-env to ensure keys pass through:
- name: Run agent
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
run: sudo --preserve-env=GEMINI_API_KEY awf --enable-api-proxy ...Note:
sudostrips most environment variables by default. Use--preserve-env=VAR(orsudo -Eto preserve all) to ensure API keys are visible to the AWF process.
If the key is present only in secrets.* but not exported into the step's env:, AWF will warn that no Gemini key was found and the api-proxy Gemini listener will return 503.
:::
Recommended domain whitelist:
api.openai.com— for OpenAI/Codexapi.anthropic.com— for Anthropic/Claude
Optional flags for custom upstream endpoints:
| Flag | Default | Description |
|---|---|---|
--openai-api-target <host> |
api.openai.com |
Custom upstream for OpenAI API requests (e.g. Azure OpenAI or an internal LLM router). Can also be set via OPENAI_API_TARGET env var. |
--anthropic-api-target <host> |
api.anthropic.com |
Custom upstream for Anthropic API requests (e.g. an internal Claude router). Can also be set via ANTHROPIC_API_TARGET env var. |
--copilot-api-target <host> |
auto-derived | Custom upstream for GitHub Copilot API requests (useful for GHES). Can also be set via COPILOT_API_TARGET env var. |
Important: When using a custom
--openai-api-targetor--anthropic-api-target, you must add the target domain to--allow-domainsso the firewall permits outbound traffic. AWF will emit a warning if a custom target is set but not in the allowlist.
The sidecar container:
- Image:
ghcr.io/github/gh-aw-firewall/api-proxy:latest - Base:
node:22-alpine - Network:
awf-netat172.30.0.30 - Ports: 10000 (OpenAI), 10001 (Anthropic), 10002 (GitHub Copilot), 10003 (Google Gemini), 10004 (OpenCode, requires
--enable-opencode) - Proxy: Routes via Squid at
http://172.30.0.10:3128
Docker healthcheck on the /health endpoint (port 10000):
- Interval: 1s
- Timeout: 1s
- Retries: 5
- Start period: 2s
The /health endpoint returns a JSON object that includes a models_fetch_complete field, indicating whether the startup model-discovery pass has finished:
{
"status": "healthy",
"service": "awf-api-proxy",
"squid_proxy": "http://172.30.0.10:3128",
"providers": { "openai": true, "anthropic": false, "gemini": false, "copilot": false },
"key_validation": { "complete": true, "results": { "openai": "valid" } },
"models_fetch_complete": true,
"metrics_summary": { "total_requests": 0, "success_rate": 100, "avg_latency_ms": 0 },
"rate_limits": {}
}Use models_fetch_complete as a readiness gate before submitting the first inference request, ensuring model lists are warm. See the Readiness polling recipe below.
Poll /health (or /reflect) until models_fetch_complete: true before launching the agent command, so model lists are fully cached:
# Wait up to 30 seconds for model discovery to complete
for i in $(seq 1 30); do
result=$(curl -sf http://172.30.0.30:10000/health 2>/dev/null)
if [ "$(echo "$result" | jq -r '.models_fetch_complete')" = "true" ]; then
echo "Model discovery complete"
break
fi
echo "Waiting for model discovery... ($i/30)"
sleep 1
doneOr use /reflect directly if you also need the model lists:
curl -sf http://172.30.0.30:10000/reflect | jq '.models_fetch_complete, .endpoints[].models'The management port (10000) also exposes a GET /reflect endpoint for dynamic provider and model discovery. This allows agent harnesses to query which providers are configured and which models are available at runtime.
curl http://172.30.0.30:10000/reflectExample response:
{
"endpoints": [
{
"provider": "openai",
"port": 10000,
"base_url": "http://api-proxy:10000",
"configured": true,
"models": ["gpt-4o", "gpt-4o-mini"],
"models_url": "http://api-proxy:10000/v1/models"
},
{
"provider": "anthropic",
"port": 10001,
"base_url": "http://api-proxy:10001",
"configured": false,
"models": null,
"models_url": "http://api-proxy:10001/v1/models"
},
{
"provider": "copilot",
"port": 10002,
"base_url": "http://api-proxy:10002",
"configured": true,
"models": ["gpt-4o", "claude-3.5-sonnet"],
"models_url": "http://api-proxy:10002/models"
},
{
"provider": "gemini",
"port": 10003,
"base_url": "http://api-proxy:10003",
"configured": false,
"models": null,
"models_url": "http://api-proxy:10003/v1beta/models"
},
{
"provider": "opencode",
"port": 10004,
"base_url": "http://api-proxy:10004",
"configured": true,
"models": null,
"models_url": null
}
],
"models_fetch_complete": true
}Fields:
configured—trueif an API key for this provider was found at startupmodels— list of model IDs fetched from the provider at startup;nullif the provider is not configured or model fetch failedmodels_fetch_complete—trueonce the startup model-fetch pass has finishedmodels_url— URL to query for the live model list;nullfor OpenCode (which routes to other providers)
When --enable-api-proxy is active and GEMINI_API_KEY is provided to the AWF runner, GOOGLE_GEMINI_BASE_URL, GEMINI_API_BASE_URL, and a placeholder GEMINI_API_KEY are injected into the agent container. If the real GEMINI_API_KEY was not set in the AWF runner environment, the Gemini routing vars are never set and the api-proxy Gemini listener (port 10003) responds with 503 to any requests that do reach it.
Solution: Export GEMINI_API_KEY in the runner environment before invoking AWF. In GitHub Actions, add it to the step's env: block and use sudo --preserve-env:
- name: Run Gemini agent
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
run: |
sudo --preserve-env=GEMINI_API_KEY \
awf --enable-api-proxy \
--allow-domains generativelanguage.googleapis.com \
-- gemini ...Note: Exit code 41 ("no auth method") should no longer occur with
--enable-api-proxysince the placeholder key satisfies the CLI's pre-flight check. If you see exit 41, ensure--enable-api-proxyis active.
⚠️ API proxy enabled but no API keys found in environment
Set OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, COPILOT_GITHUB_TOKEN, or COPILOT_API_KEY to use the proxy
Solution: Export API keys before running awf (use sudo --preserve-env in CI):
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."Check if the API proxy container started:
docker ps | grep awf-api-proxyView API proxy logs:
docker logs awf-api-proxyEnsure the API domains are whitelisted:
sudo awf --enable-api-proxy \
--allow-domains api.openai.com,api.anthropic.com \
-- your-commandCheck Squid logs for denied requests:
docker exec awf-squid cat /var/log/squid/access.log | grep DENIEDAWF supports OIDC-based credential exchange with multiple cloud providers via GitHub Actions workload identity federation. Set AWF_AUTH_TYPE=github-oidc and AWF_AUTH_PROVIDER to select the provider.
| Variable | Required | Description |
|---|---|---|
AWF_AUTH_TYPE |
✅ | Set to github-oidc to enable OIDC authentication |
AWF_AUTH_PROVIDER |
No | Cloud provider: azure (default), aws, or gcp |
AWF_AUTH_OIDC_AUDIENCE |
No | Override the OIDC audience (provider-specific defaults apply) |
ACTIONS_ID_TOKEN_REQUEST_URL |
✅ | Provided automatically by the GitHub Actions runtime |
ACTIONS_ID_TOKEN_REQUEST_TOKEN |
✅ | Provided automatically by the GitHub Actions runtime |
Exchanges the GitHub OIDC JWT for an Azure AD access token via workload identity federation, then injects it as a Bearer token on upstream requests.
| Variable | Required | Default | Description |
|---|---|---|---|
AWF_AUTH_AZURE_TENANT_ID |
✅ | — | Azure AD tenant ID |
AWF_AUTH_AZURE_CLIENT_ID |
✅ | — | Azure AD application (client) ID for the federated credential |
AWF_AUTH_AZURE_SCOPE |
No | https://cognitiveservices.azure.com/.default |
Azure token scope |
AWF_AUTH_AZURE_CLOUD |
No | public |
Azure cloud environment (public, usgovernment, or china) |
Default OIDC audience: api://AzureADTokenExchange
jobs:
agent:
permissions:
id-token: write # required for OIDC token request
contents: read
steps:
- name: Run agent with Azure OpenAI
env:
AWF_AUTH_TYPE: github-oidc
AWF_AUTH_AZURE_TENANT_ID: ${{ vars.AZURE_TENANT_ID }}
AWF_AUTH_AZURE_CLIENT_ID: ${{ vars.AZURE_CLIENT_ID }}
OPENAI_API_TARGET: my-deployment.openai.azure.com
run: |
sudo --preserve-env=AWF_AUTH_TYPE,AWF_AUTH_AZURE_TENANT_ID,AWF_AUTH_AZURE_CLIENT_ID,OPENAI_API_TARGET \
awf --enable-api-proxy \
--openai-api-target my-deployment.openai.azure.com \
--allow-domains my-deployment.openai.azure.com \
-- your-agent-command:::caution
Azure OpenAI deployments use a different base URL format from OpenAI. Set --openai-api-target to your Azure endpoint hostname and add it to --allow-domains.
:::
Exchanges the GitHub OIDC JWT for temporary AWS credentials via STS AssumeRoleWithWebIdentity. The sidecar uses these credentials to sign requests to AWS Bedrock using SigV4.
| Variable | Required | Default | Description |
|---|---|---|---|
AWF_AUTH_AWS_ROLE_ARN |
✅ | — | IAM role ARN to assume via OIDC federation |
AWF_AUTH_AWS_REGION |
✅ | — | AWS region for the Bedrock endpoint |
AWF_AUTH_AWS_ROLE_SESSION_NAME |
No | awf-oidc-session |
Session name for the STS call |
Default OIDC audience: sts.amazonaws.com
jobs:
agent:
permissions:
id-token: write
contents: read
steps:
- name: Run agent with AWS Bedrock
env:
AWF_AUTH_TYPE: github-oidc
AWF_AUTH_PROVIDER: aws
AWF_AUTH_AWS_ROLE_ARN: ${{ vars.AWS_ROLE_ARN }}
AWF_AUTH_AWS_REGION: us-east-1
run: |
sudo --preserve-env=AWF_AUTH_TYPE,AWF_AUTH_PROVIDER,AWF_AUTH_AWS_ROLE_ARN,AWF_AUTH_AWS_REGION \
awf --enable-api-proxy \
--allow-domains bedrock-runtime.us-east-1.amazonaws.com,sts.us-east-1.amazonaws.com \
-- your-agent-command:::note AWS Bedrock uses IAM/SigV4 request signing rather than Bearer tokens. The sidecar signs the complete request (method, path, headers, body hash) with the temporary credentials. :::
Exchanges the GitHub OIDC JWT for a GCP access token via the Security Token Service, optionally followed by service account impersonation. The resulting token is injected as a Bearer token.
| Variable | Required | Default | Description |
|---|---|---|---|
AWF_AUTH_GCP_WORKLOAD_IDENTITY_PROVIDER |
✅ | — | Full resource name of the Workload Identity Provider |
AWF_AUTH_GCP_SERVICE_ACCOUNT |
No | — | Service account email to impersonate (omit for direct access) |
AWF_AUTH_GCP_SCOPE |
No | https://www.googleapis.com/auth/cloud-platform |
OAuth2 scope |
Default OIDC audience: the gcpWorkloadIdentityProvider value
jobs:
agent:
permissions:
id-token: write
contents: read
steps:
- name: Run agent with GCP Vertex AI
env:
AWF_AUTH_TYPE: github-oidc
AWF_AUTH_PROVIDER: gcp
AWF_AUTH_GCP_WORKLOAD_IDENTITY_PROVIDER: projects/123456/locations/global/workloadIdentityPools/my-pool/providers/github
AWF_AUTH_GCP_SERVICE_ACCOUNT: my-sa@my-project.iam.gserviceaccount.com
run: |
sudo --preserve-env=AWF_AUTH_TYPE,AWF_AUTH_PROVIDER,AWF_AUTH_GCP_WORKLOAD_IDENTITY_PROVIDER,AWF_AUTH_GCP_SERVICE_ACCOUNT \
awf --enable-api-proxy \
--allow-domains sts.googleapis.com,iamcredentials.googleapis.com,us-central1-aiplatform.googleapis.com \
-- your-agent-command:::note
ACTIONS_ID_TOKEN_REQUEST_URL and ACTIONS_ID_TOKEN_REQUEST_TOKEN are injected by the Actions runner automatically; AWF forwards them to the sidecar when AWF_AUTH_TYPE=github-oidc.
:::
:::tip
When gcpServiceAccount is omitted, the federated token is used directly without service account impersonation. This requires that the federated principal has direct access grants on the target resource.
:::
The API proxy can enforce a cumulative effective token budget per run. When enabled, the proxy tracks weighted token usage across all LLM requests and rejects new requests once the budget is exhausted.
Set in the AWF config file (not available as a CLI flag):
{
"apiProxy": {
"maxEffectiveTokens": 500000,
"modelMultipliers": {
"o3-pro": 15,
"o3": 4,
"claude-sonnet-4-20250514": 1,
"gpt-4.1-mini": 0.5
}
}
}Raw token counts from upstream responses are not treated equally. Each category has a fixed weight that reflects its relative cost:
| Category | Weight | Example fields |
|---|---|---|
| Input | ×1.0 | prompt_tokens, input_tokens |
| Cache read | ×0.1 | cache_read_input_tokens |
| Output | ×4.0 | completion_tokens, output_tokens |
| Reasoning | ×4.0 | reasoning_tokens |
The formula for a single response is:
effective_tokens = model_multiplier × (1.0×input + 0.1×cache_read + 4.0×output + 4.0×reasoning)
If no model multiplier is configured, it defaults to 1.
After each successful upstream response, the proxy accumulates the effective tokens. Before forwarding the next request, the proxy checks the running total:
- Under budget: Request is forwarded normally.
- Budget reached or exceeded: Request is rejected immediately with:
-
HTTP
429 Too Many Requests -
Error body:
{ "error": { "type": "effective_tokens_limit_exceeded", "message": "Maximum effective tokens exceeded (512345.67 / 500000).", "total_effective_tokens": 512345.67, "max_effective_tokens": 500000 } }
-
WebSocket upgrade requests are also rejected with 429 when the budget is reached or exceeded.
:::caution Once the budget is reached or exceeded, all subsequent requests in the run are rejected. The budget is not recoverable — there is no way to "free up" tokens within a single run. :::
The proxy tracks which usage thresholds have been crossed:
| Threshold | Tracked once per run |
|---|---|
| 50% | Yes |
| 75% | Yes |
| 90% | Yes |
| 95% | Yes |
Crossed thresholds are exposed via /reflect in effective_tokens.thresholds_crossed.
Query the /reflect endpoint on any provider port to see the current budget state:
curl http://172.30.0.30:10000/reflectThe response includes:
{
"effective_tokens": {
"enabled": true,
"max_effective_tokens": 500000,
"total_effective_tokens": 234567.89,
"remaining_effective_tokens": 265432.11,
"percent_used": 46.91,
"thresholds_crossed": []
}
}Agents and orchestrators should detect the 429 response and the effective_tokens_limit_exceeded error type. The error body is structured JSON and can be parsed programmatically:
if (response.status === 429) {
const body = await response.json();
if (body.error?.type === 'effective_tokens_limit_exceeded') {
// Budget exhausted — stop making API calls
console.log(`Token budget exceeded: ${body.error.total_effective_tokens} / ${body.error.max_effective_tokens}`);
}
}- Keys must be set as environment variables (not file-based)
- No request/response logging (by design, for security)
- Authentication Architecture — detailed credential isolation internals
- Security — overall security model
- Environment Variables — environment variable configuration
- Troubleshooting — common issues and solutions
- Architecture — overall system architecture