~/blog/ai-tool-gateway-kubernetes-agents

AI Tool Gateways: Sandboxing Agent Access in Kubernetes

9 min read

You've sandboxed containers. You've sandboxed Lambda functions. You've sandboxed database connections with least-privilege IAM roles. You haven't sandboxed your AI agents.

This is the part of the AI adoption curve most platform teams haven't reached yet. The conversation has been about capability — what can agents do, which model is best, how do you chain tools together. The security conversation follows when something goes wrong: an agent calls a delete endpoint it shouldn't have access to, leaks credentials through a prompt injection, or hammers an external API at 10,000 requests per minute because nobody set a rate limit.

An AI Tool Gateway is the proxy layer between an agent and the tools it's allowed to call. It enforces allow-lists, rate limits, auth, and logging at the boundary — the same way an API gateway enforces them for human callers. The difference is the caller is an LLM that can be prompted into attempting things a human wouldn't.


The Threat Model for AI Agents

Before building a gateway, you need a threat model. AI agents have a different attack surface from services because:

The caller is non-deterministic. A human API caller has a fixed set of actions it will attempt. An agent's actions depend on the prompt, the conversation history, the tools available, and the model's interpretation of all three. The same agent can behave differently on Tuesday than it did on Monday because the context changed.

Tool call arguments are LLM-generated. When an agent calls search_database(query="..."), the query string is generated by a model, not written by a developer. A sufficiently crafted input to the agent (prompt injection) can cause it to generate tool calls with arguments designed to exfiltrate data or cause side effects.

Agents can chain calls autonomously. A multi-step agent might call five tools in sequence without human review. If step three produces an unexpected result, the agent may attempt to correct it by calling additional tools — including ones it was given access to but wasn't expected to use.

The blast radius of a misconfigured agent is unbounded by default. An agent with access to a GitHub token and a file system tool can, if prompted, read every repository and write to any file. The fact that you didn't intend this is irrelevant.

The gateway addresses this by reducing capability to what's explicitly needed, rate-limiting what's explicitly allowed, and logging everything for audit.


What a Tool Gateway Does

A Tool Gateway sits between the agent and its tools. When an agent wants to call a tool, the request passes through the gateway, which:

  1. Authenticates the agent (which agent is this?)
  2. Authorises the call (is this agent allowed to call this tool?)
  3. Validates the arguments (do the arguments match the schema? Do they pass content inspection?)
  4. Rate limits the call (has this agent exceeded its quota?)
  5. Logs the call (what did the agent call, with what arguments, and what did it get back?)
  6. Forwards to the actual tool
  7. Returns the result — or a sanitised version of it

This is not fundamentally different from an API gateway. The operational difference is that you need this in front of every tool the agent can call, because each tool is a potential blast radius.


Building a Tool Gateway on Kubernetes

The simplest implementation uses Envoy Gateway (already running from part two) plus an Envoy extension filter for policy enforcement. For production, dedicated tools like Portkey, LiteLLM Proxy, or a custom FastAPI gateway give you more control.

Option 1: Envoy Gateway with Traffic Policy

For HTTP-based tools, the gateway from part two already handles auth and rate limiting via EnvoyProxy policy resources:

manifests/gateway/agent-traffic-policy.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: agent-rate-limit
  namespace: infra
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: tool-github
    namespace: agents
  rateLimit:
    type: Global
    global:
      rules:
        - clientSelectors:
            - headers:
                - type: Distinct
                  name: x-agent-id
          limit:
            requests: 100
            unit: Minute
manifests/gateway/agent-traffic-policy.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: agent-rate-limit
  namespace: infra
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: tool-github
    namespace: agents
  rateLimit:
    type: Global
    global:
      rules:
        - clientSelectors:
            - headers:
                - type: Distinct
                  name: x-agent-id
          limit:
            requests: 100
            unit: Minute

Rate limiting per x-agent-id header means each agent gets its own quota. One agent hitting its limit doesn't affect others.

For authentication, use Envoy's external auth filter pointed at an OIDC-issuing service. Agents authenticate with short-lived tokens scoped to their allowed tool set. The gateway validates the token before forwarding the request.

Option 2: A Dedicated Gateway Service

For more control — argument validation, content inspection, structured audit logs — a lightweight FastAPI gateway is practical:

gateway/main.py
from fastapi import FastAPI, Request, HTTPException, Depends
from fastapi.responses import JSONResponse
import httpx
import logging
import time
from typing import Any
 
app = FastAPI()
logger = logging.getLogger("tool-gateway")
 
TOOL_ALLOW_LIST: dict[str, dict] = {
    "search": {
        "upstream": "http://search-service.tools:8080",
        "allowed_agents": ["research-agent", "support-agent"],
        "rate_limit_per_minute": 60,
        "max_query_length": 500,
    },
    "code_executor": {
        "upstream": "http://executor.tools:8080",
        "allowed_agents": ["dev-agent"],
        "rate_limit_per_minute": 10,
        "blocked_patterns": ["rm -rf", "DROP TABLE", "os.system"],
    },
}
 
request_counts: dict[str, list[float]] = {}
 
def check_rate_limit(agent_id: str, tool_name: str, limit: int) -> None:
    key = f"{agent_id}:{tool_name}"
    now = time.time()
    window = 60.0
    calls = [t for t in request_counts.get(key, []) if now - t < window]
    if len(calls) >= limit:
        raise HTTPException(status_code=429, detail="Rate limit exceeded")
    calls.append(now)
    request_counts[key] = calls
 
@app.post("/tools/{tool_name}")
async def call_tool(tool_name: str, request: Request) -> JSONResponse:
    agent_id = request.headers.get("x-agent-id")
    if not agent_id:
        raise HTTPException(status_code=401, detail="Missing x-agent-id header")
 
    tool = TOOL_ALLOW_LIST.get(tool_name)
    if not tool:
        raise HTTPException(status_code=404, detail=f"Tool '{tool_name}' not found")
 
    if agent_id not in tool["allowed_agents"]:
        logger.warning(f"Agent {agent_id} attempted unauthorized access to {tool_name}")
        raise HTTPException(status_code=403, detail="Agent not authorised for this tool")
 
    check_rate_limit(agent_id, tool_name, tool["rate_limit_per_minute"])
 
    body = await request.json()
    args = body.get("arguments", {})
 
    # Content inspection
    for pattern in tool.get("blocked_patterns", []):
        for v in args.values():
            if isinstance(v, str) and pattern in v:
                logger.error(f"Blocked pattern '{pattern}' in {tool_name} call from {agent_id}")
                raise HTTPException(status_code=400, detail="Argument contains blocked content")
 
    # Audit log
    logger.info({
        "event": "tool_call",
        "agent_id": agent_id,
        "tool": tool_name,
        "arguments": args,
        "timestamp": time.time(),
    })
 
    async with httpx.AsyncClient() as client:
        response = await client.post(
            tool["upstream"] + request.url.path,
            json=body,
            headers={"x-forwarded-agent": agent_id},
            timeout=30.0,
        )
 
    return JSONResponse(content=response.json(), status_code=response.status_code)
gateway/main.py
from fastapi import FastAPI, Request, HTTPException, Depends
from fastapi.responses import JSONResponse
import httpx
import logging
import time
from typing import Any
 
app = FastAPI()
logger = logging.getLogger("tool-gateway")
 
TOOL_ALLOW_LIST: dict[str, dict] = {
    "search": {
        "upstream": "http://search-service.tools:8080",
        "allowed_agents": ["research-agent", "support-agent"],
        "rate_limit_per_minute": 60,
        "max_query_length": 500,
    },
    "code_executor": {
        "upstream": "http://executor.tools:8080",
        "allowed_agents": ["dev-agent"],
        "rate_limit_per_minute": 10,
        "blocked_patterns": ["rm -rf", "DROP TABLE", "os.system"],
    },
}
 
request_counts: dict[str, list[float]] = {}
 
def check_rate_limit(agent_id: str, tool_name: str, limit: int) -> None:
    key = f"{agent_id}:{tool_name}"
    now = time.time()
    window = 60.0
    calls = [t for t in request_counts.get(key, []) if now - t < window]
    if len(calls) >= limit:
        raise HTTPException(status_code=429, detail="Rate limit exceeded")
    calls.append(now)
    request_counts[key] = calls
 
@app.post("/tools/{tool_name}")
async def call_tool(tool_name: str, request: Request) -> JSONResponse:
    agent_id = request.headers.get("x-agent-id")
    if not agent_id:
        raise HTTPException(status_code=401, detail="Missing x-agent-id header")
 
    tool = TOOL_ALLOW_LIST.get(tool_name)
    if not tool:
        raise HTTPException(status_code=404, detail=f"Tool '{tool_name}' not found")
 
    if agent_id not in tool["allowed_agents"]:
        logger.warning(f"Agent {agent_id} attempted unauthorized access to {tool_name}")
        raise HTTPException(status_code=403, detail="Agent not authorised for this tool")
 
    check_rate_limit(agent_id, tool_name, tool["rate_limit_per_minute"])
 
    body = await request.json()
    args = body.get("arguments", {})
 
    # Content inspection
    for pattern in tool.get("blocked_patterns", []):
        for v in args.values():
            if isinstance(v, str) and pattern in v:
                logger.error(f"Blocked pattern '{pattern}' in {tool_name} call from {agent_id}")
                raise HTTPException(status_code=400, detail="Argument contains blocked content")
 
    # Audit log
    logger.info({
        "event": "tool_call",
        "agent_id": agent_id,
        "tool": tool_name,
        "arguments": args,
        "timestamp": time.time(),
    })
 
    async with httpx.AsyncClient() as client:
        response = await client.post(
            tool["upstream"] + request.url.path,
            json=body,
            headers={"x-forwarded-agent": agent_id},
            timeout=30.0,
        )
 
    return JSONResponse(content=response.json(), status_code=response.status_code)

This is not production-grade — it uses in-memory rate limiting, not Redis — but the structure is correct. The key pieces: allow-list first (no tool exists unless declared), agent authorisation check, rate limiting, content inspection, structured audit log, then forward.

Deploy it as a Kubernetes Deployment in a dedicated agents namespace, behind an HTTPRoute from the gateway:

manifests/gateway/httproutes.yaml (addition)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: tool-gateway
  namespace: agents
spec:
  parentRefs:
    - name: platform-gateway
      namespace: infra
  hostnames:
    - "tools.local"
  rules:
    - backendRefs:
        - name: tool-gateway
          port: 8080
manifests/gateway/httproutes.yaml (addition)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: tool-gateway
  namespace: agents
spec:
  parentRefs:
    - name: platform-gateway
      namespace: infra
  hostnames:
    - "tools.local"
  rules:
    - backendRefs:
        - name: tool-gateway
          port: 8080

Agents call tools at http://tools.local:8080/tools/{tool_name}. The gateway is the only path in.


Network Isolation for Agent Workloads

The gateway is software — it can fail, be misconfigured, or be bypassed if an agent has direct network access to the tool services. Cilium network policies (from part four) enforce this at the network layer:

manifests/netpol/agent-isolation.yaml
# Agents can only reach the tool gateway, not tools directly
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agents-egress
  namespace: agents
spec:
  podSelector:
    matchLabels:
      role: ai-agent
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: agents
          podSelector:
            matchLabels:
              app: tool-gateway
      ports:
        - port: 8080
          protocol: TCP
    - to:  # DNS
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP
---
# Tool services only accept calls from the gateway, not directly from agents
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tools-ingress
  namespace: tools
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: agents
          podSelector:
            matchLabels:
              app: tool-gateway
manifests/netpol/agent-isolation.yaml
# Agents can only reach the tool gateway, not tools directly
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agents-egress
  namespace: agents
spec:
  podSelector:
    matchLabels:
      role: ai-agent
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: agents
          podSelector:
            matchLabels:
              app: tool-gateway
      ports:
        - port: 8080
          protocol: TCP
    - to:  # DNS
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP
---
# Tool services only accept calls from the gateway, not directly from agents
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tools-ingress
  namespace: tools
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: agents
          podSelector:
            matchLabels:
              app: tool-gateway

Now the architecture is enforced at two layers: the gateway software layer and the Cilium network layer. An agent with a bug or a misconfigured allow-list cannot bypass the gateway even if it tries.


Audit Logging and Observability

The gateway emits structured JSON logs. Promtail (from part three) picks them up automatically. In Loki, query:

{namespace="agents", app="tool-gateway"} | json | event="tool_call"
| line_format "{{.agent_id}} → {{.tool}} ({{.arguments}})"
{namespace="agents", app="tool-gateway"} | json | event="tool_call"
| line_format "{{.agent_id}} → {{.tool}} ({{.arguments}})"

This gives you a chronological record of every tool call every agent made, filterable by agent ID, tool name, or time range. Compliance review of an agent incident becomes a Loki query, not a log archaeology exercise.

For metrics, add Prometheus instrumentation to the gateway:

from prometheus_client import Counter, Histogram
 
tool_calls_total = Counter(
    "tool_gateway_calls_total",
    "Total tool calls",
    ["agent_id", "tool", "status"]
)
tool_call_duration = Histogram(
    "tool_gateway_duration_seconds",
    "Tool call duration",
    ["agent_id", "tool"]
)
from prometheus_client import Counter, Histogram
 
tool_calls_total = Counter(
    "tool_gateway_calls_total",
    "Total tool calls",
    ["agent_id", "tool", "status"]
)
tool_call_duration = Histogram(
    "tool_gateway_duration_seconds",
    "Tool call duration",
    ["agent_id", "tool"]
)

Alert on tool_gateway_calls_total{status="403"} spiking — that's an agent repeatedly attempting to access tools it's not authorised for, which is either a bug or an active prompt injection attempt.


The Architecture in Full

Looking back across the six parts:

The k3d cluster from part one is the substrate. Envoy Gateway routes to everything. Cilium enforces network boundaries. Kyverno enforces resource shape. The LGTM stack observes the whole thing. The Tool Gateway is the access control boundary for agent workloads.

Every layer is running locally. Every layer is production-identical in design. When this moves to a real cluster, the changes are infrastructure (a real load balancer, real persistent storage, real authentication providers) — the architecture doesn't change.


The Pattern Worth Taking Away

AI agents are not a new category of compute that needs a new security model. They're a new type of caller that needs the same security model applied deliberately: least-privilege access, rate limiting, auth at the boundary, audit logging, and network isolation.

The platform teams that get this right early will be the ones who can deploy agent workloads in production confidently. The ones who don't will be the ones reading an incident retro six months from now about a prompt injection that called a deletion endpoint nobody knew the agent had access to.

Build the gateway before you need it. You'll know why the moment you turn on an agent with real credentials.