Network policies in most Kubernetes clusters are cargo cult. Teams write them, Kubernetes accepts them, and pod-to-pod traffic continues exactly as before. The policies exist in YAML. The enforcement doesn't exist at all — because the CNI plugin they installed doesn't support NetworkPolicy.
The default CNI in kind and k3d is flannel. Flannel does not enforce NetworkPolicy. You can apply all the deny-all ingress policies you want. Every pod can still talk to every other pod. You'll find this out during a security review, not during an incident — if you're lucky.
Cilium enforces network policies. It goes further: L7 policies (HTTP method, path, gRPC method), identity-based policies that don't rely on IP addresses, and a network flow observability layer that shows you exactly which flows are allowed and which are being dropped. This is the difference between writing a policy and knowing it works.
Kyverno operates a layer up: admission control. Before a resource lands in etcd, Kyverno validates, mutates, or blocks it. Missing team labels on PrometheusRule resources? Blocked at the door. Deployments without resource limits? Mutated to add defaults before the kubelet sees them.
Together, they're the enforcement layer that gives the multi-tenant model from part three its teeth.
Local Platform Engineering Series
- Running Local Kubernetes with k3d: Fast, Ephemeral, and Kind to Your Battery
- Gateway API in Practice: From Ingress Migration to Envoy Debugging
- Multi-Tenant Observability: LGTM at Platform Scale
- Network Control with Cilium and Kyverno: Policies That Actually Work
- Observing LLM Inference: The Metrics That Actually Matter
- AI Tool Gateways: Sandboxing Agent Access in Kubernetes
Replacing the Default CNI with Cilium
k3d with the default configuration uses flannel. Swap it out at cluster creation:
apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
name: platform-local
servers: 1
agents: 2
options:
k3s:
extraArgs:
- arg: "--disable=traefik"
nodeFilters:
- server:*
- arg: "--disable-network-policy"
nodeFilters:
- server:*
- arg: "--flannel-backend=none"
nodeFilters:
- server:*apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
name: platform-local
servers: 1
agents: 2
options:
k3s:
extraArgs:
- arg: "--disable=traefik"
nodeFilters:
- server:*
- arg: "--disable-network-policy"
nodeFilters:
- server:*
- arg: "--flannel-backend=none"
nodeFilters:
- server:*--flannel-backend=none disables flannel. --disable-network-policy disables the k3s built-in NetworkPolicy controller (which is also non-enforcing when flannel is gone). We'll replace both with Cilium.
Install Cilium:
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--version 1.16.0 \
--namespace kube-system \
--set image.pullPolicy=IfNotPresent \
--set ipam.mode=kubernetes \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--waithelm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--version 1.16.0 \
--namespace kube-system \
--set image.pullPolicy=IfNotPresent \
--set ipam.mode=kubernetes \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--waithubble.relay and hubble.ui enable Cilium's network observability layer. Hubble is how you see which flows are allowed and which are being dropped — not by reading logs, but by watching live L3/L4/L7 traffic in real time.
cilium status # install the CLI: brew install cilium-clicilium status # install the CLI: brew install cilium-cliAll pods should be running and Cilium should report OK.
Default-Deny Network Policies
The first policy every multi-tenant cluster needs: deny all ingress and egress within a namespace by default, then explicitly allow what's needed. Without this baseline, NetworkPolicy is additive — you can only allow more, never restrict the default permissive state.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: payments
spec:
podSelector: {} # applies to all pods in the namespace
policyTypes:
- Ingress
- EgressapiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: payments
spec:
podSelector: {} # applies to all pods in the namespace
policyTypes:
- Ingress
- EgressApply this to every application namespace. After this, nothing can talk to anything in payments — including the pods within it. Explicitly allow what's needed:
# Allow the payments API to receive traffic from the gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-gateway-ingress
namespace: payments
spec:
podSelector:
matchLabels:
app: payments-api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: infra
podSelector:
matchLabels:
app.kubernetes.io/name: envoy
---
# Allow payments to call the identity service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-to-identity
namespace: payments
spec:
podSelector:
matchLabels:
app: payments-api
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: identity
ports:
- port: 8080
protocol: TCP
- to: # always allow DNS
- namespaceSelector: {}
ports:
- port: 53
protocol: UDP# Allow the payments API to receive traffic from the gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-gateway-ingress
namespace: payments
spec:
podSelector:
matchLabels:
app: payments-api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: infra
podSelector:
matchLabels:
app.kubernetes.io/name: envoy
---
# Allow payments to call the identity service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-to-identity
namespace: payments
spec:
podSelector:
matchLabels:
app: payments-api
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: identity
ports:
- port: 8080
protocol: TCP
- to: # always allow DNS
- namespaceSelector: {}
ports:
- port: 53
protocol: UDPThe DNS egress rule is easy to forget and hard to debug when you forget it. Always add it explicitly. A pod that can't resolve DNS looks like a connectivity failure.
Cilium L7 Policies
Standard Kubernetes NetworkPolicy operates at L3/L4 — IP addresses and ports. Cilium extends this to L7. You can write policies that allow GET /api/v1/status but deny DELETE /api/v1/users.
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: payments-api-l7
namespace: payments
spec:
endpointSelector:
matchLabels:
app: payments-api
ingress:
- fromEndpoints:
- matchLabels:
app: payments-frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: GET
path: "^/api/v1/"
- method: POST
path: "^/api/v1/payments$"apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: payments-api-l7
namespace: payments
spec:
endpointSelector:
matchLabels:
app: payments-api
ingress:
- fromEndpoints:
- matchLabels:
app: payments-frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: GET
path: "^/api/v1/"
- method: POST
path: "^/api/v1/payments$"This says: only payments-frontend can reach the API, and only on GET /api/v1/* or POST /api/v1/payments. Any other method or path from any other source is dropped at the proxy layer, before the application code runs. The application cannot accidentally expose a misconfigured endpoint.
L7 policies are expensive — Cilium has to parse every HTTP request. Use them at trust boundaries, not everywhere. The right model is: L3/L4 default-deny as the baseline everywhere, L7 policies at the perimeter of sensitive namespaces.
Observing Flows with Hubble
This is where Cilium earns its setup cost. Hubble gives you real-time network flow visibility:
kubectl port-forward -n kube-system svc/hubble-ui 12000:80kubectl port-forward -n kube-system svc/hubble-ui 12000:80Open http://localhost:12000 and select the payments namespace. You'll see a service map with live traffic flows. Green arrows are allowed flows. Red arrows are denied flows. Click a denied flow and you get the policy that dropped it, the source pod, and the destination port.
From the CLI:
hubble observe --namespace payments --last 50hubble observe --namespace payments --last 50payments/payments-api → identity/identity-api:8080 FORWARDED
payments/payments-api → monitoring/prometheus:9090 DROPPED (policy)
The second flow is being dropped because the default-deny policy in payments blocks egress to monitoring. If this was unexpected, you found a gap in your allow-list. If it was expected, you confirmed the policy is working.
This is the difference between writing a network policy and knowing it's enforced.
Kyverno: Admission Control
Kyverno intercepts every resource creation and update request. Before the API server writes to etcd, Kyverno can:
- Validate — reject resources that don't meet a policy
- Mutate — modify resources to add missing fields
- Generate — create additional resources when a resource is created
Install Kyverno:
helm repo add kyverno https://kyverno.github.io/kyverno/
helm install kyverno kyverno/kyverno \
--namespace kyverno \
--create-namespace \
--waithelm repo add kyverno https://kyverno.github.io/kyverno/
helm install kyverno kyverno/kyverno \
--namespace kyverno \
--create-namespace \
--waitRequiring resource limits
Every Deployment that lands in an application namespace should have resource limits. Without limits, a runaway pod can consume all available CPU on a node and starve neighbours. This policy enforces it at admission:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
rules:
- name: check-container-limits
match:
any:
- resources:
kinds: ["Pod"]
namespaceSelector:
matchExpressions:
- key: platform.io/managed
operator: In
values: ["true"]
validate:
message: "All containers must have CPU and memory limits."
pattern:
spec:
containers:
- (name): "*"
resources:
limits:
memory: "?*"
cpu: "?*"apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
rules:
- name: check-container-limits
match:
any:
- resources:
kinds: ["Pod"]
namespaceSelector:
matchExpressions:
- key: platform.io/managed
operator: In
values: ["true"]
validate:
message: "All containers must have CPU and memory limits."
pattern:
spec:
containers:
- (name): "*"
resources:
limits:
memory: "?*"
cpu: "?*"The namespaceSelector scopes this to namespaces with platform.io/managed: "true". Platform namespaces like kube-system and monitoring are exempt. Application namespaces are labelled at creation time.
Requiring team labels on alerts
From part three — every PrometheusRule must have a team label or Alertmanager can't route it:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-alert-team-label
spec:
validationFailureAction: Enforce
rules:
- name: check-prometheusrule-labels
match:
any:
- resources:
kinds: ["PrometheusRule"]
validate:
message: "PrometheusRule must have a 'team' label for Alertmanager routing."
pattern:
metadata:
labels:
team: "?*"apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-alert-team-label
spec:
validationFailureAction: Enforce
rules:
- name: check-prometheusrule-labels
match:
any:
- resources:
kinds: ["PrometheusRule"]
validate:
message: "PrometheusRule must have a 'team' label for Alertmanager routing."
pattern:
metadata:
labels:
team: "?*"?* means "non-empty string". A PrometheusRule without a team label is rejected before it touches Prometheus. The platform team doesn't need to audit alert configs after the fact.
Auto-generating NetworkPolicies
When a new application namespace is created, Kyverno can automatically create the default-deny policy:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-default-deny
spec:
rules:
- name: create-default-deny
match:
any:
- resources:
kinds: ["Namespace"]
selector:
matchLabels:
platform.io/managed: "true"
generate:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: default-deny-all
namespace: "{{request.object.metadata.name}}"
synchronize: true
data:
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressapiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-default-deny
spec:
rules:
- name: create-default-deny
match:
any:
- resources:
kinds: ["Namespace"]
selector:
matchLabels:
platform.io/managed: "true"
generate:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: default-deny-all
namespace: "{{request.object.metadata.name}}"
synchronize: true
data:
spec:
podSelector: {}
policyTypes:
- Ingress
- Egresssynchronize: true means Kyverno will recreate this policy if someone manually deletes it. The default-deny baseline becomes permanent and automatic. New namespaces are secure by default.
The Enforcement Model
What you've built across parts three and four is a layered enforcement model:
| Layer | Tool | What it enforces |
|---|---|---|
| Admission | Kyverno | Resource shape, required labels, generated resources |
| Network (L3/L4) | Cilium NetworkPolicy | Pod-to-pod connectivity |
| Network (L7) | CiliumNetworkPolicy | HTTP method, path, gRPC method |
| Observability | Hubble | Real-time flow visibility and policy verification |
The critical insight is that enforcement and observability must be paired. Cilium without Hubble means you can enforce policies you can't debug. Kyverno without admission validation means you catch problems in production instead of at deployment time. The combination turns "our policies say this should be safe" into "we can see that it is safe".
What's Next
Your cluster is secure, observable, and properly routed. Now point it at a real workload: an LLM inference server. Part five covers what metrics actually matter for LLM performance — TTFT is your SLO, not throughput — and which frameworks give you observability out of the box.