~/blog/cilium-kyverno-kubernetes-networking

Network Control with Cilium and Kyverno: Policies That Actually Work

8 min read

Network policies in most Kubernetes clusters are cargo cult. Teams write them, Kubernetes accepts them, and pod-to-pod traffic continues exactly as before. The policies exist in YAML. The enforcement doesn't exist at all — because the CNI plugin they installed doesn't support NetworkPolicy.

The default CNI in kind and k3d is flannel. Flannel does not enforce NetworkPolicy. You can apply all the deny-all ingress policies you want. Every pod can still talk to every other pod. You'll find this out during a security review, not during an incident — if you're lucky.

Cilium enforces network policies. It goes further: L7 policies (HTTP method, path, gRPC method), identity-based policies that don't rely on IP addresses, and a network flow observability layer that shows you exactly which flows are allowed and which are being dropped. This is the difference between writing a policy and knowing it works.

Kyverno operates a layer up: admission control. Before a resource lands in etcd, Kyverno validates, mutates, or blocks it. Missing team labels on PrometheusRule resources? Blocked at the door. Deployments without resource limits? Mutated to add defaults before the kubelet sees them.

Together, they're the enforcement layer that gives the multi-tenant model from part three its teeth.


Replacing the Default CNI with Cilium

k3d with the default configuration uses flannel. Swap it out at cluster creation:

k3d-config.yaml
apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
  name: platform-local
servers: 1
agents: 2
options:
  k3s:
    extraArgs:
      - arg: "--disable=traefik"
        nodeFilters:
          - server:*
      - arg: "--disable-network-policy"
        nodeFilters:
          - server:*
      - arg: "--flannel-backend=none"
        nodeFilters:
          - server:*
k3d-config.yaml
apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
  name: platform-local
servers: 1
agents: 2
options:
  k3s:
    extraArgs:
      - arg: "--disable=traefik"
        nodeFilters:
          - server:*
      - arg: "--disable-network-policy"
        nodeFilters:
          - server:*
      - arg: "--flannel-backend=none"
        nodeFilters:
          - server:*

--flannel-backend=none disables flannel. --disable-network-policy disables the k3s built-in NetworkPolicy controller (which is also non-enforcing when flannel is gone). We'll replace both with Cilium.

Install Cilium:

helm repo add cilium https://helm.cilium.io/
 
helm install cilium cilium/cilium \
  --version 1.16.0 \
  --namespace kube-system \
  --set image.pullPolicy=IfNotPresent \
  --set ipam.mode=kubernetes \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --wait
helm repo add cilium https://helm.cilium.io/
 
helm install cilium cilium/cilium \
  --version 1.16.0 \
  --namespace kube-system \
  --set image.pullPolicy=IfNotPresent \
  --set ipam.mode=kubernetes \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --wait

hubble.relay and hubble.ui enable Cilium's network observability layer. Hubble is how you see which flows are allowed and which are being dropped — not by reading logs, but by watching live L3/L4/L7 traffic in real time.

cilium status  # install the CLI: brew install cilium-cli
cilium status  # install the CLI: brew install cilium-cli

All pods should be running and Cilium should report OK.


Default-Deny Network Policies

The first policy every multi-tenant cluster needs: deny all ingress and egress within a namespace by default, then explicitly allow what's needed. Without this baseline, NetworkPolicy is additive — you can only allow more, never restrict the default permissive state.

manifests/netpol/default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: payments
spec:
  podSelector: {}      # applies to all pods in the namespace
  policyTypes:
    - Ingress
    - Egress
manifests/netpol/default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: payments
spec:
  podSelector: {}      # applies to all pods in the namespace
  policyTypes:
    - Ingress
    - Egress

Apply this to every application namespace. After this, nothing can talk to anything in payments — including the pods within it. Explicitly allow what's needed:

manifests/netpol/payments-allow.yaml
# Allow the payments API to receive traffic from the gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-gateway-ingress
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: infra
          podSelector:
            matchLabels:
              app.kubernetes.io/name: envoy
---
# Allow payments to call the identity service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-identity
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: identity
      ports:
        - port: 8080
          protocol: TCP
    - to:                    # always allow DNS
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP
manifests/netpol/payments-allow.yaml
# Allow the payments API to receive traffic from the gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-gateway-ingress
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: infra
          podSelector:
            matchLabels:
              app.kubernetes.io/name: envoy
---
# Allow payments to call the identity service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-identity
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: identity
      ports:
        - port: 8080
          protocol: TCP
    - to:                    # always allow DNS
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP

The DNS egress rule is easy to forget and hard to debug when you forget it. Always add it explicitly. A pod that can't resolve DNS looks like a connectivity failure.


Cilium L7 Policies

Standard Kubernetes NetworkPolicy operates at L3/L4 — IP addresses and ports. Cilium extends this to L7. You can write policies that allow GET /api/v1/status but deny DELETE /api/v1/users.

manifests/cilium/l7-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: payments-api-l7
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: payments-api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: payments-frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: GET
                path: "^/api/v1/"
              - method: POST
                path: "^/api/v1/payments$"
manifests/cilium/l7-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: payments-api-l7
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: payments-api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: payments-frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: GET
                path: "^/api/v1/"
              - method: POST
                path: "^/api/v1/payments$"

This says: only payments-frontend can reach the API, and only on GET /api/v1/* or POST /api/v1/payments. Any other method or path from any other source is dropped at the proxy layer, before the application code runs. The application cannot accidentally expose a misconfigured endpoint.

L7 policies are expensive — Cilium has to parse every HTTP request. Use them at trust boundaries, not everywhere. The right model is: L3/L4 default-deny as the baseline everywhere, L7 policies at the perimeter of sensitive namespaces.


Observing Flows with Hubble

This is where Cilium earns its setup cost. Hubble gives you real-time network flow visibility:

kubectl port-forward -n kube-system svc/hubble-ui 12000:80
kubectl port-forward -n kube-system svc/hubble-ui 12000:80

Open http://localhost:12000 and select the payments namespace. You'll see a service map with live traffic flows. Green arrows are allowed flows. Red arrows are denied flows. Click a denied flow and you get the policy that dropped it, the source pod, and the destination port.

From the CLI:

hubble observe --namespace payments --last 50
hubble observe --namespace payments --last 50
payments/payments-api → identity/identity-api:8080 FORWARDED
payments/payments-api → monitoring/prometheus:9090 DROPPED (policy)

The second flow is being dropped because the default-deny policy in payments blocks egress to monitoring. If this was unexpected, you found a gap in your allow-list. If it was expected, you confirmed the policy is working.

This is the difference between writing a network policy and knowing it's enforced.


Kyverno: Admission Control

Kyverno intercepts every resource creation and update request. Before the API server writes to etcd, Kyverno can:

  • Validate — reject resources that don't meet a policy
  • Mutate — modify resources to add missing fields
  • Generate — create additional resources when a resource is created

Install Kyverno:

helm repo add kyverno https://kyverno.github.io/kyverno/
helm install kyverno kyverno/kyverno \
  --namespace kyverno \
  --create-namespace \
  --wait
helm repo add kyverno https://kyverno.github.io/kyverno/
helm install kyverno kyverno/kyverno \
  --namespace kyverno \
  --create-namespace \
  --wait

Requiring resource limits

Every Deployment that lands in an application namespace should have resource limits. Without limits, a runaway pod can consume all available CPU on a node and starve neighbours. This policy enforces it at admission:

manifests/kyverno/require-limits.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-container-limits
      match:
        any:
          - resources:
              kinds: ["Pod"]
              namespaceSelector:
                matchExpressions:
                  - key: platform.io/managed
                    operator: In
                    values: ["true"]
      validate:
        message: "All containers must have CPU and memory limits."
        pattern:
          spec:
            containers:
              - (name): "*"
                resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"
manifests/kyverno/require-limits.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-container-limits
      match:
        any:
          - resources:
              kinds: ["Pod"]
              namespaceSelector:
                matchExpressions:
                  - key: platform.io/managed
                    operator: In
                    values: ["true"]
      validate:
        message: "All containers must have CPU and memory limits."
        pattern:
          spec:
            containers:
              - (name): "*"
                resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

The namespaceSelector scopes this to namespaces with platform.io/managed: "true". Platform namespaces like kube-system and monitoring are exempt. Application namespaces are labelled at creation time.

Requiring team labels on alerts

From part three — every PrometheusRule must have a team label or Alertmanager can't route it:

manifests/kyverno/require-team-label.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-alert-team-label
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-prometheusrule-labels
      match:
        any:
          - resources:
              kinds: ["PrometheusRule"]
      validate:
        message: "PrometheusRule must have a 'team' label for Alertmanager routing."
        pattern:
          metadata:
            labels:
              team: "?*"
manifests/kyverno/require-team-label.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-alert-team-label
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-prometheusrule-labels
      match:
        any:
          - resources:
              kinds: ["PrometheusRule"]
      validate:
        message: "PrometheusRule must have a 'team' label for Alertmanager routing."
        pattern:
          metadata:
            labels:
              team: "?*"

?* means "non-empty string". A PrometheusRule without a team label is rejected before it touches Prometheus. The platform team doesn't need to audit alert configs after the fact.

Auto-generating NetworkPolicies

When a new application namespace is created, Kyverno can automatically create the default-deny policy:

manifests/kyverno/generate-default-deny.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-default-deny
spec:
  rules:
    - name: create-default-deny
      match:
        any:
          - resources:
              kinds: ["Namespace"]
              selector:
                matchLabels:
                  platform.io/managed: "true"
      generate:
        apiVersion: networking.k8s.io/v1
        kind: NetworkPolicy
        name: default-deny-all
        namespace: "{{request.object.metadata.name}}"
        synchronize: true
        data:
          spec:
            podSelector: {}
            policyTypes:
              - Ingress
              - Egress
manifests/kyverno/generate-default-deny.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-default-deny
spec:
  rules:
    - name: create-default-deny
      match:
        any:
          - resources:
              kinds: ["Namespace"]
              selector:
                matchLabels:
                  platform.io/managed: "true"
      generate:
        apiVersion: networking.k8s.io/v1
        kind: NetworkPolicy
        name: default-deny-all
        namespace: "{{request.object.metadata.name}}"
        synchronize: true
        data:
          spec:
            podSelector: {}
            policyTypes:
              - Ingress
              - Egress

synchronize: true means Kyverno will recreate this policy if someone manually deletes it. The default-deny baseline becomes permanent and automatic. New namespaces are secure by default.


The Enforcement Model

What you've built across parts three and four is a layered enforcement model:

LayerToolWhat it enforces
AdmissionKyvernoResource shape, required labels, generated resources
Network (L3/L4)Cilium NetworkPolicyPod-to-pod connectivity
Network (L7)CiliumNetworkPolicyHTTP method, path, gRPC method
ObservabilityHubbleReal-time flow visibility and policy verification

The critical insight is that enforcement and observability must be paired. Cilium without Hubble means you can enforce policies you can't debug. Kyverno without admission validation means you catch problems in production instead of at deployment time. The combination turns "our policies say this should be safe" into "we can see that it is safe".


What's Next

Your cluster is secure, observable, and properly routed. Now point it at a real workload: an LLM inference server. Part five covers what metrics actually matter for LLM performance — TTFT is your SLO, not throughput — and which frameworks give you observability out of the box.

Observing LLM Inference: The Metrics That Actually Matter →