~/ emre.cavunt_
Kubernetes

Network Control with Cilium and Kyverno: Policies That Actually Work

Network policies in most Kubernetes clusters are cargo cult. Teams write them, Kubernetes accepts them, and nothing changes. Cilium actually enforces them — and shows you the traffic.

Network policies in most Kubernetes clusters are cargo cult. Teams write them, Kubernetes accepts them, and pod-to-pod traffic continues exactly as before. The policies exist in YAML. The enforcement doesn't exist at all — because the CNI plugin they installed doesn't support NetworkPolicy.

kind's default CNI is kindnet. Unlike flannel, kindnet does implement NetworkPolicy — so policies aren't entirely ignored. But kindnet stops at L3/L4: IP addresses and ports. You cannot write a policy that allows GET /api/v1/status but denies DELETE /api/v1/users. You cannot see which flows are being allowed or dropped in real time. You cannot enforce access based on workload identity rather than IP. For a multi-tenant platform, that's not enough.

Cilium enforces network policies. It goes further: L7 policies (HTTP method, path, gRPC method), identity-based policies that don't rely on IP addresses, and a network flow observability layer that shows you exactly which flows are allowed and which are being dropped. This is the difference between writing a policy and knowing it works.

Kyverno operates a layer up: admission control. Before a resource lands in etcd, Kyverno validates, mutates, or blocks it. Missing team labels on PrometheusRule resources? Blocked at the door. Deployments without resource limits? Mutated to add defaults before the kubelet sees them.

Together, they're the enforcement layer that gives the multi-tenant model from part three its teeth.


Replacing the Default CNI with Cilium

Swap out kindnet at cluster creation by disabling the default CNI in your kind config:

kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: platform-local
networking:
  disableDefaultCNI: true
  podSubnet: "192.168.0.0/16"
nodes:
  - role: control-plane
  - role: worker
  - role: worker
kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: platform-local
networking:
  disableDefaultCNI: true
  podSubnet: "192.168.0.0/16"
nodes:
  - role: control-plane
  - role: worker
  - role: worker

disableDefaultCNI: true tells kind not to install kindnet. The podSubnet matches Cilium's default IP range, which avoids a config mismatch during installation. Recreate the cluster with this config before continuing:

Install Cilium:

helm repo add cilium https://helm.cilium.io/
 
helm install cilium cilium/cilium \
  --version 1.16.0 \
  --namespace kube-system \
  --set image.pullPolicy=IfNotPresent \
  --set ipam.mode=kubernetes \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --wait
helm repo add cilium https://helm.cilium.io/
 
helm install cilium cilium/cilium \
  --version 1.16.0 \
  --namespace kube-system \
  --set image.pullPolicy=IfNotPresent \
  --set ipam.mode=kubernetes \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --wait

hubble.relay and hubble.ui enable Cilium's network observability layer. Hubble is how you see which flows are allowed and which are being dropped — not by reading logs, but by watching live L3/L4/L7 traffic in real time.

cilium status  # install the CLI: brew install cilium-cli
cilium status  # install the CLI: brew install cilium-cli

All pods should be running and Cilium should report OK.


Default-Deny Network Policies

The first policy every multi-tenant cluster needs: deny all ingress and egress within a namespace by default, then explicitly allow what's needed. Without this baseline, NetworkPolicy is additive — you can only allow more, never restrict the default permissive state.

manifests/netpol/default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: payments
spec:
  podSelector: {}      # applies to all pods in the namespace
  policyTypes:
    - Ingress
    - Egress
manifests/netpol/default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: payments
spec:
  podSelector: {}      # applies to all pods in the namespace
  policyTypes:
    - Ingress
    - Egress

Apply this to every application namespace. After this, nothing can talk to anything in payments — including the pods within it. Explicitly allow what's needed:

manifests/netpol/payments-allow.yaml
# Allow the payments API to receive traffic from the gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-gateway-ingress
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: infra
          podSelector:
            matchLabels:
              app.kubernetes.io/name: envoy
---
# Allow payments to call the identity service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-identity
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: identity
      ports:
        - port: 8080
          protocol: TCP
    - to:                    # always allow DNS
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP
manifests/netpol/payments-allow.yaml
# Allow the payments API to receive traffic from the gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-gateway-ingress
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: infra
          podSelector:
            matchLabels:
              app.kubernetes.io/name: envoy
---
# Allow payments to call the identity service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-identity
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: identity
      ports:
        - port: 8080
          protocol: TCP
    - to:                    # always allow DNS
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP

The DNS egress rule is easy to forget and hard to debug when you forget it. Always add it explicitly. A pod that can't resolve DNS looks like a connectivity failure.


Cilium L7 Policies

Standard Kubernetes NetworkPolicy operates at L3/L4 — IP addresses and ports. Cilium extends this to L7. You can write policies that allow GET /api/v1/status but deny DELETE /api/v1/users.

manifests/cilium/l7-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: payments-api-l7
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: payments-api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: payments-frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: GET
                path: "^/api/v1/"
              - method: POST
                path: "^/api/v1/payments$"
manifests/cilium/l7-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: payments-api-l7
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: payments-api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: payments-frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: GET
                path: "^/api/v1/"
              - method: POST
                path: "^/api/v1/payments$"

This says: only payments-frontend can reach the API, and only on GET /api/v1/* or POST /api/v1/payments. Any other method or path from any other source is dropped at the proxy layer, before the application code runs. The application cannot accidentally expose a misconfigured endpoint.

L7 policies are expensive — Cilium has to parse every HTTP request. Use them at trust boundaries, not everywhere. The right model is: L3/L4 default-deny as the baseline everywhere, L7 policies at the perimeter of sensitive namespaces.


Observing Flows with Hubble

This is where Cilium earns its setup cost. Hubble gives you real-time network flow visibility:

kubectl port-forward -n kube-system svc/hubble-ui 12000:80
kubectl port-forward -n kube-system svc/hubble-ui 12000:80

Open http://localhost:12000 and select the payments namespace. You'll see a service map with live traffic flows. Green arrows are allowed flows. Red arrows are denied flows. Click a denied flow and you get the policy that dropped it, the source pod, and the destination port.

From the CLI:

hubble observe --namespace payments --last 50
hubble observe --namespace payments --last 50
payments/payments-api → identity/identity-api:8080 FORWARDED
payments/payments-api → monitoring/prometheus:9090 DROPPED (policy)

The second flow is being dropped because the default-deny policy in payments blocks egress to monitoring. If this was unexpected, you found a gap in your allow-list. If it was expected, you confirmed the policy is working.

This is the difference between writing a network policy and knowing it's enforced.


Kyverno: Admission Control

Kyverno intercepts every resource creation and update request. Before the API server writes to etcd, Kyverno can:

  • Validate — reject resources that don't meet a policy
  • Mutate — modify resources to add missing fields
  • Generate — create additional resources when a resource is created

Install Kyverno:

helm repo add kyverno https://kyverno.github.io/kyverno/
helm install kyverno kyverno/kyverno \
  --namespace kyverno \
  --create-namespace \
  --wait
helm repo add kyverno https://kyverno.github.io/kyverno/
helm install kyverno kyverno/kyverno \
  --namespace kyverno \
  --create-namespace \
  --wait

Requiring resource limits

Every Deployment that lands in an application namespace should have resource limits. Without limits, a runaway pod can consume all available CPU on a node and starve neighbours. This policy enforces it at admission:

manifests/kyverno/require-limits.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-container-limits
      match:
        any:
          - resources:
              kinds: ["Pod"]
              namespaceSelector:
                matchExpressions:
                  - key: platform.io/managed
                    operator: In
                    values: ["true"]
      validate:
        message: "All containers must have CPU and memory limits."
        pattern:
          spec:
            containers:
              - (name): "*"
                resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"
manifests/kyverno/require-limits.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-container-limits
      match:
        any:
          - resources:
              kinds: ["Pod"]
              namespaceSelector:
                matchExpressions:
                  - key: platform.io/managed
                    operator: In
                    values: ["true"]
      validate:
        message: "All containers must have CPU and memory limits."
        pattern:
          spec:
            containers:
              - (name): "*"
                resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

The namespaceSelector scopes this to namespaces with platform.io/managed: "true". Platform namespaces like kube-system and monitoring are exempt. Application namespaces are labelled at creation time.

Requiring team labels on alerts

From part three — every PrometheusRule must have a team label or Alertmanager can't route it:

manifests/kyverno/require-team-label.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-alert-team-label
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-prometheusrule-labels
      match:
        any:
          - resources:
              kinds: ["PrometheusRule"]
      validate:
        message: "PrometheusRule must have a 'team' label for Alertmanager routing."
        pattern:
          metadata:
            labels:
              team: "?*"
manifests/kyverno/require-team-label.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-alert-team-label
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-prometheusrule-labels
      match:
        any:
          - resources:
              kinds: ["PrometheusRule"]
      validate:
        message: "PrometheusRule must have a 'team' label for Alertmanager routing."
        pattern:
          metadata:
            labels:
              team: "?*"

?* means "non-empty string". A PrometheusRule without a team label is rejected before it touches Prometheus. The platform team doesn't need to audit alert configs after the fact.

Auto-generating NetworkPolicies

When a new application namespace is created, Kyverno can automatically create the default-deny policy:

manifests/kyverno/generate-default-deny.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-default-deny
spec:
  rules:
    - name: create-default-deny
      match:
        any:
          - resources:
              kinds: ["Namespace"]
              selector:
                matchLabels:
                  platform.io/managed: "true"
      generate:
        apiVersion: networking.k8s.io/v1
        kind: NetworkPolicy
        name: default-deny-all
        namespace: "{{request.object.metadata.name}}"
        synchronize: true
        data:
          spec:
            podSelector: {}
            policyTypes:
              - Ingress
              - Egress
manifests/kyverno/generate-default-deny.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-default-deny
spec:
  rules:
    - name: create-default-deny
      match:
        any:
          - resources:
              kinds: ["Namespace"]
              selector:
                matchLabels:
                  platform.io/managed: "true"
      generate:
        apiVersion: networking.k8s.io/v1
        kind: NetworkPolicy
        name: default-deny-all
        namespace: "{{request.object.metadata.name}}"
        synchronize: true
        data:
          spec:
            podSelector: {}
            policyTypes:
              - Ingress
              - Egress

synchronize: true means Kyverno will recreate this policy if someone manually deletes it. The default-deny baseline becomes permanent and automatic. New namespaces are secure by default.


The Enforcement Model

What you've built across parts three and four is a layered enforcement model:

LayerToolWhat it enforces
AdmissionKyvernoResource shape, required labels, generated resources
Network (L3/L4)Cilium NetworkPolicyPod-to-pod connectivity
Network (L7)CiliumNetworkPolicyHTTP method, path, gRPC method
ObservabilityHubbleReal-time flow visibility and policy verification

The critical insight is that enforcement and observability must be paired. Cilium without Hubble means you can enforce policies you can't debug. Kyverno without admission validation means you catch problems in production instead of at deployment time. The combination turns "our policies say this should be safe" into "we can see that it is safe".


What's Next

Your cluster is secure, observable, and properly routed. Now point it at a real workload: an LLM inference server. Part five covers what metrics actually matter for LLM performance — TTFT is your SLO, not throughput — and which frameworks give you observability out of the box.

Observing LLM Inference: The Metrics That Actually Matter →