~/blog/lgtm-observability-multi-tenant-kubernetes

Multi-Tenant Observability: LGTM at Platform Scale

8 min read

Your tenants want dashboards. Your security team wants data isolation. Your SREs want a single pane of glass. Your engineering director wants an alert that pages the right team, not every team.

These are not opposing requirements. They feel that way because most observability setups on Kubernetes are built for one team at a time and then stretched to cover ten. You end up with a single Prometheus instance where every team's metrics are visible to every other team. A single Grafana organisation where someone deleted the production dashboard at 2am and nobody knows who. Alert rules in a flat file that routes everything to a shared Slack channel.

The LGTM stack — Loki for logs, Grafana for dashboards, Tempo for traces, Mimir for long-term metrics — is designed for multi-tenancy. Most teams install it and ignore that design entirely. This article covers what it actually looks like when you use it correctly.


The Multi-Tenant O11y Problem

The core problem in multi-tenant observability is data ownership combined with visibility scope:

  • A tenant should see their own metrics, logs, and traces
  • A tenant should not see another tenant's data
  • The platform team should see everything
  • Alerts should route to the team responsible for the service, not the platform team

Each of these is solvable. The mistake is solving them independently with different tools, resulting in four inconsistent access control models that all break in different ways when someone changes team structure.

The LGTM approach is to enforce tenancy at the data layer — in Loki and Mimir — and then project it upward through Grafana's RBAC and folder structure. One consistent model from ingestion to dashboard.


Deploying the Stack

Add Helm repos:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
kubectl create namespace tracing
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
kubectl create namespace tracing

Prometheus and Grafana

The kube-prometheus-stack is the practical starting point. It bundles Prometheus, Alertmanager, and Grafana with sane defaults for Kubernetes monitoring.

helm/values-kube-prometheus.yaml
grafana:
  enabled: true
  adminPassword: "localdev"
  service:
    type: ClusterIP
  grafana.ini:
    auth:
      disable_login_form: false
    auth.basic:
      enabled: true
    users:
      allow_sign_up: false
  sidecar:
    datasources:
      enabled: true
    dashboards:
      enabled: true
      searchNamespace: ALL
      folderAnnotation: grafana_folder
      provider:
        foldersFromFilesStructure: false
  additionalDataSources:
    - name: Loki
      type: loki
      url: http://loki.monitoring:3100
      access: proxy
    - name: Tempo
      type: tempo
      url: http://tempo.tracing:3100
      access: proxy
      jsonData:
        tracesToLogsV2:
          datasourceUid: "loki"
          filterByTraceID: true
 
prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi
    serviceMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelector: {}
    serviceMonitorNamespaceSelector: {}
    podMonitorSelectorNilUsesHelmValues: false
    podMonitorSelector: {}
    podMonitorNamespaceSelector: {}
    # Drop high-cardinality labels before storage
    externalLabels:
      cluster: platform-local
 
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname', 'namespace']
      group_wait: 10s
      group_interval: 5m
      repeat_interval: 4h
      receiver: 'platform-team'
      routes:
        - matchers:
            - team="payments"
          receiver: 'payments-team'
        - matchers:
            - team="identity"
          receiver: 'identity-team'
    receivers:
      - name: 'platform-team'
        slack_configs:
          - channel: '#platform-alerts'
            api_url: '<to fill out: Slack webhook>'
      - name: 'payments-team'
        slack_configs:
          - channel: '#payments-oncall'
            api_url: '<to fill out: Slack webhook>'
      - name: 'identity-team'
        slack_configs:
          - channel: '#identity-oncall'
            api_url: '<to fill out: Slack webhook>'
helm/values-kube-prometheus.yaml
grafana:
  enabled: true
  adminPassword: "localdev"
  service:
    type: ClusterIP
  grafana.ini:
    auth:
      disable_login_form: false
    auth.basic:
      enabled: true
    users:
      allow_sign_up: false
  sidecar:
    datasources:
      enabled: true
    dashboards:
      enabled: true
      searchNamespace: ALL
      folderAnnotation: grafana_folder
      provider:
        foldersFromFilesStructure: false
  additionalDataSources:
    - name: Loki
      type: loki
      url: http://loki.monitoring:3100
      access: proxy
    - name: Tempo
      type: tempo
      url: http://tempo.tracing:3100
      access: proxy
      jsonData:
        tracesToLogsV2:
          datasourceUid: "loki"
          filterByTraceID: true
 
prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi
    serviceMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelector: {}
    serviceMonitorNamespaceSelector: {}
    podMonitorSelectorNilUsesHelmValues: false
    podMonitorSelector: {}
    podMonitorNamespaceSelector: {}
    # Drop high-cardinality labels before storage
    externalLabels:
      cluster: platform-local
 
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname', 'namespace']
      group_wait: 10s
      group_interval: 5m
      repeat_interval: 4h
      receiver: 'platform-team'
      routes:
        - matchers:
            - team="payments"
          receiver: 'payments-team'
        - matchers:
            - team="identity"
          receiver: 'identity-team'
    receivers:
      - name: 'platform-team'
        slack_configs:
          - channel: '#platform-alerts'
            api_url: '<to fill out: Slack webhook>'
      - name: 'payments-team'
        slack_configs:
          - channel: '#payments-oncall'
            api_url: '<to fill out: Slack webhook>'
      - name: 'identity-team'
        slack_configs:
          - channel: '#identity-oncall'
            api_url: '<to fill out: Slack webhook>'

The externalLabels.cluster addition is not cosmetic — when you federate multiple clusters into Mimir later, this label is what lets you filter by cluster in Grafana. Add it from day one.

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values helm/values-kube-prometheus.yaml \
  --wait --timeout 5m
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values helm/values-kube-prometheus.yaml \
  --wait --timeout 5m

Loki with Multi-Tenancy

Loki's multi-tenancy model is based on tenant IDs passed in the X-Scope-OrgID HTTP header. When auth_enabled: true, Loki scopes all writes and reads to the tenant in the header. No tenant can read another tenant's logs.

For a platform setup, the Promtail agents run per-node and tag logs with namespace labels. Loki's per-tenant storage means you can give teams read access to their own org ID and nothing else.

helm/values-loki.yaml
loki:
  auth_enabled: true
  storage:
    type: filesystem
  commonConfig:
    replication_factor: 1
  limits_config:
    retention_period: 720h  # 30 days
    ingestion_rate_mb: 10
    ingestion_burst_size_mb: 20
    # Per-tenant limits enforced here
    per_tenant_override_config: /etc/loki/overrides.yaml
 
promtail:
  enabled: true
  config:
    clients:
      - url: http://loki.monitoring:3100/loki/api/v1/push
        tenant_id: "platform"  # default tenant; override per-namespace via pipeline stages
    snippets:
      pipelineStages:
        - docker: {}
        - match:
            selector: '{namespace="payments"}'
            stages:
              - tenant:
                  value: "payments"
        - match:
            selector: '{namespace="identity"}'
            stages:
              - tenant:
                  value: "identity"
helm/values-loki.yaml
loki:
  auth_enabled: true
  storage:
    type: filesystem
  commonConfig:
    replication_factor: 1
  limits_config:
    retention_period: 720h  # 30 days
    ingestion_rate_mb: 10
    ingestion_burst_size_mb: 20
    # Per-tenant limits enforced here
    per_tenant_override_config: /etc/loki/overrides.yaml
 
promtail:
  enabled: true
  config:
    clients:
      - url: http://loki.monitoring:3100/loki/api/v1/push
        tenant_id: "platform"  # default tenant; override per-namespace via pipeline stages
    snippets:
      pipelineStages:
        - docker: {}
        - match:
            selector: '{namespace="payments"}'
            stages:
              - tenant:
                  value: "payments"
        - match:
            selector: '{namespace="identity"}'
            stages:
              - tenant:
                  value: "identity"

The match + tenant pipeline stages are how you route logs from different namespaces to different Loki tenants. Promtail reads the namespace label (set automatically by Docker/containerd) and sets the X-Scope-OrgID header accordingly on the push request.

helm install loki grafana/loki-stack \
  --namespace monitoring \
  --values helm/values-loki.yaml \
  --wait
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --values helm/values-loki.yaml \
  --wait

Tempo for Distributed Tracing

helm/values-tempo.yaml
tempo:
  storage:
    trace:
      backend: local
  retention: 336h  # 14 days
  receiver:
    otlp:
      protocols:
        http:
          endpoint: "0.0.0.0:4318"
        grpc:
          endpoint: "0.0.0.0:4317"
  multitenancy_enabled: true
helm/values-tempo.yaml
tempo:
  storage:
    trace:
      backend: local
  retention: 336h  # 14 days
  receiver:
    otlp:
      protocols:
        http:
          endpoint: "0.0.0.0:4318"
        grpc:
          endpoint: "0.0.0.0:4317"
  multitenancy_enabled: true

With multitenancy_enabled: true, Tempo also uses X-Scope-OrgID for trace isolation. Traces pushed with tenant_id: payments are only visible when querying as the payments tenant.

helm install tempo grafana/tempo \
  --namespace tracing \
  --values helm/values-tempo.yaml \
  --wait
helm install tempo grafana/tempo \
  --namespace tracing \
  --values helm/values-tempo.yaml \
  --wait

Dashboard Management at Platform Scale

The most common observability failure on multi-tenant platforms isn't bad metrics — it's bad dashboard governance. Someone creates a dashboard manually in the UI, forgets to save it to a folder, it gets deleted, and now the SRE on call can't see the panels they built last month.

Grafana has a first-class solution for this: provisioning via ConfigMaps.

manifests/grafana-dashboards/platform-overview.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dashboard-platform-overview
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
    grafana_folder: "Platform"
data:
  platform-overview.json: |
    <to fill out: export JSON from Grafana UI → Share → Export>
manifests/grafana-dashboards/platform-overview.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dashboard-platform-overview
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
    grafana_folder: "Platform"
data:
  platform-overview.json: |
    <to fill out: export JSON from Grafana UI → Share → Export>

The grafana_dashboard: "1" label triggers the Grafana sidecar to auto-import this ConfigMap as a dashboard. The grafana_folder label places it in the right folder in the UI. Every dashboard is a ConfigMap. Every ConfigMap is in Git. Nobody deletes a dashboard at 2am without leaving a trace.

For tenant dashboards:

metadata:
  labels:
    grafana_dashboard: "1"
    grafana_folder: "Payments Team"
metadata:
  labels:
    grafana_dashboard: "1"
    grafana_folder: "Payments Team"

The folder structure in Grafana mirrors your team structure. Platform team dashboards are in the Platform folder. The Payments team gets their folder. RBAC in Grafana maps folders to roles, so the payments team can only edit dashboards in their folder.


Alert Management: Routing That Actually Works

The Alertmanager config above routes based on a team label. This only works if your alert rules set that label:

manifests/alerts/payments-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: payments-slos
  namespace: payments
spec:
  groups:
    - name: payments.slos
      rules:
        - alert: PaymentsP99LatencyHigh
          expr: |
            histogram_quantile(0.99,
              rate(http_request_duration_seconds_bucket{namespace="payments"}[5m])
            ) > 0.5
          for: 5m
          labels:
            severity: warning
            team: payments       # <-- this is what Alertmanager routes on
            namespace: payments
          annotations:
            summary: "Payments API p99 latency above 500ms"
            description: "P99 is {{ $value | humanizeDuration }}"
manifests/alerts/payments-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: payments-slos
  namespace: payments
spec:
  groups:
    - name: payments.slos
      rules:
        - alert: PaymentsP99LatencyHigh
          expr: |
            histogram_quantile(0.99,
              rate(http_request_duration_seconds_bucket{namespace="payments"}[5m])
            ) > 0.5
          for: 5m
          labels:
            severity: warning
            team: payments       # <-- this is what Alertmanager routes on
            namespace: payments
          annotations:
            summary: "Payments API p99 latency above 500ms"
            description: "P99 is {{ $value | humanizeDuration }}"

PrometheusRule resources in application namespaces get picked up by the Prometheus operator because we set serviceMonitorNamespaceSelector: {} to allow all namespaces. The same applies to rules. Application teams own their alert definitions; the platform team owns the routing config.

The trap most teams fall into: the platform team writes all the alert rules, which means they get paged for application failures they don't own. Get the team label on every rule from day one.


What the Platform Gives, What Tenants Own

This is the model that scales:

Platform team provides:

  • Prometheus scrape infrastructure
  • Loki and Tempo ingestion endpoints
  • Grafana instance with org-level RBAC
  • Default dashboards for cluster health (nodes, control plane, namespaces)
  • Alert routing infrastructure (Alertmanager, receiver configuration)
  • ServiceMonitor CRDs and discovery config
  • Multi-tenant isolation at the data layer

Tenants own:

  • Their ServiceMonitor and PodMonitor resources
  • Their PrometheusRule resources (with mandatory team label)
  • Their Grafana dashboards (as ConfigMaps in their namespace, auto-imported)
  • Their OTEL instrumentation and trace data

Nobody touches:

  • The Prometheus operator config (platform only)
  • Alertmanager receiver configs for other teams
  • Dashboards in other teams' Grafana folders

This isn't just a policy — it's enforced by the access control model. Kyverno policies (covered in part four) can block PrometheusRule resources without a valid team label at admission time. The platform team doesn't need to audit rules after the fact.


What's Next

You can see everything happening in the cluster. You can't yet control what's allowed to happen. Part four covers network policies with Cilium and admission control with Kyverno — the enforcement layer that turns good intentions into guaranteed behaviour.

Network Control with Cilium and Kyverno: Policies That Actually Work →