~/blog/suse-cloud-native-foundations-my-study-notes

SUSE Cloud Native Foundations: My Study Notes

14 min read

I completed the SUSE Cloud Native Foundations scholarship through Udacity. These are my lesson notes, structured for reference, not for reading top to bottom.

I kept the parts that felt operationally useful and skipped most of the certification fluff.


Lesson 2: Cloud Native Application Design

Context discovery

Before writing a line of code, it's worth doing a proper context discovery pass. Two things to nail down:

Functional requirements — what the application actually needs to do:

  • Who are the stakeholders?
  • What are the core functionalities?
  • Who are the end users (customer-facing vs. internal tool)?
  • What are the inputs and outputs?
  • Which engineering teams are involved?

Available resources — what you actually have to work with:

  • Engineering headcount and skill set
  • Budget and financial constraints
  • Timelines
  • Existing internal knowledge

If you skip this step, you usually pay for it later in rework.

Monoliths and microservices

Most business applications still resolve into the same three tiers regardless of architecture:

  • UI — handles HTTP requests and returns a response
  • Business logic — the code that provides the actual service
  • Data layer — access and storage of data

The difference is how those tiers are packaged and deployed.

Monolith: all tiers are part of the same unit — one repository, shared resources (CPU/memory), one programming language, one release binary.

Microservice: each tier (or sub-component) is an independent unit — separate repositories, isolated resource allocation, a well-defined API surface, language of choice, its own release binary.

Trade-offs

DimensionMonolithMicroservices
Development complexityOne language, one repo, sequentialMultiple languages/repos, concurrent
ScalabilityReplicate the entire stackReplicate only the hot unit
Time to deployOne pipeline, higher risk per releaseMany pipelines, lower risk per release
FlexibilityRestructuring required for new featuresChange an independent unit
Operational costLow initially, exponential at scaleHigh initially, proportional at scale
ReliabilityWhole stack fails together; low observabilityIsolated failures; high per-unit visibility

Neither is universally better. The right choice depends on team size, traffic patterns, and how the application will be maintained at scale. The architecture will also change over time: services get split, merged, replaced, or retired as the product matures.

Maintenance operations

Once in production, architectures change. The common operations:

  • Split — a service has grown too large and complex; break it into smaller, manageable units
  • Merge — two closely coupled services make more sense as one
  • Replace — a more efficient implementation is available (e.g., rewriting a Java service in Go for latency gains)
  • Stale — a service no longer provides business value; archive or deprecate it

Same job, different shape: keep the system useful without making it miserable to run.

Application best practices

Regardless of architecture, apply these practices across every service to improve resilience, reduce time to recovery, and enable observability.

Health checks

Expose an HTTP endpoint (typically /healthz or /status) that returns the current health state. Kubernetes uses readiness checks to decide whether to send traffic to a Pod, and liveness checks to decide when to restart it.

@app.route('/status')
def status():
    response = app.response_class(
        response=json.dumps({"result": "OK - healthy"}),
        status=200,
        mimetype='application/json'
    )
    return response
@app.route('/status')
def status():
    response = app.response_class(
        response=json.dumps({"result": "OK - healthy"}),
        status=200,
        mimetype='application/json'
    )
    return response

Metrics

Expose a /stats endpoint reporting runtime statistics if you want a simple application view, or a Prometheus-compatible /metrics endpoint if you want standard scraping. What matters is that your platform can scrape it and your team can act on it.

@app.route('/stats')
def stats():
    response = app.response_class(
        response=json.dumps({
            "status": "success",
            "code": 0,
            "data": {"UserCount": 140, "UserCountActive": 23}
        }),
        status=200,
        mimetype='application/json'
    )
    return response
@app.route('/stats')
def stats():
    response = app.response_class(
        response=json.dumps({
            "status": "success",
            "code": 0,
            "data": {"UserCount": 140, "UserCountActive": 23}
        }),
        status=200,
        mimetype='application/json'
    )
    return response

Logs

Log to STDOUT and STDERR. A logging tool or node-level agent can collect from there without coupling the application to local files. Standard log levels:

  • DEBUG — fine-grained process events
  • INFO — coarse-grained operational info
  • WARN — potential issue, not yet an error
  • ERROR — error encountered, application still running
  • FATAL — critical failure, application not operational

Always include a timestamp on every log line.

import logging
logging.basicConfig(level=logging.DEBUG)
 
@app.route('/status')
def healthcheck():
    app.logger.info('Status request successful')
    ...
import logging
logging.basicConfig(level=logging.DEBUG)
 
@app.route('/status')
def healthcheck():
    app.logger.info('Status request successful')
    ...

Tracing

Tracing builds a full picture of how a request flows through multiple services. Individual service records are spans; a collection of spans forms a trace. Jaeger is the common implementation in Kubernetes environments.

Resource consumption

Know your CPU and memory baselines. Benchmark network throughput. Without resource awareness, you can't set meaningful Kubernetes requests and limits and the scheduler is flying blind.


Lesson 3: Container Orchestration with Kubernetes

Docker for application packaging

Three moving parts: Dockerfile, Docker image, Docker registry.

Dockerfile

A set of instructions that produces a layered image. Each instruction creates a layer; layers are cached. Change a layer early in the file and everything after it rebuilds.

Core instructions:

FROM    # set the base image
RUN     # execute a command during build
COPY    # copy files from host to container filesystem
ADD     # like COPY, but also handles URLs and tar extraction
CMD     # default command to run when the container starts
EXPOSE  # document the port the application listens on
FROM    # set the base image
RUN     # execute a command during build
COPY    # copy files from host to container filesystem
ADD     # like COPY, but also handles URLs and tar extraction
CMD     # default command to run when the container starts
EXPOSE  # document the port the application listens on

Example — packaging a Go application:

FROM golang:alpine
 
WORKDIR /go/src/app
 
ADD . .
 
RUN go build -o helloworld
 
EXPOSE 6111
 
CMD ["./helloworld"]
FROM golang:alpine
 
WORKDIR /go/src/app
 
ADD . .
 
RUN go build -o helloworld
 
EXPOSE 6111
 
CMD ["./helloworld"]

Docker image

Build and run:

# build from current directory, tag as go-helloworld
docker build -t go-helloworld .
 
# run in detached mode, map host port 5111 to container port 6111
docker run -d -p 5111:6111 go-helloworld
 
# retrieve container logs
docker logs <CONTAINER_ID>
# build from current directory, tag as go-helloworld
docker build -t go-helloworld .
 
# run in detached mode, map host port 5111 to container port 6111
docker run -d -p 5111:6111 go-helloworld
 
# retrieve container logs
docker logs <CONTAINER_ID>

Docker registry

Tag before pushing. An untagged image gets a non-human-readable ID; a tag provides registry/repo/name:version.

# tag for DockerHub
docker tag go-helloworld pixelpotato/go-helloworld:v1.0.0
 
# push
docker push pixelpotato/go-helloworld:v1.0.0
# tag for DockerHub
docker tag go-helloworld pixelpotato/go-helloworld:v1.0.0
 
# push
docker push pixelpotato/go-helloworld:v1.0.0

Public registries: DockerHub, GitHub Container Registry. Private: GCR, ECR, Harbor, Artifact Registry.

Docker command reference

docker build [OPTIONS] PATH          # build an image
docker run [OPTIONS] IMAGE           # run a container
docker logs CONTAINER_ID             # get container logs
docker images                        # list available images
docker ps                            # list running containers
docker tag SOURCE_IMAGE TARGET_IMAGE # tag an image
docker login                         # authenticate to DockerHub
docker push NAME[:TAG]               # push to registry
docker pull NAME[:TAG]               # pull from registry
docker build [OPTIONS] PATH          # build an image
docker run [OPTIONS] IMAGE           # run a container
docker logs CONTAINER_ID             # get container logs
docker images                        # list available images
docker ps                            # list running containers
docker tag SOURCE_IMAGE TARGET_IMAGE # tag an image
docker login                         # authenticate to DockerHub
docker push NAME[:TAG]               # push to registry
docker pull NAME[:TAG]               # pull from registry

Kubernetes architecture

Kubernetes is a container orchestrator. You declare desired state; Kubernetes works continuously to achieve and maintain it.

A cluster is made up of nodes — physical or virtual servers. Nodes split into two planes:

  • Control plane (master nodes) — makes cluster-wide decisions
  • Data plane (worker nodes) — hosts workloads

Control plane components

  • kube-apiserver — the nucleus. Exposes the Kubernetes API; all operations flow through it. Validates and persists state to etcd.
  • etcd — distributed key-value store. The source of truth for the entire cluster. Back it up.
  • kube-scheduler — watches for unscheduled Pods and assigns them to nodes based on resource availability, affinity, taints, and tolerations.
  • kube-controller-manager — runs the control loops (Deployment, ReplicaSet, Node controllers, etc.). Each loop reconciles desired vs. actual state.

Data plane components

  • kubelet — runs on every node. Receives PodSpecs from the API server and ensures described containers are running and healthy.
  • kube-proxy — manages network rules on each node; routes traffic to the correct Pod for a given Service.

kubelet and kube-proxy are installed on all nodes — master and worker alike.


Bootstrapping a cluster

Provisioning manually is error-prone. Tooling handles this automatically.

Production-grade: kubeadm, Kubespray, Kops, k3s

Development-grade: kind, minikube, k3d

For local dev with k3s via Vagrant:

vagrant status   # inspect available boxes
vagrant up       # spin up the box
vagrant ssh      # SSH in
vagrant status   # inspect available boxes
vagrant up       # spin up the box
vagrant ssh      # SSH in

Kubeconfig

Grants access to a cluster. Default location: ~/.kube/config. k3s places it at /etc/rancher/k3s/k3s.yaml.

Three sections:

  • Cluster — cluster name, API server endpoint, CA certificate
  • User — credentials (username/password, token, or client certificates)
  • Context — links a user to a cluster; current-context sets the active one
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: {{ CA }}
    server: https://127.0.0.1:63668
  name: udacity-cluster
users:
- name: udacity-user
  user:
    client-certificate-data: {{ CERT }}
    client-key-data: {{ KEY }}
- name: green-user
  user:
    token: {{ TOKEN }}
contexts:
- context:
    cluster: udacity-cluster
    user: udacity-user
  name: udacity-context
current-context: udacity-context
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: {{ CA }}
    server: https://127.0.0.1:63668
  name: udacity-cluster
users:
- name: udacity-user
  user:
    client-certificate-data: {{ CERT }}
    client-key-data: {{ KEY }}
- name: green-user
  user:
    token: {{ TOKEN }}
contexts:
- context:
    cluster: udacity-cluster
    user: udacity-user
  name: udacity-context
current-context: udacity-context
kubectl cluster-info                    # control plane and add-on endpoints
kubectl get nodes                       # list all nodes
kubectl get nodes -o wide               # with internal IPs and container runtime
kubectl describe node <NODE_NAME>       # full node config including pod CIDR
kubectl cluster-info                    # control plane and add-on endpoints
kubectl get nodes                       # list all nodes
kubectl get nodes -o wide               # with internal IPs and container runtime
kubectl describe node <NODE_NAME>       # full node config including pod CIDR

Kubernetes resources

Pods

The atomic unit. A Pod wraps one or more containers that share a network namespace and storage. The 1:1 Pod-to-container ratio is the recommended default. Don't run bare Pods in production, they won't be rescheduled if the node dies.

# headless pod for testing
kubectl run -it busybox-test --image=busybox --restart=Never
# headless pod for testing
kubectl run -it busybox-test --image=busybox --restart=Never

Deployments and ReplicaSets

A Deployment describes the desired state of an application. It manages a ReplicaSet, which ensures the specified number of replicas are running at all times.

Rolling update strategies:

  • RollingUpdate — replaces pods incrementally; supports maxSurge and maxUnavailable
  • Recreate — kills all existing pods before creating new ones
kubectl create deploy go-helloworld --image=pixelpotato/go-helloworld:v1.0.0 -n test
kubectl create deploy go-helloworld --image=pixelpotato/go-helloworld:v1.0.0 -n test

Full Deployment manifest with probes and resource limits:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: go-helloworld
  name: go-helloworld
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: go-helloworld
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: go-helloworld
    spec:
      containers:
      - image: pixelpotato/go-helloworld:v2.0.0
        imagePullPolicy: IfNotPresent
        name: go-helloworld
        ports:
        - containerPort: 6112
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /
            port: 6112
        readinessProbe:
          httpGet:
            path: /
            port: 6112
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: go-helloworld
  name: go-helloworld
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: go-helloworld
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: go-helloworld
    spec:
      containers:
      - image: pixelpotato/go-helloworld:v2.0.0
        imagePullPolicy: IfNotPresent
        name: go-helloworld
        ports:
        - containerPort: 6112
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /
            port: 6112
        readinessProbe:
          httpGet:
            path: /
            port: 6112
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

readinessProbe gates traffic. livenessProbe restarts stuck pods. For long-running production services, both should be treated as standard, not optional extras.

Services

A Service provides a stable virtual IP and DNS name for a set of Pods. Pod IPs are ephemeral; the Service abstracts over that churn.

TypeScopeUse case
ClusterIPInternal onlyService-to-service (default)
NodePortNode IP + static portDirect external access for dev/testing
LoadBalancerCloud LBProduction external ingress
kubectl expose deploy go-helloworld --port=8111 --target-port=6112
kubectl expose deploy go-helloworld --port=8111 --target-port=6112

Full Service manifest:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: go-helloworld
  name: go-helloworld
  namespace: default
spec:
  ports:
  - port: 8111
    protocol: TCP
    targetPort: 6112
  selector:
    app: go-helloworld
  type: ClusterIP
apiVersion: v1
kind: Service
metadata:
  labels:
    app: go-helloworld
  name: go-helloworld
  namespace: default
spec:
  ports:
  - port: 8111
    protocol: TCP
    targetPort: 6112
  selector:
    app: go-helloworld
  type: ClusterIP

Ingress

Manages external HTTP/HTTPS access to services. An Ingress Controller reads the rules and configures the load balancer.

Request flow: external user → Ingress → Ingress Controller → LoadBalancer → Pod

ConfigMaps and Secrets

ConfigMaps store non-confidential key-value pairs. Secrets store sensitive data, but the values are only base64-encoded by default. If etcd encryption at rest is not enabled, that data is not meaningfully protected. In production, enable encryption at rest and preferably use an external secrets system.

kubectl create configmap test-cm --from-literal=color=yellow
kubectl create secret generic test-secret --from-literal=color=blue
kubectl create configmap test-cm --from-literal=color=yellow
kubectl create secret generic test-secret --from-literal=color=blue

Namespaces

Logical separation within a cluster by team, environment, or tenant. Resource quotas and network policies apply per namespace. Eliminates noisy-neighbour resource contention.

kubectl create ns test-udacity
kubectl get po -n test-udacity
kubectl create ns test-udacity
kubectl get po -n test-udacity

Imperative vs. declarative management

Imperativekubectl create, kubectl run, kubectl expose directly against the live cluster. Fast for development; not version-controlled, not repeatable.

Declarative — YAML manifests applied with kubectl apply -f. Recommended for production. Manifests live in Git; changes are auditable.

Every YAML manifest has four required sections:

apiVersion: # API version for the resource type
kind:        # resource type (Deployment, Service, ConfigMap, etc.)
metadata:    # name, namespace, labels
spec:        # desired configuration state
apiVersion: # API version for the resource type
kind:        # resource type (Deployment, Service, ConfigMap, etc.)
metadata:    # name, namespace, labels
spec:        # desired configuration state
# apply all manifests in a directory
kubectl apply -f exercises/manifests/
 
# delete resources defined in a manifest
kubectl delete -f manifest.yaml
 
# generate a manifest template without creating the resource
kubectl create deploy demo --image=nginx --dry-run=client -o yaml
# apply all manifests in a directory
kubectl apply -f exercises/manifests/
 
# delete resources defined in a manifest
kubectl delete -f manifest.yaml
 
# generate a manifest template without creating the resource
kubectl create deploy demo --image=nginx --dry-run=client -o yaml

kubectl command reference

kubectl create RESOURCE NAME [FLAGS]        # create a resource
kubectl describe RESOURCE NAME              # detailed resource info
kubectl get RESOURCE NAME [-o yaml]         # get resource (optionally as YAML)
kubectl edit RESOURCE NAME                  # edit resource in-place
kubectl label RESOURCE NAME [PARAMS]        # add or update labels
kubectl port-forward RESOURCE/NAME [PARAMS] # forward a local port to a pod
kubectl logs RESOURCE/NAME [FLAGS]          # stream or retrieve logs
kubectl delete RESOURCE NAME                # delete a resource
kubectl create RESOURCE NAME [FLAGS]        # create a resource
kubectl describe RESOURCE NAME              # detailed resource info
kubectl get RESOURCE NAME [-o yaml]         # get resource (optionally as YAML)
kubectl edit RESOURCE NAME                  # edit resource in-place
kubectl label RESOURCE NAME [PARAMS]        # add or update labels
kubectl port-forward RESOURCE/NAME [PARAMS] # forward a local port to a pod
kubectl logs RESOURCE/NAME [FLAGS]          # stream or retrieve logs
kubectl delete RESOURCE NAME                # delete a resource

Failure modes

Kubernetes handles low-level failures automatically:

  • ReplicaSets — maintain the desired replica count
  • Liveness probes — restart pods in an errored state
  • Readiness probes — remove unhealthy pods from load balancer rotation
  • Services — single stable entry point across pod churn

Control plane failure is a separate category. Applications continue running and handling traffic, but no new workloads can be scheduled and no configuration changes can be applied. Recovering the control plane is a critical priority but it doesn't take down live traffic.


Lesson 4: Open Source PaaS

The problem PaaS solves

Running Kubernetes across multiple environments (sandbox, staging, production) and multiple regions compounds quickly. Three environments × three regions = nine clusters to upgrade, patch, and maintain. If you do not have a platform team, that is a fast way to manufacture operational overhead.

Cloud Foundry

Cloud Foundry is an application platform. Push source code; CF handles buildpacks, containerisation, routing, and scaling.

# target org and space
cf login -a https://api.example.com
cf target -o my-org -s production
 
# push an application
cf push my-app -b go_buildpack -m 256M -i 2
 
# scale horizontally
cf scale my-app -i 5
 
# tail logs
cf logs my-app --recent
 
# set environment variables
cf set-env my-app DB_HOST postgres.example.com
cf restage my-app
# target org and space
cf login -a https://api.example.com
cf target -o my-org -s production
 
# push an application
cf push my-app -b go_buildpack -m 256M -i 2
 
# scale horizontally
cf scale my-app -i 5
 
# tail logs
cf logs my-app --recent
 
# set environment variables
cf set-env my-app DB_HOST postgres.example.com
cf restage my-app

CF is opinionated; standard buildpacks, managed routing, one pipeline model. That's the value for standard web applications. The ceiling appears when you need fine-grained resource control, custom networking, or workloads that don't map cleanly to an HTTP process model.

Function as a Service

FaaS (AWS Lambda, GCP Cloud Functions) is the far end of the managed spectrum. You provide a function; the platform handles everything else.

Best suited for: event-driven, stateless, short-lived tasks. Not suited for: long-running processes, persistent connections, or complex warm-up requirements.


Glossary

TermDefinition
MonolithApplication design where all tiers are managed as a single unit
MicroserviceApplication design where tiers are independent, separately deployed units
DockerfileSet of instructions used to build a Docker image
Docker imageRead-only template for creating a runnable container
Docker registryCentral mechanism to store and distribute Docker images
NodeA physical or virtual server in a Kubernetes cluster
ClusterA collection of distributed nodes for managing and hosting workloads
Master nodeControl plane node — makes global cluster decisions
Worker nodeData plane node — hosts application workloads
BootstrapProcess of provisioning a cluster so each node is fully operational
KubeconfigMetadata file that grants access to a Kubernetes cluster
PodSmallest deployable unit; provides the execution environment for a container
ReplicaSetEnsures a desired number of Pod replicas are running at all times
DeploymentDescribes and manages the desired state of an application
ServiceStable network abstraction over a collection of Pods
IngressManages external HTTP/HTTPS access to cluster services
ConfigMapStores non-confidential configuration data as key-value pairs
SecretStores sensitive data as key-value pairs (base64-encoded)
NamespaceLogical separation between applications and their resources
CRDCustom Resource Definition — extends the Kubernetes API
Imperative configManaging resources via direct kubectl commands against the live cluster
Declarative configManaging resources via YAML manifests stored and version-controlled locally