I completed the SUSE Cloud Native Foundations scholarship through Udacity. These are my lesson notes, structured for reference, not for reading top to bottom.
I kept the parts that felt operationally useful and skipped most of the certification fluff.
Lesson 2: Cloud Native Application Design
Context discovery
Before writing a line of code, it's worth doing a proper context discovery pass. Two things to nail down:
Functional requirements — what the application actually needs to do:
- Who are the stakeholders?
- What are the core functionalities?
- Who are the end users (customer-facing vs. internal tool)?
- What are the inputs and outputs?
- Which engineering teams are involved?
Available resources — what you actually have to work with:
- Engineering headcount and skill set
- Budget and financial constraints
- Timelines
- Existing internal knowledge
If you skip this step, you usually pay for it later in rework.
Monoliths and microservices
Most business applications still resolve into the same three tiers regardless of architecture:
- UI — handles HTTP requests and returns a response
- Business logic — the code that provides the actual service
- Data layer — access and storage of data
The difference is how those tiers are packaged and deployed.
Monolith: all tiers are part of the same unit — one repository, shared resources (CPU/memory), one programming language, one release binary.
Microservice: each tier (or sub-component) is an independent unit — separate repositories, isolated resource allocation, a well-defined API surface, language of choice, its own release binary.
Trade-offs
| Dimension | Monolith | Microservices |
|---|---|---|
| Development complexity | One language, one repo, sequential | Multiple languages/repos, concurrent |
| Scalability | Replicate the entire stack | Replicate only the hot unit |
| Time to deploy | One pipeline, higher risk per release | Many pipelines, lower risk per release |
| Flexibility | Restructuring required for new features | Change an independent unit |
| Operational cost | Low initially, exponential at scale | High initially, proportional at scale |
| Reliability | Whole stack fails together; low observability | Isolated failures; high per-unit visibility |
Neither is universally better. The right choice depends on team size, traffic patterns, and how the application will be maintained at scale. The architecture will also change over time: services get split, merged, replaced, or retired as the product matures.
Maintenance operations
Once in production, architectures change. The common operations:
- Split — a service has grown too large and complex; break it into smaller, manageable units
- Merge — two closely coupled services make more sense as one
- Replace — a more efficient implementation is available (e.g., rewriting a Java service in Go for latency gains)
- Stale — a service no longer provides business value; archive or deprecate it
Same job, different shape: keep the system useful without making it miserable to run.
Application best practices
Regardless of architecture, apply these practices across every service to improve resilience, reduce time to recovery, and enable observability.
Health checks
Expose an HTTP endpoint (typically /healthz or /status) that returns the current health state. Kubernetes uses readiness checks to decide whether to send traffic to a Pod, and liveness checks to decide when to restart it.
@app.route('/status')
def status():
response = app.response_class(
response=json.dumps({"result": "OK - healthy"}),
status=200,
mimetype='application/json'
)
return response@app.route('/status')
def status():
response = app.response_class(
response=json.dumps({"result": "OK - healthy"}),
status=200,
mimetype='application/json'
)
return responseMetrics
Expose a /stats endpoint reporting runtime statistics if you want a simple application view, or a Prometheus-compatible /metrics endpoint if you want standard scraping. What matters is that your platform can scrape it and your team can act on it.
@app.route('/stats')
def stats():
response = app.response_class(
response=json.dumps({
"status": "success",
"code": 0,
"data": {"UserCount": 140, "UserCountActive": 23}
}),
status=200,
mimetype='application/json'
)
return response@app.route('/stats')
def stats():
response = app.response_class(
response=json.dumps({
"status": "success",
"code": 0,
"data": {"UserCount": 140, "UserCountActive": 23}
}),
status=200,
mimetype='application/json'
)
return responseLogs
Log to STDOUT and STDERR. A logging tool or node-level agent can collect from there without coupling the application to local files. Standard log levels:
DEBUG— fine-grained process eventsINFO— coarse-grained operational infoWARN— potential issue, not yet an errorERROR— error encountered, application still runningFATAL— critical failure, application not operational
Always include a timestamp on every log line.
import logging
logging.basicConfig(level=logging.DEBUG)
@app.route('/status')
def healthcheck():
app.logger.info('Status request successful')
...import logging
logging.basicConfig(level=logging.DEBUG)
@app.route('/status')
def healthcheck():
app.logger.info('Status request successful')
...Tracing
Tracing builds a full picture of how a request flows through multiple services. Individual service records are spans; a collection of spans forms a trace. Jaeger is the common implementation in Kubernetes environments.
Resource consumption
Know your CPU and memory baselines. Benchmark network throughput. Without resource awareness, you can't set meaningful Kubernetes requests and limits and the scheduler is flying blind.
Lesson 3: Container Orchestration with Kubernetes
Docker for application packaging
Three moving parts: Dockerfile, Docker image, Docker registry.
Dockerfile
A set of instructions that produces a layered image. Each instruction creates a layer; layers are cached. Change a layer early in the file and everything after it rebuilds.
Core instructions:
FROM # set the base image
RUN # execute a command during build
COPY # copy files from host to container filesystem
ADD # like COPY, but also handles URLs and tar extraction
CMD # default command to run when the container starts
EXPOSE # document the port the application listens onFROM # set the base image
RUN # execute a command during build
COPY # copy files from host to container filesystem
ADD # like COPY, but also handles URLs and tar extraction
CMD # default command to run when the container starts
EXPOSE # document the port the application listens onExample — packaging a Go application:
FROM golang:alpine
WORKDIR /go/src/app
ADD . .
RUN go build -o helloworld
EXPOSE 6111
CMD ["./helloworld"]FROM golang:alpine
WORKDIR /go/src/app
ADD . .
RUN go build -o helloworld
EXPOSE 6111
CMD ["./helloworld"]Docker image
Build and run:
# build from current directory, tag as go-helloworld
docker build -t go-helloworld .
# run in detached mode, map host port 5111 to container port 6111
docker run -d -p 5111:6111 go-helloworld
# retrieve container logs
docker logs <CONTAINER_ID># build from current directory, tag as go-helloworld
docker build -t go-helloworld .
# run in detached mode, map host port 5111 to container port 6111
docker run -d -p 5111:6111 go-helloworld
# retrieve container logs
docker logs <CONTAINER_ID>Docker registry
Tag before pushing. An untagged image gets a non-human-readable ID; a tag provides registry/repo/name:version.
# tag for DockerHub
docker tag go-helloworld pixelpotato/go-helloworld:v1.0.0
# push
docker push pixelpotato/go-helloworld:v1.0.0# tag for DockerHub
docker tag go-helloworld pixelpotato/go-helloworld:v1.0.0
# push
docker push pixelpotato/go-helloworld:v1.0.0Public registries: DockerHub, GitHub Container Registry. Private: GCR, ECR, Harbor, Artifact Registry.
Docker command reference
docker build [OPTIONS] PATH # build an image
docker run [OPTIONS] IMAGE # run a container
docker logs CONTAINER_ID # get container logs
docker images # list available images
docker ps # list running containers
docker tag SOURCE_IMAGE TARGET_IMAGE # tag an image
docker login # authenticate to DockerHub
docker push NAME[:TAG] # push to registry
docker pull NAME[:TAG] # pull from registrydocker build [OPTIONS] PATH # build an image
docker run [OPTIONS] IMAGE # run a container
docker logs CONTAINER_ID # get container logs
docker images # list available images
docker ps # list running containers
docker tag SOURCE_IMAGE TARGET_IMAGE # tag an image
docker login # authenticate to DockerHub
docker push NAME[:TAG] # push to registry
docker pull NAME[:TAG] # pull from registryKubernetes architecture
Kubernetes is a container orchestrator. You declare desired state; Kubernetes works continuously to achieve and maintain it.
A cluster is made up of nodes — physical or virtual servers. Nodes split into two planes:
- Control plane (master nodes) — makes cluster-wide decisions
- Data plane (worker nodes) — hosts workloads
Control plane components
- kube-apiserver — the nucleus. Exposes the Kubernetes API; all operations flow through it. Validates and persists state to etcd.
- etcd — distributed key-value store. The source of truth for the entire cluster. Back it up.
- kube-scheduler — watches for unscheduled Pods and assigns them to nodes based on resource availability, affinity, taints, and tolerations.
- kube-controller-manager — runs the control loops (Deployment, ReplicaSet, Node controllers, etc.). Each loop reconciles desired vs. actual state.
Data plane components
- kubelet — runs on every node. Receives PodSpecs from the API server and ensures described containers are running and healthy.
- kube-proxy — manages network rules on each node; routes traffic to the correct Pod for a given Service.
kubeletandkube-proxyare installed on all nodes — master and worker alike.
Bootstrapping a cluster
Provisioning manually is error-prone. Tooling handles this automatically.
Production-grade: kubeadm, Kubespray, Kops, k3s
Development-grade: kind, minikube, k3d
For local dev with k3s via Vagrant:
vagrant status # inspect available boxes
vagrant up # spin up the box
vagrant ssh # SSH invagrant status # inspect available boxes
vagrant up # spin up the box
vagrant ssh # SSH inKubeconfig
Grants access to a cluster. Default location: ~/.kube/config. k3s places it at /etc/rancher/k3s/k3s.yaml.
Three sections:
- Cluster — cluster name, API server endpoint, CA certificate
- User — credentials (username/password, token, or client certificates)
- Context — links a user to a cluster;
current-contextsets the active one
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: {{ CA }}
server: https://127.0.0.1:63668
name: udacity-cluster
users:
- name: udacity-user
user:
client-certificate-data: {{ CERT }}
client-key-data: {{ KEY }}
- name: green-user
user:
token: {{ TOKEN }}
contexts:
- context:
cluster: udacity-cluster
user: udacity-user
name: udacity-context
current-context: udacity-contextapiVersion: v1
clusters:
- cluster:
certificate-authority-data: {{ CA }}
server: https://127.0.0.1:63668
name: udacity-cluster
users:
- name: udacity-user
user:
client-certificate-data: {{ CERT }}
client-key-data: {{ KEY }}
- name: green-user
user:
token: {{ TOKEN }}
contexts:
- context:
cluster: udacity-cluster
user: udacity-user
name: udacity-context
current-context: udacity-contextkubectl cluster-info # control plane and add-on endpoints
kubectl get nodes # list all nodes
kubectl get nodes -o wide # with internal IPs and container runtime
kubectl describe node <NODE_NAME> # full node config including pod CIDRkubectl cluster-info # control plane and add-on endpoints
kubectl get nodes # list all nodes
kubectl get nodes -o wide # with internal IPs and container runtime
kubectl describe node <NODE_NAME> # full node config including pod CIDRKubernetes resources
Pods
The atomic unit. A Pod wraps one or more containers that share a network namespace and storage. The 1:1 Pod-to-container ratio is the recommended default. Don't run bare Pods in production, they won't be rescheduled if the node dies.
# headless pod for testing
kubectl run -it busybox-test --image=busybox --restart=Never# headless pod for testing
kubectl run -it busybox-test --image=busybox --restart=NeverDeployments and ReplicaSets
A Deployment describes the desired state of an application. It manages a ReplicaSet, which ensures the specified number of replicas are running at all times.
Rolling update strategies:
- RollingUpdate — replaces pods incrementally; supports
maxSurgeandmaxUnavailable - Recreate — kills all existing pods before creating new ones
kubectl create deploy go-helloworld --image=pixelpotato/go-helloworld:v1.0.0 -n testkubectl create deploy go-helloworld --image=pixelpotato/go-helloworld:v1.0.0 -n testFull Deployment manifest with probes and resource limits:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: go-helloworld
name: go-helloworld
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: go-helloworld
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: go-helloworld
spec:
containers:
- image: pixelpotato/go-helloworld:v2.0.0
imagePullPolicy: IfNotPresent
name: go-helloworld
ports:
- containerPort: 6112
protocol: TCP
livenessProbe:
httpGet:
path: /
port: 6112
readinessProbe:
httpGet:
path: /
port: 6112
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: go-helloworld
name: go-helloworld
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: go-helloworld
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: go-helloworld
spec:
containers:
- image: pixelpotato/go-helloworld:v2.0.0
imagePullPolicy: IfNotPresent
name: go-helloworld
ports:
- containerPort: 6112
protocol: TCP
livenessProbe:
httpGet:
path: /
port: 6112
readinessProbe:
httpGet:
path: /
port: 6112
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"readinessProbe gates traffic. livenessProbe restarts stuck pods. For long-running production services, both should be treated as standard, not optional extras.
Services
A Service provides a stable virtual IP and DNS name for a set of Pods. Pod IPs are ephemeral; the Service abstracts over that churn.
| Type | Scope | Use case |
|---|---|---|
ClusterIP | Internal only | Service-to-service (default) |
NodePort | Node IP + static port | Direct external access for dev/testing |
LoadBalancer | Cloud LB | Production external ingress |
kubectl expose deploy go-helloworld --port=8111 --target-port=6112kubectl expose deploy go-helloworld --port=8111 --target-port=6112Full Service manifest:
apiVersion: v1
kind: Service
metadata:
labels:
app: go-helloworld
name: go-helloworld
namespace: default
spec:
ports:
- port: 8111
protocol: TCP
targetPort: 6112
selector:
app: go-helloworld
type: ClusterIPapiVersion: v1
kind: Service
metadata:
labels:
app: go-helloworld
name: go-helloworld
namespace: default
spec:
ports:
- port: 8111
protocol: TCP
targetPort: 6112
selector:
app: go-helloworld
type: ClusterIPIngress
Manages external HTTP/HTTPS access to services. An Ingress Controller reads the rules and configures the load balancer.
Request flow: external user → Ingress → Ingress Controller → LoadBalancer → Pod
ConfigMaps and Secrets
ConfigMaps store non-confidential key-value pairs. Secrets store sensitive data, but the values are only base64-encoded by default. If etcd encryption at rest is not enabled, that data is not meaningfully protected. In production, enable encryption at rest and preferably use an external secrets system.
kubectl create configmap test-cm --from-literal=color=yellow
kubectl create secret generic test-secret --from-literal=color=bluekubectl create configmap test-cm --from-literal=color=yellow
kubectl create secret generic test-secret --from-literal=color=blueNamespaces
Logical separation within a cluster by team, environment, or tenant. Resource quotas and network policies apply per namespace. Eliminates noisy-neighbour resource contention.
kubectl create ns test-udacity
kubectl get po -n test-udacitykubectl create ns test-udacity
kubectl get po -n test-udacityImperative vs. declarative management
Imperative — kubectl create, kubectl run, kubectl expose directly against the live cluster. Fast for development; not version-controlled, not repeatable.
Declarative — YAML manifests applied with kubectl apply -f. Recommended for production. Manifests live in Git; changes are auditable.
Every YAML manifest has four required sections:
apiVersion: # API version for the resource type
kind: # resource type (Deployment, Service, ConfigMap, etc.)
metadata: # name, namespace, labels
spec: # desired configuration stateapiVersion: # API version for the resource type
kind: # resource type (Deployment, Service, ConfigMap, etc.)
metadata: # name, namespace, labels
spec: # desired configuration state# apply all manifests in a directory
kubectl apply -f exercises/manifests/
# delete resources defined in a manifest
kubectl delete -f manifest.yaml
# generate a manifest template without creating the resource
kubectl create deploy demo --image=nginx --dry-run=client -o yaml# apply all manifests in a directory
kubectl apply -f exercises/manifests/
# delete resources defined in a manifest
kubectl delete -f manifest.yaml
# generate a manifest template without creating the resource
kubectl create deploy demo --image=nginx --dry-run=client -o yamlkubectl command reference
kubectl create RESOURCE NAME [FLAGS] # create a resource
kubectl describe RESOURCE NAME # detailed resource info
kubectl get RESOURCE NAME [-o yaml] # get resource (optionally as YAML)
kubectl edit RESOURCE NAME # edit resource in-place
kubectl label RESOURCE NAME [PARAMS] # add or update labels
kubectl port-forward RESOURCE/NAME [PARAMS] # forward a local port to a pod
kubectl logs RESOURCE/NAME [FLAGS] # stream or retrieve logs
kubectl delete RESOURCE NAME # delete a resourcekubectl create RESOURCE NAME [FLAGS] # create a resource
kubectl describe RESOURCE NAME # detailed resource info
kubectl get RESOURCE NAME [-o yaml] # get resource (optionally as YAML)
kubectl edit RESOURCE NAME # edit resource in-place
kubectl label RESOURCE NAME [PARAMS] # add or update labels
kubectl port-forward RESOURCE/NAME [PARAMS] # forward a local port to a pod
kubectl logs RESOURCE/NAME [FLAGS] # stream or retrieve logs
kubectl delete RESOURCE NAME # delete a resourceFailure modes
Kubernetes handles low-level failures automatically:
- ReplicaSets — maintain the desired replica count
- Liveness probes — restart pods in an errored state
- Readiness probes — remove unhealthy pods from load balancer rotation
- Services — single stable entry point across pod churn
Control plane failure is a separate category. Applications continue running and handling traffic, but no new workloads can be scheduled and no configuration changes can be applied. Recovering the control plane is a critical priority but it doesn't take down live traffic.
Lesson 4: Open Source PaaS
The problem PaaS solves
Running Kubernetes across multiple environments (sandbox, staging, production) and multiple regions compounds quickly. Three environments × three regions = nine clusters to upgrade, patch, and maintain. If you do not have a platform team, that is a fast way to manufacture operational overhead.
Cloud Foundry
Cloud Foundry is an application platform. Push source code; CF handles buildpacks, containerisation, routing, and scaling.
# target org and space
cf login -a https://api.example.com
cf target -o my-org -s production
# push an application
cf push my-app -b go_buildpack -m 256M -i 2
# scale horizontally
cf scale my-app -i 5
# tail logs
cf logs my-app --recent
# set environment variables
cf set-env my-app DB_HOST postgres.example.com
cf restage my-app# target org and space
cf login -a https://api.example.com
cf target -o my-org -s production
# push an application
cf push my-app -b go_buildpack -m 256M -i 2
# scale horizontally
cf scale my-app -i 5
# tail logs
cf logs my-app --recent
# set environment variables
cf set-env my-app DB_HOST postgres.example.com
cf restage my-appCF is opinionated; standard buildpacks, managed routing, one pipeline model. That's the value for standard web applications. The ceiling appears when you need fine-grained resource control, custom networking, or workloads that don't map cleanly to an HTTP process model.
Function as a Service
FaaS (AWS Lambda, GCP Cloud Functions) is the far end of the managed spectrum. You provide a function; the platform handles everything else.
Best suited for: event-driven, stateless, short-lived tasks. Not suited for: long-running processes, persistent connections, or complex warm-up requirements.
Glossary
| Term | Definition |
|---|---|
| Monolith | Application design where all tiers are managed as a single unit |
| Microservice | Application design where tiers are independent, separately deployed units |
| Dockerfile | Set of instructions used to build a Docker image |
| Docker image | Read-only template for creating a runnable container |
| Docker registry | Central mechanism to store and distribute Docker images |
| Node | A physical or virtual server in a Kubernetes cluster |
| Cluster | A collection of distributed nodes for managing and hosting workloads |
| Master node | Control plane node — makes global cluster decisions |
| Worker node | Data plane node — hosts application workloads |
| Bootstrap | Process of provisioning a cluster so each node is fully operational |
| Kubeconfig | Metadata file that grants access to a Kubernetes cluster |
| Pod | Smallest deployable unit; provides the execution environment for a container |
| ReplicaSet | Ensures a desired number of Pod replicas are running at all times |
| Deployment | Describes and manages the desired state of an application |
| Service | Stable network abstraction over a collection of Pods |
| Ingress | Manages external HTTP/HTTPS access to cluster services |
| ConfigMap | Stores non-confidential configuration data as key-value pairs |
| Secret | Stores sensitive data as key-value pairs (base64-encoded) |
| Namespace | Logical separation between applications and their resources |
| CRD | Custom Resource Definition — extends the Kubernetes API |
| Imperative config | Managing resources via direct kubectl commands against the live cluster |
| Declarative config | Managing resources via YAML manifests stored and version-controlled locally |