Container & Kubernetes Security: Real-World Vulnerabilities, Exploit Paths, and a Defense Blueprint By CyberDudeBivash — Founder, CyberDudeBivash | Cybersecurity & AI

 


Executive summary

Containers and Kubernetes move fast—and so do attackers. Most real incidents are misconfig + weak identity + supply-chain rather than “exotic kernel 0-days”. But container escapes and API-server flaws still happen. This guide maps the attack surface for Docker/containerd/runc and Kubernetes control/worker planes, walks through common exploits, and gives copy-paste hardening, detection, and response you can implement today.


1) How the stack fits together (threat model)

Container runtime path: image → containerd/CRI-Orunc (creates Linux namespaces/cgroups) → kernel.
Kubernetes control plane: API Server ⇄ etcd (state) ⇄ Controller Manager & Scheduler; nodes run kubelet + CNI (network) + CSI (storage) + runtime.

Trust boundaries (red zones):

  1. Container ↔ host kernel (escapes, privilege escalation).

  2. Kubelet / API Server (authn/authz mistakes, open ports).

  3. etcd (cluster secrets).

  4. Admission & Supply chain (malicious images, mutable tags).

  5. CNI/Network (flat east-west traffic).

  6. Volumes / hostPath (symlink & subPath tricks).


2) Container vulnerabilities & exploit patterns

2.1 Runtime escapes (runc/containerd)

  • runc “/proc/self/exe” & FD leaks (e.g., CVE-2019-5736; later variants like CVE-2024-21626): attacker overwrites or abuses the runc binary/FD during exec to gain host execution.

  • containerd CRI bugs (e.g., CVE-2022-23648): crafted image/manifests or mounts leading to unexpected host access.

  • Kernel bugs reachable from namespaces: UAF/overlayfs flaws to write on host.

Exploit flow (typical): run a malicious image → trigger runc/containerd bug during docker exec or pod start → obtain host shell → pivot to node credentials → cluster admin via kubelet API or cloud IAM.

2.2 Capability & privilege abuse

  • Containers running as root with CAP_SYS_ADMIN (or --privileged) can: mount filesystems, manipulate cgroups, or access /dev to break isolation.

  • No seccomp/AppArmor/SELinux → dangerous syscalls (e.g., ptrace, bpf) allowed.

2.3 hostPath & subPath volume attacks

  • hostPath mounts expose host directories; combined with symlink races → arbitrary host file write.

  • subPath volume handling historically hit symlink/TOCTOU issues (e.g., CVE-2021-25741 class).

  • Kubernetes Windows hostProcess containers can reach host services if misused.

2.4 Image & registry supply-chain

  • Typosquatted/malicious images on public registries.

  • Mutable tags (:latest) pull different bits over time; poisoned base images or Dockerfile FROM chains.

  • Embedded secrets in layers (forgotten .npmrc, SSH keys).

  • Build system takeover (CI tokens → push trojaned images).


3) Kubernetes vulnerabilities & exploit paths

3.1 API server & aggregated APIs

  • Proxy/upgrade request mishandling can forward attacker traffic to backends (e.g., CVE-2018-1002105 class), yielding privilege escalation.

  • Over-permissive RBAC (e.g., create pods / secrets in prod) converts any workload compromise into cluster admin.

3.2 etcd exposure

  • etcd stores Secrets. If reachable without TLS+auth, one GET = full cluster compromise.

3.3 Kubelet issues

  • Legacy readonly port (10255) exposure leaks metrics/pods; misconfigured kubelet credentials allow exec/cp into pods.

3.4 CNI / Network

  • Flat networks let a compromised pod scan and reach every service.

  • DNAT/externalIPs misconfig (e.g., CVE-2020-8554 class) can enable MITM within cluster.

3.5 Admission & policy gaps

  • No Pod Security / PodSecurityPolicy (legacy) replacement → pods request privileged, hostPID, hostNetwork, or unsafe capabilities and get them.

  • Missing image signature policy lets untrusted images in.


4) Realistic attacker playbooks

Playbook A — Malicious public image → node → cluster

  1. Pull image company/app:latest (poisoned).

  2. Container phones home, drops toolkit, enumerates service account token.

  3. If pod has list/create pods or secrets, attacker spawns privileged pod / steals cluster secrets.

  4. Uses cloud metadata IAM via kube-node role to escalate in cloud.

Playbook B — Exposed Docker API (2375) or kubelet creds

  1. Internet scan finds unauth Docker/kubelet; attacker docker run --privileged -v /:/host or kubectl exec.

  2. Writes SSH key to /host/root/.ssh/authorized_keys; persistence achieved.

  3. Mass-deploy crypto-miners or exfil data.

Playbook C — hostPath/subPath write

  1. Dev grants hostPath: /var/lib/kubelet/pods to sidecar for debugging.

  2. Attacker symlinks to /etc/shadow or kubelet cert dir → writes controlled content.

  3. Replaces node service or steals kubelet cert → cluster admin.


5) Hardening that actually works (copy-paste)

5.1 Container run options (least privilege)

yaml
# pod-security.yaml (snippet) securityContext: runAsNonRoot: true runAsUser: 10000 allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] seccompProfile: type: RuntimeDefault # or Localhost # Avoid host namespaces & devices unless absolutely necessary hostNetwork: false hostPID: false hostIPC: false

5.2 Block dangerous mounts

  • Avoid hostPath. If required, use type: DirectoryOrCreate, read-only, and path-allowlists.

  • Do not mount /var/run/docker.sock, kubelet dirs, or /proc//sys into pods.

5.3 Admission control: Pod Security + policy engines

Pod Security Admission (v1.25+): label namespaces to baseline/restricted:

yaml
metadata: labels: pod-security.kubernetes.io/enforce: "restricted" pod-security.kubernetes.io/audit: "restricted" pod-security.kubernetes.io/warn: "restricted"

Kyverno/Gatekeeper examples:

  • Deny privileged: true, hostNetwork: true, hostPID: true.

  • Require runAsNonRoot, readOnlyRootFilesystem, seccompProfile: RuntimeDefault.

  • Enforce image: registry.example.com/* and signature required (Sigstore/cosign).

5.4 Image supply-chain controls

  • Build SBOMs (Syft) and scan (Trivy/Grype).

  • Sign images: cosign sign --key kms://… <image>; admission webhook rejects unsigned.

  • Pin immutable digests:

yaml
image: registry.example.com/api@sha256:3c1f... # not :latest
  • Block root in Dockerfiles: USER 10000:10000.

5.5 Node & runtime hardening

  • Keep runc/containerd current; enable AppArmor/SELinux; lock kernel with LSMs.

  • Enable seccomp default (RuntimeDefault).

  • Isolate node IAM roles; disable cloud metadata access from non-system pods (IMDSv2 hop-limit, firewall).

5.6 Network policies (CNI)

yaml
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: { name: default-deny, namespace: prod } spec: podSelector: {} policyTypes: ["Ingress","Egress"]

Then add allow policies per app (db only from app, egress only to needed APIs).

5.7 etcd & control plane

  • etcd TLS/mTLS only; isolate on control-plane network; encryption at rest.

  • API server: audit logging, admission webhooks mTLS, restrict anonymous auth, rate-limit.


6) Detection & response (ready-to-use)

6.1 Falco rules (container breakout attempts)

yaml
- rule: Write below root desc: Container writing to sensitive host paths condition: write_to_known_sensitive_file output: "Write to sensitive file (user=%user.name proc=%proc.name file=%fd.name)" priority: CRITICAL

6.2 K8s audit log – privileged pod creation

json
{"stage":"ResponseComplete","verb":"create","objectRef":{"resource":"pods"},"requestObject":{"spec":{"containers":[{"securityContext":{"privileged":true}}]}}}

Alert on any such event outside break-glass namespaces.

6.3 Hunt queries

  • Container spawning shell from web svc:

    • Parent: nginx/httpd/java/w3wp → Child: bash/sh/powershell/cmd

  • Kubelet exec storms: audit exec count per user/node > baseline.

  • Image drift: deployment digest ≠ last approved digest.

6.4 Incident playbook (short)

  1. Contain: cordon node or isolate namespace; block egress via NetworkPolicy; revoke service-account tokens.

  2. Triage: fetch pod describe, container logs, node journalctl, Falco events, kube-audit trail, image digest & SBOM.

  3. Scope: search for similar pods, suspicious admissions, unsigned images.

  4. Recover: redeploy from signed images; rotate secrets; re-issue kubelet cert if node compromised.

  5. Lessons: add/adjust admission policies; write regression tests.


7) Program plan (30–60–90)

Days 1–30 – Baseline

  • Enforce Pod Security restricted in all namespaces.

  • Default-deny NetworkPolicy + egress allow-lists.

  • Block :latest; require cosign signatures for prod.

  • Patch runc/containerd; enable seccomp RuntimeDefault.

Days 31–60 – Supply-chain & policy

  • SBOMs + image scanning in CI; fail builds on critical vulns.

  • Kyverno/Gatekeeper policies (no privileged/host namespaces; require non-root).

  • etct TLS/mTLS; secrets encryption at rest; rotate root certs.

Days 61–90 – Detection & drills

  • Deploy Falco/eBPF sensor; wire to SIEM; create dashboards for admissions, exec, privileged attempts.

  • Red-team: hostPath abuse, kubelet exec misuse, unsigned image admission.

  • Practice node compromise → cluster recovery.


8) Quick checklist

  • No privileged pods, no host namespaces, non-root users

  • Seccomp/AppArmor/SELinux enforced

  • NetworkPolicy default-deny + egress controls

  • Signed images, immutable digests, SBOM + scan

  • Admission control (Kyverno/Gatekeeper) enforcing baseline

  • etcd mTLS + encryption at rest; API audit logs on

  • Runtime patched (runc/containerd); kernel up-to-date

  • Falco/eBPF + SIEM detections; incident playbooks tested


Closing

Most “container hacks” are preventable: kill privileged pods, enforce policy at admission, sign what you run, and observe what runs. Do that, and even if a new runc/K8s CVE appears, your blast radius is small and your recovery is fast.

Comments