Container & Kubernetes Security: Real-World Vulnerabilities, Exploit Paths, and a Defense Blueprint By CyberDudeBivash — Founder, CyberDudeBivash | Cybersecurity & AI
Executive summary
Containers and Kubernetes move fast—and so do attackers. Most real incidents are misconfig + weak identity + supply-chain rather than “exotic kernel 0-days”. But container escapes and API-server flaws still happen. This guide maps the attack surface for Docker/containerd/runc and Kubernetes control/worker planes, walks through common exploits, and gives copy-paste hardening, detection, and response you can implement today.
1) How the stack fits together (threat model)
Container runtime path: image → containerd/CRI-O → runc (creates Linux namespaces/cgroups) → kernel.
Kubernetes control plane: API Server ⇄ etcd (state) ⇄ Controller Manager & Scheduler; nodes run kubelet + CNI (network) + CSI (storage) + runtime.
Trust boundaries (red zones):
-
Container ↔ host kernel (escapes, privilege escalation).
-
Kubelet / API Server (authn/authz mistakes, open ports).
-
etcd (cluster secrets).
-
Admission & Supply chain (malicious images, mutable tags).
-
CNI/Network (flat east-west traffic).
-
Volumes / hostPath (symlink & subPath tricks).
2) Container vulnerabilities & exploit patterns
2.1 Runtime escapes (runc/containerd)
-
runc “/proc/self/exe” & FD leaks (e.g., CVE-2019-5736; later variants like CVE-2024-21626): attacker overwrites or abuses the runc binary/FD during
exec
to gain host execution. -
containerd CRI bugs (e.g., CVE-2022-23648): crafted image/manifests or mounts leading to unexpected host access.
-
Kernel bugs reachable from namespaces: UAF/overlayfs flaws to write on host.
Exploit flow (typical): run a malicious image → trigger runc/containerd bug during docker exec
or pod start → obtain host shell → pivot to node credentials → cluster admin via kubelet API or cloud IAM.
2.2 Capability & privilege abuse
-
Containers running as root with
CAP_SYS_ADMIN
(or--privileged
) can: mount filesystems, manipulate cgroups, or access/dev
to break isolation. -
No seccomp/AppArmor/SELinux → dangerous syscalls (e.g.,
ptrace
,bpf
) allowed.
2.3 hostPath & subPath volume attacks
-
hostPath mounts expose host directories; combined with symlink races → arbitrary host file write.
-
subPath volume handling historically hit symlink/TOCTOU issues (e.g., CVE-2021-25741 class).
-
Kubernetes Windows hostProcess containers can reach host services if misused.
2.4 Image & registry supply-chain
-
Typosquatted/malicious images on public registries.
-
Mutable tags (
:latest
) pull different bits over time; poisoned base images or Dockerfile FROM chains. -
Embedded secrets in layers (forgotten
.npmrc
, SSH keys). -
Build system takeover (CI tokens → push trojaned images).
3) Kubernetes vulnerabilities & exploit paths
3.1 API server & aggregated APIs
-
Proxy/upgrade request mishandling can forward attacker traffic to backends (e.g., CVE-2018-1002105 class), yielding privilege escalation.
-
Over-permissive RBAC (e.g.,
create pods
/secrets
in prod) converts any workload compromise into cluster admin.
3.2 etcd exposure
-
etcd stores Secrets. If reachable without TLS+auth, one GET = full cluster compromise.
3.3 Kubelet issues
-
Legacy readonly port (10255) exposure leaks metrics/pods; misconfigured kubelet credentials allow
exec
/cp
into pods.
3.4 CNI / Network
-
Flat networks let a compromised pod scan and reach every service.
-
DNAT/externalIPs misconfig (e.g., CVE-2020-8554 class) can enable MITM within cluster.
3.5 Admission & policy gaps
-
No Pod Security / PodSecurityPolicy (legacy) replacement → pods request privileged, hostPID, hostNetwork, or unsafe capabilities and get them.
-
Missing image signature policy lets untrusted images in.
4) Realistic attacker playbooks
Playbook A — Malicious public image → node → cluster
-
Pull image
company/app:latest
(poisoned). -
Container phones home, drops toolkit, enumerates service account token.
-
If pod has
list/create pods
orsecrets
, attacker spawns privileged pod / steals cluster secrets. -
Uses cloud metadata IAM via kube-node role to escalate in cloud.
Playbook B — Exposed Docker API (2375) or kubelet creds
-
Internet scan finds unauth Docker/kubelet; attacker
docker run --privileged -v /:/host
orkubectl exec
. -
Writes SSH key to
/host/root/.ssh/authorized_keys
; persistence achieved. -
Mass-deploy crypto-miners or exfil data.
Playbook C — hostPath/subPath write
-
Dev grants
hostPath: /var/lib/kubelet/pods
to sidecar for debugging. -
Attacker symlinks to
/etc/shadow
or kubelet cert dir → writes controlled content. -
Replaces node service or steals kubelet cert → cluster admin.
5) Hardening that actually works (copy-paste)
5.1 Container run options (least privilege)
5.2 Block dangerous mounts
-
Avoid
hostPath
. If required, usetype: DirectoryOrCreate
, read-only, and path-allowlists. -
Do not mount
/var/run/docker.sock
, kubelet dirs, or/proc
//sys
into pods.
5.3 Admission control: Pod Security + policy engines
Pod Security Admission (v1.25+): label namespaces to baseline/restricted:
Kyverno/Gatekeeper examples:
-
Deny
privileged: true
,hostNetwork: true
,hostPID: true
. -
Require
runAsNonRoot
,readOnlyRootFilesystem
,seccompProfile: RuntimeDefault
. -
Enforce
image: registry.example.com/*
and signature required (Sigstore/cosign).
5.4 Image supply-chain controls
-
Build SBOMs (Syft) and scan (Trivy/Grype).
-
Sign images:
cosign sign --key kms://… <image>
; admission webhook rejects unsigned. -
Pin immutable digests:
-
Block root in Dockerfiles:
USER 10000:10000
.
5.5 Node & runtime hardening
-
Keep runc/containerd current; enable AppArmor/SELinux; lock kernel with LSMs.
-
Enable seccomp default (
RuntimeDefault
). -
Isolate node IAM roles; disable cloud metadata access from non-system pods (IMDSv2 hop-limit, firewall).
5.6 Network policies (CNI)
Then add allow policies per app (db only from app, egress only to needed APIs).
5.7 etcd & control plane
-
etcd TLS/mTLS only; isolate on control-plane network; encryption at rest.
-
API server: audit logging, admission webhooks mTLS, restrict anonymous auth, rate-limit.
6) Detection & response (ready-to-use)
6.1 Falco rules (container breakout attempts)
6.2 K8s audit log – privileged pod creation
Alert on any such event outside break-glass namespaces.
6.3 Hunt queries
-
Container spawning shell from web svc:
-
Parent:
nginx/httpd/java/w3wp
→ Child:bash/sh/powershell/cmd
-
-
Kubelet exec storms: audit
exec
count per user/node > baseline. -
Image drift: deployment digest ≠ last approved digest.
6.4 Incident playbook (short)
-
Contain: cordon node or isolate namespace; block egress via NetworkPolicy; revoke service-account tokens.
-
Triage: fetch pod
describe
, container logs, nodejournalctl
, Falco events, kube-audit trail, image digest & SBOM. -
Scope: search for similar pods, suspicious admissions, unsigned images.
-
Recover: redeploy from signed images; rotate secrets; re-issue kubelet cert if node compromised.
-
Lessons: add/adjust admission policies; write regression tests.
7) Program plan (30–60–90)
Days 1–30 – Baseline
-
Enforce Pod Security restricted in all namespaces.
-
Default-deny NetworkPolicy + egress allow-lists.
-
Block
:latest
; require cosign signatures for prod. -
Patch runc/containerd; enable seccomp RuntimeDefault.
Days 31–60 – Supply-chain & policy
-
SBOMs + image scanning in CI; fail builds on critical vulns.
-
Kyverno/Gatekeeper policies (no privileged/host namespaces; require non-root).
-
etct TLS/mTLS; secrets encryption at rest; rotate root certs.
Days 61–90 – Detection & drills
-
Deploy Falco/eBPF sensor; wire to SIEM; create dashboards for admissions, exec, privileged attempts.
-
Red-team: hostPath abuse, kubelet exec misuse, unsigned image admission.
-
Practice node compromise → cluster recovery.
8) Quick checklist
-
No privileged pods, no host namespaces, non-root users
-
Seccomp/AppArmor/SELinux enforced
-
NetworkPolicy default-deny + egress controls
-
Signed images, immutable digests, SBOM + scan
-
Admission control (Kyverno/Gatekeeper) enforcing baseline
-
etcd mTLS + encryption at rest; API audit logs on
-
Runtime patched (runc/containerd); kernel up-to-date
-
Falco/eBPF + SIEM detections; incident playbooks tested
Closing
Most “container hacks” are preventable: kill privileged pods, enforce policy at admission, sign what you run, and observe what runs. Do that, and even if a new runc/K8s CVE appears, your blast radius is small and your recovery is fast.
Comments
Post a Comment