Product leaders hear 'we need Kubernetes' and picture endless YAML and SRE hiring sprees. Vextrosys runs client workloads on EKS, GKE, and managed platforms like Railway for smaller tiers. Kubernetes earns its keep above ~$30k/month infra or when you need heterogeneous workloads, GPU nodes, and standardized deploys across services - not for a three-container MVP.
What Kubernetes gives you (in plain terms)
- Self-healing: crashed pods restart; unhealthy instances removed from load balancers
- Horizontal scaling: HPA on CPU, memory, or custom metrics (queue depth)
- Zero-downtime rollouts: rolling updates with readiness probes
- Portable packaging: same container image from laptop to production
When to skip K8s
If you have one monolith and one database, managed PaaS (Fly.io, Render, ECS Fargate) is cheaper in engineering time. We recommend K8s when you have 4+ services, multiple environments with parity requirements, or compliance mandates for network policies.
The abstraction stack that hides YAML hell
- Developers push images; CI builds and scans (Trivy) artifacts
- Helm or Kustomize charts templatize 80% of manifests - apps never hand-edit raw YAML in prod
- Argo CD or Flux syncs Git state to cluster - GitOps is the rollback story
- Ingress via AWS Load Balancer Controller or Gateway API; cert-manager for TLS
- External Secrets Operator pulls from AWS Secrets Manager - no secrets in Git
# Minimal Deployment pattern (team never edits by hand)
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: 123.dkr.ecr.us-east-1.amazonaws.com/api:{{ .Values.tag }}
readinessProbe:
httpGet: { path: /health, port: 8080 }Namespaces, environments, and blast radius
One cluster with namespace-per-environment (dev, staging, prod) works for mid-size teams if network policies isolate prod. Larger orgs use cluster-per-env. Resource quotas prevent staging from starving prod. Label everything: app, team, cost-center for chargeback.
Probes that actually work
Liveness kills and restarts; readiness removes from service. The classic mistake: liveness hits the same DB-dependent path as readiness - DB blip cycles the whole fleet. Liveness should be cheap (/health/live); readiness checks dependencies with short timeouts.
Autoscaling without surprise bills
- HPA min/max replicas set per service with business input
- Cluster Autoscaler needs headroom - we run node pools with buffer capacity
- VPA for right-sizing recommendations; don't auto-apply without review
- Spot instances for batch workers; on-demand for API critical path
The most expensive Kubernetes cluster is the one nobody monitors - set CPU/memory alerts and review monthly with finance.
Observability product teams should care about
You do not need to read Prometheus queries daily, but you need golden signals: latency, traffic, errors, saturation. We deploy kube-prometheus-stack or use Datadog agents. Distributed tracing (OpenTelemetry) from ingress to DB - product can see which features correlate with latency spikes.
Deploy workflow developers feel
- Merge to main → CI tests → image push → Argo sync (auto or manual promote)
- Preview environments per PR via Helm values + temporary namespace (optional)
- Rollback = revert Git commit or Argo history - minutes, not hours
- Runbooks linked from PagerDuty for failed deploy vs. failed health check
Platform team sizing
Expect 0.5-1 FTE platform engineer per 8-12 microservices until patterns stabilize. Templates and golden paths reduce this over time - that's what we deliver in platform engagements.
Security basics non-negotiable
- RBAC: developers deploy to staging, not prod without break-glass
- NetworkPolicies: default deny east-west, allow explicit service mesh paths
- Pod security standards: non-root containers, read-only root filesystem where possible
- Image signing and admission controllers for prod clusters
EKS specifics we standardize on
Managed node groups, IRSA for pod IAM roles (no long-lived AWS keys in clusters), ALB ingress, and RDS outside the cluster. Stateful workloads in K8s only when necessary - operators for Redis/Postgres exist but managed services reduce pager pain.
# IRSA annotation on ServiceAccount (pattern)
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/api-pod-roleKubernetes is infrastructure glue, not a product feature. Product teams win when platform engineers expose golden paths - deploy button, logs, metrics, env vars - and keep YAML as an implementation detail. That's how we implement K8s for clients who need scale without hiring a 10-person platform org on day one.