DevOps & Cloud Learning Roadmap
Master modern DevOps practices and cloud infrastructure management
Duration: 28 weeks | 3 steps | 37 topics
Career Opportunities
- DevOps Engineer
- Site Reliability Engineer
- Cloud Architect
- Infrastructure Engineer
Step 1: Linux & Command Line
Master Linux fundamentals and essential command line tools for system administration
Time: 6 weeks | Level: beginner
- Linux File System & Navigation (required) — Learn the Linux directory hierarchy, absolute and relative paths, and essential navigation commands like ls, cd, and pwd.
- The root directory (/) is the top of the filesystem hierarchy
- Key directories include /etc for configuration, /var for logs, and /home for users
- ls -la shows detailed file listings including hidden files and permissions
- Absolute paths start from / while relative paths start from the current directory
- File Manipulation (required) — Master file operations with cp, mv, rm, and powerful text processing tools like find, grep, sed, and awk for everyday tasks.
- cp copies files and directories, with -r for recursive directory copies
- find searches the filesystem by name, type, size, or modification time
- grep filters text using regular expressions across files and streams
- sed performs stream editing for search-and-replace transformations
- Users & Permissions (required) — Manage Linux users, groups, and file permissions using chmod, chown, groups, and sudo for secure system administration.
- Permissions use read (4), write (2), and execute (1) for owner, group, and others
- chmod changes permissions using numeric (755) or symbolic (u+x) notation
- chown changes file ownership to a different user or group
- sudo grants temporary root privileges for administrative commands
- Shell Scripting (required) — Write bash scripts with variables, conditionals, loops, and functions to automate repetitive system administration tasks.
- Scripts start with #!/bin/bash (shebang) and need execute permissions
- Variables are assigned without spaces and referenced with $ prefix
- Conditionals use if/elif/else with test brackets for comparisons
- Functions encapsulate reusable logic and accept positional parameters
- Process Management (required) — Monitor and control running processes using ps, top, kill, systemctl, manage services, and schedule tasks with cron.
- ps aux lists all running processes with their PID, CPU, and memory usage
- top provides a real-time interactive view of system resource usage
- kill sends signals to processes, with SIGTERM (15) for graceful and SIGKILL (9) for forced termination
- cron schedules recurring tasks using the minute/hour/day/month/weekday format
- Networking Basics (required) — Understand TCP/IP fundamentals, DNS resolution, port management, SSH connections, and diagnostic tools like curl and netstat.
- TCP/IP is a four-layer model: link, internet, transport, and application
- DNS translates domain names to IP addresses through recursive resolution
- Common ports include 22 (SSH), 80 (HTTP), 443 (HTTPS), and 5432 (PostgreSQL)
- curl makes HTTP requests from the command line for testing APIs and endpoints
- Package Management (recommended) — Install and manage software packages using apt, yum, and snap, and learn to compile from source when packages are unavailable.
- apt is the default package manager for Debian/Ubuntu distributions
- yum/dnf manages packages on RHEL, CentOS, and Fedora systems
- snap provides containerized cross-distribution package installation
- Compiling from source requires configure, make, and make install steps
- Vim/Nano Editors (recommended) — Edit files directly in the terminal using Vim and Nano, including basic navigation, editing, and search/replace operations.
- Vim has normal, insert, and command modes for different operations
- Nano is simpler with commands shown at the bottom of the screen
- In Vim, :wq saves and quits, :q! quits without saving
- Vim's /pattern searches forward and :%s/old/new/g replaces globally
- Environment Variables & Profiles (recommended) — Configure environment variables, shell profiles like .bashrc and .profile, manage the PATH variable, and use export for child processes.
- .bashrc runs for interactive non-login shells and is the common place for aliases
- PATH determines which directories the shell searches for executable commands
- export makes variables available to child processes spawned from the shell
- env lists all current environment variables in the session
- SSH & Remote Access (recommended) — Securely connect to remote servers using SSH with key-based authentication, configure SSH settings, set up tunneling, and transfer files with SCP.
- ssh-keygen generates public/private key pairs for passwordless authentication
- ~/.ssh/config simplifies connections with host aliases and default settings
- SSH tunneling forwards local or remote ports securely through encrypted connections
- SCP copies files securely between local and remote machines over SSH
- Log Management (optional) — View and manage system logs using journalctl and syslog, configure log rotation, and understand centralized logging basics.
- journalctl queries the systemd journal for service and kernel logs
- syslog stores messages in /var/log with facility and severity levels
- logrotate prevents log files from consuming all available disk space
- Centralized logging aggregates logs from multiple servers for analysis
- Linux Security Basics (optional) — Secure Linux systems using firewalls like ufw and iptables, understand SELinux policies, and apply system hardening techniques.
- ufw provides a simplified frontend for managing iptables firewall rules
- iptables defines packet filtering rules at the kernel level for network traffic
- SELinux enforces mandatory access controls beyond traditional Unix permissions
- Hardening includes disabling root SSH login, using fail2ban, and minimizing installed packages
Step 2: Containerization & CI/CD
Learn container technologies and orchestration with Docker and Kubernetes, and build CI/CD pipelines
Time: 10 weeks | Level: intermediate
- Docker Fundamentals (required) — Understand Docker images, containers, Dockerfiles, image layers, and the build process for packaging applications.
- Images are read-only templates built from Dockerfiles in sequential layers
- Containers are running instances of images with their own writable layer
- Each Dockerfile instruction creates a new layer that is cached for faster rebuilds
- docker build, run, stop, and rm are the core lifecycle commands
- Docker Compose (required) — Define and run multi-container applications with Docker Compose using volumes, networks, and service dependency management.
- docker-compose.yml defines services, networks, and volumes declaratively
- Volumes persist data beyond the container lifecycle for databases and state
- Networks isolate communication between groups of related containers
- depends_on controls service startup order but does not wait for readiness
- Container Registries (required) — Push and pull container images from registries including Docker Hub, Amazon ECR, and Google Container Registry with proper tagging strategies.
- Docker Hub is the default public registry for community and official images
- Private registries like ECR and GCR store proprietary application images securely
- Semantic version tags (v1.2.3) are preferred over mutable tags like latest
- Image scanning detects known vulnerabilities in base images and dependencies
- CI/CD Concepts (required) — Understand continuous integration, continuous delivery, and continuous deployment pipeline stages and best practices.
- Continuous integration merges and tests code changes frequently, ideally multiple times per day
- Continuous delivery ensures code is always in a deployable state through automated pipelines
- Continuous deployment automatically releases every change that passes the pipeline to production
- Pipeline stages typically include build, test, security scan, and deploy
- GitHub Actions (required) — Build CI/CD workflows with GitHub Actions using workflow files, jobs, steps, secrets management, and matrix build strategies.
- Workflows are defined in YAML files under .github/workflows/
- Jobs run in parallel by default and can be configured with dependencies
- Secrets are encrypted environment variables for API keys and credentials
- Matrix builds test across multiple OS versions, language versions, or configurations
- Jenkins Basics (required) — Set up Jenkins CI/CD pipelines using declarative Jenkinsfiles, configure plugins, and manage build agents for distributed builds.
- Declarative pipelines use a structured Jenkinsfile with stages and steps
- Plugins extend Jenkins with integrations for Docker, Kubernetes, Slack, and more
- Build agents distribute workload across multiple machines for parallel execution
- Shared libraries promote code reuse across multiple pipeline definitions
- Container Orchestration Concepts (recommended) — Understand why Kubernetes is needed for production workloads and learn core concepts including pods, services, and deployments.
- Kubernetes automates deployment, scaling, and management of containerized apps
- Pods are the smallest deployable units containing one or more containers
- Services expose pods to network traffic with stable endpoints and load balancing
- Deployments manage pod replicas and enable rolling updates with zero downtime
- GitOps Workflow (recommended) — Implement GitOps practices using tools like ArgoCD and Flux for declarative, Git-driven infrastructure and application management.
- Git is the single source of truth for both application and infrastructure state
- ArgoCD continuously syncs the cluster state with the desired state in Git
- Pull-based deployment is more secure than push-based as the cluster pulls changes
- Declarative configuration eliminates manual kubectl commands in production
- Artifact Management (recommended) — Manage build artifacts using npm registries, Nexus Repository, and implement versioning strategies for reproducible builds.
- Artifact repositories store versioned build outputs for deployment and rollback
- Semantic versioning communicates the nature of changes in each release
- Nexus and Artifactory support multiple package formats in a single repository
- Testing in CI/CD (recommended) — Integrate unit tests, integration tests, and automated test suites into CI/CD pipelines for continuous quality assurance.
- Unit tests run first and fastest, catching issues at the function level
- Integration tests verify interactions between services and external dependencies
- Test reports and coverage metrics should be published as pipeline artifacts
- Flaky tests must be quarantined to maintain pipeline reliability
- GitLab CI/CD (optional) — Configure CI/CD pipelines in GitLab using .gitlab-ci.yml, manage runners, and set up deployment environments.
- .gitlab-ci.yml defines pipeline stages, jobs, and their execution rules
- Runners are agents that execute CI/CD jobs on shared or dedicated infrastructure
- Environments track deployments to staging, production, and review apps
- Auto DevOps provides pre-configured pipelines for common project types
- Build Tools & Strategies (optional) — Optimize container builds with multi-stage builds, layer caching, and build strategies that reduce image size and build time.
- Multi-stage builds separate build dependencies from the final runtime image
- Layer caching skips unchanged layers to dramatically speed up rebuilds
- Ordering Dockerfile instructions from least to most frequently changed maximizes cache hits
- Distroless and Alpine base images minimize the attack surface and image size
Step 3: Cloud & Infrastructure as Code
Master cloud services and infrastructure as code with AWS, Terraform, and Kubernetes
Time: 12 weeks | Level: advanced
- AWS Core Services (required) — Learn essential AWS services including EC2 for compute, S3 for storage, VPC for networking, IAM for access control, RDS for databases, and Lambda for serverless.
- EC2 provides resizable virtual servers with multiple instance types for different workloads
- S3 offers virtually unlimited object storage with 99.999999999% durability
- VPC creates isolated network environments with subnets, route tables, and security groups
- IAM controls access with users, roles, and policies following least-privilege principles
- Infrastructure as Code (Terraform) (required) — Provision and manage cloud infrastructure declaratively using Terraform with providers, resources, state management, and reusable modules.
- Providers connect Terraform to cloud platforms like AWS, GCP, and Azure
- Resources define the infrastructure components to create and manage
- State tracks the mapping between configuration and real-world resources
- Modules encapsulate reusable infrastructure patterns with input variables and outputs
- Kubernetes Deep Dive (required) — Master advanced Kubernetes concepts including pod management, deployments, services, ConfigMaps, Secrets, and Ingress controllers.
- Deployments manage ReplicaSets and enable rolling updates and rollbacks
- ConfigMaps decouple configuration from container images for environment flexibility
- Secrets store sensitive data like passwords and tokens with base64 encoding
- Ingress controllers route external HTTP/HTTPS traffic to internal services
- Monitoring & Observability (required) — Implement monitoring and observability using Prometheus for metrics collection, Grafana for dashboards, and configure alerting rules.
- Prometheus scrapes metrics from targets at configured intervals using a pull model
- PromQL queries time-series data for alerts, dashboards, and ad-hoc analysis
- Grafana visualizes metrics from multiple data sources in customizable dashboards
- Alerting rules trigger notifications when metrics exceed defined thresholds
- Logging at Scale (required) — Aggregate and analyze logs at scale using the ELK/EFK stack, AWS CloudWatch, and implement structured logging practices.
- ELK stack combines Elasticsearch, Logstash, and Kibana for log aggregation and search
- EFK replaces Logstash with Fluentd for lighter-weight log forwarding in Kubernetes
- CloudWatch provides native AWS log collection, monitoring, and alarming
- Structured logging with JSON format enables efficient parsing and querying
- Helm Charts (recommended) — Package Kubernetes applications with Helm using templates, values files, chart repositories, and manage upgrades and rollbacks.
- Helm charts are packages of pre-configured Kubernetes resource templates
- Values files customize chart behavior without modifying the templates directly
- Chart repositories host and distribute charts like package registries
- helm upgrade and rollback manage releases with version history tracking
- AWS Advanced (recommended) — Explore advanced AWS services including ECS, EKS, CloudFormation, Route53 for DNS, and messaging services like SNS and SQS.
- ECS runs containers on AWS with Fargate for serverless or EC2 for managed compute
- EKS provides managed Kubernetes with AWS integration for networking and IAM
- Route53 handles DNS routing with health checks and failover configurations
- SNS/SQS enable decoupled architectures with pub/sub and message queue patterns
- GCP / Azure Basics (recommended) — Compare major cloud providers including GCP and Azure, and understand multi-cloud considerations for avoiding vendor lock-in.
- GCP excels in data analytics, machine learning, and Kubernetes (GKE)
- Azure integrates deeply with Microsoft enterprise tools and Active Directory
- Multi-cloud strategies reduce vendor lock-in but increase operational complexity
- Terraform and Pulumi enable infrastructure code that works across cloud providers
- Service Mesh (recommended) — Implement service mesh with Istio and Envoy for traffic management, mutual TLS encryption, and observability between microservices.
- A service mesh manages service-to-service communication with sidecar proxies
- Istio provides traffic management, security, and observability for microservices
- Envoy proxy handles load balancing, retries, and circuit breaking transparently
- Mutual TLS (mTLS) encrypts all service-to-service communication automatically
- Secrets Management (recommended) — Securely store and manage secrets using HashiCorp Vault, AWS Secrets Manager, and Kubernetes Sealed Secrets.
- Vault provides dynamic secrets, encryption as a service, and access control
- AWS Secrets Manager rotates credentials automatically on a configured schedule
- Sealed Secrets encrypt Kubernetes secrets safely for storage in Git repositories
- Never store secrets in code, environment files, or container images
- Chaos Engineering (optional) — Practice chaos engineering with tools like Chaos Monkey and Litmus, run game days, and understand blast radius management.
- Chaos engineering proactively injects failures to discover system weaknesses
- Start with small blast radius experiments in non-production environments
- Game days are scheduled events where teams practice incident response with controlled chaos
- Litmus provides Kubernetes-native chaos experiments with CRD-based workflows
- Cost Optimization (optional) — Optimize cloud spending using FinOps practices, reserved and spot instances, and right-sizing resources to match actual workloads.
- FinOps brings financial accountability to cloud spending through cross-team collaboration
- Reserved instances save 30-60% over on-demand pricing for predictable workloads
- Spot instances offer up to 90% savings for fault-tolerant and flexible workloads
- Right-sizing matches instance types to actual resource utilization to eliminate waste
- Platform Engineering (optional) — Build internal developer platforms using tools like Backstage to provide self-service infrastructure and improve developer experience.
- Platform engineering builds golden paths that simplify infrastructure for developers
- Backstage provides a unified developer portal with service catalogs and templates
- Self-service platforms reduce time-to-deploy and dependency on platform teams
- Internal developer platforms standardize tooling while allowing flexibility for teams
