The Complete Terraform Guide: From Zero to Multi-Cloud
I still remember the moment Terraform clicked for me. I was SSHing into a production server at 2 AM, manually running apt-get install nginx for the fourth time that month, when it hit me: I was being a human script runner. Everything I was doing could be — and should be — codified, versioned, and automated.
That was five years ago. Since then, I've used Terraform to manage infrastructure across AWS, GCP, and Azure. I've made every mistake in the book: accidentally destroying production databases, creating circular dependencies that took hours to untangle, and writing modules so complex that nobody could understand them. This guide distills all of that pain into something useful.
Whether you're provisioning your first EC2 instance or architecting a multi-cloud platform, this guide covers the entire Terraform journey. No hand-waving, no "just use a module from the registry" — real understanding from the ground up.
Chapter 1: What Terraform Actually Is (and Isn't)
Terraform is a declarative infrastructure-as-code tool made by HashiCorp. You describe the desired state of your infrastructure in .tf files, and Terraform figures out how to make reality match your description.
The key insight is declarative vs imperative. You don't tell Terraform "create a server, then attach a disk, then configure a firewall rule." You say "I want a server with this disk and these firewall rules," and Terraform handles the sequencing, dependencies, and API calls.
The Core Workflow
# 1. Write your infrastructure code
# main.tf
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "birjob-web-server"
}
}
# 2. Initialize the working directory
terraform init
# 3. Preview what will change
terraform plan
# 4. Apply the changes
terraform apply
# 5. When you're done, tear it all down
terraform destroy
What Terraform Is Not
- Not a configuration management tool. Terraform provisions infrastructure (servers, networks, databases). Ansible, Chef, and Puppet configure what runs on that infrastructure. They're complementary, not competitors.
- Not a deployment tool. Terraform creates the server and the load balancer. Your CI/CD pipeline deploys your application code to them.
- Not magic. It still calls the same cloud APIs you would. It just does it consistently, repeatably, and with state tracking.
Chapter 2: State — Terraform's Memory
State is the concept that confuses most Terraform beginners and causes the most production incidents for experts. Terraform's state file (terraform.tfstate) is a JSON file that maps your configuration to real-world resources.
Why State Exists
Without state, Terraform would have to query every resource in your cloud account to figure out what it manages. With thousands of resources, this would be impossibly slow. State provides a cache and a mapping: "resource aws_instance.web in my config corresponds to EC2 instance i-0abc123def456 in AWS."
Remote State: Non-Negotiable for Teams
The default state is stored locally in a file. This is fine for learning but catastrophic for teams. Two people running terraform apply simultaneously with local state will corrupt your infrastructure. According to HashiCorp's documentation on remote state, remote backends with state locking are required for any team workflow.
# backend.tf — S3 backend with DynamoDB locking
terraform {
backend "s3" {
bucket = "birjob-terraform-state"
key = "production/terraform.tfstate"
region = "eu-west-1"
encrypt = true
dynamodb_table = "terraform-lock"
}
}
# Create the lock table first
resource "aws_dynamodb_table" "terraform_lock" {
name = "terraform-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
State Operations You Need to Know
# List all resources in state
terraform state list
# Show details of a specific resource
terraform state show aws_instance.web
# Move a resource (renamed in config)
terraform state mv aws_instance.web aws_instance.web_server
# Remove from state without destroying (adopting existing resources)
terraform state rm aws_instance.legacy
# Import an existing resource into state
terraform import aws_instance.web i-0abc123def456
# Force unlock state (dangerous, only when lock is stale)
terraform force-unlock LOCK_ID
Chapter 3: HCL Deep Dive — The Language
HashiCorp Configuration Language (HCL) is purpose-built for infrastructure. It's not a general-purpose programming language — and that's intentional. The constraints are features.
Variables and Outputs
# variables.tf
variable "environment" {
description = "Deployment environment"
type = string
default = "staging"
validation {
condition = contains(["staging", "production"], var.environment)
error_message = "Environment must be 'staging' or 'production'."
}
}
variable "instance_config" {
description = "EC2 instance configuration"
type = object({
instance_type = string
disk_size_gb = number
enable_monitoring = bool
})
default = {
instance_type = "t3.micro"
disk_size_gb = 20
enable_monitoring = false
}
}
# outputs.tf
output "web_server_ip" {
description = "Public IP of the web server"
value = aws_instance.web.public_ip
sensitive = false
}
output "database_connection_string" {
description = "Database connection string"
value = "postgresql://${var.db_user}:${var.db_password}@${aws_db_instance.main.endpoint}/birjob"
sensitive = true # Won't show in CLI output
}
Locals, Data Sources, and Dynamic Blocks
# Locals: computed values you reference multiple times
locals {
common_tags = {
Project = "birjob"
Environment = var.environment
ManagedBy = "terraform"
CostCenter = "engineering"
}
is_production = var.environment == "production"
instance_type = local.is_production ? "t3.large" : "t3.micro"
}
# Data sources: read existing infrastructure
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
# Dynamic blocks: generate repeated nested blocks
resource "aws_security_group" "web" {
name = "birjob-web-sg"
dynamic "ingress" {
for_each = var.allowed_ports
content {
from_port = ingress.value
to_port = ingress.value
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
}
For Expressions and Conditionals
# for expression: transform collections
locals {
# List of uppercase names
server_names = [for s in var.servers : upper(s.name)]
# Map from name to IP
server_ips = {for s in aws_instance.servers : s.tags.Name => s.private_ip}
# Filtered list
production_servers = [for s in var.servers : s if s.environment == "production"]
}
# Conditional resource creation
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
count = local.is_production ? 1 : 0 # Only in production
alarm_name = "birjob-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = 300
statistic = "Average"
threshold = 80
}
# for_each: better than count for named resources
resource "aws_iam_user" "team" {
for_each = toset(["alice", "bob", "charlie"])
name = each.value
}
Chapter 4: Modules — Reusable Infrastructure
Modules are Terraform's unit of reuse. A module is just a directory of .tf files with input variables and outputs. The Terraform module documentation provides conventions, but here's what actually works in practice.
Module Structure
modules/
├── vpc/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── README.md
├── ecs-service/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── iam.tf
│ └── alb.tf
└── rds/
├── main.tf
├── variables.tf
└── outputs.tf
Writing a Good Module
# modules/ecs-service/variables.tf
variable "service_name" {
description = "Name of the ECS service"
type = string
}
variable "container_image" {
description = "Docker image URI"
type = string
}
variable "container_port" {
description = "Port the container listens on"
type = number
default = 3000
}
variable "desired_count" {
description = "Number of task instances"
type = number
default = 2
}
variable "environment_variables" {
description = "Environment variables for the container"
type = map(string)
default = {}
}
# modules/ecs-service/main.tf
resource "aws_ecs_task_definition" "this" {
family = var.service_name
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = 256
memory = 512
execution_role_arn = aws_iam_role.execution.arn
container_definitions = jsonencode([{
name = var.service_name
image = var.container_image
essential = true
portMappings = [{
containerPort = var.container_port
protocol = "tcp"
}]
environment = [
for key, value in var.environment_variables : {
name = key
value = value
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.this.name
"awslogs-region" = data.aws_region.current.name
"awslogs-stream-prefix" = var.service_name
}
}
}])
}
# modules/ecs-service/outputs.tf
output "service_name" {
value = aws_ecs_service.this.name
}
output "task_definition_arn" {
value = aws_ecs_task_definition.this.arn
}
# Using the module
module "birjob_api" {
source = "./modules/ecs-service"
service_name = "birjob-api"
container_image = "123456789.dkr.ecr.eu-west-1.amazonaws.com/birjob-api:latest"
container_port = 3000
desired_count = 3
environment_variables = {
DATABASE_URL = module.rds.connection_string
REDIS_URL = module.elasticache.endpoint
NODE_ENV = "production"
}
}
Module Design Principles
After writing dozens of modules, these principles have saved me the most time:
- One module, one concern. A "vpc" module creates a VPC with subnets. It doesn't also create EC2 instances.
- Expose what consumers need, hide what they don't. Internal implementation details should not be variables.
- Use sensible defaults. A module should work with zero optional variables for the common case.
- Version your modules. Use Git tags:
source = "git::https://github.com/org/modules.git//vpc?ref=v2.1.0"
Chapter 5: Multi-Cloud Architecture
Running on multiple clouds is one of Terraform's greatest strengths — and greatest sources of complexity. The provider system lets you manage AWS, GCP, Azure, Cloudflare, GitHub, and hundreds of other services from the same codebase.
Provider Configuration
# providers.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
cloudflare = {
source = "cloudflare/cloudflare"
version = "~> 4.0"
}
}
}
provider "aws" {
region = "eu-west-1"
default_tags {
tags = local.common_tags
}
}
provider "google" {
project = "birjob-production"
region = "europe-west1"
}
# Multiple provider instances (aliases)
provider "aws" {
alias = "us_east"
region = "us-east-1"
}
# Use aliased provider
resource "aws_acm_certificate" "cdn_cert" {
provider = aws.us_east # CloudFront requires certs in us-east-1
domain_name = "birjob.com"
}
Multi-Cloud Strategy: When It Makes Sense
| Scenario | Recommendation | Why |
|---|---|---|
| Avoiding vendor lock-in | Usually not worth it | Abstraction cost > switching cost for most companies |
| Best-of-breed services | Good reason | GCP BigQuery + AWS Lambda + Cloudflare CDN |
| Regulatory requirements | Often necessary | Data residency laws may require specific regions/providers |
| Disaster recovery | Good for critical systems | Survive a full cloud provider outage |
| Cost optimization | High effort, high reward | Spot/preemptible pricing varies by cloud |
Chapter 6: Terraform in CI/CD
Manual terraform apply from a laptop is how production incidents happen. Every team beyond two people should run Terraform from CI/CD. According to HashiCorp's automation guide, automated workflows reduce configuration drift and human error significantly.
GitHub Actions Workflow
# .github/workflows/terraform.yml
name: Terraform
on:
pull_request:
paths: ['infrastructure/**']
push:
branches: [main]
paths: ['infrastructure/**']
permissions:
id-token: write
contents: read
pull-requests: write
jobs:
plan:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/terraform-ci
aws-region: eu-west-1
- name: Terraform Init
run: terraform init
working-directory: infrastructure
- name: Terraform Plan
id: plan
run: terraform plan -no-color -out=tfplan
working-directory: infrastructure
- name: Comment Plan on PR
uses: actions/github-script@v7
with:
script: |
const plan = `${{ steps.plan.outputs.stdout }}`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
});
apply:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment: production
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/terraform-ci
aws-region: eu-west-1
- name: Terraform Init
run: terraform init
working-directory: infrastructure
- name: Terraform Apply
run: terraform apply -auto-approve
working-directory: infrastructure
Safety Mechanisms
# Prevent accidental destruction of critical resources
resource "aws_db_instance" "production" {
# ... configuration ...
lifecycle {
prevent_destroy = true # Terraform will error if you try to destroy this
}
}
# Ignore changes made outside of Terraform
resource "aws_instance" "web" {
# ... configuration ...
lifecycle {
ignore_changes = [
tags["LastModifiedBy"], # Auto-updated by AWS Config
user_data, # Don't redeploy for user_data changes
]
}
}
Chapter 7: Advanced Patterns
Workspaces for Environment Isolation
# Create workspaces
terraform workspace new staging
terraform workspace new production
# Switch workspace
terraform workspace select production
# Use workspace in configuration
locals {
environment = terraform.workspace
config = {
staging = {
instance_type = "t3.micro"
min_size = 1
max_size = 2
}
production = {
instance_type = "t3.large"
min_size = 3
max_size = 10
}
}
current = local.config[local.environment]
}
Terragrunt for DRY Configuration
Terragrunt is a thin wrapper around Terraform that keeps configurations DRY across environments:
# live/
# ├── terragrunt.hcl (root config)
# ├── staging/
# │ ├── vpc/terragrunt.hcl
# │ ├── ecs/terragrunt.hcl
# │ └── rds/terragrunt.hcl
# └── production/
# ├── vpc/terragrunt.hcl
# ├── ecs/terragrunt.hcl
# └── rds/terragrunt.hcl
# live/production/ecs/terragrunt.hcl
terraform {
source = "../../../modules//ecs-service"
}
include "root" {
path = find_in_parent_folders()
}
dependency "vpc" {
config_path = "../vpc"
}
dependency "rds" {
config_path = "../rds"
}
inputs = {
service_name = "birjob-api"
container_image = "123456789.dkr.ecr.eu-west-1.amazonaws.com/birjob-api:v2.5.0"
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.private_subnet_ids
database_url = dependency.rds.outputs.connection_string
desired_count = 3
}
Testing Terraform Code
# Built-in validation (Terraform 1.5+)
check "health_check" {
data "http" "birjob_health" {
url = "https://${aws_lb.main.dns_name}/api/health"
}
assert {
condition = data.http.birjob_health.status_code == 200
error_message = "Health check failed after deployment"
}
}
# Terratest (Go-based integration testing)
# test/vpc_test.go
func TestVpc(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../modules/vpc",
Vars: map[string]interface{}{
"cidr_block": "10.0.0.0/16",
"environment": "test",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcId)
}
Chapter 8: My Opinionated Take
After five years of daily Terraform use, here's what I'd tell someone starting today:
1. Start simple, stay simple as long as possible. A single main.tf file with 200 lines is better than a "best practices" directory structure with 30 files and 3 modules for a simple project. Complexity should be earned by real pain, not anticipated.
2. State is your single point of failure. Treat your state backend with the same care you'd treat a production database. Enable versioning on the S3 bucket, enable encryption, restrict access. A corrupted state file can ruin your week.
3. Don't abstract too early. I've seen teams create elaborate module hierarchies for infrastructure they haven't validated yet. Write the resources inline first. When you find yourself copy-pasting the same block for the third time, then extract a module.
4. terraform plan is not enough. A plan that shows "0 to add, 1 to change, 0 to destroy" can still break your application. A security group change that's "in-place" might cause a brief connection drop. Always understand what is changing, not just the count.
5. Terraform is not the right tool for everything. Application configuration (feature flags, environment variables that change frequently) should live in your application's config system, not in Terraform. If you find yourself running terraform apply multiple times a day, something is in the wrong layer.
Chapter 9: Action Plan — Your First 30 Days
Days 1-7: Learn the Basics
- Install Terraform CLI and set up a free AWS account
- Create an EC2 instance, an S3 bucket, and a security group
- Practice
terraform plan,apply, anddestroy - Read the official getting started tutorial
Days 8-14: State and Backend
- Set up remote state with S3 + DynamoDB locking
- Practice
terraform statecommands - Import an existing resource into state
- Break your state on purpose (in dev!) and fix it
Days 15-21: Modules and Patterns
- Write your first module (start with a simple VPC)
- Use the module from a root configuration
- Explore the Terraform Registry for community modules
- Set up a basic CI/CD pipeline with
terraform planon PRs
Days 22-30: Production Readiness
- Implement workspaces or Terragrunt for multi-environment
- Set up automated
terraform applyon merge to main - Write your first
checkblock for post-apply validation - Document your infrastructure decisions in ADRs
Conclusion
Terraform changed how I think about infrastructure. It's not just an automation tool — it's a way of making infrastructure decisions visible, reviewable, and reversible. Every change goes through a PR. Every deployment has a plan. Every environment is reproducible.
The learning curve is real, but the payoff is enormous. Start small, commit your .tf files to version control from day one, and never, ever store state locally in production.
Sources
- HashiCorp Terraform Product Page
- Terraform Documentation: Remote State
- Terraform Documentation: Module Development
- Terraform Documentation: Providers
- Terraform Automation Tutorials
- Terragrunt: DRY Terraform
- Terraform Registry
I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.
