The Complete Terraform Guide: From Zero to Multi-Cloud

Code editor displaying infrastructure configuration on a developer screen

I still remember the moment Terraform clicked for me. I was SSHing into a production server at 2 AM, manually running apt-get install nginx for the fourth time that month, when it hit me: I was being a human script runner. Everything I was doing could be — and should be — codified, versioned, and automated.

That was five years ago. Since then, I've used Terraform to manage infrastructure across AWS, GCP, and Azure. I've made every mistake in the book: accidentally destroying production databases, creating circular dependencies that took hours to untangle, and writing modules so complex that nobody could understand them. This guide distills all of that pain into something useful.

Whether you're provisioning your first EC2 instance or architecting a multi-cloud platform, this guide covers the entire Terraform journey. No hand-waving, no "just use a module from the registry" — real understanding from the ground up.

Chapter 1: What Terraform Actually Is (and Isn't)

Terraform is a declarative infrastructure-as-code tool made by HashiCorp. You describe the desired state of your infrastructure in .tf files, and Terraform figures out how to make reality match your description.

The key insight is declarative vs imperative. You don't tell Terraform "create a server, then attach a disk, then configure a firewall rule." You say "I want a server with this disk and these firewall rules," and Terraform handles the sequencing, dependencies, and API calls.

The Core Workflow

# 1. Write your infrastructure code
# main.tf
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name = "birjob-web-server"
  }
}

# 2. Initialize the working directory
terraform init

# 3. Preview what will change
terraform plan

# 4. Apply the changes
terraform apply

# 5. When you're done, tear it all down
terraform destroy

What Terraform Is Not

Not a configuration management tool. Terraform provisions infrastructure (servers, networks, databases). Ansible, Chef, and Puppet configure what runs on that infrastructure. They're complementary, not competitors.
Not a deployment tool. Terraform creates the server and the load balancer. Your CI/CD pipeline deploys your application code to them.
Not magic. It still calls the same cloud APIs you would. It just does it consistently, repeatably, and with state tracking.

Chapter 2: State — Terraform's Memory

Abstract data flow visualization representing state management

State is the concept that confuses most Terraform beginners and causes the most production incidents for experts. Terraform's state file (terraform.tfstate) is a JSON file that maps your configuration to real-world resources.

Why State Exists

Without state, Terraform would have to query every resource in your cloud account to figure out what it manages. With thousands of resources, this would be impossibly slow. State provides a cache and a mapping: "resource aws_instance.web in my config corresponds to EC2 instance i-0abc123def456 in AWS."

Remote State: Non-Negotiable for Teams

The default state is stored locally in a file. This is fine for learning but catastrophic for teams. Two people running terraform apply simultaneously with local state will corrupt your infrastructure. According to HashiCorp's documentation on remote state, remote backends with state locking are required for any team workflow.

# backend.tf — S3 backend with DynamoDB locking
terraform {
  backend "s3" {
    bucket         = "birjob-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "eu-west-1"
    encrypt        = true
    dynamodb_table = "terraform-lock"
  }
}

# Create the lock table first
resource "aws_dynamodb_table" "terraform_lock" {
  name         = "terraform-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

State Operations You Need to Know

# List all resources in state
terraform state list

# Show details of a specific resource
terraform state show aws_instance.web

# Move a resource (renamed in config)
terraform state mv aws_instance.web aws_instance.web_server

# Remove from state without destroying (adopting existing resources)
terraform state rm aws_instance.legacy

# Import an existing resource into state
terraform import aws_instance.web i-0abc123def456

# Force unlock state (dangerous, only when lock is stale)
terraform force-unlock LOCK_ID

Chapter 3: HCL Deep Dive — The Language

HashiCorp Configuration Language (HCL) is purpose-built for infrastructure. It's not a general-purpose programming language — and that's intentional. The constraints are features.

Variables and Outputs

# variables.tf
variable "environment" {
  description = "Deployment environment"
  type        = string
  default     = "staging"

  validation {
    condition     = contains(["staging", "production"], var.environment)
    error_message = "Environment must be 'staging' or 'production'."
  }
}

variable "instance_config" {
  description = "EC2 instance configuration"
  type = object({
    instance_type = string
    disk_size_gb  = number
    enable_monitoring = bool
  })
  default = {
    instance_type     = "t3.micro"
    disk_size_gb      = 20
    enable_monitoring = false
  }
}

# outputs.tf
output "web_server_ip" {
  description = "Public IP of the web server"
  value       = aws_instance.web.public_ip
  sensitive   = false
}

output "database_connection_string" {
  description = "Database connection string"
  value       = "postgresql://${var.db_user}:${var.db_password}@${aws_db_instance.main.endpoint}/birjob"
  sensitive   = true  # Won't show in CLI output
}

Locals, Data Sources, and Dynamic Blocks

# Locals: computed values you reference multiple times
locals {
  common_tags = {
    Project     = "birjob"
    Environment = var.environment
    ManagedBy   = "terraform"
    CostCenter  = "engineering"
  }

  is_production = var.environment == "production"
  instance_type = local.is_production ? "t3.large" : "t3.micro"
}

# Data sources: read existing infrastructure
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"] # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

# Dynamic blocks: generate repeated nested blocks
resource "aws_security_group" "web" {
  name = "birjob-web-sg"

  dynamic "ingress" {
    for_each = var.allowed_ports
    content {
      from_port   = ingress.value
      to_port     = ingress.value
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
    }
  }
}

For Expressions and Conditionals

# for expression: transform collections
locals {
  # List of uppercase names
  server_names = [for s in var.servers : upper(s.name)]

  # Map from name to IP
  server_ips = {for s in aws_instance.servers : s.tags.Name => s.private_ip}

  # Filtered list
  production_servers = [for s in var.servers : s if s.environment == "production"]
}

# Conditional resource creation
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  count = local.is_production ? 1 : 0  # Only in production

  alarm_name          = "birjob-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 300
  statistic           = "Average"
  threshold           = 80
}

# for_each: better than count for named resources
resource "aws_iam_user" "team" {
  for_each = toset(["alice", "bob", "charlie"])
  name     = each.value
}

Chapter 4: Modules — Reusable Infrastructure

Modular architecture diagram with interconnected components

Modules are Terraform's unit of reuse. A module is just a directory of .tf files with input variables and outputs. The Terraform module documentation provides conventions, but here's what actually works in practice.

Module Structure

modules/
├── vpc/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   └── README.md
├── ecs-service/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── iam.tf
│   └── alb.tf
└── rds/
    ├── main.tf
    ├── variables.tf
    └── outputs.tf

Writing a Good Module

# modules/ecs-service/variables.tf
variable "service_name" {
  description = "Name of the ECS service"
  type        = string
}

variable "container_image" {
  description = "Docker image URI"
  type        = string
}

variable "container_port" {
  description = "Port the container listens on"
  type        = number
  default     = 3000
}

variable "desired_count" {
  description = "Number of task instances"
  type        = number
  default     = 2
}

variable "environment_variables" {
  description = "Environment variables for the container"
  type        = map(string)
  default     = {}
}

# modules/ecs-service/main.tf
resource "aws_ecs_task_definition" "this" {
  family                   = var.service_name
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 256
  memory                   = 512
  execution_role_arn       = aws_iam_role.execution.arn

  container_definitions = jsonencode([{
    name      = var.service_name
    image     = var.container_image
    essential = true

    portMappings = [{
      containerPort = var.container_port
      protocol      = "tcp"
    }]

    environment = [
      for key, value in var.environment_variables : {
        name  = key
        value = value
      }
    ]

    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.this.name
        "awslogs-region"        = data.aws_region.current.name
        "awslogs-stream-prefix" = var.service_name
      }
    }
  }])
}

# modules/ecs-service/outputs.tf
output "service_name" {
  value = aws_ecs_service.this.name
}

output "task_definition_arn" {
  value = aws_ecs_task_definition.this.arn
}

# Using the module
module "birjob_api" {
  source = "./modules/ecs-service"

  service_name    = "birjob-api"
  container_image = "123456789.dkr.ecr.eu-west-1.amazonaws.com/birjob-api:latest"
  container_port  = 3000
  desired_count   = 3

  environment_variables = {
    DATABASE_URL = module.rds.connection_string
    REDIS_URL    = module.elasticache.endpoint
    NODE_ENV     = "production"
  }
}

Module Design Principles

After writing dozens of modules, these principles have saved me the most time:

One module, one concern. A "vpc" module creates a VPC with subnets. It doesn't also create EC2 instances.
Expose what consumers need, hide what they don't. Internal implementation details should not be variables.
Use sensible defaults. A module should work with zero optional variables for the common case.
Version your modules. Use Git tags: source = "git::https://github.com/org/modules.git//vpc?ref=v2.1.0"

Chapter 5: Multi-Cloud Architecture

Running on multiple clouds is one of Terraform's greatest strengths — and greatest sources of complexity. The provider system lets you manage AWS, GCP, Azure, Cloudflare, GitHub, and hundreds of other services from the same codebase.

Provider Configuration

# providers.tf
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
    cloudflare = {
      source  = "cloudflare/cloudflare"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
  region = "eu-west-1"

  default_tags {
    tags = local.common_tags
  }
}

provider "google" {
  project = "birjob-production"
  region  = "europe-west1"
}

# Multiple provider instances (aliases)
provider "aws" {
  alias  = "us_east"
  region = "us-east-1"
}

# Use aliased provider
resource "aws_acm_certificate" "cdn_cert" {
  provider    = aws.us_east  # CloudFront requires certs in us-east-1
  domain_name = "birjob.com"
}

Multi-Cloud Strategy: When It Makes Sense

Scenario	Recommendation	Why
Avoiding vendor lock-in	Usually not worth it	Abstraction cost > switching cost for most companies
Best-of-breed services	Good reason	GCP BigQuery + AWS Lambda + Cloudflare CDN
Regulatory requirements	Often necessary	Data residency laws may require specific regions/providers
Disaster recovery	Good for critical systems	Survive a full cloud provider outage
Cost optimization	High effort, high reward	Spot/preemptible pricing varies by cloud

Chapter 6: Terraform in CI/CD

Automated pipeline visualization with continuous integration workflow

Manual terraform apply from a laptop is how production incidents happen. Every team beyond two people should run Terraform from CI/CD. According to HashiCorp's automation guide, automated workflows reduce configuration drift and human error significantly.

GitHub Actions Workflow

# .github/workflows/terraform.yml
name: Terraform
on:
  pull_request:
    paths: ['infrastructure/**']
  push:
    branches: [main]
    paths: ['infrastructure/**']

permissions:
  id-token: write
  contents: read
  pull-requests: write

jobs:
  plan:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/terraform-ci
          aws-region: eu-west-1

      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -out=tfplan
        working-directory: infrastructure

      - name: Comment Plan on PR
        uses: actions/github-script@v7
        with:
          script: |
            const plan = `${{ steps.plan.outputs.stdout }}`;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
            });

  apply:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment: production
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/terraform-ci
          aws-region: eu-west-1

      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure

      - name: Terraform Apply
        run: terraform apply -auto-approve
        working-directory: infrastructure

Safety Mechanisms

# Prevent accidental destruction of critical resources
resource "aws_db_instance" "production" {
  # ... configuration ...

  lifecycle {
    prevent_destroy = true  # Terraform will error if you try to destroy this
  }
}

# Ignore changes made outside of Terraform
resource "aws_instance" "web" {
  # ... configuration ...

  lifecycle {
    ignore_changes = [
      tags["LastModifiedBy"],  # Auto-updated by AWS Config
      user_data,               # Don't redeploy for user_data changes
    ]
  }
}

Chapter 7: Advanced Patterns

Workspaces for Environment Isolation

# Create workspaces
terraform workspace new staging
terraform workspace new production

# Switch workspace
terraform workspace select production

# Use workspace in configuration
locals {
  environment = terraform.workspace

  config = {
    staging = {
      instance_type = "t3.micro"
      min_size      = 1
      max_size      = 2
    }
    production = {
      instance_type = "t3.large"
      min_size      = 3
      max_size      = 10
    }
  }

  current = local.config[local.environment]
}

Terragrunt for DRY Configuration

Terragrunt is a thin wrapper around Terraform that keeps configurations DRY across environments:

# live/
# ├── terragrunt.hcl           (root config)
# ├── staging/
# │   ├── vpc/terragrunt.hcl
# │   ├── ecs/terragrunt.hcl
# │   └── rds/terragrunt.hcl
# └── production/
#     ├── vpc/terragrunt.hcl
#     ├── ecs/terragrunt.hcl
#     └── rds/terragrunt.hcl

# live/production/ecs/terragrunt.hcl
terraform {
  source = "../../../modules//ecs-service"
}

include "root" {
  path = find_in_parent_folders()
}

dependency "vpc" {
  config_path = "../vpc"
}

dependency "rds" {
  config_path = "../rds"
}

inputs = {
  service_name    = "birjob-api"
  container_image = "123456789.dkr.ecr.eu-west-1.amazonaws.com/birjob-api:v2.5.0"
  vpc_id          = dependency.vpc.outputs.vpc_id
  subnet_ids      = dependency.vpc.outputs.private_subnet_ids
  database_url    = dependency.rds.outputs.connection_string
  desired_count   = 3
}

Testing Terraform Code

# Built-in validation (Terraform 1.5+)
check "health_check" {
  data "http" "birjob_health" {
    url = "https://${aws_lb.main.dns_name}/api/health"
  }

  assert {
    condition     = data.http.birjob_health.status_code == 200
    error_message = "Health check failed after deployment"
  }
}

# Terratest (Go-based integration testing)
# test/vpc_test.go
func TestVpc(t *testing.T) {
    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "cidr_block":  "10.0.0.0/16",
            "environment": "test",
        },
    }

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    assert.NotEmpty(t, vpcId)
}

Chapter 8: My Opinionated Take

Strategic planning and decision-making workspace

After five years of daily Terraform use, here's what I'd tell someone starting today:

1. Start simple, stay simple as long as possible. A single main.tf file with 200 lines is better than a "best practices" directory structure with 30 files and 3 modules for a simple project. Complexity should be earned by real pain, not anticipated.

2. State is your single point of failure. Treat your state backend with the same care you'd treat a production database. Enable versioning on the S3 bucket, enable encryption, restrict access. A corrupted state file can ruin your week.

3. Don't abstract too early. I've seen teams create elaborate module hierarchies for infrastructure they haven't validated yet. Write the resources inline first. When you find yourself copy-pasting the same block for the third time, then extract a module.

4. terraform plan is not enough. A plan that shows "0 to add, 1 to change, 0 to destroy" can still break your application. A security group change that's "in-place" might cause a brief connection drop. Always understand what is changing, not just the count.

5. Terraform is not the right tool for everything. Application configuration (feature flags, environment variables that change frequently) should live in your application's config system, not in Terraform. If you find yourself running terraform apply multiple times a day, something is in the wrong layer.

Chapter 9: Action Plan — Your First 30 Days

Days 1-7: Learn the Basics

Install Terraform CLI and set up a free AWS account
Create an EC2 instance, an S3 bucket, and a security group
Practice terraform plan, apply, and destroy
Read the official getting started tutorial

Days 8-14: State and Backend

Set up remote state with S3 + DynamoDB locking
Practice terraform state commands
Import an existing resource into state
Break your state on purpose (in dev!) and fix it

Days 15-21: Modules and Patterns

Write your first module (start with a simple VPC)
Use the module from a root configuration
Explore the Terraform Registry for community modules
Set up a basic CI/CD pipeline with terraform plan on PRs

Days 22-30: Production Readiness

Implement workspaces or Terragrunt for multi-environment
Set up automated terraform apply on merge to main
Write your first check block for post-apply validation
Document your infrastructure decisions in ADRs

Conclusion

Terraform changed how I think about infrastructure. It's not just an automation tool — it's a way of making infrastructure decisions visible, reviewable, and reversible. Every change goes through a PR. Every deployment has a plan. Every environment is reproducible.

The learning curve is real, but the payoff is enormous. Start small, commit your .tf files to version control from day one, and never, ever store state locally in production.

Sources

I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.

Loading BirJob...

The Complete Terraform Guide: From Zero to Multi-Cloud

The Complete Terraform Guide: From Zero to Multi-Cloud

Chapter 1: What Terraform Actually Is (and Isn't)

The Core Workflow

What Terraform Is Not

Chapter 2: State — Terraform's Memory

Why State Exists

Remote State: Non-Negotiable for Teams

State Operations You Need to Know

Chapter 3: HCL Deep Dive — The Language

Variables and Outputs

Locals, Data Sources, and Dynamic Blocks

For Expressions and Conditionals

Chapter 4: Modules — Reusable Infrastructure

Module Structure

Writing a Good Module

Module Design Principles

Chapter 5: Multi-Cloud Architecture

Provider Configuration

Multi-Cloud Strategy: When It Makes Sense

Chapter 6: Terraform in CI/CD

GitHub Actions Workflow

Safety Mechanisms

Chapter 7: Advanced Patterns

Workspaces for Environment Isolation

Terragrunt for DRY Configuration

Testing Terraform Code

Chapter 8: My Opinionated Take

Chapter 9: Action Plan — Your First 30 Days

Days 1-7: Learn the Basics

Days 8-14: State and Backend

Days 15-21: Modules and Patterns

Days 22-30: Production Readiness

Conclusion

Sources

İş axtarışınıza başlayın

Oxşar məqalələr