How Load Balancers Work: A Visual Guide for Developers
Early in my career, I deployed an application to a single server and called it a day. Traffic grew. The server got slow. I vertically scaled it (bigger machine). Traffic grew more. The server got slow again. Eventually, I hit the ceiling: the biggest machine wasn't big enough. That's when I learned about load balancers, and everything changed.
A load balancer is deceptively simple in concept: it distributes incoming requests across multiple servers. But the details matter enormously. Which algorithm distributes traffic most efficiently? Where should the load balancer sit in your architecture? How do you handle SSL termination? What happens when a backend server dies?
This guide answers all of these questions with clear explanations and practical comparisons. Whether you're deploying your first multi-server application or optimizing an existing architecture, this is what you need to know.
What Is a Load Balancer?
A load balancer sits between clients and servers. It receives incoming requests and forwards them to one of several backend servers (also called "upstream servers" or "backend pool"). The client doesn't know (or care) which server handles the request.
The core benefits:
- Horizontal scaling: Add more servers to handle more traffic (instead of making one server bigger)
- High availability: If one server fails, the load balancer routes traffic to healthy servers
- Maintenance flexibility: Take servers offline for updates without downtime
- Geographic distribution: Route users to the nearest server location
According to Nginx's documentation, a single Nginx instance can handle over 10,000 concurrent connections on commodity hardware, making it an effective load balancer for most applications. For larger scale, HAProxy has been benchmarked handling over 2 million concurrent connections.
Layer 4 vs. Layer 7 Load Balancing
Load balancers operate at different layers of the OSI model. The two most important are Layer 4 (Transport) and Layer 7 (Application).
| Aspect | Layer 4 (L4) | Layer 7 (L7) |
|---|---|---|
| Operates on | TCP/UDP connections | HTTP/HTTPS requests |
| Decision based on | IP address, port number | URL path, headers, cookies, body |
| Performance | Very fast (no content inspection) | Slower (parses HTTP) |
| SSL termination | Pass-through or terminate | Always terminates (must read HTTP) |
| Content-based routing | No | Yes (/api to backend A, /static to backend B) |
| Connection type | Connection-level (one upstream per connection) | Request-level (different upstreams per request) |
| Example tools | HAProxy (TCP mode), AWS NLB, LVS | Nginx, HAProxy (HTTP mode), AWS ALB, Envoy |
| Use case | Database, game servers, non-HTTP protocols | Web applications, APIs, microservices |
Rule of thumb: For web applications and APIs, use Layer 7. You'll want content-based routing, header inspection, and proper HTTP health checks. Use Layer 4 for non-HTTP protocols or when you need maximum performance and don't need HTTP-level features.
Load Balancing Algorithms
1. Round Robin
The simplest algorithm. Requests are distributed to servers in sequence: Server 1, Server 2, Server 3, Server 1, Server 2, Server 3, and so on.
Pros: Dead simple. Zero overhead. No state to maintain.
Cons: Ignores server capacity and current load. If one server is slower than the others, it accumulates a backlog.
Best for: Homogeneous server pools where all servers have identical hardware and handle requests in roughly the same time.
2. Weighted Round Robin
Like round robin, but each server gets a weight. A server with weight 3 gets three times as many requests as a server with weight 1.
Best for: Server pools with mixed hardware (e.g., some servers have 4 CPUs, others have 8).
3. Least Connections
Route each new request to the server with the fewest active connections. This naturally balances load when requests have varying processing times.
Pros: Adapts to actual server load. Great when request durations vary significantly.
Cons: Requires tracking connection counts. Slightly more overhead than round robin.
Best for: Applications with varying request durations (e.g., API servers where some endpoints are fast and others are slow).
4. Least Response Time
Route to the server with the lowest average response time and the fewest active connections. More sophisticated than least connections because it considers actual performance.
Best for: Performance-sensitive applications where you want to optimize for user-perceived latency.
5. IP Hash
Hash the client's IP address and use it to consistently route to the same server. This provides "sticky sessions" without cookies.
Pros: Session affinity without any application-level changes.
Cons: Uneven distribution if traffic comes from a few large IP ranges (corporate NATs, CDNs). Adding or removing servers changes the hash mapping and disrupts existing sessions.
6. Consistent Hashing
A more sophisticated version of IP hash. When servers are added or removed, only a fraction of requests are remapped (instead of all of them). Used by CDNs and distributed caches.
| Algorithm | State Required | Handles Heterogeneous Servers | Session Affinity | Best For |
|---|---|---|---|---|
| Round Robin | None | No | No | Homogeneous pools |
| Weighted Round Robin | Weights | Yes | No | Mixed hardware |
| Least Connections | Connection counts | Naturally | No | Variable-duration requests |
| Least Response Time | Response times | Naturally | No | Latency-sensitive apps |
| IP Hash | None | No | Yes | Simple session affinity |
| Consistent Hashing | Hash ring | Via virtual nodes | Yes | Cache proxies, CDNs |
Health Checks: How Load Balancers Detect Failures
A load balancer is only useful if it knows which servers are healthy. Health checks are the mechanism for this.
Types of Health Checks
- TCP Health Check: Can the load balancer establish a TCP connection to the server? Fastest but least informative. A server might accept TCP connections but return 500 errors.
- HTTP Health Check: Does the server return a 200 OK for a specific endpoint (typically
/healthor/healthz)? More informative. Can check that the application is actually running. - Deep Health Check: Does the server's
/healthendpoint verify database connectivity, cache availability, and other dependencies? Most informative but slowest. Risk of cascading failures if a shared dependency (like a database) fails and all servers become "unhealthy" simultaneously.
My recommendation: Use a two-tier approach. A liveness check (/healthz) that returns 200 if the process is running. A readiness check (/ready) that verifies the server can handle requests (database connected, cache available). The load balancer uses the readiness check for traffic routing.
This pattern is used by Kubernetes natively (liveness, readiness, and startup probes) and has become the industry standard.
Health Check Configuration
| Parameter | Typical Value | Description |
|---|---|---|
| Interval | 10-30 seconds | How often to check each server |
| Timeout | 5 seconds | How long to wait for a response |
| Unhealthy threshold | 2-3 failures | Consecutive failures before marking unhealthy |
| Healthy threshold | 2 successes | Consecutive successes before marking healthy again |
SSL/TLS Termination
SSL termination is where the load balancer decrypts HTTPS traffic and forwards plain HTTP to backend servers. This has several advantages:
- Centralized certificate management: Install certificates in one place instead of every server
- Reduced backend CPU: TLS encryption/decryption is CPU-intensive; offloading it to the load balancer frees backend CPU for application logic
- Simplified backend configuration: Backend servers only need to handle HTTP
The trade-off is that traffic between the load balancer and backend servers is unencrypted. In many environments (same VPC, same data center), this is acceptable. For zero-trust environments, use SSL re-encryption (load balancer terminates the client's TLS, then establishes a new TLS connection to the backend) or SSL pass-through (load balancer forwards the encrypted traffic without decrypting it).
Cloud Load Balancers Compared
| Feature | AWS ALB | AWS NLB | GCP HTTP(S) LB | Azure App Gateway |
|---|---|---|---|---|
| Layer | 7 | 4 | 7 | 7 |
| Protocols | HTTP, HTTPS, gRPC | TCP, UDP, TLS | HTTP, HTTPS | HTTP, HTTPS |
| Content routing | Path, host, header | No | Path, host, header | Path, host |
| WebSocket | Yes | Yes | Yes | Yes |
| WAF integration | AWS WAF | No | Cloud Armor | Azure WAF |
| Global/Regional | Regional | Regional | Global | Regional |
| Pricing model | Per hour + LCU | Per hour + LCU | Per rule + bandwidth | Per hour + capacity |
| Auto-scaling | Automatic | Automatic | Automatic | Automatic |
According to AWS documentation, an Application Load Balancer can handle millions of requests per second. For most web applications, a cloud load balancer is the right choice because it eliminates the operational burden of managing load balancer infrastructure.
Self-Managed Load Balancers: Nginx vs. HAProxy vs. Envoy
| Feature | Nginx | HAProxy | Envoy |
|---|---|---|---|
| Primary use | Web server + reverse proxy | Dedicated load balancer | Service proxy (service mesh) |
| Configuration | Config file (reload) | Config file (reload) | Dynamic API (xDS) |
| L4 + L7 | Both | Both | Both |
| HTTP/2 upstream | Yes (1.25.1+) | Yes (2.4+) | Yes |
| gRPC | Yes | Yes (2.0+) | Native |
| Hot reload | Yes (graceful) | Yes (hitless) | Yes (via API) |
| Observability | Basic (logs, stub_status) | Excellent (stats page, Prometheus) | Excellent (built-in Prometheus) |
| Learning curve | Low | Medium | High |
| Best for | General purpose | High-performance LB | Kubernetes / service mesh |
HAProxy consistently outperforms Nginx in load balancing benchmarks, handling higher connection counts with lower latency. A 2024 HAProxy benchmark showed it handling 2 million concurrent connections with sub-millisecond latency on a single instance. However, Nginx's versatility (web server + reverse proxy + load balancer) makes it the more common choice for teams that don't need HAProxy-level performance.
My Opinionated Take
1. Use your cloud provider's load balancer. Unless you have a specific reason not to (cost at extreme scale, multi-cloud requirements, specific features), use AWS ALB/NLB, GCP HTTP(S) Load Balancer, or Azure App Gateway. The operational simplicity is worth the premium.
2. Least connections should be your default algorithm. Round robin is fine for truly homogeneous workloads, but in practice, requests are never uniform. Least connections adapts naturally to varying request durations and server performance.
3. Always implement proper health checks. A load balancer without health checks is just a traffic splitter. Use HTTP-level readiness checks that verify the application can actually handle requests.
4. SSL termination at the load balancer is the right default. Unless you're in a regulated environment that requires end-to-end encryption, terminate TLS at the load balancer and use plain HTTP behind it. The CPU savings and operational simplicity are significant.
5. Avoid sticky sessions whenever possible. Sticky sessions (routing the same user to the same server) make it impossible to scale or maintain servers independently. Store session state in Redis or your database instead. If your application requires sticky sessions, it's a sign of a design problem that should be fixed.
Action Plan: Setting Up Load Balancing
Phase 1: Basic Setup
- Deploy your application to at least 2 servers
- Set up a load balancer (cloud LB or Nginx/HAProxy)
- Configure least connections algorithm
- Add HTTP health checks (/healthz endpoint)
- Configure SSL termination at the load balancer
Phase 2: Resilience
- Test failover: stop one server and verify traffic routes to the healthy server
- Set up monitoring: alert when a server is removed from the pool
- Configure connection draining for graceful shutdowns
- Add a /ready endpoint that checks database and cache connectivity
Phase 3: Optimization
- Enable HTTP/2 between the load balancer and clients
- Configure connection keepalives between the load balancer and backend servers
- Add request logging at the load balancer level for debugging
- Consider auto-scaling backend servers based on load balancer metrics
Key Takeaways
- A load balancer distributes traffic across multiple servers, enabling horizontal scaling and high availability.
- Use Layer 7 (HTTP) load balancing for web applications; Layer 4 (TCP) for non-HTTP protocols or maximum performance.
- Least connections is the best default algorithm for most workloads.
- Health checks are essential: use HTTP readiness checks, not just TCP port checks.
- SSL termination at the load balancer simplifies operations and reduces backend CPU usage.
- Cloud load balancers (AWS ALB, GCP HTTPS LB) should be your default choice; self-manage (Nginx, HAProxy) only when you have specific needs.
- Avoid sticky sessions. Externalize session state instead.
Sources
- Nginx Load Balancing Guide
- HAProxy Documentation
- Envoy Proxy Documentation
- AWS Elastic Load Balancing Documentation
- Kubernetes Health Probes Documentation
- OSI Model - Wikipedia
I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.
