How Load Balancers Work: A Visual Guide for Developers

Server infrastructure representing load distribution

Early in my career, I deployed an application to a single server and called it a day. Traffic grew. The server got slow. I vertically scaled it (bigger machine). Traffic grew more. The server got slow again. Eventually, I hit the ceiling: the biggest machine wasn't big enough. That's when I learned about load balancers, and everything changed.

A load balancer is deceptively simple in concept: it distributes incoming requests across multiple servers. But the details matter enormously. Which algorithm distributes traffic most efficiently? Where should the load balancer sit in your architecture? How do you handle SSL termination? What happens when a backend server dies?

This guide answers all of these questions with clear explanations and practical comparisons. Whether you're deploying your first multi-server application or optimizing an existing architecture, this is what you need to know.

What Is a Load Balancer?

A load balancer sits between clients and servers. It receives incoming requests and forwards them to one of several backend servers (also called "upstream servers" or "backend pool"). The client doesn't know (or care) which server handles the request.

The core benefits:

Horizontal scaling: Add more servers to handle more traffic (instead of making one server bigger)
High availability: If one server fails, the load balancer routes traffic to healthy servers
Maintenance flexibility: Take servers offline for updates without downtime
Geographic distribution: Route users to the nearest server location

According to Nginx's documentation, a single Nginx instance can handle over 10,000 concurrent connections on commodity hardware, making it an effective load balancer for most applications. For larger scale, HAProxy has been benchmarked handling over 2 million concurrent connections.

Layer 4 vs. Layer 7 Load Balancing

Load balancers operate at different layers of the OSI model. The two most important are Layer 4 (Transport) and Layer 7 (Application).

Aspect	Layer 4 (L4)	Layer 7 (L7)
Operates on	TCP/UDP connections	HTTP/HTTPS requests
Decision based on	IP address, port number	URL path, headers, cookies, body
Performance	Very fast (no content inspection)	Slower (parses HTTP)
SSL termination	Pass-through or terminate	Always terminates (must read HTTP)
Content-based routing	No	Yes (/api to backend A, /static to backend B)
Connection type	Connection-level (one upstream per connection)	Request-level (different upstreams per request)
Example tools	HAProxy (TCP mode), AWS NLB, LVS	Nginx, HAProxy (HTTP mode), AWS ALB, Envoy
Use case	Database, game servers, non-HTTP protocols	Web applications, APIs, microservices

Rule of thumb: For web applications and APIs, use Layer 7. You'll want content-based routing, header inspection, and proper HTTP health checks. Use Layer 4 for non-HTTP protocols or when you need maximum performance and don't need HTTP-level features.

Load Balancing Algorithms

1. Round Robin

The simplest algorithm. Requests are distributed to servers in sequence: Server 1, Server 2, Server 3, Server 1, Server 2, Server 3, and so on.

Pros: Dead simple. Zero overhead. No state to maintain.

Cons: Ignores server capacity and current load. If one server is slower than the others, it accumulates a backlog.

Best for: Homogeneous server pools where all servers have identical hardware and handle requests in roughly the same time.

2. Weighted Round Robin

Like round robin, but each server gets a weight. A server with weight 3 gets three times as many requests as a server with weight 1.

Best for: Server pools with mixed hardware (e.g., some servers have 4 CPUs, others have 8).

3. Least Connections

Route each new request to the server with the fewest active connections. This naturally balances load when requests have varying processing times.

Pros: Adapts to actual server load. Great when request durations vary significantly.

Cons: Requires tracking connection counts. Slightly more overhead than round robin.

Best for: Applications with varying request durations (e.g., API servers where some endpoints are fast and others are slow).

4. Least Response Time

Route to the server with the lowest average response time and the fewest active connections. More sophisticated than least connections because it considers actual performance.

Best for: Performance-sensitive applications where you want to optimize for user-perceived latency.

5. IP Hash

Hash the client's IP address and use it to consistently route to the same server. This provides "sticky sessions" without cookies.

Pros: Session affinity without any application-level changes.

Cons: Uneven distribution if traffic comes from a few large IP ranges (corporate NATs, CDNs). Adding or removing servers changes the hash mapping and disrupts existing sessions.

6. Consistent Hashing

A more sophisticated version of IP hash. When servers are added or removed, only a fraction of requests are remapped (instead of all of them). Used by CDNs and distributed caches.

Data center infrastructure with networking equipment

Algorithm	State Required	Handles Heterogeneous Servers	Session Affinity	Best For
Round Robin	None	No	No	Homogeneous pools
Weighted Round Robin	Weights	Yes	No	Mixed hardware
Least Connections	Connection counts	Naturally	No	Variable-duration requests
Least Response Time	Response times	Naturally	No	Latency-sensitive apps
IP Hash	None	No	Yes	Simple session affinity
Consistent Hashing	Hash ring	Via virtual nodes	Yes	Cache proxies, CDNs

Health Checks: How Load Balancers Detect Failures

A load balancer is only useful if it knows which servers are healthy. Health checks are the mechanism for this.

Types of Health Checks

TCP Health Check: Can the load balancer establish a TCP connection to the server? Fastest but least informative. A server might accept TCP connections but return 500 errors.
HTTP Health Check: Does the server return a 200 OK for a specific endpoint (typically /health or /healthz)? More informative. Can check that the application is actually running.
Deep Health Check: Does the server's /health endpoint verify database connectivity, cache availability, and other dependencies? Most informative but slowest. Risk of cascading failures if a shared dependency (like a database) fails and all servers become "unhealthy" simultaneously.

My recommendation: Use a two-tier approach. A liveness check (/healthz) that returns 200 if the process is running. A readiness check (/ready) that verifies the server can handle requests (database connected, cache available). The load balancer uses the readiness check for traffic routing.

This pattern is used by Kubernetes natively (liveness, readiness, and startup probes) and has become the industry standard.

Health Check Configuration

Parameter	Typical Value	Description
Interval	10-30 seconds	How often to check each server
Timeout	5 seconds	How long to wait for a response
Unhealthy threshold	2-3 failures	Consecutive failures before marking unhealthy
Healthy threshold	2 successes	Consecutive successes before marking healthy again

SSL/TLS Termination

SSL termination is where the load balancer decrypts HTTPS traffic and forwards plain HTTP to backend servers. This has several advantages:

Centralized certificate management: Install certificates in one place instead of every server
Reduced backend CPU: TLS encryption/decryption is CPU-intensive; offloading it to the load balancer frees backend CPU for application logic
Simplified backend configuration: Backend servers only need to handle HTTP

The trade-off is that traffic between the load balancer and backend servers is unencrypted. In many environments (same VPC, same data center), this is acceptable. For zero-trust environments, use SSL re-encryption (load balancer terminates the client's TLS, then establishes a new TLS connection to the backend) or SSL pass-through (load balancer forwards the encrypted traffic without decrypting it).

Cloud Load Balancers Compared

Cloud infrastructure and computing resources

Feature	AWS ALB	AWS NLB	GCP HTTP(S) LB	Azure App Gateway
Layer	7	4	7	7
Protocols	HTTP, HTTPS, gRPC	TCP, UDP, TLS	HTTP, HTTPS	HTTP, HTTPS
Content routing	Path, host, header	No	Path, host, header	Path, host
WebSocket	Yes	Yes	Yes	Yes
WAF integration	AWS WAF	No	Cloud Armor	Azure WAF
Global/Regional	Regional	Regional	Global	Regional
Pricing model	Per hour + LCU	Per hour + LCU	Per rule + bandwidth	Per hour + capacity
Auto-scaling	Automatic	Automatic	Automatic	Automatic

According to AWS documentation, an Application Load Balancer can handle millions of requests per second. For most web applications, a cloud load balancer is the right choice because it eliminates the operational burden of managing load balancer infrastructure.

Self-Managed Load Balancers: Nginx vs. HAProxy vs. Envoy

Feature	Nginx	HAProxy	Envoy
Primary use	Web server + reverse proxy	Dedicated load balancer	Service proxy (service mesh)
Configuration	Config file (reload)	Config file (reload)	Dynamic API (xDS)
L4 + L7	Both	Both	Both
HTTP/2 upstream	Yes (1.25.1+)	Yes (2.4+)	Yes
gRPC	Yes	Yes (2.0+)	Native
Hot reload	Yes (graceful)	Yes (hitless)	Yes (via API)
Observability	Basic (logs, stub_status)	Excellent (stats page, Prometheus)	Excellent (built-in Prometheus)
Learning curve	Low	Medium	High
Best for	General purpose	High-performance LB	Kubernetes / service mesh

HAProxy consistently outperforms Nginx in load balancing benchmarks, handling higher connection counts with lower latency. A 2024 HAProxy benchmark showed it handling 2 million concurrent connections with sub-millisecond latency on a single instance. However, Nginx's versatility (web server + reverse proxy + load balancer) makes it the more common choice for teams that don't need HAProxy-level performance.

My Opinionated Take

1. Use your cloud provider's load balancer. Unless you have a specific reason not to (cost at extreme scale, multi-cloud requirements, specific features), use AWS ALB/NLB, GCP HTTP(S) Load Balancer, or Azure App Gateway. The operational simplicity is worth the premium.

2. Least connections should be your default algorithm. Round robin is fine for truly homogeneous workloads, but in practice, requests are never uniform. Least connections adapts naturally to varying request durations and server performance.

3. Always implement proper health checks. A load balancer without health checks is just a traffic splitter. Use HTTP-level readiness checks that verify the application can actually handle requests.

4. SSL termination at the load balancer is the right default. Unless you're in a regulated environment that requires end-to-end encryption, terminate TLS at the load balancer and use plain HTTP behind it. The CPU savings and operational simplicity are significant.

5. Avoid sticky sessions whenever possible. Sticky sessions (routing the same user to the same server) make it impossible to scale or maintain servers independently. Store session state in Redis or your database instead. If your application requires sticky sessions, it's a sign of a design problem that should be fixed.

Action Plan: Setting Up Load Balancing

Phase 1: Basic Setup

Deploy your application to at least 2 servers
Set up a load balancer (cloud LB or Nginx/HAProxy)
Configure least connections algorithm
Add HTTP health checks (/healthz endpoint)
Configure SSL termination at the load balancer

Phase 2: Resilience

Test failover: stop one server and verify traffic routes to the healthy server
Set up monitoring: alert when a server is removed from the pool
Configure connection draining for graceful shutdowns
Add a /ready endpoint that checks database and cache connectivity

Phase 3: Optimization

Enable HTTP/2 between the load balancer and clients
Configure connection keepalives between the load balancer and backend servers
Add request logging at the load balancer level for debugging
Consider auto-scaling backend servers based on load balancer metrics

Developer optimizing server performance on laptop

Key Takeaways

A load balancer distributes traffic across multiple servers, enabling horizontal scaling and high availability.
Use Layer 7 (HTTP) load balancing for web applications; Layer 4 (TCP) for non-HTTP protocols or maximum performance.
Least connections is the best default algorithm for most workloads.
Health checks are essential: use HTTP readiness checks, not just TCP port checks.
SSL termination at the load balancer simplifies operations and reduces backend CPU usage.
Cloud load balancers (AWS ALB, GCP HTTPS LB) should be your default choice; self-manage (Nginx, HAProxy) only when you have specific needs.
Avoid sticky sessions. Externalize session state instead.

Sources

I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.

Loading BirJob...

How Load Balancers Work: A Visual Guide for Developers

How Load Balancers Work: A Visual Guide for Developers

What Is a Load Balancer?

Layer 4 vs. Layer 7 Load Balancing

Load Balancing Algorithms

1. Round Robin

2. Weighted Round Robin

3. Least Connections

4. Least Response Time

5. IP Hash

6. Consistent Hashing

Health Checks: How Load Balancers Detect Failures

Types of Health Checks

Health Check Configuration

SSL/TLS Termination

Cloud Load Balancers Compared

Self-Managed Load Balancers: Nginx vs. HAProxy vs. Envoy

My Opinionated Take

Action Plan: Setting Up Load Balancing

Phase 1: Basic Setup

Phase 2: Resilience

Phase 3: Optimization

Key Takeaways

Sources

İş axtarışınıza başlayın

Oxşar məqalələr