Rate Limiting, Throttling, and Backpressure: A Developer's Guide

Server infrastructure with traffic flow visualization

I once watched a partner integration take down our entire API. Not a DDoS attack. Not a bug. Just one enthusiastic client running a data sync without any delay between requests. They sent 12,000 API calls in 45 seconds. Our database connection pool was exhausted in under a minute. Every other customer got 503 errors for the next ten minutes while we scrambled to recover.

That day, I learned a lesson that every developer learns eventually: if you don't control how traffic enters your system, someone else will control it for you, usually at the worst possible time.

Rate limiting, throttling, and backpressure are three related but distinct mechanisms for managing traffic flow. They're often conflated, sometimes misunderstood, and frequently implemented incorrectly. This guide breaks down each one, shows you when to use which, and gives you practical implementations you can deploy today.

Definitions: Getting the Terminology Right

Before we go deeper, let's be precise about what these terms mean:

Mechanism	What It Does	Who Enforces It	Response to Excess
Rate Limiting	Caps the number of requests in a time window	Server / API Gateway	Reject (HTTP 429)
Throttling	Slows down request processing	Server / Client	Queue or delay
Backpressure	Signals upstream to slow down	Consumer / Downstream	Propagate signal upstream

Rate limiting says "no more than 100 requests per minute." Throttling says "I'll process your request, but you might have to wait." Backpressure says "I'm overwhelmed, please slow down." They work at different layers and solve different problems.

Rate Limiting Algorithms: The Core Four

1. Fixed Window Counter

The simplest approach. Divide time into fixed windows (e.g., 1-minute intervals). Count requests per window. Reject when the count exceeds the limit.

Problem: Burst at window boundaries. A client can send 100 requests at 11:59:59 and another 100 at 12:00:00, effectively sending 200 requests in 2 seconds while staying within the "100 per minute" limit for both windows.

Use when: You need something simple and the boundary burst problem is acceptable.

2. Sliding Window Log

Store the timestamp of every request. When a new request arrives, remove timestamps older than the window size, then count remaining entries. Reject if the count exceeds the limit.

Problem: Memory-intensive. Storing timestamps for every request for every user adds up quickly. If you have 10,000 users each making 100 requests per minute, that's 1 million timestamps in memory.

Use when: You need precise rate limiting and memory isn't a constraint.

3. Sliding Window Counter

A hybrid approach. Use two fixed windows and calculate a weighted count based on the current position within the window. If the current position is 30% into the window, the count is: (current window count) + (previous window count * 70%).

Advantage: Good accuracy with low memory overhead. This is what most production systems use.

Use when: You want a good balance of accuracy and performance. This should be your default choice.

4. Token Bucket

Imagine a bucket that holds tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, which allows short bursts.

Advantage: Naturally handles bursts up to the bucket capacity while enforcing a long-term average rate. This is what Stripe, GitHub, and most major APIs use.

Use when: You want to allow burst traffic while maintaining a steady average rate.

Algorithm	Memory	Accuracy	Burst Handling	Complexity
Fixed Window	Very Low	Low	Poor	Very Low
Sliding Window Log	High	Very High	Good	Medium
Sliding Window Counter	Low	High	Good	Medium
Token Bucket	Very Low	High	Excellent	Low

Code on dark screen representing algorithm implementation

Implementing Rate Limiting with Redis

Redis is the de facto standard for distributed rate limiting. Its atomic operations and built-in expiration make it ideal. Here's a sliding window counter implementation:

The key insight is using Redis's MULTI/EXEC for atomicity. The pattern is:

Increment the counter for the current window
Set expiration on the key (so old windows are automatically cleaned up)
Get the counter for the previous window
Calculate the weighted count
Decide: allow or reject

For the token bucket algorithm, Redis's EVALSHA with a Lua script is the way to go. The Lua script runs atomically on the Redis server, preventing race conditions between checking the bucket and consuming tokens.

According to Redis's official documentation, a single Redis instance can handle over 100,000 rate limit checks per second, which is sufficient for most applications. For higher scale, Redis Cluster distributes the load across multiple nodes.

Rate Limiting at Different Layers

API Gateway Level

This is your first line of defense. Tools like Kong, Nginx, and cloud providers' API gateways (AWS API Gateway, GCP Cloud Endpoints) offer built-in rate limiting. Configure it here and you protect your entire backend without any code changes.

AWS API Gateway, for example, allows you to set a rate of 10,000 requests per second with a burst capacity of 5,000. According to AWS documentation, these limits can be configured per API key, per stage, and per method.

Application Level

For more nuanced rate limiting (per user, per endpoint, per pricing tier), implement it in your application code. Libraries like rate-limiter-flexible for Node.js, django-ratelimit for Python, and Rack::Attack for Ruby make this straightforward.

Database Level

Often overlooked but critical. Connection pools are a form of rate limiting. PostgreSQL's max_connections setting (default 100) is a hard rate limit on concurrent database access. PgBouncer sits in front of PostgreSQL and provides connection pooling with configurable limits.

Throttling: Slowing Down Instead of Rejecting

Sometimes rejecting requests is too harsh. Throttling offers a gentler alternative: slow down the processing instead of refusing it entirely.

Server-Side Throttling

Request queuing: Instead of returning 429, queue excess requests and process them at a controlled rate. This works well for background jobs and batch processing but is problematic for real-time user requests where latency matters.

Priority-based throttling: Assign priority to requests. When the system is under load, process high-priority requests normally while delaying or queuing low-priority ones. For example, a SaaS application might prioritize paying customers over free-tier users during peak load.

Client-Side Throttling

Good API clients throttle themselves. This is both a courtesy and a practical necessity, because if you don't, the server will rate limit you anyway, and 429 errors are more disruptive than a slight delay.

Exponential backoff with jitter is the standard approach for retries after throttling. The formula: delay = min(cap, base * 2^attempt) + random(0, jitter). The jitter prevents the "thundering herd" problem where many clients retry at the same time.

Google's SRE book recommends a more sophisticated approach called "adaptive throttling" where the client tracks its own error rate and progressively reduces its sending rate as errors increase.

Backpressure: The Often-Ignored Third Pillar

Abstract technology visualization representing data flow control

Backpressure is fundamentally different from rate limiting and throttling. Instead of the server telling the client "you're sending too much," it's a downstream component telling an upstream component "I can't keep up." The signal flows backward through the system, hence "back" pressure.

Where Backpressure Matters

Message queues: When a Kafka consumer can't process messages fast enough, the consumer lag grows. This is a backpressure signal.
Stream processing: In systems like Apache Flink or Kafka Streams, backpressure is a first-class concept. When a processing stage is slow, it signals upstream stages to slow down.
HTTP/2: The protocol has built-in flow control. A receiver can signal to the sender to stop sending data by adjusting the flow control window.
TCP: TCP's sliding window protocol is the original backpressure mechanism. When the receiver's buffer is full, it tells the sender to stop.

Backpressure Strategies

Strategy	How It Works	Pros	Cons
Drop oldest	Discard the oldest items in the buffer	Keeps most recent data	Data loss
Drop newest	Reject new items when buffer is full	Simple, preserves order	Data loss
Block producer	Stop the producer until space is available	No data loss	Can cascade upstream
Buffer to disk	Spill excess to disk storage	No data loss, no blocking	Slower, disk management
Sample	Only process a percentage of items	Controlled degradation	Incomplete data

Reactive programming frameworks like Reactive Streams (Java), RxJS (JavaScript), and Project Reactor (Spring) have backpressure built into their APIs. If you're using these frameworks, you get backpressure handling "for free" by using their operators correctly.

Real-World Rate Limiting: How the Big Players Do It

API Provider	Rate Limit	Algorithm	Notable Feature
GitHub	5,000 req/hr (authenticated)	Token Bucket	X-RateLimit-* headers
Stripe	100 req/sec (live), 25 req/sec (test)	Token Bucket	Idempotency keys for retries
Twitter/X	Varies by endpoint (15-900 per 15 min)	Fixed Window	Per-endpoint limits
OpenAI	Varies by tier and model	Token Bucket	Both RPM and TPM limits
Shopify	40 req/sec (Plus), 2 req/sec (basic)	Leaky Bucket	Tier-based limits

Notice a pattern: every major API uses clear response headers to communicate rate limit status. At minimum, you should return:

X-RateLimit-Limit: Maximum requests allowed
X-RateLimit-Remaining: Requests remaining in the current window
X-RateLimit-Reset: Unix timestamp when the window resets
Retry-After: Seconds to wait before retrying (on 429 responses)

My Opinionated Take

After years of building and consuming rate-limited APIs, here are my strong opinions:

1. Rate limiting is not optional. Every API that will be consumed by external clients needs rate limiting from day one. Not "when we scale." Not "when we have abuse." Day one. I've seen startups without rate limiting get accidentally DDoS'd by a single partner integration more times than I can count.

2. The token bucket is the right default. Unless you have a specific reason to use something else, start with a token bucket. It handles bursts gracefully, it's intuitive to explain to API consumers, and it's easy to implement with Redis.

3. Client-side throttling should be mandatory. If you're building an API client library (SDK), build in automatic retry with exponential backoff. Don't make developers implement it themselves. They won't, or they'll do it wrong.

4. Backpressure is the most underused mechanism. Most web applications have no backpressure signals. A slow database query doesn't cause the API to stop accepting requests. A full message queue doesn't cause producers to slow down. Adding backpressure signals, even simple ones like monitoring queue depth, can prevent cascading failures.

5. Rate limits should be tier-based from the start. Even if you only have one tier today, structure your rate limiting code to support multiple tiers. When you add a paid plan (and you will), you don't want to refactor your rate limiting infrastructure.

Action Plan: Implementing Rate Limiting in Your API

Phase 1: Basic Protection (Day 1)

Add rate limiting at your API gateway or reverse proxy (Nginx, Cloudflare, AWS API Gateway)
Set a generous global limit (e.g., 1,000 requests per minute per IP)
Return proper 429 responses with Retry-After headers
Add monitoring: alert when any client hits the rate limit

Phase 2: Per-User Limits (Week 1-2)

Implement token bucket rate limiting in your application code with Redis
Set per-user limits based on authentication level
Add X-RateLimit-* response headers to all API responses
Document your rate limits in your API documentation

Phase 3: Granular Controls (Month 1)

Add per-endpoint rate limits (write endpoints get lower limits than read endpoints)
Implement tier-based limits (free vs. paid)
Add client-side throttling to your SDK/client libraries
Build a rate limit dashboard showing usage patterns

Phase 4: Backpressure (Month 2-3)

Add health checks that monitor downstream dependency load
Implement circuit breakers for external service calls
Add queue depth monitoring with alerts
Consider adaptive throttling based on system load

Developer implementing code on a modern workstation

Key Takeaways

Rate limiting rejects excess traffic; throttling slows it down; backpressure signals upstream to reduce flow.
Token bucket is the best default algorithm for API rate limiting.
Redis is the standard backend for distributed rate limiting.
Always return rate limit headers so clients can self-regulate.
Implement client-side throttling (exponential backoff with jitter) in your SDK.
Backpressure is the least implemented but most valuable mechanism for preventing cascading failures.
Start with gateway-level rate limiting on day one, then add application-level granularity.

Sources

I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.

Loading BirJob...

Rate Limiting, Throttling, and Backpressure: A Developer's Guide

Rate Limiting, Throttling, and Backpressure: A Developer's Guide

Definitions: Getting the Terminology Right

Rate Limiting Algorithms: The Core Four

1. Fixed Window Counter

2. Sliding Window Log

3. Sliding Window Counter

4. Token Bucket

Implementing Rate Limiting with Redis

Rate Limiting at Different Layers

API Gateway Level

Application Level

Database Level

Throttling: Slowing Down Instead of Rejecting

Server-Side Throttling

Client-Side Throttling

Backpressure: The Often-Ignored Third Pillar

Where Backpressure Matters

Backpressure Strategies

Real-World Rate Limiting: How the Big Players Do It

My Opinionated Take

Action Plan: Implementing Rate Limiting in Your API

Phase 1: Basic Protection (Day 1)

Phase 2: Per-User Limits (Week 1-2)

Phase 3: Granular Controls (Month 1)

Phase 4: Backpressure (Month 2-3)

Key Takeaways

Sources

İş axtarışınıza başlayın

Oxşar məqalələr