Postgres Performance Tuning: From Slow Queries to Sub-Second

Database server infrastructure with performance monitoring

Six months ago, BirJob's main job search query took 3.2 seconds. We had about 350,000 job listings in our database, scraped from 80+ sources daily. Users were bouncing before results loaded. Today, the same query returns in 47 milliseconds — a 68x improvement. We did not rewrite the application, switch databases, or throw money at bigger servers. We tuned PostgreSQL.

This article documents every optimization we made, from indexing strategy to query rewriting, from connection pooling to configuration tuning. If you are running PostgreSQL in production and things are getting slow, this is your playbook.

1. Diagnosing the Problem: Where Is the Time Going?

Before optimizing anything, you need to know what is slow. PostgreSQL has excellent built-in diagnostics, but you have to enable them.

Enable Query Logging

-- postgresql.conf
log_min_duration_statement = 100    -- Log queries taking > 100ms
shared_preload_libraries = 'pg_stat_statements'

-- After restart, create the extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

The pg_stat_statements extension is the single most valuable diagnostic tool in PostgreSQL. It tracks execution statistics for all SQL statements. According to the PostgreSQL documentation, it normalizes queries (replacing literal values with parameters) so you can see which query patterns are the most expensive.

Finding Your Worst Queries

-- Top 10 queries by total execution time
SELECT
    substring(query, 1, 100) AS short_query,
    calls,
    round(total_exec_time::numeric, 2) AS total_ms,
    round(mean_exec_time::numeric, 2) AS avg_ms,
    round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct_total,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

When I first ran this on BirJob's database, one query accounted for 43% of total database time. It was our full-text search query, and it was doing a sequential scan on 350,000 rows. That single finding drove most of our optimization effort.

2. Indexing: The 80/20 of Performance

Proper indexing solves 80% of PostgreSQL performance problems. But "add more indexes" is not the answer — wrong indexes waste space, slow down writes, and can actually make queries slower.

Understanding EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM jobs
WHERE company = 'Kapital Bank'
AND created_at > '2026-01-01'
ORDER BY created_at DESC
LIMIT 20;

-- Output (before optimization):
Limit  (cost=18234.56..18234.61 rows=20 width=284) (actual time=892.4..892.5 rows=20 loops=1)
  ->  Sort  (cost=18234.56..18245.12 rows=4223 width=284) (actual time=892.3..892.4 rows=20 loops=1)
        Sort Key: created_at DESC
        Sort Method: top-N heapsort  Memory: 32kB
        ->  Seq Scan on jobs  (cost=0.00..18123.00 rows=4223 width=284) (actual time=0.02..876.1 rows=4156 loops=1)
              Filter: ((company = 'Kapital Bank') AND (created_at > '2026-01-01'))
              Rows Removed by Filter: 345844
              Buffers: shared hit=12456 read=3211

The Seq Scan on 350,000 rows is the problem. PostgreSQL is reading every single row and filtering. The fix:

CREATE INDEX CONCURRENTLY idx_jobs_company_created
ON jobs (company, created_at DESC);

-- After indexing:
Limit  (cost=0.42..2.87 rows=20 width=284) (actual time=0.12..0.24 rows=20 loops=1)
  ->  Index Scan using idx_jobs_company_created on jobs  (cost=0.42..512.34 rows=4223 width=284)
        Index Cond: ((company = 'Kapital Bank') AND (created_at > '2026-01-01'))
        Buffers: shared hit=5

From 892ms to 0.24ms. That is a 3,717x improvement from a single index.

Index Types and When to Use Them

Index Type	Best For	Example
B-tree (default)	Equality and range queries, ORDER BY	`CREATE INDEX ON jobs (company, created_at)`
GIN	Full-text search, JSONB containment, arrays	`CREATE INDEX ON jobs USING GIN (to_tsvector('english', title))`
GiST	Geometric data, full-text search (ranking), range types	`CREATE INDEX ON locations USING GiST (coordinates)`
Hash	Exact equality only (rare use case)	`CREATE INDEX ON sessions USING HASH (session_id)`
BRIN	Large tables with naturally ordered data	`CREATE INDEX ON logs USING BRIN (timestamp)`
Partial	Queries that always filter on a condition	`CREATE INDEX ON jobs (title) WHERE is_active = true`

3. Full-Text Search Optimization

Analytics dashboard showing search performance metrics

Full-text search was our biggest performance bottleneck at BirJob. We needed to search job titles, company names, and descriptions across 350,000+ rows with ranking and highlighting. Here is how we optimized it.

Step 1: Add a tsvector Column

Instead of computing the tsvector on every query, store it as a column and keep it updated with a trigger:

-- Add the column
ALTER TABLE jobs ADD COLUMN search_vector tsvector;

-- Populate it
UPDATE jobs SET search_vector =
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(company, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'C');

-- Create GIN index
CREATE INDEX idx_jobs_search ON jobs USING GIN (search_vector);

-- Auto-update trigger
CREATE OR REPLACE FUNCTION jobs_search_trigger() RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.company, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(NEW.description, '')), 'C');
    RETURN NEW;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER trig_jobs_search
    BEFORE INSERT OR UPDATE ON jobs
    FOR EACH ROW EXECUTE FUNCTION jobs_search_trigger();

Step 2: Optimize the Query

-- Before: 3.2 seconds
SELECT *, ts_rank(search_vector, query) AS rank
FROM jobs, plainto_tsquery('english', 'software developer baku') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

-- After optimization: 47ms
SELECT j.id, j.title, j.company, j.apply_link, j.created_at,
       ts_rank_cd(j.search_vector, query, 32) AS rank
FROM jobs j, websearch_to_tsquery('english', 'software developer baku') AS query
WHERE j.search_vector @@ query
AND j.is_active = true
ORDER BY rank DESC, j.created_at DESC
LIMIT 20;

Key changes: websearch_to_tsquery instead of plainto_tsquery (supports natural language queries), ts_rank_cd instead of ts_rank (cover density ranking is faster and often more relevant), and selecting only needed columns instead of SELECT *.

4. Query Optimization Patterns

N+1 Queries

The N+1 problem is the most common performance killer in ORMs. Your code fetches a list of N items, then makes N additional queries to fetch related data:

-- BAD: N+1 pattern (1 query + 20 queries)
SELECT * FROM jobs WHERE is_active = true LIMIT 20;
-- Then for each job:
SELECT * FROM companies WHERE id = $1;  -- 20 times!

-- GOOD: Single query with JOIN
SELECT j.*, c.name AS company_name, c.logo_url
FROM jobs j
LEFT JOIN companies c ON j.company_id = c.id
WHERE j.is_active = true
LIMIT 20;

Pagination Optimization

-- BAD: OFFSET-based pagination (gets slower on later pages)
SELECT * FROM jobs ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- PostgreSQL must read 10,020 rows and discard 10,000

-- GOOD: Cursor-based pagination
SELECT * FROM jobs
WHERE created_at < '2026-03-20T10:00:00Z'   -- cursor from last page
ORDER BY created_at DESC
LIMIT 20;
-- PostgreSQL reads exactly 20 rows using the index

According to Use The Index, Luke, offset-based pagination is O(n) — performance degrades linearly as the offset increases. Cursor-based pagination is O(1) — constant time regardless of position.

EXISTS vs IN vs JOIN

Pattern	Best When	Avoid When
`EXISTS (SELECT 1 FROM ...)`	Checking if related rows exist, subquery has many matches	N/A — generally safe
`WHERE id IN (SELECT ...)`	Small subquery result set	Subquery returns thousands of rows
`JOIN`	You need data from both tables	You only need to check existence

5. Connection Pooling

PostgreSQL creates a new process for each connection. Each process uses 5-10MB of RAM. If your application opens 200 connections, that is 1-2GB of RAM just for connection overhead. PgBouncer solves this by pooling connections:

# pgbouncer.ini
[databases]
birjob = host=localhost port=5432 dbname=birjob

[pgbouncer]
listen_port = 6432
listen_addr = 127.0.0.1
auth_type = md5
pool_mode = transaction    # Most common for web apps
max_client_conn = 1000     # Accept up to 1000 app connections
default_pool_size = 20     # Use only 20 actual PostgreSQL connections
reserve_pool_size = 5
reserve_pool_timeout = 3

With pool_mode = transaction, a PostgreSQL connection is assigned to a client only for the duration of a transaction. Between transactions, the connection is returned to the pool. This means 1,000 application connections can share 20 PostgreSQL connections efficiently.

Supabase's benchmarks show that PgBouncer can handle 10,000+ concurrent connections with minimal overhead, while raw PostgreSQL connections start degrading at around 200-300.

6. PostgreSQL Configuration Tuning

PostgreSQL's default configuration is designed to run on a Raspberry Pi. For any production workload, you need to tune these settings. Here are the key parameters, based on PGTune recommendations for a server with 16GB RAM and SSD storage:

Parameter	Default	Recommended (16GB RAM)	What It Does
`shared_buffers`	128MB	4GB	PostgreSQL's internal cache. Set to 25% of RAM.
`effective_cache_size`	4GB	12GB	Estimate of OS cache. Set to 75% of RAM. Affects query planner.
`work_mem`	4MB	64MB	Memory for sorting and hashing per operation. Be careful — each query can use multiple.
`maintenance_work_mem`	64MB	1GB	Memory for VACUUM, CREATE INDEX. Set higher for faster maintenance.
`random_page_cost`	4.0	1.1	Cost estimate for random disk I/O. Lower for SSDs to encourage index usage.
`effective_io_concurrency`	1	200	Concurrent I/O operations. Set to 200 for SSDs.
`max_connections`	100	50 (with PgBouncer)	Lower is better when using connection pooling.
`checkpoint_completion_target`	0.5	0.9	Spread checkpoint I/O over more time.

7. VACUUM and Maintenance

PostgreSQL uses MVCC (Multi-Version Concurrency Control), which means deleted and updated rows are not immediately removed — they become "dead tuples." VACUUM cleans up these dead tuples and updates statistics. Without regular vacuuming, your database will bloat and queries will slow down.

-- Check table bloat
SELECT
    schemaname || '.' || relname AS table_name,
    n_live_tup AS live_rows,
    n_dead_tup AS dead_rows,
    round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
    last_autovacuum,
    last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

Autovacuum configuration for high-write tables:

-- For tables with frequent updates (like job listings)
ALTER TABLE jobs SET (
    autovacuum_vacuum_scale_factor = 0.01,    -- Vacuum when 1% of rows are dead (default 20%)
    autovacuum_analyze_scale_factor = 0.005,  -- Analyze when 0.5% of rows changed
    autovacuum_vacuum_cost_delay = 2          -- Run vacuum more aggressively
);

8. Monitoring in Production

Real-time performance monitoring dashboard with charts and metrics

You cannot optimize what you cannot measure. Here are the essential metrics to monitor:

-- Active connections and their state
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;

-- Long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - query_start > interval '5 seconds'
ORDER BY duration DESC;

-- Index usage statistics
SELECT
    schemaname || '.' || relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC
LIMIT 20;  -- Least used indexes — candidates for removal

-- Cache hit ratio (should be > 99%)
SELECT
    round(100.0 * sum(blks_hit) / sum(blks_hit + blks_read), 2) AS cache_hit_ratio
FROM pg_stat_database;

For production monitoring, I recommend postgres_exporter for Prometheus + Grafana, or pganalyze for a managed solution. Set alerts for: cache hit ratio below 99%, replication lag above 10 seconds, dead tuple percentage above 10%, and long-running queries above 30 seconds.

9. My Opinionated Take

Most performance problems are query problems, not database problems. Before tuning PostgreSQL configuration, look at your queries. In my experience, 90% of performance issues are caused by missing indexes, N+1 queries, or sequential scans that should be index scans. Configuration tuning is important, but it is the last 10%.

Do not add indexes preemptively. Every index slows down writes and uses disk space. Add indexes in response to actual slow queries that you have measured. The best approach: deploy to production, monitor with pg_stat_statements, optimize the top 5 queries, repeat.

EXPLAIN ANALYZE is your best friend. Learn to read query plans. The PostgreSQL documentation on EXPLAIN is excellent, and tools like explain.dalibo.com make query plans visual and easy to understand.

PostgreSQL scales further than you think. I have seen teams switch to NoSQL or add caching layers when PostgreSQL could handle the load with proper tuning. PostgreSQL handles billions of rows, thousands of concurrent queries, and complex analytical workloads. Before adding complexity to your stack, make sure you have exhausted PostgreSQL's capabilities.

10. Action Plan: Speed Up Your Database This Week

Day 1: Diagnose

Enable pg_stat_statements
Set log_min_duration_statement = 100
Identify your top 10 slowest queries

Day 2-3: Index

Run EXPLAIN ANALYZE on each slow query
Add missing indexes (use CREATE INDEX CONCURRENTLY)
Remove unused indexes

Day 4: Configure

Run PGTune with your server specs
Apply recommended settings
Set up PgBouncer if you have more than 50 connections

Day 5: Monitor

Set up dashboards for key metrics
Configure alerts for degradation
Schedule weekly reviews of pg_stat_statements

Sources

I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.

Loading BirJob...

Postgres Performance Tuning: From Slow Queries to Sub-Second

Postgres Performance Tuning: From Slow Queries to Sub-Second

1. Diagnosing the Problem: Where Is the Time Going?

Enable Query Logging

Finding Your Worst Queries

2. Indexing: The 80/20 of Performance

Understanding EXPLAIN ANALYZE

Index Types and When to Use Them

3. Full-Text Search Optimization

Step 1: Add a tsvector Column

Step 2: Optimize the Query

4. Query Optimization Patterns

N+1 Queries

EXISTS vs IN vs JOIN

5. Connection Pooling

6. PostgreSQL Configuration Tuning

7. VACUUM and Maintenance

8. Monitoring in Production

9. My Opinionated Take

10. Action Plan: Speed Up Your Database This Week

Sources

İş axtarışınıza başlayın