Structured Logging: From console.log to Production-Ready Observability
It was 3 AM and our checkout service was dropping 15% of transactions. I opened CloudWatch, searched for "payment error," and got back 47,000 results — a wall of unstructured text that told me nothing useful. Half were console.log("payment error", error) with no context about which user, which order, which payment provider, or which server instance. The other half were stack traces that had been truncated by the log collector.
We spent 90 minutes correlating timestamps across five services before finding the root cause: a third-party payment provider had changed their API response format, and our parser was silently failing. Ninety minutes of downtime because our logs were console.log statements scattered across the codebase like confetti.
The next week, I migrated our entire logging stack to structured JSON logs with correlation IDs. The following incident took 4 minutes to diagnose — because I could filter by service=payment AND level=error AND correlation_id=abc123 and see the exact sequence of events across all services.
Structured logging isn't glamorous. Nobody writes blog posts about switching from console.log to pino. But it's the single highest-ROI investment you can make in operational readiness. According to Splunk's 2024 State of Observability Report, organizations with mature logging practices resolve incidents 69% faster than those without.
What Makes Logging "Structured"?
Unstructured logging is human-readable text. Structured logging is machine-parseable data (typically JSON) with consistent fields. Here's the difference:
// Unstructured (bad for production)
console.log(`[${new Date().toISOString()}] ERROR: Payment failed for user ${userId} - ${error.message}`);
// Output: [2026-03-20T14:23:45.123Z] ERROR: Payment failed for user usr_abc - timeout
// Structured (good for production)
logger.error({
message: 'Payment processing failed',
userId: 'usr_abc',
orderId: 'ord_xyz',
paymentProvider: 'stripe',
errorCode: 'TIMEOUT',
errorMessage: error.message,
durationMs: 5023,
correlationId: req.correlationId,
service: 'payment-service',
environment: 'production'
});
// Output: {"level":"error","message":"Payment processing failed","userId":"usr_abc","orderId":"ord_xyz","paymentProvider":"stripe","errorCode":"TIMEOUT","durationMs":5023,...,"timestamp":"2026-03-20T14:23:45.123Z"}
The structured version is harder to read in a terminal — but it's infinitely easier to search, filter, aggregate, and alert on. You can write a query like "show me all errors from the payment service in the last hour where the payment provider is Stripe and duration exceeded 5 seconds" and get precise results in milliseconds.
The Five Properties of Production-Ready Logs
| Property | Description | Example |
|---|---|---|
| Structured | Machine-parseable format (JSON) | {"level":"error","message":"..."} |
| Contextual | Includes who, what, where, when | userId, orderId, service, environment |
| Correlated | Traceable across services | correlationId / traceId shared across services |
| Leveled | Severity classification | debug, info, warn, error, fatal |
| Sampled | Cost-controlled at high volume | Log 100% of errors, 10% of info |
Choosing a Logging Library: Pino vs Winston vs Bunyan
In the Node.js ecosystem, three libraries dominate production logging:
| Feature | Pino | Winston | Bunyan |
|---|---|---|---|
| Performance (ops/sec) | ~180,000 | ~25,000 | ~35,000 |
| Output format | JSON (native) | JSON or text | JSON (native) |
| Child loggers | Yes (fast) | Yes | Yes |
| Transport plugins | Separate thread | In-process | Streams |
| Ecosystem | Growing | Largest | Stable (minimal updates) |
My recommendation: Pino. It's 5-7x faster than Winston in published benchmarks, produces structured JSON by default, and runs transports (log shipping) in a separate thread so logging never blocks your application's event loop.
Setting Up Pino: The Complete Configuration
// logger.js — Production-ready Pino setup
const pino = require('pino');
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
// Base context added to every log line
base: {
service: process.env.SERVICE_NAME || 'api',
environment: process.env.NODE_ENV || 'development',
version: process.env.APP_VERSION || 'unknown',
hostname: require('os').hostname()
},
// Timestamp in ISO format
timestamp: pino.stdTimeFunctions.isoTime,
// Redact sensitive fields
redact: {
paths: ['req.headers.authorization', 'req.headers.cookie', 'password', 'token', 'creditCard'],
censor: '[REDACTED]'
},
// Serializers for common objects
serializers: {
err: pino.stdSerializers.err,
req: pino.stdSerializers.req,
res: pino.stdSerializers.res
},
// Pretty print in development only
transport: process.env.NODE_ENV === 'development'
? { target: 'pino-pretty', options: { colorize: true } }
: undefined
});
module.exports = logger;
Request-Scoped Logging with Child Loggers
// middleware/requestLogger.js
const { randomUUID } = require('crypto');
const logger = require('./logger');
const requestLogger = (req, res, next) => {
// Generate or extract correlation ID
const correlationId = req.headers['x-correlation-id'] || randomUUID();
req.correlationId = correlationId;
// Create a child logger with request context
req.log = logger.child({
correlationId,
requestId: randomUUID(),
method: req.method,
path: req.path,
userAgent: req.headers['user-agent'],
ip: req.ip
});
// Log request start
const startTime = Date.now();
req.log.info('Request started');
// Log request end
res.on('finish', () => {
req.log.info({
statusCode: res.statusCode,
durationMs: Date.now() - startTime,
contentLength: res.get('content-length')
}, 'Request completed');
});
// Pass correlation ID downstream (for microservices)
res.setHeader('x-correlation-id', correlationId);
next();
};
// Usage in route handlers
app.get('/api/orders/:id', async (req, res) => {
req.log.info({ orderId: req.params.id }, 'Fetching order');
try {
const order = await OrderService.findById(req.params.id);
req.log.info({ orderId: order.id, status: order.status }, 'Order found');
res.json(order);
} catch (err) {
req.log.error({ orderId: req.params.id, err }, 'Failed to fetch order');
res.status(500).json({ error: 'Internal server error' });
}
});
Log Levels: When to Use What
Log levels are not arbitrary. Each level has a specific purpose, and misusing them makes your logs useless:
| Level | When to Use | Example | Alert? |
|---|---|---|---|
| fatal | Application is crashing | Uncaught exception, out of memory | Page immediately |
| error | Operation failed, needs attention | Payment processing failed, DB connection lost | Alert within minutes |
| warn | Unexpected but recovered | Retry succeeded, deprecated API used, rate limit approaching | Dashboard / weekly review |
| info | Normal business operations | User logged in, order created, job completed | No |
| debug | Development / troubleshooting | SQL query, cache hit/miss, function arguments | No (off in prod) |
| trace | Granular debugging | Entry/exit of functions, loop iterations | No (off in prod) |
The Log Pipeline: From Application to Dashboard
Your application produces logs. But logs are useless if they're sitting in a file on a server that nobody reads. A production log pipeline typically looks like this:
Application → stdout (JSON) → Log Collector → Log Aggregator → Dashboard/Alerts
Concrete stack examples:
1. Cloud-native:
App → stdout → CloudWatch/GCP Logging → Dashboard + Alerts
2. ELK Stack:
App → stdout → Filebeat → Logstash → Elasticsearch → Kibana
3. Modern lightweight:
App → stdout → Vector/Fluent Bit → Grafana Loki → Grafana
4. Managed:
App → stdout → Datadog Agent → Datadog → Dashboards + Alerts
Tool Comparison: Where to Send Your Logs
| Platform | Type | Cost (100 GB/mo) | Query Speed | Best For |
|---|---|---|---|---|
| ELK (Elasticsearch) | Self-hosted | $100-500 (infra) | Fast | Full control, complex queries |
| Grafana Loki | Self-hosted/Cloud | $50-200 | Good (label-based) | Cost-effective, Grafana users |
| Datadog | Managed | $1,000-3,000 | Very fast | Enterprise, full observability |
| AWS CloudWatch | Managed | $50-300 | Moderate | AWS-native workloads |
| Axiom | Managed | $25-200 | Fast | Startups, cost-sensitive |
Correlation IDs: Tracing Requests Across Services
In a microservices architecture, a single user request might touch 5-10 services. Without correlation IDs, connecting logs across services requires manual timestamp matching — which is slow, error-prone, and often impossible.
// Correlation ID propagation pattern
// Service A: Generates or extracts the correlation ID
const correlationId = req.headers['x-correlation-id'] || randomUUID();
// Service A calls Service B with the correlation ID
const response = await fetch('http://service-b/api/orders', {
headers: {
'x-correlation-id': correlationId,
'Content-Type': 'application/json'
},
body: JSON.stringify(orderData)
});
// Service B extracts and propagates the correlation ID
app.use((req, res, next) => {
req.correlationId = req.headers['x-correlation-id'] || randomUUID();
req.log = logger.child({ correlationId: req.correlationId });
next();
});
// Now you can search all logs across all services with:
// correlationId = "abc-123-def"
// and see the complete request lifecycle
For full distributed tracing (not just log correlation), consider OpenTelemetry — it provides standardized trace context propagation, spans, and metrics alongside logs.
My Opinionated Logging Rules
1. Never use console.log in production code. console.log is synchronous, unstructured, and missing context. Replace every instance with a proper logger. This is not negotiable.
2. Log outcomes, not implementations. Bad: logger.info("Calling Stripe API"). Good: logger.info({ orderId, amount, provider: 'stripe' }, "Payment initiated"). Log what happened and why it matters, not what function you're about to call.
3. Never log sensitive data. Use Pino's redact option to automatically censor passwords, tokens, credit card numbers, and PII from logs. A OWASP Logging Cheat Sheet lists the data you should never log.
4. Every error log must include context to reproduce the issue. The error message alone is useless. Include: what operation was being performed, who triggered it, what inputs were provided, and what the system state was.
5. Logging is not free — budget for it. At scale, log storage costs more than compute. At BirJob, I log 100% of errors and warnings, 100% of request start/end, but only 10% of debug-level events. This keeps costs manageable while maintaining visibility.
Action Plan: Production Logging in 2 Weeks
Week 1: Foundation
- Replace all
console.logwith Pino (or your language's equivalent) - Configure base context (service, environment, version)
- Add request-scoped child loggers with correlation IDs
- Enable sensitive data redaction
- Establish log level guidelines for the team
Week 2: Pipeline and Observability
- Set up log collection (Fluent Bit, Vector, or cloud-native)
- Deploy a log aggregation platform (Loki, ELK, or managed)
- Create dashboards for error rates, response times, and service health
- Set up alerts for error rate spikes and specific error patterns
- Document logging standards and share with the team
Sources and Further Reading
- Pino — Fast Node.js Logger
- Pino Benchmarks vs Winston, Bunyan
- Splunk — 2024 State of Observability Report
- OpenTelemetry — Distributed Tracing Standard
- Grafana Loki — Log Aggregation
- OWASP — Logging Cheat Sheet
- 12-Factor App — Logs as Event Streams
I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.
