Monitoring Production Next.js Apps: The Complete Stack
Last year, we shipped a Next.js app to production with zero monitoring. No error tracking, no performance metrics, no log aggregation. It was a job aggregator processing 50,000+ scraping operations daily, serving 200,000+ page views monthly. When things broke — and they broke often — we found out from users. Sometimes days later. The fix-it-when-it-breaks approach cost us an estimated 15% of daily active users before we instrumented everything.
This guide is the monitoring stack I wish existed when I started. It covers every layer — from browser-side Real User Monitoring to server-side APM, from log aggregation to synthetic monitoring — with specific Next.js configuration, real code, and honest cost analysis.
The Four Pillars of Next.js Monitoring
Before diving into tools, let's establish a framework. Production monitoring for Next.js applications sits on four pillars:
| Pillar | What It Tracks | Key Metrics | Primary Tools |
|---|---|---|---|
| Error Tracking | Exceptions, unhandled rejections, API errors | Error rate, error frequency, affected users | Sentry, Bugsnag, Rollbar |
| Performance (APM) | Response times, throughput, Core Web Vitals | p50/p95/p99 latency, TTFB, LCP, CLS | Vercel Analytics, Datadog, New Relic |
| Logs | Application logs, access logs, build logs | Log volume, error log ratio, latency correlation | Axiom, Datadog Logs, Logtail |
| Uptime & Synthetics | Availability, SSL, DNS, critical user flows | Uptime %, response time from global PoPs | BetterUptime, Checkly, Pingdom |
According to Datadog's 2024 State of Serverless report, 68% of organizations running serverless or edge-deployed applications (like Next.js on Vercel) lack adequate observability. The New Relic 2024 Observability Forecast found that organizations with mature monitoring practices resolve incidents 69% faster.
Pillar 1: Error Tracking with Sentry
Sentry is the gold standard for error tracking in Next.js applications. It has first-class Next.js support with the @sentry/nextjs SDK, which automatically instruments both client-side and server-side errors.
Installation and Configuration
# Install the Sentry Next.js SDK
npm install @sentry/nextjs
# Run the setup wizard (recommended for first-time setup)
npx @sentry/wizard@latest -i nextjs
The wizard creates three configuration files:
// sentry.client.config.ts
import * as Sentry from '@sentry/nextjs';
Sentry.init({
dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
environment: process.env.NODE_ENV,
// Performance monitoring
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
// Session replay for debugging user-reported issues
replaysSessionSampleRate: 0.01, // 1% of sessions
replaysOnErrorSampleRate: 0.5, // 50% of sessions with errors
integrations: [
Sentry.replayIntegration({
maskAllText: false,
blockAllMedia: false,
}),
],
// Filter out noise
ignoreErrors: [
'ResizeObserver loop limit exceeded',
'ResizeObserver loop completed with undelivered notifications',
'Non-Error promise rejection captured',
/Loading chunk \d+ failed/,
/ChunkLoadError/,
],
beforeSend(event) {
// Don't send errors from bots/crawlers
const userAgent = event.request?.headers?.['user-agent'] || '';
if (/bot|crawler|spider|googlebot|bingbot/i.test(userAgent)) {
return null;
}
return event;
},
});
// sentry.server.config.ts
import * as Sentry from '@sentry/nextjs';
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.2 : 1.0,
// Capture unhandled promise rejections
integrations: [
Sentry.prismaIntegration(), // If using Prisma
],
beforeSend(event) {
// Redact sensitive data
if (event.request?.data) {
const data = event.request.data as Record<string, unknown>;
if (data.password) data.password = '[REDACTED]';
if (data.token) data.token = '[REDACTED]';
if (data.creditCard) data.creditCard = '[REDACTED]';
}
return event;
},
});
// sentry.edge.config.ts
import * as Sentry from '@sentry/nextjs';
Sentry.init({
dsn: process.env.SENTRY_DSN,
tracesSampleRate: 0.1,
});
Next.js-Specific Error Boundaries
// app/global-error.tsx — catches errors in the root layout
'use client';
import * as Sentry from '@sentry/nextjs';
import { useEffect } from 'react';
export default function GlobalError({
error,
reset,
}: {
error: Error & { digest?: string };
reset: () => void;
}) {
useEffect(() => {
Sentry.captureException(error);
}, [error]);
return (
<html>
<body>
<div style={{ padding: '2rem', textAlign: 'center' }}>
<h2>Something went wrong</h2>
<p>We've been notified and are working on a fix.</p>
<button onClick={reset}>Try again</button>
</div>
</body>
</html>
);
}
// app/error.tsx — catches errors in page components
'use client';
import * as Sentry from '@sentry/nextjs';
import { useEffect } from 'react';
export default function Error({
error,
reset,
}: {
error: Error & { digest?: string };
reset: () => void;
}) {
useEffect(() => {
Sentry.captureException(error, {
tags: { boundary: 'page-error' },
});
}, [error]);
return (
<div className="error-container">
<h2>This page encountered an error</h2>
<p>Error ID: {error.digest}</p>
<button onClick={reset}>Reload</button>
</div>
);
}
Custom Error Context for API Routes
// lib/api-error-handler.ts
import * as Sentry from '@sentry/nextjs';
import { NextRequest, NextResponse } from 'next/server';
export function withErrorHandling(
handler: (req: NextRequest) => Promise<NextResponse>
) {
return async (req: NextRequest): Promise<NextResponse> => {
try {
return await handler(req);
} catch (error) {
Sentry.captureException(error, {
tags: {
route: req.nextUrl.pathname,
method: req.method,
},
extra: {
url: req.url,
headers: Object.fromEntries(req.headers.entries()),
searchParams: Object.fromEntries(req.nextUrl.searchParams.entries()),
},
});
if (error instanceof ValidationError) {
return NextResponse.json(
{ error: error.message, fields: error.fields },
{ status: 400 }
);
}
if (error instanceof AuthorizationError) {
return NextResponse.json(
{ error: 'Unauthorized' },
{ status: 401 }
);
}
return NextResponse.json(
{ error: 'Internal server error' },
{ status: 500 }
);
}
};
}
// Usage in an API route
// app/api/jobs/route.ts
export const GET = withErrorHandling(async (req) => {
const jobs = await prisma.job.findMany({
take: 20,
orderBy: { createdAt: 'desc' },
});
return NextResponse.json(jobs);
});
Pillar 2: Performance Monitoring and Core Web Vitals
Next.js has built-in support for Core Web Vitals reporting. Here's how to capture and act on these metrics.
Web Vitals Reporting
// app/layout.tsx — report Web Vitals
import { SpeedInsights } from '@vercel/speed-insights/next';
import { Analytics } from '@vercel/analytics/react';
export default function RootLayout({ children }: { children: React.ReactNode }) {
return (
<html>
<body>
{children}
<SpeedInsights />
<Analytics />
</body>
</html>
);
}
// For custom Web Vitals collection (non-Vercel deployments)
// components/web-vitals.tsx
'use client';
import { useReportWebVitals } from 'next/web-vitals';
export function WebVitals() {
useReportWebVitals((metric) => {
const body = {
name: metric.name,
value: metric.value,
rating: metric.rating, // 'good' | 'needs-improvement' | 'poor'
delta: metric.delta,
id: metric.id,
navigationType: metric.navigationType,
url: window.location.pathname,
};
// Send to your analytics endpoint
if (navigator.sendBeacon) {
navigator.sendBeacon('/api/vitals', JSON.stringify(body));
} else {
fetch('/api/vitals', {
method: 'POST',
body: JSON.stringify(body),
keepalive: true,
});
}
});
return null;
}
Core Web Vitals Thresholds
| Metric | Good | Needs Improvement | Poor | What It Measures |
|---|---|---|---|---|
| LCP | ≤ 2.5s | 2.5-4.0s | > 4.0s | Loading: when main content is visible |
| INP | ≤ 200ms | 200-500ms | > 500ms | Interactivity: response to user input |
| CLS | ≤ 0.1 | 0.1-0.25 | > 0.25 | Visual stability: layout shift |
| TTFB | ≤ 800ms | 800ms-1.8s | > 1.8s | Server response time |
| FCP | ≤ 1.8s | 1.8-3.0s | > 3.0s | First paint: when any content appears |
According to Google's research, sites meeting Core Web Vitals thresholds see 24% fewer abandoned page loads. For a job board like BirJob, that translates directly to more applications submitted.
Server-Side Performance: Tracking API Route Latency
// middleware.ts — global request timing
import { NextRequest, NextResponse } from 'next/server';
export function middleware(req: NextRequest) {
const start = Date.now();
const response = NextResponse.next();
// Add server timing header
response.headers.set('Server-Timing', `total;dur=${Date.now() - start}`);
response.headers.set('X-Request-Id', crypto.randomUUID());
return response;
}
// lib/performance.ts — function-level instrumentation
export function measureAsync<T>(
name: string,
fn: () => Promise<T>,
tags?: Record<string, string>
): Promise<T> {
return Sentry.startSpan({ name, op: 'function', ...tags }, async () => {
const start = performance.now();
try {
const result = await fn();
const duration = performance.now() - start;
// Log slow operations
if (duration > 1000) {
console.warn(`Slow operation: ${name} took ${duration.toFixed(0)}ms`);
}
return result;
} catch (error) {
throw error;
}
});
}
// Usage
const jobs = await measureAsync('prisma.job.findMany', () =>
prisma.job.findMany({
where: { active: true },
take: 50,
orderBy: { createdAt: 'desc' },
}),
{ 'db.system': 'postgresql' }
);
Pillar 3: Structured Logging
Next.js's default console.log is insufficient for production. You need structured, searchable, correlated logs.
Setting Up Pino for Next.js
// lib/logger.ts
import pino from 'pino';
const isProduction = process.env.NODE_ENV === 'production';
export const logger = pino({
level: process.env.LOG_LEVEL || (isProduction ? 'info' : 'debug'),
// Human-readable in dev, JSON in production
transport: isProduction
? undefined // JSON output for log aggregators
: { target: 'pino-pretty', options: { colorize: true } },
// Add default fields to every log line
base: {
env: process.env.NODE_ENV,
service: 'birjob-web',
version: process.env.NEXT_PUBLIC_APP_VERSION || 'unknown',
},
// Redact sensitive fields automatically
redact: {
paths: ['req.headers.authorization', 'req.headers.cookie', 'password', 'token', 'secret'],
censor: '[REDACTED]',
},
// Custom serializers
serializers: {
err: pino.stdSerializers.err,
req: (req) => ({
method: req.method,
url: req.url,
userAgent: req.headers?.['user-agent'],
ip: req.headers?.['x-forwarded-for'] || req.socket?.remoteAddress,
}),
},
});
// Create child loggers for different modules
export const scraperLogger = logger.child({ module: 'scraper' });
export const apiLogger = logger.child({ module: 'api' });
export const authLogger = logger.child({ module: 'auth' });
export const paymentLogger = logger.child({ module: 'payment' });
Correlating Logs with Request IDs
// lib/request-context.ts
import { AsyncLocalStorage } from 'node:async_hooks';
import { logger } from './logger';
interface RequestContext {
requestId: string;
userId?: string;
startTime: number;
route: string;
}
export const requestContext = new AsyncLocalStorage<RequestContext>();
export function getRequestLogger() {
const ctx = requestContext.getStore();
if (!ctx) return logger;
return logger.child({
requestId: ctx.requestId,
userId: ctx.userId,
route: ctx.route,
});
}
// Usage in API routes
export async function withRequestContext(
req: NextRequest,
handler: () => Promise<NextResponse>
): Promise<NextResponse> {
const context: RequestContext = {
requestId: req.headers.get('x-request-id') || crypto.randomUUID(),
startTime: Date.now(),
route: req.nextUrl.pathname,
};
return requestContext.run(context, async () => {
const log = getRequestLogger();
log.info({ method: req.method }, 'Request started');
try {
const response = await handler();
log.info({ status: response.status, duration: Date.now() - context.startTime }, 'Request completed');
response.headers.set('X-Request-Id', context.requestId);
return response;
} catch (error) {
log.error({ error, duration: Date.now() - context.startTime }, 'Request failed');
throw error;
}
});
}
Log Aggregation: Axiom Integration
// For Vercel deployments, Axiom is the best option
// next.config.js
const { withAxiom } = require('next-axiom');
module.exports = withAxiom({
// your Next.js config
});
// Usage with next-axiom
import { log } from 'next-axiom';
export async function GET(req: NextRequest) {
log.info('Job search', {
query: req.nextUrl.searchParams.get('q'),
page: req.nextUrl.searchParams.get('page'),
});
// ... handle request
log.info('Job search completed', {
resultCount: jobs.length,
duration: Date.now() - start,
});
}
Pillar 4: Uptime and Synthetic Monitoring
Uptime monitoring tells you when your application is down before your users tell you.
Health Check Endpoint
// app/api/health/route.ts
import { NextResponse } from 'next/server';
import { prisma } from '@/lib/prisma';
interface HealthCheckResult {
status: 'healthy' | 'degraded' | 'unhealthy';
timestamp: string;
version: string;
checks: {
database: { status: string; latencyMs: number };
memory: { usedMB: number; totalMB: number; percentage: number };
uptime: number;
};
}
export async function GET(): Promise<NextResponse<HealthCheckResult>> {
const checks: HealthCheckResult['checks'] = {
database: { status: 'unknown', latencyMs: 0 },
memory: { usedMB: 0, totalMB: 0, percentage: 0 },
uptime: process.uptime(),
};
// Check database
const dbStart = Date.now();
try {
await prisma.$queryRaw`SELECT 1`;
checks.database = { status: 'connected', latencyMs: Date.now() - dbStart };
} catch {
checks.database = { status: 'disconnected', latencyMs: Date.now() - dbStart };
}
// Check memory
const mem = process.memoryUsage();
checks.memory = {
usedMB: Math.round(mem.heapUsed / 1024 / 1024),
totalMB: Math.round(mem.heapTotal / 1024 / 1024),
percentage: Math.round((mem.heapUsed / mem.heapTotal) * 100),
};
const isHealthy = checks.database.status === 'connected' && checks.memory.percentage < 90;
const isDegraded = checks.database.latencyMs > 500 || checks.memory.percentage > 75;
const result: HealthCheckResult = {
status: !isHealthy ? 'unhealthy' : isDegraded ? 'degraded' : 'healthy',
timestamp: new Date().toISOString(),
version: process.env.NEXT_PUBLIC_APP_VERSION || 'unknown',
checks,
};
return NextResponse.json(result, {
status: isHealthy ? 200 : 503,
headers: { 'Cache-Control': 'no-store' },
});
}
Synthetic Monitoring with Checkly
// __checks__/homepage.check.ts (Checkly Monitoring-as-Code)
import { ApiCheck, AssertionBuilder } from 'checkly/constructs';
new ApiCheck('homepage-availability', {
name: 'Homepage loads successfully',
activated: true,
frequency: 5, // Check every 5 minutes
locations: ['eu-west-1', 'us-east-1', 'ap-southeast-1'],
request: {
method: 'GET',
url: 'https://www.birjob.com',
assertions: [
AssertionBuilder.statusCode().equals(200),
AssertionBuilder.responseTime().lessThan(3000),
AssertionBuilder.body().contains('BirJob'),
],
},
alertChannels: [slackChannel, emailChannel],
});
// Browser check for critical user flow
import { BrowserCheck } from 'checkly/constructs';
new BrowserCheck('job-search-flow', {
name: 'Job search returns results',
activated: true,
frequency: 15, // Every 15 minutes
locations: ['eu-west-1'],
code: {
content: `
const { expect, test } = require('@playwright/test');
test('search for jobs', async ({ page }) => {
await page.goto('https://www.birjob.com');
await page.fill('[data-testid="search-input"]', 'developer');
await page.click('[data-testid="search-button"]');
await page.waitForSelector('[data-testid="job-card"]');
const results = await page.locator('[data-testid="job-card"]').count();
expect(results).toBeGreaterThan(0);
});
`,
},
});
The Complete Monitoring Stack: Cost Comparison
| Tool | Category | Free Tier | Paid Starting At | Best For |
|---|---|---|---|---|
| Sentry | Error Tracking | 5K errors/month | $26/mo | Next.js error tracking (best SDK) |
| Vercel Analytics | Web Vitals | 2.5K events/mo | $10/mo (Pro plan) | Vercel-hosted sites |
| Axiom | Logs | 500MB ingest/mo | $25/mo | Vercel log drain |
| BetterUptime | Uptime | 10 monitors | $20/mo | Simple uptime + status page |
| Checkly | Synthetics | 5 checks | $30/mo | Playwright-based E2E monitoring |
| Datadog | Full Stack | 14-day trial | $15/host/mo | Enterprise all-in-one |
| Grafana Cloud | Dashboards | Generous free tier | $29/mo | Custom dashboards + alerting |
My recommended stack for small-to-medium Next.js apps (under $100/month total):
- Sentry Developer plan ($0-26/mo) for error tracking
- Vercel Analytics (included in Pro plan) for Web Vitals
- Axiom free tier for logs
- BetterUptime free tier for uptime monitoring
Alerting: Don't Wake Up for Everything
The biggest mistake teams make is over-alerting. Alert fatigue is real — according to PagerDuty's 2024 report, 44% of on-call engineers receive alerts that don't require action, leading to desensitization.
Alert Severity Framework
| Severity | Criteria | Notification Channel | Response Time |
|---|---|---|---|
| P0 - Critical | Site down, data loss, security breach | Phone call + SMS + Slack | 15 minutes |
| P1 - High | Major feature broken, error rate >5% | Slack + Email | 1 hour |
| P2 - Medium | Degraded performance, non-critical errors | Slack | 4 hours |
| P3 - Low | Warning threshold, minor issues | Daily digest email | Next business day |
// Example: Sentry alert rules configuration
// In Sentry dashboard: Settings > Alerts
// P0: Site-wide error spike
// Trigger: Error count > 100 in 5 minutes
// Action: PagerDuty page + Slack #incidents
// P1: High error rate on specific routes
// Trigger: Error rate > 5% on /api/* routes in 10 minutes
// Action: Slack #alerts
// P2: Performance degradation
// Trigger: p95 response time > 3s for 15 minutes
// Action: Slack #monitoring
// P3: New unhandled errors
// Trigger: New issue (first seen)
// Action: Slack #errors-feed
Opinionated: What Most Guides Get Wrong
1. You don't need Datadog. For 90% of Next.js applications, the Sentry + Vercel Analytics + Axiom + BetterUptime stack covers everything at 1/10th the cost. Datadog is extraordinary, but its per-host pricing model makes it overkill for teams under 20 engineers.
2. Don't monitor everything equally. Your job search endpoint matters more than your about page. Your payment flow matters more than your blog. Allocate monitoring budget (both financial and attention) proportionally to business impact.
3. Sampling is not optional. If you're sending 100% of traces to Sentry, you're paying too much and getting too much noise. Sample 10-20% in production. You'll still catch patterns. For errors, always capture 100%.
4. Logs without structure are noise. If your logs look like console.log('user did thing'), they're worthless in production. Every log line needs a severity, a timestamp, a request ID, and structured metadata. Use Pino, not console.log.
5. Status pages build trust. A public status page (via BetterUptime, Statuspage, or even a simple page on your site) transforms outages from "is this site broken?" into "they know, they're working on it." It costs nothing and saves support tickets.
Action Plan: From Zero to Full Observability in One Sprint
Day 1: Error Tracking
- Install
@sentry/nextjs - Configure client, server, and edge configs
- Add error boundaries (
error.tsx,global-error.tsx) - Set up Slack integration for new errors
Day 2: Performance
- Add Vercel Analytics / Speed Insights
- Or set up custom Web Vitals reporting endpoint
- Add server-side timing headers
- Benchmark current Core Web Vitals scores
Day 3: Logging
- Install Pino, configure structured logging
- Replace all
console.logwith logger calls - Set up Axiom or your log aggregator
- Add request ID correlation
Day 4: Uptime & Alerts
- Create
/api/healthendpoint - Set up BetterUptime with 3-5 monitors
- Configure alert severity levels
- Create a public status page
Day 5: Dashboard & Review
- Create a single "overview" dashboard with key metrics
- Document runbooks for common alert scenarios
- Schedule weekly monitoring review
- Verify all alerts fire correctly (trigger a test error)
Conclusion: Monitoring Is Not Optional
After instrumenting our Next.js application, we caught three critical bugs within the first week that had been silently affecting users for months. A database connection leak that caused 500 errors during peak hours. A memory leak in a Server Component that grew 10MB/hour. A third-party API timeout that cascaded into our job search being empty for users in certain regions.
Without monitoring, these bugs would have continued eroding user trust. With monitoring, we fixed them in hours. The total cost of our monitoring stack: $36/month. The cost of the bugs it catches: incalculable.
Sources
- Datadog State of Serverless 2024
- New Relic Observability Forecast 2024
- Google Web Vitals
- Google — Web Vitals Business Impact
- PagerDuty State of Digital Operations 2024
- Sentry Next.js SDK Documentation
- Next.js OpenTelemetry Documentation
- Axiom Vercel Integration
I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.
