Monitoring Production Next.js Apps: The Complete Stack

Real-time monitoring dashboard with performance metrics

Last year, we shipped a Next.js app to production with zero monitoring. No error tracking, no performance metrics, no log aggregation. It was a job aggregator processing 50,000+ scraping operations daily, serving 200,000+ page views monthly. When things broke — and they broke often — we found out from users. Sometimes days later. The fix-it-when-it-breaks approach cost us an estimated 15% of daily active users before we instrumented everything.

This guide is the monitoring stack I wish existed when I started. It covers every layer — from browser-side Real User Monitoring to server-side APM, from log aggregation to synthetic monitoring — with specific Next.js configuration, real code, and honest cost analysis.

The Four Pillars of Next.js Monitoring

Before diving into tools, let's establish a framework. Production monitoring for Next.js applications sits on four pillars:

Pillar	What It Tracks	Key Metrics	Primary Tools
Error Tracking	Exceptions, unhandled rejections, API errors	Error rate, error frequency, affected users	Sentry, Bugsnag, Rollbar
Performance (APM)	Response times, throughput, Core Web Vitals	p50/p95/p99 latency, TTFB, LCP, CLS	Vercel Analytics, Datadog, New Relic
Logs	Application logs, access logs, build logs	Log volume, error log ratio, latency correlation	Axiom, Datadog Logs, Logtail
Uptime & Synthetics	Availability, SSL, DNS, critical user flows	Uptime %, response time from global PoPs	BetterUptime, Checkly, Pingdom

According to Datadog's 2024 State of Serverless report, 68% of organizations running serverless or edge-deployed applications (like Next.js on Vercel) lack adequate observability. The New Relic 2024 Observability Forecast found that organizations with mature monitoring practices resolve incidents 69% faster.

Pillar 1: Error Tracking with Sentry

Error tracking code on a developer screen

Sentry is the gold standard for error tracking in Next.js applications. It has first-class Next.js support with the @sentry/nextjs SDK, which automatically instruments both client-side and server-side errors.

Installation and Configuration

# Install the Sentry Next.js SDK
npm install @sentry/nextjs

# Run the setup wizard (recommended for first-time setup)
npx @sentry/wizard@latest -i nextjs

The wizard creates three configuration files:

// sentry.client.config.ts
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  environment: process.env.NODE_ENV,

  // Performance monitoring
  tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,

  // Session replay for debugging user-reported issues
  replaysSessionSampleRate: 0.01, // 1% of sessions
  replaysOnErrorSampleRate: 0.5,  // 50% of sessions with errors
  integrations: [
    Sentry.replayIntegration({
      maskAllText: false,
      blockAllMedia: false,
    }),
  ],

  // Filter out noise
  ignoreErrors: [
    'ResizeObserver loop limit exceeded',
    'ResizeObserver loop completed with undelivered notifications',
    'Non-Error promise rejection captured',
    /Loading chunk \d+ failed/,
    /ChunkLoadError/,
  ],

  beforeSend(event) {
    // Don't send errors from bots/crawlers
    const userAgent = event.request?.headers?.['user-agent'] || '';
    if (/bot|crawler|spider|googlebot|bingbot/i.test(userAgent)) {
      return null;
    }
    return event;
  },
});

// sentry.server.config.ts
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.2 : 1.0,

  // Capture unhandled promise rejections
  integrations: [
    Sentry.prismaIntegration(), // If using Prisma
  ],

  beforeSend(event) {
    // Redact sensitive data
    if (event.request?.data) {
      const data = event.request.data as Record<string, unknown>;
      if (data.password) data.password = '[REDACTED]';
      if (data.token) data.token = '[REDACTED]';
      if (data.creditCard) data.creditCard = '[REDACTED]';
    }
    return event;
  },
});

// sentry.edge.config.ts
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 0.1,
});

Next.js-Specific Error Boundaries

// app/global-error.tsx — catches errors in the root layout
'use client';

import * as Sentry from '@sentry/nextjs';
import { useEffect } from 'react';

export default function GlobalError({
  error,
  reset,
}: {
  error: Error & { digest?: string };
  reset: () => void;
}) {
  useEffect(() => {
    Sentry.captureException(error);
  }, [error]);

  return (
    <html>
      <body>
        <div style={{ padding: '2rem', textAlign: 'center' }}>
          <h2>Something went wrong</h2>
          <p>We've been notified and are working on a fix.</p>
          <button onClick={reset}>Try again</button>
        </div>
      </body>
    </html>
  );
}

// app/error.tsx — catches errors in page components
'use client';

import * as Sentry from '@sentry/nextjs';
import { useEffect } from 'react';

export default function Error({
  error,
  reset,
}: {
  error: Error & { digest?: string };
  reset: () => void;
}) {
  useEffect(() => {
    Sentry.captureException(error, {
      tags: { boundary: 'page-error' },
    });
  }, [error]);

  return (
    <div className="error-container">
      <h2>This page encountered an error</h2>
      <p>Error ID: {error.digest}</p>
      <button onClick={reset}>Reload</button>
    </div>
  );
}

Custom Error Context for API Routes

// lib/api-error-handler.ts
import * as Sentry from '@sentry/nextjs';
import { NextRequest, NextResponse } from 'next/server';

export function withErrorHandling(
  handler: (req: NextRequest) => Promise<NextResponse>
) {
  return async (req: NextRequest): Promise<NextResponse> => {
    try {
      return await handler(req);
    } catch (error) {
      Sentry.captureException(error, {
        tags: {
          route: req.nextUrl.pathname,
          method: req.method,
        },
        extra: {
          url: req.url,
          headers: Object.fromEntries(req.headers.entries()),
          searchParams: Object.fromEntries(req.nextUrl.searchParams.entries()),
        },
      });

      if (error instanceof ValidationError) {
        return NextResponse.json(
          { error: error.message, fields: error.fields },
          { status: 400 }
        );
      }

      if (error instanceof AuthorizationError) {
        return NextResponse.json(
          { error: 'Unauthorized' },
          { status: 401 }
        );
      }

      return NextResponse.json(
        { error: 'Internal server error' },
        { status: 500 }
      );
    }
  };
}

// Usage in an API route
// app/api/jobs/route.ts
export const GET = withErrorHandling(async (req) => {
  const jobs = await prisma.job.findMany({
    take: 20,
    orderBy: { createdAt: 'desc' },
  });
  return NextResponse.json(jobs);
});

Pillar 2: Performance Monitoring and Core Web Vitals

Next.js has built-in support for Core Web Vitals reporting. Here's how to capture and act on these metrics.

Web Vitals Reporting

// app/layout.tsx — report Web Vitals
import { SpeedInsights } from '@vercel/speed-insights/next';
import { Analytics } from '@vercel/analytics/react';

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return (
    <html>
      <body>
        {children}
        <SpeedInsights />
        <Analytics />
      </body>
    </html>
  );
}

// For custom Web Vitals collection (non-Vercel deployments)
// components/web-vitals.tsx
'use client';

import { useReportWebVitals } from 'next/web-vitals';

export function WebVitals() {
  useReportWebVitals((metric) => {
    const body = {
      name: metric.name,
      value: metric.value,
      rating: metric.rating, // 'good' | 'needs-improvement' | 'poor'
      delta: metric.delta,
      id: metric.id,
      navigationType: metric.navigationType,
      url: window.location.pathname,
    };

    // Send to your analytics endpoint
    if (navigator.sendBeacon) {
      navigator.sendBeacon('/api/vitals', JSON.stringify(body));
    } else {
      fetch('/api/vitals', {
        method: 'POST',
        body: JSON.stringify(body),
        keepalive: true,
      });
    }
  });

  return null;
}

Core Web Vitals Thresholds

Metric	Good	Needs Improvement	Poor	What It Measures
LCP	≤ 2.5s	2.5-4.0s	> 4.0s	Loading: when main content is visible
INP	≤ 200ms	200-500ms	> 500ms	Interactivity: response to user input
CLS	≤ 0.1	0.1-0.25	> 0.25	Visual stability: layout shift
TTFB	≤ 800ms	800ms-1.8s	> 1.8s	Server response time
FCP	≤ 1.8s	1.8-3.0s	> 3.0s	First paint: when any content appears

According to Google's research, sites meeting Core Web Vitals thresholds see 24% fewer abandoned page loads. For a job board like BirJob, that translates directly to more applications submitted.

Server-Side Performance: Tracking API Route Latency

// middleware.ts — global request timing
import { NextRequest, NextResponse } from 'next/server';

export function middleware(req: NextRequest) {
  const start = Date.now();
  const response = NextResponse.next();

  // Add server timing header
  response.headers.set('Server-Timing', `total;dur=${Date.now() - start}`);
  response.headers.set('X-Request-Id', crypto.randomUUID());

  return response;
}

// lib/performance.ts — function-level instrumentation
export function measureAsync<T>(
  name: string,
  fn: () => Promise<T>,
  tags?: Record<string, string>
): Promise<T> {
  return Sentry.startSpan({ name, op: 'function', ...tags }, async () => {
    const start = performance.now();
    try {
      const result = await fn();
      const duration = performance.now() - start;

      // Log slow operations
      if (duration > 1000) {
        console.warn(`Slow operation: ${name} took ${duration.toFixed(0)}ms`);
      }

      return result;
    } catch (error) {
      throw error;
    }
  });
}

// Usage
const jobs = await measureAsync('prisma.job.findMany', () =>
  prisma.job.findMany({
    where: { active: true },
    take: 50,
    orderBy: { createdAt: 'desc' },
  }),
  { 'db.system': 'postgresql' }
);

Pillar 3: Structured Logging

Next.js's default console.log is insufficient for production. You need structured, searchable, correlated logs.

Setting Up Pino for Next.js

// lib/logger.ts
import pino from 'pino';

const isProduction = process.env.NODE_ENV === 'production';

export const logger = pino({
  level: process.env.LOG_LEVEL || (isProduction ? 'info' : 'debug'),

  // Human-readable in dev, JSON in production
  transport: isProduction
    ? undefined // JSON output for log aggregators
    : { target: 'pino-pretty', options: { colorize: true } },

  // Add default fields to every log line
  base: {
    env: process.env.NODE_ENV,
    service: 'birjob-web',
    version: process.env.NEXT_PUBLIC_APP_VERSION || 'unknown',
  },

  // Redact sensitive fields automatically
  redact: {
    paths: ['req.headers.authorization', 'req.headers.cookie', 'password', 'token', 'secret'],
    censor: '[REDACTED]',
  },

  // Custom serializers
  serializers: {
    err: pino.stdSerializers.err,
    req: (req) => ({
      method: req.method,
      url: req.url,
      userAgent: req.headers?.['user-agent'],
      ip: req.headers?.['x-forwarded-for'] || req.socket?.remoteAddress,
    }),
  },
});

// Create child loggers for different modules
export const scraperLogger = logger.child({ module: 'scraper' });
export const apiLogger = logger.child({ module: 'api' });
export const authLogger = logger.child({ module: 'auth' });
export const paymentLogger = logger.child({ module: 'payment' });

Correlating Logs with Request IDs

// lib/request-context.ts
import { AsyncLocalStorage } from 'node:async_hooks';
import { logger } from './logger';

interface RequestContext {
  requestId: string;
  userId?: string;
  startTime: number;
  route: string;
}

export const requestContext = new AsyncLocalStorage<RequestContext>();

export function getRequestLogger() {
  const ctx = requestContext.getStore();
  if (!ctx) return logger;
  return logger.child({
    requestId: ctx.requestId,
    userId: ctx.userId,
    route: ctx.route,
  });
}

// Usage in API routes
export async function withRequestContext(
  req: NextRequest,
  handler: () => Promise<NextResponse>
): Promise<NextResponse> {
  const context: RequestContext = {
    requestId: req.headers.get('x-request-id') || crypto.randomUUID(),
    startTime: Date.now(),
    route: req.nextUrl.pathname,
  };

  return requestContext.run(context, async () => {
    const log = getRequestLogger();
    log.info({ method: req.method }, 'Request started');

    try {
      const response = await handler();
      log.info({ status: response.status, duration: Date.now() - context.startTime }, 'Request completed');
      response.headers.set('X-Request-Id', context.requestId);
      return response;
    } catch (error) {
      log.error({ error, duration: Date.now() - context.startTime }, 'Request failed');
      throw error;
    }
  });
}

Log Aggregation: Axiom Integration

// For Vercel deployments, Axiom is the best option
// next.config.js
const { withAxiom } = require('next-axiom');

module.exports = withAxiom({
  // your Next.js config
});

// Usage with next-axiom
import { log } from 'next-axiom';

export async function GET(req: NextRequest) {
  log.info('Job search', {
    query: req.nextUrl.searchParams.get('q'),
    page: req.nextUrl.searchParams.get('page'),
  });

  // ... handle request

  log.info('Job search completed', {
    resultCount: jobs.length,
    duration: Date.now() - start,
  });
}

Pillar 4: Uptime and Synthetic Monitoring

Uptime monitoring tells you when your application is down before your users tell you.

Health Check Endpoint

// app/api/health/route.ts
import { NextResponse } from 'next/server';
import { prisma } from '@/lib/prisma';

interface HealthCheckResult {
  status: 'healthy' | 'degraded' | 'unhealthy';
  timestamp: string;
  version: string;
  checks: {
    database: { status: string; latencyMs: number };
    memory: { usedMB: number; totalMB: number; percentage: number };
    uptime: number;
  };
}

export async function GET(): Promise<NextResponse<HealthCheckResult>> {
  const checks: HealthCheckResult['checks'] = {
    database: { status: 'unknown', latencyMs: 0 },
    memory: { usedMB: 0, totalMB: 0, percentage: 0 },
    uptime: process.uptime(),
  };

  // Check database
  const dbStart = Date.now();
  try {
    await prisma.$queryRaw`SELECT 1`;
    checks.database = { status: 'connected', latencyMs: Date.now() - dbStart };
  } catch {
    checks.database = { status: 'disconnected', latencyMs: Date.now() - dbStart };
  }

  // Check memory
  const mem = process.memoryUsage();
  checks.memory = {
    usedMB: Math.round(mem.heapUsed / 1024 / 1024),
    totalMB: Math.round(mem.heapTotal / 1024 / 1024),
    percentage: Math.round((mem.heapUsed / mem.heapTotal) * 100),
  };

  const isHealthy = checks.database.status === 'connected' && checks.memory.percentage < 90;
  const isDegraded = checks.database.latencyMs > 500 || checks.memory.percentage > 75;

  const result: HealthCheckResult = {
    status: !isHealthy ? 'unhealthy' : isDegraded ? 'degraded' : 'healthy',
    timestamp: new Date().toISOString(),
    version: process.env.NEXT_PUBLIC_APP_VERSION || 'unknown',
    checks,
  };

  return NextResponse.json(result, {
    status: isHealthy ? 200 : 503,
    headers: { 'Cache-Control': 'no-store' },
  });
}

Synthetic Monitoring with Checkly

// __checks__/homepage.check.ts (Checkly Monitoring-as-Code)
import { ApiCheck, AssertionBuilder } from 'checkly/constructs';

new ApiCheck('homepage-availability', {
  name: 'Homepage loads successfully',
  activated: true,
  frequency: 5, // Check every 5 minutes
  locations: ['eu-west-1', 'us-east-1', 'ap-southeast-1'],
  request: {
    method: 'GET',
    url: 'https://www.birjob.com',
    assertions: [
      AssertionBuilder.statusCode().equals(200),
      AssertionBuilder.responseTime().lessThan(3000),
      AssertionBuilder.body().contains('BirJob'),
    ],
  },
  alertChannels: [slackChannel, emailChannel],
});

// Browser check for critical user flow
import { BrowserCheck } from 'checkly/constructs';

new BrowserCheck('job-search-flow', {
  name: 'Job search returns results',
  activated: true,
  frequency: 15, // Every 15 minutes
  locations: ['eu-west-1'],
  code: {
    content: `
      const { expect, test } = require('@playwright/test');
      test('search for jobs', async ({ page }) => {
        await page.goto('https://www.birjob.com');
        await page.fill('[data-testid="search-input"]', 'developer');
        await page.click('[data-testid="search-button"]');
        await page.waitForSelector('[data-testid="job-card"]');
        const results = await page.locator('[data-testid="job-card"]').count();
        expect(results).toBeGreaterThan(0);
      });
    `,
  },
});

The Complete Monitoring Stack: Cost Comparison

Cost analysis dashboard for monitoring tools

Tool	Category	Free Tier	Paid Starting At	Best For
Sentry	Error Tracking	5K errors/month	$26/mo	Next.js error tracking (best SDK)
Vercel Analytics	Web Vitals	2.5K events/mo	$10/mo (Pro plan)	Vercel-hosted sites
Axiom	Logs	500MB ingest/mo	$25/mo	Vercel log drain
BetterUptime	Uptime	10 monitors	$20/mo	Simple uptime + status page
Checkly	Synthetics	5 checks	$30/mo	Playwright-based E2E monitoring
Datadog	Full Stack	14-day trial	$15/host/mo	Enterprise all-in-one
Grafana Cloud	Dashboards	Generous free tier	$29/mo	Custom dashboards + alerting

My recommended stack for small-to-medium Next.js apps (under $100/month total):

Sentry Developer plan ($0-26/mo) for error tracking
Vercel Analytics (included in Pro plan) for Web Vitals
Axiom free tier for logs
BetterUptime free tier for uptime monitoring

Alerting: Don't Wake Up for Everything

The biggest mistake teams make is over-alerting. Alert fatigue is real — according to PagerDuty's 2024 report, 44% of on-call engineers receive alerts that don't require action, leading to desensitization.

Alert Severity Framework

Severity	Criteria	Notification Channel	Response Time
P0 - Critical	Site down, data loss, security breach	Phone call + SMS + Slack	15 minutes
P1 - High	Major feature broken, error rate >5%	Slack + Email	1 hour
P2 - Medium	Degraded performance, non-critical errors	Slack	4 hours
P3 - Low	Warning threshold, minor issues	Daily digest email	Next business day

// Example: Sentry alert rules configuration
// In Sentry dashboard: Settings > Alerts

// P0: Site-wide error spike
// Trigger: Error count > 100 in 5 minutes
// Action: PagerDuty page + Slack #incidents

// P1: High error rate on specific routes
// Trigger: Error rate > 5% on /api/* routes in 10 minutes
// Action: Slack #alerts

// P2: Performance degradation
// Trigger: p95 response time > 3s for 15 minutes
// Action: Slack #monitoring

// P3: New unhandled errors
// Trigger: New issue (first seen)
// Action: Slack #errors-feed

Opinionated: What Most Guides Get Wrong

1. You don't need Datadog. For 90% of Next.js applications, the Sentry + Vercel Analytics + Axiom + BetterUptime stack covers everything at 1/10th the cost. Datadog is extraordinary, but its per-host pricing model makes it overkill for teams under 20 engineers.

2. Don't monitor everything equally. Your job search endpoint matters more than your about page. Your payment flow matters more than your blog. Allocate monitoring budget (both financial and attention) proportionally to business impact.

3. Sampling is not optional. If you're sending 100% of traces to Sentry, you're paying too much and getting too much noise. Sample 10-20% in production. You'll still catch patterns. For errors, always capture 100%.

4. Logs without structure are noise. If your logs look like console.log('user did thing'), they're worthless in production. Every log line needs a severity, a timestamp, a request ID, and structured metadata. Use Pino, not console.log.

5. Status pages build trust. A public status page (via BetterUptime, Statuspage, or even a simple page on your site) transforms outages from "is this site broken?" into "they know, they're working on it." It costs nothing and saves support tickets.

Action Plan: From Zero to Full Observability in One Sprint

Day 1: Error Tracking

Install @sentry/nextjs
Configure client, server, and edge configs
Add error boundaries (error.tsx, global-error.tsx)
Set up Slack integration for new errors

Day 2: Performance

Add Vercel Analytics / Speed Insights
Or set up custom Web Vitals reporting endpoint
Add server-side timing headers
Benchmark current Core Web Vitals scores

Day 3: Logging

Install Pino, configure structured logging
Replace all console.log with logger calls
Set up Axiom or your log aggregator
Add request ID correlation

Day 4: Uptime & Alerts

Create /api/health endpoint
Set up BetterUptime with 3-5 monitors
Configure alert severity levels
Create a public status page

Day 5: Dashboard & Review

Create a single "overview" dashboard with key metrics
Document runbooks for common alert scenarios
Schedule weekly monitoring review
Verify all alerts fire correctly (trigger a test error)

Developer setting up monitoring dashboards

Conclusion: Monitoring Is Not Optional

After instrumenting our Next.js application, we caught three critical bugs within the first week that had been silently affecting users for months. A database connection leak that caused 500 errors during peak hours. A memory leak in a Server Component that grew 10MB/hour. A third-party API timeout that cascaded into our job search being empty for users in certain regions.

Without monitoring, these bugs would have continued eroding user trust. With monitoring, we fixed them in hours. The total cost of our monitoring stack: $36/month. The cost of the bugs it catches: incalculable.

Sources

I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.

Loading BirJob...

Monitoring Production Next.js Apps: The Complete Stack

Monitoring Production Next.js Apps: The Complete Stack

The Four Pillars of Next.js Monitoring

Pillar 1: Error Tracking with Sentry

Installation and Configuration

Next.js-Specific Error Boundaries

Custom Error Context for API Routes

Pillar 2: Performance Monitoring and Core Web Vitals

Web Vitals Reporting

Core Web Vitals Thresholds

Server-Side Performance: Tracking API Route Latency

Pillar 3: Structured Logging

Setting Up Pino for Next.js

Correlating Logs with Request IDs

Log Aggregation: Axiom Integration

Pillar 4: Uptime and Synthetic Monitoring

Health Check Endpoint

Synthetic Monitoring with Checkly

The Complete Monitoring Stack: Cost Comparison

Alerting: Don't Wake Up for Everything

Alert Severity Framework

Opinionated: What Most Guides Get Wrong

Action Plan: From Zero to Full Observability in One Sprint

Conclusion: Monitoring Is Not Optional

Sources

İş axtarışınıza başlayın

Oxşar məqalələr