Your App Is Probably Misconfigured for Database Connections

I've diagnosed the same production incident pattern more times than I'd like. App is running fine, traffic picks up, database starts throwing "too many connections" or queries start timing out even though the database itself isn't under heavy load. Check the connection count: it's at the max. Check the active queries: most of them are idle, just sitting there holding connections.

This is a pooling problem. It's fixable, but you need to understand what's actually happening.

What a Connection Is

A database connection is a stateful TCP connection with established authentication, an assigned process on the database server, and allocated memory for session state. Postgres, for instance, spawns a backend process per connection. Connections are not cheap.

Most databases have a hard limit on how many connections they'll accept (max_connections in Postgres, default 100). When you hit that limit, new connection attempts are refused not queued, not retried refused. Your app gets an error.

A connection pool maintains a set of pre-established connections and hands them out to application threads or requests on demand, releasing them back to the pool when done. The goal is to avoid the overhead of establishing and tearing down connections per request, and to enforce a ceiling on how many connections the application holds.

The Serverless Problem

Traditional long-running servers and serverless/edge functions have completely different connection dynamics.

A traditional Node.js server initializes a pool at startup, keeps connections warm, and the pool stays at whatever size makes sense for the load. Simple.

A serverless function (Vercel functions, AWS Lambda, Netlify functions) spins up a new process per request. Each process initializes its own connection pool. If you have 500 concurrent requests, you potentially have 500 pool initializations, each trying to hold pool.max connections. Against a Postgres instance with max_connections=100, this is immediately catastrophic.

This is why you see recommendations to use PgBouncer or Prisma Accelerate or Supabase's connection pooler with serverless. You need an external connection pooler that sits between your functions and the database, maintaining a shared pool.

Setting Up pg Properly

For a long-running Node.js server, the native pg module's pool:


typescript
// lib/db.ts
import { Pool } from 'pg'

// One pool instance for the entire application lifetime
// Never create a pool per-request
export const pool = new Pool({
  connectionString: process.env.DATABASE_URL,

  // Pool sizing — the rule of thumb:
  // max = number of CPU cores on the DB server (for CPU-bound queries)
  // or = desired max concurrent queries + some headroom
  max: 20,

  // Minimum connections to keep warm
  min: 2,

  // How long a connection can sit idle before being released
  idleTimeoutMillis: 30_000,

  // How long to wait for a connection from the pool before erroring
  connectionTimeoutMillis: 5_000,

  // How long a query can run before timing out (client-side)
  // statement_timeout in Postgres config does this server-side
  query_timeout: 30_000,
})

// Surface pool health in logs
pool.on('connect', () => {
  // New physical connection established
})

pool.on('error', (err) => {
  console.error('Unexpected pool error:', err)
})

// Expose pool stats for monitoring
export function getPoolStats() {
  return {
    total: pool.totalCount,
    idle: pool.idleCount,
    waiting: pool.waitingCount,
  }
}

The most important rule: one pool per application, not one pool per request. I've seen people create new Pool() inside their request handler. That means a new pool — and potentially new TCP connections — for every single request.

Transaction Handling and Connection Leaks

A connection is returned to the pool when the query completes. But transactions hold the connection until they're committed or rolled back. Forgetting to commit/rollback is the most common source of connection exhaustion.


typescript
// Wrong: if the logic throws, the connection leaks
async function transferBalance(fromId: string, toId: string, amount: number) {
  const client = await pool.connect()
  await client.query('BEGIN')
  await client.query('UPDATE accounts SET balance = balance - $1 WHERE id = $2', [amount, fromId])
  await client.query('UPDATE accounts SET balance = balance + $1 WHERE id = $2', [amount, toId])
  await client.query('COMMIT')
  client.release()  // Never reached if an error is thrown
}

// Right: always release in a finally block
async function transferBalance(fromId: string, toId: string, amount: number) {
  const client = await pool.connect()

  try {
    await client.query('BEGIN')
    await client.query(
      'UPDATE accounts SET balance = balance - $1 WHERE id = $2',
      [amount, fromId]
    )
    await client.query(
      'UPDATE accounts SET balance = balance + $1 WHERE id = $2',
      [amount, toId]
    )
    await client.query('COMMIT')
  } catch (err) {
    await client.query('ROLLBACK')
    throw err
  } finally {
    client.release()  // Always runs
  }
}

Better: wrap this in a helper so you never forget:


typescript
async function withTransaction<T>(
  fn: (client: PoolClient) => Promise<T>
): Promise<T> {
  const client = await pool.connect()

  try {
    await client.query('BEGIN')
    const result = await fn(client)
    await client.query('COMMIT')
    return result
  } catch (err) {
    await client.query('ROLLBACK')
    throw err
  } finally {
    client.release()
  }
}

// Clean usage
await withTransaction(async (client) => {
  await client.query('UPDATE accounts SET balance = balance - $1 WHERE id = $2', [amount, fromId])
  await client.query('UPDATE accounts SET balance = balance + $1 WHERE id = $2', [amount, toId])
})

Pool Size: The Math

There's a famous HikariCP post (the Java connection pool) that argues the optimal pool size is surprisingly small — often as small as (core_count * 2) + effective_spindle_count for the database server. For a 4-core Postgres server with SSDs, that's around 10.

The intuition: Postgres is CPU-bound for most query workloads. More connections than CPUs means contexts switching and waiting, not more parallelism. Large connection pools often hurt performance rather than help.

The practical guidance:

Start with max: 10 for small apps
Increase based on observed pool.waitingCount in your metrics
Watch for query latency increasing as pool size increases — that's a sign you've gone too far

For serverless with PgBouncer:


typescript
// With an external pooler, your app-level pool can be small
// because PgBouncer is doing the real work
export const pool = new Pool({
  connectionString: process.env.DATABASE_URL, // Points to PgBouncer
  max: 5,  // Small — PgBouncer manages the real connections
  min: 1,
})

Healthchecks and Pool Recovery

Connections go stale. A Postgres restart, a network blip, an idle connection killed by a firewall — all of these leave the pool holding dead connections that return errors on use.

Most pools have a health check mechanism:


typescript
export const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,
  // pg doesn't have a built-in health check, but you can test connections
  // by catching and retrying on specific error codes
})

// Wrapper that retries on connection errors once
async function query(text: string, params?: unknown[]) {
  try {
    return await pool.query(text, params)
  } catch (err: any) {
    // Connection terminated, connection reset, broken pipe
    if (['57P01', '08006', '08001', 'ECONNRESET'].includes(err.code)) {
      // Wait a moment and retry once
      await new Promise((r) => setTimeout(r, 100))
      return pool.query(text, params)
    }
    throw err
  }
}

Prisma handles this internally. If you're using raw pg, add at minimum a liveness check to your app's healthcheck endpoint:


typescript
app.get('/health', async (req, res) => {
  try {
    await pool.query('SELECT 1')
    res.json({ status: 'ok', pool: getPoolStats() })
  } catch {
    res.status(503).json({ status: 'db_unavailable' })
  }
})

What to Monitor

These three numbers tell you most of what you need to know about pool health:

pool.totalCount total connections open (established or being established). Should stay well below max_connections on your DB.

pool.idleCount connections sitting idle waiting to be used. Too high means your max is set higher than needed.

pool.waitingCount requests waiting for a connection. Any sustained non-zero value here means your pool is undersized for the load.

Expose these to your metrics (Datadog, Prometheus, whatever you use) and alert on waitingCount > 0 sustained for more than 30 seconds. That's the canary for "connection starvation is happening right now."