
Your App Is Probably Misconfigured for Database Connections
I've diagnosed the same production incident pattern more times than I'd like. App is running fine, traffic picks up, database starts throwing "too many connections" or queries start timing out even though the database itself isn't under heavy load. Check the connection count: it's at the max. Check the active queries: most of them are idle, just sitting there holding connections.
This is a pooling problem. It's fixable, but you need to understand what's actually happening.
What a Connection Is
A database connection is a stateful TCP connection with established authentication, an assigned process on the database server, and allocated memory for session state. Postgres, for instance, spawns a backend process per connection. Connections are not cheap.
Most databases have a hard limit on how many connections they'll accept (max_connections in Postgres, default 100). When you hit that limit, new connection attempts are refused not queued, not retried refused. Your app gets an error.
A connection pool maintains a set of pre-established connections and hands them out to application threads or requests on demand, releasing them back to the pool when done. The goal is to avoid the overhead of establishing and tearing down connections per request, and to enforce a ceiling on how many connections the application holds.
The Serverless Problem
Traditional long-running servers and serverless/edge functions have completely different connection dynamics.
A traditional Node.js server initializes a pool at startup, keeps connections warm, and the pool stays at whatever size makes sense for the load. Simple.
A serverless function (Vercel functions, AWS Lambda, Netlify functions) spins up a new process per request. Each process initializes its own connection pool. If you have 500 concurrent requests, you potentially have 500 pool initializations, each trying to hold pool.max connections. Against a Postgres instance with max_connections=100, this is immediately catastrophic.
This is why you see recommendations to use PgBouncer or Prisma Accelerate or Supabase's connection pooler with serverless. You need an external connection pooler that sits between your functions and the database, maintaining a shared pool.
Setting Up pg Properly
For a long-running Node.js server, the native pg module's pool:
typescript// lib/db.ts import { Pool } from 'pg' // One pool instance for the entire application lifetime // Never create a pool per-request export const pool = new Pool({ connectionString: process.env.DATABASE_URL, // Pool sizing — the rule of thumb: // max = number of CPU cores on the DB server (for CPU-bound queries) // or = desired max concurrent queries + some headroom max: 20, // Minimum connections to keep warm min: 2, // How long a connection can sit idle before being released idleTimeoutMillis: 30_000, // How long to wait for a connection from the pool before erroring connectionTimeoutMillis: 5_000, // How long a query can run before timing out (client-side) // statement_timeout in Postgres config does this server-side query_timeout: 30_000, }) // Surface pool health in logs pool.on('connect', () => { // New physical connection established }) pool.on('error', (err) => { console.error('Unexpected pool error:', err) }) // Expose pool stats for monitoring export function getPoolStats() { return { total: pool.totalCount, idle: pool.idleCount, waiting: pool.waitingCount, } }
The most important rule: one pool per application, not one pool per request. I've seen people create new Pool() inside their request handler. That means a new pool — and potentially new TCP connections — for every single request.
Transaction Handling and Connection Leaks
A connection is returned to the pool when the query completes. But transactions hold the connection until they're committed or rolled back. Forgetting to commit/rollback is the most common source of connection exhaustion.
typescript// Wrong: if the logic throws, the connection leaks async function transferBalance(fromId: string, toId: string, amount: number) { const client = await pool.connect() await client.query('BEGIN') await client.query('UPDATE accounts SET balance = balance - $1 WHERE id = $2', [amount, fromId]) await client.query('UPDATE accounts SET balance = balance + $1 WHERE id = $2', [amount, toId]) await client.query('COMMIT') client.release() // Never reached if an error is thrown } // Right: always release in a finally block async function transferBalance(fromId: string, toId: string, amount: number) { const client = await pool.connect() try { await client.query('BEGIN') await client.query( 'UPDATE accounts SET balance = balance - $1 WHERE id = $2', [amount, fromId] ) await client.query( 'UPDATE accounts SET balance = balance + $1 WHERE id = $2', [amount, toId] ) await client.query('COMMIT') } catch (err) { await client.query('ROLLBACK') throw err } finally { client.release() // Always runs } }
Better: wrap this in a helper so you never forget:
typescriptasync function withTransaction<T>( fn: (client: PoolClient) => Promise<T> ): Promise<T> { const client = await pool.connect() try { await client.query('BEGIN') const result = await fn(client) await client.query('COMMIT') return result } catch (err) { await client.query('ROLLBACK') throw err } finally { client.release() } } // Clean usage await withTransaction(async (client) => { await client.query('UPDATE accounts SET balance = balance - $1 WHERE id = $2', [amount, fromId]) await client.query('UPDATE accounts SET balance = balance + $1 WHERE id = $2', [amount, toId]) })
Pool Size: The Math
There's a famous HikariCP post (the Java connection pool) that argues the optimal pool size is surprisingly small — often as small as (core_count * 2) + effective_spindle_count for the database server. For a 4-core Postgres server with SSDs, that's around 10.
The intuition: Postgres is CPU-bound for most query workloads. More connections than CPUs means contexts switching and waiting, not more parallelism. Large connection pools often hurt performance rather than help.
The practical guidance:
- Start with
max: 10for small apps - Increase based on observed
pool.waitingCountin your metrics - Watch for query latency increasing as pool size increases — that's a sign you've gone too far
For serverless with PgBouncer:
typescript// With an external pooler, your app-level pool can be small // because PgBouncer is doing the real work export const pool = new Pool({ connectionString: process.env.DATABASE_URL, // Points to PgBouncer max: 5, // Small — PgBouncer manages the real connections min: 1, })
Healthchecks and Pool Recovery
Connections go stale. A Postgres restart, a network blip, an idle connection killed by a firewall — all of these leave the pool holding dead connections that return errors on use.
Most pools have a health check mechanism:
typescriptexport const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20, // pg doesn't have a built-in health check, but you can test connections // by catching and retrying on specific error codes }) // Wrapper that retries on connection errors once async function query(text: string, params?: unknown[]) { try { return await pool.query(text, params) } catch (err: any) { // Connection terminated, connection reset, broken pipe if (['57P01', '08006', '08001', 'ECONNRESET'].includes(err.code)) { // Wait a moment and retry once await new Promise((r) => setTimeout(r, 100)) return pool.query(text, params) } throw err } }
Prisma handles this internally. If you're using raw pg, add at minimum a liveness check to your app's healthcheck endpoint:
typescriptapp.get('/health', async (req, res) => { try { await pool.query('SELECT 1') res.json({ status: 'ok', pool: getPoolStats() }) } catch { res.status(503).json({ status: 'db_unavailable' }) } })
What to Monitor
These three numbers tell you most of what you need to know about pool health:
pool.totalCount total connections open (established or being established). Should stay well below max_connections on your DB.
pool.idleCount connections sitting idle waiting to be used. Too high means your max is set higher than needed.
pool.waitingCount requests waiting for a connection. Any sustained non-zero value here means your pool is undersized for the load.
Expose these to your metrics (Datadog, Prometheus, whatever you use) and alert on waitingCount > 0 sustained for more than 30 seconds. That's the canary for "connection starvation is happening right now."