
Postgres Full-Text Search Is Better Than You Think (Until It Isn't)
Every few months someone on a team I'm advising will say they need to add search, and the first suggestion is usually Elasticsearch or Algolia or Typesense. Sometimes they're right. But a lot of the time, they're already running Postgres and they don't need to operate another service, pay another bill, or figure out syncing.
Postgres has had full-text search since version 8. It's not Elasticsearch. But if your search volume is under a few million rows and you don't need faceted filtering with sub-millisecond latency, it might be all you need.
Here's an honest walkthrough.
The Data Model
Postgres full-text search converts text into a tsvector a sorted list of lexemes (normalized word roots) with their positions in the document. A search query becomes a tsquery. You check if the tsquery matches the tsvector.
sqlSELECT to_tsvector('english', 'The quick brown foxes are jumping over fences') -- 'brown':3 'fence':8 'fox':4 'jump':6 'quick':2
Notice: "foxes" → "fox", "jumping" → "jump", "fences" → "fence", "The" → gone. This is stemming and stop-word removal. It means "fox" searches match "foxes" automatically.
sqlSELECT to_tsquery('english', 'fox & jump') -- 'fox' & 'jump' SELECT to_tsvector('english', 'The quick brown foxes are jumping') @@ to_tsquery('english', 'fox & jump'); -- t (true — it matches)
Setting It Up for a Real Table
Don't call to_tsvector at query time on millions of rows. Precompute and index it.
sql-- Add a generated tsvector column (Postgres 12+) ALTER TABLE articles ADD COLUMN search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(body, '')), 'B') || setweight(to_tsvector('english', coalesce(tags::text, '')), 'A') ) STORED; -- GIN index for fast lookups CREATE INDEX articles_search_vector_idx ON articles USING GIN (search_vector);
The setweight function lets you rank matches differently depending on where they appear. 'A' weight is highest. A title match ranks higher than a body match.
Querying it:
sqlSELECT id, title, ts_rank(search_vector, query) AS rank FROM articles, to_tsquery('english', 'postgres & search') query WHERE search_vector @@ query ORDER BY rank DESC LIMIT 20;
Handling User Input
Users don't type boolean queries. They type phrases. plainto_tsquery converts free text to an AND query, phraseto_tsquery requires exact phrase order, and websearch_to_tsquery parses Google-style syntax:
sqlSELECT websearch_to_tsquery('english', 'postgres search -elasticsearch') -- 'postgres' & 'search' & !'elasticsearch' SELECT websearch_to_tsquery('english', '"full text" search') -- 'full' <-> 'text' & 'search' (phrase match + AND)
websearch_to_tsquery is what I use for user-facing search. It's the most natural and handles the weird inputs users actually type.
Highlighting Results
sqlSELECT id, title, ts_headline( 'english', body, to_tsquery('english', 'postgres'), 'MaxFragments=3, MaxWords=35, MinWords=20, StartSel=<mark>, StopSel=</mark>' ) AS excerpt FROM articles WHERE search_vector @@ to_tsquery('english', 'postgres');
ts_headline is surprisingly capable. It finds the best matching fragments and wraps hits in your chosen markup. You can configure fragment count, length, and delimiters.
The Full Search Function
Here's what I put into a Postgres function for a docs search feature:
sqlCREATE OR REPLACE FUNCTION search_articles( query_text TEXT, page_num INTEGER DEFAULT 1, page_size INTEGER DEFAULT 20 ) RETURNS TABLE ( id UUID, title TEXT, excerpt TEXT, rank REAL, total_count BIGINT ) LANGUAGE plpgsql AS $$ DECLARE ts_query tsquery; BEGIN -- Convert user input to a tsquery, handle empty/invalid input BEGIN ts_query := websearch_to_tsquery('english', query_text); EXCEPTION WHEN others THEN ts_query := plainto_tsquery('english', query_text); END; RETURN QUERY WITH search_results AS ( SELECT a.id, a.title, a.body, a.search_vector, ts_rank_cd(a.search_vector, ts_query) AS rank, COUNT(*) OVER () AS total_count FROM articles a WHERE a.search_vector @@ ts_query AND a.published = true ORDER BY rank DESC OFFSET (page_num - 1) * page_size LIMIT page_size ) SELECT sr.id, sr.title, ts_headline( 'english', sr.body, ts_query, 'MaxFragments=2, MaxWords=30, MinWords=15, StartSel=<mark>, StopSel=</mark>' ), sr.rank, sr.total_count FROM search_results sr; END; $$;
Call it from your app:
typescriptconst results = await db.query( 'SELECT * FROM search_articles($1, $2, $3)', [searchQuery, page, pageSize] )
Where Postgres Falls Over
Be honest with yourself about when to move on:
Fuzzy/typo matching. to_tsquery('english', 'potsgres') returns nothing. There's no built-in fuzzy matching. You can get creative with pg_trgm (trigram matching), which does handle typos, but combining it with tsvector ranking is awkward.
sql-- pg_trgm for fuzzy matching, different approach CREATE INDEX articles_title_trgm_idx ON articles USING GIN (title gin_trgm_ops); SELECT title, similarity(title, 'potsgres') AS sim FROM articles WHERE title % 'potsgres' ORDER BY sim DESC;
The two approaches (tsvector and trigram) can coexist but you're essentially maintaining two search systems and combining their results, which gets complicated.
Synonyms and custom dictionaries. You can configure custom Postgres text search dictionaries and thesauruses, but the configuration is XML files on the server and it's painful enough that most teams don't bother.
Faceted search at scale. Filtering by multiple categories while ranking by relevance is doable in Postgres, but if you have complex facets with counts ("32 results in 'JavaScript', 18 in 'Python'"), the query plans get expensive fast.
More than ~5M rows. This isn't a hard cutoff and it depends heavily on your query patterns, but this is where I start benchmarking seriously and considering alternatives.
My Decision Framework
Postgres search first if:
- You're on Postgres already
- Less than a few million searchable records
- Mostly keyword search, not fuzzy/semantic
- You don't need real-time index updates for search features (Postgres replication and GENERATED columns keep this fairly current)
Dedicated search when:
- Typo tolerance is a hard requirement
- Semantic/vector search (though Postgres + pgvector is changing this)
- Facets with live counts at scale
- Sub-100ms search across 10M+ records
The operational cost of another service is real. Running Elasticsearch for a 50,000-row knowledge base is overkill, and you'll spend more time on ops than on features. Start with Postgres. Migrate when you have data that proves you need to.