Postgres Full-Text Search Is Better Than You Think (Until It Isn't)

Every few months someone on a team I'm advising will say they need to add search, and the first suggestion is usually Elasticsearch or Algolia or Typesense. Sometimes they're right. But a lot of the time, they're already running Postgres and they don't need to operate another service, pay another bill, or figure out syncing.

Postgres has had full-text search since version 8. It's not Elasticsearch. But if your search volume is under a few million rows and you don't need faceted filtering with sub-millisecond latency, it might be all you need.

Here's an honest walkthrough.

The Data Model

Postgres full-text search converts text into a tsvector a sorted list of lexemes (normalized word roots) with their positions in the document. A search query becomes a tsquery. You check if the tsquery matches the tsvector.


sql
SELECT to_tsvector('english', 'The quick brown foxes are jumping over fences')
-- 'brown':3 'fence':8 'fox':4 'jump':6 'quick':2

Notice: "foxes" → "fox", "jumping" → "jump", "fences" → "fence", "The" → gone. This is stemming and stop-word removal. It means "fox" searches match "foxes" automatically.


sql
SELECT to_tsquery('english', 'fox & jump')
-- 'fox' & 'jump'

SELECT to_tsvector('english', 'The quick brown foxes are jumping') @@
       to_tsquery('english', 'fox & jump');
-- t (true — it matches)

Setting It Up for a Real Table

Don't call to_tsvector at query time on millions of rows. Precompute and index it.


sql
-- Add a generated tsvector column (Postgres 12+)
ALTER TABLE articles
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
  setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
  setweight(to_tsvector('english', coalesce(body, '')), 'B') ||
  setweight(to_tsvector('english', coalesce(tags::text, '')), 'A')
) STORED;

-- GIN index for fast lookups
CREATE INDEX articles_search_vector_idx
ON articles USING GIN (search_vector);

The setweight function lets you rank matches differently depending on where they appear. 'A' weight is highest. A title match ranks higher than a body match.

Querying it:


sql
SELECT
  id,
  title,
  ts_rank(search_vector, query) AS rank
FROM
  articles,
  to_tsquery('english', 'postgres & search') query
WHERE
  search_vector @@ query
ORDER BY
  rank DESC
LIMIT 20;

Handling User Input

Users don't type boolean queries. They type phrases. plainto_tsquery converts free text to an AND query, phraseto_tsquery requires exact phrase order, and websearch_to_tsquery parses Google-style syntax:


sql
SELECT websearch_to_tsquery('english', 'postgres search -elasticsearch')
-- 'postgres' & 'search' & !'elasticsearch'

SELECT websearch_to_tsquery('english', '"full text" search')
-- 'full' <-> 'text' & 'search'  (phrase match + AND)

websearch_to_tsquery is what I use for user-facing search. It's the most natural and handles the weird inputs users actually type.

Highlighting Results


sql
SELECT
  id,
  title,
  ts_headline(
    'english',
    body,
    to_tsquery('english', 'postgres'),
    'MaxFragments=3, MaxWords=35, MinWords=20, StartSel=<mark>, StopSel=</mark>'
  ) AS excerpt
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgres');

ts_headline is surprisingly capable. It finds the best matching fragments and wraps hits in your chosen markup. You can configure fragment count, length, and delimiters.

The Full Search Function

Here's what I put into a Postgres function for a docs search feature:


sql
CREATE OR REPLACE FUNCTION search_articles(
  query_text TEXT,
  page_num INTEGER DEFAULT 1,
  page_size INTEGER DEFAULT 20
)
RETURNS TABLE (
  id UUID,
  title TEXT,
  excerpt TEXT,
  rank REAL,
  total_count BIGINT
)
LANGUAGE plpgsql AS $$
DECLARE
  ts_query tsquery;
BEGIN
  -- Convert user input to a tsquery, handle empty/invalid input
  BEGIN
    ts_query := websearch_to_tsquery('english', query_text);
  EXCEPTION WHEN others THEN
    ts_query := plainto_tsquery('english', query_text);
  END;

  RETURN QUERY
  WITH search_results AS (
    SELECT
      a.id,
      a.title,
      a.body,
      a.search_vector,
      ts_rank_cd(a.search_vector, ts_query) AS rank,
      COUNT(*) OVER () AS total_count
    FROM articles a
    WHERE a.search_vector @@ ts_query
      AND a.published = true
    ORDER BY rank DESC
    OFFSET (page_num - 1) * page_size
    LIMIT page_size
  )
  SELECT
    sr.id,
    sr.title,
    ts_headline(
      'english',
      sr.body,
      ts_query,
      'MaxFragments=2, MaxWords=30, MinWords=15, StartSel=<mark>, StopSel=</mark>'
    ),
    sr.rank,
    sr.total_count
  FROM search_results sr;
END;
$$;

Call it from your app:


typescript
const results = await db.query(
  'SELECT * FROM search_articles($1, $2, $3)',
  [searchQuery, page, pageSize]
)

Where Postgres Falls Over

Be honest with yourself about when to move on:

Fuzzy/typo matching. to_tsquery('english', 'potsgres') returns nothing. There's no built-in fuzzy matching. You can get creative with pg_trgm (trigram matching), which does handle typos, but combining it with tsvector ranking is awkward.


sql
-- pg_trgm for fuzzy matching, different approach
CREATE INDEX articles_title_trgm_idx
ON articles USING GIN (title gin_trgm_ops);

SELECT title, similarity(title, 'potsgres') AS sim
FROM articles
WHERE title % 'potsgres'
ORDER BY sim DESC;

The two approaches (tsvector and trigram) can coexist but you're essentially maintaining two search systems and combining their results, which gets complicated.

Synonyms and custom dictionaries. You can configure custom Postgres text search dictionaries and thesauruses, but the configuration is XML files on the server and it's painful enough that most teams don't bother.

Faceted search at scale. Filtering by multiple categories while ranking by relevance is doable in Postgres, but if you have complex facets with counts ("32 results in 'JavaScript', 18 in 'Python'"), the query plans get expensive fast.

More than ~5M rows. This isn't a hard cutoff and it depends heavily on your query patterns, but this is where I start benchmarking seriously and considering alternatives.

My Decision Framework

Postgres search first if:

You're on Postgres already
Less than a few million searchable records
Mostly keyword search, not fuzzy/semantic
You don't need real-time index updates for search features (Postgres replication and GENERATED columns keep this fairly current)

Dedicated search when:

Typo tolerance is a hard requirement
Semantic/vector search (though Postgres + pgvector is changing this)
Facets with live counts at scale
Sub-100ms search across 10M+ records

The operational cost of another service is real. Running Elasticsearch for a 50,000-row knowledge base is overkill, and you'll spend more time on ops than on features. Start with Postgres. Migrate when you have data that proves you need to.