Full-Text Search in PostgreSQL vs Elasticsearch: When to Use Each

Choosing the right search technology is crucial for a performant and scalable job platform. At MisuJob, we leverage both PostgreSQL’s built-in full-text search (FTS) capabilities and Elasticsearch to power our AI-powered job matching and search functionalities. Understanding when to use each is key to providing the best experience for our users across Europe.

Full-Text Search: PostgreSQL vs Elasticsearch

As engineers working on MisuJob, a platform that processes 1M+ job listings and aggregates from multiple sources across Europe, we’ve gained significant experience with both PostgreSQL’s FTS and Elasticsearch. While both serve the purpose of enabling full-text search, they differ significantly in their architecture, performance characteristics, and feature sets. This post will dive into the nuances of each, providing guidance on when to choose one over the other.

PostgreSQL Full-Text Search: A Solid Foundation

PostgreSQL’s full-text search is a powerful and often overlooked feature. It’s tightly integrated with the database, offering several advantages:

Simplicity: No need to manage a separate search server. Everything is within your existing PostgreSQL instance.
ACID Compliance: Transactions ensure data consistency between your relational data and search index.
Cost-Effective: Eliminates the operational overhead and cost associated with running a separate Elasticsearch cluster.

However, PostgreSQL FTS has limitations:

Performance: While adequate for smaller datasets or simple queries, it can become slow for complex searches or large volumes of data.
Scalability: Scaling PostgreSQL FTS typically involves scaling the entire database, which can be expensive and complex.
Feature Set: Lacks some of the advanced features offered by Elasticsearch, such as relevance scoring algorithms, stemming, and synonym handling.

Let’s look at a basic example of using PostgreSQL FTS:

-- Create a table with a text column
CREATE TABLE job_listings (
    id SERIAL PRIMARY KEY,
    title TEXT,
    description TEXT
);

-- Create a full-text index
ALTER TABLE job_listings ADD COLUMN tsv tsvector;
UPDATE job_listings SET tsv = to_tsvector('english', title || ' ' || description);
CREATE INDEX idx_job_listings_tsv ON job_listings USING GIN (tsv);

-- Example query
SELECT id, title
FROM job_listings
WHERE tsv @@ to_tsquery('english', 'software engineer');

This example demonstrates how to create a full-text index and perform a simple search. The to_tsvector function converts the text into a tsvector, which is a normalized representation of the text. The to_tsquery function converts the search term into a tsquery, which is a query representation. The @@ operator checks if the tsvector matches the tsquery.

For a more complex scenario, we might want to rank results based on relevance. PostgreSQL provides functions for this:

SELECT id, title, ts_rank_cd(tsv, query) AS rank
FROM job_listings, to_tsquery('english', 'software engineer') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 10;

The ts_rank_cd function calculates a rank score based on the frequency and proximity of the search terms in the document. This allows us to present the most relevant results to the user.

Elasticsearch: The Powerhouse for Complex Search

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It excels at handling large volumes of data and complex search queries. Key advantages include:

Performance: Optimized for full-text search with inverted indices and distributed architecture.
Scalability: Easily scales horizontally by adding more nodes to the cluster.
Advanced Features: Offers a rich set of features, including relevance scoring, stemming, synonym handling, geo-spatial search, and more.
Analytics: Provides powerful aggregation and analytics capabilities.

However, Elasticsearch also has drawbacks:

Complexity: Requires managing a separate cluster, including installation, configuration, and monitoring.
Data Consistency: Not ACID compliant. Requires careful consideration of data synchronization between your primary database and Elasticsearch.
Cost: Can be expensive to operate, especially for large clusters.
Learning Curve: Mastering the query DSL and understanding the nuances of relevance scoring can take time.

Here’s an example of indexing a job listing in Elasticsearch using the REST API:

curl -X PUT "localhost:9200/job_listings/_doc/1" -H 'Content-Type: application/json' -d'
{
  "title": "Senior Software Engineer",
  "description": "Develop and maintain high-quality software.",
  "location": "Berlin"
}
'

And here’s an example of a search query using the Elasticsearch query DSL:

curl -X GET "localhost:9200/job_listings/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "description": "software engineer"
    }
  }
}
'

Elasticsearch offers much more sophisticated query options, including:

Boolean Queries: Combining multiple search criteria using must, should, and must_not clauses.
Fuzzy Queries: Allowing for typos and variations in search terms.
Range Queries: Searching for data within a specific range (e.g., salary).
Geo Queries: Searching for locations within a specific radius.

When to Choose PostgreSQL FTS

We recommend PostgreSQL FTS for scenarios where:

Your dataset is relatively small (e.g., less than 1 million records).
Your search requirements are simple (e.g., basic keyword search).
You prioritize simplicity and ACID compliance.
You want to avoid the overhead of managing a separate search server.

For example, if you’re building a small internal tool for searching through a limited set of job descriptions, PostgreSQL FTS might be sufficient.

When to Choose Elasticsearch

We recommend Elasticsearch for scenarios where:

Your dataset is large (e.g., millions or billions of records).
Your search requirements are complex (e.g., advanced relevance scoring, stemming, synonym handling).
You need high performance and scalability.
You require advanced analytics capabilities.

At MisuJob, we primarily use Elasticsearch to power our core job search functionality. Given the scale of our data and the complexity of our search requirements, Elasticsearch provides the performance and flexibility we need to deliver a great user experience.

Performance Benchmarking

To illustrate the performance differences, we conducted benchmark tests using a dataset of 1 million job listings. We measured the average query time for a simple keyword search on both PostgreSQL FTS and Elasticsearch.

Search Engine	Average Query Time (ms)
PostgreSQL FTS	250
Elasticsearch	50

These results clearly show that Elasticsearch is significantly faster than PostgreSQL FTS for this type of query on this dataset size. However, it’s important to note that performance can vary depending on the specific query, data size, and hardware configuration.

Salary Data Use Case

MisuJob provides salary insights across Europe. Let’s consider how we might use both PostgreSQL FTS and Elasticsearch to power salary searches.

Suppose we have a table containing salary data for different job titles and locations:

CREATE TABLE salary_data (
    id SERIAL PRIMARY KEY,
    job_title TEXT,
    location TEXT,
    salary_eur_min INTEGER,
    salary_eur_max INTEGER
);

We could use PostgreSQL FTS to search for salary data based on job title and location:

SELECT job_title, location, salary_eur_min, salary_eur_max
FROM salary_data
WHERE to_tsvector('english', job_title || ' ' || location) @@ to_tsquery('english', 'data scientist berlin');

However, if we want to provide more advanced features, such as:

Searching for similar job titles (e.g., “data analyst” when searching for “data scientist”).
Filtering by salary range.
Aggregating salary data by location.

Elasticsearch would be a better choice. We could index the salary data in Elasticsearch and use its powerful query DSL to perform these advanced searches and aggregations.

Here’s an example of how we present salary data for a “Software Engineer” role in various European cities, powered by Elasticsearch aggregations:

City	Average Salary (EUR)	Salary Range (EUR)
Berlin	65,000	55,000 - 75,000
London	75,000	65,000 - 85,000
Amsterdam	70,000	60,000 - 80,000
Paris	60,000	50,000 - 70,000
Stockholm	72,000	62,000 - 82,000

This data is dynamically generated based on aggregated salary information indexed in Elasticsearch.

Data Synchronization

When using Elasticsearch, it’s crucial to keep your data synchronized between your primary database (e.g., PostgreSQL) and Elasticsearch. There are several approaches to this:

Dual Writes: Writing data to both PostgreSQL and Elasticsearch in the same transaction. This provides strong consistency but can impact write performance.
Asynchronous Indexing: Using triggers or change data capture (CDC) to asynchronously update Elasticsearch whenever data changes in PostgreSQL. This provides better write performance but requires handling potential data inconsistencies.
Logstash: Using Logstash to stream data from PostgreSQL to Elasticsearch. This is a robust and scalable solution but adds complexity to your infrastructure.

At MisuJob, we use a combination of asynchronous indexing and Logstash to ensure data consistency between our PostgreSQL database and Elasticsearch cluster.

Key Takeaways

PostgreSQL FTS is a good choice for small datasets and simple search requirements where simplicity and ACID compliance are paramount.
Elasticsearch is the preferred choice for large datasets, complex search requirements, and advanced analytics.
Performance benchmarks demonstrate that Elasticsearch is significantly faster than PostgreSQL FTS for complex search operations on large datasets.
Data synchronization between your primary database and Elasticsearch is crucial for ensuring data consistency. Choose the approach that best suits your needs and infrastructure.
Consider the operational overhead and cost associated with managing a separate Elasticsearch cluster when making your decision.

By understanding the strengths and weaknesses of each technology, we can make informed decisions about when to use PostgreSQL FTS and when to leverage the power of Elasticsearch to deliver the best possible job search experience to our users across Europe.