From 30 Seconds to 50ms: Scaling PostgreSQL for One Million Job Listings

How we made a single PostgreSQL instance handle 1M+ rows, 55 indexes, real-time matching, and 10K daily writes without breaking a sweat.

We run MisuJob, a job matching platform that processes over one million tech job listings. The entire system runs on a single PostgreSQL instance. No Elasticsearch. No Redis search. Just Postgres, a materialized view, and a lot of hard-learned lessons.

Here’s how we got query times from 30 seconds down to 50ms — and the mistakes that got us to 30 seconds in the first place.

The Problem

Every day, our system imports 10,000 to 100,000 new job listings from 50+ ATS platforms. Each job has a title, company, location, description, skills array, salary range, remote type, and a dozen other fields.

Users upload their CV, and our AI extracts their skills, experience, and preferences. Then we need to match them against all 1M+ active jobs and return results in under a second.

Sounds like a job for Elasticsearch, right? That’s what everyone told us. Here’s why we didn’t listen, and why we’re glad we didn’t.

Why Not Elasticsearch?

Three reasons:

Operational complexity. We run this on a single GCP VM. Adding an ES cluster means another system to monitor, back up, and debug at 3 AM when it runs out of heap.
Data consistency. Every job has a is_active flag, a visibility field, and a skills array that gets updated by multiple services. With ES, I’d need to keep two data stores in sync. With Postgres, there’s one source of truth.
PostgreSQL is more capable than people think. With the right indexes, ILIKE on short strings is fast. ANY(array) with GIN indexes is fast. tsvector full-text search is fast. You just need to know what NOT to do.

Mistake #1: ILIKE on Long Text Fields

Our category pages (like Python Jobs or Remote Jobs) filter jobs using WHERE clauses. Our first version:

-- The query that took 30+ seconds
SELECT * FROM jobs
WHERE is_active = true
  AND visibility = 'public'
  AND (
    'Python' = ANY(skills)
    OR title ILIKE '%python%'
    OR description ILIKE '%python%'  -- THIS IS THE PROBLEM
  )
ORDER BY posted_date DESC
LIMIT 50;

description ILIKE '%python%' on 1M rows means PostgreSQL has to read every single description (average 2KB each) and do a case-insensitive substring match. That’s a full sequential scan of ~2GB of data. Every. Single. Request.

The fix was brutal but effective: don’t search descriptions.

-- The query that takes <100ms
SELECT * FROM jobs
WHERE is_active = true
  AND visibility = 'public'
  AND (
    'Python' = ANY(skills)
    OR title ILIKE '%python%'
  )
ORDER BY posted_date DESC NULLS LAST
LIMIT 50;

Why does this work? Because:

skills is a TEXT[] array with a GIN index → instant lookup
title is a short string (avg 50 chars) → ILIKE on short strings is fast even without an index
description is a long text field → ILIKE causes a full table scan

Lesson: Index-friendly filtering on structured fields will always beat full-text search on unstructured blobs. If you need description search, use tsvector with a GIN index — not ILIKE.

Mistake #2: COUNT(*) on Complex Filters

Every category page showed “Showing 1-50 of 12,847 Python jobs.” To get that number:

-- This query alone took 15-20 seconds
SELECT COUNT(*) FROM jobs
WHERE is_active = true
  AND visibility = 'public'
  AND ('Python' = ANY(skills) OR title ILIKE '%python%');

COUNT(*) with complex WHERE clauses is PostgreSQL’s kryptonite. It has to scan all matching rows just to count them. There’s no shortcut.

The fix: Don’t count. Estimate.

// Fetch pageSize + 1 to check if there are more results
const result = await readPool.query(`
  SELECT id, title, company, location, ...
  FROM jobs
  WHERE is_active = true AND visibility = 'public'
    AND ('Python' = ANY(skills) OR title ILIKE '%python%')
    AND posted_date < $1
  ORDER BY posted_date DESC NULLS LAST
  LIMIT $2
`, [lastSeenDate, pageSize + 1]);

const hasMore = result.rows.length > pageSize;
const jobs = hasMore ? result.rows.slice(0, pageSize) : result.rows;

Does the user see “Showing Python jobs” without an exact count? Yes. Do they care? No. Does the page load 20 seconds faster? Yes.

Lesson: Exact counts are a luxury. If your query is slow, estimate.

Mistake #3: OFFSET Pagination on Large Tables

-- Page 1: instant
SELECT * FROM jobs ORDER BY id LIMIT 50 OFFSET 0;

-- Page 100: 3 seconds
SELECT * FROM jobs ORDER BY id LIMIT 50 OFFSET 5000;

-- Page 1000: 30 seconds
SELECT * FROM jobs ORDER BY id LIMIT 50 OFFSET 50000;

OFFSET N means PostgreSQL fetches N + LIMIT rows, then throws away the first N. At OFFSET 50000, it’s scanning 50,050 rows to return 50. The deeper you paginate, the slower it gets.

The fix: Keyset pagination.

-- Always instant, regardless of "page" depth
SELECT * FROM jobs
WHERE (user_id, job_id) > ($lastUserId, $lastJobId)
ORDER BY user_id, job_id
LIMIT 50;

With keyset pagination, PostgreSQL uses the index to jump directly to the right position. Page 1 and page 1,000 take the same time.

We use this for our DailyAgePenaltyRecalculator, which iterates over millions of job match records. Before keyset pagination: 83 seconds per page. After: < 5 seconds per page.

The index that makes this work:

CREATE INDEX idx_job_matches_stale_recalc
ON job_matches (user_id, job_id)
WHERE match_percentage >= 50;

Partial indexes are PostgreSQL’s secret weapon. This index only includes rows where match_percentage >= 50, which is exactly what the recalculator queries. Smaller index = faster lookups.

Mistake #4: Refreshing Materialized Views During Peak Hours

We have a materialized view mv_job_search that pre-joins jobs with match scores for the search page. Refreshing it takes 2-3 minutes and locks the table.

-- This blocks all reads for 2-3 minutes
REFRESH MATERIALIZED VIEW mv_job_search;

During a refresh, every search query hangs. For 2-3 minutes. In production.

The fix: CONCURRENTLY.

-- This allows reads during refresh
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_job_search;

The CONCURRENTLY keyword requires a unique index on the view, but it allows reads while the refresh happens in the background. No more blocked queries.

-- Required for CONCURRENTLY
CREATE UNIQUE INDEX mv_job_search_user_job_idx
ON mv_job_search (user_id, job_id);

Lesson: Always use REFRESH MATERIALIZED VIEW CONCURRENTLY in production. The unique index requirement is a small price to pay.

The AI Matching Layer

Once PostgreSQL returns candidate jobs efficiently, the AI matching layer scores them. Here’s the simplified flow:

User uploads CV
    │
    v
LLM extracts: skills, experience, preferences, languages
    │
    v
PostgreSQL: Filter jobs by skills overlap + location + remote preference
    │
    v
Scoring engine: Weighted match (skills 40%, location 25%, experience 20%, recency 15%)
    │
    v
Age penalty: -0.5% per day since posting (jobs get stale)
    │
    v
Return top matches with explanation

The key insight: use the database for filtering, use application code for scoring. PostgreSQL is great at “give me all Python jobs in Berlin” but terrible at “rank these by how well they match this specific person’s career trajectory.”

The age penalty recalculation runs daily across all active matches. With keyset pagination and partial indexes, it processes millions of records in under 10 minutes.

The Numbers

Metric	Before	After
Category page load	30+ seconds	< 100ms
Search query	18-30 seconds	< 50ms (cached)
Daily recalculation	83s/page	< 5s/page
MV refresh	Blocked reads 3 min	Zero blocking
Total active jobs	-	1,000,000+
PostgreSQL indexes	-	55
Uptime	Frequent timeouts	99.9%

55 Indexes: Too Many?

Yes, probably. But here’s the thing — on a write-heavy table that also needs fast reads across dozens of query patterns, you either add indexes or you add Elasticsearch. We chose indexes.

The trade-off is slower writes. Each INSERT updates 55 indexes. But our writes are batched (bulk imports at night) while reads are real-time (users browsing). The trade-off makes sense for our workload.

-- Some of our most impactful indexes
CREATE INDEX idx_jobs_active_public ON jobs (is_active, visibility)
  WHERE is_active = true AND visibility = 'public';

CREATE INDEX idx_jobs_skills_gin ON jobs USING GIN (skills);

CREATE INDEX idx_jobs_content_dedup ON jobs (lower(title), lower(company))
  WHERE is_active = true;

CREATE INDEX idx_jobs_posted_date ON jobs (posted_date DESC NULLS LAST)
  WHERE is_active = true AND visibility = 'public';

What I’d Do Differently

Start with partial indexes. Full indexes on a 1M row table waste space and slow writes. Most queries filter by is_active = true, so every index should have that as a WHERE clause.
Implement stale-while-revalidate caching from day one. Our PublicJobsCacheService serves cached data immediately and refreshes in the background. This turned 18-second cold calls into 50ms responses. We should have built this first, not after users complained.
Never use COUNT(*) on user-facing queries. Estimate. Always estimate.
Monitor slow queries in production. We log any query over 5 seconds. This caught the ILIKE '%...%' on description before it became a permanent problem.

Try It

You can see all of this in action at MisuJob. Browse 1M+ tech jobs, filter by technology, location, or work type. Every page loads in under 100ms, served by a single PostgreSQL instance.

If you’re considering adding Elasticsearch to your stack just for search — try optimizing Postgres first. You might be surprised.

Have you pushed PostgreSQL to its limits? We’d love to hear your optimization stories. Find us on LinkedIn or check out MisuJob.