Building an AI Job Matching Engine: From CV Upload to Ranked Results in Under a Second

Upload a CV. Get matched to 1,000,000+ positions. See why each job is a good fit. All in under a second.

Here’s how we built the matching engine behind MisuJob — and why the scoring algorithm matters more than the AI model.

The Pipeline

CV Upload (PDF/DOCX)
    │
    ▼
LLM Extraction
    │  → Skills: ["Python", "PostgreSQL", "AWS", "Docker"]
    │  → Experience: 5 years
    │  → Languages: ["English", "German"]
    │  → Preferences: remote, Berlin, €60-80K
    │
    ▼
PostgreSQL Filtering
    │  → WHERE skills overlap + location match + remote preference
    │  → ~2,000 candidate jobs (from 1M)
    │
    ▼
Scoring Engine
    │  → Weighted multi-factor scoring
    │  → Skills: 40%, Location: 25%, Experience: 20%, Recency: 15%
    │
    ▼
Age Penalty
    │  → -0.5% per day since posting
    │
    ▼
Top 50 Matches with Explanations

Why Database First, AI Second

The naive approach: send all 1M jobs to an AI model for scoring. That’s insane. Even at 1ms per comparison, that’s 1,000 seconds.

Instead, we use PostgreSQL as a coarse filter:

SELECT id, title, company, location, skills, posted_date
FROM jobs
WHERE is_active = true
  AND visibility = 'public'
  AND skills && $1::text[]  -- Array overlap
  AND (remote_type = $2 OR $2 IS NULL)
  AND (location ILIKE $3 OR $3 IS NULL)
LIMIT 2000;

The && operator with a GIN index on skills makes this instant. From 1M jobs, we get ~2,000 candidates in under 50ms.

Then the scoring engine runs on those 2,000 — a much more tractable problem.

The Scoring Formula

function calculateMatch(job: Job, profile: UserProfile): number {
  const skillScore = calculateSkillOverlap(job.skills, profile.skills);
  const locationScore = calculateLocationMatch(job.location, profile.preferences);
  const experienceScore = calculateExperienceMatch(job, profile.experience);
  const recencyScore = calculateRecency(job.posted_date);

  const raw = (skillScore * 0.40) +
              (locationScore * 0.25) +
              (experienceScore * 0.20) +
              (recencyScore * 0.15);

  // Age penalty: jobs get stale
  const daysOld = daysSince(job.posted_date);
  const agePenalty = Math.max(0, 1 - (daysOld * 0.005));

  return raw * agePenalty;
}

Why recency matters: A 90% skill match posted today beats a 95% match posted 60 days ago. The position is likely filled.

Why the age penalty is separate: It’s recalculated daily in batch (not per-request), so match scores naturally decay over time.

The Daily Recalculation

Every night, we recalculate age penalties for all active matches:

// Keyset pagination over millions of rows
let lastUserId = 0, lastJobId = 0;

while (true) {
  const batch = await pool.query(`
    SELECT user_id, job_id, match_percentage, calculated_at
    FROM job_matches
    WHERE match_percentage >= 50
      AND (user_id, job_id) > ($1, $2)
    ORDER BY user_id, job_id
    LIMIT 1000
  `, [lastUserId, lastJobId]);

  if (batch.rows.length === 0) break;

  // Recalculate age penalty for each match
  for (const match of batch.rows) {
    const newScore = applyAgePenalty(match);
    await updateMatchScore(match.user_id, match.job_id, newScore);
  }

  lastUserId = batch.rows.at(-1).user_id;
  lastJobId = batch.rows.at(-1).job_id;
}

This processes millions of records in under 10 minutes using keyset pagination.

What We Learned

The model doesn’t matter as much as the scoring weights. We spent weeks tuning skill overlap vs location vs experience weights. The LLM extraction was the easy part.
Users care about “why” not just “what.” Showing “85% match: 5/6 skills match, location matches, posted 3 days ago” gets more engagement than just showing a number.
Recency is king. A stale job with perfect skills is worse than a fresh job with 80% skills. People want to apply to positions that are actually open.

Try it yourself: upload your CV at MisuJob and see your matches.

Building recommendation systems? We’d love to hear how you handle scoring and ranking.