File size: 7,841 Bytes

f62c1b4

---
license: apache-2.0
base_model: LiquidAI/LFM2.5-1.2B-Instruct
tags:
  - recruitment
  - cv-matching
  - keyword-extraction
  - resume-screening
  - hr-tech
  - sft
  - dpo
  - lora
  - unsloth
datasets:
  - custom
language:
  - en
pipeline_tag: text-generation
model-index:
  - name: LFM2.5-1.2B-MOAT
    results:
      - task:
          type: text-generation
          name: CV-JD Assessment
        metrics:
          - name: Score MAE
            type: mae
            value: 6.82
          - name: JSON Parse Rate
            type: accuracy
            value: 99.9
          - name: Verdict Accuracy
            type: accuracy
            value: 76.8
          - name: Score Bias
            type: custom
            value: 1.53
---

# LFM2.5-1.2B-MOAT

**M**ulti-task **O**ptimized **A**ssessment **T**ool — a finetuned [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) model for recruitment AI.

Handles two tasks with a single model:
1. **CV-JD Assessment** — Match scoring + qualitative analysis
2. **Keyword Extraction** — Structured keyword extraction from job descriptions and CVs

## Training

- **Base model**: LiquidAI/LFM2.5-1.2B-Instruct (1.2B params, hybrid Mamba2 + Attention)
- **Stage 1 — Multi-task SFT**: 39,641 examples (19,588 assessments + 20,053 keywords), LoRA r=32/α=64, 1 epoch, LR=5e-5
- **Stage 2 — Targeted DPO**: 2,374 filtered problematic pairs (|score diff| ≥ 5pts), LoRA r=16/α=32, beta=0.2, LR=5e-6
- **Hardware**: NVIDIA RTX 5080 16GB, total training time ~3.5 hours
- **Training data**: Gemini-generated assessments and keyword extractions across tech, healthcare, finance, and blue collar domains

## Performance

### CV-JD Assessment (4,898 held-out samples)

| Metric | V1 Baseline | MOAT V2 | Target |
|--------|------------|---------|--------|
| JSON Parse Rate | 97.0% | **99.9%** | ≥95% |
| Score MAE | 13.1 pts | **6.82 pts** | <8 |
| Score Bias | -13.0 pts | **+1.53 pts** | ~0 |
| Verdict Accuracy | 50.0% | **76.8%** | >60% |
| Within 5 pts | — | **51.4%** | — |
| Within 10 pts | — | **77.5%** | — |
| Median Absolute Error | — | **4.90 pts** | — |

### Keyword Extraction (10 diverse samples across domains)

| Field | Accuracy |
|-------|----------|
| JSON Parse Rate | 100% |
| Schema Complete | 100% |
| Experience Years | 100% |
| Domain | 90% |
| Education | 80% |
| Seniority | 80% |
| Skills (avg F1) | 0.58 |

Skills F1 varies by domain: white collar (0.74-0.84) > blue collar/healthcare (0.33-0.58). The model extracts correct skills but sometimes at different granularity than reference labels.

## Usage with vLLM

```python
from vllm import LLM, SamplingParams

model = LLM(
    model="GazTrab/LFM2.5-1.2B-MOAT",
    max_model_len=4096,
    gpu_memory_utilization=0.85,
    dtype="bfloat16",
    trust_remote_code=True,
    max_num_seqs=64,
)
tokenizer = model.get_tokenizer()

sampling_params = SamplingParams(
    temperature=0.1,
    top_p=0.1,
    top_k=50,
    repetition_penalty=1.05,
    max_tokens=2048,
)

# Build prompt using chat template
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)
```

### Important Notes

- **max_model_len=4096** — the model was trained with this context length
- **temperature=0.1, top_p=0.1** — low temperature for consistent structured output
- **trust_remote_code=True** — required for the LFM2.5 architecture (hybrid Mamba2 + Attention)
- Prompts exceeding ~2048 tokens should be truncated (leave room for generation)
- The model outputs raw JSON — no markdown fences needed

## Usage with Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "GazTrab/LFM2.5-1.2B-MOAT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=2048,
    temperature=0.1,
    top_p=0.1,
    top_k=50,
    repetition_penalty=1.05,
    do_sample=True,
)
response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```

## Task Prompts

### Task 1: CV-JD Assessment

**System prompt:**
```
You are an expert recruitment AI that analyzes CV-JD compatibility.
You MUST respond with valid JSON only. No additional text before or after the JSON.

Output schema:
{
  "match_score": <float 0-100>,
  "executive_summary": "<2-3 sentence overview>",
  "strengths": ["<quantified strength 1>", "<quantified strength 2>", ...],
  "gaps": ["<specific gap 1>", "<specific gap 2>", ...],
  "recommendation": "Interview|Consider|Not recommended",
  "verdict": "STRONG_MATCH|GOOD_MATCH|MODERATE_MATCH|WEAK_MATCH|NOT_SUITABLE"
}

Guidelines:
- Be specific and quantified in strengths/gaps (e.g., "5/7 required skills", "3 years below requirement")
- Reference actual skills from the JD and CV
- Verdict must align with match_score brackets
- Keep strengths and gaps to 2-4 items each
```

**User prompt format:**
```
Analyze the following CV against the Job Description and provide a structured assessment.

=== JOB DESCRIPTION ===
{jd_text}

=== CANDIDATE CV ===
{cv_text}

Respond with JSON only:
```

**Verdict-to-score mapping:**
| Verdict | Score Range |
|---------|-------------|
| STRONG_MATCH | 85-100 |
| GOOD_MATCH | 70-84 |
| MODERATE_MATCH | 50-69 |
| WEAK_MATCH | 30-49 |
| NOT_SUITABLE | 0-29 |

### Task 2: Keyword Extraction

**System prompt:**
```
You are an expert recruitment AI that extracts structured keywords from documents.
You MUST respond with valid JSON only. No additional text before or after the JSON.

Output schema:
{
  "skills": ["<skill 1>", "<skill 2>", ...],
  "experience_years": <integer>,
  "education": "<phd|master|bachelor|associate|diploma|certificate|high_school|none>",
  "certifications": ["<cert 1>", "<cert 2>", ...],
  "domain": "<2-4 word domain>",
  "seniority": "<intern|junior|mid|senior|lead|principal|director|manager>"
}

Guidelines:
- Extract only explicitly stated skills, not inferred ones
- For CVs: infer experience_years from work history dates
- For JDs: use the stated requirement, or 0 if not specified
- Skills should be lowercase
- Keep domain to 2-4 words
```

**User prompt format (for JDs):**
```
Extract structured keywords from the following Job Description.

=== JOB DESCRIPTION ===
{jd_text}

Respond with JSON only:
```

**User prompt format (for CVs):**
```
Extract structured keywords from the following CV/Resume.

=== CANDIDATE CV ===
{cv_text}

Respond with JSON only:
```

## Limitations

- **Low-score bias**: Scores in the 0-20 range tend to be overestimated by ~8 points (model struggles to score below ~17)
- **Blue collar granularity**: Keyword extraction for trade/blue collar roles sometimes outputs overly verbose skill descriptions
- **Training data domains**: Primarily trained on tech, healthcare, and finance — generalizes to other domains but with slightly lower quality
- **Context length**: Long CVs or JDs may need truncation to stay within the 2048-token prompt budget

## Citation

```bibtex
@misc{gaztrab2026moat,
  title={LFM2.5-1.2B-MOAT: Multi-task Optimized Assessment Tool for Recruitment},
  author={GazTrab},
  year={2026},
  url={https://huggingface.co/GazTrab/LFM2.5-1.2B-MOAT}
}
```