# Pulse CV External Assets Audit ## 1. Executive Verdict Pulse CV כבר נמצא על מסלול ארכיטקטוני טוב יותר מרוב נכסי ה-"ATS" החיצוניים שנבדקו. רובם דוחפים חזרה ל: - flat keyword matching - black-box resume scoring - English-only assumptions - synthetic pair scores ללא explainability המסקנה העיקרית: - **לא לאמץ אף אחד מהנכסים ישירות ללוגיקת production core** - **כן לבדוק מספר נכסים צרי-היקף כ-benchmark / weak supervision / narrow parser utility** - **לדחות מהר את רוב ה-repos וה-models שמוכרים "ATS matching" בלי evidence model** הנכסים היחידים ששווים בדיקה זהירה: - `Priyanka-Balivada/en_Resume_Matching_Keywords` - רק כ-reference / parser-side auxiliary לנושאי CV entity extraction באנגלית - `VaishnaviGude/ats-resume-dataset-1lakh` - רק כ-synthetic weak supervision / calibration stress-test - `AzharAli05/Resume-Screening-Dataset` - רק כ-parser benchmark / resume structure stress-test - `cgy11102/pairwise-ai-matching` - רק כרפרנס ל-typed payload / ranking experiment / UI inspiration מצומצם התשובה לשאלה המרכזית: > כן. Pulse CV כבר נמצא על מסלול טוב ואמין יותר מרוב נכסי ה-ATS החיצוניים הללו. ## 2. Asset-by-Asset Audit Table | Asset | Type | Real value | Risks / weaknesses | Best use in Pulse CV | Better than current Pulse CV? | Final decision | |---|---|---|---|---|---|---| | `VaishnaviGude/ats-resume-dataset-1lakh` | HF dataset | גדול יחסית (121k pairs), כולל score-like columns | נראה synthetic/heuristic by construction; אין evidence annotations; English-style pair scoring bias; עלול ללמד בדיוק את מה ש-Pulse CV מנסה להחליף | Weak supervision only; calibration stress-test | Only for a narrow subtask | **Use only as weak-supervision / synthetic calibration input** | | `AzharAli05/Resume-Screening-Dataset` | HF dataset | dataset ציבורי, MIT, יכול לשמש stress-test ל-CV parsing | card דל; לא ברור labeling depth; לא נראה JD-vs-CV explainable corpus | Resume parser benchmark only | Only for a narrow subtask | **Benchmark/reference only** | | `Youseff1987/resume-matching-dataset-v2` | HF dataset | corpus גדול יחסית | Korean-only (`language:ko`); grading-oriented; לא מתאים לדומיין manufacturing שלנו | אולי stress-test רחוק מאוד | No | **Reject for current product path** | | `Divyanandh/resume-matching-dataset-v2` | HF dataset | קיים טכנית | metadata דל מאוד; אין cardData; נראות נמוכה; איכות לא ברורה | None | No | **Reject** | | `jminc/resume-matching-dataset-v2` | HF dataset | קיים טכנית | metadata דל; downloads נמוכים; source/labeling לא ברור | None | No | **Reject** | | `Philseok/resume_fit_dataset` | HF dataset | tiny dataset with splits | זעיר מאוד (244 train); לא מספק ל-calibration אמיתי; לא evidence-based | Benchmark toy set only | No | **Reject for production use; optional tiny benchmark only** | | `InferencePrince555/Resume-Dataset` | HF dataset | Apache-2.0, מעט קהל שימוש, יכול לעזור ב-plain resume parsing stress | נראה יותר corpus של resumes מאשר matching/evidence set; לא domain-specific | Resume parsing stress-test only | Only for a narrow subtask | **Benchmark/reference only** | | `C0ldSmi1e/resume-dataset` | HF dataset | small English resume dataset | English-only; small; לא תומך JD parsing או evidence mapping | Parser stress-test only | No | **Reject / optional parser smoke data only** | | `MikePfunk28/resume-training-dataset` | HF dataset | multilingual-ish tags (`en/vi/zh`) | לא נראה JD-vs-CV fit corpus; נראה summaries/text collection; unclear resume provenance | Weak parser benchmark at most | No | **Reject for core; optional parsing experiment only** | | `zoraizbinsamee/resume-job-matching-sbert` | HF model | sentence-similarity fine-tune מעל MiniLM; ייתכן שימוש כרכיב ניסויי ל-ranking | black-box similarity; dataset קטן (~4457); no explainable evidence; likely English/generalist | Offline comparator / reranker experiment only | No | **Use only as auxiliary offline ranking experiment** | | `Priyanka-Balivada/en_Resume_Matching_Keywords` | HF model | NER model עם labels שימושיים: SKILLS, QUALIFICATION, EXPERIENCE, CERTIFICATIONS וכו' | English-only; F1 ~72 לא מספיק ל-source of truth; resume-side בלבד; לא JD typed extraction | Optional CV parser helper behind flag | Only for a narrow subtask | **Adopt only as optional parser helper / benchmark** | | `Resume-screener/Skill-matching` | HF model | שם מרמז על skill matching | כמעט אין metadata, אין pipeline ברור, שימוש אפסי | None | No | **Reject** | | `Resume-screener/experience-matching-model` | HF model | כמעט none | 0 downloads, no card, no evaluation, no explainability | None | No | **Reject** | | `jminc/llama3-lora-resume-matching-r64` | HF model | LoRA experiment exists | black-box LLM fine-tune; no evaluation card; no evidence traceability; likely synthetic | None | No | **Reject** | | `vbanwari/JobApplicationAssistant` | HF Space | product/UI inspiration for helper flow | Gradio app; broad assistant framing; not evidence engine | UI inspiration only | No | **UI inspiration only** | | `csccorner/Agentic-Resume-Parser` | HF Space | parser UX inspiration | “agentic” parser style, but likely opaque extraction; not domain-aware or evidence-grounded | UI / parser workflow inspiration only | No | **UI inspiration only** | | `sakthiiiiivel/ats-optimizer` | GitHub repo | very small proof-of-concept | almost empty, no license, thin README, no clear evaluation | None | No | **Reject fast** | | `ai-naymul/ATS-Catalyst` | GitHub repo | product messaging / onboarding inspiration at most | “magical tool” framing, AI HR theater, score-first positioning, no license | Surface UX inspiration only | No | **Reject for logic; UI inspiration only** | | `cgy11102/pairwise-ai-matching` | GitHub repo | typed API payloads, ranking flow, clear architecture notes, UI flow ideas | embeddings+LLM evidence pipeline is not auditable enough; not manufacturing-specific; no deterministic requirement/evidence core | Narrow ranking experiment / payload/UI reference | Only for a narrow subtask | **Benchmark/reference only** | | `fosetorico/resume_ATS_scanner` | GitHub repo | typical ATS scanner baseline; can be used as "what not to regress toward" | Gemini + keyword extraction + score-first; classic ATS theater; no evidence model | Comparator / anti-pattern reference only | No | **Reject** | | `Ujjwal226/Advanced-ATS-Resume-Checker` | GitHub repo | little visible value | README effectively empty; no license; low signal | None | No | **Reject fast** | ## 3. Recommended Integration Map ### Parsing - `Priyanka-Balivada/en_Resume_Matching_Keywords` - optional CV entity extraction benchmark/helper - use only behind feature flag - compare against current deterministic CV parse ### Benchmarking - `AzharAli05/Resume-Screening-Dataset` - parser robustness benchmark - `InferencePrince555/Resume-Dataset` - resume parsing stress-test - `VaishnaviGude/ats-resume-dataset-1lakh` - weak-supervision / calibration stress-test, not truth source - `cgy11102/pairwise-ai-matching` - compare ranking outputs against our evidence-based engine in offline scenarios ### Weak Supervision / Synthetic Data - `VaishnaviGude/ats-resume-dataset-1lakh` - use only to test threshold behavior and coarse score monotonicity - do not use as gold labels ### Ranking Experiments - `zoraizbinsamee/resume-job-matching-sbert` - offline only - can provide one auxiliary semantic score for comparison - `cgy11102/pairwise-ai-matching` - architecture reference for optional reranker experiments ### UI Inspiration - `vbanwari/JobApplicationAssistant` - `csccorner/Agentic-Resume-Parser` - small surface inspiration from `ai-naymul/ATS-Catalyst` ## 4. Hard Rejection List These assets should **not** be used in Pulse CV production logic: - `Youseff1987/resume-matching-dataset-v2` - Korean-only, grading-style dataset, wrong domain/language bias - `Divyanandh/resume-matching-dataset-v2` - weak metadata, unclear provenance, low confidence in quality - `jminc/resume-matching-dataset-v2` - same issue: opaque, low-signal, no evidence structure - `Philseok/resume_fit_dataset` - too small to matter for real calibration - `C0ldSmi1e/resume-dataset` - too generic and English-only for our current evidence architecture - `MikePfunk28/resume-training-dataset` - not aligned to JD-vs-CV evidence matching - `Resume-screener/Skill-matching` - nearly no usable documentation or evaluation - `Resume-screener/experience-matching-model` - effectively unusable from an engineering trust standpoint - `jminc/llama3-lora-resume-matching-r64` - black-box matching LoRA with no traceability - `sakthiiiiivel/ats-optimizer` - too thin and under-specified - `fosetorico/resume_ATS_scanner` - actively pulls Pulse CV backward to score/keyword theater - `Ujjwal226/Advanced-ATS-Resume-Checker` - insufficient documentation and no reliable signal ## 5. Codex Implementation Advice ### Asset: `Priyanka-Balivada/en_Resume_Matching_Keywords` - Touch: - [engine.ts](C:\Users\user\Documents\Playground\pulse-cv\server\analysis\engine.ts) - [types.ts](C:\Users\user\Documents\Playground\pulse-cv\server\analysis\types.ts) - add new adapter file under `server/analysis/` such as `cvEntityAux.ts` - Role: - **optional** - parser-side auxiliary only - not source of truth - Scope: - offline eval first - hidden behind feature flag, e.g. `ENABLE_AUX_EN_CV_NER` ### Asset: `VaishnaviGude/ats-resume-dataset-1lakh` - Touch: - [scripts/run-manufacturing-benchmark.ts](C:\Users\user\Documents\Playground\pulse-cv\scripts\run-manufacturing-benchmark.ts) - add synthetic calibration script under `scripts/` - add docs under `docs/validation-report.md` - Role: - **offline only** - weak supervision / calibration stress-test only - Scope: - never production primary - no direct model fitting into live scoring without human-reviewed benchmark ### Asset: `AzharAli05/Resume-Screening-Dataset` - Touch: - parsing evaluation scripts - benchmark fixtures directory under `benchmark/` - Role: - benchmark only - parser stress-test only - Scope: - optional - no live dependency ### Asset: `zoraizbinsamee/resume-job-matching-sbert` - Touch: - add optional semantic comparator under `server/analysis/` or `intelligence/` - log comparator output into telemetry only - Role: - auxiliary offline signal - not production truth source - Scope: - feature flag only - use to compare score ordering, not to replace evidence map ### Asset: `cgy11102/pairwise-ai-matching` - Touch: - mostly none; use as architecture reference - optionally borrow response-shape or ranking experiment patterns - Role: - parser/API/UI inspiration only - maybe offline ranking baseline - Scope: - no direct code import - no production dependency ## Final Answer to the Most Important Question Yes. **Pulse CV is already on a better path than these external ATS assets**. Why: - it preserves deterministic truth - it exposes evidence - it supports must-have logic explicitly - it already includes JD quality review - it is verticalized for manufacturing/industrial use - it is less vulnerable to keyword theater and opaque black-box scoring The right move is **not** to import these assets into the core engine. The right move is to use a very small subset of them in supporting roles: - benchmark - parser stress-testing - weak supervision - optional comparator experiments