vineetdaniels
/

NYXMed-V17-Model

@@ -39,13 +39,14 @@ V17 is a **LoRA adapter** trained on top of [`vineetdaniels/NYXMed-V16-Model`](h
 | Metric | V16 (base) | **V17 (this)** | Δ |
 |---|---|---|---|
 | **Final eval_loss** | ~0.25 | **0.0824** | **−67%** |
-| Base model | Llama-3-70B-Instruct | Same | — |
-| LoRA params trained | r=64, α=128 | r=64, α=128 | — |
 | Train examples | ~67K | **113,032** | +69% |
 | Adds Exam Description + Reason | ❌ | ✅ | — |
-> Production accuracy on held-out radiology reports is validated separately. See **Evaluation** below.
 ---
@@ -90,16 +91,21 @@ Early stopping triggered at step 3,900 (1.1 epochs); `load_best_model_at_end=Tru
 ### Domain-specific accuracy
 | Metric | V17 |
 |---|---|
-| CPT exact match | _validation in progress_ |
-| Primary CPT match | _validation in progress_ |
-| Modifier exact match | _validation in progress_ |
-| ICD-10 exact match | _validation in progress_ |
-| ICD-10 root-overlap | _validation in progress_ |
-| Mean ICD recall | _validation in progress_ |
-(Will be updated once the 500-record live validation completes.)
 ---

 | Metric | V16 (base) | **V17 (this)** | Δ |
 |---|---|---|---|
+| **CPT exact match** | ~85% | **90.6%** | **+5.6 pts** |
+| **Modifier exact match** | ~95% | **97.0%** | +2.0 pts |
+| **Mean ICD recall** | ~65% | **83.4%** | **+18.4 pts** |
 | **Final eval_loss** | ~0.25 | **0.0824** | **−67%** |
 | Train examples | ~67K | **113,032** | +69% |
 | Adds Exam Description + Reason | ❌ | ✅ | — |
+V17 was trained to push **ICD recall above 80%** without regressing CPT — both goals achieved. Full metric breakdown in **Evaluation** below.
 ---
 ### Domain-specific accuracy
+Measured on **n = 500** randomly sampled held-out radiology reports (greedy decoding, batch=4, 4×H200):
 | Metric | V17 |
 |---|---|
+| **CPT exact match** | **90.60%** |
+| Primary CPT match | 91.40% |
+| **Modifier exact match** | **97.00%** |
+| **ICD-10 exact match** (full set) | 69.60% |
+| ICD-10 any-overlap | 90.40% |
+| ICD-10 root-overlap (`A99.x`-level) | 92.20% |
+| **Mean ICD recall** | **83.37%** |
+| Mean ICD precision | 85.05% |
+| All-three exact (CPT + MOD + full ICD set) | 64.00% |
+V17's primary training objective — **raise ICD recall above 80%** — was met (83.37%) while CPT (90.6%) and Modifier (97.0%) far exceeded the no-regression floor. Code-set-overlap metrics show V17 is identifying the correct *family* of ICD codes 92% of the time, with most remaining errors being specificity refinements (e.g. predicting `M25.5` instead of `M25.511`) rather than wrong-diagnosis errors.
 ---