vineetdaniels commited on
Commit
5a66960
·
verified ·
1 Parent(s): 98b28cf

Add real 500-record validation results: ICD recall 83.4%, CPT 90.6%, Modifier 97.0%

Browse files
Files changed (1) hide show
  1. README.md +17 -11
README.md CHANGED
@@ -39,13 +39,14 @@ V17 is a **LoRA adapter** trained on top of [`vineetdaniels/NYXMed-V16-Model`](h
39
 
40
  | Metric | V16 (base) | **V17 (this)** | Δ |
41
  |---|---|---|---|
 
 
 
42
  | **Final eval_loss** | ~0.25 | **0.0824** | **−67%** |
43
- | Base model | Llama-3-70B-Instruct | Same | — |
44
- | LoRA params trained | r=64, α=128 | r=64, α=128 | — |
45
  | Train examples | ~67K | **113,032** | +69% |
46
  | Adds Exam Description + Reason | ❌ | ✅ | — |
47
 
48
- > Production accuracy on held-out radiology reports is validated separately. See **Evaluation** below.
49
 
50
  ---
51
 
@@ -90,16 +91,21 @@ Early stopping triggered at step 3,900 (1.1 epochs); `load_best_model_at_end=Tru
90
 
91
  ### Domain-specific accuracy
92
 
 
 
93
  | Metric | V17 |
94
  |---|---|
95
- | CPT exact match | _validation in progress_ |
96
- | Primary CPT match | _validation in progress_ |
97
- | Modifier exact match | _validation in progress_ |
98
- | ICD-10 exact match | _validation in progress_ |
99
- | ICD-10 root-overlap | _validation in progress_ |
100
- | Mean ICD recall | _validation in progress_ |
101
-
102
- (Will be updated once the 500-record live validation completes.)
 
 
 
103
 
104
  ---
105
 
 
39
 
40
  | Metric | V16 (base) | **V17 (this)** | Δ |
41
  |---|---|---|---|
42
+ | **CPT exact match** | ~85% | **90.6%** | **+5.6 pts** |
43
+ | **Modifier exact match** | ~95% | **97.0%** | +2.0 pts |
44
+ | **Mean ICD recall** | ~65% | **83.4%** | **+18.4 pts** |
45
  | **Final eval_loss** | ~0.25 | **0.0824** | **−67%** |
 
 
46
  | Train examples | ~67K | **113,032** | +69% |
47
  | Adds Exam Description + Reason | ❌ | ✅ | — |
48
 
49
+ V17 was trained to push **ICD recall above 80%** without regressing CPT — both goals achieved. Full metric breakdown in **Evaluation** below.
50
 
51
  ---
52
 
 
91
 
92
  ### Domain-specific accuracy
93
 
94
+ Measured on **n = 500** randomly sampled held-out radiology reports (greedy decoding, batch=4, 4×H200):
95
+
96
  | Metric | V17 |
97
  |---|---|
98
+ | **CPT exact match** | **90.60%** |
99
+ | Primary CPT match | 91.40% |
100
+ | **Modifier exact match** | **97.00%** |
101
+ | **ICD-10 exact match** (full set) | 69.60% |
102
+ | ICD-10 any-overlap | 90.40% |
103
+ | ICD-10 root-overlap (`A99.x`-level) | 92.20% |
104
+ | **Mean ICD recall** | **83.37%** |
105
+ | Mean ICD precision | 85.05% |
106
+ | All-three exact (CPT + MOD + full ICD set) | 64.00% |
107
+
108
+ V17's primary training objective — **raise ICD recall above 80%** — was met (83.37%) while CPT (90.6%) and Modifier (97.0%) far exceeded the no-regression floor. Code-set-overlap metrics show V17 is identifying the correct *family* of ICD codes 92% of the time, with most remaining errors being specificity refinements (e.g. predicting `M25.5` instead of `M25.511`) rather than wrong-diagnosis errors.
109
 
110
  ---
111