Instructions to use AmareshHebbar/icd10-to-drg-qwen25-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AmareshHebbar/icd10-to-drg-qwen25-1b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "AmareshHebbar/icd10-to-drg-qwen25-1b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use AmareshHebbar/icd10-to-drg-qwen25-1b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AmareshHebbar/icd10-to-drg-qwen25-1b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AmareshHebbar/icd10-to-drg-qwen25-1b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AmareshHebbar/icd10-to-drg-qwen25-1b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="AmareshHebbar/icd10-to-drg-qwen25-1b", max_seq_length=2048, )
🏨 ICD-10 to MS-DRG Mapper
Qwen2.5-1.5B fine-tuned for icd-10 to ms-drg mapper
Part of the Medical AI Fine-tuned Model Suite — 16 specialist models, one per task
TL;DR
Maps ICD-10-CM diagnosis codes to the matching MS-DRG code, relative weight, and geometric mean length of stay.
INPUT: ICD-10-CM code: I21.09
OUTPUT: MS-DRG 280 — Acute Myocardial Infarction, Discharged Alive with MCC\nRelative Weight: 2.8613\nGeometric Mean LOS: 4.7 days
| Base model | unsloth/Qwen2.5-1.5B-Instruct |
| Method | QLoRA, 4-bit NF4, rank 16 |
| Training data | icd10-to-drg-sft — 5,385 real-world rows |
| Training compute | NVIDIA A40 (48GB), ~0.5h |
| License | Apache 2.0 |
Architecture
+-------------------------+
user prompt --> | Qwen2.5-1.5B-Instruct | --> base weights (frozen, 4-bit NF4)
| + LoRA adapter (r=16) | --> icd10-to-drg-qwen25-1b
+-------------------------+
|
v
structured output
(code / JSON / classification)
This repo contains only the LoRA adapter (~20MB), not the full merged weights. Load it on top of the base model as shown below — this keeps the download small and lets you swap adapters on one base model in memory.
Intended use
Hospital reimbursement prediction and revenue cycle automation under Medicare IPPS.
Direct use
Give an ICD-10-CM code, get back the MS-DRG, relative weight, and geometric mean LOS.
Downstream use
Feed into a hospital revenue forecasting tool or a DRG validation audit pipeline.
Out of scope
Multi-diagnosis DRG grouping logic that depends on the full official grouper software — this model approximates single-code lookups, it does not replicate the complete CMS MS-DRG grouper algorithm with MCC/CC interaction rules.
This model is not a substitute for a certified medical professional's judgment. Output should be reviewed by a qualified person before being used in a clinical or billing decision.
Quickstart
Option A — Transformers + PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model = "unsloth/Qwen2.5-1.5B-Instruct"
adapter = "AmareshHebbar/icd10-to-drg-qwen25-1b"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a DRG grouper. Given ICD-10-CM codes, return the MS-DRG code, relative weight, and geometric mean LOS."},
{"role": "user", "content": "ICD-10-CM code: I21.09"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
Expected output:
MS-DRG 280 — Acute Myocardial Infarction, Discharged Alive with MCC\nRelative Weight: 2.8613\nGeometric Mean LOS: 4.7 days
Option B — Unsloth (2x faster load + inference)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="AmareshHebbar/icd10-to-drg-qwen25-1b",
max_seq_length=512,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{"role": "system", "content": "You are a DRG grouper. Given ICD-10-CM codes, return the MS-DRG code, relative weight, and geometric mean LOS."},
{"role": "user", "content": "ICD-10-CM code: J18.9"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Option C — vLLM (production serving, OpenAI-compatible)
vllm serve unsloth/Qwen2.5-1.5B-Instruct \
--enable-lora \
--lora-modules icd10-to-drg-qwen25-1b=AmareshHebbar/icd10-to-drg-qwen25-1b \
--host 0.0.0.0 --port 8000 --dtype bfloat16
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="icd10-to-drg-qwen25-1b",
messages=[
{"role": "system", "content": "You are a DRG grouper. Given ICD-10-CM codes, return the MS-DRG code, relative weight, and geometric mean LOS."},
{"role": "user", "content": "MS-DRG: 343"},
],
temperature=0.1,
)
print(response.choices[0].message.content)
Option D — GGUF / llama.cpp (CPU / edge inference)
This repo ships LoRA adapter weights, not a pre-merged GGUF. To run on llama.cpp, merge first:
pip install unsloth
python -c "
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained('AmareshHebbar/icd10-to-drg-qwen25-1b', load_in_4bit=False)
model.save_pretrained_gguf('icd10-to-drg-qwen25-1b-gguf', tokenizer, quantization_method='q4_k_m')
"
Training details
Data
Trained on 5,385 examples extracted from real CMS MS-DRG v43.1 Definitions Manual + FY2026 Final Rule Table 5 weights (source). No synthetic or LLM-generated training data — every example pairs real-world input with its authoritative output.
| Split | Rows |
|---|---|
| Train | 4,308 |
| Validation | 538 |
| Test | 539 |
Full extraction pipeline documented on the dataset card.
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 (QLoRA) |
| Max sequence length | 512 |
| Optimizer | paged_adamw_8bit |
| LR schedule | 2e-4, cosine |
| Gradient checkpointing | Unsloth (smart offload) |
Training compute
| GPU | NVIDIA A40 (48GB) |
| Cloud provider | RunPod |
| Training time | ~0.5h (incl. eval + hub push) |
| Tracking | W&B run |
| CO2 estimate | self-reported, not measured with a carbon tracker — treat as approximate |
Fine-tuned with Unsloth for 2x faster training and reduced VRAM, using TRL's SFTTrainer. Full project: wandb.ai/amareshhebbar-/axiomapper.
Bias, risks & limitations
Data recency. Training data reflects a specific snapshot in time (CMS FY2026 / dataset publish date). Codes, rates, and rules referenced may become outdated as source authorities issue updates — always cross-check against the live authoritative source before high-stakes use.
Failure mode. Like any LLM, this model can produce a plausible-sounding but incorrect output, especially on rare, ambiguous, or highly compound real-world cases that fall outside the training distribution. It does not know when it's wrong.
Language. English-language input only (Hindi-medical model excepted, where Hindi system prompts are used but underlying clinical reasoning data is largely English-sourced).
Not a regulated medical device. This model has not been validated, cleared, or approved by any regulatory body (FDA, CDSCO, or equivalent) as a medical device or clinical decision support tool. It is a research/engineering artifact.
Misapplication risk. Do not use this model as the sole basis for a clinical, billing, or compliance decision affecting a real patient or claim. Do not deploy in an emergency triage context without a human-in-the-loop and clear escalation paths.
FAQ
Q: Can I merge the adapter into the base model for faster inference?
Yes — use model.merge_and_unload() after loading with PEFT, or use Unsloth's save_pretrained_merged() method.
Q: Why QLoRA instead of full fine-tuning? The base model already has strong language and medical knowledge from pretraining. QLoRA adapts only ~0.5-1% of parameters, which is enough to specialize the output format and domain without the cost or overfitting risk of full fine-tuning.
Q: Can I fine-tune this further on my own data? Yes, this adapter can be used as a starting checkpoint for continued fine-tuning. Note this may require merging first depending on your training framework.
Q: Why is the output format so strict? Each task was trained on a fixed system prompt and consistent output structure. Following the documented system prompt closely (see Quickstart above) gives the most reliable results — deviating from it may produce inconsistent formatting.
Q: Does this model store or transmit my input data? No. Like any open-weight model, all inference happens locally on your own infrastructure (or wherever you deploy it) — nothing is sent back to the model author.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
ValueError: padding_token not set |
Base tokenizer has no pad token | Set tokenizer.pad_token = tokenizer.eos_token before inference |
| Garbled / repeated output | Wrong chat template applied | Make sure you use tokenizer.apply_chat_template, not a raw string prompt |
| CUDA OOM on load | Insufficient VRAM | Use load_in_4bit=True (already default above) or reduce max_seq_length |
| Adapter loads but ignores fine-tuning | Base model mismatch | Confirm you loaded the exact base listed above — adapters are not portable across different base models or quantizations |
Related models in this suite
| Model | Task | Size |
|---|---|---|
| icd10-coder-qwen25-7b | ICD-10-CM medical coding | 7B |
| snomed-mapper-qwen25-7b | Clinical concept mapping | 7B |
| icd10-to-drg-qwen25-1b | ICD-10 to DRG reimbursement | 1.5B |
| pmjay-classifier-qwen25-3b | India PM-JAY classification | 3B |
Full suite overview: AmareshHebbar/medical-ai-model-suite
Changelog
| Version | Date | Notes |
|---|---|---|
| v1.0 | 2026 | Initial release — QLoRA fine-tune on 5,385 real-world rows |
Citation
@misc{medicalai2026,
author = {Hebbar, Amaresh},
title = {Medical AI Fine-tuning Suite},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/AmareshHebbar}
}
Contact
- Downloads last month
- -
Model tree for AmareshHebbar/icd10-to-drg-qwen25-1b
Base model
Qwen/Qwen2.5-1.5B