Instructions to use AmareshHebbar/icd10-coder-qwen25-7b-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AmareshHebbar/icd10-coder-qwen25-7b-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AmareshHebbar/icd10-coder-qwen25-7b-merged") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AmareshHebbar/icd10-coder-qwen25-7b-merged") model = AutoModelForCausalLM.from_pretrained("AmareshHebbar/icd10-coder-qwen25-7b-merged") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AmareshHebbar/icd10-coder-qwen25-7b-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AmareshHebbar/icd10-coder-qwen25-7b-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AmareshHebbar/icd10-coder-qwen25-7b-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AmareshHebbar/icd10-coder-qwen25-7b-merged
- SGLang
How to use AmareshHebbar/icd10-coder-qwen25-7b-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AmareshHebbar/icd10-coder-qwen25-7b-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AmareshHebbar/icd10-coder-qwen25-7b-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AmareshHebbar/icd10-coder-qwen25-7b-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AmareshHebbar/icd10-coder-qwen25-7b-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use AmareshHebbar/icd10-coder-qwen25-7b-merged with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AmareshHebbar/icd10-coder-qwen25-7b-merged to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AmareshHebbar/icd10-coder-qwen25-7b-merged to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AmareshHebbar/icd10-coder-qwen25-7b-merged to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="AmareshHebbar/icd10-coder-qwen25-7b-merged", max_seq_length=2048, ) - Docker Model Runner
How to use AmareshHebbar/icd10-coder-qwen25-7b-merged with Docker Model Runner:
docker model run hf.co/AmareshHebbar/icd10-coder-qwen25-7b-merged
Use Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "AmareshHebbar/icd10-coder-qwen25-7b-merged" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AmareshHebbar/icd10-coder-qwen25-7b-merged",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'ICD-10 Medical Coder โ Qwen2.5-7B
An AI system for WHO-standardized medical classification, insurance code prediction, and coverage estimation
What Is This?
ICD-10-Coder is the first model in a long-term initiative โ AxisMapper โ to build an AI-native insurance intelligence layer for the Indian and global healthcare ecosystem.
The International Classification of Diseases, 10th Revision (ICD-10), maintained by the World Health Organization (WHO), is the globally accepted standard for encoding medical diagnoses, procedures, and conditions. Every hospital, insurer, and government health authority uses ICD-10 codes to classify care and determine reimbursement.
The core insight behind this project: insurance agents, hospital billing teams, and patients have no reliable way to know what a given diagnosis actually entitles them to. Coverage decisions are opaque, rules are fragmented across schemes, and the same condition might be coded five different ways โ each triggering a different payout.
This model is the first agent in what will become a Multi-Agent, Mixture-of-Experts (MoE) pipeline โ purpose-built to decode that opacity.
The Bigger Vision: AxisMapper
"One fine-tuned model per insurance scheme. A shared routing layer. Zero ambiguity for the patient."
India's health insurance landscape spans:
- Ayushman Bharat / PM-JAY โ world's largest government-funded health insurance scheme
- Star Health โ India's largest standalone health insurer
- ESIC / CGHS โ central government employee schemes
- State-level programs โ varying eligibility, tariff, and admission rules
- NGO-backed schemes โ community-level coverage with entirely different logic
Each of these schemes has its own ICD-10 code mappings, admission duration requirements, procedure eligibility, and claim caps. There is no unified interface to query them all.
AxisMapper's roadmap:
Phase 1 (Now) โ WHO ICD-10 base model (this model)
Universal code prediction + coverage logic
Phase 2 โ Fine-tune per scheme (StarHealth, PM-JAY, ESIC, etc.)
Each model specialises in one insurer's rule set
Phase 3 โ MoE Router
Given a patient + insurer, route to the right specialist model
Phase 4 โ Multi-Agent Pipeline
Agent 1: Diagnosis โ ICD-10 code
Agent 2: Code โ Coverage estimate (policy-aware)
Agent 3: Coverage + Admission rules โ Final claim amount
Agent 4: Web search โ Real-time tariff / market validation
This model โ the WHO-standardized base โ handles Phase 1: given any clinical description, it returns the correct ICD-10 code, explains the classification, and applies WHO-level coverage logic.
Model Details
| Property | Value |
|---|---|
| Base Model | unsloth/qwen2.5-7b-instruct |
| Architecture | Qwen2 (decoder-only transformer) |
| Parameters | ~8B |
| Precision | BF16 |
| Fine-tuning Method | LoRA via Unsloth + HuggingFace TRL |
| Training Hardware | NVIDIA RTX A5000 (24GB VRAM) |
| Training Duration | ~2 hours |
| Training Speed | 2ร faster than standard HF training (via Unsloth) |
| Experiment Tracking | Weights & Biases (W&B) |
| Max Sequence Length | 2048 tokens |
| License | Apache 2.0 |
Training Infrastructure
This model was trained using the Unsloth optimization library, which achieves 2ร training speed and ~60% VRAM reduction compared to standard HuggingFace fine-tuning โ without any loss in model quality.
Training stack:
unslothโ optimized LoRA fine-tuning enginetrl(HuggingFace) โ SFTTrainer for instruction fine-tuningtransformersโ model loading, tokenization, inferencewandbโ real-time loss curves, learning rate scheduling, gradient tracking
All training runs are logged and reproducible via Weights & Biases. The training converged stably within 2 hours on a single A5000 GPU, making this a cost-efficient approach to medical domain adaptation.
What This Model Does
Given a clinical description or patient scenario, this model will:
- Assign the correct ICD-10 code(s) โ primary diagnosis, secondary conditions, procedure codes
- Explain the WHO classification logic โ why this code, what the category means, adjacent codes
- Estimate WHO-level insurance coverage โ standard reimbursement brackets, admission duration requirements, procedure eligibility
- Flag restrictions โ minimum admission days, co-morbidity requirements, pre-authorisation triggers
- Support multi-condition scenarios โ comorbidities, complications, dual coding
Example input:
Patient admitted for acute appendicitis with peritonitis.
Underwent emergency appendectomy. Admitted for 3 days.
What ICD-10 codes apply and what is the expected insurance coverage?
Example output (truncated):
Primary Code: K35.2 โ Acute appendicitis with generalised peritonitis
Procedure Code: 0DTJ4ZZ โ Resection of appendix, percutaneous endoscopic approach
WHO Classification: Diseases of the digestive system (K00โK93)
Chapter XI, Block K35-K38 (Diseases of appendix)
Coverage Logic:
- WHO standard: Surgical admission, inpatient required
- Minimum admission: 1โ3 days (surgery-dependent)
- Reimbursement class: Major surgery
- Pre-auth: Required for elective; emergency bypass available
- Approximate WHO-tier bracket: โน35,000โโน75,000 (India tier-2 hospital)
Quickstart
Using Transformers (Pipeline)
from transformers import pipeline
pipe = pipeline("text-generation", model="AmareshHebbar/icd10-coder-qwen25-7b-merged")
query = """
Patient presents with Type 2 diabetes mellitus with chronic kidney disease stage 3.
What ICD-10 codes apply? What are the WHO-level insurance implications?
What are the admission requirements for this to be covered?
"""
result = pipe([{"role": "user", "content": query}], max_new_tokens=512)
print(result[0]["generated_text"][-1]["content"])
Using Unsloth (Recommended for inference speed)
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="AmareshHebbar/icd10-coder-qwen25-7b-merged",
max_seq_length=2048,
load_in_4bit=True, # Optional: 4-bit for lower VRAM
)
messages = [
{"role": "system", "content": "You are an expert ICD-10 medical coder with deep knowledge of WHO insurance classification standards."},
{"role": "user", "content": "Patient: acute MI, stented. 2-day admission. Code and coverage?"}
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Using vLLM (Production / High Throughput)
pip install vllm
vllm serve "AmareshHebbar/icd10-coder-qwen25-7b-merged" --max-model-len 2048
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
model="AmareshHebbar/icd10-coder-qwen25-7b-merged",
messages=[
{"role": "system", "content": "You are an expert ICD-10 coder and insurance analyst."},
{"role": "user", "content": "Patient: fractured femur, open reduction required, 4-day inpatient. ICD-10 codes and insurance coverage?"}
],
max_tokens=512,
temperature=0.1,
)
print(response.choices[0].message.content)
Using Ollama (Local / Offline)
# Export to GGUF first (via llama.cpp or Unsloth export)
ollama create icd10-coder -f ./Modelfile
ollama run icd10-coder "Patient: appendicitis, emergency surgery. Code and coverage?"
๐ Integrations Supported
| Backend | Status | Use Case |
|---|---|---|
| HuggingFace Transformers | โ | Research, prototyping |
| Unsloth FastModel | โ | Fast inference, fine-tuning |
| vLLM | โ | Production API, high throughput |
| SGLang | โ | Structured generation |
| Ollama | โ | Local / offline deployment |
| Claude API (Anthropic) | ๐ Planned | Hybrid: ICD-10 code โ Claude for coverage analysis |
| Gemini API (Google) | ๐ Planned | Multi-LLM comparison layer |
| Web Search (Tavily/Serper) | ๐ Planned | Real-time tariff + hospital rate lookup |
ICD-10 Coverage
This model has been fine-tuned across all major ICD-10-CM chapters:
| Chapter | Description |
|---|---|
| I (A00โB99) | Infectious and parasitic diseases |
| II (C00โD49) | Neoplasms |
| III (D50โD89) | Blood and immune disorders |
| IV (E00โE89) | Endocrine, nutritional, metabolic |
| V (F01โF99) | Mental and behavioural disorders |
| IX (I00โI99) | Circulatory system diseases |
| X (J00โJ99) | Respiratory diseases |
| XI (K00โK95) | Digestive system diseases |
| XIII (M00โM99) | Musculoskeletal diseases |
| XIV (N00โN99) | Genitourinary diseases |
| XIX (S00โT88) | Injuries, poisonings |
| XXI (Z00โZ99) | Health status, contact with services |
Limitations & Intended Use
- This model is trained on WHO ICD-10 baseline standards, not on any specific insurer's proprietary rules. Coverage estimates are indicative, not legally binding.
- Not a substitute for professional medical coding or licensed insurance adjudication.
- Coverage estimates should be validated against the patient's actual policy terms and the treating hospital's empanelment status.
- Future scheme-specific models (Ayushman Bharat, Star Health, etc.) will provide more precise, policy-aware outputs.
Links
- GitHub (AxisMapper): https://github.com/amareshhebbar/AxisMapper
- Developed by: AmareshHebbar
- Base model: unsloth/qwen2.5-7b-instruct
Citation
@misc{hebbar2025icd10coder,
title={ICD-10 Coder: A Fine-tuned Qwen2.5-7B for Medical Classification and Insurance Coverage Estimation},
author={Amaresh Hebbar},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/AmareshHebbar/icd10-coder-qwen25-7b-merged},
note={Part of the AxisMapper project: https://github.com/amareshhebbar/AxisMapper}
}
- Downloads last month
- 118
Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AmareshHebbar/icd10-coder-qwen25-7b-merged" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AmareshHebbar/icd10-coder-qwen25-7b-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'