--- language: en license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B tags: - llama-3.1 - qlora - lora - clinical-nlp - fhir - medical - fine-tuned - unsloth datasets: - ai-galileo/clinical-notes-to-fhir --- # LLaMA 3.1 8B — Clinical Notes to FHIR Fine-tuned adapter converting unstructured clinical notes into structured FHIR R4 JSON. ## Model Details - **Base model**: `unsloth/meta-llama-3.1-8b-bnb-4bit` - **Method**: QLoRA (4-bit quantization + LoRA adapters) - **LoRA config**: r=16, alpha=32, dropout=0.05 - **Trainable params**: ~20M / 8B (0.25%) - **Hardware**: NVIDIA L40 (47.7 GB VRAM) - **Training time**: ~4 minutes ## Training Metrics | Metric | Value | |--------|-------| | Train loss (epoch 1) | 2.398 | | Train loss (epoch 3) | 1.394 | | Eval loss | 1.389 | | Peak VRAM | 8.3 GB | ## Usage ```python from unsloth import FastLanguageModel from peft import PeftModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/meta-llama-3.1-8b-bnb-4bit", max_seq_length=2048, load_in_4bit=True, ) model = PeftModel.from_pretrained(model, "AnukeerthiReddy/llama-3.1-8b-clinical-fhir-lora") note = "Patient: 58M with chest pain radiating to left arm x 2h. HTN, T2DM. BP 158/92." inputs = tokenizer(note, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Dataset [ai-galileo/clinical-notes-to-fhir](https://huggingface.co/datasets/ai-galileo/clinical-notes-to-fhir) ## Training Infrastructure - GPU: NVIDIA L40 (47.7 GB) — Georgia State University - Framework: Unsloth + TRL SFTTrainer - Logging: Weights & Biases