--- base_model: Qwen/Qwen2.5-VL-3B-Instruct library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:Qwen/Qwen2.5-VL-3B-Instruct - lora - sft - trl - vision-language - medical --- # Qwen2.5-VL-3B-Instruct - MIMIC-CXR Fine-tuned This repository contains a **LoRA fine-tuned adapter** for [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), trained on the **MIMIC-CXR** dataset. The goal is to adapt a powerful **multimodal vision-language model** for **medical chest X-ray interpretation**, generating clinical-style reports from chest radiographs. --- ## How to Use ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor from peft import PeftModel from qwen_vl_utils import process_vision_info import torch base_model_id = "Qwen/Qwen2.5-VL-3B-Instruct" adapter_id = "onurulu17/qwen2.5-vl-3b-instruct-mimic-cxr" # Load base model model = Qwen2_5_VLForConditionalGeneration.from_pretrained( base_model_id, device_map="auto", torch_dtype=torch.bfloat16, ) # Load LoRA adapter model = PeftModel.from_pretrained(model, adapter_id) # Processor processor = AutoProcessor.from_pretrained(base_model_id) # Example inference def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"): text_input = processor.apply_chat_template( sample[:1], tokenize=False, add_generation_prompt=True ) image_inputs, _ = process_vision_info(sample) model_inputs = processor( text=[text_input], images=image_inputs, return_tensors="pt", ).to(device) generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens) trimmed_generated_ids = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)] output_text = processor.batch_decode( trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False ) return output_text[0] sample = [ {'role': 'user', 'content': [{'type': 'image', 'image': "./chest_xray.jpg"}, {'type': 'text', 'text': 'Please analyze this chest X-ray and provide the findings and impression.'}]}, ] output = generate_text_from_sample(model, processor, sample) print(output) ``` --- ## Model Details - **Base model:** Qwen/Qwen2.5-VL-3B-Instruct - **Adapter type:** LoRA (PEFT) - **Training objective:** Supervised fine-tuning (SFT) on chest X-ray reports - **Dataset:** [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) (radiology images + reports) - **Languages:** English (medical reporting domain) - **Frameworks:** `transformers`, `peft`, `trl` --- ## Intended Uses ### Direct Use - Generating radiology-style reports from chest X-ray images. - Research on applying large multimodal models to medical imaging tasks. ### Downstream Use - Medical text generation tasks where radiological image context is available. - Adaptation for other healthcare VQA (Visual Question Answering) tasks. ### Out-of-Scope Use ⚠️ **Not for clinical decision-making.** This model is intended **for research purposes only**. Do not use it in medical practice without proper validation and regulatory approval.