--- license: cc-by-4.0 library_name: transformers base_model: google/gemma-2b-it tags: - aayuai - aayu-ai - aayubot - medical-ai - healthcare-ai - conversational-ai - chatbot - llm - generative-ai - gemma - gemma-2b - transformers - nlp - fine-tuned - medical-chatbot - clinical-ai - digital-health - ai-safety - safe-ai - responsible-ai - ai-alignment - content-moderation - severity-aware - virtual-assistant - open-source - research-project - startup - innovation - lora - peft - safety --- # AayuBot Gemma-2B (LoRA Adapter) **Severity-aware, safety-focused medical conversational AI** built on top of `google/gemma-2b-it` using QLoRA fine-tuning. | Field | Detail | |-------|--------| | **Creator** | Satyam Tiwari (`thesatyam12`) | | **Base model** | `google/gemma-2b-it` | | **Method** | QLoRA (4-bit NF4) / PEFT adapter | | **License** | CC BY 4.0 (adapter weights only) | | **DOI** | `10.57967/hf/7776` | | **Status** | Research — Manuscript submitted | --- ## Available Checkpoints | Folder | Training Steps | Recommended? | |--------|---------------|--------------| | `checkpoint1-20k` | 20,000 | | | `checkpoint2-40k` | 40,000 | | | `checkpoint3-60k` | 60,000 | | | `checkpoint4-80k` | 80,000 | | | `checkpoint5-100k` | 100,000 | ✅ **Best / Final** | By default, all code below uses **checkpoint5-100k**. --- ## Step 1 — Install dependencies ```bash # Create a fresh environment (recommended) pip install --upgrade pip # Core packages pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Hugging Face + Gemma + LoRA stack pip install transformers accelerate bitsandbytes peft # Login to Hugging Face (required for Gemma and our adapter access) pip install huggingface_hub #then huggingface-cli login #or from huggingface_hub import login login("YOUR_HF_TOKEN") ``` > **Gemma access:** You must first accept Google's Gemma license at > [https://huggingface.co/google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) > (click "Agree and access repository"). Otherwise the base model download will fail. --- ## Step 2 — Download the adapter ### Option A: Clone the full repo (all checkpoints) ```bash git lfs install git clone https://huggingface.co/thesatyam12/aayubot-gemma-2b ``` ### Option B: Download only checkpoint5-100k (recommended) ```python from huggingface_hub import snapshot_download snapshot_download( repo_id="thesatyam12/aayubot-gemma-2b", allow_patterns="checkpoint5-100k/*", local_dir="./aayubot-gemma-2b", ) ``` --- ## Step 3 — Load the model ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel # ── Config ────────────────────────────────────────────── BASE_MODEL = "google/gemma-2b-it" ADAPTER_REPO = "thesatyam12/aayubot-gemma-2b" CHECKPOINT = "checkpoint5-100k" # change if you want another checkpoint # ──────────────────────────────────────────────────────── # 4-bit quantization (same settings used during training) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) # Load tokenizer from the adapter checkpoint tokenizer = AutoTokenizer.from_pretrained( ADAPTER_REPO, subfolder=CHECKPOINT, ) # Load base Gemma model in 4-bit base_model = AutoModelForCausalLM.from_pretrained( BASE_MODEL, quantization_config=bnb_config, device_map="auto", torch_dtype=torch.float16, attn_implementation="eager", ) # Attach the LoRA adapter model = PeftModel.from_pretrained( base_model, ADAPTER_REPO, subfolder=CHECKPOINT, ) model.eval() print("✅ AayuBot loaded successfully!") ``` --- ## Step 4 — Run inference ```python def ask_aayubot(question, severity="simple"): # Step 1: Generate initial response prompt = ( f"user\n" f"[Severity: {severity}]\n" f"{question}\n" f"\n" f"model\n" ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens=300, temperature=0.7, top_p=0.9, do_sample=True, repetition_penalty=1.25, no_repeat_ngram_size=5, pad_token_id=tokenizer.eos_token_id, ) raw_response = tokenizer.decode( output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True, ).strip() # Step 2: Post-processing (Clean + One Disclaimer) post_prompt = ( f"user\n" f"Rewrite the following medical answer cleanly. " f"Remove all repeated lines and duplicate disclaimers. " f"Keep only ONE clear disclaimer at the very end. " f"Make the response professional and concise.\n\n" f"Text:\n{raw_response}\n" f"\n" f"model\n" ) post_inputs = tokenizer(post_prompt, return_tensors="pt").to(model.device) with torch.no_grad(): post_output = model.generate( **post_inputs, max_new_tokens=300, temperature=0.4, top_p=0.95, do_sample=True, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id, ) final_response = tokenizer.decode( post_output[0][post_inputs["input_ids"].shape[-1]:], skip_special_tokens=True, ).strip() # Step 3: Final cleaning (remove any remaining duplicates) lines = [] for line in final_response.split("\n"): line = line.strip() if line and line not in lines: lines.append(line) cleaned_response = "\n".join(lines) # Ensure disclaimer appears only once at the end disclaimer = "This is general information only. Please consult a doctor for proper diagnosis and treatment." # Remove any existing disclaimer lines cleaned_lines = [l for l in cleaned_response.split("\n") if "general information only" not in l.lower()] cleaned_response = "\n".join(cleaned_lines).strip() # Add single disclaimer at the end if disclaimer.lower() not in cleaned_response.lower(): cleaned_response += "\n\n" + disclaimer return cleaned_response.strip() # ── Try it ── print(ask_aayubot("What should I do for a mild headache?", severity="simple")) print(ask_aayubot("I have chest pain and shortness of breath", severity="high")) ``` --- ## Prompt format AayuBot expects a **severity tag** before the user query: ``` user [Severity: simple|medium|high] Your medical question here model ``` | Severity | When to use | |----------|-------------| | `simple` | General wellness, mild symptoms | | `medium` | Moderate symptoms, medication questions | | `high` | Emergency-level, chest pain, severe symptoms | --- ## Loading from a local clone If you used `git clone` (Option A above): ```python # Just point to the local path instead of the HF repo ID model = PeftModel.from_pretrained( base_model, "./aayubot-gemma-2b/checkpoint5-100k", ) ``` --- ## Paper / Citation > **AayuAI: Transform-Not-Delete Safety Curation and Severity-Aware Fine-Tuning for Medical Conversational AI** > > Authors: Jayraj S. Lakkad, **Satyam R. Tiwari**, Laksh J. Savaliya, Akshar V. Prajapati, Dipali Kasat, Dr. Premalkumar J. Patel ```bibtex @misc{aayuai2025, title = {AayuAI: Transform-Not-Delete Safety Curation and Severity-Aware Fine-Tuning for Medical Conversational AI}, author = {Lakkad, Jayraj S. and Tiwari, Satyam R. and Savaliya, Laksh J. and Prajapati, Akshar V. and Kasat, Dipali and Patel, Premalkumar J.}, year = {2025}, doi = {10.57967/hf/7776}, url = {https://huggingface.co/thesatyam12/aayubot-gemma-2b}, } ``` --- ## ⚠️ Important disclaimers 1. **Not a doctor.** AayuBot is a research prototype. Do **not** use it for real medical decisions. 2. **Base model license.** This repo contains **adapter weights only**. Usage of `google/gemma-2b-it` is governed by [Google's Gemma Terms of Use](https://ai.google.dev/gemma/terms). 3. **Adapter license.** The adapter weights in this repo are released under **CC BY 4.0**. ---