--- library_name: transformers license: other license_name: health-ai-developer-foundations license_link: https://developers.google.com/health-ai-developer-foundations/terms datasets: - Nadhari/MedToolCalling language: - en base_model: - google/medgemma-1.5-4b-it tags: - medical - healthcare - tool-calling - FHIR - EHR - clinical - agents pipeline_tag: text-generation metrics: - accuracy --- # Sara-1.5-4B-it **Sara** is a fine-tuned variant of Google's [MedGemma-1.5-4B-it](https://huggingface.co/google/medgemma-1.5-4b-it) that excels at medical tool calling and agentic tasks in EHR/FHIR clinical workflows. ## Model Description Sara is specifically trained to interact with FHIR R4-compliant Electronic Health Record (EHR) systems through structured API calls. The model can: - **Query patient data** via FHIR GET requests (patient lookup, lab results, vitals) - **Create clinical records** via FHIR POST requests (medication orders, referrals, observations) - **Extract and return structured answers** in a consistent format This makes Sara ideal for building clinical AI agents that need to interface with healthcare IT systems. ## Intended Use Sara is designed for: - Building AI agents that interact with FHIR R4-compliant EHR systems - Clinical decision support workflows requiring structured API interactions - Research on LLM agents in healthcare settings - Prototyping medical AI applications with tool-calling capabilities ### Out-of-Scope Use - Direct clinical decision-making without human oversight - Deployment in production healthcare environments without proper validation - Use cases requiring real-time patient safety decisions ## Training Data Sara was fine-tuned on the [MedToolCalling](https://huggingface.co/datasets/Nadhari/MedToolCalling) dataset, which contains 284 verified multi-turn conversations demonstrating correct FHIR API usage. ### Dataset Overview | Attribute | Value | |-----------|-------| | Total Samples | 284 | | Format | Multi-turn conversations | | Avg. Turns per Sample | 2 | | Action Types | `GET`, `POST`, `FINISH` | | Total GET Calls | 225 | | Total POST Calls | 78 | ### Task Types Covered | Task | Description | |------|-------------| | Patient Lookup | Search patients by name, DOB, MRN | | Age Calculation | Calculate patient age from DOB | | Vitals Recording | Record blood pressure observations (POST) | | Lab Queries | Query magnesium, potassium, CBG, HbA1C levels | | Medication Orders | Conditionally order IV replacements with correct dosing | | Referrals | Order orthopedic surgery referrals | | Follow-up Labs | Schedule follow-up lab orders based on conditions | ### FHIR Resources Used - `Patient` - Search and retrieve patient demographics - `Observation` - Query labs and vitals, record new observations - `MedicationRequest` - Order medications - `ServiceRequest` - Order referrals and lab tests ## How to Use ### Installation ```bash pip install transformers accelerate torch ``` ### Basic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "Alfaxad/Sara-1.5-4B-it" model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(model_id) # Example: Patient lookup task system_prompt = """You are an expert in using FHIR functions to assist medical professionals. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose. 1. If you decide to invoke a GET function, you MUST put it in the format of GET url?param_name1=param_value1¶m_name2=param_value2... 2. If you decide to invoke a POST function, you MUST put it in the format of POST url [your payload data in JSON format] 3. If you have got answers for all the questions and finished all the requested tasks, you MUST call to finish the conversation in the format of FINISH([answer1, answer2, ...]) Your response must be in the format of one of the three cases, and you can call only one function each time. Available FHIR endpoints: - GET {api_base}/Patient - Search patients by name, DOB, identifier - GET {api_base}/Observation - Query lab results and vitals - POST {api_base}/Observation - Record new observations - POST {api_base}/MedicationRequest - Order medications - POST {api_base}/ServiceRequest - Order referrals and labs Use http://localhost:8080/fhir/ as the api_base. Question: What's the MRN of the patient with name John Smith and DOB of 1985-03-15?""" messages = [{"role": "user", "content": system_prompt}] input_text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(input_text, return_tensors="pt").to(model.device) with torch.inference_mode(): outputs = model.generate( **inputs, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.pad_token_id, ) response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) print(response) # Expected output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15 ``` ### Multi-Turn Conversation Example ```python def run_agent_turn(model, tokenizer, conversation): """Run a single agent turn given the conversation history.""" input_text = tokenizer.apply_chat_template( conversation, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(input_text, return_tensors="pt").to(model.device) with torch.inference_mode(): outputs = model.generate( **inputs, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.pad_token_id, ) response = tokenizer.decode( outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True ) return response.strip() # Initialize conversation with system prompt conversation = [{"role": "user", "content": system_prompt}] # Turn 1: Agent makes API call agent_response = run_agent_turn(model, tokenizer, conversation) print(f"Agent: {agent_response}") # Output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15 # Simulate FHIR server response conversation.append({"role": "model", "content": agent_response}) fhir_response = """Here is the response from the GET request: { "resourceType": "Bundle", "total": 1, "entry": [{ "resource": { "resourceType": "Patient", "id": "S1234567", "identifier": [{"value": "S1234567"}], "name": [{"family": "Smith", "given": ["John"]}], "birthDate": "1985-03-15" } }] }""" conversation.append({"role": "user", "content": fhir_response}) # Turn 2: Agent extracts answer agent_response = run_agent_turn(model, tokenizer, conversation) print(f"Agent: {agent_response}") # Output: FINISH(["S1234567"]) ``` ## Agent Action Format Sara responds in exactly one of three formats per turn: ### GET Request ``` GET http://localhost:8080/fhir/{Resource}?param1=value1¶m2=value2 ``` ### POST Request ``` POST http://localhost:8080/fhir/{Resource} { "resourceType": "...", "field": "value", ... } ``` ### Final Answer ``` FINISH([answer1, answer2, ...]) ``` ## Limitations - **Domain Specificity**: Sara is optimized for FHIR R4 API interactions and may not generalize well to other healthcare standards or non-medical tool-calling tasks. - **Validation Required**: Outputs should be validated before execution in any clinical system. - **Not for Direct Patient Care**: This model is intended for research and development purposes, not direct clinical decision-making. - **Context Window**: While the model supports up to 128K tokens, it was fine-tuned on sequences up to 16K tokens. ## License The use of Sara is governed by the [Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms), inherited from the base MedGemma model. ## Citation If you use this model, please cite: ```bibtex @misc{Sara, title={Sara-1.5-4B-it: A Fine-tuned MedGemma Model for Clinical Tool Calling}, author={Alfaxad Eyembe, Nadhari AI Lab}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/Nadhari/Sara-1.5-4B-it} } ``` ### Base Model Citation ```bibtex @article{sellergren2025medgemma, title={MedGemma Technical Report}, author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and others}, journal={arXiv preprint arXiv:2507.05201}, year={2025} } ``` ### Dataset Citation ```bibtex @misc{MedToolCalling, author = {Alfaxad Eyembe, Nadhari AI Lab}, title = {MedToolCalling: Medical Tool Calling Dataset}, year = {2026}, publisher = {Hugging Face}, url={https://huggingface.co/Nadhari/MedToolCalling} } ``` ### Evaluation Framework Citation ```bibtex @article{tang2025medagentbench, title={MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications}, author={Tang, Yixing and Zou, Kaizhao and Sun, Hao and Chen, Zheng and Chen, Jonathan H}, journal={arXiv preprint arXiv:2501.14654}, year={2025} } ```