---
library_name: transformers
license: other
license_name: health-ai-developer-foundations
license_link: https://developers.google.com/health-ai-developer-foundations/terms
datasets:
- Nadhari/MedToolCalling
language:
- en
base_model:
- google/medgemma-1.5-4b-it
tags:
- medical
- healthcare
- tool-calling
- FHIR
- EHR
- clinical
- agents
pipeline_tag: text-generation
metrics:
- accuracy
---

# Sara-1.5-4B-it

**Sara** is a fine-tuned variant of Google's [MedGemma-1.5-4B-it](https://huggingface.co/google/medgemma-1.5-4b-it) that excels at medical tool calling and agentic tasks in EHR/FHIR clinical workflows.

## Model Description

Sara is specifically trained to interact with FHIR R4-compliant Electronic Health Record (EHR) systems through structured API calls. The model can:

- **Query patient data** via FHIR GET requests (patient lookup, lab results, vitals)
- **Create clinical records** via FHIR POST requests (medication orders, referrals, observations)
- **Extract and return structured answers** in a consistent format

This makes Sara ideal for building clinical AI agents that need to interface with healthcare IT systems.

## Intended Use

Sara is designed for:

- Building AI agents that interact with FHIR R4-compliant EHR systems
- Clinical decision support workflows requiring structured API interactions
- Research on LLM agents in healthcare settings
- Prototyping medical AI applications with tool-calling capabilities

### Out-of-Scope Use

- Direct clinical decision-making without human oversight
- Deployment in production healthcare environments without proper validation
- Use cases requiring real-time patient safety decisions

## Training Data

Sara was fine-tuned on the [MedToolCalling](https://huggingface.co/datasets/Nadhari/MedToolCalling) dataset, which contains 284 verified multi-turn conversations demonstrating correct FHIR API usage.

### Dataset Overview

| Attribute | Value |
|-----------|-------|
| Total Samples | 284 |
| Format | Multi-turn conversations |
| Avg. Turns per Sample | 2 |
| Action Types | `GET`, `POST`, `FINISH` |
| Total GET Calls | 225 |
| Total POST Calls | 78 |

### Task Types Covered

| Task | Description |
|------|-------------|
| Patient Lookup | Search patients by name, DOB, MRN |
| Age Calculation | Calculate patient age from DOB |
| Vitals Recording | Record blood pressure observations (POST) |
| Lab Queries | Query magnesium, potassium, CBG, HbA1C levels |
| Medication Orders | Conditionally order IV replacements with correct dosing |
| Referrals | Order orthopedic surgery referrals |
| Follow-up Labs | Schedule follow-up lab orders based on conditions |

### FHIR Resources Used

- `Patient` - Search and retrieve patient demographics
- `Observation` - Query labs and vitals, record new observations
- `MedicationRequest` - Order medications
- `ServiceRequest` - Order referrals and lab tests


## How to Use

### Installation
```bash
pip install transformers accelerate torch
```

### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Alfaxad/Sara-1.5-4B-it"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Example: Patient lookup task
system_prompt = """You are an expert in using FHIR functions to assist medical professionals. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.

1. If you decide to invoke a GET function, you MUST put it in the format of
GET url?param_name1=param_value1&param_name2=param_value2...

2. If you decide to invoke a POST function, you MUST put it in the format of
POST url
[your payload data in JSON format]

3. If you have got answers for all the questions and finished all the requested tasks, you MUST call to finish the conversation in the format of
FINISH([answer1, answer2, ...])

Your response must be in the format of one of the three cases, and you can call only one function each time.

Available FHIR endpoints:
- GET {api_base}/Patient - Search patients by name, DOB, identifier
- GET {api_base}/Observation - Query lab results and vitals
- POST {api_base}/Observation - Record new observations
- POST {api_base}/MedicationRequest - Order medications
- POST {api_base}/ServiceRequest - Order referrals and labs

Use http://localhost:8080/fhir/ as the api_base.

Question: What's the MRN of the patient with name John Smith and DOB of 1985-03-15?"""

messages = [{"role": "user", "content": system_prompt}]

input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
# Expected output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15
```

### Multi-Turn Conversation Example
```python
def run_agent_turn(model, tokenizer, conversation):
    """Run a single agent turn given the conversation history."""
    input_text = tokenizer.apply_chat_template(
        conversation,
        tokenize=False,
        add_generation_prompt=True
    )
    
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    
    with torch.inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            do_sample=False,
            pad_token_id=tokenizer.pad_token_id,
        )
    
    response = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[-1]:], 
        skip_special_tokens=True
    )
    return response.strip()

# Initialize conversation with system prompt
conversation = [{"role": "user", "content": system_prompt}]

# Turn 1: Agent makes API call
agent_response = run_agent_turn(model, tokenizer, conversation)
print(f"Agent: {agent_response}")
# Output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15

# Simulate FHIR server response
conversation.append({"role": "model", "content": agent_response})
fhir_response = """Here is the response from the GET request:
{
  "resourceType": "Bundle",
  "total": 1,
  "entry": [{
    "resource": {
      "resourceType": "Patient",
      "id": "S1234567",
      "identifier": [{"value": "S1234567"}],
      "name": [{"family": "Smith", "given": ["John"]}],
      "birthDate": "1985-03-15"
    }
  }]
}"""
conversation.append({"role": "user", "content": fhir_response})

# Turn 2: Agent extracts answer
agent_response = run_agent_turn(model, tokenizer, conversation)
print(f"Agent: {agent_response}")
# Output: FINISH(["S1234567"])
```

## Agent Action Format

Sara responds in exactly one of three formats per turn:

### GET Request
```
GET http://localhost:8080/fhir/{Resource}?param1=value1&param2=value2
```

### POST Request
```
POST http://localhost:8080/fhir/{Resource}
{
  "resourceType": "...",
  "field": "value",
  ...
}
```

### Final Answer
```
FINISH([answer1, answer2, ...])
```

## Limitations

- **Domain Specificity**: Sara is optimized for FHIR R4 API interactions and may not generalize well to other healthcare standards or non-medical tool-calling tasks.
- **Validation Required**: Outputs should be validated before execution in any clinical system.
- **Not for Direct Patient Care**: This model is intended for research and development purposes, not direct clinical decision-making.
- **Context Window**: While the model supports up to 128K tokens, it was fine-tuned on sequences up to 16K tokens.

## License

The use of Sara is governed by the [Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms), inherited from the base MedGemma model.

## Citation

If you use this model, please cite:
```bibtex
@misc{Sara,
  title={Sara-1.5-4B-it: A Fine-tuned MedGemma Model for Clinical Tool Calling},
  author={Alfaxad Eyembe, Nadhari AI Lab},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/Nadhari/Sara-1.5-4B-it}
}
```

### Base Model Citation
```bibtex
@article{sellergren2025medgemma,
  title={MedGemma Technical Report},
  author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and others},
  journal={arXiv preprint arXiv:2507.05201},
  year={2025}
}
```

### Dataset Citation
```bibtex
@misc{MedToolCalling,
  author = {Alfaxad Eyembe, Nadhari AI Lab},
  title = {MedToolCalling: Medical Tool Calling Dataset},
  year = {2026},
  publisher = {Hugging Face},
  url={https://huggingface.co/Nadhari/MedToolCalling}
}
```

### Evaluation Framework Citation
```bibtex
@article{tang2025medagentbench,
  title={MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications},
  author={Tang, Yixing and Zou, Kaizhao and Sun, Hao and Chen, Zheng and Chen, Jonathan H},
  journal={arXiv preprint arXiv:2501.14654},
  year={2025}
}
```