# T³: Test-Time Model Merging for Medical Vision-Language Models

![T³ Workflow](figures/method.png)  
*Figure 1: Dynamic test-time merging workflow of T³*

Official implementation of **T³: Test-Time Model Merging in Vision-Language Models for Zero-Shot Medical Imaging**, a method for adaptive fusion of pretrained and fine-tuned vision-language models at test time using Jensen-Shannon divergence.

---

## Key Features
- 🧠 **Mutual Information Guidance**: Uses JS divergence to measure model consensus.
- ⚡ **Backpropagation-Free**: No gradient updates required during inference.
- 🏥 **Medical Modality Agnostic**: Validated consistency on 4x medical imaging domains.
- 🚀 **Batch-Wise Efficiency**: Reduces compute cost by 32x vs sample-wise merging.
- 📈 **SOTA Performance**: Outperforms 8+ baselines in accuracy & robustness.

---

## Table of Contents
- [Installation](#installation)
- [Method Overview](#method-overview)
- [Folder Structure](#folder-structure)
- [Reproducing Results](#reproducing-results)
- [Pretrained Weights](#pretrained-weights)
- [Citation](#citation)

## Installation

1. Clone repository:
```bash
git clone https://github.com/yourusername/T3.git
cd T3
```

2. Create conda environment:
```bash
conda create -n t3 python=3.9
conda activate t3
pip install -r requirements.txt
```

## Method Overview

### Adaptive Merging via Jensen-Shannon Divergence
The interpolation coefficient λ is computed dynamically for each sample using the following equation:

```math
λ(x) = λ_{min} + (λ_{max}-λ_{min})σ(γ⋅JS(p_{pt}(x)‖p_{ft}(x)))
```

Where:
- `JS` = Jensen-Shannon divergence between pretrained and fine-tuned model predictions.
- `σ` = Sigmoid function for smooth scaling.
- `γ` = Scaling factor (default=0.5).

### Visual Explanation of the Method
Below justifies the method and its effectiveness:

### Dynamic Weighting Based on Model Agreement

We propose using Jensen–Shannon (JS) divergence to measure mutual information between pretrained (`p_pt`) and fine-tuned (`p_ft`) model predictions, offering a more robust gauge of joint confidence than entropy-based methods like DaWin's entropy ratio:

```math
R(x) = \frac{\mathcal{H}(p_{ft}(x))}{\mathcal{H}(p_{pt}(x)) + \mathcal{H}(p_{ft}(x))}
```

JS divergence explicitly captures agreement vs. disagreement by comparing full predictive distributions:

```math
I(x) = \frac{1}{2} \Bigl(\mathrm{KL}(p_{pt}(x) \Vert \bar{p}(x)) + \mathrm{KL}(p_{ft}(x) \Vert \bar{p}(x))\Bigr)
```
where
```math
\bar{p}(x) = 0.5 \cdot (p_{pt}(x) + p_{ft}(x))`.
```

 This ensures:
- \(I(x) = 0\) when models fully agree.
- \(I(x) > 0\) when confident predictions disagree.

Empirically, \(I(x)\) correlates positively with \(R(x)\), but better distinguishes disagreements, validating its use for adaptive merging.

2. **Mutual Information vs. Entropy**  
   ![MI vs Entropy](figures/mi_v_ent.png)  
   *Figure 3: Relationship between mutual information and entropy for adaptive merging.*

3. **Performance Across Modalities**  
   ![Performance Comparison](figures/results.png)  
   *Figure 4: T³ achieves superior performance across multiple medical imaging modalities.*

---

## Folder Structure

```
T3/
├── clip/              # CLIP model adaptations
├── data/              # Data Utilities
├── utils/             # Helper functions
├── baselines.py       # Comparison methods
├── t_cube.py          # Core T³ implementation
├── BetaMixture.py     # Auxiliary models
└── README.md          # This document
```

---

## Reproducing Results

To reproduce the results from the paper, you can run the `t_cube.py` script. This script handles the evaluation of T³ and its baselines across multiple datasets and severity levels. Additional baselines are available in `baselines.py`.

To understand the script better:
- Refer to the `compute_samplewise_tcube_weights` and `compute_samplewise_tcube_weights_MI` functions for entropy (DaWiN baseline) and Our mutual information-based merging.
- Check the `evaluate_on_test_set` function for how datasets and severities are processed.
- Explore the `evaluate_tcube` function for the merging and evaluation logic.

---

## Pretrained Weights

We provide pretrained weights for the following models:
1. **Generalist CLIP**: A pretrained model for general vision-language tasks.
2. **Expert CLIPs**: 4x Fine-tuned models for the following medical imaging domains:
   - Breast Imaging
   - Fundoscopy
   - Cell Microscopy
   - Retinal OCT

If you would like access to these weights, please contact us directly at [Raza Imam](mailto:raza.imam@mbzuai.ac.ae).

---

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contact
For questions or collaborations, contact [Raza Imam](mailto:raza.imam@mbzuai.ac.ae).