Instructions to use mindlab-research/Macaron-V1-Preview-749B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mindlab-research/Macaron-V1-Preview-749B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mindlab-research/Macaron-V1-Preview-749B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mindlab-research/Macaron-V1-Preview-749B")
model = AutoModelForCausalLM.from_pretrained("mindlab-research/Macaron-V1-Preview-749B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use mindlab-research/Macaron-V1-Preview-749B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mindlab-research/Macaron-V1-Preview-749B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mindlab-research/Macaron-V1-Preview-749B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mindlab-research/Macaron-V1-Preview-749B

SGLang

How to use mindlab-research/Macaron-V1-Preview-749B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mindlab-research/Macaron-V1-Preview-749B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mindlab-research/Macaron-V1-Preview-749B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mindlab-research/Macaron-V1-Preview-749B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mindlab-research/Macaron-V1-Preview-749B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use mindlab-research/Macaron-V1-Preview-749B with Docker Model Runner:
```
docker model run hf.co/mindlab-research/Macaron-V1-Preview-749B
```

Macaron-V1-Preview-749B

File size: 12,041 Bytes

---
license: mit
language:
- en
- zh
pipeline_tag: text-generation
library_name: transformers
base_model:
- zai-org/GLM-5.1
tags:
- macaron
- personal-agent
- tool-use
- mixture-of-lora
- generative-ui
- a2ui
- glm
---

# Macaron-V1-Preview-749B

Macaron-V1-Preview-749B is a 749B-class Mixture-of-LoRA personal-agent model from MindLab Research, post-trained from GLM-5.1 with MinT. It combines a 744B base model with five specialist LoRA adapters and a router-driven serving design for multi-turn personal-life assistance, tool-grounded planning, coding and terminal workflows, and protocol-grounded Generative UI.

Release blog: https://macaron.im/mindlab/research/macaron-v1-preview

## Highlights

- 749B-class Mixture-of-LoRA preview model: 744B base + 5 specialist LoRAs.
- Built for personal-agent tasks where user intent, private state, tools, and world state change across turns.
- Uses an explicit router-tool design: the default adapter can route to specialist LoRAs through `change_model`.
- Covers personal planning, search/calendar/tool workflows, coding and terminal tasks, computer-agent workflows, and A2UI Generative UI.
- Ships as a single Hugging Face repository: base model files at root, LoRA adapters in `l0/` through `l4/`.

## Model Overview

| Field | Value |
|---|---|
| Model name | Macaron-V1-Preview-749B |
| Organization | MindLab Research |
| Base model | GLM-5.1 |
| Architecture | Mixture-of-LoRA |
| Parameter footprint | 749B-class: 744B base + 5 x ~1B LoRA |
| Post-training system | MinT |
| Primary domain | Personal agents, tool-use agents, Generative UI |
| Release type | Preview |
| Checkpoint format | Single HF repo: base checkpoint at root; LoRAs under `l0/`-`l4/` |
| Context length | 202,752 tokens, from `config.json` / `tokenizer_config.json` |
| Precision | bfloat16, from `config.json` |
| License | MIT; see [License](#license) |

## Repository Layout

The release is intentionally kept in one Hugging Face model repository:

```text
.
|-- config.json
|-- generation_config.json
|-- model.safetensors.index.json
|-- model-00001-of-00282.safetensors
|-- ...
|-- model-00282-of-00282.safetensors
|-- tokenizer.json
|-- tokenizer_config.json
|-- l0/
|   |-- adapter_config.json
|   `-- adapter_model.safetensors
|-- l1/
|-- l2/
|-- l3/
`-- l4/
```

Adapter roles:

| Adapter | Role |
|---|---|
| `l0` | Default chat, general-purpose behavior, and routing entry point |
| `l1` | Personal-agent tasks such as calendar, planning, search, and life automation |
| `l2` | Coding, terminal, repository, and shell tasks |
| `l3` | A2UI and Generative UI |
| `l4` | Computer-agent / OpenClaw-style workflows |

## What Macaron Is For

A useful personal agent has to work where the user actually lives. Daily life is full of small contingent decisions: what to eat tonight, where to find a quiet table, how to reroute when traffic changes, how to schedule an errand around family obligations, or how to choose the right UI surface for a task. These tasks become hard because the user, tools, and environment all change while the agent is working.

Macaron-V1-Preview-749B targets three linked abilities:

- **Capability**: using real tools such as search, maps, restaurants, calendars, coding environments, and task APIs.
- **Coherence**: tracking a real human across turns, preferences, constraints, and changing intent.
- **Expression**: choosing the right surface, such as text, card, form, table, slider, or dashboard, and rendering it quickly enough to remain useful.

## Architecture

### Mixture-of-LoRA

Macaron-V1-Preview-749B keeps divergent skill families in separate LoRAs over a shared base model. This is intended to reduce interference between chat, personal-agent tool use, coding, computer-agent behavior, and Generative UI, while still allowing the system to add new specialist domains without modifying the base model or existing specialists.

### Router Tool

Macaron exposes model selection as a tool call rather than as an opaque separate router model. The default adapter is `l0`. When a specialist is needed, the serving harness can route through an OpenAI-compatible tool call such as:

```json
{
  "name": "change_model",
  "arguments": {
    "target_model": "l1"
  }
}
```

The route is visible in traces and compatible with a standard tool-calling serving loop. A complete deployment should define the adapter registry, routing policy, confirmation policy, and how the system returns to the default adapter after a specialist turn.

### Harness Co-Design

Macaron-V1-Preview-749B is a model-and-harness release. The model was trained and evaluated with a production-style agent harness that manages LoRA routing, tool calls, memory/state exposure, system prompts, and task metadata. Deployments that remove or replace that harness should expect behavior and benchmark results to change.

## Generative UI and A2UI

Generative UI is a core Macaron capability. For many personal-agent tasks, the best answer is not only text: it may be a comparison card, editable task summary, booking form, route choice, slider, or dashboard.

Macaron-V1-Preview-749B is trained and evaluated with A2UI-style protocol actions. A2UI-Bench scores Generative UI along three layers:

- **Protocol correctness**: emitted actions are well formed and faithful to protocol semantics.
- **Task construction correctness**: the generated UI answers the user's request.
- **User-experience lift**: the UI makes the task easier than a text-only answer.

The evaluation also includes rendered visual checks for failures that text-only judges can miss, such as overflow, broken layouts, hidden controls, and spacing issues.

## Evaluation

The headline benchmark suite focuses on personal-agent behavior, daily-life task surfaces, Generative UI, and OpenClaw-style workflows.

![Macaron-V1-Preview-749B benchmark bar chart](assets/macaron_benchmark_bar_chart.png)

![Macaron-V1-Preview-749B benchmark radar chart](assets/macaron_benchmark_radar_chart.png)

![Macaron-V1-Preview-749B benchmark table](assets/macaron_benchmark_table.png)

Higher is better for all scores shown in the figures.

### Evaluation Protocols

**Macaron LivingBench.** Models are evaluated on 30 multi-turn personal-agent cases with a 10-turn budget. The tested agent may make up to three tool-use decisions per user turn. API calls use a 240-second timeout and up to three request-level retries. The reported mean case score is `0.7 x need score + 0.3 x process score`.

**A2UI-Bench.** Macaron-V1-Preview-749B is evaluated without explicit schema hints. Scores include protocol correctness, task construction correctness, and rendered UI quality.

**VitaBench.** VitaBench is used to stress realistic daily-life workflows. Since the original official judge model is no longer available, GLM-5.1 is used as both the judge and user model. Each query is run three times and the reported value is the average score.

**PinchBench.** PinchBench is used for search-grounded, high-precision personal-agent tasks. The reported setup uses Claude Haiku 4.5 as the judge model and Perplexity as the search API, and reports the best observed score.

**Tau3 Bench.** The reported setup uses GPT-5.2 with `reasoning_effort=low` as the user simulator and reports pass@1.

**SWE-Bench Verified.** The reported setup allows up to three retries only when an evaluation error occurs and reports the best successful attempt. The overall evaluation-error rate is approximately 0.8%.

**Terminal-Bench 2.0.** The reported setup uses the Harbor framework to run Macaron with the Pi Coding Agent Harness in sandboxed environments, with a maximum timeout of 4 hours, and reports pass@1.

**AIME 2026.** The reported score is included as a general-capability reference; the preview release is optimized primarily for personal-agent behavior and Generative UI rather than for maximizing this benchmark.

## Intended Use

Macaron-V1-Preview-749B is intended for:

- personal assistant research
- multi-turn tool-use agents
- daily-life planning and automation
- coding and terminal-agent research
- Generative UI / A2UI research
- agent benchmark evaluation
- research on modular post-training and LoRA specialization

## Out-of-Scope Use

Macaron-V1-Preview-749B is not intended for:

- autonomous high-stakes decisions without human confirmation
- medical, legal, financial, or safety-critical advice as a sole authority
- covert surveillance or privacy-invasive automation
- fully unsupervised payments, bookings, messages, calendar changes, or other external write actions
- production deployment without task-specific safety testing, audit logs, and confirmation flows

## Installation and Loading

The repository contains both the base checkpoint and LoRA adapters, but full Macaron behavior depends on the router-aware serving harness. Loading a single LoRA is useful for inspection and specialist experiments; it is not equivalent to the full routed personal-agent system.

Install dependencies:

```bash
pip install -U transformers accelerate peft safetensors
```

Example: load the base checkpoint and attach one specialist LoRA:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

repo_id = "mindlab-research/Macaron-V1-Preview-749B"
adapter = "l1"

tokenizer = AutoTokenizer.from_pretrained(
    repo_id,
    trust_remote_code=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(
    base_model,
    repo_id,
    subfolder=adapter,
)
model.eval()
```

For full routed serving, use a harness that:

- registers all five LoRA specialists
- starts each conversation from `l0`
- exposes `change_model` as a tool call
- routes to specialists according to the adapter registry
- returns control to `l0` after specialist turns
- enforces confirmation for external write actions

## Tool Use

Macaron-V1-Preview-749B is designed to operate with external tools. Personal-agent deployments may include:

- search
- calendar
- route planning
- restaurant/place lookup
- booking
- messaging
- task-specific APIs
- A2UI rendering actions
- coding, shell, and repository tools

The model should request explicit user confirmation before external write actions such as booking, sending messages, changing calendars, or making purchases.

## Safety, Privacy, and Limitations

Macaron-V1-Preview-749B is designed for personal-agent settings where user state, calendar details, preferences, and inferred motivations may be sensitive. The model should avoid revealing private state unless the user explicitly authorizes disclosure.

Deployment recommendations:

- keep audit logs for tool calls
- require confirmation for external write actions
- separate private user state from visible conversation
- evaluate privacy leakage in the target harness
- test tool schemas before production use

Limitations:

- Preview release; behavior may change across versions.
- Full behavior depends on a correct harness, router, and tool schema.
- Agent performance can degrade if tools return stale, partial, or contradictory data.
- Long-horizon personal-agent tasks still require human confirmation for external actions.
- A2UI quality depends on renderer and protocol compatibility.
- Benchmark scores may not transfer to deployments with different tools, user simulators, routing policies, or safety constraints.

## License

Macaron-V1-Preview-749B is released under the MIT License. Users should also respect any requirements inherited from the GLM-5.1 base model and from dependencies used by the serving harness.

## Citation

```bibtex
@misc{macaron2026preview749b,
  title = {Macaron-V1-Preview-749B: Mixture-of-LoRA Personal Agent Model},
  author = {MindLab Research},
  year = {2026},
  howpublished = {Hugging Face}
}
```

## Contact

- Organization: MindLab Research
- Project: Macaron
- Release blog: https://macaron.im/mindlab/research/macaron-v1-preview