Image-Text-to-Text
Transformers
Safetensors
qwen3_5
vero
vision-language-model
multimodal
visual-reasoning
reinforcement-learning
conversational
Instructions to use zlab-princeton/Vero-Qwen35-9B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zlab-princeton/Vero-Qwen35-9B-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="zlab-princeton/Vero-Qwen35-9B-Base") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("zlab-princeton/Vero-Qwen35-9B-Base") model = AutoModelForMultimodalLM.from_pretrained("zlab-princeton/Vero-Qwen35-9B-Base") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use zlab-princeton/Vero-Qwen35-9B-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "zlab-princeton/Vero-Qwen35-9B-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zlab-princeton/Vero-Qwen35-9B-Base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/zlab-princeton/Vero-Qwen35-9B-Base
- SGLang
How to use zlab-princeton/Vero-Qwen35-9B-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "zlab-princeton/Vero-Qwen35-9B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zlab-princeton/Vero-Qwen35-9B-Base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "zlab-princeton/Vero-Qwen35-9B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zlab-princeton/Vero-Qwen35-9B-Base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use zlab-princeton/Vero-Qwen35-9B-Base with Docker Model Runner:
docker model run hf.co/zlab-princeton/Vero-Qwen35-9B-Base
jinja template
#1
by kalle07 - opened
minor issue but significant impact, so
https://huggingface.co/mradermacher/Vero-Qwen35-9B-Base-GGUF
this are not working
you jinja is wrong ... not accepted by lm-studio and several gguf proivder
here is one that works ... you can change if you think its mroe related to your training but system prompt dont need such hard coding like your template
tninking you can simple enable disable
{%- set enable_thinking = True-%}
{%- set fixed_system_prompt = "Describe the image clearly and accurately." %}
{%- set image_count = namespace(value=0) %}
{%- set video_count = namespace(value=0) %}
{%- macro render_content(content, do_vision_count=false, is_system_content=false) %}
{%- if content is string %}
{{- content }}
{%- elif content is iterable and content is not mapping %}
{%- for item in content %}
{# ---------- IMAGE ---------- #}
{%- if (
('image' in item)
or ('image_url' in item)
or (item.type is defined and item.type == 'image')
) %}
{%- if is_system_content %}
{{- raise_exception('System message cannot contain images.') }}
{%- endif %}
{%- if do_vision_count %}
{%- set image_count.value = image_count.value + 1 %}
{%- endif %}
{%- if add_vision_id %}
{{- 'Picture ' ~ image_count.value ~ ': ' }}
{%- endif %}
{{- '<|vision_start|><|image_pad|><|vision_end|>' }}
{# ---------- VIDEO ---------- #}
{%- elif (
('video' in item)
or (item.type is defined and item.type == 'video')
) %}
{%- if is_system_content %}
{{- raise_exception('System message cannot contain videos.') }}
{%- endif %}
{%- if do_vision_count %}
{%- set video_count.value = video_count.value + 1 %}
{%- endif %}
{%- if add_vision_id %}
{{- 'Video ' ~ video_count.value ~ ': ' }}
{%- endif %}
{{- '<|vision_start|><|video_pad|><|vision_end|>' }}
{# ---------- TEXT ---------- #}
{%- elif item.text is defined %}
{{- item.text }}
{%- else %}
{{- raise_exception('Unexpected item type in content.') }}
{%- endif %}
{%- endfor %}
{%- elif content is none or content is undefined %}
{{- '' }}
{%- else %}
{{- raise_exception('Unexpected content type.') }}
{%- endif %}
{%- endmacro %}
{# ====================================================== #}
{# VALIDATION #}
{# ====================================================== #}
{%- if not messages %}
{{- raise_exception('No messages provided.') }}
{%- endif %}
{# ====================================================== #}
{# SYSTEM PROMPT #}
{# ====================================================== #}
{%- if tools and tools is iterable and tools is not mapping %}
{{- '<|im_start|>system\n' }}
{{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>" }}
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\n- Function calls MUST follow the specified format\n- Required parameters MUST be specified\n- If there is no function call available, answer normally\n</IMPORTANT>' }}
{%- set content = fixed_system_prompt | trim %}
{%- if content %}
{{- '\n\n' + content }}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- else %}
{{- '<|im_start|>system\n' }}
{{- fixed_system_prompt | trim }}
{{- '<|im_end|>\n' }}
{%- endif %}
{# ====================================================== #}
{# FIND LAST USER QUERY #}
{# ====================================================== #}
{%- set ns = namespace(
multi_step_tool=true,
last_query_index=messages|length - 1
) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" %}
{%- set content = render_content(message.content, false) | trim %}
{%- if not (
content.startswith('<tool_response>')
and content.endswith('</tool_response>')
) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if ns.multi_step_tool %}
{{- raise_exception('No user query found in messages.') }}
{%- endif %}
{# ====================================================== #}
{# RENDER MESSAGES #}
{# ====================================================== #}
{%- for message in messages %}
{%- set content = render_content(message.content, true) | trim %}
{# ---------- SYSTEM ---------- #}
{%- if message.role == "system" %}
{%- if not loop.first %}
{{- raise_exception('System message must be at the beginning.') }}
{%- endif %}
{# ---------- USER ---------- #}
{%- elif message.role == "user" %}
{{- '<|im_start|>user\n' }}
{{- content }}
{{- '<|im_end|>\n' }}
{# ---------- ASSISTANT ---------- #}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content =
content.split('</think>')[0]
.rstrip('\n')
.split('<think>')[-1]
.lstrip('\n')
%}
{%- set content =
content.split('</think>')[-1]
.lstrip('\n')
%}
{%- endif %}
{%- endif %}
{%- set reasoning_content = reasoning_content | trim %}
{{- '<|im_start|>assistant\n' }}
{%- if loop.index0 > ns.last_query_index and reasoning_content %}
{{- '<think>\n' }}
{{- reasoning_content }}
{{- '\n</think>\n\n' }}
{%- endif %}
{{- content }}
{# ---------- TOOL CALLS ---------- #}
{%- if message.tool_calls
and message.tool_calls is iterable
and message.tool_calls is not mapping
%}
{%- for tool_call in message.tool_calls %}
{%- if tool_call.function is defined %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '\n<tool_call>\n' }}
{{- '<function=' + tool_call.name + '>\n' }}
{%- if tool_call.arguments is defined %}
{%- for args_name, args_value in tool_call.arguments.items() %}
{{- '<parameter=' + args_name + '>\n' }}
{%- if args_value is mapping
or (
args_value is iterable
and args_value is not string
)
%}
{{- args_value | tojson }}
{%- else %}
{{- args_value | string }}
{%- endif %}
{{- '\n</parameter>\n' }}
{%- endfor %}
{%- endif %}
{{- '</function>\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{# ---------- TOOL ---------- #}
{%- elif message.role == "tool" %}
{%- if loop.previtem and loop.previtem.role != "tool" %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or loop.nextitem.role != "tool" %}
{{- '<|im_end|>\n' }}
{%- endif %}
{# ---------- INVALID ROLE ---------- #}
{%- else %}
{{- raise_exception('Unexpected message role.') }}
{%- endif %}
{%- endfor %}
{# ====================================================== #}
{# GENERATION PROMPT #}
{# ====================================================== #}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if enable_thinking is defined and enable_thinking %}
{{- '<think>\n' }}
{%- else %}
{{- '<think>\n\n</think>\n\n' }}
{%- endif %}
{%- endif %}