Instructions to use arcee-ai/Trinity-Nano-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use arcee-ai/Trinity-Nano-Preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="arcee-ai/Trinity-Nano-Preview", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("arcee-ai/Trinity-Nano-Preview", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("arcee-ai/Trinity-Nano-Preview", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use arcee-ai/Trinity-Nano-Preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "arcee-ai/Trinity-Nano-Preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arcee-ai/Trinity-Nano-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/arcee-ai/Trinity-Nano-Preview

SGLang

How to use arcee-ai/Trinity-Nano-Preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "arcee-ai/Trinity-Nano-Preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arcee-ai/Trinity-Nano-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "arcee-ai/Trinity-Nano-Preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arcee-ai/Trinity-Nano-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use arcee-ai/Trinity-Nano-Preview with Docker Model Runner:
```
docker model run hf.co/arcee-ai/Trinity-Nano-Preview
```

An example of Trinity-Nano-Preview punching above its weight.

by BingoBird - opened Jan 26

Discussion

BingoBird

Jan 26

•

edited Jan 26

Model produced a response indicating recognition of a contradiction and of self-reference:

Prompt: What's funny about "DNUA: Do Not Use Abbreviations"? Please explain logically why it is funny in a one-line extended paragraph.

Response: DNUA is an abbreviation of the phrase "Do Not Use Abbreviations". The humor in this term comes from the fact that it itself is an abbreviation, and as such, it is an example of a practice that it is advising against. By using the phrase "Do Not Use Abbreviations", it is directly violating the very advice it is giving, making it a humorous example of a rule that should be followed. This creates a paradoxical or contradictory statement that is inherently funny,

Using 
$ llama-server --chat-template-kwargs '{"enable_thinking": false}' -m Trinity-Nano-Preview.i1-Q5_K_M.gguf -t 5 -c 2048 -fa auto --context-shift  --jinja -ngl -1

Other models I tried with 6GB and smaller quants failed to answer correctly.

I think contradiction-recognition and self-reference awareness is rare in these model sizes. I would be happy to read other opinions.

BingoBird

Jan 26

•

edited Jan 26

[EDIT] At Q4_K_M I get:

Response: It's funny because it's a clear, direct warning about not using abbreviations, yet it uses an abbreviation itself, creating a self-defeating loop of ambiguity. This meta-humorous situation plays on the common practice of using acronyms, where the very act of using "DNUA" to warn against abbreviations becomes an example of the very behavior it's criticizing. The humor is in the unintentional contradiction between

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment