Instructions to use arcee-ai/Trinity-Nano-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use arcee-ai/Trinity-Nano-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="arcee-ai/Trinity-Nano-Preview", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("arcee-ai/Trinity-Nano-Preview", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("arcee-ai/Trinity-Nano-Preview", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use arcee-ai/Trinity-Nano-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "arcee-ai/Trinity-Nano-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arcee-ai/Trinity-Nano-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/arcee-ai/Trinity-Nano-Preview
- SGLang
How to use arcee-ai/Trinity-Nano-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "arcee-ai/Trinity-Nano-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arcee-ai/Trinity-Nano-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "arcee-ai/Trinity-Nano-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arcee-ai/Trinity-Nano-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use arcee-ai/Trinity-Nano-Preview with Docker Model Runner:
docker model run hf.co/arcee-ai/Trinity-Nano-Preview
An example of Trinity-Nano-Preview punching above its weight.
Model produced a response indicating recognition of a contradiction and of self-reference:
Prompt: What's funny about "DNUA: Do Not Use Abbreviations"? Please explain logically why it is funny in a one-line extended paragraph.
Response: DNUA is an abbreviation of the phrase "Do Not Use Abbreviations". The humor in this term comes from the fact that it itself is an abbreviation, and as such, it is an example of a practice that it is advising against. By using the phrase "Do Not Use Abbreviations", it is directly violating the very advice it is giving, making it a humorous example of a rule that should be followed. This creates a paradoxical or contradictory statement that is inherently funny,
Using
$ llama-server --chat-template-kwargs '{"enable_thinking": false}' -m Trinity-Nano-Preview.i1-Q5_K_M.gguf -t 5 -c 2048 -fa auto --context-shift --jinja -ngl -1
Other models I tried with 6GB and smaller quants failed to answer correctly.
I think contradiction-recognition and self-reference awareness is rare in these model sizes. I would be happy to read other opinions.
[EDIT] At Q4_K_M I get:
Response: It's funny because it's a clear, direct warning about not using abbreviations, yet it uses an abbreviation itself, creating a self-defeating loop of ambiguity. This meta-humorous situation plays on the common practice of using acronyms, where the very act of using "DNUA" to warn against abbreviations becomes an example of the very behavior it's criticizing. The humor is in the unintentional contradiction between