Instructions to use 01-ai/Yi-34B-200K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 01-ai/Yi-34B-200K with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="01-ai/Yi-34B-200K")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-34B-200K") model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-34B-200K") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use 01-ai/Yi-34B-200K with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "01-ai/Yi-34B-200K" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "01-ai/Yi-34B-200K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/01-ai/Yi-34B-200K
- SGLang
How to use 01-ai/Yi-34B-200K with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "01-ai/Yi-34B-200K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "01-ai/Yi-34B-200K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "01-ai/Yi-34B-200K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "01-ai/Yi-34B-200K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use 01-ai/Yi-34B-200K with Docker Model Runner:
docker model run hf.co/01-ai/Yi-34B-200K
Weight updates?
I noticed the weights of this model got updated!
Rope Theta is different too.
What changed? Is the long context performance stronger now?
@brucethemoose We have indeed enhanced the capabilities of this model. In the "Needle-in-a-Haystack" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. You are always welcome to refer to the news section on our model card for the most updated and detailed information.
Pardon the pings to you guys, and forgive me if this is bad etiquette, but I thought this update might be worth the heads up just in case any of you have plans for future models down the pipeline so you can update beforehand (love your works btw!)
I shall return to the background in peace and leave you guys be, thank you.
Would be good to see 01-ai do fine-tuning/DPO on 200k context like Nous Capybara -- quite good model
Please consider giving the newly updated weights a new version number. There needs to be something that differentiates these weights from the originals.
Pardon the pings to you guys, and forgive me if this is bad etiquette, but I thought this update might be worth the heads up just in case any of you have plans for future models down the pipeline so you can update beforehand (love your works btw!)
Thanks for the heads up @ParasiticRogue
https://huggingface.co/01-ai/Yi-34B-200K/discussions/13#65e961b7d0cc7c76b925e133
Thank you for the ping! Appreciate it! I wouldn't have known otherwise. @ParasiticRogue
@MeisterDeLaV -- In the future, please release new versions guys, when you update. i.e. (Yi-34B-200K-v0.2), otherwise we don't know that there's been an update.