Instructions to use brucethemoose/Yi-34B-200K-RPMerge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use brucethemoose/Yi-34B-200K-RPMerge with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="brucethemoose/Yi-34B-200K-RPMerge")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("brucethemoose/Yi-34B-200K-RPMerge") model = AutoModelForMultimodalLM.from_pretrained("brucethemoose/Yi-34B-200K-RPMerge") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use brucethemoose/Yi-34B-200K-RPMerge with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "brucethemoose/Yi-34B-200K-RPMerge" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "brucethemoose/Yi-34B-200K-RPMerge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/brucethemoose/Yi-34B-200K-RPMerge
- SGLang
How to use brucethemoose/Yi-34B-200K-RPMerge with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "brucethemoose/Yi-34B-200K-RPMerge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "brucethemoose/Yi-34B-200K-RPMerge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "brucethemoose/Yi-34B-200K-RPMerge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "brucethemoose/Yi-34B-200K-RPMerge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use brucethemoose/Yi-34B-200K-RPMerge with Docker Model Runner:
docker model run hf.co/brucethemoose/Yi-34B-200K-RPMerge
GGUF version please
Could you please release the GGUF version of this model?
Yeah. I am busy today but will kick off the imatrix quantization tonight, I have been meaning to mess with that anyway.
That's great. I'm waiting for that.
Please release Q5_K_M and q4_k_m too if that's possible.
Yeah they will all be imatrixed
Trying to convert to GGUF but it's missing a tokenizer.model file. Using the one from regular Yi leads to other errors.
Still doing this, but I literally fell asleep on my keyboard, lol.
I think I know how to generate a tokenizer as well, let's see
π. I'm really looking forward to it.
I'm kinda stumped tbh, if I run:
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("/home/alpha/Models/Raw/RPmerge/")
tok.save_pretrained("/home/alpha/Models/Raw/temp/", legacy_format=True)
There is still not an option to output a tokenizer.model. Currently tryint to trace back and see how it's even generated.
python convert.py /home/alpha/Models/Raw/RPmerge/ --vocab-only --vocab-type hfft --outfile tokenizer.model Seems to work? I will quantize and see if it actually does.
I'm waiting for the results.. π
Do think of adding the Orca-Vicuna chat template to tokenize_config.json:"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{message['role'] + ' :' + message['content'] + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ 'ASSISTANT: ' }}{% endif %}",
I'll make my own quants just in case.
Yeah I figured it out as well, making some imatrix quants
Do think of adding the Orca-Vicuna chat template to tokenize_config.json:
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{message['role'] + ' :' + message['content'] + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ 'ASSISTANT: ' }}{% endif %}",I'll make my own quants just in case.
This is a good idea.
Is this template correct though? I don't see anything that adds the USER: or SYSTEM: message.
Uploading now: https://huggingface.co/brucethemoose/Yi-34B-200K-RPMerge-iMat.GGUF
Also:
GGUFs uploading now: https://huggingface.co/MarsupialAI/Yi-34B-200K-RPMerge_GGUF
Is this template correct though? I don't see anything that adds the USER: or SYSTEM: message.
{message['role'] + ' : ' + message['content'] + '\n'}
role USER, SYSTEM or ASSISTANT.
{% if add_generation_prompt %}{{ 'ASSISTANT: ' }}{% endif %}
Adds 'ASSISTANT:' when you only send the history.
Also there's an error and it should be ASSISTANT: and message['role'] + ': ' + message['content']
Whoops
Thank you so much for all your hard work. I hope really appreciate it.