Instructions to use fla-hub/rwkv7-2.9B-world with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fla-hub/rwkv7-2.9B-world with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="fla-hub/rwkv7-2.9B-world", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("fla-hub/rwkv7-2.9B-world", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use fla-hub/rwkv7-2.9B-world with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "fla-hub/rwkv7-2.9B-world" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fla-hub/rwkv7-2.9B-world", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/fla-hub/rwkv7-2.9B-world
- SGLang
How to use fla-hub/rwkv7-2.9B-world with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "fla-hub/rwkv7-2.9B-world" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fla-hub/rwkv7-2.9B-world", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "fla-hub/rwkv7-2.9B-world" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fla-hub/rwkv7-2.9B-world", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use fla-hub/rwkv7-2.9B-world with Docker Model Runner:
docker model run hf.co/fla-hub/rwkv7-2.9B-world
Fix eos_token init and \n\n tokenization
Just setting eos_token to \n\n will cause transformers to add it to the end of the vocab (index 65530) and tokenization will then use this new token instead of the original token (index 261).
FYI, setting eos_token to \n\n in the first place breaks tokenization in itself as special tokens will be pretokenized by transformers, causing sequences such as \n \n\n to be tokenized to 262 261 instead of 3330 11 as in the original tokenizer!
Please contribute to RWKV-LM, since we only transform RWKV to fla's format.
Please contribute to RWKV-LM, since we only transform RWKV to fla's format.
These are your changes to make in run in transformers is it not, none of this is in original code.