Instructions to use fla-hub/rwkv7-2.9B-world with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use fla-hub/rwkv7-2.9B-world with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="fla-hub/rwkv7-2.9B-world", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("fla-hub/rwkv7-2.9B-world", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use fla-hub/rwkv7-2.9B-world with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "fla-hub/rwkv7-2.9B-world"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fla-hub/rwkv7-2.9B-world",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/fla-hub/rwkv7-2.9B-world

SGLang

How to use fla-hub/rwkv7-2.9B-world with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "fla-hub/rwkv7-2.9B-world" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fla-hub/rwkv7-2.9B-world",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "fla-hub/rwkv7-2.9B-world" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fla-hub/rwkv7-2.9B-world",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use fla-hub/rwkv7-2.9B-world with Docker Model Runner:
```
docker model run hf.co/fla-hub/rwkv7-2.9B-world
```

Fix eos_token init and \n\n tokenization

by CISCai - opened May 28, 2025

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

-0

Fix eos_token init and \n\n tokenizationfc0a14c0

CISCai

May 28, 2025

Just setting eos_token to \n\n will cause transformers to add it to the end of the vocab (index 65530) and tokenization will then use this new token instead of the original token (index 261).

handle AddedToken input22c0bea9

CISCai

May 28, 2025

FYI, setting eos_token to \n\n in the first place breaks tokenization in itself as special tokens will be pretokenized by transformers, causing sequences such as \n \n\n to be tokenized to 262 261 instead of 3330 11 as in the original tokenizer!

zhiyuan8

fla-hub org May 28, 2025

Please contribute to RWKV-LM, since we only transform RWKV to fla's format.

CISCai

May 28, 2025

Please contribute to RWKV-LM, since we only transform RWKV to fla's format.

These are your changes to make in run in transformers is it not, none of this is in original code.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment