Instructions to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Akicou/Qwen3-30B-A3B-Instruct-REAMINI")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Akicou/Qwen3-30B-A3B-Instruct-REAMINI")
model = AutoModelForMultimodalLM.from_pretrained("Akicou/Qwen3-30B-A3B-Instruct-REAMINI")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Akicou/Qwen3-30B-A3B-Instruct-REAMINI"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Akicou/Qwen3-30B-A3B-Instruct-REAMINI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Akicou/Qwen3-30B-A3B-Instruct-REAMINI

SGLang

How to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Akicou/Qwen3-30B-A3B-Instruct-REAMINI" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Akicou/Qwen3-30B-A3B-Instruct-REAMINI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Akicou/Qwen3-30B-A3B-Instruct-REAMINI" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Akicou/Qwen3-30B-A3B-Instruct-REAMINI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with Docker Model Runner:
```
docker model run hf.co/Akicou/Qwen3-30B-A3B-Instruct-REAMINI
```

Will it break if I run it as is?

by sasa2000 - opened Feb 23

Discussion

sasa2000

Feb 23

from transformers import AutoModelForCausalLM, AutoTokenizer
from ream_moe import observe_model, prune_model, PruningConfig

Load model

model = AutoModelForCausalLM.from_pretrained(
"moonshotai/Moonlight-16B-A3B-Instruct",
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("moonshotai/Moonlight-16B-A3B-Instruct", trust_remote_code=True)

Collect activation statistics on calibration data

observer_data = observe_model(
model,
calibration_input_ids,
calibration_attention_mask,
)

Prune 25% of experts

config = PruningConfig(compression_ratio=0.25)
retained_counts = prune_model(model, observer_data, config)

Save compressed model

model.save_pretrained("./compressed_model")
tokenizer.save_pretrained("./compressed_model")

Loading checkpoint shards: 100% 27/27 [00:03<00:00, 5.62it/s]WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.

NameError Traceback (most recent call last)
/tmp/ipython-input-2843650284.py in <cell line: 0>()
14 observer_data = observe_model(
15 model,
---> 16 calibration_input_ids,
17 calibration_attention_mask,
18 )

NameError: name 'calibration_input_ids' is not defined

Akicou

Owner Feb 23

I didn't add moonshot support yet or atleast not yet...

Also I recommend using the terminal as you are to avoid errors (i only do terminal stuff). However thank you for pointing it out

python examples/compress.py \
   --model user/model
  --dataset hardcoded
 -- samples [1-50]

Im writing this on my phone but I can recommend doing this as it works for me

Adding a model however is pretty much the same as in reap.

sasa2000

Feb 23

Thank you ！

Akicou

Owner Feb 23

i forgot but you have to make sure to use --method merge is included as an argumentbecause the repo supports REAM AND REAP

Akicou

Owner Feb 23

Also it was my bad since the callibration set wasnt mentioned. the main branch is updated with better docs and a ipynb but i still recommend the terminal

Akicou changed discussion status to closed Feb 26

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment