Instructions to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Akicou/Qwen3-30B-A3B-Instruct-REAMINI") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Akicou/Qwen3-30B-A3B-Instruct-REAMINI") model = AutoModelForMultimodalLM.from_pretrained("Akicou/Qwen3-30B-A3B-Instruct-REAMINI") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Akicou/Qwen3-30B-A3B-Instruct-REAMINI" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Akicou/Qwen3-30B-A3B-Instruct-REAMINI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Akicou/Qwen3-30B-A3B-Instruct-REAMINI
- SGLang
How to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Akicou/Qwen3-30B-A3B-Instruct-REAMINI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Akicou/Qwen3-30B-A3B-Instruct-REAMINI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Akicou/Qwen3-30B-A3B-Instruct-REAMINI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Akicou/Qwen3-30B-A3B-Instruct-REAMINI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Akicou/Qwen3-30B-A3B-Instruct-REAMINI with Docker Model Runner:
docker model run hf.co/Akicou/Qwen3-30B-A3B-Instruct-REAMINI
Will it break if I run it as is?
from transformers import AutoModelForCausalLM, AutoTokenizer
from ream_moe import observe_model, prune_model, PruningConfig
Load model
model = AutoModelForCausalLM.from_pretrained(
"moonshotai/Moonlight-16B-A3B-Instruct",
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("moonshotai/Moonlight-16B-A3B-Instruct", trust_remote_code=True)
Collect activation statistics on calibration data
observer_data = observe_model(
model,
calibration_input_ids,
calibration_attention_mask,
)
Prune 25% of experts
config = PruningConfig(compression_ratio=0.25)
retained_counts = prune_model(model, observer_data, config)
Save compressed model
model.save_pretrained("./compressed_model")
tokenizer.save_pretrained("./compressed_model")
Loading checkpoint shards: 100% 27/27 [00:03<00:00, 5.62it/s]WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.
NameError Traceback (most recent call last)
/tmp/ipython-input-2843650284.py in <cell line: 0>()
14 observer_data = observe_model(
15 model,
---> 16 calibration_input_ids,
17 calibration_attention_mask,
18 )
NameError: name 'calibration_input_ids' is not defined
I didn't add moonshot support yet or atleast not yet...
Also I recommend using the terminal as you are to avoid errors (i only do terminal stuff). However thank you for pointing it out
python examples/compress.py \
--model user/model
--dataset hardcoded
-- samples [1-50]
Im writing this on my phone but I can recommend doing this as it works for me
Adding a model however is pretty much the same as in reap.
Thank you !
i forgot but you have to make sure to use --method merge is included as an argumentbecause the repo supports REAM AND REAP
Also it was my bad since the callibration set wasnt mentioned. the main branch is updated with better docs and a ipynb but i still recommend the terminal