Text Generation
Transformers
Safetensors
Japanese
English
qwen3_moe
qwen3
qwen3-moe
swallow
medical
japanese
conversational
Instructions to use tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B") model = AutoModelForMultimodalLM.from_pretrained("tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B
- SGLang
How to use tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B with Docker Model Runner:
docker model run hf.co/tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B
| license: apache-2.0 | |
| language: | |
| - ja | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - qwen3 | |
| - qwen3-moe | |
| - swallow | |
| - medical | |
| - japanese | |
| - text-generation | |
| base_model: tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2 | |
| # Medical-Qwen3-Swallow-30B-A3B | |
| Medical-Qwen3-Swallow-30B-A3B is a medical-domain language model based on `tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2`. It is designed to support research and development toward safe and trustworthy AI for Japanese clinical settings. | |
| The model follows the Qwen3-Swallow model family, which is a bilingual Japanese-English model family based on Qwen3 and developed through continual pre-training, supervised fine-tuning, and reinforcement learning with verifiable rewards. | |
| ## Highlights | |
| - Medical-domain adaptation of Qwen3-Swallow-30B-A3B-RL-v0.2 | |
| - Mixture-of-Experts architecture inherited from the base 30B-A3B model | |
| - Bilingual Japanese-English capability inherited from Qwen3-Swallow | |
| - Evaluated on Japanese medical and healthcare-related benchmarks | |
| - Intended for research use in medical AI safety and reliability evaluation | |
| ## Model Details | |
| - **Model type:** Causal language model, Mixture-of-Experts | |
| - **Base model:** `tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2` | |
| - **Language(s):** Japanese, English | |
| - **Tokenizer:** Qwen3-Swallow tokenizer | |
| - **License:** Apache License 2.0 | |
| ## Model Performance | |
| The following results compare the base model and this medical-domain model on medical benchmarks. General benchmark results are intentionally omitted because this release focuses on medical-domain performance. | |
| | Model | IgakuQA | JJSIMQA | JMMLU Medical | MMLU_Medical_JP | MedMCQA_JP | MedQA_JP | JUSMLEQA_JP | YakugakuQA | | |
| | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | |
| | `tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2` | 0.768 | 0.699 | 0.761 | 0.782 | 0.578 | 0.616 | 0.681 | 0.676 | | |
| | `Medical-Qwen3-Swallow-30B-A3B` | 0.798 | 0.763 | 0.788 | 0.805 | 0.629 | 0.666 | 0.725 | 0.715 | | |
| ## Usage | |
| This model is expected to work with Hugging Face Transformers and vLLM-compatible inference stacks. | |
| ### vLLM | |
| ```sh | |
| vllm serve tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B --reasoning-parser qwen3 --max-model-len 32768 | |
| ``` | |
| Once the server is running, you can send requests using an OpenAI-compatible client. | |
| ```python | |
| from openai import OpenAI | |
| model_name = "tokyotech-llm/Medical-Qwen3-Swallow-30B-A3B" | |
| client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY") | |
| result = client.chat.completions.create( | |
| model=model_name, | |
| messages=[ | |
| {"role": "user", "content": "日本語で、臨床現場における生成AI利用時の注意点を説明してください。"} | |
| ], | |
| max_tokens=2048, | |
| temperature=0.6, | |
| top_p=0.95, | |
| extra_body={ | |
| "top_k": 20, | |
| "min_p": 0, | |
| }, | |
| ) | |
| print(result.choices[0].message.content) | |
| ``` | |
| ## Best Practices | |
| We recommend using the generation parameters specified in `generation_config.json` when available. For Qwen3-Swallow models, commonly used settings include `temperature=0.6`, `top_p=0.95`, `top_k=20`, and `min_p=0`. | |
| We also recommend specifying a maximum context length of 32,768 tokens or less for inference unless your serving stack has been validated with a longer context. | |
| ## Training Data | |
| This model was adapted from Qwen3-Swallow-30B-A3B-RL-v0.2 using a mixture that emphasizes medical-domain text while retaining general-domain data. The medical-domain data includes resources such as biomedical literature, medical synthetic data, medical QA-style data, and clinical guideline-style text. | |
| ## Risks and Limitations | |
| This model is intended for research and development. It has not been validated as a medical device and must not be used as a substitute for professional medical judgment. Outputs may contain factual errors, unsafe recommendations, or unsupported clinical claims. Any clinical use requires careful human review, validation, and compliance with applicable laws, regulations, and institutional policies. | |
| ## License | |
| Apache License 2.0 | |
| ## How to Cite | |
| If you find our work helpful, please feel free to cite these papers. The Qwen3-Swallow and GPT-OSS-Swallow Technical Paper (Training Details) will be released in March. | |
| ### Continual Pre-Training | |
| ```bibtex | |
| @inproceedings{ | |
| fujii2024continual, | |
| title={Continual Pre-Training for Cross-Lingual {LLM} Adaptation: Enhancing Japanese Language Capabilities}, | |
| author={Kazuki Fujii and Taishi Nakamura and Mengsay Loem and Hiroki Iida and Masanari Ohi and Kakeru Hattori and Hirai Shota and Sakae Mizuki and Rio Yokota and Naoaki Okazaki}, | |
| booktitle={First Conference on Language Modeling}, | |
| year={2024} | |
| } | |
| ``` | |
| ### Supervised Fine-Tuning | |
| ```bibtex | |
| @inproceedings{ | |
| ma2025building, | |
| title={Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models}, | |
| author={Youmi Ma and Sakae Mizuki and Kazuki Fujii and Taishi Nakamura and Masanari Ohi and Hinari Shimada and Taihei Shiotani and Koshiro Saito and Koki Maeda and Kakeru Hattori and Takumi Okamoto and Shigeki Ishida and Rio Yokota and Hiroya Takamura and Naoaki Okazaki}, | |
| booktitle={Second Conference on Language Modeling}, | |
| year={2025} | |
| } | |
| ``` | |
| ## References | |
| [Yang, 2025] Alibaba. Qwen3 Technical Report, arxiv:2505.09388. | |
| ## Acknowledgements | |
| This work builds on Qwen3 and Qwen3-Swallow. We thank the Qwen team and the contributors to the Qwen3-Swallow project. | |
| この成果は、国立研究開発法人新エネルギー・産業技術総合開発機構(NEDO)の助成事業(JPNP25006)の結果得られたものです。 | |