Instructions to use internlm/Intern-S1-Pro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use internlm/Intern-S1-Pro with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="internlm/Intern-S1-Pro", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("internlm/Intern-S1-Pro", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use internlm/Intern-S1-Pro with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "internlm/Intern-S1-Pro"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S1-Pro",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/internlm/Intern-S1-Pro

SGLang

How to use internlm/Intern-S1-Pro with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "internlm/Intern-S1-Pro" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S1-Pro",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "internlm/Intern-S1-Pro" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S1-Pro",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use internlm/Intern-S1-Pro with Docker Model Runner:
```
docker model run hf.co/internlm/Intern-S1-Pro
```

jack-zxy commited on Feb 12

Commit

1814c32

verified ·

1 Parent(s): 1ba4154

Update vllm info

Browse files

Seems sth wrong with vLLM TP deployment, remove it for now. And add detailed information about vllm docker image.

Files changed (1) hide show

deployment_guide.md +1 -19

deployment_guide.md CHANGED Viewed

@@ -59,25 +59,7 @@ lmdeploy serve api_server \
 ## vLLM
-- Tensor Parallelism + Expert Parallelism
-```bash
-# start ray on node 0 and node 1
-# node 0
-export VLLM_ENGINE_READY_TIMEOUT_S=10000
-vllm serve internlm/Intern-S1-Pro \
-    --tensor-parallel-size 16 \
-    --enable-expert-parallel \
-    --distributed-executor-backend ray \
-    --max-model-len 65536 \
-    --trust-remote-code \
-    --reasoning-parser deepseek_r1 \
-    --enable-auto-tool-choice \
-    --tool-call-parser hermes
-```
-- Data Parallelism + Expert Parallelism
 ```bash
 # node 0

 ## vLLM
+You can use the vLLM nightly-built docker image `vllm/vllm-openai:nightly` to deploy. Refer to [using-docker](https://docs.vllm.ai/en/latest/deployment/docker/?h=docker) for more.
 ```bash
 # node 0