Instructions to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1")
model = AutoModelForMultimodalLM.from_pretrained("nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1

SGLang

How to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with Docker Model Runner:
```
docker model run hf.co/nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1
```

License Compatibility

by qiuqiu666 - opened Jun 24, 2025

Discussion

qiuqiu666

Jun 24, 2025

Hi , I’d like to report a potential license compatibility issue in
nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1. From what I can tell, this model appears to be a merged version that includes tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.1 as one of its components, which is licensed under the Gemma License.

However, the merged model is currently published under the LLaMA 3.3 Community License, which may not be fully compatible with the terms of the Gemma license. This could raise legal and compliance issues regarding redistribution and downstream usage.

⚠️ Key Potential Conflicts Between the Gemma License and LLaMA 3.3 License:

Gemma License Restrictions:
• Only permits use and modification for non-commercial research purposes
• Prohibits redistribution unless all components comply with Google's terms
• Requires clear attribution and license propagation for any derivatives
• Includes Google’s Acceptable Use Policy, which must be preserved

LLaMA 3.3 License:
• Requires use of the LLaMA 3.3 Community License exclusively
• Imposes Meta's Acceptable Use Policy
• Mandates a “NOTICE” file with exact attribution and license link
• Imposes downstream licensing constraints that are not easily reconciled with Gemma's

Conflict:
→ If LLaMA 3.3 license is applied globally to the merged model, it may remove or override critical obligations from the Gemma license — especially around redistribution and use-case restrictions.
→ Merged models must respect the **most restrictive** license among all base components.

🔹 Suggestions for Resolving

To ensure license compliance and clarity:

1. Acknowledge in the model card or README that the model merges components under the Gemma license
2. Include both the Gemma license and the LLaMA 3.3 license in the repository or model card
3. Add a NOTICE file with:
   • Attribution to both Meta (LLaMA 3.3) and Google (Gemma)
   • All required license texts and URLs
4. Clarify the scope of allowed usage:
   • If Gemma prohibits commercial use, that restriction should apply to the entire merged model
5. Consider using dual licensing tags or clarifying license scope for each merged component

Let me know if I misunderstood anything — happy to help clarify further!

Thanks for your attention!

nitky

Owner Jul 2, 2025

Nice, thanks for confirming!

nitky changed discussion status to closed Jul 2, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment