Instructions to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1") model = AutoModelForMultimodalLM.from_pretrained("nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1
- SGLang
How to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1 with Docker Model Runner:
docker model run hf.co/nitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1
License Compatibility
Hi , I’d like to report a potential license compatibility issue innitky/Llama-3.3-SuperSwallow-70B-Instruct-v0.1. From what I can tell, this model appears to be a merged version that includes tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.1 as one of its components, which is licensed under the Gemma License.
However, the merged model is currently published under the LLaMA 3.3 Community License, which may not be fully compatible with the terms of the Gemma license. This could raise legal and compliance issues regarding redistribution and downstream usage.
⚠️ Key Potential Conflicts Between the Gemma License and LLaMA 3.3 License:
Gemma License Restrictions:
• Only permits use and modification for non-commercial research purposes
• Prohibits redistribution unless all components comply with Google's terms
• Requires clear attribution and license propagation for any derivatives
• Includes Google’s Acceptable Use Policy, which must be preserved
LLaMA 3.3 License:
• Requires use of the LLaMA 3.3 Community License exclusively
• Imposes Meta's Acceptable Use Policy
• Mandates a “NOTICE” file with exact attribution and license link
• Imposes downstream licensing constraints that are not easily reconciled with Gemma's
Conflict:
→ If LLaMA 3.3 license is applied globally to the merged model, it may remove or override critical obligations from the Gemma license — especially around redistribution and use-case restrictions.
→ Merged models must respect the **most restrictive** license among all base components.
🔹 Suggestions for Resolving
To ensure license compliance and clarity:
1. Acknowledge in the model card or README that the model merges components under the Gemma license
2. Include both the Gemma license and the LLaMA 3.3 license in the repository or model card
3. Add a NOTICE file with:
• Attribution to both Meta (LLaMA 3.3) and Google (Gemma)
• All required license texts and URLs
4. Clarify the scope of allowed usage:
• If Gemma prohibits commercial use, that restriction should apply to the entire merged model
5. Consider using dual licensing tags or clarifying license scope for each merged component
Let me know if I misunderstood anything — happy to help clarify further!
Thanks for your attention!
Nice, thanks for confirming!