Instructions to use sowilow/Next2-Air-DGX-Spark-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use sowilow/Next2-Air-DGX-Spark-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sowilow/Next2-Air-DGX-Spark-GGUF", filename="next2-air-mmproj-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use sowilow/Next2-Air-DGX-Spark-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16 # Run inference directly in the terminal: llama-cli -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16 # Run inference directly in the terminal: llama-cli -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16 # Run inference directly in the terminal: ./llama-cli -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16
Use Docker
docker model run hf.co/sowilow/Next2-Air-DGX-Spark-GGUF:F16
- LM Studio
- Jan
- vLLM
How to use sowilow/Next2-Air-DGX-Spark-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sowilow/Next2-Air-DGX-Spark-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sowilow/Next2-Air-DGX-Spark-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/sowilow/Next2-Air-DGX-Spark-GGUF:F16
- Ollama
How to use sowilow/Next2-Air-DGX-Spark-GGUF with Ollama:
ollama run hf.co/sowilow/Next2-Air-DGX-Spark-GGUF:F16
- Unsloth Studio
How to use sowilow/Next2-Air-DGX-Spark-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sowilow/Next2-Air-DGX-Spark-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sowilow/Next2-Air-DGX-Spark-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sowilow/Next2-Air-DGX-Spark-GGUF to start chatting
- Pi
How to use sowilow/Next2-Air-DGX-Spark-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sowilow/Next2-Air-DGX-Spark-GGUF:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sowilow/Next2-Air-DGX-Spark-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sowilow/Next2-Air-DGX-Spark-GGUF:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sowilow/Next2-Air-DGX-Spark-GGUF:F16
Run Hermes
hermes
- Docker Model Runner
How to use sowilow/Next2-Air-DGX-Spark-GGUF with Docker Model Runner:
docker model run hf.co/sowilow/Next2-Air-DGX-Spark-GGUF:F16
- Lemonade
How to use sowilow/Next2-Air-DGX-Spark-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sowilow/Next2-Air-DGX-Spark-GGUF:F16
Run and chat with the model
lemonade run user.Next2-Air-DGX-Spark-GGUF-F16
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)🚀 v0.1.6: Real-time Metrics & Blackwell-Optimized Docker (Recommended)
This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the state-of-the-art inference engine optimized for NVIDIA Blackwell (DGX Spark) hardware.
🌟 Key Features (v0.1.6)
- Real-time Performance Metrics: Now visualizes
Input TPSandOutput TPSduring streaming. - Improved Reasoning UI: Seamlessly renders and stabilizes the model's Chain-of-Thought (CoT).
- Blackwell Optimization: Native support for ARM64/SM121 and CUDA 13.0 FP4.
🐳 Quick Start
# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.6
For more details, visit our GitHub Repository.
🚀 v0.1.6: 실시간 지표 및 Blackwell 최적화 도커 (권장)
이 모델은 DGX-Spark-llama.cpp-Bench 시스템에 최적화되어 있습니다. NVIDIA Blackwell (DGX Spark) 하드웨어의 성능을 최대로 활용하세요.
🌟 주요 특징 (v0.1.6)
- 실시간 성능 지표 시각화: 스트리밍 중
Input TPS및Output TPS를 실시간으로 표시합니다. - 지능형 추론 UI 고도화: 모델의 생각하는 과정(CoT)을 더 안정적으로 렌더링합니다.
- Blackwell 최적화: ARM64/SM121 아키텍처 및 CUDA 13.0 FP4 가속 지원.
🐳 실행 방법
# 최신 최적화 이미지 내려받기
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.6
상세한 사용법은 GitHub 리포지토리를 참조하세요.
🚀 v0.1.5: Real-time Metrics & Blackwell-Optimized Docker (Recommended)
This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the state-of-the-art inference engine optimized for NVIDIA Blackwell (DGX Spark) hardware.
🌟 Key Features (v0.1.5)
- Real-time Performance Metrics: Now visualizes
Input TPSandOutput TPSduring streaming. - Improved Reasoning UI: Seamlessly renders and stabilizes the model's Chain-of-Thought (CoT).
- Blackwell Optimization: Native support for ARM64/SM121 and CUDA 13.0 FP4.
🐳 Quick Start
# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.5
For more details, visit our GitHub Repository.
🚀 v0.1.5: 실시간 지표 및 Blackwell 최적화 도커 (권장)
이 모델은 DGX-Spark-llama.cpp-Bench 시스템에 최적화되어 있습니다. NVIDIA Blackwell (DGX Spark) 하드웨어의 성능을 최대로 활용하세요.
🌟 주요 특징 (v0.1.5)
- 실시간 성능 지표 시각화: 스트리밍 중
Input TPS및Output TPS를 실시간으로 표시합니다. - 지능형 추론 UI 고도화: 모델의 생각하는 과정(CoT)을 더 안정적으로 렌더링합니다.
- Blackwell 최적화: ARM64/SM121 아키텍처 및 CUDA 13.0 FP4 가속 지원.
🐳 실행 방법
# 최신 최적화 이미지 내려받기
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.5
상세한 사용법은 GitHub 리포지토리를 참조하세요.
🚀 v0.1.4: Quick Start with Blackwell-Optimized Docker (Recommended)
This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the best performance on NVIDIA Blackwell (DGX Spark) hardware with our optimized inference engine.
🌟 Key Features (v0.1.4)
- Blackwell Optimized: Native support for ARM64/SM121 and CUDA 13.0 FP4.
- Intelligent Reasoning UI: Automatic extraction and visualization of reasoning processes (CoT).
- One-Click Deployment: Standardized environment via GHCR Docker image.
🐳 How to Run
# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.4
# Follow the instructions in our repo to serve this model
# GitHub: https://github.com/sowilow/DGX-Spark-llama.cpp-Bench
🚀 v0.1.4: Blackwell 최적화 도커 퀵스타트 (권장)
이 모델은 DGX-Spark-llama.cpp-Bench 시스템에 최적화되어 있습니다. NVIDIA Blackwell (DGX Spark) 하드웨어의 성능을 최대로 활용하는 최적화된 추론 엔진을 경험해 보세요.
🌟 주요 특징 (v0.1.4)
- Blackwell 최적화: ARM64/SM121 아키텍처 및 CUDA 13.0 FP4 하드웨어 가속 지원.
- 지능형 추론 UI: 모델의 생각하는 과정(CoT)을 자동으로 감지하고 시각화합니다.
- 간편한 배포: GHCR 도커 이미지를 통해 환경 설정 없이 즉시 실행 가능합니다.
🐳 실행 방법
# 최신 최적화 이미지 내려받기
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.4
상세한 사용법은 GitHub 리포지토리를 참조하세요.
🚀 Quick Start with Docker (Recommended)
You can easily run this model using the DGX-Spark-llama.cpp-Bench inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark).
1. Pull the Docker Image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest
2. Run the Inference Server
For detailed configuration and usage, visit the GitHub Repository.
Next2-Air-DGX-Spark-GGUF
This repository contains GGUF-quantized weights for Next2-Air, specifically optimized for NVIDIA Blackwell (DGX Spark) hardware.
🚀 Key Features
- Hardware Optimized: Built with CUDA 13.0 and SM121 (Blackwell) native acceleration.
- Quantization: Specialized for high-speed airborne/mobile visual reasoning.
- Base Model Integration: Linked directly to the original thelamapi/Next-2-Air-GGUF.
⚖️ License & Attribution
This model is subject to the Apache License 2.0.
📂 Files Included
- Next2-specific GGUF and mmproj files.
Created using DGX-Spark-llama.cpp-Bench
- Downloads last month
- 26
4-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sowilow/Next2-Air-DGX-Spark-GGUF", filename="", )