Evrmind's picture
Upload folder using huggingface_hub
1669554 verified

Web UI

Start

./start-server.sh

Opens the web interface at http://localhost:8080.

Network Mode

Access the web UI from your phone's browser or any device on the same WiFi (the model runs on your computer, your phone is just the display):

./start-server.sh --network

The script will print the URL to open on other devices.

Options

These flags work with start-server.sh (Linux, macOS, Android):

Flag Description
--network Bind to all interfaces (allows LAN access)
--port=N Use a different port (default: 8080)
--cpu CPU-only mode (no GPU offload)

Windows

Double-click start-server.bat or run from Command Prompt:

start-server.bat

Uses the CUDA build if available, otherwise falls back to Vulkan. The .bat script uses the default settings (localhost, port 8080, GPU enabled). To change these, use the Manual Start section below.

Manual Start

If you prefer to start the server directly:

# Linux (CUDA)
cd linux-cuda
LD_LIBRARY_PATH=. ./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui

# Linux (Vulkan)
cd linux-vulkan
LD_LIBRARY_PATH=. ./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui

# macOS (Apple Silicon)
cd metal
./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui

# Windows (CUDA)
cd windows-cuda
llama-server.exe -m ..\evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ..\webui

# Windows (Vulkan)
cd windows-vulkan
llama-server.exe -m ..\evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ..\webui

API

The server exposes an OpenAI-compatible API:

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}],"stream":false}'

# Text completion
curl http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"prompt":"The main causes of","max_tokens":200,"stream":false}'

# Health check
curl http://localhost:8080/health

Troubleshooting

Server won't start: Make sure no other process is using port 8080. Try --port=8081.

Slow generation: Ensure GPU offload is working (-ngl 99). Check that CUDA/Vulkan drivers are installed.

Can't access from phone: Use --network flag. Make sure both devices are on the same WiFi network. Check firewall settings.