# Web UI ## Start ```bash ./start-server.sh ``` Opens the web interface at **http://localhost:8080**. ## Network Mode Access the web UI from your phone's browser or any device on the same WiFi (the model runs on your computer, your phone is just the display): ```bash ./start-server.sh --network ``` The script will print the URL to open on other devices. ## Options These flags work with `start-server.sh` (Linux, macOS, Android): | Flag | Description | |------|------------| | `--network` | Bind to all interfaces (allows LAN access) | | `--port=N` | Use a different port (default: 8080) | | `--cpu` | CPU-only mode (no GPU offload) | ## Windows Double-click `start-server.bat` or run from Command Prompt: ``` start-server.bat ``` Uses the CUDA build if available, otherwise falls back to Vulkan. The `.bat` script uses the default settings (localhost, port 8080, GPU enabled). To change these, use the Manual Start section below. ## Manual Start If you prefer to start the server directly: ```bash # Linux (CUDA) cd linux-cuda LD_LIBRARY_PATH=. ./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui # Linux (Vulkan) cd linux-vulkan LD_LIBRARY_PATH=. ./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui # macOS (Apple Silicon) cd metal ./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui # Windows (CUDA) cd windows-cuda llama-server.exe -m ..\evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ..\webui # Windows (Vulkan) cd windows-vulkan llama-server.exe -m ..\evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ..\webui ``` ## API The server exposes an OpenAI-compatible API: ```bash # Chat completion curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"messages":[{"role":"user","content":"Hello"}],"stream":false}' # Text completion curl http://localhost:8080/v1/completions \ -H "Content-Type: application/json" \ -d '{"prompt":"The main causes of","max_tokens":200,"stream":false}' # Health check curl http://localhost:8080/health ``` ## Troubleshooting **Server won't start:** Make sure no other process is using port 8080. Try `--port=8081`. **Slow generation:** Ensure GPU offload is working (`-ngl 99`). Check that CUDA/Vulkan drivers are installed. **Can't access from phone:** Use `--network` flag. Make sure both devices are on the same WiFi network. Check firewall settings.