File size: 2,492 Bytes
1669554
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# Web UI

## Start

```bash
./start-server.sh
```

Opens the web interface at **http://localhost:8080**.

## Network Mode

Access the web UI from your phone's browser or any device on the same WiFi (the model runs on your computer, your phone is just the display):

```bash
./start-server.sh --network
```

The script will print the URL to open on other devices.

## Options

These flags work with `start-server.sh` (Linux, macOS, Android):

| Flag | Description |
|------|------------|
| `--network` | Bind to all interfaces (allows LAN access) |
| `--port=N` | Use a different port (default: 8080) |
| `--cpu` | CPU-only mode (no GPU offload) |

## Windows

Double-click `start-server.bat` or run from Command Prompt:

```
start-server.bat
```

Uses the CUDA build if available, otherwise falls back to Vulkan. The `.bat` script uses the default settings (localhost, port 8080, GPU enabled). To change these, use the Manual Start section below.

## Manual Start

If you prefer to start the server directly:

```bash
# Linux (CUDA)
cd linux-cuda
LD_LIBRARY_PATH=. ./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui

# Linux (Vulkan)
cd linux-vulkan
LD_LIBRARY_PATH=. ./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui

# macOS (Apple Silicon)
cd metal
./llama-server -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ../webui

# Windows (CUDA)
cd windows-cuda
llama-server.exe -m ..\evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ..\webui

# Windows (Vulkan)
cd windows-vulkan
llama-server.exe -m ..\evr-llama-3.1-8b-instruct.gguf -ngl 99 --port 8080 --path ..\webui
```

## API

The server exposes an OpenAI-compatible API:

```bash
# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}],"stream":false}'

# Text completion
curl http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"prompt":"The main causes of","max_tokens":200,"stream":false}'

# Health check
curl http://localhost:8080/health
```

## Troubleshooting

**Server won't start:** Make sure no other process is using port 8080. Try `--port=8081`.

**Slow generation:** Ensure GPU offload is working (`-ngl 99`). Check that CUDA/Vulkan drivers are installed.

**Can't access from phone:** Use `--network` flag. Make sure both devices are on the same WiFi network. Check firewall settings.