Instructions to use jamescallander/DeepSeek-R1-Distill-Llama-8B_w8a8_g128_rk3588.rkllm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- RKLLM
How to use jamescallander/DeepSeek-R1-Distill-Llama-8B_w8a8_g128_rk3588.rkllm with RKLLM:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 4,917 Bytes
e46c62c 28a895f e46c62c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | ---
library_name: rkllm
license: mit
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
tags:
- rkllm
- rknn-llm
- rk3588
- rockchip
- edge-ai
- llm
- deepseek
pipeline_tag: text-generation
---
# DeepSeek-R1-Distill-Llama-8B — RKLLM build for RK3588 boards
### Built with DeepSeek
**Author:** @jamescallander
**Source model:** [deepseek-ai/DeepSeek-R1-Distill-Llama-8B · Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
**Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**
> This repository hosts a **conversion** of `DeepSeek-R1-Distill-Llama-8B` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
#### Conversion details
- RKLLM-Toolkit version: v1.2.1
- NPU driver: v0.9.8
- Python: 3.12
- Quantization: `w8a8_g128`
- Output: single-file `.rkllm` artifact
- Tokenizer: not required at runtime (UI handles prompt I/O)
## Intended use
- On-device inference on RK3588 SBCs.
- **Reasoning-focused** model — designed to handle multi-step thinking, problem-solving, and structured explanations.
- Well-suited for tasks that need **step-by-step reasoning** or more careful breakdowns than typical instruction models.
## Limitations
- Requires 9GB free memory
- Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
- Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
- While strong at reasoning, performance is limited by RK3588’s NPU compared to high-end GPUs.
## Quick start (RK3588)
### 1) Install runtime
The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
Download and install the required packages as per the toolkit's instructions.
### 2) Simple Flask server deployment
The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
```bash
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
--rkllm_model_path <MODEL_PATH>/DeepSeek-R1-Distill-Llama-8B_w8a8_g128_rk3588.rkllm \
--target_platform rk3588
```
### 3) Sending a request
A basic format for message request is:
```json
{
"model":"DeepSeek-R1-Distill-Llama-8B",
"messages":[{
"role":"user",
"content":"<YOUR_PROMPT_HERE>"}],
"stream":false
}
```
Example request using `curl`:
```bash
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
-H 'Content-Type: application/json' \
-d '{"model":"DeepSeek-R1-Distill-Llama-8B","messages":[{"role":"user","content":"In 2 or 3 sentences, who was Napoleon Bonaparte?"}],"stream":false}'
```
The response is formated in the following way:
```json
{
"choices":[{
"finish_reason":"stop",
"index":0,
"logprobs":null,
"message":{
"content":"<MODEL_REPLY_HERE">,
"role":"assistant"}}],
"created":null,
"id":"rkllm_chat",
"object":"rkllm_chat",
"usage":{
"completion_tokens":null,
"prompt_tokens":null,
"total_tokens":null}
}
```
Example response:
```json
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
```
#### Note on reasoning traces
This model outputs **intermediate reasoning text** (e.g., chains of thought) before its final response, enclosed by `</think>` markers.
- Many OpenAI-compatible UIs automatically **suppress or hide this internal reasoning**.
- If your client does not, you may see the reasoning steps along with the final answer.
### 4) UI compatibility
This server exposes an **OpenAI-compatible Chat Completions API**.
You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
- Make sure the `model` field matches the converted model’s name, for example:
```json
{
"model": "DeepSeek-R1-Distill-Llama-8B",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}
```
# License
This conversion follows the [MIT License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)
- Attribution: **Built with DeepSeek-R1-Distill-Llama-8B (DeepSeek-AI)**
- Required notice: see [`NOTICE`](NOTICE)
- Modifications: quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs |