jamescallander's picture
Update README.md
28a895f verified
|
Raw
History Blame Contribute Delete
4.92 kB
---
library_name: rkllm
license: mit
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
tags:
- rkllm
- rknn-llm
- rk3588
- rockchip
- edge-ai
- llm
- deepseek
pipeline_tag: text-generation
---
# DeepSeek-R1-Distill-Llama-8B — RKLLM build for RK3588 boards
### Built with DeepSeek
**Author:** @jamescallander
**Source model:** [deepseek-ai/DeepSeek-R1-Distill-Llama-8B · Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
**Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**
> This repository hosts a **conversion** of `DeepSeek-R1-Distill-Llama-8B` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
#### Conversion details
- RKLLM-Toolkit version: v1.2.1
- NPU driver: v0.9.8
- Python: 3.12
- Quantization: `w8a8_g128`
- Output: single-file `.rkllm` artifact
- Tokenizer: not required at runtime (UI handles prompt I/O)
## Intended use
- On-device inference on RK3588 SBCs.
- **Reasoning-focused** model — designed to handle multi-step thinking, problem-solving, and structured explanations.
- Well-suited for tasks that need **step-by-step reasoning** or more careful breakdowns than typical instruction models.
## Limitations
- Requires 9GB free memory
- Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
- Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
- While strong at reasoning, performance is limited by RK3588’s NPU compared to high-end GPUs.
## Quick start (RK3588)
### 1) Install runtime
The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
Download and install the required packages as per the toolkit's instructions.
### 2) Simple Flask server deployment
The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
```bash
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
--rkllm_model_path <MODEL_PATH>/DeepSeek-R1-Distill-Llama-8B_w8a8_g128_rk3588.rkllm \
--target_platform rk3588
```
### 3) Sending a request
A basic format for message request is:
```json
{
"model":"DeepSeek-R1-Distill-Llama-8B",
"messages":[{
"role":"user",
"content":"<YOUR_PROMPT_HERE>"}],
"stream":false
}
```
Example request using `curl`:
```bash
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
-H 'Content-Type: application/json' \
-d '{"model":"DeepSeek-R1-Distill-Llama-8B","messages":[{"role":"user","content":"In 2 or 3 sentences, who was Napoleon Bonaparte?"}],"stream":false}'
```
The response is formated in the following way:
```json
{
"choices":[{
"finish_reason":"stop",
"index":0,
"logprobs":null,
"message":{
"content":"<MODEL_REPLY_HERE">,
"role":"assistant"}}],
"created":null,
"id":"rkllm_chat",
"object":"rkllm_chat",
"usage":{
"completion_tokens":null,
"prompt_tokens":null,
"total_tokens":null}
}
```
Example response:
```json
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
```
#### Note on reasoning traces
This model outputs **intermediate reasoning text** (e.g., chains of thought) before its final response, enclosed by `</think>` markers.
- Many OpenAI-compatible UIs automatically **suppress or hide this internal reasoning**.
- If your client does not, you may see the reasoning steps along with the final answer.
### 4) UI compatibility
This server exposes an **OpenAI-compatible Chat Completions API**.
You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
- Make sure the `model` field matches the converted model’s name, for example:
```json
{
"model": "DeepSeek-R1-Distill-Llama-8B",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}
```
# License
This conversion follows the [MIT License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)
- Attribution: **Built with DeepSeek-R1-Distill-Llama-8B (DeepSeek-AI)**
- Required notice: see [`NOTICE`](NOTICE)
- Modifications: quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs