---
language:
- en
- zh
- ja
- ko
- fr
- es
- pt
- de
- it
- ru
- ar
- vi
- th
tags:
- code
- coding
- qwen3.0
- onnx
- int4
- web-ui
license: unknown
---

# JiRack Coder Reasoning 14B INT4

A fast and efficient coding assistant with a clean built-in web UI, powered by **Qwen3.0-Coder-14B-Instruct** and optimized using Microsoft ONNX Runtime.

- JiRack is a cloud-ready model that helps save money on cloud infrastructure. It can be used as an expert model in RAG deployments, with the ONNX JiRack Java server as an alternative.
- Subscription: **$1 per month per user** (updated license for non-company use).
- Corp Subscription: **$3 per month per user** (updated license for company use).
- It works without subscription but send message about subscription

## Quick Start

Watch JiRack Coder Reasoning 14B in action:

**DEMO**: [JiRack Coder Reasoning 14B Web UI](https://youtu.be/mq1DxIov7Bw)

### Run with Docker

**Default CPU**

```bash
docker run -d \
  --name jirack_coder_reasoning_14b \
  -p 7869:7869 \
  --restart unless-stopped \
  cmsmanhattan/jirack_coder_14b_int4_qwenbase:latest
```

**Multi CPU**

```bash
docker run -d \
  --name jirack_coder_reasoning_14b \
  -p 7869:7869 \
  --restart unless-stopped \
  --memory=20g \
  --cpus=12 \
  cmsmanhattan/jirack_coder_14b_int4_qwenbase:latest
```

**GPU (Coming soon)**

```bash
docker run -d \
  --name jirack_coder_reasoning_14b \
  -p 7869:7869 \
  --gpus all \
  --restart unless-stopped \
  cmsmanhattan/jirack_coder_14b_int4_gpu_qwenbase:latest
```

### Docker Compose Example

```yaml
services:
  jirack:
    image: cmsmanhattan/jirack_coder_14b_int4_qwenbase:latest
    container_name: jirack_onnx_service
    ports:
      - "7869:7869"
    volumes:
      - .:/app
      - ./web:/app/web
    environment:
      - MAX_TOKENS=1024
      - TEMPERATURE=0.7
      - TOP_P=0.9
      - DEFAULT_STREAM=False
      - INTRA_THREADS=4
      - USE_ENV_ALLOCATOR=1
    deploy:
      resources:
        limits:
          memory: 32g
```

## Access the UI

Once the container is running, open your browser and navigate to:

`http://localhost:7869`

This opens the JiRack Coder UI — a clean web interface designed for coding.

## Changing the Port

The listening port can be easily modified directly from the **Settings** panel within the JiRack Coder UI.

## Licensing

- The JiRack Coder Reasoning 14B model is provided under a commercial license ($12 per user per year).
- All JiRack UI clients are provided under a commercial license.
- However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately.

JiRack Coder 32B is available exclusively under a commercial enterprise license.

For commercial licensing, cluster deployment, or enterprise use of JiRack Coder 32B and JiRack Coder 14B, please contact us.

- **JiRack MS Windows 11 Desktop Client (with Ollama API):**  
  https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip

- **Live email chat with the model:** support@cmsmanhattan.com

## Hardware Recommendations for AMD Systems

Note: This model is heavier than JiRack Coder 7B INT8.

### Recommended Hardware for JiRack Coder Reasoning 14B INT4 (single Docker container)

| Use Case         | CPU                      | GPU (ROCm)                | VRAM / RAM | Expected Speed   | Recommendation |
|------------------|--------------------------|---------------------------|------------|------------------|----------------|
| Recommended      | Ryzen 7 7700 / 9700X     | RX 7900 XTX / 7900 XT     | 24GB VRAM  | 50-75 tokens/s   | Best choice    |
| High Performance | Ryzen 9 7950X / 9950X    | RX 7900 XTX               | 24GB+ VRAM | 65-90 tokens/s   | Excellent      |
| Enterprise       | EPYC 7003/9004 series    | MI300X or 2x RX 7900 XTX  | 48GB+ VRAM | 90-140 tokens/s  | For 32B model  |
| Budget Option    | Ryzen 5 7600 / 9600X     | RX 7800 XT (16GB)         | 16GB VRAM  | 35-50 tokens/s   | Acceptable     |

## Important Memory Notes

Even though the 14B INT4 model itself takes approximately 5–6 GB, we recommend at least 24GB VRAM for the following reasons:

- KV-cache consumption during generation, especially with long context
- ONNX Runtime overhead and temporary buffers
- System stability and avoiding out-of-memory errors
- Room for larger context windows

**Minimum recommended:** 24GB VRAM (RX 7900 series)  
**Ideal:** 24–32GB VRAM

For pure CPU inference (no GPU), we recommend at least 64GB system RAM (Ryzen 9 7950X/9950X).

I added the default model in full FP32 precision. This serves as the base for quantization, allowing us to find the optimal balance between model size and performance.

## 📧 Contact & Licensing

For joint venture opportunities, hardware integration, or licensing inquiries:

- **Email:** grabko@cmsmanhattan.com
- **Phone:** +1 (516) 777-0945
- **Location:** New York, USA