rbentaarit commited on
Commit
30e7b44
·
verified ·
1 Parent(s): cfe1637

Card: canonical naming + note adapter now in-repo under adapter/

Browse files
Files changed (1) hide show
  1. README.md +18 -15
README.md CHANGED
@@ -15,20 +15,21 @@ tags:
15
  - gguf
16
  ---
17
 
18
- # kubelm-edge-v0.3 — Q4_K_M GGUF
19
 
20
  A 2B parameter K8sGPT MCP tool-use specialist, trained with QLoRA on
21
  Qwen3.5-2B and quantized to Q4_K_M for CPU-only deployment. The
22
- headline deployable of the [kubelm](https://github.com/rbentaarit/kubelm)
23
- project — supersedes
24
- [`kubelm-edge-v0`](https://huggingface.co/rbentaarit/kubelm-edge-v0-GGUF).
 
25
 
26
  ## TL;DR
27
 
28
  On the 35-scenario v0.3 evaluation library, served via `llama-server`
29
  at temperature 0:
30
 
31
- | metric | qwen2.5-7b (reference) | kubelm-edge-v0 + corrected prompt | **kubelm-edge-v0.3** |
32
  |---|---|---|---|
33
  | `conclusion_rubric_passed` | 28 / 35 | 29 / 35 | **32 / 35** |
34
  | `reference_calls_passed` | 28 / 35 | 27 / 35 | **32 / 35** |
@@ -50,7 +51,7 @@ ollama 0.23.1's `qwen3next` loader currently rejects this GGUF (see
50
  ```bash
51
  # Boot the model (Apple Silicon shown; on Linux drop -ngl or set 0)
52
  brew install llama.cpp # or: build from https://github.com/ggml-org/llama.cpp
53
- huggingface-cli download rbentaarit/kubelm-edge-v0.3-GGUF \
54
  kubelm-edge.Q4_K_M.gguf --local-dir .
55
 
56
  llama-server \
@@ -83,7 +84,7 @@ Sample chat-completion call with a K8sGPT MCP tool:
83
  curl -sS http://127.0.0.1:8088/v1/chat/completions \
84
  -H 'Content-Type: application/json' \
85
  -d '{
86
- "model": "kubelm-edge-v0.3",
87
  "temperature": 0.0,
88
  "max_tokens": 2048,
89
  "chat_template_kwargs": {"enable_thinking": false},
@@ -105,8 +106,8 @@ so the model can call real tools against a real cluster.
105
 
106
  - **Tool-use specialist** for K8sGPT MCP investigations on CPU-only
107
  hardware (M-series Macs, modest Linux boxes).
108
- - Drop-in upgrade from `kubelm-edge-v0` for K8sGPT integrations that
109
- already speak the OpenAI Chat Completions API.
110
  - Local component of agentic K8s diagnosis pipelines where the
111
  destructive-action layer is handled by K8sGPT's operator + Mutation
112
  CR policy gates (i.e. **the model proposes; the operator gates**).
@@ -136,7 +137,8 @@ so the model can call real tools against a real cluster.
136
  [dataset card](https://huggingface.co/datasets/rbentaarit/kubelm-seed-v0)
137
  "v0.2 corpus" section for the full provenance.
138
  - **Method:** QLoRA, rank 32 / alpha 64, target modules
139
- `q_proj k_proj v_proj o_proj gate_proj up_proj down_proj`.
 
140
  - **Schedule:** 1 epoch, batch 8 × grad-accum 2, lr 2e-4 cosine,
141
  warmup 3%, max_seq_length 16384, seed 42. Train loss bottomed at
142
  0.14–0.17 (no overfit; v0.2 on Qwen 2.5 1.5B bottomed at 0.024 and
@@ -190,8 +192,9 @@ Full bench summary (rows for all four columns, every scenario):
190
  Qwen 3.5 loader stabilizes.
191
  - **CPU latency on weak hardware.** Per-turn latency on M1 Max with
192
  Metal offload is ~1.5–2 s; on a 2-core / 2 GB edge box without
193
- hardware acceleration, expect single-digit seconds per turn. For
194
- per-step latency budgets < 1 s, see `kubelm-edge-v0` (1.5B Qwen 2.5).
 
195
  - **No native tool-call format other than OpenAI Chat Completions.**
196
  Anthropic-style tool-use, Cohere-style, and custom XML formats are
197
  not trained. Use a translation layer.
@@ -205,11 +208,11 @@ model is Qwen 3.5 2B (Apache 2.0). The training corpus is
205
  ## Citation
206
 
207
  ```
208
- @misc{kubelm_edge_v03,
209
- title = {kubelm-edge-v0.3},
210
  author = {Ramzi Ben Taarit and contributors},
211
  year = {2026},
212
- url = {https://huggingface.co/rbentaarit/kubelm-edge-v0.3-GGUF},
213
  note = {QLoRA on Qwen3.5-2B; trained against K8sGPT v0.4.32 MCP trajectories}
214
  }
215
  ```
 
15
  - gguf
16
  ---
17
 
18
+ # kubelm-qwen3.5-2b-v1 — Q4_K_M GGUF
19
 
20
  A 2B parameter K8sGPT MCP tool-use specialist, trained with QLoRA on
21
  Qwen3.5-2B and quantized to Q4_K_M for CPU-only deployment. The
22
+ headline deployable (**edge+** tier) of the
23
+ [kubelm](https://github.com/rbentaarit/kubelm) project — supersedes the
24
+ edge tier
25
+ [`kubelm-qwen2.5-1.5b-v1`](https://huggingface.co/rbentaarit/kubelm-qwen2.5-1.5b-v1).
26
 
27
  ## TL;DR
28
 
29
  On the 35-scenario v0.3 evaluation library, served via `llama-server`
30
  at temperature 0:
31
 
32
+ | metric | qwen2.5-7b (reference) | kubelm-qwen2.5-1.5b-v1 (edge) | **kubelm-qwen3.5-2b-v1** |
33
  |---|---|---|---|
34
  | `conclusion_rubric_passed` | 28 / 35 | 29 / 35 | **32 / 35** |
35
  | `reference_calls_passed` | 28 / 35 | 27 / 35 | **32 / 35** |
 
51
  ```bash
52
  # Boot the model (Apple Silicon shown; on Linux drop -ngl or set 0)
53
  brew install llama.cpp # or: build from https://github.com/ggml-org/llama.cpp
54
+ huggingface-cli download rbentaarit/kubelm-qwen3.5-2b-v1 \
55
  kubelm-edge.Q4_K_M.gguf --local-dir .
56
 
57
  llama-server \
 
84
  curl -sS http://127.0.0.1:8088/v1/chat/completions \
85
  -H 'Content-Type: application/json' \
86
  -d '{
87
+ "model": "kubelm-qwen3.5-2b",
88
  "temperature": 0.0,
89
  "max_tokens": 2048,
90
  "chat_template_kwargs": {"enable_thinking": false},
 
106
 
107
  - **Tool-use specialist** for K8sGPT MCP investigations on CPU-only
108
  hardware (M-series Macs, modest Linux boxes).
109
+ - Drop-in upgrade from `kubelm-qwen2.5-1.5b-v1` for K8sGPT integrations
110
+ that already speak the OpenAI Chat Completions API.
111
  - Local component of agentic K8s diagnosis pipelines where the
112
  destructive-action layer is handled by K8sGPT's operator + Mutation
113
  CR policy gates (i.e. **the model proposes; the operator gates**).
 
137
  [dataset card](https://huggingface.co/datasets/rbentaarit/kubelm-seed-v0)
138
  "v0.2 corpus" section for the full provenance.
139
  - **Method:** QLoRA, rank 32 / alpha 64, target modules
140
+ `q_proj k_proj v_proj o_proj gate_proj up_proj down_proj`. LoRA
141
+ adapter included in this repo under `adapter/`.
142
  - **Schedule:** 1 epoch, batch 8 × grad-accum 2, lr 2e-4 cosine,
143
  warmup 3%, max_seq_length 16384, seed 42. Train loss bottomed at
144
  0.14–0.17 (no overfit; v0.2 on Qwen 2.5 1.5B bottomed at 0.024 and
 
192
  Qwen 3.5 loader stabilizes.
193
  - **CPU latency on weak hardware.** Per-turn latency on M1 Max with
194
  Metal offload is ~1.5–2 s; on a 2-core / 2 GB edge box without
195
+ hardware acceleration, expect single-digit seconds per turn. For the
196
+ lowest per-step latency and smallest footprint, see the ultra-edge
197
+ `kubelm-qwen3.5-0.8b-v1`.
198
  - **No native tool-call format other than OpenAI Chat Completions.**
199
  Anthropic-style tool-use, Cohere-style, and custom XML formats are
200
  not trained. Use a translation layer.
 
208
  ## Citation
209
 
210
  ```
211
+ @misc{kubelm_qwen35_2b_v1,
212
+ title = {kubelm-qwen3.5-2b-v1},
213
  author = {Ramzi Ben Taarit and contributors},
214
  year = {2026},
215
+ url = {https://huggingface.co/rbentaarit/kubelm-qwen3.5-2b-v1},
216
  note = {QLoRA on Qwen3.5-2B; trained against K8sGPT v0.4.32 MCP trajectories}
217
  }
218
  ```