Nishant2414 YTan2000 commited on
Commit
8d49739
·
0 Parent(s):

Duplicate from YTan2000/Qwen3.6-35B-A3B-TQ3_4S

Browse files

Co-authored-by: Y Tan <YTan2000@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ thumbnail.png filter=lfs diff=lfs merge=lfs -text
37
+ mmproj-BF16.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Qwen3.6-35B-A3B-TQ3_4S.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3.6-35B-A3B-TQ3_4S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6af1c2df5cdeac6975079a6e129bc2043f7bbc0f0ac72125a7572016b452216e
3
+ size 13298875360
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen3.6-35B-A3B
5
+ language:
6
+ - en
7
+ tags:
8
+ - GGUF
9
+ - llama.cpp
10
+ - qwen3.6
11
+ - qwen
12
+ - quantization
13
+ - turboquant
14
+ - tq3_4s
15
+ - multimodal
16
+ - Mixture of Experts
17
+ - conversational
18
+ pipeline_tag: image-text-to-text
19
+ ---
20
+
21
+ ![thumbnail](thumbnail.png)
22
+
23
+ # Qwen3.6-35B-A3B-TQ3_4S
24
+
25
+ GGUF quantization of [`Qwen/Qwen3.6-35B-A3B`](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) using **TQ3_4S** with mixed-precision MoE compression — 2-bit experts, 4-bit attention.
26
+
27
+ ## Files
28
+
29
+ | File | Description |
30
+ |------|-------------|
31
+ | `Qwen3.6-35B-A3B-TQ3_4S.gguf` | Main model (12.4 GiB, 3.07 BPW) |
32
+ | `mmproj-BF16.gguf` | Multimodal projector (BF16) |
33
+
34
+ ## Quantization
35
+
36
+ MoE experts tolerate aggressive compression because only 8/256 are active per token. This quantization exploits that asymmetry:
37
+
38
+ | Component | Quant | Rationale |
39
+ |-----------|-------|-----------|
40
+ | Expert MLP gate/up | Q2_K | 98% of params, MoE-tolerant |
41
+ | Expert MLP down | Q3_K | Write-back sensitivity |
42
+ | Attention Q/K/V/O | TQ3_4S | WHT-protected |
43
+ | Embeddings + output | Q6_K | Quality anchor |
44
+
45
+ ## Runtime Requirement
46
+
47
+ This model requires the public TurboQuant runtime fork:
48
+ * https://github.com/turbo-tan/llama.cpp-tq3
49
+
50
+ ## Recommended Settings (16GB VRAM)
51
+
52
+ ```bash
53
+ ./build/bin/llama-server \
54
+ -m Qwen3.6-35B-A3B-TQ3_4S.gguf \
55
+ -ngl 99 -c 4096 -np 1 \
56
+ -ctk q4_0 -ctv tq3_0 -fa on \
57
+ --jinja \
58
+ --reasoning off --reasoning-budget 0 --reasoning-format deepseek
59
+ ```
60
+
61
+ With vision:
62
+
63
+ ```bash
64
+ ./build/bin/llama-server \
65
+ -m Qwen3.6-35B-A3B-TQ3_4S.gguf \
66
+ --mmproj mmproj-BF16.gguf \
67
+ -ngl 99 -c 4096 -np 1 \
68
+ -ctk q4_0 -ctv tq3_0 -fa on \
69
+ --jinja --no-mmproj-offload \
70
+ --reasoning off --reasoning-budget 0 --reasoning-format deepseek
71
+ ```
72
+
73
+ ## Performance (RTX 5060 Ti 16GB)
74
+
75
+ | Metric | Value |
76
+ |--------|------:|
77
+ | PP512 | 1832 tok/s |
78
+ | TG128 | 107 tok/s |
79
+ | Size | 12.4 GiB |
80
+ | BPW | 3.07 |
81
+ | ngl | 99 (full GPU) |
82
+
83
+ Fits entirely in 16GB VRAM — no CPU offload needed.
84
+
85
+ ## Quality
86
+
87
+ 10/10 correct on standard QA benchmark (capital of France, 2+2, Python reverse string, gravity, WW2, primes, boiling point, Shakespeare, Jupiter, hello→Hola).
88
+
89
+ ## Base Model
90
+
91
+ * [`Qwen/Qwen3.6-35B-A3B`](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
92
+ * Source: [`unsloth/Qwen3.6-35B-A3B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF) (Q8_0)
93
+
94
+ ## License
95
+
96
+ Apache 2.0 — same as the base model.
97
+
98
+ ## Tool Call Validation
99
+
100
+ Tested with `--jinja` on both `--reasoning off` and `--reasoning on --reasoning-budget 2048`:
101
+
102
+ | Test | reasoning off | reasoning on |
103
+ |------|:---:|:---:|
104
+ | Basic tool call trigger | ✅ | ✅ |
105
+ | Tool response → final answer (no loop) | ✅ | ✅ |
106
+ | Correct tool selection from multiple | ✅ | ✅ |
107
+ | No tool call for simple questions | ✅ | ✅ |
108
+ | Multi-step tool use | ✅ | ✅ |
109
+ | Nested quote escaping retry (no loop) | ✅ | ✅ |
110
+ | **Total** | **10/10** | **10/10** |
111
+
112
+ ### Recommended settings for tool-use / agentic workflows
113
+
114
+ ```bash
115
+ --jinja --reasoning off --reasoning-budget 0 --reasoning-format deepseek
116
+ ```
117
+
118
+ Avoid `--presence-penalty` above 0.5 for tool-use — high values diversify reasoning tokens but don't improve structured JSON output, and can cause repeated near-identical tool calls in agent loops.
119
+
120
+ If using `--reasoning on`, ensure your agent framework detects consecutive identical tool calls and breaks after 2-3 retries.
121
+
122
+ ### Run tests yourself
123
+
124
+ ```bash
125
+ chmod +x test_tool_calls.sh
126
+ ./test_tool_calls.sh 8085
127
+ ```
128
+
mmproj-BF16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:356dfaa3111376a4f7165e32e8749713378d1700b37cf52e0c50d9f23322334d
3
+ size 902822624
test_tool_calls.sh ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Comprehensive tool call test for Qwen3.6-35B-A3B
3
+ # Usage: ./test_tool_calls.sh [port]
4
+ # Default port: 8085
5
+
6
+ PORT="${1:-8085}"
7
+ BASE="http://localhost:$PORT/v1/chat/completions"
8
+ PASS=0
9
+ FAIL=0
10
+
11
+ check() {
12
+ local name="$1" expected="$2" actual="$3"
13
+ if echo "$actual" | grep -q "$expected"; then
14
+ echo " ✅ $name"
15
+ ((PASS++))
16
+ else
17
+ echo " ❌ $name (expected '$expected', got: $actual)"
18
+ ((FAIL++))
19
+ fi
20
+ }
21
+
22
+ echo "=== Tool Call Tests ==="
23
+ echo "Server: http://localhost:$PORT"
24
+ curl -s "http://localhost:$PORT/health" | grep -q ok || { echo "❌ Server not running"; exit 1; }
25
+ echo ""
26
+
27
+ # Test 1: Basic tool call
28
+ echo "Test 1: Basic tool call trigger"
29
+ R=$(curl -s --max-time 60 "$BASE" -H "Content-Type: application/json" -d '{
30
+ "messages": [{"role": "user", "content": "What is the weather in London?"}],
31
+ "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
32
+ "max_tokens": 256, "temperature": 0
33
+ }')
34
+ TCALLS=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);tc=d['choices'][0]['message'].get('tool_calls');print(tc[0]['function']['name'] if tc else 'none')" 2>/dev/null)
35
+ FINISH=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);print(d['choices'][0]['finish_reason'])" 2>/dev/null)
36
+ check "triggers tool call" "get_weather" "$TCALLS"
37
+ check "finish_reason=tool_calls" "tool_calls" "$FINISH"
38
+
39
+ # Test 2: Tool response → final answer (no loop)
40
+ echo ""
41
+ echo "Test 2: Tool response produces final answer (no loop)"
42
+ R=$(curl -s --max-time 60 "$BASE" -H "Content-Type: application/json" -d '{
43
+ "messages": [
44
+ {"role": "user", "content": "What is the weather in London?"},
45
+ {"role": "assistant", "content": "", "tool_calls": [{"id": "call_1", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"London\"}"}}]},
46
+ {"role": "tool", "tool_call_id": "call_1", "content": "{\"temperature\": 15, \"condition\": \"cloudy\"}"}
47
+ ],
48
+ "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
49
+ "max_tokens": 256, "temperature": 0
50
+ }')
51
+ CONTENT=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);print(d['choices'][0]['message'].get('content',''))" 2>/dev/null)
52
+ TCALLS=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);tc=d['choices'][0]['message'].get('tool_calls');print('has_calls' if tc else 'no_calls')" 2>/dev/null)
53
+ FINISH=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);print(d['choices'][0]['finish_reason'])" 2>/dev/null)
54
+ check "has content" "London\|15\|cloudy" "$CONTENT"
55
+ check "no further tool calls" "no_calls" "$TCALLS"
56
+ check "finish_reason=stop" "stop" "$FINISH"
57
+
58
+ # Test 3: Correct tool selection
59
+ echo ""
60
+ echo "Test 3: Selects correct tool from multiple"
61
+ R=$(curl -s --max-time 60 "$BASE" -H "Content-Type: application/json" -d '{
62
+ "messages": [{"role": "user", "content": "Search for latest AI news"}],
63
+ "tools": [
64
+ {"type": "function", "function": {"name": "get_weather", "description": "Get weather for a location", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}},
65
+ {"type": "function", "function": {"name": "search_web", "description": "Search the web", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}
66
+ ],
67
+ "max_tokens": 256, "temperature": 0
68
+ }')
69
+ TCALLS=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);tc=d['choices'][0]['message'].get('tool_calls');print(tc[0]['function']['name'] if tc else 'none')" 2>/dev/null)
70
+ check "picks search_web not get_weather" "search_web" "$TCALLS"
71
+
72
+ # Test 4: No tool call when not needed
73
+ echo ""
74
+ echo "Test 4: No tool call for simple question"
75
+ R=$(curl -s --max-time 60 "$BASE" -H "Content-Type: application/json" -d '{
76
+ "messages": [{"role": "user", "content": "What is 2+2?"}],
77
+ "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
78
+ "max_tokens": 256, "temperature": 0
79
+ }')
80
+ CONTENT=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);print(d['choices'][0]['message'].get('content',''))" 2>/dev/null)
81
+ TCALLS=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);tc=d['choices'][0]['message'].get('tool_calls');print('has_calls' if tc else 'no_calls')" 2>/dev/null)
82
+ check "answers directly with 4" "4" "$CONTENT"
83
+ check "no tool call" "no_calls" "$TCALLS"
84
+
85
+ # Test 5: Multi-step tool use
86
+ echo ""
87
+ echo "Test 5: Multi-step (2 cities)"
88
+ R=$(curl -s --max-time 60 "$BASE" -H "Content-Type: application/json" -d '{
89
+ "messages": [{"role": "user", "content": "Compare weather in London and Paris"}],
90
+ "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a location", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
91
+ "max_tokens": 256, "temperature": 0
92
+ }')
93
+ TCALLS=$(echo "$R" | python3 -c "import sys,json;d=json.load(sys.stdin);tc=d['choices'][0]['message'].get('tool_calls',[]);print(len(tc))" 2>/dev/null)
94
+ check "calls tool (1 or 2 calls)" "[12]" "$TCALLS"
95
+
96
+ # Summary
97
+ echo ""
98
+ echo "=== Results: $PASS passed, $FAIL failed ==="
99
+
100
+ # Test 6: Nested quote escaping (stress test)
101
+ echo ""
102
+ echo "Test 6: Nested bash quote escaping (3 rounds)"
103
+ TOOLS_T='[{"type":"function","function":{"name":"terminal","description":"Execute a bash command","parameters":{"type":"object","properties":{"command":{"type":"string"}},"required":["command"]}}}]'
104
+
105
+ R1=$(curl -s --max-time 120 "$BASE" -H "Content-Type: application/json" -d "{
106
+ \"messages\":[{\"role\":\"user\",\"content\":\"Run: bash ~/script/proxy.sh \\\"web read --url \\\\\\\"https://example.com/\\\\\\\"\\\"\"}],
107
+ \"tools\":$TOOLS_T, \"max_tokens\":512, \"temperature\":1.0, \"top_p\":0.95, \"top_k\":20, \"presence_penalty\":1.5
108
+ }")
109
+ CMD1=$(echo "$R1" | python3 -c "import sys,json;d=json.load(sys.stdin);tc=d['choices'][0]['message'].get('tool_calls',[]);print(tc[0]['function']['arguments'] if tc else 'no_call')" 2>/dev/null)
110
+
111
+ R2=$(curl -s --max-time 120 "$BASE" -H "Content-Type: application/json" -d "{
112
+ \"messages\":[
113
+ {\"role\":\"user\",\"content\":\"Run: bash ~/script/proxy.sh \\\"web read --url \\\\\\\"https://example.com/\\\\\\\"\\\"\"},
114
+ {\"role\":\"assistant\",\"content\":\"\",\"tool_calls\":[{\"id\":\"c1\",\"type\":\"function\",\"function\":{\"name\":\"terminal\",\"arguments\":$CMD1}}]},
115
+ {\"role\":\"tool\",\"tool_call_id\":\"c1\",\"content\":\"{\\\"output\\\":\\\"bash: unexpected EOF\\\\nSTATUS:FAILURE\\\",\\\"exit_code\\\":1}\"}
116
+ ],
117
+ \"tools\":$TOOLS_T, \"max_tokens\":512, \"temperature\":1.0, \"top_p\":0.95, \"top_k\":20, \"presence_penalty\":1.5
118
+ }")
119
+ CMD2=$(echo "$R2" | python3 -c "import sys,json;d=json.load(sys.stdin);tc=d['choices'][0]['message'].get('tool_calls',[]);print(tc[0]['function']['arguments'] if tc else 'gave_up')" 2>/dev/null)
120
+
121
+ if [ "$CMD1" = "$CMD2" ]; then
122
+ echo " ⚠️ identical commands (potential loop)"
123
+ ((FAIL++))
124
+ else
125
+ echo " ✅ commands differ across retries (no loop)"
126
+ ((PASS++))
127
+ fi
128
+ echo " R1: $CMD1"
129
+ echo " R2: $CMD2"
130
+
131
+ # Summary
132
+ echo ""
133
+ echo "=== Results: $PASS passed, $FAIL failed ==="
thumbnail.png ADDED

Git LFS Details

  • SHA256: a6e2b038c38bee031b6fe99b76ebe5759bd1e3571f99a37bba729b4490897758
  • Pointer size: 131 Bytes
  • Size of remote file: 108 kB