keithnull commited on
Commit
d53391f
·
verified ·
1 Parent(s): bbb919a

Upload folder using huggingface_hub

Browse files
eval-results-2026-05-05/README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # REAM-192 HumanEval bundle (2026-05-05)
2
+
3
+ Raw lm-eval-harness output for the comparison reported in the model card. Six runs:
4
+
5
+ | Subdir | Model | pass@1 | Notes |
6
+ |---|---|---:|---|
7
+ | `bf16/` | REAM-192 bf16 (vLLM) | 0.7134 | Reference |
8
+ | `q4km/` | REAM-192 Q4_K_M (llama-server) | 0.6768 | This repo |
9
+ | `q3ks/` | REAM-192 Q3_K_S (llama-server) | 0.6768 | This repo |
10
+ | `reap-q4km/` | atbender REAP-26B Q4_K_M | 0.6646 | Direct merge-vs-prune at same compression |
11
+ | `bartowski-q4km/` | Bartowski Qwen3.6-35B-A3B Q4_K_M (unmerged) | 0.6463 | Community gold-standard quant of unmerged base |
12
+ | `unsloth-ud-q4km/` | Unsloth Qwen3.6-35B-A3B UD-Q4_K_M (unmerged) | 0.6280 | Dynamic-quant variant of unmerged base |
13
+
14
+ Each subdir contains lm-eval's `results_<timestamp>.json` with full per-task metrics + generation kwargs + sample outputs. The corresponding `<label>-eval.log` next to each subdir is the stdout/stderr from that run.
15
+
16
+ ## Methodology
17
+
18
+ - **Task**: lm-eval-harness `humaneval` (164 problems, OpenAI 2021), default config
19
+ - **Decoding**: greedy (`temperature=0`), `max_gen_toks=1024`
20
+ - **Mode**: raw completion (no `--apply_chat_template`), thinking off
21
+ - **Stop sequences**: lm-eval default `['\nclass', '\ndef', '\n#', '\nif', '\nprint']`
22
+ - **Inference engines**: vLLM 0.6+ (bf16), upstream `llama-server` b9020-era CUDA build (GGUFs)
23
+ - **Tokenizer**: `/workspace/REAM-192-bf16` HF dir shared across all runs (consistent client-side tokenization)
24
+ - **Hardware**: RunPod A100 SXM (80 GB VRAM), one model at a time
25
+ - **Total cost**: ~$3.50
26
+
27
+ ## Caveats (in short)
28
+
29
+ - Single benchmark, single greedy run, ~3.7 pp standard error
30
+ - Numbers don't match published Qwen3.6 ~85% (different methodology — chat template + thinking on)
31
+ - Confidence intervals overlap meaningfully across the unmerged-base trio; ranking is directional not statistically definitive
32
+ - See main model-card section for full caveats + interpretation
eval-results-2026-05-05/bartowski-q4km-eval.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/164 [00:00<?, ?it/s]
1
  79%|███████▉ | 130/164 [00:00<00:00, 1297.43it/s]
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /usr/local/lib/python3.12/dist-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.5.0) or chardet (6.0.0.post1)/charset_normalizer (3.4.3) doesn't match a supported version!
2
+ warnings.warn(
3
+ 2026-05-05:22:44:38 INFO [_cli.run:376] Selected Tasks: ['humaneval']
4
+ 2026-05-05:22:44:39 INFO [evaluator:211] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
5
+ 2026-05-05:22:44:39 WARNING [evaluator:223] generation_kwargs: {'temperature': 0, 'max_gen_toks': 1024} specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
6
+ 2026-05-05:22:44:39 INFO [evaluator:236] Initializing local-completions model, with arguments: {'model': '/workspace/bartowski-gguf/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}
7
+ 2026-05-05:22:44:39 INFO [models.openai_completions:42] Remote tokenizer not supported. Using huggingface tokenizer backend.
8
+ 2026-05-05:22:44:39 INFO [models.api_models:172] Using max length 2048 - 1
9
+ 2026-05-05:22:44:39 INFO [models.api_models:175] Concurrent requests are disabled. To enable concurrent requests, set `num_concurrent` > 1.
10
+ 2026-05-05:22:44:39 INFO [models.api_models:193] Using tokenizer huggingface
11
+ 2026-05-05:22:44:50 INFO [tasks:700] Selected tasks:
12
+ 2026-05-05:22:44:50 INFO [tasks:691] Task: humaneval (humaneval/humaneval.yaml)
13
+ 2026-05-05:22:44:50 INFO [evaluator:314] humaneval: Using gen_kwargs: {'until': ['\nclass', '\ndef', '\n#', '\nif', '\nprint'], 'max_gen_toks': 1024, 'do_sample': False, 'temperature': 0}
14
+ 2026-05-05:22:44:50 INFO [api.task:311] Building contexts for humaneval on rank 0...
15
+
16
  0%| | 0/164 [00:00<?, ?it/s]
17
  79%|███████▉ | 130/164 [00:00<00:00, 1297.43it/s]
18
+ 2026-05-05:22:44:50 INFO [evaluator:584] Running generate_until requests
19
+ 2026-05-05:22:44:50 INFO [models.api_models:733] Tokenized requests are disabled. Context + generation length is not checked.
20
+
21
+ fatal: not a git repository (or any parent up to mount point /)
22
+ Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
23
+ 2026-05-05:22:47:39 INFO [loggers.evaluation_tracker:247] Saving results aggregated
24
+ local-completions ({'model': '/workspace/bartowski-gguf/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({'temperature': 0, 'max_gen_toks': 1024}), limit: None, num_fewshot: None, batch_size: 1
25
+ | Tasks |Version| Filter |n-shot|Metric| |Value | |Stderr|
26
+ |---------|------:|-----------|-----:|------|---|-----:|---|-----:|
27
+ |humaneval| 1|create_test| 0|pass@1| |0.6463|± |0.0374|
28
+
eval-results-2026-05-05/bartowski-q4km/__workspace__bartowski-gguf__Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf/results_2026-05-05T22-47-39.616055.json ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": {
3
+ "humaneval": {
4
+ "alias": "humaneval",
5
+ "pass@1,create_test": 0.6463414634146342,
6
+ "pass@1_stderr,create_test": 0.03744805613781577
7
+ }
8
+ },
9
+ "group_subtasks": {
10
+ "humaneval": []
11
+ },
12
+ "configs": {
13
+ "humaneval": {
14
+ "task": "humaneval",
15
+ "dataset_path": "openai/openai_humaneval",
16
+ "test_split": "test",
17
+ "doc_to_text": "{{prompt}}",
18
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
19
+ "unsafe_code": true,
20
+ "description": "",
21
+ "target_delimiter": " ",
22
+ "fewshot_delimiter": "\n\n",
23
+ "fewshot_config": {
24
+ "sampler": "default",
25
+ "split": null,
26
+ "process_docs": null,
27
+ "fewshot_indices": null,
28
+ "samples": null,
29
+ "doc_to_text": "{{prompt}}",
30
+ "doc_to_choice": null,
31
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
32
+ "gen_prefix": null,
33
+ "fewshot_delimiter": "\n\n",
34
+ "target_delimiter": " "
35
+ },
36
+ "num_fewshot": 0,
37
+ "metric_list": [
38
+ {
39
+ "metric": "def pass_at_k(references: list[str], predictions: list[list[str]], k: list[int] = None):\n global compute_\n assert k is not None\n if isinstance(k, int):\n k = [k]\n res = compute_.compute(\n references=references,\n predictions=predictions,\n k=k,\n )\n return res[0]\n",
40
+ "aggregation": "mean",
41
+ "higher_is_better": true,
42
+ "k": [
43
+ 1
44
+ ]
45
+ }
46
+ ],
47
+ "output_type": "generate_until",
48
+ "generation_kwargs": {
49
+ "until": [
50
+ "\nclass",
51
+ "\ndef",
52
+ "\n#",
53
+ "\nif",
54
+ "\nprint"
55
+ ],
56
+ "max_gen_toks": 1024,
57
+ "do_sample": false,
58
+ "temperature": 0
59
+ },
60
+ "repeats": 1,
61
+ "filter_list": [
62
+ {
63
+ "name": "create_test",
64
+ "filter": [
65
+ {
66
+ "function": "custom",
67
+ "filter_fn": "<function build_predictions at 0x7068f4240180>"
68
+ }
69
+ ]
70
+ }
71
+ ],
72
+ "should_decontaminate": false,
73
+ "metadata": {
74
+ "version": 1.0,
75
+ "model": "/workspace/bartowski-gguf/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf",
76
+ "tokenizer": "/workspace/REAM-192-bf16",
77
+ "base_url": "http://127.0.0.1:8080/v1/completions",
78
+ "num_concurrent": 1,
79
+ "max_retries": 3,
80
+ "tokenized_requests": false
81
+ }
82
+ }
83
+ },
84
+ "versions": {
85
+ "humaneval": 1.0
86
+ },
87
+ "n-shot": {
88
+ "humaneval": 0
89
+ },
90
+ "higher_is_better": {
91
+ "humaneval": {
92
+ "pass_at_k": true
93
+ }
94
+ },
95
+ "n-samples": {
96
+ "humaneval": {
97
+ "original": 164,
98
+ "effective": 164
99
+ }
100
+ },
101
+ "config": {
102
+ "model": "local-completions",
103
+ "model_args": {
104
+ "model": "/workspace/bartowski-gguf/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf",
105
+ "tokenizer": "/workspace/REAM-192-bf16",
106
+ "base_url": "http://127.0.0.1:8080/v1/completions",
107
+ "num_concurrent": 1,
108
+ "max_retries": 3,
109
+ "tokenized_requests": false
110
+ },
111
+ "batch_size": 1,
112
+ "batch_sizes": [],
113
+ "device": "cuda:0",
114
+ "use_cache": null,
115
+ "limit": null,
116
+ "bootstrap_iters": 100000,
117
+ "gen_kwargs": {
118
+ "temperature": 0,
119
+ "max_gen_toks": 1024
120
+ },
121
+ "random_seed": 0,
122
+ "numpy_seed": 1234,
123
+ "torch_seed": 1234,
124
+ "fewshot_seed": 1234
125
+ },
126
+ "git_hash": null,
127
+ "date": 1778021078.100925,
128
+ "pretty_env_info": "PyTorch version: 2.11.0+cu130\nIs debug build: False\nCUDA used to build PyTorch: 13.0\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 24.04.3 LTS (x86_64)\nGCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0\nClang version: Could not collect\nCMake version: version 3.28.3\nLibc version: glibc-2.39\n\nPython version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] (64-bit runtime)\nPython platform: Linux-6.8.0-110-generic-x86_64-with-glibc2.39\nIs CUDA available: True\nCUDA runtime version: 12.8.93\nCUDA_MODULE_LOADING set to: \nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB\nNvidia driver version: 580.126.20\ncuDNN version: Probably one of the following:\n/usr/lib/x86_64-linux-gnu/libcudnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.8.0\nIs XPU available: False\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\nCaching allocator config: N/A\n\nCPU:\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nAddress sizes: 48 bits physical, 48 bits virtual\nByte Order: Little Endian\nCPU(s): 256\nOn-line CPU(s) list: 0-254\nOff-line CPU(s) list: 255\nVendor ID: AuthenticAMD\nModel name: AMD EPYC 7763 64-Core Processor\nCPU family: 25\nModel: 1\nThread(s) per core: 2\nCore(s) per socket: 64\nSocket(s): 2\nStepping: 1\nFrequency boost: enabled\nCPU(s) scaling MHz: 74%\nCPU max MHz: 3530.4929\nCPU min MHz: 0.0000\nBogoMIPS: 4900.44\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user\nVirtualization: AMD-V\nL1d cache: 4 MiB (128 instances)\nL1i cache: 4 MiB (128 instances)\nL2 cache: 64 MiB (128 instances)\nL3 cache: 512 MiB (16 instances)\nNUMA node(s): 2\nNUMA node0 CPU(s): 0-63,128-191\nNUMA node1 CPU(s): 64-127,192-254\nVulnerability Gather data sampling: Not affected\nVulnerability Indirect target selection: Not affected\nVulnerability Itlb multihit: Not affected\nVulnerability L1tf: Not affected\nVulnerability Mds: Not affected\nVulnerability Meltdown: Not affected\nVulnerability Mmio stale data: Not affected\nVulnerability Reg file data sampling: Not affected\nVulnerability Retbleed: Not affected\nVulnerability Spec rstack overflow: Mitigation; Safe RET\nVulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected\nVulnerability Srbds: Not affected\nVulnerability Tsa: Vulnerable: Clear CPU buffers attempted, no microcode\nVulnerability Tsx async abort: Not affected\nVulnerability Vmscape: Mitigation; IBPB before exit to userspace\n\nVersions of relevant libraries:\n[pip3] numpy==2.1.2\n[pip3] nvidia-cublas==13.1.0.3\n[pip3] nvidia-cublas-cu12==12.8.4.1\n[pip3] nvidia-cuda-cupti==13.0.85\n[pip3] nvidia-cuda-cupti-cu12==12.8.90\n[pip3] nvidia-cuda-nvrtc==13.0.88\n[pip3] nvidia-cuda-nvrtc-cu12==12.8.93\n[pip3] nvidia-cuda-runtime==13.0.96\n[pip3] nvidia-cuda-runtime-cu12==12.8.90\n[pip3] nvidia-cudnn-cu12==9.10.2.21\n[pip3] nvidia-cudnn-cu13==9.19.0.56\n[pip3] nvidia-cudnn-frontend==1.18.0\n[pip3] nvidia-cufft==12.0.0.61\n[pip3] nvidia-cufft-cu12==11.3.3.83\n[pip3] nvidia-curand==10.4.0.35\n[pip3] nvidia-curand-cu12==10.3.9.90\n[pip3] nvidia-cusolver==12.0.4.66\n[pip3] nvidia-cusolver-cu12==11.7.3.90\n[pip3] nvidia-cusparse==12.6.3.3\n[pip3] nvidia-cusparse-cu12==12.5.8.93\n[pip3] nvidia-cusparselt-cu12==0.7.1\n[pip3] nvidia-cusparselt-cu13==0.8.0\n[pip3] nvidia-nccl-cu12==2.27.3\n[pip3] nvidia-nccl-cu13==2.28.9\n[pip3] nvidia-nvjitlink==13.0.88\n[pip3] nvidia-nvjitlink-cu12==12.8.93\n[pip3] nvidia-nvtx==13.0.85\n[pip3] nvidia-nvtx-cu12==12.8.90\n[pip3] torch==2.11.0\n[pip3] torch_c_dlpack_ext==0.1.5\n[pip3] torchaudio==2.11.0\n[pip3] torchvision==0.26.0\n[pip3] triton==3.6.0\n[conda] Could not collect",
129
+ "transformers_version": "5.8.0",
130
+ "lm_eval_version": "0.4.11",
131
+ "upper_git_hash": null,
132
+ "tokenizer_pad_token": [
133
+ "<|endoftext|>",
134
+ "248044"
135
+ ],
136
+ "tokenizer_eos_token": [
137
+ "<|im_end|>",
138
+ "248046"
139
+ ],
140
+ "tokenizer_bos_token": [
141
+ null,
142
+ "None"
143
+ ],
144
+ "eot_token_id": 248046,
145
+ "max_length": 2047,
146
+ "task_hashes": {},
147
+ "model_source": "local-completions",
148
+ "model_name": "/workspace/bartowski-gguf/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf",
149
+ "model_name_sanitized": "__workspace__bartowski-gguf__Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf",
150
+ "system_instruction": null,
151
+ "system_instruction_sha": null,
152
+ "fewshot_as_multiturn": null,
153
+ "chat_template": null,
154
+ "chat_template_sha": null,
155
+ "total_evaluation_time_seconds": "209.96706952992827"
156
+ }
eval-results-2026-05-05/bf16-eval.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/164 [00:00<?, ?it/s]
1
  82%|████████▏ | 134/164 [00:00<00:00, 1334.28it/s]
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /usr/local/lib/python3.12/dist-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.5.0) or chardet (6.0.0.post1)/charset_normalizer (3.4.3) doesn't match a supported version!
2
+ warnings.warn(
3
+ 2026-05-05:21:35:20 INFO [_cli.run:376] Selected Tasks: ['humaneval']
4
+ 2026-05-05:21:35:21 INFO [evaluator:211] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
5
+ 2026-05-05:21:35:21 WARNING [evaluator:223] generation_kwargs: {'temperature': 0, 'max_gen_toks': 1024} specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
6
+ 2026-05-05:21:35:21 INFO [evaluator:236] Initializing local-completions model, with arguments: {'model': '/workspace/REAM-192-bf16', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8000/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}
7
+ 2026-05-05:21:35:21 INFO [models.openai_completions:42] Remote tokenizer not supported. Using huggingface tokenizer backend.
8
+ 2026-05-05:21:35:21 INFO [models.api_models:172] Using max length 2048 - 1
9
+ 2026-05-05:21:35:21 INFO [models.api_models:175] Concurrent requests are disabled. To enable concurrent requests, set `num_concurrent` > 1.
10
+ 2026-05-05:21:35:21 INFO [models.api_models:193] Using tokenizer huggingface
11
+ 2026-05-05:21:35:32 INFO [tasks:700] Selected tasks:
12
+ 2026-05-05:21:35:32 INFO [tasks:691] Task: humaneval (humaneval/humaneval.yaml)
13
+ 2026-05-05:21:35:32 INFO [evaluator:314] humaneval: Using gen_kwargs: {'until': ['\nclass', '\ndef', '\n#', '\nif', '\nprint'], 'max_gen_toks': 1024, 'do_sample': False, 'temperature': 0}
14
+ 2026-05-05:21:35:32 INFO [api.task:311] Building contexts for humaneval on rank 0...
15
+
16
  0%| | 0/164 [00:00<?, ?it/s]
17
  82%|████████▏ | 134/164 [00:00<00:00, 1334.28it/s]
18
+ 2026-05-05:21:35:32 INFO [evaluator:584] Running generate_until requests
19
+ 2026-05-05:21:35:32 INFO [models.api_models:733] Tokenized requests are disabled. Context + generation length is not checked.
20
+
21
+ fatal: not a git repository (or any parent up to mount point /)
22
+ Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
23
+ 2026-05-05:21:37:20 INFO [loggers.evaluation_tracker:247] Saving results aggregated
24
+ local-completions ({'model': '/workspace/REAM-192-bf16', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8000/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({'temperature': 0, 'max_gen_toks': 1024}), limit: None, num_fewshot: None, batch_size: 1
25
+ | Tasks |Version| Filter |n-shot|Metric| |Value | |Stderr|
26
+ |---------|------:|-----------|-----:|------|---|-----:|---|-----:|
27
+ |humaneval| 1|create_test| 0|pass@1| |0.7134|± |0.0354|
28
+
eval-results-2026-05-05/bf16/__workspace__REAM-192-bf16/results_2026-05-05T21-37-20.029190.json ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": {
3
+ "humaneval": {
4
+ "alias": "humaneval",
5
+ "pass@1,create_test": 0.7134146341463414,
6
+ "pass@1_stderr,create_test": 0.035416383329935054
7
+ }
8
+ },
9
+ "group_subtasks": {
10
+ "humaneval": []
11
+ },
12
+ "configs": {
13
+ "humaneval": {
14
+ "task": "humaneval",
15
+ "dataset_path": "openai/openai_humaneval",
16
+ "test_split": "test",
17
+ "doc_to_text": "{{prompt}}",
18
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
19
+ "unsafe_code": true,
20
+ "description": "",
21
+ "target_delimiter": " ",
22
+ "fewshot_delimiter": "\n\n",
23
+ "fewshot_config": {
24
+ "sampler": "default",
25
+ "split": null,
26
+ "process_docs": null,
27
+ "fewshot_indices": null,
28
+ "samples": null,
29
+ "doc_to_text": "{{prompt}}",
30
+ "doc_to_choice": null,
31
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
32
+ "gen_prefix": null,
33
+ "fewshot_delimiter": "\n\n",
34
+ "target_delimiter": " "
35
+ },
36
+ "num_fewshot": 0,
37
+ "metric_list": [
38
+ {
39
+ "metric": "def pass_at_k(references: list[str], predictions: list[list[str]], k: list[int] = None):\n global compute_\n assert k is not None\n if isinstance(k, int):\n k = [k]\n res = compute_.compute(\n references=references,\n predictions=predictions,\n k=k,\n )\n return res[0]\n",
40
+ "aggregation": "mean",
41
+ "higher_is_better": true,
42
+ "k": [
43
+ 1
44
+ ]
45
+ }
46
+ ],
47
+ "output_type": "generate_until",
48
+ "generation_kwargs": {
49
+ "until": [
50
+ "\nclass",
51
+ "\ndef",
52
+ "\n#",
53
+ "\nif",
54
+ "\nprint"
55
+ ],
56
+ "max_gen_toks": 1024,
57
+ "do_sample": false,
58
+ "temperature": 0
59
+ },
60
+ "repeats": 1,
61
+ "filter_list": [
62
+ {
63
+ "name": "create_test",
64
+ "filter": [
65
+ {
66
+ "function": "custom",
67
+ "filter_fn": "<function build_predictions at 0x7db0822100e0>"
68
+ }
69
+ ]
70
+ }
71
+ ],
72
+ "should_decontaminate": false,
73
+ "metadata": {
74
+ "version": 1.0,
75
+ "model": "/workspace/REAM-192-bf16",
76
+ "tokenizer": "/workspace/REAM-192-bf16",
77
+ "base_url": "http://127.0.0.1:8000/v1/completions",
78
+ "num_concurrent": 1,
79
+ "max_retries": 3,
80
+ "tokenized_requests": false
81
+ }
82
+ }
83
+ },
84
+ "versions": {
85
+ "humaneval": 1.0
86
+ },
87
+ "n-shot": {
88
+ "humaneval": 0
89
+ },
90
+ "higher_is_better": {
91
+ "humaneval": {
92
+ "pass_at_k": true
93
+ }
94
+ },
95
+ "n-samples": {
96
+ "humaneval": {
97
+ "original": 164,
98
+ "effective": 164
99
+ }
100
+ },
101
+ "config": {
102
+ "model": "local-completions",
103
+ "model_args": {
104
+ "model": "/workspace/REAM-192-bf16",
105
+ "tokenizer": "/workspace/REAM-192-bf16",
106
+ "base_url": "http://127.0.0.1:8000/v1/completions",
107
+ "num_concurrent": 1,
108
+ "max_retries": 3,
109
+ "tokenized_requests": false
110
+ },
111
+ "batch_size": 1,
112
+ "batch_sizes": [],
113
+ "device": "cuda:0",
114
+ "use_cache": null,
115
+ "limit": null,
116
+ "bootstrap_iters": 100000,
117
+ "gen_kwargs": {
118
+ "temperature": 0,
119
+ "max_gen_toks": 1024
120
+ },
121
+ "random_seed": 0,
122
+ "numpy_seed": 1234,
123
+ "torch_seed": 1234,
124
+ "fewshot_seed": 1234
125
+ },
126
+ "git_hash": null,
127
+ "date": 1778016920.1216605,
128
+ "pretty_env_info": "PyTorch version: 2.11.0+cu130\nIs debug build: False\nCUDA used to build PyTorch: 13.0\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 24.04.3 LTS (x86_64)\nGCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0\nClang version: Could not collect\nCMake version: version 3.28.3\nLibc version: glibc-2.39\n\nPython version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] (64-bit runtime)\nPython platform: Linux-6.8.0-110-generic-x86_64-with-glibc2.39\nIs CUDA available: True\nCUDA runtime version: 12.8.93\nCUDA_MODULE_LOADING set to: \nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB\nNvidia driver version: 580.126.20\ncuDNN version: Probably one of the following:\n/usr/lib/x86_64-linux-gnu/libcudnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.8.0\nIs XPU available: False\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\nCaching allocator config: N/A\n\nCPU:\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nAddress sizes: 48 bits physical, 48 bits virtual\nByte Order: Little Endian\nCPU(s): 256\nOn-line CPU(s) list: 0-254\nOff-line CPU(s) list: 255\nVendor ID: AuthenticAMD\nModel name: AMD EPYC 7763 64-Core Processor\nCPU family: 25\nModel: 1\nThread(s) per core: 2\nCore(s) per socket: 64\nSocket(s): 2\nStepping: 1\nFrequency boost: enabled\nCPU(s) scaling MHz: 76%\nCPU max MHz: 3530.4929\nCPU min MHz: 0.0000\nBogoMIPS: 4900.44\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user\nVirtualization: AMD-V\nL1d cache: 4 MiB (128 instances)\nL1i cache: 4 MiB (128 instances)\nL2 cache: 64 MiB (128 instances)\nL3 cache: 512 MiB (16 instances)\nNUMA node(s): 2\nNUMA node0 CPU(s): 0-63,128-191\nNUMA node1 CPU(s): 64-127,192-254\nVulnerability Gather data sampling: Not affected\nVulnerability Indirect target selection: Not affected\nVulnerability Itlb multihit: Not affected\nVulnerability L1tf: Not affected\nVulnerability Mds: Not affected\nVulnerability Meltdown: Not affected\nVulnerability Mmio stale data: Not affected\nVulnerability Reg file data sampling: Not affected\nVulnerability Retbleed: Not affected\nVulnerability Spec rstack overflow: Mitigation; Safe RET\nVulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected\nVulnerability Srbds: Not affected\nVulnerability Tsa: Vulnerable: Clear CPU buffers attempted, no microcode\nVulnerability Tsx async abort: Not affected\nVulnerability Vmscape: Mitigation; IBPB before exit to userspace\n\nVersions of relevant libraries:\n[pip3] numpy==2.1.2\n[pip3] nvidia-cublas==13.1.0.3\n[pip3] nvidia-cublas-cu12==12.8.4.1\n[pip3] nvidia-cuda-cupti==13.0.85\n[pip3] nvidia-cuda-cupti-cu12==12.8.90\n[pip3] nvidia-cuda-nvrtc==13.0.88\n[pip3] nvidia-cuda-nvrtc-cu12==12.8.93\n[pip3] nvidia-cuda-runtime==13.0.96\n[pip3] nvidia-cuda-runtime-cu12==12.8.90\n[pip3] nvidia-cudnn-cu12==9.10.2.21\n[pip3] nvidia-cudnn-cu13==9.19.0.56\n[pip3] nvidia-cudnn-frontend==1.18.0\n[pip3] nvidia-cufft==12.0.0.61\n[pip3] nvidia-cufft-cu12==11.3.3.83\n[pip3] nvidia-curand==10.4.0.35\n[pip3] nvidia-curand-cu12==10.3.9.90\n[pip3] nvidia-cusolver==12.0.4.66\n[pip3] nvidia-cusolver-cu12==11.7.3.90\n[pip3] nvidia-cusparse==12.6.3.3\n[pip3] nvidia-cusparse-cu12==12.5.8.93\n[pip3] nvidia-cusparselt-cu12==0.7.1\n[pip3] nvidia-cusparselt-cu13==0.8.0\n[pip3] nvidia-nccl-cu12==2.27.3\n[pip3] nvidia-nccl-cu13==2.28.9\n[pip3] nvidia-nvjitlink==13.0.88\n[pip3] nvidia-nvjitlink-cu12==12.8.93\n[pip3] nvidia-nvtx==13.0.85\n[pip3] nvidia-nvtx-cu12==12.8.90\n[pip3] torch==2.11.0\n[pip3] torch_c_dlpack_ext==0.1.5\n[pip3] torchaudio==2.11.0\n[pip3] torchvision==0.26.0\n[pip3] triton==3.6.0\n[conda] Could not collect",
129
+ "transformers_version": "5.8.0",
130
+ "lm_eval_version": "0.4.11",
131
+ "upper_git_hash": null,
132
+ "tokenizer_pad_token": [
133
+ "<|endoftext|>",
134
+ "248044"
135
+ ],
136
+ "tokenizer_eos_token": [
137
+ "<|im_end|>",
138
+ "248046"
139
+ ],
140
+ "tokenizer_bos_token": [
141
+ null,
142
+ "None"
143
+ ],
144
+ "eot_token_id": 248046,
145
+ "max_length": 2047,
146
+ "task_hashes": {},
147
+ "model_source": "local-completions",
148
+ "model_name": "/workspace/REAM-192-bf16",
149
+ "model_name_sanitized": "__workspace__REAM-192-bf16",
150
+ "system_instruction": null,
151
+ "system_instruction_sha": null,
152
+ "fewshot_as_multiturn": null,
153
+ "chat_template": null,
154
+ "chat_template_sha": null,
155
+ "total_evaluation_time_seconds": "150.0233008810319"
156
+ }
eval-results-2026-05-05/q3ks-eval.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/164 [00:00<?, ?it/s]
1
  83%|████████▎ | 136/164 [00:00<00:00, 1351.25it/s]
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /usr/local/lib/python3.12/dist-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.5.0) or chardet (6.0.0.post1)/charset_normalizer (3.4.3) doesn't match a supported version!
2
+ warnings.warn(
3
+ 2026-05-05:21:42:20 INFO [_cli.run:376] Selected Tasks: ['humaneval']
4
+ 2026-05-05:21:42:21 INFO [evaluator:211] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
5
+ 2026-05-05:21:42:21 WARNING [evaluator:223] generation_kwargs: {'temperature': 0, 'max_gen_toks': 1024} specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
6
+ 2026-05-05:21:42:21 INFO [evaluator:236] Initializing local-completions model, with arguments: {'model': '/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q3_K_S.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}
7
+ 2026-05-05:21:42:21 INFO [models.openai_completions:42] Remote tokenizer not supported. Using huggingface tokenizer backend.
8
+ 2026-05-05:21:42:21 INFO [models.api_models:172] Using max length 2048 - 1
9
+ 2026-05-05:21:42:21 INFO [models.api_models:175] Concurrent requests are disabled. To enable concurrent requests, set `num_concurrent` > 1.
10
+ 2026-05-05:21:42:21 INFO [models.api_models:193] Using tokenizer huggingface
11
+ 2026-05-05:21:42:32 INFO [tasks:700] Selected tasks:
12
+ 2026-05-05:21:42:32 INFO [tasks:691] Task: humaneval (humaneval/humaneval.yaml)
13
+ 2026-05-05:21:42:32 INFO [evaluator:314] humaneval: Using gen_kwargs: {'until': ['\nclass', '\ndef', '\n#', '\nif', '\nprint'], 'max_gen_toks': 1024, 'do_sample': False, 'temperature': 0}
14
+ 2026-05-05:21:42:32 INFO [api.task:311] Building contexts for humaneval on rank 0...
15
+
16
  0%| | 0/164 [00:00<?, ?it/s]
17
  83%|████████▎ | 136/164 [00:00<00:00, 1351.25it/s]
18
+ 2026-05-05:21:42:32 INFO [evaluator:584] Running generate_until requests
19
+ 2026-05-05:21:42:32 INFO [models.api_models:733] Tokenized requests are disabled. Context + generation length is not checked.
20
+
21
+ fatal: not a git repository (or any parent up to mount point /)
22
+ Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
23
+ 2026-05-05:21:45:27 INFO [loggers.evaluation_tracker:247] Saving results aggregated
24
+ local-completions ({'model': '/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q3_K_S.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({'temperature': 0, 'max_gen_toks': 1024}), limit: None, num_fewshot: None, batch_size: 1
25
+ | Tasks |Version| Filter |n-shot|Metric| |Value | |Stderr|
26
+ |---------|------:|-----------|-----:|------|---|-----:|---|-----:|
27
+ |humaneval| 1|create_test| 0|pass@1| |0.6768|± |0.0366|
28
+
eval-results-2026-05-05/q3ks/__workspace__gguf__Qwen3.6-35B-A3B-REAM-192-Q3_K_S.gguf/results_2026-05-05T21-45-27.983146.json ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": {
3
+ "humaneval": {
4
+ "alias": "humaneval",
5
+ "pass@1,create_test": 0.676829268292683,
6
+ "pass@1_stderr,create_test": 0.03663209644602847
7
+ }
8
+ },
9
+ "group_subtasks": {
10
+ "humaneval": []
11
+ },
12
+ "configs": {
13
+ "humaneval": {
14
+ "task": "humaneval",
15
+ "dataset_path": "openai/openai_humaneval",
16
+ "test_split": "test",
17
+ "doc_to_text": "{{prompt}}",
18
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
19
+ "unsafe_code": true,
20
+ "description": "",
21
+ "target_delimiter": " ",
22
+ "fewshot_delimiter": "\n\n",
23
+ "fewshot_config": {
24
+ "sampler": "default",
25
+ "split": null,
26
+ "process_docs": null,
27
+ "fewshot_indices": null,
28
+ "samples": null,
29
+ "doc_to_text": "{{prompt}}",
30
+ "doc_to_choice": null,
31
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
32
+ "gen_prefix": null,
33
+ "fewshot_delimiter": "\n\n",
34
+ "target_delimiter": " "
35
+ },
36
+ "num_fewshot": 0,
37
+ "metric_list": [
38
+ {
39
+ "metric": "def pass_at_k(references: list[str], predictions: list[list[str]], k: list[int] = None):\n global compute_\n assert k is not None\n if isinstance(k, int):\n k = [k]\n res = compute_.compute(\n references=references,\n predictions=predictions,\n k=k,\n )\n return res[0]\n",
40
+ "aggregation": "mean",
41
+ "higher_is_better": true,
42
+ "k": [
43
+ 1
44
+ ]
45
+ }
46
+ ],
47
+ "output_type": "generate_until",
48
+ "generation_kwargs": {
49
+ "until": [
50
+ "\nclass",
51
+ "\ndef",
52
+ "\n#",
53
+ "\nif",
54
+ "\nprint"
55
+ ],
56
+ "max_gen_toks": 1024,
57
+ "do_sample": false,
58
+ "temperature": 0
59
+ },
60
+ "repeats": 1,
61
+ "filter_list": [
62
+ {
63
+ "name": "create_test",
64
+ "filter": [
65
+ {
66
+ "function": "custom",
67
+ "filter_fn": "<function build_predictions at 0x7cf5e6c48220>"
68
+ }
69
+ ]
70
+ }
71
+ ],
72
+ "should_decontaminate": false,
73
+ "metadata": {
74
+ "version": 1.0,
75
+ "model": "/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q3_K_S.gguf",
76
+ "tokenizer": "/workspace/REAM-192-bf16",
77
+ "base_url": "http://127.0.0.1:8080/v1/completions",
78
+ "num_concurrent": 1,
79
+ "max_retries": 3,
80
+ "tokenized_requests": false
81
+ }
82
+ }
83
+ },
84
+ "versions": {
85
+ "humaneval": 1.0
86
+ },
87
+ "n-shot": {
88
+ "humaneval": 0
89
+ },
90
+ "higher_is_better": {
91
+ "humaneval": {
92
+ "pass_at_k": true
93
+ }
94
+ },
95
+ "n-samples": {
96
+ "humaneval": {
97
+ "original": 164,
98
+ "effective": 164
99
+ }
100
+ },
101
+ "config": {
102
+ "model": "local-completions",
103
+ "model_args": {
104
+ "model": "/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q3_K_S.gguf",
105
+ "tokenizer": "/workspace/REAM-192-bf16",
106
+ "base_url": "http://127.0.0.1:8080/v1/completions",
107
+ "num_concurrent": 1,
108
+ "max_retries": 3,
109
+ "tokenized_requests": false
110
+ },
111
+ "batch_size": 1,
112
+ "batch_sizes": [],
113
+ "device": "cuda:0",
114
+ "use_cache": null,
115
+ "limit": null,
116
+ "bootstrap_iters": 100000,
117
+ "gen_kwargs": {
118
+ "temperature": 0,
119
+ "max_gen_toks": 1024
120
+ },
121
+ "random_seed": 0,
122
+ "numpy_seed": 1234,
123
+ "torch_seed": 1234,
124
+ "fewshot_seed": 1234
125
+ },
126
+ "git_hash": null,
127
+ "date": 1778017340.007905,
128
+ "pretty_env_info": "PyTorch version: 2.11.0+cu130\nIs debug build: False\nCUDA used to build PyTorch: 13.0\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 24.04.3 LTS (x86_64)\nGCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0\nClang version: Could not collect\nCMake version: version 3.28.3\nLibc version: glibc-2.39\n\nPython version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] (64-bit runtime)\nPython platform: Linux-6.8.0-110-generic-x86_64-with-glibc2.39\nIs CUDA available: True\nCUDA runtime version: 12.8.93\nCUDA_MODULE_LOADING set to: \nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB\nNvidia driver version: 580.126.20\ncuDNN version: Probably one of the following:\n/usr/lib/x86_64-linux-gnu/libcudnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.8.0\nIs XPU available: False\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\nCaching allocator config: N/A\n\nCPU:\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nAddress sizes: 48 bits physical, 48 bits virtual\nByte Order: Little Endian\nCPU(s): 256\nOn-line CPU(s) list: 0-254\nOff-line CPU(s) list: 255\nVendor ID: AuthenticAMD\nModel name: AMD EPYC 7763 64-Core Processor\nCPU family: 25\nModel: 1\nThread(s) per core: 2\nCore(s) per socket: 64\nSocket(s): 2\nStepping: 1\nFrequency boost: enabled\nCPU(s) scaling MHz: 76%\nCPU max MHz: 3530.4929\nCPU min MHz: 0.0000\nBogoMIPS: 4900.44\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user\nVirtualization: AMD-V\nL1d cache: 4 MiB (128 instances)\nL1i cache: 4 MiB (128 instances)\nL2 cache: 64 MiB (128 instances)\nL3 cache: 512 MiB (16 instances)\nNUMA node(s): 2\nNUMA node0 CPU(s): 0-63,128-191\nNUMA node1 CPU(s): 64-127,192-254\nVulnerability Gather data sampling: Not affected\nVulnerability Indirect target selection: Not affected\nVulnerability Itlb multihit: Not affected\nVulnerability L1tf: Not affected\nVulnerability Mds: Not affected\nVulnerability Meltdown: Not affected\nVulnerability Mmio stale data: Not affected\nVulnerability Reg file data sampling: Not affected\nVulnerability Retbleed: Not affected\nVulnerability Spec rstack overflow: Mitigation; Safe RET\nVulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected\nVulnerability Srbds: Not affected\nVulnerability Tsa: Vulnerable: Clear CPU buffers attempted, no microcode\nVulnerability Tsx async abort: Not affected\nVulnerability Vmscape: Mitigation; IBPB before exit to userspace\n\nVersions of relevant libraries:\n[pip3] numpy==2.1.2\n[pip3] nvidia-cublas==13.1.0.3\n[pip3] nvidia-cublas-cu12==12.8.4.1\n[pip3] nvidia-cuda-cupti==13.0.85\n[pip3] nvidia-cuda-cupti-cu12==12.8.90\n[pip3] nvidia-cuda-nvrtc==13.0.88\n[pip3] nvidia-cuda-nvrtc-cu12==12.8.93\n[pip3] nvidia-cuda-runtime==13.0.96\n[pip3] nvidia-cuda-runtime-cu12==12.8.90\n[pip3] nvidia-cudnn-cu12==9.10.2.21\n[pip3] nvidia-cudnn-cu13==9.19.0.56\n[pip3] nvidia-cudnn-frontend==1.18.0\n[pip3] nvidia-cufft==12.0.0.61\n[pip3] nvidia-cufft-cu12==11.3.3.83\n[pip3] nvidia-curand==10.4.0.35\n[pip3] nvidia-curand-cu12==10.3.9.90\n[pip3] nvidia-cusolver==12.0.4.66\n[pip3] nvidia-cusolver-cu12==11.7.3.90\n[pip3] nvidia-cusparse==12.6.3.3\n[pip3] nvidia-cusparse-cu12==12.5.8.93\n[pip3] nvidia-cusparselt-cu12==0.7.1\n[pip3] nvidia-cusparselt-cu13==0.8.0\n[pip3] nvidia-nccl-cu12==2.27.3\n[pip3] nvidia-nccl-cu13==2.28.9\n[pip3] nvidia-nvjitlink==13.0.88\n[pip3] nvidia-nvjitlink-cu12==12.8.93\n[pip3] nvidia-nvtx==13.0.85\n[pip3] nvidia-nvtx-cu12==12.8.90\n[pip3] torch==2.11.0\n[pip3] torch_c_dlpack_ext==0.1.5\n[pip3] torchaudio==2.11.0\n[pip3] torchvision==0.26.0\n[pip3] triton==3.6.0\n[conda] Could not collect",
129
+ "transformers_version": "5.8.0",
130
+ "lm_eval_version": "0.4.11",
131
+ "upper_git_hash": null,
132
+ "tokenizer_pad_token": [
133
+ "<|endoftext|>",
134
+ "248044"
135
+ ],
136
+ "tokenizer_eos_token": [
137
+ "<|im_end|>",
138
+ "248046"
139
+ ],
140
+ "tokenizer_bos_token": [
141
+ null,
142
+ "None"
143
+ ],
144
+ "eot_token_id": 248046,
145
+ "max_length": 2047,
146
+ "task_hashes": {},
147
+ "model_source": "local-completions",
148
+ "model_name": "/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q3_K_S.gguf",
149
+ "model_name_sanitized": "__workspace__gguf__Qwen3.6-35B-A3B-REAM-192-Q3_K_S.gguf",
150
+ "system_instruction": null,
151
+ "system_instruction_sha": null,
152
+ "fewshot_as_multiturn": null,
153
+ "chat_template": null,
154
+ "chat_template_sha": null,
155
+ "total_evaluation_time_seconds": "219.2053124109516"
156
+ }
eval-results-2026-05-05/q4km-eval.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/164 [00:00<?, ?it/s]
1
  80%|███████▉ | 131/164 [00:00<00:00, 1303.95it/s]
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /usr/local/lib/python3.12/dist-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.5.0) or chardet (6.0.0.post1)/charset_normalizer (3.4.3) doesn't match a supported version!
2
+ warnings.warn(
3
+ 2026-05-05:21:38:34 INFO [_cli.run:376] Selected Tasks: ['humaneval']
4
+ 2026-05-05:21:38:35 INFO [evaluator:211] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
5
+ 2026-05-05:21:38:35 WARNING [evaluator:223] generation_kwargs: {'temperature': 0, 'max_gen_toks': 1024} specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
6
+ 2026-05-05:21:38:35 INFO [evaluator:236] Initializing local-completions model, with arguments: {'model': '/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q4_K_M.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}
7
+ 2026-05-05:21:38:35 INFO [models.openai_completions:42] Remote tokenizer not supported. Using huggingface tokenizer backend.
8
+ 2026-05-05:21:38:35 INFO [models.api_models:172] Using max length 2048 - 1
9
+ 2026-05-05:21:38:35 INFO [models.api_models:175] Concurrent requests are disabled. To enable concurrent requests, set `num_concurrent` > 1.
10
+ 2026-05-05:21:38:35 INFO [models.api_models:193] Using tokenizer huggingface
11
+ 2026-05-05:21:38:46 INFO [tasks:700] Selected tasks:
12
+ 2026-05-05:21:38:46 INFO [tasks:691] Task: humaneval (humaneval/humaneval.yaml)
13
+ 2026-05-05:21:38:46 INFO [evaluator:314] humaneval: Using gen_kwargs: {'until': ['\nclass', '\ndef', '\n#', '\nif', '\nprint'], 'max_gen_toks': 1024, 'do_sample': False, 'temperature': 0}
14
+ 2026-05-05:21:38:46 INFO [api.task:311] Building contexts for humaneval on rank 0...
15
+
16
  0%| | 0/164 [00:00<?, ?it/s]
17
  80%|███████▉ | 131/164 [00:00<00:00, 1303.95it/s]
18
+ 2026-05-05:21:38:46 INFO [evaluator:584] Running generate_until requests
19
+ 2026-05-05:21:38:46 INFO [models.api_models:733] Tokenized requests are disabled. Context + generation length is not checked.
20
+
21
+ fatal: not a git repository (or any parent up to mount point /)
22
+ Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
23
+ 2026-05-05:21:41:15 INFO [loggers.evaluation_tracker:247] Saving results aggregated
24
+ local-completions ({'model': '/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q4_K_M.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({'temperature': 0, 'max_gen_toks': 1024}), limit: None, num_fewshot: None, batch_size: 1
25
+ | Tasks |Version| Filter |n-shot|Metric| |Value | |Stderr|
26
+ |---------|------:|-----------|-----:|------|---|-----:|---|-----:|
27
+ |humaneval| 1|create_test| 0|pass@1| |0.6768|± |0.0366|
28
+
eval-results-2026-05-05/q4km/__workspace__gguf__Qwen3.6-35B-A3B-REAM-192-Q4_K_M.gguf/results_2026-05-05T21-41-15.435960.json ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": {
3
+ "humaneval": {
4
+ "alias": "humaneval",
5
+ "pass@1,create_test": 0.676829268292683,
6
+ "pass@1_stderr,create_test": 0.036632096446028474
7
+ }
8
+ },
9
+ "group_subtasks": {
10
+ "humaneval": []
11
+ },
12
+ "configs": {
13
+ "humaneval": {
14
+ "task": "humaneval",
15
+ "dataset_path": "openai/openai_humaneval",
16
+ "test_split": "test",
17
+ "doc_to_text": "{{prompt}}",
18
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
19
+ "unsafe_code": true,
20
+ "description": "",
21
+ "target_delimiter": " ",
22
+ "fewshot_delimiter": "\n\n",
23
+ "fewshot_config": {
24
+ "sampler": "default",
25
+ "split": null,
26
+ "process_docs": null,
27
+ "fewshot_indices": null,
28
+ "samples": null,
29
+ "doc_to_text": "{{prompt}}",
30
+ "doc_to_choice": null,
31
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
32
+ "gen_prefix": null,
33
+ "fewshot_delimiter": "\n\n",
34
+ "target_delimiter": " "
35
+ },
36
+ "num_fewshot": 0,
37
+ "metric_list": [
38
+ {
39
+ "metric": "def pass_at_k(references: list[str], predictions: list[list[str]], k: list[int] = None):\n global compute_\n assert k is not None\n if isinstance(k, int):\n k = [k]\n res = compute_.compute(\n references=references,\n predictions=predictions,\n k=k,\n )\n return res[0]\n",
40
+ "aggregation": "mean",
41
+ "higher_is_better": true,
42
+ "k": [
43
+ 1
44
+ ]
45
+ }
46
+ ],
47
+ "output_type": "generate_until",
48
+ "generation_kwargs": {
49
+ "until": [
50
+ "\nclass",
51
+ "\ndef",
52
+ "\n#",
53
+ "\nif",
54
+ "\nprint"
55
+ ],
56
+ "max_gen_toks": 1024,
57
+ "do_sample": false,
58
+ "temperature": 0
59
+ },
60
+ "repeats": 1,
61
+ "filter_list": [
62
+ {
63
+ "name": "create_test",
64
+ "filter": [
65
+ {
66
+ "function": "custom",
67
+ "filter_fn": "<function build_predictions at 0x79bc0bf44180>"
68
+ }
69
+ ]
70
+ }
71
+ ],
72
+ "should_decontaminate": false,
73
+ "metadata": {
74
+ "version": 1.0,
75
+ "model": "/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q4_K_M.gguf",
76
+ "tokenizer": "/workspace/REAM-192-bf16",
77
+ "base_url": "http://127.0.0.1:8080/v1/completions",
78
+ "num_concurrent": 1,
79
+ "max_retries": 3,
80
+ "tokenized_requests": false
81
+ }
82
+ }
83
+ },
84
+ "versions": {
85
+ "humaneval": 1.0
86
+ },
87
+ "n-shot": {
88
+ "humaneval": 0
89
+ },
90
+ "higher_is_better": {
91
+ "humaneval": {
92
+ "pass_at_k": true
93
+ }
94
+ },
95
+ "n-samples": {
96
+ "humaneval": {
97
+ "original": 164,
98
+ "effective": 164
99
+ }
100
+ },
101
+ "config": {
102
+ "model": "local-completions",
103
+ "model_args": {
104
+ "model": "/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q4_K_M.gguf",
105
+ "tokenizer": "/workspace/REAM-192-bf16",
106
+ "base_url": "http://127.0.0.1:8080/v1/completions",
107
+ "num_concurrent": 1,
108
+ "max_retries": 3,
109
+ "tokenized_requests": false
110
+ },
111
+ "batch_size": 1,
112
+ "batch_sizes": [],
113
+ "device": "cuda:0",
114
+ "use_cache": null,
115
+ "limit": null,
116
+ "bootstrap_iters": 100000,
117
+ "gen_kwargs": {
118
+ "temperature": 0,
119
+ "max_gen_toks": 1024
120
+ },
121
+ "random_seed": 0,
122
+ "numpy_seed": 1234,
123
+ "torch_seed": 1234,
124
+ "fewshot_seed": 1234
125
+ },
126
+ "git_hash": null,
127
+ "date": 1778017114.2524564,
128
+ "pretty_env_info": "PyTorch version: 2.11.0+cu130\nIs debug build: False\nCUDA used to build PyTorch: 13.0\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 24.04.3 LTS (x86_64)\nGCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0\nClang version: Could not collect\nCMake version: version 3.28.3\nLibc version: glibc-2.39\n\nPython version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] (64-bit runtime)\nPython platform: Linux-6.8.0-110-generic-x86_64-with-glibc2.39\nIs CUDA available: True\nCUDA runtime version: 12.8.93\nCUDA_MODULE_LOADING set to: \nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB\nNvidia driver version: 580.126.20\ncuDNN version: Probably one of the following:\n/usr/lib/x86_64-linux-gnu/libcudnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.8.0\nIs XPU available: False\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\nCaching allocator config: N/A\n\nCPU:\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nAddress sizes: 48 bits physical, 48 bits virtual\nByte Order: Little Endian\nCPU(s): 256\nOn-line CPU(s) list: 0-254\nOff-line CPU(s) list: 255\nVendor ID: AuthenticAMD\nModel name: AMD EPYC 7763 64-Core Processor\nCPU family: 25\nModel: 1\nThread(s) per core: 2\nCore(s) per socket: 64\nSocket(s): 2\nStepping: 1\nFrequency boost: enabled\nCPU(s) scaling MHz: 73%\nCPU max MHz: 3530.4929\nCPU min MHz: 0.0000\nBogoMIPS: 4900.44\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user\nVirtualization: AMD-V\nL1d cache: 4 MiB (128 instances)\nL1i cache: 4 MiB (128 instances)\nL2 cache: 64 MiB (128 instances)\nL3 cache: 512 MiB (16 instances)\nNUMA node(s): 2\nNUMA node0 CPU(s): 0-63,128-191\nNUMA node1 CPU(s): 64-127,192-254\nVulnerability Gather data sampling: Not affected\nVulnerability Indirect target selection: Not affected\nVulnerability Itlb multihit: Not affected\nVulnerability L1tf: Not affected\nVulnerability Mds: Not affected\nVulnerability Meltdown: Not affected\nVulnerability Mmio stale data: Not affected\nVulnerability Reg file data sampling: Not affected\nVulnerability Retbleed: Not affected\nVulnerability Spec rstack overflow: Mitigation; Safe RET\nVulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected\nVulnerability Srbds: Not affected\nVulnerability Tsa: Vulnerable: Clear CPU buffers attempted, no microcode\nVulnerability Tsx async abort: Not affected\nVulnerability Vmscape: Mitigation; IBPB before exit to userspace\n\nVersions of relevant libraries:\n[pip3] numpy==2.1.2\n[pip3] nvidia-cublas==13.1.0.3\n[pip3] nvidia-cublas-cu12==12.8.4.1\n[pip3] nvidia-cuda-cupti==13.0.85\n[pip3] nvidia-cuda-cupti-cu12==12.8.90\n[pip3] nvidia-cuda-nvrtc==13.0.88\n[pip3] nvidia-cuda-nvrtc-cu12==12.8.93\n[pip3] nvidia-cuda-runtime==13.0.96\n[pip3] nvidia-cuda-runtime-cu12==12.8.90\n[pip3] nvidia-cudnn-cu12==9.10.2.21\n[pip3] nvidia-cudnn-cu13==9.19.0.56\n[pip3] nvidia-cudnn-frontend==1.18.0\n[pip3] nvidia-cufft==12.0.0.61\n[pip3] nvidia-cufft-cu12==11.3.3.83\n[pip3] nvidia-curand==10.4.0.35\n[pip3] nvidia-curand-cu12==10.3.9.90\n[pip3] nvidia-cusolver==12.0.4.66\n[pip3] nvidia-cusolver-cu12==11.7.3.90\n[pip3] nvidia-cusparse==12.6.3.3\n[pip3] nvidia-cusparse-cu12==12.5.8.93\n[pip3] nvidia-cusparselt-cu12==0.7.1\n[pip3] nvidia-cusparselt-cu13==0.8.0\n[pip3] nvidia-nccl-cu12==2.27.3\n[pip3] nvidia-nccl-cu13==2.28.9\n[pip3] nvidia-nvjitlink==13.0.88\n[pip3] nvidia-nvjitlink-cu12==12.8.93\n[pip3] nvidia-nvtx==13.0.85\n[pip3] nvidia-nvtx-cu12==12.8.90\n[pip3] torch==2.11.0\n[pip3] torch_c_dlpack_ext==0.1.5\n[pip3] torchaudio==2.11.0\n[pip3] torchvision==0.26.0\n[pip3] triton==3.6.0\n[conda] Could not collect",
129
+ "transformers_version": "5.8.0",
130
+ "lm_eval_version": "0.4.11",
131
+ "upper_git_hash": null,
132
+ "tokenizer_pad_token": [
133
+ "<|endoftext|>",
134
+ "248044"
135
+ ],
136
+ "tokenizer_eos_token": [
137
+ "<|im_end|>",
138
+ "248046"
139
+ ],
140
+ "tokenizer_bos_token": [
141
+ null,
142
+ "None"
143
+ ],
144
+ "eot_token_id": 248046,
145
+ "max_length": 2047,
146
+ "task_hashes": {},
147
+ "model_source": "local-completions",
148
+ "model_name": "/workspace/gguf/Qwen3.6-35B-A3B-REAM-192-Q4_K_M.gguf",
149
+ "model_name_sanitized": "__workspace__gguf__Qwen3.6-35B-A3B-REAM-192-Q4_K_M.gguf",
150
+ "system_instruction": null,
151
+ "system_instruction_sha": null,
152
+ "fewshot_as_multiturn": null,
153
+ "chat_template": null,
154
+ "chat_template_sha": null,
155
+ "total_evaluation_time_seconds": "194.68436025094707"
156
+ }
eval-results-2026-05-05/reap-q4km-eval.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/164 [00:00<?, ?it/s]
1
  81%|████████ | 133/164 [00:00<00:00, 1325.08it/s]
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /usr/local/lib/python3.12/dist-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.5.0) or chardet (6.0.0.post1)/charset_normalizer (3.4.3) doesn't match a supported version!
2
+ warnings.warn(
3
+ 2026-05-05:22:51:14 INFO [_cli.run:376] Selected Tasks: ['humaneval']
4
+ 2026-05-05:22:51:16 INFO [evaluator:211] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
5
+ 2026-05-05:22:51:16 WARNING [evaluator:223] generation_kwargs: {'temperature': 0, 'max_gen_toks': 1024} specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
6
+ 2026-05-05:22:51:16 INFO [evaluator:236] Initializing local-completions model, with arguments: {'model': '/workspace/reap-gguf/Qwen3.6-VL-REAP-26B-A3B-text-Q4_K_M.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}
7
+ 2026-05-05:22:51:16 INFO [models.openai_completions:42] Remote tokenizer not supported. Using huggingface tokenizer backend.
8
+ 2026-05-05:22:51:16 INFO [models.api_models:172] Using max length 2048 - 1
9
+ 2026-05-05:22:51:16 INFO [models.api_models:175] Concurrent requests are disabled. To enable concurrent requests, set `num_concurrent` > 1.
10
+ 2026-05-05:22:51:16 INFO [models.api_models:193] Using tokenizer huggingface
11
+ 2026-05-05:22:51:27 INFO [tasks:700] Selected tasks:
12
+ 2026-05-05:22:51:27 INFO [tasks:691] Task: humaneval (humaneval/humaneval.yaml)
13
+ 2026-05-05:22:51:27 INFO [evaluator:314] humaneval: Using gen_kwargs: {'until': ['\nclass', '\ndef', '\n#', '\nif', '\nprint'], 'max_gen_toks': 1024, 'do_sample': False, 'temperature': 0}
14
+ 2026-05-05:22:51:27 INFO [api.task:311] Building contexts for humaneval on rank 0...
15
+
16
  0%| | 0/164 [00:00<?, ?it/s]
17
  81%|████████ | 133/164 [00:00<00:00, 1325.08it/s]
18
+ 2026-05-05:22:51:27 INFO [evaluator:584] Running generate_until requests
19
+ 2026-05-05:22:51:27 INFO [models.api_models:733] Tokenized requests are disabled. Context + generation length is not checked.
20
+
21
+ fatal: not a git repository (or any parent up to mount point /)
22
+ Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
23
+ 2026-05-05:22:54:14 INFO [loggers.evaluation_tracker:247] Saving results aggregated
24
+ local-completions ({'model': '/workspace/reap-gguf/Qwen3.6-VL-REAP-26B-A3B-text-Q4_K_M.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({'temperature': 0, 'max_gen_toks': 1024}), limit: None, num_fewshot: None, batch_size: 1
25
+ | Tasks |Version| Filter |n-shot|Metric| |Value | |Stderr|
26
+ |---------|------:|-----------|-----:|------|---|-----:|---|-----:|
27
+ |humaneval| 1|create_test| 0|pass@1| |0.6646|± | 0.037|
28
+
eval-results-2026-05-05/reap-q4km/__workspace__reap-gguf__Qwen3.6-VL-REAP-26B-A3B-text-Q4_K_M.gguf/results_2026-05-05T22-54-14.733493.json ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": {
3
+ "humaneval": {
4
+ "alias": "humaneval",
5
+ "pass@1,create_test": 0.6646341463414634,
6
+ "pass@1_stderr,create_test": 0.03697915163403716
7
+ }
8
+ },
9
+ "group_subtasks": {
10
+ "humaneval": []
11
+ },
12
+ "configs": {
13
+ "humaneval": {
14
+ "task": "humaneval",
15
+ "dataset_path": "openai/openai_humaneval",
16
+ "test_split": "test",
17
+ "doc_to_text": "{{prompt}}",
18
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
19
+ "unsafe_code": true,
20
+ "description": "",
21
+ "target_delimiter": " ",
22
+ "fewshot_delimiter": "\n\n",
23
+ "fewshot_config": {
24
+ "sampler": "default",
25
+ "split": null,
26
+ "process_docs": null,
27
+ "fewshot_indices": null,
28
+ "samples": null,
29
+ "doc_to_text": "{{prompt}}",
30
+ "doc_to_choice": null,
31
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
32
+ "gen_prefix": null,
33
+ "fewshot_delimiter": "\n\n",
34
+ "target_delimiter": " "
35
+ },
36
+ "num_fewshot": 0,
37
+ "metric_list": [
38
+ {
39
+ "metric": "def pass_at_k(references: list[str], predictions: list[list[str]], k: list[int] = None):\n global compute_\n assert k is not None\n if isinstance(k, int):\n k = [k]\n res = compute_.compute(\n references=references,\n predictions=predictions,\n k=k,\n )\n return res[0]\n",
40
+ "aggregation": "mean",
41
+ "higher_is_better": true,
42
+ "k": [
43
+ 1
44
+ ]
45
+ }
46
+ ],
47
+ "output_type": "generate_until",
48
+ "generation_kwargs": {
49
+ "until": [
50
+ "\nclass",
51
+ "\ndef",
52
+ "\n#",
53
+ "\nif",
54
+ "\nprint"
55
+ ],
56
+ "max_gen_toks": 1024,
57
+ "do_sample": false,
58
+ "temperature": 0
59
+ },
60
+ "repeats": 1,
61
+ "filter_list": [
62
+ {
63
+ "name": "create_test",
64
+ "filter": [
65
+ {
66
+ "function": "custom",
67
+ "filter_fn": "<function build_predictions at 0x728612718180>"
68
+ }
69
+ ]
70
+ }
71
+ ],
72
+ "should_decontaminate": false,
73
+ "metadata": {
74
+ "version": 1.0,
75
+ "model": "/workspace/reap-gguf/Qwen3.6-VL-REAP-26B-A3B-text-Q4_K_M.gguf",
76
+ "tokenizer": "/workspace/REAM-192-bf16",
77
+ "base_url": "http://127.0.0.1:8080/v1/completions",
78
+ "num_concurrent": 1,
79
+ "max_retries": 3,
80
+ "tokenized_requests": false
81
+ }
82
+ }
83
+ },
84
+ "versions": {
85
+ "humaneval": 1.0
86
+ },
87
+ "n-shot": {
88
+ "humaneval": 0
89
+ },
90
+ "higher_is_better": {
91
+ "humaneval": {
92
+ "pass_at_k": true
93
+ }
94
+ },
95
+ "n-samples": {
96
+ "humaneval": {
97
+ "original": 164,
98
+ "effective": 164
99
+ }
100
+ },
101
+ "config": {
102
+ "model": "local-completions",
103
+ "model_args": {
104
+ "model": "/workspace/reap-gguf/Qwen3.6-VL-REAP-26B-A3B-text-Q4_K_M.gguf",
105
+ "tokenizer": "/workspace/REAM-192-bf16",
106
+ "base_url": "http://127.0.0.1:8080/v1/completions",
107
+ "num_concurrent": 1,
108
+ "max_retries": 3,
109
+ "tokenized_requests": false
110
+ },
111
+ "batch_size": 1,
112
+ "batch_sizes": [],
113
+ "device": "cuda:0",
114
+ "use_cache": null,
115
+ "limit": null,
116
+ "bootstrap_iters": 100000,
117
+ "gen_kwargs": {
118
+ "temperature": 0,
119
+ "max_gen_toks": 1024
120
+ },
121
+ "random_seed": 0,
122
+ "numpy_seed": 1234,
123
+ "torch_seed": 1234,
124
+ "fewshot_seed": 1234
125
+ },
126
+ "git_hash": null,
127
+ "date": 1778021474.3773422,
128
+ "pretty_env_info": "PyTorch version: 2.11.0+cu130\nIs debug build: False\nCUDA used to build PyTorch: 13.0\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 24.04.3 LTS (x86_64)\nGCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0\nClang version: Could not collect\nCMake version: version 3.28.3\nLibc version: glibc-2.39\n\nPython version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] (64-bit runtime)\nPython platform: Linux-6.8.0-110-generic-x86_64-with-glibc2.39\nIs CUDA available: True\nCUDA runtime version: 12.8.93\nCUDA_MODULE_LOADING set to: \nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB\nNvidia driver version: 580.126.20\ncuDNN version: Probably one of the following:\n/usr/lib/x86_64-linux-gnu/libcudnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.8.0\nIs XPU available: False\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\nCaching allocator config: N/A\n\nCPU:\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nAddress sizes: 48 bits physical, 48 bits virtual\nByte Order: Little Endian\nCPU(s): 256\nOn-line CPU(s) list: 0-254\nOff-line CPU(s) list: 255\nVendor ID: AuthenticAMD\nModel name: AMD EPYC 7763 64-Core Processor\nCPU family: 25\nModel: 1\nThread(s) per core: 2\nCore(s) per socket: 64\nSocket(s): 2\nStepping: 1\nFrequency boost: enabled\nCPU(s) scaling MHz: 75%\nCPU max MHz: 3530.4929\nCPU min MHz: 0.0000\nBogoMIPS: 4900.44\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user\nVirtualization: AMD-V\nL1d cache: 4 MiB (128 instances)\nL1i cache: 4 MiB (128 instances)\nL2 cache: 64 MiB (128 instances)\nL3 cache: 512 MiB (16 instances)\nNUMA node(s): 2\nNUMA node0 CPU(s): 0-63,128-191\nNUMA node1 CPU(s): 64-127,192-254\nVulnerability Gather data sampling: Not affected\nVulnerability Indirect target selection: Not affected\nVulnerability Itlb multihit: Not affected\nVulnerability L1tf: Not affected\nVulnerability Mds: Not affected\nVulnerability Meltdown: Not affected\nVulnerability Mmio stale data: Not affected\nVulnerability Reg file data sampling: Not affected\nVulnerability Retbleed: Not affected\nVulnerability Spec rstack overflow: Mitigation; Safe RET\nVulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected\nVulnerability Srbds: Not affected\nVulnerability Tsa: Vulnerable: Clear CPU buffers attempted, no microcode\nVulnerability Tsx async abort: Not affected\nVulnerability Vmscape: Mitigation; IBPB before exit to userspace\n\nVersions of relevant libraries:\n[pip3] numpy==2.1.2\n[pip3] nvidia-cublas==13.1.0.3\n[pip3] nvidia-cublas-cu12==12.8.4.1\n[pip3] nvidia-cuda-cupti==13.0.85\n[pip3] nvidia-cuda-cupti-cu12==12.8.90\n[pip3] nvidia-cuda-nvrtc==13.0.88\n[pip3] nvidia-cuda-nvrtc-cu12==12.8.93\n[pip3] nvidia-cuda-runtime==13.0.96\n[pip3] nvidia-cuda-runtime-cu12==12.8.90\n[pip3] nvidia-cudnn-cu12==9.10.2.21\n[pip3] nvidia-cudnn-cu13==9.19.0.56\n[pip3] nvidia-cudnn-frontend==1.18.0\n[pip3] nvidia-cufft==12.0.0.61\n[pip3] nvidia-cufft-cu12==11.3.3.83\n[pip3] nvidia-curand==10.4.0.35\n[pip3] nvidia-curand-cu12==10.3.9.90\n[pip3] nvidia-cusolver==12.0.4.66\n[pip3] nvidia-cusolver-cu12==11.7.3.90\n[pip3] nvidia-cusparse==12.6.3.3\n[pip3] nvidia-cusparse-cu12==12.5.8.93\n[pip3] nvidia-cusparselt-cu12==0.7.1\n[pip3] nvidia-cusparselt-cu13==0.8.0\n[pip3] nvidia-nccl-cu12==2.27.3\n[pip3] nvidia-nccl-cu13==2.28.9\n[pip3] nvidia-nvjitlink==13.0.88\n[pip3] nvidia-nvjitlink-cu12==12.8.93\n[pip3] nvidia-nvtx==13.0.85\n[pip3] nvidia-nvtx-cu12==12.8.90\n[pip3] torch==2.11.0\n[pip3] torch_c_dlpack_ext==0.1.5\n[pip3] torchaudio==2.11.0\n[pip3] torchvision==0.26.0\n[pip3] triton==3.6.0\n[conda] Could not collect",
129
+ "transformers_version": "5.8.0",
130
+ "lm_eval_version": "0.4.11",
131
+ "upper_git_hash": null,
132
+ "tokenizer_pad_token": [
133
+ "<|endoftext|>",
134
+ "248044"
135
+ ],
136
+ "tokenizer_eos_token": [
137
+ "<|im_end|>",
138
+ "248046"
139
+ ],
140
+ "tokenizer_bos_token": [
141
+ null,
142
+ "None"
143
+ ],
144
+ "eot_token_id": 248046,
145
+ "max_length": 2047,
146
+ "task_hashes": {},
147
+ "model_source": "local-completions",
148
+ "model_name": "/workspace/reap-gguf/Qwen3.6-VL-REAP-26B-A3B-text-Q4_K_M.gguf",
149
+ "model_name_sanitized": "__workspace__reap-gguf__Qwen3.6-VL-REAP-26B-A3B-text-Q4_K_M.gguf",
150
+ "system_instruction": null,
151
+ "system_instruction_sha": null,
152
+ "fewshot_as_multiturn": null,
153
+ "chat_template": null,
154
+ "chat_template_sha": null,
155
+ "total_evaluation_time_seconds": "207.06668718799483"
156
+ }
eval-results-2026-05-05/unsloth-ud-q4km-eval.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/164 [00:00<?, ?it/s]
1
  77%|███████▋ | 127/164 [00:00<00:00, 1262.46it/s]
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /usr/local/lib/python3.12/dist-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.5.0) or chardet (6.0.0.post1)/charset_normalizer (3.4.3) doesn't match a supported version!
2
+ warnings.warn(
3
+ 2026-05-05:22:33:20 INFO [_cli.run:376] Selected Tasks: ['humaneval']
4
+ 2026-05-05:22:33:22 INFO [evaluator:211] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
5
+ 2026-05-05:22:33:22 WARNING [evaluator:223] generation_kwargs: {'temperature': 0, 'max_gen_toks': 1024} specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
6
+ 2026-05-05:22:33:22 INFO [evaluator:236] Initializing local-completions model, with arguments: {'model': '/workspace/unsloth-gguf/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}
7
+ 2026-05-05:22:33:22 INFO [models.openai_completions:42] Remote tokenizer not supported. Using huggingface tokenizer backend.
8
+ 2026-05-05:22:33:22 INFO [models.api_models:172] Using max length 2048 - 1
9
+ 2026-05-05:22:33:22 INFO [models.api_models:175] Concurrent requests are disabled. To enable concurrent requests, set `num_concurrent` > 1.
10
+ 2026-05-05:22:33:22 INFO [models.api_models:193] Using tokenizer huggingface
11
+ 2026-05-05:22:33:33 INFO [tasks:700] Selected tasks:
12
+ 2026-05-05:22:33:33 INFO [tasks:691] Task: humaneval (humaneval/humaneval.yaml)
13
+ 2026-05-05:22:33:33 INFO [evaluator:314] humaneval: Using gen_kwargs: {'until': ['\nclass', '\ndef', '\n#', '\nif', '\nprint'], 'max_gen_toks': 1024, 'do_sample': False, 'temperature': 0}
14
+ 2026-05-05:22:33:33 INFO [api.task:311] Building contexts for humaneval on rank 0...
15
+
16
  0%| | 0/164 [00:00<?, ?it/s]
17
  77%|███████▋ | 127/164 [00:00<00:00, 1262.46it/s]
18
+ 2026-05-05:22:33:33 INFO [evaluator:584] Running generate_until requests
19
+ 2026-05-05:22:33:33 INFO [models.api_models:733] Tokenized requests are disabled. Context + generation length is not checked.
20
+
21
+ fatal: not a git repository (or any parent up to mount point /)
22
+ Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
23
+ 2026-05-05:22:36:12 INFO [loggers.evaluation_tracker:247] Saving results aggregated
24
+ local-completions ({'model': '/workspace/unsloth-gguf/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf', 'tokenizer': '/workspace/REAM-192-bf16', 'base_url': 'http://127.0.0.1:8080/v1/completions', 'num_concurrent': 1, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({'temperature': 0, 'max_gen_toks': 1024}), limit: None, num_fewshot: None, batch_size: 1
25
+ | Tasks |Version| Filter |n-shot|Metric| |Value| |Stderr|
26
+ |---------|------:|-----------|-----:|------|---|----:|---|-----:|
27
+ |humaneval| 1|create_test| 0|pass@1| |0.628|± |0.0379|
28
+
eval-results-2026-05-05/unsloth-ud-q4km/__workspace__unsloth-gguf__Qwen3.6-35B-A3B-UD-Q4_K_M.gguf/results_2026-05-05T22-36-12.437795.json ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": {
3
+ "humaneval": {
4
+ "alias": "humaneval",
5
+ "pass@1,create_test": 0.6280487804878049,
6
+ "pass@1_stderr,create_test": 0.037856972501334366
7
+ }
8
+ },
9
+ "group_subtasks": {
10
+ "humaneval": []
11
+ },
12
+ "configs": {
13
+ "humaneval": {
14
+ "task": "humaneval",
15
+ "dataset_path": "openai/openai_humaneval",
16
+ "test_split": "test",
17
+ "doc_to_text": "{{prompt}}",
18
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
19
+ "unsafe_code": true,
20
+ "description": "",
21
+ "target_delimiter": " ",
22
+ "fewshot_delimiter": "\n\n",
23
+ "fewshot_config": {
24
+ "sampler": "default",
25
+ "split": null,
26
+ "process_docs": null,
27
+ "fewshot_indices": null,
28
+ "samples": null,
29
+ "doc_to_text": "{{prompt}}",
30
+ "doc_to_choice": null,
31
+ "doc_to_target": "{{test}}\ncheck({{entry_point}})",
32
+ "gen_prefix": null,
33
+ "fewshot_delimiter": "\n\n",
34
+ "target_delimiter": " "
35
+ },
36
+ "num_fewshot": 0,
37
+ "metric_list": [
38
+ {
39
+ "metric": "def pass_at_k(references: list[str], predictions: list[list[str]], k: list[int] = None):\n global compute_\n assert k is not None\n if isinstance(k, int):\n k = [k]\n res = compute_.compute(\n references=references,\n predictions=predictions,\n k=k,\n )\n return res[0]\n",
40
+ "aggregation": "mean",
41
+ "higher_is_better": true,
42
+ "k": [
43
+ 1
44
+ ]
45
+ }
46
+ ],
47
+ "output_type": "generate_until",
48
+ "generation_kwargs": {
49
+ "until": [
50
+ "\nclass",
51
+ "\ndef",
52
+ "\n#",
53
+ "\nif",
54
+ "\nprint"
55
+ ],
56
+ "max_gen_toks": 1024,
57
+ "do_sample": false,
58
+ "temperature": 0
59
+ },
60
+ "repeats": 1,
61
+ "filter_list": [
62
+ {
63
+ "name": "create_test",
64
+ "filter": [
65
+ {
66
+ "function": "custom",
67
+ "filter_fn": "<function build_predictions at 0x748e7a340220>"
68
+ }
69
+ ]
70
+ }
71
+ ],
72
+ "should_decontaminate": false,
73
+ "metadata": {
74
+ "version": 1.0,
75
+ "model": "/workspace/unsloth-gguf/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf",
76
+ "tokenizer": "/workspace/REAM-192-bf16",
77
+ "base_url": "http://127.0.0.1:8080/v1/completions",
78
+ "num_concurrent": 1,
79
+ "max_retries": 3,
80
+ "tokenized_requests": false
81
+ }
82
+ }
83
+ },
84
+ "versions": {
85
+ "humaneval": 1.0
86
+ },
87
+ "n-shot": {
88
+ "humaneval": 0
89
+ },
90
+ "higher_is_better": {
91
+ "humaneval": {
92
+ "pass_at_k": true
93
+ }
94
+ },
95
+ "n-samples": {
96
+ "humaneval": {
97
+ "original": 164,
98
+ "effective": 164
99
+ }
100
+ },
101
+ "config": {
102
+ "model": "local-completions",
103
+ "model_args": {
104
+ "model": "/workspace/unsloth-gguf/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf",
105
+ "tokenizer": "/workspace/REAM-192-bf16",
106
+ "base_url": "http://127.0.0.1:8080/v1/completions",
107
+ "num_concurrent": 1,
108
+ "max_retries": 3,
109
+ "tokenized_requests": false
110
+ },
111
+ "batch_size": 1,
112
+ "batch_sizes": [],
113
+ "device": "cuda:0",
114
+ "use_cache": null,
115
+ "limit": null,
116
+ "bootstrap_iters": 100000,
117
+ "gen_kwargs": {
118
+ "temperature": 0,
119
+ "max_gen_toks": 1024
120
+ },
121
+ "random_seed": 0,
122
+ "numpy_seed": 1234,
123
+ "torch_seed": 1234,
124
+ "fewshot_seed": 1234
125
+ },
126
+ "git_hash": null,
127
+ "date": 1778020400.7048757,
128
+ "pretty_env_info": "PyTorch version: 2.11.0+cu130\nIs debug build: False\nCUDA used to build PyTorch: 13.0\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 24.04.3 LTS (x86_64)\nGCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0\nClang version: Could not collect\nCMake version: version 3.28.3\nLibc version: glibc-2.39\n\nPython version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] (64-bit runtime)\nPython platform: Linux-6.8.0-110-generic-x86_64-with-glibc2.39\nIs CUDA available: True\nCUDA runtime version: 12.8.93\nCUDA_MODULE_LOADING set to: \nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB\nNvidia driver version: 580.126.20\ncuDNN version: Probably one of the following:\n/usr/lib/x86_64-linux-gnu/libcudnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.8.0\n/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.8.0\nIs XPU available: False\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\nCaching allocator config: N/A\n\nCPU:\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nAddress sizes: 48 bits physical, 48 bits virtual\nByte Order: Little Endian\nCPU(s): 256\nOn-line CPU(s) list: 0-254\nOff-line CPU(s) list: 255\nVendor ID: AuthenticAMD\nModel name: AMD EPYC 7763 64-Core Processor\nCPU family: 25\nModel: 1\nThread(s) per core: 2\nCore(s) per socket: 64\nSocket(s): 2\nStepping: 1\nFrequency boost: enabled\nCPU(s) scaling MHz: 76%\nCPU max MHz: 3530.4929\nCPU min MHz: 0.0000\nBogoMIPS: 4900.44\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user\nVirtualization: AMD-V\nL1d cache: 4 MiB (128 instances)\nL1i cache: 4 MiB (128 instances)\nL2 cache: 64 MiB (128 instances)\nL3 cache: 512 MiB (16 instances)\nNUMA node(s): 2\nNUMA node0 CPU(s): 0-63,128-191\nNUMA node1 CPU(s): 64-127,192-254\nVulnerability Gather data sampling: Not affected\nVulnerability Indirect target selection: Not affected\nVulnerability Itlb multihit: Not affected\nVulnerability L1tf: Not affected\nVulnerability Mds: Not affected\nVulnerability Meltdown: Not affected\nVulnerability Mmio stale data: Not affected\nVulnerability Reg file data sampling: Not affected\nVulnerability Retbleed: Not affected\nVulnerability Spec rstack overflow: Mitigation; Safe RET\nVulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected\nVulnerability Srbds: Not affected\nVulnerability Tsa: Vulnerable: Clear CPU buffers attempted, no microcode\nVulnerability Tsx async abort: Not affected\nVulnerability Vmscape: Mitigation; IBPB before exit to userspace\n\nVersions of relevant libraries:\n[pip3] numpy==2.1.2\n[pip3] nvidia-cublas==13.1.0.3\n[pip3] nvidia-cublas-cu12==12.8.4.1\n[pip3] nvidia-cuda-cupti==13.0.85\n[pip3] nvidia-cuda-cupti-cu12==12.8.90\n[pip3] nvidia-cuda-nvrtc==13.0.88\n[pip3] nvidia-cuda-nvrtc-cu12==12.8.93\n[pip3] nvidia-cuda-runtime==13.0.96\n[pip3] nvidia-cuda-runtime-cu12==12.8.90\n[pip3] nvidia-cudnn-cu12==9.10.2.21\n[pip3] nvidia-cudnn-cu13==9.19.0.56\n[pip3] nvidia-cudnn-frontend==1.18.0\n[pip3] nvidia-cufft==12.0.0.61\n[pip3] nvidia-cufft-cu12==11.3.3.83\n[pip3] nvidia-curand==10.4.0.35\n[pip3] nvidia-curand-cu12==10.3.9.90\n[pip3] nvidia-cusolver==12.0.4.66\n[pip3] nvidia-cusolver-cu12==11.7.3.90\n[pip3] nvidia-cusparse==12.6.3.3\n[pip3] nvidia-cusparse-cu12==12.5.8.93\n[pip3] nvidia-cusparselt-cu12==0.7.1\n[pip3] nvidia-cusparselt-cu13==0.8.0\n[pip3] nvidia-nccl-cu12==2.27.3\n[pip3] nvidia-nccl-cu13==2.28.9\n[pip3] nvidia-nvjitlink==13.0.88\n[pip3] nvidia-nvjitlink-cu12==12.8.93\n[pip3] nvidia-nvtx==13.0.85\n[pip3] nvidia-nvtx-cu12==12.8.90\n[pip3] torch==2.11.0\n[pip3] torch_c_dlpack_ext==0.1.5\n[pip3] torchaudio==2.11.0\n[pip3] torchvision==0.26.0\n[pip3] triton==3.6.0\n[conda] Could not collect",
129
+ "transformers_version": "5.8.0",
130
+ "lm_eval_version": "0.4.11",
131
+ "upper_git_hash": null,
132
+ "tokenizer_pad_token": [
133
+ "<|endoftext|>",
134
+ "248044"
135
+ ],
136
+ "tokenizer_eos_token": [
137
+ "<|im_end|>",
138
+ "248046"
139
+ ],
140
+ "tokenizer_bos_token": [
141
+ null,
142
+ "None"
143
+ ],
144
+ "eot_token_id": 248046,
145
+ "max_length": 2047,
146
+ "task_hashes": {},
147
+ "model_source": "local-completions",
148
+ "model_name": "/workspace/unsloth-gguf/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf",
149
+ "model_name_sanitized": "__workspace__unsloth-gguf__Qwen3.6-35B-A3B-UD-Q4_K_M.gguf",
150
+ "system_instruction": null,
151
+ "system_instruction_sha": null,
152
+ "fewshot_as_multiturn": null,
153
+ "chat_template": null,
154
+ "chat_template_sha": null,
155
+ "total_evaluation_time_seconds": "203.78883513098117"
156
+ }