Hyun9junn commited on
Commit
a81f41b
ยท
verified ยท
1 Parent(s): 5e04abe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -6
README.md CHANGED
@@ -16,8 +16,8 @@ base_model: LGAI-EXAONE/K-EXAONE-236B-A23B
16
  ---
17
 
18
  # K-EXAONE-236B-A23B-W4A16-G128
19
- ### ๐Ÿ”„ (2026-04-13) v2 Update - Improved Quantization (# of Calibration Dataset 512, Sequence len 512)
20
- ### ๐Ÿ”„ (2026-04-10) Initial commit (# of Calibration Dataset 32, Sequence len 128)
21
 
22
  > **Note โ€” Early release**
23
  >
@@ -63,6 +63,8 @@ This is the **first W4A16 AWQ checkpoint** for K-EXAONE-236B-A23B publicly avail
63
 
64
  Quantization was performed using [llm-compressor](https://github.com/vllm-project/llm-compressor) with a **MoE-aware AWQ** recipe.
65
 
 
 
66
  **Method:** AWQ applies channel-wise scaling to minimize quantization error by protecting salient weights, using a calibration dataset to determine optimal scales.
67
 
68
  **Recipe highlights:**
@@ -75,8 +77,6 @@ Quantization was performed using [llm-compressor](https://github.com/vllm-projec
75
  * Layer 0 (dense MLP) and `lm_head` are excluded from quantization
76
  * Gate weight tensors are excluded from quantization
77
 
78
- The full recipe is available in `recipe.yaml`. The MoE-aware AWQ recipe was developed in [SqueezeBits/EXAONE](https://github.com/SqueezeBits/EXAONE).
79
-
80
  **Calibration dataset:** [`neuralmagic/LLM_compression_calibration`](https://huggingface.co/datasets/neuralmagic/LLM_compression_calibration) (512 samples, sequence length 2048)
81
 
82
  ---
@@ -241,5 +241,3 @@ If you use this model, please cite the original K-EXAONE work:
241
  url = {https://huggingface.co/LGAI-EXAONE/K-EXAONE-236B-A23B}
242
  }
243
  ```
244
-
245
- Quantization produced by [Hyun9junn](https://huggingface.co/Hyun9junn) using [llm-compressor](https://github.com/vllm-project/llm-compressor).
 
16
  ---
17
 
18
  # K-EXAONE-236B-A23B-W4A16-G128
19
+ **๐Ÿ”„ (2026-04-13) Improved Quantization** - scale-up calibration dataset (# of Calibration Dataset 512, Sequence len 512)
20
+ **๐Ÿ”„ (2026-04-10) Initial commit** (# of Calibration Dataset 32, Sequence len 128)
21
 
22
  > **Note โ€” Early release**
23
  >
 
63
 
64
  Quantization was performed using [llm-compressor](https://github.com/vllm-project/llm-compressor) with a **MoE-aware AWQ** recipe.
65
 
66
+ The EXAONE specific MoE-aware AWQ recipe was developed in [SqueezeBits/llm-compressor-K-EXAONE](https://github.com/SqueezeBits/llm-compressor-K-EXAONE).
67
+
68
  **Method:** AWQ applies channel-wise scaling to minimize quantization error by protecting salient weights, using a calibration dataset to determine optimal scales.
69
 
70
  **Recipe highlights:**
 
77
  * Layer 0 (dense MLP) and `lm_head` are excluded from quantization
78
  * Gate weight tensors are excluded from quantization
79
 
 
 
80
  **Calibration dataset:** [`neuralmagic/LLM_compression_calibration`](https://huggingface.co/datasets/neuralmagic/LLM_compression_calibration) (512 samples, sequence length 2048)
81
 
82
  ---
 
241
  url = {https://huggingface.co/LGAI-EXAONE/K-EXAONE-236B-A23B}
242
  }
243
  ```