Update README.md
Browse files
README.md
CHANGED
|
@@ -60,6 +60,29 @@ The architecture closely follows the efficient‑small‑LM blueprint popularise
|
|
| 60 |
|
| 61 |
Total trainable parameters: **≈48 M** (with weight tying).
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
## Uses
|
| 64 |
|
| 65 |
### Direct Use
|
|
|
|
| 60 |
|
| 61 |
Total trainable parameters: **≈48 M** (with weight tying).
|
| 62 |
|
| 63 |
+
### Benchmark Evaluation Metrics
|
| 64 |
+
|
| 65 |
+
| Category | Benchmark | Metric | Score / Value | Status |
|
| 66 |
+
| :--- | :--- | :--- | :---: | :---: |
|
| 67 |
+
| **Linguistics & Grammar** | BLiMP | Accuracy | 68.12% | Success |
|
| 68 |
+
| **Commonsense & Reasoning** | PIQA | Normalized Accuracy | 57.83% | Success |
|
| 69 |
+
| | COPA | Accuracy | 57.00% | Success |
|
| 70 |
+
| | BoolQ | Accuracy | 52.17% | Success |
|
| 71 |
+
| | WinoGrande | Accuracy | 47.36% | Success |
|
| 72 |
+
| | HellaSwag | Normalized Accuracy | 28.49% | Success |
|
| 73 |
+
| | RACE | Accuracy | 26.41% | Success |
|
| 74 |
+
| | CommonsenseQA | Accuracy | 20.31% | Success |
|
| 75 |
+
| **Academic & Knowledge** | SciQ | Normalized Accuracy | 49.00% | Success |
|
| 76 |
+
| | ARC-Easy | Normalized Accuracy | 36.49% | Success |
|
| 77 |
+
| | MMLU | Accuracy | 25.64% | Success |
|
| 78 |
+
| | ARC-Challenge | Normalized Accuracy | 25.17% | Success |
|
| 79 |
+
| | OpenBookQA | Normalized Accuracy | 25.40% | Success |
|
| 80 |
+
| **Language Modeling** | LAMBADA | Accuracy | 15.87% | Success |
|
| 81 |
+
| | WikiText-2 | Word Perplexity | 251.76 | Success |
|
| 82 |
+
|
| 83 |
+
*Note: The Arithmetic benchmark failed due to outdated script support (`arithmetic.py`), and SocialIQA failed due to a registration tag error (`siqa`). Total baseline execution completed successfully for all other 15 tasks.*
|
| 84 |
+
|
| 85 |
+
|
| 86 |
## Uses
|
| 87 |
|
| 88 |
### Direct Use
|