ThingsAI commited on
Commit
334fa55
·
verified ·
1 Parent(s): be51c2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -60,6 +60,29 @@ The architecture closely follows the efficient‑small‑LM blueprint popularise
60
 
61
  Total trainable parameters: **≈48 M** (with weight tying).
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ## Uses
64
 
65
  ### Direct Use
 
60
 
61
  Total trainable parameters: **≈48 M** (with weight tying).
62
 
63
+ ### Benchmark Evaluation Metrics
64
+
65
+ | Category | Benchmark | Metric | Score / Value | Status |
66
+ | :--- | :--- | :--- | :---: | :---: |
67
+ | **Linguistics & Grammar** | BLiMP | Accuracy | 68.12% | Success |
68
+ | **Commonsense & Reasoning** | PIQA | Normalized Accuracy | 57.83% | Success |
69
+ | | COPA | Accuracy | 57.00% | Success |
70
+ | | BoolQ | Accuracy | 52.17% | Success |
71
+ | | WinoGrande | Accuracy | 47.36% | Success |
72
+ | | HellaSwag | Normalized Accuracy | 28.49% | Success |
73
+ | | RACE | Accuracy | 26.41% | Success |
74
+ | | CommonsenseQA | Accuracy | 20.31% | Success |
75
+ | **Academic & Knowledge** | SciQ | Normalized Accuracy | 49.00% | Success |
76
+ | | ARC-Easy | Normalized Accuracy | 36.49% | Success |
77
+ | | MMLU | Accuracy | 25.64% | Success |
78
+ | | ARC-Challenge | Normalized Accuracy | 25.17% | Success |
79
+ | | OpenBookQA | Normalized Accuracy | 25.40% | Success |
80
+ | **Language Modeling** | LAMBADA | Accuracy | 15.87% | Success |
81
+ | | WikiText-2 | Word Perplexity | 251.76 | Success |
82
+
83
+ *Note: The Arithmetic benchmark failed due to outdated script support (`arithmetic.py`), and SocialIQA failed due to a registration tag error (`siqa`). Total baseline execution completed successfully for all other 15 tasks.*
84
+
85
+
86
  ## Uses
87
 
88
  ### Direct Use