kaiokendev
/

superhot-13b-16k-no-rlhf-test

Model card Files Files and versions

kaiokendev commited on Jun 23, 2023

Commit

fdf735d

·

1 Parent(s): 6f55672

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -1,3 +1,27 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+### SuperHOT Prototype 2 w/ 16K Context
+This is a second prototype of SuperHOT, this time with 16K context and no RLHF, using the same technique described in [the github blog](https://kaiokendev.github.io/til#extending-context-to-8k).
+Tests have shown that the model does indeed leverage the extended context at 8K, so naturally, let's try going even further.
+You will need to **use either the monkeypatch** or, if you are already using the monkeypatch, **change the scaling factor to 0.125 and the maximum sequence length to 16384**
+I trained the LoRA with the following configuration:
+- 1200 samples (~400 samples over 2048 sequence length)
+- learning rate of 3e-4
+- 3 epochs
+- The exported modules are:
+    - q_proj
+    - k_proj
+    - v_proj
+    - o_proj
+    - no bias
+- Rank = 4
+- Alpha = 8
+- no dropout
+- weight decay of 0.1
+- AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5
+- Trained on 4-bit base model