kaiokendev commited on
Commit
fdf735d
·
1 Parent(s): 6f55672

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -1,3 +1,27 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ ### SuperHOT Prototype 2 w/ 16K Context
6
+
7
+ This is a second prototype of SuperHOT, this time with 16K context and no RLHF, using the same technique described in [the github blog](https://kaiokendev.github.io/til#extending-context-to-8k).
8
+ Tests have shown that the model does indeed leverage the extended context at 8K, so naturally, let's try going even further.
9
+
10
+ You will need to **use either the monkeypatch** or, if you are already using the monkeypatch, **change the scaling factor to 0.125 and the maximum sequence length to 16384**
11
+
12
+ I trained the LoRA with the following configuration:
13
+ - 1200 samples (~400 samples over 2048 sequence length)
14
+ - learning rate of 3e-4
15
+ - 3 epochs
16
+ - The exported modules are:
17
+ - q_proj
18
+ - k_proj
19
+ - v_proj
20
+ - o_proj
21
+ - no bias
22
+ - Rank = 4
23
+ - Alpha = 8
24
+ - no dropout
25
+ - weight decay of 0.1
26
+ - AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5
27
+ - Trained on 4-bit base model