Instructions to use Niansuh/Biggie-SmoLlm-0.15B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Niansuh/Biggie-SmoLlm-0.15B-Base with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Niansuh/Biggie-SmoLlm-0.15B-Base", filename="Biggie_SmolLM_0.15B_Base_bf16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Niansuh/Biggie-SmoLlm-0.15B-Base with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Niansuh/Biggie-SmoLlm-0.15B-Base:BF16 # Run inference directly in the terminal: llama-cli -hf Niansuh/Biggie-SmoLlm-0.15B-Base:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Niansuh/Biggie-SmoLlm-0.15B-Base:BF16 # Run inference directly in the terminal: llama-cli -hf Niansuh/Biggie-SmoLlm-0.15B-Base:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Niansuh/Biggie-SmoLlm-0.15B-Base:BF16 # Run inference directly in the terminal: ./llama-cli -hf Niansuh/Biggie-SmoLlm-0.15B-Base:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Niansuh/Biggie-SmoLlm-0.15B-Base:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Niansuh/Biggie-SmoLlm-0.15B-Base:BF16
Use Docker
docker model run hf.co/Niansuh/Biggie-SmoLlm-0.15B-Base:BF16
- LM Studio
- Jan
- Ollama
How to use Niansuh/Biggie-SmoLlm-0.15B-Base with Ollama:
ollama run hf.co/Niansuh/Biggie-SmoLlm-0.15B-Base:BF16
- Unsloth Studio
How to use Niansuh/Biggie-SmoLlm-0.15B-Base with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Niansuh/Biggie-SmoLlm-0.15B-Base to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Niansuh/Biggie-SmoLlm-0.15B-Base to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Niansuh/Biggie-SmoLlm-0.15B-Base to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Niansuh/Biggie-SmoLlm-0.15B-Base with Docker Model Runner:
docker model run hf.co/Niansuh/Biggie-SmoLlm-0.15B-Base:BF16
- Lemonade
How to use Niansuh/Biggie-SmoLlm-0.15B-Base with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Niansuh/Biggie-SmoLlm-0.15B-Base:BF16
Run and chat with the model
lemonade run user.Biggie-SmoLlm-0.15B-Base-BF16
List all available models
lemonade list
Upload 10 files
Browse files- .gitattributes +2 -0
- Biggie_SmolLM_0.15B_Base_bf16.gguf +3 -0
- README.md +368 -0
- biggie_groked_int8_q8_0.gguf +3 -0
- config.json +29 -0
- gitattributes +41 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +42 -0
- tokenizer.json +0 -0
- tokenizer_config.json +167 -0
- vocab.json +0 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
biggie_groked_int8_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
Biggie_SmolLM_0.15B_Base_bf16.gguf filter=lfs diff=lfs merge=lfs -text
|
Biggie_SmolLM_0.15B_Base_bf16.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fa567e7fe3fb07549881f8cb50172dd3fd6a15ada0e8e9b007fd4a04ad254180
|
| 3 |
+
size 362923584
|
README.md
ADDED
|
@@ -0,0 +1,368 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: HuggingFaceTB/SmolLM-135M
|
| 3 |
+
datasets:
|
| 4 |
+
- LDJnr/Capybara
|
| 5 |
+
inference:
|
| 6 |
+
parameters:
|
| 7 |
+
model_file: biggie_groked_int8_q8_0.gguf
|
| 8 |
+
temperature: 1
|
| 9 |
+
license: mit
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
### TINY Frankenstein of [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) upped to 0.18b
|
| 13 |
+
Use this frankenbase for training.
|
| 14 |
+
Sorry for the mislabelling, the model is a 0.18b 181m parameter, not 0.15.
|
| 15 |
+
I did not except this repo to blow up and now all the training scripts depend on it.
|
| 16 |
+
|
| 17 |
+
* ## CITE WORK FROM THIS HF PAGE AND [@cognitivecompai](https://huggingface.co/ehartford)'s OPTIMIZER ON YOUR FUTURE PAPERS OR I WILL DRAG YOUR ORG ON TWITTER LIKE I DID WITH COHERE LOL (we're cool now btw, visited them :)
|
| 18 |
+
* https://github.com/cognitivecomputations/grokadamw
|
| 19 |
+
* https://github.com/SakanaAI/evolutionary-model-merge/
|
| 20 |
+
* https://huggingface.co/blog/smollm
|
| 21 |
+
|
| 22 |
+
>>[!TIP]🐧 If you're impppatient, get the trained checkpoint file that runs on 1 cpu core:
|
| 23 |
+
>>
|
| 24 |
+
>>wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
|
| 25 |
+
>>
|
| 26 |
+
>>make sure to install latest llama.cpp first, it's easy on linux & mac:
|
| 27 |
+
>>
|
| 28 |
+
>> git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
|
| 29 |
+
|
| 30 |
+
Now for the magic trained finetune that runs at insane speeds:
|
| 31 |
+
|
| 32 |
+
The settings are very finicky so be careful with your experimentation
|
| 33 |
+
```verilog
|
| 34 |
+
./llama-cli -fa -b 512 -ctv q8_0 -ctk q8_0 --min-p 0.3 --top-p 0.85 --keep -1 \
|
| 35 |
+
-p "You are a NASA JPL Scientists. Human: I want to bring my cat to mars." \
|
| 36 |
+
--in-prefix "<|im_start|>Human:" --reverse-prompt "Human:" \
|
| 37 |
+
-m biggie_groked_int8_q8_0.gguf -co -cnv \
|
| 38 |
+
-c 1024 -n 700 --temp 1.5 -ngl 0 -t 1
|
| 39 |
+
```
|
| 40 |
+
Yup, that's no gpu, 1 cpu core.
|
| 41 |
+
|
| 42 |
+
This base model was built one via semi-automated continuous merging to figure out the recipe.
|
| 43 |
+
Model is more coherent.
|
| 44 |
+
|
| 45 |
+
The temperature settings and min p etc need to be adjusted but even at default temp0 it was coherent for first 100 tokens.
|
| 46 |
+
Amazing option for further training. And this is a merge of the base, not the instruct!
|
| 47 |
+
|
| 48 |
+
## 🧠 What's Really Going Down Here?
|
| 49 |
+
|
| 50 |
+
We're talking about a convergence of whole bunch of stuff, more papers will be written about this:
|
| 51 |
+
|
| 52 |
+
1. **Evolutionary Merging**:
|
| 53 |
+
2. **BitNet Integration**:
|
| 54 |
+
4. **Experimental GrokAdamW Optimizer**:
|
| 55 |
+
|
| 56 |
+
## Prior work, from last week
|
| 57 |
+
|
| 58 |
+
Credits for optimizer go to [@cognitivecompai](https://github.com/cognitivecomputations/grokadamw) for laying the groundwork with the original GrokAdamW optimizer.
|
| 59 |
+
|
| 60 |
+
## LETS TRY OUT THE EXPERIMENTAL GROKKED FINETUNE:
|
| 61 |
+
|
| 62 |
+
```bash
|
| 63 |
+
wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
Yes we will be talking with a 164mb file that runs at 160 tokens per second on a single cpu core
|
| 67 |
+
## you read all of that correctly yes, 1 cpu core 160 tps https://x.com/nisten/status/1819752034305970649
|
| 68 |
+

|
| 69 |
+
|
| 70 |
+
## 🚀 run it with NO GPU and only one CPU core it with these settings
|
| 71 |
+
```bash
|
| 72 |
+
./llama-cli -n -1 -fa -b 512 -ctv q8_0 -ctk q8_0 -fa --min-p 0.3 --top-p 0.85 --keep -1 -p "You are a NASA JPL Scientists. Human: I want to bring my cat to mars." -m biggie_groked_int8_q8_0.gguf -co -cnv --in-prefix "<|im_start|>Human:" --reverse-prompt "Human:" -c 1024 -n 512 --temp 1.5 -ngl 0
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
## 🏋️ Training Tutorial, MAKE YOUR OWN BIGGIE_SMOlLM
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
Clone the repo like you're stealing code from the future:
|
| 80 |
+
```bash
|
| 81 |
+
git clone https://github.com/nisten/grokadamw
|
| 82 |
+
cd grokadamw
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
Fire up the training script and watch the magic happen:
|
| 86 |
+
```bash
|
| 87 |
+
python smoltrainer.py
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
## 💻 Do it from scratch yourself
|
| 91 |
+
Install the secret sauce (dependencies):
|
| 92 |
+
```bash
|
| 93 |
+
pip install torch transformers datasets tqdm
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
make a file named meow.py , copy paste in this code, and then run it ```python meow.py```
|
| 97 |
+
|
| 98 |
+
```python
|
| 99 |
+
import torch
|
| 100 |
+
import torch.nn as nn
|
| 101 |
+
import logging
|
| 102 |
+
from datasets import load_dataset, Dataset
|
| 103 |
+
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForLanguageModeling
|
| 104 |
+
from torch.cuda.amp import autocast
|
| 105 |
+
import warnings
|
| 106 |
+
from tqdm import tqdm
|
| 107 |
+
|
| 108 |
+
warnings.filterwarnings("ignore", category=FutureWarning)
|
| 109 |
+
warnings.filterwarnings("ignore", category=UserWarning)
|
| 110 |
+
|
| 111 |
+
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
| 112 |
+
logger = logging.getLogger(__name__)
|
| 113 |
+
|
| 114 |
+
MODEL_NAME = "nisten/Biggie-SmoLlm-0.15B-Base"
|
| 115 |
+
MAX_LENGTH = 2048
|
| 116 |
+
BATCH_SIZE = 8
|
| 117 |
+
LEARNING_RATE = 2e-4
|
| 118 |
+
MAX_STEPS = 3000
|
| 119 |
+
GRADIENT_ACCUMULATION_STEPS = 2
|
| 120 |
+
NUM_WARMUP_STEPS = 30
|
| 121 |
+
OUTPUT_DIR = "./capybara_finetuned_results"
|
| 122 |
+
|
| 123 |
+
torch.backends.cuda.matmul.allow_tf32 = True
|
| 124 |
+
torch.backends.cudnn.allow_tf32 = True
|
| 125 |
+
|
| 126 |
+
class GrokAdamW(torch.optim.Optimizer):
|
| 127 |
+
def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=1e-2,
|
| 128 |
+
alpha_init=0.98, lamb=2.0, gamma=0.1, grokking_signal_fns=None,
|
| 129 |
+
grokking_signal_decay_rate=0.1, gradient_clipping=1.0):
|
| 130 |
+
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay,
|
| 131 |
+
alpha_init=alpha_init, lamb=lamb, gamma=gamma,
|
| 132 |
+
grokking_signal_fns=grokking_signal_fns,
|
| 133 |
+
grokking_signal_decay_rate=grokking_signal_decay_rate,
|
| 134 |
+
gradient_clipping=gradient_clipping)
|
| 135 |
+
super(GrokAdamW, self).__init__(params, defaults)
|
| 136 |
+
|
| 137 |
+
@torch.no_grad()
|
| 138 |
+
def step(self, closure=None):
|
| 139 |
+
loss = None
|
| 140 |
+
if closure is not None:
|
| 141 |
+
with torch.enable_grad():
|
| 142 |
+
loss = closure()
|
| 143 |
+
|
| 144 |
+
for group in self.param_groups:
|
| 145 |
+
grokking_signal = self._compute_grokking_signal(group)
|
| 146 |
+
for i, p in enumerate(group['params']):
|
| 147 |
+
if p.grad is None:
|
| 148 |
+
continue
|
| 149 |
+
grad = p.grad
|
| 150 |
+
|
| 151 |
+
if group['gradient_clipping'] > 0:
|
| 152 |
+
grad = torch.clamp(grad, -group['gradient_clipping'], group['gradient_clipping'])
|
| 153 |
+
|
| 154 |
+
state = self.state[p]
|
| 155 |
+
|
| 156 |
+
if len(state) == 0:
|
| 157 |
+
state['step'] = 0
|
| 158 |
+
state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
|
| 159 |
+
state['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)
|
| 160 |
+
state['grok_ema'] = torch.zeros_like(p, memory_format=torch.preserve_format)
|
| 161 |
+
|
| 162 |
+
exp_avg, exp_avg_sq, grok_ema = state['exp_avg'], state['exp_avg_sq'], state['grok_ema']
|
| 163 |
+
beta1, beta2 = group['betas']
|
| 164 |
+
|
| 165 |
+
state['step'] += 1
|
| 166 |
+
|
| 167 |
+
layer_beta1 = beta1 * (1 - group['gamma'])**i
|
| 168 |
+
|
| 169 |
+
alpha = group['alpha_init'] * torch.exp(torch.tensor(-group['grokking_signal_decay_rate'] * grokking_signal))
|
| 170 |
+
grok_ema.mul_(alpha).add_(grad, alpha=1 - alpha)
|
| 171 |
+
grok_grad = grad + group['lamb'] * grok_ema
|
| 172 |
+
|
| 173 |
+
exp_avg.mul_(layer_beta1).add_(grok_grad, alpha=1 - layer_beta1)
|
| 174 |
+
exp_avg_sq.mul_(beta2).addcmul_(grok_grad, grok_grad, value=1 - beta2)
|
| 175 |
+
|
| 176 |
+
denom = exp_avg_sq.sqrt().add_(group['eps'])
|
| 177 |
+
step_size = group['lr']
|
| 178 |
+
|
| 179 |
+
if group['weight_decay'] != 0:
|
| 180 |
+
p.data.mul_(1 - group['lr'] * group['weight_decay'])
|
| 181 |
+
|
| 182 |
+
p.addcdiv_(exp_avg, denom, value=-step_size)
|
| 183 |
+
|
| 184 |
+
return loss
|
| 185 |
+
|
| 186 |
+
def _compute_grokking_signal(self, group):
|
| 187 |
+
if group['grokking_signal_fns'] is None:
|
| 188 |
+
return 0.0
|
| 189 |
+
|
| 190 |
+
signals = []
|
| 191 |
+
for fn in group['grokking_signal_fns']:
|
| 192 |
+
try:
|
| 193 |
+
signal = fn()
|
| 194 |
+
if signal is not None:
|
| 195 |
+
signals.append(signal)
|
| 196 |
+
except Exception as e:
|
| 197 |
+
logger.warning(f"Error in grokking_signal_fn: {e}. Ignoring this function.")
|
| 198 |
+
|
| 199 |
+
if not signals:
|
| 200 |
+
return 0.0
|
| 201 |
+
|
| 202 |
+
return sum(signals) / len(signals)
|
| 203 |
+
|
| 204 |
+
def format_capybara_prompts(examples):
|
| 205 |
+
texts = []
|
| 206 |
+
for conversation in examples['conversation']:
|
| 207 |
+
formatted_text = ""
|
| 208 |
+
for turn in conversation:
|
| 209 |
+
if 'input' in turn:
|
| 210 |
+
formatted_text += f"Human: {turn['input']}\n\n"
|
| 211 |
+
if 'output' in turn:
|
| 212 |
+
formatted_text += f"Assistant: {turn['output']}\n\n"
|
| 213 |
+
texts.append(formatted_text.strip())
|
| 214 |
+
return {"text": texts}
|
| 215 |
+
|
| 216 |
+
class CustomTrainer(Trainer):
|
| 217 |
+
def __init__(self, *args, **kwargs):
|
| 218 |
+
super().__init__(*args, **kwargs)
|
| 219 |
+
self.grokking_signal = 0.0
|
| 220 |
+
|
| 221 |
+
def compute_loss(self, model, inputs, return_outputs=False):
|
| 222 |
+
labels = inputs.pop("labels")
|
| 223 |
+
outputs = model(**inputs)
|
| 224 |
+
logits = outputs.logits
|
| 225 |
+
shift_logits = logits[..., :-1, :].contiguous()
|
| 226 |
+
shift_labels = labels[..., 1:].contiguous()
|
| 227 |
+
loss_fct = nn.CrossEntropyLoss()
|
| 228 |
+
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
|
| 229 |
+
return (loss, outputs) if return_outputs else loss
|
| 230 |
+
|
| 231 |
+
def training_step(self, model, inputs):
|
| 232 |
+
model.train()
|
| 233 |
+
inputs = self._prepare_inputs(inputs)
|
| 234 |
+
|
| 235 |
+
with autocast(dtype=torch.bfloat16):
|
| 236 |
+
loss = self.compute_loss(model, inputs)
|
| 237 |
+
|
| 238 |
+
if self.args.gradient_accumulation_steps > 1:
|
| 239 |
+
loss = loss / self.args.gradient_accumulation_steps
|
| 240 |
+
|
| 241 |
+
loss.backward()
|
| 242 |
+
|
| 243 |
+
self.grokking_signal = loss.item()
|
| 244 |
+
|
| 245 |
+
return loss.detach()
|
| 246 |
+
|
| 247 |
+
def grokking_signal_fn():
|
| 248 |
+
return trainer.grokking_signal
|
| 249 |
+
|
| 250 |
+
def main():
|
| 251 |
+
logger.info(f"🚀 Initializing {MODEL_NAME} finetuning with GrokAdamW")
|
| 252 |
+
|
| 253 |
+
try:
|
| 254 |
+
config = AutoConfig.from_pretrained(MODEL_NAME)
|
| 255 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
|
| 256 |
+
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16)
|
| 257 |
+
except Exception as e:
|
| 258 |
+
logger.error(f"❌ Failed to load model or tokenizer: {str(e)}")
|
| 259 |
+
return
|
| 260 |
+
|
| 261 |
+
if tokenizer.pad_token is None:
|
| 262 |
+
tokenizer.pad_token = tokenizer.eos_token
|
| 263 |
+
model.config.pad_token_id = model.config.eos_token_id
|
| 264 |
+
|
| 265 |
+
logger.info("📚 Loading Capybara dataset")
|
| 266 |
+
try:
|
| 267 |
+
capybara_dataset = load_dataset("LDJnr/Capybara", split="train")
|
| 268 |
+
capybara_dataset = capybara_dataset.map(format_capybara_prompts, batched=True, remove_columns=capybara_dataset.column_names)
|
| 269 |
+
except Exception as e:
|
| 270 |
+
logger.error(f"❌ Failed to load Capybara dataset: {str(e)}")
|
| 271 |
+
return
|
| 272 |
+
|
| 273 |
+
logger.info(f"📊 Capybara dataset size: {len(capybara_dataset)}")
|
| 274 |
+
|
| 275 |
+
def tokenize_function(examples):
|
| 276 |
+
return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=MAX_LENGTH)
|
| 277 |
+
|
| 278 |
+
logger.info("🔢 Tokenizing dataset")
|
| 279 |
+
tokenized_dataset = capybara_dataset.map(tokenize_function, batched=True, remove_columns=capybara_dataset.column_names)
|
| 280 |
+
|
| 281 |
+
logger.info("🏋️ Setting up the training arguments")
|
| 282 |
+
training_args = TrainingArguments(
|
| 283 |
+
output_dir=OUTPUT_DIR,
|
| 284 |
+
num_train_epochs=3,
|
| 285 |
+
per_device_train_batch_size=BATCH_SIZE,
|
| 286 |
+
gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
|
| 287 |
+
learning_rate=LEARNING_RATE,
|
| 288 |
+
weight_decay=0.01,
|
| 289 |
+
bf16=True,
|
| 290 |
+
logging_steps=10,
|
| 291 |
+
save_steps=300,
|
| 292 |
+
save_total_limit=10,
|
| 293 |
+
dataloader_num_workers=4,
|
| 294 |
+
warmup_steps=NUM_WARMUP_STEPS,
|
| 295 |
+
gradient_checkpointing=True,
|
| 296 |
+
evaluation_strategy="steps",
|
| 297 |
+
eval_steps=300,
|
| 298 |
+
max_steps=MAX_STEPS,
|
| 299 |
+
fp16=False,
|
| 300 |
+
optim="adamw_hf",
|
| 301 |
+
lr_scheduler_type="cosine",
|
| 302 |
+
load_best_model_at_end=True,
|
| 303 |
+
metric_for_best_model="loss",
|
| 304 |
+
greater_is_better=False,
|
| 305 |
+
)
|
| 306 |
+
|
| 307 |
+
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
|
| 308 |
+
|
| 309 |
+
optimizer = GrokAdamW(
|
| 310 |
+
model.parameters(),
|
| 311 |
+
lr=LEARNING_RATE,
|
| 312 |
+
betas=(0.9, 0.999),
|
| 313 |
+
eps=1e-8,
|
| 314 |
+
weight_decay=0.01,
|
| 315 |
+
alpha_init=0.98,
|
| 316 |
+
lamb=2.0,
|
| 317 |
+
gamma=0.1,
|
| 318 |
+
grokking_signal_fns=[grokking_signal_fn],
|
| 319 |
+
grokking_signal_decay_rate=0.1,
|
| 320 |
+
gradient_clipping=1.0
|
| 321 |
+
)
|
| 322 |
+
|
| 323 |
+
logger.info("🏃♂️ Initializing Trainer with GrokAdamW")
|
| 324 |
+
global trainer
|
| 325 |
+
trainer = CustomTrainer(
|
| 326 |
+
model=model,
|
| 327 |
+
args=training_args,
|
| 328 |
+
train_dataset=tokenized_dataset,
|
| 329 |
+
eval_dataset=tokenized_dataset.select(range(min(1000, len(tokenized_dataset)))),
|
| 330 |
+
data_collator=data_collator,
|
| 331 |
+
optimizers=(optimizer, None),
|
| 332 |
+
)
|
| 333 |
+
|
| 334 |
+
logger.info("🔥 Starting the training with GrokAdamW")
|
| 335 |
+
try:
|
| 336 |
+
trainer.train()
|
| 337 |
+
except Exception as e:
|
| 338 |
+
logger.error(f"❌ Training failed: {str(e)}")
|
| 339 |
+
return
|
| 340 |
+
|
| 341 |
+
logger.info("💾 Saving the model")
|
| 342 |
+
try:
|
| 343 |
+
trainer.save_model(OUTPUT_DIR)
|
| 344 |
+
except Exception as e:
|
| 345 |
+
logger.error(f"❌ Failed to save model: {str(e)}")
|
| 346 |
+
|
| 347 |
+
logger.info("🎉 Finetuning with GrokAdamW completed!")
|
| 348 |
+
|
| 349 |
+
if __name__ == "__main__":
|
| 350 |
+
main()
|
| 351 |
+
```
|
| 352 |
+
🚀 Now go forth and train, accelerate that code!
|
| 353 |
+
|
| 354 |
+
> **Note:** You'll need about 14GB of VRAM. If you have 8GB, change to batch size 4.
|
| 355 |
+
|
| 356 |
+
Results will appear in `./capybara_finetuned_results`
|
| 357 |
+
|
| 358 |
+
---
|
| 359 |
+
|
| 360 |
+
### Author
|
| 361 |
+
|
| 362 |
+
**Nisten Tahiraj**
|
| 363 |
+
🏢 [rakun.ai](https://rakun.ai)
|
| 364 |
+
📍 Toronto, Canada
|
| 365 |
+
|
| 366 |
+
---
|
| 367 |
+
Happy training!
|
| 368 |
+
<video controls autoplay muted src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/WCLhKzZWbrLo8BETGaKvI.qt"></video>
|
biggie_groked_int8_q8_0.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c94683f4bda200ab06337ec237ab7953012d428b449e085563cd7a7a2b865658
|
| 3 |
+
size 163636960
|
config.json
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_name_or_path": "/Users/n/hf/SmolLM-135M",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"LlamaForCausalLM"
|
| 5 |
+
],
|
| 6 |
+
"attention_bias": false,
|
| 7 |
+
"attention_dropout": 0.0,
|
| 8 |
+
"bos_token_id": 0,
|
| 9 |
+
"eos_token_id": 0,
|
| 10 |
+
"hidden_act": "silu",
|
| 11 |
+
"hidden_size": 576,
|
| 12 |
+
"initializer_range": 0.02,
|
| 13 |
+
"intermediate_size": 1536,
|
| 14 |
+
"max_position_embeddings": 2048,
|
| 15 |
+
"mlp_bias": false,
|
| 16 |
+
"model_type": "llama",
|
| 17 |
+
"num_attention_heads": 9,
|
| 18 |
+
"num_hidden_layers": 35,
|
| 19 |
+
"num_key_value_heads": 3,
|
| 20 |
+
"pretraining_tp": 1,
|
| 21 |
+
"rms_norm_eps": 1e-05,
|
| 22 |
+
"rope_scaling": null,
|
| 23 |
+
"rope_theta": 10000.0,
|
| 24 |
+
"tie_word_embeddings": true,
|
| 25 |
+
"torch_dtype": "bfloat16",
|
| 26 |
+
"transformers_version": "4.43.3",
|
| 27 |
+
"use_cache": true,
|
| 28 |
+
"vocab_size": 49152
|
| 29 |
+
}
|
gitattributes
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
smolbiggie_bf16.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
Biggie_SmolLM_0.15B_Base_bf16.gguf filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
Biggie_SmolLM_0.15B_Base_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
biggie-smollm-checkpoint-twitter-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
biggie_groked_int8_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
old-biggie-smollm-checkpoint-twitter-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8868a6657a83f51c2b8d5d88cf48f96cdf0631feed5582a152cbcc77cfae4299
|
| 3 |
+
size 269149770
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"additional_special_tokens": [
|
| 3 |
+
"<|endoftext|>",
|
| 4 |
+
"<|im_start|>",
|
| 5 |
+
"<|im_end|>",
|
| 6 |
+
"<repo_name>",
|
| 7 |
+
"<reponame>",
|
| 8 |
+
"<file_sep>",
|
| 9 |
+
"<filename>",
|
| 10 |
+
"<gh_stars>",
|
| 11 |
+
"<issue_start>",
|
| 12 |
+
"<issue_comment>",
|
| 13 |
+
"<issue_closed>",
|
| 14 |
+
"<jupyter_start>",
|
| 15 |
+
"<jupyter_text>",
|
| 16 |
+
"<jupyter_code>",
|
| 17 |
+
"<jupyter_output>",
|
| 18 |
+
"<jupyter_script>",
|
| 19 |
+
"<empty_output>"
|
| 20 |
+
],
|
| 21 |
+
"bos_token": {
|
| 22 |
+
"content": "<|endoftext|>",
|
| 23 |
+
"lstrip": false,
|
| 24 |
+
"normalized": false,
|
| 25 |
+
"rstrip": false,
|
| 26 |
+
"single_word": false
|
| 27 |
+
},
|
| 28 |
+
"eos_token": {
|
| 29 |
+
"content": "<|endoftext|>",
|
| 30 |
+
"lstrip": false,
|
| 31 |
+
"normalized": false,
|
| 32 |
+
"rstrip": false,
|
| 33 |
+
"single_word": false
|
| 34 |
+
},
|
| 35 |
+
"unk_token": {
|
| 36 |
+
"content": "<|endoftext|>",
|
| 37 |
+
"lstrip": false,
|
| 38 |
+
"normalized": false,
|
| 39 |
+
"rstrip": false,
|
| 40 |
+
"single_word": false
|
| 41 |
+
}
|
| 42 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"added_tokens_decoder": {
|
| 4 |
+
"0": {
|
| 5 |
+
"content": "<|endoftext|>",
|
| 6 |
+
"lstrip": false,
|
| 7 |
+
"normalized": false,
|
| 8 |
+
"rstrip": false,
|
| 9 |
+
"single_word": false,
|
| 10 |
+
"special": true
|
| 11 |
+
},
|
| 12 |
+
"1": {
|
| 13 |
+
"content": "<|im_start|>",
|
| 14 |
+
"lstrip": false,
|
| 15 |
+
"normalized": false,
|
| 16 |
+
"rstrip": false,
|
| 17 |
+
"single_word": false,
|
| 18 |
+
"special": true
|
| 19 |
+
},
|
| 20 |
+
"2": {
|
| 21 |
+
"content": "<|im_end|>",
|
| 22 |
+
"lstrip": false,
|
| 23 |
+
"normalized": false,
|
| 24 |
+
"rstrip": false,
|
| 25 |
+
"single_word": false,
|
| 26 |
+
"special": true
|
| 27 |
+
},
|
| 28 |
+
"3": {
|
| 29 |
+
"content": "<repo_name>",
|
| 30 |
+
"lstrip": false,
|
| 31 |
+
"normalized": false,
|
| 32 |
+
"rstrip": false,
|
| 33 |
+
"single_word": false,
|
| 34 |
+
"special": true
|
| 35 |
+
},
|
| 36 |
+
"4": {
|
| 37 |
+
"content": "<reponame>",
|
| 38 |
+
"lstrip": false,
|
| 39 |
+
"normalized": false,
|
| 40 |
+
"rstrip": false,
|
| 41 |
+
"single_word": false,
|
| 42 |
+
"special": true
|
| 43 |
+
},
|
| 44 |
+
"5": {
|
| 45 |
+
"content": "<file_sep>",
|
| 46 |
+
"lstrip": false,
|
| 47 |
+
"normalized": false,
|
| 48 |
+
"rstrip": false,
|
| 49 |
+
"single_word": false,
|
| 50 |
+
"special": true
|
| 51 |
+
},
|
| 52 |
+
"6": {
|
| 53 |
+
"content": "<filename>",
|
| 54 |
+
"lstrip": false,
|
| 55 |
+
"normalized": false,
|
| 56 |
+
"rstrip": false,
|
| 57 |
+
"single_word": false,
|
| 58 |
+
"special": true
|
| 59 |
+
},
|
| 60 |
+
"7": {
|
| 61 |
+
"content": "<gh_stars>",
|
| 62 |
+
"lstrip": false,
|
| 63 |
+
"normalized": false,
|
| 64 |
+
"rstrip": false,
|
| 65 |
+
"single_word": false,
|
| 66 |
+
"special": true
|
| 67 |
+
},
|
| 68 |
+
"8": {
|
| 69 |
+
"content": "<issue_start>",
|
| 70 |
+
"lstrip": false,
|
| 71 |
+
"normalized": false,
|
| 72 |
+
"rstrip": false,
|
| 73 |
+
"single_word": false,
|
| 74 |
+
"special": true
|
| 75 |
+
},
|
| 76 |
+
"9": {
|
| 77 |
+
"content": "<issue_comment>",
|
| 78 |
+
"lstrip": false,
|
| 79 |
+
"normalized": false,
|
| 80 |
+
"rstrip": false,
|
| 81 |
+
"single_word": false,
|
| 82 |
+
"special": true
|
| 83 |
+
},
|
| 84 |
+
"10": {
|
| 85 |
+
"content": "<issue_closed>",
|
| 86 |
+
"lstrip": false,
|
| 87 |
+
"normalized": false,
|
| 88 |
+
"rstrip": false,
|
| 89 |
+
"single_word": false,
|
| 90 |
+
"special": true
|
| 91 |
+
},
|
| 92 |
+
"11": {
|
| 93 |
+
"content": "<jupyter_start>",
|
| 94 |
+
"lstrip": false,
|
| 95 |
+
"normalized": false,
|
| 96 |
+
"rstrip": false,
|
| 97 |
+
"single_word": false,
|
| 98 |
+
"special": true
|
| 99 |
+
},
|
| 100 |
+
"12": {
|
| 101 |
+
"content": "<jupyter_text>",
|
| 102 |
+
"lstrip": false,
|
| 103 |
+
"normalized": false,
|
| 104 |
+
"rstrip": false,
|
| 105 |
+
"single_word": false,
|
| 106 |
+
"special": true
|
| 107 |
+
},
|
| 108 |
+
"13": {
|
| 109 |
+
"content": "<jupyter_code>",
|
| 110 |
+
"lstrip": false,
|
| 111 |
+
"normalized": false,
|
| 112 |
+
"rstrip": false,
|
| 113 |
+
"single_word": false,
|
| 114 |
+
"special": true
|
| 115 |
+
},
|
| 116 |
+
"14": {
|
| 117 |
+
"content": "<jupyter_output>",
|
| 118 |
+
"lstrip": false,
|
| 119 |
+
"normalized": false,
|
| 120 |
+
"rstrip": false,
|
| 121 |
+
"single_word": false,
|
| 122 |
+
"special": true
|
| 123 |
+
},
|
| 124 |
+
"15": {
|
| 125 |
+
"content": "<jupyter_script>",
|
| 126 |
+
"lstrip": false,
|
| 127 |
+
"normalized": false,
|
| 128 |
+
"rstrip": false,
|
| 129 |
+
"single_word": false,
|
| 130 |
+
"special": true
|
| 131 |
+
},
|
| 132 |
+
"16": {
|
| 133 |
+
"content": "<empty_output>",
|
| 134 |
+
"lstrip": false,
|
| 135 |
+
"normalized": false,
|
| 136 |
+
"rstrip": false,
|
| 137 |
+
"single_word": false,
|
| 138 |
+
"special": true
|
| 139 |
+
}
|
| 140 |
+
},
|
| 141 |
+
"additional_special_tokens": [
|
| 142 |
+
"<|endoftext|>",
|
| 143 |
+
"<|im_start|>",
|
| 144 |
+
"<|im_end|>",
|
| 145 |
+
"<repo_name>",
|
| 146 |
+
"<reponame>",
|
| 147 |
+
"<file_sep>",
|
| 148 |
+
"<filename>",
|
| 149 |
+
"<gh_stars>",
|
| 150 |
+
"<issue_start>",
|
| 151 |
+
"<issue_comment>",
|
| 152 |
+
"<issue_closed>",
|
| 153 |
+
"<jupyter_start>",
|
| 154 |
+
"<jupyter_text>",
|
| 155 |
+
"<jupyter_code>",
|
| 156 |
+
"<jupyter_output>",
|
| 157 |
+
"<jupyter_script>",
|
| 158 |
+
"<empty_output>"
|
| 159 |
+
],
|
| 160 |
+
"bos_token": "<|endoftext|>",
|
| 161 |
+
"clean_up_tokenization_spaces": false,
|
| 162 |
+
"eos_token": "<|endoftext|>",
|
| 163 |
+
"model_max_length": 1000000000000000019884624838656,
|
| 164 |
+
"tokenizer_class": "GPT2Tokenizer",
|
| 165 |
+
"unk_token": "<|endoftext|>",
|
| 166 |
+
"vocab_size": 49152
|
| 167 |
+
}
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|