Instructions to use kth8/gemma-3-270m-it-OpenCode-Title-Generator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kth8/gemma-3-270m-it-OpenCode-Title-Generator")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("kth8/gemma-3-270m-it-OpenCode-Title-Generator")
model = AutoModelForMultimodalLM.from_pretrained("kth8/gemma-3-270m-it-OpenCode-Title-Generator")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kth8/gemma-3-270m-it-OpenCode-Title-Generator"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kth8/gemma-3-270m-it-OpenCode-Title-Generator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kth8/gemma-3-270m-it-OpenCode-Title-Generator

SGLang

How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kth8/gemma-3-270m-it-OpenCode-Title-Generator" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kth8/gemma-3-270m-it-OpenCode-Title-Generator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kth8/gemma-3-270m-it-OpenCode-Title-Generator" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kth8/gemma-3-270m-it-OpenCode-Title-Generator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kth8/gemma-3-270m-it-OpenCode-Title-Generator to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kth8/gemma-3-270m-it-OpenCode-Title-Generator to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for kth8/gemma-3-270m-it-OpenCode-Title-Generator to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="kth8/gemma-3-270m-it-OpenCode-Title-Generator",
    max_seq_length=2048,
)

Docker Model Runner
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator with Docker Model Runner:
```
docker model run hf.co/kth8/gemma-3-270m-it-OpenCode-Title-Generator
```

kth8 commited on 23 days ago

Commit

83772e2

verified ·

1 Parent(s): c5a7299

Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

.gitattributes +1 -0
README.md +143 -0
added_tokens.json +3 -0
chat_template.jinja +50 -0
config.json +55 -0
generation_config.json +14 -0
model.safetensors +3 -0
special_tokens_map.json +33 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0
train/log.json +927 -0
train/training_loss.png +0 -0
train/validation_loss.png +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,143 @@

+---
+license: gemma
+language:
+- en
+base_model: unsloth/gemma-3-270m-it
+datasets:
+- kth8/title-generation-25000x
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- sft
+- trl
+- unsloth
+- gemma
+- gemma3
+- gemma3_text
+---
+![logo](https://storage.googleapis.com/gweb-developer-goog-blog-assets/images/gemma-3_2.original.png)
+A supervised fine-tune of [unsloth/gemma-3-270m-it](https://huggingface.co/unsloth/gemma-3-270m-it) on the [kth8/title-generation-25000x](https://huggingface.co/datasets/kth8/title-generation-25000x) dataset.
+Trained on ~17K example English only subset due to base model limitation and with the exact system prompt OpenCode's [title agent uses](https://raw.githubusercontent.com/anomalyco/opencode/refs/heads/dev/packages/opencode/src/agent/prompt/title.txt).
+## Usage example
+**System prompt**
+```
+You are a title generator. You output ONLY a thread title. Nothing else.
+<task>
+Generate a brief title that would help the user find this conversation later.
+Follow all rules in <rules>
+Use the <examples> so you know what a good title looks like.
+Your output must be:
+- A single line
+- ≤50 characters
+- No explanations
+</task>
+<rules>
+- you MUST use the same language as the user message you are summarizing
+- Title must be grammatically correct and read naturally - no word salad
+- Never include tool names in the title (e.g. "read tool", "bash tool", "edit tool")
+- Focus on the main topic or question the user needs to retrieve
+- Vary your phrasing - avoid repetitive patterns like always starting with "Analyzing"
+- When a file is mentioned, focus on WHAT the user wants to do WITH the file, not just that they shared it
+- Keep exact: technical terms, numbers, filenames, HTTP codes
+- Remove: the, this, my, a, an
+- Never assume tech stack
+- Never use tools
+- NEVER respond to questions, just generate a title for the conversation
+- The title should NEVER include "summarizing" or "generating" when generating a title
+- DO NOT SAY YOU CANNOT GENERATE A TITLE OR COMPLAIN ABOUT THE INPUT
+- Always output something meaningful, even if the input is minimal.
+- If the user message is short or conversational (e.g. "hello", "lol", "what's up", "hey"):
+  → create a title that reflects the user's tone or intent (such as Greeting, Quick check-in, Light chat, Intro message, etc.)
+</rules>
+<examples>
+"debug 500 errors in production" → Debugging production 500 errors
+"refactor user service" → Refactoring user service
+"why is app.js failing" → app.js failure investigation
+"implement rate limiting" → Rate limiting implementation
+"how do I connect postgres to my API" → Postgres API connection
+"best practices for React hooks" → React hooks best practices
+"@src/auth.ts can you add refresh token support" → Auth refresh token support
+"@utils/parser.ts this is broken" → Parser bug fix
+"look at @config.json" → Config review
+"@App.tsx add dark mode toggle" → Dark mode toggle in App
+</examples>
+```
+**User prompt**
+```
+If there were 200 students who passed an English course three years ago, and each subsequent year until the current one that number increased by 50% of the previous year's number, how many students will pass the course this year?
+```
+**Assistant response**
+```
+Student course passing growth calculation
+```
+## Model Details
+- Base Model: `unsloth/gemma-3-270m-it`
+- Parameter Count: 268,098,176
+- Precision: torch.bfloat16
+## Training Settings
+### PEFT
+- Rank: 32
+- LoRA alpha: 64
+- Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+- Gradient checkpointing: unsloth
+### SFT
+- Epoch: 1
+- Batch size: 8
+- Gradient Accumulation steps: 2
+- Learning rate: 0.0002
+- Optimizer: adamw_torch_fused
+- Learning rate scheduler: cosine
+- Warmup steps: 100
+- Weight decay: 0.01
+## Training stats
+- Date: 2026-05-27T10:31:30.454060
+- GPU: NVIDIA A100-SXM4-40GB
+- Peak VRAM usage: 12.152 GB
+- Global step: 1086
+- Training runtime (seconds): 1140.8033
+- Average training loss: 1.5016714283994108
+- Best validation loss: 1.2990461587905884
+| Step | Training Loss | Validation Loss |
+|------|---------------|-----------------|
+| 54   | 1.634000      | 1.755292        |
+| 108  | 1.783900      | 1.585450        |
+| 162  | 1.609100      | 1.580338        |
+| 216  | 1.534900      | 1.548727        |
+| 270  | 1.485100      | 1.522543        |
+| 324  | 1.549500      | 1.483723        |
+| 378  | 1.512200      | 1.459690        |
+| 432  | 1.451300      | 1.432863        |
+| 486  | 1.502300      | 1.439751        |
+| 540  | 1.376700      | 1.425881        |
+| 594  | 1.442000      | 1.390692        |
+| 648  | 1.365000      | 1.359873        |
+| 702  | 1.334400      | 1.337866        |
+| 756  | 1.376700      | 1.324850        |
+| 810  | 1.355800      | 1.325707        |
+| 864  | 1.327700      | 1.317618        |
+| 918  | 1.423100      | 1.310045        |
+| 972  | 1.400300      | 1.303569        |
+| 1026 | 1.257700      | 1.299046        |
+| 1080 | 1.278000      | 1.299577        |
+## Framework versions
+- Unsloth: 2026.5.8
+- TRL: 0.22.2
+- Transformers: 4.56.2
+- Pytorch: 2.11.0+cu128
+- Datasets: 4.8.5
+- Tokenizers: 0.22.2
+## License
+This model is released under the Gemma license. See the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Prohibited Use Policy](https://policies.google.com/terms/generative-ai/use-policy) regarding the use of Gemma-generated content.

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "<image_soft_token>": 262144
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,50 @@

+{# Unsloth Chat template fixes #}
+{{ bos_token }}
+{%- if messages[0]['role'] == 'system' -%}
+    {%- if messages[0]['content'] is string -%}
+        {%- set first_user_prefix = messages[0]['content'] + '
+' -%}
+    {%- else -%}
+        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
+' -%}
+    {%- endif -%}
+    {%- set loop_messages = messages[1:] -%}
+{%- else -%}
+    {%- set first_user_prefix = "" -%}
+    {%- set loop_messages = messages -%}
+{%- endif -%}
+{%- for message in loop_messages -%}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
+        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
+    {%- endif -%}
+    {%- if (message['role'] == 'assistant') -%}
+        {%- set role = "model" -%}
+    {%- else -%}
+        {%- set role = message['role'] -%}
+    {%- endif -%}
+    {{ '<start_of_turn>' + role + '
+' + (first_user_prefix if loop.first else "") }}
+    {%- if message['content'] is string -%}
+        {{ message['content'] | trim }}
+    {%- elif message['content'] is iterable -%}
+        {%- for item in message['content'] -%}
+            {%- if item['type'] == 'image' -%}
+                {{ '<start_of_image>' }}
+            {%- elif item['type'] == 'text' -%}
+                {{ item['text'] | trim }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- elif message['content'] is defined -%}
+        {{ raise_exception("Invalid content type") }}
+    {%- endif -%}
+    {{ '<end_of_turn>
+' }}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{'<start_of_turn>model
+'}}
+{%- endif -%}
+{# Copyright 2025-present Unsloth. Apache 2.0 License. #}

config.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "_sliding_window_pattern": 6,
+  "architectures": [
+    "Gemma3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "attn_logit_softcapping": null,
+  "bos_token_id": 2,
+  "dtype": "bfloat16",
+  "eos_token_id": 106,
+  "final_logit_softcapping": null,
+  "head_dim": 256,
+  "hidden_activation": "gelu_pytorch_tanh",
+  "hidden_size": 640,
+  "initializer_range": 0.02,
+  "intermediate_size": 2048,
+  "layer_types": [
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "model_type": "gemma3_text",
+  "num_attention_heads": 4,
+  "num_hidden_layers": 18,
+  "num_key_value_heads": 1,
+  "pad_token_id": 0,
+  "query_pre_attn_scalar": 256,
+  "rms_norm_eps": 1e-06,
+  "rope_local_base_freq": 10000.0,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": 512,
+  "transformers_version": "4.56.2",
+  "unsloth_fixed": true,
+  "use_bidirectional_attention": false,
+  "use_cache": true,
+  "vocab_size": 262144
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "do_sample": true,
+  "eos_token_id": [
+    1,
+    106
+  ],
+  "max_length": 32768,
+  "pad_token_id": 0,
+  "top_k": 64,
+  "top_p": 0.95,
+  "transformers_version": "4.56.2"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79a0631e78cdf084899625dd6104d857efe7a866e624bc028c23e1f2b1ec6fbe
+size 536223056

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<end_of_turn>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

train/log.json ADDED Viewed

	@@ -0,0 +1,927 @@

+[
+  {
+    "loss": 4.5928,
+    "grad_norm": 26.88880729675293,
+    "learning_rate": 1.8e-05,
+    "epoch": 0.009208103130755065,
+    "step": 10
+  },
+  {
+    "loss": 2.6528,
+    "grad_norm": 17.780437469482422,
+    "learning_rate": 3.8e-05,
+    "epoch": 0.01841620626151013,
+    "step": 20
+  },
+  {
+    "loss": 2.0058,
+    "grad_norm": 11.103266716003418,
+    "learning_rate": 5.8e-05,
+    "epoch": 0.027624309392265192,
+    "step": 30
+  },
+  {
+    "loss": 1.933,
+    "grad_norm": 9.816450119018555,
+    "learning_rate": 7.800000000000001e-05,
+    "epoch": 0.03683241252302026,
+    "step": 40
+  },
+  {
+    "loss": 1.634,
+    "grad_norm": 10.19833755493164,
+    "learning_rate": 9.8e-05,
+    "epoch": 0.04604051565377532,
+    "step": 50
+  },
+  {
+    "eval_loss": 1.755292296409607,
+    "eval_runtime": 29.156,
+    "eval_samples_per_second": 12.176,
+    "eval_steps_per_second": 3.053,
+    "epoch": 0.049723756906077346,
+    "step": 54
+  },
+  {
+    "loss": 1.8678,
+    "grad_norm": 10.386731147766113,
+    "learning_rate": 0.000118,
+    "epoch": 0.055248618784530384,
+    "step": 60
+  },
+  {
+    "loss": 1.8065,
+    "grad_norm": 9.946635246276855,
+    "learning_rate": 0.000138,
+    "epoch": 0.06445672191528545,
+    "step": 70
+  },
+  {
+    "loss": 1.662,
+    "grad_norm": 7.8996901512146,
+    "learning_rate": 0.00015800000000000002,
+    "epoch": 0.07366482504604052,
+    "step": 80
+  },
+  {
+    "loss": 1.7389,
+    "grad_norm": 9.132104873657227,
+    "learning_rate": 0.00017800000000000002,
+    "epoch": 0.08287292817679558,
+    "step": 90
+  },
+  {
+    "loss": 1.7839,
+    "grad_norm": 6.549439430236816,
+    "learning_rate": 0.00019800000000000002,
+    "epoch": 0.09208103130755065,
+    "step": 100
+  },
+  {
+    "eval_loss": 1.5854495763778687,
+    "eval_runtime": 9.738,
+    "eval_samples_per_second": 36.455,
+    "eval_steps_per_second": 9.139,
+    "epoch": 0.09944751381215469,
+    "step": 108
+  },
+  {
+    "loss": 1.6674,
+    "grad_norm": 6.268077373504639,
+    "learning_rate": 0.0001999588877563566,
+    "epoch": 0.10128913443830571,
+    "step": 110
+  },
+  {
+    "loss": 1.7546,
+    "grad_norm": 6.57451868057251,
+    "learning_rate": 0.0001998168147576911,
+    "epoch": 0.11049723756906077,
+    "step": 120
+  },
+  {
+    "loss": 1.7248,
+    "grad_norm": 7.216021537780762,
+    "learning_rate": 0.00019957341762950344,
+    "epoch": 0.11970534069981584,
+    "step": 130
+  },
+  {
+    "loss": 1.5602,
+    "grad_norm": 8.38607406616211,
+    "learning_rate": 0.00019922894344441571,
+    "epoch": 0.1289134438305709,
+    "step": 140
+  },
+  {
+    "loss": 1.4979,
+    "grad_norm": 5.186763286590576,
+    "learning_rate": 0.0001987837418784522,
+    "epoch": 0.13812154696132597,
+    "step": 150
+  },
+  {
+    "loss": 1.6091,
+    "grad_norm": 5.746674060821533,
+    "learning_rate": 0.0001982382648560832,
+    "epoch": 0.14732965009208104,
+    "step": 160
+  },
+  {
+    "eval_loss": 1.5803377628326416,
+    "eval_runtime": 9.8217,
+    "eval_samples_per_second": 36.145,
+    "eval_steps_per_second": 9.062,
+    "epoch": 0.14917127071823205,
+    "step": 162
+  },
+  {
+    "loss": 1.5429,
+    "grad_norm": 4.997202396392822,
+    "learning_rate": 0.00019759306609147596,
+    "epoch": 0.15653775322283608,
+    "step": 170
+  },
+  {
+    "loss": 1.748,
+    "grad_norm": 6.101498603820801,
+    "learning_rate": 0.00019684880052641917,
+    "epoch": 0.16574585635359115,
+    "step": 180
+  },
+  {
+    "loss": 1.4566,
+    "grad_norm": 6.071585655212402,
+    "learning_rate": 0.0001960062236654908,
+    "epoch": 0.17495395948434622,
+    "step": 190
+  },
+  {
+    "loss": 1.7097,
+    "grad_norm": 4.875613689422607,
+    "learning_rate": 0.000195066190809145,
+    "epoch": 0.1841620626151013,
+    "step": 200
+  },
+  {
+    "loss": 1.5349,
+    "grad_norm": 7.088517665863037,
+    "learning_rate": 0.00019402965618549575,
+    "epoch": 0.19337016574585636,
+    "step": 210
+  },
+  {
+    "eval_loss": 1.5487271547317505,
+    "eval_runtime": 9.6971,
+    "eval_samples_per_second": 36.609,
+    "eval_steps_per_second": 9.178,
+    "epoch": 0.19889502762430938,
+    "step": 216
+  },
+  {
+    "loss": 1.6348,
+    "grad_norm": 6.564614295959473,
+    "learning_rate": 0.00019289767198167916,
+    "epoch": 0.20257826887661143,
+    "step": 220
+  },
+  {
+    "loss": 1.4628,
+    "grad_norm": 6.782557010650635,
+    "learning_rate": 0.0001916713872757776,
+    "epoch": 0.21178637200736647,
+    "step": 230
+  },
+  {
+    "loss": 1.4765,
+    "grad_norm": 4.291659355163574,
+    "learning_rate": 0.00019035204687038943,
+    "epoch": 0.22099447513812154,
+    "step": 240
+  },
+  {
+    "loss": 1.614,
+    "grad_norm": 4.750514984130859,
+    "learning_rate": 0.00018894099002902896,
+    "epoch": 0.2302025782688766,
+    "step": 250
+  },
+  {
+    "loss": 1.6338,
+    "grad_norm": 4.737514019012451,
+    "learning_rate": 0.00018743964911663893,
+    "epoch": 0.23941068139963168,
+    "step": 260
+  },
+  {
+    "loss": 1.4851,
+    "grad_norm": 4.858434677124023,
+    "learning_rate": 0.00018584954814559578,
+    "epoch": 0.24861878453038674,
+    "step": 270
+  },
+  {
+    "eval_loss": 1.5225430727005005,
+    "eval_runtime": 9.7006,
+    "eval_samples_per_second": 36.596,
+    "eval_steps_per_second": 9.175,
+    "epoch": 0.24861878453038674,
+    "step": 270
+  },
+  {
+    "loss": 1.4764,
+    "grad_norm": 5.288131237030029,
+    "learning_rate": 0.00018417230122868335,
+    "epoch": 0.2578268876611418,
+    "step": 280
+  },
+  {
+    "loss": 1.4763,
+    "grad_norm": 6.140823841094971,
+    "learning_rate": 0.00018240961094060572,
+    "epoch": 0.26703499079189685,
+    "step": 290
+  },
+  {
+    "loss": 1.4253,
+    "grad_norm": 4.705212593078613,
+    "learning_rate": 0.00018056326658970226,
+    "epoch": 0.27624309392265195,
+    "step": 300
+  },
+  {
+    "loss": 1.4912,
+    "grad_norm": 4.866296291351318,
+    "learning_rate": 0.00017863514240161932,
+    "epoch": 0.285451197053407,
+    "step": 310
+  },
+  {
+    "loss": 1.5495,
+    "grad_norm": 5.423320293426514,
+    "learning_rate": 0.00017662719561678216,
+    "epoch": 0.2946593001841621,
+    "step": 320
+  },
+  {
+    "eval_loss": 1.4837230443954468,
+    "eval_runtime": 9.6775,
+    "eval_samples_per_second": 36.683,
+    "eval_steps_per_second": 9.197,
+    "epoch": 0.2983425414364641,
+    "step": 324
+  },
+  {
+    "loss": 1.5233,
+    "grad_norm": 5.150054454803467,
+    "learning_rate": 0.00017454146450359876,
+    "epoch": 0.30386740331491713,
+    "step": 330
+  },
+  {
+    "loss": 1.4765,
+    "grad_norm": 4.734647274017334,
+    "learning_rate": 0.00017238006628941173,
+    "epoch": 0.31307550644567217,
+    "step": 340
+  },
+  {
+    "loss": 1.527,
+    "grad_norm": 6.608598232269287,
+    "learning_rate": 0.00017014519501129923,
+    "epoch": 0.32228360957642727,
+    "step": 350
+  },
+  {
+    "loss": 1.5924,
+    "grad_norm": 6.124919891357422,
+    "learning_rate": 0.00016783911928890618,
+    "epoch": 0.3314917127071823,
+    "step": 360
+  },
+  {
+    "loss": 1.5122,
+    "grad_norm": 5.096858978271484,
+    "learning_rate": 0.0001654641800215665,
+    "epoch": 0.3406998158379374,
+    "step": 370
+  },
+  {
+    "eval_loss": 1.4596896171569824,
+    "eval_runtime": 9.6761,
+    "eval_samples_per_second": 36.688,
+    "eval_steps_per_second": 9.198,
+    "epoch": 0.34806629834254144,
+    "step": 378
+  },
+  {
+    "loss": 1.5265,
+    "grad_norm": 4.963854789733887,
+    "learning_rate": 0.00016302278801205443,
+    "epoch": 0.34990791896869244,
+    "step": 380
+  },
+  {
+    "loss": 1.4015,
+    "grad_norm": 6.355607986450195,
+    "learning_rate": 0.00016051742151937655,
+    "epoch": 0.35911602209944754,
+    "step": 390
+  },
+  {
+    "loss": 1.5026,
+    "grad_norm": 4.335422039031982,
+    "learning_rate": 0.00015795062374308918,
+    "epoch": 0.3683241252302026,
+    "step": 400
+  },
+  {
+    "loss": 1.5513,
+    "grad_norm": 4.647507190704346,
+    "learning_rate": 0.00015532500024169446,
+    "epoch": 0.3775322283609576,
+    "step": 410
+  },
+  {
+    "loss": 1.4938,
+    "grad_norm": 5.171050548553467,
+    "learning_rate": 0.0001526432162877356,
+    "epoch": 0.3867403314917127,
+    "step": 420
+  },
+  {
+    "loss": 1.4513,
+    "grad_norm": 4.9742536544799805,
+    "learning_rate": 0.00014990799416227682,
+    "epoch": 0.39594843462246776,
+    "step": 430
+  },
+  {
+    "eval_loss": 1.4328629970550537,
+    "eval_runtime": 9.8993,
+    "eval_samples_per_second": 35.861,
+    "eval_steps_per_second": 8.991,
+    "epoch": 0.39779005524861877,
+    "step": 432
+  },
+  {
+    "loss": 1.5631,
+    "grad_norm": 4.611151218414307,
+    "learning_rate": 0.0001471221103915134,
+    "epoch": 0.40515653775322286,
+    "step": 440
+  },
+  {
+    "loss": 1.4699,
+    "grad_norm": 5.096474647521973,
+    "learning_rate": 0.00014428839292831801,
+    "epoch": 0.4143646408839779,
+    "step": 450
+  },
+  {
+    "loss": 1.3699,
+    "grad_norm": 4.659368515014648,
+    "learning_rate": 0.00014140971828158306,
+    "epoch": 0.42357274401473294,
+    "step": 460
+  },
+  {
+    "loss": 1.5189,
+    "grad_norm": 5.0686235427856445,
+    "learning_rate": 0.00013848900859627448,
+    "epoch": 0.43278084714548803,
+    "step": 470
+  },
+  {
+    "loss": 1.5023,
+    "grad_norm": 5.085811138153076,
+    "learning_rate": 0.00013552922868715988,
+    "epoch": 0.4419889502762431,
+    "step": 480
+  },
+  {
+    "eval_loss": 1.4397507905960083,
+    "eval_runtime": 10.0959,
+    "eval_samples_per_second": 35.163,
+    "eval_steps_per_second": 8.815,
+    "epoch": 0.44751381215469616,
+    "step": 486
+  },
+  {
+    "loss": 1.4776,
+    "grad_norm": 4.884566307067871,
+    "learning_rate": 0.00013253338302922268,
+    "epoch": 0.45119705340699817,
+    "step": 490
+  },
+  {
+    "loss": 1.4041,
+    "grad_norm": 5.0404229164123535,
+    "learning_rate": 0.00012950451270781727,
+    "epoch": 0.4604051565377532,
+    "step": 500
+  },
+  {
+    "loss": 1.4588,
+    "grad_norm": 5.768477916717529,
+    "learning_rate": 0.00012644569233166055,
+    "epoch": 0.4696132596685083,
+    "step": 510
+  },
+  {
+    "loss": 1.4851,
+    "grad_norm": 4.67483377456665,
+    "learning_rate": 0.0001233600269117943,
+    "epoch": 0.47882136279926335,
+    "step": 520
+  },
+  {
+    "loss": 1.5198,
+    "grad_norm": 4.369293212890625,
+    "learning_rate": 0.00012025064870968594,
+    "epoch": 0.4880294659300184,
+    "step": 530
+  },
+  {
+    "loss": 1.3767,
+    "grad_norm": 4.557173252105713,
+    "learning_rate": 0.00011712071405766735,
+    "epoch": 0.4972375690607735,
+    "step": 540
+  },
+  {
+    "eval_loss": 1.4258806705474854,
+    "eval_runtime": 9.9078,
+    "eval_samples_per_second": 35.83,
+    "eval_steps_per_second": 8.983,
+    "epoch": 0.4972375690607735,
+    "step": 540
+  },
+  {
+    "loss": 1.3845,
+    "grad_norm": 4.495442867279053,
+    "learning_rate": 0.00011397340015493934,
+    "epoch": 0.5064456721915286,
+    "step": 550
+  },
+  {
+    "loss": 1.3657,
+    "grad_norm": 4.659553527832031,
+    "learning_rate": 0.00011081190184239419,
+    "epoch": 0.5156537753222836,
+    "step": 560
+  },
+  {
+    "loss": 1.5381,
+    "grad_norm": 5.458863258361816,
+    "learning_rate": 0.00010763942835953012,
+    "epoch": 0.5248618784530387,
+    "step": 570
+  },
+  {
+    "loss": 1.3163,
+    "grad_norm": 4.973690986633301,
+    "learning_rate": 0.00010445920008674955,
+    "epoch": 0.5340699815837937,
+    "step": 580
+  },
+  {
+    "loss": 1.442,
+    "grad_norm": 4.4588236808776855,
+    "learning_rate": 0.00010127444527634855,
+    "epoch": 0.5432780847145487,
+    "step": 590
+  },
+  {
+    "eval_loss": 1.3906919956207275,
+    "eval_runtime": 9.8795,
+    "eval_samples_per_second": 35.933,
+    "eval_steps_per_second": 9.009,
+    "epoch": 0.5469613259668509,
+    "step": 594
+  },
+  {
+    "loss": 1.3652,
+    "grad_norm": 4.593173980712891,
+    "learning_rate": 9.808839677551511e-05,
+    "epoch": 0.5524861878453039,
+    "step": 600
+  },
+  {
+    "loss": 1.3837,
+    "grad_norm": 5.622270584106445,
+    "learning_rate": 9.490428874466344e-05,
+    "epoch": 0.5616942909760589,
+    "step": 610
+  },
+  {
+    "loss": 1.4205,
+    "grad_norm": 3.932429552078247,
+    "learning_rate": 9.172535337443507e-05,
+    "epoch": 0.570902394106814,
+    "step": 620
+  },
+  {
+    "loss": 1.4007,
+    "grad_norm": 4.891452312469482,
+    "learning_rate": 8.855481760469961e-05,
+    "epoch": 0.580110497237569,
+    "step": 630
+  },
+  {
+    "loss": 1.365,
+    "grad_norm": 5.47842264175415,
+    "learning_rate": 8.539589984888534e-05,
+    "epoch": 0.5893186003683242,
+    "step": 640
+  },
+  {
+    "eval_loss": 1.3598726987838745,
+    "eval_runtime": 9.786,
+    "eval_samples_per_second": 36.276,
+    "eval_steps_per_second": 9.095,
+    "epoch": 0.5966850828729282,
+    "step": 648
+  },
+  {
+    "loss": 1.4359,
+    "grad_norm": 4.814668655395508,
+    "learning_rate": 8.225180672696527e-05,
+    "epoch": 0.5985267034990792,
+    "step": 650
+  },
+  {
+    "loss": 1.4049,
+    "grad_norm": 4.641814231872559,
+    "learning_rate": 7.912572981041448e-05,
+    "epoch": 0.6077348066298343,
+    "step": 660
+  },
+  {
+    "loss": 1.4101,
+    "grad_norm": 5.327591419219971,
+    "learning_rate": 7.602084238244338e-05,
+    "epoch": 0.6169429097605893,
+    "step": 670
+  },
+  {
+    "loss": 1.3731,
+    "grad_norm": 4.462182998657227,
+    "learning_rate": 7.294029621679532e-05,
+    "epoch": 0.6261510128913443,
+    "step": 680
+  },
+  {
+    "loss": 1.3291,
+    "grad_norm": 4.188094139099121,
+    "learning_rate": 6.98872183783787e-05,
+    "epoch": 0.6353591160220995,
+    "step": 690
+  },
+  {
+    "loss": 1.3344,
+    "grad_norm": 4.551364898681641,
+    "learning_rate": 6.68647080489805e-05,
+    "epoch": 0.6445672191528545,
+    "step": 700
+  },
+  {
+    "eval_loss": 1.3378660678863525,
+    "eval_runtime": 9.8503,
+    "eval_samples_per_second": 36.039,
+    "eval_steps_per_second": 9.035,
+    "epoch": 0.6464088397790055,
+    "step": 702
+  },
+  {
+    "loss": 1.2724,
+    "grad_norm": 4.210637092590332,
+    "learning_rate": 6.387583338128471e-05,
+    "epoch": 0.6537753222836096,
+    "step": 710
+  },
+  {
+    "loss": 1.3625,
+    "grad_norm": 4.314669609069824,
+    "learning_rate": 6.092362838438772e-05,
+    "epoch": 0.6629834254143646,
+    "step": 720
+  },
+  {
+    "loss": 1.3277,
+    "grad_norm": 5.252039432525635,
+    "learning_rate": 5.801108984397354e-05,
+    "epoch": 0.6721915285451197,
+    "step": 730
+  },
+  {
+    "loss": 1.2975,
+    "grad_norm": 3.6448535919189453,
+    "learning_rate": 5.514117428027394e-05,
+    "epoch": 0.6813996316758748,
+    "step": 740
+  },
+  {
+    "loss": 1.3767,
+    "grad_norm": 4.960095405578613,
+    "learning_rate": 5.2316794946902533e-05,
+    "epoch": 0.6906077348066298,
+    "step": 750
+  },
+  {
+    "eval_loss": 1.324850082397461,
+    "eval_runtime": 10.0053,
+    "eval_samples_per_second": 35.481,
+    "eval_steps_per_second": 8.895,
+    "epoch": 0.6961325966850829,
+    "step": 756
+  },
+  {
+    "loss": 1.4546,
+    "grad_norm": 4.837981224060059,
+    "learning_rate": 4.954081887360873e-05,
+    "epoch": 0.6998158379373849,
+    "step": 760
+  },
+  {
+    "loss": 1.4597,
+    "grad_norm": 5.846377372741699,
+    "learning_rate": 4.681606395595325e-05,
+    "epoch": 0.7090239410681399,
+    "step": 770
+  },
+  {
+    "loss": 1.2981,
+    "grad_norm": 4.527792930603027,
+    "learning_rate": 4.4145296094860366e-05,
+    "epoch": 0.7182320441988951,
+    "step": 780
+  },
+  {
+    "loss": 1.3727,
+    "grad_norm": 4.371113300323486,
+    "learning_rate": 4.153122638894952e-05,
+    "epoch": 0.7274401473296501,
+    "step": 790
+  },
+  {
+    "loss": 1.3613,
+    "grad_norm": 3.6618120670318604,
+    "learning_rate": 3.8976508382496745e-05,
+    "epoch": 0.7366482504604052,
+    "step": 800
+  },
+  {
+    "loss": 1.3558,
+    "grad_norm": 4.440310478210449,
+    "learning_rate": 3.648373537182001e-05,
+    "epoch": 0.7458563535911602,
+    "step": 810
+  },
+  {
+    "eval_loss": 1.325706958770752,
+    "eval_runtime": 9.8213,
+    "eval_samples_per_second": 36.146,
+    "eval_steps_per_second": 9.062,
+    "epoch": 0.7458563535911602,
+    "step": 810
+  },
+  {
+    "loss": 1.289,
+    "grad_norm": 4.077640533447266,
+    "learning_rate": 3.40554377728219e-05,
+    "epoch": 0.7550644567219152,
+    "step": 820
+  },
+  {
+    "loss": 1.3105,
+    "grad_norm": 4.2821044921875,
+    "learning_rate": 3.1694080552362224e-05,
+    "epoch": 0.7642725598526704,
+    "step": 830
+  },
+  {
+    "loss": 1.2668,
+    "grad_norm": 4.420470714569092,
+    "learning_rate": 2.9402060726068492e-05,
+    "epoch": 0.7734806629834254,
+    "step": 840
+  },
+  {
+    "loss": 1.3542,
+    "grad_norm": 4.674584865570068,
+    "learning_rate": 2.7181704925123075e-05,
+    "epoch": 0.7826887661141805,
+    "step": 850
+  },
+  {
+    "loss": 1.3277,
+    "grad_norm": 4.855812072753906,
+    "learning_rate": 2.5035267034498243e-05,
+    "epoch": 0.7918968692449355,
+    "step": 860
+  },
+  {
+    "eval_loss": 1.317618489265442,
+    "eval_runtime": 9.7915,
+    "eval_samples_per_second": 36.256,
+    "eval_steps_per_second": 9.089,
+    "epoch": 0.7955801104972375,
+    "step": 864
+  },
+  {
+    "loss": 1.4031,
+    "grad_norm": 4.8453803062438965,
+    "learning_rate": 2.296492590503564e-05,
+    "epoch": 0.8011049723756906,
+    "step": 870
+  },
+  {
+    "loss": 1.3857,
+    "grad_norm": 4.110125541687012,
+    "learning_rate": 2.0972783141692898e-05,
+    "epoch": 0.8103130755064457,
+    "step": 880
+  },
+  {
+    "loss": 1.3671,
+    "grad_norm": 4.354367256164551,
+    "learning_rate": 1.9060860970202955e-05,
+    "epoch": 0.8195211786372008,
+    "step": 890
+  },
+  {
+    "loss": 1.3786,
+    "grad_norm": 4.151617050170898,
+    "learning_rate": 1.7231100184310956e-05,
+    "epoch": 0.8287292817679558,
+    "step": 900
+  },
+  {
+    "loss": 1.4231,
+    "grad_norm": 4.310825347900391,
+    "learning_rate": 1.5485358175672927e-05,
+    "epoch": 0.8379373848987108,
+    "step": 910
+  },
+  {
+    "eval_loss": 1.3100452423095703,
+    "eval_runtime": 9.6199,
+    "eval_samples_per_second": 36.903,
+    "eval_steps_per_second": 9.252,
+    "epoch": 0.8453038674033149,
+    "step": 918
+  },
+  {
+    "loss": 1.3932,
+    "grad_norm": 4.235050678253174,
+    "learning_rate": 1.382540704841604e-05,
+    "epoch": 0.8471454880294659,
+    "step": 920
+  },
+  {
+    "loss": 1.3509,
+    "grad_norm": 4.722878456115723,
+    "learning_rate": 1.2252931820274327e-05,
+    "epoch": 0.856353591160221,
+    "step": 930
+  },
+  {
+    "loss": 1.351,
+    "grad_norm": 5.233123302459717,
+    "learning_rate": 1.0769528712125731e-05,
+    "epoch": 0.8655616942909761,
+    "step": 940
+  },
+  {
+    "loss": 1.2573,
+    "grad_norm": 4.566328048706055,
+    "learning_rate": 9.376703527667063e-06,
+    "epoch": 0.8747697974217311,
+    "step": 950
+  },
+  {
+    "loss": 1.3823,
+    "grad_norm": 5.684493064880371,
+    "learning_rate": 8.075870124871353e-06,
+    "epoch": 0.8839779005524862,
+    "step": 960
+  },
+  {
+    "loss": 1.4003,
+    "grad_norm": 4.066319942474365,
+    "learning_rate": 6.868348980779593e-06,
+    "epoch": 0.8931860036832413,
+    "step": 970
+  },
+  {
+    "eval_loss": 1.303568720817566,
+    "eval_runtime": 9.8347,
+    "eval_samples_per_second": 36.097,
+    "eval_steps_per_second": 9.05,
+    "epoch": 0.8950276243093923,
+    "step": 972
+  },
+  {
+    "loss": 1.3237,
+    "grad_norm": 3.711812734603882,
+    "learning_rate": 5.7553658510833945e-06,
+    "epoch": 0.9023941068139963,
+    "step": 980
+  },
+  {
+    "loss": 1.3512,
+    "grad_norm": 4.245476722717285,
+    "learning_rate": 4.738050525859317e-06,
+    "epoch": 0.9116022099447514,
+    "step": 990
+  },
+  {
+    "loss": 1.3341,
+    "grad_norm": 3.881774425506592,
+    "learning_rate": 3.817435682718096e-06,
+    "epoch": 0.9208103130755064,
+    "step": 1000
+  },
+  {
+    "loss": 1.3007,
+    "grad_norm": 5.654109477996826,
+    "learning_rate": 2.994455838532828e-06,
+    "epoch": 0.9300184162062615,
+    "step": 1010
+  },
+  {
+    "loss": 1.2577,
+    "grad_norm": 4.331023693084717,
+    "learning_rate": 2.269946400810041e-06,
+    "epoch": 0.9392265193370166,
+    "step": 1020
+  },
+  {
+    "eval_loss": 1.2990461587905884,
+    "eval_runtime": 9.5776,
+    "eval_samples_per_second": 37.066,
+    "eval_steps_per_second": 9.293,
+    "epoch": 0.9447513812154696,
+    "step": 1026
+  },
+  {
+    "loss": 1.3371,
+    "grad_norm": 4.437676906585693,
+    "learning_rate": 1.644642819666886e-06,
+    "epoch": 0.9484346224677717,
+    "step": 1030
+  },
+  {
+    "loss": 1.3207,
+    "grad_norm": 6.489171981811523,
+    "learning_rate": 1.119179841275131e-06,
+    "epoch": 0.9576427255985267,
+    "step": 1040
+  },
+  {
+    "loss": 1.3583,
+    "grad_norm": 4.911295413970947,
+    "learning_rate": 6.940908635298283e-07,
+    "epoch": 0.9668508287292817,
+    "step": 1050
+  },
+  {
+    "loss": 1.2529,
+    "grad_norm": 4.163908004760742,
+    "learning_rate": 3.6980739459665517e-07,
+    "epoch": 0.9760589318600368,
+    "step": 1060
+  },
+  {
+    "loss": 1.4483,
+    "grad_norm": 5.159338474273682,
+    "learning_rate": 1.4665861488761813e-07,
+    "epoch": 0.9852670349907919,
+    "step": 1070
+  },
+  {
+    "loss": 1.278,
+    "grad_norm": 4.065121173858643,
+    "learning_rate": 2.4871042909768715e-08,
+    "epoch": 0.994475138121547,
+    "step": 1080
+  },
+  {
+    "eval_loss": 1.2995774745941162,
+    "eval_runtime": 9.8586,
+    "eval_samples_per_second": 36.009,
+    "eval_steps_per_second": 9.028,
+    "epoch": 0.994475138121547,
+    "step": 1080
+  },
+  {
+    "train_runtime": 1140.8033,
+    "train_samples_per_second": 15.228,
+    "train_steps_per_second": 0.952,
+    "total_flos": 7629500232960000.0,
+    "train_loss": 1.5016714283994108,
+    "epoch": 1.0,
+    "step": 1086
+  }
+]

train/training_loss.png ADDED Viewed

train/validation_loss.png ADDED Viewed