Instructions to use kromvault/L3.1-Siithamo-v0.2-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kromvault/L3.1-Siithamo-v0.2-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kromvault/L3.1-Siithamo-v0.2-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("kromvault/L3.1-Siithamo-v0.2-8B")
model = AutoModelForMultimodalLM.from_pretrained("kromvault/L3.1-Siithamo-v0.2-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kromvault/L3.1-Siithamo-v0.2-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kromvault/L3.1-Siithamo-v0.2-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kromvault/L3.1-Siithamo-v0.2-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kromvault/L3.1-Siithamo-v0.2-8B

SGLang

How to use kromvault/L3.1-Siithamo-v0.2-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kromvault/L3.1-Siithamo-v0.2-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kromvault/L3.1-Siithamo-v0.2-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kromvault/L3.1-Siithamo-v0.2-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kromvault/L3.1-Siithamo-v0.2-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use kromvault/L3.1-Siithamo-v0.2-8B with Docker Model Runner:
```
docker model run hf.co/kromvault/L3.1-Siithamo-v0.2-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Second (third) time's the charm. After fighting with Formax trying to increase it's max context to something that isn't 4k, spat out this merge as a result. Still maintains a lot of v0.1's properties; creativity, literacy, and chattiness. Knowing everything I've learned making this, time to dive headfirst into making an L3.1 space whale.

I stg LLMs are testing me.

Quants

OG Q8 GGUF by me.

GGUFs by mradermacher

Details & Recommended Settings

Outputs a lot, pretty fucking chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. Flowery dramatic writing at times. Starts repeating at basic settings around 8k but DRY eliminates it and can handle 32k context. Very good instructions following.

Rec. Settings:

Template: L3
Temperature: 1.4
Min P: 0.1
Repeat Penalty: 1.05
Repeat Penalty Tokens: 256
Dyn Temp: 0.9-1.05 at 0.1
Smooth Sampl: 0.18

Models Merged & Merge Theory

The following models were included in the merge:

Compared to v0.1, the siithamol3.1 part stayed the same. To 'increase' the context of Formax, just chopped of the ladder half and replaced it with a ~1M context model and that seemed to do the trick (after doing a bunch of other shit, this was the simplest and easiest route). Then, changed from dare_linear to breadcrumbs for the final merge, gave a better output without the hassle. Again, TIES anything didn't work nearly as well.

Config

slices:
- sources:
  - layer_range: [0, 16]
    model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
- sources:
  - layer_range: [16, 32]
    model: gradientai/Llama-3-8B-Instruct-Gradient-1048k
parameters:
  int8_mask: true
merge_method: passthrough
dtype: float32
out_dtype: bfloat16
name: formax.ext
---
models:
    - model: Sao10K/L3.1-8B-Niitama-v1.1
    - model: Sao10K/L3-8B-Stheno-v3.3-32K
    - model: Sao10K/L3-8B-Tamamo-v1
base_model: Edgerunners/Lyraea-large-llama-3.1
parameters:
  normalize: false
  int8_mask: true
merge_method: model_stock
dtype: float32
out_dtype: bfloat16
name: siithamol3.1
---
models: 
  - model: siitamol3.1
    parameters:
      weight: [0.5, 0.8, 0.9, 1]
      density: 0.9
      gamma: 0.01
  - model: formax.ext
    parameters:
      weight: [0.5, 0.2, 0.1, 0]
      density: 0.9
      gamma: 0.01
base_model: siitamol3.1
parameters:
  normalize: false
  int8_mask: true
merge_method: breadcrumbs
dtype: float32
out_dtype: bfloat16

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for kromvault/L3.1-Siithamo-v0.2-8B

Edgerunners/Lyraea-large-llama-3.1

OwenArli/Llama-3-8B-ArliAI-Formax-v1.0

Sao10K/L3-8B-Stheno-v3.3-32K

Sao10K/L3-8B-Tamamo-v1

Sao10K/L3.1-8B-Niitama-v1.1

gradientai/Llama-3-8B-Instruct-Gradient-1048k

Merge model

this model

Quantizations

3 models