Instructions to use veyra-ai/veyra-30m-base-5b-tokens with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use veyra-ai/veyra-30m-base-5b-tokens with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="veyra-ai/veyra-30m-base-5b-tokens", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("veyra-ai/veyra-30m-base-5b-tokens", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use veyra-ai/veyra-30m-base-5b-tokens with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "veyra-ai/veyra-30m-base-5b-tokens" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "veyra-ai/veyra-30m-base-5b-tokens", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/veyra-ai/veyra-30m-base-5b-tokens
- SGLang
How to use veyra-ai/veyra-30m-base-5b-tokens with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "veyra-ai/veyra-30m-base-5b-tokens" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "veyra-ai/veyra-30m-base-5b-tokens", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "veyra-ai/veyra-30m-base-5b-tokens" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "veyra-ai/veyra-30m-base-5b-tokens", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use veyra-ai/veyra-30m-base-5b-tokens with Docker Model Runner:
docker model run hf.co/veyra-ai/veyra-30m-base-5b-tokens
Veyra 30M Base 5B Checkpoint
This is an early Veyra-30M base checkpoint trained for approximately 5B pretraining tokens.
It is not instruction tuned and should not be evaluated like a finished chat assistant. It is expected to hallucinate, repeat, fail simple factual/math prompts, and continue text in odd ways. This checkpoint is uploaded for transparency, reproducibility, and milestone tracking before further continuation training. This checkpoint is not optimized for use on edge devices yet.
Training summary
Approximate training stages:
- 1B tokens: Cosmopedia v2 bootstrap pretraining.
- +1.5B tokens: mixed continuation using Cosmopedia-v2 repository configs including
cosmopedia-v2,fineweb-edu-dedup, andpython-edu. - +2.5B tokens: Went back to Cosmopedia v2 but increased context length from 512 -> 1024.
- Total: about 5B pretraining tokens.
Architecture
Veyra-30M is a small attention-sparse decoder-only language model.
Key details:
- Exact parameters: 31,988,224 / 31.99M
- Vocabulary: 8,192 tokens
- Hidden size: 512
- Layers: 8
- Layer pattern:
AMAMAMAMA= attention + MLP blockM= MLP-only block
- Attention heads: 8 query heads, 2 KV heads
- MLP intermediate size: 2048
- Activation: SwiGLU
- Normalization: RMSNorm
- Position encoding: RoPE
- Tied token embeddings / LM head
- Context in this checkpoint: 1024 tokens
Loading
This repository uses custom Transformers code.
Minimal usage:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
repo = "veyra-ai/veyra-30m-base-5b-tokens"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True, dtype=torch.float32)
model.eval()
prompt = "Photosynthesis is the process by which"
input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
out = model.generate(
input_ids,
do_sample=True,
temperature=0.5,
top_k=30,
repetition_penalty=1.15,
no_repeat_ngram_size=2,
max_new_tokens=80,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
For raw completion prompts, use add_special_tokens=False.
Optimizer
Training used:
- CosineGatedAdam / CGA-v0 on 2D projection matrices
- AdamW on embeddings, norms, tied head, and auxiliary parameters
Intended use
This checkpoint is primarily for:
- continued pretraining
- research / ablations
- tracking Veyra training milestones
- testing tiny model behavior
It is not intended for production use or reliable factual answering.
Known limitations
This model can:
- hallucinate confidently
- repeat phrases
- fail arithmetic
- fail simple factual questions
- produce fake code
- continue in textbook-like or tutorial-like styles
Further continuation pretraining and post-training are planned.
- Downloads last month
- 1,401