Instructions to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned")
model = AutoModelForMultimodalLM.from_pretrained("kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned

SGLang

How to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with Docker Model Runner:
```
docker model run hf.co/kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🧠 LoggenixMoE315M: A Mid-Sized MoE Language Model (16E2A, 4K Context)

📝 Model Card

LoggenixMoE315M is a 315M parameter Mixture-of-Experts (MoE) Causal Language Model trained from scratch on a multi-task dataset featuring root cause analysis, code generation, instruction-following, and agentic reasoning tasks.

Architecture: Transformer with Qwen3-style MoE routing and extended 4K context capability.
Parameter Count: 315M total, with 2 experts active per token (approx. 182M active).
Experts: 16 in total, selected using token-level top-2 routing.
Special Tokens: Includes custom <think> and <tool_call> tokens to support planning and tool-based interactions for agentic AI use-cases.

📊 Training Details

Attribute	Value
Total Params	315M
MoE Config	16 experts, top-2 gating
Dataset Type	RCA, agent reasoning, code, instruction tasks
Training Epochs	5
Effective Tokens Seen	~ 5 Billion
Train Loss (final)	1.568
Mean Token Accuracy	~77.6%
Samples/sec	10.04
Steps/sec	0.314
Optimizer	AdamW
Scheduler	Linear Warmup + Cosine Decay
Precision	FP16 with GradScaler
Checkpoint Format	HF-compatible
Training Cost	~$108 on Modal (H200) + ~20$ 7* RTX 4090 system on Hyperbolic + ~10$ for synthetic data generation + ~10$ for eval + rest for failed attempts
Context Length	4096

📈 Standard Benchmark Results

To provide context, here's how Loggenix-MoE-0.3B compares to a typical open-source model of a similar size (e.g., GPT-2 Small, 124M parameters). While not state-of-the-art on all general tasks, the model shows promising gains with few-shot prompting and strong performance on its specific training domains.

Task	0-Shot	5-Shot	GPT-2 (124M)
ARC-Challenge	24%	25%	19%
ARC-Easy	30%	40%	28%
BoolQ	59%	40%	55%
GSM8K	0%	0%	0%
HellaSwag	27%	25%	28%
OpenBookQA	9%	0%	15%
PIQA	53%	70%	65%
Winogrande	52%	60%	50%

Note: The 0% score on GSM8K is expected, as the model was not trained on complex mathematical reasoning datasets. Its strengths lie in logical and text-based tasks, not symbolic manipulation.

📊 Benchmark Chart:

🤖 Synthetic Task Performance

These tasks simulate real-world infra and agent use cases, with scores reflecting performance on a custom synthetic benchmark dataset (on a scale of 0.0 to 1.0).

🧠 Chain-of-Thought Reasoning: 0.80
🔍 Log Error Detection: 0.80
🧑‍🏫 LLM Eval Reasoning: 0.80
🛠 Python Function Calling: 0.80
🧠 Think Token Trace Generation: 0.60
📉 RCA, RAG, Observability, SLI/SLO: 0.40–0.60

📊 Performance Chart:

🧪 Intended Use

✅ Suitable for:

Instruction-following & logic-based Q&A
Root cause analysis and structured summarization
Lightweight code generation and debugging
Long-context reasoning (up to 4K tokens)
Foundation for edge deployable AI agents
Tool-augmented or chain-of-thought agents via <think> / <tool_call> tokens

🚫 Not suitable for:

Open-domain factual QA at scale
Tasks requiring multi-modal reasoning
Safety-critical or real-time systems without human validation

🧨 Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5-finetuned")
model = AutoModelForCausalLM.from_pretrained("kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5-finetuned", device_map="auto")

tokenizer.pad_token = tokenizer.eos_token

messages = [
   {"role": "system", "content": ""},
   {"role": "user", "content": "Summarize the root cause of a database timeout error."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.6,
        top_p=0.9,
        use_cache=False
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🔧 Expert Routing This model employs top-2 token-level expert routing using router logits.

Routing is guided by auxiliary loss to balance expert load and improve convergence.

Enable output_router_logits=True in your inference pipeline for advanced inspection or analysis of routing behavior.

📃 License Released under the Apache 2.0 License for both commercial and research purposes.

🙌 Acknowledgements Trained using:

🧨 Hugging Face Transformers

💾 7* RTX 4090 (24GB VRAM) and H200 (140GB VRAM)

🧠 Custom gradient checkpointing and mixed-precision pipeline

📈 Logged via Weights & Biases

🗣️ Citation @misc{loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5-finetuned, title = {LoggenixMoE315M: A Mid-Sized Mixture-of-Experts Model with 16E2A Routing}, author = {Kshitij Thakkar}, year = {2025}, url = {https://huggingface.co/kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned}, note = {Trained from scratch on agentic reasoning + RCA + code datasets.} }

Downloads last month: 8

Safetensors

Model size

0.3B params

Tensor type

BF16

Model tree for kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned

Quantizations

2 models

Collection including kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned

Loggenix-MOE

Collection

Collection of Loggenix Models, Eval Dataset, Demo Playground. Soon will add the training dataset. • 11 items • Updated Oct 4, 2025 • 1