Instructions to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned") model = AutoModelForMultimodalLM.from_pretrained("kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned
- SGLang
How to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned with Docker Model Runner:
docker model run hf.co/kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned
🧠 LoggenixMoE315M: A Mid-Sized MoE Language Model (16E2A, 4K Context)
📝 Model Card
LoggenixMoE315M is a 315M parameter Mixture-of-Experts (MoE) Causal Language Model trained from scratch on a multi-task dataset featuring root cause analysis, code generation, instruction-following, and agentic reasoning tasks.
- Architecture: Transformer with Qwen3-style MoE routing and extended 4K context capability.
- Parameter Count: 315M total, with 2 experts active per token (approx. 182M active).
- Experts: 16 in total, selected using token-level top-2 routing.
- Special Tokens: Includes custom
<think>and<tool_call>tokens to support planning and tool-based interactions for agentic AI use-cases.
📊 Training Details
| Attribute | Value |
|---|---|
| Total Params | 315M |
| MoE Config | 16 experts, top-2 gating |
| Dataset Type | RCA, agent reasoning, code, instruction tasks |
| Training Epochs | 5 |
| Effective Tokens Seen | ~ 5 Billion |
| Train Loss (final) | 1.568 |
| Mean Token Accuracy | ~77.6% |
| Samples/sec | 10.04 |
| Steps/sec | 0.314 |
| Optimizer | AdamW |
| Scheduler | Linear Warmup + Cosine Decay |
| Precision | FP16 with GradScaler |
| Checkpoint Format | HF-compatible |
| Training Cost | ~$108 on Modal (H200) + ~20$ 7* RTX 4090 system on Hyperbolic + ~10$ for synthetic data generation + ~10$ for eval + rest for failed attempts |
| Context Length | 4096 |
📈 Standard Benchmark Results
To provide context, here's how Loggenix-MoE-0.3B compares to a typical open-source model of a similar size (e.g., GPT-2 Small, 124M parameters). While not state-of-the-art on all general tasks, the model shows promising gains with few-shot prompting and strong performance on its specific training domains.
| Task | 0-Shot | 5-Shot | GPT-2 (124M) |
|---|---|---|---|
| ARC-Challenge | 24% | 25% | 19% |
| ARC-Easy | 30% | 40% | 28% |
| BoolQ | 59% | 40% | 55% |
| GSM8K | 0% | 0% | 0% |
| HellaSwag | 27% | 25% | 28% |
| OpenBookQA | 9% | 0% | 15% |
| PIQA | 53% | 70% | 65% |
| Winogrande | 52% | 60% | 50% |
Note: The 0% score on GSM8K is expected, as the model was not trained on complex mathematical reasoning datasets. Its strengths lie in logical and text-based tasks, not symbolic manipulation.
🤖 Synthetic Task Performance
These tasks simulate real-world infra and agent use cases, with scores reflecting performance on a custom synthetic benchmark dataset (on a scale of 0.0 to 1.0).
- 🧠 Chain-of-Thought Reasoning: 0.80
- 🔍 Log Error Detection: 0.80
- 🧑🏫 LLM Eval Reasoning: 0.80
- 🛠 Python Function Calling: 0.80
- 🧠 Think Token Trace Generation: 0.60
- 📉 RCA, RAG, Observability, SLI/SLO: 0.40–0.60
🧪 Intended Use
✅ Suitable for:
- Instruction-following & logic-based Q&A
- Root cause analysis and structured summarization
- Lightweight code generation and debugging
- Long-context reasoning (up to 4K tokens)
- Foundation for edge deployable AI agents
- Tool-augmented or chain-of-thought agents via
<think>/<tool_call>tokens
🚫 Not suitable for:
- Open-domain factual QA at scale
- Tasks requiring multi-modal reasoning
- Safety-critical or real-time systems without human validation
🧨 Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5-finetuned")
model = AutoModelForCausalLM.from_pretrained("kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5-finetuned", device_map="auto")
tokenizer.pad_token = tokenizer.eos_token
messages = [
{"role": "system", "content": ""},
{"role": "user", "content": "Summarize the root cause of a database timeout error."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.6,
top_p=0.9,
use_cache=False
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
🔧 Expert Routing This model employs top-2 token-level expert routing using router logits.
Routing is guided by auxiliary loss to balance expert load and improve convergence.
Enable output_router_logits=True in your inference pipeline for advanced inspection or analysis of routing behavior.
📃 License Released under the Apache 2.0 License for both commercial and research purposes.
🙌 Acknowledgements Trained using:
🧨 Hugging Face Transformers
💾 7* RTX 4090 (24GB VRAM) and H200 (140GB VRAM)
🧠 Custom gradient checkpointing and mixed-precision pipeline
📈 Logged via Weights & Biases
🗣️ Citation @misc{loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5-finetuned, title = {LoggenixMoE315M: A Mid-Sized Mixture-of-Experts Model with 16E2A Routing}, author = {Kshitij Thakkar}, year = {2025}, url = {https://huggingface.co/kshitijthakkar/loggenix-moe-0.3B-A0.1B-e3-lr7e5-b16-4090-v5.1-finetuned}, note = {Trained from scratch on agentic reasoning + RCA + code datasets.} }
- Downloads last month
- 8


