Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 2 days ago

Post

4747

A small gift for anyone building or studying foundation models.

Most "open" models hand you the weights and stop there. With Aether-7B-5Attn we wanted to hand over the whole thing — so you can actually learn from it, reproduce it, and build on it: the data recipe, the training code, every hyperparameter, the complete logs, and the intermediate checkpoints. All Apache-2.0, reproducible byte-for-byte.

What you can do with it:
🔁 Rebuild it from scratch, or fork the recipe for your own model
🔬 Study a real heterogeneous-attention MoE — 49 layers place 5 attention mechanisms on a 7×7 Latin square, arranged as a clean, attributable ablation
📈 Trace training dynamics across the released checkpoints (110k / 115k / 162k)

It's a modest 6.59B model, and an honest one — the limitations (no KV-cache in this build, small scale) are written right in the card. We're not claiming it's special. If any piece of it saves you time or teaches you something, that's exactly what we hoped for. 🤗

📖 Full write-up →
[blog] · https://huggingface.co/blog/FINAL-Bench/opensource-llm
📦 5 Attention Base · FINAL-Bench/Aether-7B-5Attn
🎯 5 Attention Instruct · FINAL-Bench/Aether-7B-5Attn-it
🚀 5 Attention Live demo · FINAL-Bench/Aether-Sovereign-AI
📦 7 Attention Base · https://huggingface.co/FINAL-Bench/Aether-7B-7Attn-base
📦 11 Attention Base · FINAL-Bench/Aether-6B-11Attn-base
🧬 Collection · https://huggingface.co/collections/FINAL-Bench/aether-foundation-model

#opensource #LLM #MoE #reproducibility #Apache2

5 replies

danielhanchen

posted an update about 24 hours ago

Post

1909

Introducing Unsloth for AMD 🚀
You can now train & run LLMs on your AMD hardware

• We collaborated with AMD to enable you to train & run 500+ models on AMD GPUs
• Works on Windows, WSL, Linux
• Train Qwen, Gemma on just 3GB VRAM

GitHub: https://github.com/unslothai/unsloth
Blog + Guide: https://unsloth.ai/docs/basics/amd

3 replies

danielhanchen

posted an update 3 days ago

Post

4798

Gemma 4 is now faster and much more accurate! 🚀

Google made huge improvements to tool-calling and chat accuracy, reliability + speed.
To get fixes, re-download our updated GGUF, MLX, NVFP4 quants!

Unsloth quants: https://huggingface.co/collections/unsloth/gemma-4
Gemma 4 Guide: https://unsloth.ai/docs/models/gemma-4

7 replies

appvoid

posted an update 3 days ago

Post

4526

if you are a tinkerer of small language models and want to stay ahead of what small models can do, follow me!!! seriously, start following people that actually still makes small models

i've made one recently btw

also, i'm keeping an eye on AxiomicLabs leaderboard, looks like the only current alternative to check where the things are going to

though, between us, i think they should add agentic/tool use benchmarks there

anyways,

enjoy!

appvoid/a-cool-model

19 replies

ucr-max

posted an update 1 day ago

Post

1918

We have released the first version of CompactLMIndex UniversalComputingResearch/clmi, a curated benchmark for smaller language models.

The index evaluates models across six benchmarks covering commonsense, science and world knowledge, and grammatical competence. Scores are normalized for random chance, so results are easier to compare across tasks with different baselines.
The composite gives 50% weight to commonsense, 25% to science and world knowledge, and 25% to grammatical competence. BLiMP is macro-averaged across its 12 linguistic categories. Every result includes 95% bootstrap intervals, and rank ranges reflect paired comparisons on matched evaluation items.

The goal is to compare compact models under one fixed, zero-shot evaluation protocol and make differences in architecture, training, and inference visible without treating parameter count as a substitute for quality.

If you have results for a model that fits the benchmark scope, please open a discussion on the Space.

salma-remyx

posted an update about 21 hours ago

Post

1196

There's plenty of AI coding tools now. None help you decide what to build next.

Every week brings a flood of new methods, models, and techniques. Nobody has time to read everything, figure out what applies to their codebase, let alone test it all.

We've open sourced Outrider to systematically explore new ideas for improving your AI system. In a few steps, you can set up the Outrider GitHub Action to:

- continuously search for relevant advances
- evaluate whether they fit your codebase
- open draft PRs when it finds worthwhile improvements

so you never stop discovering what's next.
Outrider: https://github.com/remyxai/outrider
Try it at https://studio.remyx.ai
Docs at https://docs.remyx.ai

ManniX-ITA

posted an update 3 days ago

Post

1975

🚀 New release: Qwen3.6-27B-A3B-Coder

A code-specialized MoE carved out of Qwen3.6-35B-A3B by pure expert pruning — no fine-tuning, no distillation. I profiled all 256 experts on balanced corpora plus targeted code benchmarks (LiveCodeBench +
MultiPL-E), built a competence map with the code classes up-weighted 1.5×, and dropped the 72 weakest experts per layer (256→184, ~35B→27B). Router, attention, norms, the MTP head and the vision tower are all
preserved; active params stay at A3B and routing is baked to top-10 (revert to top-8 anytime).

📊 Benchmarks (Q6_K, temp 0.6):
• MultiPL-E 0.840
• HumanEval 0.970
• LiveCodeBench 0.688
• GSM8K 0.970 · ARC-C 0.944 · AIME 0.733
• GPQA-Diamond 0.773 · MATH-500 0.620 · IFEval 0.730
• Average 0.808

27B footprint, A3B speed, coding that punches well above its size — and the preserved MTP head gives you speculative decoding out of the box (text + vision).

🔗 Model: ManniX-ITA/Qwen3.6-27B-A3B-Coder
📦 GGUF (+MTP): ManniX-ITA/Qwen3.6-27B-A3B-Coder-MTP-GGUF
🦙 Ollama: https://ollama.com/mannix/qwen3.6-27b-a3b-coder

6 replies

Banaxi-Tech

posted an update 4 days ago

Post

2508

Introducing BananaMind 2 Nano

BananaMind 2 Nano is the smallest member of the BananaMind 2.0 family — a 10M-parameter language model that shows how much you can squeeze out of a tiny footprint. It uses the family's digit-isolated tokenizer, so it keeps solid arithmetic despite its size, and it's small enough to run just about anywhere.

Trained on 30B tokens in about a day on a single RTX 5070 Ti (16GB), 4096-token context.

Benchmarks:

Average 35.77
ARC Easy 36.20
PIQA 55.98
ARC Challenge 23.38
HellaSwag 27.50

That 35.77 average edges out Pythia-31M (~34.79) at roughly a third the parameters.

Released under Apache 2.0 on Hugging Face: BananaMind/BananaMind-2-Nano — weights, tokenizer, and config included.

1 reply

HannesVonEssen

posted an update about 15 hours ago

Post

📣 Countdown Kimi K3 with us!

Read our deep dive into the architecture of Kimi K3 and get notified on July 27 when it gets released and is viewable on hfviewer.com!

https://hfviewer.com/moonshotai/kimi-k3

Banaxi-Tech

posted an update 1 day ago

Post

648

Lets get 80 followers on my account and 30 on BananaMind.
When we hit that we are going to release BananaMind 2 Medium tomorrow.
Me: @Banaxi-Tech
BananaMind:

BananaMind

Early checkpoint shows #1 for <50M on the Open SLM Leaderboard!

Keep Shipping! 🚀

7 replies

Recently active users