Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Open to Collab
90.0
TFLOPS
88
17
285
s3nh
PRO
s3nh
Follow
ksiabani's profile picture
Thellton's profile picture
theycallmeloki's profile picture
256 followers
ยท
116 following
s3nhxx
s3nh
AI & ML interests
Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh
Recent Activity
liked
a model
about 7 hours ago
DJLougen/Qwen3.6-35B-A3B-REAP-90pct-GGUF
reacted
to
fblgit
's
post
with ๐
1 day ago
Introducing `HarEmb - PII` a single-transformer-block distilled layer from OpenMed PII Privacy filter. Its a very tiny model that reaches comparable results at PII classification thru viterbi BIOES decoding, harnessing 98%~ the original model performance while being a tiny fraction of the base model. It doubles the performance tk/s, reduces the active params dramatically and the VRAM footprint. The evaluation & benchmarking is within the model repository and can be reproduced. I trained it with an RTX4090 without issues and it is compatible with OpenMed suite and a in-place replacement for openai privacy-filter model. https://huggingface.co/fblgit/haremb-privacy-filter-opennemo I'm looking for people who wants to co-author/contribute/endorse HarEmb research and the technical paper for the model. Contact xavi@juanako.ai
reacted
to
lbourdois
's
post
with ๐ค
1 day ago
New blog post! An introduction to a little-known but highly effective model reduction method: ๐ง๐ฟ๐ถ๐บ๐บ๐ถ๐ป๐ดโ๏ธ We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance. We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text. From these 16 families, we generated over ๐ฑ,๐ฑ๐ฌ๐ฌ ๐บ๐ผ๐ป๐ผ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ถ๐ป ๐ญ๐ฎ๐ฐ ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ๐ ๐ Key takeaways from our experiments: 1๏ธโฃ Trimming does not require a GPU. Our models were obtained on a CPU. 2๏ธโฃ This method scales up to at least 4B parameters (we did not test beyond that). 3๏ธโฃ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance. 4๏ธโฃ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original. 5๏ธโฃ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter. 6๏ธโฃ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language. And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost! Blogpost: https://huggingface.co/blog/lbourdois/introduction-to-trimming Models: https://huggingface.co/spaces/alphaedge-ai/Trimming_models_search
View all activity
Organizations
s3nh
's datasets
1
Sort:ย Recently updated
s3nh/alpaca-dolly-instruction-only-polish
Viewer
โข
Updated
May 2, 2023
โข
23.7k
โข
23
โข
6