--- language: en tags: - llm - compression - nanoquant - quantization - pruning license: apache-2.0 datasets: [] model-index: [] --- # NanoQuant Compressed Model ## Model Description This is a compressed version of [tencent/Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B) created using NanoQuant, an advanced LLM compression toolkit. ## Compression Details - **Compression Level**: medium - **Size Reduction**: 77.0% - **Techniques Used**: - Quantization: 8bit - Pruning: magnitude - LoRA: {'r': 32, 'alpha': 32, 'dropout': 0.1} ## Deployment Options ### Option 1: Direct Usage with Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_medium") tokenizer = AutoTokenizer.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_medium") ``` ### Option 2: Ollama Deployment This model is also available for Ollama: ```bash ollama pull nanoquant-tencent-Hunyuan-MT-7B:medium ``` ## Performance Characteristics Due to the compression, this model: - Requires significantly less storage space - Has faster loading times - Uses less memory during inference - Maintains most of the original model's capabilities ## Original Model For information about the original model, please visit: https://huggingface.co/tencent/Hunyuan-MT-7B ## License This model is released under the Apache 2.0 license. ## NanoQuant NanoQuant is an advanced model compression system that achieves up to 99.95% size reduction while maintaining model performance. Learn more at [NanoQuant Documentation](https://github.com/nanoquant/nanoquant).