--- license: other license_name: evrmind-free-1.0 license_link: LICENSE.md language: - en library_name: llama.cpp tags: - llama - llama-3.1 - gguf - 3-bit - quantization - evr - evrmind - text-generation - instruct - chat - on-device - maano pipeline_tag: text-generation model-index: - name: Evrmind EVR-1 Maano-8b-Instruct results: - task: type: text-generation metrics: - name: Perplexity (wikitext-2, ctx=512) type: perplexity value: 7.37 - name: Coherence (rep4 @ 500 tokens) type: repetition-rate value: 2.77 - name: Coherence (rep4 @ 1000 tokens) type: repetition-rate value: 9.66 --- # Evrmind EVR-1 Maano-8b-Instruct (3.93 GiB) A custom 3-bit compression of Meta's Llama 3.1 8B Instruct that fits in **under 4 GiB** while maintaining coherent instruction-following at 1000+ tokens. EVR-1 is not a standard quantization (not Q2, Q3, Q4, etc). It is a custom compression method developed by Evrmind. The compressed weights average approximately 3 bits per parameter; the total GGUF file (3.93 GiB) includes additional metadata and structure overhead. ## Key Results | Metric | EVR-1 Instruct (3.93 GiB) | |--------|--------------------------| | Coherence @ 500 tokens | **2.77% repetition** | | Coherence @ 1000 tokens | **9.66% repetition** | | Perplexity (wikitext-2, ctx=512) | **7.37** | ## Chat Example **User:** What are the main causes of the French Revolution? **EVR-1 Maano-8b-Instruct:** > The French Revolution, which lasted from 1789 to 1799, was a complex event with multiple causes. However, some of the main contributing factors include: > > 1. **Financial Crisis**: France was deeply in debt from its involvement in the American Revolutionary War and other wars. The burden of taxation fell on the common people, who were already facing economic hardship. > > 2. **Social Inequality**: The French nobility held a significant amount of power over commoners through their control over land and government positions. > > 3. **Enlightenment Ideas**: The ideas presented by Enlightenment thinkers such as Rousseau, Voltaire, Diderot encouraged people to think critically about their government and seek change. > > 4. **Food Shortages**: A series of poor harvests led to widespread hunger and starvation. > > 5. **Absolute Monarchy**: The rule of King Louis XVI was seen as tyrannical by many. > > 6. **The American Revolution's Influence**: The American Revolution inspired the French people with its ideas of liberty, equality, and fraternity. ## How to Run Download the model file and the binary for your platform, then: ```bash # Extract the binary mkdir -p linux-cuda && tar xzf evrmind-linux-cuda.tar.gz -C linux-cuda # Run (interactive chat) cd linux-cuda LD_LIBRARY_PATH=. ./llama-cli -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99 # Run (single completion) LD_LIBRARY_PATH=. ./llama-completion -m ../evr-llama-3.1-8b-instruct.gguf -p "Your prompt here" -n 500 -ngl 99 ``` ### Platform Binaries | Platform | File | GPU Required | |----------|------|-------------| | Linux + NVIDIA | `evrmind-linux-cuda.tar.gz` | NVIDIA GPU (CUDA 12) | | Linux + Any GPU | `evrmind-linux-vulkan.tar.gz` | Any Vulkan-capable GPU | | Windows + NVIDIA | `evrmind-windows-cuda.zip` | NVIDIA GPU (CUDA 12) | | Windows + Any GPU | `evrmind-windows-vulkan.zip` | Any Vulkan-capable GPU | | macOS (Apple Silicon) | `evrmind-macos-metal.tar.gz` | M1/M2/M3/M4 | | Android (Termux) | `evrmind-android-vulkan.tar.gz` | Vulkan | > **Note:** The binaries are the same for all EVR-1 models. You only need to download them once. Just point them at whichever GGUF you want to run. ### Flags | Flag | Description | |------|------------| | `-ngl 99` | Offload all layers to GPU (recommended) | | `-n 500` | Generate 500 tokens | | `-p "..."` | Your prompt | | `-t 8` | Number of CPU threads (for CPU layers) | ## Model Details - **Name:** Evrmind EVR-1 Maano-8b-Instruct - **Base model:** Meta Llama 3.1 8B Instruct - **Size:** 3.93 GiB (GGUF) - **Method:** EVR-1 (Evrmind Reconstruction), a custom 3-bit compression method - **Backends:** CUDA, Vulkan, Metal, CPU - **Context:** Tested up to 2048 tokens; longer contexts have not been validated at 3-bit compression - **Chat template:** Llama 3.1 instruct format (built-in) ## Benchmarks ### Coherence (5 continuation-style prompts, 500 and 1000 tokens each) Average 4-gram repetition rate (lower = better): | Model | Size | rep4 @ 500 | rep4 @ 1000 | |-------|------|-----------|-------------| | **EVR-1 Instruct** | **3.93 GiB** | **2.77%** | **9.66%** | ## Also Available - **[EVR-1 Maano-8b](https://huggingface.co/evrmind/evr-1-maano-8b)**, base model (not instruction-tuned), for text completion and creative writing - **[EVR-1 Bafethu-8b-Reasoning](https://huggingface.co/evrmind/evr-1-bafethu-8b-reasoning)**, reasoning model (DeepSeek R1) ## Intended Use This model is intended for on-device chat and instruction-following on laptops, desktops, and edge devices where memory is constrained. An Android (Termux) build is also available. There is no iOS build. ## Limitations - Math reasoning is limited, consistent with the base Llama 3.1 8B Instruct at this compression level. - Occasional minor character-level artefacts (e.g., dropped letters) due to 3-bit compression. - Generation quality degrades somewhat beyond 1000 tokens. - As with all heavily quantized models, generated text may contain factual inaccuracies (e.g., incorrect numbers, dates, or scientific details). Always verify factual claims independently. ## System Requirements - **Storage:** ~4 GiB for model weights + ~50 MB for binaries - **RAM:** 6 GiB minimum (8 GiB recommended) - **GPU (recommended):** NVIDIA GPU with CUDA 12, Apple Silicon (M1/M2/M3/M4), or any Vulkan-capable GPU - **CPU-only:** Supported but significantly slower - **OS:** Linux (x86_64), macOS (Apple Silicon), Windows (x86_64), Android (Termux, ARM64) - **Not supported:** iOS, 32-bit systems ## Safety and Responsible Use This model inherits the capabilities and limitations of its base model (Meta Llama 3.1 8B Instruct). Like all language models, it can generate incorrect, biased, or harmful content. Users should: - Not rely on this model for factual accuracy without verification - Not use this model to generate content that could cause harm - Apply appropriate content filtering for any user-facing applications - Be aware that 3-bit compression may amplify certain failure modes of the base model ## Derivative Works If you create derivative works, credit **"EVR-1 Maano"** in your model name and documentation. Commercial use is permitted subject to the Llama 3.1 Community License Agreement. ## License Available for personal, research, and commercial use with attribution, subject to upstream license terms. See LICENSE.md for full terms. Built with Llama. This model is a derivative of Meta's Llama 3.1 8B Instruct and is subject to the [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/) in addition to the Evrmind license. ## Citation ``` @misc{evrmind2026evr1maano8binstruct, title={Evrmind EVR-1 Maano-8b-Instruct: A Custom 3-Bit Compression Method for Coherent On-Device Instruction-Following}, author={Evrmind}, year={2026}, url={https://huggingface.co/evrmind/evr-1-maano-8b-instruct} } ``` ## Contact - Email: hello@evrmind.io - Issues: [GitHub](https://github.com/evrmind-uk/evr-llama/issues)