Text Generation
GGUF
English
llama.cpp
llama
llama-3.1
3-bit
quantization
evr
evrmind
instruct
chat
on-device
maano
conversational
Instructions to use Evrmind/EVR-1-Maano-8b-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Evrmind/EVR-1-Maano-8b-Instruct with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Evrmind/EVR-1-Maano-8b-Instruct", filename="evr-llama-3.1-8b-instruct.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Evrmind/EVR-1-Maano-8b-Instruct with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Evrmind/EVR-1-Maano-8b-Instruct # Run inference directly in the terminal: llama-cli -hf Evrmind/EVR-1-Maano-8b-Instruct
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Evrmind/EVR-1-Maano-8b-Instruct # Run inference directly in the terminal: llama-cli -hf Evrmind/EVR-1-Maano-8b-Instruct
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Evrmind/EVR-1-Maano-8b-Instruct # Run inference directly in the terminal: ./llama-cli -hf Evrmind/EVR-1-Maano-8b-Instruct
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Evrmind/EVR-1-Maano-8b-Instruct # Run inference directly in the terminal: ./build/bin/llama-cli -hf Evrmind/EVR-1-Maano-8b-Instruct
Use Docker
docker model run hf.co/Evrmind/EVR-1-Maano-8b-Instruct
- LM Studio
- Jan
- vLLM
How to use Evrmind/EVR-1-Maano-8b-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Evrmind/EVR-1-Maano-8b-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Evrmind/EVR-1-Maano-8b-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Evrmind/EVR-1-Maano-8b-Instruct
- Ollama
How to use Evrmind/EVR-1-Maano-8b-Instruct with Ollama:
ollama run hf.co/Evrmind/EVR-1-Maano-8b-Instruct
- Unsloth Studio
How to use Evrmind/EVR-1-Maano-8b-Instruct with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Evrmind/EVR-1-Maano-8b-Instruct to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Evrmind/EVR-1-Maano-8b-Instruct to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Evrmind/EVR-1-Maano-8b-Instruct to start chatting
- Pi
How to use Evrmind/EVR-1-Maano-8b-Instruct with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Evrmind/EVR-1-Maano-8b-Instruct
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Evrmind/EVR-1-Maano-8b-Instruct" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Evrmind/EVR-1-Maano-8b-Instruct with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Evrmind/EVR-1-Maano-8b-Instruct
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Evrmind/EVR-1-Maano-8b-Instruct
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Evrmind/EVR-1-Maano-8b-Instruct with Docker Model Runner:
docker model run hf.co/Evrmind/EVR-1-Maano-8b-Instruct
- Lemonade
How to use Evrmind/EVR-1-Maano-8b-Instruct with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Evrmind/EVR-1-Maano-8b-Instruct
Run and chat with the model
lemonade run user.EVR-1-Maano-8b-Instruct-{{QUANT_TAG}}List all available models
lemonade list
File size: 7,452 Bytes
1669554 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | ---
license: other
license_name: evrmind-free-1.0
license_link: LICENSE.md
language:
- en
library_name: llama.cpp
tags:
- llama
- llama-3.1
- gguf
- 3-bit
- quantization
- evr
- evrmind
- text-generation
- instruct
- chat
- on-device
- maano
pipeline_tag: text-generation
model-index:
- name: Evrmind EVR-1 Maano-8b-Instruct
results:
- task:
type: text-generation
metrics:
- name: Perplexity (wikitext-2, ctx=512)
type: perplexity
value: 7.37
- name: Coherence (rep4 @ 500 tokens)
type: repetition-rate
value: 2.77
- name: Coherence (rep4 @ 1000 tokens)
type: repetition-rate
value: 9.66
---
# Evrmind EVR-1 Maano-8b-Instruct (3.93 GiB)
A custom 3-bit compression of Meta's Llama 3.1 8B Instruct that fits in **under 4 GiB** while maintaining coherent instruction-following at 1000+ tokens.
EVR-1 is not a standard quantization (not Q2, Q3, Q4, etc). It is a custom compression method developed by Evrmind. The compressed weights average approximately 3 bits per parameter; the total GGUF file (3.93 GiB) includes additional metadata and structure overhead.
## Key Results
| Metric | EVR-1 Instruct (3.93 GiB) |
|--------|--------------------------|
| Coherence @ 500 tokens | **2.77% repetition** |
| Coherence @ 1000 tokens | **9.66% repetition** |
| Perplexity (wikitext-2, ctx=512) | **7.37** |
## Chat Example
**User:** What are the main causes of the French Revolution?
**EVR-1 Maano-8b-Instruct:**
> The French Revolution, which lasted from 1789 to 1799, was a complex event with multiple causes. However, some of the main contributing factors include:
>
> 1. **Financial Crisis**: France was deeply in debt from its involvement in the American Revolutionary War and other wars. The burden of taxation fell on the common people, who were already facing economic hardship.
>
> 2. **Social Inequality**: The French nobility held a significant amount of power over commoners through their control over land and government positions.
>
> 3. **Enlightenment Ideas**: The ideas presented by Enlightenment thinkers such as Rousseau, Voltaire, Diderot encouraged people to think critically about their government and seek change.
>
> 4. **Food Shortages**: A series of poor harvests led to widespread hunger and starvation.
>
> 5. **Absolute Monarchy**: The rule of King Louis XVI was seen as tyrannical by many.
>
> 6. **The American Revolution's Influence**: The American Revolution inspired the French people with its ideas of liberty, equality, and fraternity.
## How to Run
Download the model file and the binary for your platform, then:
```bash
# Extract the binary
mkdir -p linux-cuda && tar xzf evrmind-linux-cuda.tar.gz -C linux-cuda
# Run (interactive chat)
cd linux-cuda
LD_LIBRARY_PATH=. ./llama-cli -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99
# Run (single completion)
LD_LIBRARY_PATH=. ./llama-completion -m ../evr-llama-3.1-8b-instruct.gguf -p "Your prompt here" -n 500 -ngl 99
```
### Platform Binaries
| Platform | File | GPU Required |
|----------|------|-------------|
| Linux + NVIDIA | `evrmind-linux-cuda.tar.gz` | NVIDIA GPU (CUDA 12) |
| Linux + Any GPU | `evrmind-linux-vulkan.tar.gz` | Any Vulkan-capable GPU |
| Windows + NVIDIA | `evrmind-windows-cuda.zip` | NVIDIA GPU (CUDA 12) |
| Windows + Any GPU | `evrmind-windows-vulkan.zip` | Any Vulkan-capable GPU |
| macOS (Apple Silicon) | `evrmind-macos-metal.tar.gz` | M1/M2/M3/M4 |
| Android (Termux) | `evrmind-android-vulkan.tar.gz` | Vulkan |
> **Note:** The binaries are the same for all EVR-1 models. You only need to download them once. Just point them at whichever GGUF you want to run.
### Flags
| Flag | Description |
|------|------------|
| `-ngl 99` | Offload all layers to GPU (recommended) |
| `-n 500` | Generate 500 tokens |
| `-p "..."` | Your prompt |
| `-t 8` | Number of CPU threads (for CPU layers) |
## Model Details
- **Name:** Evrmind EVR-1 Maano-8b-Instruct
- **Base model:** Meta Llama 3.1 8B Instruct
- **Size:** 3.93 GiB (GGUF)
- **Method:** EVR-1 (Evrmind Reconstruction), a custom 3-bit compression method
- **Backends:** CUDA, Vulkan, Metal, CPU
- **Context:** Tested up to 2048 tokens; longer contexts have not been validated at 3-bit compression
- **Chat template:** Llama 3.1 instruct format (built-in)
## Benchmarks
### Coherence (5 continuation-style prompts, 500 and 1000 tokens each)
Average 4-gram repetition rate (lower = better):
| Model | Size | rep4 @ 500 | rep4 @ 1000 |
|-------|------|-----------|-------------|
| **EVR-1 Instruct** | **3.93 GiB** | **2.77%** | **9.66%** |
## Also Available
- **[EVR-1 Maano-8b](https://huggingface.co/evrmind/evr-1-maano-8b)**, base model (not instruction-tuned), for text completion and creative writing
- **[EVR-1 Bafethu-8b-Reasoning](https://huggingface.co/evrmind/evr-1-bafethu-8b-reasoning)**, reasoning model (DeepSeek R1)
## Intended Use
This model is intended for on-device chat and instruction-following on laptops, desktops, and edge devices where memory is constrained. An Android (Termux) build is also available. There is no iOS build.
## Limitations
- Math reasoning is limited, consistent with the base Llama 3.1 8B Instruct at this compression level.
- Occasional minor character-level artefacts (e.g., dropped letters) due to 3-bit compression.
- Generation quality degrades somewhat beyond 1000 tokens.
- As with all heavily quantized models, generated text may contain factual inaccuracies (e.g., incorrect numbers, dates, or scientific details). Always verify factual claims independently.
## System Requirements
- **Storage:** ~4 GiB for model weights + ~50 MB for binaries
- **RAM:** 6 GiB minimum (8 GiB recommended)
- **GPU (recommended):** NVIDIA GPU with CUDA 12, Apple Silicon (M1/M2/M3/M4), or any Vulkan-capable GPU
- **CPU-only:** Supported but significantly slower
- **OS:** Linux (x86_64), macOS (Apple Silicon), Windows (x86_64), Android (Termux, ARM64)
- **Not supported:** iOS, 32-bit systems
## Safety and Responsible Use
This model inherits the capabilities and limitations of its base model (Meta Llama 3.1 8B Instruct). Like all language models, it can generate incorrect, biased, or harmful content. Users should:
- Not rely on this model for factual accuracy without verification
- Not use this model to generate content that could cause harm
- Apply appropriate content filtering for any user-facing applications
- Be aware that 3-bit compression may amplify certain failure modes of the base model
## Derivative Works
If you create derivative works, credit **"EVR-1 Maano"** in your model name and documentation. Commercial use is permitted subject to the Llama 3.1 Community License Agreement.
## License
Available for personal, research, and commercial use with attribution, subject to upstream license terms. See LICENSE.md for full terms.
Built with Llama. This model is a derivative of Meta's Llama 3.1 8B Instruct and is subject to the [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/) in addition to the Evrmind license.
## Citation
```
@misc{evrmind2026evr1maano8binstruct,
title={Evrmind EVR-1 Maano-8b-Instruct: A Custom 3-Bit Compression Method for Coherent On-Device Instruction-Following},
author={Evrmind},
year={2026},
url={https://huggingface.co/evrmind/evr-1-maano-8b-instruct}
}
```
## Contact
- Email: hello@evrmind.io
- Issues: [GitHub](https://github.com/evrmind-uk/evr-llama/issues)
|