EVR-1-Maano-8b-Instruct / MODEL_CARD.md
Evrmind's picture
Upload folder using huggingface_hub
1669554 verified
metadata
license: other
license_name: evrmind-free-1.0
license_link: LICENSE.md
language:
  - en
library_name: llama.cpp
tags:
  - llama
  - llama-3.1
  - gguf
  - 3-bit
  - quantization
  - evr
  - evrmind
  - text-generation
  - instruct
  - chat
  - on-device
  - maano
pipeline_tag: text-generation
model-index:
  - name: Evrmind EVR-1 Maano-8b-Instruct
    results:
      - task:
          type: text-generation
        metrics:
          - name: Perplexity (wikitext-2, ctx=512)
            type: perplexity
            value: 7.37
          - name: Coherence (rep4 @ 500 tokens)
            type: repetition-rate
            value: 2.77
          - name: Coherence (rep4 @ 1000 tokens)
            type: repetition-rate
            value: 9.66

Evrmind EVR-1 Maano-8b-Instruct (3.93 GiB)

A custom 3-bit compression of Meta's Llama 3.1 8B Instruct that fits in under 4 GiB while maintaining coherent instruction-following at 1000+ tokens.

EVR-1 is not a standard quantization (not Q2, Q3, Q4, etc). It is a custom compression method developed by Evrmind. The compressed weights average approximately 3 bits per parameter; the total GGUF file (3.93 GiB) includes additional metadata and structure overhead.

Key Results

Metric EVR-1 Instruct (3.93 GiB)
Coherence @ 500 tokens 2.77% repetition
Coherence @ 1000 tokens 9.66% repetition
Perplexity (wikitext-2, ctx=512) 7.37

Chat Example

User: What are the main causes of the French Revolution?

EVR-1 Maano-8b-Instruct:

The French Revolution, which lasted from 1789 to 1799, was a complex event with multiple causes. However, some of the main contributing factors include:

  1. Financial Crisis: France was deeply in debt from its involvement in the American Revolutionary War and other wars. The burden of taxation fell on the common people, who were already facing economic hardship.

  2. Social Inequality: The French nobility held a significant amount of power over commoners through their control over land and government positions.

  3. Enlightenment Ideas: The ideas presented by Enlightenment thinkers such as Rousseau, Voltaire, Diderot encouraged people to think critically about their government and seek change.

  4. Food Shortages: A series of poor harvests led to widespread hunger and starvation.

  5. Absolute Monarchy: The rule of King Louis XVI was seen as tyrannical by many.

  6. The American Revolution's Influence: The American Revolution inspired the French people with its ideas of liberty, equality, and fraternity.

How to Run

Download the model file and the binary for your platform, then:

# Extract the binary
mkdir -p linux-cuda && tar xzf evrmind-linux-cuda.tar.gz -C linux-cuda

# Run (interactive chat)
cd linux-cuda
LD_LIBRARY_PATH=. ./llama-cli -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99

# Run (single completion)
LD_LIBRARY_PATH=. ./llama-completion -m ../evr-llama-3.1-8b-instruct.gguf -p "Your prompt here" -n 500 -ngl 99

Platform Binaries

Platform File GPU Required
Linux + NVIDIA evrmind-linux-cuda.tar.gz NVIDIA GPU (CUDA 12)
Linux + Any GPU evrmind-linux-vulkan.tar.gz Any Vulkan-capable GPU
Windows + NVIDIA evrmind-windows-cuda.zip NVIDIA GPU (CUDA 12)
Windows + Any GPU evrmind-windows-vulkan.zip Any Vulkan-capable GPU
macOS (Apple Silicon) evrmind-macos-metal.tar.gz M1/M2/M3/M4
Android (Termux) evrmind-android-vulkan.tar.gz Vulkan

Note: The binaries are the same for all EVR-1 models. You only need to download them once. Just point them at whichever GGUF you want to run.

Flags

Flag Description
-ngl 99 Offload all layers to GPU (recommended)
-n 500 Generate 500 tokens
-p "..." Your prompt
-t 8 Number of CPU threads (for CPU layers)

Model Details

  • Name: Evrmind EVR-1 Maano-8b-Instruct
  • Base model: Meta Llama 3.1 8B Instruct
  • Size: 3.93 GiB (GGUF)
  • Method: EVR-1 (Evrmind Reconstruction), a custom 3-bit compression method
  • Backends: CUDA, Vulkan, Metal, CPU
  • Context: Tested up to 2048 tokens; longer contexts have not been validated at 3-bit compression
  • Chat template: Llama 3.1 instruct format (built-in)

Benchmarks

Coherence (5 continuation-style prompts, 500 and 1000 tokens each)

Average 4-gram repetition rate (lower = better):

Model Size rep4 @ 500 rep4 @ 1000
EVR-1 Instruct 3.93 GiB 2.77% 9.66%

Also Available

Intended Use

This model is intended for on-device chat and instruction-following on laptops, desktops, and edge devices where memory is constrained. An Android (Termux) build is also available. There is no iOS build.

Limitations

  • Math reasoning is limited, consistent with the base Llama 3.1 8B Instruct at this compression level.
  • Occasional minor character-level artefacts (e.g., dropped letters) due to 3-bit compression.
  • Generation quality degrades somewhat beyond 1000 tokens.
  • As with all heavily quantized models, generated text may contain factual inaccuracies (e.g., incorrect numbers, dates, or scientific details). Always verify factual claims independently.

System Requirements

  • Storage: ~4 GiB for model weights + ~50 MB for binaries
  • RAM: 6 GiB minimum (8 GiB recommended)
  • GPU (recommended): NVIDIA GPU with CUDA 12, Apple Silicon (M1/M2/M3/M4), or any Vulkan-capable GPU
  • CPU-only: Supported but significantly slower
  • OS: Linux (x86_64), macOS (Apple Silicon), Windows (x86_64), Android (Termux, ARM64)
  • Not supported: iOS, 32-bit systems

Safety and Responsible Use

This model inherits the capabilities and limitations of its base model (Meta Llama 3.1 8B Instruct). Like all language models, it can generate incorrect, biased, or harmful content. Users should:

  • Not rely on this model for factual accuracy without verification
  • Not use this model to generate content that could cause harm
  • Apply appropriate content filtering for any user-facing applications
  • Be aware that 3-bit compression may amplify certain failure modes of the base model

Derivative Works

If you create derivative works, credit "EVR-1 Maano" in your model name and documentation. Commercial use is permitted subject to the Llama 3.1 Community License Agreement.

License

Available for personal, research, and commercial use with attribution, subject to upstream license terms. See LICENSE.md for full terms.

Built with Llama. This model is a derivative of Meta's Llama 3.1 8B Instruct and is subject to the Llama 3.1 Community License Agreement in addition to the Evrmind license.

Citation

@misc{evrmind2026evr1maano8binstruct,
  title={Evrmind EVR-1 Maano-8b-Instruct: A Custom 3-Bit Compression Method for Coherent On-Device Instruction-Following},
  author={Evrmind},
  year={2026},
  url={https://huggingface.co/evrmind/evr-1-maano-8b-instruct}
}

Contact