---
base_model:
  - deepreinforce-ai/Ornith-1.0-397B
base_model_relation: quantized
quantized_by: Alittlehammmer
license: mit
license_link: https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B/blob/main/LICENSE
pipeline_tag: text-generation
library_name: gguf
tags:
  - gguf
  - quantized
  - llama.cpp
  - text-generation
  - agentic-coding
---

# Ornith-1.0-397B-GGUF

GGUF quantizations of [deepreinforce-ai/Ornith-1.0-397B](https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B).

Converted to BF16 using `convert_hf_to_gguf.py`, then quantized using `llama-quantize` from [llama.cpp](https://github.com/ggml-org/llama.cpp).

Did not use an `imatrix` for any of these runs, something I want to explore for future model uploads.

## Available quants

| Quant  | Bits  | Size    | Notes                            |
| ------ | ----- | ------- | -------------------------------- |
| Q4_K_M | 4     | ~241 GB | Recommended                      |
| Q6_K   | 6     | ~326 GB | Very high quality                |
| BF16   | 16    | ~793 GB | Full precision, reference file   |

## Usage

Download the shards and run with llama.cpp. The split files are
loaded automatically when you point at the first shard:

```bash
llama-cli -m Ornith-1.0-397B-Q6_K/Ornith-1.0-397B-Q6_K-00001-of-00007.gguf -p "Hello"
```

llama.cpp handles the `-00001-of-0000X` split automatically, so you do not need to merge
the shards manually.

## Original model

See the [original model card](https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B)
for details on capabilities, benchmarks, and license.