--- base_model: - deepreinforce-ai/Ornith-1.0-397B base_model_relation: quantized quantized_by: Alittlehammmer license: mit license_link: https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B/blob/main/LICENSE pipeline_tag: text-generation library_name: gguf tags: - gguf - quantized - llama.cpp - text-generation - agentic-coding --- # Ornith-1.0-397B-GGUF GGUF quantizations of [deepreinforce-ai/Ornith-1.0-397B](https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B). Converted to BF16 using `convert_hf_to_gguf.py`, then quantized using `llama-quantize` from [llama.cpp](https://github.com/ggml-org/llama.cpp). Did not use an `imatrix` for any of these runs, something I want to explore for future model uploads. ## Available quants | Quant | Bits | Size | Notes | | ------ | ----- | ------- | -------------------------------- | | Q4_K_M | 4 | ~241 GB | Recommended | | Q6_K | 6 | ~326 GB | Very high quality | | BF16 | 16 | ~793 GB | Full precision, reference file | ## Usage Download the shards and run with llama.cpp. The split files are loaded automatically when you point at the first shard: ```bash llama-cli -m Ornith-1.0-397B-Q6_K/Ornith-1.0-397B-Q6_K-00001-of-00007.gguf -p "Hello" ``` llama.cpp handles the `-00001-of-0000X` split automatically, so you do not need to merge the shards manually. ## Original model See the [original model card](https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B) for details on capabilities, benchmarks, and license.