Transformers
GGUF
text-generation-inference
unsloth
llama
Eval Results (legacy)
imatrix
conversational
Instructions to use BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF", dtype="auto") - llama-cpp-python
How to use BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF", filename="replete-coder-llama3-8b-iq3_m-imat.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF with Ollama:
ollama run hf.co/BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M
- Unsloth Studio
How to use BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF with Docker Model Runner:
docker model run hf.co/BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M
- Lemonade
How to use BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BenevolenceMessiah/Replete-Coder-Llama3-8B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Replete-Coder-Llama3-8B-GGUF-Q4_K_M
List all available models
lemonade list
| license: other | |
| license_name: llama-3 | |
| license_link: https://llama.meta.com/llama3/license/ | |
| tags: | |
| - text-generation-inference | |
| - transformers | |
| - unsloth | |
| - llama | |
| datasets: | |
| - Replete-AI/code_bagel_hermes-2.5 | |
| - Replete-AI/code_bagel | |
| - Replete-AI/OpenHermes-2.5-Uncensored | |
| - teknium/OpenHermes-2.5 | |
| - layoric/tiny-codes-alpaca | |
| - glaiveai/glaive-code-assistant-v3 | |
| - ajibawa-2023/Code-290k-ShareGPT | |
| - TIGER-Lab/MathInstruct | |
| - chargoddard/commitpack-ft-instruct-rated | |
| - iamturun/code_instructions_120k_alpaca | |
| - ise-uiuc/Magicoder-Evol-Instruct-110K | |
| - cognitivecomputations/dolphin-coder | |
| - nickrosh/Evol-Instruct-Code-80k-v1 | |
| - coseal/CodeUltraFeedback_binarized | |
| - glaiveai/glaive-function-calling-v2 | |
| - CyberNative/Code_Vulnerability_Security_DPO | |
| - jondurbin/airoboros-2.2 | |
| - camel-ai | |
| - lmsys/lmsys-chat-1m | |
| - CollectiveCognition/chats-data-2023-09-22 | |
| - CoT-Alpaca-GPT4 | |
| - WizardLM/WizardLM_evol_instruct_70k | |
| - WizardLM/WizardLM_evol_instruct_V2_196k | |
| - teknium/GPT4-LLM-Cleaned | |
| - GPTeacher | |
| - OpenGPT | |
| - meta-math/MetaMathQA | |
| - Open-Orca/SlimOrca | |
| - garage-bAInd/Open-Platypus | |
| - anon8231489123/ShareGPT_Vicuna_unfiltered | |
| - Unnatural-Instructions-GPT4 | |
| model-index: | |
| - name: Replete-Coder-llama3-8b | |
| results: | |
| - task: | |
| name: HumanEval | |
| type: text-generation | |
| dataset: | |
| type: openai_humaneval | |
| name: HumanEval | |
| metrics: | |
| - name: pass@1 | |
| type: pass@1 | |
| value: .64683835842678326 | |
| verified: True | |
| - task: | |
| name: AI2 Reasoning Challenge | |
| type: text-generation | |
| dataset: | |
| name: AI2 Reasoning Challenge (25-Shot) | |
| type: ai2_arc | |
| config: ARC-Challenge | |
| split: test | |
| args: | |
| num_few_shot: 25 | |
| metrics: | |
| - type: accuracy | |
| value: | |
| name: normalized accuracy | |
| source: | |
| url: https://www.placeholderurl.com | |
| name: Open LLM Leaderboard | |
| - task: | |
| name: Text Generation | |
| type: text-generation | |
| dataset: | |
| name: HellaSwag (10-Shot) | |
| type: hellaswag | |
| split: validation | |
| args: | |
| num_few_shot: 10 | |
| metrics: | |
| - type: accuracy | |
| value: | |
| name: normalized accuracy | |
| source: | |
| url: https://www.placeholderurl.com | |
| name: Open LLM Leaderboard | |
| - task: | |
| name: Text Generation | |
| type: text-generation | |
| dataset: | |
| name: MMLU (5-Shot) | |
| type: cais/mmlu | |
| config: all | |
| split: test | |
| args: | |
| num_few_shot: 5 | |
| metrics: | |
| - type: accuracy | |
| value: | |
| name: accuracy | |
| source: | |
| url: https://www.placeholderurl.com | |
| name: Open LLM Leaderboard | |
| - task: | |
| name: Text Generation | |
| type: text-generation | |
| dataset: | |
| name: TruthfulQA (0-shot) | |
| type: truthful_qa | |
| config: multiple_choice | |
| split: validation | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: multiple_choice_accuracy | |
| value: | |
| source: | |
| url: https://www.placeholderurl.com | |
| name: Open LLM Leaderboard | |
| - task: | |
| name: Text Generation | |
| type: text-generation | |
| dataset: | |
| name: Winogrande (5-shot) | |
| type: winogrande | |
| config: winogrande_xl | |
| split: validation | |
| args: | |
| num_few_shot: 5 | |
| metrics: | |
| - type: accuracy | |
| value: | |
| name: accuracy | |
| source: | |
| url: https://www.placeholderurl.com | |
| name: Open LLM Leaderboard | |
| - task: | |
| name: Text Generation | |
| type: text-generation | |
| dataset: | |
| name: GSM8k (5-shot) | |
| type: gsm8k | |
| config: main | |
| split: test | |
| args: | |
| num_few_shot: 5 | |
| metrics: | |
| - type: accuracy | |
| value: | |
| name: accuracy | |
| source: | |
| url: https://www.placeholderurl.com | |
| name: Open LLM Leaderboard | |
| Greetings friends, Asalamu Alaikum; I am pleased to provide you with GGUF vresions of this great model! The original model is: [Replete-AI/Replete-Coder-Llama3-8B](https://huggingface.co/Replete-AI/Replete-Coder-Llama3-8B/tree/main) by [Replete-AI](https://huggingface.co/Replete-AI) with the finetuning conducted by [rombo dawg](https://huggingface.co/rombodawg). | |
| <!-- description start --> | |
| ## Description (per [TheBloke](https://huggingface.co/TheBloke)) | |
| This repo contains GGUF format model files. | |
| These files were quantised using ggml-org/gguf-my-repo [https://huggingface.co/spaces/ggml-org/gguf-my-repo] | |
| <!-- description end --> | |
| <!-- README_GGUF.md-about-gguf start --> | |
| ### About GGUF (per [TheBloke](https://huggingface.co/TheBloke)) | |
| GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. | |
| Here is an incomplete list of clients and libraries that are known to support GGUF: | |
| * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option. | |
| * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. | |
| * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. | |
| * [GPT4All](https://gpt4all.io/index.html), a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. | |
| * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. | |
| * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with many interesting and unique features, including a full model library for easy model selection. | |
| * [Faraday.dev](https://faraday.dev/), an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. | |
| * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. | |
| * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use. | |
| * [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. | |
| <!-- README_GGUF.md-about-gguf end --> | |
| <!-- repositories-available start --> | |
| --- | |
| # Replete-Coder-llama3-8b | |
| Finetuned by: Rombodawg | |
| ### More than just a coding model! | |
| Although Replete-Coder has amazing coding capabilities, its trained on vaste amount of non-coding data, fully cleaned and uncensored. Dont just use it for coding, use it for all your needs! We are truly trying to make the GPT killer! | |
|  | |
| Thank you to TensorDock for sponsoring Replete-Coder-llama3-8b and Replete-Coder-Qwen2-1.5b | |
| you can check out their website for cloud compute rental below. | |
| - https://tensordock.com | |
| __________________________________________________________________________________________________ | |
| Replete-Coder-llama3-8b is a general purpose model that is specially trained in coding in over 100 coding languages. The data used to train the model contains 25% non-code instruction data and 75% coding instruction data totaling up to 3.9 million lines, roughly 1 billion tokens, or 7.27gb of instruct data. The data used to train this model was 100% uncensored, then fully deduplicated, before training happened. | |
| The Replete-Coder models (including Replete-Coder-llama3-8b and Replete-Coder-Qwen2-1.5b) feature the following: | |
| - Advanced coding capabilities in over 100 coding languages | |
| - Advanced code translation (between languages) | |
| - Security and vulnerability prevention related coding capabilities | |
| - General purpose use | |
| - Uncensored use | |
| - Function calling | |
| - Advanced math use | |
| - Use on low end (8b) and mobile (1.5b) platforms | |
| Notice: Replete-Coder series of models are fine-tuned on a context window of 8192 tokens. Performance past this context window is not guaranteed. | |
|  | |
| __________________________________________________________________________________________________ | |
| You can find the 25% non-coding instruction below: | |
| - https://huggingface.co/datasets/Replete-AI/OpenHermes-2.5-Uncensored | |
| And the 75% coding specific instruction data below: | |
| - https://huggingface.co/datasets/Replete-AI/code_bagel | |
| These two datasets were combined to create the final dataset for training, which is linked below: | |
| - https://huggingface.co/datasets/Replete-AI/code_bagel_hermes-2.5 | |
| __________________________________________________________________________________________________ | |
| ## Prompt Template: Custom Alpaca | |
| ``` | |
| ### System: | |
| {} | |
| ### Instruction: | |
| {} | |
| ### Response: | |
| {} | |
| ``` | |
| Note: The system prompt varies in training data, but the most commonly used one is: | |
| ``` | |
| Below is an instruction that describes a task, Write a response that appropriately completes the request. | |
| ``` | |
| End token: | |
| ``` | |
| <|endoftext|> | |
| ``` | |
| __________________________________________________________________________________________________ | |
| Thank you to the community for your contributions to the Replete-AI/code_bagel_hermes-2.5 dataset. Without the participation of so many members making their datasets free and open source for any to use, this amazing AI model wouldn't be possible. | |
| Extra special thanks to Teknium for the Open-Hermes-2.5 dataset and jondurbin for the bagel dataset and the naming idea for the code_bagel series of datasets. You can find both of their huggingface accounts linked below: | |
| - https://huggingface.co/teknium | |
| - https://huggingface.co/jondurbin | |
| Another special thanks to unsloth for being the main method of training for Replete-Coder. Bellow you can find their github, as well as the special Replete-Ai secret sause (Unsloth + Qlora + Galore) colab code document that was used to train this model. | |
| - https://github.com/unslothai/unsloth | |
| - https://colab.research.google.com/drive/1VAaxMQJN9-78WLsPU0GWg5tEkasXoTP9?usp=sharing | |
| __________________________________________________________________________________________________ | |
| ## Join the Replete-Ai discord! We are a great and Loving community! | |
| - https://discord.gg/ZZbnsmVnjD |