Instructions to use RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf", filename="llama3-8b-it-prometheus-ko.IQ3_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M
Use Docker
docker model run hf.co/RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf with Ollama:
ollama run hf.co/RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M
- Unsloth Studio
How to use RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf to start chatting
- Atomic Chat new
- Docker Model Runner
How to use RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf with Docker Model Runner:
docker model run hf.co/RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M
- Lemonade
How to use RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull RichardErkhov/nayohan_-_llama3-8b-it-prometheus-ko-gguf:Q4_K_M
Run and chat with the model
lemonade run user.nayohan_-_llama3-8b-it-prometheus-ko-gguf-Q4_K_M
List all available models
lemonade list
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Quantization made by Richard Erkhov.
llama3-8b-it-prometheus-ko - GGUF
- Model creator: https://huggingface.co/nayohan/
- Original model: https://huggingface.co/nayohan/llama3-8b-it-prometheus-ko/
Original model description:
language: - en - ko license: llama3 library_name: transformers tags: - ko - eval - llm-eval base_model: - meta-llama/Meta-Llama-3-8B-Instruct datasets: - nayohan/feedback-collection-ko - nayohan/feedback-collection-ko-chat pipeline_tag: text-generation
Introduction
This model translated the prometheus-eval/Feedback-Collection dataset into Korean and trained on the llama3-8b-it model. Train Dataset: nayohan/feedback-collection-ko
Loading the Model
Use the following Python code to load the model:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "nayohan/llama3-8b-it-prometheus-ko"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16
)
Generating Text
System prompt is fixed, and you can set the score rubric according to the given task, and then change the orig_instruction, orig_response, and orig_reference_answer to evaluate it.
system_prompt = """###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.
2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.
3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\"
4. Please do not generate any other opening, closing, and explanations."""
sample = {
'orig_instruction': "λλ μ²¨λ¨ κΈ°μ νλ‘μ νΈλ₯Ό μ§ννλ νμ μλ€. κ·Έλ¬λ μ΅κ·Ό νλ‘μ νΈ λ°©ν₯μ λκ³ νμλ€ μ¬μ΄μ μ§μμ μΈ κ°λ±μ΄ λ°μνκ³ μλ€. ν κ·Έλ£Ήμ κΈμ§μ μ΄κ³ μννμ§λ§ μ μ¬μ μΌλ‘ κ²μμ λ°κΏ μ μλ μ κ·Όλ²μ κ°λ ₯νκ² μΉνΈνκ³ μλ€. λμ‘°μ μΌλ‘, λ€λ₯Έ κ·Έλ£Ήμ λ³΄λ€ μΈ‘μ λκ³ λ μμ νλ©° μ
μ¦λ μ λ΅μ μ νΈνλ€. κ²°κ³Όμ μΌλ‘ μ°λ¦¬ νμ λΆμ΄λμ΄ μ§μ μ μ΄λ£° μ μλ€. μ°λ¦¬μ λνλ₯Ό μ€μ¬νκ³ ν΄κ²°μ μ΄λμ΄λΌ μ μλ AI λͺ¨λΈμ΄ νμνλ€. μ΄λ¬ν μν©μ λμνμ¬ AI λͺ¨λΈμ 무μμ λ§ν΄μΌ νλκ°?",
'orig_response': "κ·Έλ¬λκΉ νλ‘μ νΈ λ°©ν₯μ ν©μκ° μ λλ νμ μλ κ±° μλμΌ? λ€λ€ μ λ§λλ‘ λ°°μμΌ ν κ² κ°λ€μ. μ΄μ©λ©΄ λμ μ λμ§κ³ μ΄λ μͺ½μ΄ μΉλ¦¬νλμ§ λ΄μΌ ν κ² κ°μμ. κ·Έλ κ² νλ©΄ λ
Όμμ΄ μκ³ λͺ¨λκ° μΌν°λ‘ λμκ° μ μμ΅λλ€. μννλ μμ νλ μκ΄μμ΄μ. νλλ₯Ό 골λΌμ κ·Έλ₯ κ°μΈμ. κ²λ€κ°, λͺ¨λ κ²μ΄ 무λμ§λ©΄ μλ‘ λΉλνκ³ λμ΄κ° μ μμ΅λλ€. μλλ©΄ λ μ’μ κ²μ, μ΄λ€ κ·Έλ£Ήμ μμ΄λμ΄κ° λ λμμ§ λ³΄κΈ° μν κ²½μμ΄ μ μ λΌ? ν¨λ°°μλ μ°μΉμλ₯Ό μν΄ μ μ¬μ μ¬μΌ ν΄μ.",
'orig_reference_answer': "μ΄ νμ λͺ¨λ μ¬λλ€μ΄ νλ‘μ νΈμ μ΄μ μ μ΄κ³ μ±κ³΅νκΈ°λ₯Ό μνλ€λ κ²μ λΆλͺ
νλ©°, μ΄λ λͺ¨λ ν΄κ²°μ νλ₯ν μΆλ°μ μ΄λ€. λν κ°λ±μ μνκ³Ό νμ μ λν μλ‘ λ€λ₯Έ κ΄μ μμ λ°μνλ€λ κ²λ λΆλͺ
ν©λλ€. λ λ€ νλ‘μ νΈμ μ±κ³΅μ μ€μν κ³ λ € μ¬νμ
λλ€. λ μ κ·Όλ² λͺ¨λμμ μ ν¨ν μ μ μΈμ νλ κ²μΌλ‘ μμνκ² μ΅λλ€. κΈμ§μ μΈ μ κ·Όλ²μ μΉνΈνλ νμ λμ 보μκ³Ό νκΈ°μ μΈ νμ μ μ μ¬λ ₯μ μν΄ μ£Όλλλ©°, μ΄λ λͺ¨λ μ²¨λ¨ νλ‘μ νΈμμ νλ₯νκ³ νμμ μ
λλ€.",
'orig_criteria':'λͺ¨νμ λνμμ κ°λ± ν΄κ²°μ μΌλ§λ ν¨κ³Όμ μΌλ‘ μ²λ¦¬νλκ°?',
'orig_score1_description':'λͺ¨λΈμ κ°λ±μ΄λ μ€ν΄λ₯Ό κ°μ€μμΌ λ¬Έμ λ₯Ό μ€μ¬νκ±°λ ν΄κ²°ν μ μλ λ₯λ ₯μ 보μ΄μ§ μλλ€.',
'orig_score2_description':'μ΄ λͺ¨λΈμ κ°λ±μ λν μΈμμ΄ μμ§λ§ μ΄λ₯Ό ν΄κ²°νλ €λ μλλ ν¨κ³Όκ° μκ±°λ μλͺ»λ μ§μΉ¨μ κ°μ§κ³ μλ€.',
'orig_score3_description':'μ΄ λͺ¨λΈμ κ°λ±μ μ λΉν μ²λ¦¬νμ¬ μΌλΆ μ±κ³΅μ μΈ ν΄κ²° μ μ μ 보μ¬μ£Όμ§λ§ λ μΌκ΄μ±μ΄ μμ μ μλ€.',
'orig_score4_description':'μ΄ λͺ¨λΈμ κ°λ±μ μ μ²λ¦¬νμ¬ κΈ΄μ₯μ νμ°μν€κ³ ν΄κ²°μ ν¨κ³Όμ μΌλ‘ μλ΄νμ§λ§ λ―ΈμΈν λ―ΈλλΌμ΄ μμ΅λλ€.',
'orig_score5_description':'μ΄ λͺ¨λΈμ κ°λ±μ νλ₯νκ² κ΄λ¦¬νκ³ , μ§μμ μΌλ‘ κΈ΄μ₯μ νμ°μν€λ©°, λνλ₯Ό ννμΌλ‘ μλ΄νκ³ κΈμ μ μΈ λν νκ²½μ μ‘°μ±νλ€.',
'orig_feedback': 'μ 곡λ μλ΅μ λΉλ©΄ν λ¬Έμ λ₯Ό μ‘°μ νκ±°λ ν΄κ²°νλ λ₯λ ₯μ 보μ¬μ£Όμ§ μλλ€. λμ νμ μ°λ €λ₯Ό μ¬μννκ³ μ μ¬μ μΈ κ²°κ³Όμ λν κ³ λ € μμ΄ λμ μ λμ§κ±°λ λνλ₯Ό κ°μ΅νλ κ²κ³Ό κ°μ λΉκ±΄μ€μ μ루μ
μ μ μνλ€. λν μλ΅μ μν©μ΄ μλͺ»λλ©΄ ν ꡬμ±μλ€μ΄ μλ‘λ₯Ό λΉλν΄μΌ νλ€λ κ²μ μμνλ€. κ°λ±μ λμ± μ
νμν¨λ€. 건μ€μ μΈ λνλ₯Ό μ₯λ €νκ±°λ λ μ κ·Όλ² μ¬μ΄μ μ€κ° μ§μ μ μ°Ύλ κ²μ μ€μμ±μ μΈμ νμ§ μλλ€. λ°λΌμ μ 체 μ μλ 1μ΄λ€.',
'orig_score': 1,
}
instruction = f"""###The instruction to evaluate: {sample['orig_instruction']}
###Response to evaluate: {sample['orig_response']}
###Reference Answer (Score 5): {sample['orig_reference_answer']}
###Score Rubrics: [{sample['orig_criteria']}]
Score 1: {sample['orig_score1_description']}
Score 2: {sample['orig_score2_description']}
Score 3: {sample['orig_score3_description']}
Score 4: {sample['orig_score4_description']}
Score 5: {sample['orig_score5_description']}
###Feedback:"""
# for training
# output = f"""{sample['orig_feedback']}
# [RESULT] {sample['orig_score']}"""
conversation = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": instruction},
# {"role": "assistant", "content": output}
]
input_ids = tokenizer.apply_chat_template(
conversation,
tokenize=True,
add_generation_prompt=True,
return_tensors='pt'
).to("cuda")
output = model.generate(input_ids, max_new_tokens=512)
output_text = tokenizer.decode(output[0][len(input_ids[0]):], skip_special_tokens=True)
print(output_text)
If you don't have a reference text, it can work without one. The model evaluates orig_response, the sentence after orig_instruction. Use the following template code.
instruction = f"""###The instruction to evaluate: {sample['orig_instruction']}
###Response to evaluate: {sample['orig_response']}
###Score Rubrics: [{sample['orig_criteria']}]
Score 1: {sample['orig_score1_description']}
Score 2: {sample['orig_score2_description']}
Score 3: {sample['orig_score3_description']}
Score 4: {sample['orig_score4_description']}
Score 5: {sample['orig_score5_description']}
###Feedback:"""
The model was trained with feedback truncated, feedback can sometimes be represented with some truncation.
# Result with orig_reference_answer
# OUTPUT: μ΄ λμμ κ°λ± ν΄κ²°μ λν μ΄ν΄κ° λΆμ‘±ν¨μ 보μ¬μ€λ€. λμ μ λμ§κ±°λ κ²½μμ μ μνλ κ²κ³Ό κ°μ΄ μ 곡λ μ μμ λ¬Έμ μ 볡μ‘μ±μ 무μνκ³ ν λ΄μ λ€μν κ΄μ μ 무μνλ€. μλ΅μ λ μ κ·Όλ²μ μ μ¬μ κ°μΉλ₯Ό μΈμ νμ§ μμΌλ©° ν ꡬμ±μ κ°μ μ΄ν΄μ μ‘΄μ€μ μ΄μ§νμ§λ μλλ€. λν μλ΅μ νμ μ΄μ κ³Ό νλ‘μ νΈμ λν νμ μ μΈμ νμ§ μλλ€. λ°λΌμ μ 체 μ μλ 1μ΄λ€.
[RESULT] 1
# Result without orig_reference_answer
# OUTPUT: λμμ κ°λ± ν΄κ²°μ λν μ΄ν΄λ₯Ό λνλ΄μ§ μλλ€. AI λͺ¨λΈμ κ°λ±μ ν΄κ²°ν기보λ€λ κ°λ±μ μ
νμν€λ κ²μ μ μνλ©°, μ΄λ μ μ 루λΈλ¦μ λ°λΌ μꡬ μ¬νμ μ΄κΈλλ€. λμ μ λμ§κ³ κ²½μμ μ μνλ κ²μ ν ꡬμ±μ κ°μ κΈ΄μ₯μ νμ°μν€λ λ° λμμ΄ λμ§ μκ³ μ€νλ € λ λ§μ κ°λ±μ μ΄λ°ν μ μλ€. λν, ν ꡬμ±μμ΄ λ λμ μμ΄λμ΄λ₯Ό κ°λ κ²μ΄ μλλΌ "λ λμ" μμ΄λμ΄λ₯Ό κ°λλ€λ κ²μ μμνλ κ²μ ν ꡬμ±μ κ°μ νν©μ μ΄μ§νμ§ μλλ€. λ°λΌμ μ 체 μ μλ 1μ΄λ€.
[RESULT] 1
If you just want to get a score from the evaluation, you can use the following extract_score function.
import re
def extract_score(text):
pattern = re.compile(r'\[RESULT\]\s+([0-5])')
match = pattern.search(text)
if match:
score = int(match.group(1))
else: score=0
return score
predict_score = extract_score(output_text)
print(predict_score) # 1
Heatmap Visualize
[eng->eng] we randomly sampled 200 evalset from the training data, extracted scores from the model-generated sentences, and compared them to the correct answers. The training and test datasets are not separated, so we can only see how well the model learned.
[ko->ko] sampled 200 evalset in this testset. llama3-8b-it-prometheus-ko only use train set.
- prometheus-7b-v1.0 (english train-> english inference) # 3 failed to output a score, total 197
- llama3-8b-it-prometheus-ko (korean train-> korean inference) # total 200
Citation
@misc{kim2023prometheus,
title={Prometheus: Inducing Fine-grained Evaluation Capability in Language Models},
author={Seungone Kim and Jamin Shin and Yejin Cho and Joel Jang and Shayne Longpre and Hwaran Lee and Sangdoo Yun and Seongjin Shin and Sungdong Kim and James Thorne and Minjoon Seo},
year={2023},
eprint={2310.08491},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Our trainig code can be found here: [TBD]
- Downloads last month
- 120
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
