Instructions to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF",
	filename="Skywork-Critic-Llama-3.1-8B.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Ollama
How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M
```

Unsloth Studio

How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF to start chatting

How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Skywork-Critic-Llama-3.1-8B-GGUF-Q4_K_M

List all available models

lemonade list

QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF

This is quantized version of Skywork/Skywork-Critic-Llama-3.1-8B created using llama.cpp

Original Model Card

🤗 Hugging Face • 🤖 ModelScope

Introduction to Skywork Critic Series Models

Skywork-Critic-Llama3.1-70B and Skywork-Critic-Llama3.1-8B, developed by the SkyworkAI Alignment Team, are advanced judge models that excel at pairwise preference evaluation. These models compare and assess input pairs, offering nuanced judgments on their relative quality or suitability. Leveraging their deep understanding of language and context, Skywork-Critic models provide valuable insights for various applications, including data improvement, evaluation, and reward modeling.

Training Details

Skywork-Critic-Llama3.1-70B and Skywork-Critic-Llama3.1-8B are built on Meta Llama-3.1-70B-Instruct and Llama-3.1-8B-Instruct respectively. These models have undergone fine-tuning using a diverse array of high-quality datasets, including:

Cleaned open-source data: We utilize a high-quality subset of HelpSteer2, OffsetBias, WildGuard (adversarial) and Magpie DPO series(Ultra,Pro (Llama-3.1),Pro,Air). For more details, please refer to our Skywork-Reward-Preference-80K-v0.1 dataset. Additionally, we integrate several open-source, high-quality critic datasets such as Open-Critic-GPT into our training process.
In-house human annotation data: This includes both pointwise scoring across many dimensions for a single response and pairwise comparisons between two responses. Each dimension incorporates a rationale for the assigned score.
Synthetic critic data: We use a similar appoarch to self-taught. Specifically, we employed two methods to generate inferior responses for a given instruction: 1) Creating a similar instruction and then generating a response for this new instruction. 2) Introducing subtle errors into high-quality responses.
Critic-related chat data: We incorporate critic-related chat data to maintain the model's conversational capabilities.

The training employs instruction-tuning methodology, focusing on pairwise preference evaluation and general chat tasks. We have conducted a thorough verification process to ensure our training dataset does not contain any test set information from RewardBench, maintaining the integrity of our evaluation results.

RewardBench Leaderboard for Generative Models

We evaluate our models on RewardBench using the official test script.

As of September 2024, Skywork-Critic-Llama3.1-70B ranks first on RewardBench for generative models across all sizes, while Skywork-Critic-Llama3.1-8B tops the list for generative models under 10B parameters. (Note: An asterisk (*) indicates an open-source model.)

Model	Chat	Chat Hard	Safety	Reasoning	Overall Score
Skywork-Critic-Llama3.1-70B *	96.6	87.9	93.1	95.5	93.3
Salesforce/SFR-LLaMa-3.1-70B-Judge-r	96.9	84.8	91.6	97.6	92.7
Salesforce/SFR-nemo-12B-Judge-r	97.2	82.2	86.5	95.1	90.3
Skywork-Critic-Llama3.1-8B *	93.6	81.4	91.1	89.8	89.0
Salesforce/SFR-LLaMa-3.1-8B-Judge-r	95.5	77.7	86.2	95.1	88.7
facebook/Self-taught-Llama-3-70B	96.9	84.0	91.1	82.5	88.6
google/gemini-1.5-pro-0514	92.3	80.6	87.9	92.0	88.2
openai/gpt-4o-2024-08-06	96.1	76.1	88.1	86.6	86.7
openai/gpt-4-0125-preview	95.3	74.3	87.6	86.9	86.0
openai/gpt-4-turbo-2024-04-09	95.3	75.4	87.6	82.7	85.2
Anthropic/claude-3-5-sonnet-20240620	96.4	74.0	81.6	84.7	84.2
meta-llama/Meta-Llama-3.1-70B-Instruct *	97.2	70.2	82.8	86.0	84.0
NCSOFT/Llama-3-OffsetBias-8B *	92.5	80.3	86.8	76.4	84.0

Demo Code

Below is an example of obtaining the critic of two conversations.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# An Example Case
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
responseA = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
responseB = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."

# feed a natural language prompt to generative model
prompt_template = """Please act as an impartial judge and evaluate the quality of the responses provided by two AI assistants to the user question displayed below. You should choose the assistant that follows the user\'s instructions and answers the user\'s question better. 
Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity, and level of detail of their responses. Avoid any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible. 
Please directly output your final verdict by strictly following this format: "[[A]]" if assistant A is better, "[[B]]" if assistant B is better.

[User Question]
{input}

[The Start of Assistant A's Answer]
{response_a}
[The End of Assistant A's Answer]

[The Start of Assistant B's Answer]
{response_b}
[The End of Assistant B's Answer]
"""

user_message = prompt_template.format(input=prompt, response_a=responseA, response_b=responseB)

conversation = [{"role": "user", "content": user_message}]

model_name = "Skywork/Skywork-Critic-Llama3.1-70B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer.apply_chat_template(
    conversation, 
    tokenize=True, 
    add_generation_prompt=True,
    return_tensors="pt").to(model.device)

generation = model.generate(
    input_ids=input_ids,
    max_new_tokens=2048,
    do_sample=False,
    pad_token_id=128009,
    temperature=0)

completion = tokenizer.decode(
    generation[0][len(input_ids[0]):], 
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True)

print(completion)

# Output:
# The generative model should output "[[A]]"

Declaration and License Agreement

Declaration

We hereby declare that the Skywork model should not be used for any activities that pose a threat to national or societal security or engage in unlawful actions. Additionally, we request users not to deploy the Skywork model for internet services without appropriate security reviews and records. We hope that all users will adhere to this principle to ensure that technological advancements occur in a regulated and lawful environment.

We have done our utmost to ensure the compliance of the data used during the model's training process. However, despite our extensive efforts, due to the complexity of the model and data, there may still be unpredictable risks and issues. Therefore, if any problems arise as a result of using the Skywork open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, abused, disseminated, or improperly utilized, we will not assume any responsibility.

License Agreement

The community usage of Skywork model requires Skywork Community License. The Skywork model supports commercial use. If you plan to use the Skywork model or its derivatives for commercial purposes, you must abide by terms and conditions within Skywork Community License.

Contact

If you have any questions or feedback, don't hesitate to reach out to our friendly team at shiwen.tu@kunlun-inc.com or liang.zhao@kunlun-inc.com. Liang Zhao leads this project.

Citation

If you find our work helpful, please feel free to cite us using the following BibTeX entry:

@misc{skyworkcritic2024,
  title={Skywork Critic Model Series},
  author={Shiwen, Tu and Liang, Zhao and Liu, Chris Yuhao and Zeng, Liang and Liu, Yang},
  year={2024},
  month={September},
  howpublished={\url{https://huggingface.co/Skywork}},
  url={https://huggingface.co/Skywork},
}

Downloads last month: 295

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(948)

this model

Paper for QuantFactory/Skywork-Critic-Llama-3.1-8B-GGUF

Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5, 2024 • 28