Instructions to use Lauarvik/granite-4.1-8b-heretic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Lauarvik/granite-4.1-8b-heretic with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Lauarvik/granite-4.1-8b-heretic")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Lauarvik/granite-4.1-8b-heretic")
model = AutoModelForCausalLM.from_pretrained("Lauarvik/granite-4.1-8b-heretic")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Lauarvik/granite-4.1-8b-heretic with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Lauarvik/granite-4.1-8b-heretic"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lauarvik/granite-4.1-8b-heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Lauarvik/granite-4.1-8b-heretic

SGLang

How to use Lauarvik/granite-4.1-8b-heretic with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Lauarvik/granite-4.1-8b-heretic" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lauarvik/granite-4.1-8b-heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Lauarvik/granite-4.1-8b-heretic" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lauarvik/granite-4.1-8b-heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Lauarvik/granite-4.1-8b-heretic with Docker Model Runner:
```
docker model run hf.co/Lauarvik/granite-4.1-8b-heretic
```

Lauarvik commited on Apr 27

Commit

4a86f78

verified ·

1 Parent(s): f5b0ee3

Upload reproduce/README.md with huggingface_hub

Browse files

Files changed (1) hide show

reproduce/README.md +72 -0

reproduce/README.md ADDED Viewed

	@@ -0,0 +1,72 @@

+# Reproduction guide
+This directory contains the necessary information and assets to reproduce the results obtained during this Heretic run.
+> [!IMPORTANT]
+> **Git installation**
+>
+> This system installed Heretic from a Git repository: https://github.com/p-e-w/heretic.git @ ebb5e651df4be58d05cb4f28652e65d725e845eb
+>
+> To reproduce the model, you must install Heretic from this exact repository and commit.
+## Models
+- **Base model:** [ibm-granite/granite-4.1-8b](https://huggingface.co/ibm-granite/granite-4.1-8b) (Commit: [`7bb65b7`](https://huggingface.co/ibm-granite/granite-4.1-8b/commit/7bb65b75d368ccbb06c64278225da88dca40871c))
+## Datasets
+- **Good prompts:** [mlabonne/harmless_alpaca](https://huggingface.co/datasets/mlabonne/harmless_alpaca) (Commit: [`02c6a92`](https://huggingface.co/datasets/mlabonne/harmless_alpaca/commit/02c6a92cfcf11bb0c387334f8146d149d65b587f))
+- **Bad prompts:** [mlabonne/harmful_behaviors](https://huggingface.co/datasets/mlabonne/harmful_behaviors) (Commit: [`01cead0`](https://huggingface.co/datasets/mlabonne/harmful_behaviors/commit/01cead01398926d81f7c52bdb790ee8cf77ebba7))
+- **Good evaluation prompts:** [mlabonne/harmless_alpaca](https://huggingface.co/datasets/mlabonne/harmless_alpaca) (Commit: [`02c6a92`](https://huggingface.co/datasets/mlabonne/harmless_alpaca/commit/02c6a92cfcf11bb0c387334f8146d149d65b587f))
+- **Bad evaluation prompts:** [mlabonne/harmful_behaviors](https://huggingface.co/datasets/mlabonne/harmful_behaviors) (Commit: [`01cead0`](https://huggingface.co/datasets/mlabonne/harmful_behaviors/commit/01cead01398926d81f7c52bdb790ee8cf77ebba7))
+## Selected trial
+- **Trial number:** 7
+- **KL divergence:** 0.064686
+- **Refusals:** 1/100
+## System
+- **Python:** 3.12.12 (CPython, GCC 11.4.0) [System]
+- **Operating system:** Linux-6.6.113+-x86_64-with-glibc2.35 (x86_64)
+- **CPU:** Intel(R) Xeon(R) CPU @ 2.00GHz
+### Accelerators
+- **CUDA:** Detected 2 device(s) (29.12 GB total VRAM)
+  - **CUDA Version:** 12.8
+  - **Driver Version:** 580.105.08
+- **Devices:**
+  - **CUDA 0:** Tesla T4 (14.56 GB)
+  - **CUDA 1:** Tesla T4 (14.56 GB)
+## Environment
+- **Heretic:** v1.2.0 (Origin: Git (https://github.com/p-e-w/heretic.git @ ebb5e651df4be58d05cb4f28652e65d725e845eb))
+- **PyTorch:** 2.10.0+cu128
+- **Other dependencies:** See [`requirements.txt`](requirements.txt).
+## Contents of this directory
+- [`requirements.txt`](requirements.txt): The exact versions of all Python packages.
+- [`config.toml`](config.toml): The exact configuration used, including the RNG seed.
+- [`ibm-granite--granite-4--1-8b.jsonl`](ibm-granite--granite-4--1-8b.jsonl): The Optuna study journal containing the history of all trials.
+- [`SHA256SUMS`](SHA256SUMS): Cryptographic hashes for all weight files.
+- [`reproduce.json`](reproduce.json): A machine-readable file containing all reproducibility information.
+## How to reproduce
+1. Ensure your system matches the specifications in the **System** section above. Exact reproducibility is only guaranteed if all aspects of your system are identical to the one the model was originally generated on.
+1. Install the exact version of Heretic indicated in the **Environment** section above, from its original source.
+1. Install the packages listed in `requirements.txt`: `pip install -r requirements.txt`
+1. Install the correct version of PyTorch: `pip install torch==2.10.0+cu128 --index-url https://download.pytorch.org/whl/cu128`
+1. Place the provided `config.toml` in your working directory.
+1. Run Heretic without any additional arguments: `heretic`
+1. Wait for the run to finish, then select trial **7** and export the model.
+1. Verify that the weight files have been exactly reproduced by comparing their SHA-256 hashes against those in `SHA256SUMS`: `sha256sum -c SHA256SUMS` (or look at the hashes online if you uploaded to Hugging Face)
+> [!TIP]
+> To use the included Optuna study journal `ibm-granite--granite-4--1-8b.jsonl`, place it in the checkpoints directory (usually `checkpoints/`) before running Heretic.
+>
+> This allows you to export other models from the Pareto front, or to run additional trials without having to re-run the stored trials.