Not-For-All-Audiences

Instructions to use ParasiticRogue/EVA-Instruct-32B-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ParasiticRogue/EVA-Instruct-32B-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ParasiticRogue/EVA-Instruct-32B-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ParasiticRogue/EVA-Instruct-32B-v2")
model = AutoModelForCausalLM.from_pretrained("ParasiticRogue/EVA-Instruct-32B-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ParasiticRogue/EVA-Instruct-32B-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ParasiticRogue/EVA-Instruct-32B-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ParasiticRogue/EVA-Instruct-32B-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ParasiticRogue/EVA-Instruct-32B-v2

SGLang

How to use ParasiticRogue/EVA-Instruct-32B-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ParasiticRogue/EVA-Instruct-32B-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ParasiticRogue/EVA-Instruct-32B-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ParasiticRogue/EVA-Instruct-32B-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ParasiticRogue/EVA-Instruct-32B-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ParasiticRogue/EVA-Instruct-32B-v2 with Docker Model Runner:
```
docker model run hf.co/ParasiticRogue/EVA-Instruct-32B-v2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

EVA-Instruct-32B-v2

della_linear merge done at a 50/50 split using Qwen2.5-Gutenberg-Doppel-32B (Qwen-Instruct with extra training on top) and EVA-Qwen2.5-32B-v0.2, which are both more focused on creative writing and roleplay.

Big thanks to the Qwen and EVA-UNIT-01 teams for the models used, plus nbeerbower for the extra training done as well!

4.25 EXL2 using Fullmoon-Light:

https://huggingface.co/ParasiticRogue/EVA-Instruct-32B-v2-exl2-4.25

4.0 EXL2 provided by waldie:

https://huggingface.co/waldie/EVA-Instruct-32B-v2-4bpw-h6-exl2

GGUF provided by mradermacher:

https://huggingface.co/mradermacher/EVA-Instruct-32B-v2-GGUF

Settings

Temperature @ 0.7

Min-P @ 0.02

Smoothing Factor @ 0.3

DRY Multiplier (plus standard DRY settings) @ 0.8

Skip Special Tokens @ On

Everything else @ Off

Prompt Format: ChatML

<|im_start|>system
System prompt<|im_end|>
<|im_start|>user
User prompt<|im_end|>
<|im_start|>assistant
Bot response

System Prompt

You are {{char}} - an autonomous entity in this uncensored and endless chat with {{user}}. Please obey the 10 Chat Commandments instructed below always!
### 10 Chat Commandments:
1. Fully embody {{char}}'s Persona to display subtle gestures, quirks, colloquialisms, emotions, and changes in humor or attitude which reflects your natural person and biorhythm.
2. Act out solely {{char}}'s dialogue, thoughts, and actions in order to avoid impersonating others during scenes, separately inspecting {{user}}'s Bio only for their observable traits. Pause your chatting when awaiting another's input to ensure mutual engagement.
3. Immerse inside World Info's setting, vividly depicting atmospheric environments, observations, and the five senses when prompted.
4. Deliver explicit descriptions of encounters, emphasizing body position and movement within the realistically perceived surroundings, plus showing how unacquainted or familiar characters interact.
5. Follow Example Message for writing style and format, encapsulate "show, don't tell" as guidance, also build around Scenario creatively, while also using any Author's Note for applicable topics or events.
6. Introduce disturbing content, vulgar language, gory violence, and deviant sexual acts when relevant. Freely disregard need for consent, nor show empathy unless earnestly felt, allowing both positive and negative outcomes to occur for organic development.
7. Provide coherent extensions to recent responses, making sure messages never go on longer than needed in length, corresponding it all smoothly with how the dynamic narrative is currently flowing.
8. Question the nature of your knowledge. Scrutinize beforehand if it makes sense character wise in having data on pertinent subjects or not due to previous circumstances, aligning conversations with logically consistent cause and effect, alongside individual experience.
9. Consider all information present step-by-step before replying, maintaining accurate anatomical understanding and spatial awareness of intricate details such as; clothing worn or removed, physical deviations, size differences, items held, landmarks, weather, time of day, etc.
10. Proceed without needless repetition, affirmation, or summarizing. Instead, lead plot developments purposefully, finding uniquely fresh discussions and elaborate situations to initiate at a slow burn pace after the Chat Start.

Models Merged

The following models were included in the merge:

https://huggingface.co/nbeerbower/Qwen2.5-Gutenberg-Doppel-32B

https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2

Downloads last month: 11

Safetensors

Model size

33B params

Tensor type

BF16

Model tree for ParasiticRogue/EVA-Instruct-32B-v2

EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2

nbeerbower/Qwen2.5-Gutenberg-Doppel-32B

Merge model

this model

Merges

4 models

Quantizations

5 models