Instructions to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2")
model = AutoModelForMultimodalLM.from_pretrained("Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2

SGLang

How to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with Docker Model Runner:
```
docker model run hf.co/Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

about

The base of Hexagon Purple V2, Smartracks, remains unchanged, and is a "3 levels" stock merge including Deepseek Distill R1 (3 flavors), Nemotron, and Tulu capabilities.

Hexagon Purple V2 diverges from V1 with the following:

Steelskull's Electra R1 replace Black-Ink-Guild's Perniscious Prophecy, because it's even better. 70Blivion is recovered elsewhere.
A Priestess stock merge replaces the Hostess one, and brings 70Blivision in and the Lumitron merge out, on the top of Tess R1 and Llama Creative Writer.
Dobby, Wayfarer and Drummer's Fallen Llama R1 (already present in a Smartracks sub-submerge and now Electra R1) go out as standalone models, replaced by a stock-merge of these 3, DoppelGanger R1.
Nbeerbower's Doppel Gutemberg goes in, as a 3.1 instruct (and novel writing) stabilizator working in tandem with the following model.
Miguel Tissera's Tess 3.0 70B 3.1 goes in also, as a perplexity dropper.

As usual, abliterated and lorablated (thanks Huihui-ai, Maxime Labonne, and ofc Failspy), are used systematically when they exist, and otherwise, the focus is on very low censorship.

benchs

Benchs are traded for creativity in this merge too, but we progress neatly compared to V1 :

PPL Wikitext Eng 512 : 3.43 (good)
ARC-C : 60.55 (good)
ARC-E : 81.05 (good also)

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Model Stock merge method using Nexesenex/Llama_3.x_70b_SmarTracks_V1.01 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: model_stock
models:
  - model: migtissera/Tess-3-Llama-3.1-70B
    parameters:
      weight: 1.0
  - model: nbeerbower/Llama3.1-Gutenberg-Doppel-70B
    parameters:
      weight: 1.0
  - model: NexesMess/Llama_3.1_70b_Priestess_V1
    parameters:
      weight: 1.0
  - model: Steelskull/L3.3-Electra-R1-70b
    parameters:
      weight: 1.0
  - model: NexesMess/Llama_3.3_70b_DoppelGanger_R1
    parameters:
      weight: 1.0
base_model: Nexesenex/Llama_3.x_70b_SmarTracks_V1.01
dtype: bfloat16
out_dtype: bfloat16
parameters:
  int8_mask: true
  normalize: true
  rescale: false
chat_template: auto
tokenizer:
  source: union