Instructions to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2") model = AutoModelForMultimodalLM.from_pretrained("Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2
- SGLang
How to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2 with Docker Model Runner:
docker model run hf.co/Nexesenex/Llama_3.x_70b_Hexagon_Purple_V2
about
The base of Hexagon Purple V2, Smartracks, remains unchanged, and is a "3 levels" stock merge including Deepseek Distill R1 (3 flavors), Nemotron, and Tulu capabilities.
Hexagon Purple V2 diverges from V1 with the following:
- Steelskull's Electra R1 replace Black-Ink-Guild's Perniscious Prophecy, because it's even better. 70Blivion is recovered elsewhere.
- A Priestess stock merge replaces the Hostess one, and brings 70Blivision in and the Lumitron merge out, on the top of Tess R1 and Llama Creative Writer.
- Dobby, Wayfarer and Drummer's Fallen Llama R1 (already present in a Smartracks sub-submerge and now Electra R1) go out as standalone models, replaced by a stock-merge of these 3, DoppelGanger R1.
- Nbeerbower's Doppel Gutemberg goes in, as a 3.1 instruct (and novel writing) stabilizator working in tandem with the following model.
- Miguel Tissera's Tess 3.0 70B 3.1 goes in also, as a perplexity dropper.
As usual, abliterated and lorablated (thanks Huihui-ai, Maxime Labonne, and ofc Failspy), are used systematically when they exist, and otherwise, the focus is on very low censorship.
benchs
Benchs are traded for creativity in this merge too, but we progress neatly compared to V1 :
- PPL Wikitext Eng 512 : 3.43 (good)
- ARC-C : 60.55 (good)
- ARC-E : 81.05 (good also)
merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the Model Stock merge method using Nexesenex/Llama_3.x_70b_SmarTracks_V1.01 as a base.
Models Merged
The following models were included in the merge:
- Steelskull/L3.3-Electra-R1-70b
- NexesMess/Llama_3.3_70b_DoppelGanger_R1
- nbeerbower/Llama3.1-Gutenberg-Doppel-70B
- NexesMess/Llama_3.1_70b_Priestess_V1
- migtissera/Tess-3-Llama-3.1-70B
Configuration
The following YAML configuration was used to produce this model:
merge_method: model_stock
models:
- model: migtissera/Tess-3-Llama-3.1-70B
parameters:
weight: 1.0
- model: nbeerbower/Llama3.1-Gutenberg-Doppel-70B
parameters:
weight: 1.0
- model: NexesMess/Llama_3.1_70b_Priestess_V1
parameters:
weight: 1.0
- model: Steelskull/L3.3-Electra-R1-70b
parameters:
weight: 1.0
- model: NexesMess/Llama_3.3_70b_DoppelGanger_R1
parameters:
weight: 1.0
base_model: Nexesenex/Llama_3.x_70b_SmarTracks_V1.01
dtype: bfloat16
out_dtype: bfloat16
parameters:
int8_mask: true
normalize: true
rescale: false
chat_template: auto
tokenizer:
source: union
- Downloads last month
- 6