How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PsiPi/NousResearch_Nous-Hermes-2-Vision-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PsiPi/NousResearch_Nous-Hermes-2-Vision-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker
docker model run hf.co/PsiPi/NousResearch_Nous-Hermes-2-Vision-GGUF:
Quick Links

GGUF Quants by Twobob, Thanks to @jartine and @cmp-nct for the assists

It's vicuna ref: here

Caveat emptor: There is still some kind of bug in the inference that is likely to get fixed upstream. Just FYI image/png

Nous-Hermes-2-Vision - Mistral 7B

image/png

In the tapestry of Greek mythology, Hermes reigns as the eloquent Messenger of the Gods, a deity who deftly bridges the realms through the art of communication. It is in homage to this divine mediator that I name this advanced LLM "Hermes," a system crafted to navigate the complex intricacies of human discourse with celestial finesse.

Model description

Nous-Hermes-2-Vision stands as a pioneering Vision-Language Model, leveraging advancements from the renowned OpenHermes-2.5-Mistral-7B by teknium. This model incorporates two pivotal enhancements, setting it apart as a cutting-edge solution:

  • SigLIP-400M Integration: Diverging from traditional approaches that rely on substantial 3B vision encoders, Nous-Hermes-2-Vision harnesses the formidable SigLIP-400M. This strategic choice not only streamlines the model's architecture, making it more lightweight, but also capitalizes on SigLIP's remarkable capabilities. The result? A remarkable boost in performance that defies conventional expectations.

  • Custom Dataset Enriched with Function Calling: Our model's training data includes a unique feature – function calling. This distinctive addition transforms Nous-Hermes-2-Vision into a Vision-Language Action Model. Developers now have a versatile tool at their disposal, primed for crafting a myriad of ingenious automations.

This project is led by qnguyen3 and teknium.

Training

Dataset

  • 220K from LVIS-INSTRUCT4V
  • 60K from ShareGPT4V
  • 150K Private Function Calling Data
  • 50K conversations from teknium's OpenHermes-2.5

Usage

Prompt Format

  • Like other LLaVA's variants, this model uses Vicuna-V1 as its prompt template. Please refer to conv_llava_v1 in this file
  • For Gradio UI, please visit this GitHub Repo

Function Calling

  • For functiong calling, the message should start with a <fn_call> tag. Here is an example:
<fn_call>{
  "type": "object",
  "properties": {
    "bus_colors": {
      "type": "array",
      "description": "The colors of the bus in the image.",
      "items": {
        "type": "string",
        "enum": ["red", "blue", "green", "white"]
      }
    },
    "bus_features": {
      "type": "string",
      "description": "The features seen on the back of the bus."
    },
    "bus_location": {
      "type": "string",
      "description": "The location of the bus (driving or pulled off to the side).",
      "enum": ["driving", "pulled off to the side"]
    }
  }
}

Output:

{
  "bus_colors": ["red", "white"],
  "bus_features": "An advertisement",
  "bus_location": "driving"
}

Example

Chat

image/png

Function Calling

Input image:

image/png

Input message:

<fn_call>{
    "type": "object",
    "properties": {
      "food_list": {
        "type": "array",
        "description": "List of all the food",
        "items": {
          "type": "string",
        }
      },
    }
}

Output:

{
    "food_list": [
        "Double Burger",
        "Cheeseburger",
        "French Fries",
        "Shakes",
        "Coffee"
    ]
}
Downloads last month
276
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PsiPi/NousResearch_Nous-Hermes-2-Vision-GGUF

Quantized
(189)
this model