richardyoung commited on
Commit
b70a33a
·
verified ·
1 Parent(s): b264541

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ base_model: HuggingFaceTB/SmolVLM2-2.2B-Instruct
7
+ tags:
8
+ - vision
9
+ - multimodal
10
+ - llama-cpp
11
+ - gguf
12
+ - quantized
13
+ ---
14
+
15
+ # SmolVLM2-2.2B-Instruct GGUF
16
+
17
+ GGUF quantizations of [HuggingFaceTB/SmolVLM2-2.2B-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct) for use with llama.cpp and Ollama.
18
+
19
+ ## Model Description
20
+
21
+ SmolVLM2 is a compact 2.2B parameter vision-language model from HuggingFace with video understanding capabilities. It's designed to be fast and efficient while maintaining strong performance on vision-language tasks.
22
+
23
+ ## Key Features
24
+
25
+ - **Compact & Fast** - Only 2.2B parameters, runs efficiently on consumer hardware
26
+ - **Vision & Video** - Understands both images and video frames
27
+ - **Instruction-tuned** - Optimized for following user instructions
28
+ - **Apache 2.0** - Fully open source
29
+
30
+ ## Available Quantizations
31
+
32
+ | Filename | Quant | Size | Description |
33
+ |----------|-------|------|-------------|
34
+ | SmolVLM2-2.2B-Instruct-Q4_K_M.gguf | Q4_K_M | 1.0 GB | Best balance of quality and speed (recommended) |
35
+ | SmolVLM2-2.2B-Instruct-Q8_0.gguf | Q8_0 | 1.8 GB | Higher quality |
36
+ | SmolVLM2-2.2B-Instruct.gguf | F16 | 3.4 GB | Full precision |
37
+
38
+ ## Usage
39
+
40
+ ### With Ollama
41
+
42
+ ```bash
43
+ # Pull and run (Q4_K_M by default)
44
+ ollama run richardyoung/smolvlm2-2.2b-instruct
45
+
46
+ # Or specific quantization
47
+ ollama run richardyoung/smolvlm2-2.2b-instruct:q8_0
48
+ ollama run richardyoung/smolvlm2-2.2b-instruct:f16
49
+ ```
50
+
51
+ ### With llama.cpp
52
+
53
+ ```bash
54
+ # Download a quantization
55
+ wget https://huggingface.co/richardyoung/SmolVLM2-2.2B-Instruct-GGUF/resolve/main/SmolVLM2-2.2B-Instruct-Q4_K_M.gguf
56
+
57
+ # Run with llama.cpp
58
+ ./llama-cli -m SmolVLM2-2.2B-Instruct-Q4_K_M.gguf -p "Describe this image:" --image your_image.jpg
59
+ ```
60
+
61
+ ## Technical Requirements
62
+
63
+ - **Minimum:** 4GB RAM, any modern CPU
64
+ - **Recommended:** 8GB RAM or Apple Silicon Mac
65
+
66
+ ## Chat Template
67
+
68
+ SmolVLM2 uses the ChatML format:
69
+ ```
70
+ <|im_start|>system
71
+ {system_message}<|im_end|>
72
+ <|im_start|>user
73
+ {user_message}<|im_end|>
74
+ <|im_start|>assistant
75
+ {assistant_response}<|im_end|>
76
+ ```
77
+
78
+ ## Links
79
+
80
+ - **Original Model:** [HuggingFaceTB/SmolVLM2-2.2B-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct)
81
+ - **Ollama:** [richardyoung/smolvlm2-2.2b-instruct](https://ollama.com/richardyoung/smolvlm2-2.2b-instruct)
82
+
83
+ ## Credits
84
+
85
+ - **Original Model:** HuggingFace
86
+ - **Quantization:** Richard Young ([deepneuro.ai](https://deepneuro.ai/richard))
87
+
88
+ ## License
89
+
90
+ Apache 2.0