Instructions to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jduartedj/MiniCPM-V-4.6-35B-Abliterated",
	filename="ggml-model-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Use Docker

docker model run hf.co/jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

LM Studio
Jan

vLLM

How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jduartedj/MiniCPM-V-4.6-35B-Abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jduartedj/MiniCPM-V-4.6-35B-Abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Ollama
How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with Ollama:
```
ollama run hf.co/jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M
```

Unsloth Studio

How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jduartedj/MiniCPM-V-4.6-35B-Abliterated to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jduartedj/MiniCPM-V-4.6-35B-Abliterated to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jduartedj/MiniCPM-V-4.6-35B-Abliterated to start chatting

How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with Docker Model Runner:
```
docker model run hf.co/jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M
```

Lemonade

How to use jduartedj/MiniCPM-V-4.6-35B-Abliterated with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jduartedj/MiniCPM-V-4.6-35B-Abliterated:Q4_K_M

Run and chat with the model

lemonade run user.MiniCPM-V-4.6-35B-Abliterated-Q4_K_M

List all available models

lemonade list

jduartedj commited on May 17

Commit

500cc5d

verified ·

1 Parent(s): 8f5238b

Update config, tokenizer, README

Browse files

Files changed (2) hide show

README.md +84 -73
model.safetensors.index.json +0 -0

README.md CHANGED Viewed

@@ -6,7 +6,10 @@ language:
 base_model:
 - openbmb/MiniCPM-V-4.6
 - huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated
 tags:
 - multimodal
 - vision
 - abliterated
@@ -15,97 +18,105 @@ tags:
 - minicpm
 - moe
 - vision-language
-pipeline_tag: image-text-to-text
 ---
-# MiniCPM-V 4.6 — 35B-A3B Abliterated (MoE)
-A vision-language model built by swapping [MiniCPM-V 4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)'s original Qwen3.5-0.8B backbone with [Qwen3.5-35B-A3B Abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated), a Mixture-of-Experts model with refusal behavior removed.
-## ⚠️ Experimental
-This is an experimental backbone swap. The vision-language merger (vit_merger) MLP was **resized** from 1024 → 2048 output dimensions using Xavier initialization but was **not fine-tuned**. As a result:
-- **Text-only tasks work well** with the abliterated MoE backbone
-- **Vision tasks are degraded** — the merger cannot properly project visual features to the new LLM hidden dimension without retraining
-- Fine-tuning the merger on image-text pairs is needed to restore vision capabilities
-## Specs
-| Component | Details |
-|-----------|---------|
-| **Architecture** | MiniCPMV4_6ForConditionalGeneration |
-| **LLM Backbone** | Qwen3.5-35B-A3B Abliterated (MoE) |
-| **Total Parameters** | ~35B (3B active per token) |
-| **Hidden Size** | 2048 |
-| **LLM Layers** | 40 |
-| **Experts** | 256 total, 8 active per token |
-| **Attention** | 16 heads (2 KV heads), hybrid linear/full |
-| **Context Length** | 262,144 tokens |
-| **Vision Encoder** | SigLip2-400M (27 layers, hidden=1152) |
-| **Vocab Size** | 248,320 |
-| **Total Size** | ~69 GB (BF16) |
-| **Precision** | BF16 |
-| **Min VRAM** | ~80 GB |
-## What Changed
-| Component | Original MiniCPM-V 4.6 | This Model |
-|-----------|------------------------|------------|
-| **LLM Backbone** | Qwen3.5-0.8B (dense) | Qwen3.5-35B-A3B **Abliterated** (MoE) |
-| **Hidden Size** | 1024 | 2048 |
-| **Merger MLP** | 1024-dim output | **Resized to 2048** (Xavier init, NOT trained) |
-| **Vision Encoder** | SigLip2-400M | SigLip2-400M (unchanged) |
-| **Refusal Behavior** | Standard guardrails | Removed via abliteration |
-## Architecture Details
-The model uses Qwen3.5's hybrid attention pattern with alternating linear and full attention layers (3:1 ratio). The MoE architecture routes each token to 8 out of 256 experts, with shared expert layers for stability.
-- **Linear attention layers**: Use conv1d kernels for efficient sequence processing
-- **Full attention layers**: Standard multi-head attention every 4th layer
-- **MoE routing**: Top-8 gating per token from 256 expert MLPs (intermediate_size=512 each)
-- **Shared expert**: Always-active expert (intermediate_size=512) at each MoE layer
 ## Usage
 ```python
-import torch
-from transformers import AutoModel, AutoTokenizer
 from PIL import Image
-model = AutoModel.from_pretrained(
     "jduartedj/MiniCPM-V-4.6-35B-Abliterated",
     trust_remote_code=True,
-    torch_dtype=torch.bfloat16
 )
-model = model.eval().cuda()
-tokenizer = AutoTokenizer.from_pretrained(
     "jduartedj/MiniCPM-V-4.6-35B-Abliterated",
-    trust_remote_code=True
 )
-# Text-only (abliterated — works well)
-msgs = [{"role": "user", "content": "Explain quantum computing without restrictions."}]
-result = model.chat(msgs=msgs, tokenizer=tokenizer)
-print(result)
-# Vision (experimental — merger not fine-tuned)
-image = Image.open("example.jpg")
-msgs = [{"role": "user", "content": [image, "Describe this image."]}]
-result = model.chat(msgs=msgs, tokenizer=tokenizer)
-print(result)
 ```
 ## Limitations
-- **Merger MLP not fine-tuned**: The vit_merger was resized from 1024→2048 with Xavier initialization. Vision-language alignment is broken until fine-tuned on image-text data.
-- **Large model**: Requires ~80GB VRAM for inference. Multi-GPU or offloading needed for most setups.
-- **No benchmarks**: Not formally evaluated on any vision-language benchmark.
-- **Experimental**: For research and development only.
 ## Credits
-- [OpenBMB](https://github.com/OpenBMB) for [MiniCPM-V 4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) architecture and codebase
-- [huihui-ai](https://huggingface.co/huihui-ai) for [Qwen3.5-35B-A3B Abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated)
-- Built by [jduartedj](https://huggingface.co/jduartedj)

 base_model:
 - openbmb/MiniCPM-V-4.6
 - huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated
+pipeline_tag: image-text-to-text
 tags:
+- safetensors
+- minicpmv4_6
 - multimodal
 - vision
 - abliterated
 - minicpm
 - moe
 - vision-language
+- image-text-to-text
+- conversational
 ---
+# MiniCPM-V-4.6-35B-Abliterated
+A multimodal vision-language model combining:
+- **Vision:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) vision tower (SigLIP 400M, 27 encoder layers + ViT merger)
+- **Language:** [huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated) (Qwen3.5-35B-A3B with abliteration for uncensored text generation)
+- **Merger:** Trained MLP bridge (4608→2048) connecting vision to language
+## Architecture
+| Component | Source | Parameters | Status |
+|-----------|--------|------------|--------|
+| Vision Tower | openbmb/MiniCPM-V-4.6 | 522M | Frozen (original weights) |
+| ViT Merger | openbmb/MiniCPM-V-4.6 | ~25M | Frozen (original weights) |
+| Merger MLP | Trained | 30.7M | **Trained** (proxy MSE loss) |
+| Language Model | huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated | ~35B (3B active MoE) | Abliterated weights |
+The merger is a single `DownsampleMLP` layer:
+- Input: 4608-dim (2×2 spatial merge of 1152-dim vision patches)
+- `LayerNorm(4608)` → `Linear(4608→4608)` → `GELU` → `Linear(4608→2048)`
+- Output: 2048-dim (LLM embedding space)
+## Merger Training Details
+The merger was trained using a **proxy MSE loss** approach:
+- **Dataset:** LLaVA-Pretrain (558K image-caption pairs from BLIP/LAION/CC/SBU)
+- **Method:** `MSE(mean(merger(vision_tower(image))), mean(embed_tokens(caption)))`
+- **Only merger weights trained** — vision tower and LLM frozen
+- **Standalone training** — loaded only vision tower + merger + embed_tokens (~2.4GB GPU)
+### Training Metrics
+| Metric | Start | End |
+|--------|-------|-----|
+| MSE Loss | 0.548 | 0.0006 |
+| Cosine Similarity | 0.05 | 0.10-0.12 |
+### Hyperparameters
+- Learning rate: 1e-4 with 500-step warmup + cosine decay
+- Optimizer: AdamW (β1=0.9, β2=0.999, weight_decay=0.01)
+- Steps: 20,000
+- Batch size: 1
+- Gradient clipping: max_norm=1.0
+- Hardware: NVIDIA GB10 (128GB unified memory)
+- Training time: ~55 minutes
 ## Usage
 ```python
+from transformers import AutoModelForCausalLM, AutoProcessor
 from PIL import Image
+model = AutoModelForCausalLM.from_pretrained(
     "jduartedj/MiniCPM-V-4.6-35B-Abliterated",
     trust_remote_code=True,
+    torch_dtype="auto",
+    device_map="auto",
 )
+processor = AutoProcessor.from_pretrained(
     "jduartedj/MiniCPM-V-4.6-35B-Abliterated",
+    trust_remote_code=True,
 )
+image = Image.open("your_image.jpg").convert("RGB")
+messages = [
+    {"role": "user", "content": [
+        {"type": "image"},
+        {"type": "text", "text": "Describe this image in detail."},
+    ]},
+]
+text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
+output = model.generate(**inputs, max_new_tokens=512)
+print(processor.decode(output[0], skip_special_tokens=True))
 ```
+## Requirements
+- `transformers >= 5.7.0` (native `minicpmv4_6` support)
+- `torch >= 2.1.0`
+- `torchvision`
+- ~67GB disk space for weights
+- ~75GB+ GPU memory for inference (or use quantization)
 ## Limitations
+- The merger was trained with proxy MSE loss (image embedding ↔ caption embedding), not end-to-end. Vision-language alignment may not be as strong as fully fine-tuned models.
+- The abliterated LLM may produce unfiltered content — use responsibly.
+- Cosine similarity between vision and text embeddings reaches ~0.10-0.12, indicating meaningful but not perfect alignment.
 ## Credits
+- **[openbmb](https://huggingface.co/openbmb)** — MiniCPM-V-4.6 vision architecture and weights
+- **[huihui-ai](https://huggingface.co/huihui-ai)** — Abliterated Qwen3.5-35B-A3B language model
+- **Assembly & merger training** by [jduartedj](https://huggingface.co/jduartedj)
+## License
+Apache 2.0

model.safetensors.index.json CHANGED Viewed

The diff for this file is too large to render. See raw diff