Instructions to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with Adapters:

from adapters import AutoAdapterModel

model = AutoAdapterModel.from_pretrained("undefined")
model.load_adapter("osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF", set_active=True)

llama-cpp-python

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF",
	filename="Nidum-Llama-3.2-3B-Uncensored-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Use Docker

docker model run hf.co/osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Ollama
How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with Ollama:
```
ollama run hf.co/osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M
```

Unsloth Studio

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF to start chatting

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with Docker Model Runner:
```
docker model run hf.co/osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M
```

Lemonade

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Nidum-Llama-3.2-3B-Uncensored-GGUF-Q4_K_M

List all available models

lemonade list

Nidum-Llama-3.2-3B-Uncensored

Welcome to Nidum!

At Nidum, we believe in pushing the boundaries of innovation by providing advanced and unrestricted AI models for every application. Dive into our world of possibilities and experience the freedom of Nidum-Llama-3.2-3B-Uncensored, tailored to meet diverse needs with exceptional performance.

Explore Nidum's Open-Source Projects on GitHub: https://github.com/NidumAI-Inc

Key Features

Uncensored Responses: Capable of addressing any query without content restrictions, offering detailed and uninhibited answers.
Versatility: Excels in diverse use cases, from complex technical queries to engaging casual conversations.
Advanced Contextual Understanding: Draws from an expansive knowledge base for accurate and context-aware outputs.
Extended Context Handling: Optimized for handling long-context interactions for improved continuity and depth.
Customizability: Adaptable to specific tasks and user preferences through fine-tuning.

Use Cases

Open-Ended Q&A
Creative Writing and Ideation
Research Assistance
Educational Queries
Casual Conversations
Mathematical Problem Solving
Long-Context Dialogues

How to Use

To start using Nidum-Llama-3.2-3B-Uncensored, follow the sample code below:

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="nidum/Nidum-Llama-3.2-3B-Uncensored",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "Tell me something fascinating."},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)

Quantized Models Available for Download

Quantized Model Version	Description
Nidum-Llama-3.2-3B-Uncensored-F16.gguf	Full 16-bit floating point precision for maximum accuracy on high-end GPUs.
model-Q2_K.gguf	Optimized for minimal memory usage with lower precision, suitable for edge cases.
model-Q3_K_L.gguf	Balanced precision with enhanced memory efficiency for medium-range devices.
model-Q3_K_M.gguf	Mid-range quantization for moderate precision and memory usage balance.
model-Q3_K_S.gguf	Smaller quantization steps, offering moderate precision with reduced memory use.
model-Q4_0_4_4.gguf	Performance-optimized for low memory, ideal for lightweight deployment.
model-Q4_0_4_8.gguf	Extended quantization balancing memory use and inference speed.
model-Q4_0_8_8.gguf	Advanced memory precision targeting larger contexts.
model-Q4_K_M.gguf	High-efficiency quantization for moderate GPU resources.
model-Q4_K_S.gguf	Optimized for smaller-scale operations with compact memory footprint.
model-Q5_K_M.gguf	Balances performance and precision, ideal for robust inferencing environments.
model-Q5_K_S.gguf	Moderate quantization targeting performance with minimal resource usage.
model-Q6_K.gguf	High-precision quantization for accurate and stable inferencing tasks.
model-TQ1_0.gguf	Experimental quantization for targeted applications in test environments.
model-TQ2_0.gguf	High-performance tuning for experimental use cases and flexible precision.

Datasets and Fine-Tuning

The following fine-tuning datasets are leveraged to enhance specific model capabilities:

Uncensored Data: Enables unrestricted and uninhibited responses.
RAG-Based Fine-Tuning: Optimizes retrieval-augmented generation for knowledge-intensive tasks.
Long Context Fine-Tuning: Enhances the model's ability to process and maintain coherence in extended conversations.
Math-Instruct Data: Specially curated for precise and contextually accurate mathematical reasoning.

Benchmarks

After fine-tuning with uncensored data, Nidum-Llama-3.2-3B demonstrates superior performance compared to the original LLaMA model, particularly in accuracy and handling diverse, unrestricted scenarios.

Benchmark Summary Table

Benchmark	Metric	LLaMA 3.2 3B	Nidum 3.2 3B	Observation
GPQA	Exact Match (Flexible)	0.3	0.5	Nidum 3B demonstrates significant improvement, particularly in generative tasks.
	Accuracy	0.4	0.5	Consistent improvement, especially in zero-shot scenarios.
HellaSwag	Accuracy	0.3	0.4	Better performance in common sense reasoning tasks.
	Normalized Accuracy	0.3	0.4	Enhanced ability to understand and predict context in sentence completion.
	Normalized Accuracy (Stderr)	0.15275	0.1633	Slightly improved consistency in normalized accuracy.
	Accuracy (Stderr)	0.15275	0.1633	Shows robustness in reasoning accuracy compared to LLaMA 3B.

Insights:

GPQA Results: Fine-tuning on uncensored data has boosted Nidum 3B's Exact Match and Accuracy, particularly excelling in generative and zero-shot tasks involving domain-specific knowledge.
HellaSwag Results: Nidum 3B consistently outperforms LLaMA 3B in common sense reasoning benchmarks, indicating enhanced contextual and semantic understanding.

Contributing

We welcome contributions to improve and extend the model’s capabilities. Stay tuned for updates on how to contribute.

Contact

For inquiries, collaborations, or further information, please reach out to us at info@nidum.ai.

Explore the Possibilities

Dive into unrestricted creativity and innovation with Nidum Llama 3.2 3B Uncensored!

Downloads last month: 2,341

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

16-bit

Model tree for osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF

Base model

meta-llama/Llama-3.2-3B

Adapter

(280)

this model

Spaces using osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF 3

Collection including osmapi/Nidum-Llama-3.2-3B-Uncensored-GGUF

Nidum Uncensored GGUF

Collection

4 items • Updated Mar 21, 2025 • 3