Instructions to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with Adapters:

from adapters import AutoAdapterModel

model = AutoAdapterModel.from_pretrained("undefined")
model.load_adapter("osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit", set_active=True)

MLX

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit

Run Hermes

hermes

MLX LM

How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Nidum-Llama-3.2-3B-Uncensored-MLX-4bit

Welcome to Nidum!

At Nidum, we are committed to delivering cutting-edge AI models that offer advanced capabilities and unrestricted access to innovation. With Nidum-Llama-3.2-3B-Uncensored-MLX-4bit, we bring you a performance-optimized, space-efficient, and feature-rich model designed for diverse use cases.

Explore Nidum's Open-Source Projects on GitHub: https://github.com/NidumAI-Inc

Key Features

Compact and Efficient: Built in the MLX-4bit format for optimized performance with minimal memory usage.
Versatility: Excels in a wide range of tasks, including technical problem-solving, educational queries, and casual conversations.
Extended Context Handling: Capable of maintaining coherence in long-context interactions.
Seamless Integration: Enhanced compatibility with the mlx-lm library for a streamlined development experience.
Uncensored Access: Provides uninhibited responses across a variety of topics and applications.

How to Use

To utilize Nidum-Llama-3.2-3B-Uncensored-MLX-4bit, install the mlx-lm library and follow the example code below:

Installation

pip install mlx-lm

Usage

from mlx_lm import load, generate

# Load the model and tokenizer
model, tokenizer = load("nidum/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit")

# Create a prompt
prompt = "hello"

# Apply the chat template if available
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

# Generate the response
response = generate(model, tokenizer, prompt=prompt, verbose=True)

# Print the response
print(response)

About the Model

The nidum/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit model was converted to MLX format from nidum/Nidum-Llama-3.2-3B-Uncensored using mlx-lm version 0.19.2, providing the following benefits:

Smaller Memory Footprint: Ideal for environments with limited hardware resources.
High Performance: Retains the advanced capabilities of the original model while optimizing inference speed and efficiency.
Plug-and-Play Compatibility: Easily integrate with the mlx-lm ecosystem for seamless deployment.

Use Cases

Technical Problem Solving
Research and Educational Assistance
Open-Ended Q&A
Creative Writing and Ideation
Long-Context Dialogues
Unrestricted Knowledge Exploration

Datasets and Fine-Tuning

The model inherits the fine-tuned capabilities of its predecessor, Nidum-Llama-3.2-3B-Uncensored, including:

Uncensored Data: Ensures detailed and uninhibited responses.
RAG-Based Fine-Tuning: Optimizes retrieval-augmented generation for information-intensive tasks.
Math-Instruct Data: Tailored for precise mathematical reasoning.
Long-Context Fine-Tuning: Enhanced coherence and relevance in extended interactions.

Quantized Model Download

The MLX-4bit version is highly efficient, maintaining a balance between precision and memory usage.

Benchmark

Benchmark	Metric	LLaMA 3B	Nidum 3B	Observation
GPQA	Exact Match (Flexible)	0.3	0.5	Nidum 3B demonstrates significant improvement, particularly in generative tasks.
	Accuracy	0.4	0.5	Consistent improvement, especially in zero-shot scenarios.
HellaSwag	Accuracy	0.3	0.4	Better performance in common sense reasoning tasks.
	Normalized Accuracy	0.3	0.4	Enhanced ability to understand and predict context in sentence completion.
	Normalized Accuracy (Stderr)	0.15275	0.1633	Slightly improved consistency in normalized accuracy.
	Accuracy (Stderr)	0.15275	0.1633	Shows robustness in reasoning accuracy compared to LLaMA 3B.

Insights:

Compact Efficiency: The MLX-4bit model ensures high performance with reduced resource usage.
Enhanced Usability: Optimized for seamless integration with lightweight deployment scenarios.

Contributing

We invite contributions to further enhance the MLX-4bit model's capabilities. Reach out to us for collaboration opportunities.

Contact

For inquiries, support, or feedback, email us at info@nidum.ai.

Explore the Future

Embrace the power of innovation with Nidum-Llama-3.2-3B-Uncensored-MLX-4bit—the ideal blend of performance and efficiency.

Downloads last month: -

Safetensors

Model size

0.5B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit

Model tree for osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

osmapi/Nidum-Llama-3.2-3B-Uncensored

Adapter

(5)

this model

Collection including osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit

Nidum Uncensored MLX

Collection

2 items • Updated Dec 5, 2024