Instructions to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Adapters
How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with Adapters:
from adapters import AutoAdapterModel model = AutoAdapterModel.from_pretrained("undefined") model.load_adapter("osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit", set_active=True) - MLX
How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi
How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit
Run Hermes
hermes
- MLX LM
How to use osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent# Add to ~/.pi/agent/models.json:
{
"providers": {
"mlx-lm": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"
}
]
}
}
}Run Pi
# Start Pi in your project directory:
piNidum-Llama-3.2-3B-Uncensored-MLX-4bit
Welcome to Nidum!
At Nidum, we are committed to delivering cutting-edge AI models that offer advanced capabilities and unrestricted access to innovation. With Nidum-Llama-3.2-3B-Uncensored-MLX-4bit, we bring you a performance-optimized, space-efficient, and feature-rich model designed for diverse use cases.
![]()
Explore Nidum's Open-Source Projects on GitHub: https://github.com/NidumAI-Inc
Key Features
- Compact and Efficient: Built in the MLX-4bit format for optimized performance with minimal memory usage.
- Versatility: Excels in a wide range of tasks, including technical problem-solving, educational queries, and casual conversations.
- Extended Context Handling: Capable of maintaining coherence in long-context interactions.
- Seamless Integration: Enhanced compatibility with the mlx-lm library for a streamlined development experience.
- Uncensored Access: Provides uninhibited responses across a variety of topics and applications.
How to Use
To utilize Nidum-Llama-3.2-3B-Uncensored-MLX-4bit, install the mlx-lm library and follow the example code below:
Installation
pip install mlx-lm
Usage
from mlx_lm import load, generate
# Load the model and tokenizer
model, tokenizer = load("nidum/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit")
# Create a prompt
prompt = "hello"
# Apply the chat template if available
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
# Generate the response
response = generate(model, tokenizer, prompt=prompt, verbose=True)
# Print the response
print(response)
About the Model
The nidum/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit model was converted to MLX format from nidum/Nidum-Llama-3.2-3B-Uncensored using mlx-lm version 0.19.2, providing the following benefits:
- Smaller Memory Footprint: Ideal for environments with limited hardware resources.
- High Performance: Retains the advanced capabilities of the original model while optimizing inference speed and efficiency.
- Plug-and-Play Compatibility: Easily integrate with the mlx-lm ecosystem for seamless deployment.
Use Cases
- Technical Problem Solving
- Research and Educational Assistance
- Open-Ended Q&A
- Creative Writing and Ideation
- Long-Context Dialogues
- Unrestricted Knowledge Exploration
Datasets and Fine-Tuning
The model inherits the fine-tuned capabilities of its predecessor, Nidum-Llama-3.2-3B-Uncensored, including:
- Uncensored Data: Ensures detailed and uninhibited responses.
- RAG-Based Fine-Tuning: Optimizes retrieval-augmented generation for information-intensive tasks.
- Math-Instruct Data: Tailored for precise mathematical reasoning.
- Long-Context Fine-Tuning: Enhanced coherence and relevance in extended interactions.
Quantized Model Download
The MLX-4bit version is highly efficient, maintaining a balance between precision and memory usage.
Benchmark
| Benchmark | Metric | LLaMA 3B | Nidum 3B | Observation |
|---|---|---|---|---|
| GPQA | Exact Match (Flexible) | 0.3 | 0.5 | Nidum 3B demonstrates significant improvement, particularly in generative tasks. |
| Accuracy | 0.4 | 0.5 | Consistent improvement, especially in zero-shot scenarios. | |
| HellaSwag | Accuracy | 0.3 | 0.4 | Better performance in common sense reasoning tasks. |
| Normalized Accuracy | 0.3 | 0.4 | Enhanced ability to understand and predict context in sentence completion. | |
| Normalized Accuracy (Stderr) | 0.15275 | 0.1633 | Slightly improved consistency in normalized accuracy. | |
| Accuracy (Stderr) | 0.15275 | 0.1633 | Shows robustness in reasoning accuracy compared to LLaMA 3B. |
Insights:
- Compact Efficiency: The MLX-4bit model ensures high performance with reduced resource usage.
- Enhanced Usability: Optimized for seamless integration with lightweight deployment scenarios.
Contributing
We invite contributions to further enhance the MLX-4bit model's capabilities. Reach out to us for collaboration opportunities.
Contact
For inquiries, support, or feedback, email us at info@nidum.ai.
Explore the Future
Embrace the power of innovation with Nidum-Llama-3.2-3B-Uncensored-MLX-4bit—the ideal blend of performance and efficiency.
- Downloads last month
- -
4-bit
Model tree for osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit
Base model
meta-llama/Llama-3.2-3B-Instruct
Start the MLX server
# Install MLX LM: uv tool install mlx-lm# Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit"