How to use from
MLX LM
Generate or start a chat session
# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Irfanuruchi/SmolLM-1.7B-Instruct-MLX-4bit"
Run an OpenAI-compatible server
# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Irfanuruchi/SmolLM-1.7B-Instruct-MLX-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Irfanuruchi/SmolLM-1.7B-Instruct-MLX-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'
Quick Links

SmolLM-1.7B-Instruct (MLX 4-bit)

A 4-bit MLX quantized build of HuggingFaceTB/SmolLM-1.7B-Instruct, optimized for Apple Silicon local inference.

Benchmark Environment

  • Device: MacBook Pro (M3 Pro)
  • Runtime: MLX
  • Quantization: ~4.5 bits per weight

Performance (Measured)

  • Disk size: ~922 MB
  • Peak memory: ~1.08 GB
  • Generation speed: ~110 tokens/sec

Benchmarks were collected on macOS (M3 Pro).
Performance on iPhone / iPad will vary based on hardware and available memory.

Usage

mlx_lm.generate \
  --model Irfanuruchi/SmolLM-1.7B-Instruct-MLX-4bit \
  --prompt "In 5 sentences, explain the Pomodoro technique and how to start today." \
  --max-tokens 140

License

Upstream SmolLM is released under Apache-2.0. Preserve attribution and the original license terms.

Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Irfanuruchi/SmolLM-1.7B-Instruct-MLX-4bit

Quantized
(19)
this model

Datasets used to train Irfanuruchi/SmolLM-1.7B-Instruct-MLX-4bit