Instructions to use srswti/blackbird-she-doesnt-refuse-21b-cu with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use srswti/blackbird-she-doesnt-refuse-21b-cu with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="srswti/blackbird-she-doesnt-refuse-21b-cu")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("srswti/blackbird-she-doesnt-refuse-21b-cu")
model = AutoModelForCausalLM.from_pretrained("srswti/blackbird-she-doesnt-refuse-21b-cu")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use srswti/blackbird-she-doesnt-refuse-21b-cu with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "srswti/blackbird-she-doesnt-refuse-21b-cu"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srswti/blackbird-she-doesnt-refuse-21b-cu",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/srswti/blackbird-she-doesnt-refuse-21b-cu

SGLang

How to use srswti/blackbird-she-doesnt-refuse-21b-cu with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "srswti/blackbird-she-doesnt-refuse-21b-cu" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srswti/blackbird-she-doesnt-refuse-21b-cu",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "srswti/blackbird-she-doesnt-refuse-21b-cu" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srswti/blackbird-she-doesnt-refuse-21b-cu",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use srswti/blackbird-she-doesnt-refuse-21b-cu with Docker Model Runner:
```
docker model run hf.co/srswti/blackbird-she-doesnt-refuse-21b-cu
```

blackbird-she-doesnt-refuse-21b-cu

We promise she wont refuse.

This is different It's simple yet intuitive approach of using norm-preserving biprojected abliteration, this model delivers unrestricted intelligence without sacrificing reasoning quality.

The Methodology: Beyond Standard Abliteration

Standard abliteration simply subtracts a "refusal vector" from model weights. While this removes censorship, it is mathematically unprincipled—it destroys the magnitude of neurons, damaging the delicate feature norms learned during training. The result is degraded logic, hallucinations, and what researchers colloquially call "lobotomized" models.

We use norm-preserving biprojected abliteration, which eliminates refusals while preserving the model's intelligence. The process involves three distinct steps, each addressing a specific mathematical challenge.

Step one: Biprojection (targeting). We refine the refusal direction to be mathematically orthogonal to harmless directions. This ensures that removing refusal behavior does not accidentally remove healthy concepts. The biprojection provides surgical precision in identifying what to modify.

Step two: Decomposition. We decompose model weights into magnitude and direction components, separating the "what to say" from "how loud to say it." This enables targeted modification without collateral damage to the broader weight structure.

Step three: Norm-preservation. We remove the refusal component solely from the directional aspect, then recombine with original magnitudes. This maintains the "importance" structure of the neural network—the relative strength of different features remains intact.

The Result: Better Than Baseline

By preserving weight norms, we maintain the neural network's internal structure. Our benchmarks suggest this method does not just avoid the "safety tax"—it potentially improves reasoning capabilities, as the model no longer wastes compute suppressing its own outputs. You may discover knowledge and capabilities that were not exposed in the original model, not because they were absent, but because they were actively suppressed.

This is a technical observation, not a philosophical claim. The model has the same parameters, the same training data, the same underlying architecture. What changed is the removal of a specific directional component that caused certain outputs to be avoided regardless of their technical correctness or relevance.

What to Expect

Blackbird provides unrestricted capability. There are no artificial limitations on responses, no unnecessary hedging, no reflexive refusals to explore topics. You may occasionally observe brief consideration before proceeding on certain queries—this is residual behavior from the base model's training, not a fundamental limitation of the abliteration process.

The model maintains high-performance reasoning. Sophisticated tool usage remains intact. Instruction-following capabilities are enhanced, as the model no longer needs to balance your request against internal refusal heuristics. Benchmark performance is at or above baseline across reasoning, code generation, and general knowledge tasks.

Architecture and Performance

Blackbird shares the same underlying architecture as Centenario: 21 billion total parameters with 3.6 billion active per token. 128K token context window. MXFP4 quantization brings memory usage to 11-21GB depending on your configuration. On M-series Apple Silicon, you are getting 40-70 tokens per second sustained throughput.

The mixture of experts architecture uses alternating dense and locally banded sparse attention. Rotary position embeddings handle positional information. Grouped multi-query attention with a group size of 8 optimizes inference speed. Inference is handled through MLX, leveraging Apple's unified memory architecture for efficient on-device processing.

Intended Use

Blackbird is designed for advanced users who require maximum flexibility. Research applications without constraints. Creative and experimental projects. Scenarios demanding unrestricted capability where the user, not the model, determines what is appropriate.

This model is appropriate for informed users who understand the implications of uncensored AI. It will respond to requests that other models refuse. It will not lecture you about potential misuse. It assumes you are an adult who can make your own decisions about what you should or should not generate.

The technical work is in making abliteration preserve reasoning quality. The ethical work is yours.

Technical Notes on Abliteration

The norm-preserving biprojection approach represents a significant improvement over naive abliteration methods. Standard approaches treat refusal as a simple linear direction in weight space that can be subtracted out. This ignores the geometry of the learned representations—weight magnitudes encode feature importance, and destroying them degrades model capability.

By decomposing weights into magnitude and direction, we can modify the direction (removing the refusal component) while preserving magnitudes (maintaining feature importance). The biprojection step ensures orthogonality between the refusal direction and harmless directions, preventing overcorrection.

The mathematical framework is based on projective geometry and subspace analysis. We identify the refusal subspace through careful analysis of model activations on refused prompts, then construct an orthogonal complement that preserves everything except refusal behavior. The result is a model that maintains its reasoning capabilities while removing the learned tendency to refuse certain classes of requests.

Downloads last month: 181

Safetensors

Model size

21B params

Tensor type

BF16

Collection including srswti/blackbird-she-doesnt-refuse-21b-cu

cuDega

Collection

Optimized for cuda acceleration • 10 items • Updated Apr 30