Instructions to use prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B

SGLang

How to use prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Hatshepsut-Qwen3_QWQ-LCoT-4B

Hatshepsut-Qwen3_QWQ-LCoT-4B is a fine-tuned variant of the Qwen3-4B architecture, explicitly trained on QWQ Synthetic datasets with support for Least-to-Complexity-of-Thought (LCoT) prompting. This model is optimized for precise mathematical reasoning, logic-driven multi-step solutions, and structured technical outputs, while being compute-efficient and instruction-aligned.

GGUF : https://huggingface.co/prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B-Q4_K_M-GGUF

Key Features

LCoT Prompting Mastery Specifically tuned to handle Least-to-Complexity-of-Thought prompting, encouraging granular reasoning from simple to complex steps in problem solving.
QWQ-Based Precision Reasoning Built on the QWQ synthetic datasets, ensuring high-fidelity outputs in symbolic logic, algebraic manipulation, and mathematical word problems.
Code Understanding & Logic Generation Interprets and writes concise, logically sound code snippets in Python, C++, and JavaScript, with special focus on algorithmic steps and edge case handling.
Structured Output Control Outputs responses in JSON, Markdown, LaTeX, and table formats, ideal for educational material, notebooks, and structured reasoning chains.
Multilingual Reasoning Supports over 20 languages, enabling STEM-based problem solving and translation tasks across global languages.
Efficient 4B Parameter Footprint Lightweight yet powerful—suitable for researchers, educators, and developers running on mid-tier GPUs (e.g., A10, 3090, or L4).

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Hatshepsut-Qwen3_QWQ-LCoT-4B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Solve using LCoT: If 3x - 7 = 2(x + 1), what is the value of x?"

messages = [
    {"role": "system", "content": "You are a step-by-step reasoning assistant trained on QWQ datasets with LCoT support."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)