Instructions to use caid-technologies/parti-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use caid-technologies/parti-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="caid-technologies/parti-base")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("caid-technologies/parti-base", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use caid-technologies/parti-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "caid-technologies/parti-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "caid-technologies/parti-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/caid-technologies/parti-base

SGLang

How to use caid-technologies/parti-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "caid-technologies/parti-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "caid-technologies/parti-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "caid-technologies/parti-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "caid-technologies/parti-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use caid-technologies/parti-base with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for caid-technologies/parti-base to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for caid-technologies/parti-base to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for caid-technologies/parti-base to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="caid-technologies/parti-base",
    max_seq_length=2048,
)

Docker Model Runner
How to use caid-technologies/parti-base with Docker Model Runner:
```
docker model run hf.co/caid-technologies/parti-base
```

parti-base / README.md

Hudeani

Update README.md

e11d9af verified 5 days ago

preview code

Raw

History Blame Contribute Delete

6.35 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- caid
	- blueprint
	- hardware
	- cad
	- text-to-cad
	- manufacturing
	- agents
	- structured-generation
	- json
	- qwen2.5
	- unsloth
	pipeline_tag: text-generation
	base_model: Qwen/Qwen2.5-3B-Instruct
	library_name: transformers
	---

	# Parti Base — Qwen2.5-3B

	Parti turns natural language prompts into hardware designs and plans.

	Tell it what you want to build — "a compact desk clock with an e-ink display and a remote" —
	and it gives back a structured blueprint: the parts list, how the parts connect, step-by-step
	build instructions, rough costs, and a quick design check. Everything comes out as clean,
	organized data that an app can read and build on.

	This is the all-in-one model — it runs on its own, no add-ons needed. (There's also a small
	adapter-only version at
	[blueprint-base-lora](https://huggingface.co/caid-technologies/blueprint-base-lora).)

	📌 Note: Great for drafting and exploring ideas — not a replacement for real engineering, CAD software, or safety review.

	## Questions

	Contact us:
	[Caid Technologies](mailto:team@caid-technologies.com)

	---

	## What it can do

	Give it a hardware idea and it can produce any of:

	- 📋 a parts list (components)
	- 🔌 a wiring/connection map between the parts
	- 🛠️ ordered build steps
	- 💲 rough sourcing and cost info
	- ✅ a basic design check
	- 📦 or the whole project plan at once

	You can ask for the complete plan, or just one piece (like only the parts list).

	## What it's good for — and not

	✅ Good for: brainstorming hardware projects, drafting parts lists and build steps, and
	turning a rough idea into an organized starting plan.

	🚫 Not for: final engineering decisions, production CAD models, electrical safety, or anything
	safety-critical. Treat the output as a helpful first draft to review, not a finished design.

	## Try it

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	REPO = "caid-technologies/parti-base"
	model = AutoModelForCausalLM.from_pretrained(REPO, device_map="auto", torch_dtype="bfloat16")
	tok = AutoTokenizer.from_pretrained(REPO)

	msgs = [
	{"role": "system", "content":
	"You design hobbyist electronics projects. Given a request, reply with a single "
	"JSON object describing the full project. Output only the JSON."},
	{"role": "user", "content": "A compact desk clock with an e-ink display and an IR remote."},
	]
	inputs = tok.apply_chat_template(
	msgs, add_generation_prompt=True, return_tensors="pt", return_dict=True).to(model.device)
	out = model.generate(**inputs, max_new_tokens=6144, repetition_penalty=1.1,
	pad_token_id=tok.eos_token_id)
	print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
	```

	💡 Tip: keep `max_new_tokens` high (≥ 6000) so long plans aren't cut off, and keep
	`repetition_penalty=1.1` so wiring lists don't get stuck repeating. For Ollama/local apps,
	convert this model to GGUF with llama.cpp.

	## What it learned from

	It was trained on about 130 real-world hardware projects — things like weather stations,
	small robots, drones, smart-home gadgets, lab tools, and audio gear — expanded into a few
	thousand practice examples. Everything is DIY, maker-friendly electronics-plus-hardware.

	Most common project types in the training data:

	\| Project type \| Share \| Examples \|
	\|---\|---\|---\|
	\| Test & lab instruments \| ~20% \| function generator, Geiger counter \|
	\| Smart-home / IoT gadgets \| ~15% \| pet feeder, smart mailbox, pill dispenser \|
	\| Radio, comms & networking \| ~9% \| LoRa base station, APRS tracker, NAS \|
	\| Wearables & health \| ~8% \| sleep ring, heart-rate strap \|
	\| Audio & music \| ~8% \| synth module, guitar pedal, speaker \|
	\| Robotics & motion \| ~7% \| quadruped robot, robotic arm \|
	\| Environmental sensing \| ~7% \| air-quality monitor, weather station \|
	\| Clocks & e-ink displays \| ~6% \| word clock, e-ink calendar \|
	\| Maker / fabrication tools \| ~5% \| vinyl cutter, pen plotter \|
	\| Drones & aerial \| ~5% \| FPV drone, VTOL aircraft \|
	\| Everything else \| ~10% \| lighting, games, automotive, power \|

	## Good to know (limitations)

	- It's a small model, so complex, many-part projects are harder for it.
	- It proposes designs; it doesn't verify them. Always sanity-check before building.
	- It's strongest on common project types (lab tools, smart-home) and weaker on rarer ones
	(games, automotive).

	## How well it works

	We tested it on projects it had never seen during training. Here's how often it produced a
	valid, well-structured result for each task:

	\| Task \| Valid result \|
	\|---\|---\|
	\| 🛠️ Build steps \| ~100% \|
	\| ✅ Design check \| ~100% \|
	\| 📋 Parts list \| ~95% \|
	\| 📦 Full project plan \| ~85–97% \|
	\| 🔌 Wiring map \| ~67% \|

	It's strongest at build steps, design checks, and parts lists. Full end-to-end plans are close
	behind, and wiring maps are the hardest (and most sensitive to the `repetition_penalty` tip
	above). Figures are from held-out testing and are being finalized for the current version.

	---

	<details>
	<summary> <b>Technical details</b> </summary>

	- Base model: `Qwen/Qwen2.5-3B-Instruct`; this repo is the fine-tune merged to 16-bit
	(standalone, no adapter needed).
	- Method: QLoRA with Unsloth (LoRA r=32, alpha=32, all attention+MLP projections), then merged.
	- Training: 1 epoch, max_seq_len 6144, effective batch 8, lr 2e-4 (linear, 3% warmup),
	adamw_8bit, NEFTune α=5, loss masked to assistant turns, early stopping on eval loss
	- Hardware: single RTX 4070 (12 GB)
	- Data: synthetic dataset projected into 6 task "modes" (full plan, parts, wiring,
	instructions, validation); split grouped by project so none leak between train/test.
	~3,242 rows; modes rebalanced (cap 350/mode) so the model doesn't coast on the easy ones.
	- Inference: `do_sample=False`, `repetition_penalty≈1.1`, `max_new_tokens≥6000`, pass the
	attention mask.

	```bibtex
	@misc{parti_base,
	title = {Parti Base: Qwen2.5-3B for structured hardware generation},
	author = {Caid Technologies},
	year = {2026},
	howpublished = {\url{https://huggingface.co/caid-technologies}}
	}
	```

	Built with [Unsloth](https://github.com/unslothai/unsloth) and 🤗 Transformers / PEFT / TRL.

	</details>