Instructions to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF",
	filename="v1-73845650-00000000.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Use Docker

docker model run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Ollama
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Ollama:
```
ollama run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
```

Unsloth Studio

How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF to start chatting

How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Docker Model Runner:
```
docker model run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
```

Lemonade

How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Llama-3.2-3B-genmount-tcm-GGUF-Q4_K_M

List all available models

lemonade list

Access Genmount TCM-reference (educational use only)

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model is for education and reference only — it is NOT a medical device and its outputs are NOT a diagnosis, prescription, or treatment recommendation. Always consult a qualified practitioner. By requesting access you accept the Llama 3.2 Community License and the terms in this model card.

Genmount TCM-reference (GGUF, Q4_K_M)

A LoRA fine-tune of Llama-3.2-3B-Instruct, merged and quantized to GGUF (Q4_K_M) for fast local inference. It is built to surface classical-source-grounded, citation-bearing answers about Traditional Chinese Medicine (TCM) texts, and to refuse diagnosis and prescription requests by design.

This is part of the Genmount traditional-medicine reference family — three small, education-only models that read classical literature and cite it, rather than acting as clinical tools. See Model family.

Built with Llama. This model is a derivative of Llama 3.2 and is distributed under the Llama 3.2 Community License.

At a glance


Base model	`meta-llama/Llama-3.2-3B-Instruct`
Parameters	3B
Format / quant	GGUF, Q4_K_M
File	`v1-73845650-00000000.Q4_K_M.gguf` — 2,019,377,344 B (≈ 1.88 GiB)
sha256	`dc434a13c3761150689fdb234f0748ba8b10979d49cdddc3e213bdca91ec91a7`
Context	Up to 128K tokens (Llama 3.2 base; effective window set by your runtime)
Language	Chinese (classical TCM literature)
Domain	Classical Traditional-Chinese-Medicine texts
Adapter version	`v1-73845650-00000000`
License	Llama 3.2 Community License + educational-use terms (this card)
Status	Education / reference; non-SaMD; access-gated

What it does

Explains and quotes classical TCM passages, returning answers that carry source anchors so you can trace a statement back to the literature.
Stays in scope. It is trained to surface "what the classical literature says" and to say when it does not know instead of inventing an answer.
Refuses by default on diagnosis, prescription, dosing, and treatment requests, and on attempts to extract clinical instructions.

Intended use & scope

Education and reference over classical medical literature: explaining passages, citing sources, structured study aid.
Not a diagnosis, prescription, dosing, or treatment tool. The model is trained to refuse such requests.
Not a medical device (non-SaMD). No clinical validation is claimed.
Outputs are study prompts to be verified against primary sources — not medical authority.

How to run

The model runs in any llama.cpp-compatible runtime. repeat_penalty 1.15 is required (see parameters) — the genmount client sets it for you.

genmount client (recommended)

pip install genmount          # Python ≥ 3.10
genmount doctor               # checks Python, Ollama, config
ollama create genmount-tcm -f Modelfile   # FROM ./v1-73845650-00000000.Q4_K_M.gguf
genmount chat "What does the classical literature say about ...?"   # 100% local

The official genmount client runs the model locally via Ollama (no account needed for local use) and handles the required repeat_penalty. Source: github.com/doorm-ai/genmount.

Ollama (raw)

# Modelfile:
#   FROM ./v1-73845650-00000000.Q4_K_M.gguf
#   PARAMETER repeat_penalty 1.15
ollama create genmount-tcm -f Modelfile
ollama run genmount-tcm

llama.cpp

./llama-cli -m v1-73845650-00000000.Q4_K_M.gguf --repeat-penalty 1.15 -cnv

llama-cpp-python

from llama_cpp import Llama
llm = Llama(model_path="v1-73845650-00000000.Q4_K_M.gguf", n_ctx=8192)
out = llm.create_chat_completion(
    messages=[{"role": "user", "content": "..."}],
    repeat_penalty=1.15,
)

LM Studio / Jan — import the .gguf and set repeat penalty to 1.15.

Recommended parameters

Parameter	Value	Why
`repeat_penalty`	1.15 (required)	Single-epoch QLoRA on long-form classical corpora will loop under pure greedy decoding without it.
`temperature`	0.2–0.5	Keeps answers close to the cited source; raise only for exploratory study.
`top_p`	0.9	Default works well.

Training & data

Base: meta-llama/Llama-3.2-3B-Instruct
Method: QLoRA (4-bit NF4); the LoRA adapter is merged into the base and the result quantized to Q4_K_M.
Data: retrieval/citation-grounded instruction pairs built from public-domain and licensed classical-text sources, plus a refusal-by-default safety set. Answers carry source anchors; unanswerable or out-of-scope prompts are refused rather than fabricated.
Provenance: license-gated sources are used only where redistribution and derivative use are permitted; the released weights contain no raw corpus text.
Adapter version: v1-73845650-00000000

Evaluation

Internal acceptance only. The released build was checked to: (1) load and run in llama.cpp/Ollama, (2) produce source-grounded Chinese answers within the classical-text domain, and (3) hold the refusal red-line on out-of-scope prompts (diagnosis / prescription / dosage). No public benchmark and no clinical validation have been performed, and none is claimed.

Ethics, safety & acceptable use

Education only. Do not use outputs to diagnose, prescribe, dose, or treat any condition, for yourself or others. Always consult a qualified practitioner.
No clinical decisions. This model must not be embedded in a medical device or clinical-decision workflow.
Verify before relying. A 3B model can be confidently wrong; treat every output as a pointer into the literature, not a conclusion.
Access is gated and governed by the Llama 3.2 Community License plus the educational-use terms in this card.

Limitations

Small (3B) model: can still make factual errors; verify against primary sources.
Domain-bound: classical TCM literature only — not general medical QA, not Western clinical medicine.
Quantized (Q4_K_M): some fidelity loss versus full precision.
Reference, not reasoning engine: it surfaces and cites; it does not perform clinical reasoning.

Model family

Model	Domain
`Llama-3.2-3B-genmount-tcm-GGUF`	Traditional Chinese Medicine (this model)
`Llama-3.2-3B-genmount-ayurveda-GGUF`	Ayurveda
`Llama-3.2-3B-genmount-tibetan-GGUF`	Tibetan medicine (Sowa Rigpa)

License & attribution

Built with Llama. Distributed under the Llama 3.2 Community License; the acceptable-use and educational-only terms in this card apply in addition. When redistributing, retain the "Built with Llama" notice and the license.

Contact

📧 service@doorm.ai — access, questions, and commercial / non-AGPL licensing
🌐 genmount.com · client: github.com/doorm-ai/genmount · genmount on PyPI
DOORM AI PTE. LTD. · Singapore

Downloads last month: 1

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF

Base model

meta-llama/Llama-3.2-3B-Instruct

Quantized

(479)

this model