Instructions to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF", filename="v1-73845650-00000000.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Use Docker
docker model run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
- Ollama
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Ollama:
ollama run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
- Unsloth Studio
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF to start chatting
- Pi
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Docker Model Runner:
docker model run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
- Lemonade
How to use doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Llama-3.2-3B-genmount-tcm-GGUF-Q4_K_M
List all available models
lemonade list
Access Genmount TCM-reference (educational use only)
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This model is for education and reference only — it is NOT a medical device and its outputs are NOT a diagnosis, prescription, or treatment recommendation. Always consult a qualified practitioner. By requesting access you accept the Llama 3.2 Community License and the terms in this model card.
Log in or Sign Up to review the conditions and access this model content.
Genmount TCM-reference (GGUF, Q4_K_M)
A LoRA fine-tune of Llama-3.2-3B-Instruct, merged and quantized to GGUF (Q4_K_M) for fast local inference. It is built to surface classical-source-grounded, citation-bearing answers about Traditional Chinese Medicine (TCM) texts, and to refuse diagnosis and prescription requests by design.
This is part of the Genmount traditional-medicine reference family — three small, education-only models that read classical literature and cite it, rather than acting as clinical tools. See Model family.
Built with Llama. This model is a derivative of Llama 3.2 and is distributed under the Llama 3.2 Community License.
At a glance
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Parameters | 3B |
| Format / quant | GGUF, Q4_K_M |
| File | v1-73845650-00000000.Q4_K_M.gguf — 2,019,377,344 B (≈ 1.88 GiB) |
| sha256 | dc434a13c3761150689fdb234f0748ba8b10979d49cdddc3e213bdca91ec91a7 |
| Context | Up to 128K tokens (Llama 3.2 base; effective window set by your runtime) |
| Language | Chinese (classical TCM literature) |
| Domain | Classical Traditional-Chinese-Medicine texts |
| Adapter version | v1-73845650-00000000 |
| License | Llama 3.2 Community License + educational-use terms (this card) |
| Status | Education / reference; non-SaMD; access-gated |
What it does
- Explains and quotes classical TCM passages, returning answers that carry source anchors so you can trace a statement back to the literature.
- Stays in scope. It is trained to surface "what the classical literature says" and to say when it does not know instead of inventing an answer.
- Refuses by default on diagnosis, prescription, dosing, and treatment requests, and on attempts to extract clinical instructions.
Intended use & scope
- Education and reference over classical medical literature: explaining passages, citing sources, structured study aid.
- Not a diagnosis, prescription, dosing, or treatment tool. The model is trained to refuse such requests.
- Not a medical device (non-SaMD). No clinical validation is claimed.
- Outputs are study prompts to be verified against primary sources — not medical authority.
How to run
The model runs in any llama.cpp-compatible runtime. repeat_penalty 1.15 is
required (see parameters) — the genmount
client sets it for you.
genmount client (recommended)
pip install genmount # Python ≥ 3.10
genmount doctor # checks Python, Ollama, config
ollama create genmount-tcm -f Modelfile # FROM ./v1-73845650-00000000.Q4_K_M.gguf
genmount chat "What does the classical literature say about ...?" # 100% local
The official genmount client runs the
model locally via Ollama (no account needed for local use) and handles the
required repeat_penalty. Source:
github.com/doorm-ai/genmount.
Ollama (raw)
# Modelfile:
# FROM ./v1-73845650-00000000.Q4_K_M.gguf
# PARAMETER repeat_penalty 1.15
ollama create genmount-tcm -f Modelfile
ollama run genmount-tcm
llama.cpp
./llama-cli -m v1-73845650-00000000.Q4_K_M.gguf --repeat-penalty 1.15 -cnv
llama-cpp-python
from llama_cpp import Llama
llm = Llama(model_path="v1-73845650-00000000.Q4_K_M.gguf", n_ctx=8192)
out = llm.create_chat_completion(
messages=[{"role": "user", "content": "..."}],
repeat_penalty=1.15,
)
LM Studio / Jan — import the .gguf and set repeat penalty to 1.15.
Recommended parameters
| Parameter | Value | Why |
|---|---|---|
repeat_penalty |
1.15 (required) | Single-epoch QLoRA on long-form classical corpora will loop under pure greedy decoding without it. |
temperature |
0.2–0.5 | Keeps answers close to the cited source; raise only for exploratory study. |
top_p |
0.9 | Default works well. |
Training & data
- Base:
meta-llama/Llama-3.2-3B-Instruct - Method: QLoRA (4-bit NF4); the LoRA adapter is merged into the base and the result quantized to Q4_K_M.
- Data: retrieval/citation-grounded instruction pairs built from public-domain and licensed classical-text sources, plus a refusal-by-default safety set. Answers carry source anchors; unanswerable or out-of-scope prompts are refused rather than fabricated.
- Provenance: license-gated sources are used only where redistribution and derivative use are permitted; the released weights contain no raw corpus text.
- Adapter version:
v1-73845650-00000000
Evaluation
Internal acceptance only. The released build was checked to: (1) load and run in llama.cpp/Ollama, (2) produce source-grounded Chinese answers within the classical-text domain, and (3) hold the refusal red-line on out-of-scope prompts (diagnosis / prescription / dosage). No public benchmark and no clinical validation have been performed, and none is claimed.
Ethics, safety & acceptable use
- Education only. Do not use outputs to diagnose, prescribe, dose, or treat any condition, for yourself or others. Always consult a qualified practitioner.
- No clinical decisions. This model must not be embedded in a medical device or clinical-decision workflow.
- Verify before relying. A 3B model can be confidently wrong; treat every output as a pointer into the literature, not a conclusion.
- Access is gated and governed by the Llama 3.2 Community License plus the educational-use terms in this card.
Limitations
- Small (3B) model: can still make factual errors; verify against primary sources.
- Domain-bound: classical TCM literature only — not general medical QA, not Western clinical medicine.
- Quantized (Q4_K_M): some fidelity loss versus full precision.
- Reference, not reasoning engine: it surfaces and cites; it does not perform clinical reasoning.
Model family
| Model | Domain |
|---|---|
Llama-3.2-3B-genmount-tcm-GGUF |
Traditional Chinese Medicine (this model) |
Llama-3.2-3B-genmount-ayurveda-GGUF |
Ayurveda |
Llama-3.2-3B-genmount-tibetan-GGUF |
Tibetan medicine (Sowa Rigpa) |
License & attribution
Built with Llama. Distributed under the Llama 3.2 Community License; the acceptable-use and educational-only terms in this card apply in addition. When redistributing, retain the "Built with Llama" notice and the license.
Contact
- 📧 service@doorm.ai — access, questions, and commercial / non-AGPL licensing
- 🌐 genmount.com · client: github.com/doorm-ai/genmount ·
genmounton PyPI - DOORM AI PTE. LTD. · Singapore
- Downloads last month
- 1
4-bit
Model tree for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF
Base model
meta-llama/Llama-3.2-3B-Instruct