How to use from
llama.cpp
# Gated model: Login with a HF token with gated access permission
hf auth login
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Use Docker
docker model run hf.co/doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF:Q4_K_M
Quick Links

Access Genmount TCM-reference (educational use only)

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model is for education and reference only — it is NOT a medical device and its outputs are NOT a diagnosis, prescription, or treatment recommendation. Always consult a qualified practitioner. By requesting access you accept the Llama 3.2 Community License and the terms in this model card.

Log in or Sign Up to review the conditions and access this model content.

Genmount TCM-reference (GGUF, Q4_K_M)

A LoRA fine-tune of Llama-3.2-3B-Instruct, merged and quantized to GGUF (Q4_K_M) for fast local inference. It is built to surface classical-source-grounded, citation-bearing answers about Traditional Chinese Medicine (TCM) texts, and to refuse diagnosis and prescription requests by design.

This is part of the Genmount traditional-medicine reference family — three small, education-only models that read classical literature and cite it, rather than acting as clinical tools. See Model family.

Built with Llama. This model is a derivative of Llama 3.2 and is distributed under the Llama 3.2 Community License.

At a glance

Base model meta-llama/Llama-3.2-3B-Instruct
Parameters 3B
Format / quant GGUF, Q4_K_M
File v1-73845650-00000000.Q4_K_M.gguf — 2,019,377,344 B (≈ 1.88 GiB)
sha256 dc434a13c3761150689fdb234f0748ba8b10979d49cdddc3e213bdca91ec91a7
Context Up to 128K tokens (Llama 3.2 base; effective window set by your runtime)
Language Chinese (classical TCM literature)
Domain Classical Traditional-Chinese-Medicine texts
Adapter version v1-73845650-00000000
License Llama 3.2 Community License + educational-use terms (this card)
Status Education / reference; non-SaMD; access-gated

What it does

  • Explains and quotes classical TCM passages, returning answers that carry source anchors so you can trace a statement back to the literature.
  • Stays in scope. It is trained to surface "what the classical literature says" and to say when it does not know instead of inventing an answer.
  • Refuses by default on diagnosis, prescription, dosing, and treatment requests, and on attempts to extract clinical instructions.

Intended use & scope

  • Education and reference over classical medical literature: explaining passages, citing sources, structured study aid.
  • Not a diagnosis, prescription, dosing, or treatment tool. The model is trained to refuse such requests.
  • Not a medical device (non-SaMD). No clinical validation is claimed.
  • Outputs are study prompts to be verified against primary sources — not medical authority.

How to run

The model runs in any llama.cpp-compatible runtime. repeat_penalty 1.15 is required (see parameters) — the genmount client sets it for you.

genmount client (recommended)

pip install genmount          # Python ≥ 3.10
genmount doctor               # checks Python, Ollama, config
ollama create genmount-tcm -f Modelfile   # FROM ./v1-73845650-00000000.Q4_K_M.gguf
genmount chat "What does the classical literature say about ...?"   # 100% local

The official genmount client runs the model locally via Ollama (no account needed for local use) and handles the required repeat_penalty. Source: github.com/doorm-ai/genmount.

Ollama (raw)

# Modelfile:
#   FROM ./v1-73845650-00000000.Q4_K_M.gguf
#   PARAMETER repeat_penalty 1.15
ollama create genmount-tcm -f Modelfile
ollama run genmount-tcm

llama.cpp

./llama-cli -m v1-73845650-00000000.Q4_K_M.gguf --repeat-penalty 1.15 -cnv

llama-cpp-python

from llama_cpp import Llama
llm = Llama(model_path="v1-73845650-00000000.Q4_K_M.gguf", n_ctx=8192)
out = llm.create_chat_completion(
    messages=[{"role": "user", "content": "..."}],
    repeat_penalty=1.15,
)

LM Studio / Jan — import the .gguf and set repeat penalty to 1.15.

Recommended parameters

Parameter Value Why
repeat_penalty 1.15 (required) Single-epoch QLoRA on long-form classical corpora will loop under pure greedy decoding without it.
temperature 0.2–0.5 Keeps answers close to the cited source; raise only for exploratory study.
top_p 0.9 Default works well.

Training & data

  • Base: meta-llama/Llama-3.2-3B-Instruct
  • Method: QLoRA (4-bit NF4); the LoRA adapter is merged into the base and the result quantized to Q4_K_M.
  • Data: retrieval/citation-grounded instruction pairs built from public-domain and licensed classical-text sources, plus a refusal-by-default safety set. Answers carry source anchors; unanswerable or out-of-scope prompts are refused rather than fabricated.
  • Provenance: license-gated sources are used only where redistribution and derivative use are permitted; the released weights contain no raw corpus text.
  • Adapter version: v1-73845650-00000000

Evaluation

Internal acceptance only. The released build was checked to: (1) load and run in llama.cpp/Ollama, (2) produce source-grounded Chinese answers within the classical-text domain, and (3) hold the refusal red-line on out-of-scope prompts (diagnosis / prescription / dosage). No public benchmark and no clinical validation have been performed, and none is claimed.

Ethics, safety & acceptable use

  • Education only. Do not use outputs to diagnose, prescribe, dose, or treat any condition, for yourself or others. Always consult a qualified practitioner.
  • No clinical decisions. This model must not be embedded in a medical device or clinical-decision workflow.
  • Verify before relying. A 3B model can be confidently wrong; treat every output as a pointer into the literature, not a conclusion.
  • Access is gated and governed by the Llama 3.2 Community License plus the educational-use terms in this card.

Limitations

  • Small (3B) model: can still make factual errors; verify against primary sources.
  • Domain-bound: classical TCM literature only — not general medical QA, not Western clinical medicine.
  • Quantized (Q4_K_M): some fidelity loss versus full precision.
  • Reference, not reasoning engine: it surfaces and cites; it does not perform clinical reasoning.

Model family

Model Domain
Llama-3.2-3B-genmount-tcm-GGUF Traditional Chinese Medicine (this model)
Llama-3.2-3B-genmount-ayurveda-GGUF Ayurveda
Llama-3.2-3B-genmount-tibetan-GGUF Tibetan medicine (Sowa Rigpa)

License & attribution

Built with Llama. Distributed under the Llama 3.2 Community License; the acceptable-use and educational-only terms in this card apply in addition. When redistributing, retain the "Built with Llama" notice and the license.

Contact

Downloads last month
1
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for doorm-ai/Llama-3.2-3B-genmount-tcm-GGUF

Quantized
(479)
this model