Instructions to use Laplaces-Red-Devils/v01-QA-CoT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Laplaces-Red-Devils/v01-QA-CoT with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "Laplaces-Red-Devils/v01-QA-CoT")

Transformers

How to use Laplaces-Red-Devils/v01-QA-CoT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Laplaces-Red-Devils/v01-QA-CoT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Laplaces-Red-Devils/v01-QA-CoT", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Laplaces-Red-Devils/v01-QA-CoT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Laplaces-Red-Devils/v01-QA-CoT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laplaces-Red-Devils/v01-QA-CoT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Laplaces-Red-Devils/v01-QA-CoT

SGLang

How to use Laplaces-Red-Devils/v01-QA-CoT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Laplaces-Red-Devils/v01-QA-CoT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laplaces-Red-Devils/v01-QA-CoT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Laplaces-Red-Devils/v01-QA-CoT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Laplaces-Red-Devils/v01-QA-CoT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Laplaces-Red-Devils/v01-QA-CoT with Docker Model Runner:
```
docker model run hf.co/Laplaces-Red-Devils/v01-QA-CoT
```

QA Stage 2 — COT Reasoning (NL + FOL → Answer + Explanation)

LoRA adapter for Qwen/Qwen2.5-3B-Instruct, fine-tuned on the Logic-Based Educational Queries dataset. Given natural-language premises, their FOL translations, and a question, the model reasons step-by-step and outputs a JSON answer with explanation.

Pipeline

NL premises + FOL premises + Question
        |
  QA COT Model (this adapter)
        |
  {"answer": "B", "explanation": "Premise 1 states..."}

This model is Stage 2 in a two-stage ensemble:

Stage 1 (FOL Model): NL → FOL (fol-v05-cot-augmented)
Stage 2 (This Model): NL + FOL + Question → Answer + Explanation

Accuracy (Dev set, 40 samples)

Epoch	Raw Accuracy	Avg Latency
1	20.0%	20.6s
5	52.5%	13.2s
10	47.5%	17.0s
15	50.0%	11.3s
19	55.0%	10.3s
20	55.0%	10.3s
25	55.0%	11.4s
29	57.5%	11.2s
30	52.5%	11.7s

Best raw accuracy: 57.5% (23/40) at epoch 29

Adjusted Accuracy (corrected gold labels)

5 samples in the dev set have gold label errors — the gold explanation contradicts the gold answer label. After manual verification, the model predicted correctly on all 5.

Sample	Gold (wrong)	Corrected	Pred	Evidence
4	Unknown	A	A	Explanation: "Option A is most effective because..."
20	No	Yes	Yes	Explanation: "So such a programmer exists"
25	No	Yes	Yes	Explanation: "Therefore, JavaScript supports..."
28	No	Yes	Yes	Explanation: "Thus, all committee members approve"
29	No	Yes	Yes	Explanation: "Thus, all faculty members think..."

Adjusted accuracy: 70.0% (28/40)

Training Details

Hyperparameters

Parameter	Value
Base model	`Qwen/Qwen2.5-3B-Instruct`
Method	LoRA (PEFT)
LoRA r	8
LoRA alpha	16
LoRA dropout	0.05
Target modules	q_proj, k_proj, v_proj, o_proj
Trainable params	3,686,400 (0.12%)
Epochs	30 (early stop patience=7)
Batch size	1 (gradient accumulation=8, effective=8)
Learning rate	2e-5
Warmup ratio	0.05
Weight decay	0.01
Precision	INT8 (bitsandbytes)
Max seq length	3500
Seed	42

Training Loss Curve

Epoch	Train Loss	Eval Loss	Token Accuracy
1	1.487	1.468	69.8%
3	0.854	0.420	89.9%
5	0.370	0.380	90.4%
10	0.345	0.347	91.1%
15	0.333	0.328	91.5%
20	0.328	0.319	91.7%
24	0.321	0.316	91.8%
25	0.315	0.316	91.9%

Best eval_loss: 0.3155 at step 1944 (epoch 24)

Dataset

Task: Logic-Based Educational Queries (MCQ + Yes/No)
Train: 647 QA samples (328 records)
Dev: 79 QA samples (41 records)
Test: 81 QA samples (41 records)
Input: NL premises + FOL premises + Question
Output: JSON {"answer": "<label>", "explanation": "<reasoning>"}

Training Infrastructure

Hardware: NVIDIA L4 (24GB VRAM)
Platform: Modal Cloud
Training time: ~6 hours (30 epochs)

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(base_model, "Laplaces-Red-Devils/qa-v01-cot-Qwen2.5-3B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

messages = [
    {"role": "system", "content": "You are a logic-based educational QA system..."},
    {"role": "user", "content": "Premises (NL):\n1. If a student attends lectures...\n\nPremises (FOL):\n1. ∀x (AttendsLectures(x) → UnderstandsMaterial(x))\n\nQuestion:\nWhich conclusion is best supported?\nA. ...\nB. ..."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# {"answer": "B", "explanation": "Premise 1 states..."}