Instructions to use cs-552-2026-databand/general_knowledge_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cs-552-2026-databand/general_knowledge_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cs-552-2026-databand/general_knowledge_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cs-552-2026-databand/general_knowledge_model")
model = AutoModelForCausalLM.from_pretrained("cs-552-2026-databand/general_knowledge_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cs-552-2026-databand/general_knowledge_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cs-552-2026-databand/general_knowledge_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-databand/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cs-552-2026-databand/general_knowledge_model

SGLang

How to use cs-552-2026-databand/general_knowledge_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cs-552-2026-databand/general_knowledge_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-databand/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cs-552-2026-databand/general_knowledge_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-databand/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cs-552-2026-databand/general_knowledge_model with Docker Model Runner:
```
docker model run hf.co/cs-552-2026-databand/general_knowledge_model
```

General Knowledge Model

This is the final General Knowledge individual model for the CS-552 Modern NLP Spring 2026 standardized project.

The submitted model is the SFT-only merged model. A later DPO experiment was run on ARC/CommonsenseQA mistakes, but it reduced external benchmark accuracy, so it was not selected as the final model.

Model behavior

The model is specialized for multiple-choice general knowledge questions. It is prompted to output exactly one final boxed answer, for example:

\boxed{A}

The chat template enforces concise answer-only behavior and supports choices labeled from A through T.

Training setup

Starting point:

Baseline working model folder with the project chat template and generation config
LoRA SFT on top of the baseline model
Final model produced by merging the LoRA adapter into the baseline model

Training method:

LoRA supervised fine-tuning
Loss masked so that only the final assistant boxed answer contributes to training
Prompt, system message, question text, choices, chat markers, and template tokens are masked with -100
Assistant target format: \boxed{LETTER}

LoRA configuration:

r = 16
lora_alpha = 32
lora_dropout = 0.05
Target modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj

Main training hyperparameters:

Learning rate: 8e-5
Epochs: 1
Batch size per device: 1
Gradient accumulation steps: 8
Max sequence length: 8192
Precision: bf16
Scheduler: cosine
Warmup steps: 20

SFT datasets

The SFT training data was built from:

Kaggle LLM Science
EduQG
EduAdapt, MCQ-only questions
NCERT_MCQs
SciQ train
OpenBookQA train

Final SFT data sizes:

Train: 26,120
Validation: 2,000

The answer labels were balanced uniformly across A through T separately for train and validation.

Train answer distribution:

A through T: 1,306 examples each

Validation answer distribution:

A through T: 100 examples each

Evaluation

The final selected model is the SFT-only merged model.

The “SFT validation” set in the table is the held-out validation set created from the same six dataset families used for LoRA SFT training: Kaggle LLM Science, EduQG, EduAdapt MCQ, NCERT_MCQs, SciQ, and OpenBookQA. It contains 2,000 examples and is answer-balanced across A through T.

External benchmark sets:

MMLU Pro: 2,000 examples, uniformly sampled across categories
MMLU Redux: 2,000 examples, uniformly sampled across subjects
SuperGPQA: 2,000 examples, uniformly sampled across disciplines

Evaluation set	Baseline boxed	Baseline accuracy	SFT-only boxed	SFT-only accuracy	SFT + DPO boxed	SFT + DPO accuracy
SFT validation 2k	19.20%	16.00%	100.00%	85.30%	100.00%	79.75%
MMLU Pro 2k	60.25%	18.05%	100.00%	37.85%	100.00%	35.25%
MMLU Redux 2k	26.65%	11.40%	100.00%	56.25%	100.00%	50.90%
SuperGPQA 2k	66.95%	15.85%	99.95%	27.55%	100.00%	23.45%

The DPO experiment improved neither the selected SFT validation score nor the external benchmark scores. Therefore, the SFT-only merged model was selected as the final model.

SFT validation details

SFT-only evaluation on the held-out SFT validation set:

Total: 2,000
Extracted boxed answer: 2,000 / 2,000 = 100.00%
Accuracy: 1,706 / 2,000 = 85.30%

Accuracy by validation source:

Source	Accuracy	Boxed extraction
eduadapt	82.35% (14/17)	100.00% (17/17)
eduqg	76.86% (93/121)	100.00% (121/121)
kaggle_llm_science	58.30% (130/223)	100.00% (223/223)
ncert_mcqs	93.33% (14/15)	100.00% (15/15)
openbookqa	80.83% (430/532)	100.00% (532/532)
sciq	93.86% (1025/1092)	100.00% (1092/1092)

SuperGPQA boxed-answer edge case

The SFT-only model produced boxed answers for 1,999 out of 2,000 SuperGPQA examples. The single unboxed example was a long, LaTeX-heavy numerical analysis question whose answer choices contained multi-line mathematical derivations. Instead of producing a boxed option, the model continued/copy-completed part of one answer choice, generating text beginning with:

mathrm{d} x^{2} = 2.1730$ and $| R_{1} | ...

Increasing max_new_tokens from 20 to 64 did not change this outcome. The reported SuperGPQA result therefore keeps the strict extraction score of 99.95%.

Expected input format

The model expects a multiple-choice question formatted like:

Question text here?

Choices: A. first option B. second option C. third option D. fourth option

It should answer with only:

\boxed{A}

Reproducibility notes

Important files from the training folder:

SFT trainer: scripts/train_v3_lora_sft_masked.py
SFT data builder: scripts/build_my_sft_data_balanced.py
DPO trainer used for the unselected experiment: scripts/train_v3_lora_dpo_boxed.py
Merge script: scripts/merge_v3_lora_adapter.py
Evaluation script: scripts/evaluate_mcq_accuracy.py

Final selected model folder before upload:

outputs/lora_sft_v3_boxed_only/merged_full_model

SFT LoRA adapter:

outputs/lora_sft_v3_boxed_only/final_adapter

DPO adapter, experimental and not selected:

outputs/lora_dpo_arc_csqa_on_sft/final_adapter

Downloads last month: 191

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for cs-552-2026-databand/general_knowledge_model

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(517)

this model