Instructions to use cs-552-2026-databand/general_knowledge_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cs-552-2026-databand/general_knowledge_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cs-552-2026-databand/general_knowledge_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cs-552-2026-databand/general_knowledge_model")
model = AutoModelForCausalLM.from_pretrained("cs-552-2026-databand/general_knowledge_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cs-552-2026-databand/general_knowledge_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cs-552-2026-databand/general_knowledge_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-databand/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cs-552-2026-databand/general_knowledge_model

SGLang

How to use cs-552-2026-databand/general_knowledge_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cs-552-2026-databand/general_knowledge_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-databand/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cs-552-2026-databand/general_knowledge_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-databand/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cs-552-2026-databand/general_knowledge_model with Docker Model Runner:
```
docker model run hf.co/cs-552-2026-databand/general_knowledge_model
```

joelleachkar commited on about 7 hours ago

Commit

f1a22ca

verified ·

1 Parent(s): e85fa38

Upload final SFT boxed-only general knowledge model

Browse files

Files changed (2) hide show

README.md +155 -24
model.safetensors +1 -1

README.md CHANGED Viewed

@@ -1,40 +1,171 @@
 ---
 license: apache-2.0
 ---
-## v3 GRPO general knowledge model
-Updated: 2026-06-04 11:22 UTC
-This repository stores the final v3 GRPO general knowledge model for the CS-552 2026 Databand project.
-Model source on the training cluster:
-/scratch/general_knowledge_sft_v3_lora_grpo/outputs/grpo_v3_maxredux_4000/final
-The model was trained from the v3 LoRA SFT model using GRPO on the MMLU-Pro / MMLU-Redux general-knowledge data split.
-The final model files were verified locally before upload, including:
-- config.json
-- generation_config.json
-- model.safetensors
-- tokenizer.json
-- tokenizer_config.json
-- chat_template.jinja
-Important generation/config fields:
-- bos_token_id = 151643
-- eos_token_id = 151645
-- pad_token_id = 151643
-- use_cache = True
-- generation eos_token_id = [151645, 151643]
-- temperature = 0.1
-- top_k = 20
-- top_p = 0.8
-Expected output format:
-\boxed{LETTER}

 ---
 license: apache-2.0
+base_model: Qwen/Qwen3-1.7B
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- qwen3
+- multiple-choice
+- general-knowledge
+- lora
+- sft
+- boxed-answer
 ---
+# General Knowledge Model
+This is the final General Knowledge individual model for the CS-552 Modern NLP Spring 2026 standardized project.
+The submitted model is the SFT-only merged model. A later DPO experiment was run on ARC/CommonsenseQA mistakes, but it reduced benchmark accuracy, so it was not selected as the final model.
+## Model behavior
+The model is specialized for multiple-choice general knowledge questions. It is prompted to output exactly one final boxed answer, for example:
+\boxed{A}
+The chat template enforces concise answer-only behavior and supports choices labeled from A through T.
+## Training setup
+Starting point:
+- Baseline working model folder with the project chat template and generation config
+- LoRA SFT on top of the baseline model
+- Final model produced by merging the LoRA adapter into the baseline model
+Training method:
+- LoRA supervised fine-tuning
+- Loss masked so that only the final assistant boxed answer contributes to training
+- Prompt, system message, question text, choices, chat markers, and template tokens are masked with -100
+- Assistant target format: \boxed{LETTER}
+LoRA configuration:
+- r = 16
+- lora_alpha = 32
+- lora_dropout = 0.05
+- Target modules:
+  - q_proj
+  - k_proj
+  - v_proj
+  - o_proj
+  - gate_proj
+  - up_proj
+  - down_proj
+Main training hyperparameters:
+- Learning rate: 8e-5
+- Epochs: 1
+- Batch size per device: 1
+- Gradient accumulation steps: 8
+- Max sequence length: 8192
+- Precision: bf16
+- Scheduler: cosine
+- Warmup steps: 20
+## SFT datasets
+The SFT training data was built from:
+1. Kaggle LLM Science
+2. EduQG
+3. EduAdapt, MCQ-only questions
+4. NCERT_MCQs
+5. SciQ train
+6. OpenBookQA train
+The final SFT dataset was capped below 30,000 training rows.
+Final SFT data sizes:
+- Train: 26,120
+- Validation: 2,000
+The answer labels were balanced uniformly across A through T separately for train and validation.
+Train answer distribution:
+- A through T: 1,306 examples each
+Validation answer distribution:
+- A through T: 100 examples each
+## Evaluation
+The final selected model is the SFT-only merged model.
+Evaluation sets:
+- Validation set: 2,000
+- MMLU Pro: 2,000, uniformly sampled across categories
+- MMLU Redux: 2,000, uniformly sampled across subjects
+- SuperGPQA: 2,000, uniformly sampled across disciplines
+SFT-only results:
+| Benchmark | Boxed extraction | Accuracy |
+|---|---:|---:|
+| MMLU Pro 2k | 100.00% | 37.85% |
+| MMLU Redux 2k | 100.00% | 56.25% |
+| SuperGPQA 2k | 99.95% | 27.55% |
+Baseline comparison:
+| Benchmark | Baseline accuracy | SFT-only accuracy |
+|---|---:|---:|
+| Validation 2k | 16.00% | not logged in the final grep output |
+| MMLU Pro 2k | 18.05% | 37.85% |
+| MMLU Redux 2k | 11.40% | 56.25% |
+| SuperGPQA 2k | 15.85% | 27.55% |
+DPO experiment, not selected:
+| Benchmark | SFT + DPO accuracy |
+|---|---:|
+| Validation 2k | 79.75% |
+| MMLU Pro 2k | 35.25% |
+| MMLU Redux 2k | 50.90% |
+| SuperGPQA 2k | 23.45% |
+DPO improved the internal validation metric but reduced the external benchmark scores, so the SFT-only model was selected.
+## Expected input format
+The model expects a multiple-choice question formatted like:
+Question text here?
+Choices:
+A. first option
+B. second option
+C. third option
+D. fourth option
+It should answer with only:
+\boxed{A}
+## Reproducibility notes
+Important files from the training folder:
+- SFT trainer: scripts/train_v3_lora_sft_masked.py
+- SFT data builder: scripts/build_my_sft_data_balanced.py
+- Merge script: scripts/merge_v3_lora_adapter.py
+- Evaluation script: scripts/evaluate_mcq_accuracy.py
+Final selected model folder before upload:
+outputs/lora_sft_v3_boxed_only/merged_full_model
+SFT LoRA adapter:
+outputs/lora_sft_v3_boxed_only/final_adapter
+DPO adapter, experimental and not selected:
+outputs/lora_dpo_arc_csqa_on_sft/final_adapter

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:813eaad9f372af34c6dbe827ef83b2f6fa4242f384b4b4513f0aa5030b9e5e20
 size 3441185608

 version https://git-lfs.github.com/spec/v1
+oid sha256:bbac80fd49d664b0096daf5c93346809d1275bde1375705d0bda731204b5ab90
 size 3441185608