Instructions to use cs-552-2026-databand/general_knowledge_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cs-552-2026-databand/general_knowledge_model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cs-552-2026-databand/general_knowledge_model") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cs-552-2026-databand/general_knowledge_model") model = AutoModelForCausalLM.from_pretrained("cs-552-2026-databand/general_knowledge_model") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cs-552-2026-databand/general_knowledge_model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cs-552-2026-databand/general_knowledge_model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cs-552-2026-databand/general_knowledge_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cs-552-2026-databand/general_knowledge_model
- SGLang
How to use cs-552-2026-databand/general_knowledge_model with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cs-552-2026-databand/general_knowledge_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cs-552-2026-databand/general_knowledge_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cs-552-2026-databand/general_knowledge_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cs-552-2026-databand/general_knowledge_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use cs-552-2026-databand/general_knowledge_model with Docker Model Runner:
docker model run hf.co/cs-552-2026-databand/general_knowledge_model
General Knowledge Model
This is the final General Knowledge individual model for the CS-552 Modern NLP Spring 2026 standardized project.
The submitted model is the SFT-only merged model. A later DPO experiment was run on ARC/CommonsenseQA mistakes, but it reduced external benchmark accuracy, so it was not selected as the final model.
Model behavior
The model is specialized for multiple-choice general knowledge questions. It is prompted to output exactly one final boxed answer, for example:
\boxed{A}
The chat template enforces concise answer-only behavior and supports choices labeled from A through T.
Training setup
Starting point:
- Baseline working model folder with the project chat template and generation config
- LoRA SFT on top of the baseline model
- Final model produced by merging the LoRA adapter into the baseline model
Training method:
- LoRA supervised fine-tuning
- Loss masked so that only the final assistant boxed answer contributes to training
- Prompt, system message, question text, choices, chat markers, and template tokens are masked with -100
- Assistant target format: \boxed{LETTER}
LoRA configuration:
- r = 16
- lora_alpha = 32
- lora_dropout = 0.05
- Target modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
Main training hyperparameters:
- Learning rate: 8e-5
- Epochs: 1
- Batch size per device: 1
- Gradient accumulation steps: 8
- Max sequence length: 8192
- Precision: bf16
- Scheduler: cosine
- Warmup steps: 20
SFT datasets
The SFT training data was built from:
- Kaggle LLM Science
- EduQG
- EduAdapt, MCQ-only questions
- NCERT_MCQs
- SciQ train
- OpenBookQA train
Final SFT data sizes:
- Train: 26,120
- Validation: 2,000
The answer labels were balanced uniformly across A through T separately for train and validation.
Train answer distribution:
- A through T: 1,306 examples each
Validation answer distribution:
- A through T: 100 examples each
Evaluation
The final selected model is the SFT-only merged model.
The “SFT validation” set in the table is the held-out validation set created from the same six dataset families used for LoRA SFT training: Kaggle LLM Science, EduQG, EduAdapt MCQ, NCERT_MCQs, SciQ, and OpenBookQA. It contains 2,000 examples and is answer-balanced across A through T.
External benchmark sets:
- MMLU Pro: 2,000 examples, uniformly sampled across categories
- MMLU Redux: 2,000 examples, uniformly sampled across subjects
- SuperGPQA: 2,000 examples, uniformly sampled across disciplines
| Evaluation set | Baseline boxed | Baseline accuracy | SFT-only boxed | SFT-only accuracy | SFT + DPO boxed | SFT + DPO accuracy |
|---|---|---|---|---|---|---|
| SFT validation 2k | 19.20% | 16.00% | 100.00% | 85.30% | 100.00% | 79.75% |
| MMLU Pro 2k | 60.25% | 18.05% | 100.00% | 37.85% | 100.00% | 35.25% |
| MMLU Redux 2k | 26.65% | 11.40% | 100.00% | 56.25% | 100.00% | 50.90% |
| SuperGPQA 2k | 66.95% | 15.85% | 99.95% | 27.55% | 100.00% | 23.45% |
The DPO experiment improved neither the selected SFT validation score nor the external benchmark scores. Therefore, the SFT-only merged model was selected as the final model.
SFT validation details
SFT-only evaluation on the held-out SFT validation set:
- Total: 2,000
- Extracted boxed answer: 2,000 / 2,000 = 100.00%
- Accuracy: 1,706 / 2,000 = 85.30%
Accuracy by validation source:
| Source | Accuracy | Boxed extraction |
|---|---|---|
| eduadapt | 82.35% (14/17) | 100.00% (17/17) |
| eduqg | 76.86% (93/121) | 100.00% (121/121) |
| kaggle_llm_science | 58.30% (130/223) | 100.00% (223/223) |
| ncert_mcqs | 93.33% (14/15) | 100.00% (15/15) |
| openbookqa | 80.83% (430/532) | 100.00% (532/532) |
| sciq | 93.86% (1025/1092) | 100.00% (1092/1092) |
SuperGPQA boxed-answer edge case
The SFT-only model produced boxed answers for 1,999 out of 2,000 SuperGPQA examples. The single unboxed example was a long, LaTeX-heavy numerical analysis question whose answer choices contained multi-line mathematical derivations. Instead of producing a boxed option, the model continued/copy-completed part of one answer choice, generating text beginning with:
mathrm{d} x^{2} = 2.1730$ and $| R_{1} | ...
Increasing max_new_tokens from 20 to 64 did not change this outcome. The reported SuperGPQA result therefore keeps the strict extraction score of 99.95%.
Expected input format
The model expects a multiple-choice question formatted like:
Question text here?
Choices: A. first option B. second option C. third option D. fourth option
It should answer with only:
\boxed{A}
Reproducibility notes
Important files from the training folder:
- SFT trainer: scripts/train_v3_lora_sft_masked.py
- SFT data builder: scripts/build_my_sft_data_balanced.py
- DPO trainer used for the unselected experiment: scripts/train_v3_lora_dpo_boxed.py
- Merge script: scripts/merge_v3_lora_adapter.py
- Evaluation script: scripts/evaluate_mcq_accuracy.py
Final selected model folder before upload:
outputs/lora_sft_v3_boxed_only/merged_full_model
SFT LoRA adapter:
outputs/lora_sft_v3_boxed_only/final_adapter
DPO adapter, experimental and not selected:
outputs/lora_dpo_arc_csqa_on_sft/final_adapter
- Downloads last month
- 191