--- license: apache-2.0 language: - en base_model: openai-community/gpt2 tags: - claritas - data-selection - mathematical-reasoning - training-dynamics - gpt2 datasets: - DKYoon/SlimPajama-6B - openbmb/UltraData-Math --- # Claritas-GPT2: Trajectory-to-Trait Optimized Reasoning Model **[[Paper](https://arxiv.org/abs/your-arxiv-id)]** > **Claritas** (Latin for "clarity") is a training dynamics framework that optimizes Large Language Models by analyzing gradient trajectories. This repository contains the **model weights** for the lightweight implementation described in the paper *"Claritas: Trajectory-to-Trait Framework for Complex Reasoning Training Optimization in Large Language Models"*. ## 📝 Model Overview This model serves as a reproducible validation of the **Claritas Dynamic Data Selection Algorithm**. Unlike standard Supervised Fine-Tuning (SFT) that treats all data equally, this model was trained by dynamically selecting high-value, conflict-free samples. **Key Implementation Details:** 1. **Gradient Spectral Fingerprint (GSF)**: High-dimensional gradients were compressed into 128-dimensional signatures to track sample influence. 2. **Counterfactual Trajectory Contrast (CTC)**: Samples were scored based on their contribution to mathematical reasoning capabilities. 3. **Dynamic Conflict Graph**: The training process identified and excluded samples with opposing gradient directions to maximize learning efficiency. ## 📊 Performance Highlights While this specific release uses a GPT-2 backbone for accessibility, the underlying methodology has been validated on LLaMA-2-7B/70B in our paper, demonstrating significant improvements: | Method | Data Usage | GSM8K | MATH | BBH | | :--- | :---: | :---: | :---: | :---: | | Standard SFT | 100% | 52.4 | 12.1 | 38.5 | | Random Selection | 60% | 48.1 | 10.3 | 35.2 | | **Claritas (Ours)** | **60%** | **55.8** | **14.5** | **41.2** | > **Key Result**: The Claritas framework cuts training tokens by **40%** while boosting Pass@1 accuracy by **+2.5 points** on average over full-data SFT. ## 🚀 How to Use You can easily load this model using the Hugging Face `transformers` library for inference. ### Installation ```bash pip install transformers torch ``` ### Inference Example ```python import torch from transformers import GPT2LMHeadModel, GPT2Tokenizer # Load the Claritas-optimized model model_name = "Muki182/claritas-gpt2" # Replace with your repo ID model = GPT2LMHeadModel.from_pretrained(model_name) tokenizer = GPT2Tokenizer.from_pretrained(model_name) # Set pad token if not set tokenizer.pad_token = tokenizer.eos_token # Prepare input prompt = "Question: If I have 3 apples and eat 1, how many are left?\nAnswer:" inputs = tokenizer(prompt, return_tensors="pt") # Generate with torch.no_grad(): outputs = model.generate( inputs["input_ids"], max_new_tokens=50, do_sample=False, pad_token_id=tokenizer.eos_token_id ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## ⚙️ Training Specifications This model was trained using the Claritas algorithm with the following configurations: * **Base Model**: `gpt2` * **Training Data**: GSM8K (Mathematical Reasoning) * **Algorithm**: Claritas (GSF + CTC + Dynamic Conflict Graph) * **Spectral Dimension**: 128 * **Conflict Threshold**: -0.5 * **CTC Threshold**: 0.2 * **Optimizer**: AdamW ## 📖 Citation If you use this model or reference the Claritas framework in your research, please cite our paper: ```bibtex @article{feng2026claritas, title={Claritas: Trajectory-to-Trait Framework for Complex Reasoning Training Optimization in Large Language Models}, author={Feng, Junjie}, journal={arXiv preprint arXiv:your-arxiv-id}, year={2026} } ``` ## License This model is licensed under the **Apache License 2.0**.