--- base_model: Microsoft/FastContext-1.0-4B-SFT base_model_relation: quantized library_name: mlx pipeline_tag: text-generation license: mit tags: - qwen - qwen3 - coder - reasoning - agent - mlx - omlx - quantized - apple-silicon - image-text-to-text - image-to-text - video-to-text - any-to-any - Explorer SubAgent - Repository Exploration - FastContext language: - en --- # FastContext-1.0-4B-SFT-oQ6 An oQ6 quantized version of FastContext-1.0-4B-SFT optimized for Apple Silicon using oMLX. This model preserves the repository exploration capabilities of FastContext while significantly reducing memory usage and improving inference efficiency through mixed-precision oQ quantization. ## About FastContext FastContext is a lightweight repository-exploration subagent designed for coding agents. Instead of having a single model perform both repository exploration and problem solving, FastContext specializes in repository discovery and evidence gathering using parallel tool calls. The model explores repositories through: * READ * GLOB * GREP and returns concise file paths and line references for downstream coding agents. Original model: FastContext-1.0-4B-SFT. ## Quantization This release uses: * Quantization: oQ6 * Format: MLX * Target Platform: Apple Silicon * Mixed Precision: Enabled * Optimized for local inference The oQ quantization pipeline allocates higher precision to more sensitive weights while aggressively compressing less important regions of the network, providing a strong quality-to-size ratio. ## Recommended Inference Settings For best performance: ```yaml temperature: 0.7 top_p: 0.6 top_k: 20 min_p: 0 repetition_penalty: 1.05 presence_penalty: 1.5 thinking: true ``` ### oMLX Preset ```yaml temp: 0.7 top_p: 0.6 top_k: 20 min_p: 0 rep_penalty: 1.05 presence_penalty: 1.5 enable_thinking: true ``` These settings were selected to improve repository exploration quality, encourage broader search behavior, and maintain stable citation generation. ## Example Usage ```python from mlx_lm import load, generate model, tokenizer = load("FastContext-1.0-4B-SFT-oQ6") prompt = "Find where authentication tokens are validated." response = generate( model, tokenizer, prompt=prompt, temp=0.7, top_p=0.6, top_k=20, ) print(response) ``` ## Intended Use This model is intended for: * Repository exploration * Codebase navigation * SWE-bench style workflows * Coding agents * Retrieval and evidence gathering * Search-heavy software engineering tasks It is not intended to replace a primary coding model. FastContext works best as a specialized exploration subagent paired with a stronger reasoning or code-generation model. ## Performance FastContext was trained specifically to improve repository exploration efficiency and reduce the token overhead associated with repository search. The original paper reports improved end-to-end coding-agent performance while reducing token consumption across multiple SWE benchmarks. ## Recommended Deployment Apple Silicon: * M1 Pro / Max * M2 Pro / Max / Ultra * M3 Series * M4 Series Works well with: * MLX * oMLX * Open WebUI * LM Studio (MLX builds) * Custom agent frameworks ## Credits * Microsoft FastContext Team * Qwen Team * Apple MLX * oMLX ## Citation Please cite the original FastContext paper when using this model in research: ```bibtex @misc{zhang2026fastcontexttrainingefficientrepository, title={FastContext: Training Efficient Repository Explorer for Coding Agents}, author={Shaoqiu Zhang and Maoquan Wang and Yuling Shi and Yuhang Wang and Xiaodong Gu and Yongqiang Yao and Rao Fu and Shengyu Fu}, year={2026}, eprint={2606.14066}, archivePrefix={arXiv}, primaryClass={cs.SE} } ```