---

base_model: Microsoft/FastContext-1.0-4B-SFT
base_model_relation: quantized
library_name: mlx
pipeline_tag: text-generation
license: mit
tags:
- qwen
- qwen3
- coder
- reasoning
- agent
- mlx
- omlx
- quantized
- apple-silicon
- image-text-to-text
- image-to-text
- video-to-text
- any-to-any
- Explorer SubAgent
- Repository Exploration
- FastContext
language:
  - en
  
---

# FastContext-1.0-4B-SFT-oQ6

An oQ6 quantized version of FastContext-1.0-4B-SFT optimized for Apple Silicon using oMLX.

This model preserves the repository exploration capabilities of FastContext while significantly reducing memory usage and improving inference efficiency through mixed-precision oQ quantization.

## About FastContext

FastContext is a lightweight repository-exploration subagent designed for coding agents. Instead of having a single model perform both repository exploration and problem solving, FastContext specializes in repository discovery and evidence gathering using parallel tool calls.

The model explores repositories through:

* READ
* GLOB
* GREP

and returns concise file paths and line references for downstream coding agents.

Original model: FastContext-1.0-4B-SFT.

## Quantization

This release uses:

* Quantization: oQ6
* Format: MLX
* Target Platform: Apple Silicon
* Mixed Precision: Enabled
* Optimized for local inference

The oQ quantization pipeline allocates higher precision to more sensitive weights while aggressively compressing less important regions of the network, providing a strong quality-to-size ratio.

## Recommended Inference Settings

For best performance:

```yaml
temperature: 0.7
top_p: 0.6
top_k: 20
min_p: 0
repetition_penalty: 1.05
presence_penalty: 1.5
thinking: true
```

### oMLX Preset

```yaml
temp: 0.7
top_p: 0.6
top_k: 20
min_p: 0
rep_penalty: 1.05
presence_penalty: 1.5
enable_thinking: true
```

These settings were selected to improve repository exploration quality, encourage broader search behavior, and maintain stable citation generation.

## Example Usage

```python
from mlx_lm import load, generate

model, tokenizer = load("FastContext-1.0-4B-SFT-oQ6")

prompt = "Find where authentication tokens are validated."

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    temp=0.7,
    top_p=0.6,
    top_k=20,
)

print(response)
```

## Intended Use

This model is intended for:

* Repository exploration
* Codebase navigation
* SWE-bench style workflows
* Coding agents
* Retrieval and evidence gathering
* Search-heavy software engineering tasks

It is not intended to replace a primary coding model. FastContext works best as a specialized exploration subagent paired with a stronger reasoning or code-generation model.

## Performance

FastContext was trained specifically to improve repository exploration efficiency and reduce the token overhead associated with repository search. The original paper reports improved end-to-end coding-agent performance while reducing token consumption across multiple SWE benchmarks.

## Recommended Deployment

Apple Silicon:

* M1 Pro / Max
* M2 Pro / Max / Ultra
* M3 Series
* M4 Series

Works well with:

* MLX
* oMLX
* Open WebUI
* LM Studio (MLX builds)
* Custom agent frameworks

## Credits

* Microsoft FastContext Team
* Qwen Team
* Apple MLX
* oMLX

## Citation

Please cite the original FastContext paper when using this model in research:

```bibtex
@misc{zhang2026fastcontexttrainingefficientrepository,
      title={FastContext: Training Efficient Repository Explorer for Coding Agents},
      author={Shaoqiu Zhang and Maoquan Wang and Yuling Shi and Yuhang Wang and Xiaodong Gu and Yongqiang Yao and Rao Fu and Shengyu Fu},
      year={2026},
      eprint={2606.14066},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}
```