---
license: apple-ascl
tags:
  - open-lm
  - temporal
  - tic-lm
  - causal-lm
library_name: transformers
pipeline_tag: text-generation
---

# Open LM 3B — Knowledge Cutoff July 2024

This is a HuggingFace-format conversion of the Apple Open LM **3B** oracle model
trained with a knowledge cutoff of **July 2024**, from the
[TiC-LM (Time-Continual Language Modeling)](https://arxiv.org/abs/2410.14660) project.

## Model Details

| Property | Value |
|---|---|
| Architecture | LLaMA-style (pre-norm, SwiGLU, RoPE) |
| Parameters | ~2.7B |
| Training tokens | 220B |
| Knowledge cutoff | July 2024 |
| Vocab size | 50,432 |
| Context length | 2,048 |
| Original format | Apple Open LM |

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "dogtooth/open-lm-3b-202407",
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
```

## Conversion Notes

- Converted from the original Open LM `.pt` checkpoint to a custom `OpenLMForCausalLM` format.
- Uses **LayerNorm** (not RMSNorm) to match the original Open LM training.
- Includes **QK norm** (LayerNorm on Q and K projections before attention).
- Architecture dimensions are auto-detected from checkpoint weights.
- Requires `trust_remote_code=True` when loading.

## Citation

```bibtex
@article{jain2024ticlm,
  title={Time-Continual Learning from a Streaming Language Model},
  author={Jain, Ameya and Ramesh, Aakanksha and Li, Tianjian and others},
  journal={arXiv preprint arXiv:2410.14660},
  year={2024}
}
```