--- license: apple-ascl tags: - open-lm - temporal - tic-lm - causal-lm library_name: transformers pipeline_tag: text-generation --- # Open LM 3B — Knowledge Cutoff July 2024 This is a HuggingFace-format conversion of the Apple Open LM **3B** oracle model trained with a knowledge cutoff of **July 2024**, from the [TiC-LM (Time-Continual Language Modeling)](https://arxiv.org/abs/2410.14660) project. ## Model Details | Property | Value | |---|---| | Architecture | LLaMA-style (pre-norm, SwiGLU, RoPE) | | Parameters | ~2.7B | | Training tokens | 220B | | Knowledge cutoff | July 2024 | | Vocab size | 50,432 | | Context length | 2,048 | | Original format | Apple Open LM | ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "dogtooth/open-lm-3b-202407", dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") ``` ## Conversion Notes - Converted from the original Open LM `.pt` checkpoint to a custom `OpenLMForCausalLM` format. - Uses **LayerNorm** (not RMSNorm) to match the original Open LM training. - Includes **QK norm** (LayerNorm on Q and K projections before attention). - Architecture dimensions are auto-detected from checkpoint weights. - Requires `trust_remote_code=True` when loading. ## Citation ```bibtex @article{jain2024ticlm, title={Time-Continual Learning from a Streaming Language Model}, author={Jain, Ameya and Ramesh, Aakanksha and Li, Tianjian and others}, journal={arXiv preprint arXiv:2410.14660}, year={2024} } ```