--- base_model: LiquidAI/LFM2-350M-Extract license: apache-2.0 language: - en tags: - text-generation - instruction-tuning - structured-output - toon - lfm2 - unsloth - lora - transformers datasets: - yasserrmd/TOON-Unstructured-Structured model-index: - name: yasserrmd/LFM2-350M-Extract-TOON results: - task: name: TOON conversion (schema-driven extraction) type: text-generation dataset: name: yasserrmd/TOON-Unstructured-Structured type: text metrics: - name: Final Training Loss type: loss value: 0.2178 - name: Lowest Loss type: loss value: 0.2043 - name: Total Steps type: steps value: 430 --- # yasserrmd/LFM2-350M-Extract-TOON `yasserrmd/LFM2-350M-Extract-TOON` is a **fine-tuned variant of LiquidAI’s LFM2-350M-Extract**, built using the **Unsloth AI** framework and the dataset [`yasserrmd/TOON-Unstructured-Structured`](https://huggingface.co/datasets/yasserrmd/TOON-Unstructured-Structured). This model specializes in **schema-driven conversion of natural-language text into valid TOON (Token-Oriented Object Notation)** format β€” a compact, token-efficient alternative to JSON designed for large language models. --- ## Model Overview | Property | Description | |-----------|-------------| | **Base Model** | LiquidAI/LFM2-350M-Extract | | **Architecture** | LFM2-350M (Decoder-only Transformer) | | **Fine-tuning Method** | LoRA (via Unsloth AI) | | **Objective** | Structured extraction in TOON format | | **Dataset** | yasserrmd/TOON-Unstructured-Structured | | **Languages** | English | | **Frameworks** | Transformers, Unsloth, PyTorch | | **License** | LFM License v1.0 | | **Final Loss** | 0.2178 (Step 430) | --- ## What is TOON? **TOON (Token-Oriented Object Notation)** is a serialization format optimized for LLMs. It represents structured data with minimal tokens using a **header + rows** pattern: ``` users[2]{id,name,role}: 1,Alice,admin 2,Bob,user ```` Compared to JSON, TOON reduces token count by up to 60% and is easier for LLMs to generate deterministically. --- ## Training Summary The model was trained on 430 steps with the following key trends: - **Initial loss:** 1.3793 - **Final loss:** 0.2178 - **Lowest recorded loss:** 0.2043 - **Steady convergence** after step 250 with consistent decline below 0.3. - **Training method:** Unsloth LoRA (rank 16, alpha 32, learning rate 2e-4, batch size 64). - **Hardware:** 1x NVIDIA L4 (24 GB VRAM). - **Duration:** 1.5 hours. The training demonstrated strong stability and smooth convergence towards sub-0.25 loss, confirming excellent adaptation of the base model to TOON structure. --- ## 🧰 Usage Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "yasserrmd/LFM2-350M-Extract-TOON" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") schema = "animal{name,action,location}" text = "The cat sat on the mat." system = ( "You are a precise extractor that outputs TOON format only. " "Header must be