Feature Extraction
Transformers
Safetensors
PyTorch
English
eden
text-enhancement
grammar-correction
text-rewriting
encoder-decoder
transformer
custom_code
Instructions to use Rybib/EDEN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rybib/EDEN with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Rybib/EDEN", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rybib/EDEN", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: transformers | |
| tags: | |
| - text-enhancement | |
| - grammar-correction | |
| - text-rewriting | |
| - encoder-decoder | |
| - transformer | |
| - pytorch | |
| # EDEN: Encoder Decoder Enhancement Network | |
| EDEN is a from-scratch PyTorch encoder-decoder Transformer that rewrites rough | |
| text into clean, polished text. It fixes spelling, grammar, punctuation, and | |
| phrasing while keeping the original meaning. The model was built and trained | |
| from the ground up (architecture, tokenizer, and training loop) and runs | |
| comfortably on a single machine, including Apple Silicon. | |
| This repository contains everything needed to use the model, retrain it, and | |
| extend it: | |
| * The trained model weights in safetensors format. | |
| * A Hugging Face Transformers integration (`AutoModel` with `trust_remote_code`). | |
| * The full training, fine-tuning, and evaluation engine. | |
| * A local web dashboard for training and trying the model in a browser. | |
| ## Model summary | |
| | Property | Value | | |
| | --- | --- | | |
| | Architecture | Encoder-decoder Transformer with tied embeddings | | |
| | Parameters | About 107 million | | |
| | Encoder layers | 8 | | |
| | Decoder layers | 8 | | |
| | Hidden size | 640 | | |
| | Attention heads | 10 | | |
| | Feed-forward size | 2560 | | |
| | Vocabulary | 24,000 byte-level BPE tokens | | |
| | Max sequence length | 512 tokens | | |
| | Held-out validation loss | 0.123 (cross entropy) | | |
| | Precision | float32 | | |
| ## Quick start | |
| First install the two dependencies (one time): | |
| ```bash | |
| pip3 install torch transformers | |
| ``` | |
| ### Option 1: chat with EDEN in the terminal (recommended) | |
| This opens a simple interactive interface, similar to Ollama. Type or paste | |
| rough text, press Enter, and get the cleaned-up version. Type `/bye` or press | |
| Ctrl+D to quit. | |
| ```bash | |
| python3 examples/try_eden.py | |
| ``` | |
| macOS users can also double-click `Try EDEN.command` to open the same interface | |
| in a terminal window. | |
| Example session: | |
| ```text | |
| >>> their are alot of reasons why this dont work proper | |
| There are a lot of reasons why this do not work proper. | |
| >>> /bye | |
| Goodbye. | |
| ``` | |
| ### Option 2: one terminal command | |
| Paste this whole line into your terminal to clean a single sentence: | |
| ```bash | |
| python3 -c "from transformers import AutoModel, AutoTokenizer; t=AutoTokenizer.from_pretrained('Rybib/EDEN', trust_remote_code=True); m=AutoModel.from_pretrained('Rybib/EDEN', trust_remote_code=True).eval(); print(m.enhance(t, 'i relly wnt this to sound beter'))" | |
| ``` | |
| ### Option 3: a Python script | |
| The lines below are Python, not terminal commands. Save them as a file such as | |
| `run.py`, then run `python3 run.py`. Do not paste them straight into the | |
| terminal. | |
| ```python | |
| from transformers import AutoModel, AutoTokenizer | |
| model_id = "Rybib/EDEN" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModel.from_pretrained(model_id, trust_remote_code=True).eval() | |
| rough = "i relly wnt this sentance to sound more profesional" | |
| print(model.enhance(tokenizer, rough)) | |
| # I really want this sentence to sound more professional. | |
| ``` | |
| The `enhance` method handles long inputs by splitting them into sentence-aware | |
| chunks, rewriting each chunk, and joining the results. | |
| ### Decoding options | |
| ```python | |
| model.enhance( | |
| tokenizer, | |
| "their are alot of reasons why this dont work proper", | |
| strategy="beam", # "beam", "greedy", or "sample" | |
| beam_size=4, | |
| repetition_penalty=1.08, | |
| length_penalty=0.7, | |
| ) | |
| ``` | |
| ## What the model is good at | |
| EDEN was trained on rough-to-polished text pairs covering several editing skills: | |
| * Spelling and typo correction, including dyslexia-style letter swaps. | |
| * Grammar correction. | |
| * Punctuation and capitalization. | |
| * Clearer, more fluent rewriting and light paraphrasing. | |
| * Preserving the original meaning rather than inventing new content. | |
| It is an editing model, not a chatbot or a general text generator. Give it a | |
| sentence or paragraph to clean up, not a question or an instruction. | |
| ## Training data | |
| The dataset is built from publicly available text-editing corpora plus generated | |
| noise, combined into rough-text to clean-text pairs: | |
| | Source | Role | | |
| | --- | --- | | |
| | JFLEG | Grammar correction examples | | |
| | Grammarly CoEdIT | Correction and rewrite tasks | | |
| | W&I / LOCNESS | Learner-English correction | | |
| | ASSET | Sentence simplification | | |
| | WikiSplit | Sentence and paragraph flow | | |
| | MRPC | Meaning-preserving paraphrase pairs | | |
| | Synthetic noise | Generated typos, swaps, punctuation, and capitalization fixes | | |
| You can rebuild the dataset locally with the training engine described below. | |
| ## Retrain or extend the model | |
| This repository ships the complete training engine as an importable `eden` | |
| package and a command-line tool. | |
| ```bash | |
| pip install -r requirements.txt | |
| # Build the dataset and tokenizer, then train from scratch. | |
| python -m eden.cli prepare | |
| python -m eden.cli train | |
| # Continue training on your own examples. | |
| python -m eden.cli finetune --data my_pairs.jsonl --mix-base | |
| # Enhance text from the command line. | |
| python -m eden.cli enhance "i relly wnt this to sound beter" | |
| ``` | |
| Your own fine-tuning data is a JSONL file of input and target pairs: | |
| ```jsonl | |
| {"input": "bad rough text here", "target": "Polished text here."} | |
| {"input": "another messy sentance", "target": "Another polished sentence."} | |
| ``` | |
| Keeping `--mix-base` on is recommended so the model learns your style without | |
| forgetting general spelling and grammar ability. | |
| See [docs/TRAINING.md](docs/TRAINING.md) for the full workflow and | |
| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for how the model is built. | |
| ## Local web dashboard | |
| ```bash | |
| python -m eden.cli ui | |
| # then open http://127.0.0.1:7860 | |
| ``` | |
| The dashboard can start, pause, and resume training, shows live loss and | |
| validation metrics, watches memory use, and runs a finished checkpoint directly | |
| in the browser. | |
| ## Fine-tuning with the Transformers Trainer | |
| The model also supports standard supervised training. `forward` accepts `labels` | |
| and returns a loss, so it works with the Hugging Face `Trainer` for users who | |
| prefer that workflow. Tokens that should be ignored in the loss use the index | |
| `-100`, and `decoder_input_ids` are shifted from `labels` automatically. | |
| ## Files in this repository | |
| | File | Purpose | | |
| | --- | --- | | |
| | `model.safetensors` | Trained model weights | | |
| | `config.json` | Model configuration | | |
| | `configuration_eden.py` | Configuration class for Transformers | | |
| | `modeling_eden.py` | Model class for Transformers | | |
| | `tokenizer.json` | Byte-level BPE tokenizer | | |
| | `eden/` | Training, fine-tuning, and inference engine | | |
| | `scripts/` | Checkpoint conversion and upload helpers | | |
| | `examples/` | Runnable usage examples | | |
| | `docs/` | Architecture and training guides | | |
| ## Limitations | |
| * English only. | |
| * Best on sentence and paragraph length inputs, up to 512 tokens per chunk. | |
| * It can occasionally change wording more than intended. Beam search with the | |
| default penalties gives the most conservative edits. | |
| * It is not designed to answer questions, follow instructions, or generate new | |
| content from scratch. | |
| ## License | |
| Released under the Apache License 2.0. See [LICENSE](LICENSE). | |
| ## Citation | |
| ```bibtex | |
| @software{eden_text_enhancement, | |
| title = {EDEN: Encoder Decoder Enhancement Network}, | |
| author = {Dunn, Ryan}, | |
| year = {2026}, | |
| url = {https://huggingface.co/Rybib/EDEN} | |
| } | |
| ``` | |