---
language:
- en
license: mit
tags:
- text-generation
- style-transfer
- rewriting
- humanization
- seq2seq
- bart
- evaluation
- bertscore
- rouge
- chrf
library_name: transformers
base_model: facebook/bart-large
pipeline_tag: text-generation
paper:
- https://arxiv.org/abs/2604.11687v1
---

# cive202/humanize-ai-text-bart-large

Fine-tuned **BART-large** (`facebook/bart-large`) for **AI → Human rewriting** (“humanization”). This model is designed for **constrained rewriting**: preserve meaning while shifting style toward human-authored text.

- **Architecture**: encoder–decoder (seq2seq)
- **Parameters**: ~406M
- **Task format**: `humanize: {ai_text}` → `{human_text}`

## 📄 Paper

**“Rewriting the Machine: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer”**  
**Authors:** Utsav Paneru et al.  
**arXiv:** https://arxiv.org/abs/2604.11687v1  
**Status:** Preprint (2026)

### Citation

```bibtex
@misc{paneru2026makesoundlikehuman,
      title={Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer}, 
      author={Utsav Paneru},
      year={2026},
      eprint={2604.11687},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.11687}, 
}
```

## Quickstart

```bash
pip install -U "transformers>=4.40.0" torch sentencepiece
```

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "cive202/humanize-ai-text-bart-large"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

ai_text = "Large language models often produce fluent, structured prose with recognizable regularities..."
inputs = tokenizer("humanize: " + ai_text, return_tensors="pt", truncation=True)

out = model.generate(
    **inputs,
    max_new_tokens=256,
    num_beams=4,
)

print(tokenizer.decode(out[0], skip_special_tokens=True))
```

## Training summary (from project config)

Full fine-tuning (no adapters) with a standard seq2seq cross-entropy objective:

- **LR / schedule**: `5e-5`, cosine scheduler
- **Warmup ratio**: `0.1`
- **Precision**: bf16
- **Effective batch size**: 16 (`per_device_train_batch_size = 2`, `gradient_accumulation_steps = 8`)
- **Epochs**: 5
- **Checkpoint selection**: best checkpoint by validation loss

## Dataset

Parallel chunk pairs created via sentence-aware chunking:

- **Train**: 25,140 pairs
- **Validation**: 1,390 examples
- **Test (evaluation subset)**: 1,390 examples

Preprocessing details (high-level):
- sentence tokenization (NLTK)
- greedy packing to a token budget (≤200 tokens measured with BART-base tokenizer)
- drop pairs with fewer than 10 words on either side
- document-disjoint splits (no `doc_id` overlap between splits)

## Evaluation (test n = 1,390)

All metrics computed on the same 1,390-example test subset.

### Reference similarity (higher is better)

- **BERTScore F1**: **0.9240**
- **ROUGE-L**: **0.5657**
- **chrF++**: **55.9219**

### Fluency proxy

- **GPT-2 perplexity (output)**: **27.1481**
- **GPT-2 perplexity (human reference)**: **23.6912**

### Linguistic marker shift (style movement)

- **Mean directional marker shift**: **0.8289**

Qualitative note:
- This run is characterized by comparatively **precise targeting** of human marker means on several features (e.g., average word length and lexical diversity were extremely close to human reference means in the project’s analysis).

## Limitations

- This model optimizes reference similarity and controlled rewriting; it may not “push style” as aggressively as decoder-only models that can overshoot.
- No guarantee of bypassing AI detectors.
- Generalization depends on domains/styles present in training data.

## Research paper (unpublished)

Part of an unpublished manuscript (2026):

**“Rewriting the Machine: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer”**

- Status: **not published yet**
- Link: *(add your PDF/arXiv link when available)*

## License

MIT is a placeholder here—set this repo’s license to what you intend to distribute under, consistent with the base model’s terms.