---
language:
- zh
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- roleplay
- dialogue
- multi-turn
- qwen
- reinforcement-learning
- chat
base_model: Qwen/Qwen3-32B
---
## Overview
**HER-RL** is a role-playing language model enhanced with reinforcement learning, built upon Qwen3-32B. It achieves cognitive-level persona simulation through **Dual-layer Thinking**:
- **System Thinking** (``): Third-person meta-level planning on how to portray the character
- **Role Thinking** (``): First-person character's inner thoughts and cognitive processes
HER-RL significantly outperforms Qwen3-32B baseline by **30.26%** on CoSER and **14.97%** on MiniMax Role-Play Bench.
## Output Format
The model generates responses with rich, interleaved structure:
```
Third-person analysis: context understanding, character motivation, response planning...
Character's inner thoughts (invisible to others)
Physical actions and expressions (visible to others)
Spoken dialogue text.
```
## How to Use
### Quick Start: Interactive Chat Demo
```bash
git clone https://github.com/cydu24/HER.git
cd HER/chat_demo
python chat_demo.py --model-path ChengyuDu0123/HER-32B
```
**Demo Options:**
```bash
# Show the model's reasoning process (system thinking)
python chat_demo.py --show-think
# Show character's inner thoughts (role thinking)
python chat_demo.py --show-rolethink
# Both
python chat_demo.py --show-think --show-rolethink
```
### Programmatic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ChengyuDu0123/HER-32B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Build system prompt
system_prompt = """You are role-playing as Elizabeth Bennet from the book "Pride and Prejudice".
===Elizabeth Bennet's Profile===
The protagonist, intelligent and strong-willed. Quick-witted with a playful sense of humor. Values honesty and integrity. Maintains composure under pressure.
===Current Scene===
The scene is set at the Netherfield ball. Mr. Darcy has just approached you.
===The Person You Are Interacting With===
Mr. Darcy: A wealthy gentleman, proud and reserved. Owner of Pemberley estate.
===Instructions===
- Stay in character as Elizabeth Bennet at all times
- Respond from Elizabeth's perspective
- Speak DIRECTLY to "Mr. Darcy" using "you" (second person)
===Output Format===
Your output should include thought, speech, and action in this two-part structure:
1. System Thinking: A single block at the very beginning, wrapped in and . This is third-person analysis of how to portray the character.
2. Role-play Response: The character's actual response including:
- inner thoughts (invisible to others)
- physical actions (visible to others)
- Speech (plain text, what the character says out loud)"""
user_input = "*Mr. Darcy bows slightly* Miss Bennet, might I have the honor of the next dance?"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_input}
]
# Generate with system_thinking prefix
text = tokenizer.apply_chat_template(
messages + [{"role": "assistant", "content": ""}],
tokenize=False,
add_generation_prompt=False
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=False)
response = response.replace("<|im_end|>", "").replace("<|im_start|>", "").strip()
full_response = "" + response
print(full_response)
```
### Example Output
```
Context Analysis: Mr. Darcy has asked Elizabeth to dance at the Netherfield ball.
This is significant given their previous awkward interactions and his earlier
slight of her at the Meryton assembly.
Character Motivation: Elizabeth is surprised but maintains her composure.
She's curious about his sudden interest but won't show it openly.
Her wit is her shield.
Plan:
- Action: Accept with grace but subtle irony
- Internal Thought: Question his motives
- Speech: Polite acceptance with a hint of her characteristic wit
What game is he playing now? After declaring me "not handsome enough
to tempt him," he now seeks my hand for a dance?
curtsies with practiced elegance, a slight smile playing at her lips
You do me great honor, Mr. Darcy. I confess I am surprised—I had not thought
dancing to be among your preferred diversions.
```
### Processing the Output
```python
import re
def remove_system_thinking(text):
"""Remove ... for display"""
pattern = r'.*?\s*'
return re.sub(pattern, '', text, flags=re.DOTALL).strip()
def format_for_display(text, show_rolethink=True):
"""Format for display: [] for thoughts, () for actions"""
result = text
if show_rolethink:
result = result.replace('', '[').replace('', ']')
else:
result = re.sub(r'.*?', '', result, flags=re.DOTALL)
result = result.replace('', '(').replace('', ')')
result = result.replace('', '').replace('', '')
return result.strip()
# Usage
clean_response = remove_system_thinking(full_response)
display_response = format_for_display(clean_response, show_rolethink=True)
print(display_response)
```
**Output:**
```
[What game is he playing now? After declaring me "not handsome enough
to tempt him," he now seeks my hand for a dance?]
(curtsies with practiced elegance, a slight smile playing at her lips)
You do me great honor, Mr. Darcy. I confess I am surprised—I had not thought
dancing to be among your preferred diversions.
```
## Performance
| Model | CoSER Avg | MiniMax Avg |
| ----------------------- | ----------------- | ----------------- |
| Qwen3-32B (baseline) | 22.86 | 50.76 |
| HER-SFT | 50.92 | 58.44 |
| **HER-RL** | **53.12** | **65.73** |
| Improvement vs baseline | **+30.26%** | **+14.97%** |
## 🎓 Citation
```bibtex
@article{her2025,
title={HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing},
author={Chengyu Du, Xintao Wang, Aili Chen, Weiyuan Li, Rui Xu, Junteng Liu, Zishan Huang, Rong Tian, Zijun Sun, Yuhao Li, Liheng Feng, Deming Ding, Pengyu Zhao, Yanghua Xiao},
journal={arXiv preprint arXiv:2601.21459},
year={2026}
}
```
## đź“„ License
This project is licensed under the Apache 2.0 License.
## 🤝 Acknowledgments
- [CoSER](https://github.com/Neph0s/CoSER) for the evaluation benchmark
- [MiniMax](https://huggingface.co/datasets/MiniMaxAI/role-play-bench) for the evaluation benchmark
---
**[Paper](https://arxiv.org/abs/2601.21459)** | **[HER-RM Model](https://huggingface.co/ChengyuDu0123/HER-RM-32B)** | **[Dataset](https://huggingface.co/datasets/ChengyuDu0123/HER-Dataset)** | **[GitHub](https://github.com/cydu24/HER)**
Made with ❤️ for better AI role-playing