--- language: - zh - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - roleplay - dialogue - multi-turn - qwen - reinforcement-learning - chat base_model: Qwen/Qwen3-32B ---
# 🎭 HER-RL: Role-Playing Model with Reinforcement Learning ### HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing

Paper Dataset HER-RL HER-RM GitHub

HER Framework *HER introduces dual-layer thinking that distinguishes characters' first-person thinking from LLMs' third-person thinking for cognitive-level persona simulation.*
## Overview **HER-RL** is a role-playing language model enhanced with reinforcement learning, built upon Qwen3-32B. It achieves cognitive-level persona simulation through **Dual-layer Thinking**: - **System Thinking** (``): Third-person meta-level planning on how to portray the character - **Role Thinking** (``): First-person character's inner thoughts and cognitive processes HER-RL significantly outperforms Qwen3-32B baseline by **30.26%** on CoSER and **14.97%** on MiniMax Role-Play Bench. ## Output Format The model generates responses with rich, interleaved structure: ``` Third-person analysis: context understanding, character motivation, response planning... Character's inner thoughts (invisible to others) Physical actions and expressions (visible to others) Spoken dialogue text. ``` ## How to Use ### Quick Start: Interactive Chat Demo ```bash git clone https://github.com/cydu24/HER.git cd HER/chat_demo python chat_demo.py --model-path ChengyuDu0123/HER-32B ``` **Demo Options:** ```bash # Show the model's reasoning process (system thinking) python chat_demo.py --show-think # Show character's inner thoughts (role thinking) python chat_demo.py --show-rolethink # Both python chat_demo.py --show-think --show-rolethink ``` ### Programmatic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "ChengyuDu0123/HER-32B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # Build system prompt system_prompt = """You are role-playing as Elizabeth Bennet from the book "Pride and Prejudice". ===Elizabeth Bennet's Profile=== The protagonist, intelligent and strong-willed. Quick-witted with a playful sense of humor. Values honesty and integrity. Maintains composure under pressure. ===Current Scene=== The scene is set at the Netherfield ball. Mr. Darcy has just approached you. ===The Person You Are Interacting With=== Mr. Darcy: A wealthy gentleman, proud and reserved. Owner of Pemberley estate. ===Instructions=== - Stay in character as Elizabeth Bennet at all times - Respond from Elizabeth's perspective - Speak DIRECTLY to "Mr. Darcy" using "you" (second person) ===Output Format=== Your output should include thought, speech, and action in this two-part structure: 1. System Thinking: A single block at the very beginning, wrapped in and . This is third-person analysis of how to portray the character. 2. Role-play Response: The character's actual response including: - inner thoughts (invisible to others) - physical actions (visible to others) - Speech (plain text, what the character says out loud)""" user_input = "*Mr. Darcy bows slightly* Miss Bennet, might I have the honor of the next dance?" messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_input} ] # Generate with system_thinking prefix text = tokenizer.apply_chat_template( messages + [{"role": "assistant", "content": ""}], tokenize=False, add_generation_prompt=False ) inputs = tokenizer([text], return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9, do_sample=True ) response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=False) response = response.replace("<|im_end|>", "").replace("<|im_start|>", "").strip() full_response = "" + response print(full_response) ``` ### Example Output ``` Context Analysis: Mr. Darcy has asked Elizabeth to dance at the Netherfield ball. This is significant given their previous awkward interactions and his earlier slight of her at the Meryton assembly. Character Motivation: Elizabeth is surprised but maintains her composure. She's curious about his sudden interest but won't show it openly. Her wit is her shield. Plan: - Action: Accept with grace but subtle irony - Internal Thought: Question his motives - Speech: Polite acceptance with a hint of her characteristic wit What game is he playing now? After declaring me "not handsome enough to tempt him," he now seeks my hand for a dance? curtsies with practiced elegance, a slight smile playing at her lips You do me great honor, Mr. Darcy. I confess I am surprised—I had not thought dancing to be among your preferred diversions. ``` ### Processing the Output ```python import re def remove_system_thinking(text): """Remove ... for display""" pattern = r'.*?\s*' return re.sub(pattern, '', text, flags=re.DOTALL).strip() def format_for_display(text, show_rolethink=True): """Format for display: [] for thoughts, () for actions""" result = text if show_rolethink: result = result.replace('', '[').replace('', ']') else: result = re.sub(r'.*?', '', result, flags=re.DOTALL) result = result.replace('', '(').replace('', ')') result = result.replace('', '').replace('', '') return result.strip() # Usage clean_response = remove_system_thinking(full_response) display_response = format_for_display(clean_response, show_rolethink=True) print(display_response) ``` **Output:** ``` [What game is he playing now? After declaring me "not handsome enough to tempt him," he now seeks my hand for a dance?] (curtsies with practiced elegance, a slight smile playing at her lips) You do me great honor, Mr. Darcy. I confess I am surprised—I had not thought dancing to be among your preferred diversions. ``` ## Performance | Model | CoSER Avg | MiniMax Avg | | ----------------------- | ----------------- | ----------------- | | Qwen3-32B (baseline) | 22.86 | 50.76 | | HER-SFT | 50.92 | 58.44 | | **HER-RL** | **53.12** | **65.73** | | Improvement vs baseline | **+30.26%** | **+14.97%** | ## 🎓 Citation ```bibtex @article{her2025, title={HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing}, author={Chengyu Du, Xintao Wang, Aili Chen, Weiyuan Li, Rui Xu, Junteng Liu, Zishan Huang, Rong Tian, Zijun Sun, Yuhao Li, Liheng Feng, Deming Ding, Pengyu Zhao, Yanghua Xiao}, journal={arXiv preprint arXiv:2601.21459}, year={2026} } ``` ## 📄 License This project is licensed under the Apache 2.0 License. ## 🤝 Acknowledgments - [CoSER](https://github.com/Neph0s/CoSER) for the evaluation benchmark - [MiniMax](https://huggingface.co/datasets/MiniMaxAI/role-play-bench) for the evaluation benchmark ---
**[Paper](https://arxiv.org/abs/2601.21459)** | **[HER-RM Model](https://huggingface.co/ChengyuDu0123/HER-RM-32B)** | **[Dataset](https://huggingface.co/datasets/ChengyuDu0123/HER-Dataset)** | **[GitHub](https://github.com/cydu24/HER)** Made with ❤️ for better AI role-playing