Characterizing, Evaluating, and Optimizing Complex Reasoning
Paper • 2602.08498 • Published
How to use zzzhr97/TRM-8B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="zzzhr97/TRM-8B") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("zzzhr97/TRM-8B")
model = AutoModelForSequenceClassification.from_pretrained("zzzhr97/TRM-8B")The Thinking Reward Model (TRM) evaluates the quality of reasoning traces rather than just final answers. Introduced in the paper Characterizing, Evaluating, and Optimizing Complex Reasoning, the model characterizes reasoning quality along four dimensions (the ME² principle):
The model can be used to score reasoning traces. Below is an example of how to use the model via a hosted server (e.g., using SGLang as suggested in the official repository):
import requests
import json
# Example prompt and response
prompt = "Your question here"
response = "<think> Thinking process... </think> Final Answer"
# Score the reasoning trace (before the termination marker).
reasoning = response.split("</think>", 1)[0]
input_text = f"{prompt}
{reasoning}"
payload = {"model": "RewardModel", "input": input_text}
# Replace <TRM_HOST> and <TRM_PORT> with your server details
resp = requests.post("http://<TRM_HOST>:<TRM_PORT>/v1/embeddings", json=payload, timeout=60)
resp.raise_for_status()
score = resp.json()["data"][0]["embedding"][0]
print("TRM score:", score)
@article{zhang2026characterizing,
title={Characterizing, Evaluating, and Optimizing Complex Reasoning},
author={Zhang, Haoran and Li, Yafu and Wang, Zhi and Wang, Zhilin and Zhang, Shunkai and Qu, Xiaoye and Cheng, Yu},
journal={arXiv preprint arXiv:2602.08498},
year={2026}
}