You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

IH-Scorer v2 — Intellectual Humility 3-seed LoRA Ensemble on Qwen3.6-27B

A 5-level A..E scorer of Intellectual Humility (IH) in short English text — the degree to which a passage acknowledges epistemic limits, expresses openness to revision, and engages opposing views charitably and non-absolutely. IH is treated here as a property of the text, not of the writer.

The deployed scorer is a 3-seed ensemble of LoRA adapters trained on unsloth/Qwen3.6-27B with the same supervised fine-tuning recipe and three independent random seeds. Inter-seed disagreement is exposed as an agreement-based confidence tier (HIGH / MEDIUM / LOW) for research triage, quality control, or human-review routing.

On 5-fold CV of the Guo 2024 corpus, the ensemble achieves Pearson r = 0.729 overall (mean across folds), rising to 0.815 on the HIGH-confidence subset (57%) where all 3 seeds agree. This release supersedes the earlier ORPO scorer tmadl/IH-Qwen3.5-ORPO-Guo (pooled Pearson 0.689), which is now deprecated.

Intended use

This model is intended for research use: scoring short English text for intellectual humility expressed in the passage, especially in psychological and social-science text analysis, persuasion / belief-change research, and human-AI interaction or AI-safety studies concerned with receiver-side reflective availability. Scores are most defensible at the level of texts, conditions, corpora, or repeated-measures aggregates.

The model scores texts, not people. A single text's IH score should not be treated as a stable attribute of the writer.

Do not use the scorer for individual profiling, clinical or forensic assessment, educational or employment evaluation, eligibility decisions, surveillance, content-moderation decisions, targeted persuasion, ranking people by intellectual character, or any other use that ranks, classifies, or makes decisions about identifiable individuals.

The model is trained and evaluated against expert text-marker coding of IH (Guo 2024 scheme); transfer outside written English social / argumentative discourse has not been validated.

Research context

This scorer is a candidate text-derived indicator for intellectual humility as construct anchor for the belief-holding/updating (calibration-collapse) failure mode in Madl & Lazar, A Receiver-Side Blind Spot in AI Safety (in review). It estimates IH expressed in a passage as a fallible proxy for the in-situ availability of warrant-inspection operations; load-bearing framework claims rest on independent behavioural endpoints, not on this text score alone.

Related scorers for the other two construct anchors are:

Quick start

pip install -U unsloth bitsandbytes accelerate

Ensemble (recommended)

from inference_example import score_texts
out = score_texts([
    "I know what I'm talking about, unlike most people. Anyone who "
    "disagrees hasn't done their research.",
    "I've held this view for a while but I recognize there's a lot I don't "
    "know. The strongest argument against it is real and I can't fully "
    "rebut it.",
])
# out[0]: {"ensemble_argmax_letter": "A", "ensemble_argmax_score_1_5": 1.0,
#          "ensemble_ev_score_1_5": ..., "confidence_tier": "HIGH", ...}
# out[1]: {"ensemble_argmax_letter": "E", "ensemble_argmax_score_1_5": 5.0,
#          "ensemble_ev_score_1_5": ..., "confidence_tier": "HIGH", ...}

Single-adapter fast path (≈1/3 cost; no confidence tier)

out = score_texts(texts, members=["sft_lowirr_all_seed42"])

CLI form (one essay per line):

python inference_example.py --input essays.txt --output scored.jsonl

Output fields

field description
ensemble_argmax_letter majority vote across adapters (tiebreak: highest mean prob)
ensemble_argmax_score_1_5 integer 1..5 mapping (A=1, ..., E=5)
ensemble_ev_score_1_5 Σ p(L) · num(L) — soft continuous score
prob_A..prob_E mean softmax probabilities across adapters
confidence_range max(adapter letters) − min (integer 0..4); 0 means all adapters agree
confidence_tier HIGH (range=0), MEDIUM (range=1), LOW (range≥2) — triage signal. N/A in single-adapter mode (no inter-seed range).
n_adapters_voted how many adapters produced a parseable letter
parse_failed True iff no adapter produced a parseable Letter: <X> — score reverts to uniform prior; flag for human review
adapter{0..K-1}_letter individual adapter letter predictions, for inspection

For downstream use:

  • letter-level tasks (κ-linear, exact accuracy): ensemble_argmax_letter
  • continuous scores (correlations, regressions): ensemble_ev_score_1_5
  • model probabilities: prob_A..prob_E
  • research triage / quality control: confidence_tier — flag LOW confidence for human review

Scoring head

The model was supervised fine-tuned to emit a short theory-grounded rationale followed by Letter: <X>. At inference time we therefore generate up to 240 tokens (greedy decoding) per essay, locate the predicted letter token, and read softmax probabilities at that position over the five letter tokens "A".."E". Ensemble probabilities are the per-adapter mean.

This differs from the earlier ORPO scorer (v1), which used first-position logit-EV decoding (one forward pass). v2's scoring head is not a drop-in replacement for v1's; see "Migrating from v1". The system prompt embedded in inference_example.py is the exact training prompt — do not modify it without retraining.

Expected performance (5-fold CV on Guo 2024)

metric mean across folds SD across folds
Pearson r 0.729 0.090
Cohen's κ-linear 0.575 0.082
Krippendorff α-ordinal 0.619 0.085

Per-fold Pearson: 0.844, 0.811, 0.641, 0.696, 0.652. Compared to human inter-annotator IRR (Neil/Melody Pearson on dual-coded essays):

fold model Pearson human IRR % of IRR
0 0.844 0.905 93%
1 0.811 0.946 86%
2 0.641 0.874 73%
3 0.696 0.773 90%
4 0.652 0.769 85%
mean 0.729 0.853 85%

The protocol is pre-registered single-recipe: 3-seed low-IRR-filtered SFT mean ensemble, applied uniformly across all 5 folds. An alternative cross-recipe 6-model ensemble (3 regular SFT + 3 low-IRR-filtered SFT) was tested and ties on mean Pearson (0.729) at 2× inference cost — single-recipe wins on parsimony.

Confidence tier — use it for triage

Stratifying 5-fold holdout predictions by 3-seed agreement gives a monotonic agreement–performance pattern:

confidence_tier coverage Pearson r κ-linear exact-acc within-1
HIGH (range=0, all 3 agree) 57% 0.815 0.692 65.5% 89.4%
MEDIUM (range=1) 33% 0.558 0.368 45.1% 83.2%
LOW (range≥2) 10% 0.408 0.201 25.8% 65.4%

The 10% LOW-tier items are especially strong candidates for human review or exclusion from sensitive analyses. On the 57% HIGH-tier majority, the ensemble is +0.086 Pearson over the all-items average. The signal reflects essay-intrinsic ambiguity (where human raters also disagree most), not just model-internal noise.

Generalisation — topic vs style

Paired topic-swap test (43 Guo essays rewritten preserving epistemic stance — hedges, absolutism, charity — but changing only the surface topic from religion to politics / nutrition / ethics / workplace / parenting / art):

  • Pearson(originals vs Guo letter) = 0.893
  • Pearson(swap twins vs Guo letter) = 0.826 (only −0.07)
  • 81% of paired predictions match exactly; 93% within 1 letter

In this small paired rewrite test, predictions were relatively stable under surface-topic swaps, suggesting some topic robustness for human-style argumentative writing.

Known systematic biases

The model is conservative — on 5-fold pooled holdouts it pulls extreme letters (A "arrogant" and E "deeply humble") toward the middle. The effect is most visible on the minority classes (A, B, C together = 34% of training data).

Training

Base unsloth/Qwen3.6-27B (4-bit NF4 via bitsandbytes — QLoRA)
Adapter LoRA r=32, α=64, no dropout, target: q/k/v/o + gate/up/down_proj
Recipe Supervised fine-tuning (SFT) on Guo letter labels + auto-generated theory-grounded rationales
Data 359 Guo 2024 essays = 410 total − 51 "low-IRR" essays (Neil/Melody letter difference ≥ 1)
Optimizer AdamW, lr 5e-5 cosine, weight decay 0.01
Effective batch 16
Steps 200
Seeds 42, 43, 44 (three independent LoRA initialisations)

Design choices

Low-IRR-filtered training. We exclude the 51 essays where human raters disagreed by at least one letter and train on the remaining 359 higher-agreement essays. In 5-fold comparison, this recipe outperformed regular SFT on every fold (+0.035 mean Pearson).

Three-seed ensemble. Single LoRA seeds vary noticeably across folds. Averaging three independently initialised adapters improves robustness and provides the agreement-based HIGH / MEDIUM / LOW confidence tier. Use the single-adapter path when latency matters.

Migrating from v1

This release supersedes tmadl/IH-Qwen3.5-ORPO-Guo, which is now deprecated.

Main differences from v1:

  • v2 improves pooled Pearson from 0.689 to 0.729 and Krippendorff α-ordinal from 0.451 to 0.619.
  • v2 uses a 3-seed SFT ensemble rather than a single-seed ORPO adapter.
  • v2 uses all five A..E levels more reliably; v1 often behaved closer to a binary arrogant-vs-humble classifier.
  • v2 reports an agreement-based HIGH / MEDIUM / LOW confidence tier.
  • v2 is slower in full-ensemble mode because it generates a short rationale before reading the final letter; use single-adapter mode when latency matters.
  • v2 reports native A..E / 1..5 scores. Do not mix v1 and v2 scores in the same analysis without recalibration.

For all new work, use v2.

Limitations

  • Language: trained on English text only. No claims about other languages.
  • Domain: Reddit religion-discussion posts (Guo 2024). Performance on technical, narrative, or non-argumentative text is not validated.
  • Length: truncated at 1024 tokens with up to 240 generated tokens for the rationale. Very long passages are scored on the truncated prefix.
  • Inference cost: 3 adapters loaded sequentially = ≈3× single-model time (≈15 min for 1000 essays on RTX 6000 Pro). For latency-sensitive use, deploy a single seed.
  • Single-rater: the ensemble outputs a single automated estimate per text. It is not a substitute for multiple trained human raters when consensus IH scores are required.
  • Calibration: anchored to Guo 2024's text-marker coding scheme; absolute scores should be interpreted relative to the training distribution, not as universal "humility units".
  • Aggregate where possible. Aggregate analyses over many texts are more reliable than interpreting any single text score.
  • No individual decision use. The scorer has not been validated for decisions about identifiable people, with or without consent.

License

The LoRA adapter weights and accompanying files are licensed under CC-BY-NC-4.0 — see LICENSE. CC BY-NC 4.0 permits non-commercial use, including research, teaching, personal experimentation, and other uses not primarily intended for commercial advantage or monetary compensation.

Commercial uses are not granted under CC BY-NC 4.0. Contact the rights holder for a separate commercial license — see COMMERCIAL.md.

The base model (unsloth/Qwen3.6-27B) is Apache 2.0 and is not redistributed here. The Guo 2024 EMNLP training corpus is governed by its own license; see NOTICE for full third-party attribution.

Copyright © 2026 Tamas Madl. All rights not granted under CC BY-NC 4.0 or a separate written commercial license are reserved.

Citation

If you use this model, please cite:

@misc{madl_ih_scorer_v2_2026,
  author       = {Madl, Tamas},
  title        = {IH-Scorer v2 — Intellectual Humility 3-seed LoRA Ensemble on Qwen3.6-27B},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/tmadl/IH-scorer-v2}},
  note         = {Model repository}
}

@misc{madl2026icscorer,
  author       = {Madl, Tamas},
  title        = {Text-measured cognitive complexity predicts belief revision in AI persuasion},
  year         = {2026},
  howpublished = {PsyArXiv preprint},
  url          = {https://osf.io/preprints/psyarxiv/mdxvs_v1}
}

@inproceedings{guo2024humility,
  author    = {Guo, Xiaobo and Potnis, Neil and Yu, Melody and Gillani, Nabeel and Vosoughi, Soroush},
  title     = {The Computational Anatomy of Humility: Modeling Intellectual
               Humility in Online Public Discourse},
  booktitle = {Proceedings of EMNLP 2024},
  year      = {2024},
  url       = {https://github.com/xiaobo-guo/The-Computational-Anatomy-of-Humility-Modeling-Intellectual-Humility-in-Online-Public-Discourse}
}

If your use case concerns AI dialogue, reflective agency, belief change, or receiver-side examinability, please also cite:

@unpublished{madl_lazar_receiver_side_examinability,
  author       = {Madl, Tamas and Lazar, Sara W.},
  title        = {A Receiver-Side Blind Spot in AI Safety},
  note         = {Manuscript in review},
  year         = {2026}
}

Additional instrument citations are in NOTICE.

Contact

Tamas Madl — tamas.madl@ofai.at Austrian Research Institute for Artificial Intelligence (OFAI)

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tmadl/IH-scorer-v2

Base model

Qwen/Qwen3.6-27B
Adapter
(9)
this model