====== Perplexity statistics ====== Mean PPL(Q) : 8.855147 ± 0.069027 Mean PPL(base) : 8.445938 ± 0.065177 Cor(ln(PPL(Q)), ln(PPL(base))): 98.10% Mean ln(PPL(Q)/PPL(base)) : 0.047313 ± 0.001512 Mean PPL(Q)/PPL(base) : 1.048450 ± 0.001585 Mean PPL(Q)-PPL(base) : 0.409209 ± 0.013614 ====== KL divergence statistics ====== Mean KLD: 0.081621 ± 0.000566 Maximum KLD: 30.775106 99.9% KLD: 2.516853 99.0% KLD: 0.775507 99.0% KLD: 0.775507 Median KLD: 0.033271 10.0% KLD: 0.000066 5.0% KLD: 0.000007 1.0% KLD: 0.000000 Minimum KLD: -0.000012 ====== Token probability statistics ====== Mean Δp: -1.121 ± 0.023 % Maximum Δp: 95.704% 99.9% Δp: 51.255% 99.0% Δp: 23.109% 95.0% Δp: 9.434% 90.0% Δp: 4.560% 75.0% Δp: 0.316% Median Δp: -0.008% 25.0% Δp: -1.786% 10.0% Δp: -8.436% 5.0% Δp: -14.687% 1.0% Δp: -32.445% 0.1% Δp: -73.467% Minimum Δp: -99.970% RMS Δp : 8.907 ± 0.055 % Same top p: 87.915 ± 0.084 %