====== Perplexity statistics ====== Mean PPL(Q) : 8.949256 ± 0.069263 Mean PPL(base) : 8.445938 ± 0.065177 Cor(ln(PPL(Q)), ln(PPL(base))): 97.57% Mean ln(PPL(Q)/PPL(base)) : 0.057885 ± 0.001704 Mean PPL(Q)/PPL(base) : 1.059593 ± 0.001805 Mean PPL(Q)-PPL(base) : 0.503318 ± 0.015365 ====== KL divergence statistics ====== Mean KLD: 0.104258 ± 0.000668 Maximum KLD: 18.479706 99.9% KLD: 3.157249 99.0% KLD: 0.989329 99.0% KLD: 0.989329 Median KLD: 0.042291 10.0% KLD: 0.000107 5.0% KLD: 0.000012 1.0% KLD: 0.000000 Minimum KLD: -0.000004 ====== Token probability statistics ====== Mean Δp: -1.473 ± 0.025 % Maximum Δp: 99.114% 99.9% Δp: 53.562% 99.0% Δp: 24.617% 95.0% Δp: 10.221% 90.0% Δp: 4.865% 75.0% Δp: 0.294% Median Δp: -0.017% 25.0% Δp: -2.146% 10.0% Δp: -9.929% 5.0% Δp: -17.075% 1.0% Δp: -37.839% 0.1% Δp: -77.867% Minimum Δp: -99.916% RMS Δp : 9.937 ± 0.057 % Same top p: 86.826 ± 0.088 %