====== Perplexity statistics ====== Mean PPL(Q) : 9.141469 ± 0.071687 Mean PPL(base) : 8.445938 ± 0.065177 Cor(ln(PPL(Q)), ln(PPL(base))): 97.40% Mean ln(PPL(Q)/PPL(base)) : 0.079135 ± 0.001778 Mean PPL(Q)/PPL(base) : 1.082351 ± 0.001925 Mean PPL(Q)-PPL(base) : 0.695531 ± 0.016891 ====== KL divergence statistics ====== Mean KLD: 0.115653 ± 0.000636 Maximum KLD: 16.755222 99.9% KLD: 2.903406 99.0% KLD: 1.030481 99.0% KLD: 1.030481 Median KLD: 0.051309 10.0% KLD: 0.000107 5.0% KLD: 0.000012 1.0% KLD: 0.000000 Minimum KLD: -0.000004 ====== Token probability statistics ====== Mean Δp: -1.357 ± 0.027 % Maximum Δp: 99.787% 99.9% Δp: 54.627% 99.0% Δp: 26.888% 95.0% Δp: 11.789% 90.0% Δp: 5.843% 75.0% Δp: 0.415% Median Δp: -0.011% 25.0% Δp: -2.096% 10.0% Δp: -10.461% 5.0% Δp: -18.305% 1.0% Δp: -39.181% 0.1% Δp: -79.693% Minimum Δp: -99.947% RMS Δp : 10.469 ± 0.057 % Same top p: 85.489 ± 0.091 %