====== Perplexity statistics ====== Mean PPL(Q) : 8.950671 ± 0.069566 Mean PPL(base) : 8.445938 ± 0.065177 Cor(ln(PPL(Q)), ln(PPL(base))): 97.80% Mean ln(PPL(Q)/PPL(base)) : 0.058043 ± 0.001625 Mean PPL(Q)/PPL(base) : 1.059760 ± 0.001722 Mean PPL(Q)-PPL(base) : 0.504733 ± 0.014789 ====== KL divergence statistics ====== Mean KLD: 0.095434 ± 0.000689 Maximum KLD: 32.811981 99.9% KLD: 2.871047 99.0% KLD: 0.908818 99.0% KLD: 0.908818 Median KLD: 0.038622 10.0% KLD: 0.000091 5.0% KLD: 0.000010 1.0% KLD: 0.000000 Minimum KLD: -0.000006 ====== Token probability statistics ====== Mean Δp: -1.299 ± 0.025 % Maximum Δp: 99.105% 99.9% Δp: 51.217% 99.0% Δp: 24.267% 95.0% Δp: 10.059% 90.0% Δp: 4.891% 75.0% Δp: 0.312% Median Δp: -0.013% 25.0% Δp: -1.967% 10.0% Δp: -9.334% 5.0% Δp: -16.184% 1.0% Δp: -35.890% 0.1% Δp: -76.357% Minimum Δp: -99.865% RMS Δp : 9.560 ± 0.056 % Same top p: 87.237 ± 0.086 %