by William Suzuki

Here we explore how bootstraps change when we change parameters. Some of the parameters that we change are number of bootstraps samples \(B\), sample size \(m\) of each bootstrap sample and dispertion of orginal sample (variance \(\sigma^2\) of DGP).

We focus in a measure that we call inter percentile range (IPR) that is the difference between the 95th percentile and 5th from the bootstrap sample.

The problem that inspired this investigation is about a hypothesis test for MSE (mean squared error). To be able to perform this test is important because we can distinguish the performance between two models, e.g. \(f_1\) and \(f_2\). A difference between the MSE could be found just by chance, we can determine if the difference was due to chance of due to a true performance difference with a MSE hypothesis test.

Simulate sample

#200104 bootstrapping exploration
#preliminaries -------------------------------------------------

#objective: elaborate a test to see if two models have different 
#performances, for this we need to study how the measure of performance 
#behaves. we study the distribution of MSE first.
#other endeavours: study how accuracy changes, and other measures of 
#classification models.

library(tidyverse)

We create a random sample of normal distribution. This is a simulated sample for residuals.

#build sample ------------------------------------------
mean1 = 10
sig1 = 10
n = 100
sample1 = rnorm(n,mean1,sig1)
mean(sample1)

## [1] 10.76882

mm = 100000
vec_boots = rep(NA,mm)
for (ii in 1:mm) {
  sample1 = rnorm(n,mean1,sig1)
  mean_sample = mean(sample1)
  vec_boots[ii] = mean_sample
}

hist(vec_boots,breaks=100)

mean(vec_boots) #mean of means

## [1] 9.99758

var(vec_boots) #var of mean it is 10

## [1] 0.9977212

#mse problem
#we want to build a hypothesis test if two values of mse are different or not
#the difference between these two values can be due to chance?
#this is the purpose of hypothesis testing and bootstraping in this case.
set.seed(1)
n = 1000
sample1 = rnorm(n,mean1,sig1)
#sample1 #pretend is our training set
#our function is the mean for the sake of this problem
#it could be the mean 
#or for example squared mean

#bootstrapping loop
mm = 10000
vec_boots = rep(NA,mm)
set.seed(111)
for (ii in 1:mm) {
  sample2 = sample(sample1,n,replace=TRUE)
  mean_sample = mean(sample2^2)
  vec_boots[ii] = mean_sample
  #print(ii)
}

vec_boots %>% hist(breaks=100)

quantile_vec = vec_boots %>% quantile(probs=c(.05,.5,.95))
quantile_vec

##       5%      50%      95% 
## 191.6941 204.4745 217.8718

#ipr inter percentile range
ipr =  quantile_vec[3] - quantile_vec[1]
ipr

##      95% 
## 26.17775

IPR is the interpercentile range.

IPR is the difference between the 95th and 5th percentiles from the bootstrap distribution.

This is a measure of span of possible values that a MSE statistic can assume. If another MSE is out of this range we can say that the two models that generated the MSE have different performances.

How IPR changes with changing number of boostrapping samples?

#for mean square error-------------------------
#suppose error is gaussian
mm = 500
vec_boots = rep(NA,mm)
set.seed(111)
for (ii in 1:mm) {
  sample2 = sample(sample1,n,replace=TRUE)
  mean_sample = mean(sample2^2)
  vec_boots[ii] = mean_sample
  #print(ii)
}
vec_boots %>% hist(breaks=100)

quantile_vec = vec_boots %>% quantile(probs=c(.05,.5,.95))
quantile_vec

##       5%      50%      95% 
## 191.7129 204.7830 217.8538

#ipr inter percentile range
ipr =  quantile_vec[3] - quantile_vec[1]
ipr

##      95% 
## 26.14091

It does not change much the value is still close to 26.17. The histograms are noticibly different though.

How IPR changes with `mm` size?

We see how IPR change with number of bootstrap samples.

vec_B = c(seq(1,100,by=1),seq(100,1000,by=10),seq(1000,10000,by=100))
len = length(vec_B)
vec_ipr_mm = rep(NA,len)
for (jj in 1:len){
  mm = vec_B[jj]
  vec_boots = rep(NA,mm)
  #set.seed(111)
  for (ii in 1:mm) {
    sample2 = sample(sample1,n,replace=TRUE)
    mean_sample = mean(sample2^2)
    vec_boots[ii] = mean_sample
    #print(paste(ii,";",jj))
  }
  quantile_vec = vec_boots %>% quantile(probs=c(.05,.5,.95))
  ipr =  quantile_vec[3] - quantile_vec[1]
  vec_ipr_mm[jj] = ipr
}

plot(vec_B,vec_ipr_mm)
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

#the black line is the bootstrap ipr with mm=500
#500 is sufficient to most of our problems

The black line is the IPR value for mm=500.

xmin = 0
xmax = 50
ymin = 0
ymax = 35
plot(vec_B,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

#a view closer to zero

We notice that after mm=50 IPR is fairly stable. For a rule of thumb let’s use 300 bootstrap sample as enough.

Are there situations where we need an enormous amont of bootstrap samples?

xmin = 50
xmax = 100
ymin = 0
ymax = 35
plot(vec_B,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

#moving a little to the right we can see how ipr 
#stabilizes

xmin = 100
xmax = 1000
ymin = 0
ymax = 35
plot(vec_B,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

xmin = 1000
xmax = 10000
ymin = 0
ymax = 35
plot(vec_B,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

All of the images above show the same graph in different zooms.

Change sample size, impact in IPR

Here we investigate how changing the orignal sample size will change IPR.

The example below we use a sample size of 100, which is way smaller than 1000 which we used before. Let’s see how this change impacts IPR.

# shrink test sample ----------------------------------------------------
#suppose now that we have only 100 observations in the test sample
sample1_1 = sample1[1:100]
#sample1_1 #smaller test sample

mm = 300 #300 we adopt as the default
vec_boots = rep(NA,mm)
set.seed(111)
len = length(sample1_1)
for (ii in 1:mm) {
  sample2 = sample(sample1_1,len,replace=TRUE)
  mean_sample = mean(sample2^2)
  vec_boots[ii] = mean_sample
  #print(ii)
}
vec_boots %>% hist(breaks=100)

quantile_vec = vec_boots %>% quantile(probs=c(.05,.5,.95))
quantile_vec

##       5%      50%      95% 
## 166.3975 200.9483 239.3798

ipr =  quantile_vec[3] - quantile_vec[1]
ipr

##      95% 
## 72.98228

This is bigger than 26.14091 (sample size of 1000). The obtained value of 72.98228 (sample size of 100) it is much bigger. That is we have much less confidence about our measure of MSE hence the dispertion measure is larger.

#test sample size change, ipr change -------------------------------------
#let's see how ipr changes with test sample size
vec_B = c(seq(1,100,by=1),seq(100,1000,by=10))
len = length(vec_B)
vec_ipr_mm = rep(NA,len)
for (jj in 1:len){
  mm = vec_B[jj]
  sample1_1 = sample1[1:mm]
  sample1_1 #smaller test sample
  
  mm = 300 #300 we adopt as the standard
  vec_boots = rep(NA,mm)
  set.seed(111)
  len = length(sample1_1)
  for (ii in 1:mm) {
    sample2 = sample(sample1_1,len,replace=TRUE)
    mean_sample = mean(sample2^2)
    vec_boots[ii] = mean_sample
    #print(paste(ii,jj))
  }
  quantile_vec = vec_boots %>% quantile(probs=c(.05,.5,.95))
  ipr =  quantile_vec[3] - quantile_vec[1]
  vec_ipr_mm[jj] = ipr
}
plot(vec_B,vec_ipr_mm)
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

This graph shows how IPR changes when we change sample size of bootstrap (and the test sample, MSE).

We see that as the sample size grows IPR gets smaller that means that the uncertainty of the MSE statistic deacreases. The black line is the value of 26.15 which is the IPR from a sample size of 500.

xmin = 0
xmax = 50
ymin = 0
ymax = 500
plot(vec_B,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

This image is a closer view of the graph near 0.

xmin = 50
xmax = 500
ymin = 0
ymax = 500
plot(vec_B,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

xmin = 50
xmax = 500
ymin = 20
ymax = 150
plot(vec_B,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

Bootstrap procedures can be used to find distributions from estimators.

Some estimators have complicated distributions that are hard to find in an analytical way.

Also bootstraps can be used to increase the sample size of a statistical procedure. If for example we don’t have the sample size necessary to make IPR stabilize, we can build a larger sample by increasing the bootstrap sample size.

Usually in machine learning test sample size are larger than 300 observations.

xmin = 500
xmax = 1000
ymin = 20
ymax = 150
plot(vec_B,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

#a view closer to zero

Change variance, change IPR

Here we test how changing the variance of the DGP will change IPR.

#change variance of residuals, change ipr -------------------------------------
#how interpercentile range change with variance of residuals?
mean1=10
seq_var = c(seq(.01,1,by=.01),seq(1,10,by=.1),seq(10,100,by=1))
len = length(seq_var)
vec_ipr_mm = rep(NA,len)
set.seed(123)
for (ss in 1:len) {
  sig1 = seq_var[ss]
  sample1_2 = rnorm(1000,mean1,sig1)
  
  mm = 300 #300 we adopt as the standard
  vec_boots = rep(NA,mm)
  set.seed(111)
  for (ii in 1:mm) {
    sample2 = sample(sample1_2,len,replace=TRUE)
    mean_sample = mean(sample2^2)
    vec_boots[ii] = mean_sample
    #print(paste(ii,ss))
  }
  quantile_vec = vec_boots %>% quantile(probs=c(.05,.5,.95))
  ipr =  quantile_vec[3] - quantile_vec[1]
  vec_ipr_mm[ss] = ipr
}

plot(seq_var,vec_ipr_mm)
lines(seq_var,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

#the black line is the bootstrap ipr with mm=500
#ipr slowly decreases towards 26.15
#when we increase test sample size the

We can see that IPR increases exponentially with the variance of the DGP’s variance.

xmin = 0
xmax = 10
ymin = 0
ymax = 50
plot(seq_var,vec_ipr_mm, xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(seq_var,vec_ipr_mm,col='blue')
abline(h=26.14091,col='black')

Closer to zero the variance and IPR have a linear relationship.

The black line is 26.15 when the DGP’s variance is 10. We made the bootstrap sample size fixed in 1000, and the number of bootstrap samples in 300.

change variance, bootstrap sample size and IPR

#test IPR, boots sample size, and variance of DGP -------------------------
vec_B = c(seq(10,1000,by=10),seq(1000,10000,by=200))
vec_var = c(1,10,50,100)
len_var = length(vec_var)
len = length(vec_B)
tb_var = data.frame(matrix(NA,nrow=len,ncol=len_var))
names(tb_var) = paste0("sd",as.character(vec_var))
for (kk in 1:len_var){
  set.seed(112)
  vv = vec_var[kk]
  sample2_1 = rnorm(10000,mean=10,sd=vv)
  vec_ipr_mm = rep(NA,len)
  for (jj in 1:len) {
    nn = vec_B[jj]
    sample1_1 = sample2_1[1:nn]
    
    mm = 300 #300 we adopt as the standard
    vec_boots = rep(NA,mm)
    set.seed(111)
    for (ii in 1:mm) {
      sample2 = sample(sample1_1,nn,replace=TRUE)
      mean_sample = mean(sample2^2)
      vec_boots[ii] = mean_sample
    }
    quantile_vec = vec_boots %>% quantile(probs=c(.05,.5,.95))
    ipr =  quantile_vec[3] - quantile_vec[1]
    vec_ipr_mm[jj] = ipr
  }
  tb_var[,kk] = vec_ipr_mm
}

xmin = 0
xmax = 10000
ymin = 0
ymax = 15000
plot(vec_B,tb_var[,4],type='l', xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,tb_var[,3],col='blue')
lines(vec_B,tb_var[,2],col='red')
lines(vec_B,tb_var[,1],col='darkred')
abline(h=26.14091,col='black')

xmin = 0
xmax = 10000
ymin = 0
ymax = 3000
plot(vec_B,tb_var[,4],type='l', xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,tb_var[,3],col='blue')
lines(vec_B,tb_var[,2],col='red')
lines(vec_B,tb_var[,1],col='darkred')
abline(h=26.14091,col='black')

xmin = 0
xmax = 10000
ymin = 0
ymax = 400
plot(vec_B,tb_var[,4],type='l', xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,tb_var[,3],col='blue')
lines(vec_B,tb_var[,2],col='red')
lines(vec_B,tb_var[,1],col='darkred')
abline(h=26.14091,col='black')

xmin = 0
xmax = 10000
ymin = 0
ymax = 100
plot(vec_B,tb_var[,4],type='l', xlim = c(xmin,xmax), ylim=c(ymin, ymax))
lines(vec_B,tb_var[,3],col='blue')
lines(vec_B,tb_var[,2],col='red')
lines(vec_B,tb_var[,1],col='darkred')
abline(h=26.14091,col='black')

If we increase the size of sample the IPR decreases continuously.

After some point the gains in precision decreases that is maybe after a sample size of 1000 IPR decreases very slowly maybe the cost of increasing sample test size is not worth the gains in precision.

Let’s say we have a sample test of size 10,000 but we want to make bootstrap samples of size 1000, what are the problems of doing this? What if the variance of error is large? How this affect the stability of IPR?

Ideas

Ideas for classification problem: we should first build a DGP that is about a classification problem. It is a binomial model. Then we test with the method of building a bootstrap inter percentile range and see how IPR changes with changing parameters of bootstrap procedure.

Bootstrapping Exploration I