Cross Validation


We will utilize the Auto dataset of ISLR package to compare different linear methods of varying flexibility to model mpg as function of horespower using CV approaches

Leave One Out Cross Validation (LOOCV)

Loading the required packages

require(boot)
Loading required package: boot

Plot of mpg vs horsepower

Using glm() to model mpg as a function of horsepower

cv.glm(Auto, glm.fit)$delta
[1] 24.23151 24.23114

cv.glm is bit slow even for a linear model as it fits any method n times. However, just for linear model we have a direct simple formula for comuting the CV estimate of test error without having to fit the model n times.

This formula is given as -

LOOCV(n) = 1/n sum{[(y_i - y_hat_i)/(1-h_i)]^2}

We will write a user-defined function to compute test error estimate using this formula

(test_err <- loocv(glm.fit))
[1] 24.23151

In the above code blocks, we estimated the test error by fitting a linear model on mpg using linear horespower term.

Now we will use LOOCV to compare linear models of varying degrees.

{
  glm.fit <- glm(mpg ~ poly(horsepower, i))
  cv.error[i] <- loocv(glm.fit)
}
Error in eval(predvars, data, env) : object 'mpg' not found

Plot of LOOCV test error estimate against linear model of varying flexibilities

The above plot shows that linear model of degree 1 does a poor job. The test error estimates drops down for degree 2 model. Inclusion of higher degree terms do not make much difference.

Hence we can conclude that a linear model of degree 2 is suited for modeling the relationship between mpg and horsepower


K- fold Cross Validation (K = 10)

The concept remains the same as LOOCV, except that instead of leaving one observation out of the model fitting process, we leave out approx (n/k) observations. The model fitting process is repeated k times

We will reuse the above code block with a small change. We will set the argument “k” to 10 in the cv.glm()

{
  glm.fit <- glm(mpg ~ poly(horsepower, i), data = Auto)
  cv.error10[i] <- cv.glm(Auto, glm.fit, k = 10)$delta[1]
}
Error in cv.glm(Auto, glm.fit, k = 10) : unused argument (k = 10)

Comparing the plot of test error estimate from LOOCV and K-fold CV approach

As we observe there is not much difference in the outputs of the 2 methods.

Generally speaking, we use 10-fold (or 5-fold) CV approach as it is computationally less expensive than LOOCV and it tends to produce more stable measure.



Bootstrap


We will illustrate the use of Bootstrap to estimate the standard error of the coefficients estimated by the linear regression method

Here we use the bootstrap approach in order to assess the variability of the estimates for β0 and β1, the intercept and slope terms for the linear regression model that uses horsepower to predict mpg in the Auto data set.

  return(coef(lm(mpg ~ horsepower, data = data, subset = index)))
Error in as.data.frame.default(data, optional = TRUE) : 
  cannot coerce class ‘"function"’ to a data.frame

Calling boot() function to repeat the model fit process on 1000 bootstrapped samples

boot(data = Auto, boot.fn, R = 1000)

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Auto, statistic = boot.fn, R = 1000)


Bootstrap Statistics :
      original        bias    std. error
t1* 39.9358610  0.0040975524 0.848703651
t2* -0.1578447 -0.0001175595 0.007325433

The output indicates that the bootstrap estimate of the standard error of Beta0 = 0.848 and that of Beta1 = 0.0074.

We can compare these values with the one produced by lm function by executing summary function on its output

summary(lm.out)

Call:
lm(formula = mpg ~ horsepower, data = Auto)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.5710  -3.2592  -0.3435   2.7630  16.9240 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 39.935861   0.717499   55.66   <2e-16 ***
horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.906 on 390 degrees of freedom
Multiple R-squared:  0.6059,    Adjusted R-squared:  0.6049 
F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

We see that the 2 values are different.

Does this mean the our boostrap estimate of standard error of cofficients is incorrect? No, this is not the case. Infact the bootstrap estimate is more accurate as it does not depend on any assumptions.

To determine the SE of cofficients we need to estimate the unknown error(noise) variance. And estimation of this noise variance depends on the linear model being correct.

As we have seen that the relationship between mpg and horsepower is not linear. So the residuals from linear model will be inflated and so will be the estimated error variance.

To validate this, we will now fit a quardatic model and compare the estimates from 2 methods

boot(Auto, boot.fn2, R = 10000)

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Auto, statistic = boot.fn2, R = 10000)


Bootstrap Statistics :
      original        bias    std. error
t1*   23.44592 -0.0009480357   0.3932161
t2* -120.13774  0.1498780916   4.7499967
t3*   44.08953 -0.3022614152   4.7778677
summary(lm.out2)$coef
                       Estimate Std. Error   t value      Pr(>|t|)
(Intercept)            23.44592  0.2209163 106.13030 2.752212e-289
poly(horsepower, 2)1 -120.13774  4.3739206 -27.46683  4.169400e-93
poly(horsepower, 2)2   44.08953  4.3739206  10.08009  2.196340e-21
LS0tDQp0aXRsZTogIklTTFIgLSBDaGFwdGVyIDUiDQpzdWJ0aXRsZTogIlJlc2FtcGxpbmcgTWV0aG9kcyAtIExhYnMiDQphdXRob3I6ICJNZWVuYWwgTmFyc2luZ2hhbmkiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQoNCioqQ3Jvc3MgVmFsaWRhdGlvbioqDQoNCioqKg0KDQpXZSB3aWxsIHV0aWxpemUgdGhlICpBdXRvKiBkYXRhc2V0IG9mIElTTFIgcGFja2FnZSB0byBjb21wYXJlIGRpZmZlcmVudCBsaW5lYXIgbWV0aG9kcyBvZiB2YXJ5aW5nIGZsZXhpYmlsaXR5IHRvIG1vZGVsIG1wZyBhcyBmdW5jdGlvbiBvZiBob3Jlc3Bvd2VyIHVzaW5nIENWIGFwcHJvYWNoZXMNCg0KKioqTGVhdmUgT25lIE91dCBDcm9zcyBWYWxpZGF0aW9uIChMT09DVikqKioNCg0KTG9hZGluZyB0aGUgcmVxdWlyZWQgcGFja2FnZXMNCmBgYHtyfQ0KcmVxdWlyZShJU0xSKQ0KcmVxdWlyZShib290KSAgICNGb3IgY3YuZ2xtIGZ1bmN0aW9uDQpgYGANCg0KUGxvdCBvZiBtcGcgdnMgaG9yc2Vwb3dlcg0KYGBge3J9DQpwbG90KG1wZyB+IGhvcnNlcG93ZXIsIGRhdGEgPSBBdXRvKQ0KYGBgDQoNClVzaW5nIGdsbSgpIHRvIG1vZGVsIG1wZyBhcyBhIGZ1bmN0aW9uIG9mIGhvcnNlcG93ZXINCmBgYHtyfQ0KZ2xtLmZpdCA8LSBnbG0obXBnIH4gaG9yc2Vwb3dlciwgZGF0YSA9IEF1dG8pDQpjdi5nbG0oQXV0bywgZ2xtLmZpdCkkZGVsdGEgICAjRGVmYXVsdCB2YWx1ZSBvZiBrID0gbiBpLmUuIExPT0NWDQpgYGANCg0KY3YuZ2xtIGlzIGJpdCBzbG93IGV2ZW4gZm9yIGEgbGluZWFyIG1vZGVsIGFzIGl0IGZpdHMgYW55IG1ldGhvZCBuIHRpbWVzLg0KSG93ZXZlciwganVzdCBmb3IgbGluZWFyIG1vZGVsIHdlIGhhdmUgYSBkaXJlY3Qgc2ltcGxlIGZvcm11bGEgZm9yIGNvbXV0aW5nIHRoZSBDViBlc3RpbWF0ZSBvZiB0ZXN0IGVycm9yIHdpdGhvdXQgaGF2aW5nIHRvIGZpdCB0aGUgbW9kZWwgbiB0aW1lcy4NCg0KVGhpcyBmb3JtdWxhIGlzIGdpdmVuIGFzIC0NCg0KICAgIExPT0NWKG4pID0gMS9uIHN1bXtbKHlfaSAtIHlfaGF0X2kpLygxLWhfaSldXjJ9DQogICAgDQpXZSB3aWxsIHdyaXRlIGEgdXNlci1kZWZpbmVkIGZ1bmN0aW9uIHRvIGNvbXB1dGUgdGVzdCBlcnJvciBlc3RpbWF0ZSB1c2luZyB0aGlzIGZvcm11bGENCmBgYHtyfQ0KbG9vY3YgPC0gZnVuY3Rpb24oZml0KQ0Kew0KICBoIDwtIGxtLmluZmx1ZW5jZShmaXQpJGgNCiAgbWVhbigocmVzaWR1YWxzKGZpdCkvKDEtaCkpXjIpDQp9ICANCmBgYA0KDQpgYGB7cn0NCih0ZXN0X2VyciA8LSBsb29jdihnbG0uZml0KSkNCmBgYA0KDQpJbiB0aGUgYWJvdmUgY29kZSBibG9ja3MsIHdlIGVzdGltYXRlZCB0aGUgdGVzdCBlcnJvciBieSBmaXR0aW5nIGEgbGluZWFyIG1vZGVsIG9uIG1wZyB1c2luZyBsaW5lYXIgaG9yZXNwb3dlciB0ZXJtLg0KDQpOb3cgd2Ugd2lsbCB1c2UgTE9PQ1YgdG8gY29tcGFyZSBsaW5lYXIgbW9kZWxzIG9mIHZhcnlpbmcgZGVncmVlcy4NCg0KYGBge3J9DQpjdi5lcnJvciA8LSByZXAoMCw1KQ0KZGVncmVlIDwtIDE6NQ0KDQpmb3IoaSBpbiBkZWdyZWUpDQp7DQogIGdsbS5maXQgPC0gZ2xtKG1wZyB+IHBvbHkoaG9yc2Vwb3dlciwgaSksIGRhdGEgPSBBdXRvKQ0KICBjdi5lcnJvcltpXSA8LSBsb29jdihnbG0uZml0KQ0KfQ0KYGBgDQoNClBsb3Qgb2YgTE9PQ1YgdGVzdCBlcnJvciBlc3RpbWF0ZSBhZ2FpbnN0IGxpbmVhciBtb2RlbCBvZiB2YXJ5aW5nIGZsZXhpYmlsaXRpZXMNCmBgYHtyfQ0KcGxvdCh4ID0gZGVncmVlLCB5ID0gY3YuZXJyb3IsIHR5cGUgPSAnYicsIGNvbCA9ICdibHVlJykNCmBgYA0KDQpUaGUgYWJvdmUgcGxvdCBzaG93cyB0aGF0IGxpbmVhciBtb2RlbCBvZiBkZWdyZWUgMSBkb2VzIGEgcG9vciBqb2IuIFRoZSB0ZXN0IGVycm9yIGVzdGltYXRlcyBkcm9wcyBkb3duIGZvciBkZWdyZWUgMiBtb2RlbC4gSW5jbHVzaW9uIG9mIGhpZ2hlciBkZWdyZWUgdGVybXMgZG8gbm90IG1ha2UgbXVjaCBkaWZmZXJlbmNlLg0KDQpIZW5jZSB3ZSBjYW4gY29uY2x1ZGUgdGhhdCBhIGxpbmVhciBtb2RlbCBvZiBkZWdyZWUgMiBpcyBzdWl0ZWQgZm9yIG1vZGVsaW5nIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBtcGcgYW5kIGhvcnNlcG93ZXINCg0KDQoqKioNCg0KKioqSy0gZm9sZCBDcm9zcyBWYWxpZGF0aW9uIChLID0gMTApKioqDQoNClRoZSBjb25jZXB0IHJlbWFpbnMgdGhlIHNhbWUgYXMgTE9PQ1YsIGV4Y2VwdCB0aGF0IGluc3RlYWQgb2YgbGVhdmluZyBvbmUgb2JzZXJ2YXRpb24gb3V0IG9mIHRoZSBtb2RlbCBmaXR0aW5nIHByb2Nlc3MsIHdlIGxlYXZlIG91dCBhcHByb3ggKG4vaykgb2JzZXJ2YXRpb25zLiBUaGUgbW9kZWwgZml0dGluZyBwcm9jZXNzIGlzIHJlcGVhdGVkIGsgdGltZXMNCg0KDQpXZSB3aWxsIHJldXNlIHRoZSBhYm92ZSBjb2RlIGJsb2NrIHdpdGggYSBzbWFsbCBjaGFuZ2UuIFdlIHdpbGwgc2V0IHRoZSBhcmd1bWVudCAiayIgdG8gMTAgaW4gdGhlIGN2LmdsbSgpDQoNCmBgYHtyfQ0KY3YuZXJyb3IxMCA8LSByZXAoMCw1KQ0KZm9yIChpIGluIGRlZ3JlZSkNCnsNCiAgZ2xtLmZpdCA8LSBnbG0obXBnIH4gcG9seShob3JzZXBvd2VyLCBpKSwgZGF0YSA9IEF1dG8pDQogIGN2LmVycm9yMTBbaV0gPC0gY3YuZ2xtKEF1dG8sIGdsbS5maXQsIEsgPSAxMCkkZGVsdGFbMV0NCn0NCmBgYA0KDQpDb21wYXJpbmcgdGhlIHBsb3Qgb2YgdGVzdCBlcnJvciBlc3RpbWF0ZSBmcm9tIExPT0NWIGFuZCBLLWZvbGQgQ1YgYXBwcm9hY2gNCmBgYHtyfQ0KcGxvdCh4ID0gZGVncmVlLCB5ID0gY3YuZXJyb3IsIHR5cGUgPSAnYicsIGNvbCA9ICdibHVlJykNCmxpbmVzKGRlZ3JlZSwgY3YuZXJyb3IxMCwgdHlwZSA9ICJiIiwgY29sID0gImRhcmtyZWQiKQ0KYGBgDQoNCkFzIHdlIG9ic2VydmUgdGhlcmUgaXMgbm90IG11Y2ggZGlmZmVyZW5jZSBpbiB0aGUgb3V0cHV0cyBvZiB0aGUgMiBtZXRob2RzLg0KDQpHZW5lcmFsbHkgc3BlYWtpbmcsIHdlIHVzZSAxMC1mb2xkIChvciA1LWZvbGQpIENWIGFwcHJvYWNoIGFzIGl0IGlzIGNvbXB1dGF0aW9uYWxseSBsZXNzIGV4cGVuc2l2ZSB0aGFuIExPT0NWIGFuZCBpdCB0ZW5kcyB0byBwcm9kdWNlIG1vcmUgc3RhYmxlIG1lYXN1cmUuIA0KDQoNCioqKg0KDQoqKioNCg0KKkJvb3RzdHJhcCoNCg0KKioqDQoNCldlIHdpbGwgaWxsdXN0cmF0ZSB0aGUgdXNlIG9mIEJvb3RzdHJhcCB0byBlc3RpbWF0ZSB0aGUgc3RhbmRhcmQgZXJyb3Igb2YgdGhlIGNvZWZmaWNpZW50cyBlc3RpbWF0ZWQgYnkgdGhlIGxpbmVhciByZWdyZXNzaW9uIG1ldGhvZA0KDQpIZXJlIHdlIHVzZSB0aGUgYm9vdHN0cmFwIGFwcHJvYWNoIGluIG9yZGVyIHRvIGFzc2VzcyB0aGUgdmFyaWFiaWxpdHkgb2YgdGhlIGVzdGltYXRlcyBmb3IgzrIwIGFuZCDOsjEsIHRoZSBpbnRlcmNlcHQgYW5kIHNsb3BlIHRlcm1zIGZvciB0aGUgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgdGhhdCB1c2VzIGhvcnNlcG93ZXIgdG8gcHJlZGljdCBtcGcgaW4gdGhlIEF1dG8gZGF0YSBzZXQuDQoNCg0KYGBge3J9DQpib290LmZuIDwtIGZ1bmN0aW9uKGRhdGEsIGluZGV4KQ0Kew0KICByZXR1cm4oY29lZihsbShtcGcgfiBob3JzZXBvd2VyLCBkYXRhID0gZGF0YSwgc3Vic2V0ID0gaW5kZXgpKSkNCn0NCmBgYA0KDQpDYWxsaW5nIGJvb3QoKSBmdW5jdGlvbiB0byByZXBlYXQgdGhlIG1vZGVsIGZpdCBwcm9jZXNzIG9uIDEwMDAgYm9vdHN0cmFwcGVkIHNhbXBsZXMNCmBgYHtyfQ0KYm9vdChkYXRhID0gQXV0bywgYm9vdC5mbiwgUiA9IDEwMDApDQpgYGANCg0KVGhlIG91dHB1dCBpbmRpY2F0ZXMgdGhhdCB0aGUgYm9vdHN0cmFwIGVzdGltYXRlIG9mIHRoZSBzdGFuZGFyZCBlcnJvciBvZiBCZXRhMCA9IDAuODQ4IGFuZCB0aGF0IG9mIEJldGExID0gMC4wMDc0Lg0KDQpXZSBjYW4gY29tcGFyZSB0aGVzZSB2YWx1ZXMgd2l0aCB0aGUgb25lIHByb2R1Y2VkIGJ5IGxtIGZ1bmN0aW9uIGJ5IGV4ZWN1dGluZyBzdW1tYXJ5IGZ1bmN0aW9uIG9uIGl0cyBvdXRwdXQNCg0KYGBge3J9DQpsbS5vdXQgPC0gbG0obXBnIH4gaG9yc2Vwb3dlciwgZGF0YSA9IEF1dG8pDQpzdW1tYXJ5KGxtLm91dCkNCmBgYA0KDQpXZSBzZWUgdGhhdCB0aGUgMiB2YWx1ZXMgYXJlIGRpZmZlcmVudC4gDQoNCipEb2VzIHRoaXMgbWVhbiB0aGUgb3VyIGJvb3N0cmFwIGVzdGltYXRlIG9mIHN0YW5kYXJkIGVycm9yIG9mIGNvZmZpY2llbnRzIGlzIGluY29ycmVjdD8qDQpObywgdGhpcyBpcyBub3QgdGhlIGNhc2UuIEluZmFjdCB0aGUgYm9vdHN0cmFwIGVzdGltYXRlIGlzIG1vcmUgYWNjdXJhdGUgYXMgaXQgZG9lcyBub3QgZGVwZW5kIG9uIGFueSBhc3N1bXB0aW9ucy4NCg0KVG8gZGV0ZXJtaW5lIHRoZSBTRSBvZiBjb2ZmaWNpZW50cyB3ZSBuZWVkIHRvIGVzdGltYXRlIHRoZSB1bmtub3duIGVycm9yKG5vaXNlKSB2YXJpYW5jZS4gQW5kIGVzdGltYXRpb24gb2YgdGhpcyBub2lzZSB2YXJpYW5jZSBkZXBlbmRzIG9uIHRoZSBsaW5lYXIgbW9kZWwgYmVpbmcgY29ycmVjdC4NCg0KQXMgd2UgaGF2ZSBzZWVuIHRoYXQgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIG1wZyBhbmQgaG9yc2Vwb3dlciBpcyBub3QgbGluZWFyLiBTbyB0aGUgcmVzaWR1YWxzIGZyb20gbGluZWFyIG1vZGVsIHdpbGwgYmUgaW5mbGF0ZWQgYW5kIHNvIHdpbGwgYmUgdGhlIGVzdGltYXRlZCBlcnJvciB2YXJpYW5jZS4NCg0KVG8gdmFsaWRhdGUgdGhpcywgd2Ugd2lsbCBub3cgZml0IGEgcXVhcmRhdGljIG1vZGVsIGFuZCBjb21wYXJlIHRoZSBlc3RpbWF0ZXMgZnJvbSAyIG1ldGhvZHMNCg0KDQpgYGB7cn0NCnNldC5zZWVkKDEyMykNCmJvb3QuZm4yIDwtIGZ1bmN0aW9uKGRhdGEsaW5kZXgpDQp7DQogIHJldHVybihjb2VmKGxtKG1wZyB+IHBvbHkoaG9yc2Vwb3dlciwyKSwgZGF0YSA9IGRhdGFbaW5kZXgsXSkpKQ0KfQ0KYGBgDQoNCmBgYHtyfQ0KYm9vdChBdXRvLCBib290LmZuMiwgUiA9IDEwMDAwKQ0KYGBgDQoNCmBgYHtyfQ0KbG0ub3V0MiA8LSBsbShtcGcgfiBwb2x5KGhvcnNlcG93ZXIsMiksIGRhdGEgPSBBdXRvKQ0Kc3VtbWFyeShsbS5vdXQyKSRjb2VmDQpgYGANCg0K