A quick summary of ISLR Ch5-6

Below are the key points I summarized from Chapters 5 and 6 in An Introduction to Statistical Learning with Applications in R (ISLR).

Resampling Methods

Leave-one-out cross validation: less bias and always yield the same result, but expensive to implement

K-fold cross validation: computationally cheaper and generate more accurate test error  estimates due to bias-variance trade-off

Bootstrap: used to quantify the uncertainty of an estimator or statistical learning method

Subset Selection Methods

Best subset selection: computationally limited

Forward stepwise selection: computationally efficient and applicable if the number of variables (p) is bigger than the number of observations (n), but not guaranteed to find the best possible model

Backward stepwise selection: computationally efficient, but n > p is required and not guaranteed to find the best possible model

Shrinkage Methods

Compared to least squares, they are computationally more efficient, achieve big reduction in variance with small increase in bias, and applicable if p > n. 

Ridge regression: quadratic shrinkage penalty terms and shrink least squares coefficients by the same proportion; the final model includes all p predictors

LASSO: absolute shrinkage penalty terms and shrink least squares coefficients by a constant amount; perform variable selection

Dimension Reduction Methods

Principal components regression (PCR): identify directions best representing predictors in an unsupervised way (i.e., responses are not used); not for feature selection

Partial least squares (PLS): identify directions that help explain both the response and the predictors; less bias but higher variance than PCR