A quick summary of ISLR Ch5-6
Below are the key points I summarized from Chapters 5 and 6 in An Introduction to Statistical Learning with Applications in R (ISLR).
Resampling Methods
Leave-one-out cross validation: less bias and always yield the same result, but expensive to implement
K-fold cross validation: computationally cheaper and generate more accurate test error estimates due to bias-variance trade-off
Bootstrap: used to quantify the uncertainty of an estimator or statistical learning method
Subset Selection Methods
Best subset selection: computationally limited
Forward stepwise selection: computationally efficient and applicable if the number of variables (p) is bigger than the number of observations (n), but not guaranteed to find the best possible model
Backward stepwise selection: computationally efficient, but n > p is required and not guaranteed to find the best possible model
Shrinkage Methods
Compared to least squares, they are computationally more efficient, achieve big reduction in variance with small increase in bias, and applicable if p > n.
Ridge regression: quadratic shrinkage penalty terms and shrink least squares coefficients by the same proportion; the final model includes all p predictors
LASSO: absolute shrinkage penalty terms and shrink least squares coefficients by a constant amount; perform variable selection
Dimension Reduction Methods
Principal components regression (PCR): identify directions best representing predictors in an unsupervised way (i.e., responses are not used); not for feature selection
Partial least squares (PLS): identify directions that help explain both the response and the predictors; less bias but higher variance than PCR