A quick summary of ISLR Ch7-8
Below are the key points I summarized from Chapters 7 and 8 in An Introduction to Statistical Learning with Applications in R (ISLR).
Simple Extensions of Linear Models
Polynomial regression: raise original predictors to a power, and add them to the regression
Step functions: create a qualitative variable by cutting the range of a predictor into K distinct regions, and use the qualitative variable in the regression
Regression splines: divide the range of predictors into K distinct regions, and fit a polynomial function to the data within each region; cubic spline is one example, which uses (K+4) degrees of freedom to fit it with K knots
Smoothing splines: result from minimizing RSS subject to a smoothness penalty
Local regression: compute the fit at a target point X0 using only nearby observations; the regions can overlap
Generalized additive models (GAM): extend the methods above to deal with multiple predictors
Decision Tree Methods
Broadly speaking, there are 2 types of decision trees.
Regression tree: predict a quantitative response (often given by the mean or median response value)
Classification tree: predict a qualitative response
Compared to the more classical methods (e.g., regression), decision trees are easier to interpret and can be graphically displayed. However, they are less robust and have a lower predictive accuracy. Here are a number of ways to improve the predictive performance of trees:
Cost complexity pruning: add a penalty term, such as the number of terminal nodes, to the optimization of RSS for a regression tree, or to the optimization of one of the 3 measures (classification error, Gini index, entropy) for a classification tree
Bagging: construct n trees using n bootstrapped training datasets, and take average of the n predictions
Random forest: at each split in a tree, a random sample of m predictors is chosen as split candidates from the full set of p predictors
Boosting: grow a tree by fitting small trees to the residuals, and slowly improve the tree in areas where it does not perform well