A quick summary of ISLR Ch7-8

Below are the key points I summarized from Chapters 7 and 8 in An Introduction to Statistical Learning with Applications in R (ISLR).  

Simple Extensions of Linear Models

Polynomial regression: raise original predictors to a power, and add them to the regression

Step functions: create a qualitative variable by cutting the range of a predictor into K distinct regions, and use the qualitative variable in the regression

Regression splines:  divide the range of predictors into K distinct regions, and fit a polynomial function to the data within each region; cubic spline is one example, which uses (K+4) degrees of freedom to fit it with K knots 

Smoothing splines: result from minimizing RSS subject to a smoothness penalty

Local regression:  compute the fit at a target point X0 using only nearby observations; the regions can overlap

Generalized additive models (GAM):  extend the methods above to deal with multiple predictors

Decision Tree Methods

Broadly speaking, there are 2 types of decision trees. 

Regression tree: predict a quantitative response (often given by the mean or median response value)  

Classification tree: predict a qualitative response 

Compared to the more classical methods (e.g., regression), decision trees are easier to interpret and can be graphically displayed. However, they are less robust and have a lower predictive accuracy. Here are a number of ways to improve the predictive performance of trees:

Cost complexity pruning: add a penalty term, such as the number of terminal nodes, to the optimization of RSS for a regression tree, or to the optimization of one of the 3 measures (classification error, Gini index, entropy) for a classification tree

Bagging: construct n trees using n bootstrapped training datasets, and take average of the n predictions

Random forest: at each split in a tree, a random sample of m predictors is chosen as split candidates from the full set of p predictors

Boosting: grow a tree by fitting small trees to the residuals, and slowly improve the tree in areas where it does not perform well