A quick summary of ISLR Ch1-4

Data Science

Below are the key points I summarized from Chapters 1-4 in An Introduction to Statistical Learning with Applications in R (ISLR).

Two Main Types of Learning Tasks

Supervised learning: predict y (response) using x (predictors)

Unsupervised learning: only x (predictors) are available, so we try to learn relationships among them

For each model, there is a bias-variance trade-off. A very flexible model tends to have a low bias (or low training error) but a high variance (or high testing error). Flexible means the model can fit training data very well, thus the low bias. However, this often indicates overfitting, so the model does poorly with a different dataset. Ideally, we want a model with a low bias and a low variance.

Parametric Approach

Logistic Regression: Coefficients are estimated using maximum likelihood. It's best for two-class classification.

‍Linear Discriminant Analysis: Observations within each class are assumed to come from a normal distribution with a class-speciﬁc mean and a common variance. The estimates for these parameters are plugged into the Bayes classiﬁer. It's better for multiple-class classification.‍

Quadratic Discriminant Analysis: It assumes class-specific variances instead of a common one. The model is more suitable when training data is large or the assumption of a common covariance matrix can't be justified.

Non-Parametric Approach

K-Nearest Neighbors: It assigns a new data point to the class that has the most of these observations. No assumptions are made about the shape of the decision boundary.

Featured Posts:

A quick summary of ISLR Ch1-4

Two Main Types of Learning Tasks

Parametric Approach

Non-Parametric Approach

Featured Posts: