A quick summary of ISLR Ch9-10

Data Science

Below are the key points I summarized from Chapters 9 and 10 in An Introduction to Statistical Learning with Applications in R (ISLR).

Support Vector Machines

Support vector classifier (SVC): classification with a linear boundary

Support vector machine (SVM): extend SVC using kernels to provide non-linear boundaries

Unsupervised Learning

Principal components analysis (PCA): As seen in Ch 6, it uses a smaller number of representative variables (or principal components) to collectively explain most of the variability in the original feature set. In total, there are min(n−1,p) principal components to choose from. We can select the number of components to use based on proportion of variance explained (PVE).

Clustering Methods

While PCA tries to find a low-dimensional representation of the observations, clustering looks to ﬁnd homogeneous subgroups among the observations. Below are 2 popular clustering methods.

K-Means clustering: must specify the number of desired clusters (or K), and the results depend on the initial (random) cluster assignment of each observation

Hierarchical clustering: the choice of K is not required and generate a tree-based representation of the observations (or dendrogram); it builds the tree from the bottom up using dissimilarity measures for points (e.g., Euclidean or correlation-based distance) and for groups (e.g., complete, average, single, or centroid linkage)

Featured Posts:

A quick summary of ISLR Ch9-10

Support Vector Machines

Unsupervised Learning

Clustering Methods

Featured Posts: