A quick summary of ISLR Ch9-10

Below are the key points I summarized from Chapters 9 and 10 in An Introduction to Statistical Learning with Applications in R (ISLR).

Support Vector Machines

Support vector classifier (SVC): classification with a linear boundary 

Support vector machine (SVM): extend SVC using kernels to provide non-linear boundaries

Unsupervised Learning

Principal components analysis (PCA):  As seen in Ch 6, it uses a smaller number of representative variables (or principal components) to collectively explain most of the variability in the original feature set. In total, there are min(n−1,p) principal components to choose from. We can select the number of components to use based on proportion of variance explained (PVE). 

Clustering Methods

While PCA tries to find a low-dimensional representation of the observations, clustering looks to find homogeneous subgroups among the observations. Below are 2 popular clustering methods.

K-Means clustering: must specify the number of desired clusters (or K), and the results depend on the initial (random) cluster assignment of each observation

Hierarchical clustering: the choice of K is not required and generate a tree-based representation of the observations (or dendrogram); it builds the tree from the bottom up using dissimilarity measures for points (e.g., Euclidean or correlation-based distance) and for groups (e.g., complete, average, single, or centroid linkage)