A quick summary of ISLR Ch9-10
Below are the key points I summarized from Chapters 9 and 10 in An Introduction to Statistical Learning with Applications in R (ISLR).
Support Vector Machines
Support vector classifier (SVC): classification with a linear boundary
Support vector machine (SVM): extend SVC using kernels to provide non-linear boundaries
Unsupervised Learning
Principal components analysis (PCA): As seen in Ch 6, it uses a smaller number of representative variables (or principal components) to collectively explain most of the variability in the original feature set. In total, there are min(n−1,p) principal components to choose from. We can select the number of components to use based on proportion of variance explained (PVE).
Clustering Methods
While PCA tries to find a low-dimensional representation of the observations, clustering looks to find homogeneous subgroups among the observations. Below are 2 popular clustering methods.
K-Means clustering: must specify the number of desired clusters (or K), and the results depend on the initial (random) cluster assignment of each observation
Hierarchical clustering: the choice of K is not required and generate a tree-based representation of the observations (or dendrogram); it builds the tree from the bottom up using dissimilarity measures for points (e.g., Euclidean or correlation-based distance) and for groups (e.g., complete, average, single, or centroid linkage)