Unsupervised Methods:
- Principal Components
- K-means Clustering
- Hierarchical Clustering
1. Principal Component Analysis
We will use the USArrests data to illustrate the implementation of Principal Component method
glimpse(USArrests)
Observations: 50
Variables: 4
$ Murder [3m[38;5;246m<dbl>[39m[23m 13.2, 10.0, 8.1, 8.8, 9.0, 7.9, 3.3, 5.9, 15.4, 17.4, 5.3, 2.6, 10.4, 7.2, 2....
$ Assault [3m[38;5;246m<int>[39m[23m 236, 263, 294, 190, 276, 204, 110, 238, 335, 211, 46, 120, 249, 113, 56, 115,...
$ UrbanPop [3m[38;5;246m<int>[39m[23m 58, 48, 80, 50, 91, 78, 77, 72, 80, 60, 83, 54, 83, 65, 57, 66, 52, 66, 51, 6...
$ Rape [3m[38;5;246m<dbl>[39m[23m 21.2, 44.5, 31.0, 19.5, 40.6, 38.7, 11.1, 15.8, 31.9, 25.8, 20.2, 14.2, 24.0,...
We will now check for the means and variances of the variables and decide if standarization is required or not
apply(USArrests, 2, var)
Murder Assault UrbanPop Rape
18.97047 6945.16571 209.51878 87.72916
We notice that the means and variances are quite different. And since in Principal Component method we aim to identify the linear combination of variables that maximizes the variance, the result will be dominated by the variable that has the greatest variance.
So, we will standardize the variables (i.e. bring the variance of all variables to 1 unit) before implementing the method.
This can be achieved by setting the scale
argument of the prcomp
function to TRUE
pca_res
Standard deviations (1, .., p=4):
[1] 1.5748783 0.9948694 0.5971291 0.4164494
Rotation (n x k) = (4 x 4):
PC1 PC2 PC3 PC4
Murder -0.5358995 0.4181809 -0.3412327 0.64922780
Assault -0.5831836 0.1879856 -0.2681484 -0.74340748
UrbanPop -0.2781909 -0.8728062 -0.3780158 0.13387773
Rape -0.5434321 -0.1673186 0.8177779 0.08902432
The standard deviation displayed in the result is the standard deviation of each of the 4 principal components. (Remember that the total number of Principal Components for a dataset = MIN[n-1, p])
Notice that the standard deviations always decreases.
The Rotation in the above summary is nothing but the loadings.
The first principal component is loaded equally on all the 3 kinds of crime. And it has got a lower loading on UrbanPop
So the first principal component esentially measure the average of the 3 crimes in any state
The second principal component is heavily loaded on UrbanPop
Visualizing the Principal Components

Interpretation Since the loadings were negative for the first principal component, states with a negative values have high crime rate (like Michigan, Nevada, California)
Similarly, the second principal component had a negative loading corresponding to UrbanPop. Hence states like New Jersey, Hawaii has high percentage of urban population.
2. K-means clustering
We will work with a simulated 2-dimensional data to illustrate the application of k-means clustering method.

Now, the cluster_assign store the true cluster numbers for each data point.
We will now run k-means algorithm on this dataset. The true clsuter assignment will be hidden from the algorithm.
Determining the optimal value of k - * Elbow Curve Method *

Based on the plot above, we will select k = 4
kmeans_out
K-means clustering with 4 clusters of sizes 29, 22, 28, 21
Cluster means:
[,1] [,2]
1 -1.208942 -3.512880
2 3.062712 1.015205
3 -8.447148 -3.005280
4 -2.368405 1.643897
Clustering vector:
[1] 1 4 2 3 3 1 4 4 3 4 3 2 3 2 3 2 2 4 1 1 2 3 3 1 3 3 2 3 1 1 4 1 3 3 2 1 2 3 1 2 3 1 1 2 2 4
[47] 1 2 1 4 4 4 2 1 3 1 3 1 4 4 1 4 3 1 3 2 1 4 2 3 1 1 3 1 2 4 4 2 3 3 1 1 3 1 1 3 1 2 3 4 2 2
[93] 2 1 4 3 4 4 4 3
Within cluster sum of squares by cluster:
[1] 40.73619 51.11144 72.60169 41.22388
(between_SS / total_SS = 91.6 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss"
[7] "size" "iter" "ifault"
The output of k-means provides us with a number of metrics.
- It displays the cluster centers
- Cluster assignment for the input dataset
- Within sum of squares for each cluster: smaller the value more homogeneous the cluster is
Visualizing the output of k-means We will compare the cluster results from k-means with the the true cluster assignments

We can see that k-means did a pretty good job in correctly assigning points to the clusters.
3. Hierarchical Clustering
We will use the same simulated dataset used to perform k-means clustering.
We use the function hclust()
that accepts 2 parameters, one is the distance matrix
and the other is the linkage method
.

Since we know that there are 4 clusters in the data, the dendogram above infact shows the presence of 4 major cluster (if we cut the dendogram at height between 5 and 10)
Recall that complete linkage uses the maximum pairwise-distance between points in 2 clusters.
We will now use other linkage methods:
- Single: minimum pairwise distance
- Average: Averages the pairwise-distances between 2 clsuters

As expected, single linkage produced long, stringy trees. The 4 clsuters are not really prominent in the above dendogram.

Average linkage, like complete method, produces balanced trees. The 4 clusters are quite visible from the above dendogram.
Comparing the result from complete
linkage method with the true clsuter assignments
table(hclust_complete_cut, cluster_assign)
cluster_assign
hclust_complete_cut 1 2 3 4
1 29 0 0 0
2 0 20 0 0
3 0 0 22 0
4 1 0 0 28
The table above shows that only 1 observation has been assigned to a wrong cluster.
Comparing the result from complete
linkage method with the k-means clsuter assignments
table(hclust_complete_cut, kmeans_out$cluster)
hclust_complete_cut 1 2 3 4
1 29 0 0 0
2 0 0 0 20
3 0 22 0 0
4 0 0 28 1
