**Finding Optimal Cluster Number**

After normalizing our dataset, we will use **KMeans Clustering** to group/cluster similar education data. K-Means Algorithm works by grouping together two data points that have *the least Euclidean Distance*.

We don’t know the number of clusters beforehand. We will determine it in the case of the KMeans Clustering algorithm using **Elbow Method**. This is called Elbow Method because we will choose the cluster at that point of the graph from where it stops falling **steeply**(*just like an elbow of hand*). It is the point where the **WCSS(Within Cluster Sum of Squares)** decreases very slowly.

**WCSS **is the distance between points in a cluster.

From the above plot, we can see *2 or 3* could be the ideal number of clusters.

To choose specifically among 2 and 3, the ideal number of clusters we will use some metrics as discussed below:

It ranges from -1 to 1. The higher the value better our clusters are. Closer to 1 means perfect clusters. 0 means the point lies at the border of its cluster. A negative value means that the point is classified into the wrong cluster.__Silhouette Score-__denotes how the data points are spread within a cluster. The higher the score, the denser is the cluster thus the cluster is better. It starts at 0 and has no upper limit.__Calinski-Harabasz Index__measures the average similarity between cluster using the ratio of the__Davies Boulden Index____distance between a cluster and its closest point__&__the average distance between each data point of a cluster and its cluster center__. The closer the score is to 0, the better our clusters areas it indicates clusters are well separated.

Let’s check the values of these metrics to find out the ideal number of the cluster for our KMeans algorithm on the scaled data. We already concluded before that *2 or 3* would be the ideal number of clusters but we will also test with 4 or 5 just for the purpose of demonstration.

As expected from **Elbow Method**, **2 **has the best **Silhouette Score** and **Davies Bouldin Score**, and second-best **Calinski Harabasz Score**. So **2** numbers of the cluster can be an ideal choice.

Though we have discussed that, we should always **normalize** our data to a similar range before applying *a distance-based clustering* algorithm, but let’s also check the metrics values using **the KMeans** algorithm on unnormalized data. *Remember that it’s always good to perform experimentation for better understanding and results.*

We can see that the best cluster number, in this case, is **3**. But both the **Silhouette Score** and **Davies Bouldin Score** deteriorated than 2-clusters we evaluated before though **Calinski Harabarz score** improved a bit. **Overall, the model performance deteriorated a bit**. So as mentioned before,* normalizing* data points before clustering do give good results.

Next, we will use a ** Hierarchical Clustering** technique called

**. It is a**

__Agglomerative Clustering__*bottom-up clustering*

*approach*where each data point is first considered as an individual cluster and then merges closest points according to a distance metric until a single cluster is obtained.

The Hierarchical Clustering can be visualized using a **dendrogram** as shown below. The suitable number of clusters for this data is shown as **2** in the dendrogram(*red & green*).

From the dendrogram we saw that the ideal number of clusters for the dataset is **2**, the **KMeans** algorithm also found the same. We will again use *Silhouette Score, Calinski Harabarz Index*, and *Davis Bouldin Score* to validate this.

Different Types of **Linkage**(*the metric/criteria to merge two clusters to form a bigger cluster*) **Functions**:

It combines clusters by taking into account the closest(minimum) points between two clusters.__Single Cluster:____min__. Cluster pairs having minimum distance between pair of points distance gets merged.__( Dist-a – Dist-b)__cluster-based on the farthest(maximum) distance between a pair of points between two clusters. Cluster pairs having maximum distance between pair of points distance gets merged.__Complete Cluster:____max(Dist-a – Dist-b)__Finds the minimum squared distance between a pair of points between two clusters. Cluster pairs having the lowest squared distance between pair of points distance get merged.**Ward Linkage:**Merges clusters based on the average distance of all points in one cluster from the points in other clusters. Cluster pairs having the lowest average distance get merged.__Average Linkage:__

Performing **Agglomerative Clustering** on **normalized** data:

As observed from the output table, the ideal number of clusters is indeed **2** with linkage methods ‘**average**‘ or ‘**ward**‘.

Now let’s use **Agglomerative Clustering( ward linkage)** for

**unnormalized data**and check how it performs.

In the case of an unnormalized dataset, 2 clusters with complete linkage are the best. But in the case of a normalized dataset, the performance is better. There the **Silhouette Score** and** Calinski Harabarz Score** performance are better though the **Davies Bouldin score **performance is a bit lower.

2 Clusters for both algorithms KMeans and Agglomerative on normalized datasets have the same performance. Let’s see how the **values for each f**** eature/parameter vary across** the two clusters for both the algorithms.