The K-Means algorithm tries to separate the samples into n groups with equal variance. This algorithm requires specifying the number of clusters. It scales well to a large number of samples and has been used in a wide variety of application areas in many different fields.
The cluster centre (centroid) is the arithmetic mean of all points belonging to the individual cluster. Each point is closer to its own cluster centre than the other clusters’ centre.
Step by Step K-Means Algorithm:
- Random assignment of centroids: The algorithm chooses a random centroid for each cluster.
- Creating of the First Clusters: The algorithm includes each point in the cluster of the nearest centroid point to obtain the first k (number of clusters) clusters. Distance is considered when assigning centroids, and one of the most common methods used is the Euclidean formula.
- Recalculation of Center: For each cluster, the algorithm recalculates the centroid by averaging all the points in the cluster. Changes in centres are indicated by arrows in the figure. As the centres change, the algorithm reassigns the points to the nearest centre.
- Reassignments and Clusters: The algorithm repeats the calculation of the centre points and the assignment of the points until the points stop changing the sets. When clustering large datasets, you use other criteria instead, stopping the algorithm before it reaches convergence.
We gave K-means clustering details. We want to you should use the Elbow method for the optimum point in this challenge.
It is a method used to determine the number of clusters when using the K-Means algorithm.
- Expresses the value graph of the cost function drawn for different k values
- Elbow point is taken in the drawn graph
- This point is usually close to the optimum number of clusters
We applied the K-means algorithm in the piece of code we gave in the dataset section. In this notebook, we worked with a dataset of 1000 samples, random_state=5. We gave the K value as 2 in the notebook, not being sure. We want you to find the best K value and find out the iteration_value you will get to this K value on the Elbow method.
What is the value corresponding to the optimum K value in the range of 1-10 for the Elbow method?
Submissions are evaluated using the Accuracy Score.