Definition

The k-means clustering algorithm solves the following optimization problem where is the number of clusters, a Partition is a set of clusters, and The k-means clustering algorithm aims to partition observations into clusters in which each observation belongs to the cluster with the nearest mean.
Algorithm
-
Initialize .
-
Assignment step: Assign each observation to the cluster with the nearest mean
where each is assigned to exactly one
-
Update step: Recalculate means for observations assigned to each cluster.
-
Iterate step 2 and 3 until the within cluster variation converges.
Facts
The k-means clustering uses spherical metric (Euclidean distance) to group data, so that it is useful only when clusters are convex sets.