Definition

The k-means clustering algorithm solves the following optimization problem where is the number of clusters, a Partition is a set of clusters, and The k-means clustering algorithm aims to partition observations into clusters in which each observation belongs to the cluster with the nearest mean.

Algorithm

  1. Initialize .

  2. Assignment step: Assign each observation to the cluster with the nearest mean

    where each is assigned to exactly one

  3. Update step: Recalculate means for observations assigned to each cluster.

  4. Iterate step 2 and 3 until the within cluster variation converges.

Facts

The k-means clustering uses spherical metric (Euclidean distance) to group data, so that it is useful only when clusters are convex sets.