Definition

The k-means clustering algorithm solves the following optimization problem $S min k = 1 \sum K x_{i} \in S_{i} \sum ∣∣ x_{i} - m_{k} ∣ ∣^{2}$ where $K$ is the number of clusters, a Partition $S = {S_{k} ∣ k = 1, \dots, K}$ is a set of clusters, and $m_{k} = \frac{1}{∣ S _{k} ∣} x_{i} \in S_{i} \sum x_{i}$ The k-means clustering algorithm aims to partition observations into clusters in which each observation belongs to the cluster with the nearest mean.

Algorithm

Initialize ${m_{i}^{(1)} ∣ i = 1, \dots, K}$ .
Assignment step: Assign each observation to the cluster with the nearest mean

$S_{i}^{(t)} = {x_{p} ∣ 1 \leq j \leq K argmin ∣∣ x_{p} - m_{j}^{(t)} ∣ ∣^{2}}$ where each $x_{p}$ is assigned to exactly one $S_{i}^{(t)}$
Update step: Recalculate means for observations assigned to each cluster. $m_{i}^{(t + 1)} = \frac{1}{∣ S _{i}^{(t)} ∣} x_{j} \in s_{i}^{(t)} \sum x_{j}$
Iterate step 2 and 3 until the within cluster variation converges.

Facts

The k-means clustering uses spherical metric (Euclidean distance) to group data, so that it is useful only when clusters are convex sets.

My Knowledge Base

Explorer

K-Means Clustering

Definition

Algorithm

Facts

Graph View

Table of Contents

Backlinks