Corelab Seminar

Loukas Kavouras (NTUA)
k means clustering

k-means clustering is a method of vector quantization originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as prototype of the cluster. I will present you Lloyd's algorithm which is a very simple and fast algorithm for practical applications but has no approximation guarantees. By improving the initialization of the algorithm we get a faster algorithm with approximation guarantees,named k-means++.Next, i will show a way to parallelize the algorithm making it much more efficient for large data. Last, i will show 2 algorithms in the streaming model.