K-means clustering: Difference between revisions
CSV import Tags: mobile edit mobile web edit |
CSV import |
||
| Line 38: | Line 38: | ||
{{stub}} | {{stub}} | ||
<gallery> | |||
File:K-means_convergence.gif|K-means clustering | |||
File:K_Means_Example_Step_1.svg|K-means clustering | |||
File:K_Means_Example_Step_2.svg|K-means clustering | |||
File:K_Means_Example_Step_3.svg|K-means clustering | |||
File:K_Means_Example_Step_4.svg|K-means clustering | |||
File:K-means_convergence_to_a_local_minimum.png|K-means clustering | |||
File:Iris_Flowers_Clustering_kMeans.svg|K-means clustering | |||
File:ClusterAnalysis_Mouse.svg|K-means clustering | |||
File:Rosa_Gold_Glow_2_small_noblue.png|K-means clustering | |||
File:Rosa_Gold_Glow_2_small_noblue_color_space.png|K-means clustering | |||
</gallery> | |||
Latest revision as of 12:06, 18 February 2025
K-means clustering is a popular unsupervised learning algorithm used in data mining and machine learning for partitioning n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This method is a type of partitioning clustering that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
Overview[edit]
K-means clustering optimizes the positions of the centroids (the mean position of all the points in a cluster) to minimize the within-cluster sum of squares (WCSS). In other words, the algorithm tries to minimize the variance within each cluster. The 'means' in K-means refers to averaging of the data; that is, finding the centroid.
Algorithm[edit]
The standard algorithm for K-means clustering, often referred to as Lloyd's algorithm, involves four steps:
- Initialization: Selecting initial centroids randomly.
- Assignment: Assign each observation to the cluster with the closest centroid.
- Update: Calculate the new centroids as the mean of the observations in each cluster.
- Repeat: Repeat the assignment and update steps until convergence, that is, until the centroids no longer change significantly.
Applications[edit]
K-means clustering is widely used in various fields such as market research, pattern recognition, image analysis, and bioinformatics for grouping data into k distinct clusters based on their features.
Challenges and Solutions[edit]
One of the main challenges of K-means clustering is choosing the appropriate number of clusters (k). Several methods, such as the Elbow method and the Silhouette method, have been developed to address this issue. Another challenge is the sensitivity of the initial centroid selection, which can lead to suboptimal clustering. Solutions include multiple runs with different initializations and more sophisticated initialization methods like the k-means++ algorithm.
Variants[edit]
Several variants of the K-means algorithm exist, including:
- K-means++: Improves the initialization phase to ensure better cluster centroids.
- Fuzzy K-means: Allows observations to belong to more than one cluster with varying degrees of membership.
- Mini-batch K-means: Uses small random batches of observations for each iteration, reducing computation time.
See Also[edit]
References[edit]
<references />


