I am trying to apply K-Means through the following code-snippet in Python. Basically, the arr
is a numpy array having values in three columns (Data with Multiple Features that I want to cluster). Here, I have used the following values: cv.TERM_CRITERIA_EPS = 1.0
, cv.TERM_CRITERIA_MAX_ITER = 10
and attempts = 10
. (as per the default values in the OpenCV documnetation link above).
To be specific, my three column arr
is a RGB image, that is reshaped such that each column represents a color channel.
import cv2
import numpy as np
Z = np.float32(arr)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret, label, center = cv2.kmeans(Z, 4, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
labelToUse = (label.flatten()).astype('int32')
centerToUse = center.astype('float64')
While this gives me perfect results 70% of the time, but in 30% of the time I'm experiencing a weird case, as in the picture below (left one is centerToUse
, right one is labelToUse
). That is, all my cluster centers are (0, 0, 0), while labels are 0 for all the data points, except the last three (3, 2 and 1 respectively). Also, for the same arr
, in some runs this abnormal case arises while in others, the results are just perfect.
Can anyone suggest me, about what should be my approach to eliminate this abnormality. I want to get decent results from K-Means each time, and not wait for good fortune. Also (don't know though whether this is relevant or not here), the situation is same in the scikit-learn representation of K-Means. This improves a bit if I increase n_init
and max_iter
to 30 and 300 respectively, but still persists. K-Means++ initialization also does not help.
在这个问题上的任何线索将不胜感激。