I am trying to apply K-Means through the following code-snippet in Python. Basically, the
arr is a numpy array having values in three columns (Data with Multiple Features that I want to cluster). Here, I have used the following values:
cv.TERM_CRITERIA_EPS = 1.0,
cv.TERM_CRITERIA_MAX_ITER = 10 and
attempts = 10. (as per the default values in the OpenCV documnetation link above).
To be specific, my three column
arr is a RGB image, that is reshaped such that each column represents a color channel.
import cv2 import numpy as np Z = np.float32(arr) criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0) ret, label, center = cv2.kmeans(Z, 4, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS) labelToUse = (label.flatten()).astype('int32') centerToUse = center.astype('float64')
While this gives me perfect results 70% of the time, but in 30% of the time I'm experiencing a weird case, as in the picture below (left one is
centerToUse, right one is
labelToUse). That is, all my cluster centers are (0, 0, 0), while labels are 0 for all the data points, except the last three (3, 2 and 1 respectively). Also, for the same
arr, in some runs this abnormal case arises while in others, the results are just perfect.
Can anyone suggest me, about what should be my approach to eliminate this abnormality. I want to get decent results from K-Means each time, and not wait for good fortune. Also (don't know though whether this is relevant or not here), the situation is same in the scikit-learn representation of K-Means. This improves a bit if I increase
max_iter to 30 and 300 respectively, but still persists. K-Means++ initialization also does not help.