K-means is a popular unsupervised ML algorithm for analyzing and recognizing natural occurring patterns to cluster similar points together. When applied to the color space of an image, it can work to recognize segments of the image where more meaningful clustering can be applied. Color quantization has been employed for decades now to optimize the memory usage of saved images. Typical images are composed of red, blue, and green channels, each represented by a byte in memory. Therefore, each pixel can be represented by a total of 24 bits, resulting in around 16.8 million unique colors. However, the human perceptive system is not sensitive enough to require full usage of this color space and it is beneficial to find ways to reduce the number of colors closer to what the eye is able to distinguish. This results in more efficient use of memory while still preserving details and color separation in the image. The key issue is to determine how much a picture can be quantized before the image starts to degrade to the point that a human would be able to discern the difference. Currently there is no algorithm available to aptly determine where this point occurs or whether each color channel should be treated identically. This research applies K-means color clustering to each color channel of the image separately to optimize compression. The introduction of principal component analysis (PCA) informed K-means in place of randomly seeded K-means on each color channel separately further improves performance.
|