K-means is a classic unsupervised learning clustering algorithm. In theory, it can work well in the field of image segmentation. But compared with other segmentation algorithms, this algorithm needs much more computation, and segmentation speed is slow. This limits its application. With the emergence of general-purpose computing on the GPU and the release of CUDA, some scholars try to implement K-means algorithm in parallel on the GPU, and applied to image segmentation at the same time. They have achieved some results, but the approach they use is not completely parallel, not take full advantage of GPU’s super computing power. K-means algorithm has two core steps: label and update, in current parallel realization of K-means, only labeling is parallel, update operation is still serial. In this paper, both of the two steps in K-means will be parallel to improve the degree of parallelism and accelerate this algorithm. Experimental results show that this improvement has reached a much quicker speed than the previous research.