数据质量聚类算法

Clustering Data with Mass

摘要: 在聚类算法中，聚类中心决定聚类的最终结果，而传统的分割聚类算法不能准确定位聚类中心。根据数据场提出了数据质量聚类中心的新概念，给出数据质量聚类算法，能够一次定位聚类中心，无需迭代，也无需预置聚类个数。7组对比实验表明，提出的方法能够准确定位聚类中心，获得良好的聚类结果和稳定性，优于传统的分割聚类算法和峰值密度聚类算法。

Abstract: The clustering center has a great effect on the clustering result. In this paper, a new concept of the data mass is proposed. The mass of data represents one of the inherent attributes of the data. With different view angles of data mining, the data mass maybe different. Based on the concept of data mass, a new clustering algorithm, which is clustering data with mass, is put forward. This new algorithm finds the clustering centers based on two attributes of data:the data mass and the data distance. And it can complete the clustering process with only one pass of the whole dataset. Experimental results show that the proposed algorithm can find the clustering center accurately and can get better clustering result than the same typical clustering algorithms, such as K-means, K-medoids and clustering by fast search and find of density peaks.