•  
  •  
 

Applied Mathematics & Information Sciences

Authors

Jinchao Ji, School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, China\\ Key Lab of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, China\\ Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, ChinaFollow
Wei Pang, School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, AB24 3UE, UKFollow
Yanlin Zheng, School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, China\\ Key Lab of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, ChinaFollow
Zhe Wang, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China\\ College of Computer Science and Technology, Jilin University, Changchun, 130012, ChinaFollow
Zhiqiang Ma, School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, China\\ Key Lab of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, ChinaFollow
Libiao Zhang, School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, China\\ Key Lab of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, ChinaFollow

Author Country (or Countries)

China

Abstract

The k-prototypes algorithms are well known for their efficiency to cluster mixed numeric and categorical data. In kprototypes type algorithms the initial cluster centers are often determined in a random manner. It is acknowledged that the initial placement of cluster centers has a direct impact on the performance of the k-prototypes algorithms. However, most of the existing initialization approaches are designed for the k-means or k-modes algorithms, which can only deal with either pure numeric or categorical data, but not the mixture of both. In this paper, we propose a novel cluster center initialization method for the k-prototypes algorithms to address this issue. In the proposed method, the centrality of data objects is introduced based on the concept of neighborset, and then both the centrality and distance are exploited together to determine initial cluster centers. The performance of the proposed method is demonstrated by a series of experiments in comparison with that of traditional random initialization method.

Share

COinS