•  
  •  
 
Information Sciences Letters

Information Sciences Letters

Abstract

Data mining is a method to mine valuable hidden knowledge, patterns and associations from massive and sparse datasets. This process proceeds through various techniques e.g. classification, clustering and association etc. Clustering is an important data mining technique which group similar data items together in a group. In this study comparison is performed with six different clustering techniques using six different datasets. Comparison was performed on the basis of different evaluation parameters. By overall results it is concluded that k-Mean algorithm is best, simplest, produced quality clusters and has high performance amongst all other five algorithms. Performance of EM algorithm is worst amongst all other five algorithms as it took more time to produce inaccurate results. Hierarchical algorithm is best on small datasets but on huge datasets it took more time. Performance of density based Clustered and Canopy algorithm is almost same with slight difference in results. We also compared our study results with existing results and proved that proposed results are quite reasonable and accurate. Our research analysis and results make better understanding for cluster researcher to improve existing techniques and also to analyze more techniques and to propose a new clustering technique.

Share

COinS