Future Computing and Informatics Journal

An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm

Tanvir Habib Sardar, tanvir.cs@pace.edu.inFollow
Zahid Ansari, Computer Science and Engineering, P.A. College of Engineering, Mangalore, IndiaFollow

Abstract

One of the significant data mining techniques is clustering. Due to expansion and digitalization of each field, large datasets are being generated rapidly. Such large dataset clustering is a challenge for traditional sequential clustering algorithms due to huge processing time. Distributed parallel architectures and algorithms are thus helpful to achieve performance and scalability requirement of clustering large datasets. In this study, we design and experiment a parallel k-means algorithm using MapReduce programming model and compared the result with sequential k-means for clustering varying size of document dataset. The result demonstrates that proposed k-means obtains higher performance and outperformed sequential k-means while clustering documents.

Recommended Citation

Sardar, Tanvir Habib and Ansari, Zahid (2018) "An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm," Future Computing and Informatics Journal: Vol. 3: Iss. 2, Article 7.
Available at: https://digitalcommons.aaru.edu.jo/fcij/vol3/iss2/7

Download

Included in

Computer Engineering Commons

COinS

Future Computing and Informatics Journal

An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm

Abstract

Recommended Citation

Included in

Special Issues:

Search

Future Computing and Informatics Journal

An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm

Authors

Abstract

Recommended Citation

Included in

Share

Special Issues:

Search