Author Country (or Countries)

Egypt

Abstract

Protein sequences clustering based on their sequence patterns has attracted lots of research efforts in the last decade. The principal idea of most clustering systems is how to represent and interpret protein sequences, which principally determines the performance of classifiers. In this paper, we proposed a new methodology, that definite a new descriptor to represent and interpret each sequence using its Probability Densities Functions (PDF). The Hellinger distance is used to measure the similarity between the sequences. Afterward, a hierarchical algorithm is applied to clustering proteins sequences using the Hellinger distance. Two of protein data sets are using for the experiments; the first is a mixed between Influenza and Ebola virus and the second is a set of Influenza. We compare between a two Hierarchical Clustering Algorithms, The first based on similarity measure is to use methods with sequences alignments (HCAWSA). The second is the proposed approach to the similarity measure is to use methods without sequences alignments.( HCAWOSA). The experiments result show that the proposed methodology is feasible and achieves good accuracy.

Digital Object Identifier (DOI)

http://dx.doi.org/10.18576/amis/100432

Recommended Citation

Abdel-Azim, Gamil (2016) "New Hierarchical Clustering Algorithm for Protein Sequences Based on Hellinger Distance," Applied Mathematics & Information Sciences: Vol. 10: Iss. 4, Article 32.
DOI: http://dx.doi.org/10.18576/amis/100432
Available at: https://digitalcommons.aaru.edu.jo/amis/vol10/iss4/32

Download

COinS

Applied Mathematics & Information Sciences

New Hierarchical Clustering Algorithm for Protein Sequences Based on Hellinger Distance

Authors

Author Country (or Countries)

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Share

Search