Email is one of the most ubiquitous and pervasive application used on a daily basis by millions of people worldwide. Email spam is a serious worldwide problem which causes problems for almost all computer users. Nowadays, e-mail becomes a powerful tool for communication as it saves a lot of time and cost. But, due to social networks and advertisers, most of the e-mails contain unwanted information called spam. Spam is the unwanted and unsolicited commercial e-mail. It is also known as junk e-mail. This issue not only affects normal users of the internet, but also causes a huge problem for companies and organizations since it costs a huge amount of money in lost productivity, wasting user’s time and network bandwidth. Recently, various researchers have presented several email spam classiﬁcation techniques. Spam classiﬁcations, which ﬁlter the spam emails from inbox moves it to our junk email folder. It automatically classiﬁes email based on the social features. Spam classiﬁes the set of mails into spam and ham based on its contents. It is very difﬁcult to eliminate the spam mail completely as the spammers change their techniques frequently. The proposed system, we have developed is an efﬁcient technique to classify the email spam using ensemble method. Gradient Boost classiﬁcation is used which is an ensemble of the weak decision tree and weighted majority voting is used to ensemble the decision tree and also Naive Bayes classiﬁcation is used. It consists of two phases, such as training phase and testing phase. The performance metrics namely precision, recall and accuracy are used for evaluation.
Digital Object Identifier (DOI)
Karthika Renuka, D.; Visalakshi, P.; and Rajamohana, SP.
"An Ensembled Classiﬁer for Email Spam Classiﬁcation in Hadoop Environment,"
Applied Mathematics & Information Sciences: Vol. 11:
4, Article 19.
Available at: https://digitalcommons.aaru.edu.jo/amis/vol11/iss4/19