Journal of Statistics Applications & Probability

A Data-based Method for Harmonising Heterogeneous Data Modelling Techniques Across Data Mining Applications

Kassim S. Mwitondi, Sheffield HallamUniversity, Department of Computing, SheffieldS1 1WB,UKFollow
Raed A. T. Said

Author Country (or Countries)

Abstract

We propose an iterative graphical data visualisation algorithm for optimal model selection. The algorithm is implemented on three domain-partitioning techniques - decision trees, neural networks and support vector machines. Each model is trained and tested on the Pima Indians and Bupa Liver Disorders datasets with the performance being assessed in a multi-step process. Firstly, the conventional ROC curves and the Youden Indexare applied to determine the optimal model then sequential moving differences involving the fitted parameters - true and false positives – are extracted and their respective probability density estimations are used to track their variability using the proposed algorithm. The algorithm allows the use of data-dependent density bandwidths as tuning parameters in determining class separation across applications. Our results suggest that this novel approach yields robust predictions and minimizes data obscurity and over-fitting. The algorithm’s simple mechanics which derive from the standard confusion matrix and built-ingraphical data visualisationand adaptive bandwidth featuresmake it multidisciplinary compliant and easily comprehensible to non-specialists. The paper’s main outcomes are two-fold. Firstly, it combines the power of domain partitioning techniques on Bayesian foundations with graphical data visualisation to provide a dynamic, discernible and comprehensible information representation. Secondly, it demonstrates that by converting mathematical formulation into visual objects, multi-disciplinary teams can jointly enhance the knowledge of concepts and positively contribute towards global consistency in the data-based characterisation of various phenomena across disciplines.

Suggested Reviewers

N/A

Digital Object Identifier (DOI)

http://dx.doi.org/10.12785/jsap/020312

Recommended Citation

S. Mwitondi, Kassim and A. T. Said, Raed (2013) "A Data-based Method for Harmonising Heterogeneous Data Modelling Techniques Across Data Mining Applications," Journal of Statistics Applications & Probability: Vol. 2: Iss. 3, Article 12.
DOI: http://dx.doi.org/10.12785/jsap/020312
Available at: https://digitalcommons.aaru.edu.jo/jsap/vol2/iss3/12

Download

COinS