Journal of Statistics Applications & Probability
Abstract
We propose an iterative graphical data visualisation algorithm for optimal model selection. The algorithm is implemented on three domain-partitioning techniques - decision trees, neural networks and support vector machines. Each model is trained and tested on the Pima Indians and Bupa Liver Disorders datasets with the performance being assessed in a multi-step process. Firstly, the conventional ROC curves and the Youden Indexare applied to determine the optimal model then sequential moving differences involving the fitted parameters - true and false positives – are extracted and their respective probability density estimations are used to track their variability using the proposed algorithm. The algorithm allows the use of data-dependent density bandwidths as tuning parameters in determining class separation across applications. Our results suggest that this novel approach yields robust predictions and minimizes data obscurity and over-fitting. The algorithm’s simple mechanics which derive from the standard confusion matrix and built-ingraphical data visualisationand adaptive bandwidth featuresmake it multidisciplinary compliant and easily comprehensible to non-specialists. The paper’s main outcomes are two-fold. Firstly, it combines the power of domain partitioning techniques on Bayesian foundations with graphical data visualisation to provide a dynamic, discernible and comprehensible information representation. Secondly, it demonstrates that by converting mathematical formulation into visual objects, multi-disciplinary teams can jointly enhance the knowledge of concepts and positively contribute towards global consistency in the data-based characterisation of various phenomena across disciplines.
Suggested Reviewers
N/A
Digital Object Identifier (DOI)
http://dx.doi.org/10.12785/jsap/020312
Recommended Citation
S. Mwitondi, Kassim and A. T. Said, Raed
(2013)
"A Data-based Method for Harmonising Heterogeneous Data Modelling Techniques Across Data Mining Applications,"
Journal of Statistics Applications & Probability: Vol. 2:
Iss.
3, Article 12.
DOI: http://dx.doi.org/10.12785/jsap/020312
Available at:
https://digitalcommons.aaru.edu.jo/jsap/vol2/iss3/12