Raúl Ramos-Pollán, Miguel Angel Guevara López, Eugénio Oliveira
Journal of Medical Systems: 1-13. 2011
This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures.
In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation.