This page contains supplemental material for the following paper submitted to Bioinformatics:
Visualization-based cancer microarray data classification analysis
Minca Mramor, Gregor Leban, Janez Dem�ar, and Bla� Zupan
Abstract:
Motivation: Methods for analyzing cancer microarray data often face two distinct challenges: the models they infer need to perform well when classifying new tissue samples while at the same time providing an insight into the patterns and gene interactions hidden in the data. State-of-the-art supervised data mining methods often well cover only one of these aspects, motivating the development of methods where predictive models with a solid classification performance would be easily communicated to the domain expert.
Results: Data visualization may provide for an excellent approach to knowledge discovery and analysis of class-labeled data. We have previously developed an approach called VizRank that can score and rank point-based visualizations according to degree of separation of data instances of different class. We here extend VizRank with techniques to uncover outliers, score features (genes) and perform classification, as well as to demonstrate that the proposed approach is well-suited for cancer microarray analysis. Using VizRank and radviz visualization on a set of previously published cancer microarray data sets, we were able to find simple, interpretable data projections that include only a small subset of genes yet do clearly differentiate among different cancer types. We also report that our approach to classification through visualization achieves performance that is comparable to state-of-the-art supervised data mining techniques.
We provide supplemental information on the following topics:
The methods reported in this paper were implemented in Orange data mining suite, with instalation available at Orange installation pages.