We have compared the prediction accuracy of the best-ranked projections found by VizRank to four standard machine learning methods: support vector machines (SVM, with linear kernel), k-nearest neighbors (with k set to square root of the number of training instances), naive Bayesian classifier, and decision trees (Quinlan's C4.5 implementation used with the default parameters). The predictive accuracy was assessed on six cancer gene expression data sets using the bootstrap resampling repeated 100 times, as recommended by (Braga-Neto and Dougherty, 2003). The final performance scores were computed using the 0.632 bootstrap estimator as suggested in the same reference. The average classification accuracies derived in this way and the area under the ROC curve, with their respective standard deviations are shown in the following tables.
Classification accuracy
Data set
VizRank
SVM
k-NN
Naive Bayes
Decision trees
Leukemia
96.40 +- 4.33
97.57 +- 3.71
92.72 +- 6.74
84.34 +- 10.33
90.46 +- 5.52
DLBCL
93.03 +- 5.67
97.85 +- 3.26
88.60 +- 6.29
83.76 +- 8.64
85.46 +- 9.10
Prostate
94.00 +- 4.53
93.47 +- 4.60
84.51 +- 6.58
81.10 +- 9.00
85.47 +- 7.77
MLL
95.00 +- 5.12
97.32 +- 3.21
89.65 +- 6.37
75.20 +- 9.67
88.31 +- 9.16
SRBCT
96.39 +- 5.01
99.42 +- 2.35
86.29 +- 6.96
75.31 +- 10.58
87.32 +- 8.00
Lung cancer
92.72 +- 3.40
94.67 +- 3.16
90.35 +- 3.44
75.28 +- 5.18
91.21 +- 5.09
Ranks
1.83
1.17
3.5
5.00
3.50
AUC (Area under ROC)
Data set
VizRank
SVM
k-NN
Naive Bayes
Decision trees
Leukemia
0.976 +- 0.040
0.997 +- 0.011
0.969 +- 0.049
0.819 +- 0.127
0.903 +- 0.069
DLBCL
0.946 +- 0.077
0.997 +- 0.010
0.925 +- 0.058
0.736 +- 0.095
0.818 +- 0.118
Prostate
0.961 +- 0.036
0.973 +- 0.026
0.912 +- 0.051
0.835 +- 0.091
0.870 +- 0.075
MLL
0.981 +- 0.030
0.998 +- 0.005
0.983 +- 0.020
0.860 +- 0.073
0.938 +- 0.059
SRBCT
0.989 +- 0.025
1.000 +- 0.001
0.978 +- 0.030
0.879 +- 0.076
0.942 +- 0.045
Lung cancer
0.969 +- 0.036
0.995 +- 0.007
0.974 +- 0.027
0.753 +- 0.054
0.935 +- 0.051
Ranks
2.33
1.00
2.67
5.00
4.00
To download the scripts and data sets used to obtain these results click here.