Prediction accuracy of top ranked projections

We have compared the prediction accuracy of the best-ranked projections found by VizRank to four standard machine learning methods: support vector machines (SVM, with linear kernel), k-nearest neighbors (with k set to square root of the number of training instances), naive Bayesian classifier, and decision trees (Quinlan's C4.5 implementation used with the default parameters). The predictive accuracy was assessed on six cancer gene expression data sets using the bootstrap resampling repeated 100 times, as recommended by (Braga-Neto and Dougherty, 2003). The final performance scores were computed using the 0.632 bootstrap estimator as suggested in the same reference. The average classification accuracies derived in this way and the area under the ROC curve, with their respective standard deviations are shown in the following tables.

Classification accuracy

Data set VizRank SVM k-NN Naive Bayes Decision trees

Leukemia 96.40 +- 4.33 97.57 +- 3.71 92.72 +- 6.74 84.34 +- 10.33 90.46 +- 5.52

DLBCL 93.03 +- 5.67 97.85 +- 3.26 88.60 +- 6.29 83.76 +- 8.64 85.46 +- 9.10

Prostate 94.00 +- 4.53 93.47 +- 4.60 84.51 +- 6.58 81.10 +- 9.00 85.47 +- 7.77

MLL 95.00 +- 5.12 97.32 +- 3.21 89.65 +- 6.37 75.20 +- 9.67 88.31 +- 9.16

SRBCT 96.39 +- 5.01 99.42 +- 2.35 86.29 +- 6.96 75.31 +- 10.58 87.32 +- 8.00

Lung cancer 92.72 +- 3.40 94.67 +- 3.16 90.35 +- 3.44 75.28 +- 5.18 91.21 +- 5.09

Ranks 1.83 1.17 3.5 5.00 3.50

AUC (Area under ROC)

Data set VizRank SVM k-NN Naive Bayes Decision trees

Leukemia 0.976 +- 0.040 0.997 +- 0.011 0.969 +- 0.049 0.819 +- 0.127 0.903 +- 0.069

DLBCL 0.946 +- 0.077 0.997 +- 0.010 0.925 +- 0.058 0.736 +- 0.095 0.818 +- 0.118

Prostate 0.961 +- 0.036 0.973 +- 0.026 0.912 +- 0.051 0.835 +- 0.091 0.870 +- 0.075

MLL 0.981 +- 0.030 0.998 +- 0.005 0.983 +- 0.020 0.860 +- 0.073 0.938 +- 0.059

SRBCT 0.989 +- 0.025 1.000 +- 0.001 0.978 +- 0.030 0.879 +- 0.076 0.942 +- 0.045

Lung cancer 0.969 +- 0.036 0.995 +- 0.007 0.974 +- 0.027 0.753 +- 0.054 0.935 +- 0.051

Ranks 2.33 1.00 2.67 5.00 4.00

To download the scripts and data sets used to obtain these results click here.

Data set	VizRank	SVM	k-NN	Naive Bayes	Decision trees
Leukemia	96.40 +- 4.33	97.57 +- 3.71	92.72 +- 6.74	84.34 +- 10.33	90.46 +- 5.52
DLBCL	93.03 +- 5.67	97.85 +- 3.26	88.60 +- 6.29	83.76 +- 8.64	85.46 +- 9.10
Prostate	94.00 +- 4.53	93.47 +- 4.60	84.51 +- 6.58	81.10 +- 9.00	85.47 +- 7.77
MLL	95.00 +- 5.12	97.32 +- 3.21	89.65 +- 6.37	75.20 +- 9.67	88.31 +- 9.16
SRBCT	96.39 +- 5.01	99.42 +- 2.35	86.29 +- 6.96	75.31 +- 10.58	87.32 +- 8.00
Lung cancer	92.72 +- 3.40	94.67 +- 3.16	90.35 +- 3.44	75.28 +- 5.18	91.21 +- 5.09
Ranks	1.83	1.17	3.5	5.00	3.50