FRI > Biolab > Supplements

Data set name: leukemia


Original data set (Golub et al.)
Data set for Orange
Brief description:
This classification model is built with probably the most famous gene expression cancer dataset (Golub et al.), containing information on gene-expression in samples from human acute myeloid (AML) and acute lymphoblastic leukemias (ALL). The original research is one of the first to show a new approach to cancer classification based on gene expression monitoring by DNA microarrays.

Platform: Affymetrix HuGeneFL array

Diagnostic classes:
- acute lymphoblastic leukemia (ALL): 47 examples (65.3%)
- acute myeloid leukemia (AML): 25 examples (34.7%)
Number of genes: 5147
Number of samples: 72
Note: From the originally measured 6817 probe sets we removed genes that were not present (P) in at least one sample
Predictive accuracy with 10-fold cross validation (classifying using the best projection with eight attributes):
Classification accuracy: 98.57%
Area under curve (AUC): 1.000
Following are the three best-ranked visualization with eight, six and four attributes in respect to the visualization score, that is, visualizations where examples from different diagnostic classes are best separated:

Score: 99.93%
Genes:
M11722_at: "deoxynucleotidyltransferase, terminal", DNTT
HG1612-HT1612_at: ---, ---
M92287_at: cyclin D3, CCND3
U05259_rna1_at: CD79A antigen (immunoglobulin-associated alpha), CD79A
M23197_at: CD33 antigen (gp67), CD33
X95735_at: zyxin, ZYX
M27891_at: cystatin C (amyloid angiopathy and cerebral hemorrhage), CST3
M84526_at: D component of complement (adipsin), DF
Score: 99.70%
Genes:
M11722_at: "deoxynucleotidyltransferase, terminal", DNTT
M31523_at: transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47), TCF3
M92287_at: cyclin D3, CCND3
M84526_at: D component of complement (adipsin), DF
M23197_at: CD33 antigen (gp67), CD33
M27891_at: cystatin C (amyloid angiopathy and cerebral hemorrhage), CST3
Score: 99.05%
Genes:
M11722_at: "deoxynucleotidyltransferase, terminal", DNTT
M92287_at: cyclin D3, CCND3
M84526_at: D component of complement (adipsin), DF
M23197_at: CD33 antigen (gp67), CD33

Attribute ranking

Following is the histogram of genes showing how often are they present in one of the top 100 radviz visualizations with 8 attributes.

Genes:
M23197_at: CD33 antigen (gp67), CD33
M31523_at: transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47), TCF3
M27891_at: cystatin C (amyloid angiopathy and cerebral hemorrhage), CST3
M92287_at: cyclin D3, CCND3
M84526_at: D component of complement (adipsin), DF
U05259_rna1_at: CD79A antigen (immunoglobulin-associated alpha), CD79A
X95735_at: zyxin, ZYX
M11722_at: "deoxynucleotidyltransferase, terminal", DNTT
J05243_at: "spectrin, alpha, non-erythrocytic 1 (alpha-fodrin)", SPTAN1
D88270_at: pre-B lymphocyte gene 1, VPREB1
HG1612-HT1612_at: ---, ---
M84371_rna1_s_at: CD19 antigen, CD19
M19507_at: myeloperoxidase, MPO
U29175_at: "SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 4", SMARCA4
M22960_at: protective protein for beta-galactosidase (galactosialidosis), PPGB
X59417_at: "proteasome (prosome, macropain) subunit, alpha type, 6", PSMA6
M55150_at: fumarylacetoacetate hydrolase (fumarylacetoacetase), FAH
M28170_at: ---, ---
M96326_rna1_at: azurocidin 1 (cationic antimicrobial protein 37), AZU1
X16546_at: "ribonuclease, RNase A family, 2 (liver, eosinophil-derived neurotoxin)", RNASE2