Introduction http://apps.fao.org 2003 1998 2001 2003 2003 Coffea arabica Coffea canephora 2002 C. canephora C. arabica C. canephora n x 1988 C. arabica n x C. arabica C. canephora 2000 2003 Arabidopsis Arabidopsis Arabidopsis 1998 2004 1 Fig. 1 1993 1 http://www.sgn.cornell.edu/ 2002 2004 1998 2004 1 C 2002 x x 1971 1998 Materials and methods Library construction Source of tissues C. canephora 1 C. canephora Table 1 Characteristics of the 5 cDNA libraries used to develop the coffee EST database Library name Tissue Varieties Average insert size, kb Good quality ESTs Leaf Leaves, young BP409 1.5±0.6 8,942 Pericarp Pericarp, all developmental stages BP358, BP409, BP42, BP961, Q121 1.4±0.5 8,956 Early stage cherry Whole cherries, 18 and 22 week after pollination BP358, BP409, BP42, Q121 1.4±0.3 9,843 Middle stage seed Endosperm and perisperm of seeds, 30 week after pollination BP409, BP961, Q121 1.4±0.3 10,077 Late stage seed Endosperm and perisperm of seeds, 42 and 46 week after pollination BP358, BP409, BP42, BP961, Q121 1.4±0.3 9,096 RNA and mRNA isolation 1999 cDNA libraries 1 Sequencing http://www.brc.cornell.edu Sequence quality processing 1998 E.coli Unigene assembly http://www.sgn.cornell.edu 1999 Annotation Protein prediction 1999 http://www.ebi.ac.uk/embl 2 2 Table 2 Materials and methods Tomato Coffee Total unigenes 30,576 13,175 Average unigene length, bp 774 678 Unigenes with coding regions 96% 95% Average length (bp) of predicated peptides 569 556 Average ESTScan score 409 346 BLAST matches between coffee unigenes and other sequence databases 1997 Results and discussion http://www.ncbi.nlm.nih.gov Arabidopsis Arabidopsis http://www.arabidopsis.org ftp://ftp.arabidopsis.org/home/tair/ Solanum. lycopersicum Solanum. tuberosum Capsicum annuum Petunia hybrida Solanum melongena http://www.sgn.cornell.edu Arabidopsis Functional annotation based on predicted peptides 2001 2001 http://www.geneontology.org http://www.ebi.ac.uk/interpro Functional categorization based on gene ontology http://www.geneontology.org Gene family analysis Arabidopsis 2002 Results and discussion Generation of coffee EST database and unigene set Materials and methods 1 2 2 http://www.sgn.cornell.edu Fig. 2 bars Differentiating between paralogs and alleles C. canephora 1 C. canephora Arabidopsis http://www.arabidopsis.org Arabidopsis Columbia Arabidopsis Arabidopsis 3 Fig. 3 Arabidopsis Arabidopsis Arabidopsis 3 Functional annotation of coffee EST-derived unigenes Predicted coffee proteins Materials and methods Arabidopsis Protein domain annotation Materials and methods 3 Arabidopsis Arabidopsis Arabidopsis Arabidopsis Table 3 Arabidopsis InterPro accession Description % of unigenes (ranking) Coffee Tomato Arabidopsis IPR000719 Protein kinase 1.6 1.20 (1) 3.0 (1) IPR000694 Proline-rich region 1.3 0.91 (4) 0.003 (1763) IPR002290 Serine/threonine protein kinase 0.85 1.10 (2) 0 IPR001245 Tyrosine protein kinase 0.69 1.0 (3) 0.15 (311) IPR008271 Serine/threonine protein kinase, active site 0.61 0.68 (5) 2.6 (2) IPR000504 RNA-binding region RNP-1 (RNA recognition motif) 0.55 0.60 (6) 0.59 (6) IPR001680 G-protein beta WD-40 repeat 0.49 0.51 (8) 0.51 (8) IPR001611 Leucine-rich repeat 0.48 0.59 (7) 0.59 (7) IPR002048 Calcium-binding EF-hand 0.36 0.34 (13) 0.34 (13) IPR000379 Esterase/lipase/thioesterase 0.33 0.43 (10) 0.43 (10) IPR001806 Ras GTPase superfamily 0.32 0.26 (22) 0.43 (70) IPR003579 Ras small GTPase, Rab type 0.29 0.23 (27) 0 IPR005123 2OG-Fe(II) oxygenase superfamily 0.27 0.26 (21) 0.47 (52) IPR000626 Ubiquitin 0.27 0.22 (32) 0.40 (89) IPR002401 E-class P450, group I 0.27 0.46 (8) 0.77 (24) IPR002347 Glucose/ribitol dehydrogenase 0.26 0.23 (28) 0.33 (110) IPR001005 Myb DNA-binding domain 0.26 0.34 (15) 1.34 (8) IPR005225 Small GTP-binding protein domain 0.26 0.24 (25) 0.68 (27) IPR000608 Ubiquitin-conjugating enzymes 0.26 0.21 (34) 0.19 (221) IPR007090 Leucine-rich repeat, plant specific 0.25 0.40 (12) 1.07 (11) Gene ontology annotation Materials and methods Materials and methods 4 Fig. 4 Arabidopsis Arabidopsis Arabidopsis 4 Arabidopsis Arabidopsis Arabidopsis Arabidopsis P Arabidopsis 4 1995 2001 In silico analysis of unigene expression Complexity and uniqueness of different stages/tissues 5 5 5 Fig. 5 Characteristics of each coffee cDNA library in comparison to the entire coffee EST-derived unigene set. The total unigene and highly expressed unigene categories sum to greater 100% since the same unigene may contain ESTs from more than one library Differential expression of genes across stages/tissues 1997 P 4 Table 4 P Library Pericarp Early stage cherry Middle stage seed Late stage seed Leaf 384 752 548 562 Pericarp 610 458 527 Early stage cherry 602 728 Middle stage seed 585 Highly expressed genes 5 Arabidopsis 5 Table 5 Arabidopsis Coffee unigene#: annotation Best match (e value/score) EST count Arbidopsis Solanaceae Unigene_species Total ESTs Leaf Pericarp Early stage cherry Middle stage seed Late stage seed 125230: putative 2s seed storage protein ND 243065_tomato (e-103/238) 1,219 7 15 21 1,037 139 120912: 11s seed storage protein At5g44120 (1e-88/324) 228376_tomato (0/802) 687 0 3 28 244 412 121707: unknown function At1g29050 (1e-139/489) 246695_potato (e-163/283) 324 2 3 1 149 169 120118: unknown function At5g59320 (2e-21/99.8) 221585_tomato (e-134/475) 292 0 0 3 58 231 124988: unknown function ND ND 204 58 84 55 1 6 120685: chitinase At5g24090 (2e-43/172) 214596_tomato (1e-35/84.5) 202 99 40 58 0 5 124158: photoassimilate-responsive protein At3g54040 (2e-36/149) 196924_pepper (2e-39/138) 182 1 1 2 150 28 119890: unknown function ND 204426_pepper (5e-07/52.8) 183 0 0 0 183 0 123265: ADP-ribosylation factor At2g47170 (1e-99/359) 238338_tomato (0/693) 182 58 14 65 21 24 124083: secretory peroxidase At4g21960 (e-153/537) 196145_pepper (0/681) 161 55 5 19 49 33 124911: metallothionein At5g02380 (0.32/32.3) 207464_petunia (2e-06/51.0) 163 40 65 30 11 17 119817: chitinase At3g12500 (e-103/373) 248120_potato (e-148/521) 148 0 22 0 0 126 124815: unknown function At3g29240 (1e-87/320) 227940_tomato (e-146/517) 145 2 0 1 0 142 122206: SAM synthase At2g36880 (0/711) 270415_petunia (0/887) 142 1 9 130 0 2 119460: WRKY4 transcription factor At1g80840 (3e-75/279) 237166_tomato (e-137/487) 123 0 0 123 0 0 123045: unknown function At3g16000 (0.69/31.2) 218824_tomato (90.36/33.1) 123 81 17 22 3 0 120481: AdoMet synthase At4g01850 (0/723) 243236_potato (0/886) 108 19 27 31 20 11 121265: Mobl/phocein At5g45550 (e-119/425) 196814_pepper (e-146/513) 113 1 0 112 0 0 124791: plasmodesmal receptor At5g15140 (1e-99/360) 203764_pepper (8e-86/314) 105 0 0 3 26 76 122071: rubiso small subunit At1g67090 (9e-70/260) 207453_petunia (3e-89/297) 99 76 8 9 6 0 BLAST match values are given in parentheses P Seed storage protein genes Unigene 125230: a putative 2S seed storage protein 5 Arabidopsis 5 1 Unigene 120912: 11S seed storage protein 5 C. arabica 1999 1999 Arabidopsis 1988 5 C. arabica Arabidopsis Other seed-specific genes Early stage seed development 5 Arabidopsis l 5 l l 1997 5 Arabidopsis 2003 Arabidopsis 2002 2003 Arabidopsis Arabidopsis 2004 Middle stage seed development Arabidopsis 5 Arabidopsis 1995 Late stage seed development Arabidopsis Arabidopsis Two highly expressed genes with homology to chitinase 2002 2003 5 Highly expressed genes unique to coffee Unigene 124988 Arabidopsis 5 Unigene 119890 5 5 Gene families unique or significantly expanded in coffee Arabidopsis Materials and methods arabidopsis Arabidopsis Arabidopsis Arabidopsis 6 Arabidopsis 7 6 7 Table 6 Arabidopsis Family # Arabidopsis # Coffee family member Longest coffee member Annotation 266 1 21 122330 Retrotransposon gag protein, class I 180 5 14 124952 Polygalacturonase isoenzyme 1 beta subunit with BURP domain 632 1 12 123451 Acidic endochitinase 386 2 10 124158 Photoassimilate-responsive protein 382 4 8 119672 Hypersensitive-induced protein, band 7 protein 394 2 7 122791 E-class P450 483 2 6 120054 Bet v I allergen 623 3 6 119581 Root hair defective protein 1,182 1 5 126674 Unknown function 695 2 5 126974 Tyrosine decarboxylase 783 2 5 122423 Unknown function 1,117 2 5 119449 Trypsin inhibitor Kunitz Table 7 Arabidopsis Gene family # # Family member Longest member Solanaceae hit Annotation 243 27 122956 258190 potato Retrotransposon gag protein, class II 687 11 120121 221585 tomato Thaumatin, pathogenesis related 965 10 119718 249401 potato Zn-finger, CCHC type 974 10 120244 2610402 potato Disease resistance protein (TIR-NBS-LRR class) 852 9 119638 225732 tomato Retrotransposon gag protein, classs III 360 8 121998 23671 tomato Disease resistance protein 1,019 7 124574 222350 tomato Leucine-rich repeat, disease resistance protein 1,607 7 122216 none Unknown function 1,610 7 130519 none Unknown function 1,676 7 126264 243065 tomato Unknown function 708 6 123769 236157 tomato ABA/WDS induced protein 1,852 5 120284 213688 tomato Proline-rich region, extension-like protein 2,362 5 122218 237314 tomato Unknown function 2,459 5 124466 267984 potato Leucine-rich repeat, plant specific, receptor-related protein kinase Coffee-expanded gene families 6 Arabidopsis Arabidopsis Arabidopsis Arabidopsis Arabidopsis 6 6 Coffee-unique gene families 7 Arabidopsis 7 1998 1998 7 Arabidopsis Arabidopsis Arabidopsis 6 Arabidopsis Arabidopsis Arabidopsis 6 Materials and methods Arabidopsis 1 Arabidopsis Arabidopsis Fig. 6 Arabidopsis Arabidopsis Arabidopsis 1 Arabidopsis Arabidopsis Arabidopsis 8 1 1 8 Table 8 Arabidopsis Coffee unigene Solanaceae EST-derived unigene match Score GenBank (non-redundant and dbest) best match Score Annotation 124978 240871 tomato 454 Unknown function 121324 235756 tomato 429 Brassica 44 Unknown function 131820 213100 tomato 426 Oryza 73 Unknown function 121542 240321 tomato 416 Oryza 75 Unknown function 131934 219759 tomato 377 Oryza 297 TFIIH basal transcription factor p52 subunit 121140 236347 tomato 320 Unknown function 131445 225435 tomato 320 Unknown function 125230 243065 tomato 238 Seasame 45 2S albumin 131030 246364 potato 213 Drosophila 110 Phospyhatidyl inositol transfer protein 120120 237254 tomato 202 gbICF349465.1 [Rose] 52 Unknown function 126635 237314 tomato 185 Unknown function 126575 237314 tomato 182 Unknown function 130675 209387 petunia 177 Populus 438 Unknown function 128020 237314 tomato 167 Unknown function 123615 249253 potato 163 Oryza 140 Unknown function 126432 240551 tomato 163 Rattus 56 Phosphatidylinositolglycan class N 124384 197378 pepper 156 Vitis 1,009 Unknown function 122126 239632 tomato 153 Unknown function 131601 232010 tomato 145 Macaca 74 40S Ribosomal protein S21 119644 237150 tomato 143 Oryza 70 Helicase The GenBanks Best match exclude those from Solanaceae, Coffea and Hedyotis (both members of the Rubiaceae family). Solanaceae EST-derived Arabidopsis Arabidopsis 8 Arabidopsis 8 Arabidopsis Arabidopsis Arabidopsis Arabidopsis Arabidopsis 1 Arabidopsis Arabidopsis Arabidopsis 2002 Arabidopsis Arabidopsis Arabidopsis Arabidopsis Arabidopsis 7 Arabidopsis Arabidopsis Arabidopsis Arabidopsis http://www.sgn.cornell.edu/help/about/tomato_sequencing.html Fig. 7 Arabidopsis Conclusions Arabidopsis Arabidopsis Arabidopsis Arabidopsis