Introduction 1997 2000 2005 2006 2003 2005 2002 2006 2003 2000 2000 2002 2006 2004 2000 2004 2002 2004 2000 2005 2005 2002 2004 2006 2005 Oryza sativa 2005 2005 Also in the rice genome, a small yet significant fraction of genes is organized in local coexpression domains that predominantly consist of two, up to 4, genes that are not categorized in the same functional category, irrespective of the expression platform used for analyses. The presence of tandemly duplicated genes, shared promoter sequence or gene distance is not fully explaining the occurrence of coexpression of genes in such chromosomal domains. Therefore, the regulation of local coexpression domains is postulated to be at the level of higher-order of chromatin structure. Given the similarities in the characteristics and occurrence of coexpression domains between Arabidopsis and rice, we investigated whether the genes involved showed microsynteny between the two genomes. These analyses did not identify the presence of syntenic local coexpression domains between Arabidopsis and rice. Material and methods Genome data Oryza sativa Expression data http://mpss.udel.edu/rice Oryza sativa 2007 2005 2007 Identification of local coexpression domains R R R 2000 2004 2005 2000 2005 R Duplicated genes 1997 E 2004 2003 2004 Analyses of gene orientation and gene distance t 2001 z z To determine the gene distance, the intergenic distance is used. This distance is defined as the length in nucleotides from the annotated end of one gene to the annotated start of the next gene, including the UTRs when known, otherwise the translation start and stop sites were taken. The data sets excluding the duplicated gene pairs were analyzed. For each data set, gene pairs were sorted based on gene distance from short to long and bins of 1,000 pairs were taken and analyzed, excluding the last bin with less than 1,000 pairs. The advantage of using equal pair bin is that it avoids unequal number of gene pairs in different distance categories. Per 1,000-pair bin, gene distance was calculated as the average over all 1,000 pairs. For each 1,000-pair bin, the fraction of coexpressed pairs relative to the total number of pairs in each orientation group in each bin was calculated and plotted. Functional categorization of genes 2004 Assessing synteny between Arabidopsis and rice 2005 E −20 Results Local coexpression domains consist of two to four neighboring genes 1 2005 1 1 2005 2006 Table 1 Description of rice expression data used for whole-genome local coexpression analysis MPSS MA Genes with expression Excluding overlapping genes 23,146 14,789 Without expressed neighbor(s) 5,081 5,438 represented in pairs 18,065 9,351 Adjacent pairs Total 12,920 6,032 Tandemly duplicated pairs (td) a a Coexpressed b b Total excluding td 11,257 5,459 Coexpressed excluding td c c Coexpressed adjacent pairs Total 584 320 Tandemly duplicated pairs d d Tandemly duplicated pairs Total 1,663 573 Coexpressed e e a b c d e 2003 2005 E 2004 2003 2004 1 2004 2 2 2000 2 P 2005 Table 2 Local coexpression domains in the rice genome Rice genome Random genome (100×) a b c P d Pairs e 12,920 584 (4.52%) 408 ± 17 −17 f 11,257 438 (3.89%) 356 ± 21 −6 g 6,032 320 (5.30%) 301 ± 17 0.012 h 5,459 288 (5.28%) 271 ± 16 0.014 Triplets MPSS + td 7,775 23 (0.30%) 8.78 ± 2.9 −5 MPSS-td 6,831 13 (0.19%) 7.74 ± 3.0 0.025 MA + td 2,461 5 (0.20%) 6.54 ± 2.7 n.s. MA-td 2,149 3 (0.14%) 5.10 ± 2.4 n.s. Quadruplets MPSS + td 4,887 3 (0.06%) 0.24 ± 0.47 −3 MPSS-td 4,318 0 (0%) 0.18 ± 0.39 n.s. MA + td 1,079 0 (0%) 0.14 ± 0.37 n.s. MA-td i nd nd nd a b c d P 2000 P e f g h i 1 Fig. 1 Distribution of local coexpression domains over all 12 rice chromosomes. Rectangles are schematic representation of chromosomes 1–12 from top to bottom. The numbers on the top show the scale in million bases along the chromosomes. Each gene in a local coexpression domain is depicted with a black bar. Only MPSS datasets excluding tandemly duplicated genes are shown. The orders of the drawings in each rectangle are: first lane, coexpressed pairs; second lane, coexpressed triplets; third lane, coexpressed quadruplets, fourth lane, partially syntenic coexpression domains (PSCDs) between Arabidopsis and rice Orientation and distance do not solely explain the occurrence of local coexpression 2004 2000 3 3 2001 3 Table 3 Orientation of coexpressed gene pairs a b c MPSS tan-td 5,621 239 (4.25%) div-td 2,418 82 (3.39%) con-td 3,218 117 (3.64%) MA tan-td 2,707 143 (5.28%) div-td 1,224 72 (5.88%) con-td 1,528 73 (4.78%) a b c z cis trans 2005 2 2005 2005 Fig. 2 X Y A D B E C F Functional categorization of coexpressed genes 2004 4 2005 4 Table 4 Distribution of gene pairs over GOslim categories (Non-duplicated pairs) a b c P d MPSS     GO_func e 2502 100 2402 0.42 f 365 (14.6%) 12 (12.0%) 353 (14.7%)     GO_proc         Covered 1366 50 1316 0.47         sameKnCat 144 (10.5%) 7 (14%) 137 (10.4%)     GO_comp         Covered 383 17 366 0.60         sameKnCat 113 (29.5%) 6 (35.3%) 107 (29.2%) MA     GO_func e 1365 83 1282 0.13 f 177 (13.0%) 7 (8.43%) 170 (13.3%)     GO_proc         Covered 707 43 664 0.13         sameKnCat 67 (9.48%) 2 (4.65%) 65 (9.79%)     GO_comp         Covered 202 10 192 0.24         sameKnCat 45 (22.3%) 4 (40%) 41 (21.4%) a b c d P z P P e f Microsynteny of local coexpression domains between rice and Arabidopsis 2005 2005 2005 Partially syntenic local coexpression domains can occur by chance In 34 cases though, one gene of a coexpressed pair in one plant species was orthologous to at least one gene of a coexpressed pair in the other plant. That is 3.6% of all (944) coexpressed pairs in Arabidopsis and 5.8% of all (584) coexpressed pairs in rice. We will refer to such a case as a partially syntenic coexpression domain (PSCD). To assess the significance of such partially syntenic domains, we evaluated all the genes in non-coexpressed pairs, comparing Arabidopsis (15,629 pairs including 617 duplicated pairs) and rice (12,336 pairs including 1,517 duplicated pairs) to establish whether PSCDs are more enriched in the genome than partially syntenic non-coexpressed domains (PSND). We identified 4,488 PSNDs (72 due to duplicated pairs) between all non-coexpressed pairs of genes in both plant genomes. This is 28.7% of all Arabidopsis non-coexpressed pairs and 36.4% of all rice non-coexpressed pairs. The percentage of PSNDs among non-coexpressed pairs is 6–8 times higher than that of PSCDs from coexpressed pairs. Therefore, PSCDs do not seem to occur more often than expected by chance alone. 3 Fig. 3 Schematic representation of the chromosomal regions covering genes involved in a four-to-one orthology between Arabidopsis and rice. Top part of the figure is the chromosomal region from rice (from gene locus Os07g43540.1 to gene locus Os07g43570.1). Bottom part of the figure is the chromosomal region from Arabidopsis, representing 23 genes (from gene locus At4g23120 to At4g23340; the numbers in the picture do not carry “At4g”). Black arrows represent the four Arabidopsis and the one rice gene involved in this orthology, and dashed curved connecting lines show the orthology relationships. Black bracket-like lines depict duplication and genes connected and included within by black bracket line are duplicated to each other. Dotted lines depict coexpression relationship and genes connected and included by dotted line are coexpressed with each other Discussion Local coexpression domains represent only a small part of the genome 2005 2000 2005 2003 2006 2002 2005 2002 2004 2006 2000 2004 2002 2006 2002 2002 2004 2002 2007 Parameters shaping local coexpression domains 3 2005 2006 2004 2006 2 2004 2004 2000 2000 2005 2002 2003 2006 4 2000 2003 2004 2006 2004 2002 2004 2000 2006 1997 2000 2005 2005 2002 2004 2006 1994 1995 2002 2006 2002 2007 2006 2007 2007 2001 Lack of microsyntenic coexpression 2002 1999 2002 2002 Electronic supplementary material Below is the link to the electronic supplementary material. (XLS 231 KB)