J Mol Cell Cardiol 33, 587–591 (2001) doi:10.1006/jmcc.2000.1335, available online at http://www.idealibrary.com on
Rapid Communication
Organization of Human Cardiovascularexpressed Genes on Chromosomes 21 and 22 Adam A. Dempsey1,2, Noel Pabalan2, HongChang Tang2 and Choong-Chin Liew1,2 1
The Cardiovascular Genome Unit, Brigham and Women’s Hospital, 75 Francis Street, Thorn 1326; Harvard Medical School, Boston, MA 02115, USA; 2Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario M5G 1L5, Canada
(Received 19 December 2000, accepted 19 December 2000, published electronically 24 January 2001) A. A. D, N. P, HC. T C.-C. L. Organization of Human Cardiovascular-expressed Genes on Chromosomes 21 and 22. Journal of Molecular and Cellular Cardiology (2001) 33, 587–591. The recent availability of the sequenced and annotated DNA sequences of chromosomes 21 and 22 has initiated the next phase in the human genome project: the application of this resource. One facet of these data is that they provide a list of ordered genes along the chromosome that can be capitalized upon to determine gene position effects. Specifically, the physical position and distribution of genes along the chromosomes may be related to gene expression in specific organs or organ systems. In this report we index the subset of genes constituting the human “cardiovascular genome” on chromosomes 21q and 22q as well as report the identification of several “cardiovascular gene” clusters. These gene clusters are suggestive of a higher order of tissue-specific gene regulation at the chromosomal level. 2001 Academic Press K W: Chromosome gene density; Chromosome remodeling; Cardiovascular Expressed Sequence Tags (ESTs); Cardiovascular genomics.
Introduction Current evidence supports the hypothesis that chromatin structure participates in gene regulation and the use of further dynamic chromatin remodelling as a biological strategy to regulate gene transcription.1–3 This is especially important when extensive phenotypic changes occur during processes such as cell differentiation and ontogenesis as chromatin remodelling has been shown to regulate gene expression in a temporal- and tissue-specific manner.4–6 If chromatin structure plays a role in the transcriptional regulation of genes during development, it is possible the physical distribution of genes along the chromosomes may have a higher order of organization than currently envisaged. In support of this, a recent report using gene expression
profile data generated from oligonucleotide microarrays from yeast at various stages in the cell cycle indicated that genes positioned next to each other have a high probability of being co-regulated.7 In order to explore this concept, we took advantage of the recently completed and annotated DNA sequences of human chromosome 21 (C21) and 22 (C22).8,9 Through sequence similarity searching of our current human cardiovascularbased Expressed Sequence Tag (EST) database (cvEST) housing 111 224 ESTs with all 223 and 545 transcript-encoding genes located on C21q and C22q, respectively, we were able to determine which of these genes were expressed in the cardiovascular system (CVS).10,11 We identified at least 100 (44.8%) and 248 (45.5%) genes on the q-arms of C21 and C22, respectively, that are expressed in the CVS.
Please address all correspondence to: Dr C. C. Liew, The Cardiovascular Genome Unit, Brigham and Women’s Hospital, 75 Francis Street, Thorn 1326; Harvard Medical School, Boston, MA 02115, USA. E-mail:
[email protected]
0022–2828/01/030587+05 $35.00/0
2001 Academic Press
588
A. A. Dempsey et al.
Figure 1 Gene density across chromosomes 21q and 22q. This plot represents the gene density as a proportion of the total number of genes in a 1 Mb window. The blue line represents the complete set of genes on the chromosome; the red line represents the genes expressed in the CVS based on sequence similarity searches of our EST database for each gene. Regions with statistically significant differences (P<0.05) in gene density are indicated by arrows. The numbers adjacent to the arrows indicate the fold-difference in the CVS v the total number of genes in the region. Chi-squared tests were used to determine significant differences in gene distribution (P<0.05, df=1).
Materials and Methods Developing the cardiovascular EST database (cvEST) The protocols used for construction of the cDNA libraries, generation of the ESTs and analysis of the data have been described in previous publications.10,11 The 57 241 ESTs collected from Genbank (dbEST) were obtained from the National Center for Biotechnology Information (NCBI) website (http:// www.ncbi.nlm.nih.gov/UniGene). Sequence sets from six aorta cDNA libraries (Unigene library numbers 57, 182, 332, 333, 678, 780), seven adult heart cDNA libraries (Unigene library numbers 10, 46, 121, 241, 326, 334, 335) and four fetal heart cDNA libraries (Unigene library numbers 224, 317, 465, 466) were collected and combined with the 53 983 ESTs generated in our laboratory [from one aorta, two normal adult, three diseased (hypertrophic left ventricle) and three fetal heart (8 to
12 weeks) cDNA libraries] to create the cardiovascular EST database (cvEST) containing 111 224 ESTs.
Gene identification in the cvEST using chromosomes 21 and 22 Nucleotide sequences of 225 genes in 21q were obtained from the MPIMG (www.rzpd.de/general/ html/Chrom21) website. Nucleotide sequences of the 545 genes in 22q were accessed from the Sanger Centre (www.sanger.ac.uk/cgi-bin/c22 genes table.pl) and NCBI (www.ncbi.nlm.nih.gov) websites. Sequence similarity searches of all these genes against our cardiovascular EST database were performed with the BLAST algorithm on a SUN Ultra 60 workstation. A positive match was chosen if the nucleotide sequence identity was at least 95% and the E-value was 1×10−40 or less.
Cardiovascular Gene Clusters on Chromosomes 21 and 22
Statistical analysis Chi-squared (2) statistics were used to determine statistical significance between the subset of genes expressed in the cardiovascular system v all the genes on the chromosome. The following formula was used: 2=(O−E)2/E. For each 1 megabase (Mb) region along the chromosome the number of genes expressed in the CVS (observed value, O) and the total number of genes present on the chromosome were determined. The number of genes expected to be expressed in the CVS (expected value, E) was determined based on the proportion of genes, relative to the total number of genes on the chromosomes, found in the 1 Mb region. Chisquared values (2total) were calculated by summing the 2 value for each 1 Mb region (2region) with the 2 value calculated from the sum of all remaining regions on the chromosome (2remaining) such that 2total=2region+2remaining. A degree of freedom (df) of 1 and P<0.05 were used to assess statistical significance.
Results and Discussion Nucleotide sequence comparisons for all genes located on chromosomes 21 and 22 against the cvEST identified 100 of 223 (44.8%) genes on C21q and 248 of 545 (45.5%) genes on C22q. Examining the gene distribution along the chromosomes at the cytogenetic bands revealed little variability between the genes on the chromosomes and those expressed in the CVS. However, restricting this comparison to smaller 1 Mb windows did reveal some interesting chromosomal segments (Fig. 1). The region on chromosome 21q spanning from 12 to 14 Mb (centromere to telomere) contains 11 genes, of which ten are expressed in the CVS. This is a statistically higher-than-expected density of genes (P<0.05) and is highly suggestive of a cluster of co-regulated genes in the CVS. Similarly, one CVS gene cluster was also found on chromosome 22q spanning from 22 to 27 Mb (P<0.05). This cluster contains 108 genes, 73 of which are expressed in the CVS. Also, we identified a 2 Mb centromeric region on chromosome 22 spanning from the centromere to 2 Mb that contains a decreased number of genes expressed in the CVS than expected (nine expressed out of a possible 41; P<0.05) and thus may have more functional significance in another organ or organ system (Fig. 1). Table 1 lists the genes expressed in the CVS within the three “hotspot” regions on C21q and C22q. Examination of the function of the individual genes within each cluster
589
Table 1 Genes expressed in the cardiovascular system located within the three “hotspots” on chromosomes 21q and 22q Chromosome 21q (12 to 14 Mb) Ten of 11 genes expressed in the CVS Complete cDNA FLJ20451 Disintegrin-like and metalloprotease with thrombospondin type 1 motif, 5 Gene similar to MARCKS, cDNA DKFZp564P1664 Gene similar to mouse junctional adhesion molecule, spliced EST AA725566 Human metalloproteinase with thrombospondin type 1 motifs Human mitochondrial ATPase coupling factor 6 subunit Human mRNA for amyloid A4 precursor of Alzheimer’s disease Human nuclear respiratory factor-2 subunit alpha Spliced EST AI016585 Spliced EST N23422 Chromosome 22q (centromere to 2 Mb) Nine of 41 genes expressed in the CVS Actin, beta-like 1 ADP-ribosyltransferase (NAD+; poly(ADP-ribose) polymerase)-like 4 Matches EST cluster Matches EST cluster Similar in part to Tr:Q14692, KIAA0187 Similar to part of Tr:O16024, novel A kinase anchor protein from Drosophila Similar to Tr:P15287, California sea hare ATRIAL GLAND-SPECIFIC ANTIGEN PRECURSOR Similar tp Wp:CE21003, predicted C. elegens gene YME1 (S. cerevisiae)-like 2 Chromosome 22q (22 to 27 Mb) Ten highest expressed from the 73 of 108 genes expressed in the CVS Aconitase 2, mitochondrial Activating transcription factor 4 (tax-responsive enhancer element B67) dJ186O1.2 ESTs NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 6 (14 kD, B14) Non-histone chromosome protein 2 (S. cerevisiae)-like 1 Ribosomal protein L3 Ring-box 1 Similar to Sw:P46781, human 40S RIBOSOMAL PROTEIN S9 Thyroid autoantigen 70 kD (Ku antigen) Ubiquinol-cytochrome c reductase, Rieske iron-sulfur polypeptide-like 1 The genes expressed in the CVS for each hotspot region are listed. Only the ten highest expressed genes in the CVS are listed for the 22 to 27 Mb region of chromosome 22q. The genes listed were extracted from the tables presented in refs 8 and 9.
590
A. A. Dempsey et al.
may reveal some potential “functional clusters” and is an area that we are currently investigating. These results demonstrate that subsets of genes sharing a common expression pattern, in this case the same organ, are physically clustered on the chromosomes. This supports the concept that clusters of genes spanning large chromosome segments have a higher order of organization and may be under regulatory control at the chromosomal level. Also, these large chromosome regions appear to be regulated in a tissue or organ-specific manner. Such organization is logical in light of the on/off and gene dosage mechanisms controlling a large part of cellular processes such as cell commitment and differentiation. Furthermore, these findings have implications in disease, as these gene clusters may have a greater influence on aberrant cardiovascular phenotypes, for example in Down’s Syndrome, and can be considered high risk genetic regions for cardiovascular disease. Although these data are preliminary, and further evidence may be required to solidify the hypothesis, it is nonetheless an intriguing observation with important implications for cardiovascular development and disease that warrants further investigation.
Acknowledgments This work is currently supported by The Medical Research Council of Canada and The Heart and Stroke Foundation of Ontario. Adam A. Dempsey is a recipient of a Heart and Stroke Foundation of Canada Studentship.
References 1. K JT. Eukaryotic transcription: an interlaced network of transcription factors and chromatin-modifying machines. Cell 1998; 92: 307–313. 2. W JL, K RE. Alteration of nucleosome structure as a mechanism of transcriptional regulation. Annu Rev Biochem 1998; 67: 545–579. 3. B C. Developmental regulation of eukaryotic gene loci: which cis-regulatory information is required? Trends Genet 2000; 17: 310–315. 4. B C, H MC, J U, F N, S AE. Prerequisites for tissue specific and position independent expression of a gene locus in transgenic mice. J Mol Med 1996; 74: 663–671. 5. H W-Y, L CC. Chromatin structure and cardiac gene expression. J Mol Cell Cardiol 1998; 30: 1673–1681. 6. C BA, M RD, H JD, C GM. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. J Mol Med 1996; 74: 663–671.
7. B C. Long-distance chromatin mechanisms controlling tissue-specific gene locus activation. Gene 1999; 238: 277–289. 8. D I, H AR, C JE, B R, B DM, C M, S LJ, A R, A JP, B A, B C, B J, B K, B KN, B O, B CP, B S, B AM, B D, B J, B WD, B J, C C, C NP, C Y, C G, C SM, C V, C CG, C RE, C RE, C D, C N, C GJ, C AV, D J, D E, D PD, D C, D SJ, D RM, E A, E KL, F JM, F K, F L, G AA, G JGR, G ME, G D, G MN, H C, H R, H-T G, H RW, H S, H S, H SE, J MC, K J, K A, K A, L GK, L CF, L MA, L C, L DM, M ID, M-M M, M L, MC OT, MC J, ML S, MM AA, M SA, M BJ, O CN, P R, P AV, P D, P BJ, P SH, P RW, R H, R Y, R L, R MT, S CE, S HK, S CD, S S, S L, S C, S L, S CA, S JE, S RM, V M, W M, W JM, W MN, W D, W L, W S, W H, W TE, W L, W CL, H T, B DR, B S, R J, S N, M S, K K, S T, A S, K J, S A, S K, Y Y, A N, M S, R BA, C F, C L, C J, D S, D A, D T, D A, F F, F Y, H P, H A, K S, L H, L HI, L J, L S, L S-P, L P, M E, N T, P H, P S, Q S, Q Y, R L, R Q, S S, S D, S L, W Q, W Y, W Z, W J, W D, W H, Y Z, Z M, Z G, C S, M J, M N, M P, F R, J D, B G, B D, B H, B S, C M, D Z, F L, G D, G T, H J, H K, K K, L P, L D, O P, R T, S P, W C, W A, W P, P K, N J, K I, B JA, H L, M E, W R, W R, E BS, S T, K H, S S, B ML, MD HE, J A, W ACC, M BE, E L, K UJ, S H, S MI, D JP, P M, K D, S E, F I, T I, B CE, O’B KP. The DNA sequence of human chromosome 22. Nature 1999; 402: 489–499. 9. H M, F A, T JD, W H, Y T, P H-S, T A, I K, T Y, C D-K, S E, O M, T T, S Y, T S, B K, P A, M U, D J, K K, L R, P D, R K, R A, S M, S A, Z W, R A, K J, S K, K K, A S, S A, S T, N K, M S, A SE, M S, S N, N G, H K, B P, S M, S O, D A, R J, K G, B H, R J, B A, K S, H S, R L, D E, W S, B
Cardiovascular Gene Clusters on Chromosomes 21 and 22 K, G K, N D, F F, H, R R, Y M-L. The DNA sequence of human chromosome 21. Nature 2000; 405: 311– 319. 10. L CC, H DM, F YW, L C, C E, T S, L CY. A catalogue of genes in the cardiovascular system as identified by expressed sequence tags (ESTs). Proc Natl Acad Sci USA 1994; 91: 10645–10649.
591
11. H DM, D AA, W RX, R M, B JD, D KS, W HY, M H, C E, L YQ, G JR, Z JH, T SKW, W MMY, F KP, L CY, L CC. A genome-based resource for molecular cardiovascular medicine: Towards a compendium of cardiovascular genes. Circulation 1997; 96: 4146–4203.