Update
TRENDS in Genetics Vol.21 No.3 March 2005
13 Kumazawa, Y. et al. (1998) The complete nucleotide sequence of a snake (Dinodon semicarinatus) mitochondrial genome with two identical control regions. Genetics 150, 313–329 14 Nedelcu, A.M. et al. (2000) The complete mitochondrial DNA sequence of Scenedesmus obliquus reflects an intermediate stage in the evolution of the green algal mitochondrial genome. Genome Res. 10, 819–831 15 Cedergran, R.J. and Lang, B.F. (1985) Probing fungal mitochondrial evolution with tRNA. BioSystems 18, 263–267 16 Burger, G. et al. (2003) Unique mitochondrial genome architecture in unicellular relatives of animals. Proc. Natl. Acad. Sci. U. S. A. 100, 892–897
133
17 Bedinger, P. et al. (1989) Sequence-specific pausing during in vitro DNA replication on double-stranded DNA templates. J. Biol. Chem. 264, 16880–16886 18 Pinder, D.J. et al. (1997) DIR: a novel DNA rearrangement associated with inverted repeats. Nucleic Acids Res. 25, 523–529 19 Viguera, E. et al. (2001) Replication slippage involves DNA polymerase pausing and dissociation. EMBO J. 20, 2587–2595 20 Swofford, D.L. (2002) PAUP*: Phylogenetic analysis using parsimony (*and other methods). Version 4, Sinauer Associates 0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2005.01.004
A eukaryotic gene family related to retroelement integrases Xiang Gao and Daniel F. Voytas Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
Proteins encoded by mobile genetic elements occasionally assume cellular roles. Telomerase, for example, is a reverse transcriptase that replicates chromosome ends, and Rag1 is a transposase that mediates immunoglobulin
gene rearrangements. In this article, we report cellular genes related to integrases that are not associated with a retrovirus or retrotransposon. These integrases are found in diverse eukaryotes and are evolving under functional
(a) Bel Cer9
D D E ↓ ↓ ↓ ASRCFQHTGLDYAGPIAI(5)RTPRIGKAWFSIFVCLTTK-ALHIEVVSELTT----QAFIAAFQRFIARR--AKPTDLYSDNGTTFH(15)QAKDEELAGFFANEGISWHFIP(9)WEAGVRSIKLHMKRILGSKALT--FEELSTVLTQIEAILNSRPLCPT-PSTPFQNAGLDYMGPVEY(3)DGVSTGKAYVLIYTCFTTR-ATILRVVSDGST----ERFIMALKTIFHQVG—-VPKMVYSDNAPAFI(15)HSDPLTSFMATQSIHFFRITPV(7)YERIVGLVHQILKVCG--ADRFDYFTLSYIVSSAQAMVNNRPLMQHSR-
AtRE1 Retrofit Tst1 RIRE1 Tnt1 Tto1 1731 copia SIRE1 Ty1
STRPLEYIYSDVWSS-PI---LSHDNYRYYVIFVDHFTRY-TWLYPLKQKSQV----KETFITFKNLVENRFQTRIGTFYSDNGGEFVA---LREYFSQHGISHLTSPPHTPEHNGLSERKHRHIVETGLTLLS--HASIPKTYWPYAFAVAVYLINRLPTPLLQL SQFPLELVFSDVWGP-AP---ESVGRNKYYVSFIDDFSKF-TWIYLLKYKSEV----FEKFKEFQALVERMFDRKIIAMQTDWGGRYQK---LNSFFAQIGLIIMCHVLTLIRQNGSAERKHRHIVEVGLSLLS--YASMPLKFWDEAFVAATYLINRIPSKTIQN ASKPFTMIHSDVWGPSRI---STMFGKRWFVTFIDDHTRL-SWVFLLKGKSEV----KNVFETFHVMVETQFNEKIKIFRSDNGREFFN-EQLGSFFRKTGVVHQSSCPDTPQQNGIAERKNRHLLEATRALMF--TSKVPQHLWGEALLTATYLINRMPSRPLEF ASDLLALVHTDVCGPMSS---TARGGYQYFITFTDDFSRY-GYIYLMRHKSES----FEKFKEFQNEVQNHLGKTIKFLRSDRGGEYVS-QEFGNHLKDCGIVPQLTPPGTPQWNGVSERRNRTLLDMVRSMMS--QSDLPLSFWGYALETAALTLNRVPSKSVEKLNILDLVYSDVCGPMEI---ESMGGNKYFVTFIDDASRK-LWVYILKTKDQV----FQVFQKFHALVERETGRKLKRLRSDNGGEYTS-REFEEYCSSHGIRHEKTVPGTPQHNGVAERMNRTIVEKVRSMLR--MAKLPKSFWGEAVQTACYLINRSPSVPLAF RQNVLDLVHSDVCGPFK----KSLGGARYFVTFIDDHSRK-TWVYTLKTKDQV----FQVFKQFLTLVERETGKKLKCIRTDNGGEYQG--QFDAYCKEHGIRHQFTPPKTPQLNGLAERMNRTLIERTRCLLS--HSKLPKAFWGEALVTAAYVLNHSPCVPLQY AEELLDMIHSDLCGPFST---PSLAGSKYFLTFIDDKSRR-IFVYFLRKKDEV----FTKFVEFKKLVERQTGRKIKCIRSDNGGEFVN-NVFDDYLKAHGIARQLTIPHTPQQNGVAERANRTLVEMARCMLL--QSELGEALWAEAINTAVYLRNRSTSRALQS IKRPLFVVHSDVCGPITP---VTLDDKNYFVIFVDQFTHY-CVTYLIKYKSDV----FSMFQDFVAKSEAHFNLKVVYLYIDNGREYLS-NEMRQFCVKKGISYHLTVPHTPQLNGVSERMIRTITEKARTMVS--GAKLDKSFWGEAVLTATYLINRIPSRALVD TSRVLELLHMDLMGPMQV---ESLGRKRYAYVVVDDFSRF-TWVNFIREKSDT----FEVFKELSLRLQREKDCVIKRIRSDHGREFEN-SKFTEFCTSEGITHEFSAAITPQQNGIVERKNRTLQEAARVMLH--AKELPYNLWAEAMNTACYIHNRVTLRRGTP SYEPFQYLHTDIFGPVHN---LPNSAPSYFISFTDETTKF-RWVYPLHDRREDS--ILDVFTTILAFIKNQFQASVLVIQMDRGSEYTN-RTLHKFLEKNGITPCYTTTADSRAHGVAERLNRTLLDDCRTQLQ--CSGLPNHLWFSAIEFSTIVRNSLASPKSKK
Dic_ESTgi19313657 gi1244745_XMU4333 Fugu_SINFRUP00000155622 Zebrafish_ENSDARP00000021777 Zebrafish_ENSDARP00000022449 Celchr1_13109149_13107986 C.briggsae_CBG20709 Celchr3_7961480_7960431 Tlr Human_gi29844577 Canis_gi18821731 Human_gi29824588 Bos_ESTgi15380147 Fob1p
DNEKILTKLDDLIQYDSVQP—-YDTDVVYIILCIDSFTKF-ATGRCL-TTKRT----VPIYNFLALTYFG---KPVKVWHCDNGREFKNKVQKEFLKLFPGSKSAHGAPRTPTTQGMVERLNRTIKERISKLKQQDFLDGTSRSLSELLKQALYDYNNTKTRTIKM VTRPLKQFQADLCDMQALAE—-ENDGYNYLLTVIDIFSKR-AYVRIL-KRKTA----DEVVKAFASIFKDS--QTPKKLQTDAGKEFFN---KKFKALMKKHGIEHFATASETKASVIERFNRTLKTRMWRY----FTANNTRRYVDVVQDLVKSYNHSYHASIKT APRPLNQFQADLCDMQALSE—-HNDGYNYLLTVIDIFSKK-AYVRAL-KRKSA----AEVVKAFESIFESS--QTPEKIQTDAGKEFFN---KSFEALMRKHNIVHFSTAS------------------------------------------------------VYGVDTQFQADLVDMSAYST—-ENDGNKFMLTCIDIFSKY-AWSRVL-KNKSG----IEVTKAFESILEEG--RVPRKLQTDQGKEFFN---KHFQDLMKKHDINHFATATDLKASVVERFNRTLKSRMWRF----LTATNSRRYIDVLQDIMTGYNSSYHKTIKM VYGMDTQFQADLVDMTAYST—-ENDGNKFMLTCIDVFSKY-AWSRVL-KNKSG----VEVTKAFESILEEG--RVPRKLQTDQGKEFFN---KHFQDLMKKYDINHFATATDLKASVVERFNRTLKSRMWRF----LTATNSRRYIDVLQDIMTGYNNSYHKTIKM IHGIDEQWQSDLVEMIPYAE—-ENDDFRYLLTVIDCFSKY-AWVIPL-KTKSG----KETANAISSIFKV---RKPQKIQTDNGKEYYN---SDMKKLFKSFNINHFSTYSDKKASIIERFNRTLKEKMWKM----FTYQGNHKWVNILDKLVSGYNNHYHRSIKM IHDIDEQWQSDLVEMIPYAA—-ENNDFKYLLTVIDCFSKY-AWVIPL-KTKTG----KETADALESIFKK---RKPQKMQTDNGKEYYN---KEMKALFKSFDVHHFSTYSGKKASIIERFNRTLKEKMWKM----FTHQGNHKWINILDKLVSGYNNHFHRSIKM SPGLYASLQIDLADMSRYKK—-DNDGVTFLLVCIDIYSRR-LFVKTL-KSKKG----QEVAEALQDVFKEIG-VTPMTIYSDDGKEFYN---SHVKELLAKNFVNLFSPKSEIKCAIVERANRTLKTRLAKY----MTQKYNRRYVDVLQQVVNGINNSVNRGIGK ISAPQGSFQADLTFYEQFKR--QNHGYSILLTIIEINTRF-AYVFPL-KDKSA----QSILECLKELYRTEK-DNFKSISSDEGSEFDN---NLVKNFLDQNNIKYIFFNKQTN(6)IERFNRTIRDKISKY----QSFHKQKNFIDELQKLVKTYNNTIHSQINM IKEVSSRCQVDLIDMQLN----PDGEYRFILHYQDLCTKL-TFLRSL-KSKRP----TEVAHALLDIFTII--GAPSVLQSDNGREFSSQVVSELSNIWPELKIVHGKSQTCQSQSSAEQT-EDIRKRIFSW----MQTNNSSHWTEFLWFIQMSQNQPYHRSMQQ ----------------------ARG---FIMHYQDLHTKL-SFLRSL-KSKRP----KEIAHALLDIFTII--GAPSVLQSNNGREFSSQIVGELSNIWPELKIVHGKPQPCQSQSAMNQTNEDIQNRIIFW----MQTNNSSHWAEFL----------------FKDIDSTCQVEILDMQSS----ADGEFKFILYYQDHSTKF-IILRPL-RTKQA----HEVVSVLLDIFTIL--GTPSVLDSDSGVEFTNQVVHELNELWPDLKIVSGKYHPGQSQGSLEGASRDVKNMISTW----MQSNHSCHWAKGLRFMQMVRNQAFDVSLQQ CKDNDSRCQVEILDMQSN----ADGEFKFILYYQDHLTKF-IILRPL-KAKQA----HEVVSVLLDIFTIL--GPPSVLESDSGMEFTNQVVNELNEVWPDLKIVLGKSHPGQGQGFLERVSRDVKTMLSAW----MQSNHSHHWTEGLRFMQMVRNQAFDGSLQQ KLLPMERIHIEVFEPFNGEA—-IEGKYSYVLLCRDYRSSF-MWLLPLKSTKFKH--LIPVVSSLFLTFARVP-IFVTSSTLDKDDLYD---ICEEIASKYGLRIGLGLKSSAR(6)LCIQYALNSYKKECLADW---GKCLRYGPYRFNRRRNKRTKRKPVQVLLSE
HIV1 FIV RSV ALV BLV MuLV WDSV SnRV SFV
VDCSPGIWQLDCTHLEGK----------VILVAVHVASGY-IEAEVI-PAETG----QETAYFLLKLAGR---WPVTTIHTDNGSNFTSA-TVKAACWWAGIKQEFGIPYNPQSQGVVESMNKELKKIIGQV------RDQAEHLKTAVQMAVFIHNFKRKGGIGG LKIGPGIWQMDCTHFDGK----------IILVGIHVESGY-IWAQII-SQETA----DCTVKAVLQLLSA---HNVTELQTDNGPNFKNQ-KMEGVLNYMGVKHKFGIPGNPQSQALVENVNHTLKVWIRKF------LPETTSLDNALSLAVHSLNFKRRGRIGG GLGPLQIWQTDFTLEPRMAP-----RS-WLAVTVDTASSA-IVVTQHGRVTSV-----AVQHHWATAIAVL--GRPKAIKTDNGSCFTSKSTREWL-ARWGIAHTTGIPGNSQGQAMVERANRLLKDRIRVLAEGDGFMKRIPTSKQGELLAKAMYALNHFER-GE GLGPLQIWQTDFTLEPRMAP-----RS-WLAVTVDTASSA-IVVTQHGRVTSV-----AAQHHWATAIAVL--GRPKAIKTDNGSCFTSKSTREWL-ARWGIAHTTGIPGNSQGQAMVERANRLLKDKIRVLAEGDGFMKRIPTSKQGELLAKAMYALNHFER-GE GWAPNHIWQADITHYKYKQF-----TY-ALHVFVDTYSGA-THASAKRGLTTQ-----MTIEGLLEAIVHL--GRPKKLNTDQGANYTSKTFVRFC-QQFGISLSHHVPYNPTSSGLVERTNGLLKLLLSKY----HLDEPHLPMTQALSRA-LWTHNQIN----GHRPGTHWEIDFTEIKPGLY-----GYKYLLVFIDTFSGW-IEAFPTKKETAK-----VVTKKLLEEIFPRF-GMPQVLGTDNGPAFVSKVSQTVA-DLLGIDWKLHCAYRPQSSGQVERMNRTIKETLTKL----TLATGSRDWVLLLPLALYRARNTPG-PHGL PSRPFAHLQIDFVQMCVKKP-----MY—-ALVIIDVFSKW-PEIIPCNKEDAK-----TVCDILMKDIIPRW-GLPDQIDSDQGTHFTAK-ISQELTHSIGVAWKLHCPGHPRSSGIVERTNRTLKSKIIKA----QEQLQLSKWTEVLPYVLLEMRATPK-KHGL AEGPCQKLQVDHVGPLGPGT----HGYRYLTTMVDVYTGW-FWAKPCRGPTTG-----ATIAALEEHISIW--GVPYSIQSDNGTAFTSK-AMQEWANTYGIEWKVGAIYHPQSQGKVERKHRLLKDRLKRA----THEGK--NWVQALPSILLFI-NSMHPRDQF PQKPFDKFFIDYIGPLPP-----SQGYLYVLVIVDGMTGF-TWLYPTKAPSTS-----ATVKSLNVLTSIA---IPKVIHSDQGAAFTSS-TFAEWAKERGIHLEFSTPYHPQSSGKVERKNSDIKRLLTKL----LVGRP-TKWYDLLPVVQLALNNTYSPVLKY
Tma1 Cer1 Ty3 Cft-1 Sushi Tf1 Athila4 Hm Gin-1
PEWKWDFITMDFVIGLP-----VSRTKDAIWVIVDRLTKS-AHFLAIRKTDGA----AVLAKKYVSEIVKLH-GIPVSIVSDRDSKFTSAFWRAFQA-EMGTKVQMSTAYHPQTDGQSERTIQTLEDMLRMC----VLDWG-GHWADHLSLVEFAYNNSYQASIGM TFPL-EIVACDLMDVGL-----SVQGNRYILTIIDLFTKYGTAVPIPDKK--A----ETVLKAFVERWAIGEGRIPLKLLTDQGKEFVNGLFAQF-THMLKIEHITTKGYNSRANGAVERFNKTIMHIMKKK------TAVPMEWDDQVVYAVYAYNNCVHENTGE AEGRWLDISMDFVTGLPP----TSNNLNMILVVVDRFSKR-AHFIATRKTLDA----TQLIDLLFRYIFSYH-GFPRTITSDRDVRMTADKYQEL-TKRLGIKSTMSSANHPQTDGQSERTIQTLNRLLRAY----VSTNI-QNWHVYLPQIEFVYNSTPTRTLGK PTKPWDEVTMDFITKLPR(4)VTGQAYDMILVMVDRLTKY-AHFIPASEIYTA----EQLGYLVLDRLIRYH-GFPEVFITDRDKLFTSNYWKTL-MGTIGIKHKLSTAYHPETDGQTERTNQTLEQYLRHY----INYAQ-DNWVSLLPMAQIALNNHKSETTST PSRPWSHVALDFVTGLP-----VSQGNDTILTIVDRFSKG-VHFVALPKLPSA----AETAELLVSHVVRLH-GIPLDVVSDRGPQFTSRVWQAF-CKGIGATVSLSSGYHPQSNGQAERANQAMEAALRCV----TTSNP-ASWSKFLPWVEYSLNAMESSATGM SERPWESLSMDFITALP-----ESSGY-ALFVVVDRFSKM-AILVPCTKSIT-----AEQTARMFDQRVIAYFGNPKEIIADNDHIFTSQTWKDF-AHKYNFVMKFSLPYRPQTDGQTERTNQTVEKLLRCVCS-----T-HP-NTWVDHISLVQQSYNNAIHSAT EVEIFDVWGIDFMGPFP-----SSYGNKYILVAVDYVSKW-VEAIASP-TNDA----KVVLKLFKTIIFPRF-GVPRVVISDGGKHFINKVFENL-LKKHGVKHKVATPYHPQTSGQVEISNREIKTILEKT----VGITR-KDWSAKLDDALWAYRTAFKTPIGT VENPWSLVTVDLMGPFH----TSNRSHVYALIMTDLFTKW-IVILPLCDV-SA----SEVSKAIINIFFLY--GPPQKIIMDQRDEFIQQINIEL-YRLFGIKQIVIS-HTSGTVNPTESTPNTIKAFLSKH----CADHP-NNWDDHLSAVSFAFNVTHLEPTKN
Tc1_X01005 Tcb1_gi|84417 Bari1_X67681 Mos1_M14653
RQEWAKHIWSDESKFNLF(29)GSVMVWGCFTSTSMGPLR-RIQSIMDRFQYENIFETTMRPWALQNVGRGF--VFQ---QDNDPKHTSLHVRSWFQRRHVHLLDWP--SQSPDLNPIEHLWEELERRL--------GGIRASNADAKFNQLENAWKAIPMSVIHK PREWANHIWSDESKFNMF(29)GSVMVWGCFSDTSMGPLK-RIVGTMDRYVYEDILENTMRPWARANLGRSW--VFQ---QDNDPKHTSGHVANWFRRRRVNLLEWP--SQSPDLNPIEHMWEELERRL--------KGVRASNANQKFAQLEAAWKSIPMTLVQT LDFWFNILWTDESAFQYQ(29)GGTVMFWGCLSYYG----FGDLVPIEGTLNQNGYLLILNNHAFTSGNRLFPTTEWILQQDNAPCHKGRIPTKFLNDLNLAVLPWP--PQSPDLNIIENVWAFIKNQR--------TIDKNRKREGAIIEIAEIWSKLTLEFAQT KSFLHRIVTGDEKWIFFV(29)TMLCVWWDQSGVIYYELL-KRGETVNTARYQQQLINLNRALQRKRPEYQKRQHRVIFLHDNAPSHTARAVRDTLETLNWEVLPHA--AYSPDLAPSDYHLFASMGHAL-------AEQRFD-SYESVKKWLDEWFAAKDDEFYW
Bel
Ty1/copia
c-integrase
Fob1
Retrovirus
Ty3/gypsy
Gin
Transposon
Figure 1. Relationships between the catalytic domain of c-integrases, retroelement integrases and transposases. (a) Sequence alignments of the catalytic domains of integrases and transposases. Conserved aspartate and glutamate residues are marked by arrows. Integrase sequences include representatives from the major branches of retrotransposons and the various genera of retroviruses. (b) A phylogenetic tree constructed from amino acid sequences of the catalytic domain. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 3.0 [14]. The c-integrase data set includes both genomic and EST sequences. GenBank accession numbers or chromosome coordinates are provided in the figure. Complete sequence alignments are available at http://www.public.iastate.edu/wvoytas/MSsupplementary/c_integrase. Bootstrap replicates (1000) were conducted and values are shown at the base of relevant clades. Note that there is not sufficient evidence to conclude that the c-integrases are monophyletic or to assess the ancestor or descendent relationship between Fob1p and the c-integrases. We also wish to point out that the mammalian c-integrases are distinct from the other vertebrate genes (i.e. from fish), and might define a paralogous lineage.
Corresponding author: Voytas, D.F. (
[email protected]). Available online 25 January 2005 www.sciencedirect.com
134
Update
TRENDS in Genetics Vol.21 No.3 March 2005
(b)
Ty3/gypsy
2
Mu
V SF
a1 Tm Ty3
100 46
754 195 31 i46 217 g _ i188 n STg ma 8 Hu anis_E 8 C n_gi298245 47 Huma _ESTgi153801 us Bos taur Fob1
Dic_ESTgi19409273 Dic_ESTgi19313657 Dic_ESTgi19309151 Tlr Celchr4_14840238 Celchr3_7961480 Celchr1_131 Celchr6_1 09149 C.brigg 7243247 sae C Fug .briggs _CBG20709 ae_ u_S I N Fu FR CBG23 U gu 832 _S P000 001 _S INF 348 R IN 14 FR UP0 00 UP 00 00 14 77 00 25 01 35 90 2
100
87 92
97
173 1 Tpv2-6 Art1 Sto-4 RIRE1 BARE1
99
c-integrase
Bel Cer9
2 62 55 01 00 76 00 88 01 UP 33 6 43 088 0000 FR U 449 gu SIN XM 023 AR0 022 Fu u_ 45_ gi_3 NSD 000 7 g 47 sh E P00 177 Fu 24 rafi sh_ DAR 002 gi1 Zeb brafi h_ENS P0000 Ze brafis DAR 26386 Ze sh_ENS 00000 ARP0 rafi Zeb fish_ENSD Zebra
Pa Tnt nz 1 Ta ee 1
r1 Ce
r 4 Ty y1 sse T ue er nb ss O ke ec Lu
Ty1/copia
PR To EM R Op TL1 2 ieSIR 2 E1 End Retr ovir1 ofit Hopsco tch Evelknievel AtRE1 Tst1 Tgmr ia th cop lmo Me Mosq ia o1 p Tt co
LV WD SV A At thila Su hila 4 s 9 T hi Cf f1 t -1
41
SnR V
RSV ALV
FIV BLV
-1 Gin
1 HIV SIV
Retrovirus
Bel
Bari1 Tc1
Tcb1
Mos1
0.5 Transposons TRENDS in Genetics
Figure 1 (continued)
constraint. We propose that the cellular integrases have assumed a host role and, like their retroelement counterparts, probably function in DNA metabolism.
Introduction Retrotransposons and retroviruses (collectively referred to as retroelements) copy their mRNA into cDNA and then integrate the cDNA into new chromosomal sites. This process requires the pol gene products, reverse transcriptase and integrase. Of the hundreds of retroelements described to date, to the best of our knowledge, all encode reverse transcriptase and integrase as a single polypeptide, which is post-translationally processed by a retroelement-encoded protease. A novel family of integrases In cataloging retroelements from the completed eukaryotic genome sequences, we identified a group of related integrases that lack adjacent reverse transcriptases. These integrases are derived from diverse eukaryotes, including humans, fish and worms, and include a previously described integrase from Tetrahymena. All of www.sciencedirect.com
these integrases (referred to hereafter as cellular integrases or c-integrases) have a catalytic core domain (the ‘D,D35E motif ’) that is characteristic of retroviral integrases (Figure 1a). The catalytic domains share 20–90% amino acid identity and have diverged significantly from retroelement integrases (10–25% amino acid identity) and DNA transposable element transposases (!10% amino acid identity). The residues and spacing that define the D,D35E motif indicate that the c-integrases are more similar to retroelement integrases than transposases. This relationship is further supported by phylogenetic analyses performed with catalytic domain sequences (Figure 1b). The c-integrases are distinct from retroelement integrases, and the degree of diversity among them is comparable to the diversity observed among retroelement integrases, suggesting a long evolutionary history of these genes with their hosts, multiple independent origins and/or functional divergence. It should be noted that an integrase has previously been reported in humans (Gin-1) that is not associated with a retroelement, but this gene is closely related to Ty3/gypsy retroelement integrases and therefore does not share an evolutionary history with the c-integrases [1].
Update
TRENDS in Genetics Vol.21 No.3 March 2005
Similar to the retroelement integrases, the mammalian c-integrases have a predicted N-terminal metalbinding motif (HHCC) (Figure 2a). However, downstream of the catalytic domain, the c-integrases lack conserved motifs such as the GPY/F motif, characteristic of some Ty3/gypsy retrotransposon and retrovirus integrases, and the GKGY motif, a universal feature of Ty1/copia retrotransposon integrases [2,3]. A chromodomain is located at the C-termini of all c-integrases except those derived from mammalian hosts and Tetrahymena. The most thoroughly characterized chromodomain is derived from heterochromatin protein 1 (HP1) and binds to the N-terminus of histone H3 that is methylated on lysine 9 [4]. Some Ty3/gypsy retrotransposon integrases have chromodomains that might have a role in target specificity by directing integrase to specific chromatin domains [5].
135
sequences. Furthermore, the c-integrases are not bound by characteristic retroelement long terminal direct repeats. Although it is possible that the c-integrases act in trans to mediate the integration of retroelement cDNA, we view this as unlikely, because their degree of divergence from retroelement integrases suggests that they would be unable to bind to cDNA from distantly related retroelements and perform integration [6]. The c-integrases are also unlikely vestiges of degenerate retroelements. The reading frames are intact, and the synonymous substitution rate greatly exceeds the rate for non-synonymous substitutions. Of the 120 possible c-integrase sequence pairs derived from the 16 worm and fish c-integrase genes, none has a dN/ dS ratio (non-synonymous to synonymous nucleotide changes) O0.8; 73 pairs have a ratio !0.3 (Figure 2b). The c-integrases, therefore, are evolving under functional constraint, suggesting that they still have a biological role. Because there are multiple copies of c-integrases in some organisms (i.e. Caenorhabditis elegans, zebrafish, Fugu, Tetrahymena and Xiphophorus maculates), it is possible that they are part of a new type of transposable element. This has been previously proposed for the
The relationship of c-integrases to mobile DNA Although the c-integrases are similar to retroelement integrases, it is unlikely that they are part of a conventional retroelement: no similarity to the retroelement gag and pol genes is evident in the adjacent
(a) HH CC
D
D E
Retrovirus integrases HH CC
D
D E
HH CC
D
D E
ChromoGPF/Y domain (+/−)
Ty3/gypsy integrases
400–500 aa
G KGY 500–700 aa
Ty1/copia integrases Fob1p
300–400 aa
HH CC E
D 566 aa
HH CC D
D E
Mammalian c-integrases
400–450 aa D
D E
Chromodomain
Other c-integrases
350–400 aa
Number of pairwise c-integrase comparisons
(b) 50 45 40 35 30 25 20 15 10 5 0 0.1
0.2
0.3
0.4
0.5 0.6 dN:dS
0.7
0.8
0.9
1
>1
TRENDS in Genetics
Figure 2. Coding features of c-integrases. (a) Comparison of c-integrase and retroelement integrase domain structures. Conserved amino acid sequence motifs are described in the main text. (b) A histogram of the ratio of non-synonymous to synonymous distances from pairwise c-integrase comparisons. Included in the analysis are c-integrases from Caenorhabditis elegans (four genes), Caenorhabditis briggsae (two genes), zebrafish (five genes) Xiphophorus maculates (one gene) and Fugu (four genes), all of which form a monophyletic lineage. Of the 120 possible pairs from the 16 c-integrases, meaningful dS and dN values could be calculated for 88 pairs by the Yang and Nielsen method [15], and these are presented in the histogram. www.sciencedirect.com
Update
136
TRENDS in Genetics Vol.21 No.3 March 2005
sustained a translocation since divergence of the two species. Despite the changes in the surrounding coding sequences, the c-integrases in both C. elegans and C. briggsae are under strong selective pressure and have O85% amino acid similarity and a dN/dS ratio !0.1. This implies a functional role for the c-integrases. It should be noted that flanking sequence similarity is not a universal feature of the c-integrases; it is not observed among the seven Fugu integrases or the two human integrases.
c-integrases of Tetrahymena, which are found within a 22kb repeat unit called Tlr1 [7]. Only one Tlr repeat unit has been completely sequenced, therefore, it is not known whether these repeats are well conserved. The multiple Tlr repeats could be derived through segmental duplication rather than by transposition. In C. elegans, there is compelling evidence that the multiple copies of c-integrase arose through segmental duplication. The C. elegans c-integrases and adjacent flanking sequences form two duplicate pairs. There is little similarity in the length of the duplicate region and sequence composition between these pairs (Figure 3). One of the duplicate pairs from C. elegans is found in C. briggsae, however, the structure of the duplicate region differs between these species (Figure 3b). Only a single gene adjacent to the c-integrase is preserved at all four chromosomal loci, and the region appears to have (a)
C29E4.11 C29E4.9
The function of c-integrases Because they do not appear to be part of a mobile element, and because they are evolving under functional constraint, it is possible that the c-integrases have a cellular role. This is suggested by global genome expression data, indicating that one of the C. elegans genes is expressed. Genome-wide RNA interference (RNAi) experiments in
F54H12.4 F54H12.3
F54H12.5 F54H12.6
F4H12.2
C. elegans Chr3 TonB
Y25G11C.16
C. elegans Chr4
Ribosomal protein S13
c-int
Lipase
Ribonucleotide Cyclin-like reductase F-box family
EF2
Y25G11C.17 Y57G11.18 Y57G11.21 Y57G11.38 Y57G11C.19 Y57G11C.20 Y57G11.23 Skinny c-int hedgedog-like
Lipase
Ribonucleotide reductase
Transport protein
Y57G11C.22 perinuclear binding protein 5 kb
(b) CBG20709
CBG20710
C. briggsae cb25.fpc4206 c-int
CBG23832
CBG23833
CBG23834
C. briggsae cb25.NA_051 c-int
Y26D4A.8 Y26D4A.13 Y26D4A.9 Y26D4A.14
Y26D4A.11
Y26D4A.10 C17H1.2
C17H1.3
C. elegans Chr1 c-int
N-myristylated nuclear protein T25G12.1 T25G12.2 T25G12.9
Y40C7B.1 C33E10.6 C33E10.8 C33E10.5 Y40C7B.3 Y40C7B.5
Y40C7B.4
C. elegans ChrX Dehydrogenase Tc1 transposase nuclear protein
Nuclear protein
c-int 1 kb
15 Kb
TRENDS in Genetics
Figure 3. Genomic context of c-integrases in Caenorhabditis elegans and Caenorhabditis briggsae. The four C. elegans c-integrases resulted from the duplication of two homologs: (a) c-integrase genes and annotated flanking sequences from Chromosome (Chr) 3 and Chr4; and (b) c-integrases on Chr1 and ChrX. c-integrases are indicated by black arrows, and predicted genes in the flanking regions are indicated by grey arrows. Each duplicate region in C. elegans has O85% DNA sequence identity (indicated by shading) and is delimited by incomplete (Chr3 and Chr4) and complete (Chr1 and ChrX) inverted repeats. There is no similarity in flanking sequences between the two duplicate pairs in C. elegans. Panel b also illustrates the corresponding duplicate pair from C. briggsae. www.sciencedirect.com
Update
TRENDS in Genetics Vol.21 No.3 March 2005
C. elegans did not reveal an observable phenotype for the c-integrases, but this might be due to genetic redundancy or the limited number of phenotypes scored [8]. A fish c-integrase (from Xiphophorus maculates) is also expressed, and a chromosome rearrangement that places this c-integrase adjacent to a receptor tyrosine kinase results in hereditary melanoma [9]. Expression is also indicated by the existence of ESTs encoding both of the human c-integrases in addition to the dog, cow and slime mold (Dictyostelium) genes. The c-integrases are distantly related to Fob1p of Saccharomyces [10], a protein involved in rDNA metabolism [11]. Owing to the degree of divergence, it is difficult to conclude whether Fob1p is orthologous to the c-integrases (Figure 1b). Furthermore, Fob1p does not have the conserved D,D35E residues that characterize the integrase catalytic domain (Figure 1a). Nonetheless, Fob1p is thought to initiate double-strand breaks within the rDNA repeats. The repair of these breaks through recombination results in the expansion or contraction of the rDNA array and generates the rDNA circles that signal the onset of aging in yeast [10]. The role of Fob1p in repeat sequence regulation is similar to the proposed role for the Tetrahymena c-integrase – facilitating DNA loss during the transition from the macronucleus to the micronucleus [12]. It is important to note that other cellular proteins related to transposases act on or modulate DNA sequence repeats. These include Rag1 transposase, which mediates immunoglobulin gene rearrangements, and the transposase-derived CENP-B proteins, which bind to centromeric a-satellite DNA in humans and are required for de novo assembly of centromeric chromatin [13]. Concluding remarks As proposed for Fob1p and the Tetrahymena protein, we suggest c-integrases regulate the amplification or loss of DNA repeats. The presence of a chromodomain predicts that some c-integrases act in heterochromatin, perhaps at tandemly arranged sequence repeats. Increasingly, repeated sequences and their associated epigenetic marks are implicated in heterochromatin function. Consequently, there might be a selective advantage to having a cellular integrase regulate the size and extent of sequence repeat tracts. Functional analyses will ultimately be required to determine the biological activity of c-integrases. Nonetheless, the diversity of proteins bearing similarity to
www.sciencedirect.com
137
transposases and integrases – from Rag1 and CENP-B to Fob1p and the c-integrases – suggests that, during evolution, cells have found multiple uses for the ‘cutting and pasting’ machinery that is crucial to the replication and spread of mobile DNAs. Acknowledgements We thank Henry Levin and Weiwu Xie for helpful comments on this article. This work was supported by NIH grant GM61657 to D.F.V.
References 1 Llorens, C. and Marin, I. (2001) A mammalian gene evolved from the integrase domain of an LTR retrotransposon. Mol. Biol. Evol. 18, 1597–1600 2 Malik, H.S. and Eickbush, T.H. (1999) Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73, 5186–5190 3 Peterson-Burch, B.D. and Voytas, D.F. (2002) Genes of the Pseudoviridae (Ty1/copia retrotransposons). Mol. Biol. Evol. 19, 1832–1845 4 Jenuwein, T. and Allis, C.D. (2001) Translating the histone code. Science 293, 1074–1080 5 Sandmeyer, S. (2003) Integration by design. Proc. Natl. Acad. Sci. U. S. A. 100, 5586–5588 6 Brown, P.O. (1997) Integration. In Retroviruses (Coffin, J.M. et al., eds), pp. 161–204, Cold Spring Harbor Laboratory Press 7 Wuitschick, J.D. et al. (2002) A novel family of mobile genetic elements is limited to the germline genome in Tetrahymena thermophila. Nucleic Acids Res. 30, 2524–2537 8 Kamath, R.S. et al. (2003) Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421, 231–237 9 Fornzler, D. et al. (1996) The Xmrk oncogene promoter is derived from a novel amplified locus of unusual organization. Genome Res. 6, 102–113 10 Dlakic, M. (2002) A model of the replication fork blocking protein Fob1p based on the catalytic core domain of retroviral integrases. Protein Sci. 11, 1274–1277 11 Kobayashi, T. and Horiuchi, T. (1996) A yeast gene product, Fob1 protein, required for both replication fork blocking and recombinational hotspot activities. Genes Cells 1, 465–474 12 Gershan, J.A. and Karrer, K.M. (2000) A family of developmentally excised DNA elements in Tetrahymena is under selective pressure to maintain an open reading frame encoding an integrase-like protein. Nucleic Acids Res. 28, 4105–4112 13 Ohzeki, J. et al. (2002) CENP-B box is required for de novo centromere chromatin assembly on human alphoid DNA. J. Cell Biol. 159, 765–775 14 Kumar, S. et al. (2004) MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief. Bioinform. 5, 150–163 15 Yang, Z. and Nielsen, R. (2000) Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32–43 0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2005.01.006