Unity and diversity among viral kinases

Unity and diversity among viral kinases

Journal Pre-proofs Research paper Unity and diversity among viral kinases Chintalapati Janaki, Manoharan Malini, Nidhi Tyagi, Narayanaswamy Srinivasan...

5MB Sizes 0 Downloads 80 Views

Journal Pre-proofs Research paper Unity and diversity among viral kinases Chintalapati Janaki, Manoharan Malini, Nidhi Tyagi, Narayanaswamy Srinivasan PII: DOI: Reference:

S0378-1119(19)30793-0 https://doi.org/10.1016/j.gene.2019.144134 GENE 144134

To appear in:

Gene

Received Date: Revised Date: Accepted Date:

8 February 2019 12 September 2019 16 September 2019

Please cite this article as: C. Janaki, M. Malini, N. Tyagi, N. Srinivasan, Unity and diversity among viral kinases, Gene (2019), doi: https://doi.org/10.1016/j.gene.2019.144134

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.

Unity and diversity among viral kinases Chintalapati Janaki1,2, Manoharan Malini1,3, Nidhi Tyagi1,4 and Narayanaswamy Srinivasan1* Affiliations: 1Molecular 2Centre

Biophysics Unit, Indian Institute of Science, Bangalore 560012, India

for Development of Advanced Computing, Knowledge Park, Byappanahalli, Bangalore, India

3Present

Address:Medgenome labs Pvt Ltd.3rd Floor, Narayana Netralaya Building, Narayana Health City,

#258/A, Bommasandra, Hosur Road, Bangalore, India 4Present

Address: The EMBL-European Bioinformatics Institute, Welcome Trust Genome Campus,

Hinxton, Cambridge, United Kingdom *Corresponding author: Narayanaswamy Srinivasan Contact: [email protected] Short title: Viral kinases

1

Unity and diversity among viral kinases Chintalapati Janaki1,2, Manoharan Malini1,3, Nidhi Tyagi1,4 and Narayanaswamy Srinivasan1* Affiliations: 1Molecular 2Centre

Biophysics Unit, Indian Institute of Science, Bangalore 560012, India

for Development of Advanced Computing, Knowledge Park, Byappanahalli, Bangalore, India

3Present

Address:Medgenome labs Pvt Ltd.3rd Floor, Narayana Netralaya Building, Narayana Health

City, #258/A, Bommasandra, Hosur Road, Bangalore, India 4Present

Address: The EMBL-European Bioinformatics Institute, Welcome Trust Genome Campus,

Hinxton, Cambridge, United Kingdom *Corresponding author: Narayanaswamy Srinivasan Contact: [email protected] Short title: Viral kinases

2

Abstract Viral kinases are known to undergo autophosphorylation and also phosphorylate viral and host substrates. Viral kinases have been implicated in various diseases and are also known to acquire host kinases for mimicking cellular functions and exhibit virulence. Although substantial analyses have been reported in the literature on diversity of viral kinases, there is a gap in the understanding of sequence and structural similarity among kinases from different classes of viruses. In this study, we performed a comprehensive analysis of protein kinases encoded in viral genomes. Homology search methods have been used to identify kinases from 104282 viral genomic datasets. Serine/threonine and tyrosine kinases are identified only in 390 viral genomes. Out of seven viral classes that are based on nature of genetic material, only viruses having double-stranded DNA and single-stranded RNA retroviruses are found to encode kinases. The 716 identified protein kinases are classified into 63 subfamilies based on their sequence similarity within each cluster, and sequence signatures have been identified for each subfamily. 11 clusters are well represented with at least 10 members in each of these clusters. Kinases from dsDNA viruses, Phycodnaviridae which infect green algae and Herpesvirales that infect vertebrates including human, form a major group. From our analysis, it has been observed that the protein kinases in viruses belonging to same taxonomic lineages form discrete clusters and the kinases encoded in alphaherpesvirus form host-specific clusters. A comprehensive sequence and structure-based analysis enabled us to identify the conserved residues or motifs in kinase catalytic domain regions across all viral kinases. Conserved sequence regions that are specific to a particular viral kinase cluster and the kinases that show close similarity to eukaryotic kinases were identified by using sequence and three-dimensional structural regions of eukaryotic kinases as reference. The regions specific to each viral kinase cluster can be used as signatures in the future in classifying uncharacterized viral kinases. We note that kinases from giant viruses Marseilleviridae have close similarity to viral oncogenes in the functional regions and in putative substrate binding regions indicating their possible role in cancer. Keywords: Viral genomes, Kinase classification, Kinase subfamilies, Phycodnaviridae kinases, Herpesviral kinases, Poxvirus kinases, Retroviral kinases

3

1. Introduction Protein kinases influence almost every aspect of cell biology starting from transcription, DNA replication, cell cycle to signal transduction (Hardie, 1990; Henneke et al., 2003; Mellon et al., 1989; Pines, 1994). Phosphorylation by protein kinases is one of the common and important post-translational modifications of proteins with enormous influence on biological activity in the cell (Cohen, 2002; Roskoski, 2015). Apart from cell signaling, phosphorylation plays role in pathogen recognition and defense in plants and animals (Lee and Lucas, 2001). Kinases are involved in the control of metabolism, transcription, cell division and movement, programmed cell death, and also participate in the immune response and nervous system function (Pereira et al., 2011). The assorted functions of protein kinases in signal transduction pathways have been studied in detail (Hanks and Hunter, 1995). More than 518 human protein kinases have been recognized through their conserved sequence motifs reflecting the importance of kinases in eukaryotic cell signal transduction and metabolism (Kostich et al., 2002; Krupa and Srinivasan, 2002; Manning et al., 2002). The kinases constitute the third most populous protein family and represent ∼1.7 to 2.5% of the human genome (Quintaje and Orchard, 2008). Deregulation of protein kinases results in a number of disorders and thus, many human kinases have become major targets for therapy (Cohen, 2002). As many mutations in kinases are known to cause diseases such as cancer, understanding the sequences and structures of disease causing mutants is important for the development of selective and specific targeted therapies for cancer (Dixit and Verkhivker, 2014; Gooding and Schiemann, 2016). Genomic studies revealed the presence of Ser/Thr kinases in many bacterial species (Krupa and Srinivasan, 2005), although their physiological roles have largely been unclear. The first reports of kinases in viruses were published in 1970s, and since then the role of protein phosphorylation in the life cycle of viruses has been a topic of widespread interest (Collett and Erikson, 1978; Tan, 1975). Over the last few years, several new insights have been gained concerning the pleiotropic functions of viral protein kinases. Some of the viruses are known to acquire host kinases for their own benefit and mimic their cellular counterparts (Leader, 1993; Leader and Katan, 1988). However, not much is known about the host kinases that phosphorylate viral substrates (Keating and Striker, 2012).

4

Serine-threonine kinases and tyrosine kinases have been reported largely in double stranded (ds) DNA viruses, retroviruses, and in herpesvirales (Jacob et al., 2011). Retroviruses are known to be associated with different neurological and immunological disorders. Studies of oncogenic retroviruses established fundamental principles of modern molecular cancer biology, and the oncogenes that have predominant roles in human cancer were first identified in retroviruses (Vogt, 2012). Retroviral kinases share close similarity to the cellular kinases. Kinases in the dsDNA viruses are largely been reported in the Herpesviridae group of viruses. Herpesviruses are known to encode a variety of accessory proteins, and the biological functions of these viral protein kinases are not clear (Gershburg et al., 2015). These kinases do not share high sequence similarity with the eukaryotic kinases. Earlier studies on viral protein kinases largely focused on viruses that have human host (Jacob et al., 2011; Leader and Katan, 1988). A comprehensive analysis of kinases encoded in viruses having diverse hosts is not reported so far. In this study, we have identified a comprehensive set of 716 putative protein kinases encoded in 390 viral genomes using remote homology detection methods. These viruses have diverse hosts. This analysis also describes the characteristic features of kinases grouped into clusters based on their sequence similarity. We also analyzed the conservation of known eukaryotic kinase substrate binding residues in the viral kinases. 2. Materials and Methods For this study, 1,947,340 protein sequences from sequence data sets of 104282 viruses (different strains of viruses included) have been retrieved from Uniprot database (www.uniprot.org/). The methodology developed in-house to identify kinases (Anamika et al., 2005; Krupa et al., 2004) based on homology search methods such as RPS-BLAST (Marchler-Bauer and Bryant, 2004) and hidden Markov model matching using HMMER (Mistry et al., 2013) was used to identify viral kinases. RPS-BLAST is reverse position specific blast that employs BLAST (Altschul et al., 1990) in searches against profile database comprising of Position-Specific Scoring Matrices (PSSMs). Conditions for hits in RPS-BLAST searches include an e-value cut off of 0.0001, and more than 70% of the profile should be covered by the query in the alignment. E-value cut-off used in HMM search is 0.01. In all the putative hits we ensured the presence of the catalytic aspartate residue which is conserved in almost all known Ser/Thr or Tyr kinases. 813 kinases belonging to 390 genomes were identified after applying the aforementioned 5

criteria in RPS-BLAST and HMM searches. Sequences of catalytic domain regions of the identified kinases were extracted from Uniprot database (www.uniprot.org). The duplicate sequences with 100% identity in their kinase catalytic domain regions were removed, after which a dataset of 716 sequences has been used for further analysis. To group the kinases based on sequence similarity, BLASTClust program from NCBI (Altschul et al., 1990) was used, and a criteria of 35% sequence identity and 90% query coverage was used. Methodology followed is given as a flowchart in Figure 1. The sequences that are grouped together based on their sequence similarity will henceforth be referred to as a kinase cluster. The viruses that fall into these clusters belong to a particular viral taxonomic lineage i.e., order, family, subfamily, or genus as per the International Committee on Taxonomy of Viruses (ICTV) (Fauquet and Fargette, 2005). Multiple sequence alignment program, ClustalW (Thompson et al., 1994) has been used to align kinase domain sequences from each of these clusters. Phylogenetic trees were constructed using the Maximum-likelihood method in MEGA 4.0, and the trees are generated using Figtree v1.4.3 (Kishino et al., 1990; Tamura et al., 2007; Morariu et al., 2008). The viral kinases belonging to each cluster are aligned against the profile generated using three-dimensional structure of representative eukaryotic kinases from different subfamilies. The methodology used to build the profile is discussed in Janaki et al., (2016). The objective of this exercise is to map the known substrate binding regions of eukaryotic kinases with those of corresponding viral kinases and understand how far these regions are conserved compared to their cellular counterparts. A total of 11 major clusters of viral kinases have been identified, representing as many subfamilies of viral kinases. Kinase sequences belonging to each of the 11 clusters are aligned and analyzed using Consurf (Ashkenazy et al., 2016). Consurf takes either single sequence or multiple sequence alignment as input and calculates the degree of conservation at each amino acid position within proteins using various homology-based methods. It builds a 3D model of the given query sequence using MODELLER (Sali and Blundell, 1993) and identifies structural and functionally important residues (Ashkenazy et al., 2016).

6

3. Results and Discussion From sequence data sets of 390 viruses, 813 kinases have been identified, and after removing sequences that are 100% identical in their catalytic domain regions, 716 unique putative kinase sequences were obtained (Supplementary table 1). Methodology employed to identify kinases using sequence similarity search methods is shown in Figure 1. The genomes of dsDNA virus and retroviruses that encode kinases are listed in a classified form in Figure 2. The dsDNA viruses such as Herpesvirales which infect both vertebrates and invertebrates and Phycodnaviridae which infect marine and freshwater algae were found to have large number of kinases. Some of the kinases encoded in genomes of viruses characteristic of families such as Phycodnaviridae, Baculoviridae, Mimiviridae, Poxviridae, Iridoviridae, Ascoviridae and Nudiviridae. Some of them are specific to viral genus within a family. For example, genus Marseillevirus belonging to Marseilleviridae family code for certain kinases. There are kinases encoded in a specific species within a genus. For example, Herpes Simplex Virus (HHV-1), HHV2 and HHV3 belonging to genus Alphaherpesvirinae code for certain kinases. The possible reasons behind the presence of kinases only in dsDNA and in retroviruses were proposed (Jacob et al., 2011). One of the proposed possible reasons is that the kinases are not absolutely essential for viruses though they are useful; but, due to the large size of genomes of dsDNA viruses, they could afford to accommodate kinase coding regions in their genomes.

Alternative hypothesis by the same

authors is that the kinase genes are evolutionarily opted out in recently diverged viral genomes. A more attractive hypothesis is that the kinases are, in fact, quite important for viruses and therefore, the ancient dsDNA viruses are still being around despite long evolution of hosts (Jacob et al., 2011). Another reasoning could be, as there are many redundant cellular kinases that phosphorylate viral proteins (Keating and Striker, 2012), viruses gradually adapted to survive and exhibit pathogenicity even in the absence of their own kinases. We used homology search methods such as BLAST and HMMSearch to detect close homologues of viral kinases in eukaryotes. All eukaryotic kinases are known to have a conserved catalytic domain having N-terminal small lobe majorly associated with ATP binding and C-terminal large lobe associated with peptide binding and catalysis (Pearce et al., 2010). Figure 3a depicts the representative structure of eukaryotic kinase catalytic domain region (PDB Code: 1ATP) (Kornev and Taylor, 2010) where the Nterminal lobe is highlighted in light blue color and C-terminal lob is highlighted in red. The glycine-rich 7

loop present in N-terminal lob connects β-1 and β-2 strands to harbor Adenine ring of ATP. Figure 3b depicts the key conserved residues in the catalytic kinase domain and the function of these key residues is given in Table 1. Glycine (G) 52 in Glycine-rich loop is for ATP binding, Lysine (K) 72 couples phosphates of ATP to C-helix, Glutamic acid (E) 91 forms salt bridge with K72, Histidine (H), x (any amino acid) and Asparagine (N) – HxN motif mediate key structural interactions,, Phenylalanine (F), Histidine (H) Arginine (R) Asparatic Acid (D) – HRD motif in catalytic loop is the active site, DFG motif and Alanine (A) Phenylalanine (P) Glutamic acid (E) APE motif in activation segment recognizes ATP bound MG++ ions,Tryptophan (W) in F-helix (αF) is conserved in many kinase families. AP of APE motif dock to tryptophan in F-helix (Taylor, 2012). From sequence analysis, it has been observed that most of these key residues of eukaryotic kinase catalytic domain region are conserved within viral kinases indicating a basis for their mimicry of host kinases for their replication and invasion. Supplementary Table 2 gives a list of conserved structural and functional residues within each viral kinase cluster as reported by Consurf. 3.1 Viral Kinase Clusters To understand the similarity among viral kinases, the sequences are clustered based on their sequence similarity using NCBI BLASTClust. After clustering the kinases based on their catalytic domain region, 63 clusters are formed of which 11 clusters are well represented with at least ten members in each of these clusters. The sequences within each cluster are aligned using ClustalW and using this alignment as input, hmm profile is generated for each cluster using hmmbuild, a tool within HMMER package (Eddy, 2011) and all eleven HMM profiles are made available as Supplementary data 3. We performed phylogenetic analysis using maximum likelihood for sequences from these 11 clusters (Figure 4). Distinct features of each cluster are given below. 3.1.1 Cluster-1 –Serine/Threonine Kinases encoded in Phycodnaviridae Phycodnaviridae is a family of giant viruses (Van Etten and Meints, 1999) that have large icosahedral, dsDNA, and infect marine or freshwater algae (Wilson et al., 2009). Their genome size ranges from 160 to 560 kb, and they belong to nucleocytoplasmic large DNA viruses (NCLDV) or order megavirales 8

(Dunigan et al., 2006). Ascoviridae, Asfarviridae, Iridoviridae, Marseilleviridae, Pandoraviridae, Poxviridae, and Mimiviridae are other giant virus families that belongs to NCLDV order (Colson et al., 2013; Wilson et al., 2009) and have genes for DNA replication.Though Phycodnaviridae is majorly grouped into six genera, i.e., Chlorovirus, Coccolithovirus, Phaeovirus, Prasinovirus, Prymnesiovirus, and Raphidovirus, it is interesting to observe that protein kinases are majorly encoded in Chloroviruses that infect green microalgae (Jeanniard et al., 2013). Protein kinases encoded in two chloroviruses, Acanthocystis turfacea chlorella virus and Paramecium bursaria chlorella virus formed two distinct clusters: Cluster 1 formed by Ser/Thr protein kinases and Cluster 2 formed by PBCV-specific basic adaptor domain-containing protein (represented in green in Figure 4). Most of the kinases within this cluster have catalytic Asp-Arg-Asp (HRD) motif conserved except for few having His-Leu-Asp (HLD) motif. Similarly, there are few kinases within this cluster that has Asp-Leu-Gly (DLG) motif instead of Asp-Phe-Gly (DFG) motif, and all the sequences with HLD motif has DLG motif. DFG motif present in N-terminal region of kinase activation loop has known to play role in kinase activity and inhibitor binding (Treiber and Shah, 2013; Vijayan et al., 2015). In inactive DFG-out state, Asp in DFG motif flips by 1800 and further Asp and Phe swap positions, due to which Asp moves away from ATP binding site, and Phe moves into binding site thus creating an allosteric pocket and further blocking ATP binding (Artim et al., 2012; Vijayan et al., 2015). This movement of Phe can disturb the kinase regulatory spine (R-spine) and catalytic spine (C-spine) and many kinase inhibitors such as Imatinib are known to bind to this hydrophobic pocket in classical DFG-out conformation. Autoinhibitory mechanism of kinases can get disturbed by replacement of Phe with Leu in DFG motif as Leu is not close to binding site thus retaining the active state of kinase (Artim et al., 2012). DLG motif is found in eukaryotic WNK (with no Lys) kinases, that are linked to hypertension (Min et al., 2004; Scheeff et al., 2009). 3.1.2 Cluster 2 – PBCV-specific basic adaptor domain proteins in Phycodnaviridae Paramecium bursaria chlorella virus (PBCV) encodes a few proteins containing PBCV-specific basic adaptor domain, a positively charged C-terminal domain that acts as a targeting device for specific substrates (Dunigan et al., 2006; Iyer et al., 2006). S/T kinases in PBCV has two copies of PBCVspecific basic adaptor domain, a small positively charged C-terminal domain and an additional domain tethered to it (Dunigan et al., 2012). RENxVH motif found to be conserved within this cluster of kinases and is present in AlphaC helix in N-lobe and Glu within this motif is important for orienting the 9

phosphoryl group of ATP during phosphoryl transfer (Taylor, 2012). Sequence alignment of these kinases against close homologues from eukaryotes using BLASTP show more than 30% sequence similarity to dual-specificity kinases FUZ7 in Ustilago maydis (Corn smut fungus) and Mitogenactivated kinase HOG1, high osmolarity glycerol response protein 1 in algae. In some of the sequences within this cluster, the conserved Arg of HRD motif in catalytic loop is replaced by a small or aliphatic residue, i.e., either [G/A/L]. From earlier studies, it is known that such substitutions may not disturb the catalytic activity (Scheeff et al., 2009). 3.1.3 Cluster 3- Kinases encoded in Baculoviridae This cluster comprises of 68 kinases encoded in Lepidopteran-specific Nucleopolyhedrovirus (Alphabaculovirus) and Lepidopteran-specific Granulovirus (Betabaculovirus) (Jehle et al., 2006). These viruses have arthropods as their natural hosts. However, we did not find kinases encoded in the genomes of other two types of baculovirus i.e., hymenopteron-specific Nucleopolyhedrovirus (Gammabaculovirus) and dipteran-specific nucleopolyhedroviruses (Deltabaculovirus). From sequence comparison of kinases encoded in alphabaculovirus and betabaculovirus, it is observed that Betabaculovirus kinases lack GxGxxG motif in glycine-rich loop, a triad required for catalysis, and a substitution in place of second glycine can cause kinase non-functional (Hemmer et al., 1997). Kinases of Betabaculovirus also differ from alphabaculovirus by the presence of conserved Proline P223 [Uniprot ID F4ZKN0, Clostera Anachoreta granulovirus]. It is also observed that conserved Pro is preceded by conserved DNF[D/N]P region in some of the granuloviruses. The protein kinases in this cluster are formed by protein kinase 1 that are encoded in the late phases of virus infection (Liang et al., 2013) and differ from protein kinases 2 (PK2) of baculoviruses in functionality. PK2 inhibits the eukaryotic transcription initiation factor (eIF2), a heme-regulated inhibitor (HRI)-like kinase through a lobe-swab mechanism, thus increasing viral fitness (Li et al., 2015). PK1 in nucleopolyhedrosis is useful in nucleocapsid assembly (Liang et al., 2013). PK2 lacks HRD catalytic motif instead has HHN motif whereas PK1 has HND catalytic motif. PK1 has D[Y/F]G motif and PK2 has MFG motif. PK1 of Epinotia aporema granulovirus [Uniprot ID K4EQ12], a highly pathogenic granulovirus infecting Epinotia aporema (Ferrelli et al., 2012) has Asp-Cys-Gly(DCG) motif replacing the D[Y/F]G motif of magnesium binding loop within the activation segment. As Phe plays a key role in positioning of activation segment, its mutation in DFG motif can affect the function of the kinase (Nimchuk et al., 10

2011; Nolen et al., 2004). Also, the Cys162 of DCG motif may form disulfide bond either with Cys160 or Cys173 or may form disulfide bond with Cys in host protein for nuclear localization (Koutroumani et al., 2017). 3.1.4 Cluster 4 - US3 protein kinases encoded in Alphaherpesvirinae The cluster 4 consists of 60 US3 kinases of Alphaherpesvirinae, a double-stranded DNA virus belonging to Herpesviridae, that infects Aves and Mammals. The US3 kinase is conserved throughout the alphaherpesvirus cluster but is not present in other herpesvirus genomes. To understand these kinases in detail, phylogenetic analysis was carried out by clustering the sequences at 70% sequence identity. Figure 5 represents the phylogenetic analysis of US3 kinases encoded in viruses that infects mammals such as Human, Bos taurus, Feline (Cat), Equid (Horse), Phocid (Seal), Suid (Pig), Saimirrine (Monkey), Cercopithicine (Macaque), Leporid (Rabbits) and Aves such as Gallus Gallus (Chicken), Anatid (Duck), and Psittacid (Parrot). It is interesting to observe that grouping into clades is host-specific, i.e., the viruses that infects fishes, birds, and higher organisms formed three distinct clades. Feline herpesvirus and Phocid herpesvirus that infects Cat and Seal respectively are closely related (Martina et al., 2001) and the kinases belonging to these two viruses formed a single clade. In a similar way, the kinases encoded in Psittacid virus that causes Pacheco’s disease in parrots (Thureen and Keeler, 2006) and infectious laryngotracheitis virus that infects chickens formed a unique clade. US3 kinases retained catalytic triad HRD motif in the catalytic loop and DFG motif in the activation segment. The first glycine in GxGxxG motif of glycine-rich loop is not conserved in US3, whereas the second and third glycines are conserved except for two cases where second glycine is replaced by serine in Gallus herpesvirus 3 (GaHV-3), a Marek’s disease virus Serotype-2 non-oncogenic type (Spatz and Schat, 2011). Earlier studies of kinetic and structural analysis for substituting serine in place of second glycine reported drastic reduction of kcat and steric repulsions between serine and phosphate group of ATP (Hemmer et al., 1997). Second glycine that is conserved in more than 99% of the kinases is found to be critical for catalysis. It is also interesting to observe that the “glycine-rich” loop in this sequence have a conserved motif PSSEG, proline replacing the first glycine and serine replacing the second one. PHI-BLAST search using “PSSEG” as the input pattern was performed in non-redundant (nr) database 11

to identify the kinases having same pattern. Based on the search results, it has been observed that no other kinase have the same pattern in glycine-rich loop indicating a specific functional role of this motif, and could be a reason for being non-oncogenic. Multiple sequence alignment of US3 from Gallid herpesvirus 3 (GaHV-3), GaHV-1 or ILTV, and GaHV-2 shows that US3 from GaHV-3 and GaHV-2 are more than 90% similar, whereas US3 of ILTV has low sequence similarity with other herpesviruses and has many sequence insertions (Figure 6). US3 from Meleagrid herpesvirus 1 (MeHV-1) (Turkey herpesvirus), also called Marek’s disease virus type 3 (MVD-3) is also considered for sequence comparison as GaHV-3 along with MeHV-1/MVD-3 is used as in bivalent vaccines against MVD-1 i.e. Gallus herpesvirus 2, highly oncogenic causing contagious lymphoproliferative disorder in chickens (Spatz and Schat, 2011). As seen in Figure 6, US3 from MeHV-1 shares a high sequence similarity with other US3 sequences. The US3 of the porcine alphaherpesvirus PRV was found to be broadly similar for HSV-1 and -2 and VZV (Daikoku et al., 1993; Eisfeld et al., 2006; Purves and Roizman, 1992) and the cellular protein kinase A (Benetti and Roizman, 2004; Kato et al., 2009). Lamin A/C and emerin, key elements of the nuclear lamina network, are predicted to be phosphorylated by HSV-1 US3 (Leach et al., 2007; Mou et al., 2007). Phosphorylation of the N terminus of UL31 by HSV-1 US3 (Mou et al., 2009) results in the relocalization of the envelopment machinery. Function of US3 kinase has been well studied (Jacob et al., 2011) and it is known to play a major role in viral replication and pathogenicity by undergoing autophosphorylation and even phosphorylating both viral and host proteins. These kinases have conserved motifs similar to cyclic AMP (cAMP)-dependent protein kinase (PKA) and also mimic their function by phosphorylating PKA substrates (Erazo et al., 2011). US3 kinase in Psittacid herpesvirus (PSHV) and Infectious laryngotracheitis virus ILTV (Gallid Herpesvirus 1) both belonging to Avian herpesvirus form a unique clade. Sequence alignment shows that the conserved kinase residues Glu172 and Trp-Lys-Asp (249-251) residues (Uniprot ID Q6UDG0) found only in kinases of these two viruses and not present in Marek’s diseases virus (Gallid herpesvirus 2) and other alphaherpesviruses (Figure 7). It is likely that these residues conserved only in PSHV and ILTV kinases contribute to the functional specialization of these kinases. PSHV that causes Pacheco’s disease has been grouped into Iltovirus genus based on its similarity to ILTV (Thureen and Keeler, 2006). 12

3.1.5 Cluster 5 - Kinases encoded in Retroviruses and Giant virus Kinases within this cluster belong to viruses that fall into different taxonomic lineages and also functionally different. This cluster is formed by kinases belonging to Retroviruses (either by alpha or gammaretrovirus) and Giant viruses. Tyrosine and Serine/threonine-protein kinases, which are virus transforming proteins encoded in retroviruses are well-studied as they are the original source of oncogenes (Vogt, 2012). A majority of such oncogenes have been found to be kinases, and they share a close similarity with the eukaryotic cellular kinases. However, they differ from their cellular counterpart by either deletions or mutations which make them constitutively active. The Src kinases were first identified in Rous sarcoma virus leading to the discovery of the cellular Src kinases and the coining of the term proto-oncogene (Duesberg, 1983; Vogt, 2012; Zabarovskiĭ, 1985). The C-terminal region in cellular src that carries a key regulatory phosphorylation site is deleted in viral src, and this differentiates the viral src from its cellular counterpart (Vogt, 2012). A comparison of the two proteins showed that the cellular Src has a lower kinase and negligible oncogenic activities compared with viral Src (Coussens et al., 1985; Iba et al., 1985, 1984). Since the retroviral kinases are very different from kinases of other dsDNA viruses, they have been categorized based on their homology with the eukaryotic cellular kinases. A large number of retroviral kinases share close similarity with the receptor tyrosine kinases. The other eukaryotic subfamilies include MOS, AKT and TKL kinases. Tyrosine kinases in eukaryotes are classified as receptor and non-receptor tyrosine kinases (Blume-Jensen and Hunter, 2001). Receptor tyrosine kinases (RTKs) mediate cellular responses to a broad array of extracellular signals involved in the regulation of cell proliferation, migration, differentiation, and survival signaling. Phosphorylation of proteins by non-receptor tyrosine kinases is used to control intracellular signals to the nucleus, extracellular signaling, and other cellular processes. The Src family of protein tyrosine kinases plays key roles in regulating signal transduction in a variety of cellular environments. The Src kinases were first identified in Rous sarcoma virus leading to the discovery of the cellular Src kinases and the coining of the term proto-oncogene. Tyr416 and Tyr527 are known to be two key phosphorylation sites in Src kinase; Tyr416 phosphorylation is stimulatory, and phosphorylation of Tyr527 is inhibitory (Porter et al., 2000). The equivalent residues at the C-terminal tail of v-Src are absent, and the absence of the inhibitory phosphotyrosine from this segment results in a constitutively active form of the enzyme. The viral Src kinase has been shown phosphorylates various

13

cytoskeletal proteins such as vinculin, glycolytic enzymes, and cytosolic proteins (Wyke and Stoker, 1987). v-Src also interacts with RNA binding protein Sam68, which participates in signal transduction. V-ERBB, V-ABL, V-YES, V-SRC, V-FES,V-ROS, V-FGR, V-SEA are Tyrosine kinases which are neoplastic transforming proteins (Erikson et al., 1978; Czernilofsky et al., 1980; Neil et al., 1981) encoded in Avian & Feline Sarcoma virus, Avian Erythroblastosis virus, Avian Leukemia virus, Abelson Murine leukemia virus whereas Serine/threonine-protein kinase-transforming proteins (V-RAF, V-MIL, and V-RMIL genes) (Czernilofsky et al., 1980; Kan et al., 1984) are encoded in Avian retrovirus, and Murine sarcoma virus.V-MOS and V-AKT are other Serine/threonine-protein kinase transforming proteins that are grouped into a different cluster. Except for V-AKT, all other Serine/threonine-protein kinase transforming proteins are single domain proteins having only kinase domain. Most of the Tyrosine-protein kinase transforming proteins are multi-domain proteins having either SH2 or SH3 or F-BAR domain tethered to it. Domain architecture of these kinases are given in Figure 8. ABL1 was first discovered as the oncogene in the Abelson murine leukaemia virus and was later identified as an oncogene associated with chromosome translocations in human leukaemia (Ben-Neriah et al., 1986; Goff et al., 1980). The viral Abl kinase shows striking sequence similarity with its human homologue c-Abl, which is a tyrosine kinase (Devare et al., 1983; Ferguson et al., 1985). Viral Abl kinases are directly involved in the transformation of host cell since it has been shown that oncogenic potential of the virus is decreased with deficiency in kinase activity (Rees-Jones and Goff, 1988; Witte et al., 1980). Abl kinases in Feline sarcoma virus (FSVHY), Abelson murine leukemia virus (MLVAB), Felis catus (FELCA) and Homo sapiens (Human) align with more than 99% sequence identity (Figure 9). Abl kinase in FSHVY has a variation at position 347 position (Uniprot ID P10447), where alanine is replaced with threonine. Mutation of alanine to threonine at position 337 (Ala337Thr) in human ABL kinase is known to enhance the level of phosphorylation (Wang et al., 2017). In the C-terminal region of FSHVY ABL kinase, there is a deletion of QAF motif. Substrate binding residues are found to be conserved across ABL kinases from eukaryotic and viral genomes. V-FES, a tyrosine-protein kinase transforming protein Fes (EC 2.7.10.2) encoded in Feline sarcoma virus (strain Gardner-Arnstein) (Ga-FeSV) (Gardner-Arnstein feline leukemia oncovirus B) is compared 14

against the counter part of this kinase FES from Felis catus (Cat). From the alignment of kinase domains, a single mutation of Glutamic acid (E716 in full length sequence of FES) substituted with Alanine is observed in V-FES (Figure 10). This mutation is noted only in Gardner-Arnstein (GA) strain and not in Snyder-Theilen (ST) indicating a strain-specific mutation (Roebroek et al., 1987). The substrate binding sites are well conserved across the kinases from both host and virus, indicating the functional mimicry by viral oncogene V-FES. This cluster also consists of viral protein v-erbB, which shares a close similarity with human epidermal growth factor receptor. EGFR is constituted of three domains, an extracellular epidermal growth factor binding domain, a transmembrane domain and a cytoplasmic domain which possess the kinase activity. Viral erb-B protein lacks a large portion of the extracellular domain and 32 amino acid residues in cytoplasmic domain when compared to the eukaryotic erbB (Paez et al., 2004; Riedel et al., 1987). Viral sequence involved in the pro-virus integration shares close similarity with c-erbB and results in transformation of host cells. Studies on the somatic mutations profile of the EGFR protein from lung cancer cells throws light on involvement of various amino acid residues in transformation process of cells. Protein kinase sequences from Avian erythroblastosis virus, which are categorized under Tyrosine kinases share a very high sequence identity with human epidermal growth factor receptor (EGFR) protein kinase domain. Sequence alignment of the human EGFR protein tyrosine kinase domain with the viral protein sequence shows substitution of the amino acid residue leucine 861 with glutamine in the activation loop, the same mutation in the EGFR protein tyrosine kinase is indicated in lung cancer (Shigematsu and Gazdar, 2006). 3.1.5.1 Marseilleviridae kinases resemble src kinases. Are these oncogenes? From phylogenetic analysis, we have seen that kinases from giant viruses such as Mimiviridae and Marseilleviridae, giant viruses that have green algae as its natural host (Arantes et al., 2016) are grouped along with retroviral kinases in the same clade and are more close to Serine/threonine-protein kinase transforming proteins as seen in Figure 4. Giant viruses represented in this cluster include Acanthamoeba polyphaga Mimivirus, Marseillevirus, Insectomime virus, Cannes 8 virus, and 15

Pandoravirus dulcis. Marseilleviruses are found to be present in human lymph nodes and hypothesized that they could probably cause Hodgkin's lymphoma (Aherfi et al., 2016). Even viruses belonging to Mimiviridae family are shown to be host pathogens causing pneumonia (Kutikhin et al., 2014). The kinases encoded in giant viruses within this cluster have two protein kinase domains, and in Mimiviridae kinase, there is an additional domain, Guanylate cyclase, tethered in between the two kinase domains. Genomes of these viruses are known to evolve through gene loss or gene gain (Filée, 2015). Based on sequence similarity, RP motif is found to be conserved across all the sequences in the C-terminal tail region of retroviruses and giant viruses (Figure 11).V-YES, V-SRC, and V-FGR SRC kinases that have SH2 and SH3 domains tethered to the PK domain, has a unique DPEERP[S/T] motif in the loop between αH and αI indicating a functional role of this motif in tethering of these two functional domains and stabilizing of Src kinases in closed conformation. Arg-Pro (RP) motif is a subsequence of this conserved motif. Src kinases have a SH2 domain that binds phosphorylated tyrosine residues, a SH3 domain that mediate intra and inter-molecular interactions by binding to proline residues, a C-terminal tail that has tyrosine which gets phosphorylated. Upon tyrosine phosphorylation, the protein attains closed conformation when C-terminal tail binds to SH2 domain and is stabilized by SH3 domain. The impact of mutations of residues in various loop regions and linker region of Src kinases have been studied extensively (Gonfloni et al., 1997), but the functional significance of conserved motif reported in this study has not been reported yet. v-Src, an oncogene, is constitutively active due to lack of C-terminal tail tyrosine for negative regulation, it will be interesting to understand the functional role of residues in the conserved motif. In Table 2, unique residues conserved across this cluster, and conserved residues specific to either retroviruses or giant viruses are listed. These residues are likely to contribute in a specialized function of this cluster of kinases. 3.1.6 Cluster 6 - Kinases encoded in Poxviridae - Shares similarity with eukaryotic kinases CK1 subfamily Eukaryotic Vaccinia-related kinases (VRK), a family of cellular kinases belonging to Casein kinase 1 group have close similarity to the kinases encoded in Poxviridae and thus are well studied (Couñago et al., 2017; Nichols and Traktman, 2004; Olson et al., 2017). The structural and functional similarities between vaccinia virus B1 (vvB1) kinase and Human VRK1 (hVRK1), hVRK2 and hVRK3 are well 16

studied, and the regions that are well conserved in kinase catalytic domain region are highlighted in Nichols and Traktman, 2004. Viruses which encode these protein kinases contain double stranded DNA as their genetic material and fall in the Chordopoxvirinae subfamily of the Poxviridae family. Hosts of these viruses are generally mammals. Vaccinia virus B1R gene product has been shown to demonstrate kinase activity and is located in the virion particle (Banham and Smith, 1992; Lin et al., 1992). A member of this subfamily phosphorylates and interacts with HSR, a late transcription factor which plays an important role in virion morphogenesis (DeMasi and Traktman, 2000). Vaccinia viral kinase is essential for viral replication, and mutation studies suggest a role for the kinase in DNA replication and viral intermediate gene expression (Kovacs et al., 2001). It phosphorylates ribosomal proteins in vitro as well as during infection and thus provides support to the hypothesis implicating the virus in exploiting the host protein synthesis machinery for synthesis and selective translation of its own proteins. This family of kinases can be recognized by a conserved [S/T]RRGDLE motif within the hydrophobic Fhelix (alpha C-4). The hydrophobic structures, catalytic C-spine and Regulatory R-spine, anchor at the C-terminus and N-terminus of F-helix respectively thus organizing the kinase core (Taylor and Kornev, 2011). Highly conserved Asp220 (residue numbering with reference to PDB code: 1ATP) in the above motif forms a electrostatic link with the backbone of HRD motif (Scheeff and Bourne, 2005). These kinases form a unique αC-4 helix between αC and β4 and brings the two lobes of kinases close to each other thus helping in maintaining activated state in closed conformation (Couñago et al., 2017). Ser/Thr residues within [S/T]RRGDLE motif of F-helix is known putative regulatory phosphorylation site in the non-catalytic C-terminal region and may aid in the dissociation of the C-terminal domain from the ATPbinding pocket and activation of VRK1. 3.1.7 Cluster 7 - UL13 Kinases encoded in Alphaherpesvirus 3 Ul13 Kinases of Mardivirus, Simplex virus, and Varicella virus belongs to Alphaherpesvirinae subfamily formed a separate cluster. These are sequences from HHV-1, HHV-2, HHV-3, HHV-4, HHV5, and alphaherpesviruses having other organisms such as Meleagrid (Turkey), Anatid, Equid, Feline, Leporidas hosts. UL13 kinases from the Human Simplex virus stand to be the most well characterized kinases in this cluster (Ng et al., 1998; Purves et al., 1993). These kinases, along with US3 phosphorylate lamins and help in nuclear egress (Cano-Monreal et al., 2009). Like its homologues UL97 and U69 subfamilies, they phosphorylate several cellular and viral proteins apart from 17

autophosphorylation. UL13 from HHV1 and HHV2 hyperphosphorylates EF-1δ and is likely to regulate translation process in infected cells (Long et al., 1999). Cano-Monreal et al., 2008 performed substrate specificity studies of UL13 kinases in HHV-2 and reported a conserved Ser-Pro (SP) motif essential for autophosphorylation. In our study, analysis of the sequences within this cluster revealed a conserved Phenylalanine-Serine (FS) motif in all the UL13 kinases within this cluster (Figure 12), indicating a putative functional role in phosphorylation and viral infection. To recognize eukaryotic homologs of these kinases, BLASTP is used to search against nr database, and no significant hits are found indicating their possible unique role in viral infection. 3.1.8 Cluster 8 - Kinases from Mimiviridae and Unclassified viruses This cluster is formed by putative serine/threonine kinases from unclassified viruses belonging to the genus mimivirus, Megavirus, Moumovirus, and Mamavirus These are related Giant viruses belonging to Mimiviriadae family and have the largest viral genomes of more than 1 megabase and having more than 1000 genes (Colson et al., 2011; Yoosuf et al., 2012). These kinases may have a different functional role compared to the those grouped in Cluster 5, though both are encoded in giant viruses. Genomes of Acanthamoabe polyphaga, A. castellani and Megaviruschiliensi belonging to Mimiviradae are well characterized. The hosts of these viruses are either habitats of marine or freshwater or habitats of soil. These are named mimivirus as they mimic microbes (Raoult et al., 2007) in their genome size and dimensions. A study (Raoult et al., 2004) reported that 14 ORFs of mimiviruses are found to be similar to eukaryotic protein kinase and resemble cell-division related kinases. 3.1.9 Cluster 9 - UL97 Kinases encoded in Human cytomegalovirus (HCMV) This cluster contains kinases from members of the genus Cytomegalovirus, Muromegalovirus and Tupaiid herpesvirus 1 grouped under Betaherpesvirinae subfamily notably the Human Herpesvirus 5 (HHV5). These viral kinases do not share a close similarity with any eukaryotic/prokaryotic sequences, indicating a functional niche for these viral kinases. Host-specific sequence differences are identified within this cluster. The UL97 protein from HCMV has been well characterized as it acts as a antiviral target for the prevention of infection in humans (Krosky et al., 2003). The UL97 kinase has been found to 18

phosphorylate a number of viral and cellular proteins. The cellular proteins that are targeted by the UL97 kinase are also targeted by cellular CDK1/cdc2. The UL97 kinase is also required during nuclear egress, which is accomplished by the phosphorylation of p32, as well as lamins A and C (Kawaguchi & Kato 2003). Another important target appears to be the Retinoblastoma protein Rb, which is normally hyperphosphorylated in infected cells (Prichard 2009). This Rb protein is a tumor suppressor and phosphorylation of this protein results in the modulation of cell cycle. 3.1.10 Cluster 10 - Uncharacterized proteins encoded in Phycodnaviridae and Unclassified viruses Kinases from Micromonas pusilla virus and Ostreococcus lucimarinus virus belonging to Prasinovirus genus of phycodnaviridae formed a discrete cluster. Though not much is known about these viruses, a recent study on prasinovirus attack of Ostreococcus has reported that these viruses are dormant with low viral replication during day time and as the night progresses, they become more precarious (Derelle et al., 2018). The authors in their study have reported the gene expression levels of host and viral genes at the time of infection. From their study, it can be seen that expression levels of genes encoding serine/threonine kinases is high in phase II i.e., 11 to 27 hours post infection when host replication stops and viral transcription is high. 3.1.11 Cluster 11 - Kinases encoded in Iridoviridae This cluster is formed by kinases from genus Ranavirus which belongs to Alphairidovirinae subfamily of Iridoviridae family. They are known to infect Ambystoma (mole salamanders) and Hoplobatrachus tigerinus (Indian bullfrog, Rana tigrina). This cluster also consists of many uncharacterized proteins from iridovirus, which infects invertebrates such as insects (İnce et al., 2018). Kinases in Iridoviridae are known to induce apoptosis, causing host protein shutoff, thus can be used as a biopesticide (Chitnis et al., 2011; Ince et al., 2018). From BLASTP search results of these kinase sequences against nr database, these kinases from Iridoviridae have nearly 95% sequence overlap with Iridoviridae Lymphocystis disease virus 1 that causes tumour like growths on the skin of fish (Zhang et al., 2004) but has no significant sequence similarity with kinases from other viral families and with eukaryotes. But a surprising sequence similarity of 67% over 100% overlap region was reported against hypothetical protein of Flavobacterium sp JRM (WP_039120762). Both these kinases have a common feature, i.e. presence of 2-cysteine adaptor domain. Flavobacterium is the pathogenic bacteria known to cause 19

diseases in fishes leading to their high mortality rate (Loch and Faisal, 2015) and also known to be pathogenic to amphibians (Densmore and Green, 2007). Considering the similarity of kinases from Iridoviridae with the kinase encoded in lymphocystovirus and flavobacteria, it can be hypothesized these kinases have similar functional role and may play a crucial role in pathogenesis. Catalytic loop of Iridoviridae kinases has HND motif, and activation loop has DYG and G[T/K]E motifs. These kinases also have a conserved signature motif PGGYWKVTDS between the catalytic loop and activation loop and this motif is unique to Iridoviridae kinases, and hypothetical protein of Flavobacterium sp JRM (WP_039120762) and this motif may act as a marker. 3.2 Putative substrate binding regions in viral kinases In one of our previous studies, we developed a method to classify kinases into different subfamilies based on their conservation of both sequence and substrate binding regions using three-dimensional structural alignment (Janaki et al., 2016). Crystal structures of substrate bound kinases belonging to AGC (PKA, PKB/AKT, PKC), CAMK (PHK, PIM), CMGC (CDK2), Tyrosine kinases (TK) and others (Haspin) were aligned using PROMALS3D (Pei et al., 2008) and a 3D structural alignment was generated. In our earlier study, the substrate binding residues in kinases were identified by considering all side-chain-side-chain interactions between kinases and their respective substrates. The substrate binding residues were mapped onto the profile generated by aligning 3D structures of kinases belonging to different groups. From the profile, we observed that the binding residues can be mapped to five different subdomains of kinases, which we defined as blocks in this study. To identify putative substrate binding regions in viral kinases, the multiple sequence alignment profile of each viral kinase cluster is aligned to the 3D alignment, and the regions aligned to the five block regions are analyzed. These five block regions are represented as sequence logos to highlight the conserved residues. Sequence logos of these regions in viral kinases belonging to first six clusters are given in Supplementary table 4. Conservation pattern was not found to be significant in the other five clusters (data not shown). From our analysis, we observe that in some of the viral kinases, the residues known to be involved in substrate binding in eukaryotic kinases are conserved. This could be due to same substrate being phosphorylated by host and viral kinases. For example, glycine-rich loop in many of the eukaryotic kinases

has conserved GX[F/Y] motif, and GXF is a Cdc37-interacting motif 20

(Terasawa et al., 2006). GX[F/Y] motif is found in PK1 kinases of Baculoviridae and GXF is found in retroviral kinases indicating their significant role in substrate binding (Block 1 of Supplementary table 4). Similarly, Glu230 in cAMP-dependent protein kinase (PKA) (PDB code: 1ATP) plays a key role in substrate recognition by interacting with P-2 Arginine in PKA substrate (Moore et al., 2003). Glu230 is found to be well conserved in PK1 of baculovirus, retroviral kinases and UL13 of Alphaherpesvirus. Gly200 that interacts with the backbone of phosphorylation site residue in the substrate (Hsu and Traugh, 2010) is well conserved in Ser/Thr protein kinases of Phycodnaviridae, PK1 of Baculoviridae, UL13 of Alphaherpesvirus and STPK of Poxviridae (Block 4 of Supplementary Table 4). Profiles of some viral kinases such as UL13 and UL97 from Herpesviridae did not align well with the profile of structural alignment in few subdomain regions due to high sequence divergence. The signature motifs identified in each block of substrate binding region in viral kinases encoded in different genomes can be further used to classify uncharacterized viral protein kinases. Viruses are known to have conserved short linear motifs for substrate recognition, for host cell interaction and viral replication and immune response (Becerra et al., 2017; Sobhy, 2016). From our study, it is observed that, in each cluster, there are conserved residues within each substrate binding block, and in few of the kinases there are conserved motifs which can be used as signatures for characterizing kinases with unknown function. For example, within PBCV-specific basic domain containing proteins, WYNPDYK motif is conserved within block 5 of most of these proteins indicating its functional role in substrate binding. Similarly, in protein kinases encoded in Poxviridae, there is a WLxGxLPW motif conserved in the block 5 of substrate binding region indicating a significant role in substrate recognition. PHI-BLAST search against nr database using the above pattern has reported hits to VRK kinases in eukaryotes. The structural similarity studies of VRK kinases and vvB1 kinases (Nichols and Traktman, 2004) also reported this motif to be conserved. This analysis can be extended to characterize and annotate kinases of unknown function (uncharacterized) from viral genomes. 4. Conclusions Phosphorylation by viral kinases plays significant roles in the pathogenesis of many viruses. The present analysis recognizes putative Ser/Thr and Tyr kinases encoded in the genomes of viruses. Majority of the identified protein kinases are encoded in herpesvirus, which infect animals, and Phycodnaviridae that 21

infect marine and freshwater algae. The kinases of viruses belonging to same taxonomic lineage form discrete clusters and each cluster forms a viral kinase subfamily. Kinases of poxviruses and retroviruses are found to have high similarity with their eukaryotic counterparts, whereas those encoded in hloroviruses, Herpesviruses, and Iridoviruses are found to be viral specific. Paramecium bursaria chlorovirus (PBCV)- specific basic adaptor domain containing proteins formed two different clusters. The Phycodnaviridae family having double stranded DNA contains more than 30 different species, but this dataset has representation majorly from PBCV species. Another cluster is formed by PK1 kinases encoded in alpha and Betabaculoviruses that have arthropods as their natural hosts. US3, protein kinases encoded in Alphaherpesvirinae subfamily forms a discrete cluster, with host-specificity. One of the interesting observations in this study is that some of the putative protein kinases of giant viruses such as Marseilleviridae, Mimiviridae, Cannes 8 virus and Insectomime virus form cluster along with retroviral kinases. High sequence similarity among retroviral oncogenic proteins and the putative protein kinases of giant viruses may indicate a probable role of the later in pathogenicity. The kinases encoded in Iridoviridae is found to have no close eukaryotic homolog but has more than 30% sequence similarity to the hypothetical protein encoded in flavobacterium. Both flavobacterium and Iridoviridae Lymphocystis disease virus 1 are highly pathogenic and are known to cause diseases in fish leading to their high mortality rate. Viral protein kinases which do not share substantial sequence similarity with kinases from other sources suggest their specific role in viruses and hence can be considered as attractive drug targets. There are substrate binding residues which are conserved across viral and eukaryotic protein kinases, for example in VRK of Poxviridae, PK1 of Baculovirus and in Retroviral kinases. But there are residues which are specific to a particular cluster of viral genomes and distinct from others. Such regions can be used as signatures to classify uncharacterized viral kinases and similar study can be extended even to eukaryotic and bacterial kinases. Acknowledgement: This research was supported by Indian Institute of Science-Department of Biotechnology partnership program as well as by the Mathematical Biology project sponsored by Department of Science and Technology (DST) and Indo-French Centre for the Promotion of Advanced Research (IFCPAR / CEFIPRA) grant (5203-2). Support for Infrastructural facilities from Fund for Improvement of Science and Technology infrastructure (FIST), DST, Ministry of Human Resource Development (MHRD) and 22

Centre for Advanced Study (CAS), University Grants Commission (UGC) is also acknowledged. NS is a J C Bose National Fellow supported by DST. MM was supported by the Kothari Fellowship. CJ would like to acknowledge C-DAC for providing computational resources of PARAM Yuva to carry out this work.

23

References Aherfi, S., Colson, P., Audoly, G., Nappez, C., Xerri, L., Valensi, A., Million, M., Lepidi, H., Costello, R., Raoult, D., 2016. Marseillevirus in lymphoma: a giant in the lymph node. Lancet Infect. Dis. 16, e225–e234. https://doi.org/10.1016/S1473-3099(16)30051-2 Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2 Anamika, Srinivasan, N., Krupa, A., 2005. A genomic perspective of protein kinases in Plasmodium falciparum. Proteins Struct. Funct. Bioinforma. 58, 180–189. https://doi.org/10.1002/prot.20278 Arantes, T.S., Rodrigues, R.A.L., dos Santos Silva, L.K., Oliveira, G.P., de Souza, H.L., Khalil, J.Y., de Oliveira, D.B., Torres, A.A., da Silva, L.L., Colson, P., others, 2016. The large Marseillevirus explores different entry pathways by forming giant infectious vesicles. J. Virol. 90, 5246–5255. Artim, S.C., Mendrola, J.M., Lemmon, M.A., 2012. Assessing the range of kinase autoinhibition mechanisms in the insulin receptor family. Biochem. J. 448, 213–220. https://doi.org/10.1042/BJ20121365 Ashkenazy, H., Abadi, S., Martz, E., Chay, O., Mayrose, I., Pupko, T., Ben-Tal, N., 2016. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 44, W344–W350. https://doi.org/10.1093/nar/gkw408 Banham, A.H., Smith, G.L., 1992. Vaccinia virus gene B1R encodes a 34-kDa serine/threonine protein kinase that localizes in cytoplasmic factories and is packaged into virions. Virology 191, 803–812. Becerra, A., Bucheli, V.A., Moreno, P.A., 2017. Prediction of virus-host protein-protein interactions mediated by short linear motifs. BMC Bioinformatics 18, 163. https://doi.org/10.1186/s12859-0171570-7 Benetti, L., Roizman, B., 2004. Herpes simplex virus protein kinase US3 activates and functionally overlaps protein kinase A to block apoptosis. Proc. Natl. Acad. Sci. U. S. A. 101, 9411–9416. https://doi.org/10.1073/pnas.0403160101 Ben-Neriah, Y., Daley, G., Mes-Masson, A., Witte, O., Baltimore, D., 1986. The chronic myelogenous leukemia-specific P210 protein is the product of the bcr/abl hybrid gene. Science 233, 212. https://doi.org/10.1126/science.3460176 Blume-Jensen, P., Hunter, T., 2001. Oncogenic kinase signalling. Nature 411, 355. Cano-Monreal, G.L., Wylie, K.M., Cao, F., Tavis, J.E., Morrison, L.A., 2009. Herpes simplex virus 2 UL13 protein kinase disrupts nuclear lamins. Virology 392, 137–147. https://doi.org/10.1016/j.virol.2009.06.051

24

Chitnis, N.S., Paul, E.R., Lawrence, P.K., Henderson, C.W., Ganapathy, S., Taylor, P.V., Virdi, K.S., D’Costa, S.M., May, A.R., Bilimoria, S.L., 2011. A Virion-Associated Protein Kinase Induces Apoptosis. J. Virol. 85, 13144–13152. https://doi.org/10.1128/JVI.05294-11 Cohen, P., 2002. Protein kinases - The major drug targets of the twenty-first century? https://doi.org/10.1038/nrd773 Collett, M.S., Erikson, R.L., 1978. Protein kinase activity associated with the avian sarcoma virus src gene product. Proc. Natl. Acad. Sci. U. S. A. 75, 2021–2024. Colson, P., De Lamballerie, X., Yutin, N., Asgari, S., Bigot, Y., Bideshi, D.K., Cheng, X.-W., Federici, B.A., Van Etten, J.L., Koonin, E.V., La Scola, B., Raoult, D., 2013. “Megavirales”, a proposed new order for eukaryotic nucleocytoplasmic large DNA viruses. Arch. Virol. 158, 2517–2521. https://doi.org/10.1007/s00705-013-1768-6 Colson, P., Yutin, N., Shabalina, S.A., Robert, C., Fournous, G., La Scola, B., Raoult, D., Koonin, E.V., 2011. Viruses with More Than 1,000 Genes: Mamavirus, a New Acanthamoeba polyphaga mimivirus Strain, and Reannotation of Mimivirus Genes. Genome Biol. Evol. 3, 737–742. https://doi.org/10.1093/gbe/evr048 Couñago, R.M., Allerston, C.K., Savitsky, P., Azevedo, H., Godoi, P.H., Wells, C.I., Mascarello, A., de Souza Gama, F.H., Massirer, K.B., Zuercher, W.J., Guimarães, C.R.W., Gileadi, O., 2017. Structural characterization of human Vaccinia-Related Kinases (VRK) bound to small-molecule inhibitors identifies different P-loop conformations. Sci. Rep. 7, 7501. https://doi.org/10.1038/s41598-017-07755y Coussens, P.M., Cooper, J.A., Hunter, T., Shalloway, D., 1985. Restriction of the in vitro and in vivo tyrosine protein kinase activities of pp60c-src relative to pp60v-src. Mol. Cell. Biol. 5, 2753–2763. Czernilofsky, A.P., Levinson, A.D., Varmus, H.E., Bishop, J.M., Tischer, E., Goodman, H.M., 1980. Nucleotide sequence of an avian sarcoma virus oncogene (src) and proposed amino acid sequence for gene product. Nature 287, 198–203. Daikoku, T., Yamashita, Y., Tsurumi, T., Maeno, K., Nishiyama, Y., 1993. Purification and biochemical characterization of the protein kinase encoded by the US3 gene of herpes simplex virus type 2. Virology 197, 685–694. https://doi.org/10.1006/viro.1993.1644 DeMasi, J., Traktman, P., 2000. Clustered charge-to-alanine mutagenesis of the vaccinia virus H5 gene: isolation of a dominant, temperature-sensitive mutant with a profound defect in morphogenesis. J. Virol. 74, 2393–2405. Densmore, C.L., Green, D.E., 2007. Diseases of Amphibians. ILAR J. 48, 235–254. https://doi.org/10.1093/ilar.48.3.235 Derelle, E., Yau, S., Moreau, H., Grimsley, N.H., 2018. Prasinovirus Attack of Ostreococcus Is Furtive by Day but Savage by Night. J. Virol. 92, e01703-17. https://doi.org/10.1128/JVI.01703-17 25

Devare, S.G., Reddy, E.P., Law, J.D., Robbins, K.C., Aaronson, S.A., 1983. Nucleotide sequence of the simian sarcoma virus genome: demonstration that its acquired cellular sequences encode the transforming gene product p28sis. Proc. Natl. Acad. Sci. U. S. A. 80, 731–735. Dixit, A., Verkhivker, G.M., 2014. Structure-Functional Prediction and Analysis of Cancer Mutation Effects in Protein Kinases. Comput. Math. Methods Med. 2014, 653487. https://doi.org/10.1155/2014/653487 Duesberg, P.H., 1983. Retroviral transforming genes in normal cells? Nature 304, 219–226. Dunigan, D.D., Cerny, R.L., Bauman, A.T., Roach, J.C., Lane, L.C., Agarkova, I.V., Wulser, K., YanaiBalser, G.M., Gurnon, J.R., Vitek, J.C., Kronschnabel, B.J., Jeanniard, A., Blanc, G., Upton, C., Duncan, G.A., McClung, O.W., Ma, F., Van Etten, J.L., 2012. Paramecium bursaria Chlorella Virus 1 Proteome Reveals Novel Architectural and Regulatory Features of a Giant Virus. J. Virol. 86, 8821– 8834. https://doi.org/10.1128/JVI.00907-12 Dunigan, D.D., Fitzgerald, L.A., Van Etten, J.L., 2006. Phycodnaviruses: a peek at genetic diversity. Virus Res. 117, 119—132. https://doi.org/10.1016/j.virusres.2006.01.024 Eddy, S.R., 2011. Accelerated Profile HMM Searches. PLOS Comput. Biol. 7, e1002195. https://doi.org/10.1371/journal.pcbi.1002195 Eisfeld, A.J., Turse, S.E., Jackson, S.A., Lerner, E.C., Kinchington, P.R., 2006. Phosphorylation of the varicella-zoster virus (VZV) major transcriptional regulatory protein IE62 by the VZV open reading frame 66 protein kinase. J. Virol. 80, 1710–1723. https://doi.org/10.1128/JVI.80.4.1710-1723.2006 Erazo, A., Yee, M.B., Banfield, B.W., Kinchington, P.R., 2011. The Alphaherpesvirus US3/ORF66 Protein Kinases Direct Phosphorylation of the Nuclear Matrix Protein Matrin 3. J. Virol. 85, 568–581. https://doi.org/10.1128/JVI.01611-10 Erikson, E., Collett, M.S., Erikson, R.L., 1978. In vitro synthesis of a functional avian sarcoma virus transforming-gene product. Nature 274, 919–921. Fauquet, C., Fargette, D., 2005. International Committee on Taxonomy of Viruses and the 3,142 unassigned species. Virol. J. 2, 64. https://doi.org/10.1186/1743-422X-2-64 Ferguson, B., Pritchard, M.L., Feild, J., Rieman, D., Greig, R.G., Poste, G., Rosenberg, M., 1985. Isolation and analysis of an Abelson murine leukemia virus-encoded tyrosine-specific kinase produced in Escherichia coli. J. Biol. Chem. 260, 3652–3657. Ferrelli, M.L., Salvador, R., Biedma, M.E., Berretta, M.F., Haase, S., Sciocco-Cap, A., Ghiringhelli, P.D., Romanowski, V., 2012. Genome of Epinotia aporema granulovirus (EpapGV), a polyorganotropic fast killing betabaculovirus with a novel thymidylate kinase gene. BMC Genomics 13, 548–548. https://doi.org/10.1186/1471-2164-13-548 Filée, J., 2015. Genomic comparison of closely related Giant Viruses supports an accordion-like model of evolution. Front. Microbiol. 6. https://doi.org/10.3389/fmicb.2015.00593 26

Gershburg, S., Geltz, J., Peterson, K.E., Halford, W.P., Gershburg, E., 2015. The UL13 and US3 Protein Kinases of Herpes Simplex Virus 1 Cooperate to Promote the Assembly and Release of Mature, Infectious Virions. PLOS ONE 10, e0131420. https://doi.org/10.1371/journal.pone.0131420 Goff, S.P., Gilboa, E., Witte, O.N., Baltimore, D., 1980. Structure of the Abelson murine leukemia virus genome and the homologous cellular gene: studies with cloned viral DNA. Cell 22, 777–785. Gonfloni, S., Williams, J.C., Hattula, K., Weijland, A., Wierenga, R.K., Superti-Furga, G., 1997. The role of the linker between the SH2 domain and catalytic domain in the regulation and function of Src. EMBO J. 16, 7261–7271. https://doi.org/10.1093/emboj/16.24.7261 Gooding, A.J., Schiemann, W.P., 2016. Harnessing protein kinase A activation to induce mesenchymalepithelial programs to eliminate chemoresistant, tumor-initiating breast cancer cells. Transl. Cancer Res. 5, S226–S232. https://doi.org/10.21037/tcr.2016.08.09 Hanks, S.K., Hunter, T., 1995. Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 9, 576–596. Hardie, D.G., 1990. Roles of protein kinases and phosphatases in signal transduction. Symp. Soc. Exp. Biol. 44, 241–255. Hemmer, W., McGlone, M., Tsigelny, I., Taylor, S.S., 1997. Role of the glycine triad in the ATPbinding site of cAMP-dependent protein kinase. J. Biol. Chem. 272, 16946–16954. Henneke, G., Koundrioukoff, S., Hübscher, U., 2003. Multiple roles for kinases in DNA replication. EMBO Rep. 4, 252–256. https://doi.org/10.1038/sj.embor.embor774 Hsu, Y.-H., Traugh, J.A., 2010. Reciprocally Coupled Residues Crucial for Protein Kinase Pak2 Activity Calculated by Statistical Coupling Analysis. PLOS ONE 5, e9455. https://doi.org/10.1371/journal.pone.0009455 Iba, H., Cross, F.R., Garber, E.A., Hanafusa, H., 1985. Low level of cellular protein phosphorylation by nontransforming overproduced p60c-src. Mol. Cell. Biol. 5, 1058–1066. Iba, H., Takeya, T., Cross, F.R., Hanafusa, T., Hanafusa, H., 1984. Rous sarcoma virus variants that carry the cellular src gene instead of the viral src gene cannot transform chicken embryo fibroblasts. Proc. Natl. Acad. Sci. U. S. A. 81, 4424–4428. İnce, İ.A., Özcan, O., Ilter-Akulke, A.Z., Scully, E.D., Özgen, A., 2018. Invertebrate Iridoviruses: A Glance over the Last Decade. Viruses 10, 161. https://doi.org/10.3390/v10040161 Iyer, L.M., Balaji, S., Koonin, E.V., Aravind, L., 2006. Evolutionary genomics of nucleo-cytoplasmic large DNA viruses. Comp. Genomics Evol. Complex Viruses 117, 156–184. https://doi.org/10.1016/j.virusres.2006.01.009 Jacob, T., Van den Broeke, C., Favoreel, H.W., 2011. Viral Serine/Threonine Protein Kinases. J. Virol. 85, 1158–1173. https://doi.org/10.1128/JVI.01369-10 27

Janaki, C., Srinivasan, N., Manoharan, M., 2016. Classification of Protein Kinases Influenced by Conservation of Substrate Binding Residues. Methods Mol. Biol. Clifton NJ 1415, 301–313. https://doi.org/10.1007/978-1-4939-3572-7_15 Jeanniard, A., Dunigan, D.D., Gurnon, J.R., Agarkova, I.V., Kang, M., Vitek, J., Duncan, G., McClung, O.W., Larsen, M., Claverie, J.-M., Van Etten, J.L., Blanc, G., 2013. Towards defining the chloroviruses: a genomic journey through a genus of large DNA viruses. BMC Genomics 14, 158. https://doi.org/10.1186/1471-2164-14-158 Jehle, J.A., Blissard, G.W., Bonning, B.C., Cory, J.S., Herniou, E.A., Rohrmann, G.F., Theilmann, D.A., Thiem, S.M., Vlak, J.M., 2006. On the classification and nomenclature of baculoviruses: a proposal for revision. Arch. Virol. 151, 1257–1266. https://doi.org/10.1007/s00705-006-0763-6 Kan, N., Flordellis, C., Mark, G., Duesberg, P., Papas, T., 1984. A common onc gene sequence transduced by avian carcinoma virus MH2 and by murine sarcoma virus 3611. Science 223, 813—816. https://doi.org/10.1126/science.6320371 Kato, A., Arii, J., Shiratori, I., Akashi, H., Arase, H., Kawaguchi, Y., 2009. Herpes simplex virus 1 protein kinase Us3 phosphorylates viral envelope glycoprotein B and regulates its expression on the cell surface. J. Virol. 83, 250–261. https://doi.org/10.1128/JVI.01451-08 Keating, J.A., Striker, R., 2012. Phosphorylation events during viral infections provide potential therapeutic targets. Rev. Med. Virol. 22, 166–181. https://doi.org/10.1002/rmv.722 Kishino, H., Miyata, T., Hasegawa, M., 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 31, 151–160. https://doi.org/10.1007/BF02109483 Kornev, A.P., Taylor, S.S., 2010. Defining the Conserved Internal Architecture of a Protein Kinase. Biochim. Biophys. Acta 1804, 440–444. https://doi.org/10.1016/j.bbapap.2009.10.017 Kostich, M., English, J., Madison, V., Gheyas, F., Wang, L., Qiu, P., Greene, J., Laz, T.M., 2002. Human members of the eukaryotic protein kinase family. Genome Biol. 3, research0043.1research0043.12. Koutroumani, M., Papadopoulos, G.E., Vlassi, M., Nikolakaki, E., Giannakouros, T., 2017. Evidence for disulfide bonds in SR Protein Kinase 1 (SRPK1) that are required for activity and nuclear localization. PLOS ONE 12, e0171328. https://doi.org/10.1371/journal.pone.0171328 Kovacs, G.R., Vasilakis, N., Moss, B., 2001. Regulation of Viral Intermediate Gene Expression by the Vaccinia Virus B1 Protein Kinase. J. Virol. 75, 4048–4055. https://doi.org/10.1128/JVI.75.9.40484055.2001 Krosky, P.M., Baek, M.-C., Coen, D.M., 2003. The Human Cytomegalovirus UL97 Protein Kinase, an Antiviral Drug Target, Is Required at the Stage of Nuclear Egress. J. Virol. 77, 905–914. https://doi.org/10.1128/JVI.77.2.905-914.2003

28

Krupa, A., R Abhinandan, K., Srinivasan, N., 2004. KinG: A database of protein kinases in genomes. https://doi.org/10.1093/nar/gkh019 Krupa, A., Srinivasan, N., 2005. Diversity in domain architectures of Ser/Thr kinases and their homologues in prokaryotes. BMC Genomics 6, 129. https://doi.org/10.1186/1471-2164-6-129 Krupa, A., Srinivasan, N., 2002. The repertoire of protein kinases encoded in the draft version of the human genome: atypical variations and uncommon domain combinations. Genome Biol. 3, research0066.1. https://doi.org/10.1186/gb-2002-3-12-research0066 Kutikhin, A.G., Yuzhalin, A.E., Brusina, E.B., 2014. Mimiviridae, Marseilleviridae, and virophages as emerging human pathogens causing healthcare-associated infections. GMS Hyg. Infect. Control 9. https://doi.org/10.3205/dgkh000236 Leach, N., Bjerke, S.L., Christensen, D.K., Bouchard, J.M., Mou, F., Park, R., Baines, J., Haraguchi, T., Roller, R.J., 2007. Emerin is hyperphosphorylated and redistributed in herpes simplex virus type 1infected cells in a manner dependent on both UL34 and US3. J. Virol. 81, 10792–10803. https://doi.org/10.1128/JVI.00196-07 Leader, D.P., 1993. Viral protein kinases and protein phosphatases. Pharmacol. Ther. 59, 343–389. https://doi.org/10.1016/0163-7258(93)90075-O Leader, D.P., Katan, M., 1988. Viral Aspects of Protein Phosphorylation. J. Gen. Virol. 69, 1441–1464. https://doi.org/10.1099/0022-1317-69-7-1441 Lee, J.-Y., Lucas, W.J., 2001. Phosphorylation of viral movement proteins – regulation of cell-to-cell trafficking. Trends Microbiol. 9, 5–8. https://doi.org/10.1016/S0966-842X(00)01901-6 Li, J.J., Cao, C., Fixsen, S.M., Young, J.M., Ono, C., Bando, H., Elde, N.C., Katsuma, S., Dever, T.E., Sicheri, F., 2015. Baculovirus protein PK2 subverts eIF2α kinase function by mimicry of its kinase domain C-lobe. Proc. Natl. Acad. Sci. U. S. A. 112, E4364–E4373. https://doi.org/10.1073/pnas.1505481112 Liang, C., Li, M., Dai, X., Zhao, S., Hou, Y., Zhang, Y., Lan, D., Wang, Y., Chen, X., 2013. Autographa californica multiple nucleopolyhedrovirus PK-1 is essential for nucleocapsid assembly. Virology 443, 349–357. https://doi.org/10.1016/j.virol.2013.05.025 Lin, S., Chen, W., Broyles, S.S., 1992. The vaccinia virus B1R gene product is a serine/threonine protein kinase. J. Virol. 66, 2717–2723. Loch, T.P., Faisal, M., 2015. Emerging flavobacterial infections in fish: A review. J. Adv. Res. 6, 283– 300. https://doi.org/10.1016/j.jare.2014.10.009 Long, M.C., Leong, V., Schaffer, P.A., Spencer, C.A., Rice, S.A., 1999. ICP22 and the UL13 protein kinase are both required for herpes simplex virus-induced modification of the large subunit of RNA polymerase II. J. Virol. 73, 5593–5604. 29

Manning, G., Whyte, D.B., Martinez, R., Hunter, T., Sudarsanam, S., 2002. The Protein Kinase Complement of the Human Genome. Science 298, 1912–1934. https://doi.org/10.1126/science.1075762 Marchler-Bauer, A., Bryant, S.H., 2004. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 32, W327–W331. https://doi.org/10.1093/nar/gkh454 Martina, B.E.E., Airikkala, M.I., Harder, T.C., Amerongen, G. van, Osterhaus, A.D.M.E., 2001. A candidate phocid herpesvirus vaccine that provides protection against feline herpesvirus infection. Vaccine 20, 943–948. https://doi.org/10.1016/S0264-410X(01)00378-4 Mellon, P.L., Clegg, C.H., Correll, L.A., McKnight, G.S., 1989. Regulation of transcription by cyclic AMP-dependent protein kinase. Proc. Natl. Acad. Sci. U. S. A. 86, 4887–4891. Min, X., Lee, B.-H., Cobb, M.H., Goldsmith, E.J., 2004. Crystal Structure of the Kinase Domain of WNK1, a Kinase that Causes a Hereditary Form of Hypertension. Structure 12, 1303–1311. https://doi.org/10.1016/j.str.2004.04.014 Mistry, J., Finn, R.D., Eddy, S.R., Bateman, A., Punta, M., 2013. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 41, e121–e121. https://doi.org/10.1093/nar/gkt263 Moore, M.J., Adams, J.A., Taylor, S.S., 2003. Structural basis for peptide binding in protein kinase A. Role of glutamic acid 203 and tyrosine 204 in the peptide-positioning loop. J. Biol. Chem. 278, 10613– 10618. https://doi.org/10.1074/jbc.M210807200 Morariu, V.I., Srinivasan, B.V., Raykar, V.C., Duraiswami, R., Davis, L.S., 2008. Automatic online tuning for fast Gaussian summation, in: Advances in Neural Information Processing Systems (NIPS). Mou, F., Forest, T., Baines, J.D., 2007. US3 of herpes simplex virus type 1 encodes a promiscuous protein kinase that phosphorylates and alters localization of lamin A/C in infected cells. J. Virol. 81, 6459–6470. https://doi.org/10.1128/JVI.00380-07 Mou, F., Wills, E., Baines, J.D., 2009. Phosphorylation of the UL31 Protein of Herpes Simplex Virus 1 by the US3-Encoded Kinase Regulates Localization of the Nuclear Envelopment Complex and Egress of Nucleocapsids. J. Virol. 83, 5181–5191. https://doi.org/10.1128/JVI.00090-09 Neil, J.C., Ghysdael, J., Vogt, P.K., Smart, J.E., 1981. Homologous tyrosine phosphorylation sites in transformation-specific gene products of distinct avian sarcoma viruses. Nature 291, 675–677. Ng, T.I., Ogle, W.O., Roizman, B., 1998. UL13 protein kinase of herpes simplex virus 1 complexes with glycoprotein E and mediates the phosphorylation of the viral Fc receptor: glycoproteins E and I. Virology 241, 37–48. Nichols, R.J., Traktman, P., 2004. Characterization of three paralogous members of the Mammalian vaccinia related kinase family. J. Biol. Chem. 279, 7934–7946. https://doi.org/10.1074/jbc.M310813200

30

Nimchuk, Z.L., Tarr, P.T., Meyerowitz, E.M., 2011. An Evolutionarily Conserved Pseudokinase Mediates Stem Cell Production in Plants. Plant Cell 23, 851–854. https://doi.org/10.1105/tpc.110.075622 Nolen, B., Taylor, S., Ghosh, G., 2004. Regulation of protein kinases; controlling activity through activation segment conformation. Mol. Cell 15, 661–675. https://doi.org/10.1016/j.molcel.2004.08.024 Olson, A.T., Rico, A.B., Wang, Z., Delhon, G., Wiebe, M.S., 2017. Deletion of the Vaccinia Virus B1 Kinase Reveals Essential Functions of This Enzyme Complemented Partly by the Homologous Cellular Kinase VRK2. J. Virol. 91. https://doi.org/10.1128/JVI.00635-17 Paez, J.G., Jänne, P.A., Lee, J.C., Tracy, S., Greulich, H., Gabriel, S., Herman, P., Kaye, F.J., Lindeman, N., Boggon, T.J., Naoki, K., Sasaki, H., Fujii, Y., Eck, M.J., Sellers, W.R., Johnson, B.E., Meyerson, M., 2004. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science 304, 1497–1500. https://doi.org/10.1126/science.1099314 Pearce, L.R., Komander, D., Alessi, D.R., 2010. The nuts and bolts of AGC protein kinases. Nat. Rev. Mol. Cell Biol. 11, 9. Pei, J., Kim, B.-H., Grishin, N.V., 2008. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 36, 2295–2300. https://doi.org/10.1093/nar/gkn072 Pereira, S.F.F., Goss, L., Dworkin, J., 2011. Eukaryote-like serine/threonine kinases and phosphatases in bacteria. Microbiol. Mol. Biol. Rev. MMBR 75, 192–212. https://doi.org/10.1128/MMBR.00042-10 Pines, J., 1994. Protein kinases and cell cycle control. Semin. Cell Biol. 5, 399–408. Porter, M., Schindler, T., Kuriyan, J., T Miller, W., 2000. Reciprocal regulation of Hck activity by phosphorylation of Tyr(527) and Tyr(416). Effect of introducing a high affinity intramolecular SH2 ligand. https://doi.org/10.1074/jbc.275.4.2721 Purves, F.C., Ogle, W.O., Roizman, B., 1993. Processing of the herpes simplex virus regulatory protein alpha 22 mediated by the UL13 protein kinase determines the accumulation of a subset of alpha and gamma mRNAs and proteins in infected cells. Proc. Natl. Acad. Sci. U. S. A. 90, 6701–6705. Purves, F.C., Roizman, B., 1992. The UL13 gene of herpes simplex virus 1 encodes the functions for posttranslational processing associated with phosphorylation of the regulatory protein alpha 22. Proc. Natl. Acad. Sci. U. S. A. 89, 7310–7314. Quintaje, S.B., Orchard, S., 2008. The Annotation of Both Human and Mouse Kinomes in UniProtKB/Swiss-Prot: One Small Step in Manual Annotation, One Giant Leap for Full Comprehension of Genomes. Mol. Cell. Proteomics MCP 7, 1409–1419. https://doi.org/10.1074/mcp.R700001-MCP200 Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.-M., 2004. The 1.2-Megabase Genome Sequence of Mimivirus. Science 306, 1344. https://doi.org/10.1126/science.1101485 31

Raoult, D., Scola, B.L., Birtles, R., 2007. The Discovery and Characterization of Mimivirus, the Largest Known Virus and Putative Pneumonia Agent. Clin. Infect. Dis. 45, 95–102. https://doi.org/10.1086/518608 Rees-Jones, R.W., Goff, S.P., 1988. Insertional mutagenesis of the Abelson murine leukemia virus genome: identification of mutants with altered kinase activity and defective transformation ability. J. Virol. 62, 978–986. Riedel, H., Schlessinger, J., Ullrich, A., 1987. A chimeric, ligand-binding v-erbB/EGF receptor retains transforming potential. Science 236, 197. https://doi.org/10.1126/science.3494307 Roebroek, A.J., Schalken, J.A., Onnekink, C., Bloemers, H.P., Van de Ven, W.J., 1987. Structure of the feline c-fes/fps proto-oncogene: genesis of a retroviral oncogene. J. Virol. 61, 2009–2016. Roskoski, R., 2015. A historical overview of protein kinases and their targeted small molecule inhibitors. Pharmacol. Res. 100, 1–23. https://doi.org/10.1016/j.phrs.2015.07.010 Šali, A., Blundell, T.L., 1993. Comparative Protein Modelling by Satisfaction of Spatial Restraints. J. Mol. Biol. 234, 779–815. https://doi.org/10.1006/jmbi.1993.1626 Scheeff, E.D., Bourne, P.E., 2005. Structural Evolution of the Protein Kinase–Like Superfamily. PLoS Comput. Biol. 1, e49. https://doi.org/10.1371/journal.pcbi.0010049 Scheeff, E.D., Eswaran, J., Bunkoczi, G., Knapp, S., Manning, G., 2009. Structure of the Pseudokinase VRK3 Reveals a Degraded Catalytic Site, a Highly Conserved Kinase Fold, and a Putative Regulatory Binding Site. Struct. England1993 17, 128–138. https://doi.org/10.1016/j.str.2008.10.018 Shigematsu, H., Gazdar, A.F., 2006. Somatic mutations of epidermal growth factor receptor signaling pathway in lung cancers. Int. J. Cancer 118, 257–262. https://doi.org/10.1002/ijc.21496 Sobhy, H., 2016. A Review of Functional Motifs Utilized by Viruses. Proteomes 4, 3. https://doi.org/10.3390/proteomes4010003 Spatz, S.J., Schat, K.A., 2011. Comparative genomic sequence analysis of the Marek’s disease vaccine strain SB-1. Virus Genes 42, 331–338. https://doi.org/10.1007/s11262-011-0573-0 Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24, 1596–9. Tan, K.B., 1975. Comparative study of the protein kinase associated with animal viruses. Virology 64, 566–570. Taylor, S.S., 2012. 11. PKA: Prototype for Dynamic Signaling in Time and Space. Quant. Biol. Mol. Cell. Syst. 267. Taylor, S.S., Kornev, A.P., 2011. Protein Kinases: Evolution of Dynamic Regulatory Proteins. Trends Biochem. Sci. 36, 65–77. https://doi.org/10.1016/j.tibs.2010.09.006 32

Terasawa, K., Yoshimatsu, K., Iemura, S., Natsume, T., Tanaka, K., Minami, Y., 2006. Cdc37 Interacts with the Glycine-Rich Loop of Hsp90 Client Kinases. Mol. Cell. Biol. 26, 3378. https://doi.org/10.1128/MCB.26.9.3378-3389.2006 Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–80. Thureen, D.R., Keeler, C.L., 2006. Psittacid Herpesvirus 1 and Infectious Laryngotracheitis Virus: Comparative Genome Sequence Analysis of Two Avian Alphaherpesviruses. J. Virol. 80, 7863–7872. https://doi.org/10.1128/JVI.00134-06 Treiber, D.K., Shah, N.P., 2013. Ins and Outs of Kinase DFG Motifs. Chem. Biol. 20, 745–746. https://doi.org/10.1016/j.chembiol.2013.06.001 Van Etten, J.L., Meints, R.H., 1999. Giant Viruses Infecting Algae. Annu. Rev. Microbiol. 53, 447–494. https://doi.org/10.1146/annurev.micro.53.1.447 Vijayan, R., He, P., Modi, V., Duong-Ly, K.C., Ma, H., Peterson, J.R., Dunbrack, R.L., Levy, R.M., 2015. Conformational Analysis of the DFG-out Kinase Motif and Biochemical Profiling of Structurally Validated Type II Inhibitors. J. Med. Chem. 58, 466–479. https://doi.org/10.1021/jm501603h Vogt, P.K., 2012. Retroviral Oncogenes: A Historical Primer. Nat. Rev. Cancer 12, 639–648. https://doi.org/10.1038/nrc3320 Wang, X., Charng, W.-L., Chen, C.-A., Rosenfeld, J.A., Shamsi, A.A., Al-Gazali, L., McGuire, M., Mew, N.A., Arnold, G.L., Qu, C., Ding, Y., Muzny, D.M., Gibbs, R.A., Eng, C.M., Walkiewicz, M., Xia, F., Plon, S.E., Lupski, J.R., Schaaf, C.P., Yang, Y., 2017. Germline mutations in ABL1 cause an autosomal dominant syndrome characterized by congenital heart defects and skeletal malformations. Nat. Genet. 49, 613–617. https://doi.org/10.1038/ng.3815 Wilson, W.H., Van Etten, J.L., Allen, M.J., 2009. The Phycodnaviridae: The Story of How Tiny Giants Rule the World. Curr. Top. Microbiol. Immunol. 328, 1–42. Witte, O.N., Dasgupta, A., Baltimore, D., 1980. Abelson murine leukaemia virus protein is phosphorylated in vitro to form phosphotyrosine. Nature 283, 826. https://doi.org/10.1038/283826a0 Wyke, J.A., Stoker, A.W., 1987. Genetic analysis of the form and function of the viral src oncogene product. Biochim. Biophys. Acta BBA - Rev. Cancer 907, 47–69. https://doi.org/10.1016/0304419X(87)90018-7 Yoosuf, N., Yutin, N., Colson, P., Shabalina, S.A., Pagnier, I., Robert, C., Azza, S., Klose, T., Wong, J., Rossmann, M.G., La Scola, B., Raoult, D., Koonin, E.V., 2012. Related Giant Viruses in Distant Locations and Different Habitats: Acanthamoeba polyphaga moumouvirus Represents a Third Lineage of the Mimiviridae That Is Close to the Megavirus Lineage. Genome Biol. Evol. 4, 1324–1330. https://doi.org/10.1093/gbe/evs109 33

Zabarovskiĭ, E.R., 1985. [Retroviral oncogenes and their cellular proto-oncogenes]. Mol. Biol. (Mosk.) 19, 9–35. Zhang, Q.-Y., Xiao, F., Xie, J., Li, Z.-Q., Gui, J.-F., 2004. Complete Genome Sequence of Lymphocystis Disease Virus Isolated from China. J. Virol. 78, 6982–6994. https://doi.org/10.1128/JVI.78.13.6982-6994.2004

34

Table 1: Presence of key functional residues of Glycine-rich loop, catalytic loop, and activation loop in different viral kinases Virus Family

Phycodnaviridae (Chlorovirus)

Phycodnaviridae (Prasinovirus)

AlphaBaculovirus

Protein name

GxGXXG

HRD

DFG

APE

(Glycine-rich loop)

(Catalyt ic motif)

(Activation loop)

(Activation loop)

Ser/Thr kinases

xxGxxx

H[R/L]D

D[F/L/W]G

[A/S]PE

PBCV-specific basic adaptor domain containing protein

GxGxxG

H[G/A/L ]D

DFG

2-cysteine adaptor domain containing protein

GxGXX[G/x]

HHD

DFG

X[S][H/D]

Protein kinase 1 (PK1) ORF3

xxGxxG

HND

DYG

SPE

--

HND

D[Y/F/C][G/D]

SPE

BetaBaculovirus Alphaherpesvirus

US3

xxGXXG

HRD

D[F/L]G

[S/A]PE

Herpesviridae

UL13

[F/Y/L]xGxx G

HLD

D[F/Y/L][S/N]

N[N/R/G/K ][E/x]

UL97

-

HLD

D[F/Y]S

PSE

Ser/Thr kinases and B1 protein

GxGxxG

H[G/S/A ]D

D[Y/F]G

[P/S]xD

Poxviridae

35

Iridoviridae

Phosphotransfera se

GxGXXG

H[N/Y] D

DYG

G[T/K]E

Retrovirus

Tyrosine-protein kinase transforming protein and Serine/threonineprotein kinasetransforming protein

GxGxxG

HRD

DFG

[A][P/L]E

36

Table 2: Residues conserved in Cluster 5- Kinases encoded in Retroviruses and Giant Viruses Viral genomes

Reference Conserved

encoding Kinases

sequence

Position

Residue/Motif

Within known Region Substrate

(Uniprot

binding

ID)

region (Yes/No)

All Retroviruses

O92809

GxGxxG

554

Yes

Glycinerich loop

V-FES in Feline

P00542

RLRADN

366-368

No

Sarcoma virus and

Between β1 and β-2

Fujinami Sarcoma virus

V-ERBB in Avian

P00535

Erythroblastosis and Avian leukosis

EGEKV[K/T]I

154-160

No

β-2

RDPPRYL

397-399

No

C-terminal tail

virus (Alpharetrovirus)

Serine/threonine protein kinase transforming

Q67624

SNP

174-176

Pattern:[S/K/D] NP 37

No

αF helix

protein (Avian

INNRDQIIFM

Retrovirus

VGRGYAS

203-219

Yes

Block5 (αF helix to αG-helix)

MIL)

38

Legend to Figures Figure 1: Methodology used to identify kinases encoded in viral genomes Figure 2: Viruses belonging to double-stranded DNA (dsDNA) and single stranded RNA (ssRNA) categories that code for kinases are depicted. Figure 3: a. Structure of Eukaryotic Protein kinase (EPK) catalytic domain (PDB Code - 1ATP).The small N-terminal lob is highlighted in light blue colour, and the C-terminal lobe is highlighted in red. b. Key residues/motifs in kinases driving catalytic function in N-terminal and C-terminal regions are highlighted. Figure 4: Phylogenetic analysis of kinases in viral genomes belonging to eleven most populated clusters. Color codes are given in the panel. US3 and UL13, protein kinases in viruses belonging to Alphaherpesvirinae subfamily formed two discrete clusters. Figure 5: Phylogenetic analysis of sequences in Cluster-4 (Alphaherpesvirus US3 homologs) after clustering sequences at 70% identity and 70% query coverage. Kinases encoded in viruses infecting birds forms a clade (orange), and other clades are formed by viruses infecting mammals. These include kinases encoded in viruses infecting human (red), rabbit and monkeys (purple), seals and cats (blue), horse (green), pig and buffalo (light green). Figure 6: Multiple sequence alignment of US3 encoded in Gallid Herpes virus 1 (GaHV1) or Infectious laryngotracheitis virus (ILTV), GaHV-2, GaHV-3, and Meleagrid herpesvirus 1 (MeHV-1) (Turkey herpesvirus) using ClustalW 2.1. The amino acids unique to ILTV are highlighted in yellow, unique to GeHV-3 in Cyan, MeHV1 in green. Amino acids that are conserved in MeHV1 and GeHV-3 and not in others are highlighted in gray. Glycine-rich motif is highlighted in a box, and the serine conserved only in GeHV-3 is highlighted in Cyan. Figure 7: Multiple sequence alignment of kinases from Alphaherpesvirinae. Glutamic acid (E172) and Trp-Lys-Asp (WKD) motif at position 249-251, highlighted in red color are the conserved residues unique to Infectious laryngotracheitis virus (ILTV) and Psittacid herpesvirus (PSHV).

39

Figure 8: Domain architecture of retroviral kinases: Tyrosine and Serine/threonine-protein kinasetransforming proteins contain Tyr kinase and Ser/Thr kinase domains, respectively. V-ROS, V-RYK, V-ERBB, Gag,V-Erb,V-Erb-B, V-SEA, V-FES, V-FES Fa-FeSV, V-YES, V-FPS, V-FGR SRC-2, VRAF, V-MIL, V-RMIL, V-MOS, and V-AKT are retroviral oncogenes. ROS - Reactive oxygen species, RYK - receptor-like tyrosine kinase, ERBB - Epidermal growth factor receptor, SEA -S13 Erythroblastosis oncogene homolog, FES - Feline sarcoma oncogene, SRC - v-src sarcoma (Schmidt– Ruppin A-2) viral oncogene homolog (avian), YES - v-yes-1 Yamaguchi sarcoma viral related oncogene homolog, FPS - Fujumani sarcoma virus oncogene, FGR - Gardner–Rasheed feline sarcoma viral (vFGR) oncogene homolog. Figure 9. Alignment of Abl kinases in Feline sarcoma virus (ABL_FSVHY), Abelson murine leukemiavirus (ABL_MLVAB), Felis Catus (ABL1_FELCA) and Human ABL (ABL1_HUMAN). Conserved residues and point mutations are highlighted in yellow and with an arrow. Figure 10. Alignment of V-FES kinase domain in Feline sarcoma virus (P00542) and FES/FPS encoded in the host genome Felis catus (P14238). A single mutation observed is highlighted in yellow. Figure 11: Multiple sequence alignment of kinases within cluster 5, formed by retroviruses and giant viruses (Marseilleviridae and Mimiviridae). RP motif is conserved in the c-terminal tail region of all the sequences. The motif [D/E]P[E/x][E/D/x]RP[T/S/x] conserved in these kinases is highlighted in green. The kinases encoded in giant viruses are highlighted in light yellow box. D1 and D2 stands for domain 1 and 2 in giant viruses. Figure 12: Multiple sequence alignment of UL13 kinases from Mardivirus, Simplex virus, and Varicellovirus, of Alphaherpesvirinae subfamily. Kinases are encoded in Bovine, Gallid, Anatid, Meleagrid, Equine, Human, Cercopithecine, Chimpanzee, Lepoprid, Saimiriine herpesviruses (HV). Conserved Phe-Ser (FS) motif in UL13 kinases that could have a putative functional role in phosphorylation and viral infection is highlighted in a box.

40

Abbreviations: HMM - Hidden Markov Model FSVHY - Feline sarcoma virus MLVAB - Abelson murine leukemia virus, FELCA - Felis catus EGFR – Epidermal Growth Factor Receptor CK1 – Casein kinase 1 VRK - Vaccinia-related kinases CDK – Cyclin Dependent Kinase TK – Tyrosyl kinase

Conflict of Interest None declared

Title: Unity and diversity among viral kinases Authors: Chintalapati Janaki, Manoharan Malini, Nidhi Tyagi and Narayanaswamy Srinivasan

Highlights     

Ser/Thr and Tyr kinases encoded in viral genomes are investigated. 716 putative protein kinases encoded in 390 viral genomes are analyzed. Kinases from Alphaherpesviruses formed host-specific clusters. Giant viral kinases are found to be similar to oncogenic retroviral kinases. Substrate binding residues in certain viral kinases are similar to their cellular counterparts. 41

42

43

44

45

46

47

48

49

50

51

52

53