Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules

Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules

FEBS Letters xxx (2015) xxx–xxx journal homepage: www.FEBSLetters.org Evolutionary analysis of the global landscape of protein domain types and doma...

2MB Sizes 0 Downloads 24 Views

FEBS Letters xxx (2015) xxx–xxx

journal homepage: www.FEBSLetters.org

Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules Ti-Cheng Chang, Ioannis Stergiopoulos ⇑ Department of Plant Pathology, University of California Davis, Davis, CA, USA

a r t i c l e

i n f o

Article history: Received 15 April 2015 Revised 11 May 2015 Accepted 20 May 2015 Available online xxxx Edited by Takashi Gojobori Keywords: Chitin Family 14 carbohydrate-binding module Modularity Promiscuity Versatility Supra-domain

a b s t r a c t Domain promiscuity is a powerful evolutionary force that promotes functional innovation in proteins, thus increasing proteome and organismal complexity. Carbohydrate-binding modules, in particular, are known to partake in complex modular architectures that play crucial roles in numerous biochemical and molecular processes. However, the extent, functional, and evolutionary significance of promiscuity is shrouded in mystery for most CBM families. Here, we analyzed the global promiscuity of family 14 carbohydrate-binding modules (CBM14s) and show that fusion, fission, and reorganization events with numerous other domain types interplayed incessantly in a lineage-dependent manner to likely facilitate species adaptation and functional innovation in the family. Ó 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

1. Introduction Carbohydrate-binding modules (CBMs) are ubiquitous molecules in nature that are frequently found as discrete non-catalytic components of carbohydrate-active enzymes (CAZymes), in which they promote the avidity of the enzyme for the target saccharide substrate. As a result, the general architecture of modular CAZymes is frequently composed of the catalytic module linked to one or several CBMs, while additional discrete domains might

Abbreviations: CBM, carbohydrate-binding modules; CBM14, family 14 carbohydrate-binding module; CAZymes, carbohydrate-active enzymes; HMM, Hidden Markov Model; SP, signal peptide; GH18, glycoside hydrolase family 18 domain; GH19, glycoside hydrolase family 19 domain; LDLa, lipoprotein receptor class A domain; IGv, Immunoglobulin variable-set domain; GlcNAc, N-acetylglucosamine (or N-acetyl-D-glucosamine); Ig, immunoglobulin; VCBPs, variable region-containing chitin-binding proteins; SRCR, Scavenger receptor cysteine-rich protein domain; Sp, serine protease (e.g. as in Sp22D gene from Anopheles gambiae); Pdi, polysaccharide deacetylase domain Author contributions: T.C., performed analyses; analyzed data; wrote the manuscript. I.S., conceived and supervised the study; analyzed data; wrote the manuscript. ⇑ Corresponding author at: University of California Davis, Department of Plant Pathology, One Shield Avenue, Davis, CA 95616-8751, USA. Fax: +1 530 752 5674. E-mail address: [email protected] (I. Stergiopoulos).

be present as well, thereby increasing the complexity of these architectures [1,2]. Overall, the high propensity of CBMs to fuse with other domains suggests that these modules are part of an organism’s proteome backbone that can be readily recruited for service in a milieu of enzymatic and biochemical processes [3]. However, despite the functional and evolutionary significance of acknowledging modular arrangements in CBM-containing proteins, the number and diversity of combinations in which a particular CBM can be involved, often referred to as versatility or promiscuity [4,5], remains for most CBM families largely unknown. Recently, we reported on the molecular evolutionary analysis of family 14 carbohydrate-binding modules (CBM14s) across all domains of life [6]. Members of this family show specific affinity for chitin, a b(1 ? 4) linked N-acetyl-D-glucosamine (GlcNAc) polysaccharide, and thus, not surprisingly, CBM14s are frequently connected to catalytic domains with chitinolytic activity. Our previous analysis indicated that the evolution of this family was largely shaped by horizontal gene transfer, multiple lineage-specific expansions and contractions, and positive selection that were overall suggestive of functional diversification [6]. Here, we expand these studies to further examine the dynamics of the CBM14 family evolution from the perspective of versatility or promiscuity, by surveying the diversity of domain types and complexity of domain

http://dx.doi.org/10.1016/j.febslet.2015.05.048 0014-5793/Ó 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048

2

T.-C. Chang, I. Stergiopoulos / FEBS Letters xxx (2015) xxx–xxx

architectures exhibited by modular CBM14-containing proteins. Our analysis revealed an impressive repertoire of domains and consequently domain architectures associated with the CBM14, indicating that the ligand-binding properties of this module have been exploited several times in nature, possibly as a means to facilitate functional innovation in higher eukaryotes. 2. Materials and methods 2.1. Identification of domains in multimeric CBM14-containing proteins and subsequent domain–domain interaction network analysis To precisely identify the diversity of domain types that are associated with the CBM14, we used InterProScan [7] and the Hidden Markov Model (HMM) of each domain deposited in the Pfam database [8] to search against the 3432 CBM14-containing proteins (e-value <1E-5) that we have previously identified [6,9]. The signal peptide (SP) was considered as one type of domain, meaning that proteins consisting only of CBM14s fused to a SP were regarded as homomultimeric, as the final polypeptide chain would consist only of CBM14s. Data on the topological arrangement of the various domains in different protein architectures were retrieved by constructing a directed network graph, using Cytoscape [10] and custom Perl scripts. The basic unit in the network analysis was a domain pair, defined as two domains located directly adjacent to each other in a polypeptide chain taking the order of the domains into account as well. For instance, in a single protein with three consecutive domains, A–B–C, two domain pairs were defined (i.e. A–B and B–C). Also, combinations such as A–B and B–A were classified as two discrete domain pairs. In the network, each node represented a domain and the edge was connected to the paired partners. The direction of the edge represented the order of each domain pair with the source of the arrow as the head domain (i.e. the domain facing the N-terminus of one protein) and the target as the tail domain (i.e. the domain facing the C-terminus of the protein). The number of domain pairs was used as the weight of the edges. The degrees in the network were defined as the number of edges connected to an individual node and were divided to in(wards) and out(wards). Finally, the frequency of each type of domain pair was counted at three taxonomic levels (i.e. Phylum, Kingdom and Domain) and the taxonomic distribution of the domain pairs as well as the overall network properties were further analyzed using the Network analyzer [11] implemented in Cytoscape. The sub-network was constructed based on the nodes with a non-zero clustering coefficient. 3. Results and discussion 3.1. Diversity of domain types co-occurring with the CBM14 in modular proteins A total of 11 633 domains, of which 3973 were domains other than the CBM14 were recovered from CBM14-containing proteins that corresponded to 94 unique Pfam domains (Table S1). This indicates that the CBM14 has the propensity to associate with a variety of other domain types, and thus can be regarded as a promiscuous domain [5]. Since chitin is mostly found in nature as an extracellular matrix polysaccharide [12–14], it is perhaps not surprising that, next to CBM14s (7660/11 633), the SP is the domain most commonly recovered from CBM14-containing proteins (2236/11 633). However, 1196 CBM14-containing proteins which lack a SP were also identified, and these might be involved in intermediate steps of chitin biosynthesis, metabolism, and transport to the extracellular matrix [12]. Other Pfam domains most frequently recovered

were the glycoside hydrolase family 18 domain (GH18; PF00704) (866/11 633), which catalyzes the enzymatic degradation of chitin [15]; the low-density lipoprotein receptor class A domain (LDLa; PF00057) (175/11 633), which binds and transports ligand lipoproteins into cells, thus playing a key role in lipid metabolism [16]; and the Immunoglobulin variable-set domain (IGv; PF07686) (129/11 633), which belongs to a class of Ig-like domains that are involved in a variety of functions, including cell–cell recognition, cell-surface receptors, and others [17]. The remaining 89 Pfam domain types that were recovered from CBM14-containing proteins had a frequency of occurrence less than one hundred times each, including 48 domains that were found only once (Table S1). Overall, the diversity of domains in CBM14-containing proteins indicates that these proteins can partake in a multitude of biological processes and pathways. 3.2. Diversity of domain architectures in modular CBM14-containing proteins The large number of various domain types that can be combined with the CBM14 in modular proteins was consequently translated into an increase in protein versatility. Of the 3432 CBM14-containing proteins identified previously [6], 2646 (77.1%) are modular proteins that emerged through the combination of CBM14(s) with one or more of the other 93 Pfam domain types that were recovered from the CBM14-containing proteins. Analysis of all the domain arrangements identified a total of 224 unique domain architectures, 208 of which refer to modular proteins (Fig. 1, Tables S2 and S3). After excluding proteins with one or more CBM14s fused to a signal peptide (SP) and not to any other domain types (1572 proteins), the number of ‘‘true’’ modular proteins, which contain only CBM14 domains in their mature form, still remains high (1074 proteins). These modular proteins represent 185 unique domain architectures, implying that in many cases an organism’s repertoire of CBM14s can be seen as a collection of CBM14-containing proteins with different domain architectures. The majority of the modular CBM14-containing proteins had only one (949 proteins) or two (74 proteins) additional domain types, however proteins with three or more extra domain types (51 proteins) were identified as well (Table S4), indicating that CBM14s can partake in the formation of proteins with complex domain architectures. To further elucidate the complexity of the domain architectures, we performed a directed domain combination network analysis that traced the linear pattern of pairwise domain combinations in all 3432 CBM14-containing proteins [18] (Fig. 2). The analysis revealed a total of 8201 domain pairs that accounted for 172 unique pairs. We next examined the degree number of each node in the network, i.e. the number of edges connected to a single node. Overall, the degree distributions of the network followed a power law (p(k) / kc, k as node degree) with an c value of 0.934 (correlation R = 0.957) for in-degree distribution and 1.008 for out-degree distribution (R = 0.991) (Fig. S1), suggesting that it shares the properties of a scale-free network. One major feature of such a network is the presence of central hub nodes with relatively high degree numbers, which corresponds to a high number of connected partners [19]. In the constructed network, the CBM14 was placed as one of the central hubs with an in-degree of 51 and an out-degree of 38, which is considerably higher than the total degree of the rest of the domains (<21) (Fig. 2). In addition, the clustering coefficient revealed a relatively low value of 0.0067 for the CBM14 node, compared to the highest value of 1.0 in the network (Table S5). Combined, these results indicate that the CBM14 has a significantly higher number of domain partners (61 in total) as compared to any other domain in modular

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048

T.-C. Chang, I. Stergiopoulos / FEBS Letters xxx (2015) xxx–xxx

3

Fig. 1. Schematic illustration of the different domain types accrued in the CBM14-containing proteins of each species, mapped onto their phylogeny. Each domain type is represented by a geometrical shape and color. A blue hexagon, for example, represents a CBM14, a red oval shape a signal peptide (SP), a green pentagon a glycoside hydrolase family 18 (GH18) domain, and a yellow hexagon the low-density lipoprotein receptor class A domain (LDLa). The full list of the schematic representations and annotations for each domain is provided in Tables S1 and S2. To simplify the image, the order of the domains in each protein as well as intra-protein domain duplications have not been taken into account. The domain architectures present in CBM14-containing proteins are also mapped on the species’ phylogenetic tree, with tree branches colored based on the species taxonomic classification at the phylum level. Track a: phylogenetic tree of the species with CBM14-conatining proteins as deduced from the Tree Of Life (sTOL) and the NCBI taxonomy databases. Each terminal branch represents a unique domain organization in one species. The corresponding phylum of each species is colored in accordance with the legend. Track b: the domain types accrued in the CBM14-containing proteins of each species. The order of the domains plotted in this track is based on the abundance of the domains rather their real order or arrangement in the multi-domain proteins.

CBM14-containing proteins, thus establishing that the CBM14 is a highly promiscuous module. In addition, the network contained a high number (4163) of self-loops of the CBM14, indicating extensive tandem duplications of this module in CBM14-containing proteins (Fig. 2 and Table S6). Further examination of the network connectivity revealed that only 51% (88 of 172) of the unique domain pairs present in modular CBM14-containing proteins included the CBM14 domain, while the rest 49% (84 of 172) were pairs between other domain types. Perhaps not surprising given their higher frequency of occurrence, the network connectivity analysis revealed that CBM14s were most frequently directly connected to the SP (1716 pairs), GH18 (794 pairs), LDLa (107 pairs), and IGv domains (67 pairs)

(Table S6). In many organisms including insects and fungi, GlcNAc-lipid linked intermediates or acceptors are thought to play a role in chitin biosynthesis, which might partially explain the abundant association of CBM14s with lipid-associated domains, such as the LDLa [12]. Moreover, the association of chitin-binding domains with immunoglobulin (Ig) type molecules has been well-described in the so-called variable (V) region-containing chitin-binding proteins (VCBPs), which consist of two diversified IGv domains and a chitin-binding domain. VCBPs are predominately found in protochordates and are thought to function as bifunctional molecules, in which the diversified V regions provide recognition of various foreign antigens, while the chitin-binding domain has likely been adapted for immune recognition of

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048

4

T.-C. Chang, I. Stergiopoulos / FEBS Letters xxx (2015) xxx–xxx

Fig. 2. Directed network of the domain architectures present in CBM14-containing proteins. The network is built based upon the different ways that domains can be organized in CBM14-containing proteins. Ovals (or nodes in the graph) represent the different domain types and arrows (or edges in the graph) represent the order of the domains in different domain architectures. Edges are colored and sized based on the abundance of occurrence of each type of domain architecture (color scale: black 1, bluered 2-4163). The five nodes with the highest degree numbers are highlighted in different colors.

chitineous organisms [20]. In contrast to VCBPs, the role of CBM14 in GH18 chitinases is clearly defined. The biodegradation of chitin in nature is accomplished by different types of enzymes that include chitin deacetylases and two families of glycoside hydrolases, namely GH18 and glycoside hydrolase family 19 domain (GH19) [15,21,22]. GH18 chitinases are widely distributed in nature and produced by many organisms including insects, nematodes, fungi, and bacteria, while GH19 chitinases are present predominately in plants and only rarely in bacteria [15,21,22]. Chitin deacetylases catalyze the deacetylation of chitin into chitosan and are mainly present in marine bacteria, fungi, and a few insect species [23]. CBM14 domains were predominately fused to GH18 chitinases and with a lower frequency to a family of putative chitin deacetylases (PF01522) (42 of the 8201 pairs), but not to GH19 chitinases. Given the taxonomic distribution of the CBM14 family and the fact that chitin is currently the only reported target ligand for members of this family, it is not surprising that CBM14s are most frequently connected to GH18 chitinase domains, in which they most likely promote the avidity of the enzyme for the target chitin substrate [12,24]. Taken together, despite the fact that the three domains most frequently associated with CBM14s in modular proteins can partake in various biochemical and molecular processes, these can still be linked to chitin, which would argue against changes in ligand-specificity of the CBM14s found in these

proteins. It is then likely that the presence of multiple CBM14s in these modular proteins increases the overall avidity of the protein for chitin, thus compensating for the rather weak interactions between CBM14s and chitin oligosaccharides [25]. Subtle changes for example in the amino acid composition of CBM14s can have a dramatic effect on the binding kinetics and affinity of this module for different length chito-oligosaccharides, a process that can be accelerated through multiple rounds of domain duplication and subsequent sequence diversification and selection [25]. In addition to connectivity, examination of the network directionality revealed that 61 of the 88 domain pairs that involved the CBM14 had a fixed N-to-C-terminal order of the paired domains, and could thus represent supra-domains that become joined once in evolution and assimilated a specific evolutionary and functional relationship [26,27]. A putative supra-domain that was overrepresented in our data set (32 times of occurrence) was the fusion between CBM14 and the Scavenger receptor cysteine-rich protein domain (SRCR: PF00530), which has been characterized in Sp22D, a serine protease in Anopheles gambiae [28] (Table S6). The presence of CBM14 in Sp22D presumably allows the protease to bind to exposed chitin and elicit downstream developmental responses during tissue remodeling or damage. The conservation of the domain order in the majority of the pairs could further suggest that there are functional and/or

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048

T.-C. Chang, I. Stergiopoulos / FEBS Letters xxx (2015) xxx–xxx

5

Fig. 3. Directed domain graph illustrating the evolutionary paths that led to the domain architectures found in present day CBM14-containing proteins. Ovals (or nodes in the graph) represent the different domain types with border colors denoting the variation in their clustering confidence (i.e. 0: blue; 1: red). The plot is constructed based on the nodes with a clustering coefficient >0. Arrows (or edges in the graph) represent the order and organization of the domains within multi-domain proteins with the edges colored based on the taxonomic distribution of these architectures. The network shows that most domain architectures are specific to Arthropoda and Cephalochordata, and therefore, must have been formed within these linages.

structural constrains on domain organization in modular CBM14-containing proteins and that formation of new domain architectures is not governed by random recombination but rather through linear addition [27,29]. For example, excluding pairs with the SP, an analysis for any preference in the location of the CBM14 in the domain pairs indicated that CBM14s strongly favored the C-terminal position (chi-square test, P < 0.001). Furthermore, given the high total number (8201) of domain pairs and the significantly lower number (172) of unique pairs present within the dataset, as compared to the number that would be expected (942 = 8836) under random domain shuffling, it can be assumed that formation and propagation of supra-domains has played a more significant role in the evolution and diversification of the CBM14 family as compared to domain shuffling. Also, while some domain pairs are highly abundant, most are relative rare (median number of occurrence is 1), indicating differential rates of duplications among the various domain pairs. Taken together, the above observations suggest that although duplication of domain pairs in CBM14containing proteins seems to be more common than invention of new domain architectures, both processes are under selection and thus do not take place randomly. 3.3. The distribution of domain types and domain architectures is mostly lineage-specific We previously established that the CBM14 family experienced multiple expansions and contractions in a lineage-specific manner [6]. The phylogenetic distribution of the different domain types as well as of the domain pairs formed by them, showed equally large variations among the extant lineages and taxa (Table S7), which could reflect adaptations to the specific organismal biology. Higher eukaryotes are known to exhibit elevated rates of domain rearrangements and duplications, mainly as a result of their larger genomes and more complex biology [30]. In our dataset, the Arthropoda and Cephalochordata clades exhibited the highest numbers in unique domain types (58 and 25, respectively) and consequently unique domain pairs (98 and 44, respectively) in modular CBM14-containing proteins. This is perhaps not surprising

given that CBM14s are highly abundant in these two clades (2539 and 64 proteins, respectively) [6]. However, regression analyses between the number of CBM14-containing proteins and the number of unique domain types or domain pairs present in each species indicated a weak association (R2 = 0.04 and R2 = 0.57, respectively). In contrast, a strong correlation (R2 = 0.97) was found between the number of CBM14-containing proteins and the number of domains or domain pairs present in each species (Fig. S2). These results corroborate the lineage-specific acquisition rates of new domain types and domain pairs by modular CBM14-containing proteins as well as the fact that once formed duplication of domain pairs is more common than the invention of new combinations. We next investigated key points in the evolutionary history of the major domain pairs present in modular CBM14-containing proteins in order to infer their potential origin and distribution. For this purpose, a sub-network was re-constructed from the previous constructed network based on nodes with a clustering coefficient >0, in which the phylogenetic distributions of the domain pairs were mapped at the phylum and subphylum level (Fig. 3). The resulted network was clearly dominated by domain pairs present in Arthropoda and Cephalochordata, which included 52 Arthropoda-specific and 15 Cephalochordata-specific pairs. An analysis of the origin of the major domain pairs present in modular CBM14-containing proteins, indicated that the earliest domain fusion event took place between the CBM14 and GH18 domain, most likely before the divergence of the metazoans from the rest of the Eukaryotic linages. Another significant domain fusion event represents the fusion between CBM14 with LDLa, which probably took place before the divergence of Ecdysozoas within metazoan. Notably, in Arthropoda the LDLa domain was also frequently fused with the polysaccharide deacetylase domain (Pdi; Pfam ID: PF01522), indicating a further extension of the original CBM14– LDLa architecture before the divergence of insects. In these organisms, proteins with the CBM14–LDLa–Pdi architecture are members of the chitin deacetylase gene family, which catalyzes the N-deacetylation of chitin to chitosan during morphogenesis and molting [31]. Finally, the third major fusion event took place between CBM14 and IGv, most likely before the divergence of

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048

6

T.-C. Chang, I. Stergiopoulos / FEBS Letters xxx (2015) xxx–xxx

Chordata, as it is only present in the Cephalochordata and Tunicata subphyla of Choradata animals. The remaining domain pairs were mainly specific to each phylum, including 115 species-specific domain pairs, which may have occurred more recently compared to the three most abundant pairs described above. Overall, the network analyses indicated that the domain pairs and evolutionary constraints have interplayed incessantly to shape the sequence, structure, and function of the CBM14-containing proteins. Acknowledgments This work used resources of the UC Davis College of Agricultural and Environmental Sciences Computing Center Farm cluster and was financially supported by UC Davis faculty start-up funds. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.febslet.2015.05. 048. References [1] Boraston, A., Bolam, D., Gilbert, H. and Davies, G. (2004) Carbohydratebinding modules: fine-tuning polysaccharide recognition. Biochem. J. 382, 769–781. [2] Shoseyov, O., Shani, Z. and Levy, I. (2006) Carbohydrate binding modules: biochemical properties and novel applications. Microbiol. Mol. Biol. Rev. 70 (2), 283–295. [3] Guillen, D., Sanchez, S. and Rodriguez-Sanoja, R. (2010) Carbohydrate-binding domains: multiplicity of biological roles. Appl. Microbiol. Biotechnol. 85 (5), 1241–1249. [4] Weiner, J., Moore, A.D. and Bornberg-Bauer, E. (2008) Just how versatile are domains? BMC Evol. Biol. 8 (1), 285. [5] Basu, M.K., Carmel, L., Rogozin, I.B. and Koonin, E.V. (2008) Evolution of protein domain promiscuity in eukaryotes. Genome Res. 18 (3), 449–461. [6] Chang, T.C. and Stergiopoulos, I. (2015) Inter-and intra-domain horizontal gene transfer, gain-loss asymmetry and positive selection mark the evolutionary history of the CBM14 family. FEBS J. 282 (10), 2014–2028. [7] Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R. and Lopez, R. (2005) InterProScan: protein domains identifier. Nucleic Acid Res. 33, W116–W120. [8] Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., et al. (2014) Pfam: the protein families database. Nucleic Acid Res. 42 (D1), D222–D230. [9] Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L. and Mistry, J. (2013) Pfam: the protein families database. Nucleic Acid Res., 1223. gkt1223. [10] Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L. and Ideker, T. (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27 (3), 431–432.

[11] Saito, R., Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., Lotia, S., Pico, A.R., Bader, G.D. and Ideker, T. (2012) A travel guide to Cytoscape plugins. Nat. Methods 9 (11), 1069–1076. [12] Merzendorfer, H. and Zimoch, L. (2003) Chitin metabolism in insects: structure, function and regulation of chitin synthases and chitinases. J. Exp. Biol. 206 (24), 4393–4412. [13] Ruiz-Herrera, J. (1991) Fungal Cell Wall: Structure, Synthesis, and Assembly, CRC Press. [14] Martinez, J.P., Falomir, M.P. and Gozalbo, D. (2009) Chitin: a structural biopolysaccharide. eLS. [15] Karlsson, M. and Stenlid, J. (2008) Evolution of family 18 glycoside hydrolases: diversity, domain structures and phylogenetic relationships. J. Mol. Microbiol. Biotechnol. 16 (3–4), 208–223. [16] Dieckmann, M., Dietrich, M.F. and Herz, J. (2010) Lipoprotein receptors-an evolutionarily ancient multifunctional receptor family. Biol. Chem. 391 (11), 1341–1363. [17] Kim, E., Lee, Y., Kim, J.-S., Song, B.-S., Kim, S.-U., Huh, J.-W., Lee, S.-R., Kim, S.-H., Hong, Y. and Chang, K.-T. (2010) Extracellular domain of V-set and immunoglobulin domain containing 1 (VSIG1) interacts with sertoli cell membrane protein, while its PDZ-binding motif forms a complex with ZO-1. Mol. Cells 30 (5), 443–448. [18] Itoh, M., Nacher, J.C., Kuma, K.-I., Goto, S. and Kanehisa, M. (2007) Evolutionary history and functional implications of protein domains and their combinations in eukaryotes. Genome Biol. 8 (6), R121. [19] Pavlopoulos, G.A., Secrier, M., Moschopoulos, C.N., Soldatos, T.G., Kossida, S., Aerts, J., Schneider, R. and Bagos, P.G. (2011) Using graph theory to analyze biological networks. Biodata Min. 4. [20] Cannon, J., Haire, R. and Litman, G. (2002) Identification of diversified immunoglobulin-like variable region-containing genes in a protochordate. Nat. Immunol. 3, 1200–1207. [21] Hamid, R., Khan, M.A., Ahmad, M., Ahmad, M.M., Abdin, M.Z., Musarrat, J. and Javed, S. (2013) Chitinases: an update. J. Pharm. Biol. Sci. 5 (1), 21. [22] Ubhayasekera, W. (2011) Structure and function of chitinases from glycoside hydrolase family 19. Polym. Int. 60 (6), 890–896. [23] Zhao, Y., Park, R.-D. and Muzzarelli, R.A. (2010) Chitin deacetylases: properties and applications. Mar. Drugs 8 (1), 24–46. [24] Merzendorfer, H. (2013) Insect-derived chitinasesYellow Biotechnology II, pp. 19–50, Springer. [25] Kohler, A., Chen, L.-H,, Hurlburt, N., Salvucci, A., Schwessinger, B., Fisher, A. and Stergiopoulos, I. Structural analysis of an Avr4 effector ortholog provides mechanistic insight into chitin-binding and recognition by the Cf-4 receptor. Submitted for publication. [26] Vogel, C., Berzuini, C., Bashton, M., Gough, J. and Teichmann, S.A. (2004) Supradomains: evolutionary units larger than single protein domains. J. Mol. Biol. 336 (3), 809–823. [27] Vogel, C., Bashton, M., Kerrison, N.D., Chothia, C. and Teichmann, S.A. (2004) Structure, function and evolution of multidomain proteins. Curr. Opt. Struct. Biol. 14 (2), 208–216. [28] Gorman, M., Andreeva, O. and Paskewitz, S. (2000) Sp22D: a multidomain serine protease with a putative role in insect immunity. Gene 251 (1), 9–17. [29] Vogel, C., Teichmann, S.A. and Pereira-Leal, J. (2005) The relationship between domain duplication and recombination. J. Mol. Biol. 346 (1), 355–365. [30] Nasir, A., Kim, K.M. and Caetano-Anolles, G. (2014) Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput. Biol. 10 (1), e1003452. [31] Dixit, R., Arakane, Y., Specht, C.A., Richard, C., Kramer, K.J., Beeman, R.W. and Muthukrishnan, S. (2008) Domain organization and phylogenetic analysis of proteins from the chitin deacetylase gene family of Tribolium castaneum and three other species of insects. Insect Biochem. Mol. Biol. 38 (4), 440–451.

Please cite this article in press as: Chang, T.-C. and Stergiopoulos, I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett. (2015), http://dx.doi.org/10.1016/j.febslet.2015.05.048