Structures composing protein domains

Structures composing protein domains

Biochimie 95 (2013) 1511e1524 Contents lists available at SciVerse ScienceDirect Biochimie journal homepage: www.elsevier.com/locate/biochi Review ...

3MB Sizes 9 Downloads 171 Views

Biochimie 95 (2013) 1511e1524

Contents lists available at SciVerse ScienceDirect

Biochimie journal homepage: www.elsevier.com/locate/biochi

Review

Structures composing protein domains Jaroslav Kubrycht a, *, Karel Sigler b, Pavel Sou cek c, Jirí Hude cek d a

Department of Physiology, Second Faculty of Medicine, Charles University, Plzenska 221, 150 00 Prague, Czech Republic Laboratory of Cellular Biology, Institute of Microbiology, Academy of Sciences of the Czech Republic, Videnska 1083, 142 20 Prague, Czech Republic c Toxicogenomics Unit, National Institute of Public Health, Srobarova 48, 100 42 Prague, Czech Republic d Department of Biochemistry, Charles University, Hlavova 2030, 128 40 Prague, Czech Republic b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 22 January 2013 Accepted 2 April 2013 Available online 10 April 2013

This review summarizes available data concerning intradomain structures (IS) such as functionally important amino acid residues, short linear motifs, conserved or disordered regions, peptide repeats, broadly occurring secondary structures or folds, etc. IS form structural features (units or elements) necessary for interactions with proteins or non-peptidic ligands, enzyme reactions and some structural properties of proteins. These features have often been related to a single structural level (e.g. primary structure) mostly requiring certain structural context of other levels (e.g. secondary structures or supersecondary folds) as follows also from some examples reported or demonstrated here. In addition, we deal with some functionally important dynamic properties of IS (e.g. flexibility and different forms of accessibility), and more special dynamic changes of IS during enzyme reactions and allosteric regulation. Selected notes concern also some experimental methods, still more necessary tools of bioinformatic processing and clinically interesting relationships. Ó 2013 Elsevier Masson SAS. All rights reserved.

Keywords: Catalytic Disordered Fold Intradomain Motif Repeat

1. Several notes to structural diversity, assembly and hierarchy Domains are protein chain segments representing structural and functional units (building blocks), which have kept certain overall structural conservativeness during protein evolution. Binding interactions or enzyme reactions of domains and smaller structures are involved in multiple processes endowing proteins with new functions (cf. [1]). In fact, two main types of structures smaller than domains exist: (i) interdomain connections and (ii) intradomain structures (IS). The interdomain connections (denoted also domain extensions) constitute both conserved and structurally unstable disordered regions [2,3] linking or terminating domains. IS (or otherwise structures composing domains) dealt with in this review are more extensively diversified with respect to their extent, composition and stereo-chemical properties

Abbreviations: 3D, three-dimensional; aa, amino acid residue(s) in peptide; CDR, hypervariable complementarity determining region(s) of IgV; CDRn, CDR of n-th chain order (n is 1e3); FR, framework regions of IgV; FRn, FR of n-th chain order (n is 1e4); IS, intradomain structure(s); Ig, immunoglobulin(s); IgV, variable domain(s) of Ig; igvds, conserved variable Ig domain sequence(s) (derived by conserved domain search of BLAST); MSA, multiple sequence alignment(s); MSA-R, MSA derived record(s); PKSI, peptidic protein kinase substrates and inhibitors; SLM, short linear motif(s); TCR, T-cell receptor(s); TR, protein tandem repeat(s). * Corresponding author. Tel.: þ420 233 32 32 72. E-mail address: [email protected] (J. Kubrycht). 0300-9084/$ e see front matter Ó 2013 Elsevier Masson SAS. All rights reserved. http://dx.doi.org/10.1016/j.biochi.2013.04.001

than the preceding structures. It is a question whether the assumed classification is definitively complete. Hence, for instance, we do not know whether some of mostly intradomain structures such as protein pockets (cf. Table 1) and still poorly described threedimensional (3D) motifs (for related consideration see Ref. [4]; cf. also bi-domain catalytic sites [5,6]) can be formed by local spatial structural arrangements of segments present in two or more domains. Such structures appear to be interesting, because they would represent new types of spatially united oligochain and oligodomain substructures undergoing a unifying phylogenic pressure in contrast to currently considered evolution of single chain related structural units, e.g. domains and interdomain connections mentioned above. Multiple important IS are shown in Table 1 (Refs. [7e86]). In accordance with Table 1, each functional site is formed in fact by a superposition (coincidence) of various structural levels mentioned in the upper part of the table. This means that physically interacting or chemically reacting specific (or sometimes alternative) amino acid residues (aa) as well as alternative simultaneously acting aa of short linear motifs (SLM) or catalytic motifs represent only the first “contact”-related structural level of IS important for triggering of functional events (cf. Chapters 3 and 4). Secondary structures then constitute spatial carriers of the preceding “contact” primary structures (cf. e.g. Ref. [87]). Moreover, cooperating “contact” structures can be differently present in single or several secondary structures of the same fold, or other folds located in the same or

1512

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

Table 1 Broadly occurring and some more special intradomain structures (IS). Structures

Short description

Sequence related IS Short linear motif(s) (SLM) *Short sequence structures (frequently 3e10 aa) of biological importance (proved in published papers and/or database annotated experiments) often achieving higher occurrence than expected /SLM are first of all investigated in case of interaction studies (cf. ch3). In addition, SLM can also determine other properties of proteins (e.g. flexibility or elasticity). /Only in case of sufficient spatial exposure SLM can interact. Such exposure follows from SLM superposition with proper conformation-related structures or new accessibility after proteolysis. Catalytic motifs *Motifs composed of 2e10 aa including at least single catalytic aa (see ch4) *Short sequence structures derived using searches in accessible or Sequence pattern(s) (SP) private sequence databases and occurring substantially more frequently than expected /SP are candidates for SLM, short TR and catalytic motifs or directly form these structures. Peptide signatures *Sequence patterns generalized with respect to different extent of undetermined aa (recorded in usual SP as X) inserted between specific pattern aa (UIaa) /Increased distance variability follows from insertion-deletion changes on aa level, frequent in some protein families or familiar groups (see e.g. variable domains of immunoglobulins [11,12]). *Sequence structures repeating in single protein molecule and Tandem repeat(s) of proteins (TR) usually also in the majority of molecules of family or superfamily relationship /TR unit lengths range from a single aa (aa repetitions) to more than 100 residues (i.e. in the domain range) and the repeat number is sometimes over 100 [14]. TR carry fundamental functions frequently related to human diseases [15,16]. For protein repeats different from TR see ch2. *Sequence segments of highest domain/molecular similarity Conserved sequence can be observed, when using a group of related representative regions (CSR) sequences. /CSR regions are necessary for different functions or structural properties of proteins. /Structurally based analogs of CSR (i.e. structurally conserved regions) also exist [18,19]. IS structures derived based on conformation of peptide chains *Minimum protein segments of uniform spatial arrangement Secondary structure(s) and of at least five aa length of proteins (SSP) /Only some types of secondary structures frequently participate in interactions (cf. ch3). /Knowledge based structural classes based on predicted SSP play an important role in understanding protein folding (see below; [25]) Folds *Folds are building blocks of domains. These blocks (of at least about twenty aa length) are composed of several SSP (as elements) forming together supersecondary motif. For instance supersecondary motifs ABAB, BABA, BABABB and BLBABA were described indicating A, B and L as alpha helix, beta sheet and loop, respectively [30e33]. / Folds are chain-related segments of certain structural autonomy. As it is well-known, folds can frequently keep phylogenic relationship even in cases, when superfamily relationship (indicated usually by a conserved domain similarity) is lost. Fold repertoire has not yet been enriched in the last two milliards years [34]. *Segments without unique well-defined 3D structure (Intrinsically) disordered /DR are mostly investigated in interactions, because their regions (DR) interactions can more rapidly evolve than those of conventional segments. This follows from the fact that DR are highly flexible and polyvalent in many cases due to existing alternative conformations (see ch3). *PP are formed by encapsulated protein surface separated Protein pockets (PP) from outside space and appear to be important for protein interactions including those with drugs [52,53]. /Inhibitors targeting the gp41 pocket has been developed for purposes of AIDS therapy [54].

Experimental and database indications, predictions Discrimination of annotated biologically relevant SLM from stochastically occurring non-functional instances needs database agreement (e.g. Refs. [7,8]) or experiments confirming proposed biological function (SDM, binding assays, enzyme kinetics); interaction interfaces of SLM identified in range of human proteome [9]

SDM; molecular docking; MD; see also ch5 and Ref. [10] MSA columns with monotonous or almost monotonous occurrence of unique aa indicate SP, whose frequencies and specificity can be further reevaluated using e.g. ScanProsite or PHI BLAST.

Patterns regarding variation of gap numbers in MSA; ScanProsite with SP containing more diversified numbers of UIaa than original SP generated by MSA; CoPS e comprehensive peptide signature database [13]

TR can be identified using recent Protein Repeat Database [17]. For other tools see ch5.

Databases of CSR or conserved domain sequences [20e23]; PHI BLAST with (i) MSA-derived SP and (ii) the corresponding probabilistically restricted consensi [24]

Web server related to eight experimental methods of SSP assignment [26]; PTGL e database for SSP [27]; SSP prediction using multistep learning (SPINE X [28]) or metapredictor (SymPsiPred [29])

NMR techniques (e.g. Refs. [35e39]); X-ray methods SAXS and WAXS [40,41]; image reconstruction based on electron density evaluation [42,43]; predictions consisting in: SVM combining four descriptors including profileeprofile alignment (DescFold [44]); template-based modeling [45]; knowledge-based approach (CONTSOR [46]); fold-specific PSSM libraries [47]; evaluation of multiple physicochemical aa properties [48]

NMR and SAXS [49]; ANCHOR [50] estimates energy combining general disorder tendency with sensitivity to the structural environment; MetaDisorder e accurate meta-prediction method (based on 13 disorder predictors; [51]) SiMMap server statistically derives site-moiety map to recognize interaction preferences between protein pockets and compound (e.g. drug) moieties [55].

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

1513

Table 1 (continued ) Structures

Short description

3D motifs

Experimental and database indications, predictions

*3D motif can be defined as a locally conserved contiguous structural segment of (i) certain recurrence in non-redundant proteins (more than 3 non-redundant proteins) and (ii) limited sequence similarity (less than 30% sequence identity) [56]. Functionally and *Contact aa responsible for intra- or interchain contacts; aa structurally mediating interactions of binding sites; catalytic aa directly important aa involved in formation of temporal reaction intermediates during enzyme reaction (see also ch1 and ch3e5) Intradomain superposition of IS *Sites initiating, mediating or regulating protein functions Intradomain functional sites (for special forms se below) (IFS) /Specificity is determined by the presence of (i) linear motif (catalytic motif or SLM; cf. Fig. 2) or (ii) certain (sometimes alternative) site residues arranged in suitable conformation (cf. ch1). Binding sites as IFS *Sites physically interacting with different affinities and specificities /Mimotopes (i.e. peptides derived by phage display library) and minimized recognition units (prepared using protein engineering) represent important structures for effective investigation of binding sites (see ch3) IFS involved in enzyme *Chemically reacting catalytic sites and physically interacting catalysis recognition sites (see ch4) /In spite of prevailing occurrence of a single specific catalytic domain in enzyme molecule, some active sites of enzymes are composed of two different domains within the same molecule [5,6]. *Regulatory sites located distantly from the corresponding Allosteric sites (AS) as IS regulated functional site (see ch4) /AS were formerly found in some enzymes. Nevertheless, they can also regulate receptor signaling and membrane transport. It is a question, whether also an altered acidity in the environment can act via allosteric sites at least in some cases (cf. Refs. [14,76]). Target sites for proteolytic *Besides allosteric activation, limited proteolysis can also activation cause changes in conformation or stereo-chemical site accessibility in other enzymes or regulatory molecules activating their action (e.g. complement or coagulation systems, generation of matrikines by metalloproteinases). /Some sterically accessible SLM in proteins can also act as targets for proteolytic cleavage [79] SLM/SP with/within 3D *Evaluation of 3D environments enables selection of well context stereo-chemically accessible SLM/SP. /If SLM had not been previously known, the clustering would have suggested their existence [85]

Interatomic distances and root mean square deviation (webserver RASMOT-3D PRO [4]; structural-alphabet-based strategy [56]) The effects of specific aa modification, SDM, molecular docking, MD; see also Figs. 1 and 3, or our previous review [10]

Possible versatility of NMR methods was considered in many reviews (e.g. Refs. [57e65]). For combined approaches of IFS prediction see Refs. [66e68], ch5 and our review [10]. For conserved domain associated annotations of IFS see Ref. [23]. NMR combined with other methods [69e73]; classical binding studies of interaction; co-precipitations; interactive microarrays; SDM; proteineprotein interaction networks; molecular docking (see also ch5 and Ref. [10])

Techniques of enzyme kinetics; SDM; molecular docking [10]; MD (ch5); Catalytic Site Atlas [74]; prediction based on 3D structure [66] and local closeness [75]

Recent attempts to investigate allostery using NMR [77,78]; methods of enzyme kinetics; SDM; MD; prediction based on local closeness (geometry-based generic predictor [75])

Current techniques such as N-terminal sequencing and SDM; mass spectrometry of proteolytic fragments [80,81]; databases such as BIOPEP, DegraBase and TOPPR [82e84]

In silico clustering of protein microenvironments [85]; SLM on DIET (topical SLM 3D structures) [86]

MSA e multiple sequence alignment; ch1, ch2, ch3, ch4, ch5 e Chapters 1e5, respectively; * e term explanation/explanatory note; / selected comments; DIET e domain interface extraction; MD e molecular dynamics; NMR e nuclear magnetic resonance; SDM e site directed mutagenesis; SVM e support vector machines. For other abbreviations see the first column of this table or Abbreviations.

even other domains (cf. Refs. [5,6]), forming thus hierarchy of structural levels necessary for assembly of an individual functional site (cf. Table 1). In addition to this “contact” simplification, variously distant and extended protein structures can undergo chargeinduced changes and distinct forms of energy exchanges, which are in part observed using different types of spectroscopic and NMR methods. All given changes and their consequences participate in a process yielding phylogenically optimized functional response of reacting/interacting proteins. The process should correspond to molecular orbital rearrangement including perhaps some changes in structural fluctuations (cf. Ref. [88]). The response is then mediated by structural changes occurring usually in effector segments that do not contain “contact” structures. The changes in effector segments are thus for instance involved in well-known sentinel-like re-cyclings in 3D structures of enzymes accompanying repeating substrateeproduct conversion, signal transmissions through the membrane molecules (cf. allosteric regulation in Chapter 4), and temporal functional agglomerations of intracellular signaling molecules or membrane receptors together with other membrane molecules (cf. Ref. [89]).

In accordance with the discussed multilevel structural composition of usually intradomain functional sites, new strategies have been developed to improve the corresponding structurally-based site prediction. Such strategies have consisted in the combined usage and corresponding weighted evaluation of multiple bioinformatic tools determining different structural similarities following from comparison of 3D structures, computed electrostatic properties, fold recognition and sometimes also parameters following from the corresponding multiple sequence alignment (MSA; see Table 1 Refs. [66e68], and our previous review [10]). Since 3D structure turns to be very important for functional site prediction, better visualization of 3D records appears to be also useful in many cases of structurally-based research. In addition to on-line reevaluation of stereo-chemical accessibility with the help of programs demonstrating the location of SLM or epitope in 3D structures of proteins [90e93], contact maps represent lucid transformed representation (“abstract visualization”) of protein structures. Contact maps enable us to search and evaluate 3D similarities or indicate dynamic changes in 3D (Fig. 1; [94,95]). Contact map representation follows from matrices containing

1514

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

Fig. 1. Two examples of contact maps. Part A illustrates corresponding aa distances in ribosomal protein L30 from Thermus thermophilus, whereas part B shows the related static contact map demonstrating the occurrence of predicted intrachain (black squares near the diagonal) or interchain (other black squares) contacts (adapted from Vehlow [94]). Both types of contacts are always intramolecular. Dynamic contact maps present in this figure indicate changes during the reaction of aspartokinase III (E. coli) in the presence (section C) or absence (section E) of the allosteric inhibitor lysine (adapted from Chen [95]). The structures of regulatory domain, C-lobe and catalytic domain are shown in section D (sectors A, B and C, respectively). The changes in uninfluenced (non-inhibited) catalytic site appear to be more contrasting with respect to the recorded correlated (red elements) and anticorrelated (blue elements) local aa movements.

reciprocal distances of aa (static maps; [94,96e99]) or correlated relative changes of such distances (dynamic maps; [95,100]). In optimal cases, functionally important 3D structural agreement can be indicated by both sequence and fold similarities [101]. In other cases we can find at least five different effects complicating usual prediction of 3D similarities between different proteins. Some of these effects can also influence structural stability and interactivity of investigated proteins. Firstly, some changes in amino acid content accompanying evolution markedly influence the 3D structure. This concerns mainly the occurrence of cysteines, whose disulfide bridges are sometimes able to deform 3D structure, and some reciprocal transitions between hydrophilic and hydrophobic aa. In accordance with the latter case, prevailing local content of hydrophobic aa usually pre-determines strands, whereas segments

containing hydrophilic residues form usually loops representing frequently exposed regions [101,102]. Secondly, some fold-related secondary structures (constituting a supersecondary motif) can change due to allostery (see also Chapter 4). For instance, lanthanide binding to helix-turn-helix proteins increases the content of alpha-helices from 20% in the absence of the metal to 38% or 35% in the presence of Eu3þ or La3þ ions, respectively [103]. In accordance with this and other observations, supersecondary motifs differently autonomous with respect to their substructural stability were described [33]. Thirdly, both primary and secondary structures of protein segments are influenced by important hydrophobic or hydrophilic intramolecular chain contacts frequently modifying 3D structures [104,105]. Fourthly, elimination of domain segments can be critical, if one of two preferentially packing secondary structures

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

is lost. For instance, a remarkable tendency for antiparallel helices to align a parallel strand of beta-sheets is independent of the topological constraints or prevalence of betaealphaebeta motifs in the proteins [106]. Fifthly, interaction with water can also affect the final structures of proteins. A physically motivated, non-pairwiseadditive model of water-mediated interactions added to a protein structure prediction Hamiltonian yielded a marked improvement in the quality of structure prediction for larger proteins [107]. Specific water-mediated interactions were shown to be an universal feature of biomolecular recognition landscapes in both folding and binding [107]. A more detailed investigation of solvent effects on protein structures comprised also Fragment Molecular Orbital method representing the top of recent quantum biochemistry [108,109]. It is interesting that in accordance with the structural/ functional importance of water molecules, variously located in and interacting with protein structures, some authors use terms such as catalytic water, conserved water and disordered water [110e114]. Many examples of different structures, their relationships, superpositions and local coincidence are dealt with in the following text or demonstrated in the accompanying illustrations, completing thus the facts displayed in Table 1. We also mentioned here some structurally important properties of proteins, though the full complexity of such problems exceeds the scope of this text (cf. e.g. the 531 physico-chemical properties evaluated in the paper by Tung [115]). 2. Peptide repeats and amino acid repetitions We usually distinguish between tandem repeats (TR) of characteristic family-related domain relationship (domain-related TR), short bi-, tri- and tetrapeptide TR and aa repetitions as minimum TR. Domain-related TR occur in 14% of all proteins [14]. Many of these TR possess regular secondary structures and form multirepeat assemblies of diverse sizes and functions in three dimensions [14,116]. Such assemblies appear to be functionally important. For instance, the protein families containing repeats with similar primary and three-dimensional structure are mainly involved in proteineprotein interactions and in binding to other types of ligand molecules [117]. Fibronectin 3-related TR occur in 2% of animal protein sequences [118]. Shorter epidermal growth factor (EGF) related TR are often present in cell adhesion molecules that are frequently involved in regulation of cell proliferation, differentiation, and apoptosis. EGF-related TR include SLM similar to the fragment LDSYQCT of human alpha-fetoprotein [119]. ScanProsite derived searches suggest a large number of short TR in collagen sequences. For instance, the sequence GPP in collagen IV alpha 5 forms 68 repeats, whereas 65 mostly overlapping repeating sequences GPXGP are present in the molecule of collagen I alpha 1. Two types of aa repetitions can be distinguished: (i) aa-rich segments/regions of dense diffuse unique aa occurrence and (ii) segments with monotonous aa sequences including groups of frequent hydrophobic zippers and hydrophilic or charged poly-aa regions composing beta-hairpin based or alpha-helical conformations and sometimes possibly also polar zippers or nanotube-like structures (cf. Ref. [120]). Since aa repetitions are less described in reviews, we further deal with them in more detail. In addition to the most extensive segments enriched in a single aa species, i.e. proline-rich and cysteine-rich regions, the studies of arginine-, glutamine-, glycine-, histidine- and serine-rich regions are also frequent. The well-known proline-rich regions are important for protein flexibility following from low energy difference between trans and cis configurations of prolyl residues [121]. Among other things, proline-rich regions form segments recognized by autoantibodies [122,123] and are important for the function of some cytochromes P450 [124]. Cysteine-rich regions act as important parts of functional domains of enzymes, i.e. binding or

1515

catalytic sites [125e127]. Besides extended studies of proline-and cysteine-rich regions, leucine and isoleucine zippers are most frequently investigated. Leucine zippers are characteristic for transcription factors forming leucine zipper (bZIP) family, e.g. cAMP response element binding protein (CREB; [128]). Leucine/isoleucine zippers participate in interactions regulating activities of several ion channels dependent on beta adrenergic receptors and calcium pumps in heart muscle cells (e.g. KCNQ1; [129]). Even more frequently investigated than the widely known leucine zippers are poly-glutamine protein segments (PolyQ). PolyQ are involved in pathogenesis of polyglutamine diseases including Huntington disease (caused in part by a gain-of-function mechanism of neuronal toxicities [130]). The pathogenic mechanism includes protein conformation changes possibly accompanied by selection of PolyQ structures. Such changes result in the formation of widely known beta sheet-rich aggregates [120,130]. Though only part of aa repetitions are located within domains and other such repetitions frequently form inter-domain connections (leucine zippers of transmembrane segments) and closely related domain relicts (hinge regions of Ig), we suppose that a clear example of overall aa repetition would be important to demonstrate tendencies of different aa to coincide in different protein segments. Such illustrative model relationships related to human protein sequences are present in Table 2. Among other things this table offers an interesting comparison between aa which differ only in the presence of a single CH2 group in their side chains. For instance, we can observe much more frequent model occurrences of poly-glutamate or PolyQ, when compared to the corresponding occurrences of poly-aspartate or poly-asparagine chains. This suggests that better exposure of charged or polar structures appears to be linked to more frequent segments with monotonous sequences. Our possible explanation of an opposite difference between the occurrences of model poly-glycine and poly-alanine consists in lower sterical hindrance of poly-glycine chains. In accordance with the fact that leucine zippers are often functionally restricted by the thickness of the phospholipid membrane, their occurrence in longer segments with monotonous sequences is expectedly low

Table 2 Frequencies of human protein molecules containing regions with model amino acid repetitions. mvi:

0

1

2

3

nra:

10

20

30

10

20

30

10

20

30

10

20

30

A C D E F G H I K L M N P Q R S T V W Y

142 0 9 185 0 95 39 0 1 28 0 0 125 144 2 106 4 0 0 0

1 0 0 13 0 3 0 0 0 0 0 0 12 50 0 28 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0

277 3 33 488 1 363 82 0 107 138 0 0 399 205 69 307 28 2 0 0

8 0 2 39 0 47 0 0 3 0 0 0 21 68 6 55 0 0 0 0

0 0 0 9 0 3 0 0 0 0 0 0 0 21 0 14 0 0 0 0

412 8 60 794 3 808 105 0 221 322 0 3 869 279 123 868 113 9 0 9

21 0 3 83 0 241 2 0 11 0 0 0 74 85 12 94 8 0 0 0

3 0 1 14 0 142 1 0 0 0 0 0 6 32 1 25 1 0 0 0

723 70 98 1179 3 1109 120 4 436 704 1 7 1627 408 273 1677 200 34 0 15

39 26 4 144 0 293 7 0 23 3 0 0 158 99 60 135 22 0 0 0

8 10 1 59 0 151 1 0 4 0 0 0 38 38 14 36 5 0 0 0

Characters in the first column e aa forming model repetitions are denoted by the widely used single-character code; mvi e maximum number of model variable aa insertions among repeating aa; nra e model numbers of repeating aa. The necessary ScanProsite searches were performed repeatedly in April and August of 2012. For additional comments see Chapter 2.

1516

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

(see Table 2). Model serine- and proline-rich decapeptide patterns achieve the highest frequency (higher than 1600) when using model structures with the highest aa insertion limit in Table 2. This suggests some general importance of given type of segments (e.g. participation in protein chain flexibility in case of proline). If we assume that a possible negative selection of potential autoepitope structures occurred during evolution, the low frequencies of shortest model peptide patterns in Table 2 would indicate possible autoepitope relationship. In accordance with such possibility, both epitope related aromatic aa [131,132] and exposed methylated sulfide groups of methionine [132,133] form lowfrequency model patterns (Table 2). It is a question whether also the low frequencies of isoleucine- or valine-rich model patterns can be an indicator of potential autoepitope aa in agreement with the fact that aliphatic autoepitopes different from usual epitopes do exist (cf. Ref. [134]). In addition to conventional TR mentioned above, two groups of less similar protein repeat sequences do exist, i.e. (i) quasirepeats and (ii) heptad repeats. Quasi-repeats are less valid poorly conserved repeats which contain a low or zero number of common aa (positionally conserved aa occurrence), but can still be determined by using a well-defined (probabilistically restricted) consensus sequence. They are for instance effectively linked to the beta sheet secondary structure of keratin [135]. Quasi-repeats are also the main repeating structures of variable chains of immunoglobulins (Ig; cf. Ref. [136]). The even less precisely defined heptad repeats would be similar to ancient ruins of castles, provided that castles are compared to the repeats mentioned above. Hence sequences of these repeats are formed by characters representing certain aa groups (hydrophobic, charged, polar, etc.). Heptad repeats have been frequently identified in virology, because viruses rapidly alter their sequences due to their extremely fast multiplication and low-fidelity replications [137,138]. Less frequently, heptad repeats were also used as topological markers during structural analysis of vertebrate proteins [139]. 3. Interacting sites within domains Some recent studies of interactive networks indicate differences in predicted levels of interactions, distinguishing between motife motif, motifedomain and domainedomain interactions [140]. In addition to conventional models of proteineprotein interaction based on complementarity (i.e “lock-and-key” and “hand-in-glove” models) [141], three new models have been recently developed for docking studies of proteineprotein interactions: (i) conformer selection model using a novel ensemble docking algorithm, (ii) induced fit model employing energy-gradient-based backbone minimization, and (iii) combined conformer selection/induced fit model [142]. Typically interacting protein domains generally recognize a core determinant, with flanking or noncontiguous residues providing additional contacts and an element of selectivity [1]. Interacting SLM (cf. also Chapter 1 and Table 1) are dealt with in many reviews which include general lists of motifs [1,89], immunotyrosine-based inhibitory motifs participating in negative cell signaling within many organisms [143], SLM mediating interactions within phospholipid membranes [144], SLM involved in the endocytosis of membrane receptors [145], endosomal sorting [146] and post-translational modification [147]. Many examples of domains containing multiple interaction sites of SLM extent were described in the review of Pawson and Nash [1] and in the book of Alberts [148]. The RGD motif represents a good example for demonstration of SLM properties. In fibronectin molecule, critical RGD motif is present within the loop between two elements forming a sandwich supersecondary structure of anti-parallel beta sheet chains (Fig. 2;

Fig. 2. Conformation-dependent spatial accessibility of short linear motif RGD. An unusually small rotation between fibronectin domains 9 and 10 creates two distinctive alternative conformations (cf. space-filling models A and B). In the selected case (represented by space-filling model A and the corresponding ribbon diagram D), the RGD loop from domain 10 and the “synergy” region from domain 9 (SYN) are on the same face of modeled domains 7e10 and thus easily accessible to interaction with integrin molecules. Similarly to schematic diagram C showing secondary structures of domain 10, each of fibronectin domains consists of two beta sheets, one of four strands (G, F, C, and C0 ) and one of three strands (A, B, and E), arranged as a beta sandwich. Adapted from Leahy [149].

[149]). This loop enables RGD to become accessible in case of favorable conformation of fibronectin molecule. In accordance with phage display library studies, RGD motifs display different affinities when interacting with various integrins. Such affinities depend among others on aa neighboring the RGD (e.g. GRGDSP belongs to optimized structures interacting with integrins; [150]). In other experiments with phages, RGD-containing peptides can nonreciprocally inhibit interaction of an unrelated peptide (RETAWACGA) with cell surface. This indicates the possibility of partial differences between aa responsible for interactions of integrin binding sites with the corresponding peptide ligands [150]. Such differences would then explain the observed inhibition hierarchy of interacting SLM. Two types of secondary structures frequently participate in interactions. Loops often determine the functional specificity of a given protein framework, contributing to active and binding sites, such as antibody complementarity determining regions (CDR), ligand binding sites (ATP, calcium binding sites, NAD(P)), and DNA binding or enzyme active sites (e.g. SereThr kinases or serine proteases; [151]). In accordance with this fact, functional differences between members of the same protein family usually follow from structural differences on the protein surface, which frequently correspond to exposed loop regions [151]. Similarly to loops, alpha helical coiled-coil structures frequently participate in highly specific homomeric or heteromeric proteineprotein interactions [152]. Coiled coils are amongst the most ubiquitous folding structures found in proteins and have not only been identified in structural proteins but also play an important role in various intracellular regulation processes as well as membrane fusion [153]. Recently accumulated evidence has indicated that native proteins do not necessarily require a unique 3D structure to be biologically active, and in some cases structural disorder or intrinsic flexibility has been a prerequisite for their function [154e157]. The interactions of the corresponding disordered proteins or disordered regions (cf. Table 1) are accompanied by conformational selection, after which only a single of the fluctuating conformations

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

can interact [156]. Such conformational state improves the repertoire of interactions and evolution of specificities but worsens the possibility of molecular docking analysis (cf. Ref. [10]). Disordered binding regions are even less than 30 residues long [156], which most likely corresponds to the extent of supersecondary structures. Conformation selection of disordered regions is for instance important for regulatory networks of p53 molecules very frequently involved in cancerogenesis [158]. The short N-terminal domain of p53 is devoid of tertiary structure and largely lacks secondary structure elements [159]. This domain shows only small helical preferences in the unbound form, whereas binding alters its original structure to a well-defined a-helix. The interactions of p53 are largely mediated by three hydrophobic residues Phe19, Trp23, and Leu26, that otherwise also fit into the cleft on the surface of p53 inhibitor (oncoprotein) MDM2 [160]. In accordance with the considered flexibility of disordered regions, proline-mediated flexibility based on trans/cis isomerization of proline residues has been described many years ago ([121]; cf. Chapter 2), defining perhaps a special type of disordered regions. This flexibility appears to be important for many different interactions. For instance, structural alteration of flexible C-terminal domain (CTD) of RNA polymerase II catalyzed by prolyl-isomerase Ess1 enables the dephosphorylation of CTD and thus influences the repertoire of genes transcribed by this polymerase [161]. SH3 domains are small protein modules of 60e85 amino acids that differently bind to various short proline-rich sequences with moderate-to-low affinity and specificity [162,163]. Interactions with SH3 domains play a crucial role in regulation of many cellular processes (some, e.g. SH3 domains of Abl, Grb2, Yes, p41 and myelin basic protein, are related to human or model mouse and avian cancers, AIDS and autoimmunity) and have thus been interesting targets for drug design [163e166]. A more detailed analysis of protein structures revealed flexibility of both SH3 binding sites and proline regions [163,165]. This flexibility was reduced after the corresponding interaction, indicating a process of conformation selection typical for disordered regions [162,165]. Proline flexibility possibly also improves the repertoire of antibody interactions. Hence diprolyls and prolyls frequently occur in framework regions of Ig (FR) inserted between different CDR regions mostly interacting with antigens [89]. It is interesting that certain Ig segments and some phosphopeptides can exhibit peptide consensus-derived similarity ([24,89,167,168]; Table 3). In principle, probabilistically restricted consensus of artificially prepared PKSI (i.e. peptidic protein kinase substrates and inhibitors) was similar to the boundary line of FR1and CDR1-related segments of the most common aa sequences related to different Ig domains, as found in our preceding papers [89,167,168]. The proposed explanation was based on: (i) horizontal transfer of the corresponding ancestral structural elements [24,167] and/or (ii) convergent optimization of short sequences to a broad structural variability (cf. [167]). Several facts support the discussed sequence relationship. (i) Four of six codons related to the most frequently predicted target substrate serine sites can compose hypermutation motifs frequent in CDR1 and CDR2, and functionally important for specific recognition by variable domains of Ig (IgV). (ii) The structural traces of possible insertion/ deletion-related relationships on amino acid level (including “positionally dispersed” aa relationships) were observed in both structural groups, i.e. in the cases of PKSI and IgV (see specific papers [11,12,167]). (iii) Sequence similarities of most similar PKSI peptides with protein kinase A and C exhibited the same location on the boundary line of FR1- and CDR1-related segments [167,168] as on-line predicted endogenous substrate regions of the same specificity found in sequences of the first MSA-derived record (MSA-R) of Table 3.

1517

Duplicity (mostly superior ATM and slightly less valid Aurorarelated sites of phosphorylation were observed) and multiplicity of predictions were found in searches for endogenous phosphorylated sites in conserved variable Ig domain sequences (igvds) and pre-selected shark (Carcharhinus plumbeus) IgW as representatives ([23,24]; Table 3), when using cut-off limit 0.8 (data not shown). The duplicity/multiplicity suggests possible sensitivity of the selected maxima to discrete aa sequence changes such as mutations or insertionedeletion changes on aa level (cf. Refs. [11,12]). In addition, new or better scoring of unclassified sites can be seen in the pre-selected representative shark IgW segment (cf. Ref. [24]), which is an unique authentic (non-constructed) Ig sequence in Table 3. Consequently, both the observed duplicity/ multiplicity and the displayed IgW relationship of predicted sites raise a question whether the best-evaluated ATM can indeed functionally phosphorylate the predicted sites, or if another perhaps unknown or less known kinase would specifically react. The two strategies selecting columns dense with respect to occurrence of the predicted sites are described in Table 3. These strategies are in fact based on (i) the usage of two independent methods observing all igvds (strategy 1) and (ii) evaluation concerning (in agreement with the results of the previous studies) specifically only IgV domains but not related T-cell receptor (i.e. TCR) derived igvds (strategy 2). Simultaneous usage of these two strategies determines only two columns in MSA-R of Table 3. Interestingly, one of these columns is located in the C-terminus of FR1, i.e. inside the previously described and differently derived PKSI-related region ([24,167,168]; see above). In accordance with the usage of highly conserved sequences in Table 3, the finally selected predicted phosphorylation sites may correspond at least to: (i) ancestors of antigen receptors or (ii) perhaps some products of “living fossils” among mammalian IgV and TCR genes. We propose three possible effects of the corresponding phosphorylation, i.e. (i) alteration of specificity of given Ig/Ig-related molecules, (ii) functional dissociation of binding sites and epitopes and (iii) antigen recognition or its presentation in the context of phosphate group or IgV-related phosphopeptide. In accordance with the last alternative, recognition of phosphopeptides by gamma delta TCR has recently been described [169]. The dense columns in Table 3 imply also another question, i.e. whether the frequencies of mutations or insertion/deletion changes mentioned above can increase at least in some of IgV-related phosphorylated segments of Ig-unrelated proteins. This concerns first of all predicted kinase-related sites densely occurring in the columns located at positions corresponding to hypervariable CDR1- and CDR2 regions of the displayed MSA-R. From biotechnological point of view two terms appear to be important when considering domain interactions, i.e. mimotopes and recognition units. The term mimotope denotes various peptides or peptidyls of about 5e20 aa extent mimicking different protein structures or some organic compounds. Mimotopes have been generated as random, partially random or gene fragment derived peptides mostly prepared using phage display libraries obtained by selection for required interaction [170e173]. Sequences and structures of mimotopes have been recently cataloged together with annotation of their mimicking abilities in various databases and some program tools. This enables us to predict interacting structures, e.g. alternative epitopes, peptide ligands or small binding sites [174e178] and other structures often belonging to important IS. Intradomain recognition units (or more precisely minimized recognition units) are usually segments formed at the least by slightly truncated folds and at the most by truncated domains. These units are able to substitute native interaction of proteins while keeping sufficient affinity and specificity [179e181].

1518

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

Table 3 Predicted phosphorylated sites in conserved variable domains of Ig and TCR and similar variable region sequence of representative IgW.

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

4. IS necessary for enzyme reactions The usual structural description of an enzyme active site includes catalytic dyads, triads, tetrads or pentads and catalytic motifs. Catalytic dyads, triads, tetrads and pentads represent configurations of two, three, four or five catalytic aa participating directly in enzyme reaction [182e189] (cf. also Chapter 1 and Table 1), respectively. These aa are usually located at variously distant sites of a single domain, several domains or even several different subunits. Aspartate, histidine, lysine, serine and tyrosine are the most frequent catalytic aa. In agreement with the Pearson’s acid/base concept, medium hard Lewis acids (e.g. Mn2þ, Mg2þ, Co2þ, Ni2þ, Cu2þ, Zn2þ) and bases (RNH2, OH, imidazole) constitute active sites of enzymes [190e193]. From a more special point of view, enol forms (e.g. widely known aldol condensation of serine with carbonyls in case of serine proteinases), both electrophilic and nucleophilic properties of aromatic histidine, low barrier hydrogen bonds, pre-orientation of general acid residues during non-perfect synchronization in active site and metal coordination frequently participate during enzyme reactions [193e195]. An example of the corresponding organic reactions accompanying individual enzyme activity is shown in Fig. 3 [196]. Catalytic motifs usually contain (i) a single catalytic aa, (ii) one to less than ten additional aa in restricted positions, and (iii) sometimes also uncertain/variable positions [197e199]. Interestingly, seven aa (i.e. more than five aa forming a maximum pentad group) of the active site were selected when comparing dispersion interaction energies during simulated reaction of influenza virus neuraminidase-1 [200]. In fact, these seven aa identify aa tetrad, and the three residual aa (closely neighboring to catalytic aa of the tetrad) perhaps compose additional aa of catalytic motifs. As demonstrated by enzyme engineering, additional sites of mini-enzymes can sometimes interact with distinct regions of substrates resulting thus in mini-enzymes of improved enzyme specificity [201]. In such a case we speak about additional interaction context of recognized substrate structures. In more

1519

differentiated terminology of larger interacting molecules we then distinguish several functional entities: (i) donor or acceptor recognizing structures including sometimes also coenzymes or prosthetic groups [202e204], (ii) sites responsible for subunit assembly [205e207], (iii) flexible segments [208e210] and (vi) allosteric sites (see below). In case of a more distant location between an active site and recognizing structures interacting with the donor or acceptor, we identify separate recognition sites, e.g. (i) sites of DNA and RNA polymerases interacting with specific origin oligonucleotides of DNA, (ii) matrix metalloproteinase (MMP) substrate binding sites termed exosites on domains located outside the catalytic MMP domains, and possibly also (iii) G-loop interacting site of enzyme AID (i.e. activation induced cytidine-deaminase) involved in hypermutation [147,211e214]. Sites interacting with identical coenzymes or cofactors can sometimes exhibit convergent similarities in supersecondary structures even in cases of unrelated enzymes. This concerns among others five pyridoxal phosphate dependent enzymes (aspartate amino-transferase, alanine racemase, the beta subunit of tryptophan synthase, D-amino acid amino-transferase and glycogen phosphorylase), which are composed of seven common structural segments within pyridoxal phosphate binding domains [215]. Mechanisms of allosteric regulation positively or negatively influence numerous enzyme activities as well as the less investigated related actions in molecules of various membrane receptors, acceptors and transporters. The corresponding process, i.e. allosteric transition, represents in fact an enthalpy-related change of a protein molecule triggered by interaction or modification of an allosteric (regulatory) site. Since different thermodynamic flows can participate, allosteric transitions are not always accompanied by a marked change of shape as was originally assumed (cf. Ref. [216]). This means that an initial allosteric event triggers a cascade of structural and/or less visible energetic changes passing through the molecule to a distant functional (regulated) site (active site in case of enzymes; [147,216,217]). Proteineprotein interactions via allosteric sites or interactions of these sites with small organic ligands

Fig. 3. Mechanism of reaction catalyzed by protein arginine deiminase 4 (PAD4). Organic chemistry-based scheme was in agreement with result of QM/MM (quantum mechanic/ molecular mechanic) simulation. Two steps of given reaction can be distinguished, i.e. deimination (structures colored in black) and hydrolysis (structures colored in gray). Though the scheme does not precisely demonstrate the modeled stereo-chemical conditions, the main features of chemical changes are clearly visible in this picture in contrast to usual spatial molecular models. PAD4 otherwise plays a critical role in rheumatoid arthritis (RA) and gene regulation. Adapted from Ke [196].

1520

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

and metal ions frequently trigger allosteric transitions [3]. Other allosteric sites are modified by phosphorylation as well as disulfide formation or its breaking in specific segments [3]. Allosteric transitions are widespread, i.e. they can be observed in all taxons of organisms including Archaea and vertebrates. Many allosteric changes are accompanied by quite dramatic structural changes such as hinge motion at the boundary line of two domains, or by more subtle changes such as the rotation of a single side chain to control the entrance to the functional binding pocket [3]. In addition, a more detailed structural analysis of allosteric transition revealed also many different special events as results of triggering an allosteric site (cf. also Fig. 1). This includes for instance transition from the alpha to pi-helix, pi deformations tightly related to other local structural motifs in case of c-erbB-2 kinase [218], two-step conformational changes accompanied by formation of salt bridges in case of PKC-theta [219] and assembly of a regulatory spine proposed as the crucial step in the activation of multiple protein kinases (including Tec kinases) or at least during propagation of their allosteric transition [220e222]. The regulatory spine consists of disparate residues that span the N- and C-terminal lobes of the corresponding kinase domain [220,221]. Since allostery offers a highly specific way to modulate protein function, understanding its mechanism is of increasing interest for protein science and drug discovery [217]. 5. Brief comment on in silico tools important for IS investigation Some repeat-related databases comprise general information [17,223,224] whereas other tools are more specific and concern for instance repeats present in lower eukaryotic pathogens [225], leucine-rich repeats forming among others functionally important Toll-like receptors [226,227], or PolyQ repeats related to neuronal toxicities (cf. Chapter 2; [129]). Many bioinformatic tools important for structurally-functional analysis of IS were described in our previous paper dealing with interactomics [10]. This concerns among others: (i) SLM, (ii) combined searches using different structural alphabets (primary, secondary, supersecondary and corresponding encoding structures of DNA) and sometimes knowledge-based functional and pathogenetic annotations (e.g. Refs. [228,229]),(iii) searches for 3D motifs, (iv) (static) contact maps (Fig. 1; [94,96e99]), (v) multiple structural alignments [230e 233] and (vi) docking studies. We have to add that molecular docking appears to be also an important method for investigation of polymorphisms in translated parts of genes. For instance, docking studies dealing with pharmacologically important alleles of cytochromes P450 CYP2D6 and CYP2C9 have shown that alleles with distinct aa in their protein sequences may differ in interaction with cooperating P450 reductase or in interaction between heme and substrate (due to structural changes in the small cytochrome segment), respectively [234e236]. Last but not least, the group of methods denoted as molecular dynamic simulations (MD) contributes to IS research. MD are sophisticated methods of 3D modeling of structural dynamics of proteins (e.g. highly sophisticated hybrid methods such as Born-Oppenheimer ab initio quantum mechanics/molecular mechanics with umbrella sampling method, or methods comprising evaluation of Fragment Molecular Orbitals; [113,196,237]). MD are mainly important for enzyme reactions and allosteric modulations [3,113,196,237,238]. MD help us to propose or re-evaluate the enzyme reaction mechanisms with respect to: (i) assembly and dissociations of transition intermediates formed by substrate and catalytic aa [113,237,238], (ii) reciprocal cooperation and energy changes of these aa [113,200,237,238], (iii) roles of catalytic motifs or folds [239,240], (iv) selection or prediction of specifically acting drugs [241e243]

and (v) network of conserved interactions regulating the allosteric signal [244]. Dynamic contact maps derived by MD not only lucidly demonstrate different local effects like changes in active or allosteric sites and in regions mediating transmission of allosteric effects (see Fig. 1), but also make it possible to predict functionally important aa sensitive to mutation when evolutional correlation based on aa conservativeness is simultaneously used [95]. Some additional bioinformatic tools are also mentioned in Table 1 dealing with different types of IS. 6. Instead of conclusion Recent bioinformatic tools as well as structural and structurale functional studies have been markedly improved, substantially specifying our knowledge about IS for the last twenty years. In spite of it the molecular anatomy of domains is still considerably incomplete, when assuming (i) structural multiplicity of both pathogenic molecular defects and possible sites of therapeutically important interactions, (ii) multi-level structural rules of evolutionary changes of domains and (iii) common features of dynamic structural behavior of proteins. Since part of structural studies concern still more complicated aspects of protein structure, new structural representations concerning also IS (cf. “classical” examples such as MSA, conserved domain similarities, gradual visualization of different IS location in 3D protein structures, integrated organic reactions of catalytic aa during enzyme reaction and static or dynamic contact maps) are still more necessary. Such new representations should indicate less observable or recognized structuralefunctional relationships and can thus contribute to better understanding of the investigated processes. References [1] T. Pawson, P. Nash, Domains assembly of cell regulatory systems through protein interaction, Science 300 (2003) 445e452. [2] C.K. Wang, L. Pan, J. Chen, M. Zhang, Extensions of PDZ domains as important structural and functional elements, Protein Cell 1 (2010) 737e751. [3] R.A. Laskowski, F. Gerick, J.M. Thorton, The structural basis of allosteric regulation in proteins, FEBS Lett. 583 (2009) 1692e1698. [4] G. Debret, A. Martel, P. Cuniasse, RASMOT-3D PRO: a 3D motif search webserver, Nucleic Acids Res. 37 (2009) W459eW464. [5] J.J. Lacapère, N. Bennett, Y. Dupont, F. Guillain, pH and magnesium dependence of ATP binding to sarcoplasmic reticulum ATPase. Evidence that the catalytic ATP-binding site consists of two domains, J. Biol. Chem. 265 (1990) 348e353. [6] X. Ji, W.W. Johnson, M.A. Sesay, L. Dickert, S.M. Prasad, H.L. Ammon, R.N. Armstrong, G.L. Gilliland, Structure and function of the xenobiotic substrate binding site of a glutathione S-transferase as revealed by X-ray crystallographic analysis of product complexes with the diastereomers of 9(S-glutathionyl)-10-hydroxy-9,10-dihydrophenanthrene, Biochemistry 33 (1994) 1043e1052. [7] A. Marsico, K. Scheubert, A. Tuukkanen, A. Henschel, C. Winter, R. Winnenburg, M. Schroeder, MeMotif: a database of linear motifs in alphahelical transmembrane proteins, Nucleic Acids Res. 38 (2010) D181eD189. [8] H. Dinkel, S. Michael, R.J. Weatheritt, N.E. Davey, K. Van Roey, B. Altenberg, G. Toedt, B. Uyar, M. Seiler, A. Budd, L. Jödicke, M.A. Dammert, C. Schroeter, M. Hammer, T. Schmidt, P. Jehl, C. McGuigan, M. Dymecka, C. Chica, K. Luck, A. Via, A. Chatr-Aryamontri, N. Haslam, G. Grebnev, R.J. Edwards, M.O. Steinmetz, H. Meiselbach, F. Diella, T.J. Gibson, ELM e the database of eukaryotic linear motifs, Nucleic Acids Res. 40 (2012) D242eD251. [9] R.J. Weatheritt, K. Luck, E. Petsalaki, N.E. Davey, T.J. Gibson, The identification of short linear motif-mediated interfaces within the human interactome, Bioinformatics 28 (2012) 976e982. [10] J. Kubrycht, K. Sigler, P. Sou cek, Virtual interactomics of proteins from biochemical standpoint, Mol. Biol. Int. 2012 (2012). Article ID 976385. [11] M. Ohlin, C.A. Borrebaeck, Insertions and deletions in hypervariable loops of antibody heavy chains contribute to molecular diversity, Mol. Immunol. 35 (1998) 233e238. [12] P. Wilson, Y.J. Liu, J. Banchereau, J.D. Capra, V. Pascual, Amino acid insertions and deletions contribute to diversify the human Ig repertoire, Immunol. Rev. 162 (1998) 143e151. [13] T. Prakash, M. Khandelwal, D. Dasgupta, D. Dash, S.K. Brahmachari, CoPS: comprehensive peptide signature database, Bioinformatics 20 (2004) 2886e2888.

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524 [14] N. Matsushima, H. Yoshida, Y. Kumaki, M. Kamiya, T. Tanaka, Y. Izumi, R.H. Kretsinger, Flexible structures and ligand interactions of tandem repeats consisting of proline, glycine, asparagine, serine, and/or threonine rich oligopeptides in proteins, Curr. Protein Pept. Sci. 9 (2008) 591e610. [15] A.V. Kajava, A.C. Steven, Beta-rolls, beta-helices, and other beta-solenoid proteins, Adv. Protein Chem. 73 (2006) 55e96. [16] H.T. Orr, H.Y. Zoghbi, Trinucleotide repeat disorders, Annu. Rev. Neurosci. 30 (2007) 575e621. [17] J. Jorda, T. Baudrand, A.V. Kajava, PRDB: Protein Repeat DataBase, Proteomics 12 (2012) 1333e1336. [18] D. Sirim, M. Widmann, F. Wagner, J. Pleiss, Prediction and analysis of the modular structure of cytochrome P450 monooxygenases, BMC Struct. Biol. 10 (2010). Article ID 34. [19] I.K. Huang, J. Pei, N.V. Grishin, Defining and predicting structurally conserved regions in protein superfamilies, Bioinformatics 29 (2013) 175e181. [20] A. Marchler-Bauer, A.R. Panchenko, B.A. Shoemaker, P.A. Thiessen, L.Y. Geer, S.H. Bryant, CDD: a database of conserved domain alignments with links to domain three-dimensional structure, Nucleic Acids Res. 30 (2002) 281e283. [21] N. Hulo, C.J. Sigrist, V. Le Saux, P.S. Langendijk-Genevaux, L. Bordoli, A. Gattiker, E. De Castro, P. Bucher, A. Bairoch, Recent improvements to the PROSITE database, Nucleic Acids Res. 32 (2004) D134eD137. [22] Q.J. Su, L. Lu, S. Saxonov, D.L. Brutlag, eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity, Nucleic Acids Res. 33 (2005) D178eD182. [23] A. Marchler-Bauer, S. Lu, J.B. Anderson, F. Chitsaz, M.K. Derbyshire, C. DeWeese-Scott, J.H. Fong, L.Y. Geer, R.C. Geer, N.R. Gonzales, M. Gwadz, D.I. Hurwitz, J.D. Jackson, Z. Ke, C.J. Lanczycki, F. Lu, G.H. Marchler, M. Mullokandov, M.V. Omelchenko, C.L. Robertson, J.S. Song, N. Thanki, R.A. Yamashita, D. Zhang, N. Zhang, C. Zheng, S.H. Bryant, CDD: a Conserved Domain Database for the functional annotation of proteins, Nucleic Acids Res. 39 (2011) D225eD229. [24] J. Kubrycht, K. Sigler, M. Ruzicka, P. Soucek, J. Borecky, P. Jezek, Ancient phylogenetic beginnings of immunoglobulin hypermutation, J. Mol. Evol. 63 (2006) 691e706. [25] S. Ding, S. Zhang, Y. Li, T. Wang, A novel protein structural classes prediction method based on predicted secondary structure, Biochimie 94 (2012) 1166e1171. [26] D.P. Klose, B.A. Wallace, R.W. Janes, 2Struc: the secondary structure server, Bioinformatics 26 (2010) 2624e2625. [27] P. May, A. Kreuchwig, T. Steinke, I. Koch, PTGL: a database for secondary structure-based protein topologies, Nucleic Acids Res. 38 (2010) D326eD330. [28] E. Faraggi, T. Zhang, Y. Yang, L. Kurgan, Y. Zhou, SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles, J. Comput. Chem. 33 (2012) 259e267. [29] H.N. Lin, T.Y. Sung, S.Y. Ho, W.L. Hsu, Improving protein secondary structure prediction based on short subsequences with local structure similarity, BMC Genomics 11 (Suppl. 4) (2010). Article ID S4. [30] S. Kobs-Conrad, H. Lee, A.M. DiGeorge, P.T.P. Kaumaya, Engineered topographic determinants with ab, bab, baba topologies show high affinity binding to native protein antigen (lactate dehydrogenase-C4)*, J. Biol. Chem. 268 (1993) 25285e25295. [31] D. van der Spoel, H.J. Vogel, H.J. Berendsen, Molecular dynamics simulations of N-terminal peptides from a nucleotide binding protein, Proteins 24 (1996) 450e466. [32] L.G. Laajoki, E. Le Breton, G.K. Shooter, J.C. Wallace, G.L. Francis, J.A. Carver, M.A. Keniry, Secondary structure determination of 15N-labelled human long-[Arg-3]-insulin-like growth factor 1 by multidimensional NMR spectroscopy, FEBS Lett. 420 (1997) 97e102. [33] J.C. Horng, V. Moroz, D.J. Rigotti, R. Fairman, D.P. Raleigh, Characterization of large peptide fragments derived from the N-terminal domain of the ribosomal protein L9: definition of the minimum folding motif and characterization of local electrostatic interactions, Biochemistry 41 (2002) 13360e13369. [34] J. Soding, A.N. Lupas, More than the sum of their parts: on the evolution of proteins from peptides, BioEssays 25 (2003) 837e846. [35] A.F. Angyán, A. Perczel, S. Pongor, Z. Gáspári, Fast protein fold estimation from NMR-derived distance restraints, Bioinformatics 24 (2008) 272e275. [36] D.S. Wishart, D. Arndt, M. Berjanskii, P. Tang, J. Zhou, G. Lin, CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data, Nucleic Acids Res. 36 (2008) W496eW502. [37] R.M. Rasia, E. Lescop, J.F. Palatnik, J. Boisbouvier, B. Brutscher, Rapid measurement of residual dipolar couplings for fast fold elucidation of proteins, J. Biomol. NMR 51 (2011) 369e378. [38] B.E. Coggins, J.W. Werner-Allen, A. Yan, P. Zhou, Rapid protein global fold determination using ultrasparse sampling, high-dynamic range artifact suppression, and time-shared NOESY, J. Am. Chem. Soc. 134 (2012) 18619e18630. [39] I. Sengupta, P.S. Nadaud, J.J. Helmus, C.D. Schwieters, C.P. Jaroniec, Protein fold determined by paramagnetic magic-angle spinning solid-state NMR spectroscopy, Nat. Chem. 4 (2012) 410e417. [40] W. Zheng, S. Doniach, Fold recognition aided by constraints from small angle X-ray scattering data, Protein Eng. Des. Sel. 18 (2005) 209e219. [41] L. Makowski, D.J. Rodi, S. Mandava, S. Devarapalli, R.F. Fischetti, Characterization of protein fold by wide-angle X-ray solution scattering, J. Mol. Biol. 383 (2008) 731e744.

1521

[42] R. Khayat, G.C. Lander, J.E. Johnson, An automated procedure for detecting protein folds from sub-nanometer resolution electron density, J. Struct. Biol. 170 (2010) 513e521. [43] M. Saha, M.C. Morais, FOLD-EM: automated fold recognition in medium- and low-resolution (4e15 Å) electron density maps, Bioinformatics 28 (2012) 3265e3273. [44] R.X. Yan, J.N. Si, C. Wang, Z. Zhang, DescFold: a web server for protein fold recognition, BMC Bioinform. 10 (2006). Article ID 416. [45] Y. Yang, E. Faraggi, H. Zhao, Y. Zhou, Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates, Bioinformatics 27 (2011) 2076e2082. [46] B. Vishnepolsky, M. Pirtskhalava, CONTSOR e a new knowledge-based fold recognition potential, based on side chain orientation and contacts between residue terminal groups, Protein Sci. 21 (2012) 134e141. [47] Y. Hong, S.V. Chintapalli, K.D. Ko, G. Bhardwaj, Z. Zhang, D. van Rossum, R.L. Patterson, Predicting protein folds with fold-specific PSSM libraries, PLoS One 6 (2011) e20557. [48] A. Dehzangi, S. Phon-Amnuaisuk, Fold prediction problem: the application of new physical and physicochemical-based features, Protein Pept. Lett. 18 (2011) 174e185. [49] N. Sibille, P. Bernadó, Structural characterization of intrinsically disordered proteins by the combined use of NMR and SAXS, Biochem. Soc. Trans. 40 (2012) 955e962. [50] Z. Dosztányi, B. Mészáros, I. Simon, ANCHOR: web server for predicting protein binding regions in disordered proteins, Bioinformatics 25 (2009) 2745e2746. [51] L.P. Kozlowski, J.M. Bujnicki, MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins, BMC Bioinform. 13 (2012). Article ID 111. [52] W.M. Brown, D.L. Van der Jagt, Creating artificial binding pocket boundaries to improve the efficiency of flexible ligand docking, J. Chem. Inf. Comput. Sci. 44 (2004) 1412e1422. [53] C. Hetényi, D. van der Spoel, Toward prediction of functional protein pockets using blind docking and pocket search algorithms, Protein Sci. 20 (2011) 880e893. [54] F. Yu, L. Lu, L. Du, X. Zhu, A.K. Debnath, S. Jiang, Approaches for identification of HIV-1 entry inhibitors targeting gp41 pocket, Viruses 5 (2013) 127e149. [55] Y.F. Chen, K.C. Hsu, S.R. Lin, W.C. Wang, Y.C. Huang, J.M. Yang, SiMMap: a web server for inferring site-moiety map to recognize interaction preferences between protein pockets and compound moieties, Nucleic Acids Res. 38 (2010) W424eW430. [56] C.Y. Wu, Y.C. Chen, C. Lim, A structural-alphabet-based strategy for finding structural motifs across protein families, Nucleic Acids Res. 38 (2010) e150. [57] A.J. Baldwin, L.E. Kay, NMR spectroscopy brings invisible protein states into focus, Nat. Chem. Biol. 5 (2009) 808e814. [58] S.C. Gay, A.G. Roberts, J.R. Halpert, Structural features of cytochromes P450 and ligands that affect drug metabolism as revealed by X-ray crystallography and NMR, Future Med. Chem. 2 (2010) 1451e1468. [59] L.P. Calle, F.J. Cañada, J. Jiménez-Barbero, Application of NMR methods to the study of the interaction of natural products with biomolecular receptors, Nat. Prod. Rep. 28 (2011) 1118e1125. [60] W.T. Franks, A.H. Linden, B. Kunert, B.J. van Rossum, H. Oschkinat, Solid-state magic-angle spinning NMR of membrane proteins and protein-ligand interactions, Eur. J. Cell. Biol. 91 (2012) 340e348. [61] M. Hong, Y. Zhang, F. Hu, Membrane protein structure and dynamics from NMR spectroscopy, Annu. Rev. Phys. Chem. 63 (2012) 1e24. [62] M. Osawa, K. Takeuchi, T. Ueda, N. Nishida, I. Shimada, Functional dynamics of proteins revealed by solution NMR, Curr. Opin. Struct. Biol. 22 (2012) 660e669. [63] R.P. Venkitakrishnan, O. Benard, M. Max, J.L. Markley, F.M. Assadi-Porter, Use of NMR saturation transfer difference spectroscopy to study ligand binding to membrane proteins, Methods Mol. Biol. 914 (2012) 47e63. [64] O. Vinogradova, J. Qin, NMR as a unique tool in assessment and complex determination of weak protein-protein interactions, Top. Curr. Chem. 326 (2012) 35e45. [65] X. Ding, X. Zhao, A. Watts, G-protein-coupled receptor structure, ligand binding and activation as studied by solid-state NMR spectroscopy, Biochem. J. 450 (2013) 443e457. [66] L.F. Murga, Y. Wei, M.J. Ondrechen, Computed protonation properties: unique capabilities for protein functional site prediction, Genome Inform. 19 (2007) 107e118. [67] S. Sankararaman, K. Sjölander, INTREPID e INformation-theoretic TREe traversal for Protein functional site Identification, Bioinformatics 24 (2008) 2445e2452. [68] S. Somarowthu, M.J. Ondrechen, POOL server: machine learning application for functional site prediction in proteins, Bioinformatics 28 (2012) 2078e2079. [69] W. Feng, L. Pan, M. Zhang, Combination of NMR spectroscopy and X-ray crystallography offers unique advantages for elucidation of the structural basis of protein complex assembly, Sci. China Life Sci. 54 (2011) 101e111. [70] O. Fisette, P. Lagüe, S. Gagné, S. Morin, Synergistic applications of MD and NMR for the study of biological systems, J. Biomed. Biotechnol. 2012 (2012). article ID 254208. [71] D.R. Hall, D. Kozakov, S. Vajda, Analysis of protein binding sites by computational solvent mapping, Methods Mol. Biol. 819 (2012) 13e27.

1522

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

[72] D. Schneidman-Duhovny, A. Rossi, A. Avila-Sakar, S.J. Kim, J. VelázquezMuriel, P. Strop, H. Liang, K.A. Krukenberg, M. Liao, H.M. Kim, S. Sobhanifar, V. Dötsch, A. Rajpal, J. Pons, D.A. Agard, Y. Cheng, A. Sali, A method for integrative structure determination of protein-protein complexes, Bioinformatics 28 (2012) 3282e3289. [73] J.L. Stark, R. Powers, Application of NMR and molecular docking in structurebased drug discovery, Top. Curr. Chem. 326 (2012) 1e34. [74] C.T. Porter, G.J. Bartlett, J.M. Thornton, The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data, Nucleic Acids Res. 32 (2004) D129eD133. [75] S. Mitternacht, I.N. Berezovsky, A geometry-based generic predictor for catalytic and allosteric sites, Protein Eng. Des. Sel. 24 (2011) 405e409. [76] D.-B. Borza, W.T. Morgan, Histidine-proline-rich glycoprotein as a plasma pH sensor, J. Biol. Chem. 273 (1998) 5493e5499. [77] S.R. Tzeng, C.G. Kalodimos, Protein dynamics and allostery: an NMR view, Curr. Opin. Struct. Biol. 21 (2011) 62e67. [78] G. Manley, J.P. Loria, NMR insights into protein allostery, Arch. Biochem. Biophys. 519 (2012) 223e231. [79] N.E. Davey, R.J. Edwards, D.C. Shields, Computational identification and analysis of protein short linear motifs, Front. Biosci. 15 (2010) 801e825. [80] O. Schilling, C.M. Overall, Proteome-derived, database searchable peptide libraries for identifying protease cleavage sites, Nat. Biotechnol. 26 (2008) 685e694. [81] S.A. Shiryaev, A.Y. Savinov, P. Cieplak, B.I. Ratnikov, K. Motamedchaboki, J.W. Smith, A.Y. Strongin, Matrix metalloproteinase proteolysis of the myelin basic protein isoforms is a source of immunogenic peptides in autoimmune multiple sclerosis, PLoS One 4 (2009) e4952. [82] P. Minkiewicz, J. Dziuba, A. Iwaniak, M. Dziuba, M. Darewicz, BIOPEP database and other programs for processing bioactive peptide sequences, J. AOAC Int. 91 (2008) 965e980. [83] E.D. Crawford, J.E. Seaman, N. Agard, G.W. Hsu, O. Julien, S. Mahrus, H. Nguyen, K. Shimbo, H.A. Yoshihara, M. Zhuang, R.J. Chalkley, J.A. Wells, The DegraBase: a database of proteolysis in healthy and apoptotic human cells, Mol. Cell. Proteomics 12 (2013) 813e824. [84] N. Colaert, D. Maddelein, F. Impens, P. Van Damme, K. Plasman, K. Helsens, N. Hulstaert, J. Vandekerckhove, K. Gevaert, L. Martens, The Online Protein Processing Resource (TOPPR): a database and analysis platform for protein processing events, Nucleic Acids Res. 41 (2013) D333eD337. [85] S. Yoon, J.C. Ebert, E.Y. Chung, G. De Micheli, R.B. Altman, Clustering protein environments for function prediction: finding PROSITE motifs in 3D, BMC Bioinform. 8 (Suppl. 4) (2007) S10. [86] W. Hugo, F. Song, Z. Aung, S.K. Ng, W.K. Sung, SLiM on Diet: finding short linear motifs on domain interaction interfaces in Protein Data Bank, Bioinformatics 26 (2010) 1036e1042. [87] D.P. Sargeant, M.R. Gryk, M.W. Maciejewski, V. Thapar, V. Kundeti, S. Rajasekaran, P. Romero, K. Dunker, S.C. Li, T. Kaneko, M.R. Schiller, Secondary structure, a missing component of sequence-based minimotif definitions, PLoS One 7 (2012) e49957. [88] M. Akke, Conformational dynamics and thermodynamics of protein-ligand binding studied by NMR relaxation, Biochem. Soc. Trans. 40 (2012) 419e423. [89] J. Kubrycht, K. Sigler, Animal membrane receptors and adhesive molecules, Crit. Rev. Biotechnol. 17 (1997) 123e147. [90] D.T. Chang, T.Y. Chien, C.Y. Chen, seeMotif: exploring and visualizing sequence motifs in 3D structures, Nucleic Acids Res. 37 (2009) W552eW558. [91] J. Ponomarenko, N. Papangelopoulos, D.M. Zajonc, B. Peters, A. Sette, P.E. Bourne, IEDB-3D: structural data within the immune epitope database, Nucleic Acids Res. 39 (2011) D1164eD1170. [92] A. Venkataraman, T.H. Chew, Y.A. Hussein, M.S. Shamsir, A protein short motif search tool using amino acid sequence and their secondary structure assignment, Bioinformation 7 (2011) 304e306. [93] M.S. Nawaz, Q.U. Ain, U. Seemab, S. Rashid, MotViz: a tool for sequence motif prediction in parallel to structural visualization and analyses, Genomics Proteomics Bioinform. 10 (2012) 35e43. [94] C. Vehlow, H. Stehr, M. Winkelmann, J.M. Duarte, L. Petzold, J. Dinse, M. Lappe, CMView: interactive contact map visualization and analysis, Bioinformatics 27 (2011) 1573e1574. [95] Z. Chen, S. Rappert, J. Sun, A.P. Zeng, Integrating molecular dynamics and coevolutionary analysis for reliable target prediction and deregulation of the allosteric inhibition of aspartokinase for amino acid production, J. Biotechnol. 154 (2011) 248e254. [96] A. Godzik, J. Skolnick, A. Kolinski, Regularities in interaction patterns of globular proteins, Protein Eng. 6 (1993) 801e810. [97] A. Caprara, R. Carr, S. Istrail, G. Lancia, B. Walenz, 1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap, J. Comput. Biol. 11 (2004) 27e52. [98] B. Xue, E. Faraggi, Y. Zhou, Predicting residueeresidue contact maps by a two-layer, integrated neural-network method, Proteins 76 (2009) 176e183. [99] P. Di Lena, P. Fariselli, L. Margara, M. Vassura, R. Casadio, Fast overlapping of protein contact maps by alignment of eigenvectors, Bioinformatics 26 (2010) 2250e2258. [100] B. VanSchouwen, R. Selvaratnam, F. Fogolari, G. Melacini, Role of dynamics in the autoinhibition and activation of the exchange protein directly activated by cyclic AMP (EPAC), J. Biol. Chem. 286 (2011) 42655e42669. [101] A.E. Kister, I. Gefald, Finding of residues crucial for supersecondary structure formation, Proc. Natl. Acad. Sci. U. S. A. 106 (2009) 18996e19000.

[102] C. Blouin, D. Butt, A.J. Roger, Rapid evolution in conformational space: a study of loop regions in a ubiquitous GTP binding domain, Protein Sci. 13 (2004) 608e616. [103] J.T. Welch, W.R. Kearney, S.J. Franklin, Lanthanide-binding helix-turn-helix peptides: solution structure of a designed metallonuclease, Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 3725e3730. [104] L.G. Presta, G.D. Rose, Helix signals in proteins, Science 240 (1988) 1632e1641. [105] R. Aurora, G.D. Rose, Helix capping, Protein Sci. 7 (1998) 721e738. [106] B.M. Hespenheide, L.A. Kuhn, Discovery of a significant, nontopological preference for antiparallel alignment of helices with parallel regions in sheets, Protein Sci. 12 (2003) 1119e1125. [107] G.A. Papoian, J. Ulander, M.P. Eastwood, Z. Luthey-Schulten, P.G. Wolynes, Water in protein structure prediction, Proc. Natl. Acad. Sci. U. S. A. 101 (2004) 3352e3357. [108] D.G. Fedorov, K. Kitaura, H. Li, J.H. Jensen, M.S. Gordon, The polarizable continuum model (PCM) interfaced with the fragment molecular orbital method (FMO), J. Comput. Chem. 27 (2006) 976e985. [109] H. Li, D.G. Fedorov, T. Nagata, K. Kitaura, J.H. Jensen, M.S. Gordon, Energy gradients in combined fragment molecular orbital and polarizable continuum model (FMO/PCM) calculation, J. Comput. Chem. 31 (2010) 778e790. [110] A. Cavalli, P. Carloni, Enzymatic GTP hydrolysis: insights from an ab initio molecular dynamics study, J. Am. Chem. Soc. 124 (2002) 3763e3768. [111] E. Krieger, T. Darden, S.B. Nabuurs, A. Finkelstein, G. Vriend, Making optimal use of empirical energy functions: force-field parameterization in crystal space, Proteins 57 (2004) 678e683. [112] B.P. Mukhopadhyay, B. Ghosh, H.R. Bairagya, A.K. Bera, R.K. Roy, Conserved water molecular dynamics of the different X-ray structures of rusticyanin: an unique aquation potentiality of the ligand bonded Cuþþ center, J. Biomol. Struct. Dyn. 24 (2007) 369e378. [113] T. Nakamura, A. Yamaguchi, H. Kondo, H. Watanabe, T. Kurihara, N. Esaki, S. Hirono, S. Tanaka, Roles of K151 and D180 in L-2-haloacid dehalogenase from Pseudomonas sp. YL: analysis by molecular dynamics and ab initio fragment molecular orbital calculations, J. Comput. Chem. 30 (2009) 2625e2634. [114] S.B. de Beer, N.P. Vermeulen, C. Oostenbrink, The role of water molecules in computational drug design, Curr. Top. Med. Chem. 10 (2010) 55e66. [115] C.-W. Tung, M. Ziehm, A. Kamper, O. Kohlbacher, S.-Y. Ho, POPISK: T-cell reactivity prediction using support vector machines and string kernels, BMC Bioinform. 12 (2011). Article ID 446. [116] M.A. Andrade, C. Perez-Iratxeta, C.P. Ponting, Protein repeats: structures, functions, and evolution, J. Struct. Biol. 134 (2001) 117e131. [117] R. Sabarinathan, R. Basu, K. Sekar, ProSTRIP: a method to find similar structural repeats in three-dimensional protein structures, Comput. Biol. Chem. 34 (2010) 126e130. [118] P. Bork, R.F. Doolittle, Proposed acquisition of an animal protein domain by bacteria, Proc. Natl. Acad. Sci. U. S. A. 89 (1992) 8990e8994. [119] A.A. Terentiev, N.T. Moldogazieva, Cell adhesion proteins and alphafetoprotein. Similar structural motifs as prerequisites for common functions, Biochemistry (Mosc) 72 (2007) 920e935. [120] M.S. Miettinen, V. Knecht, L. Monticelli, Z. Ignatova, Assessing polyglutamine conformation in the nucleating event by molecular dynamics simulations, J. Phys. Chem. B 116 (2012) 10259e10265. [121] F.X. Schmid, Prolyl isomerase: enzymatic catalysis of slow protein-folding reactions, Annu. Rev. Biophys. Biomol. Struct. 22 (1993) 123e142. [122] J.H. Morse, S. Antohi, K. Kasturi, S. Saito, M. Fotino, M. Humbert, G. Simonneau, R.J. Basst, C.A. Bona, Fine specificity of anti-fibrillin-1 autoantibodies in primary pulmonary hypertension syndrome, Scand. J. Immunol. 51 (2000) 607e611. [123] M. Yamazaki, R. Kitamura, S. Kusano, H. Eda, S. Sato, M. Okawa-Takatsuji, S. Aotsuka, K. Yanagi, Elevated immunoglobulin G antibodies to the prolinerich amino-terminal region of EpsteineBarr virus nuclear antigen-2 in sera from patients with systemic connective tissue diseases and from a subgroup of Sjögren’s syndrome patients with pulmonary involvements, Clin. Exp. Immunol. 139 (2005) 558e568. [124] B. Kemper, Structural basis for the role in protein folding of conserved proline-rich regions in cytochromes P450, Toxicol. Appl. Pharmacol. 199 (2004) 305e315. [125] K. Löster, K. Zeilinger, D. Schuppan, W. Reutter, The cysteine-rich region of dipeptidyl peptidase IV (CD 26) is the collagen-binding site, Biochem. Biophys. Res. Commun. 217 (1995) 341e348. [126] J. Denault, L. Bissonnette, J. Longpré, G. Charest, P. Lavigne, R. Leduc, Ectodomain shedding of furin: kinetics and role of the cysteine-rich region, FEBS Lett. 527 (2002) 309e314. [127] L. Wang, G. Yang, X. Wu, Identification of the role of a cysteine-rich region of PC6B by determining the enzymatic characteristics of its mutants, Mol. Biotechnol. 27 (2004) 15e22. [128] G. Thiel, J.A. Sarraj, C. Vinson, L. Stefano, K. Bach, Role of basic region leucine zipper transcription factors cyclic AMP response element binding protein (CREB), CREB2, activating transcription factor 2 and CAAT/enhancer binding protein a in cyclic AMP response element-mediated transcription, J. Neurochem. 92 (2005) 321e336. [129] R.S. Kass, J. Kurokawa, S.O. Marx, A.R. Marks, Leucine/isoleucine zipper coordination of ion channel macromolecular signaling complexes in the heart. Roles in inherited arrhythmias, Trends Cardiovasc. Med. 13 (2003) 52e56.

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524 [130] A.L. Robertson, M.A. Bate, S.G. Androulakis, S.P. Bottomley, A.M. Buckle, PolyQ: a database describing the sequence and domain context of polyglutamine repeats in proteins, Nucleic Acids Res. 39 (2011) D272eD276. [131] C.A. Janeway, P. Travers, Immunobiology. The Immune System in Health and Disease, Garland Publishing, New York, 1994. [132] E.A. James, A.K. Moustakas, D. Berger, L. Huston, G.K. Papadopoulos, W.W. Kwok, Definition of the peptide binding motif within DRB1*1401 restricted epitopes by peptide competition and structural modeling, Mol. Immunol. 45 (2008) 2651e2659. [133] H.M. Geysen, T.J. Mason, S.J. Rodda, Cognitive features of continuous antigenic determinants, J. Mol. Recognit. 1 (1998) 32e41. [134] J. Van de Water, M.E. Gershwin, P. Leung, A. Ansari, R.L. Coppel, The autoepitope of the 74-kD mitochondrial autoantigen of primary biliary cirrhosis corresponds to the functional site of dihydrolipoamide acetyltransferase, J. Exp. Med. 167 (1988) 1791e1799. [135] D.A.D. Parry, L.N. Marekov, P.M. Steinert, T.A. Smith, A role for the 1A and L1 rod domain segments in head domain organization and function of intermediate filaments: structural analysis of trichocyte keratin, J. Struct. Biol. 137 (2002) 97e108. [136] E.A. Kabat, T.T. Wu, H.M. Perry, K.S. Gottesman, C. Foeller, Sequences of Proteins of Immunological Interest. NIH publication No. 91-3242, Bethesda (1991). [137] X.-N. Dong, Y. Xiao, M.P. Dierich, Y.-H. Chen, N- and C-domains of HIV-1 gp41: mutation, structure and functions, Immunol. Lett. 75 (2001) 215e220. [138] V. Sivaraman, L. Zhang, E.G. Meissner, J.L. Jeffrey, L. Su, The heptad repeat 2 domain is a major determinant for enhanced human immunodeficiency virus type 1 (HIV-1) fusion and pathogenicity of a highly pathogenic HIV-1 Env, J. Virol. 83 (2009) 11715e11725. [139] K. Beck, I. Hunter, J. Engel, Structure and function of laminin: anatomy of a multidomain glycoprotein, FASEB J. 4 (1990) 148e160. [140] E. Pang, K. Lin, Yeast proteineprotein interaction binding sites: prediction from the motifemotif, motifedomain and domainedomain levels, Mol. Biosyst. 6 (2010) 2164e2173. [141] W.L. Jorgensen, Rusting of the lock and key model for protein-ligand binding, Science 254 (1991) 954e955. [142] S. Chaudhury, J.J. Gray, Conformer selection and induced fit in flexible backbone proteineprotein docking using computational and NMR ensembles, J. Mol. Biol. 381 (2008) 1068e1087. [143] M. Daëron, S. Jaeger, L. Du Pasquier, E. Vivier, Immunoreceptor tyrosinebased inhibition motifs: a quest in the past and future, Immunol. Rev. 224 (2008) 11e43. [144] D.T. Moore, B.W. Berger, W.F. DeGrado, Proteineprotein interactions in the membrane: sequence, structural, and biological motifs, Structure 16 (2008) 991e1001. [145] K.N. Pandey, Functional roles of short sequence motifs in the endocytosis of membrane receptors, Front. Biosci. 14 (2009) 5339e5360. [146] X. Ren, J.H. Hurley, Proline-rich regions and motifs in trafficking: from ESCRT interaction to viral exploitation, Traffic 12 (2011) 1282e1290. [147] B. Eisenhaber, F. Eisenhaber, Prediction of posttranslational modification of proteins from their amino acid sequence, Methods Mol. Biol. 609 (2010) 365e384. [148] B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of the Cell, fifth ed., Garland Science, New York, 2008. [149] D.J. Leahy, I. Aukhil, H.P. Erickson, 2.0 Å crystal structure of a four-domain segment of human fibronectin encompassing the RGD loop and synergy region, Cell 84 (1996) 155e164. [150] E. Koivunen, B. Wang, E. Ruoslahti, Isolation of a highly specific ligand for the alpha 5 beta 1 integrin from a phage display library, J. Cell. Biol. 124 (1994) 373e380. [151] N. Fernandez-Fuentes, B. Oliva, A. Fiser, A supersecondary structure library and search algorithm for modeling loops in protein structures, Nucleic Acids Res. 34 (2006) 2085e2097. [152] H.M. Strauss, S. Keller, Pharmacological interference with proteineprotein interactions mediated by coiled-coil motifs, Handb. Exp. Pharmacol. 186 (2008) 461e482. [153] B. Apostolovic, M. Danial, H.A. Klok, Coiled coils: attractive protein folding motifs for the fabrication of self-assembled, responsive and bioactive materials, Chem. Soc. Rev. 39 (2010) 3541e3575. [154] B. Mészáros, P. Tompa, I. Simon, Z. Dosztányi, Molecular principles of the interactions of disordered proteins, J. Mol. Biol. 372 (2007) 549e561. [155] B. Mészáros, I. Simon, Z. Dosztányi, Prediction of protein binding regions in disordered proteins, PLoS Comput. Biol. 5 (2009) e1000376. [156] B. Mészáros, I. Simon, Z. Dosztányi, The expanding view of proteineprotein interactions: complexes involving intrinsically disordered proteins, Phys. Biol. 8 (2011). Article ID 035003. [157] R. Nussinov, A.R. Panchenko, T. Przytycka, Physics approaches to protein interactions and gene regulation, Phys. Biol. 8 (2011). Article ID 030301. [158] C.J. Oldfield, J. Meng, J.Y. Yang, M.Q. Yang, V.N. Uversky, A.K. Dunker, Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners, BMC Genomics 9 (Suppl. 1) (2008) S1. [159] R. Dawson, L. Müller, A. Dehner, C. Klein, H. Kessler, J. Buchner, The N-terminal domain of p53 is natively unfolded, J. Mol. Biol. 332 (2003) 1131e1141. [160] P.H. Kussie, S. Gorina, V. Marechal, B. Elenbaas, J. Moreau, A.J. Levine, N.P. Pavletich, Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain, Science 274 (1996) 948e953.

1523

[161] S. Krishnamurthy, M.A. Ghazy, C. Moore, M. Hampsey, Functional interaction of the Ess1 prolyl isomerase with components of the RNA polymerase II initiation and termination machineries, Mol. Cell. Biol. 29 (2009) 2925e2934. [162] S. Yuzawa, M. Yokochi, H. Hatanaka, K. Ogura, M. Kataoka, K. Miura, V. Mandiyan, J. Schlessinger, F. Inagaki, Solution structure of Grb2 reveals extensive flexibility necessary for target recognition, J. Mol. Biol. 306 (2001) 527e537. [163] S. Casares, E. Ab, H. Eshuis, O. Lopez-Mayorga, N.A. van Nuland, F. ConejeroLara, The high-resolution NMR structure of the R21A Spc-SH3:P41 complex: understanding the determinants of binding affinity by comparison with Abl-SH3, BMC Struct. Biol. 7 (2007). Article ID 22. [164] J.M. Martín-García, I. Luque, P.L. Mateo, J. Ruiz-Sanz, A. Cámara-Artigas, Crystallographic structure of the SH3 domain of the human c-Yes tyrosine kinase: loop flexibility and amyloid aggregation, FEBS Lett. 581 (2007) 1701e1706. [165] E. Polverini, G. Rangaraj, D.S. Libich, J.M. Boggs, G. Harauz, Binding of the proline-rich segment of myelin basic protein to SH3 domains: spectroscopic, microarray, and modeling studies of ligand conformation and effects of posttranslational modifications, Biochemistry 47 (2008) 267e282. [166] A.M. Candel, N.A. van Nuland, F.M. Martin-Sierra, J.C. Martinez, F. ConejeroLara, Analysis of the thermodynamics of binding of an SH3 domain to prolinerich peptides using a chimeric fusion protein, J. Mol. Biol. 377 (2008) 117e135. [167] J. Kubrycht, J. Borecký, K. Sigler, Sequence similarities of protein kinase peptide substrates and inhibitors: comparison of their primary structures with immunoglobulin repeats, Folia Microbiol. 47 (2002) 319e358. [168] J. Kubrycht, J. Borecký, P. Soucek, P. Jezek, Sequence similarities of protein kinase substrates and inhibitors with immunoglobulins and model immunoglobulin homologue: cell adhesion molecule from the living fossil sponge Geodia cydonium. Mapping of coherent database similarities and implications for evolution of CDR1 and hypermutation, Folia Microbiol. 49 (2004) 219e246. [169] N. Page, N. Schall, J.-M. Strub, M. Quinternet, O. Chaloin, M. Décossas, M.T. Cung, A. Van Dorsselaer, J.-P. Briand, S. Muller, The spliceosomal phosphopeptide P140 controls the lupus disease by interacting with the HSC70 protein and via a mechanism mediated by gamma delta T cells, PLoS One 4 (2009) e5273. [170] J.J. Devlin, L.C. Panganiban, P.E. Devlin, Random peptide libraries: a source of specific protein binding molecules, Science 249 (1990) 404e406. [171] R. Schmitz, G. Baumann, H. Gram, Catalytic specificity of phosphotyrosine kinases Blk, Lyn, c-Src and Syk as assessed by phage display, J. Mol. Biol. 260 (1996) 664e677. [172] F. Fack, B. Hügle-Dörr, D. Song, I. Queitsch, G. Petersen, E.K. Bautz, Epitope mapping by phage display: random versus gene-fragment libraries, J. Immunol. Methods 206 (1997) 43e52. [173] M. Blüthner, C. Schäfer, C. Schneider, F.A. Bautz, Identification of major linear epitopes on the sp100 nuclear PBC autoantigen by the gene-fragment phagedisplay technology, Autoimmunity 29 (1999) 33e42. [174] S. Mandava, L. Makowski, S. Devarapalli, J. Uzubell, D.J. Rodi, RELIC-a bioinformatics server for combinatorial peptide analysis and identification of protein-ligand interaction sites, Proteomics 4 (2004) 1439e1460. [175] Y.X. Huang, Y.L. Bao, S.Y. Guo, Y. Wang, C.G. Zhou, Y.X. Li, Pep-3D-Search: a method for B-cell epitope prediction based on mimotope analysis, BMC Bioinform. 9 (2008). Article ID 538. [176] W.H. Chen, P.P. Sun, L. Lu, W.W. Guo, Y.X. Huang, Z.Q. Ma, MimoPro: a more efficient Web-based tool for epitope prediction using phage display libraries, BMC Bioinform. 12 (2011). Article ID 199. [177] J. Huang, B. Ru, P. Dai, Bioinformatics resources and tools for phage display, Molecules 16 (2011) 694e709. [178] J. Huang, B. Ru, P. Zhu, F. Nie, J. Yang, X. Wang, P. Dai, H. Lin, F.-B. Guo, N. Rao, MimoDB 2.0: a mimotope database and beyond, Nucleic Acids Res. 40 (Database issue) (2012) D271eD277. [179] R.A. Johanson, A.R. Shaw, M. Schlamowitz, Evidence that the CH2 domain of IgG contains the recognition unit for binding by the fetal rabbit yolk sac membrane receptor, J. Immunol. 126 (1981) 194e199. [180] T. Thomsen, J.B. Moeller, A. Schlosser, G.L. Sorensen, S.K. Moestrup, N. Palaniyar, R. Wallis, J. Mollenhauer, U. Holmskov, The recognition unit of FIBCD1 organizes into a noncovalently linked tetrameric structure and uses a hydrophobic funnel (S1) for acetyl group recognition, J. Biol. Chem. 285 (2010) 1229e1238. [181] C. Gaboriaud, L. Gregory-Pauron, F. Teillet, N.M. Thielens, I. Bally, G.J. Arlaud, Structure and properties of the Ca(2þ)-binding CUB domain, a widespread ligand-recognition unit involved in major biological functions, Biochem. J. 439 (2011) 185e193. [182] X. Qiu, J.S. Culp, A.G. DiLella, B. Hellmig, S.S. Hoog, C.A. Janson, W.W. Smith, S.S. Abdel-Meguid, Unique fold and active site in cytomegalovirus protease, Nature 383 (1996) 275e279. [183] T.M. Penning, 3 Alpha-hydroxysteroid dehydrogenase: three dimensional structure and gene regulation, J. Endocrinol. 150 (Suppl.) (1996) S175eS187. [184] C. Cheng, P. Kussie, N. Pavletich, S. Shuman, Conservation of structure and mechanism between eukaryotic topoisomerase I and site-specific recombinases, Cell 92 (1998) 841e850. [185] P. Iengar, C. Ramakrishnan, Knowledge-based modeling of the serine protease triad into non-proteases, Protein Eng. 12 (1999) 649e656. [186] D. Hyndman, D.R. Bauman, V.V. Heredia, T.M. Penning, The aldo-keto reductase superfamily homepage, Chem. Biol. Interact. 143/144 (2003) 621e631. [187] U. Oppermann, C. Filling, M. Hult, N. Shafqat, X. Wua, M. Lindh, J. Shafqat, E. Nordling, Y. Kallberg, B. Persson, H. Jornvall, Short-chain dehydrogenases/ reductases (SDR): the 2002 update, Chem. Biol. Int. 143/144 (2003) 247e253.

1524

J. Kubrycht et al. / Biochimie 95 (2013) 1511e1524

[188] K. Vanommeslaeghe, F. De Proft, S. Loverix, D. Tourwé, P. Geerlings, Theoretical study revealing the functioning of a novel combination of catalytic motifs in histone deacetylase, Bioorg. Med. Chem. 13 (2005) 3987e3992. [189] E. Chovancová, J. Kosinski, J.M. Bujnicki, J. Damborský, Phylogenetic analysis of haloalkane dehalogenases, Proteins 67 (2007) 305e316. [190] R.G. Pearson, Hard and soft acids and bases, J. Am. Chem. Soc. 85 (1963) 3533e3539. [191] R.G. Pearson, Hard and soft acids and bases, HSAB, part 1: fundamental principles, J. Chem. Educ. 45 (1968) 581e586. [192] R.G. Pearson, Hard and soft acids and bases, HSAB, part II: underlying theories, J. Chem. Educ. 45 (1968) 643e648. [193] A. Kotyk, J. Horák, Enzyme Kinetics, Academia, Prague, 1977. [194] M. Meyer, G. Wohlfahrt, J. Knäblein, D. Schomburg, Aspects of the mechanism of catalysis of glucose oxidase: a docking, molecular mechanics and quantum chemical study, J. Comput. Aided. Mol. Des. 12 (1998) 425e440. [195] R.J.T. Houk, A. Monzingo, E.V. Anslyn, Electrophilic coordination catalysis: a summary of previous thought and a new angle of analysis, Acc. Chem. Res. 41 (2008) 401e410. [196] Z. Ke, S. Wang, D. Xie, Y. Zhang, Born-Oppenheimer ab initio QM/MM molecular dynamics simulations of the hydrolysis reaction catalyzed by protein arginine deiminase 4, J. Phys. Chem. B 113 (2009) 16705e16710. [197] Z.Y. Zhang, Y. Wang, L. Wu, E.B. Fauman, J.A. Stuckey, H.L. Schubert, M.A. Saper, J.E. Dixon, The Cys(X)5Arg catalytic motif in phosphoester hydrolysis, Biochemistry 33 (1994) 15266e15270. [198] C. Zhang, D. Zhou, S. Zheng, L. Liu, S. Tao, L. Yang, S. Hu, Q. Feng, A chymotrypsin-like serine protease cDNA involved in food protein digestion in the common cutworm, Spodoptera litura: cloning, characterization, developmental and induced expression patterns, and localization, J. Insect Physiol. 56 (2010) 788e799. [199] B. He, G. Cai, Y. Ni, Y. Li, H. Zong, L. He, Characterization and expression of a novel cystatin gene from Schistosoma japonicum, Mol. Cell. Probes 25 (2011) 86e193. [200] S. Hitaoka, M. Harada, T. Yoshida, H. Chuman, Correlation analyses on binding affinity of sialic acid analogues with influenza virus neuraminidase1 using ab initio MO calculations on their complex structures, J. Chem. Inf. Model. 50 (2010) 1796e1805. [201] F. Tanaka, C.F. Barbas, A modular assembly strategy for improving the substrate specificity of small catalytic peptides, J. Am. Chem. Soc. 124 (2002) 3510e3511. [202] S. Ha, D. Walker, Y. Shi, S. Walker, The 1.9 Å crystal structure of Escherichia coli MurG, a membrane-associated glycosyltransferase involved in peptidoglycan biosynthesis, Protein Sci. 9 (2000) 1045e1052. [203] G. Sanli, J.I. Dudley, M. Blaber, Structural biology of the aldo-keto reductase family of enzymes: catalysis and cofactor binding, Cell. Biochem. Biophys. 38 (2003) 79e101. [204] D. Koesling, M. Russwurm, E. Mergia, F. Mullershausen, A. Friebe, Nitric oxide-sensitive guanylyl cyclase: structure and regulation, Neurochem. Int. 45 (2004) 813e819. [205] A. Roeben, J.M. Plitzko, R. Körner, U.M. Böttcher, K. Siegers, M. Hayer-Hartl, A. Bracher, Structural basis for subunit assembly in UDP-glucose pyrophosphorylase from Saccharomyces cerevisiae, J. Mol. Biol. 364 (2006) 551e560. [206] H.M. Jackson, T. Kawahara, Y. Nisimoto, S.M. Smith, J.D. Lambeth, Nox4 Bloop creates an interface between the transmembrane and dehydrogenase domains, J. Biol. Chem. 285 (2010) 10281e10290. [207] V. Mishra, A. Kumar, V. Ali, T. Nozaki, K.Y. Zhang, V. Bhakuni, Glu-108 is essential for subunit assembly and dimer stability of D-phosphoglycerate dehydrogenase from Entamoeba histolytica, Mol. Biochem. Parasitol. 181 (2012) 117e124. [208] B.K. Ho, D.A. Agard, Probing the flexibility of large conformational changes in protein structures through local perturbations, PLoS Comput. Biol. 5 (2009) e1000343. [209] N. Selevsek, S. Rival, A. Tholey, E. Heinzle, U. Heinz, L. Hemmingsen, H.W. Adolph, Zinc ion-induced domain organization in metallo-betalactamases: a flexible "zinc arm" for rapid metal ion transfer? J. Biol. Chem. 284 (2009) 16419e16431. [210] A.G. Roberts, M.J. Cheesman, A. Primak, M.K. Bowman, W.M. Atkins, A.E. Rettie, Intramolecular heme ligation of the cytochrome P450 2C9 R108H mutant demonstrates pronounced conformational flexibility of the B-C loop region: implications for substrate binding, Biochemistry 49 (2010) 8700e8708. [211] H. Cha, E. Kopetzki, R. Huber, M. Lanzendörfer, H. Brandstetter, Structural basis of the adaptive molecular recognition by MMP9, J. Mol. Biol. 320 (2002) 1065e1079. [212] C.M. Overall, Molecular determinants of metalloproteinase substrate specificity: matrix metalloproteinase substrate binding domains, modules, and exosites, Mol. Biotechnol. 22 (2002) 51e86. [213] I.B. Rogozin, M. Diaz, Cutting edge: DGYW/WRCH is a better predictor of mutability at G: C bases in Ig hypermutation than the widely accepted RGYW/WRCY motif and probably reflects a two-step activation-induced cytidine deaminase-triggered process, J. Immunol. 172 (2004) 3382e3384. [214] M.L. Duquette, P. Pham, M.F. Goodman, N. Maizels, AID binds to transcription-induced structures in c-MYC that map to regions associated with translocation and hypermutation, Oncogene 24 (2005) 5791e5798. [215] K.A. Denessiouk, A.I. Denesyuk, J.V. Lehtonen, T. Korpela, M.S. Johnson, Common structural elements in the architecture of the cofactor-binding domains in unrelated families of pyridoxal phosphate-dependent enzymes, Proteins 35 (1999) 250e261.

[216] C.-J. Tsai, A. del Sol, R. Nussinov, Allostery: absence of a change in shape does not imply that allostery is not at play, J. Mol. Biol. 378 (2008) 1e11. [217] A. Pandini, A. Fornili, F. Fraternali, J. Kleinjung, Detection of allosteric signal transmission by information-theoretic analysis of protein dynamics, FASEB J. 26 (2012) 868e881. [218] J.P. Duneau, N. Garnier, M. Genest, Insight into signal transduction: structural alterations in transmembrane helices probed by multi-1 ns molecular dynamics simulations, J. Biomol. Struct. Dyn. 15 (1997) 555e572. [219] J. Seco, C. Ferrer-Costa, J.M. Campanera, R. Soliva, X. Barril, Allosteric regulation of PKCq: understanding multistep phosphorylation and priming by ligands in AGC kinases, Proteins 80 (2012) 269e280. [220] A.P. Kornev, N.M. Haste, S.S. Taylor, L.F. Ten Eyck, Surface comparison of active and inactive protein kinases identifies a conserved activation mechanism, Proc. Natl. Acad. Sci. U. S. A. 103 (2006) 17783e17788. [221] A.P. Kornev, S.S. Taylor, L.F. Ten Eyck, A helix scaffold for the assembly of active protein kinases, Proc. Natl. Acad. Sci. U. S. A. 105 (2008) 14377e14382. [222] R.E. Joseph, Q. Xie, A.H. Andreotti, Identification of an allosteric signaling network within Tec family kinases, J. Mol. Biol. 403 (2010) 231e242. [223] A. Bahr, J.D. Thompson, J.C. Thierry, O. Poch, BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations, Nucleic Acids Res. 29 (2001) 323e326. [224] M.K. Kalita, G. Ramasamy, S. Duraisamy, V.S. Chauhan, D. Gupta, ProtRepeatsDB: a database of amino acid repeats in genomes, BMC Bioinform. 7 (2006). Article ID 336. [225] D.P. Depledge, R.P. Lower, D.F. Smith, RepSeq-a database of amino acid repeats present in lower eukaryotic pathogens, BMC Bioinform. 8 (2007). Article ID 122. [226] T. Wei, J. Gong, F. Jamitzky, W.M. Heckl, R.W. Stark, S.C. Rössle, LRRML: a conformational database and an XML description of leucine-rich repeats (LRRs), BMC Struct. Biol. 8 (2008). Article ID 47. [227] V. Offord, T.J. Coffey, D. Werling, LRRfinder: a web application for the identification of leucine-rich repeats and an integrative Toll-like receptor database, Dev. Comp. Immunol. 34 (2010) 1035e1041. [228] P.P. Kuksa, V. Pavlovic, Efficient motif finding algorithms for large-alphabet inputs, BMC Bioinform. 11 (2010). Article ID S1. [229] D.M. Standley, R. Yamashita, A.R. Kinjo, H. Toh, H. Nakamura, SeSAW: balancing sequence and structural information in protein functional mapping, Bioinformatics 26 (2010) 1258e1259. [230] A.S. Konagurthu, J.C. Whisstock, P.J. Stuckey, A.M. Lesk, MUSTANG: a multiple structural alignment algorithm, Proteins 64 (2006) 559e574. [231] C. Micheletti, H. Orland, MISTRAL: a tool for energy-based multiple structural alignment of proteins, Bioinformatics 25 (2009) 2663e2669. [232] A.S. Konagurthu, C.F. Reboul, J.W. Schmidberger, J.A. Irving, A.M. Lesk, P.J. Stuckey, J.C. Whisstock, A.M. Buckle, MUSTANG-MR structural sieving server: applications in protein structural analysis and crystallography, PLoS One 5 (2010) e10048. [233] W.Y. Siu, N. Mamoulis, S.M. Yiu, H.L. Chan, A data-mining approach for multiple structural alignment of proteins, Bioinformation 4 (2010) 366e370. [234] D. Allorge, D. Bréant, J. Harlow, J. Chowdry, J.M. Lo-Guidice, D. Chevalier, C. Cauffiez, M. Lhermitte, F.E. Blaney, G.T. Tucker, F. Broly, S.W. Ellis, Functional analysis of CYP2D6.31 variant: homology modeling suggests possible disruption of redox partner interaction by Arg440His substitution, Proteins 59 (2005) 339e346. [235] Y.H. Zhou, Q.C. Zheng, Z.S. Li, Y. Zhang, M. Sun, C.C. Sun, D. Si, L. Cai, Y. Guo, H. Zhou, On the human CYP2C9*13 variant activity reduction: a molecular dynamics simulation and docking study, Biochimie 88 (2006) 1457e1465. [236] H. Banu, N. Renuka, G. Vasanthakumar, Reduced catalytic activity of human CYP2C9 natural alleles for gliclazide: molecular dynamics simulation and docking studies, Biochimie 93 (2011) 1028e1036. [237] Y. Zhou, S. Wang, Y. Zhang, Catalytic reaction mechanism of acetylcholinesterase determined by Born-Oppenheimer ab initio QM/MM molecular dynamics simulations, J. Phys. Chem. B 114 (2010) 8817e8825. [238] P.R. Markwick, J.A. McCammon, Studying functional dynamics in biomolecules using accelerated molecular dynamics, Phys. Chem. Chem. Phys. 13 (2011) 20053e20065. [239] J. Yang, Molecular modeling of human hepatocyte PKA (cAMP-dependent protein kinase type-II) and its structure analysis, Protein Pept. Lett. 17 (2010) 646e659. [240] A.P. Eichenberger, L.J. Smith, V.F. van Gunsteren, Ester-linked hen egg white lysozyme shows a compact fold in a molecular dynamics simulation possible causes and sensitivity of experimentally observable quantities to structural changes maintaining this compact fold, FEBS J. 279 (2012) 299e315. [241] H. Vankayalapati, D.J. Bearss, J.W. Saldanha, R.M. Muñoz, S. Rojanala, D.D. Von Hoff, D. Mahadevan, Targeting aurora2 kinase in oncogenesis: a structural bioinformatics approach to target validation and rational drug design, Mol. Cancer Ther. 2 (2003) 283e294. [242] J. Caballero, J.H. Alzate-Morales, A. Vergara-Jaque, Investigation of the differences in activity between hydroxycycloalkyl N1 substituted pyrazole derivatives as inhibitors of B-Raf kinase by using docking, molecular dynamics, QM/MM, and fragment-based de novo design: study of binding mode of diastereomer compounds, J. Chem. Inf. Model. 51 (2011) 2920e2931. [243] J.D. Durrant, J.A. McCammon, Molecular dynamics simulations and drug discovery, BMC Biol. 9 (2011). Article ID 71. [244] R.E. Amaro, A. Sethi, R.S. Myers, V.J. Davisson, Z.A. Luthey-Schulten, A network of conserved interactions regulates the allosteric signal in a glutamine amidotransferase, Biochemistry 46 (2007) 2156e2173.