The Relation between α-Helical Conformation and Amyloidogenicity

The Relation between α-Helical Conformation and Amyloidogenicity

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018)...

1MB Sizes 0 Downloads 23 Views

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Article

The Relation between a-Helical Conformation and Amyloidogenicity Boris Haimov1 and Simcha Srebnik1,2,* 1

Russell Berrie Nanotechnology Institute and 2Department of Chemical Engineering, Technion – Israel Institute of Technology, Haifa, Israel

ABSTRACT Amyloid fibrils are stable aggregates of misfolded proteins and polypeptides that are insoluble and resistant to protease activity. Abnormal formation of amyloid fibrils in vivo may lead to neurodegenerative disorders and other systemic amyloidosis, such as Alzheimer’s, Parkinson’s, and atherosclerosis. Because of their clinical importance, amyloids are under intense scientific research. It is believed that short polypeptide segments within proteins are responsible for the transformation of correctly folded proteins into parts of larger amyloid fibrils and that this transition is modulated by environmental factors, such as pH, salt concentration, interaction with the cell membrane, and interaction with metal ions. Most studies on amyloids focus on the amyloidogenic sequences. The focus of this study is on the structure of the amyloidogenic a-helical segments because the a-helical secondary structure has been recognized to be a key player in different stages of the amyloidogenesis process. We have previously shown that the a-helical conformation may be expressed by two parameters (q and r) that form orthogonal coordinates based on the Ramachandran dihedrals (4 and j) and provide an illuminating interpretation of the a-helical conformation. By performing statistical analysis on a-helical conformations found in the Protein Data Bank, an apparent relation between a-helical conformation, as expressed by q and r, and amyloidogenicity is revealed. Remarkably, random amino acid sequences, whose helical structures were obtained from the most probable dihedral angles, revealed the same dependency of amyloidogenicity, suggesting the importance of a-helical structure as opposed to sequence.

INTRODUCTION Alzheimer’s, Parkinson’s, Creutzfeldt-Jakob’s, type II diabetes, atherosclerosis, and prolactinomas are merely a few examples of diseases caused by the abnormal formation of amyloid fibrils in vivo (1). These fibrils are stable aggregates of misfolded proteins and polypeptides that are insoluble and resistant to protease activity. Many different amyloid fibril structures have been empirically determined (2) with typical diameters ranging from 6 to 12 nm (3) and a cross-b architecture (4) in most cases. The causes of amyloidogenesis within living organisms vary and include protein synthesis errors, genetic mutations, environmental changes, incorporation of amyloidogenic agents, protein cleavage products, and maturation. Because of their clinical importance, amyloidogenic polypeptides are under intense scientific research (1,5–7) with a special emphasis on the prediction of amyloidogenic sequences (8–10). However, the relation between amyloidogenicity and polypeptide sequence is controversial.

The focus on amyloidogenic sequences is shifting toward the role of environmental circumstances in amyloidogenesis, as various factors have been found to modulate the conformational transition of the polypeptide. Changes in pH, salt concentration, interaction with lipids or the cell membrane, and the presence of metal ions are factors known to promote amyloidogenic aggregation in some cases and slow down the kinetics of aggregation in others (7). The a-helical secondary structure is believed to be a key player in the process of amyloidogenesis in the early self-assembly stages (11) as an intermediate structure (12) and because amyloidogenic sequences are often found within helices (13). In this work, we analyze the structure of amyloidogenic sequences that have been isolated from the Protein Data Bank (PDB), focusing on the relation between amyloidogenicity and the a-helical conformation. METHODS

Submitted November 30, 2017, and accepted for publication March 20, 2018. *Correspondence: [email protected] Editor: Michael Brown. https://doi.org/10.1016/j.bpj.2018.03.019

Analysis was carried out on data sets obtained from the PDB. Five data sets were prepared, as further described below. All the data sets are encoded as text files with a single field per row and may be found within the Supporting Material.

Ó 2018 Biophysical Society.

Biophysical Journal 114, 1–9, April 24, 2018 1

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Haimov and Srebnik

Data set 0: PDB a-helices All structures in the PDB were scanned, and a-helical conformations were extracted (including redundancies). The determination of a-helical conformations was made according to the predefined structural information included within PDB file headers. The following specific information was extracted: the PDB entry of the structure (d0_pname.txt), the amino acid (AA) sequence of the corresponding a-helix (d0_aaseq.txt), the length of the a-helix in AA units (d0_hlen.txt), and the corresponding Cartesian coordinates. The dihedrals of the a-helix backbone (4, j, and u) were calculated. Whereas 4 and j were required for the purpose of this study, u was used as an indicator for correct calculations with an expected value of 5180 (data given in d0_phi.txt, d0_psi.txt, and d0_omega.txt, respectively). For an a-helix with N AAs, we obtain N – 1 dihedrals. Data set 0u is a subset of data set 0 containing unique sequences of PDB helices. Because there are multiple values of 4 and j dihedrals for any given sequence that appears in the PDB more than once, an average a-helical conformation was calculated as described below. The information for this data set contains an AA sequence of the corresponding a-helix (d0u_aaseq.txt) and the length of the a-helix in AA units (d0u_hlen.txt).

Data set 1: hexapeptide a-helices from PDB Data set 1 consists of hexapeptide a-helical segments that are found in the PDB, including their redundant occurrences. This data set was built by breaking down the AA sequences from data set 0 into hexapeptides with their original conformations. Overlapping hexapeptide sequences were considered, i.e., N – 5 hexapeptide sequences for a polypeptide with N AA residues. Sequences shorter than hexapeptides were ignored. The following information is available for this data set: AA sequence of the corresponding a-helix (d1_aaseq.txt), length of the a-helix in AA units (d0_hlen.txt), set of five dihedrals of the corresponding a-helical hexapeptide backbone (d1_phi.txt, d1_psi.txt, and d1_omega.txt, respectively), and the amyloidogenicity score (d1_ascore.txt) calculated using the Meta-Predictor for Amyloid Proteins (METAMYL) (14) web server (described below).

server for empirically determined amyloidogenic polypeptide segments that is publicly available worldwide. The information for this data set contains the AA sequence of the corresponding polypeptide (d4_aaseq.txt), whether the polypeptide is amyloidogenic (d4_amyloidogenic.txt), and the estimated helical conformational pair ðq; rÞ (d4_theta.txt and d4_rho.txt).

Calculation of ðq; rÞ from ðf; cÞ dihedrals The conformational pair ðq; rÞ was calculated using the following estimated linear transformation that was developed elsewhere (16): q ¼ ðj  4 þ 11:4Þ=2:76 and r ¼  ðj þ 4Þ=29:5, where q, f, and j are in degrees and the units of r are [residues/turn]. q is the angle of backbone carbonyls relative to the helix direction vector, and r is the number of residues per single a-helix turn. Because the transformation is correct only for values inside or near the a-helix basin (the region on the Ramachandran map where the a-helical conformations reside), structures that are strongly twisted and deformed were ignored. The filtration was made by ensuring that 0 < q < 30 and 1½Res=Turn < r < 5½Res=Turn.

Estimation of a-helical conformation Every transition from one AA to another along the protein backbone had a prominent ð4; jÞ peak within the a-helix basin (16), which corresponds to a ðq; rÞ peak representing the mean a-helical conformation of the given transition, totaling 400 AA-pair transitions (presented in Tables S1 and S2). The prominent ðq; rÞ peak values are used in this study as estimations of the a-helical conformation for the given transition. Thus, by decomposing some given a-helical sequence with N AAs, we get N – 1 transitions from which we use the prominent ðq; rÞ values to estimate the a-helical conformation. Representative helical conformations were calculated as the averages of the prominent ðq; rÞ values.

Determination of amyloidogenicity score

Data set 2 consists of unique hexapeptide a-helical segments that are found in the PDB. This data set was derived from data set 1 by keeping only single AA sequences and filtering out all the redundant occurrences. Because there are multiple values of 4 and j dihedrals for any given hexapeptide sequence that appears in the PDB more than once, it was necessary to estimate an average a-helical conformation based on earlier study (described below). The information for this data set contains an AA sequence of the corresponding a-helix (d2_aaseq.txt) and the amyloidogenicity score (d2_ascore.txt) calculated using the METAMYL (14) web server (described below).

METAMYL (14) is an online, publicly available web tool that allows the calculation of amyloidogenicity for arbitrary AA sequences. METAMYL is a metapredictor based on amyloidogenicity subscores of four predictors (PAFIG, SALSA, Waltz, and FoldAmyloid) that were ‘‘tested on their ability to find amyloid-forming regions defined on the basis of in vitro experiments’’ (14). The METAMYL predictor is based on a large number of experimentally validated amyloid regions and has been shown to have 84% accuracy. METAMYL breaks every AA sequence into hexapeptides and calculates the amyloidogenicity of hexapeptide sequences only. Tool command language (TCL) together with Curl (command line tool to transfer data between web clients and web servers) were used to automate the process of uploading the desired AA sequences to METAMYL and to download the resulting amyloidogenicity scores for the uploaded sequences.

Data set 3: random hexapeptide a-helices

RESULTS AND DISCUSSION

Data set 3 consists of hexapeptide a-helical segments that were randomly generated. The information for this data set contains an AA sequence of the corresponding a-helix (d3_aaseq.txt) and the amyloidogenicity score (d3_ascore.txt) as calculated by the METAMYL (14) web server (described below).

Amyloidogenicity is a probabilistic term describing the tendency of a polypeptide sequence or region within a protein toward nucleating the formation of amyloid fibril. Research on amyloidogenic sequences has focused on short polypeptide segments within proteins. Several hundreds of such sequences have been identified (15). A large portion of these sequences comprise the a-helical secondary structure, which has been recognized to be a key player in different stages of the amyloidogenesis process. The focus of this study is on the relation between the structure of the a-helical sequence and its amyloidogenic tendency. Our analysis was

Data set 2: PDB unique hexapeptide a-helices

Data set 4: empirical amyloidogenic and nonamyloidogenic polypeptides Data set 4 consists of polypeptide segments with a priori information regarding its amyloidogenicity that was empirically determined. This data set was built using AMYLOAD web server (15)—a dedicated web

2 Biophysical Journal 114, 1–9, April 24, 2018

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Conformation of Amyloidogenic Helices

performed on all available a-helical conformations at the PDB (17). Fig. 1 a presents the distribution of the length of a-helices sampled from the PDB in AA units, whereas Fig. 1 b presents the distribution of the length of a-helices with unique AA sequences. Despite the large percentage of redundant sequences (only 22.4% have a unique AA sequence), the distributions of the set containing redundant sequences and of the nonredundant set are very similar. In both cases, we observe that the two most abundant lengths are 6 and 12 AAs, so that the cause for the most abundant a-helical lengths (6 and 12 AAs) is not the redundancy. The peak around 6 AAs could be due to the method used to identify helices, which might tend to assign shorter sequences as other structural elements, such as loops or turns. However, we have previously shown (16) that the PDB definition of a-helices follows a relatively weak alignment criterion for hydrogen bonds between the carbonyl group and the amide along the helical backbone, thereby allowing for significant kinks and deformations of the helix. Studies performed on amyloidogenic sequences tend to focus on short peptide sequences, or hot spots, which have been shown to be important players in the nucleation of amyloids (18). These hot spots constitute a minimal amyloidogenic polypeptide length that allows the formation of cross-b structure (1,6,7). It was previously shown that peptide sequences as short as pentapeptides (19–21) and even tetrapeptides (21,22) are sufficient for the formation of amyloid fibrils. However, most of the studies performed on amyloidogenic sequences consider hexapeptides because the number of known amyloidogenic tetrapeptide and pentapeptide sequences is relatively small (14,18,23,24). Because tetrapeptides and pentapeptides (22) also allow for the formation of cross-b structures, it is not clear from these studies whether the hexapeptide length is more significant than that of tetrapeptides and pentapeptides or if it is simply the most probable. The distribution of helical lengths presented in Fig. 1, a and b, suggest that the significance of amyloidogenic hexapeptides over tetrapeptides and pentapeptides might be the result of their abundance.

The conformation of a polypeptide backbone may be represented using the dihedrals 4 and j plotted on the Ramachandran map (25). a-Helices are found in the lower-left quadrant of the map. Fig. 2 b presents the distribution of the a-helical (4,j) conformational pairs making up each of the a-helices that were sampled from the PDB. A total of 15.8e6 conformational pairs were sampled (corresponding to over nearly 2e6 helices), including redundant structures. The reason for leaving redundant phases in this case is because a-helices with similar AA sequences may adopt different conformations (26). The observed diagonal distribution of (4,j) pairs is well known and is primarily due to steric constraints (27) and hydrogen bonding (28). In our analysis of the a-helical structure, we utilize an alternative representation (16) using (q,r). In this orthogonal coordinate system, r is the number of residues per turn, and q is the angle of backbone carbonyls relative to the helix direction, thus providing a physical interpretation of the helical conformation. The calculation of (q,r) pairs is done through an approximate linear transformation from the Ramachandran dihedrals. Fig. 2 c presents the distribution of a-helical (q,r) pairs and clearly demonstrates that the mean a-helical conformation contains 3.6 res/turn and a q angle of 12 , which is in good agreement with previous reports (29,30). To study the dependency of amyloidogenicity on the a-helical conformation, we prepared three different data sets: 1) a-helices with sequences and conformations taken directly from the PDB, which is the same data set used for Fig. 1; 2) helical hexapeptide sequences taken from the PDB with estimated a-helical conformations (overlapping hexapeptide sequences were considered, i.e., N – 5 hexapeptide sequences for a polypeptide with N AA residues); and 3) a-helices with randomly generated hexapeptide sequences and with estimated a-helical conformations. The a-helical conformation for a given polypeptide is estimated from the (q,r) pair that corresponds to the (4,j) peak for each AA pair (16). For a a-helical sequence with N AAs, we get N – 1 transitions, from which we use the average of

FIGURE 1 (a) Distribution of the lengths of a-helices in the PDB (a total of 1.96e6). (b) A distribution of a-helices with unique AA sequences (0.44e6) is shown. From the ratio 0.44/ 1.96, we find that only 22.4% of PDB a-helical sequences are unique or that 77.6% are redundant. The figures demonstrate that despite the strong redundancy, the distributions remain similar, with hexapeptide (6 AA-long) helices as the most abundant. To see this figure in color, go online.

Biophysical Journal 114, 1–9, April 24, 2018 3

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Haimov and Srebnik FIGURE 2 (a) Illustration of (4,j) Ramachandran dihedrals for a given AA transition (pair). Black circles represent carbon atoms, blue circles represent nitrogen atoms, red circles represent oxygen atoms, and purple circles represent side chains. (b) A distribution of PDB a-helical conformations is shown using Ramachandran dihedrals. (c) A distribution of PDB a-helical conformations is shown using (q,r) pairs. The shades represent the abundance of the a-helical conformations: the darkest shade for least abundant and the brightest shade for most abundant conformations. The dark-blue shade in the background represents zero abundance. A total of 15.9e6 conformational pairs were sampled from the PDB with 15.8e6 pairs within the presented (4,j) and (q,r) ranges. To see this figure in color, go online.

peak values to obtain an estimate of the a-helical conformation. For all three data sets, a single representative conformational pair for each polypeptide was calculated as the average (q,r) value of all a-helices with the same AA sequence. The first data set is used to study the dependency of a-helical conformation on amyloidogenicity for conformations that appear as is in the PDB, including redundancy and different a-helical lengths. Averaging over hexamers as opposed to the entire polypeptide chain gave similar results. The purpose of the second data set is to study the dependency of a-helical conformation on amyloidogenicity for nonredundant hexapeptide sequences based on the mean (q,r) for each AA pair in the sequence. Thus, conformational fluctuations in redundant hexapeptide sequences are not considered. The randomly generated sequences introduced in the third data set test for the dependence of amyloidogenicity on sequence (versus structure) and compensate for the potential sequential sparsity of the first two data sets, allowing better stochasticity of the studied sequences. Previous studies have found a dependency of amyloidogenicity on AA sequence (10,31,32). Based on such correlations, amyloidogenic regions have been predicted using machines based on predictor algorithms that have been trained with sequences probed in in vivo and in vitro studies

(33). The amyloidogenicity of the helices used in this study was calculated using the online METAMYL web server (14), which predicts an amyloidogenic score based on the AA sequence. METAMYL relies on different contributions to the aggregation propensity of amyloid hot spots using a weighted combination of a number of amyloid predictors and hence is expected to result in greater accuracy and a lower rate of false positives than each of the individual predictors (33). For a-helices that contained more than 6 AAs in the first data set, the server was used to calculate the amyloidogenicity subscore of every hexapeptide subsequence, and the total amyloidogenicity of the helix was calculated as the average of these subscores. Thus, for every peptide sequence, we obtain the set (q,r,A) (A for amyloidogenicity). The resulting dependencies of q and r on the amyloidogenicity for the three data sets are shown as symbols in Fig. 3. The corresponding linear regression models (that apply strictly within the plotted bins) are given in Table 1. The inherent flexibility of the helices and thermal fluctuations, seen in Fig. 2, leads to large deviations. Therefore, we binned the data into 10 equally sized bins along the amyloidogenicity axis to focus the analysis on conformations with similar amyloidogenicity index, thereby reducing the SE of

FIGURE 3 Dependency of amyloidogenicity on q and r for the three data sets: a-helices with conformations sampled directly from PDB (filled circles), PDB helices broken into hexapeptide a-helices with a-helical conformations estimated from the mean of the strongly peaked (q, r) distribution (open circles), and randomly generated hexapeptide a-helices with estimated a-helical conformations (pentacles). The SEM of every bin is smaller than the marked symbol. To see this figure in color, go online.

4 Biophysical Journal 114, 1–9, April 24, 2018

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Conformation of Amyloidogenic Helices TABLE 1 Linear Regression Equations for the Dependency of Amyloidogenicity on q and r A(q) Regression Line

A(r) R2

Regression Line

R2

PDB full helices A ¼ 2.82 – 0.17 q 0.97 A ¼ 4.35 þ 1.41 r 0.96 PDB hexapeptide A ¼ 4.45 – 0.3 q 0.99 A ¼ 33.65 þ 9.48 r 0.92 helicesa Random A ¼ 4.77 – 0.33 q 0.99 A ¼ 26.67 þ 7.54 r 0.92 hexapeptide helicesa a

a-Helical conformations were estimated from AA transition data obtained from the PDB (16).

every bin. Further details and a statistical summary of the binned data sets may be found in Tables S3–S5. Further information about the SDs and SEs of the studied data sets may be found in Figs. S1 and S2. The findings presented in Fig. 3 demonstrate a clear dependency of amyloidogenicity on q and evidently a dubious dependency on r, further discussed below. In all cases, amyloidogenicity decreases with q and increases with r. Conformational fluctuations of redundant hexamer sequences lead naturally to a reduction in the slope of the first data set relative to the second. In addition, heavily redundant sequences that may appear in data set 1 will dominate and bias the mean conformation and its overall mean amyloidogenicity score. On the other hand, for the nonredundant data sets (second and third), the biasing of the overall mean conformation and the overall mean amyloidogenicity was also removed. Regardless, the trend in each case is consistent. The strong similarity between the slopes of the PDB hexapeptides and random hexapeptides is striking and emphasizes the good stochasticity of a-helical hexapeptide sequences found in the PDB and, more importantly, the role of helical structure as opposed to sequence. Though the two are clearly linked, i.e., sequence defines the helical structure (under the given environmental conditions), the opposite is not necessarily the case, as different sequences may share similar structures. In Fig. 4, we tested for the dependency of the amyloidogenicity score on polypeptide length for the PDB sequences in data set 0u. This is of particular interest, as METAMYL performs the calculation of amyloidogenicity only for hexapeptide segments and does not cover calculations of amyloidogenicity for shorter peptides. As can be seen, the calculated amyloidogenicity score is independent of polypeptide length, and hence, we can assume that shorter sequences would follow similar trends, as seen in Fig. 3. Conformational transitions in polypeptides have been found to occur in vivo near cellular membranes and depend on the charge and hydrophobicity of the membrane (7). Partial unfolding or misfolding of AA sequences initiates the aggregation process. Portions of these unfolded or misfolded sequences need to be exposed and available for contact with analogous sequences of a neighboring protein to

FIGURE 4 Dependence of the average amyloidogenicity on peptide length for PDB helices. The mean amyloidogenicity value A ¼ 0.55 represents the 50% threshold of hexapeptide sequences. To see this figure in color, go online.

allow for dimerization and the initiation of irreversible amyloidogenic aggregation under the physiological conditions that they form. Many other factors contribute, such as the solution environment (e.g., pH and temperature), the interaction with membrane and metal ions, and the accessibility of charged and hydrogen-bonding groups. For example, Fezoui and Teplow (34) showed that by systematically increasing the concentration of trifluoroethanol cosolvent, the a-helical phase of amyloid b-protein fibrils increased from almost 0% to almost 80%. Trifluoroethanol displaces water from the vicinity of the a-helix and destabilizes it by preventing the formation of favorable interactions, such as hydrogen bonding, between the water and helix backbone or residues (35). Bifurcated hydrogen bonds (BHBs) provide additional steric shielding of the peptide backbone (36) through hydrogen bonding with water and surrounding molecules that assist in stabilizing the a-helical conformation, presumably reducing the probability of amyloid formation. Fig. 5 a presents an illustration of an exposed a-helical backbone with non-BHBs, whereas Fig. 5 b presents an illustration of shielded a-helical backbone with BHBs. To analyze the relation between a-helical structure and amyloidogenicity, we assume that amyloidogenicity (A) is negatively proportional to the amount of BHBs along the a-helical backbone (NBHB) as follows: Af  NBHB ;

(1)

where the BHB is established between the ith carbonyl oxygen and the i þ 4th amide nitrogen (the main part of the BHB that is required to maintain the a-helical conformation) and between the ith carbonyl oxygen and one of the surrounding

Biophysical Journal 114, 1–9, April 24, 2018 5

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Haimov and Srebnik FIGURE 5 (a) a-Helical backbone with nonbifurcated hydrogen bonds and with low q results in exposed backbone and higher amyloidogenicity. (b) a-Helical backbone with bifurcated hydrogen bonds and with high q results in a shielded backbone and lower amyloidogenicity. Dashed lines represent hydrogen bonds between carbonyl oxygen and amide hydrogen. Enlarged black spheres represent surrounding water molecules or neighboring side-chains that can establish hydrogen bonds with the backbone carbonyls. Residues are not shown, for clarity. Illustrations were prepared with visual molecular dynamics (48). (c) A schematic view of the a-helix basin on the Ramachandran map is shown. The a-helix basin is found near the Oi1-Niþ1 sterically inaccessible region which is due to interactions between i þ 1 amide nitrogen atoms (N) and i  1 carbonyl oxygen atoms (O) along the a-helix backbone, where i represents the index of backbone amino acid residues. The tangent line between the sterically inaccessible region and the a-helix basin dictates the negative proportionality between r and q for q > 0. To see this figure in color, go online.

molecules (usually water) or to a neighboring side chain residue. Careful examination of the findings presented by Barlow and Thornton (37) reveal that when BHBs are established along the a-helical backbone, q indeed increases. Therefore, it is plausible to assume that NBHB is also proportional to the angle q as follows: NBHB fq;

(2)

as illustrated in Fig. 5 b, because q (which is the average angle of backbone carbonyls relative to the helix direction normal) must increase when BHBs are established along the a-helical backbone. Combining relations (1) and (2) results in the following relation: Af  q:

(3)

The resulting negative proportionality between the amyloidogenicity A and the angle q, as observed in Fig. 3, suggests that a contribution to the decrease of amyloidogenicity is the establishment of BHBs with the surrounding molecules that apparently lead to the increase of q and to the formation of steric shield around the a-helical backbone. Previous efforts have shown that hydrophobicity and amyloidogenicity are tightly related (13) and that hydrophobic sequences of three or more amino acids occur less frequently than expected assuming natural selection (38). Other efforts demonstrated that amyloidogenicity reduces with the increase of the net charge of protein molecules (39–41). Our findings further suggest that hydrophobic residues that are found near the a-helix backbone will prevent the establishment of BHBs with the a-helical polypeptide backbone that will reduce q and increase the amyloidogenicity. Similarly, proteins with increased net charge will encourage the establishment of BHBs with the a-helical polypeptide backbone that will eventually increase q and decrease the amyloidogenicity.

6 Biophysical Journal 114, 1–9, April 24, 2018

The linear dependency between A and r seen in Fig. 3 is dubious because r is expected to be invariant, as there is minimal change in the mechanical forces applied on the helical backbone for the different values of A. It is known that the a-helix basin is found near the Oi1.Niþ1 sterically inaccessible region that is due to steric interference between the i  1st carbonyl oxygen and the i þ 1st amide nitrogen atoms (where i represents the index of backbone AA residues). As a consequence, some of the a-helical conformations are constrained (28). Assuming an approximate oval shape of the inaccessible Oi1.Niþ1 region, as depicted in Fig. 5 c, the tangent line between the a-helix basin and the Oi1-Niþ1 sterically inaccessible region dictates the negative proportion between r and q for q > 0. Because q is always positive for a-helical conformations (16) and is usually greater than 10 , we deduce that the increase of r with increasing amyloidogenicity is due to steric constraints between the a-helix basin and the Oi1-Niþ1 sterically inaccessible region (the slope of the tangent line between the a-helix basin and the Oi1-Niþ1 line corresponds with the findings presented in previous reports (27,28)). Hence, q > 0/qf  r/Afr:

(4)

Thus, we conclude that amyloidogenic sequences possess, on average, lower w values and higher r values than nonamyloidogenic sequences. These tendencies were tested on real amyloidogenic and nonamyloidogenic sequences, using the AMYLOAD (15) web server that contains experimentally verified sequences of amyloidogenic and nonamyloidogenic polypeptides. Data set 4 (described in detail in Methods) contains 1480 such sequences, of which 1037 are nonamyloidogenic and 443 amyloidogenic. The distribution of estimated a-helical conformational pairs (q,r) for these sequences is shown in Fig. 6. Because the sequences were not broken down into hexapeptides, the distribution also contains (q,r) pairs of pentapeptides and tetrapeptides that were not considered in the initial analysis based on the METAMYL (14)

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Conformation of Amyloidogenic Helices

FIGURE 6 Distribution of estimated (q,r) pairs of experimentally verified nonamyloidogenic sequences (triangles) and experimentally verified amyloidogenic sequences (squares). The average conformations of the nonamyloidogenic sequences and amyloidogenic sequences are shown by ‘‘‘‘ and ‘‘þ,’’ respectively. A total of 1480 sequences was acquired from the AMYLOAD web server (15), with 1037 nonamyloidogenic and 443 amyloidogenic sequences. The distribution confirms that, on average, amyloidogenic sequences have lower q and higher r than nonamyloidogenic sequences. To see this figure in color, go online.

predictor. We note that although the sequences used in this part of our study are not necessarily helical, the projected helices may represent intermediate helical conformation adopted during amyloid formation (12). Fig. 6 confirms that amyloidogenic sequences (red squares) possess, on average, both lower w values and higher r values than nonamyloidogenic sequences (green triangles). These results suggest that the a-helical conformation of any amyloidogenic intermediate will possess the estimated (q,r) values presented in the distribution. Our previous study on the a-helix (16) suggests that favorable interactions between the residues and the surrounding environment would lead to an increase in q (decreasing A), whereas factors that lead to stretching of the a-helix would increase r (increasing A) and vice versa. Further experimental studies, such as those carried out by Fezoui and Teplow (34), should be carried out to shed light on these findings. Structure and sequence are inherently related, and hence, both correlate with amyloidogenicity. To examine this relation, we plot the conformational pair distribution of the different AAs on q-r space in Fig. 7 and consider it in view of the relations found above for the amyloidogenicty as a function of these two variables. The median amyloidogenicity score of PDB hexapeptide helices is 0.55. By assigning this value to the equations relating w, r, and amyloidogenicity, presented in Table 1, we get the corresponding conformational threshold values of qT ¼12.8 and rT ¼3.61 [res/turn]. qT and rT allow division of the

FIGURE 7 Distribution of the homogeneous transitions of a-helical conformations. Single-letter amino acid (AA) codes are placed on the meancalculated conformation. Letters correspond to AAs: uncharged polar (NQST), acidic (ED), basic (KRH), hydrophobic (AFILMVWY), and special (CGP). Dashed horizontal and vertical lines divide the conformational space into four quadrants, where the upper-left quadrant is the least amyloidogenic whereas the lower-right quadrant is the most amyloidogenic. Threshold values of qT z 12.8 and rT z 3.61 [res/turn] correspond to an amyloidogenicity score of 0.55. To see this figure in color, go online.

a-helix basin to four quadrants, as depicted in Fig. 7. According to Eqs. 3 and 4, the upper-left quadrant is the least amyloidogenic quadrant, whereas the lower right is the most amyloidogenic quadrant. A conformational pair requires a sequence of two AAs to be defined (see Fig. 2 a). For 20 AAs, 400 possible transitions (sequence of two AAs) exist, from which 20 are homogeneous and the other 380 are heterogeneous (i.e., are made up of two different AAs). The figure shows the distribution of homogeneous conformational pairs for helical sequences as defined by PDB entries. For clarity, the heterogeneous transitions are not shown (the distribution of all 400 possible transitions may be found in Fig. S3). Most of the conformations of the heterogeneous transitions may be estimated from a linear combination of the homogeneous transitions (16). Fig. 7 clarifies the relation between polypeptide sequence and structure. Based on the structural analysis, sequences rich in D, G, S, T, and W are expected to be nonamyloidogenic, whereas those rich in F, I, and V are expected to be amyloidogenic (P presents an exception, as discussed below). Clearly, the process of amyloid formation is complex, and there is evidence for sequence-related correlations (33). These correlations may be explained by the chemistry of the specific AA sequence (42), which in turn affects their conformation (16). Hydrophobic residues primarily make up the most amyloidogenic quadrant, in agreement with experimental analysis (43) and recent reports (13,38). Methionine (M), alanine (A), leucine (L), glutamine (E), and lysine (K) (MALEK) are grouped together and define

Biophysical Journal 114, 1–9, April 24, 2018 7

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Haimov and Srebnik

the favorable helical conformation because of their highest a-helical propensities (44). It is therefore expected that MALEK would not reside inside the most amyloidogenic quadrant, as these AAs will prefer to remain in the a-helical phase over amyloidogenic transformations. Interestingly, proline also resides in the most amyloidogenic quadrant despite its rare occurrence in amyloidogenic sequences. However, this is an artifact because the d-carbon of proline is covalently bonded to its nitrogen, resulting in strong confinement of its backbone dihedrals. Proline is found far from the mean a-helical conformations (MALEK) because it is known as a helix breaker, as its nitrogen cannot establish hydrogen bonds to maintain the a-helical backbone. Hence, proline tends to occur in unstructured domains, which are known to participate with much lower frequency in amyloid nucleation (45). Glycine (G), serine (S), aspartic acid (D), threonine (T), and tryptophan (W) reside in the least amyloidogenic quadrant. Although it is reasonable that the polar residues will prefer to decrease amyloidogenicity, the relation is less evident for the bulky and hydrophobic tryptophan and alanine, which reside on the border of the amyloidogenic quadrant. Clearly, further studies and greater statistical data bases are needed to explain the contribution of tryptophan to amyloidogenicity. Cysteine is a special AA residue and has the potential to form disulfide bridges with other cysteine residues. It is striking that cysteine has the highest w and has been recently reported to be antiamyloidogenic (46,47). CONCLUSIONS The relation between amyloidogenicity and the a-helical conformation was studied. Amyloidogenicity was found to increase with an increase of r (number of residues per helical turn) and decreased with an increase of q (the angle of backbone carbonyls relative to the helix direction). The fact that these trends are retained for random sequences stresses the role of structure of the a-helix (which is defined by the sequence as well as the surrounding environment). In practice, it is difficult to separate the environmental factors (such as pH and confinement) from intrinsic factors (such as sequence or secondary structure) that affect amyloidogenicity. Hence, studies on these effects often point to statistically observable trends, as these factors likely work in concert. Improved understanding of potentially amyloidogenic regions of proteins would benefit from consideration of each of these effects (such as sequence, charged groups, hydrophobicity, and structure) in conjunction with environmental factors that are known to affect potentially amyloidogenic regions of proteins to narrow down the distribution. Previous efforts explained that hydrophobicity (13,38) and the net charge of the protein (39–41) are tightly related to amyloidogenicity. Our study provides a complementary explanation of the amyloidogenicity puzzle by relating both hydrophobicity and the net charge to the

8 Biophysical Journal 114, 1–9, April 24, 2018

apparent angle q, which serves as an indicator for the amount of the established BHBs. BHBs are important, as they indicate how much the polypeptide backbone is sterically shielded and resistant to conformational changes. An increase in the number of BHBs leads to an increase in q and results in a decrease of amyloidogenicity. The trends observed are independent of polypeptide length, thus allowing the estimation of amyloidogenic trends for peptides of arbitrary length based on estimation of the helical structure. SUPPORTING MATERIAL Three figures, five tables, and five data files are available at http://www. biophysj.org/biophysj/supplemental/S0006-3495(18)30386-2.

AUTHOR CONTRIBUTIONS B.H. and S.S. designed the research and analysis. B.H. developed the computer codes, performed the research, and carried out the analysis. B.H. and S.S. wrote the manuscript.

ACKNOWLEDGMENTS This work was funded in part by the Israel Science Foundation grant number 265/16. B.H. acknowledges the generous financial support of The Irwin and Joan Jacobs Fellowship and the generous financial support of The Miriam and Aaron Gutwirth Memorial Fellowship.

REFERENCES 1. Chiti, F., and C. M. Dobson. 2006. Protein misfolding, functional amyloid, and human disease. Annu. Rev. Biochem. 75:333–366. 2. Sipe, J. D., M. D. Benson, ., P. Westermark. 2014. Nomenclature 2014: amyloid fibril proteins and clinical classification of the amyloidosis. Amyloid. 21:221–224. 3. Serpell, L. C., M. Sunde, ., P. E. Fraser. 2000. The protofilament substructure of amyloid fibrils. J. Mol. Biol. 300:1033–1039. 4. Eanes, E. D., and G. G. Glenner. 1968. X-ray diffraction studies on amyloid filaments. J. Histochem. Cytochem. 16:673–677. 5. Armen, R. S., D. O. Alonso, and V. Daggett. 2004. Anatomy of an amyloidogenic intermediate: conversion of b-sheet to a-sheet structure in transthyretin at acidic pH. Structure. 12:1847–1863. 6. Bellesia, G., and J. E. Shea. 2009. Diversity of kinetic pathways in amyloid fibril formation. J. Chem. Phys. 131:111102. 7. Ke, P. C., M. A. Sani, ., R. Mezzenga. 2017. Implications of peptide assemblies in amyloid diseases. Chem. Soc. Rev. 46:6492–6531. 8. Bryan, A. W., Jr., M. Menke, ., B. Berger. 2009. BETASCAN: probable b-amyloids identified by pairwise probabilistic analysis. PLoS Comput. Biol. 5:e1000333. 9. Garbuzynskiy, S. O., M. Y. Lobanov, and O. V. Galzitskaya. 2010. FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence. Bioinformatics. 26:326–332. 10. Tian, J., N. Wu, ., Y. Fan. 2009. Prediction of amyloid fibril-forming segments based on a support vector machine. BMC Bioinformatics. 10 (Suppl 1):S45. 11. Pannuzzo, M., A. Raudino, ., M. Karttunen. 2013. a-helical structures drive early stages of self-assembly of amyloidogenic amyloid polypeptide aggregate formation in membranes. Sci. Rep. 3:2781.

Please cite this article in press as: Haimov and Srebnik, The Relation between a-Helical Conformation and Amyloidogenicity, Biophysical Journal (2018), https://doi.org/10.1016/j.bpj.2018.03.019

Conformation of Amyloidogenic Helices 12. Abedini, A., and D. P. Raleigh. 2009. A critical assessment of the role of helical intermediates in amyloid formation by natively unfolded proteins and polypeptides. Protein Eng. Des. Sel. 22:453–459.

31. Kim, C., J. Choi, ., S. Yoon. 2009. NetCSSP: web application for predicting chameleon sequences and amyloid fibril formation. Nucleic Acids Res. 37:W469–W473.

13. Tzotzos, S., and A. J. Doig. 2010. Amyloidogenic sequences in native protein structures. Protein Sci. 19:327–348.

32. Maurer-Stroh, S., M. Debulpaep, ., F. Rousseau. 2010. Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nat. Methods. 7:237–242.

14. Emily, M., A. Talvas, and C. Delamarche. 2013. MetAmyl: a METapredictor for AMYLoid proteins. PLoS One. 8:e79722. 15. Wozniak, P. P., and M. Kotulska. 2015. AmyLoad: website dedicated to amyloidogenic protein fragments. Bioinformatics. 31:3395–3397. 16. Haimov, B., and S. Srebnik. 2016. A closer look into the a-helix basin. Sci. Rep. 6:38341. 17. Berman, H. M., J. Westbrook, ., P. E. Bourne. 2000. The protein data bank. Nucleic Acids Res. 28:235–242. 18. Ventura, S., J. Zurdo, ., L. Serrano. 2004. Short amino acid stretches can mediate amyloid formation in globular proteins: the Src homology 3 (SH3) case. Proc. Natl. Acad. Sci. USA. 101:7258–7263. 19. Tenidis, K., M. Waldner, ., A. Kapurniotu. 2000. Identification of a penta- and hexapeptide of islet amyloid polypeptide (IAPP) with amyloidogenic and cytotoxic properties. J. Mol. Biol. 295:1055–1071. 20. Mazor, Y., S. Gilead, ., E. Gazit. 2002. Identification and characterization of a novel molecular-recognition and self-assembly domain within the islet amyloid polypeptide. J. Mol. Biol. 322:1013–1024. 21. Reches, M., Y. Porat, and E. Gazit. 2002. Amyloid fibril formation by pentapeptide and tetrapeptide fragments of human calcitonin. J. Biol. Chem. 277:35475–35480. 22. Tjernberg, L., W. Hosia, ., J. Johansson. 2002. Charge attraction and b propensity are necessary for amyloid fibril formation from tetrapeptides. J. Biol. Chem. 277:43243–43246. 23. Gazit, E. 2005. Mechanisms of amyloid fibril self-assembly and inhibition. Model short peptides as a key research tool. FEBS J. 272:5971– 5978. 24. Meng, S. R., Y. Z. Zhu, ., Y. Liang. 2012. Fibril-forming motifs are essential and sufficient for the fibrillization of human Tau. PLoS One. 7:e38903. 25. Ramachandran, G. N., C. Ramakrishnan, and V. Sasisekharan. 1963. Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7:95–99. 26. Kosloff, M., and R. Kolodny. 2008. Sequence-similar, structure-dissimilar protein pairs in the PDB. Proteins. 71:891–902.

33. Meric, G., A. S. Robinson, and C. J. Roberts. 2017. Driving forces for nonnative protein aggregation and approaches to predict aggregationprone regions. Annu. Rev. Chem. Biomol. Eng. 8:139–159. 34. Fezoui, Y., and D. B. Teplow. 2002. Kinetic studies of amyloid b-protein fibril assembly. Differential effects of a-helix stabilization. J. Biol. Chem. 277:36948–36954. 35. Roccatano, D., G. Colombo, ., A. E. Mark. 2002. Mechanism by which 2,2,2-trifluoroethanol/water mixtures stabilize secondary-structure formation in peptides: a molecular dynamics study. Proc. Natl. Acad. Sci. USA. 99:12179–12184. 36. Garcı´a, A. E., and K. Y. Sanbonmatsu. 2002. a-helical stabilization by side chain shielding of backbone hydrogen bonds. Proc. Natl. Acad. Sci. USA. 99:2782–2787. 37. Barlow, D. J., and J. M. Thornton. 1988. Helix geometry in proteins. J. Mol. Biol. 201:601–619. 38. Schwartz, R., and J. King. 2006. Frequencies of hydrophobic and hydrophilic runs and alternations in proteins of known structure. Protein Sci. 15:102–112. 39. Chiti, F., M. Calamai, ., C. M. Dobson. 2002. Studies of the aggregation of mutant proteins in vitro provide insights into the genetics of amyloid diseases. Proc. Natl. Acad. Sci. USA. 99 (Suppl 4):16419–16426. 40. Schmittschmitt, J. P., and J. M. Scholtz. 2003. The role of protein stability, solubility, and net charge in amyloid fibril formation. Protein Sci. 12:2374–2378. 41. Calloni, G., S. Zoffoli, ., F. Chiti. 2005. Investigating the effects of mutations on protein aggregation in the cell. J. Biol. Chem. 280:10607–10613. 42. Caflisch, A. 2006. Computational models for the prediction of polypeptide aggregation propensity. Curr. Opin. Chem. Biol. 10:437–444. 43. Lo´pez de la Paz, M., and L. Serrano. 2004. Sequence determinants of amyloid fibril formation. Proc. Natl. Acad. Sci. USA. 101:87–92. 44. Pace, C. N., and J. M. Scholtz. 1998. A helix propensity scale based on experimental studies of peptides and proteins. Biophys. J. 75:422–427.

27. Lovell, S. C., I. W. Davis, ., D. C. Richardson. 2003. Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins. 50:437–450.

45. Linding, R., J. Schymkowitz, ., L. Serrano. 2004. A comparative study of the relationship between protein structure and b-aggregation in globular and intrinsically disordered proteins. J. Mol. Biol. 342:345–353.

28. Ho, B. K., A. Thomas, and R. Brasseur. 2003. Revisiting the Ramachandran plot: hard-sphere repulsion, electrostatics, and H-bonding in the a-helix. Protein Sci. 12:2508–2522.

46. Zaman, M., S. M. Zakariya, ., R. H. Khan. 2017. Cysteine as a potential anti-amyloidogenic agent with protective ability against amyloid induced cytotoxicity. Int. J. Biol. Macromol. 105:556–565.

29. Fabiola, F., R. Bertram, ., M. S. Chapman. 2002. An improved hydrogen bond potential: impact on medium resolution protein structures. Protein Sci. 11:1415–1423.

47. Jia, L. G., X. M. Wang, ., J. W. Fox. 2000. Inhibition of platelet aggregation by the recombinant cysteine-rich domain of the hemorrhagic snake venom metalloproteinase, atrolysin A. Arch. Biochem. Biophys. 373:281–286.

30. Walsh, S. T., R. P. Cheng, ., W. F. DeGrado. 2003. The hydration of amides in helices; a comprehensive picture from molecular dynamics, IR, and NMR. Protein Sci. 12:520–531.

48. Humphrey, W., A. Dalke, and K. Schulten. 1996. VMD: visual molecular dynamics. J. Mol. Graph. 14:33–38, 27–28.

Biophysical Journal 114, 1–9, April 24, 2018 9