Peptides 23 (2002) 2085–2090
Estimation of amino acid pairs sensitive to variants in human phenylalanine hydroxylase protein by means of a random approach Guang Wu a,∗ , Shaomin Yan b a
Laboratoire de Toxicocinétique et Pharmacocinétique, Faculté de Pharmacie, Université de la Méditerranée Aix-Marseille II, Marseille, France b Cattedra di Anatomia Patologica, Dipartimento di Ricerche Mediche e Morfologiche, Facoltà di Medicina e Chirurgia, Università degli Studi di Udine, Udine, Italy Received 6 June 2002; accepted 25 July 2002
Abstract In this data-based theoretical analysis, we use a random approach to estimate amino acid pairs in human phenylalanine 4-hydroxylase (PAH) protein in order to determine which amino acid pairs are more sensitive to 187 variants in human PAH protein. The rationale of this study is based on our hypothesis and previous findings that the harmful variants are more likely to occur at randomly unpredictable amino acid pairs rather than at randomly predictable pairs. This is reasonable to argue as randomly predictable amino acid pairs are less likely to be deliberately evolved, whereas randomly unpredictable amino acid pairs are probably deliberately evolved in connection with protein function. 94.12% of 187 variants occurred at randomly unpredictable amino acid pairs, which accounted for 71.84% of 451 amino acid pairs in human PAH protein. The chance of a variant occurring is five times higher in randomly unpredictable amino acid pairs than in predictable pairs. Thus, randomly unpredictable amino acid pairs are more sensitive to variance in human PAH protein. The results also suggest that the human PAH protein has a natural tendency towards variants. © 2002 Elsevier Science Inc. All rights reserved. Keywords: Phenylalanine hydroxylase; Probability; Randomness; Variants
1. Introduction The phenylalanine 4-hydroxylase (PAH) is a liver enzyme, which catalyses the conversion of phenylalanine (F) to tyrosine (Y) [12]. The deficient activity of PAH results in hyperphenylalaninemia with an incidence of approximately 1:10,000 in most Caucasian populations [20]. A complete absence or profound deficiency of the enzyme activity brings about very high elevations of blood ‘F’ and accumulate phenylketones—phenylketonuria. A partial deficiency of PAH leads to lower elevations of blood ‘F’ without phenylketone accumulation—non-phenylketonuria hyperphenylalaninemia. In the United States, the incidence ranges wildly, about one per 15,000 newborns with phenylketonuria and one per 48,000 newborns with non-phenylketonuria hyperphenylalaninemia [16]. The neurologic disease appears to be secondary to the increase in brain free ‘F’ and the decrease in other large neutral amino acids [22]. Abnormalities of untreated phenylketonuria individuals manifest ∗
Corresponding author. E-mail address:
[email protected] (G. Wu).
tremor, clumsiness, epilepsy, spastic paraparesis and occasionally extrapyramidal features [4]. Pathologic changes in the brain of these patients include (1) hypomyelination and gliosis of the systems which normally myelinate late, (2) the infrequent occurrence of progressive white matter degeneration, and (3) developmental delay or arrest in cerebral cortex [11]. The vast majority of phenylketonuria and non-phenylketonuria hyperphenylalaninemia are autosomal recessive disorders caused by mutations in the PAH gene, which is a single locus with more than 400 identified mutations, including deletions, insertions, missense mutations, nonsense mutations, and splicing defects [17]. The observed spectra of PAH mutations closely reflect the impact of population history on genetic variation [9]. Certain PAH alleles are associated with phenylketonuria and others with non-phenylketonuria hyperphenylalaninemia. Also genes at other loci may influence ‘F’ transport within the brain. Therefore, the genetic heterogeneity contributes to the clinical and biological heterogeneity [8,13]. Missense mutations in the PAH gene compose 60% of the alleles impairing PAH function [21]. The mutations in the catalytic site affect
0196-9781/02/$ – see front matter © 2002 Elsevier Science Inc. All rights reserved. PII: S 0 1 9 6 - 9 7 8 1 ( 0 2 ) 0 0 2 4 9 - 8
2086
G. Wu, S. Yan / Peptides 23 (2002) 2085–2090
enzyme activity [3], while the mutations distant from the enzyme active site cause misfolding and altered oligomerization of PAH protein, promoting accelerated cellular proteolytic degradation [23,24]. In the case of so many variants in the enzyme, however, little is known about which amino acid sub-sequences in PAH protein are more sensitive to variants. It is still difficult to draw a general rule regarding which amino acid sub-sequences are more or less sensitive to variants. If such a general rule could be drawn, then we could gain not only more insight into the relationship between PAH protein and its related diseases but, more importantly we could pay more attention to these sensitive sub-sequences in order to prevent them from variants. Moreover the possible sub-sequences sensitive to the, currently, unknown variants could be predicted. This problem can be assessed from different approaches such as empiric (regression analysis), experimental (artificial and natural mutations), and computation (multiple sequence comparisons and alignments), etc. Currently two explanations are commonly proposed to explain why some amino acids are mutated more frequently than others. The first is targeted mutagenesis, which defined the ‘hotspot’ sites sensitive to endogenous and exogenous mutagens [10,15,19]. The second is the function selection, which indicates the disruption of protein functions may depend upon the position of the mutation/variant in the protein [1,7,18]. However, these explanations still do not answer why some amino acid sub-sequences are sensitive to variants. Probably the probabilistic approach can contribute considerable understanding to this problem. Chou and Liu introduced the frequency of amino acid pairs to predict protein secondary structure content [5,14]. In the past, we have used probabilistic approaches to analyze the primary structure of different proteins which have added probabilistic information regarding protein constructions and their related diseases. In general, our first approach can predict the present and absent amino acid sub-sequences in a protein primary structure. We argue that the randomly predictable present and absent sub-sequences were probably not deliberately evolved, whereas the randomly unpredictable present and absent sub-sequences were more likely to be deliberately evolved. Accordingly our approach can classify the present amino acid sub-sequences as randomly predictable and unpredictable sub-sequences. We suggest that the randomly unpredictable amino acid sub-sequences are more related with protein function and the variants in these sub-sequences may lead to the dysfunction of protein. More recently we found that a mutation which led to the dysfunction of rat monoamine oxidase B was located in a randomly unpredictable amino acid pair. In contrast, another mutation which did not affect rat monoamine oxidase B function was located in randomly predictable amino acid pairs [25]. In this study, we attempt to use a random approach to analyze amino acid pairs in human PAH protein with its 187
variants in order to determine which amino acid pairs are more sensitive to the variants.
2. Materials and methods The amino acid sequence of the human PAH protein and its variants was obtained from the Swiss-Protein data bank (access number P00439, due to the limitation of space, we will not cite the numerous references related to human PAH protein and its variants) [2]. Among 196 variants, 8 with small deletions resulted in one amino acid absent from the mutant protein and another one brought about two amino acids being substituted. The rest 187 variants with missense point mutant were selected for the current study. The detailed calculations and rationales have already been published in a number of our previous studies (for the details, see our review article [26]). Briefly, the calculation procedure with its examples is as follows. 2.1. Amino acid pairs in human PAH protein The human PAH protein consists of 452 amino acids. The first and second amino acids are counted as an amino acid pair, the second and third as another amino acid pair, the third and fourth, until the 451st and 452nd, thus there is a total of 451 amino acid pairs. There are 20 types of amino acids, any amino acid pair can be composed from any of 20 types of amino acids so, theoretically, there are 400 (202 ) possible amino acid pairs. As there are 451 amino acid pairs in human PAH protein, which are more than the potential 400 types of theoretical amino acid pairs, clearly some of these 400 types of theoretical amino acid pairs should appear more than once. Meanwhile we may expect that some of 400 types of theoretical amino acid pairs are absent from human PAH protein. 2.2. Randomly predicted frequency (PF) and actual frequency (AF) in human PAH protein The randomly predicted frequency is calculated according to a simple permutation principle [6]. For example, there are 28 alanines (A) and 50 leucines (L) in human PAH protein, the predicted frequency of amino acid pair ‘AL’ would be 3 (28/452 × 50/451 × 451 = 3.097). Actually we can find three ‘AL’s in this protein, so the actual frequency of ‘AL’ is 3. Hence we have three relationships between actual and predicted frequencies, i.e. the actual frequency is smaller, equal to and larger than the predicted frequency, respectively. 2.3. Randomly predictable present amino acid pairs As described in the last section, the predicted frequency of randomly present amino acid pair ‘AL’ would be 3 and ‘AL’ does appear three times in human PAH protein, so the presence of ‘AL’ is randomly predictable.
G. Wu, S. Yan / Peptides 23 (2002) 2085–2090
2.4. Randomly unpredictable present amino acid pairs There are 24 arginines (R) and 23 prolines (P) in human PAH protein, the frequency of random presence of amino acid pair ‘RP’ would be 1 (24/452×23/451×451 = 1.221), i.e. there would be one ‘RP’ in human PAH protein. But in fact the ‘RP’ appears three times, so the presence of ‘RP’ is randomly unpredictable. In this case the actual frequency of ‘RP’ is larger than the predicted frequency of ‘RP’. In other case the actual frequency is smaller than the predicted frequency. For example, there are 24 threonines (T) in human PAH protein and the predicted frequency of ‘TL’ is 3 (24/452 × 50/451 × 451 = 2.655), whereas the actual frequency is only 1. 2.5. Randomly predictable absent amino acid pairs There are three tryptophans (W) in human PAH protein, the frequency of random presence of ‘AW’ would be 0 (28/452 × 3/451 × 451 = 0.186), i.e. the amino acid pair ‘AW’ would not appear in human PAH protein, which is true in the real situation. Thus, the absence of ‘AW’ is randomly predictable. 2.6. Randomly unpredictable absent amino acid pairs There are 36 glutamic acids (E) in human PAH protein and the frequency of random presence of ‘RE’ would be 2 (24/452 × 36/451 × 451 = 1.912), i.e. there would be two ‘RE’s in human PAH protein. However, no ‘RE’ appears in the protein, therefore the absence of ‘RE’ from human PAH protein is randomly unpredictable. 2.7. Variants in randomly predictable and unpredictable amino acid pairs Our rationale for determination of variants in randomly predictable and unpredictable present amino acid pairs is based on our previous findings [26]. There were two mutations in rat monoamine oxidase B. The first mutation occurred at position 139 changing L to histidine (H). The amino acids were P at positions 138 and A at positions 140, thus this mutation led to four amino acid pairs being changed, i.e. ‘PL’ → ‘PH’ and ‘LA’ → ‘HA’. As ‘PL’ and ‘LA’ were randomly predictable amino acid pairs according to our random analysis, consequently we would not expect that the first mutation resulted in a substantial change in enzymic activity, which was true in the real situation. The second mutation occurred at position 199 and substituted ‘I’ by ‘F’, which changed amino acid pairs as ‘II’ → ‘IF’ and ‘IS’ → ‘FS’. As ‘IS’ belonged to the randomly unpredictable amino acid pairs, we would expect that the second mutation to brought about a substantial change in enzymic activity, such an expectation also was true in the real situation. In this manner we can determine whether a variant occurs at randomly pre-
2087
dictable or unpredictable amino acid pairs in human PAH protein. 2.8. Difference between actual and randomly predicted frequencies For the numerical analysis, we calculate the difference between actual frequency and predicted frequency of affected amino acid pairs, i.e. (AF − PF). For instance, a variant at position 333 substitutes ‘L’ by phenylalanine (F) which results in two amino acid pairs ‘GL’ and ‘LC’ being changed to ‘GF’ and ‘FC’, because the amino acid is ‘G’ at position 332 and ‘C’ at position 334. The actual frequency and predicted frequencies are 6 and 3 for ‘GL’, 2 and 1 for ‘LC’, 3 and 1 for ‘GF’, and 0 and 1 for ‘FC’, respectively. Thus, the difference between actual and predicted frequencies is 4 with regard to the old substituted amino acid pairs, i.e. (6 − 3) + (2 − 1), and 1 to the new substituting amino acid pairs, i.e. (3 − 1) + (0 − 1). In this way, we can compare the frequency difference in the amino acid pairs affected by variants.
3. Results 3.1. General information on amino acid pairs in human PAH protein Of 400 types of theoretical amino acid pairs, 166 are absent from human PAH protein including 80 randomly predictable and 86 randomly unpredictable. Consequently 451 amino acid pairs in human PAH protein include only 234 types of theoretical amino acid pairs (400 − 166 = 234), i.e. some amino acid pairs should appear more than once. Actually of 451 amino acid pairs in human PAH protein, 109 types of theoretical amino acid pairs appear once, 71 twice, 31 three times, 13 four times, 5 five times, and 5 six times. Of 234 types of theoretical amino acid pairs in human PAH protein, 94 are randomly predictable and 140 are randomly unpredictable. As mentioned above, some types of amino acid pairs appear more than once, thus, of 451 amino acid pairs in human PAH protein, 127 pairs are randomly predictable and 324 pairs are randomly unpredictable. Therefore, the number of variants occurring with respect to these amino acid pairs in human PAH protein can be detected by probability (Table 1). 3.2. Variants of human PAH protein in randomly predictable and unpredictable present amino acid pairs As mentioned in Section 2, a point mutant protein leads to two amino acid pairs being substituted by another two and their actual frequency can be smaller, equal to or larger than the predicable frequency. Tables 2 and 3 detail the situations related to substituted and substituting amino acid pairs, respectively and the relationship between their actual and predicted frequencies. In both tables the affected amino
2088
G. Wu, S. Yan / Peptides 23 (2002) 2085–2090
Table 1 Occurrence of variants with respect to randomly predictable and unpredictable amino acid pairs in human PAH protein PAH protein
Types
Pairs
Number Predictable Unpredictable
94 140
Total
234
%
Number
40.17 59.83
Variants %
127 324
100
451
28.16 71.84 100
Table 2 Classification of substituted amino acid pairs induced by variants in human PAH protein Amino acid pairs
Variants
I
Number
II
Total (%) %
Predictable
AF = PF
AF = PF
11
5.88
5.88
Unpredictable
AF AF AF AF AF
AF AF AF AF AF
79 72 17 5 3
42.25 38.50 9.09 2.67 1.60
94.12
> PF > PF > PF < PF < PF
> PF = PF < PF = PF < PF
AF: actual frequency; PF: predicted frequency.
acid pairs are presented as amino acid pairs I and II. Again the example is the variant mentioned in the last paragraph of Section 2. ‘GL’ and ‘LC’ are amino acid pairs I and II, respectively in Table 2. Correspondingly, ‘GF’ and ‘FC’ are amino acid pairs I and II in Table 3. Table 2 can be read as follows. The first column classifies the substituted amino acid pairs into randomly predictable and unpredictable. The second and third columns show in which type of amino acid pairs the variant occurs, for ex-
Number 11 176 187
Ratio % 5.88 94.12 100
Variants/types
Variants/pairs
11/94 = 0.12 176/140 = 1.26
11/127 = 0.09 176/324 = 0.54
187/234 = 0.80
187/451 = 0.41
ample, the first two cells in columns two and three indicate that the actual frequencies are equal to the predicated frequencies in both amino acid pairs I and II. The fourth and fifth columns indicate how many variants occur in amino acid pairs I and II. Eleven of 187 variants (5.88%) occur at amino acid pairs whose actual frequencies are equal to predicted frequencies. The sixth column indicates the percentage of 187 variants occurring at predictable and unpredictable amino acid pairs. Tables 1 and 2 show that 94.12% of variants occur at randomly unpredictable amino acid pairs and only 5.88% of variants occur in randomly predictable amino acid pairs. These results mean that 140 types of randomly unpredictable present amino acid pairs account for 94.12% variants in human PAH protein, whereas 94 types of randomly predictable present amino acid pairs account for 5.88%. These results strongly support our rationale that the harmful variants are more likely to occur at randomly unpredictable amino acid pairs rather than at randomly predictable. Thus, the randomly unpredictable amino acid pair positions are more sensitive to the variants. When looking at the unpredictable pairs in Table 2, it can be seen that the vast majority of these pairs are characterized
Table 3 Classification of substituting amino acid pairs induced by variants in human PAH protein Amino acid pairs I AF AF AF AF AF AF AF AF AF
= 0, PF > = 0, PF > = 0, PF > = 0, PF > = 0, PF > = PF = 0 = PF = 0 = PF = 0 = PF = 0
AF AF AF AF AF AF
< PF, AF = 0 < PF, AF = 0 < PF, AF = 0 = PF > 0 > PF = PF > 0
0 0 0 0 0
Variants
Total (%)
II
Number
%
AF AF AF AF AF AF AF AF AF
= 0, PF > 0 = PF = 0 = PF > 0 < PF, AF = 0 > PF = PF = 0 = PF > 0 < PF, AF = 0 > PF
10† 3† 30† 9† 26†
49.72
4 4 0† 7
5.35 1.60 16.04 4.81 13.90 2.14 2.14 0 3.74
AF AF AF AF AF AF
< PF, AF = 0 = PF > 0 > PF = PF > 0 > PF > PF
4† 12† 16† 14 17 31
2.14 6.42 8.56 7.49 9.09 16.58
50.28
The dagger (†) indicates the variants which target one or both substituting amino acid pairs with their actual frequency smaller than predicted one (totally 58.82%).
G. Wu, S. Yan / Peptides 23 (2002) 2085–2090
2089
Fig. 1. Frequency difference between substituted and substituting amino acid pairs induced by variants.
by one or both substituted amino acid pairs whose actual frequency is larger than their predicted frequency (the first three rows in unpredictable pairs). Comparing with the normal human PAH protein, the impact of variants is to diminish the difference between actual and predicted frequencies by means of reducing the actual frequency which implies that the variants associate with the construction of amino acid pairs to be randomly predictable. In other words, the variants result in the construction of amino acid pairs which are more likely to be naturally evolved. Also three variants occur in the amino acid pairs whose actual frequency is smaller than predicted frequency in both pairs. This interesting phenomenon suggests that it is difficult for variants to narrow the difference between actual and predicted frequencies by means of increasing the actual frequency. Commonly, reduction of actual frequency would lead to the construction of amino acid pairs against natural direction. Table 3 can be read as follows. The first and second columns indicate the actual and predicted situations in amino acid pairs I and II, the third and fourth columns indicate the number of variants occurring at amino acid pairs I and II and their percents, the fifth column shows total classifications. Table 3 shows that 49.72% of variants bring about one or both substituting amino acid pairs being absent in normal human PAH protein (AF = 0). Also 58.82% of variants target one or both substituting amino acid pairs with their actual frequency smaller than predicted frequency (†). These phenomena indicate that the amino acid pairs in mutant PAH proteins are more randomly constructed. 3.3. Frequency difference of amino acid pairs affected by variants The difference between actual and predicted frequencies represents a measure of randomness of construction of amino acid pairs, i.e. the smaller the difference, the more random
the construction of amino acid pairs. In particular (i) the larger the positive difference, the more randomly unpredictable the amino acid pairs are present and (ii) the larger the negative difference, the more randomly unpredictable the amino acid pairs are absent. Considering all 187 variants, the difference between actual and predicted frequencies is 1.96 ± 0.11 (mean ± S.E., ranging from −2 to 6) in substituted amino acid pairs. This means that the variants occur in the amino acid pairs which appear more than their predicted frequency. Meanwhile, the difference between actual and predicted frequencies is −0.03 ± 0.11 (mean ± S.E., ranging from −3 to 5) in substituting amino acid pairs. This implies that the substituting amino acid pairs are more randomly constructed in the mutant PAH proteins, as their actual and predicted frequencies are about the same. Striking statistical difference is found between the substituted and substituting amino acid pairs (P < 0.0001). Fig. 1 shows the distribution of difference between actual and predicted frequencies.
4. Discussion In this study we use a random approach to analyze the amino acid pairs in human PAH protein to determine which amino acid pairs are more sensitive to variants. The results confirm our hypothesis that the randomly unpredictable amino acid pairs are more sensitive to variants. This data-based theoretical analysis may provide a clue for protecting human PAH protein from variants and highlight the nature of some PAH variants. Based on our previous studies (due to the limitation of space, see our review article [26] for detail of reference), our argument is that the functional amino acid pairs are more likely to be deliberately evolved and thus the actual frequency should be different from the predicted frequency.
2090
G. Wu, S. Yan / Peptides 23 (2002) 2085–2090
As the predicted frequency is the highest potential for construction of amino acid pairs, it is important to find whether the variant leads to the actual frequency to approach the predicted frequency. If so, the protein has a natural trend to variants; if not, the protein does not have a natural trend to variants. The present study demonstrates that 90% of variants bring about one or both substituted amino acid pairs whose actual frequency is larger than predicted frequency, that half of variants result in one or both substituting amino acid pairs which are absent in normal human PAH protein (AF = 0) and that about 60% of variants lead to one or both substituting amino acid pairs with their actual frequency smaller than predicted frequency. All of these results reveal that the human PAH protein has a natural trend to variance. With respect to randomly unpredictable absent and present amino acid pairs, the difference between actual and predicted frequencies is interesting, because the predicable absent and present frequencies represent the more likely naturally occurring events, i.e. the construction of amino acid pairs should be the least energy- and time-consuming. Thus, the difference between actual and predicted frequencies should be engineered by the evolutionary process, i.e. the larger the difference, the larger the impact by the evolutionary process. Diminishing of difference between actual and predicted frequencies has been shown in this study (Fig. 1), thus the variants in fact are a degeneration process inducing the various phenotypes related to PAH deficiency. The current study highlights the changes in the frequencies of amino acid pairs in mutant human PAH protein. In general, missense point mutations modify the configuration of amino acid pairs in a protein, which could target the changes in the secondary structure contents and consequently affect biologic functions of the protein. Therefore, our approach may provide useful insight into the molecular mechanisms of PAH deficient diseases. Acknowledgments The authors wish to thank the anonymous reviewers for their insightful comments, which sharpen up the current version of manuscript. References [1] Aas T, Borresen AL, Geisler S, Smith-Sorensen B, Johnsen H, Varhaug JE, et al. Specific p53 mutations are associated with de novo resistance to doxorubicin in breast cancer patients. Nat Med 1996;2:811–4. [2] Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 2000. Nucleic Acids Res 2000;28:45–8. [3] Bjorgo E, Knappskog PM, Martinez A, Stevens RC, Flatmark T. Partial characterization and three-dimensional-structural localization of eight mutations in exon 7 of the human phenylalanine hydroxylase gene associated with phenylketonuria. Eur J Biochem 1998;257:1–10. [4] Brenton DP, Pietz J. Adult care in phenylketonuria and hyperphenylalaninemia: the relevance of neurological abnormalities. Eur J Pediatr 2000;159(Suppl 2):S114–20.
[5] Chou KC. Using pair-coupled amino acid composition to predict protein secondary structure content. Protein Engin 1999;12:1041–50. [6] Feller W. An introduction to probability theory and its applications. Vol. I, 3rd ed. New York: Wiley; 1968. [7] Forrester K, Lupold SE, Ott VL, Chay CH, Band V, Wang XW, et al. Effects of p53 mutants on wild-type p53-mediated transactivation are cell type dependent. Oncogene 1995;10:2103–11. [8] Guldberg P, Rey F, Zschocke J, Romano V, Francois B, Michiels L, et al. A European multicenter study of phenylalanine hydroxylase deficiency: classification of 105 mutations and a general system for genotype-based prediction of metabolic phenotype. Am J Hum Genet 1998;63:71–9. [9] Guttler F, Guldberg P. Mutation analysis anticipates dietary requirements in phenylketonuria. Eur J Pediatr 2000;159(Suppl 2):S150–3. [10] Hainaut P, Pfeifer GP. Patterns of p53 G → T transversions in lung cancers reflect the primary mutagenic signature of DNA-damage by tobacco smoke. Carcinogenesis 2001;22:367–74. [11] Huttenlocher PR. The neuropathology of phenylketonuria: human and animal studies. Eur J Pediatr 2000;159(Suppl 2):S102–6. [12] Kaufman S. Phenylalanine 4-monooxygenase from rat liver. Methods Enzymol 1987;142:3–17. [13] Kayaalp E, Treacy E, Waters PJ, Byck S, Nowacki P, Scriver CR. Human PAH mutation and hyperphenylalaninemia phenotypes: a metanalysis of genotype-phenotype correlations. Am J Hum Genet 1997;61:1309–71. [14] Liu W, Chou KC. Prediction of protein secondary structure content. J Protein Chem 1999;18:473–80. [15] Montesano R, Hainaut P, Wild CP. Hepatocellular carcinoma: from gene to public health. J Natl Cancer Inst 1997;89:1844–51. [16] National Institutes of Health Consensus Development Panel. In: Proceedings of the National Institutes of Health Consensus Development Conference Statement: Phenylketonuria: Screening and Management, 2000 October 16–18. Pediatrics 2001; 108:972–82. [17] Nowacki PM, Byck S, Prevost L, Scriver CR. PAH mutation analysis consortium database: 1997. Prototype for relational locus-specific mutation databases. Nucleic Acids Res 1998;26:222–7. [18] Ory K, Legros Y, Auguin C, Soussi T. Analysis of the most representative tumour-derived p53 mutants reveals that changes in protein conformation are not correlated with loss of transactivation or inhibition of cell proliferation. EMBO J 1994;13:3496–504. [19] Rideout WM, Coetzee GA, Olumi AF, Jones PA. 5-Methylcytosine as an endogenous mutagen in human LL receptor and p53 genes. Science 1990;249:1288–90. [20] Scriver CR, Kaufman S, Eisensmith RC, Woo SLC. The hyperphenylalaninemias. In: Scriver CR, Beaudet AL, Sly WS, Valle D, editors. The metabolic and molecular bases of inherited disease. New York, McGraw-Hill; 1995. p. 1015–75. [21] Scriver CR, Waters PJ, Sarkissian C, Ryan S, Prevost L, Cˆotè D, et al. PAHdb: a locus-specific knowledge base. Hum Mutat 2000;15:99– 104. [22] Surtees R, Blau N. The neurochemistry of phenylketonuria. Eur J Pediatr 2000;159(Suppl 2):S109–13. [23] Waters PJ, Parniak MA, Akermen BR, Scriver CR. Characterization of phenylketonuria missense substitutions, distant from the phenylalanine hydroxylase active site, illustrates a paradigm for mechanism and potential modulation of phenotype. Mol Genet Metab 2000;69:101–10. [24] Waters PJ, Parniak MA, Hewson AS, Scriver CR. Alterations in protein aggregation and degradation due to mild and severe missense mutations (A104D, R157N) in the human phenylalanine hydroxylase gene (PAH). Hum Mutat 1998;12:344–54. [25] Wu G, Yan SM. Prediction of presence and absence of two- and three-amino-acid sequence of human monoamine oxidase B from its amino acid composition according to the random mechanism. Biomol Engineer 2001;18:23–7. [26] Wu G, Yan SM. Randomness in the primary structure of protein: methods and implications. Mol Biol Today 2002;3:55–69.