doi:10.1016/j.jmb.2003.09.017
J. Mol. Biol. (2003) 333, 1025–1043
“Acceptor –Donor –Acceptor” Motifs Recognize the Watson –Crick, Hoogsteen and Sugar “Donor – Acceptor –Donor” Edges of Adenine and Adenosinecontaining Ligands Konstantin A. Denessiouk and Mark S. Johnson* Department of Biochemistry ˚ bo Akademi and Pharmacy, A University, Artillerigatan 6 P.O. Box 66, FIN-20521 Turku Finland
Nucleotides are among the most extensively exploited chemical moieties in nature and, as a part of a handful of different protein ligands, nucleotides play key roles in energy transduction, enzymatic catalysis and regulation of protein function. We have previously reported that in many proteins with different folds and functions a distinctive adenine-binding motif is involved in the recognition of the Watson– Crick edge of adenine. Here, we show that many proteins do have clear structural motifs that recognize adenosine (and some other nucleotides and nucleotide analogs) not only through the Watson– Crick edge, but also through the sugar and Hoogsteen edges. Each of the three edges of adenosine has a donor – acceptor – donor (DAD) pattern that is often recognized by proteins via a complementary acceptor – donor – acceptor (ADA) motif, whereby three distinct hydrogen bonds are formed: two conventional N –H· · ·O and N –H· · ·N hydrogen bonds, and one weak C –H· · ·O hydrogen bond. The local conformation of the adenine-binding loop is bbb or bba and reflects the mode of nucleotide binding. Additionally, we report 21 proteins from five different folds that simultaneously recognize both the sugar edge and the Watson– Crick edge of adenine. In these proteins a unique b-loop-b supersecondary structure grasps an adenine-containing ligand between two identical adenine-binding motifs as part of the bab-loop-b fold. q 2003 Published by Elsevier Ltd.
*Corresponding author
Keywords: nucleotide recognition; structural motif; binding site; ligand; protein
Introduction Adenine is one of the most extensively exploited chemical moieties in nature. As a part of the nucleotide adenosine monophosphate (AMP), adenine interacts with other bases in various nucleic acids. On the other hand, as a part of a handful of different protein ligands, adenine interacts with amino acids located within protein binding sites, playing key roles in energy transduction, enzymatic catalysis and regulation of protein function. The structure of a nucleotide base has three Abbreviations used: ADA, acceptor – donor – acceptor; DAD, donor – acceptor – donor; aaRS, aminoacyl-tRNA synthetase; SAM, S-adenosylmethionine. E-mail address of the corresponding author:
[email protected] 0022-2836/$ - see front matter q 2003 Published by Elsevier Ltd.
different “edges” that can participate in interactions with other molecules: (1) the Watson – Crick edge; (2) the Hoogsteen edge, which is also known as the “major groove edge”; and (3) the sugar edge, which was previously called the “shallow groove edge” (Figure 1(a)).1,2 In nucleic acids, these three edges of adenine (and those for the other nucleotides as well) are able to base-pair in both the cis and trans configuration, leading to 12 modes of interaction,3,4 observed in high-resolution crystal structures of RNA.2 A database of noncanonical base-pairs from known RNA structures can be found†.5 The first account of a distinct structural motif for adenine-recognition was reported by Remington et al., who obtained high-resolution crystal structures † http://prion.bchs.uh.edu/bp_type/
Figure 1 (legend opposite)
Recognising the W –C, Hoogsteen and Sugar Edges
of pig heart and chicken heart citrate synthases co-crystallized with coenzyme A.6 These authors observed that the adenine moiety of coenzyme A is stabilized by three hydrogen bonds involving the N6 and N1 nitrogen atoms of adenine and the main-chain oxygen and nitrogen atoms of a sixresidue long adenine-binding loop. In 1997 Kobayashi & Go reported a structural motif for adenine binding,7 which overlaps with that reported by Remington et al.,6 shared between two ATP-binding proteins with different folds, the cAMP-dependent protein kinase (cAPK) and D -Ala:D -Ala ligase (DD-ligase). Although statistical studies on protein environments surrounding adenine have been carried out on several occasions,8 – 10 no other distinct adenine-recognition motifs were known at that time, possibly because adenine is nearly always part of a larger ligand, such as ATP, NAD or FAD, where the primary interacting, functional or otherwise “interesting” group is not the adenine moiety but instead the phosphate groups, flavin or some other group. Our previous studies have focused mainly on interactions between the Watson –Crick edge of adenine and proteins, but where the common features of the motif, a three-residue adeninebinding loop and a hydrophobic residue, are shared among proteins having different folds.11,12 The loop forms two conventional hydrogen bonds with the Watson– Crick edge of adenine: with the N1 ring nitrogen and with the NH2 group at the N6 position (Figure 1(b) and (c)), corresponding to interactions described by Remington et al.6 for citrate synthase. Furthermore, hydrophobic interactions take place between the adenine ring system and the hydrophobic residue positioned perpendicular to the plane of the ring. This motif occurred in 86 different ATP, CoA, NAD, NADP and FADdependent proteins present at that time in the Protein Data Bank (PDB13). The adenine-binding loop that we have described can bind the adenine moiety in either of two orientations, which differ in that the base is rotated about an axis lying in the plane bifurcating the Watson –Crick edge of the adenine ring, and, thus, a “direct” (Figure 1(b)) and a “reverse” (Figure 1(c)) adenine-binding motif were reported.12 A unique variation of the reverse motif (the “Asp” motif) was also observed,
1027
where the second residue in the loop is typically a negatively charged amino acid, mainly aspartate, and the main-chain carbonyl oxygen atom of the residue at position III is replaced by the OD1 (or OD2) side-chain oxygen atom of the aspartate (Figure 1(d)).12 In addition to the N1 and N6 positions on adenine, the C2 carbon position is the third component that defines the Watson– Crick edge (Figure 1(a)). It has been shown on numerous occasions in nucleic acids and proteins that carbon atoms function as a hydrogen bond donor similar to nitrogen and oxygen atoms, forming low-energy “weak” hydrogen bonds.14 – 18 In proteins, four different categories of C – H· · ·OvC weak hydrogen bonds have been described;14 the most common is formed between Ca – H groups and main-chain carbonyl oxygen atoms, often found in adjacent strands of b-sheets. At interfaces between proteins and RNAs, the C – H· · ·O hydrogen bond has been observed in 33% of all contacts involving hydrogen bonds and ionic interactions, totaling 8% of all interactions including those with water molecules.19 The C – H· · ·O hydrogen bond has been observed in RNA – RNA interactions, often for non-canonical or non-Watson– Crick base-pairs that are believed to play an important role in RNA – RNA recognition rather than in stabilizing the RNA structure.1 For example, in high-resolution crystal structures of RNA, 152 types of intramolecular interactions were observed to occur between bases (including both canonical and non-canonical base-pairing), of which 26 types were found to involve a C –H· · ·O hydrogen bond.2 Formation of low-energy C – H· · ·O hydrogen bonds is possible because, even though the electronegativity of a carbon atom (2.55 Pauling units) is significantly less than the electronegativity of an oxygen atom (3.44 Pauling units) or a nitrogen atom (3.04 Pauling units), it is more than that of a hydrogen atom (2.20 Pauling units). Thus, if a hydrogen atom is shared between a carbon atom and an oxygen atom, it will be shared as part of two dipoles: C· · ·H and H· · ·O, having characteristics of conventional hydrogen bonds. The energy of a C –H· · ·O hydrogen bond is small, about 1 kcal/mol, but weak hydrogen bonds have been found to be important stabilizing factors for proteins and
Figure 1. (a) The three edges of adenosine (shown for ADP): the Watson –Crick edge (N6, N1, C2 positions on adenine), the sugar edge (N3, C2 positions on adenine and O20 position on ribose), and the Hoogsteen edge (N6, N7, C8 positions on adenine). (b) The direct and (c) the reverse motif binding of the A-face and B-face, respectively, along the Watson– Crick edge of adenine.12 Residues at positions II and III of the tripeptide often have a hydrophobic or an aliphatic side-chain (Q). (d) The Asp modification of the reverse adenine-binding motif. One of the side-chain oxygen atoms of aspartate at position II forms a hydrogen bond with the N6 nitrogen atom of adenine. (e) – (j) Structural motifs that interact with the Watson –Crick edge of adenine in different protein structures: (e) the ADA motif located of the adenine-binding loop binding the A-face of adenine by means of two conventional hydrogen bonds and one weak C-H· · ·O hydrogen bond; (f) the ADA motif binding the B-face of adenine (the B-face of adenine is shown bound in (f) – (j)); (g) the ADABZ motif, where BZ stands for the direct participation of an aspartate/asparagine or glutamate/ glutamine side-chain; (h) the BZADA motif; (i) the ADABZ motif, variant I; and (j) the ADABZ motif, variant II. Q, A hydrophobic or aliphatic side-chain.
1028
nucleic acids, and their role should not be disregarded when we consider interactions between molecules.20 – 23 In the current study, if we take into account the weak C –H· · ·OvC hydrogen bond formed by the C2 carbon atom, then the direct and the reverse motifs are not seen as two different motifs (as we see when viewed from the perspective of the adenine ring system) but, from the point of view of the protein, can be described as a single “unifying” adenine-binding motif that binds the adenine ring in either of two orientations. We have observed in many proteins with different folds and functions that the adenine-binding motif is involved in the recognition of the Watson– Crick edge. It is not yet clear, however, to what degree proteins recognize the sugar edge and the Hoogsteen edge of the adenine ring. It is also unclear whether a similar or a different motif is involved in recognition of the Hoogsteen edge and the sugar edge of the adenine ring, although it would be logical to suppose that it is the same or a similar motif, because the distribution of donor and acceptor groups along each of the three edges is the same. Moreover, as we will show below, some proteins interact with two or three edges simultaneously. Here, we address these issues, showing that many adenylate-dependent proteins do have clear structural motifs that recognize the adenine ring, not only through the Watson –Crick edge (similar to what is seen in base-pairing in RNA and DNA), but also through the sugar edge and the Hoogsteen edge. The mechanism involved in recognition of all three edges of the adenine ring system is indeed similar, formed by three distinct hydrogen bonds: two conventional N – H· · ·O and N – H· · ·N hydrogen bonds, and one weak C –H· · ·O hydrogen bond. The similar recognition mechanism is partially explained by the conserved secondary structure of the adenine-binding loop. In many FAD, NAD, S-adenosylhomocysteine (SAH) and S-adenosylmethionine (SAM)-dependent proteins with different folds we have observed a novel feature of adenine recognition: a b-loop-b supersecondary structure that functions to grasp an adeninecontaining ligand between two adenine-binding motifs.
Results and Discussion Low-energy short C – H· · ·O contacts are an integral part of adenine recognition In the adenine structure, two carbon atoms can form low-energy short C –H· · ·O interactions if geometry permits. These atoms are the C2 carbon atom, which belongs to both the Watson– Crick and sugar edges, and the C8 atom, which belongs to the Hoogsteen edge. Both the C2 and C8 atoms have adjacent nitrogen atoms, N1 and N3 in the case of the C2 ring position, and N7 and N9 in the case of the C8 ring position, which function to
Recognising the W–C, Hoogsteen and Sugar Edges
increase the acidity of the hydrogen atoms on these two carbon atoms. Thus, both the C2 and C8 carbon atoms can potentially participate in weak but important hydrogen bonds with oxygen atoms. The presence of interactions with the C2 position of the adenine ring system is clearly shown as a “main-chain oxygen cluster” within hydrogen bonding distance of the C2 carbon atom in Figure 4 from Denessiouk et al.,12 which displays carbon, nitrogen and oxygen atoms surrounding the adenine ring system extracted from 540 protein– adenine ligand complexes (part of the protein –ligand interaction library of Rantanen et al.24). The observation was also made by Cappello et al.10 Furthermore, we highlighted12 the example where the Watson–Crick edge of AMP (A-face orientation) and adenine (B-face orientation) bound to the same adenine-binding loop in the solved structures of phosphoribosyltransferase (1qb7, with bound adenine; 1qb8, with bound AMP). This observation suggested to us that the adeninebinding motif of proteins recognizing the Watson– Crick edge has symmetry, which would be possible if the C2 carbon atom of adenine does indeed participate in a hydrogen bond with the binding loop. If one considers the complementary pattern of hydrogen bond acceptor groups and hydrogen bond donor groups on both adenine and its protein-binding site, together with the ability of the carbon atom to form a hydrogen bond, albeit weak, a unifying global motif can be described. Moreover, this motif can be extended to both the Hoogsteen and sugar edges of adenosine. Interactions with the Watson – Crick edge (N6, N1, C2 positions on adenine) The seemingly different pattern of hydrogen bonding shown in Figure 1(b) and (c), representing observed interactions of adenine in many proteins,11,12 coalesces into a single symmetrical motif (Figure 1(e) and (f)) when formulated as a pattern of hydrogen bond donor (one) and hydrogen bond acceptor (two) groups along the tripeptide of the motif. The hydrogen bond donor is the main-chain nitrogen atom of the adenine-binding loop; the hydrogen bond acceptors are two mainchain oxygen atoms. This motif is symmetrical and we refer to the structural pattern as the acceptor – donor –acceptor or adenine-binding (ADA) motif. The Watson– Crick edge of adenine includes the N6 (NH2), N1 (N) and C2 (CH) atoms. These three positions of the adenine ring also form a symmetrical pattern of hydrogen bond donor and acceptor groups. This donor– acceptor – donor pattern, referred to herein as the protein recognition DAD pattern of adenine, is complementary to the ADA motif of proteins that bind adenine along the Watson– Crick edge (Figure 1(e) and (f)). We have observed that in many proteins the ADA motif is bound to the DAD pattern along the Watson– Crick edge of adenine (representative
Table 1. The Watson – Crick edge (N6, N1, C2 adenine moiety) Fold Protein ADA motif Ligand A-face, DAD pattern ATP-grasp P-loop containing NTP hydrolases Protein kinase-like (PK-like) PRTase-like Ligand B-face, DAD pattern PRTase-like Adenine nucleotide a hydrolase-like ETFP adenine nucleotide-binding domain-like FAD-binding domain FAD/NAD(P)-binding domain
PDB code
Cofactor
C2– H– O
x
z
u
3.4 3.3 3.7 3.4
2.4 2.2 2.9 2.4
153 154 129 151
18 330 53 38
128 119 122 126
14 226 43 30
3.3 3.1 3.9
2.6 2.1 2.8
122 148 168
120 172 214
151 140 119
25 5 229
1ez1_A 1f48_A 1cdk_A 1qb8_A
ANP400 ADP590 ANP400A AMP300
1qb7_A 1f9a_A 1efv_B
ADE300 ATP700 AMP600
1i19_A 3grsa
FAD700A FAD479
O_Cys261 O_Gly128
N_Leu263 N_Ala130
O_Leu263 O_Ala130
2.9 3.4
2.1 2.6
130 127
245 2179
148 160
228 0
Nucleotide-binding domain P-loop containing nucleotide triphosphate hydrolases Ribokinase-like Ribosome inactivating proteins (RIP) Protein ADABZ motif Ligand B-face, DAD pattern DHS-like NAD/FAD-binding domain GroEL-like chaperones, ATPase domain NAD(P)-binding Rossmann-fold domains
1h7w_Aa 1e32_A
FAD1031A ADP800A
O_Lys259 O_Asp205
N_Leu261 N_Gly207
O_Leu261 O_Gly207
3.7 3.3
3 2.8
123 108
200 245
140 136
213 239
1gc5_A 1mrj
ADP470 ADN300
O_Val440 O_Ile71 OD/OE N6 OD1_Asp166
2.3 2.5
163 149
142 146
131 126
28 27
NAP201
N_Val440 N_Ile71 N N1 N_Ala167
3.4 3.5
1d4o_A
O_Lys438 O_Val69 O C2 O_Gly165
3.8
2.9
133
167
144
8
1der_A
ATP1A
O_Tyr478
N_Ala480
OD1_Asn479
3.7
2.7
140
184
157
22
1oaa
NAP800
O_Ala69
N_Leu71
OD1_Asp70
3.4
2.6
123
228
146
225
Nucleotide-binding domain SAM-dependent methyltransferases
1i8t_Aa 1ej0_Aa
FAD450 SAM301
O_Ile211 O_Gly98
N_Phe213 N_Phe100
OD1_Asp212 OD1_Asp99
3.3 3.7
2.6 3
126 118
174 193
165 160
2 24
O C2 O_Lys55 O C2 O_Ser651 OD/OE C2 OD1_Asn79
N N1 N_Ile57 N N1 N_Asp653 N N1 N_Gly80
OD/OE N6 OE1_Glu58 OD/OE N6 OD2_Asp653 O N6 O_Gly80
3.5
2.6
142
147
150
16
3.3
2.2
159
230
136
232
3.8
2.9
144
27
131
20
1g55_Aa
SAH392
1dqa_A
NAP1
1d1g_A
NDP170A
O C2 O_Val198 O_Gln278 O_Val123 O_Ala43 N6 O_Ala43 O_Phe119 O_Cys66
H– O
O N6 O_Gly196 O_Phe276 O_Glu121 O_Arg41 C2 O_Arg41 O_Glu117 O_Val64
Protein ADABZ motif, variant I Ligand B-face, DAD pattern SAM-dependent methyltransferases Protein ADABZ motif, Variant II Ligand B-face, DAD pattern Ferredoxin-like Protein BZADA motif Ligand B-face, DAD pattern Dihydrofolate reductases
N N1 N_Val198 N_Gln278 N_Val123 N_Ala43 N1 N_Ala43 N_Phe119 N_Cys66
C2–O
˚ ; C2–H–O, x, z and u in degrees. Distances and angles are shown for the CH–O hydrogen bond. C2– O and H–O, in A a Double edge recognition, in which both the sugar edge and the Watson–Crick edge of adenosine are bound to a protein structure. b All the three edges of adenosine are bound to a protein structure.
1c3o_A, 1iow, 2hgs_A 1cja_A, 1e8x_A, 1ia9_A
1ct9_A, 1gax_A, 1gpm_A 1f0x_A, 1qlt_A, 2mbr 1b37_B,a 1b8s_A,a 1d7y_A, 1f8r_A,a 1hyu_A, 1lvl,a 1nhp,a 1qjd_A,a 1qla_A 1an9_A
1dhs, 1efv_Aa
1b16_A, 1bdb,a 1bsv_A, 1bxk_A,a 1cyd_A, 1e6w_A, 1e7w_A, 1eny, 1hu4_A, 1qrr_A,a 1xela 1af7,a 1boo_A,a 1dl5_A,b 1kr5_A,b 2adm_Aa 1vida
1030
Recognising the W–C, Hoogsteen and Sugar Edges
Table 2. The sugar edge (C2, N3, O20 adenosine moiety) Fold Protein ADA motif Ligand B-face, DAD pattern Methionine synthase (activation domain) Protein ADABZ motif Ligand A-face, DAD pattern DHS-like NAD/FAD-binding domain FAD/NAD(P)-binding domain
PDB code
Cofactor
SAM1301
O O20 O_Arg1134
N N3 N_Ala1136
1efv_Aa
FAD599
O C2 –
N N3 N_Lys301
O C2 O_Ala1136 ˚) (4.2 A OD/OE O20 OD1_Asn300
3grsa
FAD479
–
N_Ser51
OE2_Glu50
HIT-like NAD(P)-binding Rossmannfold domains
1kpf 1qrr_Aa
AMP200 NAD401
O_His42 –
N_Ile44 N_Asn33
OD1_Asp43 OD1_Asp32
Nucleotide-binding domain S-Adenosyl-L -methioninedependent methyltransferases
1cjc_A 1ej0_Aa
FAD1058 SAM301
– -
N_Lys39 N_Leu84
OE2_Glu38 OD1_Asp83
a b
1msk
1pox_A 1b37_B,a 1b3m_A, 1b8s_A,a 1f8r_A,a 1foh_A, 1gpe_A 1lvl,a 1nhp,a 1pbe_A, 1qjd_Aa 1bdb,a 1bxk_A,a 1hwy_A, 1hyh_A, 1psd_A, 1xel,a 2nad_A 1h7w_A,a 1i8t_A,a 2tmd_A 1af7,a 1boo_A,a 1dl5_A,b 1eg2_A, 1fpq_A, 1g55_Aa, 1khh_A, 1kr5_A,b 1vid,a 2adm_A,a 2dpm_A, 6mht_A
Double edge recognition, in which both the sugar edge and the Watson– Crick edge of adenosine are bound to a protein structure. All the three edges of adenosine are bound to a protein structure.
folds are listed in Table 1). The ADAuDAD interactions are formed by three hydrogen bonds: two conventional hydrogen bonds between the N1 and N6 nitrogen atoms of adenine and the mainchain amide and carbonyl groups of the adeninebinding loop, respectively, and one weak hydrogen bond between the C2 carbon atom of adenine and the second carbonyl group of the adenine-binding loop. Because of mutual symmetry, the Watson– Crick edge of the adenine ring system can bind to a protein having the ADA motif by presenting either the A-face or the B-face with respect to the N ! C main-chain direction of the tripeptide fragment (Figure 1(e) and (f)). The DAD pattern of the Watson– Crick edge of adenine was found to interact with the ADA motif and its variants in a total of 21 different family folds representing 112 proteins (a representative structure of each fold is listed in Table 1). In four fold families the A-face of adenine is bound to the ADA motif, while in nine families the B-face of adenine binds the ADA motif. In each of these complexes, solved by X-ray crystallography to a high resolution, the geometry (angles and distances) for the interaction of the C2 carbon atom of adenine with the main-chain oxygen atom from the protein fully satisfies the necessary criteria (see the Materials and Methods) defining a C – H· · ·O hydrogen bond (Table 1). The “Asp” variation of the adenine-binding motif described by Denessiouk et al.,12 in which a side-chain oxygen atom of aspartic acid substitutes for the main-chain carbonyl oxygen in the binding motif, also conforms to the symmetrical ADA motif (Figure 1(g) and (h)). While aspartate predominates in structures present in the PDB having the Asp variation of the adenine-binding motif (Table 1), asparagine (e.g. GroEL-like chaperones
having the ATPase domain fold) has been found to fulfil the same role. As we shall show below (Table 2), glutamate plays an equivalent role in binding the sugar edge of adenosine, but so far glutamine has only been observed within a rare occurrence of the motif, see below. We refer to the Asx/Glx variation of the ADA motif as the ADABZ configuration, where B stands for aspartate/asparagine and Z stands for glutamate/glutamine. Interestingly, among all 23 PDB entries from five family folds with the ADABZ motif that interact with the Watson – Crick edge of adenine, the adenine ring is always present in the B-face orientation (Figure 1(g)). The BZADA motif (Figure 1(h)) so far has been identified in only one structure, dihydrofolate reductase (1dlg), which binds to the B-face of the adenine (Table 1). Two unique variations of the ADABZ motif are found in three complexes with adenine ligands (Table 1). Both DNA cytosine methyltransferase DNMT2 (1g55) and catechol O-methyltransferase (1vid) share the same fold. In these structures, Glu58 in 1g55 and Gln120 in 1vid fulfil the same role as ABZ does in the ADABZ motif, but in each case these residues are located adjacent to the tripeptide adenine-binding loop forming the typical ADABZ motif (compare Figure 1(i) with Figure 1(g)). Similar to all structures with the ADABZ motif, the N6 nitrogen atom of adenine is hydrogen bonded with the side-chain oxygen of Glu58 in 1g55 and Gln120 in 1vid. Thus, while the chemical character of the interaction is retained, a sidechain oxygen atom of glutamate/glutamine, having a longer side-chain than aspartate/asparagine, approaches the N6 nitrogen of adenine not from the amino-terminal end of the protein but from the carboxyl-terminal end. A second variant of the ADABZ motif is present in the structure of
1031
Recognising the W –C, Hoogsteen and Sugar Edges
Figure 2. (a) A rare example of B-face recognition by the ADA motif along the sugar edge of adenosine is seen in methionine synthetase (1msk); (b) many FAD, NAD and SAM-dependent proteins use the ADABZ motif to bind the A-face of adenine along the sugar edge of adenosine (e.g. HIT-like protein, 1kpf).
human Hmg-CoA reductase (1dqa). In 1dqa, the N6 nitrogen atom of adenine forms a hydrogen bond with the side-chain OD2 oxygen of Asp653, but Asp653 is located at position III of the tripeptide adenine-binding loop (ADABZ variant II, Figure 1(j)), not position II as in the ADABZ motif. In all other proteins with the ADABZ motif (Table 1), the residue whose side-chain oxygen atom interacts with the N6 nitrogen atom of adenine is located at position II of the loop (Figure 1(g)).
The sugar edge (C2 and N3 positions on adenine, O20 of ribose) The sugar edge of adenosine (adenine plus ribose) also has the DAD pattern of alternating hydrogen bond donor and acceptor groups (Figure 1(a)). The hydrogen bond acceptor group is the N3 position of adenine, while the hydrogen bond donor groups include the C2 position of adenine and the O20 (hydroxyl group; denoted O2* in PDB
1032
files) of ribose. Consequently, we might expect to observe interactions of this DAD edge with proteins by means of the ADA, ADABZ and BZADA protein motifs. We identified only one protein, methionine synthase (1msk), where the B-face of adenosine along the sugar edge is recognized by the ADA motif (Figure 2(a), Table 2); the length of the ˚ and the interC – H· · ·O bond is a long 4.2 A action with the C2 carbon atom of adenine is correspondingly weak. Many more FAD, NAD and SAM-dependent proteins interact with the sugar edge of adenine, A-face orientation only, by means of the ADABZ motif (Figure 2(b), Table 2). In only one protein, the human protein kinase inhibitor (1kpf), the main-chain carbonyl oxygen atom of His42 has the appropriate geometry to form a weak hydrogen bond with the C2 carbon of bound AMP. In the remaining five folds, two strong conventional hydrogen bonds are present, but the interaction involving the C2 position of adenine is absent, again reflecting the difficulty in achieving the appropriate geometry needed to recognize elements of the adenine ring system along the entire sugar edge. In general, the mainchain oxygen atom, analogous to O_His42 in 1kpf, which would be a good candidate to form a weak hydrogen bond, has distorted geometry where one or more parameters, such as the C2· · ·O and H· · ·O distances and the z and u angles, are out of range. For example, in the case of 1hwy (Table 2) the ˚ (slightly larger than the C2· · ·O distance is 4.04 A ˚ cut-off), the H· · ·O distance is 3.46 A ˚ (not 4.0 A ˚ satisfied, larger than the 3.0 A cut-off), z ¼ 114.738
Recognising the W–C, Hoogsteen and Sugar Edges
(acceptable, $ 908), lul ¼ 538 (not satisfied, larger than 458). This lack of interaction with the C2 carbon can be compensated, however, by structures engaged in recognition of the two edges, the sugar edge and the Watson – Crick edge, simultaneously. Many proteins that bind the sugar edge also bind the Watson– Crick edge by means of either the ADA motif (mostly FAD-dependent proteins) or the ADABZ motif (FAD, NAD(P) and SAMdependent proteins) (structures that interact with two edges of adenine simultaneously are listed in both Tables 1 and 2). In the 21 examples engaged in double edge recognition, representing five folds, the C2 carbon of adenine interacts with the main-chain oxygen atom from the Watson– Crick edge, as shown for glutathione reductase (the FAD/NAD(P)-binding fold, 3grs) (ADA motif) and UDP-galactopyranose mutase (the nucleotidebinding fold, 1i8t) (ADABZ motif) in Figure 3, thus, forming all possible hydrogen bonds along these two edges. The ADA and ADABZ motifs both recognize the Watson– Crick edge of adenine (Table 1), by forming spatially equivalent hydrogen bonds (Figure 3). However, interactions with the sugar edge are almost exclusively mediated by the ADABZ motif (Table 2) whereby side-chain interactions (Asx/ Glx) are made with the O20 hydroxyl group of ribose. For those proteins that recognize the sugar edge, the distance between the O20 oxygen atom of ribose and the N3 nitrogen atom of adenine is larger than the average spacing between the other donor and acceptor groups along the adenine ring. Furthermore, ribose is joined to the adenine
Figure 3. Double edge recognition, in which both the sugar edge and the Watson – Crick edge of adenosine are bound to a protein structure. Glutathione reductase (the FAD/NAD(P)-binding fold, 3grs) is shown in yellow and UDPgalactopyranose mutase (the nucleotide-binding fold, 1i8t) is shown in white. The two proteins involve double edge recognition, in which all possible hydrogen-bonding vacancies along the two edges are occupied. In both proteins the sugar edge of adenine is recognized similarly by ADABZ motifs, while the Watson– Crick edge is recognized by the ADA motif in glutathione reductase and by the ADABZ motif in UDP-galactopyranose mutase. Because the C2 carbon atom interacts with the Watson – Crick edge adenine-binding loop, the geometrical criteria for a second weak interaction of this atom with the sugar edge adenine-binding loop are not fully satisfied (see Table 2).
1033
28 136 138 150
AMP501 NDP500D
ADP600
1fvi_A 1ofg_A
1fwk_A
˚ ; C8 –H–O, x, z and u in degrees. Distances and angles are shown for the CH–O hydrogen bond. C8 –0 and H–O in A All the three edges of adenosine are bound to a protein structure.
SAH699 1dl5_Aa
O C8 – N6 O_Pro26 O_Thr13 O C8 O_Lys61 Protein ADA motif Ligand A-face, DAD pattern S-Adenosyl-L -methionine-dependent methyltransferases Ligand B-face, DAD pattern ATP-grasp NAD(P)-binding Rossmann-fold domains Protein ADABZ motif Ligand A-face, DAD pattern Ribosomal protein S5 domain 2-like
Cofactor PDB code Fold
Table 3. The Hoogsteen edge (N6, N7, C8 adenine moiety)
a
N N7 N_Gly209 N7 N_Ile28 N_Ala15 N N7 N_Val63
O N6 O_Gly209 C8 O_Ile28 – OD/OE N6 OD1_Asn62
3.3
2.5
2.2
H–O C8–O
The adenine-binding loop with the ADA and ADABZ motifs have distinct local conformations For interactions with the Watson –Crick edge and with the sugar edge, where there are many structures with the full or partial complement of acceptor –donor– acceptor interactions (Table 1), the local main-chain comprising the three residues contributing to the ADA, ADABZ and BZADA motifs appears to display distinctive local conformations. The ADA motif uses three main-chain atoms of two amino acid residues to recognize adenine (positions I and III, Figure 1(e) and (f)), while the ADABZ motif uses atoms from each of the three consecutive amino acid residues of the adenine-binding loop (positions I, II and III), where one of the main-chain oxygen atoms is substituted with the a side-chain oxygen atom from an amino acid, Asx or Glx (Figure 1(g)). In order to identify similarities and differences in the local conformations of the main chain for the various ADA motifs, the distribution of mainchain dihedral angles, f and c, were plotted (Figure 5) for the three amino acid residues of the adenine-binding loop (positions I– III) of the structures listed in Tables 1 –3. These plots25 indicate discrete clusters of f, c values that are characteristic of the different ADA motifs, the mode of interaction with the adenine ring, i.e. whether the interaction involves the A-face or the B-face, and the edge of adenine that is recognized.
164
C8– H– O
Like the Watson–Crick and sugar edges of adenine, the Hoogsteen edge also has the DAD pattern of alternating hydrogen bond donor and acceptor groups. The hydrogen bond acceptor group is the N7 position of adenine, while the hydrogen bond donor groups include the N6 and C8 positions of adenine. We have identified only six examples from the PDB (four folds) where the Hoogsteen edge of adenine interacts with the ADA motif or ADABZ motif (Table 3). The ADA motif was found to bind both the A and the B-face of adenine (Figure 4(a)), while binding to the ADABZ motif was restricted to the A-face (Figure 4(b)). Two proteins, S-adenomethyltransferase syl-L -methionine-dependent from bacteria and human (1dl5 and 1kr5), recognize all three edges of adenine-dependent ligand SAH simultaneously: the Watson –Crick edge, the sugar edge, and the Hoogsteen edge are bound by the ADABZ, ADABZ, and ADA motifs, respectively.
3.5
x
359
z
The Hoogsteen edge (N6, N7 and C8 positions on adenine)
124
u
21
1kr5_Aa
ring by a single, rotatable bond, providing flexibility to the adenosine moiety. As a result, it is not surprising that the ADABZ motif, employing aspartate or asparagine, but especially the longer sidechain of glutamate, dominates in the recognition of the sugar edge.
1a0i, 1dgs_A
Recognising the W –C, Hoogsteen and Sugar Edges
1034
Recognising the W–C, Hoogsteen and Sugar Edges
Figure 4. Interactions with the Hoogsteen edge. In (a), the ADA motif interacts with the B-face of adenine along the Hoogsteen edge (e.g. an ATP-grasp fold protein, 1fvi); and in (b), the ADABZ motif interacts with the A-face of adenine along the Hoogsteen edge (e.g. the ribosomal S5 domain 2-like protein, 1fwk).
The conformation of the residue at position I may reflect differences between A-face and B-face binding to the ADA motif Many proteins use the ADA motif to bind both the A-face and B-face of the Watson–Crick edge of adenine using the full set of acceptor – donor – acceptor interactions (Table 1). With few exceptions, the dihedral angles of position I of the ADA motif, f1 and c1, partition into two distinct
clusters (Figure 5(a)): interactions with the A-face of adenine locate within cluster 1, having larger f1 angles than the interactions involving the B-face, all of which are found in cluster 2. For both the A and B-face interactions the dihedral angles f2/c2 and f3/c3 are similar. Regardless of the face involved in the interactions, the local conformation of the adenine-binding loop for the ADA motif is bbb. With the exception of two structures, 1kpf
Recognising the W –C, Hoogsteen and Sugar Edges
1035
Figure 5. The distribution of main-chain dihedral angles f and c for the three amino acids of the adenine-binding loop of the structures listed in Tables 1 – 3. (a) The distribution of angles for the ADA motif, structures from Tables 1 – 3; (b) the distribution of angles for the ADABZ and BZADA motifs along the Watson– Crick edge; and (c) the distribution of angles for the ADABZ motif along the sugar edge (Table 2). PDB codes are shown for outliers of the main clusters.
(A-face, the sugar edge) and 1fwk (A-face, the Hoogsteen edge) all of the ADABZ motifs and the one example of the BZADA motif correspond to B-face interactions with the Watson –Crick edge (Figure 5(b)), while the A-face of the sugar edge binds to the ADABZ motif (Figure 5(c)). As with the ADA motif (Figure 5(a)), A-face interactions have a larger f1 angle in comparison with B-face interactions. For both the ADA and ADABZ (and BZADA) motifs, the f1 and c1 angles lie within the b-strand region of the Ramachandran plot (Figure 5(a) –(c)). Of the six proteins that lie outside the b-strand region in Figure 5(a), ADA motif, four involve flexible glycine at position I (1lvl, 3grs, 1qlt and 1gax) and two non-glycine residues: aspartic acid (1e32) and valine (1mrj). The local conformations of five proteins (1d4o, 1kr5, 1xel, 1boo and 1dl5, Figure 5(b)), all having glycine at position I,
are located close to the b-strand region of the Ramachandran plot. The conformation of the residue at position II separates motifs with the full and partial complement of acceptor– donor – acceptor interactions The f2/c2 torsion angles of the amino acid at position II lies in the b-strand region of the Ramachandran plot for both the ADA (Figure 5(a)) and the ADABZ and BZADA (Figure 5(b) and (c)) motifs. The motifs in Figure 5(a) and (b) have the full complement of acceptor –donor– acceptor interactions involving three hydrogen bonds. In the structures with incomplete motifs (Figure 5(c)), typically involving the sugar edge of adenine (Table 2), the weak hydrogen bond is not formed between the C2 carbon atom of adenine and
1036
Recognising the W–C, Hoogsteen and Sugar Edges
Figure 6. (a) A unique b-loop-b supersecondary structure grasps an adenine-containing ligand between two adenine-binding motifs (double edge recognition), observed in seven proteins from five different folds (see Table 4). GR, glutathione reductase (3grs); UDP-GLM, UDP-galactopyranose mutase (1i8t); FTSJ, FTSJ RNA methyltransferase (1ej0); ETF, electron transfer flavoprotein (1efv); SQD1, SQD1 protein (1qrr); DNMT2, enigmatic DNA methyltransferase homolog (1g55); DPD, dihydropyrimidine dehydrogenase (1h7w). (b) The bab supersecondary structure (gold), described by Wierenga et al.,26 is followed directly by the b-loop-b supersecondary structure (adenine-binding motifs for the sugar edge and the Watson – Crick edge are shown in blue) creating a comprehensive ADP-binding babloop-b fold. Asp83 (shown for FTSJ) is the ribose-interacting aspartate in the ADABZ motif along the sugar edge of adenosine.
main-chain oxygen atom at position I of the motif. The lack of constraints on position I affects the f2 angle of position II (compare Figure 5(c) with Figure 5(a) and (b)). The conformation of the residue at position III discriminates between the ADA and ADABZ motif The f3/c3 for the residue at position III discriminates the ADA motif from the ADABZ and BZADA motifs. There is a significant and clear shift in main-chain conformation from the “b-strand” region (ADA; Figure 5(a)) to the “right handed (a-helix” region (ADABZ and BZADA; Figure 5(b) and (c)). Thus, when we consider the local mainchain conformation at positions I– III, the ADA motif corresponds to bbb, while the ADABZ and BZADA motifs are bba. A unique b-loop-b supersecondary structure grasps an adenine-containing ligand between two identical adenine-binding motifs as part of the bab-loop-b fold Twenty-one proteins from five different folds simultaneously recognize both the sugar edge (A-face) and the Watson –Crick edge (B-face) of adenine (Tables 1 and 2). Each of these proteins uses the same supersecondary structure, b-loop-b
(Figure 6(a)), to grasp the adenine ring along these two edges. The N-terminal end of the b-loop-b structure interacts with the C2 carbon of adenine, functioning to bind the sugar edge, extending to a loop of variable length and variable secondary structure, and finally returning to the adenine binding site where the second b-strand crosses over the top of the first strand, binding the Watson– Crick edge of adenine (Figure 6(a)). This b-loop-b topology is seen in at least five different folds classified by SCOP: the DHS-like NAD/ FAD-binding domain, FAD/NAD(P)-binding domain, NAD(P)-binding Rossmann fold domain, nucleotide-binding domain, and S-adenosyl-L methionine-dependent methyltransferase fold. The loop length varies from seven residues (in FTSJ) to 171 residues (in UDP-GLM); however, the loop always begins and ends with a similarly constructed adenine-binding motif that binds the adenine moiety along two edges (Figure 6(a), Table 4). The motif along the sugar edge belongs to the ADABZ type (Table 4) and, consequently, the side-chain of Asx/Glx is involved in binding adenine; the Watson – Crick edge corresponds to the ADA or the ADABZ type. The b-loop-b structure is distinct from the bab fold, reported by Wierenga et al.26 that binds the ADP moiety of NAD and FAD in many proteins with the Rossmann fold.27 Aspartic acid, corresponding to Asx/Glx present in the ADABZ motif
1037
Recognising the W –C, Hoogsteen and Sugar Edges
Table 4. ADP-binding bab-loop-b fold in seven representative structures of the five different protein folds PDB code
Cofactor
FAD/NAD(P)binding domain
3grs
FAD479
22-DYLVIG-27 bbbbbb
-18aaa-helix
46-AAVVESH-52 bbbbb
-70aaloop
123-IEIIR-GHAA-131 bbbb-b
Nucleotidebinding domain
1h7w_A
FAD1031A
189-KIALLG-194 bbbb
-19aaa-helix
214-ITIFEKQ-220 bbbbb
-32aaloop
253-VKIICGKSLS-262 bbbb
S-Adenosyl-L -methionine-dependent methyltransferases
1g55_A
SAH392
5-RVLELY-10 bbbbb
-19aaa-helix
30-VAAIDVN-36 bbbb
-13aaloop
50-TQLLA-KTIE-58 bb
DHS-like NAD/FADbinding domain
1efv_A
FAD599
274-LYIAVG-279 Bbbbb
-16aaa-helix
296-IVAINKD-302 bbbbb
-9aaloop
312-DYGIV-ADLF-320 Bbbb
NAD(P)-binding Rossmann-fold domains
1qrr_A
NAD401
3-RVMVIG-8 bbbbb
-19aaa-helix
28-VCIVDNL-34 bbbbb
-34aaloop
69-IELYV-GDIC-77 bbbb
Nucleotide-binding domain
1i8t_A
FAD450
3-DYIIVG-8 bbbbb
-18aaa-helix
27-VLVIEKR-33 bbbb
-171aaloop
205-VDVKLGIDFL-214 bbb
S-Adenosyl-L -methionine-dependent methyltransferases
1ej0_A
SAM301
54-TVVDLG-59 bbbbb
-19aaa-helix
79-IIACDLL-85 bbbbb
-7aaLoop
93-VDFLQ-GDFR-101 bbbbb
Fold
Hydrophobic site IV
recognizing the sugar edge of adenosine present in FAD/NAD-binding domains (Table 2), was described as one of the 11 attributes defining the ADP-binding bab-fold.26 The bab supersecondary structure functions to anchor pyrophosphate at the positively charge amino terminus of the a-helix,26 mediated by a conserved water molecule,28 while the adenine-ribose recognition is performed by the ADABZ tripeptide alone or together with Watson – Crick edge recognition via double edge recognition. The bab supersecondary structure is located adjacent to the b-loop-b structure (Figure 6(b)). The bab supersecondary structure consists of two parallel b-strands from the same sheet and the a-helix is positioned parallel with them, ending with the ribose-binding glutamate or aspartate. This glutamate or aspartate is the first amino acid residue of the b-loop-b structure, where the Asx-/ Glx-based adenine-binding motif recognizes the sugar edge of adenine, followed by the long variable loop, followed by the second adenine-binding motif binding the Watson – Crick edge of adenine. We have previously identified a non-polar residue (“hydrophobic site IV”),11,12 which is often found at the end of the first b-strand of the bab supersecondary structure whose side-chain is placed perpendicular to the plane of the adenine ring and interacts with it hydrophobically (Table 4). The adenine-binding b-loop-b supersecondary structure reported here, together with the classic bab supersecondary structure26 that binds ribose phosphate, form a comprehensive ADP-binding bab – loop-b fold seen in proteins represented by five different domain folds. Interestingly, the HITlike fold, which does not have the ADP-binding bab-fold, does have aspartate as part of the ADABZ motif recognizing the sugar edge of both adenosine and guanosine (see below).
The Watson– Crick edge
The sugar edge
Recognition of the other nucleotides may come through their donor –acceptor – donor edges All five nucleotides (adenine, guanine, cytosine, thiamine, uracil) present in DNA and/or RNA also play other roles in molecular biology, functioning as high-energy compounds, substrates, cofactors and coenzymes. Each nucleotide base consists of non-polar atoms alternating with polar groups capable of hydrogen bonding. While adenine has a DAD pattern along all three edges of the ring, in guanine the DAD pattern only occurs along the sugar edge, while the DDA (donor –donor– acceptor) pattern occurs along the Watson– Crick edge, and the AAD pattern along the Hoogsteen edge. Uracil, thymine and cytosine only have the DA portion of the complete DAD pattern along the sugar edge. Interestingly, in the family of histidine triad (HIT) proteins, of which there are three known structures, each of the three structures was solved with a different bound nucleotide, but the binding is similar in each complex. The human HIT-like protein (1kpt) with bound AMP and the rabbit ortholog with bound GMP are 95% identical in sequence and bind their nucleotide cofactors at the same binding site, using the same ADABZ motif for adenosine and guanosine recognition (Figure 7). Likewise, UDP is recognized along its sugar edge by an ADABZ motif (Figure 7) in the related protein nucleotidyl transferase GalT (1hxp) from Escherichia coli,29 which is 17% identical in sequence with the HIT-like proteins. Similarly, inhibitors with pyrimidine rings have been shown to bind to cyclin-dependent kinase 2 (CDK2)30 whose natural substrate is ATP with its purine ring. Two inhibitors, CYC1 and CYC2, mimic Aface binding of adenine (“mode 1” binding) to the
1038
Recognising the W–C, Hoogsteen and Sugar Edges
Figure 7. An ADABZ motif binds the sugar edge of AMP, GMP and UDP in three homologous HIT proteins. Stereo view of the superposition of the human HIT-like protein with bound AMP (1kpt), the rabbit HIT-like protein with bound GMP and the nucleotidyl transferase GalT from E. coli with bound UDP (1hxp).
ADA motif, while CYC3 and CYC4 mimic B-face binding (“mode 2” binding) to the ADA motif.30 Systems where adenine is recognized as part of a larger polynucleotide Many proteins function through the specific recognition of DNA or RNA. Consequently, we have examined 3D structures of aminoacyl-tRNA synthetases (aaRS) in complex with tRNA, the adenine-specific methyltransferase M.Taq I in complex with DNA, elongation factor Tu in complex with tRNA and other known protein –DNA and protein –RNA complexes from the PDB (listed by Treger & Westhof19 and Jones et al.31). From the analysis of these RNA/DNA-binding proteins, it is clear that proteins do use the ADA motif to recognize functionally and structurally important adenosine monomers in DNA or RNA. Interactions between aminoacyl-tRNA synthetases and tRNA There are 34 crystal structures of complexes between tRNA and class I aaRS (GluRS, GlnRS, ArgRS, ValRS, IleRS and TyrRS) and class II aaRS (AspRS, PheRS, ThrRS, ProRS and SerRS); 31 of these complexes also contain the ATP/AMP cofactor (previously, we have shown that interactions between aminoacyl-tRNA synthetases and bound ATP cofactors involve ADA motif recognition of the Watson –Crick edge of the adenine moiety of ATP11,12). Of the interactions between aaRSs and tRNA, ADAuDAD interactions were only seen in ValRS (1gax)32 and in AspRS (1asy)33 for the terminal 30 adenosine of the bound tRNA (Table 5), the position where activated amino acids are attached
to the tRNA. Coordinates for the terminal adenosine are missing from the structures of IleRS, ProRS and SerRS, presumably because the position is highly mobile. Indeed, the temperature factors in the crystal structures for the terminal adenosine in each of the remaining complexes, as well as ValRS and AspRS, are high, about 100, suggesting that adenosine is not tightly anchored to the conformation of the synthetase present in the crystal structure. It is also possible that interactions between adenine and the ADA motif of a synthetase will occur only transiently during catalysis. AspRS is particularly interesting because in a second structure, where ATP is also present (1asz),34 the terminal 30 adenosine of the tRNA binds with the A-face of the Watson– Crick edge to an alternative site having the ADA motif, but with distorted geometry. ATP in 1asz, however, is bound at the same location and in an identical way as the terminal adenosine of tRNA in 1asy where the B-face of the Watson –Crick edge is recognized (Table 5). Adenine-specific DNA methyltransferase M.Taq I Adenine-specific DNA methyltransferase M.Taq I (1g38) is a DNA base-flipping enzyme where adenine is specifically methylated.35 The active site of M.Taq I contains two adenosine-binding subsites, both of which involve ADAuDAD interactions: the ADABZ motif recognizes the B-face of the Watson –Crick edge of adenine in the ligand NEA500 (a SAM analog), and the ADA motif recognizes the B-face of the Hoogsteen edge of the flipped adenosine residue (A606) from the DNA structure (Table 5). Moreover, because M.Taq I belongs to the S-adenosyl-L -methionine-dependent
1039
Recognising the W –C, Hoogsteen and Sugar Edges
Table 5. Adenine recognition in protein – DNA/RNA complexes Protein –DNA/RNA complex aaRS, protein ADA motif Ligand B-face, W –C edge,a DAD pattern Valyl-tRNA synthetase Aspartyl-tRNA synthetase Aspartyl-tRNA synthetase Ligand A-face, W–C edge, DAD pattern Aspartyl-tRNA synthetase Methyltransferase, protein ADABZ motif Ligand B-face, W –C edge, DAD pattern Adenine-specific DNA methyltransferase M.Taq I Ligand A-face, sugar edge, DAD pattern Adenine-specific DNA methyltransferase M.Taq I Methyltransferase, protein ADA motif Ligand B-face, Hoogsteen edge, DAD pattern Adenine-specific DNA methyltransferase M.Taq I Other examples, protein ADA motif Ligand B-face, sugar edge, DAD pattern formyltransferase Methionyl-tRNAMet f Ligand B-face, W –C edge, DAD pattern Met Methionyl-tRNAf formyltransferase trp RNA-binding attenuation protein Nova-2 KH RNA-binding domain KH-QUA2 region of the splicing factor 1 Pseudouridine synthase TruB Ligand A-face, W–C edge, DAD pattern Polydenylate binding protein 1 Other examples, Protein ADABZ motif Ligand B-face, W –C edge, DAD pattern Polydenylate binding protein 1 a b
PDB code
RNA/DNA
Ligand
1gax_A 1asy_A 1asz_A
tRNA tRNA tRNA
A975_C A676_R ATP701A
1asz_A
tRNA
A676_R
1g38_A
DNA
NEA500
1g38_A
DNA
NEA500
1g38_A
DNA
A606_B
2fmt_A
tRNA
A73_C
2fmt_A 1c9s_L 1ec6_A 1k1g_A 1k8w_A
tRNA RNA RNA RNA RNA
A76_C A105_W A14_D A508_B A413_B
1cvj_A
RNA
A6_M
1cvj_A
RNA
A7_M
O C2 –b – – N6 O_Ser280 O C2 O_Ala88 C2 – O N6 O_Pro106 O O20 O_Pro39 C2 – O_Ser35 O_Ile39 – – N6 O_Trp86 O C2 -
N N1 N_Glu261 N_Met335 N_Met335 N1 N_Gly282 N N1 N_Phe90 N3 N_Ile72 N N7 N_Tyr108 N N3 N_Gly41 N1 N_Leu207 N_Lys37 N_Ile41 N_Ile177 N_His43 N1 N_Gln88 N N1 N_Met46
O N6 O_Glu261 O_Met335 O_Met335 C2 – OD/OE N6 OD1_Asp89 O20 OE2_Glu71 O C8 O_Tyr108 O C2 – N6 O_Leu207 O_Lys37 O_Ile41 O_Ile177 O_His43 C2 – OD/OE N6 OD1_Asp45
W– C edge, The Watson–Crick edge of adenine. –, One or several parameters for the weak hydrogen bond are not satisfied (distorted geometry).
methyltransferase fold, both the sugar edge (A-face) and the Watson – Crick edge (B-face) of NEA500 are recognized by the unique b-loop-b supersecondary structure described above. Other protein – DNA/RNA complexes We have identified ADAuDAD interactions for adenine binding in six other proteins with five different folds (Table 5). As for the aaRSs and M.Taq I methyltransferase, above, each of these six structures uses the ADA recognition motif in binding nucleotides that appear to be either functionally or structurally “important”, since the base of the nucleotide is exposed and can participate in interactions with the protein. formyltransferase comIn methionyl-tRNAMet f (2fmt),36 plexed with formyl-methionyl-tRNAMet f adenosine A73, the “discriminator base”, is bound via its sugar edge to one ADA motif, while at the same time the 30 adenosine (A76) of f-Met-tRNAMet f is bound to a second ADA motif with its Watson – Crick edge. In the crystal structure of trp RNA-binding attenuation protein (TRAP) in complex with single-stranded RNA (1c9s), the 53 bases of the RNA contain 11 GAG triplets separated by AU dinucleotides.37 The Watson – Crick edge of adenine in each of the 11 triplets is bound to one of the 11 identical subunits of TRAP via the ADA motif. In the crystal structure of the Nova KH RNA-
binding domain bound to an RNA hairpin (1ec6), the 50 U-C-A-C30 core RNA recognition sequence interacts directly with the Gly-X-X-Gly sequence motif.38 The Watson –Crick edge of the adenine (A14) of the core RNA recognition sequence interacts with the ADA motif of the protein. Similarly, ADA recognition occurs in the KH-QUA2 region of splicing factor 1 in complex with RNA (1k1g) where adenine A508 (Ade8 in Liu et al.39), the second (underlined) adenosine in the conserved 50 U-A-A-C30 RNA recognition region, interacts with the Gly-X-X-Gly motif. Pseudouridine synthase (TruB) from E. coli is an RNA base-flipping enzyme but uridine, not adenine, is the flipped base.40 In the structure of the complex with the bound T stem – loop (TSL) of tRNA (1k8w), adenine A413 (A58 in Hoang & Ferre´-D0 Amare´40) is directly responsible for the functional ability of TruB to recognize substrate and modify uridine U410 to pseudouridine C410 (U55 ! C55 in Hoang & Ferre´-D’Amare´40). Adenine A413 (A58), whose Watson – Crick edge is recognized by the ADA motif from TruB, does participate at the same time in a reverse Hoogsteen base-pair U409:A413 (U54:A58), invariant in tRNAs, thus giving an interesting example of an essential functional nucleotide that interacts both with another nucleotide along the Hoogsteen edge and with a protein along the Watson–Crick edge. In one instance, however, adenine recognition
1040
Recognising the W–C, Hoogsteen and Sugar Edges
Table 6. Representative protein structures that satisfy all the criteria of adenine-binding but angle u, which is unrestricted Fold Protein ADA motif Ligand A-face, DAD pattern ATP-grasp Protein kinaselike (PK-like) Ligand B-face, DAD pattern ETFP adenine nucleotidebinding domainlike Protein ADABZ motif Ligand A-face, DAD pattern P-loop containing nucleotide triphosphate hydrolases Ligand B-face, DAD pattern NAD(P)-binding Rossmann-fold domains Protein ADABZ motif Ligand A-face, DAD pattern FAD/NAD(P)binding domain NAD(P)-binding Rossmann-fold domains
PDB code
1gsa 1csn
1mjh_A
1nks_F
1enp
Cofactor
ADP317 ATP299
ATP2001
ADP4
NAP501
O
N
O
N6
N1
C2
O_Asn199 O_Asp86
N_Leu201 N_Leu88
O_Leu201 O_Leu88
C2
N1
N6
O_Leu39
N_Val41
O_Val41
O
N
OD/OE
N6
N1
C2
O_Val178
N_Gly180
OE1_Glu179
C2
N1
N6
O_Leu88
N_Ala90
OD1_Asp89
O
N
OD/OE
C2
N3
O20
1h7w_A
FAD1031A
O_Phe217
N_Lys219
OE2_Glu218
2ohx_B
NAD403B
-
N_Ile224
OD1_Asp223
clearly plays a structural role. In human poly(A)binding protein 1 (PABP1) in complex with polyadenylate RNA (PDB code: 1cvj; Deo et al.41) eight adenine residues are bound to the surface of the protein in the complex.41 Adenine bases A1 to A4 are bound to RNA recognition motif domain 1 (RRM1), adenine bases A6 to A9 are bound to RRM2, and adenine A5 is bound to a short linker connecting RRM1 and RRM2. Two adjacent adenine bases, A6 and A7, are sandwiched between residues of the poly(A)-binding protein and interact with two different ADA and ADABZ motifs, respectively (Table 5). Adenine bases A4 and A3 are placed against the RRM1 domain and symmetrically to A6 and A7. Although A3 and A4 also participate in donor/acceptor interactions with PABP1, these interactions do not mimic those made by A6 and A7 and, strictly speaking, are not of the ADA type: the N1 nitrogen of A4 interacts with the side-chain oxygen atom of Ser127, while its N6 nitrogen atom interacts with the main-chain oxygen atom of the same Ser127 residue (hydrogen ˚ and 2.8 A ˚ , respectively); A3 bonds with 2.7 A would make a good candidate for an ADA interaction along its sugar edge, but the polypeptide chain around the sugar edge is in the cis con-
C2 – O ˚) (A
H –O ˚) (A
C2 – H – O (deg.)
x (deg.)
z (deg.)
u (deg.)
3.8 3.3
3.0 2.3
132 143
53 289
113 111
47 2 62
3.6
2.6
155
241
125
2 46
3.1
3.1
82
250
79
2 67
3.3
2.8
110
255
105
2 69
3.8
3.0
132
247
119
2 53
formation such that the main-chain oxygen atoms of Arg172, Phe173, Lys174 and Ser175 all point towards the sugar edge. In each of these examples (Table 5), because of functional or structural necessity, “key” adenine nucleotides in DNA and RNA are exposed to the protein interface. Often, these nucleotides are recognized by proteins using the ADA motif as seen for proteins that bind ATP, FAD, NAD and other mono- and dinucleotides and their analogs.
Concluding Remarks Here, we have shown that the DAD pattern of each of the three edges of adenine can be recognized by proteins via a complementary ADA motif along with its two variations, ADABZ and BZADA. Each of these motifs has been observed in known protein structures. The ADA motif binds either the A-face of adenine or its B-face, while the ADABZ and BZADA motifs were found to bind only the B-face of adenine although, theoretically, they should be able to bind both. In addition to the DAD pattern presented by each of the three edges of adenine and the sugar edge of guanine,
1041
Recognising the W –C, Hoogsteen and Sugar Edges
alternative patterns exist for other nucleotides. The DDA, AAD and DA patterns are recognized by proteins using complementary AAD, DDA and AD motifs, respectively. Often, hydrogen bonds have been defined without regard to geometrical parameters, only considering distances between heavy atoms and ignoring the hydrogen atom. Here, we have included hydrogen atoms (added using SYBYL) and, with the exception of a long distance in the case of 1msk and where the elevation angle lul . 458 in some cases of double-edge recognition of adenosine, all reported results in Tables 1 –3 satisfy all geometrical criteria, especially important for establishing the participation of the weak C – H· · ·O hydrogen bond in adenine recognition. Relaxation of just one of the six parameters, u, for C – H· · ·O hydrogen bonds identifies additional protein– adenosine complexes (Table 6) that otherwise fulfil the geometric criteria, distances and angles, and belong to interaction types reported in Tables 1– 3; including a single instance of the ADABZ motif binding the A-face of adenine observed in nucleotide triphosphate hydrolase (Table 6). Here, we have shown that many proteins recognize adenosine (and even some other nucleotides and nucleotide analogs) in a similar way, where the adenine-binding loop, anticipated 20 years ago by Remington et al.,6 binds not only the Watson – Crick edge, but can also bind the sugar edge and the Hoogsteen edge. Comparison of the large number of solved structures of protein – nucleotide complexes has demonstrated how similar nucleotide recognition is in many proteins, despite their fold and functional differences.
Materials and Methods All structures of protein – ligand complexes where adenine is a part of the ligand, including those considered by us in previous studies,11,12 were extracted from the PDB.13 Folds were classified according to the FSSP fold classification database.42 Protein atoms that are in direct contact with ligand atoms were identified using the Ligand– Protein Contacts (LPC) software.43 Definitions We have revised our previous descriptions of direct and reverse adenine binding motifs in this study so that the positions of residues in the nucleotide-binding loop are always numbered consecutively along the polypeptide chain from the amino-terminal end (displayed on the left) to the carboxyl-terminal end (displayed on the right): in the N ! C direction. The orientation of the adenine ring system in which the N1, C2 and N3 atoms are viewed to run in the clockwise direction is referred to as the A-face (as in Figure 1(a)) of the adenine ring, while the counterclockwise run of these atoms define the B-face of the ligand where the ring has been flipped by 1808. Thus, a protein binds the A-face when the A-face of the ligand is bound to the adenine-binding loop in the N ! C direction; a protein binds the B-face
Figure 8. To confirm the presence of a weak C-H· · ·O ˚ hydrogen bond, two distance criteria, H· · ·O # 3.0 A ˚ , and two angular criteria, z . 908 and C· · ·O # 4.0 A and lul # 458, need to be satisfied.14 x, the H· · ·O ¼ C-Ca dihedral angle; j, the C ¼ O···H angle; u, the elevation angle of the hydrogen atom of the C-H hydrogen bond donor above the plane of the hydrogen bond acceptor group; z, the C-H· · ·O angle. The elevation angle u is calculated from the formula sinu ¼ sin j sin x. Angles j and x and the H· · ·O and C· · ·O distances are measured directly from the structures; hydrogen atoms were added to structures using SYBYL.
when the B-face of the ligand is bound to the adeninebinding loop in the N ! C direction. SYBYL (Tripos Inc., St. Louis, MO, USA) was used to model hydrogen atoms and to calculate all distances and angles. For conventional hydrogen bonds, the ˚ and heavy atom distance criteria d(N· · ·O) # 3.7 A ˚ d(N· · ·N) # 3.7 A were used for the NH· · ·O and NH· · ·N bonds, respectively; angular criteria, as described below for C– H· · ·O hydrogen bonds, were also imposed. Because the energy of a C – H· · ·O bond is small, three sets of criteria were imposed in order to confirm the likely presence of a weak hydrogen bond.14,15,20,44 Firstly, the CHO angle z must be greater than 908, and the C –H· · ·O bond must have a small elevation angle u, where lul # 458. u is the angle between the H· · ·acceptor vector and the plane of the acceptor group (the mainchain carbonyl oxygen, the main-chain carbonyl carbon, and the Ca atom of the same residue define this plane) calculated from the formula: sin u ¼ sin j sin x, where j and x are angles found from the atomic geometries as seen in Figure 8. Secondly, an electronegative atom (nitrogen or oxygen atom) must be located adjacent to the carbon atom, such that the acidity of hydrogen atoms attached to the carbon atom increases, and consequently, the carbon atom could be a hydrogen bond donor analogous to nitrogen and oxygen atoms.20 This occurs at the C2 and C8 positions of the adenine ring ˚ and system. Thirdly, the C – O distance must be # 4.0 A ˚ . Thus the generalized the H – O distance must be # 3.0 A mean distance of the C– H· · ·OvC hydrogen bond (a dipole – induced dipole interaction) is intermediate ˚, a in character between that of a hydrogen bond (2.8 A dipole· · ·dipole interaction) and a C· · ·C interaction ˚ , an induced dipole – induced dipole inter(. 4.0 A action).14 Figures Figures were produced with: ACD/Chemsketch v5.0,
1042
Advanced Chemistry Development Inc., Toronto, Canada,† MOLSCRIPT V2.145 and Raster3D V2.4b.46
Acknowledgements This work was supported by grants from the Academy of Finland (no. 204530), the Technology Development Centre for Finland (TEKES), Sigrid Juselius Foundation and the National Graduate School in Informational and Structural Biology. We thank our anonymous referees for their critical comments on the manuscript.
References 1. Westhof, E. & Fritsch, V. (2000). RNA folding: beyond Watson– Crick pairs. Structure, 8, R55– R65. 2. Leontis, N. B., Stombaugh, J. & Westhof, E. (2002). The non-Watson– Crick base pairs and their associated isostericity matrices. Nucl. Acids Res. 30, 3497– 3531. 3. Leontis, N. B. & Westhof, E. (1998). Conserved geometrical base-pairing patterns in RNA. Quart. Rev. Biophys. 31, 399– 455. 4. Leontis, N. B. & Westhof, E. (2001). Geometric nomenclature and classification of RNA base pairs. RNA, 7, 499– 512. 5. Nagaswamy, U., Voss, N., Zhang, Z. & Fox, G. E. (2000). Database of non-canonical base pairs found in known RNA structures. Nucl. Acids Res. 28, 375– 376. 6. Remington, S., Wiegand, G. & Huber, R. (1982). Crystallographic refinement and atomic models of two different forms of citrate synthase at 2.7 and ˚ resolution. J. Mol. Biol. 158, 111 – 152. 1.7 A 7. Kobayashi, N. & Go, N. (1997). ATP binding proteins with different folds share a common ATP-binding structural motif. Nature Struct. Biol. 4, 6 – 7. 8. Moodie, S. L., Mitchell, J. B. O. & Thornton, J. M. (1996). Protein recognition of adenylate: an example of a fuzzy recognition template. J. Mol. Biol. 263, 486– 500. 9. Nobeli, I., Laskowski, R. A., Valdar, W. S. & Thornton, J. M. (2001). On the molecular discrimination between adenine and guanine by proteins. Nucl. Acids Res. 29, 4294–4309. 10. Cappello, V., Tramontano, A. & Koch, U. (2002). Classification of proteins based on the properties of the ligand-binding site: the case of adenine-binding proteins. Proteins: Struct. Funct. Genet. 47, 106– 115. 11. Denessiouk, K. A. & Johnson, M. S. (2000). When fold is not important: a common structural framework for adenine and AMP binding in 12 unrelated protein families. Proteins: Struct. Funct. Genet. 38, 310– 326. 12. Denessiouk, K. A., Rantanen, V. V. & Johnson, M. S. (2001). Adenine recognition: a motif present in ATP, CoA-, NAD-, NADP-, and FAD-dependent proteins. Proteins: Struct. Funct. Genet. 44, 282– 291. 13. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H. et al. (2000). The Protein Data Bank. Nucl. Acids Res. 28, 235– 242. † www.acdlabs.com
Recognising the W–C, Hoogsteen and Sugar Edges
14. Derewenda, Z. S., Lee, L. & Derewenda, U. (1995). The occurrence of C –H· · ·O hydrogen bonds in proteins. J. Mol. Biol. 252, 248–262. 15. Wahl, M. C. & Sundaralingam, M. (1997). C – H· · ·O hydrogen bonding in biology. TIBS, 22, 97 – 102. 16. Mandel-Gutfreund, Y., Margalit, H., Jernigan, R. L. & Zhurkin, V. B. (1998). A role for C –H· · ·O interactions in protein – DNA recognition. J. Mol. Biol. 277, 1129– 1140. 17. Chu, P. Y. & Hwang, M. J. (1998). New insights for dinucleotide backbone binding in conserved C50 – H· · ·O hydrogen bonds. J. Mol. Biol. 279, 695– 701. 18. Chakrabarti, P. & Chakrabarti, S. (1998). C – H· · ·O hydrogen bond involving proline residues in a-helices. J. Mol. Biol. 284, 867– 873. 19. Treger, M. & Westhof, E. (2001). Statistical analysis of atomic contacts at RNA– protein interfaces. J. Mol. Recognit. 14, 199– 214. 20. Taylor, R. & Kennard, O. (1982). Crystallographic evidence for the existence of C – H· · ·O, C – H· · ·N, C – H· · ·Cl hydrogen bonds. J. Am. Chem. Soc. 104, 5063– 5070. 21. Levitt, M. & Perutz, M. F. (1988). Aromatic rings act as hydrogen bond acceptors. J. Mol. Biol. 201, 751– 754. 22. Pedireddi, V. R. & Desiraju, G. R. (1992). A crystallographic scale of carbon acidity. J. Chem. Soc., Chem. Commun., 988– 990. 23. Weiss, M. S., Brandl, M., Suhnel, J., Pal, D. & Hilgenfeld, R. (2001). More hydrogen bonds for the (structural) biologist. Trends Biochem. Sci. 26, 521– 523. 24. Rantanen, V. V., Denessiouk, K. A., Gyllenberg, M., Koski, T. & Johnson, M. S. (2001). A fragment library based on Gaussian mixtures predicting favorable molecular interactions. J. Mol. Biol. 313, 197– 214. 25. Ramachandran, G. N. & Sassiekharan, V. (1968). Conformation of polypeptides and proteins. Advan. Protein Chem. 28, 283– 437. 26. Wierenga, R. K., Terpstra, P. & Hol, W. G. (1986). Prediction of the occurrence of the ADP-binding bab-fold in proteins, using an amino acid sequence fingerprint. J. Mol. Biol. 187, 101– 107. 27. Rossmann, M. G., Moras, D. & Olsen, K. W. (1974). Chemical and biological evolution of nucleotidebinding protein. Nature, 250, 194– 199. 28. Bottoms, C. A., Smith, P. E. & Tanner, J. J. (2002). A structurally conserved water molecule in Rossmann dinucleotide-binding domains. Protein Sci. 11, 2125– 2137. 29. Brenner, C. (2002). Hint, Fhit, and GalT: function, structure, evolution, and mechanism of three branches of the histidine triad superfamily of nucleotide hydrolases and transferases. Biochemistry, 41, 9003– 9014. 30. Wu, S. Y., McNae, I., Kontopidis, G., McClue, S. J., McInnes, C., Stewart, K. J. et al. (2003). Discovery of a novel family of CDK inhibitors with the program LIDAEUS: structural basis for ligand-induced disordering of the activation loop. Structure, 11, 399–410. 31. Jones, S., Daley, D. T., Luscombe, N. M., Berman, H. M. & Thornton, J. M. (2001). Protein – RNA interactions: a structural analysis. Nucl. Acids Res. 29, 943– 954. 32. Fukai, S., Nureki, O., Sekine, S., Shimada, A., Tao, J., Vassylyev, D. G. & Yokoyama, S. (2000). Structural basis for double-sieve discrimination of L -valine from L -isoleucine and L -threonine by the complex
1043
Recognising the W –C, Hoogsteen and Sugar Edges
33.
34.
35.
36.
37.
38.
of tRNA(Val) and valyl-tRNA synthetase. Cell, 103, 793–803. Ruff, M., Krishnaswamy, S., Boeglin, M., Poterszman, A., Mitschler, A., Podjarny, A. et al. (1991). Class II aminoacyl transfer RNA synthetases: crystal structure of yeast aspartyl-tRNA synthetase complexed with tRNA(Asp). Science, 252, 1682– 1689. Cavarelli, J., Eriani, G., Rees, B., Ruff, M., Boeglin, M., Mitschler, A. et al. (1994). The active site of yeast aspartyl-tRNA synthetase: structural and functional aspects of the aminoacylation reaction. EMBO J. 13, 327–337. Goedecke, K., Pignot, M., Goody, R. S., Scheidig, A. J. & Weinhold, E. (2001). Structure of the N6-adenine DNA methyltransferase M.Taq I in complex with DNA and a cofactor analog. Nature Struct. Biol. 8, 121–125. Schmitt, E., Panvert, M., Blanquet, S. & Mechulam, Y. (1998). Crystal structure of methionyl-tRNAMet transf formylase complexed with the initiator formylmethionyl-tRNAMet . EMBO J. 17, 6819– 6826. f Antson, A. A., Dodson, E. J., Dodson, G., Greaves, R. B., Chen, X. & Gollnick, P. (1999). Structure of the trp RNA-binding attenuation protein, TRAP, bound to RNA. Nature, 401, 235– 242. Lewis, H. A., Musunuru, K., Jensen, K. B., Edo, C., Chen, H., Darnell, R. B. & Burley, S. K. (2000). Sequence-specific RNA binding by a Nova KH
39.
40.
41. 42. 43. 44.
45. 46.
domain: implications for paraneoplastic disease and the fragile X syndrome. Cell, 100, 323– 332. Liu, Z., Luyten, I., Bottomley, M. J., Messias, A. C., Houngninou-Molango, S., Sprangers, R. et al. (2001). Structural basis for recognition of the intron branch site RNA by splicing factor 1. Science, 294, 1098– 1102. Hoang, C. & Ferre-D’Amare, A. R. (2001). Cocrystal structure of a tRNA c55 pseudouridine synthase: nucleotide flipping by an RNA-modifying enzyme. Cell, 107, 929– 939. Deo, R. C., Bonanno, J. B., Sonenberg, N. & Burley, S. K. (1999). Recognition of polyadenylate RNA by the poly(A)-binding protein. Cell, 98, 835– 845. Holm, L. & Sander, C. (1996). Mapping the protein universe. Science, 273, 595–602. Sobolev, V., Wade, R. C., Vriend, G. & Edelman, M. (1996). Molecular docking using surface complementarity. Proteins: Struct. Funct. Genet. 25, 120– 129. Leonard, G. A., McAuley-Hecht, K., Brown, T. & Hunter, W. N. (1995). Do C – HO hydrogen bonds contribute to the stability of nucleic acid base pairs? Acta Crystallog. sect. D, 51, 136–139. Kraulis, P. J. (1991). MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallog. 24, 945–949. Merritt, E. A. & Bacon, D. J. (1997). Raster3D: photorealistic molecular graphics. Methods Enzymol. 277, 505 –524.
Edited by P. J. Hagerman (Received 16 June 2003; received in revised form 4 September 2003; accepted 12 September 2003)