Radial distributions of water—water distances in protein crystals

Radial distributions of water—water distances in protein crystals

Radial distributions of water-water distances in protein crystals Anthony C. T. North and Jeremy C. Smith* Astbury Department of Biophysics, Universit...

298KB Sizes 24 Downloads 24 Views

Radial distributions of water-water distances in protein crystals Anthony C. T. North and Jeremy C. Smith* Astbury Department of Biophysics, University of Leeds, Leeds LS2 9JT, UK (Received 19 March 1984; revised 26 February 1985) An accurate protein crystallographic structure determination requires a knowledge of the solvent contribution to the diffraction pattern. As resolutions improve, research groups are reporting coordinates of large numbers of water molecules. We examine the accuracy of these coordinates by presenting radial distributions of water-water distances from refinements at different stages and interpreting them in terms of preferred hydrogen-bonding distances and problems in solvent electron density map interpretation. Marked differences between the distributions suggest that wide variations exist in the water molecule selection and refinement criteria employed by different research groups which mask possible real differences in solvent structure. Keywords: Protein crystallography; water; radial distributions; coordinate accuracy

Introduction In recent years a number of protein crystallographic analyses have attained resolutions high enough to reveal distinct features in the solvent region of electron density maps. This region is of interest not only because of its functional significance in vivo but also because the solvent contribution to the diffraction pattern must be calculated in order to refine the protein structure itself to the highest accuracy. In the low-angle region of the diffraction pattern, Bragg reflections are affected by the contrast in average electron density between the solvent regions and the protein regions in the unit cell, and an adequate allowance for the solvent contribution in this region can be obtained by modelling the solvent with a continuous electron density l'z. Several techniques have shown that solvent molecules close to the protein surface are more ordered than those further away from the surface 3. Those solvent molecules that are fixed in the same position in different unit cells will diffract coherently and contribute significantly to the higher order reflections. An accurate calculation of the high-angle solvent contribution therefore requires a more detailed model of the solvent region. There are, however, several interpretational problems in trying to locate molecules in the solvent region of electron-density maps and the criteria used for water molecule selection vary considerably between different research groups. Although near to the protein surface peaks of density are often well-defined and can be readily identified with water molecules coordinated to groups on the protein, further away from the surface the peaks generally become broader and lower until it is difficult to separate them from noise. Indeed, artefactual peaks higher than the noise level may appear because of experimental inaccuracies or errors in phase assignment. Therefore, while it is possible to assign broad, low peaks until the noise level is reached, the significance of their * Present address: Institut Laue-Langevin, 156X, 38042 Grenoble, France 0141-8130/85/040223-O3503,00 © 1985 Butterworth & Co. (Publishers) Ltd

identification as real solvent molecules becomes increasingly doubtful. This conclusion is supported by the fact that some recent refinements (including a careful 0.9 A refinement of vitamin B124) have reported no clear correlation between occupancy and B factor in the solvent region, whereas one might expect the B factor to rise and the occupancy to fall farther from the protein surface. However, some papers explicitly specify minimum values of occupancies and maximum values of B for features to be acceptable as solvent molecules. In addition, as artefactual peaks in the solvent region may arise from incorrect modelling of parts of the protein structure, some workers refrain from assigning as water any peaks that are close to parts of the protein structure not yet satisfactorily accounted for 5. Interpretation of the solvent region thus necessarily involves a degree of subjective judgment based on the apparent stereochemical reasonableness of the peak positions and on the behaviour of site occupancies and peak shapes (thermal parameters, B) as refinement proceeds. However, the application of stereochemical constraints such as the maintenance of good hydrogenbonding distances between water molecules 5 may be of doubtful value because alternative sites, each with partial occupancy, can result in real peaks in the electron-density map being separated by less than the minimum permitted instantaneous distance 6, as the map is necessarily both time-averaged and spatially averaged through all the unit cells of the crystal. A number of recent protein crystallographic structure analyses have resulted in the coordinates of many 'water molecules' being deposited along with the protein atom coordinates in the Protein Data Bank 7. In this paper we present histograms of the radial distributions of waterwater distances in three protein crystallographic analyses which are at different stages of refinement and try to interpret the reliability of the deposited coordinates in terms of conventional preferred nearest-neighbour distances and of the problems in solvent electron-density map analysis that we have discussed above.

Int. J. Biol. Macromol., 1985, Vol 7, August

223

Water-water distances: A. C. T. North and J. C. Smith than an area correction; the choice is not, however, crucial to our conclusions.

Results and conclusions 5

u

gE 4 -

C

>"

I-

I

:3

QI

g u_

I

2

0.0

1.0

_1 I

3.0

2.0

-

4.0

5.0

O

Radial distance (A)

Fil~ure I Radial distribution of water-water distances m the 2 A lysozyme coordinate file

i

t

2

v

Our calculated radial distributions are shown in Figures 1-3. In interpreting the histograms, it is necessary to bear in mind that in liquid water a t)~pical O . . . O hydrogenbonding distance is about 2.9 A and second- and thirdnearest neighbours are around 4-5 A and 7-8 A respectively from the origin molecule s . In the 2 A lysozyme refinement 9, Protein Data Bank file 6LYZ (Figure 1), the water molecules had been refined allowing positions, occupancies and radii to vary. The histogram shows a broad distribution. In the 1.7 A actinidin refinement 5, Protein Data Bank file 2ACT (Figure 2), water molecules were initially selected only when they were within approximate hydrogen-bonding distances from each other. The molecules listed fell into two groups. The first consisted of the 153 best ordered ones which had B factors ranging from 7 to 40 A 2; although initially selected only when they were within approximate hydrogen-bonding distance of each other, they were subsequently allowed to vary without restraint on their positions during refinement. A second set of 109 molecules was then selected from 'low peaks' (0.3 to 0.4 eA-3) in order to 'make the water structure more complete'. The resulting radial distribution in Figure 2 has a clear peak at around 2.9 A and subsidiary peaks around 4-5 A and 7-8 A indicating second- and third-neighbour preferred distances. In the trypsin:inhibitor refinement ~° (Figure 3), Protein Data Bank file I PTC, solvent molecules were located by using difference Fourier maps and chosen only if they lay in electron density higher than 0.25 eA- 3 and occupied 'stereochemically reasonable positions'. All water molecules were given full occupancy and a tempera-

IJ.

0

~

2

3

4

5

6

7

8

9

10

V

A

Radial distance (~,)

Figure 2 Radial distribution of water-water distances in the 1.7 A actinidin coordinate file

=o =

-I

1

i

-q Method For each of the crystal structures considered, all water molecule oxygen--oxygen distances (including distances between symmetry-related water molecules) were calculated and corrected for the fact that in an isotropic system the number of molecules in a thin shell at any given radius from a specified molecule is proportional to the volume of the shell. Although some of this volume may be occupied by protein (thus leading to an over-correction at larger distances) the uneven and curved shape of the protein surface suggests that this correction is more valid

224

Int. J. Biol. Macromol., 1985, Vol 7, August

I

I.k

I

'1

III I

,

0.0

1.0

2,0

I 4.0

3.0

5.0

O

Radial distance (A)

Figure 3 Radial distribution of water-water distances in the 1.9 A pancreatic trypsin:inhibitor coordinate file. Distances below 1 A not included (see Table I)

W a t e r - w a t e r distances: A. C. T. N o r t h and J. C. Smith

1 Numbers of short water-water distances for each structure

ture factor of 20 A 2. Despite these criteria Figure 3 shows a broad distribution and many short distances (Table 1). The assignment of full occupancy to the water molecules indicates that the short distances are not due to peaks that had been identified as representing alternative sites.

of bulk water, that for lysozyme has little structure and that for the trypsin:inhibitor has many too-close distances. We believe that the differences between the three distributions arise from a combination of causes, including differences between the criteria used for initial identification of peaks as representing water molecules, differences in the use of stereochemical constraints in the course of the refinement, and differences between the resolutions of the X-ray diffraction data, all of which may mask possible real differences between the solvent order in the three crystals. We wish particularly to emphasize the dangers that are inherent in the application to the solvent region of a protein crystal of the types of stereochemical or energy restraints that are included in several commonly adopted refinement procedures 12-14. Such restraints force the solvent region into an unrealistic, static single-network configuration. Conclusions concerning coordinate accuracy, and hence solvent organization, must therefore be made cautiously, as the stereochemical assumptions and selection criteria used necessarily bias the results. At the outset of this work, we had hoped to derive information about the solvent structure within protein crystals. In the event, our calculations point rather to the need for very careful interpretation of the published water positions in protein crystals, in the light of the methods used to derive them and of the resolution of the data against which they have been refined.

Discussion

Acknowledgements

It is clear that a complete representation of the water structure surrounding a protein molecule would include all possible coordinate networks, with each water molecule characterized both by the probability of its occurrence at a given site per unit time and by an appropriate thermal factor. A recent molecular dynamics simulation of a full unit cell of bovine pancreatic trypsin inhibitor hast~ been performed by van Gunsteren et al. ~ . This simulation included 560 water molecules. When the positions of the 47 observed X-ray water molecules were compared with the simulated ones, only nine were sited within 1 A. These results indicate that our understanding of the nature of the solvent region is still primitive. In this study we have derived the radial distributions of water-water distances in three different coordinate sets deposited in the Protein Data Bank. Table 2 indicates that there is some correlation between the resolution, the Rfactor and the number of water molecules included in the refinement. It is of course well known that the mere inclusion of water with any structure will reduce the Rfactor. More importantly, we draw attention to the marked differences between the three radial distributions. While the distribution for actinidin closely resembles that

We thank Drs J. L. Finney, H. F. J. Savage, E. N. Baker and R. Huber for helpful comments.

Table

Distance (A)

Lysozyme Actinidin

Trypsin:inhibitor

0.0-0.5 0.5-1.0

-

-

10 6

1.0--1.5 1.5-2.0

2

-

9 8

2.0--2.25 2.25-2.5

8 5

4 27

5 7

2 The number of water molecules listed in the Protein Data Bank, the resolution and the R-factor for each structure

Table

Number of water molecules listed Resolution of refinement(A) R-factor of refinement

Lysozyme Actinidin

Trypsin:inhibitor

101

145

272

2.0

1.7

1.9

0.28

0.165

0.23

References

1 2 3 4 5 6 7 8 9 l0 11 12 13 14

Fraser,R. D. B. and MacRae,T. P. J. Appl. Crystallogr. 1978,11, 693 Phillips,S. E. V. J. Mol. Biol. 1980, 142, 531 Finney,J. L. in 'Water-A Comprehensive Treatise', (Ed. F. Franks), Plenum Press, New York, 1979, Vol. 6 Savage,H. F. J., PhD Thesis, Universityof London, 1983 Baker,E. N. J. Mol. Biol. 1980, 141,441 Bemstein,F. C., Koetzle,T. F., Williams, G. J. B., Meyer,E. F., Brice, M. D., Rodgers,J. R., Kennard, O., Shimanouchi,T. and Tasumi, M. J. Mol. Biol. 1977, 112, 535 Karle,I. and Duesler,E. Proc. Natl Acad. Sci. USA 1977,74, 2602 Narten,A. H. and Levy, H. L. in 'Water-A Comprehensive Treatise',(Ed. F. Franks), Plenum Press, New York, 1972,Vol. 1 Diamond,R. J. Mol. Biol. 1974,82, 371 Huber,R., Kukla, D., Bode, W., Schwager, P., Bartels, K., Deisenhofer,J. and Steigemann, W. J. Mol. Biol. 1974,89, 73 van Gunsteren,W. F., Berendsen,H. J. C., Hermans,J., Hoi, W. G. J. and Postma, J. P. M. Proc. Natl Acad. Sci. USA 1983,80, 5315 Wlodawer,A. and Hendrickson, W. A. Acta Crystallogr. 1982, A38, 239 Konnert,J. Acta Crystallogr. 1976,A32, 614 Jack,A. and Levitt, M. Acta Crystallogr. 1978,A34, 931

Int. J. Biol. Macromol., 1985, Vol 7, August

225