Computational Biochemistry of Antibodies and T-Cell Receptors

Computational Biochemistry of Antibodies and T-Cell Receptors

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES AND T-CELL RECEPTORS By JlRl NOVOTNY and JURGEN BAJORATH Departments of Macromolecular Modeling and Molecular...

8MB Sizes 0 Downloads 20 Views

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES AND T-CELL RECEPTORS By JlRl NOVOTNY and JURGEN BAJORATH Departments of Macromolecular Modeling and Molecular Structure BristoCMyers Squibb Research Institute Princeton, New Jersey 08540 and Seattle, Washington 98121

I. Background

... ... ...... ........

150

.. .......

...................

........... .................................

A. Gross Immunoglobul

...........

A. VL-VH Interface 3 / Barrel

D. Conformati

........... ............. .....................................

d Monte Carlo Methods

A. Thermodynamics of Binding B. Lock-and-Key or Induced Fit C. Empirical Gibbs Functions

. . . . . . . . . . . . . , . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . , . . . . . .. . . . . . . . . . . . . .. . . .

B. What Is a Protein Epitope?

ADVANCES IN PROTEIN CHEMISTRY, Vol. 49

149

151 153 154 155 157 161 162 163 165 166 168 168 170 171 172 173 173 174 176 177 177 179 179 186 191 194 194 196 198 211 213 214 218 219

Copyright B 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

IX. Antibody Engineering . . . .

.......................

E. Heterospecific Polyvalent Constructs, “Miniantihadies” . . X. T-cell Receptor Modeling arid Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . .

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

220 222 223 22.5 227 231 232 233 235 236

In vertebrates, the immune system has evolved to specifically recognize any foreign antigen (a macromolecule, virus, bacterium, or cancer cell) and to destroy it. The key attributes of immunity-its precise specificity, the seemingly endless diversity of the immune repertoire, and the ensuing unique molecular “self ’-identity of each vertebrate organism-are all mediated by antibodies and T-cell receptors. In the past two decades it has become possible to relate immune functions to immunoglobulin and T-cell receptor structures. However, with protein molecules ranging in size from 50,000 to 1,000,000 Da, the challenges of structural analysis are enormous. Aids such as molecular models, calculations, simulations, and computer graphics are indispensable. Their technical and conceptual complexities have led to the emergence of a specialized field called computer modeling. The term cornpuler modeling is being used profusely, if vaguely, to describe different types of activities. It can mean anything from just looking at three-dimensional structures on a computer screen to application of complex and involved mathematical concepts and manipulations to obtain new insights into the structure-function relationship. “Computational immunology” became a necessity once the X-ray crystallographic structures of immune molecules emerged. Antibody-antigen complexes have immediately presented us with a challenge: Given all the exquisite atomic detail of the two molecules interacting in three dimensions, can we understand the chemical origin of affinity and specificity to a point where, e.g., accurate predictions of the effects of point mutations will be possible? Such an understanding, it soon turns out, goes beyond a mere detailed description. It requires biophysical interpretation of crystallographic reality. Necessarily, one has to stop and ponder many questions, and search (and perhaps even err in the course of the search) for a conceptual framework that allows understanding.

COMPUTATIONAL BIOCHEMISIXY OF AN’I‘IBODIES

151

A. A Touch of Metascience: Description and Understanding,

Rigor and Empiricism It seems obvious that a mere collection of all the experimental facts assembled by molecular immunologists (the specificity and binding data, the structures of antibodies and their antigenic complexes, the plethora of amino acid and nucleic acid sequences) cannot, per se, answer the fundamental questions of the field. Abstract concepts are needed to connect all the functional and structural data, and to develop formalisms capable of extrapolation to novel situations and predictions. It may be useful to spend a little time on relevant metascientific thoughts. Questions are omnipresent in our work and thinking: Does a single unifying (bio)physical principle govern molecular phenomena? Is there a single correct way to proceed toward understanding these phenomena? In an abstract sense, yes; practically, however, no. Ultimately, antibodies are chemicals-macromolecules-described by the same physical principles as, e.g., hydrogen gas. A rigorous, deductive approach to their structure from the very first principles (quantum mechanics) would be hopeless, not only because of the practical impossibility of solving the Schrodinger equation for tens of thousands of atoms, but also because of the essential and unavoidable limitation of any logical system of axioms. In 1931 Kurt Godel (Fig. 1) showed that mathematical number theory is deeply and incurably inconsistent, in that any finite set of axioms can be proven to be incomplete. That is, there is always at least one statement whose veracity cannot be deduced from the initial set of postulates (Godel, 1931; Nagel and Newman, 1958). If this is true for the central discipline of mathematics, it is even more true for all the less formal scientific disciplines: physics, chemistry, and biology. Thus, attempts to deduce the ultimate truth from a set of first principles will always be limited to uncertain outcomes (see, e.g., Yun-yu et al., 1993). On the other hand, induction from experimental data and abstraction from observables have long been an efficient and fertile way of doing science (e.g., Darwin, 1859, 1871). Induction processes usually lead to an emergence of alternative scientific disciplines, each with its own specific, truthful description of reality. This does not necessarily mean that we cannot see the most general relations, or that we are not able to guess at an overreaching relationship between anything and everything else in nature. However, alternative descriptions of phenomena (“paradigms,” Khun, 1970), either exact or approximate, may coexist for long periods of time. A case in point is quantum mechanics. Equivalent formulations exist, based on matrix calculus (Heisenberg, 1925; Born and Jordan, 1925), wave mechanics (Schrodinger, 1926), and path integrals (Feynman and Hibbs, 1962; Feynman, 1972);group theory (Weyl,

152

J I R I NOVOTNY AND JURCEN BAJORATH

FIG. 1 . Kurt Godel. (Courtesy of the Institute for Advanced Studies, Princeton, NJ.)

1928) and the conjecture of “hidden variables” (Bohm, 1952) provide additional, conceptually independent perspectives. What message does this leave us regarding molecular immunology? Overall, we think, it justifies an empirical and pragmatic approach to immune phenomena and structural science, as opposed to a rigorous but limiting purism. Approaching very complex phenomena and observables (e.g., structural determinants of antigenic specificity), one needs to propose simple and approximate explanations first. Only afterward, if proven useful, can the approximate concepts be refined into something more exact. Richard Feynman (1967) observed: The only utility of science is to go on and try to make guesses. We always must make statements about regions we have not seen, or the whole business is of no use. In order to avoid simply describing experiments that have been done, we have to propose laws beyond their observed range. There is nothing wrong with that, despite the fact that it makes science uncertain. If you thought before that science was certain-well, that is just an error on your part.

Similarly, Max Planck (1933) wrote: In every science it occasionally happens that there arises a conflict between two classes of people whom I may designate respectively as purists and

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

153

pragmatists. The former strive always after a perfect coordination of the accepted axioms, submitting them to even more and more rigid analysis, for the purpose of eliminating every contingent and foreign element. On the other hand, the pragmatists try to amplify the accepted first principles by the introduction of new ideas and thus send out feelers in all directions for the purpose of making progress. They do not mind if the mongrel be mated with a pure-bred, provided something can be achieved through the combination, which otherwise could not be achieved. The purist sticks to his logical weapons. He takes his stand on logical deductions from the accepted principles of science, whereas the pragmatist scientist is striking out into new ground; and in order to open that up he must break away from the logical line of the old ideas. The pragmatist must face failure again and again, and is always open to jibes of the orthodox “I told you so.” What the puritan objects to is the introduction of new ideas from outer sources. Now, no theorem or working hypothesis can arise ready-made. Every hypothesis which eventually has proved to be useful and to have led to valuable discoveries at first occurred only vaguely to the mind of its inventor.

B. Scope of the Article, Nature of Structural Data

In this article, we discuss the diverse computer-aided techniques that have been applied to analysis of the immune structure-function relationship, and put the main results obtained with these methods in a broader perspective. As such, our article is necessarily a personal account of these analyses and another scientist would have written a different overview (some of them, in fact, have: Padlan, 1994;Webster et al., 1994). However, we have striven to make the presentation balanced enough so that the reader can form his or her own opinion on the matter. Structural data are the bedrock on which computational immunology stands and we start with general comments on the nature of the data. The three-dimensional structures come to us as Cartesian coordinates of molecules determined by X-ray crystallography or, more recently, multidimensional (hetero)nuclear magnetic resonance (NMR) spectroscopy [e.g., the solution structure of an isolated VL domain by Constantine et al. (1994)l. These protein and DNA structures (available through the Brookhaven Protein Data Bank; see Bernstein et al., 1977) represent atomic models best fit to electron density (X-ray) or interatomic distance (NMR) data. These models can be gauged as to their accuracy and precision. Accuracy is a measure of the closeness with which a calculation reproduces the true structure. Precision, on the other hand, is a measure of the reproducibility of measurements (e.g., distance constraints) or calculations (e.g., simulated annealing). Nominal resolution, the crystallographic residual factor (R

154

JIKI NOVOTNY A N D JURGEN UAJORATH

factor), and the B factor’ (Ladd and Palmer, 1985) are values describing crystallographic accuracy and precision, which may approach fractions of angstroms for well-resolved structures (less than 2.0 A nominal resolution). Estimates of the precision and accuracy of NMR-determined protein structures (Zhao and Jardetzky, 1994) put the accuracy of a family of simulated annealing-derived coordinates at about 1 even if the errors in distance constraints are smaller than 1 A. The precision of the structures appears to be nearly insensitive to the quality of NMR data and is, at best, on the order of 1-2 A, although a precision of 0.4-0.7 A is technically attainable. The X-ray and NMR structures represent thermodynamic equilibrium average structures, or what Gregorio Weber (1975)would call a “bulk’ protein:

-

The time-average structure observed by X-ray diffraction is something of an abstraction, since it is not itself widely-or even sparsely-represented at any given time in the population of molecules. I mean by this that if we were to take an instantaneous picture of the molecular population showing us all the coordinates of the atoms for each individual molecule we would have difficulty in finding one that will match the average in all respects, although most of the molecules will have most features in common with it. Indeed the protein molecule model resulting from the X-ray crystallographic observations is a “platonic” protein, well removed in its perfection from the kicking and screaming “stochastic” molecule that we infer must exist in solution. The great importance of the former lies in that it has permitted us to see the origin of the “bulk properties” of the protein, which result from averaging over the whole population.

As will become apparent (Sections II,C, and VI,B), the results of certain computer experiments (least-squares superpositions in particular) may vary according to the nominal resolution of the structures under study. Common sense dictates that greater credence is due the results obtained with high-resolution structures (less than 2.0 A resolution). 11. TOOLS OF COMPUTER ANALYSIS

Protein structures being sets of three-dimensional Cartesian coordinates (x,y,z) of thousands of N, C , S, 0, and H atoms, any convenient and meaningful manipulation of these structures can be achieved only on a computer. This is true both for simple translations, rotations, and zoom-



Nominal resolution gives the shell radius, in the reciprocal space, to which the diffraction data (reflections) were collected and used to calculate the electron density. The R factor gives the fraction of electron density unaccounted for by the final structural model. The (isotropic) B factor (Debye-Weller, temperature factor) gives the apparent size of a spherical atom and becomes larger with increasing fuzziness of the diffraction image of an atom.

COMPLTTATIONAL BIOCHEMISTKY OF ANI‘IBODIES

155

ing in real space, and for more abstract, involved operations such as generation and examination of molecular surfaces, simulation of atomic movements, least-squares superposition of molecules, and other molecular modeling procedures per se. One can speak of three levels of computer modeling approaches. On the most basic level, computers provide a convenient viewing interface for scientists contemplating atomic details of a set of least-squares-superposed molecules as, e.g., in the works of Lesk and Chothia (1982).The next level consists of software programs that allow structure manipulations “by hand” (e.g., building up immune molecules from homology as, e.g., Pumphrey did in 1986). The third level provides for computational evaluation of protein energetics and stereochemistry via a potential function or a force field, i.e., a set of equations that define the optimal state of a protein polymer (e.g., Bruccnleri and Karplus, 1990).Although calculations per se are better defined and more objective than hand manipulation of molecules, it does not necessarily mean that the less precise methods are less useful. Even the highest precision means nothing if the physical concepts embodied in the software are not sound, or are not appropriate for the situation at hand. All physical models are approximations and the user should well understand their limitations. “Garbage in, garbage out,” is the notorious adage of computer scientists. One of the goals of this article is to make the inherent conceptual limitations of our methods more explicit. A . Protein Potential Energy

In order to build and manipulate protein structures in a computer, the chemical and geometric aspects of the structure, such as bonds, angles, torsions, and atomic radii, have to be mathematically expressed and encoded in a program. The field dealing with the development and usage of such programs has become known as molecular mechanics. Molecular mechanics has its origin in X-ray diffraction of proteins. The Fourier transformed diffraction data yield an electron density map that needs to be fitted with an amino acid sequence. The crude three-dimensional protein model is then refined, its stereochemistry regularized and poor atomic overlaps corrected, etc. For the purpose of model refinement, Levitt and Lifson (1969) developed a set of formulas that explicitly describe the potential energy of an atom in a protein. The total energy of the system can be obtained by summing over all the atoms. First and second derivatives of this empirical energy potential can be calculated from the formulas, and the total energy of the molecule minimized by moving all the atoms along the potential energy gradient. It is important to realize that the potential energy function of Levitt and Lifson, and the other early force fields-ECEPP (Warme and Scheraga,

156

JIRI NOVOTNY AND JURCEN BAJORATH

1974), Hagler’s formulations (Hagler et al., 1974) embodied in DISCOVER, Hermans and McQueen (1974), MM2 (Allinger, 1977), CHARMM (Brooks et al., 1983), AMBER (Weiner et al., 1986), GROMOS (van Gunsteren and Berendsen, 1987), and OPLS potentials Uorgensen and Tirado-Rives, 1988)-were merely a means of building an abstract wire model of a molecule. The potential specifies how long (do) the wires representing atomic bonds are, how strong they are (Kbond)and how they stretch with Brownian thermal motions (harmonically, for mathematical convenience): Ebond

= Khond(d

- HdO)‘

(1)

Also specified are the correct values of angles connecting three atoms and how soft these angles are (Kangle):

(Zo)

= Kangle (6 -

(2) as well as how easy it is to turn the torsions (4) and where the torsional minima lie: Emglc

Etorslon

= &( 1 - cos @).

(3)

The molecular mechanics force field also makes sure that double bonds are planar and that the correct values for the other, “improper” torsions, w (e.g., chiral atom stereochemistry), are enforced: Elmpn,per

= KO(@

- wo)*

(4)

It specifies how big the “balls” (defined by radii, r ) representing atoms are and how hard they are: EvdW

=

A,

rv

(5)

B, rg

--

Equation (5) implies that nonbonded atoms interact by painvise LennardJones potentials. In Eq. (5), rIJstands for the distance separating the ith and thejth atoms. Their interaction consists of a weak attraction [London dispersion forces falling off with the sixth power of distance (London, 1930)] and a steep repulsive van der Waals (vdW) barrier (the term). Painvise interactions are also used to describe electrostatic interactions between electrically charged atoms and formally neutral but dipolar groups with measurable partial atomic charges,

a,a:

where E is the dielectric constant. It was quickly realized that the utility of the potential energy function was much broader than a mere real space

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

157

refinement. Via energy minimization and, better still, Monte Carlo conformational searches (Tanaka and Scheraga, 1975), researchers hoped to arrive at the natural structure of protein models; after all, the natural structure is the lowest energy structure. Taken at face value, an exact potential energy function and powerhl computers could calculate correct structures and, implicitly, the biological properties of proteins (Levitt and Warshel, 1975). After more than 20 years, the challenge and excitement of this proposition is still with us and it remains essentially unsolved. What is missing from the molecular mechanics approach? First, the potential energy of a protein in vucuo is not a very good approximation of the free energy of the biological system: the protein in physiological solution. The molecular mechanics force field was really meant to describe a mechanical ball-and-stick model built by crystallographers: impenetrable spheres of atoms with rigid bonds and somewhat more flexible angles, and with rotatable torsional degrees of freedom. No account of protein-solvent interactions is given. Second, even if the potential energy function is correct, the problem of finding the global energy minimum is an enormous one. Much activity has focused on the development and testing of free energy potentials that would be more realistic measures of the stability of a folded protein in an aqueous environment (Novotny et al., 1984; Eisenberg and McLachlan, 1986; Sippl, 1990; see Section VI1,C). Another approach is to include water molecules in the system explicitly and simulate properties of the complete solvent-solute ensemble. This approach, however, is very demanding on computing power, requiring the manipulation of tens of thousands of atoms, and always faces the fundamental question of reaching a good equilibration and accomplishing a sampling of states sufficient to represent a rigorous thermodynamic average (Chandler, 1987). B. Surfaces and Rlumes

Computer-based analyses of protein surfaces and volumes (Richards, 1977) greatly contributed to our understanding of protein structure and function in general. Specific immunochemical applications included the identification of effector sites (see Section V,A), the determination of molecular correlates of antigenicity (Section VIII), and binding energy estimates (Section VI1,C). In a landmark paper, Lee and Richards (1971) introduced the concept of solvent-accessible and contact surfaces, as defined by a spherical probe that rolls over the protein surface (Fig. 2). Thus, the accessible and contact

158

JIKI NOVOTNY AND JURCEN BAJOWI‘H

FIG 2. Accessible surface algorithm. A planar section of protein surface is shown, with atoms exposed to the outside solvent numbered from 1 to 12. Two spherical probes with different radii, RS > Rl are rolled over the surface and define the accessible surfaces as (continuous) lines traced by their centers. Their contact surfaces are the (discontinuous) lines of contact with the individual atoms. Their reentrant surfaces are the (discontinuous) lines that fail to make contact with the probe. Together, the contact and reentrant surfaces define the (continuous) molecular surface. Note that, as the probe radius goes to inifinity, so does the accessible surface while the contact surface converges to a small value. (Reproduced with permission from Richards, 1977. Annu. Rev. Biophys. Bioeng. Vol. 6. 0 1977 by Annual Reviews, Inc.)

surfaces of the same protein, effectively defined by differently sized probes, may vary and may have somewhat different properties. Large probes ( r = 10 A) that can sample, by direct contact, only the most protruding parts of a protein surface, “see” a surface with an atomic compositio? rather different from that seen by a smaller, water-sized probe (r = 1.4 A) (Table I). The concept of solvent accessibility inspired a large body of work. The studies by Chothia (1974) and others established proportionality between the Lee and Richards solvent-accessible surfaces and the magnitude of the hydrophobic effect of the solute. The Lee and Richards (1971) algorithm also became the basis of various programs for the display of molecular surfaces (Fig. 3), most notably the dot surface diagrams due to Connolly (1983) and the “plaster” surface approximations of the program GRASP

159

COMPUTATIONAL BIOCHEMISI'RY OF ANTIBODIES TABLE I

Accessibility ofAmino Acid Residues to Probes ofDzfferent Radii" Probe radius = 1.4 8,

Probe radius = 11.4 8,

Accessibility

Accessibility

Amino acid

(A")

Amino Acid

(AZ)

'4% LYS Gln Glu Asn Tyr ASP Pro Thr Ser His Trp Ala Gly Val Leu Met Ile Phe cys

101 103 79

Lys k g Gln Asn Glu ASP Pro Ser TYr Thr His Ala Gly Val Leu Met Ile TrP CYS Phe

223 219 152 1 I5 115 104 92 79 77 74 40 36 33 22 20

66 65 61 59 53 46 43 41 32 28 25 25 22 22 20 18 13

14

14 14 5 5

Adapted from Novotny et al. (1987). The values represent averages of calculations performed on 11 single-domain proteins.

(Nicholls, 1992). Richmond (1984) presented an analytical surface representation employing differential geometry and allowing for some measure of surface minimization. More recently, Pascual-Ahuir and Silla (1990), in their program GEPOL, gave us a tool for the generation of a molecular surface (see Fig. 2) and proposed that it is a more direct correlate of hydrophobicity than the solvent-accessible surface is (Tufion et al., 1992). Richards (1974) and Finney (1975) described computer algorithms that partitioned the space occupied by a protein into atomic volumes based on Voronoi polyhedra (Fig. 4). This allowed the local packing density of proteins to be determined as -0.75, a very high density indeed, approaching that of closely packed ideal spheres. Volume (Connolly, 1985, Stouch and Jurs, 1986) and surface partitioning is an important component of various computerized docking procedures (Kuntz and Crippen, 1979; Connolly, 1986; Gregoret and Cohen, 1990, 1991; Jiang and Kim, 1991; Cherfils

160

A

.

JIRI NOVOTNY AND JURGEN BAJORATH

B

FIG.4. Volumes occupied by protein atoms (lioronoi polyhedra). (A) A two-dimensional sketch of the Voronoi algorithm. Points show centers of atoms. Vectors are drawn from an atom to all its neighbors within a sphere of 8 A radius. Planes perpendicular to the vectors, and normal to the van der Waals radii of the contacting atoms, are constructed, defining the smallest polygon (a polyhedron in three dimensions, the Voronoi polyhedron) that encloses the central atom. The ratio of the atomic van der Waals volume to that of the Voronoi polyhedron is the packing density of the atom. (B) The Voronoi polyhedron of the Oy atom in a serine residue. Atoms are shown as spheres with radii one-quarter of their van der Waals radii. [Reprinted with permission from Harpaz et al. (1994).

et al., 1991) that determine the best shape complementarity in protein-

ligand pairs. The importance of inter- and intraprotein packing to protein function and stability has been hotly debated, with the focus on the following issue: Is tight packing of complementary side chains the decisive determinant of protein folds [i.e., the “jigsaw puzzle” or “watchmaker” paradigm of Harpaz et al. (1994)], or could tight packing be readily achieved with any nonspecific assemblage of side chains [e.g., the “nuts-and-bolts-in-a-bag” paradigm of Bromberg and Dill (1 994)]? Evolutionary evidence seems to favor the latter view. Side-chain-side-chain contacts in proteins of similar threedimensional structure show high variability (Russel and Barton, 1994), often with as few as 12%common contacts and virtually no conservation of energetically favorable side-chain-side-chain interactions. Gerstein et al. (1994) analyzed volume changes in side chains occurring in protein evolution and found that only about half of the protein cores strongly conserved their volume (to within 10% variation). The rest of the positions showed various degrees of variation that, in some core sites and in nearly all surface sites, approached random variation (28% and more). An inspection of interfaces in protein-protein (Krystek et al., 1993) and, particularly, in antibody-

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

161

antigen (Lawrence and Colman, 1993) complexes revealed relatively loose complementarity and enough empty space to suggest that, on the whole, the main role of (approximate) shape complementarity may be expulsion of water from hydrophobic interfaces and/or modulation of interprotein interactions by surface-prebound water molecules (see Section VII,C,3), rather than precisely engineered atomic interactions and steric repulsions.

C. Structural Superpositions Statistical analysis of the available Brookhaven Protein Databank entries has indicated that there may be no more than about 1000 distinct proteinfold families (Chothia, 1992), and that the many millions of protein amino acid sequences may merely repeat the same folding motifs over and over again. Although the three-dimensional dissimilarity of two proteins parallels their sequence diversity (Chothia and Lesk, 1986), proteins with no sequence similarity may still share the same fold. Three-dimensional similarities that persist in proteins despite sequence differences accumulated in evolution can be quantitatively analyzed by pairwise structural superpositions. The measure of overall corespondence of the two compared sets of atoms is the root-mean-square (rms) deviation (A):

McLachlan (1972), Diamond (1976), Kabsch (1976), and Rossmann and Argos (1975,1976) introduced formalisms for finding the best rotation to fit a given set of atoms to a target set of coordinates. Structural motifs (sets of backbone and/or side-chain atoms) are first identified in the two structures being compared, for example, residues participating in heme binding in globins and in cytochromes. These are then used to carry out the best rigid-body rotation and translation that matches the Cartesian coordinates of one atomic set (the source) to those of the other atomic set (the target) such that the (weighted) sum of squared deviations is minimized. In the McLachlan (1972) formulation, given the two sets of N coordinate vectors hba(a = 1, ... N)we seek an orthogonal rotation matrix R and a translation t that convert the coordinates aza(i = 1,2,3) to

and minimize the residual

162

JIKI NOVOTNY AND JUKGEN BAJORAlH

where w, is an arbitrary weight. Note that the formula implies equivalence of predefined pairs of atoms in the two structures. Several powerful algorithms were developed that find the best fit for two sets of atoms (structures) through systematic application of rotations and translations in the three-dimensional Cartesian space (Rossman and Argos, 1975; Diamond, 1976; Lesk, 1991). It is important to realize that there is no a przorz correct way of equivalencing protein structures and parts thereof. In current practice, the selection and definition of groups of superposed atoms implicitly contain a hypothesis of the cause of structural similarity; at the same time, the hypothesis is being tested by the superposition and the ensuing analysis of the results. For example, in order to measure differences in relative orientation of the Vr*and VH domains in different antibodies, it is possible to equivalence Ca atoms of one domain (e.g., V,) by least-squares superpositions, and to calculate the rotational-translational matrix required to best superpose the other domain pair (e.g., VH). However, given the existence of the invariant motifs at the VL-VH interface (Section IV,A), we may argue that a more meaningful way of equivalencing pairs of dimeric VL.-VH modules would be to least-squares superpose the six aromatic rings of the conserved “herringbone” cluster and the pair of Gln residues forming interdomain side-chain hydrogen bonds, rather than the complete domain backbones (Novotny and Sharp, 1992; Bajorath et al., 1995). It is reasonable to assume that the solvent-exposed /3 sheets are unimportant in formation of the interface and that by their excessive sequence (mass) variations, they may obscure the best alignment of the key interface-forming segments. AND SUPERFAMILIES 111. SmucruKEs, SEQUENCES,

The relationship between amino acid sequence and three-dimensional structure is often described as a stereochemical code that transposes primary protein structures into tertiary protein structures (Epstein, 1964, 1966). The code is degenerate: Although the chemical structure (amino acid sequence) of the protein is known to determine the three-dimensional structure (Anfinsen and Haber, 1961; White, 1961), many different sequences can adopt the same fold. From the point of view of the stereochemical code, structural variability of immunoglobulin polypeptide chains offers an interesting, well-studied example of structural degeneracy exploited by nature for generation of the immune repertoire. It is generally accepted that each immunoglobulin sequence represents a unique antigenic specificity. Indeed, by refolding a denatured immune Fab and

COMPUTAI’IONAL BIOCHEMISI‘KY OF ANTIBODIES

163

demonstrating that it had regained its original specificity, Haber (1964) showed for the first time that the information of immune specificity resides in the amino acid (and, implicitly in the DNA) sequence alone. In this section, we will consider sequence similarity from the general perspective of polymer solubility and from the point of view of an immunoglobulin superfamily of structures.

A . Gross Immunoglobulin Structure It is of functional significance that immunoglobulins have a modular design (Hill et al., 1966; Singer and Doolittle, 1966; Edelman, 1970; Fig. 5). Both the light (25,000 Da) and the heavy chains (50,000 Da) are composed of several domains, each of which consists of approximately 110 amino acid residues. The antigen binding function is concentrated in the amino terminal, the so-called variable domains of both the light and heavy chains (Hilschmann and Craig, 1965; Titani et al., 1965). It has been demonstrated experimentally that the isolated variable domains can refold without changing the rest of the polypeptide chain (Hochman et al., 1973). Antibody modules share the same architectural motif as noncovalently formed domain dimers. The domains themselves are formed by two antiparallel P sheets closely packed face to face (Richardson, 1981; Lesk and Chothia, 1982; Novotny et al., 1983; see Section II1,C). The basic building blocks of immunoglobulin domains are therefore P-strand segments and reverse loops connecting these strands. The domains are linked together by rather extended “hinge” or “switch” peptides long enough to permit the movement of domains with respect to each other. Comparison of currently available X-ray structures of Fab fragments shows clearly that the “elbow” angle between the long axes of the VL-VH and CL-CHl dimeric modules can vary from essentially extended (180”) to quite sharp (100”). Flexibility of the hinge between the Fabn and the Fc is well documented, e.g., by the electron microscopic studies of Valentine and Green (1967) and the hydrodynamic studies of Tanford and co-workers (Noelken et al., 1965; Fig. 5). Some immunoglobulin subclasses (e.g., the mouse IgG2.J are more flexible than others (e.g., the relatively rigid mouse IgGl) and the polypeptide segments crucial to flexibility have been mapped by domain swap experiments (see Section IX,A) to the hinge and the CHI loop 131139 (Schneider et al., 1988). Contrast-matching, small-angle scattering of neutrons determined the mean antigen-antigen distance between the and IgGnb molecules as 11.7-12.4 nm, with a bivalent mouse IgC,, IgCZA, large variance (-4 nm, Sosnick et al., 1992). For all three subclasses, the scattering data could be fitted only with a distribution of distances rather

Fv

Fab

-45m-

FIG.5. Various models of a complete I@ molecule. (hj) a sketch of covalent structure showing the light (L) and heavy (H) chains, interchain disulfide bonds (SS), and approximate dimensions of the Fab and Fc fragments. Variable domains are hatched. (Middle)Model derived from hydrodynamic studies by Noelken et al. (1965).A and B chains refer to the H and L chains, respectively; fragments I and I1 are the Fab fragments, and fragment I11 is the Fc fragment. (Rzght)a-Carbon tracing of a complete I g C molecule (Harris et al., 1992). Light line, L chain; heavy line, H chain.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

165

than with a single distance. Thus, antibody molecules in solution sample a large selection of hinge and elbow angles at any given moment, a structural trait undoubtedly important for efficient antigen engagement. B. Polymer Solubility and Amino Acid Sequence Variability

The conformational states of a polymer in solution (random coil or folded, compact or extended) depend on a relative balance of polymerpolymer and polymer-solute interactions (Flory, 1969). The fact that in proteins many different sequences adopt the same compact fold strongly suggests that there is one gross (solubility)property of natural amino acids that overwhelmingly determines the structure of the folded state (Dill, 1990). Polypeptide chains that may have only 1 or 2 amino acids out of 10 in common, but conserve the sequence distribution of hydrophobic and hydrophilic residues, retain the same fold (Bashford et al., 1987), suggesting that the hydrophobicity pattern of an amino acid sequence is the most likely determinant of protein structure. The computer experiments of Chan and Dill (1990), enumerating the complete conformational spaces of two-dimensional polymers composed of two types of residues (black and white, hydrophobic and hydrophilic), reproduced many of the features of folded proteins, in particular the existence of unique compact structures attainable by different sequences, the families of related sequences, and the secondary structure patterns. It is often asked whether the native protein structure is the one of global free energy minimum, and whether kinetic aspects of protein folding are not equally important (Baker and Agard, 1994). Although no definitive answer is currently available, some recent results emphasize the importance of a thermodynamic minimum in the determination of native folds. Sippl (1990), Bryant and Lawrence (1991), Bowie et al. (1991), and others developed pseudo potentials that capture, in numerical form, important features of the known protein structures, e.g., the averages and distributions of pairwise residue-residue distances, solvent exposures of all 20 amino acids, distances (and implicitly, electrostatic interactions) of formally charged residues, and the character of the immediate neighborhood of side chains in various folds. Native amino acid sequences examined with these pseudo potentials virtually always show a distinct energy minimum associated with only one, the native, fold. Energies obtained with misfolded structures (i.e., fits of the sequence into alternative folds of other proteins) are distinctly less stable. Likewise, the computer folding experiments of Sali et al. (1994), with proteinlike simplified polymers on a threedimensional lattice, seemed to imply that the sole criterion for a fast folding step toward compact, nativelike structures (molten globules) was the existence of a deep global minimum in the free energy landscape.

166

JIKI NOV0I”Y

AND J U R G E N UAjORATH

C. Immunoglobulin Superfamily

Lesk and Chothia (1982) were the first to define the core of immunoglobulin domains (VK,VA, Vy, CZ, Cyl, Cy2, Cy3) on the basis of leastsquares comparisons of available X-ray structures. Although the core of each of the domains consisted of about 35-36 homologous residues, forming two p sheets packed face to face, only 3 residues were common to all the domains: 2 cysteines that formed a disulfide bridge between the p sheets, and a tryptophan that packed against them (Fig. 6). The other interior residues tended to retain hydrophobic character but varied greatly in size. Williams and Barclay (1988), Harpaz and Chothia (1994), and others defined groups of amino acid sequences designated the immunoglobulin superfamily (IgSF). In the superfamily, sequence similarity ranges from a clear homology, virtually guaranteeing an identical fold (-4040% identical residues), to weak similarities (10-1 5%), and a putative three-dimensional structure for the more controversial members of the family was debated for some time. For example, a folding motif radically different from that of immunoglobulins was proposed for the CD2 antigen: the (ap fold of this molecule (Clayton et al., 1987) was seemingly favored by circular dichroism measurements showing a significant proportion of a helix in the CD2 extracellular domain(s), whereas Williams et al. (1987) argued CD2 similarity with immunglobulins on the basis of sparse, conserved sequence motifs. X-ray (Jones et al., 1992) and NMK studies (Driscoll et al., 1991, Withka et al., 1993) confirmed the presence of an immunoglobulin fold in the CD2 leukocyte antigen. Given the marginal sequence similarity of CD2 to immunoglobulins, elucidation of this structure constituted strong support for the concept of an immunoglobulin superfamily. It appears that, within the immunoglobulin family, the number of constituent p strands may vary but the general folding topology, i.e., the Greek key motif (Richardson, 1981), remains conserved. According to Harpaz and Chothia (1994), the current superfamily tree can be subdivided into sets, i.e., clusters of domains that are structurally more similar to each other than to a member of any other set. Sets are distinguished by the number ofp strands in the two individual sheets and their length (Fig. 6). An antiparallel p sandwich is one of the commonest protein folds observed, and Bork et al. (1994) used the computer program DALI to carry out a complete three-dimensional classification of the known P-sheeted structures. The common structural core of four p strands, B, C, E, and F, was found to be shared by nine distinct families. Subclass distinctions arose as additional /3 strands were appended to the core. Disulfide bridges were not necessarily invariant in number and location within the subclasses. The four major topological subtypes were described as (1) the c-type, i.e., classical seven-stranded, Ig constant-domain topology; (2) the s-type, a

FIG 6. Folding motifs of the immunoglobulin superfamily (Williams and Barclay, 1988; Harpaz and Chothia, 1994). (Top left) V domain fold (the VH domain of the McPC 603 Fab fragment) with two four-stranded /3 sheets. (Top rzght) C1 domain fold (the CI, domain of the McPC 603 Fab fragment), a p sandwich with a four-stranded and a three-stranded p sheet with the strand topology A-B-D-E and C-F-G. The core of the immunoglobulin domain (Lesk and Chothia, 1982), i.e., the intradomain disulfide and a tryptophan side chain packed against it, is also shown. (Bottom) C2 fold (the fourth extracellular domain of CD4) is essentially the C 1 fold with the D strand missing. See Bork el al. (1994) for a more encompassing definition of the immunoglobulin fold. [Graphics by Molscript (Kraulis, 1991).I

168

JIRI NOVOTNY AND JURGEN BAJORATH

seven-stranded “switched” type with strand topology E-B-A and G-F-C-C’; (3) the h-type, a hybrid between (1) and (2) where the strand C’/D is kinked and hydrogen-bonds to the strand C; and (4) the v-type, a nine-stranded type typified by variable domains. In this broad definition, the immunoglobulin fold comprised proteins as diverse as the growth hormone receptor domain 2, fibronectin, neuroglian domains 1 and 2, fungal galactose oxidase domain 3, the PapD protein domain 1, and cyclodextrin glycosyltransferase domain D. In all these folds, the core motif of four antiparallel 4, strands conserved its tight packing, strand curvature, and sheet-sheet angle. IV. MOLECULAR ANATOMY OF ANTIBODY BINDING SITE Primary structures of different light and heavy chain variable domains and V H ) contain segments conspicuously variable both in length and amino acid composition (Wu and Kabat, 1970). There are three such segments in both VL and V H , and their positions within the domain sequences are homologous. It has been hypothesized that these hypervariable loops, designated L1, L2, L3, H1, H2, and H3, are responsible for antibody specificity and form the surface of the antigen combining site. Accordingly, they are also known as complementarity-determining regions (CDRs). X-ray crystallographic structures of many Fab fragments indeed confirmed that, by noncovalent association of the VL and VH domains, all six loops come into close contact and form a contiguous area on the surface of the VL-VIj dimer from which the binding site (the paratope in serological parlance) is constructed (Fig. 7). (VL

A . VL-VH Interface /3 Barrel Domains from different chains associate by noncovalent forces to form domain heterodimers. Each domain can be viewed as having two /3-sheet surfaces, and in the complete immunoglobulin molecule one of the /3 sheets faces the solvent, whereas the other mediates contact with the other domain of the pair. The principal difference between a VL-VH type of dimerization and constant-constant domain pairs lies in the fact that the /3 sheets involved in the domain interface are different in the two domain types (Edmundson et al., 1975). It has been noticed that /3 sheets are not flat, as originally suggested by Pauling and Corey (1951), but twisted (Chothia, 1973). The VL-VH contact /3 sheets are not only twisted but also strongly curved, giving an impression of being wrapped counterclockwise around an elliptical-hyperboloidal surface (Novotny and Haber, 1985). Thus, the VL-VH interface seems to

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

169

give rise to a third p barrel formed by walls of the two interacting p barrels of the VL-VH domains. Structural fingerprints of this unique fold are (1) the two /3 bulges in the edge p strands G of the VL and VH domains, with sequences Phe-Gly-Gly-Gly and Trp-Gly-X-Gly, respectively; (2) an aromatic ring cluster of at least six Tyr/Phe residues forming a second layer of side-chain interactions in the VL-V, interface; and ( 3 ) a pair of Gln residues forming a pair of hydrogen bonds across the VL-VH interface (see Fig. 7; Novotny and Haber, 1985). The conserved features of antibody binding sites are worth recounting in greater detail. The VL-VH interface forms a close-packed, twisted p barrel characterized by cross-sectional dimensions 1.04 x 0.66 nm (10.4 x 6.6 A) and a top-to-bottom twist angle of 212". The geometry of the interface is preserved via the invariance of about 12 side chains, both inside the domain and on their surface. Buried polar residues form a conserved hydrogen-bonded network that has a similar topological connectivity in the two domain types. The two hydrogen bonds contributed by invariant Gln side chains extend across the interface and anchor the p sheets in their relative orientation. Invariant aromatic residues close-pack at the bottom of the binding site p barrel with their ring planes oriented perpendicularly in the characteristic herringbone packing mode. About 18 nm2 of protein surface is buried between the domains and about 3040% of this contact surface is contributed by the hypervariable regions (Novotny and Haber, 1985). The p sheets that form the interface have edge strands that are strongly coiled by /3 bulges. As a result, the edge strands fold back over their ownp sheet at two diagonally opposite corners. In the VL-VH dimer, residues from these edge strands form the central part of the interface, resulting in what we call three-layer packing; i.e., there is a third layer composed of side chains inserted between the two backbone side-chain layers that are usually in contact. This three-layer packing (Chothia et al., 1986) is different from the common aligned or orthogonal /3-sheet packing found in other P-sheeted proteins (Chothia and Janin, 1981, 1982). Conservation of the geometry of the VL-VH interface strongly contrasts with the variability of the CDR loops that provide connections between the /3 strands of the interface barrel. A schematic diagram of the binding region is shown in Fig. 7. It is interesting to note that one of the p strands in each of the domains corresponds to the J gene segment and is encoded separately from the rest of the variable-region gene (Early et al., 1980; Bernard and Gough, 1980; Newel1 et al., 1980). The region of the V geneJ gene junction maps into the L3 and H3 hypervariable loops, with additional amino acid sequence variablility generated by frame shifting the splice point between the two gene segments. In the heavy chain segments,

170

JIRI NOVOTNY AND JURGEN BAJJORATH

there is a third short gene segment, D, that becomes inserted in the V-J junction (Early et al., 1980; Siebenlist et al., 1981); the D segment corresponds exactly to the H3 loop. The overall picture that emerged from structural analysis is that of a conserved structural scaffold or framework (the Vl,-VH interface p barrel) to which the various hypervariable regions can be attached. This structural concept has been supported by the loop swap experiment of Jones et al. (1986) whereby a transfer of antigenic specificities, via gene splicing and protein expression, was accomplished between two different antibody V region scaffolds (see Section IX,D). It is obvious that amino acid sequence variations in the hypervariable loops change the shape of the outer parts of the binding site. However, many examples of antibody specificity modulation can be mapped to amino acids participating in formation of the bottom of the binding site and inaccessible to solvent or antigen (Stevens et al., 1980; Horne et al., 1982). Side chains of residues Leu L96, Glu H35, and Asp H101, known to be important to the specificity of phosphorylcholine-binding myelomas (Rudikoffet al., 1981; Cook et al., 1982, Rudikoffet al., 1982; Chothia et al., 1992), are either totally or partially buried in the V,-VH interface. Other hypervariable residues, such as C-terminal parts of L1 and H1 and Nterminal parts of J gene segments, also form part of the VL-VH contact surface. The calculated residue free energy contributions to the stability of antibody-antigen complexes (AGrc,,due;see Section VI1,C) also emphasized the importance of the bottom part of the binding site. For example, Tulip et al. (1994) and Rauffer et al. (1994) found that, as a rule, the highly destabilizing mutations of the N9 neuraminidase-NC4 1 antibody complex occurred at rigid residues, just as protein folding is more destabilized by mutations in the core than by mutations at mobile residues (Alber et al., 1987). B. Binding Potential of Surface Cavities In virtually all known protein structures, binding sites have the shape of cavities or grooves. In antibodies, the antibody binding site is located at the generally concave interface between the light and heavy chain variable domains. In one case, a small hapten (phosphorylcholine) enters deeply into the interdomain pocket (Satow et al., 1986), in another case the principal residue of a protein antigen (glutamine-121 of hen egg white lysozyme) is buried in the interface while the rest of the antigen-antibody contact area is rather flat, if irregularly undulated ( h i t et al., 1986). It is natural for the curvature of binding site cavities to match that of the antigens: Binding sites that accommodate small ligands have high curvatures and appear as pockets; those directed toward large protein antigens

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

171

have a low curvature, being more akin to valleys and grooves than to deep crevices (de la Paz et al., 1986). Why should the concave surface be favored as a specific combining site? Although no definitive answer is available at present, at least four good reasons can be cited. First, diffusion away from cavities is significantly slower than from flat surfaces or through the solvent. Primitive binding properties of simple concave organic molecules such as crown ethers and cavitands demonstrate this very clearly (Cram, 1983). Second, electrostatic fields are enhanced or focused in cavities, even though solvent quenches the field at other, flat or convex parts of the protein surface (Zauhar and Morgan, 1985; Klapper et al., 1986). Third, the hydrophobicity of a surface is a function of its curvature relative to the size and curvature of a water molecule, and concave surfaces are more hydrophobic than flat or convex surfaces (Sharp et al., 1991; Nicholls et al., 1991). Fourth, a hydrodynamic drag that develops at concave sites as a ligand molecule approaches may give rise to a a steering torque that forces the ligand into the pocket. The magnitude of this torque force is estimated to be significant, even larger than the electrostatic torque (Brune and Kim, 1994). The dielectric enhancement of an electrostatic field at cavities is illustrated in Fig. 3A. There the negative field generated by a pair of carboxyl groups embedded in a low-dielectric ( E =2) protein (with no other charges present) is compared to that generated by the same groups in water (i.e., a dielectric constant continuum of 80). The low dielectric of the protein not only allows the field to extend into a larger region of space but also modifies the field according to the shape of the dielectric boundary. The field enhancement inside the binding site cavity is striking (Novotny and Sharp, 1992). Computer simulations of antigen diffusion toward the antibody HyHEL-5 (Kozack and Subramaniam, 1993) suggested that the electrostatic field generated by charged amino acids at the binding site increases reaction rates, and effectively steers the antigen molecule into the site. The precise matching of charged groups with charged groups of the opposite sign is an example of chemical complementarity between antibody and antigen which, in addition to shape complementarity, determines the specificity of interaction. Complex specificity imposed by this charge distribution can, however, be achieved only by paying a price in free energy of desolvation which decreases complex stability (see Section VI1,B). C. Side-Chain Compositional Bias

It has been noted that antigen combining sites of antibodies have an unusual amino acid composition (Padlan, 1990). There is a statistically significant preference for aromatic rings and dipolar side chains, in partic-

172

JIRI NOVOTNY AND JURCEN BAJORATH

ular Tyr and Asn. A similar side-chain bias emerges from empirical free energy calculations that make it possible to rank, in a semiquantitative manner, the combining site residues according to their binding significance (AGres,due, Novotny et al. 1989). Is the side-chain bias seen in antibody combining sites shared by recognition sites of other proteins? A comparison with enzymes (Krystek et al., 1993) reveals substrate binding sites of neutral proteases populated by amino acids very different from those of antibody binding sites: small, rigid side chains that incur no conformational-entropic penalty on complex formation (Pro, Ala, Cys) or Gly, which has no side chain. Hence enzymes and their inhibitors on the one hand, and antibodies and their antigens on the other hand, employ different strategies for harnessing the free energy of binding in their respective complexes. Neutral proteases bind uncharged substrate molecules, and the nonentropic side chains in their binding sites (1) serve a better purpose than charged and dipolar amino acids, and (2) contribute favorably to binding energies by minimizing the stiff conformational entropy penalty. By analogy, the aromaticdipolar residue bias in antibody combining sites is likely to reflect the nature of antigenic surfaces. The majority of antigenic epitopes consist of loops where charged and dipolar amino acids are very common (Section VII). It seems that in the antibody combining sites the best binding surfaces are constructed with formally charged side chains carrying charges opposite those displayed by the antigens. Indeed, charge matching is frequently seen in antigen-antibody complexes, but it may be difficult to generate a perfect charge-charge match to sufficiently stabilize the complex. Further, (free) energy needed for charge desolvation on complex formation represents a large energetic penalty, so large that it may actually destabilize the complex (Section IV,D). From these points of view, aromatic rings stabilizing buried charges via “aromatic hydrogen bonds,” i.e., charge-ring n electron interactions (Levitt and Perutz, 1988), may represent a good alternative to charge neutralization and burial. Dipolar (amide) side chains represent, next to the formally charged and aromatic residues, another possible tool of charge neutralization. D. Electrostatic Perspective

Novotny and Sharp (1992) analyzed electrostatic fields generated by antibodies and their antigens, and discussed their importance in binding. For convenience, their conclusions are briefly summarized here. (1) The calculated field contours corresponded closely to the distribution of formally charged side chains on the surface. (2) By and large, the sign of the field at the binding site was found to be opposite that of the hapten. For dipolar (zwitterionic) molecules, the field dipole helped to

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

173

orient the ligand. Binding sites to neutral molecules may have measurable fields (menadione, galactan), or may be neutral (digoxin). The fields did not extend very far (i.e., beyond -4A) into the solvent. Outside the binding site region, the electrostatic potentials were complex and seemingly uncorrelated with the antigen binding. (3) Although there were local regions of electrostatic complementarity for large antigens, absolute complementarity was not a prerequisite for complex formation (e.g., D1.3lysozyme). (4) Fields at the 4-4-20 binding site specific for fluorescein suggested an electrostatically guided, lateral, two-dimensional diffusion of the ligand along the antibody surface into the site. ( 5 ) The HyHEL-5 and HyHEL- 10 antibodies against lysozyme had large negative fields complementing large positive fields of the antigen. In HyHEL-10, side chains that did not contact the antigen acted through space to augment the field and increase antibody affinity for the antigen. V. INSEARCH OF EFFECTOR SITES An antibody is a dual-purpose molecule: The antigen binding site allows for recognition of a virtually unlimited range of antigenic structures, and the constant domains mediate interactions with molecules belonging to the effector systems participating in antigen elimination. The y chain CH2 domain in particular plays an important role by engaging the Clq component of complement. The triggering event in the classical complement pathway is the binding of the first component of complement, C 1, to the Fc region of immunoglobulins aggregated in immune complexes. The structure of Clq, proposed by Reid and Porter (1975), has the appearance of a bunch of tulips: 18 chains, each about 200 amino acid residues in length, are linked in threes in the base and stalk regions to form collagen-like triple helices. Each of the six superhelices terminates with a globular head region thought to contain the Fc binding site. Apparently, the single Fc binding sites are of relatively low affinity ( K s 100-1000 M P ) and only the aggregation of several Fc fragments on immune complexes affects tight C l q binding due to multivalency mediated by the six C l q heads. Identification of the C l q binding site on IgG antibodies represents an early example of successful computer-aided structural analysis.

-

A. C l q Banding Site on Fc Fragment Burton et al. (1980) argued that the residues of the CH2 domain involved in interaction with C l q should fulfil two criteria. First, they should be accessible for C l q binding, that is, not buried in the interior of the

174

JIRI NOVOTNY AND JURCEN BAJORATH

domain. Second, they should be highly conserved in those immunoglobulin molecules that bind Clq; this followed from the observation of crossspecies reactivity of C l q and IgG antibodies. Starting with the crystallographic structure of the Fc fragment (Deisenhofer, 1981), residues of the CH2 domain in contact with solvent were identified using the Lee and Richards (197 1) algorithm and a water-sized probe (radius 1.4 A). Of the 104 residues of the cH2 domain, 72 were found to be accessible to water. An inspection of contiguous, solventexposed hydrophobic patches identified 27 residues clustered into four large patches (> 100 Az),and 8 other residues found in two small patches or as isolated side chains. The location of three of the large patches made it unlikely that they were involved in C 1 q binding. Analysis of interspecies residue conservation then helped to eliminate all the patches but one. The sequence conservation analysis employed human and mouse myeloma proteins, and rabbit and guinea pig heavy chains, some of which did and some of which did not activate complement. Some invariant positions could be readily eliminated from being involved in Clq binding by being buried, covered with the carbohydrate, or affecting sugar attachment. Of the remaining positions, the invariance of some could be understood on structural grounds without invoking functional significance. All these considerations led to emergence of the continuous residues on the C-F-G /?-sheet face as a potential binding region composed of exposed, highly conserved, and mostly charged (five Lys, two Glu) residues. These residues were proposed to form the C l q binding site (Fig. 8). In 1988, Duncan and Winter systematically altered surface residues in the mouse IgG2, isotype and localized the binding site for C l q to essentially three side chains, Glu-318, Lys-320, and Lys-322, contained within the originally proposed seven residues of the CH2 C-F-G/?-sheet face. B. Fc Receptor Binding Sites and Rheumatoid Factor-Reactive Sates

The solvent accessibility-sequence conservation argument previously outlined was also used by Woof et al. (1986) to propose location of the Fcy receptor binding site to the hinge-link region of the CH2 domain (Woof FIL 8. Effector sites of the Fc fragment. (To$ lefl) Fc fragment (Deisenhofer, 1981). p Strands are shown as ribbons, and the oligosaccharide at the c H 2 / c H 2 interface as ball-andstick. T h e p hairpin predicted by Burton et nl. (1980) to be the C l q binding site is shown in heavy lines. (To$ right) Detail of the C"2 domain. Residues mutated in the F /3 strand by Duncan and Winter (1988). are also shown. (Lower left) Fc fragment with the residue Pro-238 highlighted. The Fc receptor binding site was localized to the polypeptide N-proximal to residue 238. This peptide, however, is not visible in the crystallographic structure of the human Fc fragment. (Lower rzght) Detail of the CH2 domain with the residue Pro-238 highlighted. [Graphics by Molscript (Kraulis, 1991).]

-

\/

\\

176

JIRI NOVOTNY AND JURCEN BAJORATH

et al., 1986; Burton, 1985). Duncan et al. (1988) engineered a single amino acid change in a mouse IgGeb antibody, E235L, which enabled the antibody to bind the high-affinity FcyRI receptor with a 100-fold improvement in affinity. Finally, Chappel et al. (1991), through the use of recombinant IgG1/IgGphybrids and site-directed mutagenesis, localized the essential receptor binding activity to the CH2 sequence Glu-Leu-Leu-Gly-Gly-Pro (residues 233-238) (Fig. 8). A rather complex effector system is that of immunoglobulin E and its various cellular receptors: the high-affinity FceRI, the low-affinity FceRII (also known as CD23), FceRIII, and complement receptor 2 (CR2, also known as CD21; see Sutton and Gould, 1993). The three-dimensional structure of IgE is unknown, but the amino acid sequence of the E chain suggests an overall structure similar to that of other classes of antibodies, in particular, a p chain with an additional constant domain in place of the hinge region. The two C-terminal domains of IgE were computermodeled on the basis of the crystallographic structure of the human IgG Fc fragment (Padlan and Davies, 1986). Highlights of the IgE model were (1) a disulfide bridge linking CLto CE1, and (2) two SS bridges linking the two Ce2 domains of the pseudo symmetry-related heavy chains. The ~ C Edomains ~ assumed Asn-394-linked oligopredicted fold of the C Eand saccharide chains to lie in between the Ce3 domains. With the use of recombinant peptides and domain exchange techniques, a segment of E chain sequence at the N terminus of the Ce3 domain (residues 330-335 of the E sequence) has been implicated in F e R I receptor binding. In order to localize rheumatoid factor (RF)-reactivesites on human IgG molecules, Peterson et al. (1995) used overlapping heptapeptides derived from the C H domain ~ sequence as competitors in RF-binding assays. To design the heptapeptides, they relied on the calculated, large-probe accessible “antigenic epitopes” postulated in the IgG CH3domain by Novotny et al. (1986a; see Section VIILA). About 10 residues were identified as critical for rheumatoid factor binding, clustered in the two segments between amino acids 343-354 and 401-439 of the human H chain sequence. C. Conserved Elbow Joint in Fab Fragments

Although not an effector site per se, the elbow joint described by Lesk and Chothia (1988) is an intriguing molecular device that is localized in a small region of the antibody molecule and mediates segmental domain flexibility (see Section 111,A). Based on computer graphics inspection of several Fab fragments with different elbow angles, Lesk and Chothia

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

177

(1988) concluded that movement of the VL-VH dimer relative to the CLCHI dimer involved interactions of three VH and two CHI residues that formed the molecular equivalent of a ball-and-socket joint, plus a few additional contacts that varied from structure to structure. Residues 11, 110, and 112 in VH packed against residues 149 and 150 in CHI, and the three VH residues belonging to the P-sheet framework contacted the two adjacent, turn-located CHI residues. The extent of the shift between members of any pair of Fab structures was estimated by superposing the CHI residues in different Fab fragments and calculating the mean position differences of the VHresidues. In one case, the VHresidues were displaced by a translation of 4.4 A and a rotation of 36". Inspection of space-filling models showed that the relative movement of the residues resembled that occurring in a ball-and-socketjoint, with the CHI residues forming the ball and the VHP-sheetresidues forming the socket (Fig. 9). VI. PROTOCOLS FOR THREE-DIMENSIONAL MODELING OF BINDING SITES Basic information about the nature and dimensions of antigen combining sites was derived in the precomputer age by solution chemistry and serology. Combining sites were probed by series of chemically similar haptens in the early works of Landsteiner (1962), Pressman and Grossberg (1968), Karush (1962), Kabat (1970), Haber (Haber et al., 1967, 1976), and Schechter (1971), to name but a few. The approximate dimensions of binding sites were such as to accommodate about six monosaccharide units or about three or four peptide units. In favorable cases, ingenious resonance transfer and chemical modification studies provided more detailed atom-atom distance information (e.g., Rosenstein and Richards, 1976). A. N M R Spectroscopy Combined with Computer Model Building In chemistry, it is common to derive structural formulas of compounds from spectroscopic methods such as nuclear magnetic resonance, spin resonance, circular dichroism, and infrared. The same methodology has been applied to proteins, and particularly to antibodies. If an atomic detail-level model is sought, the complexity of the task also calls for the use of computational methods. Some of the first atomic details of antibodyligand interactions emerged from a combination of NMR spectroscopy and computer-aided model building in the work of Dwek's group (Dwek et al., 1975, 1976; Sutton et al., 1977; Willan et al., 1977; Dower et al., 1977; Wain-Hobson et al., 1977) These studies paved the way for similar analyses

178

J I K I NOVO'I'NY A N D JURCEN BAIOKA'I'H

FIG.9. Ball-and-socketjoint at the elbow of the Fah fragment McPC 605. VH domain is at the upper left, arid CFI1 domain is at the lower right. The VH side chains 1 1, 110, and 1 12 are shown in sp;ice-filling representation (heavy lines), and the C k ~ lspace-filling side chains 149 and 150 are shown as dashed lines. These side chains form a mechanical equivalent of a ball-and-socketjoint, facilitating Fah elbow movements. [Graphics by Molscript (Kraulis, 1991).]

on anti-spin label monoclonal antibodies (Anglister et al., 1984, 1987; Levy et al., 1989; Zilber et al., 1990) and on the anti-2-phenyloxazolone Fv fragments (McManus and Riechmann 1991). Dwek and co-workers used then state-of-the-art magnets (270 MHz) and various spin-labeled haptens to deduce that the combining site of the MoPC 315 myeloma protein for dinitrophenyl was a (hydrophobic) cleft with overall dimensions 11 x 9 x 6 A. The use of lanthanide ions allowed

COMPUlA’I’IONAI. BIOCHEMISTRY O F ANTIBODIES

179

the exact equilibrium binding constants for dinitrophenyl haptens to be measured (- 1 p M ) . Dinitrophenyl was buried in the site to about 11 A depth, and interacted closely with the A r g side-chain VL 95 and an “aromatic box” of residues Trp L93, Phe H34, and Tyr L34. Asn L36 contributed hydrogen bonds to the nitro groups of the hapten. Reference to the computer-built model of Padlan et al. (1977) allowed the assignment of three nearby histidine residues, H 102, L97, and L44. Eventually, inclusion of chemical modification data (Dwek et al., 1977) and NMR spectroscopy on selectively nitrated tyrosine residues (Leatherbarrow et al., 1982) led to a refined model that accounted for the binding experimental data from a large range of cross-reacting haptens (Fig. 10). B. General Stratagems

Computer-aided analyses of antibody combining sites (Section IV,A) formulated the concept of a conserved framework with hypervariable loops implanted on it. Since 1986, it has become possible, via gene cloning (Jones et al., 1986), to create chimeric proteins with CDR loops grafted onto foreign frameworks that carry antigen-binding capacity essentially indistinguishable from that of the parent antibody. If indeed it is possible to manipulate antigenic specificities at will, then accurate modeling of structures of arbitrary loops will become an important practical goal. To derive the structure of an antigen binding site from its amino acid sequence, one is faced with the problems of (1) finding the most appropriate three-dimensional Fv framework (the template) among the existing X-ray crystallographic structures, and (2) replacing the six CDR loops of the crystal structure with loops in conformations corresponding to those of the amino acid sequences at hand. A successful loop swap experiment requires (a) a modeling protocol generating natural loop conformations from amino acid sequences, (b) an exact definition of the boundaries between the framework and the N and C termini of each loop, (c) knowledge of the best possible succession in which the loops are built back onto the framework, and (d) identification of any framework residue that influences CDR conformations. These critical residues must be carried over from the template to the framework of the newly constructed antibody. Different approaches have been developed to accomplish these tasks. C. Knowledge-Based Methods

Knowledge-based (homology) modeling methods (see, e.g., Greer, 1991; Bajorath et al., 1993; Bajorath and Aruffo, 1994; and Sali, 1995, for concise overviews) allow one to construct an approximate three-dimensional

180

JIRI NOVOTNY AND JURCEN BAJOKATH

H!

c

66

L3 . n

FIG. 10. Three-dimensional structure of the MOPC 315 antigen binding site, as derived by NMR spectroscopy and computer modeling (see Section VI,A for details). (Reproduced with permission from Dwek et al., 1977. Nature 266, 31-37. 0 1977 Macmillan Magazines Ltd.)

model of a protein from its amino acid sequence, and atomic coordinates of a similar protein structure. Jones and Thirup (1986), in their analysis of protein structures from the Brookhaven Protein Data Bank, pointed out that loops with similar N- and C-terminal end points and identical length

FIGURE3 Computer graphics of protein surfaces. (A) A cross-sectional view of mesh surfaces (programs DELPHI and INSIGHT, Biosym Technologies, Inc.). The solvent-excluded surface of the HyHEL-5 anti-lysozyme antibody Fv fragment is shown in green. The red and magenta contours compare electrostatic potentials generated by a pair of glutamate residues H35 and H5O of the Fv fragment, in free solution (dielectric constant E = 78 throughout) and in the protein (E = 78 outside of the protein, E = 2 inside the protein). It can be seen that the protein low dielectric boundary enhances the magnitude of the field and focuses the field inside the cavity (see section IV,B). Reprinted with permission from Novotny and Sharp (1992). (B) A dot surface representation (Connolly, 1985) of a putative T-cell receptor antigen conibiriing site (Ganju et aL, 1992). Residues forming the immediate surface of the binding cavity are shown in yellow and cyan, respectively, the rest of the molecule is color-coded red. The antigen, fluorescein, is also shown above the putative binding site (see section X,B. for niore details). (C) .4 “plaster” surface representation (program GRASP, Nicholls, 1992) of the influenza N9 neuraminidase and its NC41 and NClO antibody epitopes (Tulip Pt nl., 1994) Relative free energy contributions of the N9 neiiraminidase residues toward stability of the NC10 and NC41 complexes were calculated and are displayed with use of a color spectrum green-red (“attractive”-“repulsive”). Surprisingly, different residues appear to stabilize the NCIO and NC41 complexes despite the fact that the NClO and NC41 epitopes overlap by -SO%, and the two antibodies have idcntical HI hypervariable loops (see sections VII,C arid VII1,A for details)

Segrnent

FIGURE7 Various representations of the antibody binding region, i.e., the interdomain P barrel with the adjacent hypervariable loops. (A) Diagram of the interdomain P barrel. Individual strands from the VL (front) and the VH domain (back) are shown. Hypervariable loops are labeled L1 through HS. The G P strands in each domain correspond to the J gene segments. [Reprinted with permission from Novotny el ol. (1983).] (B) A stereo backbone tracing of P strands forming the interdomain barrel (blue) with hypervariable loops color-coded red and green. The plot is based on the X-ray crystallographic structure of the McPC 603 Fab fragment. [Reprinted with permission from Bruccoleri el al. (1988).1 (C) Least-squares superposition of HS loops and the supporting F and G P strands from different antibodies. Note the conservation of the P-sheet framework, and the variability of the loops attached to it. (D) Stereo ribbon diagram of the McPC 603 interface P barrel showing the conserved aromatic cluster and additional aromatic residues contributed by the CDR loops. Green and yellow, VH and VL domain backbones, respectively; magrnta, p bulges; blue, hypervariable loops. [Graphics by INSIGHT (Biosym Technologies, Inc.) .]

FIGURE 11 Canonical loop motifs. (A) L3 loop, canonical type 1. The loop is six residues long (L91-L96) and its conformation is mainly determined by the conserved cis Pro L95 (yellow) and the polar residue L90 (Chothka and Lesk, 1987). The L90 side chain, typically a Gln (green), forms hydrogen bonds to the backbone N and 0 atoms within the loop. (B) H1 loop, canonical type 1. The loop (H2GH32) starts with a sharp turn which requires a Gly residue (shown in yellow) at position H26. The loop conformation is stabilized by packing interactions among large hydrophobic residues at positions H27 and H29 Phe (in yellow) within the loop, and at the framework positions H34 (brown, Met) and H94 (blue, Arg). The residue H29, pointing inside the loop, is fully buried and interacts with the framework. The last residue of the loop, H32, was not regarded as a structural determinant of this canonical type (Chothia and Lesk, 1987; Chothia el al., 1989), however, it typically carries aromatic side chains (Phe or Tyr) that are in contact with the A r g H34. These two examples illustrate the types of specific interactions that determine canonical loop types, i.e., side-chain-mainchain hydrogen bonds, conformationally constrained Pro and Gly residues, and large hydrophobic side chain interactions involving both the loops and the framework. [Graphics by INSIGHT (Biosym Technologies, Inc.).]

FIGURE12 Spatial proximity of canonical CDR loops and rigid body shifts seen among loops on framework superpositions. (A) Least-squares superpositions of the seven H1 canonical type 1 loops. (Left) A direct least-squares superposition of the isolated loops (rms deviation 0.58). (Right) Loop superposition obtained after the least-squares superposition of the most conserved backbone motifs in the VL and VH frameworks See Table I1 and section VI,C,2 for details. (B) Least-squares (s-rms deviation 1.6 8). superpositions of the three H2 canonical loops. (Left) A direct least-squares (Right) Loop superposition superposition of the isolated loops (rms deviation 0.2 8). obtained after the least squares superposition of the most conserved backbone motifs in the VH and VI, frameworks (average pairwise s-rms deviation 1.8A).

FKXK~ 13 CONGEN analysis of the Ser+Arg mutation in the antidigoxin antibody 40-150, position H94. Although distant from the antigen binding site, this mutation affects antibody affinity and fine specificity. Both the Arg and Ser side chains in position H94 were constructed by CONGEN, based on the X-ray structure of a related immunoglobulin McPC 603. The side-chain conformational space of the position H94 and its immediate vicinity was exhaustively sanipled and results of the searches analyzed. Shown here is the ribbon tracing of the 40-150 H3 loop (magenta) with the side chains Tyr H32, Ser/Arg H94, Asp H101, Tyr H103, Asp L55, and Arg L46. In the H94 mutant (side-chain carbons colored green) the preferred conformation shows Arg H94 and Asp HlOl engaging in electrostatic interaction and hydrogen binding. In the Ser H94 wild type (side-chain carbons colored white), the Asp HI01 side chain swings to the right and interacts with a charge-H bond network of Asp L.55 and Arg L56; as a result, conformations of the two latter residues are changed. Both the light chain residues 46 and 55 are close to the antigen binding site surface, and their altered conformations may impact on the shape of the binding site. See Novotny el al. (1990) and Ping et al. (1994) for details of calculations and their experimental testing. [Graphics by INSIGHT (Biosym Technologies, Inc.) .]

(~OMPU'I'A'I'IONAL BIOCHEMISTRY OF ANTIBODIES

181

often had similar conformations. The conjecture that CDR loops with identical length and different sequences may adopt similar conformations formed the basis of early antibody modeling protocols (Kabat and Wu, 1972; Padlan et al., 1977; Feldmann et al., 1984; de la Paz et al., 1986; Roberts et al., 1987; Smith-Gill et al., 1987). Independently, structural classification of loops in proteins (Sibanda and Thornton, 1985; Ring et al., 1992) also facilitated modeling, particularly of the P-hairpin loops L2, L3, H2, and H3. This type of structure-based modeling was automated by Levitt (1992) with his segment matching algorithm: Given a loop sequence, the Protein Data Bank can be searched for short, homologous backbone fragments (e.g., tripeptides) which are then assembled and computationally refined into a new combining site model. A milestone event in knowledge-based modeling was introduction of the canonical CDR structure concept by Chothia, Lesk, and colleagues (Chothia and Lesk, 1987; Chothia et al., 1989). They found that five of the six CDR loops (all except the H3) adopted only a limited repertoire of backbone conformations that were readily predictable from the sequence. These canonical conformations were determined by specific packing, hydrogen bonding interactions, and stereochemical constraints of only a few key residues (structural determinants). Examples of canonical motifs are presented in Fig. 11. In its most general form, the canonical structure concept assumes that (1) sequence variation at other than canonical positions is irrelevant for loop conformations, (2) canonical loop conformations are essentially independent of loop-loop interactions, and (3) only a limited number of canonical motifs exist and these are well represented in the database of currently known antibody crystal structures. Every one of these assumptions may fail in isolated practical instances. In their definition of the five canonical CDR loops, Chothia and Lesk (1987) and Chothia et al. (1989) described the CDR loops as overlapping, but not identical to, the hypervariable regions of Kabat et al. (1977).When a large number of antibodies were scanned for the presence of canonical sequence motifs in their CDR loops and frameworks, 50%-95% (depending on the particular loop) of murine and human sequences were found to contain one. Chothia et al. (1992) also showed that the vast majority of VH domains in antibodies display one of the seven different canonical H 1 and H2 conformations. The canonical repertoire of CDR loop Ll was enriched by Wu and Cygler (1993) who analyzed the crystal structure of a murine antibody with a 1 light chain (Cygler et al., 1991). Canonical features are also thought to determine conformations of loops in proteins other than immunoglobulins (Tramontano et al., 1989; Tramontano and Lesk, 1992). Structural determinants of canonical CDR loops include residues in the framework regions. The side chain of the heavy chain residue 71, in the

182

JIRI NOVOTNY AND J U R C E N BAJORATH

@-strandE, determines the conformation of the CDR loop H2 (Tramontano et al., 1990). A critical role for other framework residues in preserving productive antibody-antigen interactions was demonstrated in the mutagenesis experiments of Foote and Winter (1992). 1. Results: Canonical Loop Modeling of Binding Sites The canonical loop concept has met with success in structure prediction of several antibody binding sites. Chothia and colleagues predicted all six CDR loop conformations in the lysozyme-binding antibody D1.3 (Chothia et al., 1986) and five canonical loop conformations in four other antibodies (HyHEL-5, HyHEL-10, NC41, and NC10) before their crystal structures were known (Chothia et al., 1989). In the early D1.3 prediction conformations of four CDR loops (Ll, L2, H2, and H3) were accurately described with backbone rms deviations (including @ carbons) 0.5-0.9 A from the X-ray structure. Subsequently, backbones of 14 out of 19 canonical loops were predicted with an accuracy of better than 1.0 A rms (but in no case worse than 1.4 A rms). A level of accuracy of -0.7 A was achieved by Eigenbrot et al., (1993) and Essen and Skerra (1994) in single canonical loop predictions. Several papers reported the construction of models for which crystallographic structures do not exist and whose accuracy cannot be evaluated yet. The study of Roberts et al. (1994), on model building of the catalytic (esterolytic-amidolytic) NPN43C9 antibody included a careful selection of the framework based on an inspection of various VL-V, interfaces. The model correctly predicted previously unknown binding and catalytic hnctions of the Arg residue L96, and suggested a mechanism by which the antibody stabilized high-energy transition states during catalysis. Chothia et al. (1989) observed that even correctly predicted CDR loops may be spatially misplaced (rigid-body shifts) by up to 3 A relative to their crystallographic counterparts. Such effects are critical for accurate modeling of antibody binding sites, as they may substantially alter the shapes of binding sites. In this sense, assessment of the accuracy achieved in individual CDR loop predictions may be misleading. The often reported backbone, or even all-atom rms values obtained by direct least-squares fit of isolated individual loops, do not take into account the rigid-body misplacements of whole loops, and the actual accuracy of the complete binding site model may be lower (Bajorath et al., 1995). As the database of high quality crystal structures has grown, some canonical motifs have become well established. In other cases (e.g., the H2 loop), the ensemble of observed and classified canonical conformations may still be incomplete. Unusually long L1 loops, whose tip portions are not very well defined conformationally and probably sample several different backbone configurations, also may not be accurately predictable. Fur-

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

183

thermore, an assignment of sequence motifs to canonical conformations may remain ambiguous in a few instances, e.g., when conformationally “odd” amino acids, such as multiple prolines or glycines, are found in antibody sequences. In general, however, the canonical structure concept provided the model building of combining sites with an excellent tool. For the many newly determined antibody sequences it is highly likely to find, in the Brookhaven Protein Data Bank, templates (frameworks) with very high sequence similarity; more than 80% identity is not unusual. Such templates often include one or more CDR loops of the same canonical class as the loops to be modeled. In this case loop splicing is not necessary and the template loop can be directly copied into the model. At the current rate of Brookhaven Data Bank growth (close to a new antibody X-ray structure every month), and the number of antibody Fab structures currently available (more than 50), knowledge-based modeling will only become easier and even more reliable. Automatic loop search procedures, such as the distance matrix-based methods of Jones and Thirup (1986)will help to identify loops with sequences identical, or nearly identical, to those of loops with unknown conformations. Fast fetching algorithms for structure comparison (Holm and Sander, 1994) and relational databases (Bryant, 1989) will allow efficient handling of the rapidly growing numbers of antibody structures and sequences. 2. Framework-Loop Relationship

Having determined canonical backbone motifs of selected CDR loops, one is faced with remaining problems such as (1) how to model conformations of noncanonical loops, (2) how to place side chains on CDR loop backbones, and (3) how to combine CDR loops with the best framework model. Routinely, it is assumed that side-chain conformations of homologous residues are similar (Summers et al., 1987). Thus, the side-chain conformations most frequently observed at the corresponding positions in other CDR loops of the same canonical conformation are selected in the model. Libraries of preferred side-chain rotamer conformations assembled from databases can also be consulted (Ponder and Richards, 1987). The generality of these procedures, however, has frequently been questioned (see, e.g., Schrauber et al., 1993). As the best way of splicing loops onto the framework, Tramontano and Lesk (1992) suggested placing loop backbones on the end points of the framework after a weighted, least-squares superposition of four framework residues N- and C-terminal to the loop. This suggestion, however, does not eliminate differences in framework structures, and in the resulting rigid-body shifts in relative loop positions. An example will illustrate this point. Seven crystal structures of antibody combining sites, refined to

184

JIRI NOVOTNY AND J U R C E N BAJORATH

better than 2.0 A resolution, were superimposed with average rms shifts of 0.4-0.6 A on the backbones, a very good agreement indeed (see Fig. 12 and Table I1 for details). In order to assess how well the individual loops of the same canonical class agreed in the context of the frameworks, two different rms deviations were calculated (Bajorath et al., 1995). The conventional value (rms) was calculated after direct least-squares superposition of only the loop backbone atoms and directly reflects the differences in the backbone conformation of the loops. The other value, called here spatial rms (s-rms), was calculated for the loops afier superposition ofthe conserued Fv framework segments as described in Table 11. The s-rms values reflected not only the conformational differences in various loops, but also their different orientations with respect to the conserved framework (the “takeoff’ angles resulting in rigid-body shifts). A comparison of two CDR loops that have a similar conformation but are spatially displaced via rigid-body shifts would therefore result in a low rms value and a high s-rms value. As an example, we limited our comparison to the H1 and L2 loops (Fig. 12), all of which are of the same respective canonical structure type. The average backbone rms deviation for the pairwise H 1 loop comparisons (seven residues long; see Table 11) was 0.5 A, however, the s-rms deviations were much greater, 1.6 A. For the L2 loop, the average backbone rms shift of the seven L2 loops (three residues long) was 0.2 A, but their average s-rms deviation was 1.2 A. These observations are generally valid for all canonical CDR loops (Novotny and Bajorath, unpublished observations). That is, isolated loop conformations always superpose much better than loops in the context of their individual frameworks. For example, three of the seven L3 loops shared canonical structure type 3, and the backbone conformations of these loops were remarkably similar. In contrast, their relative spatial positions were quite different (see Fig. 12). Examination of the superimposed p strands supporting the CDR loops shows that the positions of the framework termini differ and that their average painvise difference is typically larger than 1.0 A. The central strands of the p sheets essentially diverge at the very ends but edge strands (such as C”) show greater structural differences throughout. To a large extent, CDR loop displacements correspond to differences at the CDRframeworkjunctions. The definition of the framework-loop junctions is, to some extent, subjective and arbitrary and a judicious selection of loop splice points may have a major impact on the quality of the final model. For example, the CDR loop H2 of the 5539 Fab fragment, when implanted on the HIL framework, would result in an s-rms deviation of about 2.5 A, despite the fact that the H2 loop conformations in 5539 and HIL are essentially identical (0.3 A rms).

-

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

'TABLE 11

Painvise ms Comparisons of Seven H I CDR Loops of Same Canonical Type Root-mean-square deviation Fv Fragments superimposed" 4-4-20 to KOL H52 to KOL D1.3 to KOL D1.3 to 5539 D1.3 to 4-4-20 H52 to 4-4-20 H52 to Dl .3 H52 to 5539 KOL toJ539 Se155-4 to 4-4-20 HIL to D1.3 5539 to 4-4-20 HIL to KOL HIL to 5539 Sel55-4 to KOL Se155-4 to HIL 4-4-20 to HIL H52 to HIL Se155-4 to D1.3 Se155-4 to H52 Se155-4 to 5539

rmsb

s-rms'

0.6 0.4 0.9 0.9 1.2 0.6 0.8 0.5 0.2 0.7 0.8 0.6 0.2 0.3 0.4 0.3 0.6 0.3 0.6 0.3 0.5

0.9

(4

(4

0.8

1.4 1.4 1.7 1.1 1.4 1.2 1.2 1.9 2.0 1.8 1.4 1.6 1.8 1.a

2.1 1.8 2.3 2.1 2.7

' 4-4-20, Mouse monoclonal antifluorescein Fah (Herron et al., 1989), PDB code 4FAB; KOL, human myeloma Fah (Marquart et al. 1980), PDB code 21G2; D1.3, mouse mnooclonal antilysozyme Fah (Fischmann et al., 1991), PDB code 1 FDL; 5539, mouse monoclonal antigalactan Fah (Suh etal., 1986), PDB code 2FBJ; H52, humanized mouse myeloma antiCD18 Fv (Eigenbrot et al., 1994), PDB code IFGV; HIL, human Fab (Saul and Poljak, to be published) PDB code SFAB; Se155-4, mouse monoclonal antioligosaccharide Fah (Cygler et al., 1991), PDB code 1MFE. To obtain the rms values listed, all the atoms of the two H I loop backbones (residues H26-H32) were least-squares superimposed. ' To obtain the s r m s values, backbone atoms of the conserved VL-VH interface-forming residues (H36-H40, H94-H96, L35-38, and L86-88; see Novotny and Sharp, 1992) in each of the two Fv fragments were least-squares superimposed. The operation brought the complete antigen combining site regions (i.e., parts of the inner /3 barrel and all the CDR loops) of the two Fv fragments into equivalent positions. Thus superimposed, the root-mean-square deviation between the pair of the H1 loop backbones was calculated as the s-rms value.

185

186

JIKI NOVOTNY AND J U R C E N BAJORATH

Changes in the spatial relation of CDR loops may provide, in addition to sequence and length variability, a means of increasing the repertoire of shapes available for antigen recognition. From this point of view, antibody sequences must include determinants not only of CDR loop conformation but also of CDR loop position. The canonical CDR loop library consists of an overlapping and, in general, smaller set of residues than that for hypervariable regions (Kabat et al., 1977). Backbone segments, which on the basis of canonical loop definitions are classified as frameworks, include sequence variability which may modulate interactions among adjacent loops and may account for some or all of the effectsjust described. One way of addressing this framework bias in practical modeling may be to select frameworks not only on the basis of overall sequence similarity (currently the most common procedure) but also on the basis of conserved patterns of residues in the regions adjacent to the CDR loops (see Section IX,D). D. Conformational Searches and Monte Carlo Methods

Protein conformation is determined by values of backbone and sidechain torsional angles @ (C-N-Ca-C),11, (N-Ca-C-N), and x (N-Cu-C/?-Cy, Ca-C/?-Cy-CG,and so on). Three-dimensional modeling involves finding specific values for all these torsional degrees of freedom. A particularly promising approach is therefore to use automatic computer algorithms that uniformly sample the complete conformational space of a polypeptide chain segment (Bruccoleri and Karplus, 1987; Shih et al., 1985; Snow and Amzel, 1986; Moult and James, 1986; Fine et al., 1986; Amzel, 1992). Ideally, all the backbone and side-chain conformations compatible with the rest of the protein structure can be generated. The lowest energy conformation should correspond to the naturally occurring one. In practice, technical problems associated with an exhaustive sampling of conformational space restrict searches to short polypeptide segments only. The CONGEN program (Bruccoleri and Karplus, 1987) represents an efficient realization of this modeling stratagem. Given an accurate Gibbs function and a short loop sequence, reliable structural prediction using CONGEN appears to be remarkably easy: Generate all the stereochemically acceptable structures of the loop, calculate their energies, and take the one with the lowest energy. In practice, however, two major problems must be overcome first: Loops are not always short, and accurate energy functions do not yet exist. In the case of antibodies, most loops are sufficiently short that a complete search is feasible given modern computer workstations. The second problem presents a more fundamental hurdle (see Section VI1,C).

COMPUTATIONAL BIOCHEMISIKY OF ANTIBODIES

187

In its most general form, a conformational search is a set of nested iterations of the degrees of freedom (e.g., torsional angle values on a preselected grid, say, 30") in the system. In CONGEN, the generation of backbone coordinates depends heavily on the Go and Scheraga (1970) chain closure algorithm as modified by Bruccoleri and Karplus (1986). Given stereochemical parameters for construction of the polymer and six adjustable torsion angles between the two fixed end points, the algorithm calculates values for the six torsional angles needed to perfectly connect the (rigid-angle) polymer from one end point to another. T o generate conformations of loops with more than three residues, the backbone torsion angles of all but three residues are systematically sampled, and the Go and Scheraga procedure is used to close the backbone. Bruccoleri and Karplus (1986) allowed for small variations in the peptide bond angles which greatly improved the efficiency of chain closure. CONGEN sampling of backbone torsion angles is done with the aid of a conformational energy map. A set of maps giving energies as a function of discrete values of q5, q, and o corresponding to grids of 60" down to 5" has been precalculated. Typically, a 30" sampling is sufficent to include the natural torsion and to guarantee good results. The backbone can be searched either forward from the N terminus or backward from the C terminus, and both cis and trans peptide bond angles can be sampled. After the backbone has been constructed, side chains can be placed by a variety of methods, each allowing complete freedom in sampling the complete side-chain conformational space. In 1987 a seven- to eight-residue loop was often too long to be searched exhaustively, particularly if it contained amino acids with many torsional degrees of freedom (glycine backbones, lysine side chains). With present-day computers, and their increasing speed of computation, it has become possible to search loops with 10 or more amino acids exhaustively, and the limit is steadily increasing (Table 111). Random searches of backbone conformations is an alternative way of producing a library of possible CDR loop structures. The random tweak method of Fine et al. (1986) consists of Monte Carlo simulation in loop torsion angles followed by energetic minimization of the randomly generated starting conformations. Random torsion angles are forced to satisfy geometric constraints of the framework and splice into the remainder of the molecule. The constraints are applied as a set of Lagrange multipliers in a computationally fast iteration scheme [one inversion of a 4 x 4 matrix is required per iteration (Bajorath and Fine, 1992)l. The tweak method has been implemented in two programs, Levinthal's PAKGRAF and the HOMOLOGY/INSIGHT program commercially available from Molecular Simulations Inc. (San Diego, CA). Higo et al., 1992 (see also Gibrat et al.,

188

JIRI NOVUTNY AND J U R G E N BAJORATH

TABLE 111

Computer Workstation Speed in CONGEN Conformational Searches CPU speed Year

Computer

1987 1990 1994 1994 1995

MicroVaxII" IRIS 4D/706 ONYX, 1 processor6 ONYX, 10 processors" CHALLENGE, 10 processors"

MHz 1.5 16 150 -1300 -2600

nsec 667 62 6.7 -0.8 -0.4

Search time per residue' (set)

200 10 1 0.1 0.05

Digital Equipment Corporation, Maynard, MA.

' Silicon Graphics, Inc., Mountain View, CA.

'I

' Averages obtained from constructions of six CKD loops of various lengths and amino acid sequences; see e.g., Bruccoleri et al. (1988), Novotny et al. 1991), Bassolino-Klimas et al. 1992). 1992) described a different Monte Carlo protocol for CDR loop modeling. It combines the Metropolis et al. (1953) search algorithm, and weights applied to potential energy terms (forcing potentials), in a simulated annealing scheme (Kirkpatrick et al., 1953) to generate loops that satisfy the framework constraints. In order to identify the native conformation in the (often vast) number of all those that are stereochemically possible (including side-chain rotamers), one has to rely on a free energy criterion. Indeed, it is the free energy of the complete system, rather than its potential energy in vacuo, that determines the native fold. In the past, the calculated in vacua potential energies (Brooks et al., 1983) were unable to distinguish between correctly and incorrectly folded protein structures, whereas modified potentials with nonbonded interactions including solvent-exposed (surfacedependent) terms could discriminate between the two types of structures (Novotny et al., 1984). An approximate representation of solvent effects was incorporated into the loop selection algorithm by Bruccoleri et al. (1988). The generated loops were first ranked by the in vacuo potential, and then solvent-accessible surfaces of the loops in the lowest potential energy window of 1 kcal (4.2 kJ) were calculated; the loop with the least accessible surface was then selected. Alternative approximations to free energy developed and reported over time were (1) a complete exclusion of intramolecular van der Waals interactions from the potential (Martin et al., 1989), and (2) ranking on the basis of the electrostatic energy obtained from a finite difference solution of the Poisson-Boltzmann equation (including the solvation term), and

-

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

189

atomic solvation parameters (Smith and Honig, 1994). None has been completely satisfactory. If indeed loop-loop contacts are important for correct binding site structure prediction, then both the knowledge-based methods and automatic computer algorithms will eventually need a robust free energy functional to gauge the nativeness of the completed assemblies of all six loops. In CONGEN construction, the lowest free energy partial solutions may continuously be selected as the buildup of the loops progresses. In knowledge-based procedures, once the correct backbones are found, the Gibbs function may be used to guide side-chain construction. 1. Results Obtained with Conformational SearcheslMonte Carlo Methods

The first use of the CONGEN program involved an automated protocol that constructed two antibody binding sites, one for the small hapten phosphorylcholine, McPC 603, and the other for the protein antigen (antilysozyme HyHEL-5). Reasonable accuracy was achieved in both cases (1.4-1.6 A s-rms for all CDR backbones, 2.4-2.6 total), with the best results obtained for the McPC 603 L3 loop (six residues, 0.8 A backbone rms, 1.4 A total) and the McPC 603 H1 loop (five residues, 0.7 A backbone rms, 1.7 A total) and the HyHEL-5 L2 loop (six residues, 0.8 A backbone rms, 1.7 A total). Significant side-chain misplacements occurred on isolated aromatic rings and formally charged residues in the L3 and H3 of HyHEL-5 (total rms 14.1 A and 2.7 A, respectively; backbone rms 1.1 A in both cases) and the H3 in McPC 603 (total rms 2.9 A, 1.1 A on the backbone). The exceptionally long L1 loop in McPC 603 had to be built in several successive runs, each constructing only part of the 100~1,and also showed a large rms deviation from the X-ray structure (2.6 A on the backbone, 3.0 A total). However, the tip of this long loop showed relatively high backbone B factors in the crystal and its electron density was difficult to interpret unequivocally (Satow et al., 1986; see also comments in Brookhaven Protein Data Bank entry 1MCP). Several CONGEN computer modeling experiments were carried out using antibody sequences whose three-dimensional structure became available only later. Nell et al. (1992) reported construction of the binding site of an anti-insulin antibody 123, based on its homology with HyHEL-5 antilysozyme antibody. In the case of the antidinitrophenyl antibody AN02 (Bassolino-Klimas et al., 1992), the backbone rms shift of the six loops (43 amino acid residues) was 2.6 A. The shortest loop was 6 residues long, and the longest 9 residues long. Due to the length of the loops, several side-chain misplacements occurred (total rms 3.9 A). The antidigoxin antibody 26-10 (Bruccoleri and Novotny, 1992)was modeled from the McPC 603 framework and only five loops were built by conformational searches. The H1 loop backbone was copied directly from the crystal-

190

JIRI N O V O T N Y A N D J U R C E N BAJORATH

lographic template because the only sequence difference in the definition of the loop was Glu H35 to Asn. This turned out to be a mistake: When the X-ray structure of the 26-10 antibody became known (Jeffrey et al., 1993), it was found that the H1 loop conformation was substantially different from the template for residues 28-30, outside the range of the loop as initially defined for the CONGEN run. As a result of this, and of the sequential construction of the loops, the H3 and H2 loops were constructed incorrectly. The fact that polypeptide segments with essentially identical sequences may differ substantially in conformation (Kabsch and Sander, 1984) presents a major challenge to all protein modeling protocols. In the first application of the random tweak method, Fine et al. (1986) generated conformations for four loops, H1, H3, L2, and L1, from the binding site of McPC 603. Starting from random structures, a large number of conformations for the loop backbones were generated, followed by either minimization or molecular dynamics to find minimum energy conformations for both the backbone and loop side chains. The same method was used to construct the entire antigen combining site of the CEA antibody specific for a carcinoembryonic antigen, a known colon cancer cell marker (Mas et al., 1992). Gibrat et al. (1992) reported backbone rms agreements in the range 0.9-2.6 A on reconstruction of the complete HyHEL-5 binding site with the Metropolis-simulated annealing Monte Carlo method. 2. Single-Residue Mutants, Indirect Effects on Binding CONGEN calculations were also used to supply a structural rationale for experimentally produced single-residue mutations and their effects on antigen binding. In the antidigoxin antibody 40-150 (Novotny et al., 1990; Ping et al., 1993), a spontaneous mutant 40-150 A2.4 carried a replacement Ser-+Argin its heavy chain (H94) and had altered specificity.A secondorder mutant, 40-150 A2.4P.10, lacked two residues at the N terminus of its H chain and had a specificity profile approaching that of 40-150 antibody. The N terminus and the position H94 were distant from the antigen binding site of the antibody, and the structural basis of the specificity changes was not immediately apparent. Approximate structures of the 40-150 antibody and its mutants were constructed by the computer, based on atomic coordinates of the homologous mouse antibody McPC 603. The torsional spaces of the polypeptide backbone and side chains around position H94 were uniformly sampled and the lowest energy conformations were analyzed in detail. The results indicated that, when Arg H94 is substituted for Ser, Arg H94 can hydrogen-bond to side chains of Asp H101, Arg L46, and Asp L55 (Fig. 13).This resulted in a change in the surface of the combining site which may account for the affinity changes. Deletion of the two N-terminal residues increased solvent accessibility of Arg H94.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

191

The solvation may have caused a hydrogen bond between Arg H94 and Asp HlOl to be lost, restoring the structure to one similar to that of 40-150. The preceding structural hypothesis was tested by Ping et al. (1993) who prepared genes coding for N-terminally deleted heavy chains with both the Ser H94 and Arg 94 side chains, respectively, in the position H94. When antibody activity of the engineered and reconstituted antibodies was measured, N-terminal deletion had little effect on binding of the Ser H94 structure, whereas two-residue truncations increased affinity approximately 40-fold for the Arg H94 mutant, consistent with the hypothesis. Chien et al. (1989) described computational analysis of a single residue mutation similar to that in 40- 150. In their work Asp H 101, when replaced with Ala, totally abolished the phosphorylcholine-binding capability of the antibody S107. Computer-aided analysis of this effect highlighted the importance of Arg H94, and the salt bridge between the residues Arg H94 and Asp H101, in maintaining proper conformation of the H3 loop. RuffJamison and Glenney (1993) used the CONGEN program to build a model of another phosphorylcholine binding Fv fragment, Py20, and to simulate binding site mutants on a computer. In the course of their work, they discovered two mutations (V, Tyr-105-tAla and VH Tyr-1064Ala) with a moderately increased affinity for the antigen. In mutagenesis experiments on the antidigoxin antibody 26-10, CONGEN computer modeling of mutated side chains helped to rationalize diverse mutagenesis data, e.g., at hapten-contacting residues Asn H35 and Tyr-50 (Schildbach et al., 1991, 1993a, 1994). Modulation of antibody affinity by side chains in the hapten-distal position H52 was also analyzed by Schildbach and co-workers (1993b). Several mutant side chains were modeled using the crystal structure as a starting point. The results suggested that diverse residues could be accommodated within the antibody without substantial structural rearrangement, and that none of the substituted side chains were able to contact hapten. At the same time, the modeled H52 mutant conformations suggested plausible ways in which noncontact residues could modulate affinity indirectly through their impact on the orientation of the hapten-contacting side chains. E. Issues, Combined Protocols, Future Improvements

Several relatively accurate methods are currently available for modeling loop structure from its sequence. An inherent accuracy limit of the canonical concept is a fraction of an angstrom if backbones are considered in isolation, and -1.5 A for all atoms. Automatic loop construction via uniform conformational sampling or random tweaking essentially always contains the native loop structure in the set of the generated loops but, as mentioned

192

JIRl NOVDTNY AND JURGEN BAJORAlH

earlier, the key problem here is to have a reliable Gibbs function to identify the native loop correctly. In many cases, the calculated free energy of the loop will be correct only if all the (potentially manifold) interloop interactions are correctly modeled, too. This raises the question of accurate modeling of the complete binding site, i.e., an ensemble of six interacting loops. Even if knowledge-based methods successfully find every single loop, one will still need a reliable procedure for correctly assembling the backbones in the context of the site, and adding side chains in their correct native conformations. Reflection on all these complexities of construction seems to favor a hybrid knowledge-based-conformationalsearch protocol, perhaps in the form of an automatic construction and database selection of every single loop, and an iterative loop combination-side-chain placement algorithm that would then completely assemble the site. Such a general protocol has not been developed yet, although the Martin et al. (1989) procedure (now commercially available from Oxford Molecular, Ltd.) represents the best attempt so far to combine CONGEN conformational searches with database loop selection methods. Pedersen et al. (1992) described an assembly of the complete D1.3 antilysozyme binding site using a combination of database searches for the corresponding canonical templates for five loops, side-chain construction by CONGEN, and CONGEN construction of the complete H3 loop, for which no canonical template was available. Very good rms values were reported for all the individual loops. One of the successful applications of a combined, knowledge-basedconformational search protocol is construction of the binding site of the anticancer antibody BR96 (Bajorath, 1994). In this work, framework regions were combined from VL and Vw domains of two different X-ray structures, 4-4-20 and 17/9 (Rini et al., 1992). This allowed the conformations of CDR loops L2, L3, and H1 to be directly included in the model. The splicing of the remaining CDR loop backbones on the framework was accomplished by superposing five residues N- and C-terminal to the loop and the framework end points, respectively. The unusually long CDR loop L1 in BR96 (12 residues) shared length and canonical determinants, but only limited sequence similarity, with the corresponding loop in 4-4-20 and was modeled based on the 4-4-20 structure. CDR loop H2 in BR96 represented another ambiguous canonical motif. Its sequence was consistent with the H2 canonical structure type 3 characterized by a glycine and exceptional torsion angles at position 54, but it had two glycine residues at positions 53 and 54, making it difficult to predict its conformation. The canonical conformation, with the glycine at position 53 in regular torsions, and the glycine at position 54 in exceptional torsions, was retained in the model as one possibility, and an alternative conformation was generated using CONGEN conformational searches. The best CONGEN-generated conformation, in fact, contained exceptional backbone torsions at position

193

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

53 and the glycine at position 54 with @ and 1/, angles in the allowed region of the torsional space. To model the H3 loop, the most common conformation at the base of the loop was assumed, and the residual portion of the loop was obtained from complete CONGEN conformational searches in the presence of the other loops. Crystal structure of the BR96 Fab fragment, determined and refined to 2.5 A resolution, was reported by Jeffrey et al. (1995). Table IV summarizes the rms and s-rms deviations between the BR96 X-ray structure and the model, as originally reported by Bajorath and Sheriff (1995). Overall, a good agreement exists for the backbone rms of all the loops (see Table IV). The largest rms was obtained for the L1 loop which also shows rather high B factors in the crystal. Also, as in other models, the s-rms values for all the loops were higher than the rms values. In the future, as computers become faster (approximately doubling the speed of the central processing unit every 10 months), computer time will become less of an issue in modeling, and all the stereochemically acceptable conformations of long loops can be fully sampled. In automatic modeling methods (CONGEN, random tweak), it is important to avoid situations where a small imprecision in, say, positioning a side chain of one loop, is amplified into a gross error in modeling the neighboring loop (see examples in Section IV,D,l). For a complete combining site buildup, Bruccoleri et al. (1988) developed a sequence of loop constructions based on the relative positions of the loops in the local frame of reference provided by the /3 sheets of the VL-VH domain interface. The three shorter loops, L2, L3, and HI, are placed “low” (closer to the midpoint of the P-barrel interface) than the others (Fig. 7B). They do not interact with each other and provide a natural basis for construction of the remaining “high” loops, H2, H3, and L1. Thus, the low loops are built first, e.g., in the order L2, H1, L3, followed by H2, H3, and L1. TABLL IV

CDR Loop Cornpanson: BR96 Model ~

~~

~

rms deviation

~~

(A)

71s

BR96 X-ray‘ ~~

s-rms deviation (8,)

Loop

Backbone

All atoms

Backbone

All atoms

L1

1.6 0.2 0.3 0.3 0.3 1.5

2.9 0.4 0.9 0.6 0.9 3.3

2.0 0.3 1.0 0.9 1.4 2.9

3.2 0.5 1.:3 1.0 1.8

L2 L3 H1 H2 H3

At 2.5 8, resolution.

3.8

194

JlRI NOVOTNY AND JURGEN BAJORAlH

VII. BINDING AFFINITY

AND

SPECIFICITY

The question of antibody specificity is central to molecular immunology. Even problems seemingly unrelated to specificity, such as the immune repertoire and maturation of the antigenic response require, for their full understanding, a reference to antigen binding and affinity changes in the process of clonal selection. What are the main physical components of binding and specificity? Do antibody and antigen structures change on complexation, and, if so, by how much? What is the atomic origin of affinity and, implicitly, of specificity? Only recently, with the emergence of dozens of X-ray structures of antigen-antibody complexes including both low- (- 1pM) and highaf€inity ( 1 nM-10 pM) ones, have we had an opportunity to explore these questions in precise atomic terms. Our explorations have been met with a stiff challenge. In the structures of the complexes, there are hundreds of atoms in intimate interactions: van der Waals, hydrogen bonding, salt bridging. It is not a simple task to relate this molecular jungle to experimental observables (equilibrium binding constants, KM). The most basic biophysical concepts necessary for formulation of any binding theory are still being debated, and there are those who doubt whether atomic origins of binding specificity will ever be fully understood (Mark and van Gunsteren, 1994; Janin, 1995). Rigorous calculations of binding constants may currently be unattainable, however, the situation is not so hopeless that any meaningful insight into the attribution of binding energies would be denied us. As we hope to demonstrate in this section, qualitative and semiquantitative estimates of binding energy attribution are possible in many cases, and can be directly compared to experimental perturbations of binding (AAG values obtained from site-directed mutants of antibodies and/or antigens). Gradually, a conceptual framework for the rationalization of binding data is emerging and will be refined, with time, into a truly quantitative tool. The current situation may remind us of the empirical and semiempirical quantummechanical calculations used in chemistry. They, too, provide a valuable insight into structures of compounds by aproximating, rather than rigorously solving, the Schrodinger equation for many electron molecules.

A. Thermodynamics of Binding Concepts such as binding specificity and complex stability (affinity) have their origin in the thermodynamics of bimolecular reactions. For example, in the reaction Antigen

+ Antibody = Complex

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

195

molar concentrations of the three molecular species are measured at equilibrium, and the strength (specificity) of the complex is estimated from its relative concentration, i.e., the ratio KAS = [complex]/[antibody][antigen]. The experimentally measured KAs relates to the Gibbs free energy of complex formation, AG, as AG = RT log KAS (where R, the gas constant, is the product of the Boltzmann constant, k, and the Avogadro constant, L, and is equal to 8.314 kJ mol-' K-', and T is the temperature in kelvins). Changes in Gibbs free energy, a thermodynamical quantity, can in principle be obtained from atomic structures of all the molecules involved in the reaction, provided all the physical forces responsible for the complex formation are accurately known. The specificity of protein-ligand interactions comes from a large difference between binding constants characterizing the binding of specific and nonspecific ligands. Thermodynamically, the higher binding constants for specific ligands arise from stronger overall forces (Gibbs free energy differences) between the antibody and the antigen. Most of the forces responsible for binding are distinctly short-range: Van der Waals-London dispersion forces (London, 1930), hydrophobic force [i.e., a difference in surface solvation] (Kauzmann, 1959), and hydrogen bonds (Pauling, 1960) have vanishingly small values at distances greater than 3 4 A, a local distance even on an atomic scale. While the first two types of forces can be said to be isotropic (spherically symmetric), hydrogen bonds are strictly directional and require orientation of the participating groups. Electrostatic forces act at a distance but the high dielectric constant of water effectively limits their reach. Thus, the burial of charged atoms at a protein-protein interface distinctly increases their effective electrostatic field, but only at the expense of the energetically unfavorable desolvation of charged groups. A consequence of all this is that the two molecular surfaces, the antibody paratope (i.e., the binding site) and the antigenic epitope, will enter into a stable complex only if they have complementary molecular shapes over a large area, and their surface charge distribution is such that the interaction of opposite charges on the epitope and the paratope provides sufficient Coulombic attraction to stabilize the complex (Novotny et al., 1987). These attractive contributions are counterbalanced by conformational entropy losses of side chains immobilized at the antigen-antibody interface. In the omnipresent thermal motion at room temperature, torsions of free side chains tend to rotate randomly and distribute themselves equally in the three lowest energy conformational states, trans (180") and f gauche (k60"). On complex formation, only one of these states becomes locked in the tightly packed interface. Enforcement of this torsional preference over the natural equalizing tendency of the Brownian motion costs energy, and some of the Gibbs free energy of complex formation must be expended on it.

196

JIRI N O V O T N Y AND JUKGEN UAJORATH

Although the quantitative aspects of these biophysical components of binding are still debated, estimates have been summarized as follows (Novotny and Sharp 1992). (1) Van der Waals or dispersion-repulsion (London) interactions probably constitute most of what is termed shape complementarity in binding, in that they penalize intermolecular contacts that provide either overlap of atoms, or cavities, either directly or through induced strain. Well-packed contact regions at antigen-antibody interfaces, though, are probably not much more favorable than the antibodysolvent and antigen-solvent contacts experienced by the free molecule in the complex. Surface tension data from organic liquid-water systems show that the work of adhesion between hydrocarbon and water interfaces is essentially the same as that between two hydrocarbon interfaces (Nicholls et al., 1991, Adamson, 1976). Furthermore, dispersion forces are relatively small in magnitude, and their possible differences amount to even smaller magnitudes. ( 2 ) The hydrophobic effect is the major stabilization factor of complex formation contributing, according to various estimates, between 25 and 7 2 cal (6 and 17 J) of Gibbs free energy stabilization per 1 of protein-protein contact area (Chothia, 1974; Chothia and Janin, 1975; Sharp et al., 1991b). (3) Electrostatic effects accompanying complex formation involve the following. (a) Creation of new clusters of charged atoms in the low dielectric constant ( E = 2) environment at the protein-protein interface of the complex; these charge-charge interactions can stabilize the complex through Coulombic attraction, the net effect being proportional to the atomic charges. (b) Desolvation of charged groups; this is proportional to the square of the atomic charge, and so the desolvation process is always unfavorable, and often stronger, than the net Coulombic attraction (see Section VI1,A). (4) On complex formation, immobilization of a freely rotatable torsional degree of freedom (a side chain exposed to solvent) carrying a free energy penalty of about 0.6-0.7 kcal ( 2 . 5 kJ; Privalov, 1979; Novotny et al., 1989; Nicholls et al., 1991; Pickett and Sternberg, 1993).

x2

B . Lock and Key or Induced Fit Do antigens andlor antibodies change structure on binding [induced fit hypothesis, first explicitly proposed for antibodies by Pauling (1940)l or is the binding essentially an association of two rigid bodies [the lock-and-key paradigm of Fischer (1894)l. The X-ray crystallographic structures, in fact, reveal both binding modes. Examples of lock-and-key complexes, currently much more numerous, include the McPC GO3 antiphosphorylcholine myeloma (Satow et al., 1986); D1.3 antilysozyme (Amit et al., 1986; Bhat et al., 1994); NC41 and NClO antineuraminidase (Tulip et al., 1994;

COMPUI’AI‘IONAI. BIOCHEMISI‘KY OF ANTIBODIES

197

Malby et al., 1994); 4-4-20 antifluorescein (Herron et al., 1989); and 26-10 antidigoxin ueffrey et al., 1993). The best examples of large induced fits on complexation are the antipeptide antibodies 17/9 and 50.1 where the whole H3 loop pivots as if on two anchor points (Stanfield et al., 1990, 1993; Rini et al., 1992; Schulze-Gamen et al., 1993). On complex formation, either lock-and-key or induced fit, one of the possible surface side-chain conformers is always selected and “rigidified” at the antigen-antibody interface. Because of this generality, and because its energetic cost can be well estimated, this shift from a preformed conformational equilibrium need not be considered an induced fit. The simplest case of a backbone-induced fit may also involve stabilization of a disordered loop segment in one of the thermally available conformational states, introducing a shift in the preexisiting equilibrium (Fersht, 1984). Given the relatively high cost of conformational entropy (-0.6 kcal per aliphatic side-chain torsion or 0.4 kcal per backbone torsion; see Section VII,A), it may well be that those antigen-antibody systems where tight binding is required evolved toward rigid (lock-and-key)binding. Can the induced fits be operational in cross-reactivity?The only structurally well-documented case of antibody cross-reactivity, that of the highaffinity (- 1 &) mouse monoclonal antiprogesterone DB3 which binds four different progesterone-like steroids (Arevalo et al., 1993, 1994), involves no structural changes in the antibody on engaging the various ligands. Cross-reactivity was accompanied by two distinct modes of hapten orientation (i.e., the steroidp ring with two methyl groups) with respect to the Trp side chain H50, one of the major hapten-contacting residues. Incomplete binding surface complementarity between the haptens and the antibody seemed to be the key structural feature promoting cross-reactivity. Bruccoleri and Karplus (1990), Hoffren et al. (1992), and de la Cruz et al. ( 1 994) reported molecular dynamics simulations of hypervariable loops that may provide us some idea of the shape variations the binding site undergoes with thermal fluctuations. The Bruccoleri and Karplus (1990) simulations were carried out at different temperatures in vacuo, and with the goal of sampling the complete conformational space of each loop. Comparison with the results of CONGEN searches revealed that molecular dynamics sampled the loop conformational space less completely, and less efficiently, than the CONGEN uniform searches. De la Cruz et al. (1994) simulated dynamics of a free and a complexed antibody binding site (Fv fragment of the antirhinovirus serotype 2 heptadecapeptide) at room temperature and with an explicit solvent. They reported average rms deviations of up to 1.4 for the free hypervariable loops of the Fv fragment, and less so for those complexed with antigen. A systematic drift from the X-ray positions was also observed that, e.g., for the

198

JIRI NOVOTNY AND JURGEN BAIORATH

CDR H2 loop approached 3 A after a 100-psec dynamics run (in the presence of an explicit solvent). This may have been an artificial by-product of the particular technical conditions of the simulations (the GROMOS potential, nonbonded interactions evaluated with a cutoff of 8 A, and a longer cutoff of 15 applied every fifth step). By comparison, other work (Kitson et al., 1993) showed that, under different conditions (class I1 DISCOVER potential, cutoff 25 with no switching function or employing Ewald summation),2 the average structure simulated by molecular dynamics did not differ from the X-ray structure by more than a fraction of an angstrom. This is indeed to be expected if the solvated protein structure represents a natural system at Boltzmann equilibrium with itself.

C. Empirical Gibbs Functions As early as 1975, Chothia and Janin used the rule of proportionality between hydrophobic effect and solvent-accessible surface area (Chothia, 1974) to estimate hydrophobic stabilizations of bovine pancreatic trypsin inhibitor (BPTI), hemoglobin a/3 dimer, and insulin dimer. After a correction for conformational entropy loss on complex formation, a good correlation between the solvent-accessible protein-protein contact areas and the measured K M was achieved. Nevertheless, the concept of accessible surface area as the only measure of the Gibbs free energy change in complex formation was an oversimplification. For example, the areas of contact between trypsin and BPTI (Huber et al., 1974) and between the immunoglobulin Fc fragment and the fragment B of staphylococcal protein A (Deisenhofer, 1981) are identical (13.9 nm2), while the affinities of these two complexes differ by at least six orders of magnitude (0.1 pM in the case of the trypsin inhibitor and -1pM or less in the case of Fc fragmentfragment B). By now, approximations of binding constants have been refined by including empirical estimates of all the atomic thermodynamic contributions deemed to be important in the attribution of binding energy (Section VI1,A). Empirical Gibbs (free energy) functionals, in the form of simple

'

Evaluation of nonhonded (van der Waals and electrostatic) interactions is timeconsuming during molecular dynamics simulations; often the painvise interactions are neglected beyond an arbitrary cutoff distance. Switching functions are sometimes added to the potential around the cutoff distance to smooth the abrupt truncation of interactions, hut this practice was shown to generate large artificial forces and unwanted perturbations to the system. Ewald (1921) summation allows calculation of all the electrostatic interactions of a set of charges to inifinity faster than simple summation of all the painvise interactions can.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

199

formulas, were proposed by Novotny et al. (1989), Williams et al. (1991), Nicholls et al. (1991), Wilson et al. (1991), Horton and Lewis (1992), Murphy et al. (1993), and Abagyan and Totrov (1994). The many simplifications necessarily adopted in these estimates (see Section VII,C, 1) require that the results not be interpreted too finely, and various improvements are being explored (Pickett and Sternberg, 1993; Jackson and Sternberg, 1994; Vajda et al., 1994). Nevertheless, the present forms yielded reasonable estimates in many cases and showed predictive power where sitedirected mutagenesis results distinguished between residues important and unimportant for binding. Research now focuses on (1) elimination of technical errors in electrostatic calculations such as, e.g., poor treatment of charge desolvation effects on complex formation (see Bruccoleri et al., 1996), (2) refinement of surface scaling constants (Novotny, Bruccoleri, Davis, Sharp, manuscript in preparation), and, where applicable, ( 3 ) a proper account of interface-bound water and its impact on the energetics of complexation (see Section VII,D). 1. Functional Forms and Limits of Approximation

The treatments of Novotny et al. (1989), Murphy et al. (1993), and Abagyan and Totrov (1994) are the only ones systematically applied to antigen-antibody interactions and are discussed here in detail. In the Novotny et al. (1989) method, atomic coordinates of an antibody-antigen complex are the only data on which the calculation is based. By using the Gibbs functional as detailed in the following discussion, the atomic free energies are calculated for each molecule, first for the uncomplexed form and then for the complex. The difference between these two values gives the atomic Gibbs free energy differences on complex formation, AG,,,,,, values. These are the sums of the individual contributions due to hydrophobic, electrostatic, and side-chain conformational entropy effects. From the atomic values, the AGresidurcontributions and the complete AG of the reaction are obtained by further summations. In this empirical scheme, and assuming rigid macromolecules, (1) the hydrophobic effect, AGHB, is directly proportional to the contact solvent-accessible area (in square angstroms) (Lee and Richards, 1971)between the two molecules (Chothia, 1974):

AGHB = (contact area) x 25 cal

(10)

(1 cal = 4.2 J), and (2) electrostatic (Coulombic) interaction between the two molecules, AGEL,is empirically “screened” by an effective dielectric constant:

200

JIRI N O V O T N Y A N D J U R G E N BA J O M I ' H

where QJ is the partial atomic charge, r is the distance between the ath and jth atoms, and E is the effective dielectric constant. Hydrogen bonding is treated as an electrostatic phenomenon, included in Eq. (1 1). Equation (11) approximately reproduces the differential strength of protein-protein hydrogen bonds compared to protein-solvent hydrogen bonds, as estimated by Fersht et al. (1985) in site-directed mutagenesis experiments. These usually attractive interactions are counteracted by a loss of conformational entropy of surface side chains immobilized at the contact surface (Privalov, 1979; Rashin, 1984): -TAS( = NRT log 3 = 0.6N kcal

(12)

where R is the gas constant, T , the temperature, equals 300 K, and N is the number of side-chain torsional degrees of freedom lost. Equation (12) implies that each torsion has approximately three equienergetic states available in free solution (i.e., trans and 2 gauche) but becomes locked in one conformation on the formation of a complex (hence the log 113 = -log 3 term. Finally, for the absolute AG value calculations, estimates are made for the cratic and translational-rotational entropy changes (AS(R and A S T R , respectively) that accompany complex formation (see Novotny et al., 1989): TASCR= 2 kcal

(13)

TASIK= 9 kcal

(14)

Murphy et al. (1994) and Finkelstein and Janin (1989) offered somewhat different treatments and values for both the cratic and translationalrotational entropy terms. Assumptions implicit in Eq. (9-14) are that (1) the solute-solute van der Waals interactions are essentially of the same magnitude as the solutesolvent van der Waals interactions and effectively cancel out (see Section VII,A), and (2) that changes in vibrational entropy between the two free proteins and the protein-protein complex are likewise unimportant. Indeed, the largest changes are expected to occur in low-frequency, collective modes of vibrations [e.g., lobe opening and closing, as suggested by, e.g., Tidor and Karplus (1994), which are known to be effectively damped in aqueous solutions (Cusack et al., 1988). 2. Culculutaons on Indavaduul Antagen-Antabody Complexes

Calculations carried out to date include those for about 10 antibodyantigen complexes and are summarized in Tables V-VII. Overall, the absolute AG are in the range of experimentally observed values, with two exceptions (HyHEL-5 and NC41) due to a poor electrostatics approximation [Eq. (1l)]. Experimentally determined AAG values characterizing the effects of single-residue mutations on binding are available for the NC 10

20 1

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES TABLE V

Empirical Free Energy Calculationsfor X-ray Structures of Antibody-Antigen Complexes ~A

A G (kcdl)

Complex

~ G ~ (kcal)

D1 .3-lysozymer Hy HEL-5-lysozyme HyHEL-10-lysozyme NC4 1-N9 neuraminidase NC 10-N9 neuraminidase McPC603-phosphorylcholine

-11.4 -14.2 -13.0 -9.7 -10.8 -6.6

-9 -3 2 -13 -22 -13

36.7 1-phenyl arsonate 4-4-20-fluorescein

-4 -12.3

-7 -16

' The experimentally

~~

~

~ Refs.

~

~

Novotny et al. (1989) Novotny et al. ( 1 989) Novotny (1991) Tulip et ul. (1994) Tulip et al. ( 1994) Novotny et al. (1989); Novotny (1991) See Table VI See Table VII

4

determined AG value.

* The calculated AG value.

' The original calculation reported by Novotny et al. (1989) was carried out on the Amit et al. (1986) X-ray coordinates. Crystal structure has since been refined and several water molecules were found at the antigen-antibody interface, however, empirical AG calculations have not yet been repeated on the refined coordinates.

TABLE VI Phenyl Arsonate-36-71 Fab Complex: Calculated &&p,dUp ~

Residue Asn H35 Ser H99 Trp H47 Tyr H50 Ala H59 Phe H 108 Tyr H106 Arg L96 Leu L94

Contact surface

~

~~

and Experimental AAG Valuesa ~~

(A2)

AGH$

AGEL'

N, torsion

0.8 0.8 0.6 32.4 3.4 1.6 25.9 17.0 15.9

0 0 0 -0.8 -0.1 0 -0.6 -0.4 -0.4

-0.7 -0.7 -0.4 -0.2 0 0 -0.1 -4.1 0

0 0 0 1.5 0 0 2.0 1 1

- T A S L F ~ Total

0 0 0 0.9 0 0 1.2 0.6 0.6

-0.7 -0.7 -0.4 -0.4 -0.1 0 +0.2 -3.9 +0.2

AAGEx~ >-3.5 -3.4 -2.3 -3.5 -3.1

' Values in kcalimol. The experimental A A G E x ~values (Sompuram and Sharon, 1993) represent the measured differences between the AG of the wild type and that of the Ala mutant at the same position. Phenyl arsonate partial atomic charges were derived from STO-3G Gaussian90 calculations (T. Stouch and J. Novotny, 1991, unpublished results); the arsonate group was assumed to be monoionic (Pressman and Grossberg, 1968). Residue numbers are consecutive through the polypeptide chain. h The hydrophobic term; see Eq. (9). The electrostatic term; see Eq. (10). The TASC:F(conformational entropy) term; see Eq. (1 1).

202

JIRI NOVOTNY AND JURCEN BAJORATH TABLE VII

Fluorescein-4-4-20 Fab Complex: Calculated AGrcsidueand Experimental AAG Valuesa

Residue Gly H104 Trp H33 Arg H52 Arg H74 TyrH103 Lys H54 Lys H67 Tyr H56 Tyr H102 Ser HlOl Asp H31 Arg L39 Lys H55 His L 31 Gln L33 Ser L96 Ser L94 Trp LlOl Phe L103 Tyr L37

Contact surface

N, A G E L ~ torsion

(A*)

6.5 48.1 0 0 29.6 0 0 39.9 16.2 5.5 5.4

4.2 0 32.1 6.9 8.6 5.0 17.9 5.7 36.5

-0.2 -1.2 0 0 -0.7 0 0 -1.0 -0.4 -0.1 -0.1 -0.1 0 -0.8 -0.2 -0.2 -0.1 -0.4 -0.1 -0.9

-0.8 -0.1 -0.7 0 -0.9 0 0 0 -0.2 -0.2 0.8 -3.3

-2.2

0 0 -1.2 0 -0.1 0 0.5

0

1 0 0

2

0

0 2 2

2 1

0

0 1 0 2 0 1 1

2

-TASCF~ Total 0-1.0 0.6 0

0 1.2 0 0 1.2 1.2 1.2 0.6 0 0 0.6 0 1.2 0 0.6 0.6 1.2

-0.7 -0.7 -0.5 -0.4 -0.3 -0.2 +0.2 +0.6 +0.9 f1.3 -3.4 -2.2 -0.2 -0.2 -0.2 -0.1 +0.1 +0.5 +0.8

AAGEXP

-2.2 0

-2.4

a Values in kcal/mol. The experimental AAGEXP values (Denzin et al., 1993) represent the measured differences between the AG of the wild type and that of the Ala mutant at the same position. Fluorescein partial atomic charges were derived from STO-3G Gaussian90 calculations (T. Stouch and J . Novotny, 1991, unpublished results). Residue numbers are consecutive through the polypeptide chain. In this crystal structure of the 4-4-20 complex, one molecule of the solvent 2-methylpentane-2.4-diolis found in a cavity inside the Fv fragment beneath fluorescein but makes no contact with the hapten. The KO of complex formation in methylpentanediol is different (lower) from that in physiological solution (Herron et al., 1989). The mcthylpentanediol molecule was present in the empirical AG calculation. The hydrophobic term; see Eq. (9). The electrostatic term; see Eq. (10). The TASCF(conformational entropy) term; see Eq. (11).

and NC4 1 antineuraminidase complexes, antiphenylarsonate 36-7 1 (Table VI), antiphosphorylcholine McPC 603, and the antifluorescein 4-4-20 (Table VII) complexes. Often, the relative ranking of residues agrees in the experiment and the calculations but the absolute values differ, sometimes by as much as 3 kcal (e.g., Table VI). The calculated AGresidueof Tyr H106 in the 36-71 antibody, Tyr-H33 in McPC 603, and Trp LlOl in 4-4-20 are all in error due to an overestimation of the side-chain con-

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

203

formational entropy term. These side chains were probably constrained to a limited set ofXl torsions even prior to complex formation. On the other hand, some agreements of the calculated residue contributions with experimental AAG data are notable. Kam-Morgan et al. (1993), working on the HyHEL- 10 complex, measured AAG values for several mutants in the Arg-2 1 and Asp- 101 positions and compared them with the AGresiduecontributions calculated by Novotny (199 1). Mutagenesis of Asp101 was well rationalized by the calculated AGHB,AGELand TASCFterms. The Arg-2 1 AGresiduccontribution, originally calculated as -11 kcal/mol, was measured to be only about -2 kcal/mol. Tulip et al. (1994) compared binding energy attributions in complexes of the influenza N9 neuraminidase with those of two different antibodies, NC41 and NC10. In this work, mutations at the energetically important positions (i.e., side-chain substitutions at sites for which the calculated AGresiduec -1 kcal/mol) did not bind the antibody, while neutral mutations (at sites where AGr,,idue > -1 kcal/mol) had no effect on binding, as tested against the binding data reported by Nuss et al. (1993) and Webster et al. (1987). The trend was valid for NC41 in 19 out of 27 cases at 13 neuraminidase sites, or 24 out of 27 if steric clashes and backbone hydrogen bonds, immutable by sidechain replacements, are taken into account. It was valid for NClO in five out of seven neuraminidase sites [corrected experimental data of Gruen et al. (1994) as opposed to the seven out of seven correlation originally reported by Tulip et al. (1994)l. Describing the effect of mutations made in the D1.3 antilysozyme combining site, Hawkins et al. (1993) noted that “much of the energetics of interaction seems to be driven by contacts from . . . the segment G117 to Q121 of lysozyme,” and that, in the antibody, VH residues T30, Y32, R99 and VL residues Y50, T53, and S93 were less important. These rankings are in a good overall accord with those given in Novotny et al. (1989). In the most recent coordinates of the D1.3 complex (Bhat et al., 1994) water molecules were found at the interface that were not considered in the earlier calculations. The question of water-mediated binding is discussed separately in Section VII,D. Hawkins et al. (1993) also remarked that “the number of contacts appears to be at least as reliable a guide to predicting the energetics of the interaction of the D1.3 antibody and lysozyme as semi-empirical calculations.” However, in the Tulip et al. (1994) calculations, retention of binding in the I368R mutant, at the spatial center of the epitope, was successfully predicted while, e.g., the K432N mutation at the edge of the interface markedly reduced binding, an effect expected from the calculations. Perhaps the most important effects suggested by the calculations and subsequently borne out by experimental evidence were those involving (1)

204

JIRI NOVOI‘NY AND JUKGEN BAJOKATH

the existence of an “energetic epitope” (Novotny et al., 1989; Jin et al., 1992), more fully discussed in Section VIII, and (2) the very different attribution of binding affinities in the two overlapping neuraminidase epitopes (also see Section VIII). Later calculations employ a significantly improved formula for electrostatics (Bruccoleri et al., 1996), a hydrophobicity term based on scaling contact molecular surfaces and on conformational entropy estimates enumerated by uniform conformational sampling of all the side chain torsional degrees of freedom in CONGEN. A blind test of the method attempting to reproduce AAG values measured on 10 lysozyme singleKirsch, chain mutants that affected binding of the HyHEL-10 antibody University of California, Berkeley, personal communication; Novotny, Bruccoleri, Davis, and Sharp, manuscript in preparation) yielded encouraging results, as shown in Fig. 14.

u.

3. Binding Energies from Calorimetric Data

Murphy et al. (1993) carried out a calorimetric study of complex formation between the Fab fragment of the antibody 13. l and its antigen, angiotensin 11. Association of the two molecules was accompanied by an enthalpy change (AH) of -8.9 2 0.7 kcal mol-’ and a heat capacity change (AC,) of -240 2 20 cal K-I. From these values, the free energy change of the reaction, AG, at 30°C was estimated as -1 1 kcal mol-’ with a AS component of 6.9 cal K-’ mol-’ (TAS 22.3 kcal). Thus, complex formation was favored both enthalpically and entropically. Structural interpretations of AC, and AH changes invoked proportionality between accessible polar (AApo1) and apolar (AA,,)contact surface areas calculated from the structure of the complex, and the AC, and AH values (Murphy and Freire, 1992; Privalov and Makhatadze, 1990; Spolar et al., 1992). Thus,

-

AC, = 0.45AAap- O.26AAp,I

(15)

The enthalpy change was related to the temperature at which the apolar contribution was assumed to be zero, T*H - 100°C:

AH

= AH*

+ (Acap+ h c , , ~ )( T - T*H)

(16)

AH* = 35AAp,,

(17)

Here, To derive KM from the X-ray structure of the complex, no structural changes were assumed in the antibody binding site, and an arbitrary extended conformation of angiotensin was used to estimate the intramolecular surface changes accompanying the assumed change in angiotensin conformation on complex formation.

205

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

I

I

I

I

I

I

I

I

I

OKWD

12.0

/

/I

/

/

/

-

/

/

10.0

-

/ / /

8.0

/

0

8.0

- correl. cosi

/

R21E

-

/ /

'OKWG

,, /

4.0

-

/

/

2.0 0.0 /'

-2.0 -4.0 -6.0

I

I

I

(ave. error

2.8 kcal)

-

(9

-

R21K

W83Y

I

correl. coef. 0.78

-

OKSEM

ODlOlK

w62y

/ /

without K98M:

-

,ti: 0

/

0.68

I

I

I

I

I

Experimental AAG values [kcal]

FIG.14. Empirical Gibbs free energy estimates on 10 single-residue hen egg lysozyme mutants complexed with the HyHEL-10 antibody. (The experimental data are courtesy of J. Kirsch, University of California Berkeley.) The calculated AAG values (mutant-wild type) are based on CONGEN-generated coordinates of the respective mutants starting from the X-ray crystallographic coordinates of the wild-type lysozyme-HyHEL- 10 complex [e.g., in the W63Y mutant the lysozyme Trp-63 was replaced by Tyr, etc. (Novotny, Bruccoleri, Davis, and Sharp, manuscript in preparation)]. The calculations employed a scaled molecular surface ([contact area]*70). as the hydrophobic term, a finite difference Poisson-Boltzmann algorithm with dielectric boundary and charge smoothing-antialiasing as the electrostatic term, and conformational entropy estimates carried out by the exhaustive CONGEN enumeration of the trans and gauche torsional degrees of freedom. The correlation coefficient for all the data points is -0.6; it is -0.8 if the K96M mutant is ignored (the lysozyme wild type Lys-96 is exceptional in that it participates in buried intramolecular hydrogen bonds and its substitution for Met is likely to lead to global structural changes). The average error for the latter nine AAG values, comparing the experiment and the calculation, is *2.8 kcal.

Based on Eq. (15), a AC, value of -250 cal K-' mob' was obtained, in close agreement with the experimental results, and the A H estimate, with the use of Eq. (15), yielded -8.4 kcal mol-I, compared to the experimentally determined AH of -8.9 kcal mol-I, a remarkable result considering that the angiotensin conformation in free solution was not well known and its total surface areas, A, and LP,had to be approximated.

206

JIRI NOVOTNY AND JURCEN BAJORATH

The entropy change in the complexation was assumed to be related to the heat capacity change as

AS = AS*

+ (AC,., + Ap,pol)In (T/T*s)

(18)

where T*s, the temperature at which the apolar contribution to the entropy change is zero, equals 112”C,and AS*, the residual entropy change, is interpreted as consisting of configurational and “other statistical” contributions to AS:

AS* = AS,,,

+ ASsc + AS,,

(19)

where AS,,, is the change in backbone torsional degrees of freedom, ASSC is the change in side-chain torsional degrees of freedom, and AS,,, is the change in the number of particles in solution. The values of the entropy estimates, i.e., the terms of Eq. (19), were substituted from literature data and the overall AS estimate fell in the range 7-9 cal K-’ mol-’, in good agreement with the experimental value of 6.9 cal K-’ mol-’. These results were interpreted as showing both the loss of configurational entropy and a larger entropy gain from solvent release due to the hydrophobic effect on complexation. Enthalpically, binding was also favored by hydrogen bonding. It can be said that the main focus of the Murphy et al. (1993) work was on finding structural correlates of the extensive thermodynamic functions of state rather than approximating the absolute AG of the reaction from its atomic components. Structural interpretations of protein thermodynamics (Kauzmann, 1959) have had a long and fruitful tradition (Privalov, 1979), however, some of its important issues are still unresolved (Sturtevant, 1994; Naghibi et al., 1995). For example, the hydrophobic effect is considered to be a manifestation of water entropy changes induced by solutes but, depending on the reference state with respect to which the effect is measured (aliphatic alcohol, vacuum) and the theoretical framework used (classical or Flory-Huggins theory; Sharp et al., 1991; Sitkoff et al., 1994; Chan and Dill, 1994), differences in solute-solvent interactions vary in magnitude, contain varying amounts of enthalpic and entropic (mixing volume) contributions, and may enter into the free energy balance with different significance. Electrostatic interactions, on the other hand, are mostly considered to be of enthalpic origin, yet they involve desolvation of charges on attainment of compact solute states (folded protein, proteinprotein complex). Solvation-desolvation events are accompanied by changes in the entropy of water (electrostriction) so large that they almost certainly overshadow the hydrophobic effect encountered at nonpolar surfaces. Efforts trying to relate any macroscopic theory (thermodynamics) to the microscopic, atomic description of matter are essential for an understand-

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

207

ing of biological specificity, however, the phenomenological gap that must be overcome is very wide indeed. Thus, the development of empirical atomic rules of specificity may have a better chance of success when focusing on the interpretation of just one-the major- experimental observable (AG and the binding constant, KD) rather than dissecting this observable into its individual macroscopic parts (calorimetric enthalpy and entropy). This is because every thermodynamic quantity of state represents an equally bewildering mix of microscopic events, each of them as complex in its atomic origins as the parent one and each involving mixtures of solute and solvent effects. Thus, by dividing the problem, one may paradoxically make it more and more complex, in proportion to the number of functions of state being considered. All the (unobservable) microscopic states and their transitions manifest themselves to us only indirectly, as macro observables. The interfacial tension, for example, is not only a direct measure of hydrophobicity, reflecting, in ensemble, all the events happening on solute-solvent mixing, but also a quantity proportional to the solute molecular surface. Molecular surface is a straightforward attribute of structure and, as such, is readily amenable to measurement and analysis. The advantage of simple empirical rules such as the correlation between the hydrophobic effect and solute surface is in circumventing the microscopic complexity of the solution and substituting an aspect of solute structure instead. In this way, quantitation of hydrophobicity can be easily carried out based on the structure of the solute alone. 4. Antigen-Antibody Docking

One way of probing intermolecular interactions is to carry out antigenantibody docking simulations in a computer. A successful computer docking experiment requires (1) a rapid and ef€icient generator of the many sterochemically acceptable contact orientations of the two molecules, and (2) a robust Gibbs functional that can correctly identify the native complex among the many antigen-antibody pairs generated in the course of orientational sampling. Because of the important role of the Gibbs functional, the docking problem is similar in spirit to the theories of binding energy attribution discussed previously. However, the importance of shape complementarity has been highlighted by the work of Norel et al. (1994) who reported successful docking on the basis of surface shape matching alone. Building on the earlier work of Connolly (1986), they represented molecular surfaces by “critical points” describing prominent holes and knobs as minima and maxima of a shape function. Their automatic algorithm considered the entire molecular surfaces of 16 protein pairs from the known

208

JIRI NOVOTNY AND J U R C E N BAJORATH

complexes, but no additional information about the structure of the binding sites. Fifteen complexes were successfully docked, including the antibody-lysozyme complexes HyHEL-5 and HyHEL-10. The most ?ccurate ab initio prediction of antibody-antigen association, to within 1.6 A of the X-ray structure, was reported by Totrov and Abagyan (1994) for lysozyme and the HyHEL-5 antibody. Their docking algorithm used an original, “biased probability,” Monte Carlo procedure. The Metropolis et al. (1953) Monte Carlo algorithm is a succession of random steps in the generalized coordinate space followed by energy evaluation of the new state, E , and its acceptance in proportion to the Boltzmann factor,

AE kT In the biased method, the random step and its acceptance are modified by a probability function that favors the energetically most preferred regions of the configurational (coordinate) space. The biased algorithm does not waste much time sampling the forbidden regions of the energetic landscape, the main problem associated with Metropolis Monte Carlo searches. Another important innovation introduced by Abagyan et al. (1993) was consistent use of the internal coordinate space, rather than the Cartesian coordinate space, for configurational sampling. The Gibbs functional used to evaluate the calculated antigen-antibody configurations was that of Abagyan et al. (1993) and Abagayan and Totrov (1994). It consisted of three terms: (1) surface energy, (2) electrostatic polarization free energy, and (3) side chain entropy. Although conceptually similar to the Gibbs functional described in Section VII,C,l [Eqs. (10)(12)], its formal implementation was different. Thus, surface terms employed the Eisenberg and McLachlan (1986) solvation parameters to scale the solvent exposed and/or contact surfaces. To this term a precomputed conformational entropy term was added that approximated side-chain conformational entropy changes in the individual side-chain types. Sidechain entropy estimates were made on the basis of preferred conformational zones, essentially rotamer libraries:

u=l

where P is the probability of the vth state and R is the gas constant, corrected if necessary for an additional number of states: Sadd

= -R 1%

(Nadd)

(21)

where N is the number of additional states. The electrostatic term, the

COMPUTA?’IONAL BIOCHEMISTRY OF ANTIBODIES

209

modified image electrostatics, makes use of (1) a rigorous analytical solution to dielectric boundary effects for an ideal spherical body (Kirkwood, 1934; Friedman, 1975), and (2) a fast approximation (a surface projection) of this solution to the irregular shape of a protein. The final electrostatic equations were of the Coulombic type and contained, in addition to the partial atomic charges, Qi, the fictitious image charges, Qi””,created at the dielectric boundary:

where E,,, and E~ are solvent and protein dielectric constants, R is the spherical protein radius, and xi is the distance from the ith atom to the center of the sphere). Cherfils et al. (1991), in their docking of the lysozyme-HyHEL-5 complex, employed simplified protein models with one sphere per residue and simulated annealing algorithms driven by a pseudo energy function proportional to the protein interface area. Docked complexes were subjected to conformational energy refinement with full atomic detail. Although a near-native complex configuration was generated and identified as a low-energy one, some other nonnative complexes could not be rejected based on the criteria used. Jiang and Kim (1991) developed a “soft” docking algorithm utilizing a cubical grid and a full molecular mechanics potential. When applied to the lysozyme-HyHEL-5 antibody complex, the correct docking solution was found to be among the top 500 configurations out of about 20,000 generated. Independently, Walls and Sternberg (1992) developed a soft algorithm that allowed for structural changes during docking. Docked structures were evaluated quantitatively based on protein surface complementarity and a simple electrostatic model that screened out unfeasible interactions. When applied to the HyHEL- 10, D1.3, and HyHEL- 10 antibody-lysozyme complexes, the method identified between 15 and 40 possible docking orientations with the native structures being ranked 3rd, 5th, and 30th. Pellegrini and Doniach (1993) reported on computer docking experiments involving lysozyme complexes with D1.3, HyHEL-5, and HyHEL10 that employed rigid structures and a two-step approach. First, a coarsegrained painvise atomic potential of the Sippl (1990) type was used to bring the two molecules together. The configurations obtained were then refined with use of the all-atom OPLS potential of Jorgensen and TiradoRives (1988) and a distance-dependent dielectric function. The native configuration was consistently found to be the preferred solution for all three complexes. Friedman et al. (1994) docked epitopic fragments (heptapep-

210

JIRI NOVOTNY AND JURCEN BAJORATH

tides) to the binding site of the antipeptide antibody B1312 (both the free and complexed X-ray structures), using the Metropolis Monte Carlo docking program of Goodsell and Olson (1990). The peptides Pro-His and Val-Pro-His, which contain residues experimentally identified as important for binding, docked correctly to both antibody structures, but all larger peptides docked correctly only to the complexed Fab, even when torsional flexibility was allowed in the ligand. 5. Hypothesis of Functional (Energetic) Epitopes

Immunochemists often observed that a small portion of an antigenic determinant was of crucial importance in defining its specificity. The term immunodominant (Sela, 1969, quoting Heidelberger) has commonly been used to describe this phenomenon. Does immunodominance have an identifiable molecular basis? The empirical Gibbs free energy calculations for antibody-antigen (Novotny et al., 1989; Novotny, 1991; Tulip et al., 1994) and enzyme-inhibitor (Krystek et al., 1993) complexes consistently indicated that only a small number of amino acids, approximately 30% of the total contact surface area, contributed actively to binding energetics. In the antibodies, the bottom part of the antigen binding cavity often dominated the energetics of binding, whereas in lysozyme, the energetically most important residues defined small (2.5 to 3 nm2) energetic epitopes. Thus, a concept of protein antigenicity emerged that invoked the active, attractive contributions mediated by the energetic antigenic epitopes and the passive surface complementarity contributed by the surrounding contact area (see also Section VIII). The concept offered resolution of an apparent paradox: on the one hand, a multitude of side-chain-side-chain interactions at the interface [-16 side chains in the antibody as well as in the antigen; see, e.g., Amit et al. (1986)J and, on the other hand, the experimentally derived size of the binding site as four to six amino acids (Haber et al., 1967; Sela, 1969; Schechter, 1971; Kabat, 1970). A number of experimental data consistent with the hypothesis of an energetic epitope have accumulated, including those obtained on antibodies to human growth hormone (Jin et al., 1992; Cunningham and Wells, 1993),3human placental lactogen (Lowman et al., 1991), the A repressor (Breyer and Sauer, 1989), the antilysozymes HyHEL-10 (Lavoie et al., 1992) and D1.3 (Hawkins et al., 1993), the anti-N9 neuraminidase NC41 (Nuss et al., 1993), anticytochrome c (Mylvaganam et al., 199l), and anticyclosporin (Rauffer et al., 1994) antibodies. Smythe and von Itzstein (1994) accepted the concept of a functional epitope as a starting point for their synthesis of Jin et al. (1992) suggested the termfinctional eptiope instead of energetic epitope.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

21 1

a biologically active, constrained cyclic peptide that mimicked the NC4 1 antineuraminidase antibody. The size of the functional epitope varies somewhat according to the criterion used for its definition. Jin et al. (1992) found that, per epitope, the number of alanine substitutions causing a >2- or >20-fold effect on binding affinity was on average eight and three, respectively, which is more in accordance with Nuss et al. (1993) than with the CONGEN calculations (Tulip et al., 1994; Novotny et al., 1989). Part of the discrepancy values of some residues in the may arise because the large negative AGreslduc CONGEN functional epitopes are due to favorable contacts made by main-chain atoms only (Arg-327 in N9-NC41) and can be “invisible” to mutagenesis. In the utmost limit of low molecular weight haptens with no formal charges and only one or two functional groups (e.g., digoxin; see Near et al., 1993), the functional epitope concept may not be applicable (Webster et al., 1994). In the calculations reported by Novotny et al. (1989) and Tulip et al. (1994), charged residues were high on the list of energetic residues. Jin et al. (1992), Kelley and O’Connell (1993), and Cunningham and Wells (1993) also reported that charged residues played a prominent role in most functional epitopes and that out of the average number of 8 residues causing a >2-fold reduction in affinity, on the average 2.8 are charged. The functional epitopes are mostly discontinuous (Jin et al., 1992; Tulip et al., 1994).As for predictions that some of the contact residues act in a repulsive manner and destabilize the complex, further experimental data are required to confirm that this is the case. The alanine scan of Jin et al. (1992) sometimes identified side chains that hindered antibody binding. Getzoff et al. (1988) reviewed the evidence indicating that, in heteroclitic antibodies, the contributions of some residues to binding affinity can be increased. D. Water-Mediated Binding

The complex of D1.3 antibody with hen egg white lysozyme is currently one of the best resolved antigen-antibody complexes, at 1.8 A (Bhat et al., 1994). At this resolution, about 50 water molecules were reported at or near the interface, with few of them actually trapped at the interface. The structure of the lysozyme complexed with the D1.3 mutant W92D (VL domain) was also solved (Ysern et al., 1994). Titration calorimetry of W92D mutant complex formation showed that the AAG of the reaction (--4 kcal, wild type-mutant) could be attributed to a smaller negative binding enthalpy (3.8 kcal) with few net changes in binding entropy. In the structure of the mutant, two water molecules occupied the space created by the smaller size of the Asp side chain compared to Trp.

212

FIR1 NOVUI‘NY AND ,JLiRC,EN BAJORATH

The phenomenon of water-mediated binding has been most extensively discussed for the trp receptor-operator system (e.g., Shakked et al., 1994). There, the specificity of DNA-protein interaction could be explained not only by direct hydrogen bonding but also by water-mediated hydrogen bonds. Comparison of the X-ray structures of the free and bound states of the DNA operator regulatory sequence made it clear that “the three hydration sites used to mediate protein contacts to the three critical bases of the operator half-site sequence are already fully occupied in the free DNA. The water molecules can thus be regarded as non-covalent extensions of the DNA bases which may be used as stereospecific recognition elements of the DNA target sequence” (Shakked et al., 1994). A full account of water-mediated interactions in antigen-antibody complexes requires knowledge of whether any given water molecule was bound to a protein prior to complex formation, or whether it became passively trapped at the interface. In the first instance, a prebound water molecule can be looked at as another protein side chain, and its interactions in the complex can be evaluated in a straightforward manner based on its protein contact and its partial atomic charge (see Section VI1,C). In the second instance, the protein-water-protein complex becomes a ternary complex of a “solute” water molecule and two protein molecules. In this case, it is necessary to estimate correctly the entropy decrease of the system due to imprisonment of the water molecule at the interface. A water molecule bound to protein may acquire new, productive waterprotein interactions (hydrogen bonds, van der Waals contacts) at the expense of those existing in bulk water prior to complex formation. One of the four possible water-protein hydrogen bonds would stabilize the complex by a varying amount, depending on the quality of the bond and the partial charge of the participating protein atom. In bulk water at room temperature, there are -3.5 hydrogen bonds per molecule (Lemberg and Stillinger, 1975) and the average hydrogen bond energy of liquid water is probably in the range 2-3 kcal/mol (Eisenberg and Kauzmann, 1969). Thus, to recover the bulk interaction energy, a protein-entrapped water molecule should gain at least -8 kcal in hydrogen bonding to protein groups, i.e., nearly -3 kcal per H bond if, as reported by Williams et al. (1994), three H bonds per molecule is the most common form of waterprotein interaction. The entropic cost of the transfer and complete immobilization of a water molecule from the liquid to the protein has been estimated to be -2 kcal/mol (Dunitz, 1994).In most situations, the free energy associated with entrapment of water in the compIex is expected to carry a AG close to zero at best, and would probably be unfavorable in many cases. This may be the reason why, in tight protein-protein complexes, intersurface-bound water is a relatively rare phenomenon.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

213

VIII. MOLECULAR BASISOF P R O T E I N ANTIGENICITY

The early serological experiments of Landsteiner (1962) and others showed that virtually any chemical structure attached to a protein molecule can elicit an antigenic response. The early era of hapten research has given way to the current age of antigenicity research on peptides and proteins. The comparison of properties of antibodies elicited with protein antigens, on the one hand, and their mimics, short peptides, on the other hand, has formed most of our views (Sela, 1969; Benjamin et al., 1984; Tainer et al., 1985; Novotny et al., 1987; Getzoff et al., 1988; Colman, 1988) on the molecular origin of antigenicity. In discussing the molecular basis of antigenic response, it is important to distingiush between the terms antigenicity and immunogenicity (Benjamin et al., 1984). Antigenicity refers to the ability of a protein surface region to be potentially antigenic, while immunogenicity refers to the ability of any antigenic site to elicit such a response under particular experimental conditions such as the immunization protocol, and the genetic constellation of the organism [an “immunopotent” determinant according to Sela (1969)]. However, bona fide antigenic sites may not be immunogenic in certain experimental situations (“immunosilent,” Sela, 1969). Identification of antigenic epitopes can be based only on indirect experimental procedures, such as methods involving the comparative strength of binding of the same specific antibody to homologous proteins with a small number of amino acid replacements (Benjamin et al., 1984); NMR hydrogendeuterium exchange experiments (Paterson et al., 1990); and X-ray crystallography of antibody-antigen complexes ( h i t et al., 1986). Antiprotein antibodies sometimes specifically recognize short peptides (tetra- to hexapeptides), and such antibodies can be elicited by synthetic peptide antigens. Often a single native conformation of a peptide is recognized, such as the disulfide-bonded loop peptide of lysozyme: antiloop antibodies do not react with peptides in which the disulfide bond has been reduced (Amon et al., 1971). The majority of antigenic sites in proteins, however, seem to consist of amino acids that are not contiguous in the amino acid sequence (composite or discontinuous epitopes). This is simply a consequence of the large contact area between antibodies and antigens (- 800 A*), and the low probability that such a large surface would be contributed by a contiguous polypeptide segment (Barlow et al., 1986). Based on a long history of experimental work, some researchers concluded that several discrete antigenic sites exist on protein surface (e.g., Atassi, 1975, 1978), implying that certain surface regions are more antigenic than others. Other researchers have argued that many more mutually overlapping epitopes exist on protein surfaces and that the whole protein surface is antigenic (Benjamin et al., 1984).

214

J l K I N O V U I N Y AND JUKGEN BAJOWI'H

A. Segmental Flexibility and Surjace Exposure Westhof et ul. (1984) and Tainer et al. (1984) noticed a correlation between the average backbone crystallographic B factors and locations of antigenic sites and proposed that segmental flexibility (assumed to be associated with the cause of high B factor values) is an important component of antigenicity (Tainer et al., 1984, 1985). Implicit in this proposal was the notion that most antigen-antibody interactions are accompanied by an induced fit in the antigen, and that antigenic epitopes frequently rearrange their conformations to maximize productive noncovalent (electrostatic, van der Waals) interactions with binding sites (Geysen et al., 1987, Getzoff et al., 1987). An alternative antigenic theory was suggested by Padlan (1985) who proposed that the antigenic potential of a polypeptide segment was a simple additive function of atomic properties such as surface exposure and polarity. The segmental flexibility theory of antigenicity was challenged by several groups (Novotny et al., 1986b; Fanning et al., 1986; Thornton et al., 1986) on the grounds that ( 1 ) B factors represented parameters combining the effects of thermal mobility and static crystalline disorder into one measure, often making it dificult to correlate them unequivocally with either the static or the dynamic aspects of the structure; and that (2) molecular properties other than segmental flexibility (in particular, surface protrusion) were also correlated with protein antigenic sites, and the average backbone B factors. For example, the prominent antigenic epitopes may simply be the most protruding parts of the surface, easily accessible to the large antibody molecules. In fact, correlation among surface protrusion (static accessibility), segmental flexibility, and antigenicity was so strong that it was difficult to design experiments that would isolate the relative importance of these various properties. In this context, analysis of flexibility and antigenicity properties in scorpion neurotoxins, small molecules of 46 amino acids containing four disulfide bridges (Fig. 15), was particularly illuminating. The experimental work oFEl Ayeb et al. (1983, 1984) and Bahraoui et al. ( 1986) established four antigenic epitopes in the Androctonus australis neurotoxin and localized them in the amino acid sequence. Novotny and Haber (1986) calculated large-probe (r = 10 A, comparable in size to antibody domains) accessibility profiles of the Centruroides sculpturatus neurotoxin, a molecule closely similar to the of A. australis neurotoxin, using the X-ray coordinates of Almassy et al. (1983). Six prominently exposed regions were identified, clustered in four surface patches that were identical to, or overlapped with, the experimental antigenic epitopes. Next, Novotny and Haber (1986) carried out molecular dynamics simulations on the C. sculpturatus structure, computed average backbone B fac-

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

215

FIG 15. Ribbon diagram ofthe scorpion neurotoxin fold. Disulfide bonds are shown as light sticks and large probe-exposed loops (antigenic regions). are highlighted in a dark color. The N terminus of the polypeptide chain is approximately in the middle of the figure and the C;-terminal Cys is at the lower left. Two turns of an a helix are visible at the upper right, and beneath the a helix is a three-stranded /3 sheet.

tors from the simulation, and compared results with the X-ray-derived B values. Most of the neurotoxin structure and, in particular, three out of the four antigenic sites, were found inflexible, asjudged both by the computed and the crystallographic B factors (Fig. 16). The remaining flexible epitope was associated with only marginal above-average maxjma of backbone B values, corresponding to rms displacements of 0.5 A. It thus appeared that, at least in this molecule, antigenicity was determined by an exceptional surface exposure of relatively short loop segments, and that segmental flexibility was not an essential component of antigenicity. These conclusions were supported by a later study (Granier et al., 1989) based on the crystal structure of the A. australis toxin determined at 1.8 resolution (Fontecilla-Campset a,l., 1988).On refinement to 1.3 A resolution (Housset et al., 1994), the average backbone B factor of the A . australis toxin structure remained exceptionally low: 10 A2,with a maximum at 16.2 A2 and a minimum at 7.2 .k2.

216

JIRI NOVOTNY AND JURCEN BAJORATH

7.

8.

1.

0.

10

20

30

40

60

e0 .

Sequence number

FIG. 16. Large-probe accessibility contact surface and crystallographic B factor profiles for the A. australis scorpion neurotoxin molecule. Heavy line represents the smoothed accessible contact surface calculated with a spherical probe 10 %, in radius, light line represents B factors. Antigenic peptides (see Section VIII,A) are delineated by small squares at the top of the figure.

It seems significant that the calculated energetic epitopes cluster along the most exposed regions in the two proteins whose antigenicity has been studied most thoroughly: hen egg white lysozyme and influenza neuraminidase (Fig. 17). Similarly, the complete antigenic analysis of human growth hormone (Jin et al., 1992) showed the epitopes to correlate well with the most protruding regions of the molecule. It is interesting that both the experimental data of Jin et al. (1992), and the calculated frequency of side chains occurring in protein surfaces accessible to large spherical probes (Novotny et al., 1987), show a prominence of long, formally charged or dipolar side chains (Arg, Lys, Glu, Gln, Asp, Asn; see Table VI). Antibodies themselves can become antigens of other antibodies and Novotny et al. ( 1986a) investigated a correlation between immunoglobulin antigenic epitopes (i.e., locations of the idiotypic, allotypic, and isotypic serological markers) and large-probe accessibility profiles of selected antibodies. The experimental epitopes always corresponded to convex parts of an antibody surface made by reverse turns. The computed protruding surfaces occurred in homologous positions in all the immunoglobulin chains for which the computations were carried out, and most of the Bsheet surfaces of the domains were found to be poorly antigenic. The C H ~

217

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

A 7.

D1.3. Hy-10

Hy-5

Hy-5. Hy-10

Hy-10

D1.3

e.

H

c (

5.

X

"8

4.

ti

3.

8

2.

5

s

1.

0.

LO

20

30

60

50

40

80

70

90

110

100

120

13C

Sequence number

B I

I

m

4.

0

2

0

3.

X

"!

5 f;

sI

2.

.Id

0

1.

0.

l!L 320

I

I

o

ym

rn

I

I

I

:II 0

0

I

I

NClO epitope

m

380

400

I

I

420

Sequence number

FIG. 17. Large-probe accessibility contact surface profiles for (A) lysozyme and (B) influenza neuraminidase. Positions of the energetically most important residues (the functional epitopes) are highlighted by vertical bars and squares and circles at the top of the figure.

218

JIRI N O V O T N Y AND J U R G E N BAJORATH

and CH3domains had many more calculated antigenic sites than the Fab fragment. Variable-domain epitopes (idiotopes) involved both hypervariable and framework residues, and only about 25% of the hypervariable residues were strongly antigenic. Pedersen et al. (1994) have carried out a statistical analysis of all the surface-accessible residues in human and murine Fv domains. They found that precise patterns of exposed residues were different in the two species, and that most surface positions had strong preferences for a small number of residue types. These observations have practical implications for the humanization of murine antibodies (see also Section IX,E). A large body of indirect evidence seems to indicate that surface protrusion is an important characteristic of antigenic sites in proteins. Side-chain polarity and higher-than-average backbone B factors also correlate well with antigenicity, but a causal link between these properties and antigenicity is less straightforward than that between protrusion and antigenicity. The conjecture that antigenicity is mostly determined by surface protrusion provides a natural link between the two extreme antigenicity theories (“distinct antigenic epitopes exist” vs “the whole surface is antigenic”) by introducing the concept of antigenic probability which varies along the surface. B. What Is a Protein Epitope?

“An antigenic epitope” is an operational definition whose factual content differs depending on whether we emphasize energetics of complex formation, complementarity of antigen-antibody surfaces, or other phenomena. According to the method used to define the epitope, one may thus arrive at different conclusions about antigenicity (Geysen et al., 1987; Laver et al., 1990; Greenspan, 1992). In addition to crystallographic epitopes and functional (energetic) epitopes, we also have NMR epitopes defined by the extent to which bound antibody prevents deuteriumhydrogen exchange on the backbone segments of the antigen (Paterson et al., 1990; Benjamin et al., 1992) and mutational epitopes deduced from the effects of single-residue substitutions (either synthetic or natural) on the strength of binding (Smith-Gill et al., 1987; Smith and Benjamin, 1991; Smith et al., 1991; Prasad et al., 1993). The mutual relations of these various definitions are only now beginning to be delineated. Sheriff et al., (1987), Padlan et al. (1989), and Prasad et al. (1993) compared the X-ray structures and epitope mutational data on the HyHEL-5, HyHEL-10, and phosphocarrier protein HPr-Jell42 antibody complexes. By and large, the crystallographic and mutational epitopes overlapped well. The limitations of the mutagenesis approach became apparent when, of the 14 amino acid

COMPUTA?’IONAL BIOCHEMISTRY OF AN‘I’IBODIES

219

residues of the HPr protein found in contact with the antibody binding site, 9 were correctly assigned by the mutagenesis studies, 1 could not be altered by mutations, 2 appeared to be critical for the protein fold, and 2 other peripheral side chains had a minimal effect on antibody binding. Interestingly, 4 amino acids adjacent to the epitopic residues were incorrectly assigned to the Jell42 epitope. The concept of a functional (energetic) epitope (see Section VI1,D) provides an additional perspective on antigenicity. The calculated energetic epitopes (Novotny et al., 1989; Novotny, 1991; Tulip et al., 1994) clustered along the most exposed regions in the two proteins whose antigenicity has been studied the most thoroughly, i.e., hen egg white lysozyme and influenza neuraminidase (Fig. 17). The complete antigenic analysis of human growth hormome gin et al., 1992) also showed the functional epitopes to correlate best with the most protruding regions of the molecule. Both the experimental data of Jin et al. (1992) and the calculated frequency of side chains occurring on protein surfaces accessible to large spherical probes (Novotny et al., 1987) showed a prominence of long, formally charged or dipolar side chains (Arg, Lys, Glu, Gln, Asp, Asn; see Table I) in the functional epitopes. At the same time, however, the size of the crystallographic epitope indicates that a surface area larger than the functional epitope must be complementary to the antibody surface. C. Cross-Reactivity in Proteins: Influenza Neuraminidase How cross-reactive are individual protein epitopes? How degenerate are proteins as antigens? The fact that a delimited patch of protein surface can support several overlapping, different epitopes has by now been well established (Darsley and Rees, 1985; Malby et al., 1994; Lescar et al., 1995; Bottger et al., 1995). In the two cases where cross-reactive epitopes were studied in atomic detail, degenerate binding of the same antigenic motif (Malby et al., 1994), or two different antigenic motifs by the same antibody (Lescar et al., 1995), did not require any chemical similarities between the different epitopes or different binding sites. In the N9 neuraminidase complexes, -80% of the NC41 and NClO antibody epitopes overlap (Malby et al., 1994), and one might expect about two-thirds to threequarters of the energetic neuraminidase residues to be identical, and to contribute comparable binding energies, in the two complexe~.~ This was clearly not the case, however, as established both by experiment (Malby This was the situation invariably found with enzyme-inhibitor complexes such as, e.g., the eglin inhibitor in complex with chymotrypsin and with subtilisin. See Krystek et al. (1993) for more details.

220

JIRI NOVOTNY AND JURGEN BAJORATH

et al., 1994; Nuss et al., 1993; Webster et al., 1987) and by calculations (Tulip et al., 1994). The two antibodies, NC41 and NC10, engaged different side chains of the same neuraminidase surface to create stable complexes (Table VIII). Is an area of protein surface a single antigenic epitope, or is it a multitude of different overlapping epitopes? The term epitope being an operational one, the answer to this question may also be formulated operationally. About 50-60% of protein surfaces is made of polar atoms including those that are formally charged. The charged atoms occur mostly at prominent surface convexities where they are most efficiently solvated. The Asp, Glu, Lys, and Arg side chains are not only preferentially located in loops but are themselves the longest side chains. Naturally, protrusions and their polar atoms constitute the best anchor points for a firm attachment of antibodies but, depending on the topography of the larger, adjacent surface area and relative dispositions of the multitude of surrounding polar atoms (both capable of making, and when in the complex required to make, hydrogen bonds) many alternative solutions exist for approximately complementary binding site surfaces. Thus, the antigenic code, akin to the stereochemical code of protein structures (Section III,B), may be degenerate (Malby et al., 1994) in the sense that many different surface shapes can complement an antigenic determinant. The degeneracy of both the stereochemical and antigenic codes may not be accidental. Protein-protein interactions follow the same physical rules as protein folding events, and approximately similar phenomenology can be expected in both types of interactions. IX.

ANTIBODY

ENGINEERING

The modular three-dimensional architecture of immunoglobulins (Section II1,A) and T-cell receptors lends itself well to protein engineering schemes that shuffle, transpose, and reconnect the domains into chimeric proteins with hybrid structures and novel properties. The rapid development of antibody engineering has been stimulated by three important technological advances: ( 1) Rapid development of gene cloning technologies and the advent of the polymerase chain reaction (PCR), allowing subcloning of eukaryotic genes into bacterial plasmids (Boss et al., 1984; Cabilly et al., 1984). Nevertheless, the expression of chimeric immunoglobulins in transformed lymphoid cells such as myeloma or hybridoma (Rice and Baltimore, 1982; Oi et al., 1983; Ochi et al., 1983; Rusconi and Kohler, 1985) has remained a powerful experimental tool. (2)Progress in the controlled expression of proteins from plasmid-inserted genes in bacteria and other organisms, i.e., yeast (Wood et al., 1985) and baculovirus

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

22 1

TABLE VIII

AGresduem Calculated for N 9 Neuraminidme Epitopes (-80% Overlapping) in NC41 and NCI 0 Antibody Complexesa NC41 Antibody complex Residue Attractive LYS-463 Lys-432 Ala-369 Asn-400 Thr 401 Arg 327 Ile 368 Pro 43 1 Ser 367 Lys 435 Neutral Asn 329 Pro 328 Leu 399 Pro 326 Gly 343 Trp 403 Ser 370 Asn 345 Man 200D Asn 344 Man 200F Ile 149 Asn 347 Ile 366 Ser 372 Asp 434 Repulsive Asp 402 Glu 433

NClO Antibody complex Residue

-4.8 -4.0 -3.4 -2.6 -2.4 -1.9 -1.8 -1.4 -1 .o -1 .o -0.8 -0.7 -0.6 -0.5 -0.5 -0.5 -0.3 0.0 0.0 0.1 0.2 0.3 0.3 0.5 0.7 0.7

1.5 2.0

AGresidue

Attractive Lys.432 Asn-329 Man 200F Ala-369 Thr 401 Pro 328 Gly 343 Ser 370

-3.6 -2.2 -2.0 -1.6 -1.5 -1.3 -1.0 -1 .o

Neutral Pro 331 Pro 342 Trp 403 Man 200E Man 200D Asn 400 Ile 368 Thr 332 LYS-336 Ile 366 Val 333 Ser 367 Asp 330 Asn 344

-0.6 -0.6 -0.6 -0.5 -0.3 -0.2 0.0 0.0 0.0 0.2 0.2 0.3 0.9

Repulsive Tyr 341 Ser 372

1.1 1.1

a Data in kcal/mol; see Tulip et al. (1994) for details. Amino acids set in boldface type are those mutated by Nuss et al. (1993) and Webster et al. (1987).

and plant cells (Hiatt et al. (1989). Proteins were obtained either as secreted, refolded, soluble species or as insoluble, denatured intracellular aggregates, inclusion bodies, that were easy to isolate but had to be solubilized and renatured before a functional protein was obtained. (3) Computer-

222

JIRI NOVOTNY AND JURCEN BAJORATH

aided structural design techniques, supported by the numerous X-ray crystallographic coordinates of diverse antibodies. In this section, we briefly discuss chimeric antibodies and T-cell receptors, covalent Fv fragments (single-chain or disulfide-bonded), heterospecific bifunctional constructs, and various domain humanization schemes (Neuberger et al., 1984; Riechmann et al., 1988) with a focus on computeraided structural design. A . Chimeric Antibodies via Domain Interchange Some of the first synthetic chimeras were mouse antigen binding domains and V,) implanted on human constant-region domains (Boulianne et al., 1984; Morrison et al., 1984; Takeda et al., 1985);VH domains spliced onto CLdomains giving rise to functional, antigen-specific, VH-CJVI.-CL L chainlike dimers (Sharon et al., 1984); recombinant mouse antibodies with novel effector functions engineered via H chain constant domain swaps (Neuberger et al., 1985; Schneider et al., 1988); and domain deletions or insertions to produce shortened antibody-like molecules (Igarashi et al., 1990) and antibodies with altered oligomerization states. As a rule, the structural design of these hybrids was straightforward (i.e., genes coding for complete domains were swapped, by subcloning, between molecules). More-ambitious examples of antibody engineering included the replacement of selected immunolgobulin domains with foreign proteins [e.g., enzymatically active bacterial P-lactamase (Goshorn et al., 1993; De Sutter and Fiers, 1994)l. Highlights of these constructions include (1) CD4 and CTLA4 immunoadhesins (proteins composed of IgG constant domains, the Fv modules being replaced with two extracellular domains of the CD4 receptor (Capon et al., 1989) or the CTLA4 marker (Linsley et al., 1991) for use in acquired immunodeficiency syndrome (AIDS)therapy; (2) immunoligands such as the interleukin-2 molecule fused to IgG constant regions (Landolfi, 1991); (3) an exogenous peptide epitope implanted in lieu of the H chain third hypervariable loop (Sollazzo et al., 1990); (4) metal coordination sites engineered into an antibody binding pocket (Roberts et al., 1990); and divalent molecules combining the complete class I major histocompatibility complex (MHC) molecule with an immunoglobulin heavy chain (Dal Porto et al., 1993). Finally, various chimeric constructs consisting of T-cell receptor (TCR) C domains and antibody V domains, or TCR-antibody polypeptide chain aP or yd heterodimers, were also assembled and shown to carry functional traits characteristic of both parent molecules (Gascoigne et al., 1987, Gross et al., 1989, Mariuzza and Winter, 1989; Becker et al., 1989; Goverman et al., 1990; Schearman et al., 1991; Gregoire et al., 1991; Eilat et al., 1992). (VL

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

223

B. Single-Chain Fv Fragments, “Diabodies, Fv-Toxin Congugates ”

The production of large quantities of stable Fv fragments in bacteria was seriously hindered by the fact that two polypeptide chain segments, VL and VH, had to be separately refolded and reformed into a noncovalent dimer. The problem has been circumvented by the design and successful construction of a single-chain Fv where the C terminus of a VH (alternatively, V,) domain was connected to the N terminus of the VL (VH)domain by a polypeptidic linker. The following issues had to be considered in the design of the linker: (1) connection polarity, i.e., the question of a possible preference for leading the linker from the C terminus of the Vr. domain to the N terminus of the VH domain, or vice versa; ( 2 ) the minimal, and possibly the maximal, length of the linker; (3) the nature, i.e., the detailed amino acid sequence, of the linker, with special reference to protein folding and proteolysis. Two different single-chain Fv designs were described in 1988, both involving the use of computer modeling to design the molecule but approaching the preceding questions in different ways. In the work of the HarvardJ Creative Biomolecules group (Huston et al., 1988), the order of domain connection was deemed unimportant in view of the pseudo symmetry of the Fv fragment, and the polarity chosen was VH+VL. In the work of the Genex group (Bird et al., 1988) the polarity was VL+VH. The success of both designs immediately proved the functional equivalence of both solutions. The HarvardKreative Biomolecules group obtained the distance to be spanned by the linker from Fab crystal stryctures as -3.5 nm which, considering the length of a peptide unit (3.8 A), would require at least 11 amino acid residues. To interfere minimally with refolding of the two immunoglobulin domains, and also to maximize the flexibility of the linker, a sequence rich in glycines, (Gly-Gly-Gly-Gly-Ser)3,was chosen. Such a sequence should also be rather resistant to proteases. The design philosophy of the Genex group relied on searches through the Brookhaven Protein Data Bank for peptides of proper molecular dimensions to bridge the interdomain distance and to introduce correct peptide bond angles at the N and C termini of the prospective linkers. Alternatively, linkers were designed by an incremental addition of short peptides from the C terminus of the VL to the N terminus of the VH. Some linkers were designed to minimize interactions with the Fv, whereas others were designed to fit into a groove on the back of the Fv structure primarily with the use of alternating glycine and serine residues and Glu and Lys included to enhance solubility. Thus, one of the successhl linkers had the sequence EGKSSGSGSESKST. Since the original studies, dozens of papers have been published describing the use of single-chain Fv fragments for various diagnostic and

224

JIRI NOVOTNY AND JURCEN BAJORATH

therapeutic purposes, and in investigating various aspects of the design (see, e.g., Huston et al., 1991, and Pluckthun, 1992, for reviews). It seems that many amino acid sequences can satisfy the purpose of the linker, although some were claimed to be superior from the point of view of bacterial secretion (Takkinen et al., 1991). Argos (1990) published a detailed survey of oligopeptide linkers in natural multidomain proteins and recommended candidates for general gene fusion work. He concluded that, for the linker to be optimally extended, it should contain small (Gly) and polar (Ser, Thr) amino acids. Thus, e.g., the improved linker described by Adams et al. (1993) and Hilyard et al. (1994) had the sequence (Ser-Ser-Ser-Ser-Gly)3,while that used by Whitlow et al. (1993) had the sequence GSTSGSGKPGSGEGSTKG. For single-chain Fv fragments, the critical length requirement for the linker seems to be 12-13 residues, as 10-residue linkers did not sterically allow the VL-VH domain dimer formation. This observation was cleverly exploited by Holliger et al. (1993) and Hudson et al. (1994) in the design of “diabodies,” i.e., small bivalent and bispecific antibody fragments. By using a linker too short to allow pairing between the two domains of the same chain (either 5 or 10 residues of linker length), the VL and VH domains were forced to pair with the complementary domains of another chain to create two different antigen binding sites. An efficient domaindomain packing at the two contacting ends of the dimeric Fv modules was investigated by computer graphics (Holliger et al., 1993), and it was found that it might be possible to join the C terminus of the VH domain directly to the N terminus of the VL domain and dispense with the linker polypeptide. Indeed, fragments with no linker proved to be dimeric and bispecific when expressed in bacteria and the X-ray structure of the diabody L5MK16, specific for phosphatidylinositol, turned out to be very similar to that predicted by modeling (Perisic et al., 1994). According to Desplancq et al. (1994) and Whitlow et al. (1994), single-chain Fv fragments have a natural tendency to form heterobivalent dimers and perhaps even higher oligomers. Thus, heterodimers of antifluorescein and antitumor scFv fragments 4-4-30 and CC49, respectively, formed with a linker as long as 12 residues (Whitlow et al., 1994). Bivalent dimers of the antitumor antibody B72.3 were obtained even with a linker 30 residues long, and better activity was observed with the domain arrangement VL-linker-VH compared to VH-linker-VL (Desplancq et al., 1994). NMR data from the McPC 603 single-chain Fv fragment using the (GGGGS)s linker (Freund et al., 1993) indicated relative independence of the linker from the rest of the structure, and confirmed its high flexibility. Two X-ray crystallographic structures have been reported for single-chain Fv fragments (Kortt et al., 1994; Zdanov et al., 1994). In both, including the

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

225

1.7 A resolution structure of the carbohydrate-binding antibody Se155-4 (Zdanov et al., 1994) single-chain Fv complexed with the antigen, the linker was largely invisible due to crystalline disorder. Antibody binding sites, with their unique specificities, can be used to target proteins (enzymes, toxins) to cells and tissues. An example of this approach to human therapy is antibody-targeted dissolution of infarctionrelated blood clots by proteolytic enzymes (Fab urokinase, Haber et al., 1989). The single-chain Fv technology has been particularly successful in supporting bacterial production of effective immunotoxins. Thus, singlechain Fv of antitumor specificity with a modified Pseudomonas exotoxin (PE40) attached to its C terminus is a powerful and specific anticancer agent (Chaudhary et al., 1989). An extensive literature, exploring various aspects of targeting, toxin structure, optimal expression, and other aspects of this topic exists by now, and is not reviewed here in detail. One practically relevant bifimctional construction was the antidigoxin single-chain Fv fragment with the -40-residue fragment B of the staphyloccocal protein A attached to its N terminus (Tai et al., 1990). The fragment B effector domain, through its affinity for the immunoglobulin Fc fragment, simplified purification (via affinity chromatography) of the Fv construct. The single-chain technology has also been applied successfully to T-cell receptors and MHC proteins. Fv-like, soluble single-chain T-cell receptor V P a fragments were expressed in bacteria and shown to possess essentially the same antigen specificity as the parent receptor (Novotny et al., 1991; Ward, 1992; So0 Hoo et al., 1992; Schodin and Kranz, 1993; Kurucz et al., 1993). Functional, soluble MHC-like proteins were prepared by tethering the three extracellular domains (al,a2and a3)of the mouse H-2 class I a chain to the p2 domain (Mottez et al., 1991, Mage et al., 1992). Finally, Eshhar et al. (1993) designed and constructed chimeric genes composed of a single-chain Fv domain linked with y or 5 chains, the common transmembrane, signal-transducing subunits of the immunoglobulin, and T-cell receptors. The chimeric genes were expressed as functional cell surface receptors in a cytolytic T-cell hybridoma, and they triggered interleukin-2 secretion from the cells on encountering antigen to which the single-chain Fv specificity was directed (the hapten trinitrophenyl). Such chimeric receptors can provide T cells and other lymphocytes with antibody-type recognition coupled directly to cellular activation. A diagrammatic summary of Fv fragment-based single-chain constructs is given in Fig. 18.

C. Disuljide-Bonded Fv Fragments It is known that Fv fragments of different specificities, and different amino acid sequences, vary widely in stability (Padlan, 1994).Although the

226

FIG 18. Protein engineering of antibody binding sites: a schematic diagram. VH and VI. domains are shown as open circular sections, peptide linkers as thick lines, leucine zippers (a helical dimers) as filled rectangles, and cell toxins, enzymes, etc., as filled circles. (Tol,) From left to right, a single-chain Fv fragment (scFv), a disulfide-bonded Fv fragment (SSFv), a bivalent diabody, and a bivalent, disulfide-bonded scFv dimer. (Bottom) From left to right, an scFv attached to an effector molecule (e.g., Pseudomonas PE40 cellular toxin), a bivalent scPv-leucine zipper construct, and a bivalent scFv chimera with three scFv fragments chained together by a flexible linker. Both homo- and heterodimeric leucine zipper sequences are known, allowing in principle a noncovalent assembly of homobivalent and heterobivalent antibody-like chimeras. Similarly, homo- and heterobivalent, linker-chained scFv constructs are possible. Bivalent scFv chimeras based on four-helix bundles (the ROP dimer of the a helix-turn- helix motif). were also described (see Section IX,F) but are not shown. Note also that antigen combining sites (i.e., the three hypervariable loops in the VH and V1, domains) can be transferred from an Fv framework to another uones et al., 1986), and different pairs of VH and V1- domains can be recombined to give novel antigen binding sites.

critical interdomain side-chain contacts accomplishing the “three-layer” P-sheet-P-sheet packing are conserved in the variable domains of antibodies and T-cell receptors, about 40-50% of the domain-domain interface is contributed by hypervariable loop residues, suggesting that the strength of domain-domain contacts may be modulated considerably. Quantitative information is scarce, but one of the most stable VL-VH

COMPUI'ATIONAL BIOCHEMISI'KY O F ANTIBODIES

227

dimers may be that of the antidigoxin antibody 26-10 (Anthony et al., 1992), approaching the V r P H association constant of 1 nM. At the other extreme, the McPC 603 Fv fragment was found to be only marginally stable (Glockshuber et al., 1990), with an estimated domain-domain association constant of less than 1 p M . According to Glockshuber et al. (1990), three different strategies, namely, (1) chemical cross-linking, (2) introduction of disulfide bonds, and (3) generation of a single-chain protein, all stabilized the Fv fragment in a comparable way. To introduce disulfide bonds into the Fv fragment, Glockshuber et al. (1990) used the computer program of Pabo and Suchanek (1986) that systematically scans intra- or intermolecular residue pairs and selects those with the Cp-Cp distance close enough to support a SS bridge (-4-5 A). The two disulfide bridges Glockshuber et al. (1990) introduced into the McPC 603 VH and Vr domains were relatively close to the third CDR loop: V1.55 and VH108 (L chain and H chain, respectively), and G56C and T106C (L chain and H chain, respectively). Jung et al. (1994) used molecular graphics and computer model building tools to identify two possible interchain disulfide bond sites in the framework region of the Fv fragment, distal from the antigen combining site (Fig. 19). Of the two sites identified, i.e., VH44-vL105 and vH111VL48, the former was tested by constructing a chimeric protein composed of a truncated form of Pseudomonas exotoxin and the Fv fragment of the monoclonal anticancer antibody B3 (Brinkmann et al., 1991; see also Webber et al., 1995). The chimeric toxin was found to be just as active as the corresponding single chain counterpart and considerably more stable. Reiter et al. (1994a) then showed that the latter disulfide site, VH11 l-Vl~48, could also be used to generate a functional disulfide-bonded B3 Fv fragment. Reiter et nl. (1994b) extended the disulfide-bonded constructs to two more Fv fragment-Pseudomonas toxin chimeras, generating cytotoxic proteins with full activity, improved stability, and a good yield in bacterial expression. Thus, SS-bridged Fv fragments may be more useful than single-chain Fv immunotoxins as therapeutic and diagnostic agents where inexpensive production and large quantities of refolded material are required, The single-chain Fv fragments may retain their usefulness in applications such as recombinant membrane-bound Fv receptors (Eshhar et al., 1993), phage surface display of complete binding sites, and heterospecific bifunctional miniantibodies.

D. Humanization of Mouse Monoclonals Monoclonal antibody therapy has become an attractive alternative in the cure of several disease states, such as allergy (Kolbinger et al., 1993) and cancer. Immunoglobulin-doxorubicin conjugates (Trail et al., 1993)

228

JIRI N O V O T N Y A N D JURGEN BAJORATH

FIG.19. Disulfide bond engineered into the Fv fragment. The Molscript ribbon diagram shows the VL (heavy lines) and the VH (light lines) domains. Carbons of side chains H44 and L100, which were replaced with cysteines by Jung et al. (1994), are shown as van der Waals radii spheres.

and single-chain Fv-Pseudomonas toxin chimeras are currently in human clinical trials. One of the major potential drawbacks of prolonged administration of mouse hybridomas, or fragments thereof, is onset of an immune response against the mouse framework antigenic determinants. To diminish this problem, murine monoclonals can be humanized by protein engineering. In therapy based on complete antibodies or Fab fragments,

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

229

most foreign determinants can be eliminated by grafting the mouse VLand VH domains onto the human constant domains (Sahagan et al., 1986, Sun et al., 1987). More fundamentally, the six murine hybridoma hypervariable loops forming the antigen combining surface can be implanted onto a human VdV, framework, producing functional chimeric Fvs that retain the original antigenic specificity (Jones et al., 1986; Riechmann et al., 1988; Verhoyen et al., 1988).This fine molecular surgery is not always successful. As discussed in Section VI,C,2, the relative orientations of the CDR loops and their conformations are supported by several framework residues. Experience has shown that these side chains must be transplanted together with the loops in order to retain the full antigenic activity of the chimera. Computer-aided structural analysis of framework-CDR loop interactions helped to rationalize humanization protocols. Aspects of this analysis have been discussed in Section VI, but a few additional comments are given here. Difficulties encountered in CDR loop grafting are illustrated, e.g., by the work of Kao and Sharon (1993) on a hybrid antibody consisting of the anti-p-azophenyl arsonate framework (hybridoma 36-65) and antidextran CDR loops (hybridoma 26.4.1). Without the use of structural analysis or modeling, and with the Kabat et al. (1977) definition of CDR loops, all attempts failed to produce a functional chimera with VH and VL 36-65 frameworks and 26.4.1 CDR loops, although a partial chimera, constructed from a hybrid H chain and the native antidextran L chain, was fully active. The humanized anti-Tac antibody (with antiinterleukin-2 receptor activity) was prepared by Queen et al. (1989) with the use of human frameworks that maximized homology with the anti-Tac framework sequences. In addition, a computer model of the anti-Tac antibody, built with the ENCAD program of Levitt (1983), was used to identi5 several framework positions (H chain 27, 30, 48, 67-68, 98, and 106, L chain 47 and 59) which were likely to interact with the CDR loops or antigen. The humanized antibody retained about one-third of its affinity for the antigen, compared with the wild-type antibody. In the trial-and-error approach of Gorman et al. (199 1) a chimeric form of an anti-CD4 antibody, based on the framework of the human myeloma KOL, possessed essentially native antigen affinity,while a chimera based on the NEW antibody framework showed only a poor affinity. The most successful humanization CDR loop grafting protocols documented in the literature employed computer-built models of the chimeric Fv fragments. Typically, these studies report a judicious choice of frameworks, transfer of the critical loop-supporting residues from the wild-type antibody, and the canonical loop structure approach to model generation. In one case (Nakatani et al., 1994) the CDR loops were grafted on, and

230

JIRl NOVO'INY AND JUKGEN BAJOKATH

QUANTNCHARMm homology models were built for, as many as nine different frameworks. Each construct was assessed for biological affinity. One of the humanized anti-IL-2 (B-B10) receptor antibody variants, M5, showed nearly the same activity as the mouse wild type. Some of the best documented examples of the humanization of a group of antibodies were reported by the Genentech group (Carter et al., 1992; Kelley et al., 1992; Presta et al., 1993) who also reported crystal structures of three humanized fragments, one Fv and two Fab (Eigenbrot et al., 1993, 1994). The crystallographic structures represented different variants of the 4D5 antibody against a proto-oncogene HER2 gene product p185. The X-ray structures attested to an excellent accuracy of model building: The average rms deviation of the computer-built models from the X-ray structures was within the range of those observed among the X-ray structures themselves. The modeling protocol of the Genentech group relied on the generation of consensus coordinates based on the crystal structures of seven different Fab fragments. The consensus structure is believed to have eliminated inappropriate structural idiosyncrasies that may be present in a single structure. Derivation of the consensus framework, with use of the program INSIGHT (Molecular Simulations, Inc.), involved ( 1) independent definition of the p-sheeted segments; (2) least-squares superposition of the consensus p strands from all the structures onto the same template structure; (3) redefinition (filtering) of consensus segments based on a Ca-Ca distance criterion (i.e., only a carbons closer than 1 A to the template were retained in the template; generally, p strands passed this test, whereas many loops did not); (4) calculation of average Cartesian coordinates for all the consensus backbone atoms; (5) addition of conserved side chains to the consensus structure; and, finally, (6) addition of the modeled CDK loops to the consensus structure, based on the Chothia et al. (1989) classification of canonical loops. Often, no antibody template could be found for the H3 loop. In that case, loops of the same length were imported from nonimmunoglobulin structures and the resulting models energy-minimized in the DISCOVER (Molecular Simulations, Inc.) or CHARMm (Molecular Simulations, Inc.) program. The humanization procedure reported by Hsiao et al. (1994) likewise relied on the comparison of several immunoglobulin frameworks. The most homologous sequences were selected as structural templates and the consensus framework was generated by superposition of invariant residues at the VL-VH interface (Novotny and Haber, 1985; Novotny and Sharp, 1992). Where possible, the Chothia et al. (1989) canonical loops were utilized; alternatively, the loop selection procedure of Jones and Thirup (1986) was employed to extract approximate loop conformations from the

C;OMPUTATIONAL. BIOCHEMISTRY OF ANTIBODIES

23 1

Brookhaven Protein Data Bank. Then, with the use of computer graphics, side chains were compared residue by residue to identify framework positions potentially critical for the structural integrity of the combining site. For such positions, the murine residues were retained in the final model. In an attempt to develop a general Fv humanization algorithm, Studnicka et al. (1994) classified each amino acid position in the variable region according to the benefit of achieving a more humanlike antibody vs the risk of decreasing or abolishing specific binding affinity. With use of the Chothia et al. (1987,1989) definitions of CDR loop end points, knowledge of Fv solvent accessibility (Novotny et al., 1986a; Padlan, 1991), VL-VH interface conservation (Novotny and Haber, 1985, Chothia et al. 1986), and the conservedP-sheeted motifs (Lesk and Chothia, 1982), a consensus table was developed to identify, in a semiquantitative manner, low-risk positions (exposed to solvent but not contributing to binding or antibody structure), moderate- and high-risk positions (directly involved in antigen binding, CDR stabilization, or internal packing). The consensus table was tested experimentally by humanizing the anti-CD5 antibody H65 whose binding activity was greatly reduced by two previous “blind” attempts at CDR grafting. The new humanized H65 antibody, with 20 low-risk human consensus substitutions, retained the full binding avidity of the wild type. Another engineered antibody with 14 more moderate-risk substitutions had unexpectedly three- to seven-fold-enhanced avidity. The Studnicka et al. (1994) “position-risk scheme” is similar in spirit to the solvent accessibility analysis reported by Pedersen et al. (1994) (see Section VIII).

E. Heterospecijic Polyvalent Constructs, “Miniantibodies” The conceptually simplest bivalent, bispecific single-chain construct is the one where two single-chain Fv fragments of different specificities were connected by a C-terminal, disulfide bond-forming Gly4Cys or Cys5Hisj extension (Adams et al., 1993; Kipriyanov et al., 1994), or by means of designed Gly- and Ser-rich linkers (Mallender and Voss, 1994; Mallender et al., 1994; Hayden et al., 1994; Mack et al., 1995; Kurucz et al., 1995). A more complicated design of heterofunctional “miniantibodies” consisting of a single-chain Fv fragment, a flexible IgG3 hinge and an amphiphilic a-helical segment (leucine zipper), was reported by Pack and Pluckthun (1992) (Fig. 18; see also Pack et al., 1993). The dimer-forming propensity of the a helices drives spontaneous generation of a noncovalent, bifunctional, heterospecific chimera. When expressed in Escherichia coli, the bivalent fragments associated readily and were able to bind to surface-bound antigen under conditions in which bivalent but not monovalent antibody fragments bind. Packet al. (1993) reported that two single-

232

JIRI NOVOTNY AND JURGEN BAJORATH

chain Fv fragments with a C-terminal hinge followed by a helix-turn-helix motif formed bivalent noncovalent dimers in vivo with significantly higher avidity than those based on the leucine zipper-containing constructs. The improved avidity may have resulted from the greater stability of the fourhelix bundle formed on association of the helix-loop-helix motifs, from the antiparallel orientation of the Fv binding sites, or from both. Better still, tetravalent miniantibodies were assembled on a parallel, four-bundle a-helical scaffold (Pack et al., 1995; Fairman et al., 1996). A similar dimerization motif, i.e., the Fos and Jun leucine zipper genes fused to VH and CH1domain genes, was used to bring together Fab‘ fragments with two different specificities (Kostelny et al., 1992) in a mammalian expression system. Initially, bivalent, monospecific Fab’ fragments were expressed individually in the myeloma cell line Sp2/0 from plasmids containing genes for the hybrid H chains and those encoding the complete L chains. When these homodimers were reduced at the hinge region and their mixture reoxidized, mostly Fab’-zipper heterodimers formed and could be readily isolated. One of the exciting developments of antibody engineering has been the advent of combinatorial libraries of immunoglobulin polypeptides, and their expression on the surface of filamentous phage vectors (see, e.g., Winter and Milstein, 1991, or Marks et al., 1992, for review). One of the components of this design is use of the single-chain Fv fragment for phage surface expression. In principle, the method may supersede hybridoma technology and facilitate a mass production of antibodies with desired specificities. In practice, the recovery of rare “original” L and H chain pairs constituting a potent antibody depends on an efficient screening of very large (> lo8) combinatorial libraries. Screening of V gene repertoires becomes very efficient when single-chain combinations of VL and VH genes are cloned into a filamentous fd phage vector in its gene I11 or gene VI proteins. Modified phages display functional single-chain Fv modules on their surfaces and readily bind to antigen in a specific manner. Phage variants as rare as lo-‘ can be isolated in a single affinity chromatography step. MODELING AND ENGINEERING X. T-CELLRECEPTOR

Before March 1995, when the X-ray crystallographic structure of the T-cell receptor chain was published by Bentley et al., all our structural knowledge of thie receptor had been derived indirectly, from analyses of T-cell receptor amino acid sequences, homology modeling, and sitedirected mutagenesis of T-cell receptor binding sites.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

233

A . Outline Structure Based on Sequence The first nucleotide sequences of T-cell receptor a and P chain genes (Chien et al., 1984; Hedrick et al., 1984; Saito et al., 1984a,b; Yanagi et al., 1984) indicated that these polypeptides corresponded in size to immunoglobulin L chains and were distantly related to immunoglobulin chains. The initial comparisons of small numbers of P chain variable domains (Patten et al., 1984) emphasized either similarities or differences between immunoglobulins and led, accordingly, to different conclusions about their functions. For example, Patten et al. (1984) hypothesized, purely on the basis of a sequence variability index, that VP segments had more hypervariable regions than the three CDR segments of immunoglobulins. The three additional nonimmunoglobulin hypervariable segments might be involved in interactions of T-cell receptors with polymorphic MHC determinants. On the other hand, Arden et al. (1985) suggested, on the basis of sequence analysis of 19 Va genes, that variable domains of T-cell receptors and immunoglobulins are similar in structure, and that it is unnecessary to postulate any special sites, apart from the classical antigen binding site, for binding properties of T-cell MHC-restricted antigen receptors. analyzed sequence Amid these conflicting claims, Novotny et al. (1986~) similarity of immunoglobulins and T-cell receptors from the point of view of structural fingerprints known to be conserved in antibody domains and in the antigen combining site (Lesk and Chothia, 1982; Novotny ef al., 1983; Novotny and Haber, 1985; Chothia et al., 1986; see Section IV). Based on these conserved sequence patterns, the T-cell receptor a, P, y, and CD8 chains were postulated to fold into immunoglobulin-like domains consisting of multistranded antiparallel P-sheet bilayers. Since the invariant side-chain motifs mediating domain-domain interactions were also found to be conserved T-cell receptor chain^,^ it appeared that the binding site of the T-cell receptor was fundamentally no different than the conventional binding site of an antibody. Thus, a/?receptors and immunoglobulins were likely to accommodate, in their respective binding sites, antigens in the same size range. If there was a single T-cell receptor binding site (as opposed to separate sites, one for the antigen and the other for the presenting MHC molecule), and if the binding sites of T-cell receptors and antibodies were fundamentally no different, what was the structural basis for the difference in antigen recognition between B and T cells? The answer to this question was likely The crystallographic structure of the CD8 extracellular domain reported by Leahy et al. (1992) confirmed that CD8 associated into a Fv-like homodimer.

234

JIKI NOVOTNY AND JURCEN UAJORATH

to be found in the structure of the antigen recognized, rather than in the binding sites themselves. The epitopes recognized by binding sites of antibodies were derived entirely from antigen, while those recognized by T-cell receptors may have been derived in part from the nominal antigen and in part from the restricting MHC element. Fundamentally a restatement of the “altered self’ hypothesis (Zinkernagel and Doherty 1974), the preceding conjecture accommodated the experimental evidence for complex formation between an immunogenic peptide and an MHC class I1 molecule (Babbitt et al., 1985). In the Ca and C/3 T-cell receptor domains, the segments corresponding to A-B-D-E /3 sheets showed a higher degree of conservation than their putative solvent-facing C-F-G /3-sheets, consistent with a dimeric Va/V/3 (VJVH-like) and CalCp (CLJCHl-like) domain modules. However, the interface of the C domains in the T-cell receptors resembled most the corresponding interface of the antibody CH3 domains which is relatively rich in electrically charged residues. Stable P-sheet-P-sheet contacts involving buried charges required formation of neutralizing ion pairs. The net charges of the putative Ca/C/3 domain dimers suggested favorable domaindomain electrostatic interactions, whereas some other domain-domain pairs were electrostatically less favorable (Novotny et al., 1986~). This observation further justified the search for a missing chain compatible with the Cy domain. Indeed, the 6 chain was identified by Brenner et al. (1986) and Bank et al. ( 1 986). Chothia et al. (1988) developed an outline structure of the T-cell a/3 receptor, based on a large set of amino acid sequences and assuming that the VaVp dimer has a framework structure very close to that of immunoglobulins. The loops that formed the antigen binding site were found to be similar in size to those commonly found in immunoglobulins, although perhaps with different conformations, and only limited sequence variability was found in the a I and PI hypervariable loops, suggesting that they mainly interacted with the constant parts of the MHC proteins. Claverie et al. (1989) built a complete model of the VaVp dimeric module, with use of the FRODO program and the Fab backbone as the starting point. The model was optimized by energy minimization with CHARMm and CONGEN, and was used to investigate various alternative arrangements of the receptor and the MHC molecules in putative antigenic complexes. The main conclusions were that the a and /3 chains were functionally equivalent, and that the third hypervariable loops of both chains may mainly interact with the antigen. The first and second regions were in positions favorable for making contacts with residues pointing up from the two a helices of the MHC structure. Similar suggestions about the relative MHC-peptide-receptor orientations were put forward by Davis and

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

235

Bjorkman (1988). It was comforting to see many of the previously mentioned predictions (e.g., the immunoglobulin fold and the character of the Cp interdomain surface) to be confirmed by the X-ray structure of the p chain (Bentley et al., 1995). An interesting observation was made by Jores et al. (1990) in their studies on sequence variability in T-cell receptor /3 chains. In addition to the three hypervariable loops homologous to those of antibodies, the /3 chains possessed a distinct fourth hypervariable loop between residues 70 and 74, intermediate to the p strands D and E. In the antibody-like model of the T-cell receptor, the four hypervariable regions formed a contiguous surface area available for contacts with putative antigens. B. Engineering and Mutagenesis of T-cell Receptor Binding Site

If indeed the T-cell receptor structure is very similar to that of antibody, equivalent engineering designs, such as construction of a soluble Fv-like fragment, should be possible with the receptor. A strategy for the production of such small, soluble, single-chain T-cell receptor fragments was reported by Novotny et al. (1991). A gene encoding the RFL3.8 receptor, specific for the hapten fluorescein in the context of major histocompatibility complex class I1 and composed of the Va and Vp domains joined by a flexible peptide linker, was assembled in an E. coli plasmid. Subsequently, the protein was produced in a bacterial expression system, purified, refolded, and found to be poorly soluble at neutral pH in aqueous buffers. An inspection of the computer-generated Va-Vp model showed several surface-exposed hydrophobic residues. When these were replaced by polar side chains via site-directed mutagenesis of the corresponding gene, a soluble protein resulted and was shown to have antigen-binding properties equivalent to those of the intact receptor of the RFL3.8 T-cell clone (see Section IX,B for complete references on single-chain VaVp constructs). The computer-generated model of the RFL3.8 antifluorescein receptor served as a starting point for mutagenesis aimed at identification of its antigen-contacting residues (Ganju et al., 1992). To localize the potential antigen-contacting residues in the model, advantage was taken of the fact that the crystallographic structure of an antifluorescein antibody, the murine monoclonal 4-4-20, has been solved (Herron et al., 1989). Using atomic coordinates of the 4-4-20 antibody, the most conserved parts of its Vr-VH interface were superimposed on the corresponding parts of the RFL3.8 model. On this superposition, the fluorescein molecule bound to the 4-4-20 antibody was found close to a conspicuous cavity on the surface of the RFL3.8 model. The cavity was surrounded by V a and Vp hypervariable loops and was therefore a promising candidate for the RFL3.8

236

JIRI NOVOTNY AND JURGEN BAJORATH

antigen binding site (see Fig. 3B). Strikingly, side chains chemically similar to those comprising the most important hapten-contacting residues of the 4-4-20 antibody (Trp H33, Tyr L37, Ser L94, Arg L39) were found at the surface of the putative RFL3.8 fluorescein-contacting cavity (Tyr-31, Tyr166, Ser-227, Arg-94) despite the fact that the amino acid sequences of the 4-4-20 and RFL3.8 hypervariable loops showed little similarity to each other. This chemical similarity extended to the layer of aromatic residues directly under the bottom of the binding site cavity. All these observations suggested that similar chemical motifs may be used by antibodies and T-cell receptors to engage the same antigen. Altogether, six potential amino acid contacts with the antigen were selected for alanine-scanning mutagenesis. The mutated single-chain T-cell receptors were expressed in E. coli, purified, refolded, and assayed for fluorescein binding (Ganju et al., 1992). Five out of six mutations resulted in a loss of detectable binding. These RFL3.8 antigen combining site residues were distributed among the p3, al and a:! CDR loops. Given that fluorescein is one of the smallest T-cell antigens available, it seems reasonable to expect that the majority of receptors use multiple CDR loops to contact antigen in combination with the MHC proteins. REFERENCES Abagyan, R., and Totrov, M. (1994). Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins.]. Mol. B i d . 135, 983-1 002. Abagyan R., Totrov M., and Kuznetsov, D. (1993). ICM-a new method of protein modeling and design: Applications to docking and structure prediction from the distorted native conformation.]. Comput. Chem. 15, 488- 506. Adams, G. P., McCartney,J. E., Tai, M. S., Oppermann, H., Huston, J. S., Stafford, W. F. 111, Bookman, M. A., Fand I., Houston, L. L., and Weiner, L. M. (1993). Highly specific in vzvo tumor targeting by monovalent and divalent forms of 741F8 anti-c-erB-2 single chain Fv. Cancer Res. 53, 4026-4034. Adamson A. W. (1976). “Physical Chemistry of Surfaces.” Wiley, New York. Alber, T., Dao-pin, S., Nye, J. A,, Muchrnore, D. C., and Matthews, B. W. (1987). Temperature-sensitive mutations of bacteriophage T4 lysozyme occur at sites with low mobility and low solvent accessibilityin the folded protein. Biochemistry 26, 3754-3758. Allinger, N. L. (1977). Conformational analysis 130: MM2, a hydrocarbon force field utilizingV1 and V2 torsional terms.]. Am. Chem. Sac. 99, 8127-8134. Almassy, R. J., Fontecilla-Camps,J. C., Suddath, F. L., and Bugg, C. E. (1983). Structure of variant 3 scorpion neurotoxin from C. sculpturatus Ewing refined at 1.8 %, resolution. ]. Mol. B i d . 170, 497-527. Amit, A. G., Mariuzza, R. A., Phillips, S. E. V., and Poljak, R. (1986). Three-dimensional structure of an antibody-antigen complex at 2.8 %, resolution. Science 233, 747-752. Amzel, M. (1992).Modeling the variable region of immunoglobulins.Zmmunomethoh 1, 91-95. Anfinsen C. B., and Haber, E. (1961). Studies on the reduction and reformation of protein disulfide bonds.]. B i d . Chem. 236, 1361-1363.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

237

Anglister, J., Frey, T., and McConnell, H . (1984). Magnetic resonance of a monoclonal anti-spin-label antibody. Biochemistry 23, 1138-1 142. Anglister, J., Bond, M.W., Frey, T., Leahy, D., Levitt, M., McConnell, H. M., Rule, G. S., Tomasello, J., and Whittaker, M. (1987). Contribution of tryptophan residues to the combining site of a monoclonal anti-dinitrophenyl spin label antibody. Biochemistry 26, 69586064. Anthony, J., Near, R., Wong, S. L., Iida, E., Ernst, E., Wittekind, M., Haber, E., and Ng, S. C. (1992). Production of stable anti-digoxin Fv in Escherichia coli. MoZ. Immunol. 29, 12371247. Arden, B., Klotz, J. L., Siu, G., and Hood, L. E. (1985). Diversity and structure of genes of the a family of mouse T-cell antigen receptor. Nature 316,783-787. Arevalo, J. H., Taussig, M. J., and Wilson, I. A. (1993). Molecular basis of crossreactivity and the limits of antibody-antigen complementarity. Nature 365, 859-863. Arevalo, J. H., Hassig, C. A., Stura, E. A., Sims, M. J., Taussig, M. J., and Wilson, I. A. (1994). Structural analysis of antibody specificity: Detailed comparison of five Fab’-steroid comp1exes.J. MoZ. Biol. 241, 663-690. Argos, P. (1990). An investigation of oligopeptides linking domains in protein tertiary structures and possible candidates for general gene fusi0n.J. MoZ. Biol. 211, 943-958. Arnon, R., Maron, E., Sela, M., and Anfinsen, C. B. (1971). Antibodies reactive with native lysozyme elicited by a completely synthetic antigen. Proc. NatZ. Acad. Sci. U.S.A. 68, 14501455. Atassi, M. Z. (1975.)Antigenic structure of myoglobin: The complete immunochemical anatomy of a protein and conclusions relating to antigenic structure of lysozyme. Immunochemistry 12,423-438. Atassi, M. Z. (1978). Precise determination of the entire antigenic structure of lysozyme. Immunochemistry 15, 909-936. Babbitt, B. P., Allen, P. M., Matsueda, G., Haber, E., and Unanue, E. R. (1985). Binding of immunogenic peptides to Ia histocompatibility molecules. Nature 317, 359-361. Bahraoui, E., El Ayeb, M., van Rieschoten, J., Rochat, H., and Granier, C. (1986). Immunochemistry of scorpion a-toxins. Mol. ImmunoZ. 23, 357-366. Bajorath, J. (1994). Three-dimensional model of the BR96 monoclonal antibody variable fragment. Bioconjugate Chem. 5, 213-219. Bajorath, J., and Aruffo, A. (1994). Three-dimensional protein models: Insights into structure, function, and molecular interactions. Bioconjugate Chem. 5, 173-181. Bajorath, J., and Fine, R. M. (1992). On the use of minimization from many randomly generated loop structures in modeling antibody combining sites. Immunomethods 1, 137-146. Bajorath, J., and Sheriff, S. (1996). Comparison of an antibody model with an X-ray structure: The variable fragment of BR96. Proteinr, 24, 152-157. Bajorath, J., Stenkamp, R., and Aruffo, A. (1993). Knowledge-based model building of proteins: Concepts and examples. Protein Sci. 2, 1798-1810. Bajorath, J., Harris, L., and Novotny, J. (1995). Conformational similarity and systematic displacement of CDR loops in high-resolution antibody X-ray structures. J . Bid. Chem., 270,22081-22084. Baker, D., and Agard, D. A. (1994). Kinetics versus thermodynamics in protein folding. Biochemistry 33, 7505-7509. Bank, I., DePinho, R. A,, Brenner, M. B., Cassimeris, J., Alt, F. W., and Chess, L. (1986). A functional T3 molecule associated with a novel heterodimer on the surface of immature human thymocytes. Nature 322, 179-181. Barlow, D. W., Edwards, M. S., and Thornton, J. M. (1986). Continuous and discontinuous protein antigenic determinants. Nature 322, 747-748.

238

J I K I NOVOTNY AND JURCEN UAJORATH

Bashford, D., Chothia, C., and Lesk, A. M. (1987). Determinants of a protein fold: Unique features of the globin amino acid sequences.J . Mol. Biol. 196, 199-216. Bassolino-Khmas, D., Bruccoleri, R. E., and Subramaniam, S. (1992). Modeling the antigen combining site of an anti-dinitrophenyl antibody, AN02. Protein Sci. 1, 1465-1476. Becker, M. L. B., Near, R., Mudgett-Hunter, M., Margolies, M. N., Kubo, R. T., Kaye, J., and Hedrick, S. M. (1989). Expression of a hybrid immunoglobulin-T cell receptor protein in transgenic mice. Cell 58, 91 1-921. Benjamin, D. C., Berzofsky, J . A., East, I. J., Gurd, F. R. N., Hannum, C., Leach, S. J., Margoliash, E., Michael, J. G., Miller, A,, Prager, E. M., Reichlin, M., Sercarz, E. E., Smith-Gill, S. J., Todd, P. E., and Wilson, A. C. (1984). The antigenic structure of proteins: A reappraisal. Annu. Rev. Immunol. 2, 67-101. Benjamin, D. C., Williams, D. C., Smith-Gill, S. J., and Rule, G. S. (1992). Long range forces in a protein antigen due to antigen-antibody interaction. Biochemistry 31, 9539-9545. Bentley, G. A., Boulot, G., Karjalainen, K., and Mariuzza, R. A. (1995). Crystal structure of t h e p chain of a T-cell antigen receptor. Science 267, 1984-1987. Bernard, 0. V., and Gough, N. M. (1980). Nucleotide sequence of immunoglobulin heavy chain joining segments between translocated VH and mu constant region genes. Proc. Natl. Acad. Sci. U.S.A. 77, 3630-3634. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977). The protein data bank: A computer based archival file for macromolecular structures.J . Mol. Biol. 112, 535-542. Bhat, T. N., Bentley, G. A,, Greene, M. I., Tello, D., Dall’Acqua, W., Souchon, H., Schwarz, F. P., Mariuzza, R. A., and Poljak, R. J. (1994). Bound water molecules and conformational stabilization help mediate an antigen-antibody association. Proc. Natl. Acad. Sci. U.S.A. 91, 1089-1093. Bird, R. E., Hardman, K. D., Jacobson, J. W., Johnson, S., Kaufman, B. M., Lee, S. M., Lee, T., Pope, S., Riordan, G. S., and Whitlow, M. (1988). Single-chain antigen-binding proteins. Science 242, 423-426. Bohm, D. (1952). A suggested interpretation of the quantum theory in terms of “hidden” variables. Phys. Rev. 85, 166-1 93. Bork, P., Holm, L., and Sander, C. (1994). The immunoglobulin fold: Structural classification, sequence patterns and common core. J . Mol. Biol. 242, 309-320. Born, M., and Jordan, P. (1925). Zur Quantenmechanik. 2. Phys. 34, 858-869. Boss, M. A., Kentern, J. H., Wood, C. R., and Emtage, J. S. (1984). Assembly of functional antibodies from immunoglobulin heavy and light chains synthesized in E. coli. Nucleic Acid Res. 12,3791-3806. Bottger, V., Bottger, A,, Lane, E. B., and Spruce, B. A. (1995). Comprehensive epitope analysis of monoclonal anti-proenkephalin antibodies using phage display libraries and synthetic peptides: Revelation of antibody fine specificities caused by somatic mutations in the variable region genes. J . Mol. Biol. 247, 932-946. Boulianne, G. L., Hozumi, N., and Shulman, M. J . (1984). Production of functional chimeric mouseihuman antibody. Nature 312, 643-646). Bowie, J. U., Luthy, R., and Eisenberg, D. (1991). A method to identify protein sequences that fold into a known three-dimensional structure. Nature 253, 164-170. Brenner, M. B., McLean, J., Dialynas, D. P., Strominger, J. L., Smith, J. A,, Owen, F. L., Seidnran, J. G., Ip, S., Rosen, F., and Krangel, M. S. (1986). Identification of a putative second T-cell receptor Nature 322, 145-149. Breyer, R. M., and Sauer, R. T. (1989). Mutational analysis of the fine specificity of binding of monoclonal antibody 51P t o l repressor. J . Biol. Chem. 264, 13355-13360. Brinkmann, U., Pai, L. H., Fitzgerald, D. J., Willingham, M., and Pastan, I. (1991). Arecom-

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

239

binant immunotoxin containing a disulfide-stabilized Fv fragment. Proc. Nutl. Acad. Sci. U.S.A. 388, 8616-8620. Bromberg, S., and Dill, K. A. (1994). Side chain entropy and packing in proteins. Protein Sci. 3,997-1009. Brooks, B., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S., and Karplus, M. (1983). CHARMM: Chemistry at Harvard molecular mechanics: A program for macromolecular energy minimization and dynamics calcu1ations.J. Comput. Chem. 18, 187-2 17. Bruccoleri, R. E. (1993). Application of systematic conformational search to protein modeling. Mol. Simulations 10, 151-174. Bruccoleri, R. E., and Karplus, M. (1986). Chain closure with bond angle variations. Macromolecules 18, 2767-2 773. Bruccoleri, R. E., and Karplus, M. (1987). Prediction of folding of short polypeptide segments by uniform conformational sampling. Biopolymers 26, 137-168. Bruccoleri, R. E., and Karplus, M. (1990). Conformational sampling using high-temperature molecular dynamics. Biopolymers 29, 1847-1862. Bruccoleri, R. E., and Novotny, J. (1992). Antibody modeling using the conformational search program CONGEN. Immunomethods 1,96-106. Bruccoleri, R. E., Haber, E., and Novotny, J. (1988). Structure of antibody hypervariable loops reproduced by a conformational search algorithm. Nuture 335, 564-568. Bruccoleri, R. E., Novotny, J., Sharp, K., and Davis, M. (1996). Finite difference PoissonBoltzmann electrostatic calculations: Increased accuracy achieved by harmonic dielectric smoothing and charge anti-a1iasing.J. Comput. Chem., in press. Brune, D., and Kim, S. (1994). Hydrodynamics steering effects in protein association. Proc. Natl. Acad. Sci. U.S.A. 91, 2930-2934. Bryant, S. (1989). PKB: A program system and data base for analysis of protein structure. Proteins 5, 233-247. Bryant, S., and Lawrence, C. E. (1991). The frequency of ion pair substructures is qualitatively related to electrostatic potential: A statistical model for non-bonded interactions. Proteins 9, 108-121. Burton, D. R. (1985). Immunoglobulin G: Functional sites. Mol. Immunol. 22, 161-206. Burton, D., Boyd, J., Brampton, A. D., Easterbrook-Smith, S. B., Emanuel, E. J., Novotny, J., Rademacher, T. W., van Schravendijk, M. R., Sternberg, M. J. E., and Dwek, R. (1980). The C l q receptor site on immunoglobulin G. Nuture 288, 338-344. Cabilly, S., Riggs, A. D., Pande, H., Shively,J. E., Holmes, W. E., Rey, M., Perry, J., Wetzel, R., and Heyneker, H. L. (1984). Generation of antibody activity from immunoglobulin polypeptide chains produced in E. coli. Proc. Natl. Acud. Sci. U.S.A. 81, 3273-3277. Capon, D. J., Chamow, S. M., Mordenti, J., Marsters, S. A,, Gregory, T., Mitsuya, H., Byrn, R. A,, Lucas, C., Wurm, F. M., Groopman, J. E., Broder, S., and Smith, D. H. (1989). Designing CD4 immunoadhesins for AIDS therapy. Nuture 337, 525-53 1. Carter, P., Presta, I.., Gorman, C. M., Ridgeway, J. B. B., Henner, D., Wong, W. L. T., Rowland, A. M., Kotts, C . , Carver, M. E., and Shepard, H. M. (1992). Humanization of an antip185HER2 antibody for human cancer therapy. Proc. Nutl. Acud. Sci. U.S.A. 89, 42854289. Chan, H. S., and Dill, K. A. (1990). Origins of structure in globular proteins. Proc. Nutl. Acud. Sci. U.S.A. 87, 6388-6392. Chan, H. S., and Dill, K. A. (1994). Solvation: Effects of molecular size and shape.J. Chem. Phys. 101, 7007-7026. Chandler, D. (1987). “Introduction to Modern Statistical Mechanics,” pp. 159-160. Oxford University Press, Oxford. Chappel, M. S., Isenman, D. E., Everett, M., Xu, Y. Y., Dorrington, K. J. and Klein, M. H.

240

JIRI NOVOTNY AND JURGEN BAJORATH

(1991). Identification of the Fcy receptor class 1 binding site in human IgG through the use of recombinant IgGl/IgG2 hybrid and point-mutated antibodies. Proc. Natl. Acud. Sci. U.S.A. 88, 90369040. Chaudhary, V. K., Queen, C., Junghans, R. P., Waldmann, T. A,, FitzGerald, D. J., and Pastan, I. (1989).A recombinant immunotoxin consisting of two antibody variable domains fused to Pseudomanas exotoxin. Nature 339, 394-397. Cherfils, J., Duquerroy, S., and Janin, J. (1991). Protein-protein recognition analyzed by docking simulation. Proteins 11, 271-280. Chien, Y. H., Gascoigne, N. R. J., Kavaler, J., Lee, N. E., and Davis, M. M. (1984). Somatic recombination in a murine T-cell receptor gene. Nature 309, 322-326. Chien, N. C., Roberts, V. A., Giusti, A. M., Scharff, M. D., and Getzoff, E. D. (1989). Significant structural and functional change of an antigen binding site by a distant amino acid substitution: Proposal of a structural mechanism. Proc. Natl. Acad. Sci. U.S.A. 86, 55325536. Chothia, C. (1973). Conformation of twisted/?-sheets in proteins.J . Mol. Biol. 75, 295-302. Chothia, C. (1974). Hydrophobic bonding and accessible surface area. Nature 248, 338-339. Chothia, C. (1992). One thousand families for the molecular biologist. Nature 357, 543-544. Chothia, C., and Janin, J. (1975). Principles of protein-protein recognition. Nature 256, 705-708. Chothia, C., and Janin, J. (1981). Relative orientation of close-packed /?-pleated sheets in proteins. Proc. Natl. Acad. Sci. U.S.A. 78, 4146-4150. Chothia, C., and Janin, J. (1982). Orthogonal packing of /?-pleated sheets in proteins. Biochemistly 21, 3955-3965. Chothia, C., and Lesk, A. M. (1986). The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823-826. Chothia, C., and Lesk, A. M. (1987). Canonical structures for the hypervariable regions in immunoglobulins. J. Mol. Biol. 196, 901-917. Chothia, C., Novotny, J., Bruccoleri, R. E., and Karplus, M. (1985). Domain association in immunoglobulin molecules: The packing of variable domains.J . Mol. B i d . 186, 65 1-663. Chothia, C., Lesk, A. M., Levitt, M., Amit, A. G., Mariuzza, R. A,, Phillips, S. E. V., and Poljak, R. J. (1986).The predicted structure of immunoglobulin D1.3 and its comparison with the crystal structure. Science 233, 755-758. Chothia, C., Boswell, D. R., and Lesk, A. (1988).The outline structure of the T-cell receptor. EMBO J. 7,3745-3755. Chothia, C., Lesk, A. M., Tramontano, A., Levitt, M., Smith-Gill, S., Air, G., Sheriff, S., Padlan, E. A., Davies, D. R., Tulip, W. R., Colman, P. M., Spinelli, S., Alzari, P. M., and Poljak, R. J. (1989). Conformations of immunoglobulin hypervariable regions. Nature 342, 877-883. Chothia, C., Lesk, A. M., Gherardi, E., Tomlinson, I. M., Walter, G., Marks, J. D., Llewelyn, M. B., and Winter, G. (1992). Structural repertoire of the human VH segments. J. Mol. Biol. 227, 799-817. Claverie, J. M., Prochnicka-Chalufour, A., and Bougueleret, L. (1989). Immunological implications of a Fab-like structure of the T-cell receptor. Immunol. Today 10, 10-14. Clayton, L. K., Sayre, P. H., Novotny, J., and Reinherz, E. L. (1987). Murine and human T11 (CD2) cDNA sequences suggest a common signal transduction mechanism. Eur. J. Immunol. 17, 1367-1370. Colman, P. M. (1988). Structure of antibody-antigen complexes: Implications for immune recognition. Adv. Immunol. 43, 99-1 32. Connolly, M. L. (1983). Solvent-accessible surfaces of proteins and nucleic acids. Science 221, 1177-1 183.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

24 1

Connolly, M. L. (1985).Computation of molecular volume.]. Am. Chem. SOC.107, 1 1 18-1 124. Connolly, M. L. (1986). Shape complementarity at the hemoglobin a1 subunit interface. Biopolymers 25, 1229-1247. Constantine, K. L., Friedrichs, M. S., Metzler, W. J., Wittekind, M., Hensley, P., and Mueller, L. (1994). Solution structure of an isolated antibody VL domain.]. Mol. Biol. 236,310-327. Cook, W. D., Rudikoff, S., Giusti, A., and Scharff, M. D. (1982). Somatic mutation in cultured mouse myeloma cell affects antigen binding. Proc. Natl. Acad. Sci. U.S.A. 79, 1240-1244. Cram, D. J. (1983). Cavitands: Organic hosts with enforced cavities. Science 219, 1177-1 183. Cunningham, B. C., and Wells, J. A. (1993). Comparison of a structural and a functional epitope.1. Mol. Biol. 234,554-563. Cusack, S., Smith, J., Finney, J., Tidor, B., and Karlus, M. (1988). Inelastic neutron scattering analysis of picosecond internal protein dynamics: Comparison of harmonic theory with experiment.]. Mol. Bid. 101,903-908. Cygler, M., Rose, D. R., and Bundle, D. R. (1991). Recognition of a cell-surface oligosaccharide of pathogenic salmonella by an antibody Fab fragment. Science 253,4 4 2 4 4 5 . Dal Porto, J., Johansen, T.E., Catipovic, B., Parfiit, D. J., Tuveson, D., Gether, U., Kozlowski, S., Fearon, D. T., and Schneck, J. P. (1993). A soluble divalent class I major histocompatibility complex molecules inhibits alloreactive T cells at nanomolar concetrations. Proc. N ~ t lAcad. . Sci. U.S.A. 90,6671-6675. Darsley, M. J., and Rees, A. R. (1985). Three distinct epitopes within the loop region of the hen egg lysozyme defined with monoclonal antibodies. EMBO]. 2,383-392. Darwin, C. (1859). “The Origin of Species by Means of Natural Selection.” Random House, New York. Darwin, C. ( 1 87 1). “The Descent of Man and Selection in Relation to Sex.” Random House, New York. Davis, M. M., and Bjorkman, P. J. (1988). T-cell antigen receptor genes and T-cell recognition. Nature 334,395-402. Deisenhofer, J. (1981). Crystallographic refinement and atomic models of a human Fc fragment and its complex with fragment B of protein A from Staphylococcus aureus at 2.9 A and 2.8 A resolution. Biochemistry 20, 2361-2370. de la Cruz, X., Mark, A. E., Tormo, J., Fita, I., and van Gunsteren, W. F. (1994). Investigation of space variations in the antibody binding site by molecular dynamics computer simulation.]. Mol. Biol. 236, 1186-1 195. de la Paz, P., Sutton, B., Darsley, M. J,, and Rees, A. R. (1986). Modelling of the combining sites of three anti-lysozyme monoclonal antibodies and of the complex betweem one of the antibodies and its epitope. EMBOJ. 2,415-425. Denzin, L. K., Gulliver, G. A., andVoss, E. W. Jr. (1993). Mutational analysis ofactive site contact residues in anti-fluorescein monoclonal antibody 4-4-20. Mol. Zmmunol. 30, 133 1-1 345. Desplancq, D., King, D. J., Lawson, A. D. G . , and Mountain, A. (1994). Multimerization behaviour of single-chain Fv variants from the tumour-binding antibody B72.3. Protein Eng. 7, 1027-1033. De Sutter, K., and Fiers, W. (1994). A bifunctional murine-human chimeric antibody with one antigen-binding arm replaced by bacterial B-lactamase. Mol. Immunol. 31,261-267. Diamond, R. (1976). On the comparison of conformations using linear and quadratic transformations. Acta Crystallogr. A32 1-10, Dill, K.A. (1990). Dominant forces in protein folding. Biochemist? 29,7133-7155. Dower, S. K., Wain-Hobson, S., Gettins, P., Givol, D., Jackson, R. C., Perkins, S.J., Sunderland, C. A., Sutton, B. J., Wright, C. E., and Dwek, R. A. (1977) The combining site of the dinitrophenyl-binding immunoglobulin A myeloma protein MOPC 3 15. Biochem. J . 165, 207-225.

242

J I R I NOVOI‘NY AND J U R C E N BAJOKAI’H

Driscoll, P. C., Cyster, J. G., Campbell, 1. D., and Williams, A. F. (1991). Structure of domain 1 of rat T lymphocyte CD2 antigen. Nature 353, 762-765. Duncan, A. K., and Winter, G. (1988). The binding site for C l q on IgG. Nuture 332, 738-740. Duncan, A. K., Woof, J. M., Partridge, L. J., Burton, D. R., and Winter, G. (1988). Localization of the binding site for the human high-affinity Fc receptor on IgC. Nature 332, 563-564. Dunitz, J. D. (1994). The entropic cost of bound water in crystals and biomolecules. Science 264, 670. Dwek, K. A., Knott, J . C. A,, March, D., McLaughlin, A. C., Press, E. M., Price, N . C., and White, A. I. (1975).Structural studies on the combining site of the myeloma protein MoPC 315. E7~r.,/.Biorhem. 53, 25-39. Lhvek, R. A,, Givol, D., Jones, R., McLaughhn, A. C., Wain-Hobson, S., White, A. I., and Wright, C. (1976).Biorhern.]. 155, 37-153. Dwek, K. A., Wain-Hobson, S., Dower, S., Gettins, P., Perkins, S. J., and Givol, D. (1977). Structure of an antibody combining site by magnetic resonance. Nature 266, 31-37. Early, P., Huang, H., Davis, M., Calame, K., and Mood (1980). An immunoglobulin heavy chain variable region gene is generated from three segments of DNA VHI DI and J F ~ .Cell 19, 981-992‘. Edelman, G. M. (1970). The covalent structure of human y G-immunoglobulin. XI: Functional implications. Riochernktry 9, 3 197-3205. Edmundson, A. B., Ely, K. K., Abola, E. E., Schiff‘er, M., and Panagiotopoulos, N . (1975). Rotational allonierism and divergent evolution of domains in immunoglobulin light chains. Biochemistry 14, 3953-3961, Eigenbrot, C., Randal, M., Presta, L., Carter, P., and Kossiakoff, A. A. (1993). X-ray structures of the antigen-binding domains from three variants of humanized anti-p 1I(supHER2 antibody 4D5 and comparison with molecular modeling.]. Mol. Rid. 229, 969-995. Eigenbrot, C., GonLalez, T., Mayeda, J., Carter, P., Werther, W., Hotaling, T., Fox, J . , and Kessler, J . (1994). X-ray structures of fragments from binding and nonbinding versions of a humanized anti-CD18 antibody: Structural indications of the key role o f V residues ~ 59 to 65. PTU~&ZS 18, 49-62. Eilat, I)., Kikuchi, G. E., Coligan, J. E., and Shevach, E. M. (1992). Secretion of a soluble, chimeric yS T-cell receptor-immunoglobulin heterodimer. Prof. Nutl. h a d . Sci. U.S.A. 89, 6871-6875. Eisenberg, D., and Kauzmann, W. (1969). “Structure and Properties of Water,” p. 179. Oxford University Press, Oxford. Eisenberg, D., and McLachlan, A. (1986). Solvation energy in protein folding and binding. Nuture 319, 199-203. El Ayeb, M., Martin, F., Delori, P., Bechis, G., and Rochat, H. (1983). Immunochemistry of scorpion a-neurotoxin: Determination of antigenic site number and isolation of a highly enriched antibody specific to a single antigenic site of toxin 11 of Androctornus australis Hector. Mol. Imrnunol. 20, 697-708. El Ayeb, M., Bahraoui, E. M., Granier, C., Delori, P., van Rieschoten, J , , Rochat, H. (1984). Immunochemistry of scorpion a-neurotoxins. Determination of the antigenic site number and isolation of a highly enriched antibody specific to a single antigenic site of toxin I1 of A. australis Hector. Mol. Imrnunol. 21, 223-232. Epstein, C. J. (1964). Kelation of protein evolution to tertiary structure. Nature 203, 13501352.

Epstein, C . J . (1966). Role of the amino acid “code” and of selection for conformation in the evolution of proteins. Nature 210, 25-28. Eshhar, Z., Waks, T., Gross, G., and Schindler, D. G. (1993). Specific activation and targeting of cytotoxic lymphocytes through chimeric single chains consisting of antibody-binding

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

243

domains and the y or 5 subunits of the immunoglobulin and T-cell receptors. Proc. Natl. Amd. Sci. U.S.A. 90, 720-724. Essen, L.-O., and Skerra, A. (1994). The de novo design of an antibody combining site. Crystallographic analysis of the V I ~domain confirms the structural mode1.J. Mol. Biol. 238, 226-244. Ewald P. (I92 1). Die Berechnung optischer und electrostatischer Gitterpotenziale. Z . Phys. 64,253-287. Fairman, R., Chao, H. G., Lavoie, T. B., Villafranca, J. J., Matsueda, G., and Novotny, J. (1996). Design of heterotetrameric coiled-coils: Evidence for increased stabilization by Glu Lys ion pair interactions. Biochemistry, 35, 2824-2829. Fanning, D. W., Smith, J. A,, Rose, G. D. (1986). Molecular cartography of globular proteins with application to antigenic sites. Biopolymers 25, 863-883. Feldmann, K. J., Potter, M., and Glaudemanns, C. P. J . (1984). A hypothetical space-filling model of the V-regions of the galactan-binding myeloma immunoglobulin 5539. Mol. Immunol. 18, 683-68-6899. Fersht, A. R. (1984). “Enzyme structure and mechanism.” W. H. Freedman & Co., New York. Fersht, A. R., Shi, J. P., Knill-Jones, J., Lowe, D. M., Wilkinson, J., Blow, D. M., Brick, P., Carter, P., Waye, M. Y., and Winter, G. (1985). Hydrogen bonding and biological specificity analyzed by protein engineering. Nature 314, 235-238. Feynman, R. P. (1967). “The Character of Physical Law.” MIT Press, Cambridge, MA. Feynman, R. P. (1972). “Statistical Mechanics: A Set of Lectures,” pp. 72-96. Benjamin/ Cummings, Reading, MA. Feynnian, R. P., and Hibbs, A. R. (1962). “Path Integrals and Quantum Mechanics.” McGraw Hill, New York. Fidelis, K., Stern, P. S., Bacon, D., and Moult, J . (1994). Comparison of systematic search and database methods for constructing segments of protein structure. Protein Eng. 7, 953-960. Fine, R. M., Wang, H., Shenkin, P. S., Yarmush, D. L., and Levinthal, C. (1986). Predicting antibody hypervariable loop conformations. 11: Minimization and molecular dynamics studies of McPC 603 from many randomly generated loop conformations. Proteins 1, 342-362. Finkelstein, A. V., and Janin, J. ( I 989). The price of lost freedom: Entropy of himolecular complex formation. Protein Eng. 3, 1-3. Finney, J. L. (1975). Volume occupation, environment and accessibility in proteins: The problem of the protein surface.]. Mol. Rial. 96, 721-732. Fischer, E. (1894). Einfluss der Configuration auf die Wirkung der Enzyme. Chem. Ber. 27, 2985-2993. Fischmann, T. O., Bentley, G. A., Bhat, N., Boulot, G., Mariuzza, R. A,. Phillips, S. E. V., Tello, D., and Poljak, R. J . (1991). Crystallographit refinement of the three-dimensional structure of the Fab D1.3-lysozyme complex at 2.5 A reso1ution.J. Biol. Chem. 266, 12915. Flory, P. J. (1 969). “Statistical Mechanics of Chain Molecules.” Oxford University Press, New York. Fontecilla-Camps, J . C., Habersetzer-Rochat, C., and Rochat, H. (1988). Orthorhombic crystals and three-dimensional structure of the potent toxin I1 from the scorpion Androctonus australis Hector. Proc. Natl. Acud. Sci. U.S.A. 85, 7443-7447. Foote, J . , and Winter, G. (1992).Antibody framework residues affecting the conformation of the hypervariable loops.J . Mol. Biol. 224, 487499. Freund, C., Ross, A,, Guth, B., Pluckthun A,, and Holak, T. A. (1993). Characterization of the linker peptide of the single-chain Fv fragment of an antibody by NMR spectroscopy. FEBS Lett. 320, 97-1 00. Friedman, A. R., Roberts, V. A., and Tainer, J . A. (1994). Predicting molecular interactions

244

JIRI NOVOTNY AND JURCEN BAJORATH

and inducible complementarity: Fragment docking of Fab-peptide complexes. Proteins 20, 15-24. Friedman, M. L. (1975). Image approximation to the reaction field. Mol. Phys. 29, 1533-1543. Ganju, R. K., Smiley, S., Bajorath, J., Novotny, J., and Reinherz, E. L. (1992). Similarity between fluorescein-specific T-cell receptor and antibody in chemical details of antigen recognition. Proc. Natl. Acad. Sci. U.S.A. 89, 11552-1 1556. Gascoigne, N. R. J,, Goodnow, C. C., Dudzik, K. I., Oi, V. T., and Davis, M. M. (1987). Secretion of a chimeric T-cell receptor-immunoglobulin protein. Proc. Natl. Acad. Sci. U.S.A. 84, 2936-2940. Gerstein, M., Sonnhammer, E. L. L., and Chothia, C. (1994). Volume changes in protein evo1ution.J. Mol. Biol. 236, 1067-1078. Getzoff, E. D., Geysen, H. M., Rodda, S. J., Alexander, H., Tainer, J. A., and Lerner, R. A. (1987). Mechanisms of antibody binding to a protein. Science 235, 1191-1 196. Getzoff, E. D., Tainer, J. A,, Lerner, R. A., and Geysen, H. M. (1988). The chemistry and mechanism of antibody binding to protein antigens. Adu. Immunol. 43, 1-98. Geysen, H. M., Tainer, J. A,, Rodda, S. J., Mason, T. J., Alexander, H., Getzoff, E. D., and Lerner, R. A. (1987). Chemistry of antibody binding to a protein. Science 235, 1184-1 190. Gibrat, J. F., Higo, J., Collura, V., and Garnier, J. (1992). A simulated annealing method for modeling the antigen combining site of antibodies. Immunomethods 1, 107-125. Glockshuber, R., Malia, M., Pfitzinger, I., and Pluckthun, A. (1990). A comparison of strategies to stabilize immunoglobulin Fv fragments. Biochemistry 29, 1362-1367. Godel, K. (1931). Uber formal unentscheidbare Satze der Principia Mathematica und verwandter Systeme I. Monatsh. Mat. Phys. 38, 173-198. Go, N., and Scheraga, H. A. (1970). Ring closure and local conformational deformations of chain molecules. Macromolecules 3, 178-187. Goodsell, D. S., and Olson, A. J. (1990). Automated docking of substrates to proteins by simulated annealing. Proteins 8, 195-202. Gorman, S. D., Clark, M. R., Routledge, E. G., Cobbold, S. P., and Waldmann, H. (1991). Reshaping a therapeutic CD4 antibody. Proc. Natl. Acad. Sci. U.S.A. 88,41814185. Goshorn, S. C., Svensson, H. P., Kerr, D. E., Somerville, J . E., Senter, P. D., and Fell, H . P. (1993). Genetic construction, expression and characterization of a single chain anticarcinoma antibody fused to p-lactamase. Cancer Res. 53, 2 123-2 127. Goverman, J., Gomez, S. M., Segesman, K. D., Hunkapiller, T., Laug, W. E., and Hood, L. (1990). Chimeric immunoglobulin-T cell receptor proteins form functional receptors: Implications for T cell receptor complex formation and activation. Cell 60, 929-939. Granier, C., Novotny, J., Fontecilla-Camps, J. C., Fourquet, P., El Ayeb, M., and Bahraoui, E. (1989). The antigenic structure of a scorpion toxin. Mol. Immunol. 26, 503-513. Greenspan, N. S. (1992). Epitopes, paratopes and other topes: Do immunologists know what they are talking about? Bull. Inst. Pasteur 90, 267-279. Greer, J. (1991). Comparative modeling of homologous proteins. Meth. Entymol. 202, 239252. Gregoire, C., Rebai, N., Schweisguth, F., Necker, A., Mazza, G., Auphan, N., Millward, A., Schmitt-Verhulst, A. M., and Malissen, B. (1991). Engineered secreted T-cell receptor ap heterodimers. Proc. Natl. Acad. Sci. U.S.A. 88, 8077-8081. Gregoret, L. M., and Cohen, F. E. (1990). Novel method for the rapid evaluation of packing in protein structures.J. Mol. B i d . 21 1, 959-974. Gregoret, L. M., and Cohen, F. E. (1991). Protein folding: Effect of packing density on chain conformation.J. Mol. Biol. 219, 109-122. Gross, G., Waks, T., and Eshhar, Z. (1989). Expression of immunoglobulin-T-cell receptor chimeric molecules as functional receptors with antibody-type specificity. Proc. Natl. Acad. Sci. U.S.A. 86, 10024-10028.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

245

Gruen, L. C., McKimm-Breschkin, J. L., Caldwell, J. B., Nice, E. C. (1994). Affinity ranking of imfluenza neuraminidase mutants with monoclonal antibodies using an optical biosensor: Comparison with ELISA and slot blot assays.]. Immunol. Meth. 168, 91-100. Haber, E. (1964). Recovery of antigenic specificity after denaturation and complete reduction of disulfides in a papain fragment of antibody. Proc. Natl. Acad. Sci. U.S.A. 52, 10991106. Haber, E., Richards, F. F., Spragg, J., Austen, K. F., Vallotton, M., and Page, L. B. (1967). Modifications in the heterogeneity of antibody response. Cold Spring Harbor Symp. Quant. Riol. 32, 299-310. Haber, E., Margolies, M. N., Cannon, L. E. (1976). Origins of antibody diversity: Insights gained from amino acid sequence studies of elicited antibodies. Cold Spring Harbor Symp. Quant. Biol. 41, 647-659. Haber, E.,Quertermous, T., Matsueda, G., and Runge, M. (1989). Innovative approaches to plasminogen activator therapy. Science 243, 5 1-56, Hagler, A. T., Huler, E., Lifson, S. (1974). Energy functions for peptides and proteins. I: Derivation of a consistent force field including the hydrogen bond from amide crystals.]. Am. Clieni. Sac. 96, 5319-5327. Harpaz, Y., and Chothia, C. (1994). Many of the immunoglobulin superfamily domains in cell adhesion molecules and surface receptors belong to a new structural set which is close to that containing variable domains.]. Mol. Biol. 238, 528-539. Harpaz, Y., Gernstein, M., and Chothia, C. (1994). Volume changes on protein folding. Structure 2, 641-650. Harris, L. J . , Larson, S. B., Hasel, K.W., Day, J., Greenwood, A,, and McPherson, A. (1992). The three-dimensional structure of an intact monoclonal antibody for canine lymphoma. Nature 360, 369-372. Hawkins, R. E., Russell, S. J., Baier, M., and Winter, G. (1993). The contribution of contact and non-contact residues of antibody in the affinity of binding to antigen: The interaction of mutant D1.3 antibodies with 1ysozyme.J. Mol. Biol. 234, 958-964. Hayden, M.S., Linsley, P. S., Gayle, M. A., Bajorath, J . , Brady, W. A,, Norris, N. A., Fell, H. P., Ledbetter, J. A., and Gilliland, L. K. (1994). Single-chain mono- and bispecific antibody derivatives with novel biological properties and anti-tumor activity from a COS cell transient expression system. Ther. Immunol. 1, 3-15. Hedrick, S. M., Nielsen, E. A., Kavaler, J., Cohen, D. I., and Davis, M. M. (1984). Sequence relationship between putative T-cell receptor polypeptides and immunoglobulins. Nature 308, 153-158. Heisenberg, W. ( 1 925). Uber quantentheoretische Umdeutung kinematischer und mechanischer Beziehungen. Z. Phys. 33, 879-891. Hermans, J., and McQueen, J. E. (1974). Computer simulation of (macro) molecules with the method of local change. A . Crystallogr. A30, 730-739. Herron, J. N., He, X. M., Mason, M. L., Voss, E. W., and Edmundson, A. B. (1989). Threedimensional structure of a fluorescein Fab complex crystallized in 2-methyl-2,4pentanediol. Protezns 5, 271-280. Hiatt, A,, Cafferkey, R., and Bowdish, K. (1989). Production of antibodies in transgenic plants. Nature 342, 76-78. Higo, J., Collura, V., and Garnier, J. (1992). Development of an extended simulated annealing method: Application to the modeling of complementarity-determining regions of immunoglobulins. Biopolymers 32, 33-43. Hill, R. L., Delaney, R., Fellows, R. E., and Lebovitz, H. E. (1966). The evolutionary origins of the immunoglobulins. Proc. Natl. Acad. Sci. U.S.A. 56, 1762-1769. Hilschmann, N., and Craig, I,. C. (1965). Amino acid sequence studies with Bence-Jones proteins. Proc. Natl. Acad. Sci. U.S.A. 53, 1403-1409.

246

JIRI NOVOTNY AND JURCEN BAJORATH

Hilyard, K. L., Reyburn, H., Chung, S., Bell, J. I., and Strominger, J. L. (1994). Binding of soluble natural ligands to a soluble human T-cell receptor fragment in Escherichiu coli. Proc. Natl. Acad. Sci. U.S.A. 91, 9057-9061. Hochman, J., Inbar, D. and Givol, D. (1973). An active antibody fragment (Fv) composed of the variable portions of the heavy and light chains. Biochemistry 12, 1130-1 135. Hoffren, A. M., Holm, L., Laaksonen, L., Teeri, T., and Teleman, 0. (1992). Molecular dynamics simulations of hapten binding to structural models of 2-phenyloxazolone antibodies. Immunomethods 1, 80-90. Holliger, P., Prospero, T., and Winer, G. (1993). “Diabodies”: Small bivalent and bispecific antibody fragments. Proc. Natl. Acad. Sci. U.S.A. 90, 644445448. Holm, L., and Sander, C. (1994). Searching protein structure databases has come of age. Proteins 19, 165-173. Horne, C., Klein, M., Polidoulis, I., and Dorrington, K. J. (1982). Noncovalent association of heavy and light chains of human immunoglobulins. 111. Specific interactions between VH and VL.J . Immunol. 129,660-664. Housset, D., Habersetzer-Rochat, C., Astier, J. P., and Fontecilla-Camps, J. C. (1994). Crystal structure of toxin I1 from the scorpion Androctonus australis Hector refined at 1.4 A resolution.]. Mol. Biol. 238, 88-103. Horton, N., and Lewis, M. (1992). Calculation of the free energy of association for protein complexes. Protein Sci. 1, 169-181. Hsiao, K. C., Bajorath, J., and Harris, L. J. (1994). Humanization of 60.3, and anti-CD18 antibody: Importance of the L2 loop. Protein Eng. 7, 815-822. Huber, R., Kukla, D., Bode, W., Schwager, P., Barterls, K., Deisenhofer, J., and Steigemann, W. (1974).J. Mol. Biol. 89, 73-101. Huston, J. S., Levinson, D., Mudgett-Hunter, M., Tai, M. S., Novotny, J., Margolies, M. N., Ridge, R., Bruccoleri, R. E., Haber, E., Crea, R., and Oppermann, H. (1988). Protein engineering of antibody binding sites: Recovery of specific activity in an anti-digoxin single-chain Fv analogue produced in Escherichiu coli. Proc. Natl. Acad. Sci. U.S.A. 85, 58795883. Huston, J. S., Mudgett-Hunter, M., Tai, M. S., McCarthy, J., Warren, F., Haber, E., and Oppermann, H. (1991). Protein engineering of single-chain Fv analogs and fusion proteins. Methods Enzymol. 203, 46-88. Igarashi, T., Sato, M., Katsube, Y., Takio, K., Tanaka, T., Nakanishi, M., and Arata, Y. (1990). Structure of a mouse immnuglobulin G that lacks the entire CH1 domain: Protein sequencing and small-angle X-ray scattering studies. Biochemistry 29, 5727-5733. Jackson, R. M., and Sternberg, M. J. E. (1994). Application of scaled particle theory to model the hydrophobic effect: Implications for molecular association and protein stability. Protein Eng., 3, 371-383. Janin, J. (1995). Elusive affinities. Proteins 21, 30-39. Jeffrey, P. D., Strong, R. K., Sieker, L. C., Chang, C. Y., Campbell, R. L., Petsko, G. A,, Haber, E., Margolies, M. N., and Sheriff, S. (1993). 26-10 Fab-digoxin complex: Affinity and specificity due to surface complementarity. Proc. Natl. Acad. Sci. U.S.A. 90, 10310-103 14. Jeffrey, P. D., Bajorath, J., Chang, C. Y., Yelton, D., Hellstrom, I., Hellstrom, K. E., and Sheriff S. (1995). The X-ray structure of an anti-tumor antibody in complex with antigen. Nat. Struct. Biol. 2, 466-47 1. Jiang, F., and Kim, S. H. (1991). “Soft docking”: Matching of molecular surface cubes.j. Mol. Baal. 219, 851-865. Jin, L., Fendly, B. M., and Wells, J. A. (1992). High resolution functional analysis of antibody-antigen interacti0ns.J. Mol. Biol. 226, 85 1-865. Jones, P. T., Dear, P. H., Foote, J., Neuberger, M. S., and Winter, G. (1986). Replacing the

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

247

complementarity-determining regions in a human antibody with those from a mouse. Nature 321, 522-525. Jones, T. A., and Thirup, S. (1986). Using known structures in protein model building and crystallography. EMBOJ. 5 , 819-822. Jones, E. Y., Davis, S. J., Williams, A. F., Harlos, K., and Stuart, D. I. (1 992). Crystal structure at 2.8 A resolution of a soluble form of the cell adhesion molecule CD2. Nature 360, 232-2 39. Jores, R., Alzari, P. M., and Meo, T. (1990). Resolution of hypervariable regions in T-cell receptor /3 chains by a modified Wu-Kabat index of amino acid diversity. Proc. Natl. Acad. Sci. U.S.A. 87, 9138-9142. Jorgensen, W. L., and Tirado-Rives, M. (1988). The OPLS potential functions for proteins: Energy minimizations for crystals of cyclic peptides and crambin.J. Am. Chem. Sac. 110, 1657- 1666. Jung, S. El., Pastan, I., and B. K. Lee (1994). Design of interchain disulfide bonds in the framework region of the Fv fragment of the monoclonal antibody B3. Proteins 19, 35-47. Kabat, E. A. (1970). Heterogeneity and structure of antibody combining sites. Ann. N. Y. Acad. Sci. 169, 43-54. Kabat, E. A., and Wu, T. T. (1972). Construction of a three-dimensional model of the polypeptide backbone of the variable region of kappa immunoglobulin light chains. Proc. Natl. Acad. Sci. U.S.A. 69, 960-964. Kabat, E. A., Wu, T. T., Reid-Miller, M., Perry, H. M., and Gottesman, K. S. (1977). “Sequences of Proteins of Immunological Interest,” 4th ed. U. S. Public Health Service, NIH, Washington, DC. Kabsch, W. (1976). A solution for the best rotation to relate two sets of vectors. Acta Crystallop. A32,922-923. Kabsch, W., and Sander, C. (1984). On the use of sequence homologies to predict protein structure: Identical pentapeptides have completely different conformations. Proc. Natl. Acad. Sci. U.S.A. 81, 1075-1078. Kam-Morgan, L. N. W., Smith-Gill, S. J., Taylor, M. G., Zhang, L., Wilson, A. C., and Lirsch, J. F. (1993). High-resolution mapping of the HyHEL-I0 epitope of chicken lysozyme by site-directed mutagenesis. Proc. Natl. Acud. Sci. U.S.A. 90, 3958-3962. Kao, C. Y. Y., and Sharon, J. (1993). Chimeric antibodies with anti-dextran-derived complementarity-determining regions and anti-p-azophenylarsonate-derived framework regions. J . Immunol. 151, 1968-1979. Karush, F. (1962). Immunologic specificity and molecular structure. Adv. Immunol. 2, 1 4 0 . Kauzmann, W. (1959). Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1-63. Kelley, R. F., O’Connell, M. P. (1993). Thermodynamic analysis of an antibody functional epitope. Biochemistry 32, 6828-6835. Kelley, R. F., O’Connell, M. P., Carter, P., Presta, L., Eigenbrot, C., Covarrubias, M., Snedecor, B., Bourell, J. H., and Vetterlein, D. (1992). Antigen binding thermodynamics and antiproliferative effects of chimeric and humanized anti-pl8supHERP antibody Fab fragments. Biochemistry 31, 5434-5441. Kipriyanov, S. M., Dubel, S., Breitling, F., Kontermann, R. E., and Little, M. (1994). Recombinant single-chain Fv fragments carrying C-terminal cysteine residues: Production of bivalent and biotinylated miniantibodies. MoZ. Immun,ol. 31, 1047-1058. Kirkpatrick, S., Gelatt, C. D. Jr., and Vecchi, M. P. (1953). Optimization by simulated annealing. Science 220, 671-680. Kirkwood, J. G . (1934). Theory of solutions of molecules containing widely separated charges with special applications to zwitterions.]. Chem. Phys. 2, 351-461.

248

JIRI NOVOTNY AND J U R C E N BAJORA‘I‘H

Kitson, D. H., Avbelj, F., Moult, J., Nguyen, D. T., Mertz, J. E., Hadzi, D., and Hagler, A. T. (1993). On achieving better than I-A accuracy in a simulation of a large protein: Streptomyces griseus protease A. Proc. Natl. Acad. Sci. U.S.A., 90, 8920-8924. Klapper, I., Hagstrom, R., Fine, R., Sharp, K. A,, and Honig, B. (1986). Focusing of electric fields in the active site ofCu,Zn superoxide dismutase. Proteins 1, 47-79. Kolbinger, F., Saldanha, J., Hardman, N., and Bendig, M. M. (1993). Humanization of a mouse anti-human IgE antibody: A potential therapeutic for IgE-mediated allergies. Protein Eng. 6, 971-980. Kortt, A. A,, Malby, R. L., Caldwell, J. B., Gruen, L. C., Ivancic, N., Lawrence, M. C., Howlett, G. J., Webster, R. G., Hudson, P. J.. and Colman, P. M. (1994). Recombinant anti-sialidase single-chain variable fragment antibody: Characterization, formation of dimer and higher-molecular-mass multimers and the solution of the crystal structure of the singlechain variable fragmentisialidase complex. FEBS I&//. 221, 15 1-157. Kostelny, S. A., Cole, M. S., and Tso, J. Y. (1992). Formation o f a bispecific antibody by the use of leucine zippers.]. Immunol. 148, 1547-1553. Kozack, R. E., and Subramaniam, S. (1993). Brownian dynamics simulations of molecular recognition in an antibody-antigen system. Protein Sci. 2, 915-926. Krdulis, P.J. (1991). MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures.]. Appl. CTstallogr. 14, 946-950. Krystek, S., Stouch, T., and Novotny, J . (1993). Affinity and specificity of serine endopeptidase-protein inhibitor enzymes: Empirical free energy calculations based on X-ray crystallographic structures.]. Mol. B i ~ l 234, . 661-679. Kuhn, T. S. (1970). “The Structure of Scientific Revolutions,” pp. 43-51. University of Chicago Press, Chicago. Kuntz, I. D., and Crippen, G. M. (1979). Protein densities. Int. J . Peptide Protein Res. 13, 223-228. Kurucz, I., Jost, C. R., George, A. J , T.,Andrew, S. M., and Segal, D. M. (1993). Abacterially expressed single-chain Fv construct from the 2B4 T-cell receptor. Proc. Natl. Acad. Sci. U.S.A.90, 3830-3834. Kurucz, I., Titus, J . A., Jost, C. R., Jacobus, C. M., and Segal, D. M. (1995). Retargeting of CTL by an efficiently refolded bispecific single-chain Fv dimer produced in bacteria. ]. Immunol. 154,45764582, Ladd, M. F. C., and Palmer, R. A. (1985). “Structure Determination by X-ray Crystallography,” 2nd ed. Plenum Press, New York. Landolfi, N. F. (1991). A chimeric IL-Bilg molecule possesses the functional activity of both proteins.,/. Immunol. 146, 915-919. Landsteiner, K. (1962). “Specificity of Serological Reactions.” Dover, New York. Laver, W. G., Air, G. M., Webster, R. G., and Smith-Gill, S. J . (1990). Epitopes on protein antigens: Misconceptions and realities. Cell 61, 553-556. Lavoie, T. B., Drohan, W. N., and Smith-Gill, S. (1992). Experimental analysis by sitedirected mutagenesis of somatic mutation effects on affinity and fine specificity in antibodies specific for 1ysozyme.J. Immunol. 148, 503-5 13. Lawrence, M. C., and Colman, P. M. (1993). Shape complementarity at proteiniprotein interfaces./. Mol. Bid. 234, 946-950. Leahy, D. J., Axel, R., and Hendrickson, W. A. (1992). Crystal structure of a soluble form of the human T-cell receptor CD8 at 2.6 resolution. Cell 68, 1145-1 162. Leatherbarrow, R. J., Jackson, W. R. C., and Dwek, R. A. (1982). Role of tyrosines in the combining site of the dinitrophenyl-binding IgA myeloma M3 15: Specific nitration and high-resolution hydrogen-1 nuclear magnetic resonance studies. Biochemistry 21, 5 1245 129.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

249

Lee, B. K., and Richards, F. M. (1971). The interpretation of protein structures: Estimation of static accessibility.J . Mol. Biol. 55,379400. Lemberg, L., and Stillinger, F. H. (1975). Central force model for liquid water.]. Chem. Phys. 62,1677-1690. Lescar, J., Pellegrini, M., Souchon, H., Tello, D., Poljak, R. J., Peterson, N., Greene, M., and Alzari, P. M. (1995). Crystal structure of a cross-reaction complex between Fab F9.13.7 and guinea fowl lysozyme.J . Biol. Chem. 270, 18067-18076. Lesk, A. M. (1991). “Protein Architecture: A Practical Approach,” pp. 130-133. Oxford University Press, Oxford. Lesk, A. M., and Chothia, C. (1982). Evolution of proteins formed by /I-sheets. 11: The core of the immunoglobulin d0mains.J. Mol. Biol. 160,325-342. Lesk, A. M., and Chothia, C. (1988). Elbow motion in the immunoglobulins involves a molecular ball-and-socket joint. Nature 335, 188-1 90. Levitt, M. (1983). Protein folding by restrained energy minimization and molecular dynamics.J. Mol. Biol. 168,595-617. Levitt, M. (1992). Accurate modeling of protein conformation by automatic segment matching.J . Mol. Biol. 226,507-533. Levitt, M.,and Lifson, S. (1969). Refinement of protein conformations using a macromolecular energy minimization procedure.J. Mol. Bid. 46,269-279. Levitt, M., and Perutz, M. F. (1988). Aromatic rings as hydrogen bond acceptors. J . Mol. Biol. 201,751-754. Levitt, M., and Warshel, A. (1975). Computer simulation of protein folding. Nature 253, 694-698. Levy, R.,Assulin, O., Scherf, T., Levitt, M., and Anglister, J. (1989). Probing antibody diversity by 2D NMR: Comparison of amino acid sequences, predicted structures, and observed antibody-antigen interactions in complexes of two antipeptide antibodies. Biochemistry 28, 7168-7 175. Linsley, P. S.,Brady, W., Grosmaire, L., Ledbetter, J. A,, and Damle, N. (1991). CTLA-4 is a second receptor for the B cell activation antigen B7.J . Exp. Med. 174,561-569. London, F. (1930). Uber einige Eigenschaften und Anwendungen der Molekularkrafte. Z. Phys. Chem. Bll,222-251. Lowman, H. B., Bass, S. H., Simpson, N., and Wells, J. A. (1991). Selecting high-affinity binding proteins by monovalent phage display. Biochemistry 30, 10832-1 0838. Luthy, R., Bowie, J. U., and Eisenberg, D. (1992). Assessment of protein models with threedimensional profiles. Nature 356,83-85. Mack, M., Riethmuller, G., and Kufer, P. (1995). A small bispecific antibody construct expressed as a functional single-chain molecule with high tumor cell cytotoxicity. Proc. Nutl. Acad. Sci. U.S.A. 92,7021-7025. Mage, M. G., Lee, L., Ribado, R. K., Corr, M., Kozlowski, S., McHugh, L., and Margulies, D. H. (1992). A recombinant, soluble, single-chain class I major histocompatibility complex molecule with biological activity. Proc. Natl. Acad. Sci. U.S.A. 89, 10658-10662. Malby, R. I>.,Tulip, W. R., Harley, V. R., McKimm-Breschkin, J. L., Laver, W. G., Webster, R. G., and Colman, P. M. (1994). The structure of a complex between the NClO antibody and influenza virus neuraminidase and comparison with the overlapping binding site of the NC41 antibody. Structure 2,733-746. Mallender, W. D., and Voss, E. W. (1994). Construction, expression and activity of a bivalent bispecific single-chain antibody./. Biol. Chem. 269, 199-206. Mallender, W. D., Ferreira, S. T., Voss, E. W., and Coelho-Sampaio, T. (1994). Interactivesite distance and solution dynamics of a bivalent-bispecific single-chain antibody molecule. Biochemistry 33, 10100-10108.

250

JIRI NOVOTNY AND JURCEN BAJORATH

Mariuzza, R. A., and Winter, G. (1989). Secretion of a homodimeric VacK T-cell receptorimmunoglobulin chimeric protein.]. Biol. Chem. 264, 73 10-7316. Mark, A. E., and van Gunsteren, W. F. (1994). Decomposition of the free energy of a system in terms of specific interactions.]. Mol. Bid. 240, 167-176. Marks, J. D., Hoogenboom, H. R., Grifiths, A. D., and Winter, G. (1992). Molecular evolution of proteins on filamentous phage.]. Mol. Biol. 267, 16007-16010. Marquart, M., Deisenhofer, J., Huber, R., and Palm, W. (1980). Cyrstallographic refinement and atomic models of the intact immunoglobulin molecule kol and its antigen-binding fragment at 3.0 A and 1.9 8, resolution.]. Mol. Bid. 141, 369-391. Martin, A. C. R., Cheetham, J . C., and Rees, A. R. (1989). Modeling antibody hypervariable loops: A combined algorithm. Proc. Nutl. Acud. Sci. U.S.A. 86, 9268-9272. Mas, M. T., Smith, K. C., Yarmush, D. L., Aisaka, K., and Fine, R. M. (1992). Modeling the anti-CEA antibody combining site by homology and conformational search. Proteins 14, 483-498. McLachlan, A. D. (1972). Repeated sequences and gene duplication in proteins.]. Mol. Bid. 64,417-437. McManus, S., and Riechmann, L. (1991). Use of 2D NMR, protein engineering, and molecular modeling to study the hapten binding site of an antibody Fv fragment against 2phenyloxazolone. Bzochemistly 30, 585 1-5857. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, E. A. H., Teller, E. (1953). Equation of state calculations by fast computing machines.j. Chem. Phys. 21, 1087-1092. Miyazawa, S., and Jernigan, R. L. (1985). Estimation of effective interresidue contact energies from protein crystal structures: Quasi-chemical approximation. Macromolecules 18, 534-552. Morrison, S. L., Johnson, M. J., Herzenberg, L. A., and Oi, V. T. (1984). Chimeric human antibody molecules: Mouse antigen-binding domains with human constant regions domains. Proc. Nutl. Acud. Sci. U.S.A. 81, 6851-6855. Mottez, E., Jaulin, C . , Godeau, F., Choppin, J., Levy, J. P., and Kourilsky, P. (1991). A single-chain murine class I major transplantation antigen. Eur.]. Zmmunol. 21, 4 6 7 4 7 1 . Moult, J., and James, M. N. G . (1986). An algorithm for determining the conformation in proteins by systematic search. Proteins 1, 146-163. Murphy, K., and Freire, E. (1992). Thermodynamics of structural stability and cooperative folding behavior in proteins. Adu. Protein Chem. 43, 313-361. Murphy, K. P., Xie, D., Garcia, C., Amzel, L. M., and Freire, E. (1993). Structural energetics of peptide recognition: Angiotensin IIiantibody binding. Proteins 15, 1 1 3-120. Murphy, K. P., Xie, D., Thompson, K. S., Amzel, L. M., and Freire, E. (1994). Entropy in biological binding processes: Estimation of translational entropy loss. Proteins 18, 63-67. Mylvaganam, S. E., Paterson, Y., Kaiser, K., Bowdish, K., and Getzoff, E. D. (1991). Biochemical implications form the variable gene sequences of an anti-cytochrome c antibody and crystallographic characterization of its antigen-binding fragment in free and antigencomplexed f0rrns.J. Mol. Biol. 221, 455-462. Nagel, E., and Newman, J. R. (1958). “Godel’s Proof.” New York University Press, New York. Naghibi, H., Tamura, A., and Sturtevant, J. M. (1995). Significant discrepancies between van’t Hoff and calorimetric enthalpies. Proc. Nutl. Acud. Sci. U.S.A. 92, 5597-5599. Nakatani, T., Lone, Y. C., Yarnakawa,J., Kanaoka, M., Gomi, H., Wijdenes, J., and Noguchi, H. (1994). Humanization of mouse anti-human IL-2 receptor antibody B-BlO. Protein Eng. 7,435-443. Near, R., Mudgett-Hunter, M., Novotny, J., Bruccoleri, R.E., and Ng, S. C. (1993). Characterization of an anti-digoxin antibody binding site by site-directed in uitro mutagenesis. Mol. Immunol. 30, 369-377. Nell, L. J., McCammon, J. A., and Subramaniam, S. (1992). Anti-insulin antibody structure

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

25 1

and conformation. I: Molecular modeling and mechanics of an insulin antibody. Biopolymers 32, 1 1-21. Neuberger, M. S., Willimas, G. T., and Fox, R. 0 . (1984). Recombinant antibodies possessing novel effector functions. Nature 312, 604-608. Neuberger, M. S., Willimas, G . T., Mitchell, E. B., Jouhal, S. S., Flanagan, J. G., and Rabbitts, T. H. (1985). A hapten-specific chimaeric IgE antibody with human physiological effector functions. Nature 314, 268-270. Newell, N., Richards, J. E., Tucker, P. W., and Blattner, F. R. (1980).J. genes for heavy chain immunoglobulins of mouse. Science 209, 1128-1 132. Nicholls, A. (1992). “GRASP: Graphical Representation and Analysis of Surface Properties.” Columbia University, New York. Nicholls, A., Sharp, K. A., and Honig, B. (1991).Protein folding and association: Insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins 11,28 1-296. Noelken, M. E., Nelson, C. A,, Buckley, C. E., and Tanford, C. (1965). Gross conformation of rabbit 7 S y-immunoglobulin and its papain-cleaved fragments.J. Biol. Chem. 240, 2 18-224. Norel, R., Lin, L., Wolfson, C. A,, and Nussinov, R. (1994). Shape complementarity at proteinprotein interfaces. Binpolymers 34, 933-940. Novotny, J. (1991). Protein antigenicity: A thermodynamics approach. Mol. Immunol. 28, 201-207. Novotny, J., Bruccoleri, R. E., Newell, J., Murphy, D., Haber, E., and Karplus, M. (1983). Molecular anatomy of the antibody binding site./. Biol. Chem. 258, 14433-14437. Novotny, .J., and Haber, E. (1985). Structural invariants of antigen binding: Comparison of VL-VH and VL-VI. domain dimers. Proc. Natl. Acad. Sci. U.S.A. 82, 4592-4596. Novotny, J., and Haber, E. (1986). Static accessibility model of protein antigenicity: The case of scorpion neurotoxin. Biochemistry 25,6748-6754. Novotny, J., and Sharp, K. A. (1992). Electrostatic fields in antibodies and antibody/antigen complexes. Prog. Biophys. Mnl. Biol. 58, 203-224. Novotny, J., Bruccoleri, R. E., and Karplus, M. (1984). An analysis of incorrectly folded protein models.]. Mol. Biol. 177, 787-818. Novotny, J., Handschumacher, M., and Haber, E. (1986a). Location of antigenic epitopes on antibody mo1ecules.J. Mol. Biol. 189, 175-72 1 . Novotny, J., Handschumacher, M., Haber, E., Bruccoleri, R. E., Carlson, W. B., Fanning, D. W., Smith, J. A., and Rose, G . D. (1986b). Antigenic determinants in proteins coincide with surface regions accessible to large probes (antibody domains). Proc. Natl. Acad. Sci. U.S.A. 83, 226-230. Novotny, J., Tonegawa, S., Saito, H., Kranz, D. M., and Eisen, H. N. (1986~).Secondary, tertiary and quaternary structure of T-cell-specific immunoglobulin-like polypeptide chains. Proc. Natl. Acad. Sci. U.S.A. 83, 742-746. Novotny, J., Handschumacher, M., and Bruccoleri, R. E. (1987). Protein antigenicity: A static surface property. Immunol. Today 8, 26-3 1 . Novotny, J., Bruccoleri, R. E., and Saul, F. A. (1989). On the attribution of binding energy in antigen-antibody complexes McPC 603, D1.3 and HyHEL-5. Biochemistry 28, 4753-4749. Novotny, J., Bruccoleri, R. E., and Haber, E. (1990). Computer analysis of mutations that affect antibody specificity. Proteins 7, 93-98. Novotny, J., Ganju, R. K., Smiley, S. T., Hussey, R. E., Luther, M. A., Recny, M. A,, Siciliano, R. F., and Reinherz, E. L. (1991). A soluble, single-chain T-cell receptor fragment endowed with antigen-combining properties. Proc. Natl. Acad. Sci. U.S.A. 88, 8684686850. Nuss, J. M., Bossart, P. J., and Air, G. M. (1993). Indentification of critical contact residues in the NC41 epitope of the subtype N9 influenza neuraminidase. Proteins 15, 121-132. Ochi, A,, Hawley, R. G., Shulman, M. J., and Hozumi, N. (1983). Transfer of a cloned

252

JIKI NOVOTNY AND JUKGEN BAJORATH

immunoglobulin light chain gene to mutant hybridoma cells restores specific antibody production. Nuturf 302, 340-343. Oi, V. T,, Morrison, S. L., Herzenberg, L. A., and Berg, P. (1983). Immunoglobulin gene expression in transformed lymphoid cells. Proc. Natl. Acad. Sci. U.S.A. 80, 825-829. Pdbo, C. O., and Suchanek, E. G. (1986). Computer-aided model building strategies for protein design. Biochemist? 35, 5987-5991. Pack, P., and Pluckthun, A. (1992). Miniantibodies: Use of amphipathic helices to produce functional, flexibly linked dimeric Fv fragments with high avidity in Escherichia coli. Biochemistry 31, 1579-1584. Pack, P., Kujau, M., Schroeckh, V., Knupfer, U., Wenderoth, R., Riesenberg, D., and Pluckthun, A. (1993). Improved bivalent miniantibodies, with identical avidity as whole antibodies, produced by high cell density fermentation of Escherichia coli. Biotechnology 11, 127 1-1277. Pack, P., Muller, K., Zahn, R., Pluckthun, A. (1995). Tetravalent miniantibodies with high avidity assembled in Escherichia coli. J . Mol. Biol. 246, 28-34. Padlan, E. A. (1985). Quantitation of immunogenic potential of protein antigens. Mol. Immunol. 22, 1243-1254. Padlan (1990). On the nature of antibody combining sites: Unusual structural features that may confer on these sites an enhanced capacity for binding ligands. Proteins 7, 112-124. Padlan, E. (1991). A possible procedure for reducing the imniunogenicity of antibody variable domains while preserving their ligand-binding properties. Mol. Immunol. 28, 489-498. Padlan, E. A. (1994).Anatomy of the antibody molecule. Mol. Ivnmunol. 31, 169-218. Padlan, E. A,, and Davies, D. R. (1986). A model of the Fc of Immunoglobulin E. Mol. Immunol. 23, 1063-1075. Padlan, E., Davies, D. R., Pecht, I., Givol, D., and Wright, C. (1977). Model-building studies of antigen binding sites: The hapten-binding site of MOPC 315. Cold Spring Harbor Symp. Quant. Biol. Padlan, E. A,, Silverton, E. W., Sheriff, S., Cohen, G. H., Smith-Gill, S. J., and Davies, D. R. (1989). Structure of an antibody-antigen complex: Crystal structure of the HyHEL-10 Fab-lysozyme complex. Proc. Natl. Acad. Sci. U.S.A. 86, 5938-5942. Pascual-Ahuir, J. L., and Silla, E. (1990). GEPOL: An improved description of molecular surfaces. I: Building the spherical surface set.J . Comput. Chem. 11, 1047-1060. Paterson, Y., Englander, S. W., and Roder, H. (1990). An antibody binding site on cytochrome c defined by hydrogen exchange and two-dimensional NMR. Science 249,755-76 1. Patten, P., Yokota, T., Rothbard, J., Chien, Y. H., Arai, K. I., and Davis, M. M. (1984). Structure, expression and divergence of T-cell receptor p-chain variable regions. Nature 312,40-46. Pauling, I,. (1940). A theory of the structure and process of formation of antibodies. J. Am. Chem. Sue. 62, 2643-2657. Pauling, L. (1960). “The Nature of the Chemical Bond,” 3rd ed. Cornell University Press, Ithaca, NY. Pauling, L., and Corey, K. B. (1951). The pleated sheet, a new layer of configuration of polypeptide chains. Proc. Natl. Acad. Sci. U.S.A. 37, 251-256. Pedersen, J., Searle, S., Henry, A., and Rees, A. R. (1992). Antibody modeling: Beyond homology. Immunomethods 1, 126-1 36. Pedersen, J. T., Henry, A. H., Searle, S. J., Guild, B. C . , Roguska, M., and Rees, A. R. (1994). Comparison of surface accessible residues in human and murine immunoglobulin Fv domains.]. Mol. Bid. 235, 959-973. Pellegrini, M., and Doniach, S. (1993). Computer simulation of antibody binding specificity. Proteins 15. 436444.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

253

Perisic, O.,Webb, P. A., Holliger, P., Winter, G., and Williams, R. L. (1994). Crystal structure of a diabody, a bivalent antibody fragment. Structure 2, 12 17-1 226. Peterson, C., Malone, C. C., and Williams, R. C., Jr. (1995). Rheumatoid-factor reactive sites on C H established ~ by overlapping 7-mer peptide epitope analysis. Mol. Immunol. 32, 57-75. Pickett, S. D., and Sternberg, M. J. E. (1993). Empirical scale of side chain conformational entropy in protein folding.]. Mol. Biol. 231, 825-839. Ping, J., Schildbach, J. F., Shaw, S. Y., Quertermous, T., Novotny, J., Bruccoleri, R. E., and Margolies, M. N. (1993). Effect of heavy chain signal peptide mutations and NH2-terminal chain length on binding of anti-digoxin antib0dies.J. Biol. Chem. 268, 23000-23007. Planck, M. (1933). “Where Is Science Going?” p. 176. Ox Bow Press. Pliickthun, A. (1992). Mono- and bivalent antibody fragments in Escherichzu coli: Engineering, folding and antigen binding. Immunol. Rev. 130, 151-188. Ponder, J. W. and Richards, F. M. (1987). Tertiary templates for proteins: Use of packing criteria in the enumeration of allowed sequences for different structural classes. J . Mol. Biol. 193, 775-791. Prasad, L., Sharma, S., Vandonselaar, M., Quail,J. W., Lee, J. S.,Waygood, E. B., Wilson, K. S., Dauter, Z., and Delbaere, L. T. J. (1993). Evaluation of mutagenesis of epitope mapping. ]. Biol. Chem. 268, 10705-10708. Pressman, D., and Grossberg, A. L. (1968). “The Structural Basis of Antibody Specificity.” Benjamin, New York. Presta, L. G., Lahr, S. J., Shields, R. L., Porter, J. P., Gorman, C. M., Fendly, B. M., and Jardieu, P. M. (1993). Humanization of an antibody directed against 1gE.J. Immunol. 151, 2623-2632. Privalov, P. L. (1979). Stability of proteins. Adv. Protein Chem. 33, 167-241. Privalov, P. L., and Makhatadze, G. I. (1990). Heat capacity ofproteins. 11: Partial molar heat capacity of the unfolded polypeptide chain in proteins: Protein unfolding effects.]. Mol. Bzol. 213,385-391. Pumphrey, R. (1986). Computer models of the human immunoglobulins. Shape and segmental flexibility. Immunol. Toduy 7, 174-1 78. Queen, C., Schneider, W. P., Selick, H. E., Payne, P. W., Landolfi, N. F., Duncan, J. F., Avdalovic, N. M., Levitt, M., Junghans, R. P., and Waldmann, T. A. (1989). A humanized antibody that binds to the interleukin-2 receptor. Proc. Nutl. Acad. Scz. U.S.A. 86, 1002910033. Rashin, A. A. (1984). Buried surface area, conformational entropy, and protein stability. Biopolymers 23, 1605-1620. Rauffer, N., Zeder-Lutz, G., Wenger, R., Van Regenmortel, M. H. V., and Altschuh, D. (1994). Structure-activity relationship for the interaction between cyclosporin A derivatives and the Fab fragment of a monoclonal antibody. Mol. Immunol. 31, 913-922. Reid, K. €5. M., and Porter, R. R. (1975). Subunit composition and structure of subcomponent C l q of the first component of human complement. Biochem.]. 155, 19-23. Reiter, Y., Brinkmann, U., Webber, K. O., Jung, S. H., Lee, B. K., and Pastan, I. (1994a). Engineering interchain disulfide bonds into conserved framework regions of Fv fragments: Improved biochemical characteristics of recombinant immunotoxins containing disulfide-stabilized Fv. I’mtein Eng. 7, 697-704. Reiter, Y., Brinkmann, U., Kreitman, Jung, S. H., Lee, B. K., and Pastan, I. (1994b). Stabilization of the Fv fragments in recombinant immunotoxins by disulfide bonds engineered into conserved framework regions. Biochemislly 33, 545 1-5459. Richards, F. M. (1974).The interpretation of protein structures: Total volume, group volume distributions and packing density.J. Mol. Biol. 82, 1-14.

254

JIRI NOVOTNY AND JURGEN BAJORATH

Richards, F. M. (1977). Areas, volumes, packing and protein structure. Annu. Rev. Biofhys. Bioeng. 6, 151-176. Richardson, J. (1981). The anatomy and taxonomy of protein structure. Adv. Protein Chem. 34, 167-339. Richmond, T. J . (1984). Solvent accessible surface area and excluded volume in proteins./. Mol. Biol. 178, 63-89. Rice, D., and Baltimore, D. (1982). Regulated expression of an immunoglobulin K gene introduced into a mouse lymphoid cell line. Proc. Natl. Acad. Sci. U.S.A. 79, 7862-7865. Riechmann, L., Clark, M., Waldmann, H., and Winter, G. (1988). Reshaping human antibodies for therapy. Nature 332, 323-327. Ring, C. S., Kneller, D. G., Langridge, R., and Cohen, F. E. (1992). Taxonomy and conformational analysis of loops in pr0teins.J. Mol. Biol. 224, 685-699. Rini, J. M., Schulze-Gahmen, U., and Wilson, I. A. (1992). Structural evidence for induced fit as a mechanism for antibody-antigen recognition. Science 255, 959-965. Roberts, S., Cheetham, J. C., and Rees, A. R. (1987). Generation of an antibody with enhanced affinity and specificity for its antigen by protein engineering. Nature 328, 731-734. Roberts, V. A,, Iverson, B. L., Iverson, S. A,, Benkovic, S. J., Lerber, R. A., Getzoff, E. D., and Tainer, J. A. (1990). Antibody remodeling: A general solution to the design of a metalcoordination site in an antibody binding pocket. Proc. Natl. Acad. Sci. U.S.A. 87, 66546658. Roberts, V. A., Stewart, J., Benkovic, S. J., and Getzoff, E. D. (1994). Catalytic antibody model and mutagenesis implicate arginine in transition-state stabilization. J . Mol. Biol. 235, 1098-1 116. Rosenstein, R. W., and Richards, F. F. (1976). The distance between the contact sites for DNP and menadione ligands in the combining region of myeloma proteins binding both haptens. Immunochemistry 13,939-943. Rossmann, M. G., and Argos, P. (1975). Acomparison of the heme binding packet in globins and cytochrome bs. J . Biol. Chem. 250, 7525-7532. Rossmann, M. G., and Argos, P. (1976). Exploring structural homology of pr0teins.J. Mol. Biol. 105, 75-95. Rudikoff, S., Satow, Y., Padlan, E. A., Davies, R. D., and Potter, M. (1981). Kappa chain structure from a crystallized murine Fab’: Role ofjoining segment in hapten binding. Mol. Immunol. 18,705-7 1 1. Rudikoff, S., Giusti, A. M., Cook, W. D., and Scharff, M. D. (1982). Single amino acid substitution altering antigen-binding specificity. Proc. Natl. Acad. Sci. U.S.A. 79, 1979-1983. Ruff-Jamison, and Glenney, J. R., Jr. (1993). Molecular modeling and site-directed mutagenesis of an anti-phosphorylcholine antibody predicts the combining site and allows detection of higher affinity interactions. Protein Eng. 6, 661-668. Rusconi, S., and Kohler, G . (1985). Transmission and expression of a specific pair of rearranged immunoglobulinp and K genes in a transgenic mouse line. Nature 314, 330-334. Russell, R. B., and Barton, G. J. (1994). Structural features can be unconserved in proteins with similar folds: An analysis of side-chain to side-chain contacts, secondary structure and accessibi1ity.J. Mol. Bid. 244, 332-350. Sahagan, B. G., Dorai, H., Saltzgaber-Muller, J., Toneguzzo, F., Guindon, C . A,, Lilly, S. P., McDonald, K. W., Morissey, D. V., Stone, B. A., Davis, G. L., McIntosh, P. K., and Moore, G. P. (1986). A genetically engineered muririeihuman chimeric antibody retains specificity for human tumor-associated antigen.]. Immunol. 137, 1066-1074. Saito, H., Kranz, D. M., Takagaki, Y., Hayday, A. D., Eisen, H. N., and Tonegawa S. (1984). Complete primary structure of a heterodimeric T-cell receptor deduced from cDNA sequences. Nature 309, 757-762.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

255

Saito, H., Kranz, D. M., Takagaki, Y., Hayday, A. C., Eisen, €3.N., and Tonegawa S. (1984). A third rearranged and expressed gene in a clone of cytotoxic T lymphocytes. Nature 312, 36-40. Sali, A., Shakhnovich, E., and Karplus, M. (1994). How does a protein fold? Nature 369, 248-25 1. Sali, A. ( 1 995). Modeling mutations and homologous proteins. Cum Opin. Biotech. 6, 437451. Satow, Y., Cohen, G. H., Padlan, E. A., and Davies D. R. (1986). Ptosphorylcholine binding immunoglobulin McPC 603: An X-ray diffraction study at 2.7 A.J. Mol. Bid. 190, 593604. Schearman, C. W., Kanzy, E. J., Lawrie, D. K., Li, Y. W., Thammana, P., Moore, G. P., and Kurrle, R. (1991). Construction, expression and biologic activity of murineihuman chimeric antibodies with specificity for the human a@ T-cell receptor. J . Zmmunol. 146, 928-935. Schechter, I. (1971). Mapping of the combining sites of antibodies specific for polyalanine chains. Ann. N . Y. Acad. Sci. 190, 394-419. Schildbach, J. F., Panka, D. J., Parks, D. R., Jager, G. C., Novotny, J., Herzenberg, L. A,, Mudgett-Hunter, M., Bruccoleri, R. E., Haber, E., and Margolies, M. N. (1991). Altered hapten recognition by two anti-digoxin hybridoma variants due to variable region point mutations.]. Biol. Chem. 266, 46404647. Schildbach, J. F., Near, R. I., Bruccoleri, R. E., Haber, E., Jeffrey, P. D., Ng, S. C., Novotny, J, Sheriff, S., and Margolies, M. N. (1993). Heavy chain position 50 is a determinant of affinity and specificity for the anti-digoxin antibody 26-10. J. Biol. Chem. 268, 2173921747. Schildbach, J. F., Near, R. I., Bruccoleri, R. E., Haber, E., Jeffrey, P. D., Novotny, J., Sheriff, S., and Margolies, M. N. (1993). Modulation of antibody affinity by a non-contact residue. Protein Sci. 2, 206-214. Schildbach, J. F., Shaw, S. Y., Bruccoleri, R., Haber, E., Herzenberg, L. A., Jager, G. C., Jeffrey, P. D., Panka, D. J., Parks, D. R., Near, R. I., Novotny, J., Sheriff, S., and Margolies, M. N. (1994). Contribution of a single heavy chain residue to specificity of an anti-digoxin monoclonal antibody. Protein Sci. 3, 737-749. Schneider, W., Wensel, T. G., Stryer, L., and Oi, V. T. (1988). Genetically engineered immunoglobulins reveal structural features controlling segmental flexibility. Proc. Natl. Acad. Sci. U.S.A. 85, 2509-2513. Schodin, B. A., and Kranz, D. M. (1993). Binding affinity and inhibitory properties of a single-chain anti-T cell receptor antib0dy.J. Biol. Chem. 268, 25722-25727. Schrauber, H., Eisenhaber, F., and Argos, P. (1993). Rotamers: To be or not to be? An analysis of amino acid side chain conformations in globular proteins. J. Mol. Bzol. 130, 592-612. Schrodinger, E. (1926). Quantisierung als Eigenwertproblem: Erste Mitteilung. Ann. Phys. 79,361-376. Schrodinger, E. (1 926). Quantisierung als Eigenwertproblem: Zweite Mitteilung. Ann. Phys. 79,489-527. Schrodinger, E. (1926). Uber das Verhaltniss der Heisenberg-Born-Jordanschen Quantenmechanik zu der meinen. Ann. Phys. 79, 734-756. Schrodinger, E. (1926). Quantisierung als Eigenwertproblem: Dritte Mitteilung. Ann. Phys. 80,437-490. Schrodinger, E. ( 1 926). Quantisierung als Eigenwertproblem: Vierte Mitteilung. Ann. Phys. 81, 109-139. Schulze-Gamen, U., Rini, J . M., and Wilson, I. A. (1993). Detailed analysis of the free and

256

JIRI NOVOTNY AND JURCEN BAJOKA'I'H

bound conformations of an antibody: X-ray structures of Fab 1719 and three different Fab-peptide comp1exes.J. Mol. Biol. 234, 1098-1 118. Sela, M. (1969). Antigenicity: Some molecular aspects. Science 166, 1365-1374. Shakked, Z., Guzikevich-Guerstein, G., Frolow, F., Rabinovich, D., Joachimiak, A,, and Sigler, P. B. (1994). Determinants of repressorioperator recognition from the structure of the trp operator binding site. Nature 368, 469-473. Sharon, J., Gefter, M. L., Manser, T., Morrison, S. L., and Ptashne, M. (1984). Expression of a VrlClc chimaeric protein in mouse myeloma cells. Nature 309, 364-367. Sharp, K. A,, Nicholls, A., Friedman, K., and Honig, B. (1991). Extracting hydrophobic free energies from experimental data: Relationship to protein folding and theoretical models. Biorhernistiy 30, 106-109. Sheriff, S., Silverton, E. W., Padlan, E. A., Cohen, G. H., Smith-Gill, S. J., Finzel, B. C . , and Davies, D. R. (1 987). Three-dimensional structure of an antibody-antigen complex. Proc. Natl. Acud. Sci. U.S.A. 84, 8075-8079. Shih, H. L., Brady, J., and Karplus, M. (1985). Structure of proteins with single mutations: A minimum perturbation approach. Proc. Nutl. Arad. Sci. U.S.A. 82, 1697-1700. Sibanda, B. L., and Thorton, J. M. (1985). /3-Hairpin families in globular proteins. Nature 316, 170-174. Siebenlist, U., Ravetch, 1. V., Korsmeyer, S., Waldmann, T.,and Leder, P. (1981). Human immunoglobulin D segments encoded in tandem multigeneic Families. Nature 294, 63 1635. Singer, S. J., and Doolittle, R. F. (1966). Antibody active sites and immunoglobulin chains. Science 153, 13-25. Sippl, M. (1990). Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J . Mol. Bid. 213, 859-883. Sitkoff, D., Sharp, K. A., and Honig, B. (1994). Accurate calculation of hydration free energies using macroscopic solvent models.,]. Phys. Chem. 98, 1978-1 988. Smith, K. C., and Honig, B. (1994). Evaluation of conformational free energies of loops in proteins. Proteins 18, 119-132. Smith, A. M., and Benjamin, D. C. (1991).The antigenic structure of staphylococcal nuclease. 11: Analysis of the N-1 epitope by site-directed mutagenesis.1. Immunol. 146, 1259-1 264. Smith, A. M., Woodward, M. P., Hershey, C. W., Hershey, E. D., and Benjamin, D. C. (1991).The antigenic structure of staphylococcal nuclease. I: Mapping epitopes by sitedirected mutagenesis.]. Immunol. 146, 1254-1258. Smith-Gill, S., Mainhart, C., Lavoie, 1'.B., Feldmann, R. J., Drohan, W., and Brooks, B. K. (1987). A three-dimensional model of an anti-lysozyme antibody.,/. Mol. Biol. 194, 713724. Smythe, M. L., and von Itzstein, M. (1994). Design and synthesis of a biologically active antibody mimic based on an antibody-antigen crystal structure. J . Am. Chem. SOC. 116, 2725-2733. Snow, M. E., and Arnzel, M. (1986). Calculating three-dimensional changes in protein structure due to amino acid substitutions: The variable region of immunoglobulins. Proteins 1, 276-279. Sollazzo, M., Billetta, K., and Zanetti, M. (1990). Expression of an exogenous peptide epitope genetically engineered in the variable region of an immunoglobulin: Implications for antibody and peptide folding. Protein Eng. 4, 215-220. Sompuram, S. K., and Sharon, J. (1993). Verification of a model of a Fab complex with phenylarsonate by oligonucleotide-directed mutagenesis.]. Immunol. 150, 1822-1 828. So0 Hoo, W. F., Lacy, M. J., Denzin, L. K., Voss, E. W., Hardman, K. D., and Kranz, D. M.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

257

(1992). Characterization of' a single-chain T-cell receptor expressed in EscheriLhia coli. Proc. Nutl. Acad. Sci. U.S.A. 89, 47594763. Sosnick, T. R., Benjamin, D. C., Novotny,J., Seeger, P. A., and Trewhella, J . (1992). Distance between the antigen-binding sites of three murine antibody subclasses measured using neutron and X-ray scattering. Biochemistry 31, 1779-1 786. Spolar, R. S., Livingstone, J. R., and Record, M. T.,Jr. (1992). Use of liquid hydrocarbon and amide transfer data to estimate contributions to thermodynamic functions of protein folding from removal of nonpolar and polar surface from water. Biochemistry 31, 3947-3955. Stanfield, R. L., Fieser, T. M., Lerner, R. A,, and Wilson, I. A. (1990).oCrystalstructures of an antibody to a peptide and its complex with peptide antigen at 2.8 A. Science 248, 712-719. Stanfield, R. I,., Takimoto-Kakimura, M., Rini, J . M., Profy, A. T., and Wilson, I. A. (1993). Major antigen-induced domain rearrangements in an antibody. Structure 1, 83-93. Stevens, F. J., Westholm, F. A., Solomon, A,, and Schniffer, M. (1980). Self-association of human immunoglobulin kappa I light chains: Role of the third hypervariable region. Proc. Natl. A c d . Sci. U.S.A. 77, 1144-1 148. Stouch, T. R., and Jurs, P. C. (1986). A simple method for the representation, quantification, and comparison ofthe volumes and shapes of chemical compounds.]. Chem. Znf: Comput. Sci. 26, 4-12. Studnicka, G. M., Soares, S., Better, M., Williams, R. E., Nadell, R., and Horwitz, A. H. (1994). Human-engineered monoclonal antibodies retain full specific binding activity by preserving non-CDR complementarity-modulating residues. Protein Eng. 7, 805-8 14. Sturtevant, J . M. (1994). The thermodynamic effects of protein mutations. C u m Opin. Struct. Bid. 4, 69-78. Suh, S. W., Bhat, T. N., Navia, M. A,, Cohen, G. H., Rao, D. N., Rudikoff, S., and Davies, D. R. (1986). The galactan-binding immunoglobulin Fab 5539: An X-ray diffraction study at 2.6 A resolution. Proteins 1, 74. Summers, N. L., Carlson, W. D., and Karplus, M. (1987). Analysis of side-chain orientations in homologous proteins.]. Mol. B i d . 196, 175-198. Sun, L. K., Curtis, P., Rakowicz-Szulczynska,E., Ghrayeb, J . , Chang, N., Morrison, S. L., and Koprowski, H. (1987). Chimeric antibody with human constant regions and mouse variable regions directed against carcinoma-associated antigen 17-IA. Proc. Natl. Acad. Sci. U.S.A. 84, 214-218. Sutton, €3. J., and Gould, H. J. (1993). The human IgE network. Nature 366, 421428. Sutton, B. J., Gettins, P., Givol, D., Marsh, D., Wain-Hobson, S., Willan, K. J., and Dwek, R. A. (1977). The gross architecture of an antibody binding site as determined by spin-label mapping. Biochem.J . 165, 177-197. Tai, M. S., Mudgett-Hunter, M., Levinson, D., Wu, G. M., Haber, E., Oppermann, H., and Huston, J. S. (1990). A bifunctional fusion protein containing Fc-binding fragment B of staphyloccocal protein A amino terminal to antidigoxin single-chain Fv. Biochemistry 29, 8024-8030. Tainer, J. A,, Getzoff, E. D., Alexander, H., Houghten, R. A,, Olson, A. J., Lerner, R. E., and Hendrickson, W. A. (1984). The reactivity of anti-peptide antibodies is a function of the atomic mobility of sites in a protein. Nature 312, 127-134. Tainer, J . A,, Getzoff, E. D., Paterson, A., Olson, A. J., and Lerner, R. A. (1985).The atomic mobility component of protein antigenicity. Annu. Rev. Zmmunol. 3, 423438. 'l'akeda, S. I., Naito, T., Hama, K., Noma, T., and Honjo, T. (1985). Construction of chimaeric processed immunoglobulin genes containing mouse variable and human constant region sequences. Nature 314,452454. Takkinen, K., Laukkanen, M. L., Sizmann, D., Alfthan, K., Immonen, T., Vanne, L., Kaartinen, M., Knowles, J. K. C., and Teeri, T. T. (1991). An active single-chain antibody

258

JIRI NOVOTNY AND JURCEN BAJORATH

containing a cellulase linker domain is secreted by Escherichia coli. Protein Eng. 4, 837-841. Tanaka, S., and Scheraga, H. A. (1975). Model of protein folding: Inclusion of short-, medium- and long-range interactions. Proc. Natl. Acad. Sci. U.S.A. 72, 3802-3806. Thornton, J. M., Edwards, M. S., Taylor, W. R., and Barlow, D. J. (1986). Location of “continuous” antigenic determinants in the protruding regions of proteins. EMBO J. 5,409-413. Tidor, B., and Karplus, M. (1994). The contribution of vibrational entropy to molecular association: The dimerization of insulin. J . Mol. Biol. 238, 4 0 5 4 1 4 . Titani, K., Whitley, R., Avogadro, L., and Putnam, F. M. (1965). Immunoglobulin structure: Partial amino acid sequence of a Bence-Jones protein. Science 149, 1090-1092. Totrov, M., and Abagyan, R. (1994). Detailed ah initio prediction of lysozyme-antibody complex with 1.6 8, accuracy. Nat. Struct. Biol. 1, 259-263. Trail, P.A., Willner, D., Lasch, S. J., Henderson, A. J., Hofstead, S., Cassaza, A. M., Firestone, R. A., Hellstrom I., and Hellstrom, K. E. (1993). Cure of xenografted human carcinomas by BR96-doxorubicin immunoconjugates. Science 261, 212-215. Tramontano, A. and Lesk, A. M. (1992).Common features of the conformations of antigenbinding loops in immunoglobulins and application to modeling loop conformations. proteins 13, 231-245. Tramontano, A., Chothia, C., and Lesk, A. M. (1989). Structural determinants of the conformations of medium-sized loops in proteins. Proteins 6, 382-394. Tramontano, A., Chothia, C., and Lesk, A. M. (1990). Framework residue 71 is a major determinant of the position and conformation of the second hypervariable region in the VH domains of immunoglobu1ins.J. Mol. Biol. 215, 175-182. Tulip, W. R., Harley, V. K., Webster, R. G., and Novotny, J. (1994). N9 neuraminidase complexes with antibodies NC41 and NC10: Empirical free energy calculations capture specificity trends observed with mutant binding data. Biochemistry 33, 79867997. Tunon, I., Silla E., and Pascual-Ahuir, J. L. (1992). Molecular surface area and hydrophobic effect. Protein Eng. 5, 715-716. Vajda, S., Weng, Z., Rosenfeld, R., and DeLisi, C. (1994). Effect of conformational flexibility and solvation on receptor-ligand binding free energies. Biochemistry 33, 13977-1 3988. Valentine, R. C., and Green, N. M. (1967). Electron microscopy of antibody-antigen complexes. J . Mol. Biol. 27, 615-617. van Gunsteren, W. F., and Berendsen, H. J. C. (1987). “GROMOS, Groningen Molecular Simulation Computer Program Package.” University of Groningen, Groningen, Netherlands. Verhoyen, M., Milstein, C., and Winter, C. (1988). Reshaping human antibodies: Grafting an antilysozyme activity. Science 239, 1534-1536. Wain-Hobson, S., Dower, S. K., Gettins, P., Givol, D., McLauglin, A. C., Pecht, I., Sunderland, C. A,, and Dwek, R. A. (1977).Specificity of interactions of hapten side chains with the combining site of the myeloma protein MOPC 315. Biochem. J . 165, 227-235. Walls, P. H., and Sternberg, M. J. E. (1992). New algorithm to model protein-protein recognition based on surface complementarity: Applications to antibody-antigen docking. J . Mol. Biol. 228, 277-297. Ward, E. S. (1992). Secretion of T-cell receptor fragments from recombinant Escherichia coli cells.J. Mol. Bid. 224, 885-890, Warme, H. K., and Scheraga, H. A. (1974). Refinement of X-ray structure of lysozyme by complete energy minimization. Biochemistry 13, 757-767. Webber, K. O., Reiter, Y., Brinkmann, U., Kreitman, R., and Pastan, I. (1995). Preparation and characterization of a disulfide-stabilized Fv fragment of the anti-Tac antibody: Comparison with its single-chain analog. Mol. Immunol. 32, 249-258. Weber, G. (1975). Energetics of ligand binding in proteins. Adv. Protein Chem. 29, 2-84.

COMPUTATIONAL BIOCHEMISTRY OF ANTIBODIES

259

Webster, R. G., Air, G. M., Metzger, D. W., Colman, P. M., Varghese, J. N., Baker, A. T., and Laver, W. G. (1987). Antigenic structure and variation in an influenza virus N9 neuraminidase.]. liirol. 61, 2910-2916. Webster, D. M., Henry, A. H., and Rees, A. R. (1994). Antibody-antigen interactions. Curr. Ofin. Struct. Bzol. 4, 123-129. Weiner, S. J., Kollman, P. A,, Nguyen, D. T., and Case, D. A. (1986). An all atom force field for simulations of proteins and nucleic acids.]. Comput. Chem. 7, 230-252. Westhof, E., Altschuh, D., Moras, D., Bloomer, A. C., Mondragon, A., Klug, A,, and van Regenmortel, M. H. V. (1984). Correlation between segmental mobility and the location of antigenic determinants in proteins. Nature 311, 123-126. Weyl, H. (1 928). “Gruppentheorie und Quantenmechanik,” English ed., “Theory of Groups and Quantum Mechanics.” Dover, New York, 1950. White, F. H. (1961). Regeneration of native secondary and tertiary structure by air oxidation of reduced ribonuc1ease.j. Biol. Chem. 236, 1353-1360. Whitlow, M., Bell, B. A., Feng, S. L., Filpula, D., Hardman, K. D., Hubert, S. L., Rollence, M. L., Wood, J. F., Schott, M. E., Milenic, D., Yokota, T., and Schlom, J. (1993).An improved linker for single-chain Fv with reduced aggregation and enhanced proteolytic stability. Protein Eng. 6, 989-995. Whitlow, M., Filpula, D., Rollence, M. L., Feng, S. L., and Wood, J. F. (1994). Multivalent Fvs: Characterization of single-chain Fv oligomers and preparation of a bispecific Fv. Protein Eng. 7, 1017-1026. Willan, K. J., Marsh, D., Sunderland, C. A., Sutton, B. J., Wain-Hobson, S., Dwek, R. A,, and Givol, D. Comparison of dimensions of the combining sites of the dinitrophenyl-binding immunoglobulin A myeloma proteins MOPC 3 15, MOPC 460 and XRPC 25 by spin-label mapping. B2ochem.j. 165, 199-206. Williams, A. F., and Barclay, A. N. (1988). The immunoglobulin superfamily-Domains for cell surface recognition. Ann. Rev. Immunol. 6, 381-405. Williams, A. F., Barclay, N., Clark, S. J., Paterson, D. J., and Willis, A. C . (1987). Similarities in sequences and cellular expression between rat CD2 and CD4 antigens. j.ExfitZ. Med. 165,368-380. Williams, D. H., Cox, J. P. L., Doig, A. J., Gardner, M., Gerhard, U., Perry, T., Kaye, P. T., Lal, A. R., Nicholls, I. A., Salter, C. J., and Mitchell, R. C. (1991). Toward the semiquantitative estimation of binding constants. Guides for peptide-peptide binding in aqueous so1ution.j. Am. Chem. SOC. 113,7020-7030. Williams, M. A,, Goodfellow, J. M., and Thornton, J. M. (1994). Buried waters and internal cavities in monomeric proteins. Protein Scz. 3, 1224-1235. Wilson, C., Mace, J. E., and Agard, D. A. (1991). Computational method for the design of enzymes with altered substrate specificity.]. Mol. Biol. 220, 495-506. Winter, G., and Milstein, C. (1991). Man-made antibodies. Nature 349, 293-299. Withka, J. M., Wyss, D. F., Wagner, G., Arulanandarn, A. R. N., Reinherz, E. L., and Recny, M. A. (1993). Structure of the glycosylated adhesion domain of human T lymphocyte glycoprotein CD2. Structure 1, 69-8 1. Wood, C. R., Boss, M. A., Kenten, J. H., Calvert, J. E., Roberts, N. A., and Emtage, J. S. (1985). The synthesis and in vivo assembly of functional antibodies in yeast. Nature 314, 46-449. Woof, J. M., Partridge, L. J., Jefferis, R., and Burton, D. R. (1986). Localization of the monocyte-binding region on human immunoglobulin G. Mol. Immunol. 23, 3 19-330. Wu, S., and Cygler, M. (1993). Conformation of complementarity determining region L1 loop in murine IgG lambda light chain extends the repertoire of canonical forms.]. Mol. B i d . 229, 597-601.

260

JIRI NOVCTI‘NY AND JURGEN BAJORATH

Wu, T. T., and Kabat, E. A. (1970). An analysis of the sequences of the variable regions of Bence-Jones proteins and myeloma light chains and their implications for antibody complementarity.]. Exfl. Med. 132, 21 1-250. Wulfing, C., and Pluckthun, A. (1994). Correctly folded T-cell receptor fragments in the periplasm of Escherichia coli. J . Mol. Rial. 242, 655-669. Yanagi, Y., Yoshikai, Y., Leggett, K., Clark, S. P., Aleksander, I., and Mak, T. W. (1984). Nutuw 308, 145-149. Yun-yu, S., Mark, A. E., Cun-xin, W., Fuhua, H., Berendsen, H. J. C., and van Gunsteren, W. F. (1993). Can stability of protein mutants be predicted by free energy calculations? Protein Eng. 6, 289-295. Ysern, X., Fields, B. A,, Bhat, ‘I. N., Goldbaum, F. A,, Dall’Acqua, W., Schwarz, F. P., Poljak, K. J., and Mariuzza, R. A. (1994). Solvent rearrangement in an antigen-antibody interface introduced by site-directed mutagenesis of the antibody combining site.]. Mal. Bid. 238, 496-500. Zauhar, R. J., and Morgan, K. S. (1985). A new method for computing the macromolecular electrostatic potential.]. Mol. Bid. 186, 815-820. Zdanov, A,, Li, Y., Bundle, D. K., Deng, S. J., MacKenzie, R., Narang, S. A,, Young, N. M., and Cygler, M. (1994). Structure of a single-chain antibody variable domain (Fv). fragment complexed with a carbohydrate antigen at 1.7 A resolution. Proc. Natl. Acad. Sci. U.S.A. 91, 6423-6427. Zhao, D., and Jardetzky, 0. (1994). An assessment of the precision and accuracy of protein structures determined by NMR.]. Mol. Biol. 239, 601-607. Zilber, B., Scherf, T., Levitt, M., and Anglister, J. (1990). NMR-derived model for a peptideantibody complex. Biochemistry 29, 10032-10041. Zinkerndgel, R. M., and Doherty, P. C. (1974). Immunological surveillance against altered self components by sensitized T lymphocytes in lynipholytic choriomeningitis. Nature 251, 547-548