Journal Pre-proof Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design Daisuke Kuroda, Kouhei Tsumoto PII:
S0022-3549(20)30016-2
DOI:
https://doi.org/10.1016/j.xphs.2020.01.011
Reference:
XPHS 1848
To appear in:
Journal of Pharmaceutical Sciences
Received Date: 30 September 2019 Revised Date:
25 December 2019
Accepted Date: 10 January 2020
Please cite this article as: Kuroda D, Tsumoto K, Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design, Journal of Pharmaceutical Sciences (2020), doi: https:// doi.org/10.1016/j.xphs.2020.01.011. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Inc. on behalf of the American Pharmacists Association.
1 2 3
Review Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design
4 5
DAISUKE KURODA,1,2 KOUHEI TSUMOTO1,2,3
6 1
7 8
University of Tokyo, Tokyo 108-8639, Japan 2
9 10
12
Department of Bioengineering, School of Engineering, The University of Tokyo, Tokyo
108-8639, Japan 3
11
Medical Device Development and Regulation Research Center, School of Engineering, The
Laboratory of Medical Proteomics, Institute of Medical Science, The University of Tokyo,
Tokyo 108-8639, Japan
13 14
CORRESPONDANCE SHOULD BE ADDRESSED: KT (
[email protected])
15
and DK (
[email protected])
16 17
Running title: Computer-aided antibody design
18 19
ABSTRACT:
20
In recent years, computational methods have garnered much attention in protein engineering.
21
A large number of computational methods have been developed to analyze the sequences and
22
structures of proteins and have been used to predict the various properties. Antibodies are one
23
of the emergent protein therapeutics, and thus methods to control their physicochemical
24
properties are highly desirable. However, despite the tremendous efforts of past decades,
25
computational methods to predict the physicochemical properties of antibodies are still in
26
their infancy. Experimental validations are certainly required for real-world applications, and
27
the results should be interpreted with caution. Among the various properties of antibodies, we
28
focus in this review on stability, viscosity, and immunogenicity, and we present the current
29
status of computational methods to engineer such properties.
30 31
Keywords:
32
Antibody engineering; Computer-aided design; Molecular simulations; Machine learning;
33
Conformational stability; Colloidal stability; Viscosity; Immunogenicity 1
34
INTRODUCTION
35
In recent years, computational methods have been becoming essential tools for antibody
36
engineering as well as for drug discovery, helping with tasks such as screening candidate
37
molecules, evaluating drug likeness, and optimizing physicochemical and pharmacokinetic
38
properties. In these context, a great many computational methods have been developed to
39
analyze quantitative structure-activity relationships of target molecules. In antibody
40
engineering particularly, the growing number of antibody crystal structures enabled us to
41
analyze the sequence-structure relationships, leading to method developments toward
42
high-resolution antibody modeling.1 Computational methods have been also applied to
43
predict various properties of antibodies either from the sequences or structures. However,
44
despite the tremendous efforts of past decades, those methods are still in their infancy.
45
Experimental validations are certainly required for real world applications, and the results
46
should be interpreted with caution.
47
Antibodies are important molecules in various fields. Depending on their purposes, there
48
are several properties of antibodies that need to be engineered (Figure 1). Perhaps the most
49
important physicochemical property is binding affinity, which is a quantitative metric of the
50
natural function of antibodies. Antibodies bind to foreign molecules or antigens through six
51
complementarity-determining regions (CDRs). Although five of the CDRs (L1, L2, L3, H1,
52
and H2) assume limited conformations,2–5 the high variability of CDR-H3 in terms of both
53
sequence and structure6–9 enable it to recognize an infinite number of antigens.
54
Because antibodies function through the six CDRs, affinity maturation by engineering
55
has focused on changing the amino acid sequences of the CDRs.10 The trade-off between
56
binding affinity and other properties is known to be an issue of antibody engineering.11
57
Improving other properties such as stability, viscosity, and immunogenicity can therefore be
58
realized by changing the amino acids of the non-CDR parts of antibodies. Careful inspection
59
of mutational sites is often necessary to generate better antibodies. To overcome such tedious
60
processes, one of the most popular methods for improving antibody properties is random
61
mutagenesis based on in vitro libraries. It is possible to engineer not only binding affinities
62
but also other properties through such an in vitro, library-based approach by elevating
63
temperature and controlling solution conditions during the selection process. However,
64
because of recent advances in computational power and algorithms, computational design is
65
becoming an alternative method in antibody engineering.12–17 One of the advantages to
66
computational methods is that their use can be a rational approach when combined with a
67
structure. Antibodies and their structures should be governed by physical laws, and based on
68
physical principles, we should be able to predict behavior of antibodies in solution and in our 2
69
body. However, such predictions are not entirely satisfactory because the accuracy of
70
computational algorithms is not as good as that of library-based methods due to our
71
incomplete understanding of the biophysical principles of biomolecules and the difficulty of
72
defining conformational dynamics in silico.
73
In this review, we present the current status of computational designs of physicochemical
74
properties of antibodies, which have not been covered in previous reviews of computer-aided
75
antibody design.10,12 Among various properties, we focused on the stability, viscosity, and
76
immunogenicity of antibodies, all of which have garnered much attention in computer-aided
77
antibody design (Figure 1).
78 79 80
PHYSICOCHEMICAL ANTIBODIES
AND
BIOLOGICAL
PROPERTIES
OF
81
Before we present various examples of the application of computer-aided antibody
82
design, we briefly describe experimental metrics of the physicochemical and biological
83
properties of antibodies.
84 85
Stability
86
There are two types of protein stability concerned in antibody drug discovery: physical
87
stability and chemical stability. Generally, protein physical stability can be classified further
88
into conformational and colloidal stabilities. Proteins are only marginally stable, and proteins
89
in solution are in dynamic equilibrium between folded and unfolded conformations (Figure 2).
90
The conformational stability of proteins is defined as the free energy difference (∆G) between
91
the folded and the unfolded states, i.e., ∆G = Gunfolded – Gfolded. To stabilize a protein, one must
92
therefore either stabilize the folded state or destabilize the unfolded state of the protein by
93
shifting the folded–unfolded balance toward folded (Figure 2).18
94
The most straightforward approach to stabilizing folded-state proteins when starting from
95
the crystal structure is to strengthen the interactions between amino acids in the folded state.
96
On the other hand, destabilizing an unfolded-state protein is not trivial because its structure is
97
not visible, and it is unlikely to be a single state, but rather an ensemble of many invisible
98
states.18,19
99
Experimentally, in addition to the free energy differences between folded and unfolded
100
proteins, the conformational stability is often assessed from the melting temperature (Tm),
101
which is the temperature at which 50% of proteins are folded and 50% are unfolded. Melting
102
temperatures can be measured by differential scanning calorimetry (DSC), differential
3
103
scanning fluorometry (DSF), a thermal shift assay, and measurement of circular dichroism
104
(CD).
105
Furthermore, by attracting exposed hydrophobic patches to each other in a partial or fully
106
unfolded state, proteins in a folded–unfolded equilibrium may assemble into another
107
oligomeric state, which is often called a protein aggregate (Figure 2); in stark contrast to the
108
folding–unfolding events, this process of aggregation has been thought to be irreversible.20
109
Such short hydrophobic segments of protein sequences or patches on protein surfaces, termed
110
aggregation-prone regions (APRs), have been considered to govern the aggregation
111
propensities of proteins,21 and single point mutations on such regions can dramatically impact
112
protein aggregation rates. Aggregation of protein therapeutics often hampers the development
113
processes, and preventing the aggregation has been a long-standing challenge in drug
114
discovery because it could lead to higher yield, suppression of unwanted immunogenic
115
responses in patients, and maintenance of binding affinity toward antigens.20 Protein
116
aggregations could also be observed when genes are expressed as recombinant proteins. These
117
aggregates are called insoluble inclusion bodies, which hamper further experiments. Insoluble
118
aggregates can be solubilized and refolded into an active conformation by adding small
119
molecule additives, such as arginine.22,23 This tendency of proteins to aggregate is referred to as
120
colloidal stability.
121
Colloidal stability can be assessed based on binding assays or the sizes of particles in
122
solution after long-term storage or chemical/heat exposure. Colloidal stability can be measured
123
by several experimental assays that can quantify particle size distributions, such as size
124
exclusion chromatography (SEC) and dynamic light scattering (DLS).
125
In addition to physical stability, another challenge in stability engineering is
126
improvement of chemical stability. Antibodies could be degraded by chemical modification
127
of amino acids, such as Asn deamidation, Asp isomerization, Met oxidation, and Lys
128
glycation in the formulation and manufacturing processes.24 In principle, these chemical
129
degradations exhibit special preferences for certain amino acids, and the degradations can be
130
predicted, to some extent, based on amino acid sequences. For example, the common
131
sequence motifs for Asn deamidation are NG, NS, NN, NT, and NH whereas, for Asp
132
isomerization, those are DG, DS, DD, DT, and DH.25 However, applying only sequence
133
information could lead to overestimation of potential degradation sites. Although antibodies
134
have many residues that could be chemically modified, many of them are buried inside
135
structures, where chemical reactions would not proceed (Figure 3). In fact, high solvent
136
exposure of a residue is correlated with the high propensity of the chemical degradation.
137
Therefore, structural information is often desirable to assess such degradation events. 4
138
Forced degradation studies have been widely employed during protein therapeutics
139
development,26 and the best experimental technique to characterize such chemical
140
degradation is the analysis based on the liquid chromatography tandem mass spectrometry
141
(LC-MS/MS).
142 143
Viscosity
144
The viscosity of antibodies is important because of its practical implications with respect
145
to formulation and administration. It has been suggested that the behavior of antibodies in
146
solution can vary as a function of the concentration used during the formulation process. A
147
commonly encountered challenge in formulation is thus the high viscosity of the concentrated
148
antibody solutions, which makes the bioprocessing time longer, the formulated antibodies
149
unstable, and the processing cost higher. For administration to patients, the concentration of
150
an antibody in solution needs to be high (>100 mg/mL), but the viscosity of the solution
151
should be low so that high doses can be delivered through a small volume (1.0–1.5 mL) into
152
the subcutaneous space.27
153
The behavior of the concentration-dependent viscosity of antibodies depends on pairwise
154
interactions or self-association, which further leads to higher-order intermolecular
155
interactions (Figure 2). Experimentally measurable parameters related to the pairwise
156
intermolecular interactions include an osmotic second virial coefficient (B22) and a diffusion
157
interaction parameter (kD), which can be obtained by static light scattering (SLS) and DLS
158
measurements, respectively. These experimental parameters have been relied on as target
159
parameters to be computationally or theoretically predicted.28–30 In addition to B22 and kD, the
160
viscosity of a protein solution has also been experimentally evaluated based on parameters
161
such as the solution viscosity (η) measured with rheometers, diffusion coefficients (D) from
162
DLS profiles, and the retention time (RT) in a chromatographic column determined via
163
hydrophobic
164
chromatography (SMAC), or cross-interaction chromatography (CIC).
interaction
chromatography
(HIC),
standard
monolayer
adsorption
165 166
Immunogenicity
167
The immunogenicity of therapeutic proteins refers to the immune response of patients
168
against the proteins. The immunological discrimination of self and non-self governs the
169
mechanism of the immune response. In patients, drugs that are recognized as non-self may
170
initiate an immune response, which is often characterized experimentally by the detection of
171
anti-drug antibodies. The limitation of efficacy and negative impact on safety caused by
172
development of such antibodies hampers the clinical utility of the antibodies. 5
173
Direct assessment of immunogenicity in preclinical trials requires animal testing, which
174
is highly time consuming and costly. Therefore, during the development stage,
175
immunogenicity is often assessed by the sequence similarity between amino acid sequences
176
of target therapeutics and those of human antibodies. There are several traditional humanized
177
formats such as chimeric and humanized antibodies, and indeed correlations have been
178
reported between the fraction of human contents and immunogenicity, which has been
179
quantified in terms of the number of patients exhibiting anti-antibody responses.31
180 181 182
PREDICTION AND ENGINEERING PROPERTIES OF ANTIBODIES
OF
PHYSICOCHEMICAL
183
In response to antigen invasions, antibodies can evolve in our body to reshape their
184
sequences and structures so that they can bind to antigens with higher specificity and affinity.
185
The natural process of reshaping the antigen-binding site involves chemical and structural
186
changes that may enhance binding affinity of the antibody at the cost of thermodynamic
187
stability or other properties. The processes of somatic maturation are thus sensitive to many
188
unknown factors and are tightly controlled with a delicate balance by immune systems.
189
These physicochemical properties of antibodies could be engineered in a manner similar
190
to in vivo evolution through mutations in vitro or in silico. Library-based in vitro approaches
191
are perhaps still the most accurate method at this moment.32 As described below, however,
192
recent advances in computational power and algorithms have suggested that computational
193
approaches could be alternative methods to engineer antibodies with lower cost and higher
194
speed.
195 196
Overview of computational prediction and engineering
197
Numerous computational methods to predict physicochemical and biological properties
198
such as protein stability, viscosity, and immunogenicity have been developed to facilitate
199
predictive protein engineering. In many cases, an input is a sequence or a structure (Figure 4).
200
Computational models can then predict either in qualitative or quantitative terms whether the
201
input is either stable or unstable, has low or high viscosity, and is immunogenic or
202
nonimmunogenic. In general, these prediction models can be classified into two categories:
203
1) statistical predictions and 2) physics-based predictions. Statistical methods literally rely on
204
statistical information derived from experimental data, and the accuracy of the predictions
205
heavily depend on the amount and quality of data used to train the prediction methods. As
206
described below, several large-scale experimental datasets have become available in public,
207
and one can use them to train new prediction models. On the other hand, physics-based 6
208
methods predict properties based on physical laws, and hence the methods do not require any
209
prior experimental data to perform predictions. These statistical and physics-based methods
210
are not mutual exclusive, and many of the prediction algorithms take advantage of prior
211
knowledges of experimental data as well as physical laws to varying extents.
212
In protein engineering, however, one has to optimize a wild-type sequence so that the
213
properties of the protein can improve. Technically, one can generate a pool of random
214
sequences, feed them into a prediction algorithm, and then obtain a potentially improved
215
sequence. However, these processes are cumbersome, and it is not trivial to cover a whole
216
sequence space; the FV region of antibodies usually consists of 200 amino acids or even more,
217
and hence there are 20200 (≈ 10270) sequences to be considered. More practical methods have
218
therefore been developed to design amino acid sequences. These methods, which couple
219
prediction of the properties of a protein with sequence sampling, are called computational
220
design calculations (Figure 4).33,34
221 222
Prediction of conformational stability
223
There are a large number of methods for prediction of conformational stability changes
224
upon mutations (i.e., ∆∆G = ∆GMut – ∆GWT). The accuracy of each method is often evaluated
225
based on the correlation coefficient (r) between predicted ∆∆G and experimental ∆∆G values.
226
Machine learning has been used to predict ∆∆G due to mutations from protein sequences
227
and structures. One recent such method is the DeepDDG,35 which relies on a deep neural
228
network trained on 5444 experimental data points. Benchmarking the method against eight
229
other methods has demonstrated that the DeepDDG performs the best; correlation coefficients
230
(r) between experimental ∆∆G and predicted ∆∆G were 0.66, 0.62, 0.60, 0.59, 0.57, 0.54, 0.52,
231
0.31, and 0.18 for DeepDDG, PopMusic,36 SDM,37 EASE-MM,38 mCSM,39 I-Mutant3.0,40
232
STRUM,41 MUpro1.1,42 and SCooP,43 respectively. Sequence and structural features exploited
233
in the DeepDDG are listed in Table I. Among them, solvent accessible surface area of the
234
mutated residue contributed to the prediction most, suggesting the importance of residue
235
packing for protein stability.35
236
In addition to machine learning, structure-based simulations can also be used to predict the
237
∆∆G associated with a mutation. For instance, Gapsys et al. have employed a method of free
238
energy perturbation44 to propose a consensus force field approach, where ∆∆G values
239
calculated from molecular dynamics simulations with six different force fields (Amber99sb,
240
Amber99sb*LDN, OPLS, Charmm22*, Charmm36, and Charmm36H) were averaged to
241
minimize the force field bias;45 benchmarking the consensus method with 119 mutations of the
242
barnase protein led to a correlation coefficient (r) between experimental and computed ∆∆Gs 7
243
of 0.74. Furthermore, Steinbrecher et al. have demonstrated that FEP+, which is a free energy
244
perturbation method based on molecular dynamics (MD) simulations with a single force
245
field,46 can predict ∆∆G values of single point mutations;47 with 712 mutations of 10 different
246
proteins, the correlation coefficient (r) between experimental and computed ∆∆G values was
247
~0.74.
248
In contrast, based on a set of mutations for which experimental data were reported at least
249
twice, Potapov et al. have reported that the correlation coefficient (r) of ∆∆G between
250
independent experiments was 0.86.48 This result is consistent with a recent theoretical estimate
251
of a natural upper bound of the accuracy of ∆∆G predictions.49 These results suggest that there
252
may still be room for improvement of computational ∆∆G predictions.
253
The melting temperature (Tm) is a direct indication of the conformational stability of a
254
protein. Thus, Tm is more commonly used than ∆∆G to experimentally represent
255
conformational stability in the literature. Bekker et al. have proposed a computational strategy
256
to assess conformational stability of single-domain antibodies and have demonstrated that the
257
fraction of native contacts (Q-value)50 computed from high-temperature (400 K) MD
258
simulations was correlated with the Tm of single-domain antibodies;51 they employed seven
259
single-domain antibodies that exhibited a range of Tm from 47℃ to 85℃, and they observed a
260
reasonable correlation (r = 0.79) between the Q-values and Tm values reported in the literature.
261
When the calculations were based on Q-values of hydrophilic residues, the correlation
262
coefficient (r) became 0.84. The implication is that favorable interactions of hydrophilic
263
residues lead to stabilization of single-domain antibodies. Based on these observations, the
264
authors proposed a few mutations that were predicted to enhance conformational stability of a
265
single-domain antibody. Recently, Zabetakis et al. have experimentally tested the mutations
266
proposed by Bekker et al. and have demonstrated that the mutations indeed improved
267
conformational stability of the antibody;52 however, it also turned out that those
268
stability-enhancing mutations in turn led to reductions of the binding affinities to the antigen,
269
suggesting the difficulty in simultaneously improving conformational stability and the other
270
properties.
271 272
Prediction of colloidal stability and solubility
273
Aggregation of proteins has been an active area of research, especially in the context of the
274
ability to develop protein therapeutics. In principle, aggregation and solubility are distinct
275
phenomena because aggregation is an irreversible process while solubility is typically
276
considered as a reversible process. In practice, however, methods to predict aggregation rates
277
have been also exploited quite effectively to predict the solubility of proteins.53,54 Therefore, 8
278
the terms “aggregation propensity” and “solubility” have been sometimes used
279
interchangeably in computational method developments.55 There have been several
280
comprehensive reviews of computational studies of colloidal stability and solubility of
281
therapeutic proteins.20,55–57
282
Currently, several computational methods are available to predict APRs and rates of
283
aggregation. Those methods are based mainly on sequence compositions and on propensities
284
such as hydrophobicity as well as charge and secondary structure propensity. The β-strand
285
tends to aggregate more than the α-helix.58 Thus, antibodies, which consist of multiple
286
β-strands, are likely to have an intrinsic aggregation propensity. In fact, light chains of
287
antibodies have been known to be a cause of amyloidosis.59 In this context, David et al. have
288
developed an algorithm for predicting amyloidogenesis of light chains of antibodies based on a
289
Bayesian classifier and a decision tree.60 Furthermore, using the same dataset of the antibody
290
light chains, Liaw et al. have developed an algorithm called AbAmyloid based on a Random
291
Forests classifier with information of dipeptide composition.61
292
More general methods to predict APRs and aggregation rates include TANGO,62 PAGE,63
293
Waltz,64 PASTA,65 Zyggregator,66 and Aggregation3D (A3D),67 just to name a few. For a more
294
comprehensive list, we refer readers to previous review articles.20,55 Like ∆∆G predictions, the
295
majority of the methods have been based on machine learning, wherein many of the features
296
used in the model constructions have been identical to those used in the prediction of
297
conformational stability, although there have been some notable exceptions.68–74 Such an
298
example is the SOLart protein solubility predictor,75 which exploited solubility-dependent
299
distance potentials derived from crystal structures.76 To develop the SOLart, a random forest
300
model was trained based on proteins that had been expressed with the cell-free expression
301
system called PURE and whose solubilities had been experimentally measured.77 Features
302
used in the SOLart is listed in Table I. A benchmark test showed that the predicted solubility
303
values by SOLart were correlated with experimental solubility scores (r = 0.65) better than 9
304
other methods used in the benchmark.75
305
To correlate protein features with protein solubility or aggregation propensity, Warwicker
306
and coworkers analyzed protein surface features and found that the most important feature
307
associated with solubility was the amount of positively charged residues on the surfaces; the
308
more positive protein surfaces are, the less soluble the proteins are.78 The fact that this apparent
309
correlation was not observed for negatively charged residues suggested that interactions
310
between expressed proteins and nucleic acids might lead to insolubility. In another study, the
311
same group have also suggested that a feature that commonly accompanies proteins with high
312
solubility and that occurs at relatively high expression and abundance levels is an increased 9
313
ratio of lysine content to arginine content.79 Based on these observations, the authors employed
314
a linear model of 35 features, including 20 amino acid compositions, seven compositions of
315
charged and hydrophobic residues, and several other features. This model led to the Protein-sol
316
application that can predict protein solubility from amino acid sequences.80 A recent
317
experimental study has also demonstrated that the arginine/lysine ratio is an important
318
determinant of colloidal stability of an antibody.81 In a recent study, the Protein-Sol application
319
has been updated to incorporate structural information and thereby enable structure-based
320
assessments of protein solubility by additional electrostatic potential calculations as well as the
321
visualization of surface patches on protein structures.82
322
These methods of prediction can extract insightful statistics from large-scale sequences
323
or structures. These methods are usually fast enough to compute APRs in a high-throughput
324
way, and thus they could be potentially exploited to analyze antibody sequences from an
325
antibody library. For instance, TANGO62 and PAGE63 have been exploited to identify
326
potential APRs of commercial antibodies.83 Such high-throughput assessments would not be
327
possible via experiments, and prediction-based analyses could therefore provide valuable
328
insights into relationships between protein sequences, structures, and aggregation propensity.
329
However, the colloidal stability of proteins still involves unknown mechanisms of aggregation,
330
and considering current accuracies of prediction algorithms, experimental verifications are still
331
required to draw final conclusions.
332
A more rational way to characterize colloidal stability of proteins is patch analyses of
333
protein surfaces. An assumption is that exposed hydrophobic patches on protein surfaces
334
would lead to self-oligomerization. Based on this idea, Trout and coworkers proposed a novel
335
measure called spatial aggregation propensity (SAP), which quantifies the exposure of
336
hydrophobic residues derived from a crystal structure or averaged over snapshots from MD
337
simulations.84 Antibodies are expected to have better colloidal stability if point mutations are
338
introduced to the predicted hydrophobic patches, such that the patches can become more
339
hydrophilic. To achieve rapid in silico screening of antibodies, the same group subsequently
340
developed another metric called the Developability Index (DI), which is calculated from a
341
combination of the SAP score and the net charge of target proteins.85 The rationale is that, in
342
addition to the obvious importance of hydrophobicity on the protein surface, electrostatics is
343
also a quite important factor for solution-phase reactions and for protein aggregation.
344
Another computational method to design aggregation-resistant proteins is the CamSol
345
method developed by Vendruscolo and coworkers,54 which was an extension of the
346
sequence-based aggregation predictor, Zyggregator.66 The CamSol method first calculates the
347
sequence-based residue-wise solubility profile based on a score represented by a linear 10
348
combination of physicochemical properties of amino acids. The score is smoothed over a
349
window of seven residues, and the sequence-based profile is further modified based on
350
structural information. Designable positions are then identified based on the structure-based
351
profile, and all possible variants are screened to identify the most soluble mutations with the
352
sequence-based solubility score. The authors benchmarked the accuracy of the CamSol method
353
using 56 previously published protein variants (including 34 antibodies) to see if the CamSol
354
could classify the proteins as soluble or insoluble. Fifty-four of 56 proteins were correctly
355
classified with the CamSol method. In another study, the same group used saturation
356
concentration analysis, DLS, and analytical SEC to demonstrate that the CamSol method could
357
identify mutations that improved solubility of a single-domain antibody.54
358
Sankar et al. have proposed the use of AggScore as a method to predict and evaluate APRs
359
from protein structures.86 Based on an input structure, the method quantifies the energetic
360
contribution of each residue to respective hydrophobic and electrostatic surface patches. The
361
AggScore function has been parameterized based on a previously published dataset of mutants
362
made from engineered immunoglobulin-like domains.87 Use of the optimized function
363
produced a better correlation coefficient (r = 0.85) with the percentage of inclusion body
364
formation than use of Zyggregator66 and Aggrescan88 (r = 0.81 and 0.84, respectively). The
365
AggScore can also discriminate between amyloidogenic and non-amyloidogenic hexapeptides;
366
it produced a better AUC value (0.81) than Zyggregator66 and WALTZ89 produced (AUC =
367
0.77 and 0.78, respectively) in an ROC curve analysis. In another benchmarking application,
368
the authors compared results to retention times measured from HIC, SMAC, and CIC assays of
369
137 antibodies in the clinical stage;90 AggScore produced better AUC values (0.75, 0.76, and
370
0.70 for the retention times of HIC, SMAC, and CIC, respectively) than Zyggregator66 (0.50,
371
0.58, and 0.54) and Aggrescan91 (0.54, 0.69, and 0.61). The use of AggScore has been
372
implemented in the BioLuminate package of Schrödinger.
373
Based on previously published experimental results of the production yields of nanobodies
374
and the ∆∆G upon point mutations,92,93 Soler et al. employed a homology model of the
375
nanobodies to see whether conformational and colloidal stabilities correlated with production
376
yields; they found that the ∆∆G values obtained through experiments and the predicted scores
377
generated by FoldX, CamSol, and A3D were unable to predict production yields.94
378
Subsequently, the authors proposed a computational protocol to predict production yields; MD
379
simulations were first performed to identify regions that were affected by mutations based on
380
the differences of residue contact area maps in the SPACE suite95 between mutants and
381
reference crystal structures. Together with the exposed hydrophobic residues identified by
382
InterProSurf,96 those identified regions were assumed to be aggregation hotspots and therefore 11
383
potential binding sites in docking simulations by HADDOCK.97 Nanobodies that exhibited
384
high production yields consistently generated poor docking scores, whereas nanobodies that
385
exhibited lower production yields scored better in the docking simulations. The implication
386
was that the nanobodies with lower yields tended to form dimers more favorably via the
387
predicted aggregation hotspots. The better performance of the MD/docking-based protocol was
388
explained by the fact that most of the conventional methods, which often relied on a single
389
static structure, did not take into account long-distance effects of mutations on the whole
390
molecular structure.94 Among computational methods, MD simulations are distinct in that,
391
given static structures, they can predict time-course motions of proteins and evaluate not only
392
the local dynamics of proteins, but also their global dynamics, such as large-scale domain
393
motions and allosteric communications.98
394
The methods discussed above were trained and validated on a limited dataset from a
395
variety of experimental sources. For model construction, a larger dataset analyzed by fewer
396
facilities would be desirable to avoid bias and minimize the noise caused by different
397
institutions doing the experiments. With the goal of achieving high-throughput screening of
398
antibodies, Obrezanova et al. have measured aggregation data of 576 antibodies via the
399
Oligomer Detection Assay and SE-HPCL methods.99 They developed a qualitative prediction
400
algorithm that classified antibodies based on their aggregation propensity (low or high) using
401
antibody sequences as input. The prediction method was based on the Adaptive Boosting
402
algorithm for building ensembles of classification trees to bridge experimental data with
403
numerical parameters derived from principal component analysis of physicochemical
404
properties of amino acids, such as hydrophobicity, electrostatic, polarity, size, steric
405
hindrances, and hydrogen bond properties. Using 49 different antibodies from the ones used
406
for training and validation above, the authors benchmarked their method with the DI tool85 and
407
showed that their method was able to correctly classify 84% of antibodies that exhibited low or
408
high aggregation risks, whereas the DI tool correctly classified only 53% of the same
409
antibodies.
410 411
Prediction of chemical stability
412
The analysis via LC-MS/MS to identify chemical modifications of proteins is often
413
labor-intensive and time consuming. Therefore, several computational methods have been
414
proposed to rapidly assess the chemical stability of therapeutic proteins.24,100–110 One of the
415
most common degradation events is the chemical modification of Asn and Asp residues,
416
which share a degradation pathway.24 Many of the methods to predict such degradation are
417
statistical-based methods, and experimental data to derive such prediction models are either 12
418
from in-house experiments100,102,106,107,109 or from literature.103,105,110 For example, to
419
understand origins of Asn deamidation and Asp isomerization, Sydow et al. employed mass
420
spectrometry to experimentally characterize 37 antibodies that were subjected to forced
421
degradation.100 These experimental data, together with homology modeling of the antibodies,
422
suggested that degradation hotspots could be characterized by their conformational flexibility,
423
the size of the C-terminal franking residue, and secondary structures. In the same study,
424
several machine learning algorithms were trained based on the experimental results, and a
425
decision tree model was proposed as the best prediction method for Asn and Asp
426
degradations. In another study, Yan et al. used 10 antibodies under both normal and stressed
427
conditions, and experimentally characterized the Asn deamidation, leading to a decision tree
428
model to predict the Asn deamidation probability from antibody structures.106 More recently,
429
based on in-house LC-MS/MS experiments and literature information, Delmar et al.
430
employed machine learning to predict Asn deamidation probability and rate.107 The training
431
set consisted of 776 Asn residues from 67 antibodies. Based on the reasoning of chemical
432
reactions of the degradation pathway, a total of 12 features were considered to train a random
433
forest prediction model (Table I). Among all features, the C-terminal flanking residue and
434
pentapeptide deamidation half-life had the greatest impacts for the categorical prediction of
435
Asn deamidation. On the independent validation set that include only 68 Asn residues of
436
antibodies, the authors compared the prediction accuracy of their method with those of other
437
prediction algorithms; the proposed random forest model achieved 95.6% prediction accuracy,
438
whereas the methods by Yan et al.106 and Lorenzo et al.103 showed the prediction accuracies
439
of 83.8% and 91.2%, respectively.
440
In contrast to statistical methods, physics-based methods are often low-throughput, but
441
can provide rationale behind predictions. In this context, a study by Plotnikov et al.
442
demonstrated that molecular dynamics and quantum mechanical calculations could help
443
predict Asn deamidation and Asp isomerization in antibodies by quantifying free energy
444
barriers along the conformational and chemical reaction pathway.104 A clear advantage to the
445
method is that it does not require any prior experimental data for parameterization.
446
Considering the fact that it is becoming possible to perform high-throughput experimental
447
assays to assess therapeutic antibodies, combining both statistical and physics-based methods
448
would be a promising direction for method developments.
449
13
450
Databases
451
construction
that
store
experimental
information
for
predictive
model
452
Databases that store protein variants and binding affinities measured by a variety of
453
experiments have been developed for studying protein–protein interactions in general111 and
454
specifically for antibodies.112 Crystal structures of protein–protein complexes and their
455
corresponding unbound-state structures are also available for benchmarking docking
456
simulations.113 Similarly, databases of conformational114 and colloidal89,115–118 stabilities as
457
well as solubility77 have been developed and exploited for construction of predictive models
458
and statistical potentials. ProTherm is probably the most widely used database that stored
459
thermodynamic stability data of proteins. It contains more than 10,000 data points generated
460
by thermodynamic experiments.114 Table II summarizes the contents and URLs of those
461
databases as of the writing of this review. The dataset used for training and testing is also
462
often provided in the associated supplementary materials of method papers.35,37 It is also
463
worth noting that scientists at Adimab have published results of experimental
464
characterizations of ~140 clinical stage antibodies based on a series of biophysical and forced
465
degradation assays (Table III).25,90,102 These data resources will enhance our understanding of
466
antibody therapeutics as well as method developments for computer-aided antibody design.
467 468
Computer-aided stability engineering of antibodies
469
Although a large number of computational techniques have been developed in a quest for
470
predictive protein engineering, only a few methods have been tested to experimentally
471
improve protein stability by using point mutations suggested by computational predictions.
472
Regardless of the target properties, to engineer a protein, one needs to choose both the
473
location of the mutations and the replacement amino acid. The outcomes of computational
474
designs are designed amino acid sequences with predicted values or scores (Figure 4).
475
Because the predicted values are used as references, one must experimentally evaluate the
476
physicochemical properties of the proteins. A score is often represented in the form of a
477
linear combination of specific physicochemical or structural properties.
478
Rosetta119 and FoldX120 are widely used automated methods for computational
479
assessments of effects of point mutations on protein structures. An advantage of these
480
methods over the other ∆∆G prediction tools is that Rosetta and FoldX can simultaneously
481
sample the type of side chains or amino acids (i.e., sequence design) as well as the
482
conformations (i.e., structure prediction), whereas, in most of the other methods, the type of
483
replacement residues need to be specified before computing the ∆∆G, and the designed
14
484
structures are often not explicitly generated. Computational methods reviewed below are
485
summarized in Table IV.
486 487
Predicted ∆∆G as a selection criterion. The goal of the majority of the computational studies
488
in antibody engineering has been to improve colloidal stability because of the recent focus in
489
formulation of protein therapeutics on the colloidal stability of antibodies. In contrast, many
490
of the experimental efforts to improve conformational stability by computations have
491
involved enzymes.121
492
Compared to improving binding affinity, improving conformational stability is much
493
more straightforward; in the case of interface designs for improvement of affinity, the relative
494
orientation between two components in a protein–protein complex needs to be considered in
495
addition to the intrinsic dynamics of each component. In contrast, when the only concern is
496
conformational stability, the design object can be a single component, and there are fewer
497
degrees of freedom. However, in real world applications, not only conformational stability,
498
but also other properties, such as colloidal stability, have to be considered, and trade-offs
499
between properties have been reported.11 In one such example, Broom et al. built a
500
meta-predictor that combined 11 freely available ∆∆G prediction tools.122 They showed that
501
the accuracy of the predictions was better than that of individual tools against 605
502
experimentally verified mutations. By exploiting the meta-predictor, the authors predicted the
503
∆∆G values for all point mutations to each of the ThreeFoil’s 120 residues that were not
504
involved in the function. The 10 variants predicted to be stabilizing were chosen for further
505
experimental characterization: four out of the 10 variants were indeed found to have better
506
conformational stability based on the experimental ∆∆G values obtained via kinetic unfolding
507
and folding measurements. However, in contrast to the improved conformational stability, all
508
the designed variants exhibited decreased colloidal stability. The implication is that it is
509
difficult to simultaneously improve all the physicochemical properties of proteins.
510
With the advent of the next generation sequencing technology, a high-throughput method
511
of saturation mutagenesis for entire sequences, called deep mutational scanning, has emerged
512
as a powerful tool in next-generation protein science,123 and deep mutational scanning has
513
also been applied to engineer antibodies.124 Similarly, with the advent of increasing
514
computational speed and accuracy, similar high-throughput saturation mutagenesis methods
515
in silico are now becoming possible. Based on such a computational protocol, Wang et al.
516
have been able to improve the conformational stability of an anti-hVEGF antibody;125 they
517
assessed the conformational stability based on the T50 value, which was the temperature at
518
which half of the antibody was inactivated in an enzyme-linked immunosorbent assay 15
519
(ELISA) after heat exposure. In their protocol, the authors first build a homology model of
520
the antibody with RosettaAntibody126 and then dock the model to a crystal structure of the
521
antigen with the ZDOCK program127, which is followed by refinement with SnugDock.128
522
After the model building of the antibody-antigen complex, the authors perform a virtual
523
scanning mutagenesis with the FoldX program120 to obtain the ∆∆G (= ∆GMut – ∆GWT) of each
524
position of non-interface residues. The resultant designed mutations are then filtered by the
525
computed ∆∆G, local structure entropy,129 and residue frequency statistics of human
526
antibodies to generate an antibody with 10 mutations having better conformational stability
527
(∆T50 ~7℃, compared to the wild type). Retrospectively, the authors also analyzed the
528
unfolding pathways of the designed mutants based on a Gaussian network model.130 That
529
assessment suggested that analysis of unfolding pathways of proteins prior to design could
530
help to improve design accuracy.
531
Zhang et al. performed a computational design calculation to explore relationships
532
between conformational and colloidal stabilities of the Fab region of an antibody, the Tm of
533
which was 71.8℃, based on measurements made with the UNit instrument (Unchained
534
Laboratories, UK).131 In their strategy, potentially flexible regions were first identified based
535
on MD simulations of a homology model of the antibody and B-factors derived from crystal
536
structures of the homologous antibodies (53–90% sequence identities). The authors then
537
applied Rosetta ∆∆G scanning mutagenesis to the entire sequence (442 residues) of the
538
antibody. This analysis resulted in 8398 model structures (442 ×19 non-native amino acids)
539
in total. Based on the prediction of the flexible regions and the in silico ∆∆G calculations, 17
540
variants were selected for further experimental validations: 11 stabilizing variants with
541
predicted ∆∆G values ranging from −8.8 to −2.6, whose designed positions were predicted to
542
be more flexible than other regions, and six destabilizing mutations with predicted ∆∆G
543
values ranging from 39.1 to 235.7. As expected, although 6 out of 11 stabilizing variants had
544
slightly higher Tm values compared to the wild type, the magnitude of the improvements was
545
not significant (∆Tm < 1.0℃). This is most likely because the wild type antibody already had
546
high Tm value (71.8℃), and the wild type sequence might have been highly optimized for the
547
conformational stability. Overall, the stable variants tended to show cooperativity of
548
unfolding and lower aggregation rate. In addition, the variants also showed that those with
549
decreased Tm values or decreased conformational stability led to more rapid aggregation.
550
More recently, in a more sophisticated approach, Lee et al. employed homology models
551
built by RosettaAntibody to engineer thermostabilized antibodies.132 Based on the visual
552
inspection of the homology models, the authors identified a small number of amino acids (2-5
553
residues) that interacted each other to form “clusters”. Subsequently, Rosetta-based fixed 16
554
backbone design protocol was used to mutate these residues in each small cluster to another
555
amino acids, so that the designed positions became more tightly packed; five out of 13
556
variants experimentally tested showed small increases in the Tm values, and two of the
557
combinations of the designs resulted in two thermostabilized variants whose Tm values were
558
4.4℃ and 4.5℃ higher than the wild type, respectively. Notably, in the same study, a crystal
559
structure of a thermostabilized variant was determined, and, retrospectively, the homology
560
models used for the computational design were in excellent agreement with the crystal
561
structure (backbone RMSDs were 0.56 Å and 0.84 Å for the FV region and CDR-H3,
562
respectively), highlighting the utility of homology modeling for stability engineering.
563
In all the 3 cases of antibodies above, the designed positions were limited to framework
564
regions since mutations in CDRs could have deteriorate effects on binding capability of the
565
designed antibodies. In agreement with this reasoning, in our experiences, although mutations
566
in CDRs could improve conformational stability, such mutations often diminish binding
567
affinity toward antigens.
568
Although one of the widely used computational methods for biomolecular design is
569
Rosetta,119 it has been less well explored for design of aggregation-resistant proteins. Based
570
on the previous observation that Asp substitutions at specific positions in human antibodies
571
could decrease the aggregation propensity,133 Sakhnini et al. have designed a combinatorial
572
antibody library in which 393 Fab variants with single, double, and triple Asp substitutions
573
have been prepared.134 Subsequently, the authors screened these variants with ∆∆G
574
calculations by Rosetta. Single and double/triple substitutions that caused increases of ∆∆G
575
by more than 5 and 1.5 Rosetta Energy Units, respectively, were eliminated. Twenty-six
576
antibodies remained for further experimental characterization. As expected from the lenient
577
∆∆G criterion, the Tm of the 26 variants measured by DSF showed some variations (57.2℃–
578
63.3℃, compared to 61.5℃ for the wild type), but all the variants resulted in fully retained
579
binding affinity, and half of them showed aggregation resistance. Retrospectively, the authors
580
computed SAP scores for each variant and compared them with experimental metrics that
581
suggest aggregation propensity; SAP values were not correlated with the percentage of
582
high-molecular-weight proteins formed after incubation at 45℃ for six days, whereas they
583
were remarkably correlated with the retention time of the size exclusion ultra-performance
584
liquid chromatography* (Spearman rank correlation coefficient = 0.94). The authors also
585
found that a decreased aggregation propensity or improving colloidal stability was well
586
correlated with conformational stability, i.e., a decreased aggregation propensity led to an
587
increased conformational stability of the Fab variants (Spearman rank correlation coefficient
588
= −0.87). 17
589 590
Supercharging. Whereas scanning mutagenesis is quite useful, alternative methods have been
591
developed in which the screening is performed by design in a more rational manner. For
592
instance, Lawrence et al. have proposed a supercharging method in which several surface
593
residues, as assessed by the average number of neighboring atoms (within 10 Å) per side
594
chain atoms, are replaced with charged amino acids to increase the thermal resistance of
595
proteins.135 Later, Miklos et al. also designed an antibody based on another supercharging
596
strategy, wherein the positions of mutations were chosen on the basis of Rosetta
597
energetics.136,137 They demonstrated that some of the designed antibodies with ~14 mutations
598
had better refolding capability, which was assessed by ELISA binding assays after thermal
599
inactivation following incubation at 70℃ for 1 hour. Some of the designed antibodies also
600
had better conformational and colloidal stabilities based on assessments with DSC and DLS,
601
respectively.137 Interestingly, a stabilized antibody (∆Tm = ~2℃) showed a 30-fold better
602
binding affinity (assessed by surface prasmon resonance (SPR)) than the parent antibodies,
603
even though the altered positions were not in the CDRs but instead in the framework regions
604
(FRs). Bruce et al. have also designed supercharged single-domain antibodies with ~11
605
mutations based on buried surface areas; They demonstrated that supercharging strategies
606
could endow small proteins with the ability to penetrate a cell without altering their structure
607
and function.138 Although immunogenicity may be a problem in the case of therapeutic
608
applications, supercharging strategies seem to be a powerful approach to design stable
609
antibodies. However, the effective net charges and positions of mutations is not universal in
610
antibodies and, despite the fact that sequences and structures of the framework regions of
611
antibodies are well conserved, these properties need to be determined case-by-case;
612
mutations at the framework regions that are tolerable to an antibody are unlikely to be equally
613
acceptable to another antibody because of the subtle balance between the conserved
614
framework regions and the highly diverse CDRs that varies between antibodies.
615 616
Spatial aggregation propensity (SAP). Protein self-association can lead to aggregation.
617
Explicitly considering mutagenesis of the interacting region would thus be a practical
618
approach to designing aggregation-resistant antibodies. In this context, the SAP method has
619
also been employed in structure-based antibody design. Starting from two therapeutic
620
antibodies, rituximab and bevacizumab, Trout and coworkers have employed the SAP
621
calculations to design biobetters with enhanced colloidal stability.139,140 In the case of the
622
bevacizumab design, in addition to the simple point mutations, the authors also incorporated
623
a glycosylation motif near the high-SAP regions of the antibody. They have shown that 18
624
masking APRs with a carbohydrate moiety can be an effective approach to prevent
625
aggregation.
626
Clark et al. have also used homology modeling and SAP calculations to design variants
627
of a highly aggregation-prone IgG2;141 mutational positions were chosen based on the SAP
628
scores whereas the selection of the types of substitutions was based on the sequence
629
comparison to an aggregation-resistant homologue, which resulted in 74 variants with as
630
many as 9 mutations. The resultant variants showed enhanced conformational and colloidal
631
stability in 32 cases, out of which 11 variants could still bind to the antigen, as confirmed by
632
SPR, and 9 variants showed biological activity, as confirmed by an assay employing natural
633
killer cells.
634
Exposure of antibodies to an acidic environment is often necessary during the
635
formulation and manufacturing processes.142 Skamris et al. have employed size-exclusion,
636
high performance liquid chromatography, small-angle X ray scattering (SAXS), and DLS to
637
characterize the oligomerization kinetics at pH 3.3 and the reversibility upon neutralization of
638
three antibodies with identical FV regions that are representative of IgG1, IgG2, and IgG4,
639
respectively.143 These experimental techniques have revealed that, under acidic pH conditions,
640
IgG1 remains monomeric, whereas the other two undergo a two-phase oligomerization
641
process. After neutralization, IgG2 oligomers partially revert to the monomeric state, whereas
642
IgG4 oligomers tend to aggregate. Use of SAP calculations based on crystal structures of the
643
Fc fragments were able to identify subclass-specific, aggregation-prone motifs, indicating
644
that these motifs could explain the two distinct pathways of reversible and irreversible
645
aggregation observed in their experiments.
646
A variety of excipients have been suggested to be protein stabilizers.144–148 In this context,
647
SAP calculations have also been used to examine interactions between antibodies and
648
formulation excipients. To gain insights into how formulation excipients of protein
649
therapeutics affect aggregation and viscosity, Trout and coworkers conducted MD
650
simulations using three different IgG1 and several carbohydrates.149 They found that sucrose
651
and trehalose reduced antibody aggregation more than sorbitol because of their larger size
652
and stronger interactions with high-SAP regions of the antibodies.
653 654
CamSol. In rational antibody design and screening, the CamSol method has also been
655
employed in combination with experiments. Using nine full-length antibodies with a
656
PEG-precipitation assay, DSC, and DSF measurements, Vendruscolo and coworkers have
657
shown that selection of soluble lead antibodies is possible with the improved sequence-based
658
CamSol solubility score just after sequencing of the screened antibody library.150 Remarkably, 19
659
the correlation coefficient (r) between the predicted CamSol scores and experimentally
660
measured solubilities of nine antibodies was 0.97. Furthermore, Vendruscolo and coworkers
661
have designed 16 variants of an antibody with the CamSol method; they produced antibodies
662
with a diverse range of solubilities and other physicochemical properties.151 The authors
663
employed several experimental techniques (cross-interaction chromatography, standup
664
monolayer
665
affinity-capture
666
precipitation) to assess the developability of the series of antibodies, and they compared the
667
experimental results with the results obtained by several in silico tools (CamSol, SAP, DI,
668
SolPro, and Protein-Sol). They found that CamSol, SAP, and DI were highly correlated with
669
the experimental measurements. The fact that the Pearson correlation coefficients were as
670
high as 0.91 demonstrated the utility of the computational methods for high-throughput
671
antibody screening.
adsorption
chromatography,
self-interaction
hydrophobic-interaction
nanoparticle
spectroscopy,
and
chromatography, ammonium
sulfate
672
In another study, based on a crystal structure of an antibody and a homology model of
673
the IgG format, Shan et al.152 have designed the 15 variants targeting the potential hotspots
674
(CDR-L2, CDR-H3, and the CH3 domain) for self-association previously suggested by
675
hydrogen–deuterium exchange mass spectrometry (HDX-MS).153 This design resulted in
676
antibodies that had a diverse range of solubilities and other physicochemical properties. The
677
authors assessed the self-association using several experimental techniques (affinity capture
678
self-interaction nanospectroscopy, DLS, and PEG-precipitation assays) and compared the
679
results with the results obtained by the CamsSol and SAP calculations. This comparison
680
revealed the correlation coefficients (r) between computed scores and experimentally
681
measured solubility as high as 0.93 and -0.84 for CamSol and SAP, respectively.
682 683
Solubis. Most of computational methods to predict APRs from an amino acid sequence do not
684
take account of the conformational stability of proteins. On the one hand, APRs should be a
685
cluster of hydrophobic residues, and inside a protein they are most likely to contribute
686
favorably to conformational stability in the native, folded state and could be a trigger of
687
aggregation only upon denaturation; on the other hand, APRs on protein surfaces could
688
trigger aggregation under native conditions via hydrophobic intermolecular interactions. To
689
distinguish these differences in mechanisms, Van Durme et al. have developed a method
690
termed Solubis that combines TANGO and FoldX.154 This combination results in a
691
structure-based method to design aggregation-resistant proteins by identifying mutations that
692
reduce the intrinsic aggregation propensity assessed by TANGO while respecting
693
conformational stability computed by FoldX.154 With 11 previously published antibodies, the 20
694
same group also demonstrated that Solubis was able to filter the TANGO-predicted APRs by
695
simultaneously considering structural information and conformational stability.155 The
696
authors further exploited Solubis to design antibodies and experimentally verified that one of
697
the designed antibodies exhibited better conformational and colloidal stabilities while
698
preserving the binding capability to the antigen.155
699 700
Brownian dynamics simulations. The examples above centered around mutational studies of
701
antibodies aimed at enhancing the physical stability of antibodies. A somewhat different
702
approach to improving physicochemical properties is to attach a fusion tag to the terminal
703
regions of proteins. For instance, with the guidance of Brownian dynamics (BD)
704
simulations,156 Nautiyal et al. designed an antibody to enhance solubility by attaching a
705
solubility-enhancing peptide (SEP) tag.157 By eliminating the degrees of freedom of the
706
solvent and using rigid-body treatments of protein structures, BD simulations can not only
707
reduce the computational cost, but also enable sampling many protein encounters to provide
708
reliable statistics on association kinetics. In the simulations of Nautiyal et al.,157 110
709
single-chain FV structures, each derived from conventional MD simulations of a homology
710
model of the antibody, were randomly placed, and the solubility of the antibody was
711
estimated by computing during the BD simulations the numbers and sizes of clusters, defined
712
as two antibodies’ approaching one another to within 3.6 Å. The BD simulations suggested
713
that the antibody with the SEP tag tended to be more often in the monomer form and was
714
associated with smaller cluster sizes than the wild type. The experimental verification of this
715
prediction showed that the designed antibody expressed in the soluble fraction, whereas the
716
wild type expressed in the insoluble fraction. Further characterization by DLS and CD
717
measurements also demonstrated that the designed antibody showed better solubility and
718
even better conformational stability. The conservation of the binding capability in the
719
designed antibody, which was confirmed by an SPR measurement, suggested that the SEP tag
720
could be useful in antibody engineering.
721 722
Prediction of viscosity
723
Viscosity has garnered much attention as a target engineering property in computer-aided
724
antibody design. Numerous researchers have devoted considerable effort to understand the
725
molecular origins of the concentration-dependent viscosity behavior of antibodies.158 Similar
726
to cognate protein–protein interactions, the driving forces of self-interactions are
727
hydrophobicity and electrostatics. Under identical formulation conditions, some antibodies
728
tend to show peculiar viscosity behavior that leads to aggregation, whereas others do not 21
729
exhibit such behavior. These observations have suggested that the viscosity behavior of
730
antibodies is determined by their amino acid sequences. Considering that the constant
731
domains of antibodies are well conserved in terms of both sequence and structure, the
732
different behaviors are likely due to differences in the variable regions. Indeed, Li et al. have
733
experimentally measured the viscosity of 11 antibodies under the same conditions and have
734
investigated relationships between concentration-dependent viscosities and several sequence-
735
and structural-based parameters.159 The authors found that the net charge, pI, zeta-potential,
736
and the aggregation property of FV regions should be important determinants of the
737
concentration-depend viscosity behavior observed in antibody solutions. Likewise, Sharma et
738
al. employed 14 antibodies to correlate the experimentally measured viscosity values with
739
predicted properties of antibodies, including properties obtained from MD simulations.160 In
740
agreement with the observations made by Li et al.,159 the authors arrived at the conclusions
741
that the viscosities of antibodies increase with hydrophobicity and charge dipole distribution,
742
whereas they decrease with net charge.160 The authors also found that 1) fast clearance is
743
correlated with high hydrophobicity of CDRs and high positive or high negative net charge,
744
2) chemical degradation from Trp oxidation is correlated with the average time of solvent
745
exposure of Trp residues, and 3) Asp isomerization rates can be predicted from the solvent
746
exposure and residue flexibility of Asp residues.
747
Furthermore, Trout and coworkers have proposed a high-throughput in silico tool,
748
termed spatial charge map (SCM), to identify highly viscous antibodies from their
749
structures.99 Conceptually similar to SAP calculations, where APRs are identified by spatial
750
summation of residue hydrophobicity, SCM calculations are based on spatial summation of
751
residue charges, with more emphasis on negative charge; several previous studies have
752
demonstrated that high-antibody viscosities are better correlated with negative than positive
753
charges161 on FV regions. Benchmarking of the SCM calculations with 19 antibodies provided
754
by three pharmaceutical and biotech companies showed clear separations between antibodies
755
possessing high and low viscosities.
756
With the goal of directly assessing the viscosity of antibody solutions, Kumar and
757
coworkers have employed experimentally measured viscosity data from 16 different
758
antibodies in the same formulation to derive mathematical models that aim to predict
759
concentration-dependent viscosity curves162 and a diffusion interaction parameter (kD)30 for
760
each antibody. An equation that was obtained from a stepwise linear regression for prediction
761
of viscosity curves included as independent variables the hydrophobicity of full-length
762
antibodies and charges on FV regions and hinge regions. The correlation coefficient (r)
763
between the experimental and predicted parameters of solution behavior was 0.54 with 22
764
leave-one-out
cross
validation,
and
the
equation
was
able
to
predict
the
765
concentration-dependent viscosity curves of the antibody solutions reasonably well.162
766
During the course of derivation of predictive models, the authors also found that the diffusion
767
interaction parameter, kD, was correlated well with several other parameters, such as
768
conformational stability, solubility and electrostatic properties of antibodies.30 To predict kD
769
from either experimentally measured or computationally predicted parameters, several
770
equations have been derived based on linear regressions on the parameters. The kD values
771
predicted by an equation derived purely from predicted parameters (estimated total charge on
772
FV and structure-based calculated hydrophobicity) have been highly correlated with
773
experimental kD values (r = 0.92).
774
Machine learning models are often referred to as a black box since what they describe is
775
correlations rather than causations. To mitigate such “feature” of machine learning
776
algorithms, Gentiluomo et al. have proposed an interpretable predictive model based on
777
neural networks to predict melting temperature (Tm), aggregation onset temperature (Tagg)
778
and diffusion interaction parameters (kD) as a function of pH and salt concentration from
779
amino acid composition of antibodies.163 Five IgG were provided with the the PIPPI
780
consortium (http://www.pippi.kemi.dtu.dk) as the dataset. After the training and testing of
781
their method, the authors applied a knowledge transfer process by evaluating the weights of
782
the parameters used in the trained networks, helping to understand how the prediction
783
algorithm arrive at the conclusion.
784 785
Engineering viscosity of antibody solutions
786
The foregoing studies were intended to screen and select antibodies with relatively low
787
viscosity from a large pool of antibodies that exhibit a variety of properties. In a situation
788
where only highly viscous antibodies are available, the sequences of those antibodies need to
789
be optimized by design. Here, we review studies that employ such computations at the stage
790
of antibody engineering.
791
In an example of computer-aided viscosity engineering that exploits a homology model
792
of an antibody, Nichols et al. have performed two types of mutagenesis studies: disruption of
793
1) an APR predicted by TANGO and PAGE, and 2) a negatively charged region.164 The
794
authors compared the results obtained by the two strategies and found on the one hand that
795
disrupting computationally predicted APRs could reduce the viscosity, but it also destabilized
796
antibodies and abolished antigen binding. On the other hand, a charge-neutralizing mutation
797
of a negative surface residue was able to reduce viscosity while simultaneously maintaining
798
conformational stability and antigen-binding capability. In another study, Kumar et al. have 23
799
also employed a homology model of the same antibody as Nichols et al. and have designed
800
seven variants based on free energy change upon mutation, as assessed by the residue-scan
801
module in MOE2014.09, to improve the physicochemical properties.165 The actual
802
improvements were experimentally verified in five out of seven cases. In particular, a variant
803
exhibited better solution behavior, lower viscosity, a reduced diffusion interaction parameter
804
(kD), better solubility, and even better binding activity toward the antigen.
805
Chow et al. have employed a crystal structure of an antibody with high viscosity and
806
phase separation at a high concentration to improve the properties of the antibody.166 Based
807
on the observation that the charge distribution on the molecular surface computed by the
808
AMBER99 force field in MOE2013 were unbalanced and there were several contacts
809
between neighboring molecules in the crystallographic lattice, the authors identified four
810
point mutations that could mitigate such phenomena. Among the mutations that ELISA and
811
SPR indicated did not affect the binding affinity of the antibody–antigen interaction, two
812
mutations, R33G and N35E in CDR-L1, showed a reduction in viscosity and a lower
813
propensity to form phase separation compared to the wild type. In addition, the mutation
814
S28K in CDR-H1 showed an increased propensity to form phase separation, and F102H in
815
CDR-H3 did not change either viscosity or phase separation behavior. Put together, these
816
results highlighted the importance of negative charges on viscous behavior. The authors
817
further sought to examine the relationships between several experimental parameters
818
measured at a low concentration (4–15 mg/ml) and the viscous behavior of the antibody at
819
high concentration (>50 mg/ml). They found that the diffusion interaction parameter (kD)
820
measured by DLS, the weight-averaged molecular weight, and the hydrodynamic diameters
821
measured by SLS at a low concentration in solution exhibited good correlations with the
822
behavior of the antibody in solution.
823
Geoghegan et al. have employed a homology model of an antibody and information from
824
HDX-MS to identify designable positions to reduce viscosity.167 Because the region
825
suggested by HDX-MS was still large, the authors further exploited the AggScore
826
implemented in the BioLuminate package of Schrödinger and the empirical reasoning that
827
hydrophobic and aromatic residues showed a tendency to contribute self-association, which
828
resulted in four positions located on CDR-H1 (H35), H2 (W50), FR2 (Y49), and CDR-L2
829
(L54). The experimental mutagenesis results indeed showed that the designed variants
830
exhibited reduced self-association tendencies and lower viscosity.
831
Based on the assumption that the tendency of a protein to self-associate is closely linked to
832
the hydration free energy of the protein in its monomeric state, Kuhn et al. have exploited MD
833
simulations and 3D-RISM theory to identify point mutations that could optimize the hydration 24
834
free energies of two antibodies that exhibited high viscosity at high concentrations.168 For those
835
two antibodies, 10 and 18 variants possessing mutations at framework regions were
836
computationally generated based on a crystal structure and a homology model made by the
837
MOE, respectively. These variants were further filtered based on the hydration free energies
838
computed by 3D-RISM theory and averaged over the MD snapshots. As a result, two variants
839
that were experimentally characterized showed that, compared to the wild type, the designed
840
variants, one including both H:E10G/D73N/A76K and L:D60S/E80Q and the other including
841
H:Q13K/D73N/Q115K, exhibited an improvement in solubility and a reduction in viscosity at
842
high concentration based on the dynamic viscosity and second virial coefficients obtained from
843
a rheometer and multiple-angle light scattering, respectively.
844
Jetha et al. have employed a homology model of an antibody, and they designed a series
845
of 97 variants based on the surface hydrophobicity determined by the Protein Surface
846
Analyzer application in MOE2016.0802, which was followed by hydrophobic interaction
847
chromatography (HIC) to experimentally estimate their viscosities;169 the reduced HIC
848
retention time of 67 variants implied lower viscosity. In addition, 93 variants showed binding
849
ability comparable to or better than that of the wild type. Overall, 29 variants exhibited both
850
reduced HIC retention times and comparable or better binding abilities than that of the wild
851
type. Retrospectively, the authors also performed a regression analysis to derive equations to
852
predict HIC retention time from sequence and structural descriptors of antibodies toward
853
high-throughput, in silico screening. The resultant equations exhibited a correlation
854
coefficient (r) between experimental HIC retention times and predicted values of 0.69 for the
855
97 variants above and a correlation coefficient (r) of 0.62 for 137 clinical stage antibodies
856
whose previously published HIC RT values were available.90
857
These described studies have demonstrated that statistical methods are quite useful in
858
screening and engineering viscous antibodies during drug discovery processes. In parallel
859
with such statistical and empirical predictions, thorough understanding of the molecular basis
860
for high viscosity in concentrated antibody solutions is still desirable; molecular simulations
861
can complement empirical predictions to achieve more rational antibody screening and
862
engineering.
863 864
Coarse-grained modeling of the behavior of antibodies in solution
865
In principle, if our understanding of the physics behind antibody structures and dynamics
866
was precise, and computational resources were infinitely available, atomistic molecular
867
simulations should reproduce the solution behavior of antibodies in a crowded, physiological
868
environment. However, such an ideal situation is still far from being a reality; the size of 25
869
conformational spaces explored by antibodies and the simulation timescales to reproduce
870
solution behavior are still too large to be studied in atomistic detail. Although these fields are
871
steadily improving, studies based on traditional molecular simulations assume that a single
872
molecule exists in a water box170 or single interactions happen between cognate pairs.171
873
However, soluble proteins can self-associate in a crowded environment, and such
874
self-association has been suggested to form transient and dynamic clusters in concentrated
875
solutions.27,172,173 Under these circumstances, a simplified coarse-grained (CG) representation
876
of antibody molecules and their simulations can prove useful in the study of the viscosity
877
behavior of concentrated antibody solutions.
878
Using 5-µs CGMD simulations with different resolutions, i.e., one bead per domain model
879
(12 CG sites in a IgG format) and the same model with a bead in each CDR and in each hinge
880
region, respectively (26 CG sites), Chaudhri et al.174,175 have studied the solution behavior of
881
the IgG format of two antibodies that differed from each other by only a few mutations in the
882
CDRs but showed very different viscosity behavior with an increase in concentration.174 Based
883
on the radial distribution function and potential of mean force computed from the simulation
884
trajectories with six different concentrations (20, 40, 60, 80, 100, and 120 mg/ml), the
885
quantification of the concentration dependency of the solution behavior of the antibodies
886
suggested that inter-domain interactions involving both Fab and the constant regions lead to
887
the formation of transient intermolecular networks and that these interactions contribute
888
toward increased viscosity of antibody solutions at high concentrations. The CGMD
889
simulations also suggested that a higher-resolution CG model (26-site model) did not offer
890
much more than the lower resolution model (12-site model); in both models, electrostatic
891
interactions at the domain level played a dominant role in determining the self-association of
892
the antibodies, in qualitative agreement with previous experimental studies, wherein adding
893
NaCl decreased the solution viscosity.176,177 The results obtained by the additional CG
894
simulations on the charge swap mutants175 of the two antibodies were also consistent with
895
previous experimental results.178,179 Subsequently, Buck et al. extended this approach to four
896
different antibodies, and they arrived at similar conclusions: electrostatic complementarity at
897
the domain level was the most vital factor that governed transient network formation in a
898
highly concentrated antibody solution.180
899
computational study employing an all-atom model of IgG structures. Lapelosa et al. employed
900
the same antibodies as Chaudhri et al.174,175 to perform all-atom MD simulations of the single
901
IgG structures, and representative solution structures from the MD trajectories were supplied
902
to the subsequent grid-based conformational search to generate plausible dimer model
These results were also supported by a
26
903
structures.181 Electrostatic interactions were calculated by solving the Poisson-Boltzmann
904
equation. Their results also suggested that electrostatics played a role in self-association.
905
More recently, using the same antibodies used by Chaudhri et al.,174 Wang et al. performed
906
CG Brownian dynamics (BD) simulations to quantitatively reproduce the previous
907
experimental results of bulk transport properties.182 Unlike the previous studies that used
908
CGMD simulations,174 wherein a dielectric constant of 1 was used for assessing electrostatic
909
interactions, Wang et al. have exploited using a dielectric constant of 80 and thus implicitly
910
considered electrostatic screening. As a result, no dense cluster or strong network was
911
observed, but instead loosely connected clusters emerged in the antibody solutions. The bulk
912
transport properties of the antibody solutions such as structure factors, self-diffusivity, and
913
viscosity computed from the CGBD simulations with microscopic parameters were in
914
quantitative agreement with previous experimental values.178,179,183
915
Small-angle X-ray scattering (SAXS) has been used to study self-association of antibody
916
molecules in solution, and the resulting scattering profiles have been interpreted based on
917
simple spherical models interacting through potentials comprised of long-rage repulsion and
918
short-range attraction.183–185 Corbett et al. have gone one step further by using CGMD
919
simulations with a three-bead model, which was able to reproduce features of SAXS profiles
920
that were not captured by spherical models.186
921
923
PREDICTION ANTIBODIES
924
For therapeutic antibodies, poor physicochemical properties such as low stability that lead to
925
(partial) unfolding and aggregations are significant risk factors for deleterious immune
926
responses in patients. Assessing and predicting immunogenicity are therefore also among the
927
important steps in antibody drug discovery.
922
AND
ENGINEERING
OF
IMMUNOGENICITY
OF
928 929
Prediction of humanness and immunogenicity
930
Table V summarizes the computational methods used to assess, predict, and reduce protein
931
immunogenicity. There are several factors that may contribute to immunogenicity of
932
antibodies. Based on how immune systems work, an obvious factor would be a sequence
933
identity to human antibodies. To address this concern, Abhinandan and Martin have compared
934
the amino acid sequences of antibodies of humans and mice to determine the degree of
935
humanness of mouse antibody sequences.187 Based on 3097 light chains and 3409 heavy chains
936
in the Kabat database, the authors derived Z-scores calculated from means and standard
937
deviations of pair-wise sequence identities within human sequences and between human and 27
938
mouse sequences, respectively. The Z-scores represent how typical a sequence is of the human
939
repertoire. However, when the Z-scores were applied to 12 therapeutic antibodies whose
940
anti-antibody response data had been reported, the very poor correlation between the
941
anti-antibody response data and the Z-scores (r = −0.12) suggested that there were no direct
942
relationships between the humanness score and immunogenicity. A web server, SHAB, was
943
developed to compute the Z-scores from an amino acid sequence so that everyone could assess
944
the degree of humanness of their antibodies (Table V). However, antibodies evolve in ways
945
that cause them to have diverse mature sequences derived from sequences of a limited germline
946
origin, and use of the germline gene is not evenly distributed in antibody populations. To avoid
947
any influence of the biased germline use on the assessment of humanness of antibodies,
948
Thullier et al. have proposed another Z-score-based metric that incorporates human germline
949
gene information. This metric is called the G-score.188
950
Germline sequences of antibodies can be attractive references to assess humanness
951
because they originate 100% from humans. Pelat et al. have therefore developed a
952
germinality index (GI) that has been defined as the percentage of residue identities in
953
framework regions between a given antibody sequence and the closest human germline
954
sequence in the IMGT database.189 The GI has been employed to humanize an antibody
955
derived from a non-human primate. The resultant humanized antibody exhibited a higher GI
956
score than a fully human antibody while preserving the binding capability to the antigen.189
957
Gao et al. have developed yet another sequence identity-based method termed the T20
958
score analyzer to quantify the humanness of antibodies.190 The authors first construct a
959
database of human antibodies that stores 38,708 human antibody–variable region sequences
960
derived from the NCBI IgBLAST.191 A BLAST search of an input antibody sequence is then
961
performed against the database. Averaging the percent sequence identities between a given
962
antibody sequence and the top 20 matched sequences, rather than the entire population, in the
963
database leads to the T20 score. The authors demonstrated that the T20 score was able to
964
distinguish human antibody sequences and non-human antibody sequences. Although
965
conceptually similar to the methods of Martin and coworkers,187,188 a clear distinction
966
between the T20 score and the Z-score is the size of the reference databases of human
967
antibodies; there are 38,708 and 6506 human antibody sequences in the databases of the T20
968
score analyzer and SHAB, respectively. Furthermore, the T20 score was applied to 65
969
therapeutic antibodies whose immunogenicity data were available; a week correlation
970
between the T20 scores and immunogenicity emerged with a correlation coefficient (r) of
971
~0.46. Comparison of antibodies before and after humanization of the antibodies revealed a
972
clear trend: the immunogenicity decreased while the T20 scores increased. 28
973
Seeliger192 has expanded on the use of simple pairwise sequence comparisons to derive
974
sequence-based statistical potentials using 11,849 antibody sequences of humans and mice
975
obtained from the abYsis database.193 Instead of simply computing sequence identities
976
between a given sequence and sequences of human antibodies, the author incorporated
977
position-specific probabilities of individual amino acids derived from a multiple-sequence
978
alignment of each chain type (i.e., heavy, κ-light, or λ-light chains of humans and mice); the
979
resulting potentials were able to distinguish between human and mouse antibodies. Based on
980
Monte Carlo sampling coupled with the potentials derived from the human antibody
981
sequences, the author computationally demonstrated that the sequences of Rituximab can
982
evolve into lower immunogenic sequences, as predicted by a Epivax score.194 The
983
sequence-based potentials were later used to design antibody sequences that were predicted
984
to have better physicochemical properties compared to the wild type.195 The series of the
985
designed antibodies was experimentally characterized; DSC suggested an improvement of the
986
Tm (68.0℃ and 83.5℃ for the wild type and the most improved design, respectively); SEC
987
showed an improvement in the long-term stability of the variants, as represented by the
988
monomer content of the samples under conditions that were relevant to the biopharmaceutical
989
development process over time; and SPR revealed the preserved binding affinities to the
990
antigen among the variants.
991
Similarly, based on a training set of 26,912 antibody sequences derived from humans and
992
mice in the IMGT database, Clavero-Alvarez et al. have developed a multivariable gaussian
993
(MG) model that takes into account the correlations between mutations at different positions
994
both within a chain and across two chains (i.e., H and L).196 The authors sought to distinguish
995
human and mouse sequences under various conditions and found that 1) CDRs did not carry
996
any relevant species-specific information that was necessary to distinguish two sequence
997
populations and 2) light chains carried a greater amount of such information than heavy
998
chains. Furthermore, based on another 1388 and 1379 sequences of human and mouse
999
antibodies, respectively, the MG model showed slightly better ability to distinguish
1000
sequences from the two populations than sequence identity-based methods190 (the prediction
1001
accuracies were 94% and 91% for the MG model and the best sequence identity-based
1002
method, respectively). The MG score was further compared to experimental immunogenicity,
1003
which was defined as the fraction of observed immunogenic responses (appearance of
1004
anti-drug antibodies) reported in the literature; the Pearson correlation coefficient (r) between
1005
the MG score and the immunogenicity was −0.43. Coupled with Steepest Descent and
1006
Simulated Annealing MC simulations, the MG score was exploited to guide sequence
1007
optimizations of seven mouse sequences whose experimentally humanized sequences were 29
1008
also available; the designed sequences starting from the mouse sequences differed from the
1009
experimentally humanized sequences, whereas many of the mutations were in common. The
1010
implication was that the experimentally humanized sequences would not be the only
1011
solutions in humanization and that the computational algorithm was able to capture some
1012
essential aspects of humanization procedures currently used in the field.
1013
Adaptive immune systems begin via antigenic peptide presentations by HLA molecules
1014
toward T-cell receptors. On the one hand, short stretches of peptide that form such T-cell
1015
epitopes on antibody structures may therefore lead to immunogenicity; on the other hand,
1016
germline sequences of human antibodies may not be recognized as “foreign” by HLAs
1017
because their origin is 100% human. Based on this assumption, Lazar et al. have proposed
1018
another metric to assess immunogenicity of antibodies that they have called the Human
1019
String Content (HSC).197 The HSC can be computed for each peptide in a target sequence
1020
based on the number of residues identical to their counterparts in the most similar aligned
1021
peptide from a human germline antibody. Computational prediction of HLA-binding peptides
1022
has been studied for decades, and, to keep abreast of recent trends, we refer readers to a
1023
review by Song and coworkers and the references therein.198
1024
In addition to T-cell epitopes, B-cell epitopes on antibody structures can be immunogenic,
1025
and interactions between two antibodies or anti-antibody responses occur through such
1026
immunogenic regions on antibody structures.199 However, such experimental data (i.e.,
1027
sequences and structures of anti-antibody antibodies) are not readily available. For instance,
1028
using 44 antibody-antibody complexes in the Protein Data Bank, Qiu et al. have tried to
1029
examine whether B-cell epitopes on antibodies possess propensities similar to those on
1030
generic protein antigens.199 It seemed, however, that the fact that their dataset consisted of not
1031
only immunogenic antibody–antibody complexes but also antibody–antibody complexes that
1032
may have been formed merely by crystal-packing contacts made it difficult to draw a
1033
conclusion regarding differences between B-cell epitopes on antibodies and on generic
1034
protein antigens. The nature of cognate protein–protein interactions is considered to differ
1035
phenomenologically from crystal-packing contacts.200
1036
As shown in Table V, only two of the six methods for immunogenicity assessment
1037
(Z-score and T20 score analyzer) have been implemented as web servers and are available in
1038
public as of the writing of this review article. The concepts behind the other methods are
1039
quite simple, and in-house implementation as web servers or command line tools would be
1040
straightforward. Considering the fact that even fully human antibodies could be
1041
immunogenic,201 there would be no perfect single method to assess immunogenicity of
30
1042
antibodies in silico. Looking at antibodies from a variety of angles with different techniques
1043
will therefore be highly desirable.
1044 1045
Antibody humanization
1046
Antibody humanization was one of the earliest attempts at computer-aided antibody
1047
design. The initial attempt for humanization used CDR grafting.202 The assumption was that
1048
the more similar an antibody sequence was to a human antibody sequence, the lower the
1049
immunogenicity it would have. Humanization by CDR grafting often requires back-mutation
1050
in framework regions, wherein visualization of three-dimensional structures of antibodies
1051
helps to identify important residues such as residues that structurally support CDRs.
1052
Framework templates can be obtained by a simple similarity search of the entire sequences as
1053
well as a similarity search of CDRs. The latter is referred to as super-humanization.203,204 The
1054
rationale is that the more similar a CDR sequence is to a CDR sequence of human antibodies,
1055
the more conserved the framework would be because of the conservation of canonical
1056
structures; important framework residues needed to maintain CDR conformations are
1057
assumed to be conserved when CDR sequences are similar between two antibodies, and
1058
hence there would be no need for the back-mutation in framework regions. In addition,
1059
Roguska et al. have proposed an alternative technique called resurfacing, wherein residues
1060
exposed to solvents in the FV region of mouse antibodies are replaced with corresponding
1061
residues observed in human antibodies.205 The statistics of residue frequency at each position
1062
in antibodies of humans and other species can now be readily obtained via the abYsis
1063
database developed by Swindells et al.193 A web server termed Tabhu206 has enabled easy
1064
access to a large number of annotated human antibody sequences and thereby made
1065
automated template searches and CDR grafting much easier for non-experts.
1066
There are several reviews that have surveyed past examples of antibody
1067
humanization.207–209 Table V also summarizes the representative methods for antibody
1068
humanization. In the following paragraphs, we present recent examples of computer-aided
1069
humanization that go beyond simple sequence comparisons and visualization of a static
1070
structure.
1071
Historically, a crystal structure or a homology model has been exploited in humanization
1072
procedures. However, proteins are dynamic molecules in solution,19 and a conformational
1073
ensemble is probably a better representation of a protein. An obstacle in traditional
1074
humanization work has been reducing or even diminishing the binding affinity after CDR
1075
grafting; this binding affinity has been interpreted as structural distortion of CDRs caused by
1076
incompatibility between the grafted CDRs and framework regions.210 MD simulations are 31
1077
among the best methods for assessing such dynamical effects on protein structures. For
1078
example, Zhang et al. have employed MD simulations to assess mutational effects on
1079
antibody structures during a humanization procedure.211 After in silico epitope scanning
1080
based on sequence (6-residue) and spatial (2-residue pair in space) local similarities to human
1081
antibodies, several residues were computationally identified as immunogenic. After replacing
1082
those residues with the corresponding residues observed in human antibodies, they performed
1083
5-ns MD simulations of the series of homology models of the variants (30 variants in total)
1084
with an explicit solvent. By using RMSD as a metric to assess CDR flexibilities, the authors
1085
were able to design humanized variants that possessed binding affinity comparable to that of
1086
the original rat antibody, which had been experimentally assessed by SPR and flow
1087
cytometry. In another example, Kunert, Oostenbrink, and coworkers have also sought to use
1088
MD simulations to predict effects of back-mutations on antibody structures.212,213 On the
1089
assumption that variants with structures and dynamics comparable to those of the original
1090
mouse antibody would show significant binding, a similarity score was developed based on
1091
the RMSD of all atoms in CDR-H3 to quantify conformational differences between the
1092
mouse antibody and humanized variants during the simulations. Starting from a crystal
1093
structure of the mouse antibody or the variant models, MD simulations were performed for
1094
~100 ns in an explicit solvent. The weak correlation between the similarity scores and
1095
binding affinities experimentally measured by bio-layer interferometry (BLI) confirmed that
1096
a requirement of humanization procedures was to identify mutations that could restore
1097
conformations of CDRs. The MD simulations further suggested a few mutations that seemed
1098
to structurally support the conformation of CDR-H3 and thereby affected the binding
1099
capability. These observations were experimentally verified via Ala scanning and BLI
1100
measurements. As a result, starting from a humanized variant that had completely lost its
1101
binding affinity for the antigen, the authors were able to restore the affinity to the level of the
1102
original mouse antibody with some back mutations selected by the MD simulations.
1103
All the foregoing examples have focused on designs or back-mutations on framework
1104
regions of the antibodies to restore the binding. Another strategy to restore binding affinity is
1105
to design CDRs so that they can retain conformations, i.e., incorporate some residues
1106
observed in CDRs of human antibodies that are compatible to the human framework.
1107
However, compared to framework regions, CDRs are hyper-variable and are expected to
1108
contribute binding to the cognate antigens. It is therefore not straightforward to empirically
1109
identify such mutations. In such a situation, computational protein design can be a solution.
1110
In a study by Hanf et al.214 protein design calculations were performed using DEEK software
1111
that exploited dead-end elimination (DEE) and A* search algorithms and by Dezymer 32
1112
software that also exploited the DEE algorithm215, respectively. The calculations were used to
1113
re-design sequences of CDRs of an antibody; the top recommendations from both pieces of
1114
software were merged to make the final list of designed variants. The initial structure for
1115
computational design was a model structure of the CDR-grafted antibody that possessed
1116
human germline framework regions. The structure was built from a crystal structure of the
1117
original mouse antibody; the binding affinity of the CDR-grafted antibody measured by
1118
ELISA was 100-fold worse than the original mouse antibody. For validation of the
1119
computational design, eight suggested variants were experimentally synthesized, and two of
1120
them exhibited binding affinities comparable to that of the wild type.
1121
An advantage of incorporating protein design calculations into humanization procedures
1122
is that, in addition to immunogenicity, one can take account of other properties, such as
1123
stability. In accord with this line of reasoning, Bailey-Kellogg, Griswold, and coworkers have
1124
been developing computational de-immunization methods for protein therapeutics.216–223
1125
Instead of CDR grafting, their method relies on identification of potential short stretches of
1126
T-cell epitopes on protein structures and replaces the amino acids that form the T-cell
1127
epitopes. The short stretches can then have lower propensities of T-cell epitopes. For instance,
1128
using a homology model of a mouse IgG1 antibody as a design target, Choi et al.221
1129
employed 1) the HSC scores197 to assess the immunogenic regions and 2) the OSPREY
1130
protein redesign software224 to replace some of the amino acids with amino acids observed in
1131
germline sequences of human antibodies.225 The authors further experimentally demonstrated
1132
that four of the eight humanized variants tested for the verification exhibited binding
1133
affinities within an order of magnitude of the original mouse antibody based on assessments
1134
with BLI measurements.221 However, a variant designed by traditional CDR grafting could
1135
not be expressed, probably because the CDR grafting introduced five mutations to the
1136
Vernier zone,226 whereas when consideration was given to energetics, their computational
1137
procedure was able to retain all the Vernier zone residues. The importance of Vernier zone
1138
residues in humanization has been implicated by previous experimental studies, where the
1139
reduction of binding affinities has been attributed to displacements of VL/VH domains as well
1140
as distortions of the canonical structures.227,228
1141
In another instance, using a crystal structure of cetuximab, Choi et al. have used DSF
1142
measurements to also show that their computational humanization method is able to increase
1143
the HSC score and simultaneously improve the conformational stability of the antibody (∆Tm
1144
= ~6.3℃) while preserving the binding affinity to the antigen.222 Because these T-cell
1145
epitope-based deimmunization methods are independent of CDR grafting, they can be
1146
applied to other protein therapeutics, such as enzyme and peptides, as demonstrated in other 33
1147
studies.229–233 The integrated methods developed by Bailey-Kellogg, Griswold, and
1148
coworkers are in the public domain as the EpiSweep package.234
1149 1150
PERSPECTIVES
1151
The predictive tools described here, together with accumulated knowledge of antibody
1152
sequences, structures, and properties, should now enable rapid screening and selection of
1153
antibodies during the early processes of antibody drug discovery. For drug discovery of small
1154
compounds, various rapid screening approaches have been effectively employed, such as the
1155
estimation of “druggability,” for rational design and evaluation of more potent compounds.235
1156
For antibody drug discovery, to our knowledge, the first such metric is one proposed by
1157
Kuroda et al;8 based on 12 amino acid sequences of antibody therapeutics reported in the
1158
DrugBank236 at that time, the authors pointed out that the antibody therapeutics tended to
1159
have shorter lengths and more rigid conformations of CDR-H3. Furthermore, recent
1160
advancements in experimental techniques have enabled high-throughput analyses for
1161
physicochemical characterizations of antibody therapeutics; Wittrup and coworkers collected
1162
amino acid sequences of 137 antibodies in the clinical stages (phase-2 and -3), and
1163
experimentally characterized the physicochemical properties.90 Their dataset should guide
1164
computational biologists toward developments of druggability metrices of antibody
1165
therapeutics. For instance, Raybould et al. implemented the Therapeutic Antibody Profiler
1166
(TAP) webserver to assess the druggability of antibodies from their amino acid sequences.237
1167
Based on a statistical analysis of the 137 clinical-stage antibody therapeutics90 and the
1168
comparison to human antibody repertoires,238 the authors found that the total length of CDRs,
1169
surface hydrophobicity, charges in CDRs, and asymmetry of the surface net charges of FV
1170
domains could be guidelines to assess the druggability of antibodies.
1171
It is worth noting that the implementation of the TAP server and many of the other
1172
studies described above have employed homology modeling of antibodies to derive various
1173
parameters and to engineer better antibodies. Except for CDR-H3, antibody structures are
1174
conserved well;2,4,239 these studies have therefore strongly suggested that current antibody
1175
modeling techniques are reliable enough to be employed in high-throughput, sequence-based
1176
in silico screening. However, structure prediction of CDR-H3 is still very challenging.8,9
1177
Because the function of antibodies has centered on the diversity of CDR-H3, to design a
1178
functional antibody with better developability, methods for structure prediction of CDR-H3
1179
as well as antibody–antigen complexes need to be improved. In particular, conformational
1180
changes can occur upon antigen binding. In addition to CDR conformations, the relative
34
1181
orientation of VL/VH domains can also change,228 making it even more challenging to predict
1182
structures of antibodies and antibody–antigen complexes.240,241
1183
Modeling conformational change or flexibility of proteins in silico is still an unsolved
1184
problem; Kuroda and Gray previously demonstrated that accuracies of current computational
1185
methods to model protein backbone flexibility is not satisfactory and subtle backbone
1186
displacements could lead to deteriorate energy landscape of proteins in computational
1187
modeling.242 There are a lot of studies on inter-relationships of protein flexibility, aggregation
1188
and chemical stability.243–245 Therefore, the method development for modeling protein
1189
flexibility is another important area in computer-aided antibody design.
1190
Changes in binding affinity of antibody–antigen interactions or the binding free energy
1191
can be described by a thermodynamic equation relating to enthalpy and entropy; favorable
1192
enthalpic interactions are often attributed to formations of new salt bridges or hydrogen
1193
bonds, whereas favorable entropic interactions can be interpreted by rigidification of
1194
antibodies themselves or change in water dynamics at the interfaces.246–249 Thus, it is most
1195
likely that there are multiple routes to improve binding affinity of antibody–antigen
1196
interactions.250 An interesting strategy to improve binding affinity has been also suggested
1197
based on mutations of framework regions, which do not directly contact antigens, affecting
1198
on-rates of antigen binding.251,252 Elucidating the molecular mechanism of such long-range
1199
mutational effects will lead to a novel maturation strategy that has not been employed in our
1200
immune systems.
1201
De novo design of functional proteins is now becoming possible with guidance of some
1202
experimental procedures.253–256 For antibody design, the use in a few successful studies of
1203
profile-based constraints to design antibody sequences254,257 has suggested that antibody
1204
sequences have already been highly optimized in the evolutionary process (i.e., both in
1205
mammalian evolution and somatic hypermutation). The imposition of selection pressure on
1206
sequence design calculations by these profile-based constraints has forced the designed
1207
sequences to mimic natural variations of antibody sequences. However, some properties such
1208
as viscosity are specific for biopharmaceutical and biotechnological applications and would
1209
not be selected for or against by evolution. Further studies on concentrated antibody solutions
1210
and methods to optimize such properties are therefore highly desirable for understanding the
1211
molecular basis of such properties.
1212
In recent decades, various properties of proteins have been predicted from amino acid
1213
sequences or structures by machine learning in which explicit design processes have not been
1214
effectively implemented. Considering that de novo design of small drug compounds is still
1215
not an easy task,258 de novo creation of functional amino acid sequences by machine learning 35
1216
may not yet be feasible. However, for antibodies in particular, artificial antibodies have been
1217
created using in vitro libraries.259 Considering recent advances in computational modeling
1218
algorithms and our knowledges of antibody sequences and structures, de novo creation of
1219
antibodies in silico could thus be achieved in the near feature.
1220
In this review, we outlined various approaches to engineer physicochemical and
1221
biological properties of antibodies, in which amino acid sequences were modified mainly
1222
through computational design. In addition to such mutation-based design approaches, another
1223
way to control antibody’s properties is to use chemical additives during manufacturing
1224
processes.260,261 Computations could also play some roles in understanding molecular details
1225
of interactions between proteins and such additives.
1226
Put together, further precise and quantitative understanding of antibody properties would
1227
make it possible to simultaneously optimize binding affinity, specificity, stability, viscosity,
1228
and immunogenicity of the amino acid sequences by computational design.
1229 1230
ACKNOWLEDGMENTS
1231
This work was funded in part by the Japan Society for the Promotion of Science (grant
1232
numbers JP17K18113 and JP19H03522 to D.K., and JP16H02420 and JP19H05766 to K.T.)
1233
and by the Japan Agency for Medical Research and Development (grant numbers
1234
JP19fm0208022h, JP18ak0101100h, and JP19ak0101117h to D.K., and JP18am0101094j,
1235
JP18dm0107064h,
1236
JP18ak0101100h to K.T.).
JP18mk0101081h,
JP18fm0208030h,
JP18fk0108073h,
and
1237 1238
REFERENCES
1239
1.
Almagro JC, Teplyakov A, Luo J, et al. Second Antibody Modeling Assessment
1240
(AMA-II). Proteins Struct Funct Bioinforma. 2014;82(8):1553-1562.
1241
doi:10.1002/prot.24567
1242
2.
Al-Lazikani B, Lesk AM, Chothia C. Standard conformations for the canonical
1243
structures of immunoglobulins. J Mol Biol. 1997;273(4):927-948.
1244
doi:10.1006/jmbi.1997.1354
1245
3.
hypervariable regions. Nature. 1989;342(6252):877-883. doi:10.1038/342877a0
1246 1247
Chothia C, Lesk AM, Tramontano A, et al. Conformations of immunoglobulin
4.
Kuroda D, Shirai H, Kobori M, Nakamura H. Systematic classification of CDR-L3 in
1248
antibodies: Implications of the light chain subtypes and the VL-VH interface. Proteins
1249
Struct Funct Bioinforma. 2009;75(1):139-146. doi:10.1002/prot.22230
36
1250
5.
conformations. J Mol Biol. 2011;406(2):228-256. doi:10.1016/j.jmb.2010.10.030
1251 1252
6. 7.
Shirai H, Kidera A, Nakamura H. H3-rules: identification of CDR-H3 structures in antibodies. FEBS Lett. 1999;455(1-2):188-197. doi:10.1016/S0014-5793(99)00821-2
1255 1256
Shirai H, Kidera A, Nakamura H. Structural classification of CDR-H3 in antibodies. FEBS Lett. 1996;399(1-2):1-8. doi:10.1016/S0014-5793(96)01252-5
1253 1254
North B, Lehmann A, Dunbrack RL. A new clustering of antibody CDR loop
8.
Kuroda D, Shirai H, Kobori M, Nakamura H. Structural classification of CDR-H3
1257
revisited: A lesson in antibody modeling. Proteins Struct Funct Bioinforma.
1258
2008;73(3):608-620. doi:10.1002/prot.22087
1259
9.
Structure. 2015;23(2):302-311. doi:10.1016/j.str.2014.11.010
1260 1261
10.
Kuroda D, Tsumoto K. Antibody Affinity Maturation by Computational Design. In: Methods in Molecular Biology. ; 2018:15-34. doi:10.1007/978-1-4939-8648-4_2
1262 1263
Weitzner BD, Dunbrack RL, Gray JJ. The origin of CDR H3 structural diversity.
11.
Rabia LA, Desai AA, Jhajj HS, Tessier PM. Understanding and overcoming trade-offs
1264
between antibody affinity, specificity, stability and solubility. Biochem Eng J.
1265
2018;137:365-374. doi:10.1016/j.bej.2018.06.003
1266
12.
Protein Eng Des Sel. 2012;25(10):507-521. doi:10.1093/protein/gzs024
1267 1268
Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design.
13.
Sevy AM, Meiler J. Antibodies: Computer-Aided Prediction of Structure and Design
1269
of Function. Microbiol Spectr. 2014;2(6):1-14.
1270
doi:10.1128/microbiolspec.AID-0024-2014
1271
14.
2018;51:156-162. doi:10.1016/j.sbi.2018.04.007
1272 1273
15.
Roy A, Nair S, Sen N, Soni N, Madhusudhan MS. In silico methods for design of biological therapeutics. Methods. 2017;131:33-65. doi:10.1016/j.ymeth.2017.09.008
1274 1275
Fischman S, Ofran Y. Computational design of antibodies. Curr Opin Struct Biol.
16.
Norman RA, Ambrosetti F, Bonvin AMJJ, et al. Computational approaches to
1276
therapeutic antibody design: established methods and emerging trends. Brief Bioinform.
1277
October 2019. doi:10.1093/bib/bbz095
1278
17.
Antibodies. 2018;7(3):22. doi:10.3390/antib7030022
1279 1280
18.
Kazlauskas R. Engineering more stable proteins. Chem Soc Rev. 2018;47(24):9026-9045. doi:10.1039/C8CS00014J
1281 1282
Zhao J, Nussinov R, Wu W-J, Ma B. In Silico Methods in Antibody Design.
19.
Boehr DD, Nussinov R, Wright PE. The role of dynamic conformational ensembles in
1283
biomolecular recognition. Nat Chem Biol. 2009;5(11):789-796.
1284
doi:10.1038/nchembio.232 37
1285
20.
Meric G, Robinson AS, Roberts CJ. Driving Forces for Nonnative Protein Aggregation
1286
and Approaches to Predict Aggregation-Prone Regions. Annu Rev Chem Biomol Eng.
1287
2017;8(1):139-159. doi:10.1146/annurev-chembioeng-060816-101404
1288
21.
Ventura S, Zurdo J, Narayanan S, et al. Short amino acid stretches can mediate
1289
amyloid formation in globular proteins: The Src homology 3 (SH3) case. Proc Natl
1290
Acad Sci. 2004;101(19):7258-7263. doi:10.1073/pnas.0308249101
1291
22.
Tsumoto K, Ejima D, Kumagai I, Arakawa T. Practical considerations in refolding
1292
proteins from inclusion bodies. Protein Expr Purif. 2003;28(1):1-8.
1293
doi:10.1016/S1046-5928(02)00641-1
1294
23.
Tsumoto K, Umetsu M, Kumagai I, Ejima D, Philo JS, Arakawa T. Role of arginine in
1295
protein refolding, solubilization, and purification. Biotechnol Prog.
1296
2004;20(5):1301-1308. doi:10.1021/bp0498793
1297
24.
Kumar S, Plotnikov N V., Rouse JC, Singh SK. Biopharmaceutical Informatics:
1298
supporting biologic drug development via molecular modelling and informatics. J
1299
Pharm Pharmacol. 2018;70(5):595-608. doi:10.1111/jphp.12700
1300
25.
Lu X, Nobrega RP, Lynaugh H, et al. Deamidation and isomerization liability analysis
1301
of 131 clinical-stage antibodies. MAbs. 2019;11(1):45-57.
1302
doi:10.1080/19420862.2018.1548233
1303
26.
Nowak C, K. Cheung J, M. Dellatore S, et al. Forced degradation of recombinant
1304
monoclonal antibodies: A practical guide. MAbs. 2017;9(8):1217-1230.
1305
doi:10.1080/19420862.2017.1368602
1306
27.
Tomar DS, Kumar S, Singh SK, Goswami S, Li L. Molecular basis of high viscosity in
1307
concentrated antibody solutions: Strategies for high concentration drug product
1308
development. MAbs. 2016;8(2):216-228. doi:10.1080/19420862.2015.1128606
1309
28.
Grünberger A, Lai P-K, Blanco MA, Roberts CJ. Coarse-Grained Modeling of Protein
1310
Second Osmotic Virial Coefficients: Sterics and Short-Ranged Attractions. J Phys
1311
Chem B. 2013;117(3):763-770. doi:10.1021/jp308234j
1312
29.
Blanco MA, Sahin E, Robinson AS, Roberts CJ. Coarse-Grained Model for Colloidal
1313
Protein Interactions, B 22 , and Protein Cluster Formation. J Phys Chem B.
1314
2013;117(50):16013-16028. doi:10.1021/jp409300j
1315
30.
Tomar DS, Singh SK, Li L, Broulidakis MP, Kumar S. In Silico Prediction of
1316
Diffusion Interaction Parameter (kD), a Key Indicator of Antibody Solution Behaviors.
1317
Pharm Res. 2018;35(10):193. doi:10.1007/s11095-018-2466-6
1318 1319
31.
Hwang WYK, Foote J. Immunogenicity of engineered antibodies. Methods. 2005;36(1):3-10. doi:10.1016/j.ymeth.2005.01.001 38
1320
32.
FEBS Lett. 2014;588(2):269-277. doi:10.1016/j.febslet.2013.11.029
1321 1322
33. 34.
Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A. 2000;97(19):10383-10388. doi:97/19/10383 [pii]
1325 1326
Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science (80- ). 1997;278(5335):82-87. doi:10.1126/science.278.5335.82
1323 1324
Rouet R, Lowe D, Christ D. Stability engineering of the human antibody repertoire.
35.
Cao H, Wang J, He L, Qi Y, Zhang JZ. DeepDDG: Predicting the Stability Change of
1327
Protein Point Mutations Using Neural Networks. J Chem Inf Model. 2019.
1328
doi:10.1021/acs.jcim.8b00697
1329
36.
Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the
1330
estimation of protein stability changes upon mutation and sequence optimality. BMC
1331
Bioinformatics. 2011;12(1):151. doi:10.1186/1471-2105-12-151
1332
37.
Pandurangan AP, Ochoa-Montaño B, Ascher DB, Blundell TL. SDM: a server for
1333
predicting effects of mutations on protein stability. Nucleic Acids Res.
1334
2017;45(W1):W229-W235. doi:10.1093/nar/gkx439
1335
38.
Folkman L, Stantic B, Sattar A, Zhou Y. EASE-MM: Sequence-Based Prediction of
1336
Mutation-Induced Stability Changes with Feature-Based Multiple Models. J Mol Biol.
1337
2016;428(6):1394-1405. doi:10.1016/j.jmb.2016.01.012
1338
39.
Pires DE V., Ascher DB, Blundell TL. mCSM: predicting the effects of mutations in
1339
proteins using graph-based signatures. Bioinformatics. 2014;30(3):335-342.
1340
doi:10.1093/bioinformatics/btt691
1341
40.
Capriotti E, Fariselli P, Rossi I, Casadio R. A three-state prediction of single point
1342
mutations on protein stability changes. BMC Bioinformatics. 2008;9(Suppl 2):S6.
1343
doi:10.1186/1471-2105-9-S2-S6
1344
41.
Quan L, Lv Q, Zhang Y. STRUM: structure-based prediction of protein stability
1345
changes upon single-point mutation. Bioinformatics. 2016;32(19):2936-2946.
1346
doi:10.1093/bioinformatics/btw361
1347
42.
Cheng J, Randall A, Baldi P. Prediction of protein stability changes for single-site
1348
mutations using support vector machines. Proteins Struct Funct Bioinforma.
1349
2005;62(4):1125-1132. doi:10.1002/prot.20810
1350
43.
Pucci F, Kwasigroch JM, Rooman M. SCooP: an accurate and fast predictor of protein
1351
stability curves as a function of temperature. Valencia A, ed. Bioinformatics.
1352
2017;33(21):3415-3422. doi:10.1093/bioinformatics/btx417
39
1353
44.
Gapsys V, Michielssens S, Seeliger D, de Groot BL. pmx: Automated protein structure
1354
and topology generation for alchemical perturbations. J Comput Chem.
1355
2015;36(5):348-354. doi:10.1002/jcc.23804
1356
45.
Gapsys V, Michielssens S, Seeliger D, de Groot BL. Accurate and Rigorous Prediction
1357
of the Changes in Protein Free Energies in a Large-Scale Mutation Scan. Angew
1358
Chemie - Int Ed. 2016;55(26):7364-7368. doi:10.1002/anie.201510054
1359
46.
Wang L, Wu Y, Deng Y, et al. Accurate and Reliable Prediction of Relative Ligand
1360
Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy
1361
Calculation Protocol and Force Field. J Am Chem Soc. 2015;137(7):2695-2703.
1362
doi:10.1021/ja512751q
1363
47.
Steinbrecher T, Zhu C, Wang L, et al. Predicting the Effect of Amino Acid
1364
Single-Point Mutations on Protein Stability—Large-Scale Validation of MD-Based
1365
Relative Free Energy Calculations. J Mol Biol. 2017;429(7):948-963.
1366
doi:10.1016/j.jmb.2016.12.007
1367
48.
Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting
1368
protein stability upon mutation: good on average but not in the details. Protein Eng
1369
Des Sel. 2009;22(9):553-560. doi:10.1093/protein/gzp030
1370
49.
Montanucci L, Martelli PL, Ben-Tal N, Fariselli P. A natural upper bound to the
1371
accuracy of predicting protein stability changes upon mutations. Valencia A, ed.
1372
Bioinformatics. 2019;35(9):1513-1517. doi:10.1093/bioinformatics/bty880
1373
50.
Best RB, Hummer G, Eaton WA. Native contacts determine protein folding
1374
mechanisms in atomistic simulations. Proc Natl Acad Sci. 2013;110(44):17874-17879.
1375
doi:10.1073/pnas.1311599110
1376
51.
Bekker G-J, Ma B, Kamiya N. Thermal stability of single-domain antibodies estimated
1377
by molecular dynamics simulations. Protein Sci. 2019;28(2):429-438.
1378
doi:10.1002/pro.3546
1379
52.
Zabetakis D, Shriver-Lake LC, Olson MA, Goldman ER, Anderson GP. Experimental
1380
evaluation of single-domain antibodies predicted by molecular dynamics simulations
1381
to have elevated thermal stability. Protein Sci. July 2019:pro.3692.
1382
doi:10.1002/pro.3692
1383
53.
Solubility. J Mol Biol. 2012;421(2-3):237-241. doi:10.1016/j.jmb.2011.12.005
1384 1385
Agostini F, Vendruscolo M, Tartaglia GG. Sequence-Based Prediction of Protein
54.
Sormanni P, Aprile FA, Vendruscolo M. The CamSol Method of Rational Design of
1386
Protein Mutants with Enhanced Solubility. J Mol Biol. 2015;427(2):478-490.
1387
doi:10.1016/j.jmb.2014.09.026 40
1388
55.
Navarro S, Ventura S. Computational re-design of protein structures to improve
1389
solubility. Expert Opin Drug Discov. 2019;14(10):1077-1088.
1390
doi:10.1080/17460441.2019.1637413
1391
56.
Agrawal NJ, Kumar S, Wang X, Helk B, Singh SK, Trout BL. Aggregation in
1392
Protein-Based Biotherapeutics: Computational Studies and Tools to Identify
1393
Aggregation-Prone Regions. J Pharm Sci. 2011;100(12):5081-5095.
1394
doi:10.1002/jps.22705
1395
57.
Buck PM, Kumar S, Wang X, Agrawal NJ, Trout BL, Singh SK. Computational
1396
Methods to Predict Therapeutic Protein Aggregation. In: Methods in Molecular
1397
Biology. ; 2012:425-451. doi:10.1007/978-1-61779-921-1_26
1398
58.
Des. 1998;3(1):R9-R23. doi:10.1016/S1359-0278(98)00002-9
1399 1400
59.
Blancas-Mejia LM, Misra P, Dick CJ, et al. Immunoglobulin light chain amyloid aggregation. Chem Commun. 2018;54(76):10664-10674. doi:10.1039/C8CC04396E
1401 1402
Fink AL. Protein aggregation: folding aggregates, inclusion bodies and amyloid. Fold
60.
David MPC, Concepcion GP, Padlan EA. Using simple artificial intelligence methods
1403
for predicting amyloidogenesis in antibodies. BMC Bioinformatics. 2010;11(1):79.
1404
doi:10.1186/1471-2105-11-79
1405
61.
Liaw C, Tung C-W, Ho S-Y. Prediction and Analysis of Antibody Amyloidogenesis
1406
from Sequences. Isalan M, ed. PLoS One. 2013;8(1):e53235.
1407
doi:10.1371/journal.pone.0053235
1408
62.
Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L. Prediction of
1409
sequence-dependent and mutational effects on the aggregation of peptides and proteins.
1410
Nat Biotechnol. 2004;22(10):1302-1306. doi:10.1038/nbt1012
1411
63.
Tartaglia GG, Cavalli A, Pellarin R, Caflisch A. Prediction of aggregation rate and
1412
aggregation-prone segments in polypeptide sequences. Protein Sci.
1413
2005;14(10):2723-2734. doi:10.1110/ps.051471205
1414
64.
Maurer-Stroh S, Debulpaep M, Kuemmerer N, et al. Exploring the sequence
1415
determinants of amyloid structure using position-specific scoring matrices. Nat
1416
Methods. 2010;7(3):237-242. doi:10.1038/nmeth.1432
1417
65.
Walsh I, Seno F, Tosatto SCE, Trovato A. PASTA 2.0: an improved server for protein
1418
aggregation prediction. Nucleic Acids Res. 2014;42(W1):W301-W307.
1419
doi:10.1093/nar/gku399
1420
66.
Tartaglia GG, Vendruscolo M. The Zyggregator method for predicting protein
1421
aggregation propensities. Chem Soc Rev. 2008;37(7):1395-1401.
1422
doi:10.1039/b706784b 41
1423
67.
Kuriata A, Iglesias V, Pujols J, Kurcinski M, Kmiecik S, Ventura S. Aggrescan3D
1424
(A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res.
1425
2019;47(W1):W300-W307. doi:10.1093/nar/gkz321
1426
68.
Stanislawski J, Kotulska M, Unold O. Machine learning methods can replace 3D
1427
profile method in classification of amyloidogenic hexapeptides. BMC Bioinformatics.
1428
2013;14(1):21. doi:10.1186/1471-2105-14-21
1429
69.
Fang Y, Gao S, Tai D, Middaugh CR, Fang J. Identification of properties important to
1430
protein aggregation using feature selection. BMC Bioinformatics. 2013;14(1):314.
1431
doi:10.1186/1471-2105-14-314
1432
70.
Gasior P, Kotulska M. FISH Amyloid – a new method for finding amyloidogenic
1433
segments in proteins based on site specific co-occurence of aminoacids. BMC
1434
Bioinformatics. 2014;15(1):54. doi:10.1186/1471-2105-15-54
1435
71.
Thangakani AM, Kumar S, Nagarajan R, Velmurugan D, Gromiha MM. GAP:
1436
towards almost 100 percent prediction for β-strand-mediated aggregating peptides with
1437
distinct morphologies. Bioinformatics. 2014;30(14):1983-1990.
1438
doi:10.1093/bioinformatics/btu167
1439
72.
Família C, Dennison SR, Quintas A, Phoenix DA. Prediction of Peptide and Protein
1440
Propensity for Amyloid Formation. Permyakov EA, ed. PLoS One.
1441
2015;10(8):e0134679. doi:10.1371/journal.pone.0134679
1442
73.
Proteins. Int J Mol Sci. 2018;19(7):2071. doi:10.3390/ijms19072071
1443 1444
Niu M, Li Y, Wang C, Han K. RFAmyloid: A Web Server for Predicting Amyloid
74.
Han X, Wang X, Zhou K. Develop machine learning based regression predictive
1445
models for engineering protein solubility. Valencia A, ed. Bioinformatics. 2019;in
1446
press. doi:10.1093/bioinformatics/btz294
1447
75.
Hou Q, Kwasigroch JM, Rooman M, Pucci F. SOLart: a structure-based method to
1448
predict protein solubility and aggregation. Valencia A, ed. Bioinformatics. October
1449
2019. doi:10.1093/bioinformatics/btz773
1450
76.
Hou Q, Bourgeas R, Pucci F, Rooman M. Computational analysis of the amino acid
1451
interactions that promote or decrease protein solubility. Sci Rep. 2018;8(1):14661.
1452
doi:10.1038/s41598-018-32988-w
1453
77.
Niwa T, Ying B-W, Saito K, et al. Bimodal protein solubility distribution revealed by
1454
an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc Natl
1455
Acad Sci. 2009;106(11):4201-4206. doi:10.1073/pnas.0811922106
1456 1457
78.
Chan P, Curtis RA, Warwicker J. Soluble expression of proteins correlates with a lack of positively-charged surface. Sci Rep. 2013;3(1):3333. doi:10.1038/srep03333 42
1458
79.
Warwicker J, Charonis S, Curtis RA. Lysine and Arginine Content of Proteins:
1459
Computational Analysis Suggests a New Tool for Solubility Design. Mol Pharm.
1460
2014;11(1):294-303. doi:10.1021/mp4004749
1461
80.
Hebditch M, Carballo-Amador MA, Charonis S, Curtis R, Warwicker J. Protein–Sol: a
1462
web tool for predicting protein solubility from sequence. Valencia A, ed.
1463
Bioinformatics. 2017;33(19):3098-3100. doi:10.1093/bioinformatics/btx345
1464
81.
Austerberry JI, Thistlethwaite A, Fisher K, et al. Arginine to Lysine Mutations
1465
Increase the Aggregation Stability of a Single-Chain Variable Fragment through
1466
Unfolded-State Interactions. Biochemistry. 2019;58(32):3413-3421.
1467
doi:10.1021/acs.biochem.9b00367
1468
82.
Hebditch M, Warwicker J. Web-based display of protein surface and pH-dependent
1469
properties for assessing the developability of biotherapeutics. Sci Rep. 2019;9(1):1969.
1470
doi:10.1038/s41598-018-36950-8
1471
83.
Wang X, Singh SK, Kumar S. Potential Aggregation-Prone Regions in
1472
Complementarity-Determining Regions of Antibodies and Their Contribution Towards
1473
Antigen Recognition: A Computational Analysis. Pharm Res. 2010;27(8):1512-1529.
1474
doi:10.1007/s11095-010-0143-5
1475
84.
Chennamsetty N, Voynov V, Kayser V, Helk B, Trout BL. Design of therapeutic
1476
proteins with enhanced stability. Proc Natl Acad Sci U S A.
1477
2009;106(29):11937-11942. doi:10.1073/pnas.0904191106
1478
85.
Lauer TM, Agrawal NJ, Chennamsetty N, Egodage K, Helk B, Trout BL.
1479
Developability index: A rapid in silico tool for the screening of antibody aggregation
1480
propensity. J Pharm Sci. 2012;101(1):102-115. doi:10.1002/jps.22758
1481
86.
Sankar K, Krystek SR, Carl SM, Day T, Maier JKX. AggScore: Prediction of
1482
aggregation-prone regions in proteins based on the distribution of surface patches.
1483
Proteins Struct Funct Bioinforma. 2018;86(11):1147-1156. doi:10.1002/prot.25594
1484
87.
Trainor K, Gingras Z, Shillingford C, et al. Ensemble Modeling and Intracellular
1485
Aggregation of an Engineered Immunoglobulin-Like Domain. J Mol Biol.
1486
2016;428(6):1365-1374. doi:10.1016/j.jmb.2016.02.016
1487
88.
Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S.
1488
AGGRESCAN: a server for the prediction and evaluation of “hot spots” of
1489
aggregation in polypeptides. BMC Bioinformatics. 2007;8(1):65.
1490
doi:10.1186/1471-2105-8-65
43
1491
89.
Beerten J, Van Durme J, Gallardo R, et al. WALTZ-DB: a benchmark database of
1492
amyloidogenic hexapeptides. Bioinformatics. 2015;31(10):1698-1700.
1493
doi:10.1093/bioinformatics/btv027
1494
90.
landscape. Proc Natl Acad Sci. 2017;114(5):944-949. doi:10.1073/pnas.1616408114
1495 1496
Jain T, Sun T, Durand S, et al. Biophysical properties of the clinical-stage antibody
91.
de Groot NS, Castillo V, Graña-Montes R, Ventura S. AGGRESCAN: Method,
1497
Application, and Perspectives for Drug Design. In: Methods in Molecular Biology. ;
1498
2012:199-220. doi:10.1007/978-1-61779-465-0_14
1499
92.
Saerens D, Pellis M, Loris R, et al. Identification of a Universal VHH Framework to
1500
Graft Non-canonical Antigen-binding Loops of Camel Single-domain Antibodies. J
1501
Mol Biol. 2005;352(3):597-607. doi:10.1016/j.jmb.2005.07.038
1502
93.
Vincke C, Loris R, Saerens D, Martinez-Rodriguez S, Muyldermans S, Conrath K.
1503
General Strategy to Humanize a Camelid Single-domain Antibody and Identification
1504
of a Universal Humanized Nanobody Scaffold. J Biol Chem. 2009;284(5):3273-3284.
1505
doi:10.1074/jbc.M806889200
1506
94.
Soler MA, de Marco A, Fortuna S. Molecular dynamics simulations and docking
1507
enable to explore the biophysical factors controlling the yields of engineered
1508
nanobodies. Sci Rep. 2016;6(1):34869. doi:10.1038/srep34869
1509
95.
Sobolev V, Eyal E, Gerzon S, et al. SPACE: a suite of tools for protein structure
1510
prediction and analysis based on complementarity and environment. Nucleic Acids Res.
1511
2005;33(Web Server):W39-W43. doi:10.1093/nar/gki398
1512
96.
Negi SS, Schein CH, Oezguen N, Power TD, Braun W. InterProSurf: a web server for
1513
predicting interacting sites on protein surfaces. Bioinformatics.
1514
2007;23(24):3397-3399. doi:10.1093/bioinformatics/btm474
1515
97.
van Zundert GCP, Rodrigues JPGLM, Trellet M, et al. The HADDOCK2.2 Web
1516
Server: User-Friendly Integrative Modeling of Biomolecular Complexes. J Mol Biol.
1517
2016;428(4):720-725. doi:10.1016/j.jmb.2015.09.014
1518
98.
Hertig S, Latorraca NR, Dror RO. Revealing Atomic-Level Mechanisms of Protein
1519
Allostery with Molecular Dynamics Simulations. Liu J, ed. PLOS Comput Biol.
1520
2016;12(6):e1004746. doi:10.1371/journal.pcbi.1004746
1521
99.
Obrezanova O, Arnell A, de la Cuesta RG, et al. Aggregation risk prediction for
1522
antibodies and its application to biotherapeutic development. MAbs.
1523
2015;7(2):352-363. doi:10.1080/19420862.2015.1007828
44
1524
100. Sydow JF, Lipsmeier F, Larraillet V, et al. Structure-Based Prediction of Asparagine
1525
and Aspartate Degradation Sites in Antibody Variable Regions. Dübel S, ed. PLoS
1526
One. 2014;9(6):e100736. doi:10.1371/journal.pone.0100736
1527
101. Agrawal NJ, Dykstra A, Yang J, et al. Prediction of the Hydrogen Peroxide–Induced
1528
Methionine Oxidation Propensity in Monoclonal Antibodies. J Pharm Sci.
1529
2018;107(5):1282-1289. doi:10.1016/j.xphs.2018.01.002
1530
102. Yang R, Jain T, Lynaugh H, et al. Rapid assessment of oxidation via middle-down
1531
LCMS correlates with methionine side-chain solvent-accessible surface area for 121
1532
clinical stage monoclonal antibodies. MAbs. 2017;9(4):646-653.
1533
doi:10.1080/19420862.2017.1290753
1534
103. Lorenzo JR, Alonso LG, Sánchez IE. Prediction of Spontaneous Protein Deamidation
1535
from Sequence-Derived Secondary Structure and Intrinsic Disorder. Lisacek F, ed.
1536
PLoS One. 2015;10(12):e0145186. doi:10.1371/journal.pone.0145186
1537
104. Plotnikov N V., Singh SK, Rouse JC, Kumar S. Quantifying the Risks of Asparagine
1538
Deamidation and Aspartate Isomerization in Biopharmaceuticals by Computing
1539
Reaction Free-Energy Surfaces. J Phys Chem B. 2017;121(4):719-730.
1540
doi:10.1021/acs.jpcb.6b11614
1541
105. Jia L, Sun Y. Protein asparagine deamidation prediction based on structures with
1542
machine learning methods. de Brevern AG, ed. PLoS One. 2017;12(7):e0181347.
1543
doi:10.1371/journal.pone.0181347
1544
106. Yan Q, Huang M, Lewis MJ, Hu P. Structure Based Prediction of Asparagine
1545
Deamidation Propensity in Monoclonal Antibodies. MAbs. 2018;10(6):901-912.
1546
doi:10.1080/19420862.2018.1478646
1547
107. Delmar JA, Wang J, Choi SW, Martins JA, Mikhail JP. Machine Learning Enables
1548
Accurate Prediction of Asparagine Deamidation Probability and Rate. Mol Ther -
1549
Methods Clin Dev. 2019;15:264-274. doi:10.1016/j.omtm.2019.09.008
1550
108. Chennamsetty N, Quan Y, Nashine V, Sadineni I, Lyngberg O, Krystek S. Modeling
1551
the Oxidation of Methionine Residues by Peroxides in Proteins. J Pharm Sci.
1552
2015;104(4):1246-1255. doi:10.1002/jps.24340
1553
109. Sankar K, Hoi KH, Yin Y, et al. Prediction of methionine oxidation risk in monoclonal
1554
antibodies using a machine learning method. MAbs. 2018;10(8):1281-1290.
1555
doi:10.1080/19420862.2018.1518887
1556
110. Aledo JC, Cantón FR, Veredas FJ. A machine learning approach for predicting
1557
methionine oxidation sites. BMC Bioinformatics. 2017;18(1):430.
1558
doi:10.1186/s12859-017-1848-9 45
1559
111. Moal IH, Fernández-Recio J. SKEMPI: A Structural Kinetic and Energetic database of
1560
Mutant Protein Interactions and its use in empirical models. Bioinformatics.
1561
2012;28(20):2600-2607. doi:10.1093/bioinformatics/bts489
1562
112. Sirin S, Apgar JR, Bennett EM, Keating AE. AB-Bind: Antibody binding mutational
1563
database for computational affinity predictions. Protein Sci. 2016;25(2):393-409.
1564
doi:10.1002/pro.2829
1565
113. Vreven T, Moal IH, Vangone A, et al. Updates to the Integrated Protein–Protein
1566
Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark
1567
Version 2. J Mol Biol. 2015;427(19):3031-3041. doi:10.1016/j.jmb.2015.07.016
1568
114. Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A. ProTherm, version 4.0:
1569
thermodynamic database for proteins and mutants. Nucleic Acids Res.
1570
2004;32(Database issue):D120-1. doi:10.1093/nar/gkh082
1571
115. Thompson MJ, Sievers SA, Karanicolas J, Ivanova MI, Baker D, Eisenberg D. The 3D
1572
profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci.
1573
2006;103(11):4074-4078. doi:10.1073/pnas.0511295103
1574 1575 1576
116. Pawlicki S, Le Béchec A, Delamarche C. AMYPdb: A database dedicated to amyloid precursor proteins. BMC Bioinformatics. 2008;9(1):273. doi:10.1186/1471-2105-9-273 117. Varadi M, De Baets G, Vranken WF, Tompa P, Pancsa R. AmyPro: a database of
1577
proteins with validated amyloidogenic regions. Nucleic Acids Res.
1578
2018;46(D1):D387-D392. doi:10.1093/nar/gkx950
1579
118. Thangakani AM, Nagarajan R, Kumar S, Sakthivel R, Velmurugan D, Gromiha MM.
1580
CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated
1581
Experimental Data on Protein and Peptide Aggregation. Zheng J, ed. PLoS One.
1582
2016;11(4):e0152949. doi:10.1371/journal.pone.0152949
1583
119. Leman JK, Weitzner BD, Lewis SM, Consortium R, Bonneau R. Macromolecular
1584
modeling and design in Rosetta: new methods and frameworks. Preprints. 2019.
1585
doi:10.20944/preprints201904.0263.v1
1586
120. Guerois R, Nielsen JE, Serrano L. Predicting Changes in the Stability of Proteins and
1587
Protein Complexes: A Study of More Than 1000 Mutations. J Mol Biol.
1588
2002;320(2):369-387. doi:10.1016/S0022-2836(02)00442-4
1589
121. Wijma HJ, Floor RJ, Janssen DB. Structure- and sequence-analysis inspired
1590
engineering of proteins for enhanced thermostability. Curr Opin Struct Biol.
1591
2013;23(4):588-594. doi:10.1016/j.sbi.2013.04.008
46
1592
122. Broom A, Jacobi Z, Trainor K, Meiering EM. Computational tools help improve
1593
protein stability but with a solubility tradeoff. J Biol Chem.
1594
2017;292(35):14349-14361. doi:10.1074/jbc.M117.784165
1595 1596 1597
123. Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods. 2014;11(8):801-807. doi:10.1038/nmeth.3027 124. Koenig P, Lee C V., Sanowar S, et al. Deep Sequencing-guided Design of a High
1598
Affinity Dual Specificity Antibody to Target Two Angiogenic Factors in Neovascular
1599
Age-related Macular Degeneration. J Biol Chem. 2015;290(36):21773-21786.
1600
doi:10.1074/jbc.M115.662783
1601
125. Wang S, Liu M, Zeng D, et al. Increasing stability of antibody via antibody
1602
engineering: Stability engineering on an anti-hVEGF. Proteins Struct Funct
1603
Bioinforma. 2014;82(10):2620-2630. doi:10.1002/prot.24626
1604
126. Sivasubramanian A, Sircar A, Chaudhury S, Gray JJ. Toward high-resolution
1605
homology modeling of antibody F v regions and application to antibody-antigen
1606
docking. Proteins Struct Funct Bioinforma. 2009;74(2):497-514.
1607
doi:10.1002/prot.22309
1608
127. Chen R, Li L, Weng Z. ZDOCK: An initial-stage protein-docking algorithm. Proteins
1609
Struct Funct Genet. 2003;52(1):80-87. doi:10.1002/prot.10389
1610
128. Sircar A, Gray JJ. SnugDock: Paratope Structural Optimization during
1611
Antibody-Antigen Docking Compensates for Errors in Antibody Homology Models.
1612
Kortemme T, ed. PLoS Comput Biol. 2010;6(1):e1000644.
1613
doi:10.1371/journal.pcbi.1000644
1614
129. Chan C-H, Liang H-K, Hsiao N-W, Ko M-T, Lyu P-C, Hwang J-K. Relationship
1615
between local structural entropy and protein thermostabilty. Proteins Struct Funct
1616
Bioinforma. 2004;57(4):684-691. doi:10.1002/prot.20263
1617
130. Su JG, Li CH, Hao R, Chen WZ, Xin Wang C. Protein Unfolding Behavior Studied by
1618
Elastic Network Model. Biophys J. 2008;94(12):4586-4596.
1619
doi:10.1529/biophysj.107.121665
1620
131. Zhang C, Samad M, Yu H, Chakroun N, Hilton D, Dalby PA. Computational Design
1621
To Reduce Conformational Flexibility and Aggregation Rates of an Antibody Fab
1622
Fragment. Mol Pharm. 2018;15(8):3079-3092.
1623
doi:10.1021/acs.molpharmaceut.8b00186
1624
132. Lee J, Der BS, Karamitros CS, et al. Computer‐based engineering of thermostabilized
1625
antibody fragments. AIChE J. November 2019:in press. doi:10.1002/aic.16864
47
1626
133. Dudgeon K, Rouet R, Kokmeijer I, et al. General strategy for the generation of human
1627
antibody variable domains with increased aggregation resistance. Proc Natl Acad Sci.
1628
2012;109(27):10879-10884. doi:10.1073/pnas.1202866109
1629
134. Sakhnini LI, Greisen PJ, Wiberg C, et al. Improving the Developability of an Antigen
1630
Binding Fragment by Aspartate Substitutions. Biochemistry. 2019;58(24):2750-2759.
1631
doi:10.1021/acs.biochem.9b00251
1632
135. Lawrence MS, Phillips KJ, Liu DR. Supercharging proteins can impart unusual
1633
resilience. J Am Chem Soc. 2007;129(33):10110-10112. doi:10.1021/ja071641y
1634
136. Der BS, Kluwe C, Miklos AE, et al. Alternative Computational Protocols for
1635
Supercharging Protein Surfaces for Reversible Unfolding and Retention of Stability.
1636
Salsbury Jr F, ed. PLoS One. 2013;8(5):e64363. doi:10.1371/journal.pone.0064363
1637
137. Miklos AE, Kluwe C, Der BS, et al. Structure-based design of supercharged, highly
1638
thermoresistant antibodies. Chem Biol. 2012;19(4):449-455.
1639
doi:10.1016/j.chembiol.2012.01.018
1640
138. Bruce VJ, Lopez-Islas M, McNaughton BR. Resurfaced cell-penetrating nanobodies:
1641
A potentially general scaffold for intracellularly targeted protein discovery. Protein Sci.
1642
2016;25(6):1129-1137. doi:10.1002/pro.2926
1643 1644
139. Courtois F, Schneider CP, Agrawal NJ, Trout BL. Rational Design of Biobetters with Enhanced Stability. J Pharm Sci. 2015;104(8):2433-2440. doi:10.1002/jps.24520
1645
140. Courtois F, Agrawal NJ, Lauer TM, Trout BL. Rational design of therapeutic mAbs
1646
against aggregation through protein engineering and incorporation of glycosylation
1647
motifs applied to bevacizumab. MAbs. 2016;8(1):99-112.
1648
doi:10.1080/19420862.2015.1112477
1649
141. Clark RH, Latypov RF, De Imus C, et al. Remediating agitation-induced antibody
1650
aggregation by eradicating exposed hydrophobic motifs. MAbs. 2014;6(6):1540-1550.
1651
doi:10.4161/mabs.36252
1652
142. Ejima D, Tsumoto K, Fukada H, et al. Effects of acid exposure on the conformation,
1653
stability, and aggregation of monoclonal antibodies. Proteins Struct Funct Bioinforma.
1654
2006;66(4):954-962. doi:10.1002/prot.21243
1655
143. Skamris T, Tian X, Thorolfsson M, et al. Monoclonal Antibodies Follow Distinct
1656
Aggregation Pathways During Production-Relevant Acidic Incubation and
1657
Neutralization. Pharm Res. 2016;33(3):716-728. doi:10.1007/s11095-015-1821-0
1658
144. Arakawa T, Kita Y, Carpenter JF. Protein--solvent interactions in pharmaceutical
1659
formulations. Pharm Res. 1991;8(3):285-291. doi:10.1023/a:1015825027737
48
1660
145. Arakawa T, Kita Y. Protection of Bovine Serum Albumin from Aggregation by Tween
1661
80. J Pharm Sci. 2000;89(5):646-651.
1662
doi:10.1002/(SICI)1520-6017(200005)89:5<646::AID-JPS10>3.0.CO;2-J
1663
146. Arakawa T, Kita Y. Stabilizing effects of caprylate and acetyltryptophanate on
1664
heat-induced aggregation of bovine serum albumin. Biochim Biophys Acta - Protein
1665
Struct Mol Enzymol. 2000;1479(1-2):32-36. doi:10.1016/S0167-4838(00)00061-3
1666
147. Arakawa T, Kita Y, Timasheff SN. Protein precipitation and denaturation by dimethyl
1667
sulfoxide. Biophys Chem. 2007;131(1-3):62-70. doi:10.1016/j.bpc.2007.09.004
1668
148. Arakawa T, Ejima D, Tsumoto K, et al. Suppression of protein interactions by
1669
arginine: A proposed mechanism of the arginine effects. Biophys Chem.
1670
2007;127(1-2):1-8. doi:10.1016/j.bpc.2006.12.007
1671
149. Cloutier T, Sudrik C, Mody N, Sathish HA, Trout BL. Molecular Computations of
1672
Preferential Interaction Coefficients of IgG1 Monoclonal Antibodies with Sorbitol,
1673
Sucrose, and Trehalose and the Impact of These Excipients on Aggregation and
1674
Viscosity. Mol Pharm. 2019;16(8):3657-3664.
1675
doi:10.1021/acs.molpharmaceut.9b00545
1676
150. Sormanni P, Amery L, Ekizoglou S, Vendruscolo M, Popovic B. Rapid and accurate in
1677
silico solubility screening of a monoclonal antibody library. Sci Rep. 2017;7(1):8200.
1678
doi:10.1038/s41598-017-07800-w
1679
151. Wolf Pérez A-M, Sormanni P, Andersen JS, et al. In vitro and in silico assessment of
1680
the developability of a designed monoclonal antibody library. MAbs.
1681
2019;11(2):388-400. doi:10.1080/19420862.2018.1556082
1682
152. Shan L, Mody N, Sormani P, Rosenthal KL, Damschroder MM, Esfandiary R.
1683
Developability Assessment of Engineered Monoclonal Antibody Variants with a
1684
Complex Self-Association Behavior Using Complementary Analytical and in Silico
1685
Tools. Mol Pharm. 2018;15(12):5697-5710. doi:10.1021/acs.molpharmaceut.8b00867
1686
153. Arora J, Hu Y, Esfandiary R, et al. Charge-mediated Fab-Fc interactions in an IgG1
1687
antibody induce reversible self-association, cluster formation, and elevated viscosity.
1688
MAbs. 2016;8(8):1561-1574. doi:10.1080/19420862.2016.1222342
1689
154. Van Durme J, De Baets G, Van Der Kant R, et al. Solubis: a webserver to reduce
1690
protein aggregation through mutation. Protein Eng Des Sel. 2016;29(8):285-289.
1691
doi:10.1093/protein/gzw019
1692
155. van der Kant R, Karow-Zwick AR, Van Durme J, et al. Prediction and Reduction of
1693
the Aggregation of Monoclonal Antibodies. J Mol Biol. 2017;429(8):1244-1261.
1694
doi:10.1016/j.jmb.2017.03.014 49
1695
156. Martinez M, Bruce NJ, Romanowska J, et al. SDA 7: A modular and parallel
1696
implementation of the simulation of diffusional association software. J Comput Chem.
1697
2015;36(21):1631-1645. doi:10.1002/jcc.23971
1698
157. Nautiyal K, Kibria MG, Akazawa-Ogawa Y, Hagihara Y, Kuroda Y. Design and
1699
assessment of an active anti-epidermal growth factor receptor (EGFR) single chain
1700
variable fragment (ScFv) with improved solubility. Biochem Biophys Res Commun.
1701
2019;508(4):1043-1049. doi:10.1016/j.bbrc.2018.11.170
1702
158. Tomar DS, Kumar S, Singh SK, Goswami S, Li L. Molecular basis of high viscosity in
1703
concentrated antibody solutions: Strategies for high concentration drug product
1704
development. MAbs. 2016;8(2):216-228. doi:10.1080/19420862.2015.1128606
1705
159. Li L, Kumar S, Buck PM, et al. Concentration Dependent Viscosity of Monoclonal
1706
Antibody Solutions: Explaining Experimental Behavior in Terms of Molecular
1707
Properties. Pharm Res. 2014;31(11):3161-3178. doi:10.1007/s11095-014-1409-0
1708
160. Sharma VK, Patapoff TW, Kabakoff B, et al. In silico selection of therapeutic
1709
antibodies for development: Viscosity, clearance, and chemical stability. Proc Natl
1710
Acad Sci. 2014;111(52):18601-18606. doi:10.1073/pnas.1421779112
1711
161. Kramer RM, Shende VR, Motl N, Pace CN, Scholtz JM. Toward a Molecular
1712
Understanding of Protein Solubility: Increased Negative Surface Charge Correlates
1713
with Increased Solubility. Biophys J. 2012;102(8):1907-1915.
1714
doi:10.1016/j.bpj.2012.01.060
1715
162. Tomar DS, Li L, Broulidakis MP, et al. In-silico prediction of concentration-dependent
1716
viscosity curves for monoclonal antibody solutions. MAbs. 2017;9(3):476-489.
1717
doi:10.1080/19420862.2017.1285479
1718
163. Gentiluomo L, Roessner D, Augustijn D, et al. Application of interpretable artificial
1719
neural networks to early monoclonal antibodies development. Eur J Pharm Biopharm.
1720
2019;141:81-89. doi:10.1016/j.ejpb.2019.05.017
1721
164. Nichols P, Li L, Kumar S, et al. Rational design of viscosity reducing mutants of a
1722
monoclonal antibody: Hydrophobic versus electrostatic inter-molecular interactions.
1723
MAbs. 2015;7(1):212-230. doi:10.4161/19420862.2014.985504
1724
165. Kumar S, Roffi K, Tomar DS, et al. Rational optimization of a monoclonal antibody
1725
for simultaneous improvements in its solution properties and biological activity.
1726
Berghuis A, ed. Protein Eng Des Sel. 2018;31(7-8):313-325.
1727
doi:10.1093/protein/gzy020
50
1728
166. Chow C-K, Allan BW, Chai Q, Atwell S, Lu J. Therapeutic Antibody Engineering To
1729
Improve Viscosity and Phase Separation Guided by Crystal Structure. Mol Pharm.
1730
2016;13(3):915-923. doi:10.1021/acs.molpharmaceut.5b00817
1731
167. Geoghegan JC, Fleming R, Damschroder M, Bishop SM, Sathish HA, Esfandiary R.
1732
Mitigation of reversible self-association and viscosity in a human IgG1 monoclonal
1733
antibody by rational, structure-guided Fv engineering. MAbs. 2016;8(5):941-950.
1734
doi:10.1080/19420862.2016.1171444
1735
168. Kuhn AB, Kube S, Karow-Zwick AR, et al. Improved Solution-State Properties of
1736
Monoclonal Antibodies by Targeted Mutations. J Phys Chem B.
1737
2017;121(48):10818-10827. doi:10.1021/acs.jpcb.7b09126
1738
169. Jetha A, Thorsteinson N, Jmeian Y, Jeganathan A, Giblin P, Fransson J. Homology
1739
modeling and structure-based design improve hydrophobic interaction chromatography
1740
behavior of integrin binding antibodies. MAbs. 2018;10(6):890-900.
1741
doi:10.1080/19420862.2018.1475871
1742
170. Lemkul J. From Proteins to Perturbed Hamiltonians: A Suite of Tutorials for the
1743
GROMACS-2018 Molecular Simulation Package [Article v1.0]. Living J Comput Mol
1744
Sci. 2019;1(1). doi:10.33011/livecoms.1.1.5068
1745
171. Kastritis PL, Bonvin AMJJ. Are Scoring Functions in Protein−Protein Docking Ready
1746
To Predict Interactomes? Clues from a Novel Binding Affinity Benchmark. J
1747
Proteome Res. 2010;9(5):2216-2225. doi:10.1021/pr9009854
1748
172. Ando T, Skolnick J. Crowding and hydrodynamic interactions likely dominate in vivo
1749
macromolecular motion. Proc Natl Acad Sci. 2010;107(43):18457-18462.
1750
doi:10.1073/pnas.1011354107
1751
173. von Bülow S, Siggel M, Linke M, Hummer G. Dynamic cluster formation determines
1752
viscosity and diffusion in dense protein solutions. Proc Natl Acad Sci U S A.
1753
2019;116(20):9843-9852. doi:10.1073/pnas.1817564116
1754
174. Chaudhri A, Zarraga IE, Kamerzell TJ, et al. Coarse-Grained Modeling of the
1755
Self-Association of Therapeutic Monoclonal Antibodies. J Phys Chem B.
1756
2012;116(28):8045-8057. doi:10.1021/jp301140u
1757
175. Chaudhri A, Zarraga IE, Yadav S, Patapoff TW, Shire SJ, Voth GA. The Role of
1758
Amino Acid Sequence in the Self-Association of Therapeutic Monoclonal Antibodies:
1759
Insights from Coarse-Grained Modeling. J Phys Chem B. 2013;117(5):1269-1279.
1760
doi:10.1021/jp3108396
51
1761
176. Yadav S, Shire SJ, Kalonia DS. Factors Affecting the Viscosity in High Concentration
1762
Solutions of Different Monoclonal Antibodies. J Pharm Sci. 2010;99(12):4812-4829.
1763
doi:10.1002/jps.22190
1764
177. Yadav S, Liu J, Shire SJ, Kalonia DS. Specific interactions in high concentration
1765
antibody solutions resulting in high viscosity. J Pharm Sci. 2010;99(3):1152-1168.
1766
doi:10.1002/jps.21898
1767
178. Yadav S, Sreedhara A, Kanai S, et al. Establishing a Link Between Amino Acid
1768
Sequences and Self-Associating and Viscoelastic Behavior of Two Closely Related
1769
Monoclonal Antibodies. Pharm Res. 2011;28(7):1750-1764.
1770
doi:10.1007/s11095-011-0410-0
1771
179. Yadav S, Laue TM, Kalonia DS, Singh SN, Shire SJ. The Influence of Charge
1772
Distribution on Self-Association and Viscosity Behavior of Monoclonal Antibody
1773
Solutions. Mol Pharm. 2012;9(4):791-802. doi:10.1021/mp200566k
1774
180. Buck PM, Chaudhri A, Kumar S, Singh SK. Highly Viscous Antibody Solutions Are a
1775
Consequence of Network Formation Caused by Domain–Domain Electrostatic
1776
Complementarities: Insights from Coarse-Grained Simulations. Mol Pharm.
1777
2015;12(1):127-139. doi:10.1021/mp500485w
1778
181. Lapelosa M, Patapoff TW, Zarraga IE. Molecular Simulations of the Pairwise
1779
Interaction of Monoclonal Antibodies. J Phys Chem B. 2014;118(46):13132-13141.
1780
doi:10.1021/jp508729z
1781
182. Wang G, Varga Z, Hofmann J, Zarraga IE, Swan JW. Structure and Relaxation in
1782
Solutions of Monoclonal Antibodies. J Phys Chem B. 2018;122(11):2867-2880.
1783
doi:10.1021/acs.jpcb.7b11053
1784
183. Yearley EJ, Zarraga IE, Shire SJ, et al. Small-Angle Neutron Scattering
1785
Characterization of Monoclonal Antibody Conformations and Interactions at High
1786
Concentrations. Biophys J. 2013;105(3):720-731. doi:10.1016/j.bpj.2013.06.043
1787
184. Lilyestrom WG, Yadav S, Shire SJ, Scherer TM. Monoclonal Antibody
1788
Self-Association, Cluster Formation, and Rheology at High Concentrations. J Phys
1789
Chem B. 2013;117(21):6373-6384. doi:10.1021/jp4008152
1790
185. Castellanos MM, Clark NJ, Watson MC, Krueger S, McAuley A, Curtis JE. Role of
1791
Molecular Flexibility and Colloidal Descriptions of Proteins in Crowded Environments
1792
from Small-Angle Scattering. J Phys Chem B. 2016;120(49):12511-12518.
1793
doi:10.1021/acs.jpcb.6b10637
52
1794
186. Corbett D, Hebditch M, Keeling R, et al. Coarse-Grained Modeling of Antibodies
1795
from Small-Angle Scattering Profiles. J Phys Chem B. 2017;121(35):8276-8290.
1796
doi:10.1021/acs.jpcb.7b04621
1797
187. Abhinandan KR, Martin ACR. Analyzing the “Degree of Humanness” of Antibody
1798
Sequences. J Mol Biol. 2007;369(3):852-862. doi:10.1016/j.jmb.2007.02.100
1799
188. Thullier P, Huish O, Pelat T, Martin ACR. The Humanness of Macaque Antibody
1800
Sequences. J Mol Biol. 2010;396(5):1439-1450. doi:10.1016/j.jmb.2009.12.041
1801
189. Pelat T, Bedouelle H, Rees AR, Crennell SJ, Lefranc M-P, Thullier P. Germline
1802
Humanization of a Non-human Primate Antibody that Neutralizes the Anthrax Toxin,
1803
by in Vitro and in Silico Engineering. J Mol Biol. 2008;384(5):1400-1407.
1804
doi:10.1016/j.jmb.2008.10.033
1805
190. Gao SH, Huang K, Tu H, Adler AS. Monoclonal antibody humanness score and its
1806
applications. BMC Biotechnol. 2013;13(1):55. doi:10.1186/1472-6750-13-55
1807
191. Ye J, Ma N, Madden TL, Ostell JM. IgBLAST: an immunoglobulin variable domain
1808
sequence analysis tool. Nucleic Acids Res. 2013;41(W1):W34-W40.
1809
doi:10.1093/nar/gkt382
1810
192. Seeliger D. Development of Scoring Functions for Antibody Sequence Assessment
1811
and Optimization. PLoS One. 2013;8(10):e76909. doi:10.1371/journal.pone.0076909
1812
193. Swindells MB, Porter CT, Couch M, et al. abYsis: Integrated Antibody Sequence and
1813
Structure—Management, Analysis, and Prediction. J Mol Biol. 2017;429(3):356-364.
1814
doi:10.1016/j.jmb.2016.08.019
1815
194. Koren E, De Groot AS, Jawa V, et al. Clinical validation of the “in silico” prediction
1816
of immunogenicity of a human recombinant therapeutic protein. Clin Immunol.
1817
2007;124(1):26-32. doi:10.1016/j.clim.2007.03.544
1818
195. Seeliger D, Schulz P, Litzenburger T, et al. Boosting antibody developability through
1819
rational sequence optimization. MAbs. 2015;7(3):505-515.
1820
doi:10.1080/19420862.2015.1017695
1821
196. Clavero-Álvarez A, Di Mambro T, Perez-Gaviro S, Magnani M, Bruscolini P.
1822
Humanization of Antibodies using a Statistical Inference Approach. Sci Rep.
1823
2018;8(1):14820. doi:10.1038/s41598-018-32986-y
1824
197. Lazar GA, Desjarlais JR, Jacinto J, Karki S, Hammond PW. A molecular immunology
1825
approach to antibody humanization and functional optimization. Mol Immunol.
1826
2007;44(8):1986-1998. doi:10.1016/j.molimm.2006.09.029
53
1827
198. Mei S, Li F, Leier A, et al. A comprehensive review and performance evaluation of
1828
bioinformatics tools for HLA class I peptide-binding prediction. Brief Bioinform.
1829
2019;in press. doi:10.1093/bib/bbz051
1830
199. Qiu J, Qiu T, Huang Y, Cao Z. Identifying the Epitope Regions of Therapeutic
1831
Antibodies Based on Structure Descriptors. Int J Mol Sci. 2017;18(12):2457.
1832
doi:10.3390/ijms18122457
1833
200. Kobe B, Guncar G, Buchholz R, et al. Crystallography and protein–protein
1834
interactions: biological interfaces and crystal contacts. Biochem Soc Trans.
1835
2008;36(6):1438-1441. doi:10.1042/BST0361438
1836
201. Harding FA, Stickler MM, Razo J, DuBridge R. The immunogenicity of humanized
1837
and fully human antibodies. MAbs. 2010;2(3):256-265. doi:10.4161/mabs.2.3.11641
1838
202. Jones PT, Dear PH, Foote J, Neuberger MS, Winter G. Replacing the
1839
complementarity-determining regions in a human antibody with those from a mouse.
1840
Nature. 1986;321(6069):522-525. doi:10.1038/321522a0
1841
203. Tan P, Mitchell DA, Buss TN, Holmes MA, Anasetti C, Foote J. “Superhumanized”
1842
Antibodies: Reduction of Immunogenic Potential by Complementarity-Determining
1843
Region Grafting with Human Germline Sequences: Application to an Anti-CD28. J
1844
Immunol. 2002;169(2):1119-1125. doi:10.4049/jimmunol.169.2.1119
1845
204. Khee Hwang WY, Almagro JC, Buss TN, Tan P, Foote J. Use of human germline
1846
genes in a CDR homology-based approach to antibody humanization. Methods.
1847
2005;36(1):35-42. doi:10.1016/j.ymeth.2005.01.004
1848
205. Roguska MA, Pedersen JT, Keddy CA, et al. Humanization of murine monoclonal
1849
antibodies through variable domain resurfacing. Proc Natl Acad Sci.
1850
1994;91(3):969-973. doi:10.1073/pnas.91.3.969
1851 1852 1853 1854 1855
206. Olimpieri PP, Marcatili P, Tramontano A. Tabhu: Tools for antibody humanization. Bioinformatics. 2015;31(3):434-435. doi:10.1093/bioinformatics/btu667 207. Almagro JC, Fransson J. Humanization of antibodies. Front Biosci. 2008;13(7):1619-1633. doi:10.1093/toxsci/kft065 208. Safdari Y, Farajnia S, Asgharzadeh M, Khalili M. Antibody humanization methods – a
1856
review and update. Biotechnol Genet Eng Rev. 2013;29(2):175-186.
1857
doi:10.1080/02648725.2013.801235
1858
209. Mayrhofer P, Kunert R. Nomenclature of humanized mAbs: Early concepts, current
1859
challenges and future perspectives. Hum Antibodies. 2018;27(1):37-51.
1860
doi:10.3233/HAB-180347
54
1861 1862 1863
210. Lo BKC. Antibody humanization by CDR grafting. Methods Mol Biol. 2004;248:135-159. http://www.ncbi.nlm.nih.gov/pubmed/14970494. 211. Zhang D, Chen CF, Zhao B Bin, et al. A novel antibody humanization method based
1864
on epitopes scanning and molecular dynamics simulation. PLoS One.
1865
2013;8(11):e80636. doi:10.1371/journal.pone.0080636
1866
212. Margreitter C, Mayrhofer P, Kunert R, Oostenbrink C. Antibody humanization by
1867
molecular dynamics simulations - In-silico guided selection of critical backmutations.
1868
J Mol Recognit. 2016;29(6):266-275. doi:10.1002/jmr.2527
1869
213. Schwaigerlehner L, Pechlaner M, Mayrhofer P, Oostenbrink C, Kunert R. Lessons
1870
learned from merging wet lab experiments with molecular simulation to improve mAb
1871
humanization. Rees A, ed. Protein Eng Des Sel. 2018;31(7-8):257-265.
1872
doi:10.1093/protein/gzy009
1873
214. Hanf KJM, Arndt JW, Chen LL, et al. Antibody humanization by redesign of
1874
complementarity-determining region residues proximate to the acceptor framework.
1875
Methods. 2014;65(1):68-76. doi:10.1016/j.ymeth.2013.06.024
1876
215. Looger LL, Hellinga HW. Generalized dead-end elimination algorithms make
1877
large-scale protein side-chain structure prediction tractable: implications for protein
1878
design and structural genomics. J Mol Biol. 2001;307(1):429-445.
1879
doi:10.1006/jmbi.2000.4424
1880
216. Parker AS, Zheng W, Griswold KE, Bailey-Kellogg C. Optimization algorithms for
1881
functional deimmunization of therapeutic proteins. BMC Bioinformatics.
1882
2010;11(1):180. doi:10.1186/1471-2105-11-180
1883
217. PARKER AS, GRISWOLD KE, BAILEY-KELLOGG C. OPTIMIZATION OF
1884
THERAPEUTIC PROTEINS TO DELETE T-CELL EPITOPES WHILE
1885
MAINTAINING BENEFICIAL RESIDUE INTERACTIONS. J Bioinform Comput
1886
Biol. 2011;09(02):207-229. doi:10.1142/S0219720011005471
1887
218. He L, Friedman AM, Bailey-Kellogg C. A divide-and-conquer approach to determine
1888
the Pareto frontier for optimization of protein engineering experiments. Proteins Struct
1889
Funct Bioinforma. 2012;80(3):790-806. doi:10.1002/prot.23237
1890
219. Parker AS, Choi Y, Griswold KE, Bailey-Kellogg C. Structure-Guided
1891
Deimmunization of Therapeutic Proteins. J Comput Biol. 2013;20(2):152-165.
1892
doi:10.1089/cmb.2012.0251
1893
220. Choi Y, Griswold KE, Bailey-Kellogg C. Structure-based redesign of proteins for
1894
minimal T-cell epitope content. J Comput Chem. 2013;34(10):879-891.
1895
doi:10.1002/jcc.23213 55
1896
221. Choi Y, Hua C, Sentman CL, Ackerman ME, Bailey-Kellogg C. Antibody
1897
humanization by structure-based computational protein design. MAbs.
1898
2015;7(6):1045-1057. doi:10.1080/19420862.2015.1076600
1899
222. Choi Y, Ndong C, Griswold KE, Bailey-Kellogg C. Computationally driven antibody
1900
engineering enables simultaneous humanization and thermostabilization. Protein Eng
1901
Des Sel. 2016;29(10):419-426. doi:10.1093/protein/gzw024
1902
223. Parker AS, Griswold KE, Bailey-Kellogg C. Optimization of Combinatorial
1903
Mutagenesis. J Comput Biol. 2011;18(11):1743-1756. doi:10.1089/cmb.2011.0152
1904
224. Gainza P, Roberts KE, Georgiev I, et al. OSPREY: protein design with ensembles,
1905
flexibility, and provable algorithms. Methods Enzymol. 2013;523:87-107.
1906
doi:10.1016/B978-0-12-394292-0.00005-9
1907 1908 1909
225. Ponder JW, Case DA. Force Fields for Protein Simulations. In: Advances in Protein Chemistry. ; 2003:27-85. doi:10.1016/S0065-3233(03)66002-X 226. Foote J, Winter G. Antibody framework residues affecting the conformation of the
1910
hypervariable loops. J Mol Biol. 1992;224(2):487-499.
1911
doi:10.1016/0022-2836(92)91010-M
1912
227. Makabe K, Nakanishi T, Tsumoto K, et al. Thermodynamic Consequences of
1913
Mutations in Vernier Zone Residues of a Humanized Anti-human Epidermal Growth
1914
Factor Receptor Murine Antibody, 528. J Biol Chem. 2008;283(2):1156-1166.
1915
doi:10.1074/jbc.M706190200
1916
228. Nakanishi T, Tsumoto K, Yokota A, Kondo H, Kumagai I. Critical contribution of
1917
VH-VL interaction to reshaping of an antibody: the case of humanization of
1918
anti-lysozyme antibody, HyHEL-10. Protein Sci. 2008;17(2):261-270.
1919
doi:10.1110/ps.073156708
1920
229. Onda M, Beers R, Xiang L, Nagata S, Wang Q -c., Pastan I. An immunotoxin with
1921
greatly reduced immunogenicity by identification and removal of B cell epitopes. Proc
1922
Natl Acad Sci. 2008;105(32):11311-11316. doi:10.1073/pnas.0804851105
1923
230. Cantor JR, Yoo TH, Dixit A, Iverson BL, Forsthuber TG, Georgiou G. Therapeutic
1924
enzyme deimmunization by combinatorial T-cell epitope removal using neutral drift.
1925
Proc Natl Acad Sci. 2011;108(4):1272-1277. doi:10.1073/pnas.1014739108
1926
231. Mazor R, Eberle JA, Hu X, et al. Recombinant immunotoxin for cancer treatment with
1927
low immunogenicity by identification and silencing of human T-cell epitopes. Proc
1928
Natl Acad Sci. 2014;111(23):8571-8576. doi:10.1073/pnas.1405153111
1929 1930
232. King C, Garza EN, Mazor R, et al. Removing T-cell epitopes with computational protein design. Proc Natl Acad Sci U S A. 2014. doi:10.1073/pnas.1321126111 56
1931
233. Schubert B, Schärfe C, Dönnes P, Hopf T, Marks D, Kohlbacher O.
1932
Population-specific design of de-immunized protein biotherapeutics. Dunbrack RL, ed.
1933
PLOS Comput Biol. 2018;14(3):e1005983. doi:10.1371/journal.pcbi.1005983
1934
234. Choi Y, Verma D, Griswold KE, Bailey-Kellogg C. EpiSweep: Computationally
1935
Driven Reengineering of Therapeutic Proteins to Reduce Immunogenicity While
1936
Maintaining Function. In: Methods in Molecular Biology. ; 2017:375-398.
1937
doi:10.1007/978-1-4939-6637-0_20
1938
235. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational
1939
approaches to estimate solubility and permeability in drug discovery and development
1940
settings. Adv Drug Deliv Rev. 2012;64:4-17. doi:10.1016/j.addr.2012.09.019
1941
236. Wishart DS. DrugBank: a comprehensive resource for in silico drug discovery and
1942 1943
exploration. Nucleic Acids Res. 2006;34(90001):D668-D672. doi:10.1093/nar/gkj067 237. Raybould MIJ, Marks C, Krawczyk K, et al. Five computational developability
1944
guidelines for therapeutic antibody profiling. Proc Natl Acad Sci.
1945
2019;116(10):4025-4030. doi:10.1073/pnas.1810576116
1946
238. Kovaltsuk A, Leem J, Kelm S, Snowden J, Deane CM, Krawczyk K. Observed
1947
Antibody Space: A Resource for Data Mining Next-Generation Sequencing of
1948
Antibody Repertoires. J Immunol. 2018;201(8):2502-2509.
1949
doi:10.4049/jimmunol.1800708
1950
239. Chothia C, Lesk AM. Canonical structures for the hypervariable regions of
1951
immunoglobulins. J Mol Biol. 1987;196(4):901-917.
1952
doi:10.1016/0022-2836(87)90412-8
1953
240. Weitzner BD, Kuroda D, Marze N, Xu J, Gray JJ. Blind prediction performance of
1954
RosettaAntibody 3.0: Grafting, relaxation, kinematic loop modeling, and full CDR
1955
optimization. Proteins Struct Funct Bioinforma. 2014;82(8):1611-1623.
1956
doi:10.1002/prot.24534
1957 1958 1959 1960 1961 1962 1963 1964
241. Weitzner BD, Jeliazkov JR, Lyskov S, et al. Modeling and docking of antibody structures with Rosetta. Nat Protoc. 2017;12(2):401-416. doi:10.1038/nprot.2016.180 242. Kuroda D, Gray JJ. Pushing the Backbone in Protein-Protein Docking. Structure. 2016;24(10):1821-1829. doi:10.1016/j.str.2016.06.025 243. Kamerzell TJ, Russell Middaugh C. The Complex Inter-Relationships Between Protein Flexibility and Stability. J Pharm Sci. 2008;97(9):3494-3517. doi:10.1002/jps.21269 244. Galm L, Amrhein S, Hubbuch J. Predictive approach for protein aggregation: Correlation of protein surface characteristics and conformational flexibility to protein
57
1965
aggregation propensity. Biotechnol Bioeng. 2017;114(6):1170-1183.
1966
doi:10.1002/bit.25949
1967
245. Schrag JD, Picard M-È, Gaudreault F, et al. Binding symmetry and surface flexibility
1968
mediate antibody self-association. MAbs. 2019;11(7):1300-1318.
1969
doi:10.1080/19420862.2019.1632114
1970
246. Kiyoshi M, Caaveiro JMM, Miura E, et al. Affinity improvement of a therapeutic
1971
antibody by structure-based computational design: Generation of electrostatic
1972
interactions in the transition state stabilizes the antibody-antigen complex. PLoS One.
1973
2014;9(1):e87099. doi:10.1371/journal.pone.0087099
1974
247. Yamashita T, Mizohata E, Nagatoishi S, et al. Affinity Improvement of a
1975
Cancer-Targeted Antibody through Alanine-Induced Adjustment of Antigen-Antibody
1976
Interface. Structure. 2019;27(3):519-527.e5. doi:10.1016/j.str.2018.11.002
1977
248. Wong SE, Sellers BD, Jacobson MP. Effects of somatic mutations on CDR loop
1978
flexibility during affinity maturation. Proteins Struct Funct Bioinforma.
1979
2011;79(3):821-829. doi:10.1002/prot.22920
1980
249. Bostrom J, Haber L, Koenig P, Kelley RF, Fuh G. High Affinity Antigen Recognition
1981
of the Dual Specific Variants of Herceptin Is Entropy-Driven in Spite of Structural
1982
Plasticity. Romesberg F, ed. PLoS One. 2011;6(4):e17887.
1983
doi:10.1371/journal.pone.0017887
1984
250. Jeliazkov JR, Sljoka A, Kuroda D, et al. Repertoire Analysis of Antibody CDR-H3
1985
Loops Suggests Affinity Maturation Does Not Typically Result in Rigidification.
1986
Front Immunol. 2018;9. doi:10.3389/fimmu.2018.00413
1987
251. Fukunaga A, Tsumoto K. Improving the affinity of an antibody for its antigen via
1988
long-range electrostatic interactions. Protein Eng Des Sel. 2013;26(12):773-780.
1989
doi:10.1093/protein/gzt053
1990
252. Fukunaga A, Maeta S, Reema B, Nakakido M, Tsumoto K. Improvement of antibody
1991
affinity by introduction of basic amino acid residues into the framework region.
1992
Biochem Biophys Reports. 2018;15:81-85. doi:10.1016/j.bbrep.2018.07.005
1993
253. Fleishman SJ, Whitehead TA, Ekiert DC, et al. Computational Design of Proteins
1994
Targeting the Conserved Stem Region of Influenza Hemagglutinin. Science (80- ).
1995
2011;332(6031):816-821. doi:10.1126/science.1202617
1996
254. Baran D, Pszolla MG, Lapidoth GD, et al. Principles for computational design of
1997
binding antibodies. Proc Natl Acad Sci. 2017;114(41):10900-10905.
1998
doi:10.1073/pnas.1707171114
58
1999
255. Jiang L, Althoff EA, Clemente FR, et al. De Novo Computational Design of
2000
Retro-Aldol Enzymes. Science (80- ). 2008;319(5868):1387-1391.
2001
doi:10.1126/science.1152692
2002
256. Siegel JB, Zanghellini A, Lovick HM, et al. Computational Design of an Enzyme
2003
Catalyst for a Stereoselective Bimolecular Diels-Alder Reaction. Science (80- ).
2004
2010;329(5989):309-313. doi:10.1126/science.1190239
2005
257. Adolf-Bryfogle J, Kalyuzhniy O, Kubitz M, et al. RosettaAntibodyDesign (RAbD): A
2006
general framework for computational antibody design. PLoS Comput Biol.
2007
2018;14(4):e1006112. doi:10.1371/journal.pcbi.1006112
2008
258. Schneider G, Clark DE. Automated De Novo Drug Design: Are We Nearly There Yet?
2009
Angew Chemie Int Ed. 2019;58(32):10792-10803. doi:10.1002/anie.201814681
2010
259. Bradbury ARM, Sidhu S, Dübel S, McCafferty J. Beyond natural antibodies: the
2011
power of in vitro display technologies. Nat Biotechnol. 2011;29(3):245-254.
2012
doi:10.1038/nbt.1791
2013
260. Shukla D, Schneider CP, Trout BL. Molecular level insight into intra-solvent
2014
interaction effects on protein stability and aggregation. Adv Drug Deliv Rev.
2015
2011;63(13):1074-1085. doi:10.1016/j.addr.2011.06.014
2016
261. Ohtake S, Kita Y, Arakawa T. Interactions of formulation excipients with proteins in
2017
solution and in the dried state. Adv Drug Deliv Rev. 2011;63(13):1053-1073.
2018
doi:10.1016/j.addr.2011.06.011
2019
262. Pettersen EF, Goddard TD, Huang CC, et al. UCSF Chimera - A visualization system
2020
for exploratory research and analysis. J Comput Chem. 2004;25(13):1605-1612.
2021
doi:10.1002/jcc.20084
2022
59
Tables Table I. Features used in machine learning models. (A) DeepDDG for predicting thermostability.35 (B) SOLart for predicting solubility.75 (C) A decision tree-based method for prediction of Asn deamidation probability.107 Table II. Databases that store experimental information for predictive model construction in computer-aided antibody design. Table III. Large-scale experimental data of clinical stage antibodies. Table IV. Summary of the methods for computer-aided stability engineering Table V. Computational methods to assess, predict, and reduce the immunogenicity of antibodies.
60
Figures Figure 1. Development processes of protein therapeutics and the roles of computations. All protein pictures in this work were drawn by UCSF Chimera.262 Figure 2. Equilibrium between folded and unfolded proteins and its relation to protein stability. Figure 3. Potential chemical degradation sites (deamidation and isomerization) in an antibody (PDB: 1IGT). Sequence motifs for Asn deamidation (NG, NS, NN, NT, and NH) are colored in blue. Those for Asp isomerization (DG, DS, DD, DT, and DH) are colored in green. Figure 4. General workflow and of computational methods.
61
Tables Table I. Features used in machine learning models. (A) DeepDDG for predicting thermostability.35 (B) SOLart for predicting solubility.75 (C) A decision tree-based method for prediction of Asn deamidation probability.107 (A) Prediction of ∆∆G upon mutations Category
Features
Sequence-based features
Amino acid types (wild type, mutant, neighbor) Position specific scoring matrix Fitness score derived from a multiple sequence alignment Protein design probability
Structure-based features
Backbone dihedral angles (Phi, Psi, Omega) Secondary structures Solvent accessible surface area Number of hydrogen bonds (backbone-backbone, backbone-side chain, side chain-side chain) Distance and orientation between the mutated residues and the neighboring residues
1
(B) Prediction of solubility and aggregation Category
Features
Sequence-based features
Amino acid compositions Protein length
Structure-based features
Solubility-dependent statistical potentials Secondary structures Solvent accessible surface area
(C) Prediction of Asn deamidation probability Category
Features
Sequence-based features
Pentapeptide deamidation half-life C-terminal flanking residue
Structure-based features
Backbone dihedral angle (Phi, Psi) Side chain dihedral angle (Chi1, Chi2) Asn local secondary structure Percent solvent accessibility Solvent accessible surface area Side chain hydrogen bonds Nucleophilic C-N attack distance
2
Table II. Databases that store experimental information for predictive model construction in computer-aided antibody design. Database
Contents
URL
Conformational stability ProTherm
114
Various parameters, such as ∆∆G, structures and experimental information of wild type
https://www.iitm.ac.in/bioinfo/ProTherm/
and mutant proteins. Colloidal stability a
Sequences of antibody light chains (4364 ), including 808 amyloidogenic sequences in ALBase
http://albase.bumc.bu.edu/aldb/ AL patients. 115
ZipperDB
116
AMYPdb
Predictions of fibril-forming peptides within proteins identified by the 3D Profile Method
https://services.mbi.ucla.edu/zipperdb/ http://amypdb.genouest.org/e107_plugins/amypdb_
Amyloid precursor proteins, results of sequence analysis project/project.php
89
Hexapeptides (243 amyloid/836 non-amyloid), experimental information (electron
Waltz-DB
http://waltzdb.switchlab.org/ microscopy, FT-IR, Thioflavin), computed scores (WALTZ, TANGO, PASTA) Validated amyloid precursor proteins and prions together with information on their
117
AmyPro
amyloidogenic regions/domains and a broad functional classification of their amyloid
http://amypro.net/
a
state (162 ) CPAD
118
Peptides (amyloid/non-amyloid), APRs on proteins, aggregation rates upon mutations
https://www.iitm.ac.in/bioinfo/CPAD/
Solubility 77
eSol a
a
Solubility of Escherichia coli proteins synthesized by the PURE system (4,133 )
The number of data points as of writing of this review.
3
http://www.tanpaku.org/tp-esol/index.php?lang=en
Table III. Large-scale experimental data of clinical stage antibodies. Purpose
Contents
Reference
Experimental results from 12 assays (Tm by DSF, SGAC-SINS AS100, HIC Retention Time, SMAC Retention Developability
Time, Slope for Accelerated Stability, Poly-Specificity Regent SMP Score, Affinity-Capture Self-Interaction
[90]
Nanoparticle Spectroscopy, CIC Retention Time, CSI-BLI Delta Response, ELISA, BVP ELISA). Asn deamidation and Experimental results on chemical modifications identified by the analysis via LC-MS/MS.
[25]
Experimental results on a chemical modification identified by the analysis via LC-MS/MS.
[102]
Asp isomerization Met oxidation
4
Table IV. Summary of the methods for computer-aided stability engineering. Computational methods
Brief description
Reference
∆∆G prediction (e.g. Rosetta, FoldX)
Assessing fitness landscape of amino acid sequences to a given structure
[125,131,132,130,134]
based on the folding stability. ∆∆G = ∆GMut - ∆GWT. Supercharging
Replacing amino acids on protein surface with charged residues. Often
[135,136,137,138]
lead to better refolding properties. Spatial Aggregation Propensity
Quantifying solvent exposures of hydrophobic amino acids in a given
[139,140,141,143,149,152]
structure or during molecular simulations. CamSol
Predicting aggregation-prone regions based on computed solubility profile.
[150,141,152]
Solubis
Identifying mutations that simultaneously improve conformational and
[154,155]
colloidal stabilities. Brownian dynamics simulations
Estimating solubility of input molecules based on computed association kinetics.
5
[157]
Table V. Computational methods to assess, predict, and reduce the immunogenicity of antibodies. Assessment of immunogenicity
Input
SHAB
Sequence
Description
Availability in the public domain
Reference
Humanness score based on sequence identity to human
[187] http://www.bioinf.org.uk/abs/shab/
antibody sequences Humanness score based on sequence identity to the top 20 T20 score analyzer
Sequence
[190] https://dm.lakepharma.com/bioinformatics/
matched human antibody sequences Humanness score based on sequence identity to the closest Germinality Index (GI)
Sequence
[189] N/A
human germline sequence Species-specific statistical potentials based on residue Seeliger potential
Sequence
[192] N/A
frequency Species-specific statistical potentials based on residue MG score
Sequence
[196] N/A
frequency Humanness score based on potential short stretch of T-cell Human String Content (HSC)
Sequence
[197] N/A
epitopes Humanization of antibodies Tabhu
(CDR
grafting
Input and
Description
Availability in the public domain
Framework template search followed by CDR grafting with Sequence
back-mutations)
[206] http://www.biocomputing.it/tabhu
back-mutations Germline framework template search based on CDR
Superhumanization
Sequence
[203,204] N/A
homology. Replacement of Resurfacing
residues
exposed to solvents
with
Structure
[205] N/A
corresponding residues observed in human antibodies Sequence
Deimmunization based on T-cell epitope prediction while
Available
upon
request
/Structure
preserving other properties such as affinity and stability.
corresponding author
EpiSweep
6
through
the
[234]
7