Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design

Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design

Journal Pre-proof Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design Daisuke Kuroda, Kouhei Tsumoto PII: S002...

4MB Sizes 0 Downloads 34 Views

Journal Pre-proof Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design Daisuke Kuroda, Kouhei Tsumoto PII:

S0022-3549(20)30016-2

DOI:

https://doi.org/10.1016/j.xphs.2020.01.011

Reference:

XPHS 1848

To appear in:

Journal of Pharmaceutical Sciences

Received Date: 30 September 2019 Revised Date:

25 December 2019

Accepted Date: 10 January 2020

Please cite this article as: Kuroda D, Tsumoto K, Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design, Journal of Pharmaceutical Sciences (2020), doi: https:// doi.org/10.1016/j.xphs.2020.01.011. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Inc. on behalf of the American Pharmacists Association.

1 2 3

Review Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design

4 5

DAISUKE KURODA,1,2 KOUHEI TSUMOTO1,2,3

6 1

7 8

University of Tokyo, Tokyo 108-8639, Japan 2

9 10

12

Department of Bioengineering, School of Engineering, The University of Tokyo, Tokyo

108-8639, Japan 3

11

Medical Device Development and Regulation Research Center, School of Engineering, The

Laboratory of Medical Proteomics, Institute of Medical Science, The University of Tokyo,

Tokyo 108-8639, Japan

13 14

CORRESPONDANCE SHOULD BE ADDRESSED: KT ([email protected])

15

and DK ([email protected])

16 17

Running title: Computer-aided antibody design

18 19

ABSTRACT:

20

In recent years, computational methods have garnered much attention in protein engineering.

21

A large number of computational methods have been developed to analyze the sequences and

22

structures of proteins and have been used to predict the various properties. Antibodies are one

23

of the emergent protein therapeutics, and thus methods to control their physicochemical

24

properties are highly desirable. However, despite the tremendous efforts of past decades,

25

computational methods to predict the physicochemical properties of antibodies are still in

26

their infancy. Experimental validations are certainly required for real-world applications, and

27

the results should be interpreted with caution. Among the various properties of antibodies, we

28

focus in this review on stability, viscosity, and immunogenicity, and we present the current

29

status of computational methods to engineer such properties.

30 31

Keywords:

32

Antibody engineering; Computer-aided design; Molecular simulations; Machine learning;

33

Conformational stability; Colloidal stability; Viscosity; Immunogenicity 1

34

INTRODUCTION

35

In recent years, computational methods have been becoming essential tools for antibody

36

engineering as well as for drug discovery, helping with tasks such as screening candidate

37

molecules, evaluating drug likeness, and optimizing physicochemical and pharmacokinetic

38

properties. In these context, a great many computational methods have been developed to

39

analyze quantitative structure-activity relationships of target molecules. In antibody

40

engineering particularly, the growing number of antibody crystal structures enabled us to

41

analyze the sequence-structure relationships, leading to method developments toward

42

high-resolution antibody modeling.1 Computational methods have been also applied to

43

predict various properties of antibodies either from the sequences or structures. However,

44

despite the tremendous efforts of past decades, those methods are still in their infancy.

45

Experimental validations are certainly required for real world applications, and the results

46

should be interpreted with caution.

47

Antibodies are important molecules in various fields. Depending on their purposes, there

48

are several properties of antibodies that need to be engineered (Figure 1). Perhaps the most

49

important physicochemical property is binding affinity, which is a quantitative metric of the

50

natural function of antibodies. Antibodies bind to foreign molecules or antigens through six

51

complementarity-determining regions (CDRs). Although five of the CDRs (L1, L2, L3, H1,

52

and H2) assume limited conformations,2–5 the high variability of CDR-H3 in terms of both

53

sequence and structure6–9 enable it to recognize an infinite number of antigens.

54

Because antibodies function through the six CDRs, affinity maturation by engineering

55

has focused on changing the amino acid sequences of the CDRs.10 The trade-off between

56

binding affinity and other properties is known to be an issue of antibody engineering.11

57

Improving other properties such as stability, viscosity, and immunogenicity can therefore be

58

realized by changing the amino acids of the non-CDR parts of antibodies. Careful inspection

59

of mutational sites is often necessary to generate better antibodies. To overcome such tedious

60

processes, one of the most popular methods for improving antibody properties is random

61

mutagenesis based on in vitro libraries. It is possible to engineer not only binding affinities

62

but also other properties through such an in vitro, library-based approach by elevating

63

temperature and controlling solution conditions during the selection process. However,

64

because of recent advances in computational power and algorithms, computational design is

65

becoming an alternative method in antibody engineering.12–17 One of the advantages to

66

computational methods is that their use can be a rational approach when combined with a

67

structure. Antibodies and their structures should be governed by physical laws, and based on

68

physical principles, we should be able to predict behavior of antibodies in solution and in our 2

69

body. However, such predictions are not entirely satisfactory because the accuracy of

70

computational algorithms is not as good as that of library-based methods due to our

71

incomplete understanding of the biophysical principles of biomolecules and the difficulty of

72

defining conformational dynamics in silico.

73

In this review, we present the current status of computational designs of physicochemical

74

properties of antibodies, which have not been covered in previous reviews of computer-aided

75

antibody design.10,12 Among various properties, we focused on the stability, viscosity, and

76

immunogenicity of antibodies, all of which have garnered much attention in computer-aided

77

antibody design (Figure 1).

78 79 80

PHYSICOCHEMICAL ANTIBODIES

AND

BIOLOGICAL

PROPERTIES

OF

81

Before we present various examples of the application of computer-aided antibody

82

design, we briefly describe experimental metrics of the physicochemical and biological

83

properties of antibodies.

84 85

Stability

86

There are two types of protein stability concerned in antibody drug discovery: physical

87

stability and chemical stability. Generally, protein physical stability can be classified further

88

into conformational and colloidal stabilities. Proteins are only marginally stable, and proteins

89

in solution are in dynamic equilibrium between folded and unfolded conformations (Figure 2).

90

The conformational stability of proteins is defined as the free energy difference (∆G) between

91

the folded and the unfolded states, i.e., ∆G = Gunfolded – Gfolded. To stabilize a protein, one must

92

therefore either stabilize the folded state or destabilize the unfolded state of the protein by

93

shifting the folded–unfolded balance toward folded (Figure 2).18

94

The most straightforward approach to stabilizing folded-state proteins when starting from

95

the crystal structure is to strengthen the interactions between amino acids in the folded state.

96

On the other hand, destabilizing an unfolded-state protein is not trivial because its structure is

97

not visible, and it is unlikely to be a single state, but rather an ensemble of many invisible

98

states.18,19

99

Experimentally, in addition to the free energy differences between folded and unfolded

100

proteins, the conformational stability is often assessed from the melting temperature (Tm),

101

which is the temperature at which 50% of proteins are folded and 50% are unfolded. Melting

102

temperatures can be measured by differential scanning calorimetry (DSC), differential

3

103

scanning fluorometry (DSF), a thermal shift assay, and measurement of circular dichroism

104

(CD).

105

Furthermore, by attracting exposed hydrophobic patches to each other in a partial or fully

106

unfolded state, proteins in a folded–unfolded equilibrium may assemble into another

107

oligomeric state, which is often called a protein aggregate (Figure 2); in stark contrast to the

108

folding–unfolding events, this process of aggregation has been thought to be irreversible.20

109

Such short hydrophobic segments of protein sequences or patches on protein surfaces, termed

110

aggregation-prone regions (APRs), have been considered to govern the aggregation

111

propensities of proteins,21 and single point mutations on such regions can dramatically impact

112

protein aggregation rates. Aggregation of protein therapeutics often hampers the development

113

processes, and preventing the aggregation has been a long-standing challenge in drug

114

discovery because it could lead to higher yield, suppression of unwanted immunogenic

115

responses in patients, and maintenance of binding affinity toward antigens.20 Protein

116

aggregations could also be observed when genes are expressed as recombinant proteins. These

117

aggregates are called insoluble inclusion bodies, which hamper further experiments. Insoluble

118

aggregates can be solubilized and refolded into an active conformation by adding small

119

molecule additives, such as arginine.22,23 This tendency of proteins to aggregate is referred to as

120

colloidal stability.

121

Colloidal stability can be assessed based on binding assays or the sizes of particles in

122

solution after long-term storage or chemical/heat exposure. Colloidal stability can be measured

123

by several experimental assays that can quantify particle size distributions, such as size

124

exclusion chromatography (SEC) and dynamic light scattering (DLS).

125

In addition to physical stability, another challenge in stability engineering is

126

improvement of chemical stability. Antibodies could be degraded by chemical modification

127

of amino acids, such as Asn deamidation, Asp isomerization, Met oxidation, and Lys

128

glycation in the formulation and manufacturing processes.24 In principle, these chemical

129

degradations exhibit special preferences for certain amino acids, and the degradations can be

130

predicted, to some extent, based on amino acid sequences. For example, the common

131

sequence motifs for Asn deamidation are NG, NS, NN, NT, and NH whereas, for Asp

132

isomerization, those are DG, DS, DD, DT, and DH.25 However, applying only sequence

133

information could lead to overestimation of potential degradation sites. Although antibodies

134

have many residues that could be chemically modified, many of them are buried inside

135

structures, where chemical reactions would not proceed (Figure 3). In fact, high solvent

136

exposure of a residue is correlated with the high propensity of the chemical degradation.

137

Therefore, structural information is often desirable to assess such degradation events. 4

138

Forced degradation studies have been widely employed during protein therapeutics

139

development,26 and the best experimental technique to characterize such chemical

140

degradation is the analysis based on the liquid chromatography tandem mass spectrometry

141

(LC-MS/MS).

142 143

Viscosity

144

The viscosity of antibodies is important because of its practical implications with respect

145

to formulation and administration. It has been suggested that the behavior of antibodies in

146

solution can vary as a function of the concentration used during the formulation process. A

147

commonly encountered challenge in formulation is thus the high viscosity of the concentrated

148

antibody solutions, which makes the bioprocessing time longer, the formulated antibodies

149

unstable, and the processing cost higher. For administration to patients, the concentration of

150

an antibody in solution needs to be high (>100 mg/mL), but the viscosity of the solution

151

should be low so that high doses can be delivered through a small volume (1.0–1.5 mL) into

152

the subcutaneous space.27

153

The behavior of the concentration-dependent viscosity of antibodies depends on pairwise

154

interactions or self-association, which further leads to higher-order intermolecular

155

interactions (Figure 2). Experimentally measurable parameters related to the pairwise

156

intermolecular interactions include an osmotic second virial coefficient (B22) and a diffusion

157

interaction parameter (kD), which can be obtained by static light scattering (SLS) and DLS

158

measurements, respectively. These experimental parameters have been relied on as target

159

parameters to be computationally or theoretically predicted.28–30 In addition to B22 and kD, the

160

viscosity of a protein solution has also been experimentally evaluated based on parameters

161

such as the solution viscosity (η) measured with rheometers, diffusion coefficients (D) from

162

DLS profiles, and the retention time (RT) in a chromatographic column determined via

163

hydrophobic

164

chromatography (SMAC), or cross-interaction chromatography (CIC).

interaction

chromatography

(HIC),

standard

monolayer

adsorption

165 166

Immunogenicity

167

The immunogenicity of therapeutic proteins refers to the immune response of patients

168

against the proteins. The immunological discrimination of self and non-self governs the

169

mechanism of the immune response. In patients, drugs that are recognized as non-self may

170

initiate an immune response, which is often characterized experimentally by the detection of

171

anti-drug antibodies. The limitation of efficacy and negative impact on safety caused by

172

development of such antibodies hampers the clinical utility of the antibodies. 5

173

Direct assessment of immunogenicity in preclinical trials requires animal testing, which

174

is highly time consuming and costly. Therefore, during the development stage,

175

immunogenicity is often assessed by the sequence similarity between amino acid sequences

176

of target therapeutics and those of human antibodies. There are several traditional humanized

177

formats such as chimeric and humanized antibodies, and indeed correlations have been

178

reported between the fraction of human contents and immunogenicity, which has been

179

quantified in terms of the number of patients exhibiting anti-antibody responses.31

180 181 182

PREDICTION AND ENGINEERING PROPERTIES OF ANTIBODIES

OF

PHYSICOCHEMICAL

183

In response to antigen invasions, antibodies can evolve in our body to reshape their

184

sequences and structures so that they can bind to antigens with higher specificity and affinity.

185

The natural process of reshaping the antigen-binding site involves chemical and structural

186

changes that may enhance binding affinity of the antibody at the cost of thermodynamic

187

stability or other properties. The processes of somatic maturation are thus sensitive to many

188

unknown factors and are tightly controlled with a delicate balance by immune systems.

189

These physicochemical properties of antibodies could be engineered in a manner similar

190

to in vivo evolution through mutations in vitro or in silico. Library-based in vitro approaches

191

are perhaps still the most accurate method at this moment.32 As described below, however,

192

recent advances in computational power and algorithms have suggested that computational

193

approaches could be alternative methods to engineer antibodies with lower cost and higher

194

speed.

195 196

Overview of computational prediction and engineering

197

Numerous computational methods to predict physicochemical and biological properties

198

such as protein stability, viscosity, and immunogenicity have been developed to facilitate

199

predictive protein engineering. In many cases, an input is a sequence or a structure (Figure 4).

200

Computational models can then predict either in qualitative or quantitative terms whether the

201

input is either stable or unstable, has low or high viscosity, and is immunogenic or

202

nonimmunogenic. In general, these prediction models can be classified into two categories:

203

1) statistical predictions and 2) physics-based predictions. Statistical methods literally rely on

204

statistical information derived from experimental data, and the accuracy of the predictions

205

heavily depend on the amount and quality of data used to train the prediction methods. As

206

described below, several large-scale experimental datasets have become available in public,

207

and one can use them to train new prediction models. On the other hand, physics-based 6

208

methods predict properties based on physical laws, and hence the methods do not require any

209

prior experimental data to perform predictions. These statistical and physics-based methods

210

are not mutual exclusive, and many of the prediction algorithms take advantage of prior

211

knowledges of experimental data as well as physical laws to varying extents.

212

In protein engineering, however, one has to optimize a wild-type sequence so that the

213

properties of the protein can improve. Technically, one can generate a pool of random

214

sequences, feed them into a prediction algorithm, and then obtain a potentially improved

215

sequence. However, these processes are cumbersome, and it is not trivial to cover a whole

216

sequence space; the FV region of antibodies usually consists of 200 amino acids or even more,

217

and hence there are 20200 (≈ 10270) sequences to be considered. More practical methods have

218

therefore been developed to design amino acid sequences. These methods, which couple

219

prediction of the properties of a protein with sequence sampling, are called computational

220

design calculations (Figure 4).33,34

221 222

Prediction of conformational stability

223

There are a large number of methods for prediction of conformational stability changes

224

upon mutations (i.e., ∆∆G = ∆GMut – ∆GWT). The accuracy of each method is often evaluated

225

based on the correlation coefficient (r) between predicted ∆∆G and experimental ∆∆G values.

226

Machine learning has been used to predict ∆∆G due to mutations from protein sequences

227

and structures. One recent such method is the DeepDDG,35 which relies on a deep neural

228

network trained on 5444 experimental data points. Benchmarking the method against eight

229

other methods has demonstrated that the DeepDDG performs the best; correlation coefficients

230

(r) between experimental ∆∆G and predicted ∆∆G were 0.66, 0.62, 0.60, 0.59, 0.57, 0.54, 0.52,

231

0.31, and 0.18 for DeepDDG, PopMusic,36 SDM,37 EASE-MM,38 mCSM,39 I-Mutant3.0,40

232

STRUM,41 MUpro1.1,42 and SCooP,43 respectively. Sequence and structural features exploited

233

in the DeepDDG are listed in Table I. Among them, solvent accessible surface area of the

234

mutated residue contributed to the prediction most, suggesting the importance of residue

235

packing for protein stability.35

236

In addition to machine learning, structure-based simulations can also be used to predict the

237

∆∆G associated with a mutation. For instance, Gapsys et al. have employed a method of free

238

energy perturbation44 to propose a consensus force field approach, where ∆∆G values

239

calculated from molecular dynamics simulations with six different force fields (Amber99sb,

240

Amber99sb*LDN, OPLS, Charmm22*, Charmm36, and Charmm36H) were averaged to

241

minimize the force field bias;45 benchmarking the consensus method with 119 mutations of the

242

barnase protein led to a correlation coefficient (r) between experimental and computed ∆∆Gs 7

243

of 0.74. Furthermore, Steinbrecher et al. have demonstrated that FEP+, which is a free energy

244

perturbation method based on molecular dynamics (MD) simulations with a single force

245

field,46 can predict ∆∆G values of single point mutations;47 with 712 mutations of 10 different

246

proteins, the correlation coefficient (r) between experimental and computed ∆∆G values was

247

~0.74.

248

In contrast, based on a set of mutations for which experimental data were reported at least

249

twice, Potapov et al. have reported that the correlation coefficient (r) of ∆∆G between

250

independent experiments was 0.86.48 This result is consistent with a recent theoretical estimate

251

of a natural upper bound of the accuracy of ∆∆G predictions.49 These results suggest that there

252

may still be room for improvement of computational ∆∆G predictions.

253

The melting temperature (Tm) is a direct indication of the conformational stability of a

254

protein. Thus, Tm is more commonly used than ∆∆G to experimentally represent

255

conformational stability in the literature. Bekker et al. have proposed a computational strategy

256

to assess conformational stability of single-domain antibodies and have demonstrated that the

257

fraction of native contacts (Q-value)50 computed from high-temperature (400 K) MD

258

simulations was correlated with the Tm of single-domain antibodies;51 they employed seven

259

single-domain antibodies that exhibited a range of Tm from 47℃ to 85℃, and they observed a

260

reasonable correlation (r = 0.79) between the Q-values and Tm values reported in the literature.

261

When the calculations were based on Q-values of hydrophilic residues, the correlation

262

coefficient (r) became 0.84. The implication is that favorable interactions of hydrophilic

263

residues lead to stabilization of single-domain antibodies. Based on these observations, the

264

authors proposed a few mutations that were predicted to enhance conformational stability of a

265

single-domain antibody. Recently, Zabetakis et al. have experimentally tested the mutations

266

proposed by Bekker et al. and have demonstrated that the mutations indeed improved

267

conformational stability of the antibody;52 however, it also turned out that those

268

stability-enhancing mutations in turn led to reductions of the binding affinities to the antigen,

269

suggesting the difficulty in simultaneously improving conformational stability and the other

270

properties.

271 272

Prediction of colloidal stability and solubility

273

Aggregation of proteins has been an active area of research, especially in the context of the

274

ability to develop protein therapeutics. In principle, aggregation and solubility are distinct

275

phenomena because aggregation is an irreversible process while solubility is typically

276

considered as a reversible process. In practice, however, methods to predict aggregation rates

277

have been also exploited quite effectively to predict the solubility of proteins.53,54 Therefore, 8

278

the terms “aggregation propensity” and “solubility” have been sometimes used

279

interchangeably in computational method developments.55 There have been several

280

comprehensive reviews of computational studies of colloidal stability and solubility of

281

therapeutic proteins.20,55–57

282

Currently, several computational methods are available to predict APRs and rates of

283

aggregation. Those methods are based mainly on sequence compositions and on propensities

284

such as hydrophobicity as well as charge and secondary structure propensity. The β-strand

285

tends to aggregate more than the α-helix.58 Thus, antibodies, which consist of multiple

286

β-strands, are likely to have an intrinsic aggregation propensity. In fact, light chains of

287

antibodies have been known to be a cause of amyloidosis.59 In this context, David et al. have

288

developed an algorithm for predicting amyloidogenesis of light chains of antibodies based on a

289

Bayesian classifier and a decision tree.60 Furthermore, using the same dataset of the antibody

290

light chains, Liaw et al. have developed an algorithm called AbAmyloid based on a Random

291

Forests classifier with information of dipeptide composition.61

292

More general methods to predict APRs and aggregation rates include TANGO,62 PAGE,63

293

Waltz,64 PASTA,65 Zyggregator,66 and Aggregation3D (A3D),67 just to name a few. For a more

294

comprehensive list, we refer readers to previous review articles.20,55 Like ∆∆G predictions, the

295

majority of the methods have been based on machine learning, wherein many of the features

296

used in the model constructions have been identical to those used in the prediction of

297

conformational stability, although there have been some notable exceptions.68–74 Such an

298

example is the SOLart protein solubility predictor,75 which exploited solubility-dependent

299

distance potentials derived from crystal structures.76 To develop the SOLart, a random forest

300

model was trained based on proteins that had been expressed with the cell-free expression

301

system called PURE and whose solubilities had been experimentally measured.77 Features

302

used in the SOLart is listed in Table I. A benchmark test showed that the predicted solubility

303

values by SOLart were correlated with experimental solubility scores (r = 0.65) better than 9

304

other methods used in the benchmark.75

305

To correlate protein features with protein solubility or aggregation propensity, Warwicker

306

and coworkers analyzed protein surface features and found that the most important feature

307

associated with solubility was the amount of positively charged residues on the surfaces; the

308

more positive protein surfaces are, the less soluble the proteins are.78 The fact that this apparent

309

correlation was not observed for negatively charged residues suggested that interactions

310

between expressed proteins and nucleic acids might lead to insolubility. In another study, the

311

same group have also suggested that a feature that commonly accompanies proteins with high

312

solubility and that occurs at relatively high expression and abundance levels is an increased 9

313

ratio of lysine content to arginine content.79 Based on these observations, the authors employed

314

a linear model of 35 features, including 20 amino acid compositions, seven compositions of

315

charged and hydrophobic residues, and several other features. This model led to the Protein-sol

316

application that can predict protein solubility from amino acid sequences.80 A recent

317

experimental study has also demonstrated that the arginine/lysine ratio is an important

318

determinant of colloidal stability of an antibody.81 In a recent study, the Protein-Sol application

319

has been updated to incorporate structural information and thereby enable structure-based

320

assessments of protein solubility by additional electrostatic potential calculations as well as the

321

visualization of surface patches on protein structures.82

322

These methods of prediction can extract insightful statistics from large-scale sequences

323

or structures. These methods are usually fast enough to compute APRs in a high-throughput

324

way, and thus they could be potentially exploited to analyze antibody sequences from an

325

antibody library. For instance, TANGO62 and PAGE63 have been exploited to identify

326

potential APRs of commercial antibodies.83 Such high-throughput assessments would not be

327

possible via experiments, and prediction-based analyses could therefore provide valuable

328

insights into relationships between protein sequences, structures, and aggregation propensity.

329

However, the colloidal stability of proteins still involves unknown mechanisms of aggregation,

330

and considering current accuracies of prediction algorithms, experimental verifications are still

331

required to draw final conclusions.

332

A more rational way to characterize colloidal stability of proteins is patch analyses of

333

protein surfaces. An assumption is that exposed hydrophobic patches on protein surfaces

334

would lead to self-oligomerization. Based on this idea, Trout and coworkers proposed a novel

335

measure called spatial aggregation propensity (SAP), which quantifies the exposure of

336

hydrophobic residues derived from a crystal structure or averaged over snapshots from MD

337

simulations.84 Antibodies are expected to have better colloidal stability if point mutations are

338

introduced to the predicted hydrophobic patches, such that the patches can become more

339

hydrophilic. To achieve rapid in silico screening of antibodies, the same group subsequently

340

developed another metric called the Developability Index (DI), which is calculated from a

341

combination of the SAP score and the net charge of target proteins.85 The rationale is that, in

342

addition to the obvious importance of hydrophobicity on the protein surface, electrostatics is

343

also a quite important factor for solution-phase reactions and for protein aggregation.

344

Another computational method to design aggregation-resistant proteins is the CamSol

345

method developed by Vendruscolo and coworkers,54 which was an extension of the

346

sequence-based aggregation predictor, Zyggregator.66 The CamSol method first calculates the

347

sequence-based residue-wise solubility profile based on a score represented by a linear 10

348

combination of physicochemical properties of amino acids. The score is smoothed over a

349

window of seven residues, and the sequence-based profile is further modified based on

350

structural information. Designable positions are then identified based on the structure-based

351

profile, and all possible variants are screened to identify the most soluble mutations with the

352

sequence-based solubility score. The authors benchmarked the accuracy of the CamSol method

353

using 56 previously published protein variants (including 34 antibodies) to see if the CamSol

354

could classify the proteins as soluble or insoluble. Fifty-four of 56 proteins were correctly

355

classified with the CamSol method. In another study, the same group used saturation

356

concentration analysis, DLS, and analytical SEC to demonstrate that the CamSol method could

357

identify mutations that improved solubility of a single-domain antibody.54

358

Sankar et al. have proposed the use of AggScore as a method to predict and evaluate APRs

359

from protein structures.86 Based on an input structure, the method quantifies the energetic

360

contribution of each residue to respective hydrophobic and electrostatic surface patches. The

361

AggScore function has been parameterized based on a previously published dataset of mutants

362

made from engineered immunoglobulin-like domains.87 Use of the optimized function

363

produced a better correlation coefficient (r = 0.85) with the percentage of inclusion body

364

formation than use of Zyggregator66 and Aggrescan88 (r = 0.81 and 0.84, respectively). The

365

AggScore can also discriminate between amyloidogenic and non-amyloidogenic hexapeptides;

366

it produced a better AUC value (0.81) than Zyggregator66 and WALTZ89 produced (AUC =

367

0.77 and 0.78, respectively) in an ROC curve analysis. In another benchmarking application,

368

the authors compared results to retention times measured from HIC, SMAC, and CIC assays of

369

137 antibodies in the clinical stage;90 AggScore produced better AUC values (0.75, 0.76, and

370

0.70 for the retention times of HIC, SMAC, and CIC, respectively) than Zyggregator66 (0.50,

371

0.58, and 0.54) and Aggrescan91 (0.54, 0.69, and 0.61). The use of AggScore has been

372

implemented in the BioLuminate package of Schrödinger.

373

Based on previously published experimental results of the production yields of nanobodies

374

and the ∆∆G upon point mutations,92,93 Soler et al. employed a homology model of the

375

nanobodies to see whether conformational and colloidal stabilities correlated with production

376

yields; they found that the ∆∆G values obtained through experiments and the predicted scores

377

generated by FoldX, CamSol, and A3D were unable to predict production yields.94

378

Subsequently, the authors proposed a computational protocol to predict production yields; MD

379

simulations were first performed to identify regions that were affected by mutations based on

380

the differences of residue contact area maps in the SPACE suite95 between mutants and

381

reference crystal structures. Together with the exposed hydrophobic residues identified by

382

InterProSurf,96 those identified regions were assumed to be aggregation hotspots and therefore 11

383

potential binding sites in docking simulations by HADDOCK.97 Nanobodies that exhibited

384

high production yields consistently generated poor docking scores, whereas nanobodies that

385

exhibited lower production yields scored better in the docking simulations. The implication

386

was that the nanobodies with lower yields tended to form dimers more favorably via the

387

predicted aggregation hotspots. The better performance of the MD/docking-based protocol was

388

explained by the fact that most of the conventional methods, which often relied on a single

389

static structure, did not take into account long-distance effects of mutations on the whole

390

molecular structure.94 Among computational methods, MD simulations are distinct in that,

391

given static structures, they can predict time-course motions of proteins and evaluate not only

392

the local dynamics of proteins, but also their global dynamics, such as large-scale domain

393

motions and allosteric communications.98

394

The methods discussed above were trained and validated on a limited dataset from a

395

variety of experimental sources. For model construction, a larger dataset analyzed by fewer

396

facilities would be desirable to avoid bias and minimize the noise caused by different

397

institutions doing the experiments. With the goal of achieving high-throughput screening of

398

antibodies, Obrezanova et al. have measured aggregation data of 576 antibodies via the

399

Oligomer Detection Assay and SE-HPCL methods.99 They developed a qualitative prediction

400

algorithm that classified antibodies based on their aggregation propensity (low or high) using

401

antibody sequences as input. The prediction method was based on the Adaptive Boosting

402

algorithm for building ensembles of classification trees to bridge experimental data with

403

numerical parameters derived from principal component analysis of physicochemical

404

properties of amino acids, such as hydrophobicity, electrostatic, polarity, size, steric

405

hindrances, and hydrogen bond properties. Using 49 different antibodies from the ones used

406

for training and validation above, the authors benchmarked their method with the DI tool85 and

407

showed that their method was able to correctly classify 84% of antibodies that exhibited low or

408

high aggregation risks, whereas the DI tool correctly classified only 53% of the same

409

antibodies.

410 411

Prediction of chemical stability

412

The analysis via LC-MS/MS to identify chemical modifications of proteins is often

413

labor-intensive and time consuming. Therefore, several computational methods have been

414

proposed to rapidly assess the chemical stability of therapeutic proteins.24,100–110 One of the

415

most common degradation events is the chemical modification of Asn and Asp residues,

416

which share a degradation pathway.24 Many of the methods to predict such degradation are

417

statistical-based methods, and experimental data to derive such prediction models are either 12

418

from in-house experiments100,102,106,107,109 or from literature.103,105,110 For example, to

419

understand origins of Asn deamidation and Asp isomerization, Sydow et al. employed mass

420

spectrometry to experimentally characterize 37 antibodies that were subjected to forced

421

degradation.100 These experimental data, together with homology modeling of the antibodies,

422

suggested that degradation hotspots could be characterized by their conformational flexibility,

423

the size of the C-terminal franking residue, and secondary structures. In the same study,

424

several machine learning algorithms were trained based on the experimental results, and a

425

decision tree model was proposed as the best prediction method for Asn and Asp

426

degradations. In another study, Yan et al. used 10 antibodies under both normal and stressed

427

conditions, and experimentally characterized the Asn deamidation, leading to a decision tree

428

model to predict the Asn deamidation probability from antibody structures.106 More recently,

429

based on in-house LC-MS/MS experiments and literature information, Delmar et al.

430

employed machine learning to predict Asn deamidation probability and rate.107 The training

431

set consisted of 776 Asn residues from 67 antibodies. Based on the reasoning of chemical

432

reactions of the degradation pathway, a total of 12 features were considered to train a random

433

forest prediction model (Table I). Among all features, the C-terminal flanking residue and

434

pentapeptide deamidation half-life had the greatest impacts for the categorical prediction of

435

Asn deamidation. On the independent validation set that include only 68 Asn residues of

436

antibodies, the authors compared the prediction accuracy of their method with those of other

437

prediction algorithms; the proposed random forest model achieved 95.6% prediction accuracy,

438

whereas the methods by Yan et al.106 and Lorenzo et al.103 showed the prediction accuracies

439

of 83.8% and 91.2%, respectively.

440

In contrast to statistical methods, physics-based methods are often low-throughput, but

441

can provide rationale behind predictions. In this context, a study by Plotnikov et al.

442

demonstrated that molecular dynamics and quantum mechanical calculations could help

443

predict Asn deamidation and Asp isomerization in antibodies by quantifying free energy

444

barriers along the conformational and chemical reaction pathway.104 A clear advantage to the

445

method is that it does not require any prior experimental data for parameterization.

446

Considering the fact that it is becoming possible to perform high-throughput experimental

447

assays to assess therapeutic antibodies, combining both statistical and physics-based methods

448

would be a promising direction for method developments.

449

13

450

Databases

451

construction

that

store

experimental

information

for

predictive

model

452

Databases that store protein variants and binding affinities measured by a variety of

453

experiments have been developed for studying protein–protein interactions in general111 and

454

specifically for antibodies.112 Crystal structures of protein–protein complexes and their

455

corresponding unbound-state structures are also available for benchmarking docking

456

simulations.113 Similarly, databases of conformational114 and colloidal89,115–118 stabilities as

457

well as solubility77 have been developed and exploited for construction of predictive models

458

and statistical potentials. ProTherm is probably the most widely used database that stored

459

thermodynamic stability data of proteins. It contains more than 10,000 data points generated

460

by thermodynamic experiments.114 Table II summarizes the contents and URLs of those

461

databases as of the writing of this review. The dataset used for training and testing is also

462

often provided in the associated supplementary materials of method papers.35,37 It is also

463

worth noting that scientists at Adimab have published results of experimental

464

characterizations of ~140 clinical stage antibodies based on a series of biophysical and forced

465

degradation assays (Table III).25,90,102 These data resources will enhance our understanding of

466

antibody therapeutics as well as method developments for computer-aided antibody design.

467 468

Computer-aided stability engineering of antibodies

469

Although a large number of computational techniques have been developed in a quest for

470

predictive protein engineering, only a few methods have been tested to experimentally

471

improve protein stability by using point mutations suggested by computational predictions.

472

Regardless of the target properties, to engineer a protein, one needs to choose both the

473

location of the mutations and the replacement amino acid. The outcomes of computational

474

designs are designed amino acid sequences with predicted values or scores (Figure 4).

475

Because the predicted values are used as references, one must experimentally evaluate the

476

physicochemical properties of the proteins. A score is often represented in the form of a

477

linear combination of specific physicochemical or structural properties.

478

Rosetta119 and FoldX120 are widely used automated methods for computational

479

assessments of effects of point mutations on protein structures. An advantage of these

480

methods over the other ∆∆G prediction tools is that Rosetta and FoldX can simultaneously

481

sample the type of side chains or amino acids (i.e., sequence design) as well as the

482

conformations (i.e., structure prediction), whereas, in most of the other methods, the type of

483

replacement residues need to be specified before computing the ∆∆G, and the designed

14

484

structures are often not explicitly generated. Computational methods reviewed below are

485

summarized in Table IV.

486 487

Predicted ∆∆G as a selection criterion. The goal of the majority of the computational studies

488

in antibody engineering has been to improve colloidal stability because of the recent focus in

489

formulation of protein therapeutics on the colloidal stability of antibodies. In contrast, many

490

of the experimental efforts to improve conformational stability by computations have

491

involved enzymes.121

492

Compared to improving binding affinity, improving conformational stability is much

493

more straightforward; in the case of interface designs for improvement of affinity, the relative

494

orientation between two components in a protein–protein complex needs to be considered in

495

addition to the intrinsic dynamics of each component. In contrast, when the only concern is

496

conformational stability, the design object can be a single component, and there are fewer

497

degrees of freedom. However, in real world applications, not only conformational stability,

498

but also other properties, such as colloidal stability, have to be considered, and trade-offs

499

between properties have been reported.11 In one such example, Broom et al. built a

500

meta-predictor that combined 11 freely available ∆∆G prediction tools.122 They showed that

501

the accuracy of the predictions was better than that of individual tools against 605

502

experimentally verified mutations. By exploiting the meta-predictor, the authors predicted the

503

∆∆G values for all point mutations to each of the ThreeFoil’s 120 residues that were not

504

involved in the function. The 10 variants predicted to be stabilizing were chosen for further

505

experimental characterization: four out of the 10 variants were indeed found to have better

506

conformational stability based on the experimental ∆∆G values obtained via kinetic unfolding

507

and folding measurements. However, in contrast to the improved conformational stability, all

508

the designed variants exhibited decreased colloidal stability. The implication is that it is

509

difficult to simultaneously improve all the physicochemical properties of proteins.

510

With the advent of the next generation sequencing technology, a high-throughput method

511

of saturation mutagenesis for entire sequences, called deep mutational scanning, has emerged

512

as a powerful tool in next-generation protein science,123 and deep mutational scanning has

513

also been applied to engineer antibodies.124 Similarly, with the advent of increasing

514

computational speed and accuracy, similar high-throughput saturation mutagenesis methods

515

in silico are now becoming possible. Based on such a computational protocol, Wang et al.

516

have been able to improve the conformational stability of an anti-hVEGF antibody;125 they

517

assessed the conformational stability based on the T50 value, which was the temperature at

518

which half of the antibody was inactivated in an enzyme-linked immunosorbent assay 15

519

(ELISA) after heat exposure. In their protocol, the authors first build a homology model of

520

the antibody with RosettaAntibody126 and then dock the model to a crystal structure of the

521

antigen with the ZDOCK program127, which is followed by refinement with SnugDock.128

522

After the model building of the antibody-antigen complex, the authors perform a virtual

523

scanning mutagenesis with the FoldX program120 to obtain the ∆∆G (= ∆GMut – ∆GWT) of each

524

position of non-interface residues. The resultant designed mutations are then filtered by the

525

computed ∆∆G, local structure entropy,129 and residue frequency statistics of human

526

antibodies to generate an antibody with 10 mutations having better conformational stability

527

(∆T50 ~7℃, compared to the wild type). Retrospectively, the authors also analyzed the

528

unfolding pathways of the designed mutants based on a Gaussian network model.130 That

529

assessment suggested that analysis of unfolding pathways of proteins prior to design could

530

help to improve design accuracy.

531

Zhang et al. performed a computational design calculation to explore relationships

532

between conformational and colloidal stabilities of the Fab region of an antibody, the Tm of

533

which was 71.8℃, based on measurements made with the UNit instrument (Unchained

534

Laboratories, UK).131 In their strategy, potentially flexible regions were first identified based

535

on MD simulations of a homology model of the antibody and B-factors derived from crystal

536

structures of the homologous antibodies (53–90% sequence identities). The authors then

537

applied Rosetta ∆∆G scanning mutagenesis to the entire sequence (442 residues) of the

538

antibody. This analysis resulted in 8398 model structures (442 ×19 non-native amino acids)

539

in total. Based on the prediction of the flexible regions and the in silico ∆∆G calculations, 17

540

variants were selected for further experimental validations: 11 stabilizing variants with

541

predicted ∆∆G values ranging from −8.8 to −2.6, whose designed positions were predicted to

542

be more flexible than other regions, and six destabilizing mutations with predicted ∆∆G

543

values ranging from 39.1 to 235.7. As expected, although 6 out of 11 stabilizing variants had

544

slightly higher Tm values compared to the wild type, the magnitude of the improvements was

545

not significant (∆Tm < 1.0℃). This is most likely because the wild type antibody already had

546

high Tm value (71.8℃), and the wild type sequence might have been highly optimized for the

547

conformational stability. Overall, the stable variants tended to show cooperativity of

548

unfolding and lower aggregation rate. In addition, the variants also showed that those with

549

decreased Tm values or decreased conformational stability led to more rapid aggregation.

550

More recently, in a more sophisticated approach, Lee et al. employed homology models

551

built by RosettaAntibody to engineer thermostabilized antibodies.132 Based on the visual

552

inspection of the homology models, the authors identified a small number of amino acids (2-5

553

residues) that interacted each other to form “clusters”. Subsequently, Rosetta-based fixed 16

554

backbone design protocol was used to mutate these residues in each small cluster to another

555

amino acids, so that the designed positions became more tightly packed; five out of 13

556

variants experimentally tested showed small increases in the Tm values, and two of the

557

combinations of the designs resulted in two thermostabilized variants whose Tm values were

558

4.4℃ and 4.5℃ higher than the wild type, respectively. Notably, in the same study, a crystal

559

structure of a thermostabilized variant was determined, and, retrospectively, the homology

560

models used for the computational design were in excellent agreement with the crystal

561

structure (backbone RMSDs were 0.56 Å and 0.84 Å for the FV region and CDR-H3,

562

respectively), highlighting the utility of homology modeling for stability engineering.

563

In all the 3 cases of antibodies above, the designed positions were limited to framework

564

regions since mutations in CDRs could have deteriorate effects on binding capability of the

565

designed antibodies. In agreement with this reasoning, in our experiences, although mutations

566

in CDRs could improve conformational stability, such mutations often diminish binding

567

affinity toward antigens.

568

Although one of the widely used computational methods for biomolecular design is

569

Rosetta,119 it has been less well explored for design of aggregation-resistant proteins. Based

570

on the previous observation that Asp substitutions at specific positions in human antibodies

571

could decrease the aggregation propensity,133 Sakhnini et al. have designed a combinatorial

572

antibody library in which 393 Fab variants with single, double, and triple Asp substitutions

573

have been prepared.134 Subsequently, the authors screened these variants with ∆∆G

574

calculations by Rosetta. Single and double/triple substitutions that caused increases of ∆∆G

575

by more than 5 and 1.5 Rosetta Energy Units, respectively, were eliminated. Twenty-six

576

antibodies remained for further experimental characterization. As expected from the lenient

577

∆∆G criterion, the Tm of the 26 variants measured by DSF showed some variations (57.2℃–

578

63.3℃, compared to 61.5℃ for the wild type), but all the variants resulted in fully retained

579

binding affinity, and half of them showed aggregation resistance. Retrospectively, the authors

580

computed SAP scores for each variant and compared them with experimental metrics that

581

suggest aggregation propensity; SAP values were not correlated with the percentage of

582

high-molecular-weight proteins formed after incubation at 45℃ for six days, whereas they

583

were remarkably correlated with the retention time of the size exclusion ultra-performance

584

liquid chromatography* (Spearman rank correlation coefficient = 0.94). The authors also

585

found that a decreased aggregation propensity or improving colloidal stability was well

586

correlated with conformational stability, i.e., a decreased aggregation propensity led to an

587

increased conformational stability of the Fab variants (Spearman rank correlation coefficient

588

= −0.87). 17

589 590

Supercharging. Whereas scanning mutagenesis is quite useful, alternative methods have been

591

developed in which the screening is performed by design in a more rational manner. For

592

instance, Lawrence et al. have proposed a supercharging method in which several surface

593

residues, as assessed by the average number of neighboring atoms (within 10 Å) per side

594

chain atoms, are replaced with charged amino acids to increase the thermal resistance of

595

proteins.135 Later, Miklos et al. also designed an antibody based on another supercharging

596

strategy, wherein the positions of mutations were chosen on the basis of Rosetta

597

energetics.136,137 They demonstrated that some of the designed antibodies with ~14 mutations

598

had better refolding capability, which was assessed by ELISA binding assays after thermal

599

inactivation following incubation at 70℃ for 1 hour. Some of the designed antibodies also

600

had better conformational and colloidal stabilities based on assessments with DSC and DLS,

601

respectively.137 Interestingly, a stabilized antibody (∆Tm = ~2℃) showed a 30-fold better

602

binding affinity (assessed by surface prasmon resonance (SPR)) than the parent antibodies,

603

even though the altered positions were not in the CDRs but instead in the framework regions

604

(FRs). Bruce et al. have also designed supercharged single-domain antibodies with ~11

605

mutations based on buried surface areas; They demonstrated that supercharging strategies

606

could endow small proteins with the ability to penetrate a cell without altering their structure

607

and function.138 Although immunogenicity may be a problem in the case of therapeutic

608

applications, supercharging strategies seem to be a powerful approach to design stable

609

antibodies. However, the effective net charges and positions of mutations is not universal in

610

antibodies and, despite the fact that sequences and structures of the framework regions of

611

antibodies are well conserved, these properties need to be determined case-by-case;

612

mutations at the framework regions that are tolerable to an antibody are unlikely to be equally

613

acceptable to another antibody because of the subtle balance between the conserved

614

framework regions and the highly diverse CDRs that varies between antibodies.

615 616

Spatial aggregation propensity (SAP). Protein self-association can lead to aggregation.

617

Explicitly considering mutagenesis of the interacting region would thus be a practical

618

approach to designing aggregation-resistant antibodies. In this context, the SAP method has

619

also been employed in structure-based antibody design. Starting from two therapeutic

620

antibodies, rituximab and bevacizumab, Trout and coworkers have employed the SAP

621

calculations to design biobetters with enhanced colloidal stability.139,140 In the case of the

622

bevacizumab design, in addition to the simple point mutations, the authors also incorporated

623

a glycosylation motif near the high-SAP regions of the antibody. They have shown that 18

624

masking APRs with a carbohydrate moiety can be an effective approach to prevent

625

aggregation.

626

Clark et al. have also used homology modeling and SAP calculations to design variants

627

of a highly aggregation-prone IgG2;141 mutational positions were chosen based on the SAP

628

scores whereas the selection of the types of substitutions was based on the sequence

629

comparison to an aggregation-resistant homologue, which resulted in 74 variants with as

630

many as 9 mutations. The resultant variants showed enhanced conformational and colloidal

631

stability in 32 cases, out of which 11 variants could still bind to the antigen, as confirmed by

632

SPR, and 9 variants showed biological activity, as confirmed by an assay employing natural

633

killer cells.

634

Exposure of antibodies to an acidic environment is often necessary during the

635

formulation and manufacturing processes.142 Skamris et al. have employed size-exclusion,

636

high performance liquid chromatography, small-angle X ray scattering (SAXS), and DLS to

637

characterize the oligomerization kinetics at pH 3.3 and the reversibility upon neutralization of

638

three antibodies with identical FV regions that are representative of IgG1, IgG2, and IgG4,

639

respectively.143 These experimental techniques have revealed that, under acidic pH conditions,

640

IgG1 remains monomeric, whereas the other two undergo a two-phase oligomerization

641

process. After neutralization, IgG2 oligomers partially revert to the monomeric state, whereas

642

IgG4 oligomers tend to aggregate. Use of SAP calculations based on crystal structures of the

643

Fc fragments were able to identify subclass-specific, aggregation-prone motifs, indicating

644

that these motifs could explain the two distinct pathways of reversible and irreversible

645

aggregation observed in their experiments.

646

A variety of excipients have been suggested to be protein stabilizers.144–148 In this context,

647

SAP calculations have also been used to examine interactions between antibodies and

648

formulation excipients. To gain insights into how formulation excipients of protein

649

therapeutics affect aggregation and viscosity, Trout and coworkers conducted MD

650

simulations using three different IgG1 and several carbohydrates.149 They found that sucrose

651

and trehalose reduced antibody aggregation more than sorbitol because of their larger size

652

and stronger interactions with high-SAP regions of the antibodies.

653 654

CamSol. In rational antibody design and screening, the CamSol method has also been

655

employed in combination with experiments. Using nine full-length antibodies with a

656

PEG-precipitation assay, DSC, and DSF measurements, Vendruscolo and coworkers have

657

shown that selection of soluble lead antibodies is possible with the improved sequence-based

658

CamSol solubility score just after sequencing of the screened antibody library.150 Remarkably, 19

659

the correlation coefficient (r) between the predicted CamSol scores and experimentally

660

measured solubilities of nine antibodies was 0.97. Furthermore, Vendruscolo and coworkers

661

have designed 16 variants of an antibody with the CamSol method; they produced antibodies

662

with a diverse range of solubilities and other physicochemical properties.151 The authors

663

employed several experimental techniques (cross-interaction chromatography, standup

664

monolayer

665

affinity-capture

666

precipitation) to assess the developability of the series of antibodies, and they compared the

667

experimental results with the results obtained by several in silico tools (CamSol, SAP, DI,

668

SolPro, and Protein-Sol). They found that CamSol, SAP, and DI were highly correlated with

669

the experimental measurements. The fact that the Pearson correlation coefficients were as

670

high as 0.91 demonstrated the utility of the computational methods for high-throughput

671

antibody screening.

adsorption

chromatography,

self-interaction

hydrophobic-interaction

nanoparticle

spectroscopy,

and

chromatography, ammonium

sulfate

672

In another study, based on a crystal structure of an antibody and a homology model of

673

the IgG format, Shan et al.152 have designed the 15 variants targeting the potential hotspots

674

(CDR-L2, CDR-H3, and the CH3 domain) for self-association previously suggested by

675

hydrogen–deuterium exchange mass spectrometry (HDX-MS).153 This design resulted in

676

antibodies that had a diverse range of solubilities and other physicochemical properties. The

677

authors assessed the self-association using several experimental techniques (affinity capture

678

self-interaction nanospectroscopy, DLS, and PEG-precipitation assays) and compared the

679

results with the results obtained by the CamsSol and SAP calculations. This comparison

680

revealed the correlation coefficients (r) between computed scores and experimentally

681

measured solubility as high as 0.93 and -0.84 for CamSol and SAP, respectively.

682 683

Solubis. Most of computational methods to predict APRs from an amino acid sequence do not

684

take account of the conformational stability of proteins. On the one hand, APRs should be a

685

cluster of hydrophobic residues, and inside a protein they are most likely to contribute

686

favorably to conformational stability in the native, folded state and could be a trigger of

687

aggregation only upon denaturation; on the other hand, APRs on protein surfaces could

688

trigger aggregation under native conditions via hydrophobic intermolecular interactions. To

689

distinguish these differences in mechanisms, Van Durme et al. have developed a method

690

termed Solubis that combines TANGO and FoldX.154 This combination results in a

691

structure-based method to design aggregation-resistant proteins by identifying mutations that

692

reduce the intrinsic aggregation propensity assessed by TANGO while respecting

693

conformational stability computed by FoldX.154 With 11 previously published antibodies, the 20

694

same group also demonstrated that Solubis was able to filter the TANGO-predicted APRs by

695

simultaneously considering structural information and conformational stability.155 The

696

authors further exploited Solubis to design antibodies and experimentally verified that one of

697

the designed antibodies exhibited better conformational and colloidal stabilities while

698

preserving the binding capability to the antigen.155

699 700

Brownian dynamics simulations. The examples above centered around mutational studies of

701

antibodies aimed at enhancing the physical stability of antibodies. A somewhat different

702

approach to improving physicochemical properties is to attach a fusion tag to the terminal

703

regions of proteins. For instance, with the guidance of Brownian dynamics (BD)

704

simulations,156 Nautiyal et al. designed an antibody to enhance solubility by attaching a

705

solubility-enhancing peptide (SEP) tag.157 By eliminating the degrees of freedom of the

706

solvent and using rigid-body treatments of protein structures, BD simulations can not only

707

reduce the computational cost, but also enable sampling many protein encounters to provide

708

reliable statistics on association kinetics. In the simulations of Nautiyal et al.,157 110

709

single-chain FV structures, each derived from conventional MD simulations of a homology

710

model of the antibody, were randomly placed, and the solubility of the antibody was

711

estimated by computing during the BD simulations the numbers and sizes of clusters, defined

712

as two antibodies’ approaching one another to within 3.6 Å. The BD simulations suggested

713

that the antibody with the SEP tag tended to be more often in the monomer form and was

714

associated with smaller cluster sizes than the wild type. The experimental verification of this

715

prediction showed that the designed antibody expressed in the soluble fraction, whereas the

716

wild type expressed in the insoluble fraction. Further characterization by DLS and CD

717

measurements also demonstrated that the designed antibody showed better solubility and

718

even better conformational stability. The conservation of the binding capability in the

719

designed antibody, which was confirmed by an SPR measurement, suggested that the SEP tag

720

could be useful in antibody engineering.

721 722

Prediction of viscosity

723

Viscosity has garnered much attention as a target engineering property in computer-aided

724

antibody design. Numerous researchers have devoted considerable effort to understand the

725

molecular origins of the concentration-dependent viscosity behavior of antibodies.158 Similar

726

to cognate protein–protein interactions, the driving forces of self-interactions are

727

hydrophobicity and electrostatics. Under identical formulation conditions, some antibodies

728

tend to show peculiar viscosity behavior that leads to aggregation, whereas others do not 21

729

exhibit such behavior. These observations have suggested that the viscosity behavior of

730

antibodies is determined by their amino acid sequences. Considering that the constant

731

domains of antibodies are well conserved in terms of both sequence and structure, the

732

different behaviors are likely due to differences in the variable regions. Indeed, Li et al. have

733

experimentally measured the viscosity of 11 antibodies under the same conditions and have

734

investigated relationships between concentration-dependent viscosities and several sequence-

735

and structural-based parameters.159 The authors found that the net charge, pI, zeta-potential,

736

and the aggregation property of FV regions should be important determinants of the

737

concentration-depend viscosity behavior observed in antibody solutions. Likewise, Sharma et

738

al. employed 14 antibodies to correlate the experimentally measured viscosity values with

739

predicted properties of antibodies, including properties obtained from MD simulations.160 In

740

agreement with the observations made by Li et al.,159 the authors arrived at the conclusions

741

that the viscosities of antibodies increase with hydrophobicity and charge dipole distribution,

742

whereas they decrease with net charge.160 The authors also found that 1) fast clearance is

743

correlated with high hydrophobicity of CDRs and high positive or high negative net charge,

744

2) chemical degradation from Trp oxidation is correlated with the average time of solvent

745

exposure of Trp residues, and 3) Asp isomerization rates can be predicted from the solvent

746

exposure and residue flexibility of Asp residues.

747

Furthermore, Trout and coworkers have proposed a high-throughput in silico tool,

748

termed spatial charge map (SCM), to identify highly viscous antibodies from their

749

structures.99 Conceptually similar to SAP calculations, where APRs are identified by spatial

750

summation of residue hydrophobicity, SCM calculations are based on spatial summation of

751

residue charges, with more emphasis on negative charge; several previous studies have

752

demonstrated that high-antibody viscosities are better correlated with negative than positive

753

charges161 on FV regions. Benchmarking of the SCM calculations with 19 antibodies provided

754

by three pharmaceutical and biotech companies showed clear separations between antibodies

755

possessing high and low viscosities.

756

With the goal of directly assessing the viscosity of antibody solutions, Kumar and

757

coworkers have employed experimentally measured viscosity data from 16 different

758

antibodies in the same formulation to derive mathematical models that aim to predict

759

concentration-dependent viscosity curves162 and a diffusion interaction parameter (kD)30 for

760

each antibody. An equation that was obtained from a stepwise linear regression for prediction

761

of viscosity curves included as independent variables the hydrophobicity of full-length

762

antibodies and charges on FV regions and hinge regions. The correlation coefficient (r)

763

between the experimental and predicted parameters of solution behavior was 0.54 with 22

764

leave-one-out

cross

validation,

and

the

equation

was

able

to

predict

the

765

concentration-dependent viscosity curves of the antibody solutions reasonably well.162

766

During the course of derivation of predictive models, the authors also found that the diffusion

767

interaction parameter, kD, was correlated well with several other parameters, such as

768

conformational stability, solubility and electrostatic properties of antibodies.30 To predict kD

769

from either experimentally measured or computationally predicted parameters, several

770

equations have been derived based on linear regressions on the parameters. The kD values

771

predicted by an equation derived purely from predicted parameters (estimated total charge on

772

FV and structure-based calculated hydrophobicity) have been highly correlated with

773

experimental kD values (r = 0.92).

774

Machine learning models are often referred to as a black box since what they describe is

775

correlations rather than causations. To mitigate such “feature” of machine learning

776

algorithms, Gentiluomo et al. have proposed an interpretable predictive model based on

777

neural networks to predict melting temperature (Tm), aggregation onset temperature (Tagg)

778

and diffusion interaction parameters (kD) as a function of pH and salt concentration from

779

amino acid composition of antibodies.163 Five IgG were provided with the the PIPPI

780

consortium (http://www.pippi.kemi.dtu.dk) as the dataset. After the training and testing of

781

their method, the authors applied a knowledge transfer process by evaluating the weights of

782

the parameters used in the trained networks, helping to understand how the prediction

783

algorithm arrive at the conclusion.

784 785

Engineering viscosity of antibody solutions

786

The foregoing studies were intended to screen and select antibodies with relatively low

787

viscosity from a large pool of antibodies that exhibit a variety of properties. In a situation

788

where only highly viscous antibodies are available, the sequences of those antibodies need to

789

be optimized by design. Here, we review studies that employ such computations at the stage

790

of antibody engineering.

791

In an example of computer-aided viscosity engineering that exploits a homology model

792

of an antibody, Nichols et al. have performed two types of mutagenesis studies: disruption of

793

1) an APR predicted by TANGO and PAGE, and 2) a negatively charged region.164 The

794

authors compared the results obtained by the two strategies and found on the one hand that

795

disrupting computationally predicted APRs could reduce the viscosity, but it also destabilized

796

antibodies and abolished antigen binding. On the other hand, a charge-neutralizing mutation

797

of a negative surface residue was able to reduce viscosity while simultaneously maintaining

798

conformational stability and antigen-binding capability. In another study, Kumar et al. have 23

799

also employed a homology model of the same antibody as Nichols et al. and have designed

800

seven variants based on free energy change upon mutation, as assessed by the residue-scan

801

module in MOE2014.09, to improve the physicochemical properties.165 The actual

802

improvements were experimentally verified in five out of seven cases. In particular, a variant

803

exhibited better solution behavior, lower viscosity, a reduced diffusion interaction parameter

804

(kD), better solubility, and even better binding activity toward the antigen.

805

Chow et al. have employed a crystal structure of an antibody with high viscosity and

806

phase separation at a high concentration to improve the properties of the antibody.166 Based

807

on the observation that the charge distribution on the molecular surface computed by the

808

AMBER99 force field in MOE2013 were unbalanced and there were several contacts

809

between neighboring molecules in the crystallographic lattice, the authors identified four

810

point mutations that could mitigate such phenomena. Among the mutations that ELISA and

811

SPR indicated did not affect the binding affinity of the antibody–antigen interaction, two

812

mutations, R33G and N35E in CDR-L1, showed a reduction in viscosity and a lower

813

propensity to form phase separation compared to the wild type. In addition, the mutation

814

S28K in CDR-H1 showed an increased propensity to form phase separation, and F102H in

815

CDR-H3 did not change either viscosity or phase separation behavior. Put together, these

816

results highlighted the importance of negative charges on viscous behavior. The authors

817

further sought to examine the relationships between several experimental parameters

818

measured at a low concentration (4–15 mg/ml) and the viscous behavior of the antibody at

819

high concentration (>50 mg/ml). They found that the diffusion interaction parameter (kD)

820

measured by DLS, the weight-averaged molecular weight, and the hydrodynamic diameters

821

measured by SLS at a low concentration in solution exhibited good correlations with the

822

behavior of the antibody in solution.

823

Geoghegan et al. have employed a homology model of an antibody and information from

824

HDX-MS to identify designable positions to reduce viscosity.167 Because the region

825

suggested by HDX-MS was still large, the authors further exploited the AggScore

826

implemented in the BioLuminate package of Schrödinger and the empirical reasoning that

827

hydrophobic and aromatic residues showed a tendency to contribute self-association, which

828

resulted in four positions located on CDR-H1 (H35), H2 (W50), FR2 (Y49), and CDR-L2

829

(L54). The experimental mutagenesis results indeed showed that the designed variants

830

exhibited reduced self-association tendencies and lower viscosity.

831

Based on the assumption that the tendency of a protein to self-associate is closely linked to

832

the hydration free energy of the protein in its monomeric state, Kuhn et al. have exploited MD

833

simulations and 3D-RISM theory to identify point mutations that could optimize the hydration 24

834

free energies of two antibodies that exhibited high viscosity at high concentrations.168 For those

835

two antibodies, 10 and 18 variants possessing mutations at framework regions were

836

computationally generated based on a crystal structure and a homology model made by the

837

MOE, respectively. These variants were further filtered based on the hydration free energies

838

computed by 3D-RISM theory and averaged over the MD snapshots. As a result, two variants

839

that were experimentally characterized showed that, compared to the wild type, the designed

840

variants, one including both H:E10G/D73N/A76K and L:D60S/E80Q and the other including

841

H:Q13K/D73N/Q115K, exhibited an improvement in solubility and a reduction in viscosity at

842

high concentration based on the dynamic viscosity and second virial coefficients obtained from

843

a rheometer and multiple-angle light scattering, respectively.

844

Jetha et al. have employed a homology model of an antibody, and they designed a series

845

of 97 variants based on the surface hydrophobicity determined by the Protein Surface

846

Analyzer application in MOE2016.0802, which was followed by hydrophobic interaction

847

chromatography (HIC) to experimentally estimate their viscosities;169 the reduced HIC

848

retention time of 67 variants implied lower viscosity. In addition, 93 variants showed binding

849

ability comparable to or better than that of the wild type. Overall, 29 variants exhibited both

850

reduced HIC retention times and comparable or better binding abilities than that of the wild

851

type. Retrospectively, the authors also performed a regression analysis to derive equations to

852

predict HIC retention time from sequence and structural descriptors of antibodies toward

853

high-throughput, in silico screening. The resultant equations exhibited a correlation

854

coefficient (r) between experimental HIC retention times and predicted values of 0.69 for the

855

97 variants above and a correlation coefficient (r) of 0.62 for 137 clinical stage antibodies

856

whose previously published HIC RT values were available.90

857

These described studies have demonstrated that statistical methods are quite useful in

858

screening and engineering viscous antibodies during drug discovery processes. In parallel

859

with such statistical and empirical predictions, thorough understanding of the molecular basis

860

for high viscosity in concentrated antibody solutions is still desirable; molecular simulations

861

can complement empirical predictions to achieve more rational antibody screening and

862

engineering.

863 864

Coarse-grained modeling of the behavior of antibodies in solution

865

In principle, if our understanding of the physics behind antibody structures and dynamics

866

was precise, and computational resources were infinitely available, atomistic molecular

867

simulations should reproduce the solution behavior of antibodies in a crowded, physiological

868

environment. However, such an ideal situation is still far from being a reality; the size of 25

869

conformational spaces explored by antibodies and the simulation timescales to reproduce

870

solution behavior are still too large to be studied in atomistic detail. Although these fields are

871

steadily improving, studies based on traditional molecular simulations assume that a single

872

molecule exists in a water box170 or single interactions happen between cognate pairs.171

873

However, soluble proteins can self-associate in a crowded environment, and such

874

self-association has been suggested to form transient and dynamic clusters in concentrated

875

solutions.27,172,173 Under these circumstances, a simplified coarse-grained (CG) representation

876

of antibody molecules and their simulations can prove useful in the study of the viscosity

877

behavior of concentrated antibody solutions.

878

Using 5-µs CGMD simulations with different resolutions, i.e., one bead per domain model

879

(12 CG sites in a IgG format) and the same model with a bead in each CDR and in each hinge

880

region, respectively (26 CG sites), Chaudhri et al.174,175 have studied the solution behavior of

881

the IgG format of two antibodies that differed from each other by only a few mutations in the

882

CDRs but showed very different viscosity behavior with an increase in concentration.174 Based

883

on the radial distribution function and potential of mean force computed from the simulation

884

trajectories with six different concentrations (20, 40, 60, 80, 100, and 120 mg/ml), the

885

quantification of the concentration dependency of the solution behavior of the antibodies

886

suggested that inter-domain interactions involving both Fab and the constant regions lead to

887

the formation of transient intermolecular networks and that these interactions contribute

888

toward increased viscosity of antibody solutions at high concentrations. The CGMD

889

simulations also suggested that a higher-resolution CG model (26-site model) did not offer

890

much more than the lower resolution model (12-site model); in both models, electrostatic

891

interactions at the domain level played a dominant role in determining the self-association of

892

the antibodies, in qualitative agreement with previous experimental studies, wherein adding

893

NaCl decreased the solution viscosity.176,177 The results obtained by the additional CG

894

simulations on the charge swap mutants175 of the two antibodies were also consistent with

895

previous experimental results.178,179 Subsequently, Buck et al. extended this approach to four

896

different antibodies, and they arrived at similar conclusions: electrostatic complementarity at

897

the domain level was the most vital factor that governed transient network formation in a

898

highly concentrated antibody solution.180

899

computational study employing an all-atom model of IgG structures. Lapelosa et al. employed

900

the same antibodies as Chaudhri et al.174,175 to perform all-atom MD simulations of the single

901

IgG structures, and representative solution structures from the MD trajectories were supplied

902

to the subsequent grid-based conformational search to generate plausible dimer model

These results were also supported by a

26

903

structures.181 Electrostatic interactions were calculated by solving the Poisson-Boltzmann

904

equation. Their results also suggested that electrostatics played a role in self-association.

905

More recently, using the same antibodies used by Chaudhri et al.,174 Wang et al. performed

906

CG Brownian dynamics (BD) simulations to quantitatively reproduce the previous

907

experimental results of bulk transport properties.182 Unlike the previous studies that used

908

CGMD simulations,174 wherein a dielectric constant of 1 was used for assessing electrostatic

909

interactions, Wang et al. have exploited using a dielectric constant of 80 and thus implicitly

910

considered electrostatic screening. As a result, no dense cluster or strong network was

911

observed, but instead loosely connected clusters emerged in the antibody solutions. The bulk

912

transport properties of the antibody solutions such as structure factors, self-diffusivity, and

913

viscosity computed from the CGBD simulations with microscopic parameters were in

914

quantitative agreement with previous experimental values.178,179,183

915

Small-angle X-ray scattering (SAXS) has been used to study self-association of antibody

916

molecules in solution, and the resulting scattering profiles have been interpreted based on

917

simple spherical models interacting through potentials comprised of long-rage repulsion and

918

short-range attraction.183–185 Corbett et al. have gone one step further by using CGMD

919

simulations with a three-bead model, which was able to reproduce features of SAXS profiles

920

that were not captured by spherical models.186

921

923

PREDICTION ANTIBODIES

924

For therapeutic antibodies, poor physicochemical properties such as low stability that lead to

925

(partial) unfolding and aggregations are significant risk factors for deleterious immune

926

responses in patients. Assessing and predicting immunogenicity are therefore also among the

927

important steps in antibody drug discovery.

922

AND

ENGINEERING

OF

IMMUNOGENICITY

OF

928 929

Prediction of humanness and immunogenicity

930

Table V summarizes the computational methods used to assess, predict, and reduce protein

931

immunogenicity. There are several factors that may contribute to immunogenicity of

932

antibodies. Based on how immune systems work, an obvious factor would be a sequence

933

identity to human antibodies. To address this concern, Abhinandan and Martin have compared

934

the amino acid sequences of antibodies of humans and mice to determine the degree of

935

humanness of mouse antibody sequences.187 Based on 3097 light chains and 3409 heavy chains

936

in the Kabat database, the authors derived Z-scores calculated from means and standard

937

deviations of pair-wise sequence identities within human sequences and between human and 27

938

mouse sequences, respectively. The Z-scores represent how typical a sequence is of the human

939

repertoire. However, when the Z-scores were applied to 12 therapeutic antibodies whose

940

anti-antibody response data had been reported, the very poor correlation between the

941

anti-antibody response data and the Z-scores (r = −0.12) suggested that there were no direct

942

relationships between the humanness score and immunogenicity. A web server, SHAB, was

943

developed to compute the Z-scores from an amino acid sequence so that everyone could assess

944

the degree of humanness of their antibodies (Table V). However, antibodies evolve in ways

945

that cause them to have diverse mature sequences derived from sequences of a limited germline

946

origin, and use of the germline gene is not evenly distributed in antibody populations. To avoid

947

any influence of the biased germline use on the assessment of humanness of antibodies,

948

Thullier et al. have proposed another Z-score-based metric that incorporates human germline

949

gene information. This metric is called the G-score.188

950

Germline sequences of antibodies can be attractive references to assess humanness

951

because they originate 100% from humans. Pelat et al. have therefore developed a

952

germinality index (GI) that has been defined as the percentage of residue identities in

953

framework regions between a given antibody sequence and the closest human germline

954

sequence in the IMGT database.189 The GI has been employed to humanize an antibody

955

derived from a non-human primate. The resultant humanized antibody exhibited a higher GI

956

score than a fully human antibody while preserving the binding capability to the antigen.189

957

Gao et al. have developed yet another sequence identity-based method termed the T20

958

score analyzer to quantify the humanness of antibodies.190 The authors first construct a

959

database of human antibodies that stores 38,708 human antibody–variable region sequences

960

derived from the NCBI IgBLAST.191 A BLAST search of an input antibody sequence is then

961

performed against the database. Averaging the percent sequence identities between a given

962

antibody sequence and the top 20 matched sequences, rather than the entire population, in the

963

database leads to the T20 score. The authors demonstrated that the T20 score was able to

964

distinguish human antibody sequences and non-human antibody sequences. Although

965

conceptually similar to the methods of Martin and coworkers,187,188 a clear distinction

966

between the T20 score and the Z-score is the size of the reference databases of human

967

antibodies; there are 38,708 and 6506 human antibody sequences in the databases of the T20

968

score analyzer and SHAB, respectively. Furthermore, the T20 score was applied to 65

969

therapeutic antibodies whose immunogenicity data were available; a week correlation

970

between the T20 scores and immunogenicity emerged with a correlation coefficient (r) of

971

~0.46. Comparison of antibodies before and after humanization of the antibodies revealed a

972

clear trend: the immunogenicity decreased while the T20 scores increased. 28

973

Seeliger192 has expanded on the use of simple pairwise sequence comparisons to derive

974

sequence-based statistical potentials using 11,849 antibody sequences of humans and mice

975

obtained from the abYsis database.193 Instead of simply computing sequence identities

976

between a given sequence and sequences of human antibodies, the author incorporated

977

position-specific probabilities of individual amino acids derived from a multiple-sequence

978

alignment of each chain type (i.e., heavy, κ-light, or λ-light chains of humans and mice); the

979

resulting potentials were able to distinguish between human and mouse antibodies. Based on

980

Monte Carlo sampling coupled with the potentials derived from the human antibody

981

sequences, the author computationally demonstrated that the sequences of Rituximab can

982

evolve into lower immunogenic sequences, as predicted by a Epivax score.194 The

983

sequence-based potentials were later used to design antibody sequences that were predicted

984

to have better physicochemical properties compared to the wild type.195 The series of the

985

designed antibodies was experimentally characterized; DSC suggested an improvement of the

986

Tm (68.0℃ and 83.5℃ for the wild type and the most improved design, respectively); SEC

987

showed an improvement in the long-term stability of the variants, as represented by the

988

monomer content of the samples under conditions that were relevant to the biopharmaceutical

989

development process over time; and SPR revealed the preserved binding affinities to the

990

antigen among the variants.

991

Similarly, based on a training set of 26,912 antibody sequences derived from humans and

992

mice in the IMGT database, Clavero-Alvarez et al. have developed a multivariable gaussian

993

(MG) model that takes into account the correlations between mutations at different positions

994

both within a chain and across two chains (i.e., H and L).196 The authors sought to distinguish

995

human and mouse sequences under various conditions and found that 1) CDRs did not carry

996

any relevant species-specific information that was necessary to distinguish two sequence

997

populations and 2) light chains carried a greater amount of such information than heavy

998

chains. Furthermore, based on another 1388 and 1379 sequences of human and mouse

999

antibodies, respectively, the MG model showed slightly better ability to distinguish

1000

sequences from the two populations than sequence identity-based methods190 (the prediction

1001

accuracies were 94% and 91% for the MG model and the best sequence identity-based

1002

method, respectively). The MG score was further compared to experimental immunogenicity,

1003

which was defined as the fraction of observed immunogenic responses (appearance of

1004

anti-drug antibodies) reported in the literature; the Pearson correlation coefficient (r) between

1005

the MG score and the immunogenicity was −0.43. Coupled with Steepest Descent and

1006

Simulated Annealing MC simulations, the MG score was exploited to guide sequence

1007

optimizations of seven mouse sequences whose experimentally humanized sequences were 29

1008

also available; the designed sequences starting from the mouse sequences differed from the

1009

experimentally humanized sequences, whereas many of the mutations were in common. The

1010

implication was that the experimentally humanized sequences would not be the only

1011

solutions in humanization and that the computational algorithm was able to capture some

1012

essential aspects of humanization procedures currently used in the field.

1013

Adaptive immune systems begin via antigenic peptide presentations by HLA molecules

1014

toward T-cell receptors. On the one hand, short stretches of peptide that form such T-cell

1015

epitopes on antibody structures may therefore lead to immunogenicity; on the other hand,

1016

germline sequences of human antibodies may not be recognized as “foreign” by HLAs

1017

because their origin is 100% human. Based on this assumption, Lazar et al. have proposed

1018

another metric to assess immunogenicity of antibodies that they have called the Human

1019

String Content (HSC).197 The HSC can be computed for each peptide in a target sequence

1020

based on the number of residues identical to their counterparts in the most similar aligned

1021

peptide from a human germline antibody. Computational prediction of HLA-binding peptides

1022

has been studied for decades, and, to keep abreast of recent trends, we refer readers to a

1023

review by Song and coworkers and the references therein.198

1024

In addition to T-cell epitopes, B-cell epitopes on antibody structures can be immunogenic,

1025

and interactions between two antibodies or anti-antibody responses occur through such

1026

immunogenic regions on antibody structures.199 However, such experimental data (i.e.,

1027

sequences and structures of anti-antibody antibodies) are not readily available. For instance,

1028

using 44 antibody-antibody complexes in the Protein Data Bank, Qiu et al. have tried to

1029

examine whether B-cell epitopes on antibodies possess propensities similar to those on

1030

generic protein antigens.199 It seemed, however, that the fact that their dataset consisted of not

1031

only immunogenic antibody–antibody complexes but also antibody–antibody complexes that

1032

may have been formed merely by crystal-packing contacts made it difficult to draw a

1033

conclusion regarding differences between B-cell epitopes on antibodies and on generic

1034

protein antigens. The nature of cognate protein–protein interactions is considered to differ

1035

phenomenologically from crystal-packing contacts.200

1036

As shown in Table V, only two of the six methods for immunogenicity assessment

1037

(Z-score and T20 score analyzer) have been implemented as web servers and are available in

1038

public as of the writing of this review article. The concepts behind the other methods are

1039

quite simple, and in-house implementation as web servers or command line tools would be

1040

straightforward. Considering the fact that even fully human antibodies could be

1041

immunogenic,201 there would be no perfect single method to assess immunogenicity of

30

1042

antibodies in silico. Looking at antibodies from a variety of angles with different techniques

1043

will therefore be highly desirable.

1044 1045

Antibody humanization

1046

Antibody humanization was one of the earliest attempts at computer-aided antibody

1047

design. The initial attempt for humanization used CDR grafting.202 The assumption was that

1048

the more similar an antibody sequence was to a human antibody sequence, the lower the

1049

immunogenicity it would have. Humanization by CDR grafting often requires back-mutation

1050

in framework regions, wherein visualization of three-dimensional structures of antibodies

1051

helps to identify important residues such as residues that structurally support CDRs.

1052

Framework templates can be obtained by a simple similarity search of the entire sequences as

1053

well as a similarity search of CDRs. The latter is referred to as super-humanization.203,204 The

1054

rationale is that the more similar a CDR sequence is to a CDR sequence of human antibodies,

1055

the more conserved the framework would be because of the conservation of canonical

1056

structures; important framework residues needed to maintain CDR conformations are

1057

assumed to be conserved when CDR sequences are similar between two antibodies, and

1058

hence there would be no need for the back-mutation in framework regions. In addition,

1059

Roguska et al. have proposed an alternative technique called resurfacing, wherein residues

1060

exposed to solvents in the FV region of mouse antibodies are replaced with corresponding

1061

residues observed in human antibodies.205 The statistics of residue frequency at each position

1062

in antibodies of humans and other species can now be readily obtained via the abYsis

1063

database developed by Swindells et al.193 A web server termed Tabhu206 has enabled easy

1064

access to a large number of annotated human antibody sequences and thereby made

1065

automated template searches and CDR grafting much easier for non-experts.

1066

There are several reviews that have surveyed past examples of antibody

1067

humanization.207–209 Table V also summarizes the representative methods for antibody

1068

humanization. In the following paragraphs, we present recent examples of computer-aided

1069

humanization that go beyond simple sequence comparisons and visualization of a static

1070

structure.

1071

Historically, a crystal structure or a homology model has been exploited in humanization

1072

procedures. However, proteins are dynamic molecules in solution,19 and a conformational

1073

ensemble is probably a better representation of a protein. An obstacle in traditional

1074

humanization work has been reducing or even diminishing the binding affinity after CDR

1075

grafting; this binding affinity has been interpreted as structural distortion of CDRs caused by

1076

incompatibility between the grafted CDRs and framework regions.210 MD simulations are 31

1077

among the best methods for assessing such dynamical effects on protein structures. For

1078

example, Zhang et al. have employed MD simulations to assess mutational effects on

1079

antibody structures during a humanization procedure.211 After in silico epitope scanning

1080

based on sequence (6-residue) and spatial (2-residue pair in space) local similarities to human

1081

antibodies, several residues were computationally identified as immunogenic. After replacing

1082

those residues with the corresponding residues observed in human antibodies, they performed

1083

5-ns MD simulations of the series of homology models of the variants (30 variants in total)

1084

with an explicit solvent. By using RMSD as a metric to assess CDR flexibilities, the authors

1085

were able to design humanized variants that possessed binding affinity comparable to that of

1086

the original rat antibody, which had been experimentally assessed by SPR and flow

1087

cytometry. In another example, Kunert, Oostenbrink, and coworkers have also sought to use

1088

MD simulations to predict effects of back-mutations on antibody structures.212,213 On the

1089

assumption that variants with structures and dynamics comparable to those of the original

1090

mouse antibody would show significant binding, a similarity score was developed based on

1091

the RMSD of all atoms in CDR-H3 to quantify conformational differences between the

1092

mouse antibody and humanized variants during the simulations. Starting from a crystal

1093

structure of the mouse antibody or the variant models, MD simulations were performed for

1094

~100 ns in an explicit solvent. The weak correlation between the similarity scores and

1095

binding affinities experimentally measured by bio-layer interferometry (BLI) confirmed that

1096

a requirement of humanization procedures was to identify mutations that could restore

1097

conformations of CDRs. The MD simulations further suggested a few mutations that seemed

1098

to structurally support the conformation of CDR-H3 and thereby affected the binding

1099

capability. These observations were experimentally verified via Ala scanning and BLI

1100

measurements. As a result, starting from a humanized variant that had completely lost its

1101

binding affinity for the antigen, the authors were able to restore the affinity to the level of the

1102

original mouse antibody with some back mutations selected by the MD simulations.

1103

All the foregoing examples have focused on designs or back-mutations on framework

1104

regions of the antibodies to restore the binding. Another strategy to restore binding affinity is

1105

to design CDRs so that they can retain conformations, i.e., incorporate some residues

1106

observed in CDRs of human antibodies that are compatible to the human framework.

1107

However, compared to framework regions, CDRs are hyper-variable and are expected to

1108

contribute binding to the cognate antigens. It is therefore not straightforward to empirically

1109

identify such mutations. In such a situation, computational protein design can be a solution.

1110

In a study by Hanf et al.214 protein design calculations were performed using DEEK software

1111

that exploited dead-end elimination (DEE) and A* search algorithms and by Dezymer 32

1112

software that also exploited the DEE algorithm215, respectively. The calculations were used to

1113

re-design sequences of CDRs of an antibody; the top recommendations from both pieces of

1114

software were merged to make the final list of designed variants. The initial structure for

1115

computational design was a model structure of the CDR-grafted antibody that possessed

1116

human germline framework regions. The structure was built from a crystal structure of the

1117

original mouse antibody; the binding affinity of the CDR-grafted antibody measured by

1118

ELISA was 100-fold worse than the original mouse antibody. For validation of the

1119

computational design, eight suggested variants were experimentally synthesized, and two of

1120

them exhibited binding affinities comparable to that of the wild type.

1121

An advantage of incorporating protein design calculations into humanization procedures

1122

is that, in addition to immunogenicity, one can take account of other properties, such as

1123

stability. In accord with this line of reasoning, Bailey-Kellogg, Griswold, and coworkers have

1124

been developing computational de-immunization methods for protein therapeutics.216–223

1125

Instead of CDR grafting, their method relies on identification of potential short stretches of

1126

T-cell epitopes on protein structures and replaces the amino acids that form the T-cell

1127

epitopes. The short stretches can then have lower propensities of T-cell epitopes. For instance,

1128

using a homology model of a mouse IgG1 antibody as a design target, Choi et al.221

1129

employed 1) the HSC scores197 to assess the immunogenic regions and 2) the OSPREY

1130

protein redesign software224 to replace some of the amino acids with amino acids observed in

1131

germline sequences of human antibodies.225 The authors further experimentally demonstrated

1132

that four of the eight humanized variants tested for the verification exhibited binding

1133

affinities within an order of magnitude of the original mouse antibody based on assessments

1134

with BLI measurements.221 However, a variant designed by traditional CDR grafting could

1135

not be expressed, probably because the CDR grafting introduced five mutations to the

1136

Vernier zone,226 whereas when consideration was given to energetics, their computational

1137

procedure was able to retain all the Vernier zone residues. The importance of Vernier zone

1138

residues in humanization has been implicated by previous experimental studies, where the

1139

reduction of binding affinities has been attributed to displacements of VL/VH domains as well

1140

as distortions of the canonical structures.227,228

1141

In another instance, using a crystal structure of cetuximab, Choi et al. have used DSF

1142

measurements to also show that their computational humanization method is able to increase

1143

the HSC score and simultaneously improve the conformational stability of the antibody (∆Tm

1144

= ~6.3℃) while preserving the binding affinity to the antigen.222 Because these T-cell

1145

epitope-based deimmunization methods are independent of CDR grafting, they can be

1146

applied to other protein therapeutics, such as enzyme and peptides, as demonstrated in other 33

1147

studies.229–233 The integrated methods developed by Bailey-Kellogg, Griswold, and

1148

coworkers are in the public domain as the EpiSweep package.234

1149 1150

PERSPECTIVES

1151

The predictive tools described here, together with accumulated knowledge of antibody

1152

sequences, structures, and properties, should now enable rapid screening and selection of

1153

antibodies during the early processes of antibody drug discovery. For drug discovery of small

1154

compounds, various rapid screening approaches have been effectively employed, such as the

1155

estimation of “druggability,” for rational design and evaluation of more potent compounds.235

1156

For antibody drug discovery, to our knowledge, the first such metric is one proposed by

1157

Kuroda et al;8 based on 12 amino acid sequences of antibody therapeutics reported in the

1158

DrugBank236 at that time, the authors pointed out that the antibody therapeutics tended to

1159

have shorter lengths and more rigid conformations of CDR-H3. Furthermore, recent

1160

advancements in experimental techniques have enabled high-throughput analyses for

1161

physicochemical characterizations of antibody therapeutics; Wittrup and coworkers collected

1162

amino acid sequences of 137 antibodies in the clinical stages (phase-2 and -3), and

1163

experimentally characterized the physicochemical properties.90 Their dataset should guide

1164

computational biologists toward developments of druggability metrices of antibody

1165

therapeutics. For instance, Raybould et al. implemented the Therapeutic Antibody Profiler

1166

(TAP) webserver to assess the druggability of antibodies from their amino acid sequences.237

1167

Based on a statistical analysis of the 137 clinical-stage antibody therapeutics90 and the

1168

comparison to human antibody repertoires,238 the authors found that the total length of CDRs,

1169

surface hydrophobicity, charges in CDRs, and asymmetry of the surface net charges of FV

1170

domains could be guidelines to assess the druggability of antibodies.

1171

It is worth noting that the implementation of the TAP server and many of the other

1172

studies described above have employed homology modeling of antibodies to derive various

1173

parameters and to engineer better antibodies. Except for CDR-H3, antibody structures are

1174

conserved well;2,4,239 these studies have therefore strongly suggested that current antibody

1175

modeling techniques are reliable enough to be employed in high-throughput, sequence-based

1176

in silico screening. However, structure prediction of CDR-H3 is still very challenging.8,9

1177

Because the function of antibodies has centered on the diversity of CDR-H3, to design a

1178

functional antibody with better developability, methods for structure prediction of CDR-H3

1179

as well as antibody–antigen complexes need to be improved. In particular, conformational

1180

changes can occur upon antigen binding. In addition to CDR conformations, the relative

34

1181

orientation of VL/VH domains can also change,228 making it even more challenging to predict

1182

structures of antibodies and antibody–antigen complexes.240,241

1183

Modeling conformational change or flexibility of proteins in silico is still an unsolved

1184

problem; Kuroda and Gray previously demonstrated that accuracies of current computational

1185

methods to model protein backbone flexibility is not satisfactory and subtle backbone

1186

displacements could lead to deteriorate energy landscape of proteins in computational

1187

modeling.242 There are a lot of studies on inter-relationships of protein flexibility, aggregation

1188

and chemical stability.243–245 Therefore, the method development for modeling protein

1189

flexibility is another important area in computer-aided antibody design.

1190

Changes in binding affinity of antibody–antigen interactions or the binding free energy

1191

can be described by a thermodynamic equation relating to enthalpy and entropy; favorable

1192

enthalpic interactions are often attributed to formations of new salt bridges or hydrogen

1193

bonds, whereas favorable entropic interactions can be interpreted by rigidification of

1194

antibodies themselves or change in water dynamics at the interfaces.246–249 Thus, it is most

1195

likely that there are multiple routes to improve binding affinity of antibody–antigen

1196

interactions.250 An interesting strategy to improve binding affinity has been also suggested

1197

based on mutations of framework regions, which do not directly contact antigens, affecting

1198

on-rates of antigen binding.251,252 Elucidating the molecular mechanism of such long-range

1199

mutational effects will lead to a novel maturation strategy that has not been employed in our

1200

immune systems.

1201

De novo design of functional proteins is now becoming possible with guidance of some

1202

experimental procedures.253–256 For antibody design, the use in a few successful studies of

1203

profile-based constraints to design antibody sequences254,257 has suggested that antibody

1204

sequences have already been highly optimized in the evolutionary process (i.e., both in

1205

mammalian evolution and somatic hypermutation). The imposition of selection pressure on

1206

sequence design calculations by these profile-based constraints has forced the designed

1207

sequences to mimic natural variations of antibody sequences. However, some properties such

1208

as viscosity are specific for biopharmaceutical and biotechnological applications and would

1209

not be selected for or against by evolution. Further studies on concentrated antibody solutions

1210

and methods to optimize such properties are therefore highly desirable for understanding the

1211

molecular basis of such properties.

1212

In recent decades, various properties of proteins have been predicted from amino acid

1213

sequences or structures by machine learning in which explicit design processes have not been

1214

effectively implemented. Considering that de novo design of small drug compounds is still

1215

not an easy task,258 de novo creation of functional amino acid sequences by machine learning 35

1216

may not yet be feasible. However, for antibodies in particular, artificial antibodies have been

1217

created using in vitro libraries.259 Considering recent advances in computational modeling

1218

algorithms and our knowledges of antibody sequences and structures, de novo creation of

1219

antibodies in silico could thus be achieved in the near feature.

1220

In this review, we outlined various approaches to engineer physicochemical and

1221

biological properties of antibodies, in which amino acid sequences were modified mainly

1222

through computational design. In addition to such mutation-based design approaches, another

1223

way to control antibody’s properties is to use chemical additives during manufacturing

1224

processes.260,261 Computations could also play some roles in understanding molecular details

1225

of interactions between proteins and such additives.

1226

Put together, further precise and quantitative understanding of antibody properties would

1227

make it possible to simultaneously optimize binding affinity, specificity, stability, viscosity,

1228

and immunogenicity of the amino acid sequences by computational design.

1229 1230

ACKNOWLEDGMENTS

1231

This work was funded in part by the Japan Society for the Promotion of Science (grant

1232

numbers JP17K18113 and JP19H03522 to D.K., and JP16H02420 and JP19H05766 to K.T.)

1233

and by the Japan Agency for Medical Research and Development (grant numbers

1234

JP19fm0208022h, JP18ak0101100h, and JP19ak0101117h to D.K., and JP18am0101094j,

1235

JP18dm0107064h,

1236

JP18ak0101100h to K.T.).

JP18mk0101081h,

JP18fm0208030h,

JP18fk0108073h,

and

1237 1238

REFERENCES

1239

1.

Almagro JC, Teplyakov A, Luo J, et al. Second Antibody Modeling Assessment

1240

(AMA-II). Proteins Struct Funct Bioinforma. 2014;82(8):1553-1562.

1241

doi:10.1002/prot.24567

1242

2.

Al-Lazikani B, Lesk AM, Chothia C. Standard conformations for the canonical

1243

structures of immunoglobulins. J Mol Biol. 1997;273(4):927-948.

1244

doi:10.1006/jmbi.1997.1354

1245

3.

hypervariable regions. Nature. 1989;342(6252):877-883. doi:10.1038/342877a0

1246 1247

Chothia C, Lesk AM, Tramontano A, et al. Conformations of immunoglobulin

4.

Kuroda D, Shirai H, Kobori M, Nakamura H. Systematic classification of CDR-L3 in

1248

antibodies: Implications of the light chain subtypes and the VL-VH interface. Proteins

1249

Struct Funct Bioinforma. 2009;75(1):139-146. doi:10.1002/prot.22230

36

1250

5.

conformations. J Mol Biol. 2011;406(2):228-256. doi:10.1016/j.jmb.2010.10.030

1251 1252

6. 7.

Shirai H, Kidera A, Nakamura H. H3-rules: identification of CDR-H3 structures in antibodies. FEBS Lett. 1999;455(1-2):188-197. doi:10.1016/S0014-5793(99)00821-2

1255 1256

Shirai H, Kidera A, Nakamura H. Structural classification of CDR-H3 in antibodies. FEBS Lett. 1996;399(1-2):1-8. doi:10.1016/S0014-5793(96)01252-5

1253 1254

North B, Lehmann A, Dunbrack RL. A new clustering of antibody CDR loop

8.

Kuroda D, Shirai H, Kobori M, Nakamura H. Structural classification of CDR-H3

1257

revisited: A lesson in antibody modeling. Proteins Struct Funct Bioinforma.

1258

2008;73(3):608-620. doi:10.1002/prot.22087

1259

9.

Structure. 2015;23(2):302-311. doi:10.1016/j.str.2014.11.010

1260 1261

10.

Kuroda D, Tsumoto K. Antibody Affinity Maturation by Computational Design. In: Methods in Molecular Biology. ; 2018:15-34. doi:10.1007/978-1-4939-8648-4_2

1262 1263

Weitzner BD, Dunbrack RL, Gray JJ. The origin of CDR H3 structural diversity.

11.

Rabia LA, Desai AA, Jhajj HS, Tessier PM. Understanding and overcoming trade-offs

1264

between antibody affinity, specificity, stability and solubility. Biochem Eng J.

1265

2018;137:365-374. doi:10.1016/j.bej.2018.06.003

1266

12.

Protein Eng Des Sel. 2012;25(10):507-521. doi:10.1093/protein/gzs024

1267 1268

Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design.

13.

Sevy AM, Meiler J. Antibodies: Computer-Aided Prediction of Structure and Design

1269

of Function. Microbiol Spectr. 2014;2(6):1-14.

1270

doi:10.1128/microbiolspec.AID-0024-2014

1271

14.

2018;51:156-162. doi:10.1016/j.sbi.2018.04.007

1272 1273

15.

Roy A, Nair S, Sen N, Soni N, Madhusudhan MS. In silico methods for design of biological therapeutics. Methods. 2017;131:33-65. doi:10.1016/j.ymeth.2017.09.008

1274 1275

Fischman S, Ofran Y. Computational design of antibodies. Curr Opin Struct Biol.

16.

Norman RA, Ambrosetti F, Bonvin AMJJ, et al. Computational approaches to

1276

therapeutic antibody design: established methods and emerging trends. Brief Bioinform.

1277

October 2019. doi:10.1093/bib/bbz095

1278

17.

Antibodies. 2018;7(3):22. doi:10.3390/antib7030022

1279 1280

18.

Kazlauskas R. Engineering more stable proteins. Chem Soc Rev. 2018;47(24):9026-9045. doi:10.1039/C8CS00014J

1281 1282

Zhao J, Nussinov R, Wu W-J, Ma B. In Silico Methods in Antibody Design.

19.

Boehr DD, Nussinov R, Wright PE. The role of dynamic conformational ensembles in

1283

biomolecular recognition. Nat Chem Biol. 2009;5(11):789-796.

1284

doi:10.1038/nchembio.232 37

1285

20.

Meric G, Robinson AS, Roberts CJ. Driving Forces for Nonnative Protein Aggregation

1286

and Approaches to Predict Aggregation-Prone Regions. Annu Rev Chem Biomol Eng.

1287

2017;8(1):139-159. doi:10.1146/annurev-chembioeng-060816-101404

1288

21.

Ventura S, Zurdo J, Narayanan S, et al. Short amino acid stretches can mediate

1289

amyloid formation in globular proteins: The Src homology 3 (SH3) case. Proc Natl

1290

Acad Sci. 2004;101(19):7258-7263. doi:10.1073/pnas.0308249101

1291

22.

Tsumoto K, Ejima D, Kumagai I, Arakawa T. Practical considerations in refolding

1292

proteins from inclusion bodies. Protein Expr Purif. 2003;28(1):1-8.

1293

doi:10.1016/S1046-5928(02)00641-1

1294

23.

Tsumoto K, Umetsu M, Kumagai I, Ejima D, Philo JS, Arakawa T. Role of arginine in

1295

protein refolding, solubilization, and purification. Biotechnol Prog.

1296

2004;20(5):1301-1308. doi:10.1021/bp0498793

1297

24.

Kumar S, Plotnikov N V., Rouse JC, Singh SK. Biopharmaceutical Informatics:

1298

supporting biologic drug development via molecular modelling and informatics. J

1299

Pharm Pharmacol. 2018;70(5):595-608. doi:10.1111/jphp.12700

1300

25.

Lu X, Nobrega RP, Lynaugh H, et al. Deamidation and isomerization liability analysis

1301

of 131 clinical-stage antibodies. MAbs. 2019;11(1):45-57.

1302

doi:10.1080/19420862.2018.1548233

1303

26.

Nowak C, K. Cheung J, M. Dellatore S, et al. Forced degradation of recombinant

1304

monoclonal antibodies: A practical guide. MAbs. 2017;9(8):1217-1230.

1305

doi:10.1080/19420862.2017.1368602

1306

27.

Tomar DS, Kumar S, Singh SK, Goswami S, Li L. Molecular basis of high viscosity in

1307

concentrated antibody solutions: Strategies for high concentration drug product

1308

development. MAbs. 2016;8(2):216-228. doi:10.1080/19420862.2015.1128606

1309

28.

Grünberger A, Lai P-K, Blanco MA, Roberts CJ. Coarse-Grained Modeling of Protein

1310

Second Osmotic Virial Coefficients: Sterics and Short-Ranged Attractions. J Phys

1311

Chem B. 2013;117(3):763-770. doi:10.1021/jp308234j

1312

29.

Blanco MA, Sahin E, Robinson AS, Roberts CJ. Coarse-Grained Model for Colloidal

1313

Protein Interactions, B 22 , and Protein Cluster Formation. J Phys Chem B.

1314

2013;117(50):16013-16028. doi:10.1021/jp409300j

1315

30.

Tomar DS, Singh SK, Li L, Broulidakis MP, Kumar S. In Silico Prediction of

1316

Diffusion Interaction Parameter (kD), a Key Indicator of Antibody Solution Behaviors.

1317

Pharm Res. 2018;35(10):193. doi:10.1007/s11095-018-2466-6

1318 1319

31.

Hwang WYK, Foote J. Immunogenicity of engineered antibodies. Methods. 2005;36(1):3-10. doi:10.1016/j.ymeth.2005.01.001 38

1320

32.

FEBS Lett. 2014;588(2):269-277. doi:10.1016/j.febslet.2013.11.029

1321 1322

33. 34.

Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A. 2000;97(19):10383-10388. doi:97/19/10383 [pii]

1325 1326

Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science (80- ). 1997;278(5335):82-87. doi:10.1126/science.278.5335.82

1323 1324

Rouet R, Lowe D, Christ D. Stability engineering of the human antibody repertoire.

35.

Cao H, Wang J, He L, Qi Y, Zhang JZ. DeepDDG: Predicting the Stability Change of

1327

Protein Point Mutations Using Neural Networks. J Chem Inf Model. 2019.

1328

doi:10.1021/acs.jcim.8b00697

1329

36.

Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the

1330

estimation of protein stability changes upon mutation and sequence optimality. BMC

1331

Bioinformatics. 2011;12(1):151. doi:10.1186/1471-2105-12-151

1332

37.

Pandurangan AP, Ochoa-Montaño B, Ascher DB, Blundell TL. SDM: a server for

1333

predicting effects of mutations on protein stability. Nucleic Acids Res.

1334

2017;45(W1):W229-W235. doi:10.1093/nar/gkx439

1335

38.

Folkman L, Stantic B, Sattar A, Zhou Y. EASE-MM: Sequence-Based Prediction of

1336

Mutation-Induced Stability Changes with Feature-Based Multiple Models. J Mol Biol.

1337

2016;428(6):1394-1405. doi:10.1016/j.jmb.2016.01.012

1338

39.

Pires DE V., Ascher DB, Blundell TL. mCSM: predicting the effects of mutations in

1339

proteins using graph-based signatures. Bioinformatics. 2014;30(3):335-342.

1340

doi:10.1093/bioinformatics/btt691

1341

40.

Capriotti E, Fariselli P, Rossi I, Casadio R. A three-state prediction of single point

1342

mutations on protein stability changes. BMC Bioinformatics. 2008;9(Suppl 2):S6.

1343

doi:10.1186/1471-2105-9-S2-S6

1344

41.

Quan L, Lv Q, Zhang Y. STRUM: structure-based prediction of protein stability

1345

changes upon single-point mutation. Bioinformatics. 2016;32(19):2936-2946.

1346

doi:10.1093/bioinformatics/btw361

1347

42.

Cheng J, Randall A, Baldi P. Prediction of protein stability changes for single-site

1348

mutations using support vector machines. Proteins Struct Funct Bioinforma.

1349

2005;62(4):1125-1132. doi:10.1002/prot.20810

1350

43.

Pucci F, Kwasigroch JM, Rooman M. SCooP: an accurate and fast predictor of protein

1351

stability curves as a function of temperature. Valencia A, ed. Bioinformatics.

1352

2017;33(21):3415-3422. doi:10.1093/bioinformatics/btx417

39

1353

44.

Gapsys V, Michielssens S, Seeliger D, de Groot BL. pmx: Automated protein structure

1354

and topology generation for alchemical perturbations. J Comput Chem.

1355

2015;36(5):348-354. doi:10.1002/jcc.23804

1356

45.

Gapsys V, Michielssens S, Seeliger D, de Groot BL. Accurate and Rigorous Prediction

1357

of the Changes in Protein Free Energies in a Large-Scale Mutation Scan. Angew

1358

Chemie - Int Ed. 2016;55(26):7364-7368. doi:10.1002/anie.201510054

1359

46.

Wang L, Wu Y, Deng Y, et al. Accurate and Reliable Prediction of Relative Ligand

1360

Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy

1361

Calculation Protocol and Force Field. J Am Chem Soc. 2015;137(7):2695-2703.

1362

doi:10.1021/ja512751q

1363

47.

Steinbrecher T, Zhu C, Wang L, et al. Predicting the Effect of Amino Acid

1364

Single-Point Mutations on Protein Stability—Large-Scale Validation of MD-Based

1365

Relative Free Energy Calculations. J Mol Biol. 2017;429(7):948-963.

1366

doi:10.1016/j.jmb.2016.12.007

1367

48.

Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting

1368

protein stability upon mutation: good on average but not in the details. Protein Eng

1369

Des Sel. 2009;22(9):553-560. doi:10.1093/protein/gzp030

1370

49.

Montanucci L, Martelli PL, Ben-Tal N, Fariselli P. A natural upper bound to the

1371

accuracy of predicting protein stability changes upon mutations. Valencia A, ed.

1372

Bioinformatics. 2019;35(9):1513-1517. doi:10.1093/bioinformatics/bty880

1373

50.

Best RB, Hummer G, Eaton WA. Native contacts determine protein folding

1374

mechanisms in atomistic simulations. Proc Natl Acad Sci. 2013;110(44):17874-17879.

1375

doi:10.1073/pnas.1311599110

1376

51.

Bekker G-J, Ma B, Kamiya N. Thermal stability of single-domain antibodies estimated

1377

by molecular dynamics simulations. Protein Sci. 2019;28(2):429-438.

1378

doi:10.1002/pro.3546

1379

52.

Zabetakis D, Shriver-Lake LC, Olson MA, Goldman ER, Anderson GP. Experimental

1380

evaluation of single-domain antibodies predicted by molecular dynamics simulations

1381

to have elevated thermal stability. Protein Sci. July 2019:pro.3692.

1382

doi:10.1002/pro.3692

1383

53.

Solubility. J Mol Biol. 2012;421(2-3):237-241. doi:10.1016/j.jmb.2011.12.005

1384 1385

Agostini F, Vendruscolo M, Tartaglia GG. Sequence-Based Prediction of Protein

54.

Sormanni P, Aprile FA, Vendruscolo M. The CamSol Method of Rational Design of

1386

Protein Mutants with Enhanced Solubility. J Mol Biol. 2015;427(2):478-490.

1387

doi:10.1016/j.jmb.2014.09.026 40

1388

55.

Navarro S, Ventura S. Computational re-design of protein structures to improve

1389

solubility. Expert Opin Drug Discov. 2019;14(10):1077-1088.

1390

doi:10.1080/17460441.2019.1637413

1391

56.

Agrawal NJ, Kumar S, Wang X, Helk B, Singh SK, Trout BL. Aggregation in

1392

Protein-Based Biotherapeutics: Computational Studies and Tools to Identify

1393

Aggregation-Prone Regions. J Pharm Sci. 2011;100(12):5081-5095.

1394

doi:10.1002/jps.22705

1395

57.

Buck PM, Kumar S, Wang X, Agrawal NJ, Trout BL, Singh SK. Computational

1396

Methods to Predict Therapeutic Protein Aggregation. In: Methods in Molecular

1397

Biology. ; 2012:425-451. doi:10.1007/978-1-61779-921-1_26

1398

58.

Des. 1998;3(1):R9-R23. doi:10.1016/S1359-0278(98)00002-9

1399 1400

59.

Blancas-Mejia LM, Misra P, Dick CJ, et al. Immunoglobulin light chain amyloid aggregation. Chem Commun. 2018;54(76):10664-10674. doi:10.1039/C8CC04396E

1401 1402

Fink AL. Protein aggregation: folding aggregates, inclusion bodies and amyloid. Fold

60.

David MPC, Concepcion GP, Padlan EA. Using simple artificial intelligence methods

1403

for predicting amyloidogenesis in antibodies. BMC Bioinformatics. 2010;11(1):79.

1404

doi:10.1186/1471-2105-11-79

1405

61.

Liaw C, Tung C-W, Ho S-Y. Prediction and Analysis of Antibody Amyloidogenesis

1406

from Sequences. Isalan M, ed. PLoS One. 2013;8(1):e53235.

1407

doi:10.1371/journal.pone.0053235

1408

62.

Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L. Prediction of

1409

sequence-dependent and mutational effects on the aggregation of peptides and proteins.

1410

Nat Biotechnol. 2004;22(10):1302-1306. doi:10.1038/nbt1012

1411

63.

Tartaglia GG, Cavalli A, Pellarin R, Caflisch A. Prediction of aggregation rate and

1412

aggregation-prone segments in polypeptide sequences. Protein Sci.

1413

2005;14(10):2723-2734. doi:10.1110/ps.051471205

1414

64.

Maurer-Stroh S, Debulpaep M, Kuemmerer N, et al. Exploring the sequence

1415

determinants of amyloid structure using position-specific scoring matrices. Nat

1416

Methods. 2010;7(3):237-242. doi:10.1038/nmeth.1432

1417

65.

Walsh I, Seno F, Tosatto SCE, Trovato A. PASTA 2.0: an improved server for protein

1418

aggregation prediction. Nucleic Acids Res. 2014;42(W1):W301-W307.

1419

doi:10.1093/nar/gku399

1420

66.

Tartaglia GG, Vendruscolo M. The Zyggregator method for predicting protein

1421

aggregation propensities. Chem Soc Rev. 2008;37(7):1395-1401.

1422

doi:10.1039/b706784b 41

1423

67.

Kuriata A, Iglesias V, Pujols J, Kurcinski M, Kmiecik S, Ventura S. Aggrescan3D

1424

(A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res.

1425

2019;47(W1):W300-W307. doi:10.1093/nar/gkz321

1426

68.

Stanislawski J, Kotulska M, Unold O. Machine learning methods can replace 3D

1427

profile method in classification of amyloidogenic hexapeptides. BMC Bioinformatics.

1428

2013;14(1):21. doi:10.1186/1471-2105-14-21

1429

69.

Fang Y, Gao S, Tai D, Middaugh CR, Fang J. Identification of properties important to

1430

protein aggregation using feature selection. BMC Bioinformatics. 2013;14(1):314.

1431

doi:10.1186/1471-2105-14-314

1432

70.

Gasior P, Kotulska M. FISH Amyloid – a new method for finding amyloidogenic

1433

segments in proteins based on site specific co-occurence of aminoacids. BMC

1434

Bioinformatics. 2014;15(1):54. doi:10.1186/1471-2105-15-54

1435

71.

Thangakani AM, Kumar S, Nagarajan R, Velmurugan D, Gromiha MM. GAP:

1436

towards almost 100 percent prediction for β-strand-mediated aggregating peptides with

1437

distinct morphologies. Bioinformatics. 2014;30(14):1983-1990.

1438

doi:10.1093/bioinformatics/btu167

1439

72.

Família C, Dennison SR, Quintas A, Phoenix DA. Prediction of Peptide and Protein

1440

Propensity for Amyloid Formation. Permyakov EA, ed. PLoS One.

1441

2015;10(8):e0134679. doi:10.1371/journal.pone.0134679

1442

73.

Proteins. Int J Mol Sci. 2018;19(7):2071. doi:10.3390/ijms19072071

1443 1444

Niu M, Li Y, Wang C, Han K. RFAmyloid: A Web Server for Predicting Amyloid

74.

Han X, Wang X, Zhou K. Develop machine learning based regression predictive

1445

models for engineering protein solubility. Valencia A, ed. Bioinformatics. 2019;in

1446

press. doi:10.1093/bioinformatics/btz294

1447

75.

Hou Q, Kwasigroch JM, Rooman M, Pucci F. SOLart: a structure-based method to

1448

predict protein solubility and aggregation. Valencia A, ed. Bioinformatics. October

1449

2019. doi:10.1093/bioinformatics/btz773

1450

76.

Hou Q, Bourgeas R, Pucci F, Rooman M. Computational analysis of the amino acid

1451

interactions that promote or decrease protein solubility. Sci Rep. 2018;8(1):14661.

1452

doi:10.1038/s41598-018-32988-w

1453

77.

Niwa T, Ying B-W, Saito K, et al. Bimodal protein solubility distribution revealed by

1454

an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc Natl

1455

Acad Sci. 2009;106(11):4201-4206. doi:10.1073/pnas.0811922106

1456 1457

78.

Chan P, Curtis RA, Warwicker J. Soluble expression of proteins correlates with a lack of positively-charged surface. Sci Rep. 2013;3(1):3333. doi:10.1038/srep03333 42

1458

79.

Warwicker J, Charonis S, Curtis RA. Lysine and Arginine Content of Proteins:

1459

Computational Analysis Suggests a New Tool for Solubility Design. Mol Pharm.

1460

2014;11(1):294-303. doi:10.1021/mp4004749

1461

80.

Hebditch M, Carballo-Amador MA, Charonis S, Curtis R, Warwicker J. Protein–Sol: a

1462

web tool for predicting protein solubility from sequence. Valencia A, ed.

1463

Bioinformatics. 2017;33(19):3098-3100. doi:10.1093/bioinformatics/btx345

1464

81.

Austerberry JI, Thistlethwaite A, Fisher K, et al. Arginine to Lysine Mutations

1465

Increase the Aggregation Stability of a Single-Chain Variable Fragment through

1466

Unfolded-State Interactions. Biochemistry. 2019;58(32):3413-3421.

1467

doi:10.1021/acs.biochem.9b00367

1468

82.

Hebditch M, Warwicker J. Web-based display of protein surface and pH-dependent

1469

properties for assessing the developability of biotherapeutics. Sci Rep. 2019;9(1):1969.

1470

doi:10.1038/s41598-018-36950-8

1471

83.

Wang X, Singh SK, Kumar S. Potential Aggregation-Prone Regions in

1472

Complementarity-Determining Regions of Antibodies and Their Contribution Towards

1473

Antigen Recognition: A Computational Analysis. Pharm Res. 2010;27(8):1512-1529.

1474

doi:10.1007/s11095-010-0143-5

1475

84.

Chennamsetty N, Voynov V, Kayser V, Helk B, Trout BL. Design of therapeutic

1476

proteins with enhanced stability. Proc Natl Acad Sci U S A.

1477

2009;106(29):11937-11942. doi:10.1073/pnas.0904191106

1478

85.

Lauer TM, Agrawal NJ, Chennamsetty N, Egodage K, Helk B, Trout BL.

1479

Developability index: A rapid in silico tool for the screening of antibody aggregation

1480

propensity. J Pharm Sci. 2012;101(1):102-115. doi:10.1002/jps.22758

1481

86.

Sankar K, Krystek SR, Carl SM, Day T, Maier JKX. AggScore: Prediction of

1482

aggregation-prone regions in proteins based on the distribution of surface patches.

1483

Proteins Struct Funct Bioinforma. 2018;86(11):1147-1156. doi:10.1002/prot.25594

1484

87.

Trainor K, Gingras Z, Shillingford C, et al. Ensemble Modeling and Intracellular

1485

Aggregation of an Engineered Immunoglobulin-Like Domain. J Mol Biol.

1486

2016;428(6):1365-1374. doi:10.1016/j.jmb.2016.02.016

1487

88.

Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S.

1488

AGGRESCAN: a server for the prediction and evaluation of “hot spots” of

1489

aggregation in polypeptides. BMC Bioinformatics. 2007;8(1):65.

1490

doi:10.1186/1471-2105-8-65

43

1491

89.

Beerten J, Van Durme J, Gallardo R, et al. WALTZ-DB: a benchmark database of

1492

amyloidogenic hexapeptides. Bioinformatics. 2015;31(10):1698-1700.

1493

doi:10.1093/bioinformatics/btv027

1494

90.

landscape. Proc Natl Acad Sci. 2017;114(5):944-949. doi:10.1073/pnas.1616408114

1495 1496

Jain T, Sun T, Durand S, et al. Biophysical properties of the clinical-stage antibody

91.

de Groot NS, Castillo V, Graña-Montes R, Ventura S. AGGRESCAN: Method,

1497

Application, and Perspectives for Drug Design. In: Methods in Molecular Biology. ;

1498

2012:199-220. doi:10.1007/978-1-61779-465-0_14

1499

92.

Saerens D, Pellis M, Loris R, et al. Identification of a Universal VHH Framework to

1500

Graft Non-canonical Antigen-binding Loops of Camel Single-domain Antibodies. J

1501

Mol Biol. 2005;352(3):597-607. doi:10.1016/j.jmb.2005.07.038

1502

93.

Vincke C, Loris R, Saerens D, Martinez-Rodriguez S, Muyldermans S, Conrath K.

1503

General Strategy to Humanize a Camelid Single-domain Antibody and Identification

1504

of a Universal Humanized Nanobody Scaffold. J Biol Chem. 2009;284(5):3273-3284.

1505

doi:10.1074/jbc.M806889200

1506

94.

Soler MA, de Marco A, Fortuna S. Molecular dynamics simulations and docking

1507

enable to explore the biophysical factors controlling the yields of engineered

1508

nanobodies. Sci Rep. 2016;6(1):34869. doi:10.1038/srep34869

1509

95.

Sobolev V, Eyal E, Gerzon S, et al. SPACE: a suite of tools for protein structure

1510

prediction and analysis based on complementarity and environment. Nucleic Acids Res.

1511

2005;33(Web Server):W39-W43. doi:10.1093/nar/gki398

1512

96.

Negi SS, Schein CH, Oezguen N, Power TD, Braun W. InterProSurf: a web server for

1513

predicting interacting sites on protein surfaces. Bioinformatics.

1514

2007;23(24):3397-3399. doi:10.1093/bioinformatics/btm474

1515

97.

van Zundert GCP, Rodrigues JPGLM, Trellet M, et al. The HADDOCK2.2 Web

1516

Server: User-Friendly Integrative Modeling of Biomolecular Complexes. J Mol Biol.

1517

2016;428(4):720-725. doi:10.1016/j.jmb.2015.09.014

1518

98.

Hertig S, Latorraca NR, Dror RO. Revealing Atomic-Level Mechanisms of Protein

1519

Allostery with Molecular Dynamics Simulations. Liu J, ed. PLOS Comput Biol.

1520

2016;12(6):e1004746. doi:10.1371/journal.pcbi.1004746

1521

99.

Obrezanova O, Arnell A, de la Cuesta RG, et al. Aggregation risk prediction for

1522

antibodies and its application to biotherapeutic development. MAbs.

1523

2015;7(2):352-363. doi:10.1080/19420862.2015.1007828

44

1524

100. Sydow JF, Lipsmeier F, Larraillet V, et al. Structure-Based Prediction of Asparagine

1525

and Aspartate Degradation Sites in Antibody Variable Regions. Dübel S, ed. PLoS

1526

One. 2014;9(6):e100736. doi:10.1371/journal.pone.0100736

1527

101. Agrawal NJ, Dykstra A, Yang J, et al. Prediction of the Hydrogen Peroxide–Induced

1528

Methionine Oxidation Propensity in Monoclonal Antibodies. J Pharm Sci.

1529

2018;107(5):1282-1289. doi:10.1016/j.xphs.2018.01.002

1530

102. Yang R, Jain T, Lynaugh H, et al. Rapid assessment of oxidation via middle-down

1531

LCMS correlates with methionine side-chain solvent-accessible surface area for 121

1532

clinical stage monoclonal antibodies. MAbs. 2017;9(4):646-653.

1533

doi:10.1080/19420862.2017.1290753

1534

103. Lorenzo JR, Alonso LG, Sánchez IE. Prediction of Spontaneous Protein Deamidation

1535

from Sequence-Derived Secondary Structure and Intrinsic Disorder. Lisacek F, ed.

1536

PLoS One. 2015;10(12):e0145186. doi:10.1371/journal.pone.0145186

1537

104. Plotnikov N V., Singh SK, Rouse JC, Kumar S. Quantifying the Risks of Asparagine

1538

Deamidation and Aspartate Isomerization in Biopharmaceuticals by Computing

1539

Reaction Free-Energy Surfaces. J Phys Chem B. 2017;121(4):719-730.

1540

doi:10.1021/acs.jpcb.6b11614

1541

105. Jia L, Sun Y. Protein asparagine deamidation prediction based on structures with

1542

machine learning methods. de Brevern AG, ed. PLoS One. 2017;12(7):e0181347.

1543

doi:10.1371/journal.pone.0181347

1544

106. Yan Q, Huang M, Lewis MJ, Hu P. Structure Based Prediction of Asparagine

1545

Deamidation Propensity in Monoclonal Antibodies. MAbs. 2018;10(6):901-912.

1546

doi:10.1080/19420862.2018.1478646

1547

107. Delmar JA, Wang J, Choi SW, Martins JA, Mikhail JP. Machine Learning Enables

1548

Accurate Prediction of Asparagine Deamidation Probability and Rate. Mol Ther -

1549

Methods Clin Dev. 2019;15:264-274. doi:10.1016/j.omtm.2019.09.008

1550

108. Chennamsetty N, Quan Y, Nashine V, Sadineni I, Lyngberg O, Krystek S. Modeling

1551

the Oxidation of Methionine Residues by Peroxides in Proteins. J Pharm Sci.

1552

2015;104(4):1246-1255. doi:10.1002/jps.24340

1553

109. Sankar K, Hoi KH, Yin Y, et al. Prediction of methionine oxidation risk in monoclonal

1554

antibodies using a machine learning method. MAbs. 2018;10(8):1281-1290.

1555

doi:10.1080/19420862.2018.1518887

1556

110. Aledo JC, Cantón FR, Veredas FJ. A machine learning approach for predicting

1557

methionine oxidation sites. BMC Bioinformatics. 2017;18(1):430.

1558

doi:10.1186/s12859-017-1848-9 45

1559

111. Moal IH, Fernández-Recio J. SKEMPI: A Structural Kinetic and Energetic database of

1560

Mutant Protein Interactions and its use in empirical models. Bioinformatics.

1561

2012;28(20):2600-2607. doi:10.1093/bioinformatics/bts489

1562

112. Sirin S, Apgar JR, Bennett EM, Keating AE. AB-Bind: Antibody binding mutational

1563

database for computational affinity predictions. Protein Sci. 2016;25(2):393-409.

1564

doi:10.1002/pro.2829

1565

113. Vreven T, Moal IH, Vangone A, et al. Updates to the Integrated Protein–Protein

1566

Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark

1567

Version 2. J Mol Biol. 2015;427(19):3031-3041. doi:10.1016/j.jmb.2015.07.016

1568

114. Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A. ProTherm, version 4.0:

1569

thermodynamic database for proteins and mutants. Nucleic Acids Res.

1570

2004;32(Database issue):D120-1. doi:10.1093/nar/gkh082

1571

115. Thompson MJ, Sievers SA, Karanicolas J, Ivanova MI, Baker D, Eisenberg D. The 3D

1572

profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci.

1573

2006;103(11):4074-4078. doi:10.1073/pnas.0511295103

1574 1575 1576

116. Pawlicki S, Le Béchec A, Delamarche C. AMYPdb: A database dedicated to amyloid precursor proteins. BMC Bioinformatics. 2008;9(1):273. doi:10.1186/1471-2105-9-273 117. Varadi M, De Baets G, Vranken WF, Tompa P, Pancsa R. AmyPro: a database of

1577

proteins with validated amyloidogenic regions. Nucleic Acids Res.

1578

2018;46(D1):D387-D392. doi:10.1093/nar/gkx950

1579

118. Thangakani AM, Nagarajan R, Kumar S, Sakthivel R, Velmurugan D, Gromiha MM.

1580

CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated

1581

Experimental Data on Protein and Peptide Aggregation. Zheng J, ed. PLoS One.

1582

2016;11(4):e0152949. doi:10.1371/journal.pone.0152949

1583

119. Leman JK, Weitzner BD, Lewis SM, Consortium R, Bonneau R. Macromolecular

1584

modeling and design in Rosetta: new methods and frameworks. Preprints. 2019.

1585

doi:10.20944/preprints201904.0263.v1

1586

120. Guerois R, Nielsen JE, Serrano L. Predicting Changes in the Stability of Proteins and

1587

Protein Complexes: A Study of More Than 1000 Mutations. J Mol Biol.

1588

2002;320(2):369-387. doi:10.1016/S0022-2836(02)00442-4

1589

121. Wijma HJ, Floor RJ, Janssen DB. Structure- and sequence-analysis inspired

1590

engineering of proteins for enhanced thermostability. Curr Opin Struct Biol.

1591

2013;23(4):588-594. doi:10.1016/j.sbi.2013.04.008

46

1592

122. Broom A, Jacobi Z, Trainor K, Meiering EM. Computational tools help improve

1593

protein stability but with a solubility tradeoff. J Biol Chem.

1594

2017;292(35):14349-14361. doi:10.1074/jbc.M117.784165

1595 1596 1597

123. Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods. 2014;11(8):801-807. doi:10.1038/nmeth.3027 124. Koenig P, Lee C V., Sanowar S, et al. Deep Sequencing-guided Design of a High

1598

Affinity Dual Specificity Antibody to Target Two Angiogenic Factors in Neovascular

1599

Age-related Macular Degeneration. J Biol Chem. 2015;290(36):21773-21786.

1600

doi:10.1074/jbc.M115.662783

1601

125. Wang S, Liu M, Zeng D, et al. Increasing stability of antibody via antibody

1602

engineering: Stability engineering on an anti-hVEGF. Proteins Struct Funct

1603

Bioinforma. 2014;82(10):2620-2630. doi:10.1002/prot.24626

1604

126. Sivasubramanian A, Sircar A, Chaudhury S, Gray JJ. Toward high-resolution

1605

homology modeling of antibody F v regions and application to antibody-antigen

1606

docking. Proteins Struct Funct Bioinforma. 2009;74(2):497-514.

1607

doi:10.1002/prot.22309

1608

127. Chen R, Li L, Weng Z. ZDOCK: An initial-stage protein-docking algorithm. Proteins

1609

Struct Funct Genet. 2003;52(1):80-87. doi:10.1002/prot.10389

1610

128. Sircar A, Gray JJ. SnugDock: Paratope Structural Optimization during

1611

Antibody-Antigen Docking Compensates for Errors in Antibody Homology Models.

1612

Kortemme T, ed. PLoS Comput Biol. 2010;6(1):e1000644.

1613

doi:10.1371/journal.pcbi.1000644

1614

129. Chan C-H, Liang H-K, Hsiao N-W, Ko M-T, Lyu P-C, Hwang J-K. Relationship

1615

between local structural entropy and protein thermostabilty. Proteins Struct Funct

1616

Bioinforma. 2004;57(4):684-691. doi:10.1002/prot.20263

1617

130. Su JG, Li CH, Hao R, Chen WZ, Xin Wang C. Protein Unfolding Behavior Studied by

1618

Elastic Network Model. Biophys J. 2008;94(12):4586-4596.

1619

doi:10.1529/biophysj.107.121665

1620

131. Zhang C, Samad M, Yu H, Chakroun N, Hilton D, Dalby PA. Computational Design

1621

To Reduce Conformational Flexibility and Aggregation Rates of an Antibody Fab

1622

Fragment. Mol Pharm. 2018;15(8):3079-3092.

1623

doi:10.1021/acs.molpharmaceut.8b00186

1624

132. Lee J, Der BS, Karamitros CS, et al. Computer‐based engineering of thermostabilized

1625

antibody fragments. AIChE J. November 2019:in press. doi:10.1002/aic.16864

47

1626

133. Dudgeon K, Rouet R, Kokmeijer I, et al. General strategy for the generation of human

1627

antibody variable domains with increased aggregation resistance. Proc Natl Acad Sci.

1628

2012;109(27):10879-10884. doi:10.1073/pnas.1202866109

1629

134. Sakhnini LI, Greisen PJ, Wiberg C, et al. Improving the Developability of an Antigen

1630

Binding Fragment by Aspartate Substitutions. Biochemistry. 2019;58(24):2750-2759.

1631

doi:10.1021/acs.biochem.9b00251

1632

135. Lawrence MS, Phillips KJ, Liu DR. Supercharging proteins can impart unusual

1633

resilience. J Am Chem Soc. 2007;129(33):10110-10112. doi:10.1021/ja071641y

1634

136. Der BS, Kluwe C, Miklos AE, et al. Alternative Computational Protocols for

1635

Supercharging Protein Surfaces for Reversible Unfolding and Retention of Stability.

1636

Salsbury Jr F, ed. PLoS One. 2013;8(5):e64363. doi:10.1371/journal.pone.0064363

1637

137. Miklos AE, Kluwe C, Der BS, et al. Structure-based design of supercharged, highly

1638

thermoresistant antibodies. Chem Biol. 2012;19(4):449-455.

1639

doi:10.1016/j.chembiol.2012.01.018

1640

138. Bruce VJ, Lopez-Islas M, McNaughton BR. Resurfaced cell-penetrating nanobodies:

1641

A potentially general scaffold for intracellularly targeted protein discovery. Protein Sci.

1642

2016;25(6):1129-1137. doi:10.1002/pro.2926

1643 1644

139. Courtois F, Schneider CP, Agrawal NJ, Trout BL. Rational Design of Biobetters with Enhanced Stability. J Pharm Sci. 2015;104(8):2433-2440. doi:10.1002/jps.24520

1645

140. Courtois F, Agrawal NJ, Lauer TM, Trout BL. Rational design of therapeutic mAbs

1646

against aggregation through protein engineering and incorporation of glycosylation

1647

motifs applied to bevacizumab. MAbs. 2016;8(1):99-112.

1648

doi:10.1080/19420862.2015.1112477

1649

141. Clark RH, Latypov RF, De Imus C, et al. Remediating agitation-induced antibody

1650

aggregation by eradicating exposed hydrophobic motifs. MAbs. 2014;6(6):1540-1550.

1651

doi:10.4161/mabs.36252

1652

142. Ejima D, Tsumoto K, Fukada H, et al. Effects of acid exposure on the conformation,

1653

stability, and aggregation of monoclonal antibodies. Proteins Struct Funct Bioinforma.

1654

2006;66(4):954-962. doi:10.1002/prot.21243

1655

143. Skamris T, Tian X, Thorolfsson M, et al. Monoclonal Antibodies Follow Distinct

1656

Aggregation Pathways During Production-Relevant Acidic Incubation and

1657

Neutralization. Pharm Res. 2016;33(3):716-728. doi:10.1007/s11095-015-1821-0

1658

144. Arakawa T, Kita Y, Carpenter JF. Protein--solvent interactions in pharmaceutical

1659

formulations. Pharm Res. 1991;8(3):285-291. doi:10.1023/a:1015825027737

48

1660

145. Arakawa T, Kita Y. Protection of Bovine Serum Albumin from Aggregation by Tween

1661

80. J Pharm Sci. 2000;89(5):646-651.

1662

doi:10.1002/(SICI)1520-6017(200005)89:5<646::AID-JPS10>3.0.CO;2-J

1663

146. Arakawa T, Kita Y. Stabilizing effects of caprylate and acetyltryptophanate on

1664

heat-induced aggregation of bovine serum albumin. Biochim Biophys Acta - Protein

1665

Struct Mol Enzymol. 2000;1479(1-2):32-36. doi:10.1016/S0167-4838(00)00061-3

1666

147. Arakawa T, Kita Y, Timasheff SN. Protein precipitation and denaturation by dimethyl

1667

sulfoxide. Biophys Chem. 2007;131(1-3):62-70. doi:10.1016/j.bpc.2007.09.004

1668

148. Arakawa T, Ejima D, Tsumoto K, et al. Suppression of protein interactions by

1669

arginine: A proposed mechanism of the arginine effects. Biophys Chem.

1670

2007;127(1-2):1-8. doi:10.1016/j.bpc.2006.12.007

1671

149. Cloutier T, Sudrik C, Mody N, Sathish HA, Trout BL. Molecular Computations of

1672

Preferential Interaction Coefficients of IgG1 Monoclonal Antibodies with Sorbitol,

1673

Sucrose, and Trehalose and the Impact of These Excipients on Aggregation and

1674

Viscosity. Mol Pharm. 2019;16(8):3657-3664.

1675

doi:10.1021/acs.molpharmaceut.9b00545

1676

150. Sormanni P, Amery L, Ekizoglou S, Vendruscolo M, Popovic B. Rapid and accurate in

1677

silico solubility screening of a monoclonal antibody library. Sci Rep. 2017;7(1):8200.

1678

doi:10.1038/s41598-017-07800-w

1679

151. Wolf Pérez A-M, Sormanni P, Andersen JS, et al. In vitro and in silico assessment of

1680

the developability of a designed monoclonal antibody library. MAbs.

1681

2019;11(2):388-400. doi:10.1080/19420862.2018.1556082

1682

152. Shan L, Mody N, Sormani P, Rosenthal KL, Damschroder MM, Esfandiary R.

1683

Developability Assessment of Engineered Monoclonal Antibody Variants with a

1684

Complex Self-Association Behavior Using Complementary Analytical and in Silico

1685

Tools. Mol Pharm. 2018;15(12):5697-5710. doi:10.1021/acs.molpharmaceut.8b00867

1686

153. Arora J, Hu Y, Esfandiary R, et al. Charge-mediated Fab-Fc interactions in an IgG1

1687

antibody induce reversible self-association, cluster formation, and elevated viscosity.

1688

MAbs. 2016;8(8):1561-1574. doi:10.1080/19420862.2016.1222342

1689

154. Van Durme J, De Baets G, Van Der Kant R, et al. Solubis: a webserver to reduce

1690

protein aggregation through mutation. Protein Eng Des Sel. 2016;29(8):285-289.

1691

doi:10.1093/protein/gzw019

1692

155. van der Kant R, Karow-Zwick AR, Van Durme J, et al. Prediction and Reduction of

1693

the Aggregation of Monoclonal Antibodies. J Mol Biol. 2017;429(8):1244-1261.

1694

doi:10.1016/j.jmb.2017.03.014 49

1695

156. Martinez M, Bruce NJ, Romanowska J, et al. SDA 7: A modular and parallel

1696

implementation of the simulation of diffusional association software. J Comput Chem.

1697

2015;36(21):1631-1645. doi:10.1002/jcc.23971

1698

157. Nautiyal K, Kibria MG, Akazawa-Ogawa Y, Hagihara Y, Kuroda Y. Design and

1699

assessment of an active anti-epidermal growth factor receptor (EGFR) single chain

1700

variable fragment (ScFv) with improved solubility. Biochem Biophys Res Commun.

1701

2019;508(4):1043-1049. doi:10.1016/j.bbrc.2018.11.170

1702

158. Tomar DS, Kumar S, Singh SK, Goswami S, Li L. Molecular basis of high viscosity in

1703

concentrated antibody solutions: Strategies for high concentration drug product

1704

development. MAbs. 2016;8(2):216-228. doi:10.1080/19420862.2015.1128606

1705

159. Li L, Kumar S, Buck PM, et al. Concentration Dependent Viscosity of Monoclonal

1706

Antibody Solutions: Explaining Experimental Behavior in Terms of Molecular

1707

Properties. Pharm Res. 2014;31(11):3161-3178. doi:10.1007/s11095-014-1409-0

1708

160. Sharma VK, Patapoff TW, Kabakoff B, et al. In silico selection of therapeutic

1709

antibodies for development: Viscosity, clearance, and chemical stability. Proc Natl

1710

Acad Sci. 2014;111(52):18601-18606. doi:10.1073/pnas.1421779112

1711

161. Kramer RM, Shende VR, Motl N, Pace CN, Scholtz JM. Toward a Molecular

1712

Understanding of Protein Solubility: Increased Negative Surface Charge Correlates

1713

with Increased Solubility. Biophys J. 2012;102(8):1907-1915.

1714

doi:10.1016/j.bpj.2012.01.060

1715

162. Tomar DS, Li L, Broulidakis MP, et al. In-silico prediction of concentration-dependent

1716

viscosity curves for monoclonal antibody solutions. MAbs. 2017;9(3):476-489.

1717

doi:10.1080/19420862.2017.1285479

1718

163. Gentiluomo L, Roessner D, Augustijn D, et al. Application of interpretable artificial

1719

neural networks to early monoclonal antibodies development. Eur J Pharm Biopharm.

1720

2019;141:81-89. doi:10.1016/j.ejpb.2019.05.017

1721

164. Nichols P, Li L, Kumar S, et al. Rational design of viscosity reducing mutants of a

1722

monoclonal antibody: Hydrophobic versus electrostatic inter-molecular interactions.

1723

MAbs. 2015;7(1):212-230. doi:10.4161/19420862.2014.985504

1724

165. Kumar S, Roffi K, Tomar DS, et al. Rational optimization of a monoclonal antibody

1725

for simultaneous improvements in its solution properties and biological activity.

1726

Berghuis A, ed. Protein Eng Des Sel. 2018;31(7-8):313-325.

1727

doi:10.1093/protein/gzy020

50

1728

166. Chow C-K, Allan BW, Chai Q, Atwell S, Lu J. Therapeutic Antibody Engineering To

1729

Improve Viscosity and Phase Separation Guided by Crystal Structure. Mol Pharm.

1730

2016;13(3):915-923. doi:10.1021/acs.molpharmaceut.5b00817

1731

167. Geoghegan JC, Fleming R, Damschroder M, Bishop SM, Sathish HA, Esfandiary R.

1732

Mitigation of reversible self-association and viscosity in a human IgG1 monoclonal

1733

antibody by rational, structure-guided Fv engineering. MAbs. 2016;8(5):941-950.

1734

doi:10.1080/19420862.2016.1171444

1735

168. Kuhn AB, Kube S, Karow-Zwick AR, et al. Improved Solution-State Properties of

1736

Monoclonal Antibodies by Targeted Mutations. J Phys Chem B.

1737

2017;121(48):10818-10827. doi:10.1021/acs.jpcb.7b09126

1738

169. Jetha A, Thorsteinson N, Jmeian Y, Jeganathan A, Giblin P, Fransson J. Homology

1739

modeling and structure-based design improve hydrophobic interaction chromatography

1740

behavior of integrin binding antibodies. MAbs. 2018;10(6):890-900.

1741

doi:10.1080/19420862.2018.1475871

1742

170. Lemkul J. From Proteins to Perturbed Hamiltonians: A Suite of Tutorials for the

1743

GROMACS-2018 Molecular Simulation Package [Article v1.0]. Living J Comput Mol

1744

Sci. 2019;1(1). doi:10.33011/livecoms.1.1.5068

1745

171. Kastritis PL, Bonvin AMJJ. Are Scoring Functions in Protein−Protein Docking Ready

1746

To Predict Interactomes? Clues from a Novel Binding Affinity Benchmark. J

1747

Proteome Res. 2010;9(5):2216-2225. doi:10.1021/pr9009854

1748

172. Ando T, Skolnick J. Crowding and hydrodynamic interactions likely dominate in vivo

1749

macromolecular motion. Proc Natl Acad Sci. 2010;107(43):18457-18462.

1750

doi:10.1073/pnas.1011354107

1751

173. von Bülow S, Siggel M, Linke M, Hummer G. Dynamic cluster formation determines

1752

viscosity and diffusion in dense protein solutions. Proc Natl Acad Sci U S A.

1753

2019;116(20):9843-9852. doi:10.1073/pnas.1817564116

1754

174. Chaudhri A, Zarraga IE, Kamerzell TJ, et al. Coarse-Grained Modeling of the

1755

Self-Association of Therapeutic Monoclonal Antibodies. J Phys Chem B.

1756

2012;116(28):8045-8057. doi:10.1021/jp301140u

1757

175. Chaudhri A, Zarraga IE, Yadav S, Patapoff TW, Shire SJ, Voth GA. The Role of

1758

Amino Acid Sequence in the Self-Association of Therapeutic Monoclonal Antibodies:

1759

Insights from Coarse-Grained Modeling. J Phys Chem B. 2013;117(5):1269-1279.

1760

doi:10.1021/jp3108396

51

1761

176. Yadav S, Shire SJ, Kalonia DS. Factors Affecting the Viscosity in High Concentration

1762

Solutions of Different Monoclonal Antibodies. J Pharm Sci. 2010;99(12):4812-4829.

1763

doi:10.1002/jps.22190

1764

177. Yadav S, Liu J, Shire SJ, Kalonia DS. Specific interactions in high concentration

1765

antibody solutions resulting in high viscosity. J Pharm Sci. 2010;99(3):1152-1168.

1766

doi:10.1002/jps.21898

1767

178. Yadav S, Sreedhara A, Kanai S, et al. Establishing a Link Between Amino Acid

1768

Sequences and Self-Associating and Viscoelastic Behavior of Two Closely Related

1769

Monoclonal Antibodies. Pharm Res. 2011;28(7):1750-1764.

1770

doi:10.1007/s11095-011-0410-0

1771

179. Yadav S, Laue TM, Kalonia DS, Singh SN, Shire SJ. The Influence of Charge

1772

Distribution on Self-Association and Viscosity Behavior of Monoclonal Antibody

1773

Solutions. Mol Pharm. 2012;9(4):791-802. doi:10.1021/mp200566k

1774

180. Buck PM, Chaudhri A, Kumar S, Singh SK. Highly Viscous Antibody Solutions Are a

1775

Consequence of Network Formation Caused by Domain–Domain Electrostatic

1776

Complementarities: Insights from Coarse-Grained Simulations. Mol Pharm.

1777

2015;12(1):127-139. doi:10.1021/mp500485w

1778

181. Lapelosa M, Patapoff TW, Zarraga IE. Molecular Simulations of the Pairwise

1779

Interaction of Monoclonal Antibodies. J Phys Chem B. 2014;118(46):13132-13141.

1780

doi:10.1021/jp508729z

1781

182. Wang G, Varga Z, Hofmann J, Zarraga IE, Swan JW. Structure and Relaxation in

1782

Solutions of Monoclonal Antibodies. J Phys Chem B. 2018;122(11):2867-2880.

1783

doi:10.1021/acs.jpcb.7b11053

1784

183. Yearley EJ, Zarraga IE, Shire SJ, et al. Small-Angle Neutron Scattering

1785

Characterization of Monoclonal Antibody Conformations and Interactions at High

1786

Concentrations. Biophys J. 2013;105(3):720-731. doi:10.1016/j.bpj.2013.06.043

1787

184. Lilyestrom WG, Yadav S, Shire SJ, Scherer TM. Monoclonal Antibody

1788

Self-Association, Cluster Formation, and Rheology at High Concentrations. J Phys

1789

Chem B. 2013;117(21):6373-6384. doi:10.1021/jp4008152

1790

185. Castellanos MM, Clark NJ, Watson MC, Krueger S, McAuley A, Curtis JE. Role of

1791

Molecular Flexibility and Colloidal Descriptions of Proteins in Crowded Environments

1792

from Small-Angle Scattering. J Phys Chem B. 2016;120(49):12511-12518.

1793

doi:10.1021/acs.jpcb.6b10637

52

1794

186. Corbett D, Hebditch M, Keeling R, et al. Coarse-Grained Modeling of Antibodies

1795

from Small-Angle Scattering Profiles. J Phys Chem B. 2017;121(35):8276-8290.

1796

doi:10.1021/acs.jpcb.7b04621

1797

187. Abhinandan KR, Martin ACR. Analyzing the “Degree of Humanness” of Antibody

1798

Sequences. J Mol Biol. 2007;369(3):852-862. doi:10.1016/j.jmb.2007.02.100

1799

188. Thullier P, Huish O, Pelat T, Martin ACR. The Humanness of Macaque Antibody

1800

Sequences. J Mol Biol. 2010;396(5):1439-1450. doi:10.1016/j.jmb.2009.12.041

1801

189. Pelat T, Bedouelle H, Rees AR, Crennell SJ, Lefranc M-P, Thullier P. Germline

1802

Humanization of a Non-human Primate Antibody that Neutralizes the Anthrax Toxin,

1803

by in Vitro and in Silico Engineering. J Mol Biol. 2008;384(5):1400-1407.

1804

doi:10.1016/j.jmb.2008.10.033

1805

190. Gao SH, Huang K, Tu H, Adler AS. Monoclonal antibody humanness score and its

1806

applications. BMC Biotechnol. 2013;13(1):55. doi:10.1186/1472-6750-13-55

1807

191. Ye J, Ma N, Madden TL, Ostell JM. IgBLAST: an immunoglobulin variable domain

1808

sequence analysis tool. Nucleic Acids Res. 2013;41(W1):W34-W40.

1809

doi:10.1093/nar/gkt382

1810

192. Seeliger D. Development of Scoring Functions for Antibody Sequence Assessment

1811

and Optimization. PLoS One. 2013;8(10):e76909. doi:10.1371/journal.pone.0076909

1812

193. Swindells MB, Porter CT, Couch M, et al. abYsis: Integrated Antibody Sequence and

1813

Structure—Management, Analysis, and Prediction. J Mol Biol. 2017;429(3):356-364.

1814

doi:10.1016/j.jmb.2016.08.019

1815

194. Koren E, De Groot AS, Jawa V, et al. Clinical validation of the “in silico” prediction

1816

of immunogenicity of a human recombinant therapeutic protein. Clin Immunol.

1817

2007;124(1):26-32. doi:10.1016/j.clim.2007.03.544

1818

195. Seeliger D, Schulz P, Litzenburger T, et al. Boosting antibody developability through

1819

rational sequence optimization. MAbs. 2015;7(3):505-515.

1820

doi:10.1080/19420862.2015.1017695

1821

196. Clavero-Álvarez A, Di Mambro T, Perez-Gaviro S, Magnani M, Bruscolini P.

1822

Humanization of Antibodies using a Statistical Inference Approach. Sci Rep.

1823

2018;8(1):14820. doi:10.1038/s41598-018-32986-y

1824

197. Lazar GA, Desjarlais JR, Jacinto J, Karki S, Hammond PW. A molecular immunology

1825

approach to antibody humanization and functional optimization. Mol Immunol.

1826

2007;44(8):1986-1998. doi:10.1016/j.molimm.2006.09.029

53

1827

198. Mei S, Li F, Leier A, et al. A comprehensive review and performance evaluation of

1828

bioinformatics tools for HLA class I peptide-binding prediction. Brief Bioinform.

1829

2019;in press. doi:10.1093/bib/bbz051

1830

199. Qiu J, Qiu T, Huang Y, Cao Z. Identifying the Epitope Regions of Therapeutic

1831

Antibodies Based on Structure Descriptors. Int J Mol Sci. 2017;18(12):2457.

1832

doi:10.3390/ijms18122457

1833

200. Kobe B, Guncar G, Buchholz R, et al. Crystallography and protein–protein

1834

interactions: biological interfaces and crystal contacts. Biochem Soc Trans.

1835

2008;36(6):1438-1441. doi:10.1042/BST0361438

1836

201. Harding FA, Stickler MM, Razo J, DuBridge R. The immunogenicity of humanized

1837

and fully human antibodies. MAbs. 2010;2(3):256-265. doi:10.4161/mabs.2.3.11641

1838

202. Jones PT, Dear PH, Foote J, Neuberger MS, Winter G. Replacing the

1839

complementarity-determining regions in a human antibody with those from a mouse.

1840

Nature. 1986;321(6069):522-525. doi:10.1038/321522a0

1841

203. Tan P, Mitchell DA, Buss TN, Holmes MA, Anasetti C, Foote J. “Superhumanized”

1842

Antibodies: Reduction of Immunogenic Potential by Complementarity-Determining

1843

Region Grafting with Human Germline Sequences: Application to an Anti-CD28. J

1844

Immunol. 2002;169(2):1119-1125. doi:10.4049/jimmunol.169.2.1119

1845

204. Khee Hwang WY, Almagro JC, Buss TN, Tan P, Foote J. Use of human germline

1846

genes in a CDR homology-based approach to antibody humanization. Methods.

1847

2005;36(1):35-42. doi:10.1016/j.ymeth.2005.01.004

1848

205. Roguska MA, Pedersen JT, Keddy CA, et al. Humanization of murine monoclonal

1849

antibodies through variable domain resurfacing. Proc Natl Acad Sci.

1850

1994;91(3):969-973. doi:10.1073/pnas.91.3.969

1851 1852 1853 1854 1855

206. Olimpieri PP, Marcatili P, Tramontano A. Tabhu: Tools for antibody humanization. Bioinformatics. 2015;31(3):434-435. doi:10.1093/bioinformatics/btu667 207. Almagro JC, Fransson J. Humanization of antibodies. Front Biosci. 2008;13(7):1619-1633. doi:10.1093/toxsci/kft065 208. Safdari Y, Farajnia S, Asgharzadeh M, Khalili M. Antibody humanization methods – a

1856

review and update. Biotechnol Genet Eng Rev. 2013;29(2):175-186.

1857

doi:10.1080/02648725.2013.801235

1858

209. Mayrhofer P, Kunert R. Nomenclature of humanized mAbs: Early concepts, current

1859

challenges and future perspectives. Hum Antibodies. 2018;27(1):37-51.

1860

doi:10.3233/HAB-180347

54

1861 1862 1863

210. Lo BKC. Antibody humanization by CDR grafting. Methods Mol Biol. 2004;248:135-159. http://www.ncbi.nlm.nih.gov/pubmed/14970494. 211. Zhang D, Chen CF, Zhao B Bin, et al. A novel antibody humanization method based

1864

on epitopes scanning and molecular dynamics simulation. PLoS One.

1865

2013;8(11):e80636. doi:10.1371/journal.pone.0080636

1866

212. Margreitter C, Mayrhofer P, Kunert R, Oostenbrink C. Antibody humanization by

1867

molecular dynamics simulations - In-silico guided selection of critical backmutations.

1868

J Mol Recognit. 2016;29(6):266-275. doi:10.1002/jmr.2527

1869

213. Schwaigerlehner L, Pechlaner M, Mayrhofer P, Oostenbrink C, Kunert R. Lessons

1870

learned from merging wet lab experiments with molecular simulation to improve mAb

1871

humanization. Rees A, ed. Protein Eng Des Sel. 2018;31(7-8):257-265.

1872

doi:10.1093/protein/gzy009

1873

214. Hanf KJM, Arndt JW, Chen LL, et al. Antibody humanization by redesign of

1874

complementarity-determining region residues proximate to the acceptor framework.

1875

Methods. 2014;65(1):68-76. doi:10.1016/j.ymeth.2013.06.024

1876

215. Looger LL, Hellinga HW. Generalized dead-end elimination algorithms make

1877

large-scale protein side-chain structure prediction tractable: implications for protein

1878

design and structural genomics. J Mol Biol. 2001;307(1):429-445.

1879

doi:10.1006/jmbi.2000.4424

1880

216. Parker AS, Zheng W, Griswold KE, Bailey-Kellogg C. Optimization algorithms for

1881

functional deimmunization of therapeutic proteins. BMC Bioinformatics.

1882

2010;11(1):180. doi:10.1186/1471-2105-11-180

1883

217. PARKER AS, GRISWOLD KE, BAILEY-KELLOGG C. OPTIMIZATION OF

1884

THERAPEUTIC PROTEINS TO DELETE T-CELL EPITOPES WHILE

1885

MAINTAINING BENEFICIAL RESIDUE INTERACTIONS. J Bioinform Comput

1886

Biol. 2011;09(02):207-229. doi:10.1142/S0219720011005471

1887

218. He L, Friedman AM, Bailey-Kellogg C. A divide-and-conquer approach to determine

1888

the Pareto frontier for optimization of protein engineering experiments. Proteins Struct

1889

Funct Bioinforma. 2012;80(3):790-806. doi:10.1002/prot.23237

1890

219. Parker AS, Choi Y, Griswold KE, Bailey-Kellogg C. Structure-Guided

1891

Deimmunization of Therapeutic Proteins. J Comput Biol. 2013;20(2):152-165.

1892

doi:10.1089/cmb.2012.0251

1893

220. Choi Y, Griswold KE, Bailey-Kellogg C. Structure-based redesign of proteins for

1894

minimal T-cell epitope content. J Comput Chem. 2013;34(10):879-891.

1895

doi:10.1002/jcc.23213 55

1896

221. Choi Y, Hua C, Sentman CL, Ackerman ME, Bailey-Kellogg C. Antibody

1897

humanization by structure-based computational protein design. MAbs.

1898

2015;7(6):1045-1057. doi:10.1080/19420862.2015.1076600

1899

222. Choi Y, Ndong C, Griswold KE, Bailey-Kellogg C. Computationally driven antibody

1900

engineering enables simultaneous humanization and thermostabilization. Protein Eng

1901

Des Sel. 2016;29(10):419-426. doi:10.1093/protein/gzw024

1902

223. Parker AS, Griswold KE, Bailey-Kellogg C. Optimization of Combinatorial

1903

Mutagenesis. J Comput Biol. 2011;18(11):1743-1756. doi:10.1089/cmb.2011.0152

1904

224. Gainza P, Roberts KE, Georgiev I, et al. OSPREY: protein design with ensembles,

1905

flexibility, and provable algorithms. Methods Enzymol. 2013;523:87-107.

1906

doi:10.1016/B978-0-12-394292-0.00005-9

1907 1908 1909

225. Ponder JW, Case DA. Force Fields for Protein Simulations. In: Advances in Protein Chemistry. ; 2003:27-85. doi:10.1016/S0065-3233(03)66002-X 226. Foote J, Winter G. Antibody framework residues affecting the conformation of the

1910

hypervariable loops. J Mol Biol. 1992;224(2):487-499.

1911

doi:10.1016/0022-2836(92)91010-M

1912

227. Makabe K, Nakanishi T, Tsumoto K, et al. Thermodynamic Consequences of

1913

Mutations in Vernier Zone Residues of a Humanized Anti-human Epidermal Growth

1914

Factor Receptor Murine Antibody, 528. J Biol Chem. 2008;283(2):1156-1166.

1915

doi:10.1074/jbc.M706190200

1916

228. Nakanishi T, Tsumoto K, Yokota A, Kondo H, Kumagai I. Critical contribution of

1917

VH-VL interaction to reshaping of an antibody: the case of humanization of

1918

anti-lysozyme antibody, HyHEL-10. Protein Sci. 2008;17(2):261-270.

1919

doi:10.1110/ps.073156708

1920

229. Onda M, Beers R, Xiang L, Nagata S, Wang Q -c., Pastan I. An immunotoxin with

1921

greatly reduced immunogenicity by identification and removal of B cell epitopes. Proc

1922

Natl Acad Sci. 2008;105(32):11311-11316. doi:10.1073/pnas.0804851105

1923

230. Cantor JR, Yoo TH, Dixit A, Iverson BL, Forsthuber TG, Georgiou G. Therapeutic

1924

enzyme deimmunization by combinatorial T-cell epitope removal using neutral drift.

1925

Proc Natl Acad Sci. 2011;108(4):1272-1277. doi:10.1073/pnas.1014739108

1926

231. Mazor R, Eberle JA, Hu X, et al. Recombinant immunotoxin for cancer treatment with

1927

low immunogenicity by identification and silencing of human T-cell epitopes. Proc

1928

Natl Acad Sci. 2014;111(23):8571-8576. doi:10.1073/pnas.1405153111

1929 1930

232. King C, Garza EN, Mazor R, et al. Removing T-cell epitopes with computational protein design. Proc Natl Acad Sci U S A. 2014. doi:10.1073/pnas.1321126111 56

1931

233. Schubert B, Schärfe C, Dönnes P, Hopf T, Marks D, Kohlbacher O.

1932

Population-specific design of de-immunized protein biotherapeutics. Dunbrack RL, ed.

1933

PLOS Comput Biol. 2018;14(3):e1005983. doi:10.1371/journal.pcbi.1005983

1934

234. Choi Y, Verma D, Griswold KE, Bailey-Kellogg C. EpiSweep: Computationally

1935

Driven Reengineering of Therapeutic Proteins to Reduce Immunogenicity While

1936

Maintaining Function. In: Methods in Molecular Biology. ; 2017:375-398.

1937

doi:10.1007/978-1-4939-6637-0_20

1938

235. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational

1939

approaches to estimate solubility and permeability in drug discovery and development

1940

settings. Adv Drug Deliv Rev. 2012;64:4-17. doi:10.1016/j.addr.2012.09.019

1941

236. Wishart DS. DrugBank: a comprehensive resource for in silico drug discovery and

1942 1943

exploration. Nucleic Acids Res. 2006;34(90001):D668-D672. doi:10.1093/nar/gkj067 237. Raybould MIJ, Marks C, Krawczyk K, et al. Five computational developability

1944

guidelines for therapeutic antibody profiling. Proc Natl Acad Sci.

1945

2019;116(10):4025-4030. doi:10.1073/pnas.1810576116

1946

238. Kovaltsuk A, Leem J, Kelm S, Snowden J, Deane CM, Krawczyk K. Observed

1947

Antibody Space: A Resource for Data Mining Next-Generation Sequencing of

1948

Antibody Repertoires. J Immunol. 2018;201(8):2502-2509.

1949

doi:10.4049/jimmunol.1800708

1950

239. Chothia C, Lesk AM. Canonical structures for the hypervariable regions of

1951

immunoglobulins. J Mol Biol. 1987;196(4):901-917.

1952

doi:10.1016/0022-2836(87)90412-8

1953

240. Weitzner BD, Kuroda D, Marze N, Xu J, Gray JJ. Blind prediction performance of

1954

RosettaAntibody 3.0: Grafting, relaxation, kinematic loop modeling, and full CDR

1955

optimization. Proteins Struct Funct Bioinforma. 2014;82(8):1611-1623.

1956

doi:10.1002/prot.24534

1957 1958 1959 1960 1961 1962 1963 1964

241. Weitzner BD, Jeliazkov JR, Lyskov S, et al. Modeling and docking of antibody structures with Rosetta. Nat Protoc. 2017;12(2):401-416. doi:10.1038/nprot.2016.180 242. Kuroda D, Gray JJ. Pushing the Backbone in Protein-Protein Docking. Structure. 2016;24(10):1821-1829. doi:10.1016/j.str.2016.06.025 243. Kamerzell TJ, Russell Middaugh C. The Complex Inter-Relationships Between Protein Flexibility and Stability. J Pharm Sci. 2008;97(9):3494-3517. doi:10.1002/jps.21269 244. Galm L, Amrhein S, Hubbuch J. Predictive approach for protein aggregation: Correlation of protein surface characteristics and conformational flexibility to protein

57

1965

aggregation propensity. Biotechnol Bioeng. 2017;114(6):1170-1183.

1966

doi:10.1002/bit.25949

1967

245. Schrag JD, Picard M-È, Gaudreault F, et al. Binding symmetry and surface flexibility

1968

mediate antibody self-association. MAbs. 2019;11(7):1300-1318.

1969

doi:10.1080/19420862.2019.1632114

1970

246. Kiyoshi M, Caaveiro JMM, Miura E, et al. Affinity improvement of a therapeutic

1971

antibody by structure-based computational design: Generation of electrostatic

1972

interactions in the transition state stabilizes the antibody-antigen complex. PLoS One.

1973

2014;9(1):e87099. doi:10.1371/journal.pone.0087099

1974

247. Yamashita T, Mizohata E, Nagatoishi S, et al. Affinity Improvement of a

1975

Cancer-Targeted Antibody through Alanine-Induced Adjustment of Antigen-Antibody

1976

Interface. Structure. 2019;27(3):519-527.e5. doi:10.1016/j.str.2018.11.002

1977

248. Wong SE, Sellers BD, Jacobson MP. Effects of somatic mutations on CDR loop

1978

flexibility during affinity maturation. Proteins Struct Funct Bioinforma.

1979

2011;79(3):821-829. doi:10.1002/prot.22920

1980

249. Bostrom J, Haber L, Koenig P, Kelley RF, Fuh G. High Affinity Antigen Recognition

1981

of the Dual Specific Variants of Herceptin Is Entropy-Driven in Spite of Structural

1982

Plasticity. Romesberg F, ed. PLoS One. 2011;6(4):e17887.

1983

doi:10.1371/journal.pone.0017887

1984

250. Jeliazkov JR, Sljoka A, Kuroda D, et al. Repertoire Analysis of Antibody CDR-H3

1985

Loops Suggests Affinity Maturation Does Not Typically Result in Rigidification.

1986

Front Immunol. 2018;9. doi:10.3389/fimmu.2018.00413

1987

251. Fukunaga A, Tsumoto K. Improving the affinity of an antibody for its antigen via

1988

long-range electrostatic interactions. Protein Eng Des Sel. 2013;26(12):773-780.

1989

doi:10.1093/protein/gzt053

1990

252. Fukunaga A, Maeta S, Reema B, Nakakido M, Tsumoto K. Improvement of antibody

1991

affinity by introduction of basic amino acid residues into the framework region.

1992

Biochem Biophys Reports. 2018;15:81-85. doi:10.1016/j.bbrep.2018.07.005

1993

253. Fleishman SJ, Whitehead TA, Ekiert DC, et al. Computational Design of Proteins

1994

Targeting the Conserved Stem Region of Influenza Hemagglutinin. Science (80- ).

1995

2011;332(6031):816-821. doi:10.1126/science.1202617

1996

254. Baran D, Pszolla MG, Lapidoth GD, et al. Principles for computational design of

1997

binding antibodies. Proc Natl Acad Sci. 2017;114(41):10900-10905.

1998

doi:10.1073/pnas.1707171114

58

1999

255. Jiang L, Althoff EA, Clemente FR, et al. De Novo Computational Design of

2000

Retro-Aldol Enzymes. Science (80- ). 2008;319(5868):1387-1391.

2001

doi:10.1126/science.1152692

2002

256. Siegel JB, Zanghellini A, Lovick HM, et al. Computational Design of an Enzyme

2003

Catalyst for a Stereoselective Bimolecular Diels-Alder Reaction. Science (80- ).

2004

2010;329(5989):309-313. doi:10.1126/science.1190239

2005

257. Adolf-Bryfogle J, Kalyuzhniy O, Kubitz M, et al. RosettaAntibodyDesign (RAbD): A

2006

general framework for computational antibody design. PLoS Comput Biol.

2007

2018;14(4):e1006112. doi:10.1371/journal.pcbi.1006112

2008

258. Schneider G, Clark DE. Automated De Novo Drug Design: Are We Nearly There Yet?

2009

Angew Chemie Int Ed. 2019;58(32):10792-10803. doi:10.1002/anie.201814681

2010

259. Bradbury ARM, Sidhu S, Dübel S, McCafferty J. Beyond natural antibodies: the

2011

power of in vitro display technologies. Nat Biotechnol. 2011;29(3):245-254.

2012

doi:10.1038/nbt.1791

2013

260. Shukla D, Schneider CP, Trout BL. Molecular level insight into intra-solvent

2014

interaction effects on protein stability and aggregation. Adv Drug Deliv Rev.

2015

2011;63(13):1074-1085. doi:10.1016/j.addr.2011.06.014

2016

261. Ohtake S, Kita Y, Arakawa T. Interactions of formulation excipients with proteins in

2017

solution and in the dried state. Adv Drug Deliv Rev. 2011;63(13):1053-1073.

2018

doi:10.1016/j.addr.2011.06.011

2019

262. Pettersen EF, Goddard TD, Huang CC, et al. UCSF Chimera - A visualization system

2020

for exploratory research and analysis. J Comput Chem. 2004;25(13):1605-1612.

2021

doi:10.1002/jcc.20084

2022

59

Tables Table I. Features used in machine learning models. (A) DeepDDG for predicting thermostability.35 (B) SOLart for predicting solubility.75 (C) A decision tree-based method for prediction of Asn deamidation probability.107 Table II. Databases that store experimental information for predictive model construction in computer-aided antibody design. Table III. Large-scale experimental data of clinical stage antibodies. Table IV. Summary of the methods for computer-aided stability engineering Table V. Computational methods to assess, predict, and reduce the immunogenicity of antibodies.

60

Figures Figure 1. Development processes of protein therapeutics and the roles of computations. All protein pictures in this work were drawn by UCSF Chimera.262 Figure 2. Equilibrium between folded and unfolded proteins and its relation to protein stability. Figure 3. Potential chemical degradation sites (deamidation and isomerization) in an antibody (PDB: 1IGT). Sequence motifs for Asn deamidation (NG, NS, NN, NT, and NH) are colored in blue. Those for Asp isomerization (DG, DS, DD, DT, and DH) are colored in green. Figure 4. General workflow and of computational methods.

61

Tables Table I. Features used in machine learning models. (A) DeepDDG for predicting thermostability.35 (B) SOLart for predicting solubility.75 (C) A decision tree-based method for prediction of Asn deamidation probability.107 (A) Prediction of ∆∆G upon mutations Category

Features

Sequence-based features

Amino acid types (wild type, mutant, neighbor) Position specific scoring matrix Fitness score derived from a multiple sequence alignment Protein design probability

Structure-based features

Backbone dihedral angles (Phi, Psi, Omega) Secondary structures Solvent accessible surface area Number of hydrogen bonds (backbone-backbone, backbone-side chain, side chain-side chain) Distance and orientation between the mutated residues and the neighboring residues

1

(B) Prediction of solubility and aggregation Category

Features

Sequence-based features

Amino acid compositions Protein length

Structure-based features

Solubility-dependent statistical potentials Secondary structures Solvent accessible surface area

(C) Prediction of Asn deamidation probability Category

Features

Sequence-based features

Pentapeptide deamidation half-life C-terminal flanking residue

Structure-based features

Backbone dihedral angle (Phi, Psi) Side chain dihedral angle (Chi1, Chi2) Asn local secondary structure Percent solvent accessibility Solvent accessible surface area Side chain hydrogen bonds Nucleophilic C-N attack distance

2

Table II. Databases that store experimental information for predictive model construction in computer-aided antibody design. Database

Contents

URL

Conformational stability ProTherm

114

Various parameters, such as ∆∆G, structures and experimental information of wild type

https://www.iitm.ac.in/bioinfo/ProTherm/

and mutant proteins. Colloidal stability a

Sequences of antibody light chains (4364 ), including 808 amyloidogenic sequences in ALBase

http://albase.bumc.bu.edu/aldb/ AL patients. 115

ZipperDB

116

AMYPdb

Predictions of fibril-forming peptides within proteins identified by the 3D Profile Method

https://services.mbi.ucla.edu/zipperdb/ http://amypdb.genouest.org/e107_plugins/amypdb_

Amyloid precursor proteins, results of sequence analysis project/project.php

89

Hexapeptides (243 amyloid/836 non-amyloid), experimental information (electron

Waltz-DB

http://waltzdb.switchlab.org/ microscopy, FT-IR, Thioflavin), computed scores (WALTZ, TANGO, PASTA) Validated amyloid precursor proteins and prions together with information on their

117

AmyPro

amyloidogenic regions/domains and a broad functional classification of their amyloid

http://amypro.net/

a

state (162 ) CPAD

118

Peptides (amyloid/non-amyloid), APRs on proteins, aggregation rates upon mutations

https://www.iitm.ac.in/bioinfo/CPAD/

Solubility 77

eSol a

a

Solubility of Escherichia coli proteins synthesized by the PURE system (4,133 )

The number of data points as of writing of this review.

3

http://www.tanpaku.org/tp-esol/index.php?lang=en

Table III. Large-scale experimental data of clinical stage antibodies. Purpose

Contents

Reference

Experimental results from 12 assays (Tm by DSF, SGAC-SINS AS100, HIC Retention Time, SMAC Retention Developability

Time, Slope for Accelerated Stability, Poly-Specificity Regent SMP Score, Affinity-Capture Self-Interaction

[90]

Nanoparticle Spectroscopy, CIC Retention Time, CSI-BLI Delta Response, ELISA, BVP ELISA). Asn deamidation and Experimental results on chemical modifications identified by the analysis via LC-MS/MS.

[25]

Experimental results on a chemical modification identified by the analysis via LC-MS/MS.

[102]

Asp isomerization Met oxidation

4

Table IV. Summary of the methods for computer-aided stability engineering. Computational methods

Brief description

Reference

∆∆G prediction (e.g. Rosetta, FoldX)

Assessing fitness landscape of amino acid sequences to a given structure

[125,131,132,130,134]

based on the folding stability. ∆∆G = ∆GMut - ∆GWT. Supercharging

Replacing amino acids on protein surface with charged residues. Often

[135,136,137,138]

lead to better refolding properties. Spatial Aggregation Propensity

Quantifying solvent exposures of hydrophobic amino acids in a given

[139,140,141,143,149,152]

structure or during molecular simulations. CamSol

Predicting aggregation-prone regions based on computed solubility profile.

[150,141,152]

Solubis

Identifying mutations that simultaneously improve conformational and

[154,155]

colloidal stabilities. Brownian dynamics simulations

Estimating solubility of input molecules based on computed association kinetics.

5

[157]

Table V. Computational methods to assess, predict, and reduce the immunogenicity of antibodies. Assessment of immunogenicity

Input

SHAB

Sequence

Description

Availability in the public domain

Reference

Humanness score based on sequence identity to human

[187] http://www.bioinf.org.uk/abs/shab/

antibody sequences Humanness score based on sequence identity to the top 20 T20 score analyzer

Sequence

[190] https://dm.lakepharma.com/bioinformatics/

matched human antibody sequences Humanness score based on sequence identity to the closest Germinality Index (GI)

Sequence

[189] N/A

human germline sequence Species-specific statistical potentials based on residue Seeliger potential

Sequence

[192] N/A

frequency Species-specific statistical potentials based on residue MG score

Sequence

[196] N/A

frequency Humanness score based on potential short stretch of T-cell Human String Content (HSC)

Sequence

[197] N/A

epitopes Humanization of antibodies Tabhu

(CDR

grafting

Input and

Description

Availability in the public domain

Framework template search followed by CDR grafting with Sequence

back-mutations)

[206] http://www.biocomputing.it/tabhu

back-mutations Germline framework template search based on CDR

Superhumanization

Sequence

[203,204] N/A

homology. Replacement of Resurfacing

residues

exposed to solvents

with

Structure

[205] N/A

corresponding residues observed in human antibodies Sequence

Deimmunization based on T-cell epitope prediction while

Available

upon

request

/Structure

preserving other properties such as affinity and stability.

corresponding author

EpiSweep

6

through

the

[234]

7