Industrial Scale Gene Synthesis

Industrial Scale Gene Synthesis

C H A P T E R E L E V E N Industrial Scale Gene Synthesis Frank Notka,* Michael Liss,* and Ralf Wagner*,† Contents 1. Brief History of Gene Synthesi...

600KB Sizes 8 Downloads 249 Views

C H A P T E R

E L E V E N

Industrial Scale Gene Synthesis Frank Notka,* Michael Liss,* and Ralf Wagner*,† Contents 1. Brief History of Gene Synthesis 2. Applications of Synthetic Genes 2.1. Availability and safety 2.2. Origin and reliability 2.3. Expression efficiency 2.4. Protein performance 2.5. Cost, capacity, and speed 2.6. Flexibility of design: artificial genes, operons, and genomes 3. State-of-the-Art Gene Synthesis 4. Gene Synthesis and Synthetic Biology-From Genes to Genomes 4.1. Information 4.2. Modularity 4.3. Standardization 4.4. Technological developments 5. Industrial Gene Synthesis—From Bench to Manufacturing 5.1. Process features 5.2. Biosafety/biosecurity 5.3. Optimization rational 5.4. Optimizer software 6. Design Tool—GeneOptimizer 6.1. Project design 6.2. Sequence design 6.3. Construction design 7. Production Processing—LIMS 7.1. Steering process 7.2. Process control 7.3. Process expansion 7.4. Order entry 7.5. Order processing 7.6. Oligonucleotide production

248 250 250 250 251 251 252 252 253 254 255 255 256 258 258 259 260 262 262 264 264 265 266 266 266 266 267 267 268 268

* Life Technologies Inc./GeneArt AG, Regensburg, Germany Institute of Medical Microbiology and Hygiene, Molecular Microbiology and Gene Therapy, University of Regensburg, Regensburg, Germany

{

Methods in Enzymology, Volume 498 ISSN 0076-6879, DOI: 10.1016/B978-0-12-385120-8.00011-5

#

2011 Elsevier Inc. All rights reserved.

247

248

Frank Notka et al.

7.7. Subfragment production 7.8. Assembly 8. Case Study: Large-Scale Gene Production 9. Conclusion References

269 269 270 272 272

Abstract The most recent developments in the area of deep DNA sequencing and downstream quantitative and functional analysis are rapidly adding a new dimension to understanding biochemical pathways and metabolic interdependencies. These increasing insights pave the way to designing new strategies that address public needs, including environmental applications and therapeutic inventions, or novel cell factories for sustainable and reconcilable energy or chemicals sources. Adding yet another level is building upon nonnaturally occurring networks and pathways. Recent developments in synthetic biology have created economic and reliable options for designing and synthesizing genes, operons, and eventually complete genomes. Meanwhile, high-throughput design and synthesis of extremely comprehensive DNA sequences have evolved into an enabling technology already indispensable in various life science sectors today. Here, we describe the industrial perspective of modern gene synthesis and its relationship with synthetic biology. Gene synthesis contributed significantly to the emergence of synthetic biology by not only providing the genetic material in high quality and quantity but also enabling its assembly, according to engineering design principles, in a standardized format. Synthetic biology on the other hand, added the need for assembling complex circuits and large complexes, thus fostering the development of appropriate methods and expanding the scope of applications. Synthetic biology has also stimulated interdisciplinary collaboration as well as integration of the broader public by addressing socioeconomic, philosophical, ethical, political, and legal opportunities and concerns. The demand-driven technological achievements of gene synthesis and the implemented processes are exemplified by an industrial setting of large-scale gene synthesis, describing production from order to delivery.

1. Brief History of Gene Synthesis Since about three decades, the top-down approach of manipulating living organisms by breeding and crossbreeding has been largely augmented by the novel bottom-up techniques of direct genetic manipulation. In 1978, the Nobel Prize in physiology or medicine was awarded to Werner Arber, Daniel Nathans, and Hamilton O. Smith for discovering restriction enzymes and their application in molecular genetics. At the time, an editorial comment in Gene stated “. . . The work on restriction nucleases not only permits us easily to construct recombinant DNA molecules and to analyze

Gene Synthesis

249

individual genes but also has led us into the new era of synthetic biology where not only existing genes are described and analyzed but also new gene arrangements can be constructed and evaluated” (Szybalski and Skalka, 1978). This cornerstone in molecular biology gave birth to the success story of genetic engineering we have witnessed over the past 30 years. Other important milestones during this period were certainly the invention of the polymerase chain reaction (PCR) (Saiki et al., 1985), cheap automated production of oligonucleotides, and high-throughput DNA sequencing systems. The systematic genetic manipulation and redesign of novel strains and genetically modified organisms (GMOs) are based on the removal of cross-species boundaries, the rearrangement of natural genetic building blocks, and the introduction of minor modifications into natural DNA sequences. Still today, most attempts to generate organisms with novel phenotypes rely on a trial-and-error approach due to the fact that living systems are extremely complex by nature and far from being fully understood. This is somewhat unsatisfying, since true construction and genuine design of machines or other man-made items aim to be as flexible, yet as standardized and predictive as possible. The emerging field of synthetic biology aims to apply the standardized process of engineering disciplines to biological sciences: working with standardized parts, combining these elements according to given syntax rules, and finally, being able to predict the effect of an assembly as precisely as possible. The prime requirement for this task is the actual availability of genetic elements that do not exist in nature. As such, de novo gene synthesis is considered the key enabling technology for synthetic biology. In 1970, the first example of a synthetically produced gene was demonstrated by Khorana and coworkers (Agarwal et al., 1970). In an effort taking several years, they assembled a 77-bp gene encoding yeast alanine transfer RNA using short oligonucleotides obtained by organic chemistry methods. While in those days gene synthesis was still restricted by the limited availability of synthetic oligonucleotides, the development of automated oligo synthesizers and subsequent decline in prices of related services motivated the emergence of novel gene synthesis methods, for example, using a T4 DNA ligase (Edge et al., 1981), heat stable ligases (Barany and Gelfand, 1991), and the ligase chain reaction (LCR) (Young and Dong, 2004). With the invention of the PCR by Kary B. Mullis in 1985 (Saiki et al., 1985), de novo gene synthesis became accessible to a broad market. Several PCR oligonucleotide assembly methods emerged based on one or more primer extension steps with subsequent amplification. Their application crossed the 1000 bp size barrier in 1990 with the synthesis of a 2.1-kb fully synthetic plasmid by Young and colleagues (Mandecki et al., 1990). Since then, ever larger synthetic DNA molecules have been constructed, although usually put together from smaller de novo synthesized 1–2 kb modules by classical ligation and/or recombination, for example, an infectious

250

Frank Notka et al.

approximately 7.5 kb poliovirus cDNA (Cello et al., 2002), or a contiguous 32 kb polyketide synthase gene cluster (Kodumal et al., 2004; Menzella et al., 2006). The current pinnacle of this advance is the compilation of an entirely synthetic bacterial genome. The group around J. Craig Venter designed, synthesized, and assembled the 1.08-Mbp Mycoplasma mycoides JCVI-syn1.0 genome starting from digitized genome sequence information. Synthetic building blocks of approximately 1 kb were first assembled from oligonucleotides and then recombined into approximately 10 kb fragments in yeast. In a next step, these were likewise recombined into approximately 100 kb intermediates, and then into the complete bacterial genome, which was subsequently transplanted into recipient Mycoplasma capricolum cells. This resulted in the first self-replicating organism derived from a fully synthetic genome (Gibson et al., 2010).

2. Applications of Synthetic Genes The first examples of genes constructed from synthetic oligonucleotides were primarily motivated by the relative complexity of attaining these molecules using alternative molecular techniques (Itakura et al., 1977; Koster et al., 1975). The ensuing rapid progress of genetic manipulation, in particular the invention of PCR, later offered much faster access to genetic material from natural sources. Thus, for some years, the potential of synthetic genes fell into oblivion, until the coverage of sequence databases and limited flexibility and performance of natural genes stimulated a new need for synthetic genes.

2.1. Availability and safety Today, the conversion of electronic sequence data into actual bioactive molecules is a vital tool in biotechnology. In many cases, the natural source material for isolating genes is simply not available, or the necessary steps required to attain a full-length gene are too labor intensive. Biosafety may also be an issue for choosing artificial genes, since working with isolated genes removed from the context of the complete organism is classified as level 1 (no risk) in most cases. Another protective measure of synthetic genes using alternative codons is their decreased ability to recombine with otherwise homologous wild-type sequences, which may be an issue with viral sequences or human oncogenes.

2.2. Origin and reliability Particularly industrial projects require most steps in research and production to be well documented and certified for regulatory reasons. This also includes the audit trail of the research reagents’ origin. It is sometimes

Gene Synthesis

251

challenging to retrace a gene’s laboratory history, or it may derive from sources or collections that do not meet regulatory demands. The source of a physical gene manufactured by an ISO certified provider circumvents this problem and is a straightforward strategy for gapless documentation. It also assures the full sequence fidelity according to project design requirements, since based on experience, many constructs derived from in-house, public, and commercial gene collections are not identical to the documented sequence.

2.3. Expression efficiency To date, most experiments in biotechnology include the recombinant expression of proteins, either to change the host’s phenotype or to directly obtain and purify the overproduced polypeptide. The dissimilar genetic and biochemical setup of different species usually causes nonoptimal transcription, processing, stability, and translation of the extrinsic gene or mRNA. Employing multiparameter optimization allows adaptation of a coding sequence to the requirements of the host, so that it performs like a native gene. Moreover, since most natural genes have not evolved for maximum expression, optimization can introduce this feature. With an overall effect on protein production yields ranging from a 10% increase to obtaining high expression of a previously undetectable gene product, optimization not only improves cross-species performance but also autologous expression, for example, the production of human genes in mammalian cells (Fath et al., 2011).

2.4. Protein performance Not only the genes are in suboptimal shape for technological and industrial purposes but also their products. Increasing numbers of recombinant proteins are being employed in healthcare, chemical and food industries, agriculture, and everyday household products. Here, they must perform under conditions that are substantially different from their previous natural environment. Viral antigens for immunization ought to be highly immunogenic, humanized antibodies for cancer therapy must recognize distinct cellular targets, and enzymes in laundry detergents have to perform under the harsh conditions of a washing machine, to name just a few. Proteins need to be engineered in order to be of commercial use. However, rational computation and prediction of necessary alterations is extremely difficult, and in most cases unachievable, since we still lack sufficient knowledge to deduce three-dimensional protein structures from the amino acid sequence. Here, it is common practice to involve methods of directed evolution—the generation and selection of many protein variants. While earlier methods to produce gene collections or gene libraries for this purpose involved tedious targeted or random mutagenesis, gene synthesis provides much faster access

252

Frank Notka et al.

to these collections and on a more rational basis. During gene fabrication, the use of oligonucleotides carrying controlled impurities (degenerations) at defined positions allows the production of libraries that result in proteins where only the relevant amino acids are prone to substitutions. This narrows down the desired fuzziness of the variants to the areas of interest and dramatically increases the success rate of protein improvement through directed evolution.

2.5. Cost, capacity, and speed The considerable decline in prices for synthetic genes has today created a source of biological DNA sequences that economically outcompetes classic genetic engineering methods. Molecular cloning steps, necessary in many projects as groundwork, can be outsourced and internal resources focused on genuine research goals. Relocating the manual DNA manipulation to an automated industrial manufacturing process also dramatically increases the processable unit size—more genes can be obtained in a shorter time—a vital necessity in the competitive domains of commercial and scientific biotechnology.

2.6. Flexibility of design: artificial genes, operons, and genomes The freedom to access any imaginable DNA sequence allows not only the modification and adaptation of naturally occurring molecules but also the manifestation of some very new visions in synthetic biology (Heinemann and Panke, 2006). A major goal within this field is to design and construct new metabolic pathways within a producer cell. This must address three major obstacles. First, for a stable and efficient series of reactions, the enzymes involved must be expressed in a highly concerted manner. Very much like other engineering technologies, this demands the availability of standardized regulatory parts and elements. Ideally, promoters, ribosome binding sites, terminators, DNA-binding proteins, corresponding protein landing sites, etc. should be available with various well-characterized potencies and specificities. Together with sophisticated computer-aided design and simulation tools, these elements ought to be combined a priori to compile novel pathways. Second, fast and efficient formation of new gene clusters or operons requires the simultaneous assembly of such parts in a robust, yet flexible way. Classical restriction sites do not allow for arbitrary combination of multiple elements simultaneously. Novel in vitro recombination technologies in conjunction with artificial modular junction sites can offer solutions in this direction. Third, establishing an extrinsic biochemical pathway within a living cell must always be perceived in the context of its entire metabolism. The availability of only one diffusion space does not allow efficient spatial separation of distinct reaction steps, and participating

Gene Synthesis

253

intermediates can always interfere with both the projected pathway and total cell fitness. Therefore, one aim is to construct simplified “chassis” strains with genomes reduced to the lowest number of genes necessary for cellular survival and growth (Gibson et al., 2008). Here again, knocking out dispensable genes one by one by conventional methods is likely to be a highly tedious strategy. More likely, the in vitro synthesis of complete genomes, designed from scratch, will provide a much faster and more flexible way to bring these organisms to life. It is reasonable to assume that the cornerstone of the complete synthesis and transplantation of a 1.08Mbp M. mycoides genome will drive further developments toward modular gene cluster construction kits in conjunction with compatible host strains, allowing for true engineering strategies in biological sciences.

3. State-of-the-Art Gene Synthesis Gene synthesis has emerged as a new application of genetic engineering, utilizing oligonucleotides and different methods of assembling these to generate stretches of double-stranded DNA usually cloned into a plasmid vector. The numerous methods employed today vary widely based on the length and complexity of the DNA, and depending on other factors such as intellectual property rights or high throughput and automation capability. Although synthetic genes can readily be ordered via the internet, the methods applied are usually basic genetic engineering methods and hence gene synthesis can be performed at the molecular biology bench using typical reagents and procedures. In general, DNA synthesis relies on assembling individually presynthesized oligonucleotides, typically by PCR-based reactions (e.g., SCR: sequential chain reaction), or by ligation of predefined reusable oligos (Slonomics technology) (Van den Brulle et al., 2008), followed by standard cloning procedures and final quality control. Oligonucleotides are usually ordered from commercial providers, since the process of oligonucleotide synthesis has been automated and oligos can be produced very economically. Standard chemical oligo synthesis is a cyclical process that elongates a chain of nucleotides from the 30 - to the 50 -end. The phosphoramidite four-step process, developed in the early 1980s, couples an acid-activated deoxynucleoside phosphoramidite to a deoxynucleoside on a solid support. Although this is the method of choice currently used by most commercial oligonucleotide synthesizers, the specifications for oligo usage in gene synthesis require adjustments to quantity and quality. Since PCR-based methods are intrinsically error-prone due to the high error rate associated with oligonucleotide synthesis and sequence mutations introduced during PCR amplification (Xiong et al., 2008), gene synthesis greatly depends on oligonucleotides with maximum sequence accuracy.

254

Frank Notka et al.

In addition, a balanced ratio of oligo quality and quantity is desired, because for gene synthesis only low amounts are needed compared to other conventional oligonucleotide-based applications. Oligonucleotide synthesis scale down is one of the most efficient ways to reduce gene synthesis costs, although reducing production volume and chemicals consumption is limited with current phosphoramidite-based synthesis processes and the need to preserve a high quality level. Continuous developments in oligonucleotide synthesis, specifically for use in gene synthesis, is progressing in terms of production scales or bringing in new methods developed for different applications, for example, chip synthesis technology. Combining these basic components into the first-step synthetic DNA fragment is usually limited to approximately 500–2000 bp, depending on the technology used. Accordingly, subsequent assembly steps are required for larger genes or higher order complexes. Again, the arsenal of assembly technologies is large and still evolving. Today, most users apply techniques that are PCR-based (e.g., PCA: polymerase chain assembly), ligase-based (LCR), mixtures of both, or homology-based methods (SLIC: sequenceand ligationindependent cloning; RED recombination, etc.). However, commercial gene synthesis performing large-scale DNA fabrication at base-level precision is transforming genetic engineering from a laborious art to an industrial, information, and technology-driven discipline (for a review see Czar et al., 2009). It is expected that synthetic biology driving demands for synthetic genes will challenge existing gene synthesis capabilities. Thus it is not surprising that current chemical DNA synthesis and gene assembly methods are being supplemented with new engineering tools, technologies, and trends aiming at providing or extending gene synthesis capacities, and at the same time cutting production costs. Some of these developments include oligonucleotide synthesis from DNA microarrays or the use of microfluidics and multiplex gene synthesis technologies (reviewed by Tian et al., 2009). Recent developments target reliability and process stability as well as simplifying existing processes, for example, by introducing smart error correction methods, reducing and improving oligo assembly, or providing assembly devices (Cheong et al., 2010; Gordeeva et al., 2010; Huang et al., 2009; TerMaat et al., 2009).

4. Gene Synthesis and Synthetic BiologyFrom Genes to Genomes Synthetic biology is a truly interdisciplinary development with many different scientific, commercial, social and political aspects, interests, and implications. Much has been initiated and already accomplished; some aspects, however, are still at an infant or developmental stage. For example,

Gene Synthesis

255

simple provision of synthetic DNA as elementary components is rather advanced and a complete industry has evolved within the past decade (Graf et al., 2009). Still, in order to efficiently exploit the potential that gene synthesis offers synthetic biology for developing new applications, some issues need specific attraction.

4.1. Information Synthetic biology comprises different layers of networks associated with information. Function, regulation, flux, and genetic information are just a selection of relevant categories where information input is needed to generate specific processes. Starting from (i) assignable single functions, such as specific catalytic activities or unique binding sites, to (ii) more complex but still defined regulation interrelations, as experienced in operons or logical function devices, and (iii) complex higher order processes found in metabolic pathways, cells, or organisms, to a comparable extent the information needed gains complexity and is cumulatively more difficult to provide. Nevertheless, or precisely for this reason, the first requirement in synthetic biology is knowledge and more importantly access to it. A comprehensive database for synthetic biology—should it exist—must, in addition to technological details, provide information aside from function and how a product can be used. Depending on the goals and application field, it is mandatory to be informed about community standards, for example, on assembly, quality control, quantification, intellectual property rights, etc., associated with a component or device and on other legal regulations such as biosafety and biosecurity issues. At present, a comprehensive data collection is not available, although the information exists. Therefore, one central demand for synthetic biology to prosper is without a doubt generating and maintaining an information database. Since such a project reflects the broader public interest in addition to its scientific significance, and since it will require substantial sourcing, it will most likely require public funding. One example pointing in the right direction has been developed by the Spanish National Cancer Research Center supported by various national and international funding agencies. The Bionemo (Biodegradation Network Molecular Biology Database) reflects an online data collection that stores manually organized information about proteins and genes directly implicated in biodegradation metabolism that has been extracted from published articles (http://bionemo.bioinfo.cnio.es/Run.cgi).

4.2. Modularity Information such as the properties associated with a part (protein) or a subpart (domain) is readily available and can be used to design and produce new functions by combining what is known. The engineering and

256

Frank Notka et al.

modification of individual proteins was traditionally dominated by directed evolution methods, providing appropriate pools of proteins with partly randomized sequence and methods to select the desired variation (Bershtein and Tawfik, 2008). More recently, computational protein design methods are becoming increasingly successful with structure-based engineering of protein folds, interactions, and activities (Van der Sloot et al., 2009). Apart from the design and engineering by manipulation of residues, combination and fusion of whole protein domains is gradually becoming more popular (Heyman et al., 2007; Parmeggiani et al., 2008) and might be further boosted by the concept of accessing standardized parts and subparts. The smallest design entity in the vocabulary of synthetic biology refers to a subpart that characterizes the discrete minimal sequence requirement associated with a function performing specific tasks independently of the other subparts (one defined segment of a more complex whole). The subpart comprises (i) a domain with respect to structural sequences (being translated into proteins) and (ii) a functional motif, for example, a transcription factor binding site, with respect to functional sequences (e.g., regulatory sequences as represented by a promoter). Starting with subparts and increasing the complexity via assembly into parts, devices, systems, and genomes, the basic requirements for a circuit diagram-based construction that can be characterized by interchangeable, functionally well-defined, and ultimately normalized components become obvious. Subparts and all subsequent constructions need to be freely linked and arranged in order to combine the intended functions. The connection sites also need to be flexible, providing the potential to introduce additional motifs, such as linkers, restriction sites, or protease cleavage sites. The assembly process thus needs to be highly flexible and the system requires a high degree of modularity. Scar-free assembly of sequences resulting in the exact input sequence depends on sophisticated bioinformatics tools for sequence modulation and optimization. These tools are available, and in addition to gluing the exact sequences together, most of the developed tools provide optimization algorithms to improve expression characteristics in the selected host system (Raab et al., 2010). Apart from the technological feasibility of domain and circuit assembly, apparent biological complexity appears to impede the rational design of sophisticated protein circuitry. However, progress in this direction is evident and fusion of individual domains to new functional entities seems possible (Gru¨nberg and Serrano, 2010).

4.3. Standardization Conceptual frameworks and related international collaboration opportunities are sparse. Standards underlie most aspects of the modern world, especially when it comes to engineering principles that rely on the exact description of individual elements used for a construction plan-based

Gene Synthesis

257

design. Considering the complexity of biological systems, an adequate process of standardization seems inordinately more difficult in the science of biology (De Lorenzo and Danchin, 2008). Still, a number of useful standards have already been described, and the number is increasing partly in response to the development of widely practiced methods that generate significant amounts of data (exterior impulse), and partly due to initiatives aiming at transforming biology into an engineering discipline (interior impulse). Existing standards include information at different levels, ranging from one-dimensional descriptions (e.g., enzyme nomenclature, endonuclease activities, DNA sequence data, and genetic features) to complex data handling and description (e.g., microarray data, protein crystallographic data, and systems biology models) (Endy, 2005). These standards have to be supplemented by accurate technical standards for most classes of basic biological functions and experimental measurements, as well as by standards beyond technology, facilitating cooperation, sharing, public acceptance (e.g., common language, IP regulation, and biosecurity guidance). This is essential for a prospering and responsible synthetic biology community. In principle, two technological standardization categories have to be considered when designing a device or a system: (i) the physical assembly of parts within a construction plan, based on cloning/assembly rules and (ii) function, based on consistent characterization and score classification of reusable standard biological parts. Fueling the synthetic biology idea, the Registry of Standard Biological Parts starting at MIT as a first practical example, now maintains and distributes thousands of BioBrick biological parts (Canton et al., 2008). However, BioBrick parts are only standardized in terms of how individual parts are physically assembled into multicomponent systems, and most parts remain uncharacterized. Therefore, scientists have started to develop measurements and processes to characterize certain functions, within a defined environment, based on reference activities. In anticipation of global acceptance and use of synthetic biology standards, researchers started to assemble kits for lab use, as exemplified by the definition of the Relative Promoter Unit as a measurement of promoter activity (Kelly et al., 2009). The initial BioBrick limitations have also prompted scientists to develop their own standards, providing different avenues to overcome these shortcomings. The lack of compatibility between independently proposed standards has significantly increased the complexity of assembling constructs from standardized parts. These problems have recently also been recognized and addressed, especially by means of computer-aided design concepts. Computer tools have been developed to provide a framework for the precise description of part assembly in the context of a stimulated progression of physical construction methods and rules. In addition, these tools provide methods for assembly from large libraries of genetic parts, as well as simulation functions to model different biological systems and for testing predicted functions in silico. In analogy to

258

Frank Notka et al.

providing standardization kits, these programs are available online to be accessible to a large community of synthetic biologists (Cai et al., 2010, Cooling et al., 2010, Marchisio and Stelling, 2009).

4.4. Technological developments The list of requirements can be continued and further depends on the perspective and individual position, situation, or objective. A public spokesman has different concerns and needs than a government representative, a synthetic biology user, or a basic material provider, although overlaps are obvious. One major and common requirement is the ability to provide the raw material for developing environmental, energy, medical, material, and other applications, that is, the technological competence to produce devices, systems, and even genomes in a usable, economical as well as ethical and legally justifiable manner. While gene synthesis technologies are rapidly advancing, the assembly of readily fabricated fragments for producing genetic metabolic networks or even genomes is at the moment practically a manual process. However, the potential of assembling genomes using recombination technologies in yeast has been acknowledged (Gibson et al., 2008) and technological progress is evident (Shao et al., 2009). Therefore, to be able to satisfy the anticipated demand for large gene constructs, the scales and costs for assembling technologies need further promotion. To a similar extent, it is absolutely essential to integrate the option of providing a defined amount of variation at a certain position, meaning that computer and wet-lab tools for the design and implementation of gene libraries in synthetic biology projects need to be advanced.

5. Industrial Gene Synthesis—From Bench to Manufacturing Over the past three decades, the ability to amplify DNA dramatically boosted the availability of natural templates otherwise inaccessible in sufficient amounts for genetic manipulation. In conjunction with easy and cheap availability of oligonucleotide synthesis, PCR also allowed direct and flexible manipulation of amplified DNA fragments, although introduction of larger mutations and/or rearrangements of DNA fragments remained only possible through consecutive rounds of alterations, in other words, time consuming and expensive. Furthermore, automated fluorescence-based sequencing techniques significantly accelerated molecular cloning and facilitated easy examination of intermediate steps. High-throughput sequencing also led to the exponential growth of available sequence information in publicly available databases, with a doubling rate of approximately

Gene Synthesis

259

18 months. This in turn motivated the development of sophisticated algorithms and web applications to manage and use this vast amount of data. By the mid-1990s, the records of DNA and protein sequences, structural data, protein interaction networks, expression profiles, etc. became comprehensive enough to substitute for real-life experiments. Today, it is difficult to perform BLAST analysis of a sequence that has not been previously identified, in addition to finding numerous-related sequences from many different species, alive or extinct. Moreover, modern nextgeneration high-throughput sequencing of complete genomes or even metagenomes predominantly store the data electronically on hard drives, rather than in tangible genomic or cDNA libraries. Ideally, this could free the experimenter from genetic source material, which is often difficult, impossible, or sometimes dangerous to obtain. What remains is the problem of the fundamental difference between electronic sequence data and its physical counterpart preserved in a tangible gene. A “translation” machine or process capable of quickly converting an ASCII input sequence into a cloned DNA molecule in a copy/paste manner was needed.

5.1. Process features These promises were so tempting that consequently around the year 2000 the first companies appeared on the market offering such services. The gene synthesis business started out with a relative high price for artificial genes amounting to US$12 per base pair or US$10,000—20,000 for an average sized gene. The application of synthetic genes in scientific projects involved careful preparation and budgeting and was still far from being widespread. However, during the following 10 years, the price rapidly declined exponentially. Today, gene synthesis costs are about 3% of their original figure and have reached a level that is highly competitive with any alternative cloning method. This remarkable price drop was due to challenging competition between gene synthesis providers, not only at the level of product prices but also in service coverage, quality, capacity, and delivery time. While at first pricing was the deciding factor for the customer, the falling market price for synthetic genes forced providers to drive technological and administrative developments toward being cost effective in a tight market and coping with an exponential increase in demand. Since nowadays related costs are no longer the vital or limiting factor for deciding to work with synthetic genes, providers concentrate more on total synthesis capacity and the reduction and reliability of delivery time. With the growing market and the beginning of the era of synthetic biology, the business model for gene synthesis companies changed from a high-priced low-quantity niche market provider to a high-throughput supplier of a common research reagent. For the scientist, the order process needs to be easy and intuitive. Ideally, the electronic sequence can be submitted online

260

Frank Notka et al.

within a web interface similar to current sequence manipulation software. It must provide straightforward tools for submitting bulk orders of many different sequences and turnaround times for generating quotes need to be short. As such, many of the involved data processing steps—order entry, sequence optimization, quote generation—must be automated to minimize the level of human involvement in order to secure the scalability of the service.

5.2. Biosafety/biosecurity Synthetic biology is generally believed to have beneficial environmental, biomedical, and commercial potential; at the same time, potential “highrisk” factors and applications cannot be neglected. Gene synthesis is a typical “dual-use” technology. It can be applied for the greater good, providing research material for therapeutics and vaccine development, but the very same genes can be misused for nefarious purposes to cause considerable harm. In particular, the possibility of synthesizing pathogens and using these as biological weapons is palpable. The successful synthesis or reproduction of a poliovirus accomplished by online ordering of oligonucleotides (Cello et al., 2002), the reconstruction of the 1918 “Spanish Flu” virus (Tumpey et al., 2005) and many more examples fuel this notion. At the moment, legally binding regulations for screening do not exist, but the awareness regarding the dual-use problematic within the uniformly propagated potential of synthetic biology and the resulting need for appropriate directives is high (Samuel et al., 2009). The gene synthesis industry, represented by five major companies joined within the International Gene Synthesis Consortium (IGSC), has taken on its responsibility for a secure and fair supply of genetic material. The IGSC has developed and presented a harmonized best practice screening protocol in compliance with draft guidelines released by the US government. The member companies committed themselves to comply with the developed protocol and the maxim behind it, and implemented or adjusted screening processes in view of that. The second risk factor in high scale gene synthesis regards the biosafety evaluation of hundreds of sequences. The eventuality of GMOs to escape from a research laboratory or containment facility with the potential to proliferate out of control causing environmental damage or threatening public health is a long known concern, coinciding with the very early advantages in recombinant DNA technology. Already in 1975 a group of professionals joining the Asilomar conference defined voluntary guidelines to ensure the safe handling of recombinant DNA. These guidelines were in general adopted by the scientific community and brought forth stringent regulation in many biosafety (genetic engineering) laws. Accordingly, one integral component of the ordering process affects the authentication of a sequence request in order to provide security that the generated GMO has no potential to harm

261

Gene Synthesis

lab personnel, that the sequence cannot be misused for hostile or malicious purposes, and that a customer is ordering with the intention to promote legitimate research (Fig. 11.1). In accordance with the U.S. governmental guidelines (Screening Framework Guidance for Providers of Synthetic Double-Stranded DNA, effective on October 13, 2010), the biosecurity evaluation process is divided into two tasks: first, the identity and legitimacy of a customer is assessed. Second, the sequences for all ordered gene products are identified and screened against specific databases to determine whether they match a sequence related to an existing hazardous or controlled agent or toxin. Regulation protocols have been implemented into the order process to provide decision guidance for safety officers in the case that a sequence or a customer raises concerns. Problematic sequence requests are processed in absolute compliance with national and international regulations and laws. These include export control regulations and the guidelines and lists established by the Australia Group, an informal forum of member countries with the goal to strengthen global security through harmonization of export controls to prevent illegitimate supply of compounds for chemical or biological weapons (http://www.australiagroup.net/en/index.html). In addition, customers located in “Countries of Concern” as determined by official authorities are informed that due to compliance with all export controls, sanctions, and related laws and Local biosafety classification

Critical sequence lists (AG list, CDC)

BioSafety

BioSecurity Sequence check

Sequence identification

Sequence host Sequence function

Production

check

NCBI BLAST

ok?

Country check www

?

ok

?

ok?

check

FedEx

Customer check

Approval of summary export control document

Legitimacy

BioSecurity Customer evaluation lists

Figure 11.1 Schematic overview of Life Technologies’ biosafety and biosecurity screening practice integrated into the ordering process.

262

Frank Notka et al.

regulations, their order cannot be accepted. The relevant information is concised and an internal summary export control document has to be completed before shipment of goods.

5.3. Optimization rational The first step in gene synthesis is specifying the sequence itself. Given the flexibility of synthesizing any conceivable string of nucleotides, it is reasonable to alter a natural gene to ensure its best performance in the required application or experiment. The second rationale for gene optimization is of practical nature. Since the synthesis of genes relies on the correct assembly of short oligonucleotides, copious motif repeats and inverted repeats need to be avoided. This, again, is a beneficial feature in the final sequence regarding genetic stability. The most commonly employed modification of proteincoding genes is adapting codon usage. With the rapidly growing size of natural sequence databases, numerous sequenced genes are listed for many species—up to fully sequenced genomes of the most studied organisms. This information is compiled into codon usage databases, reflecting the relative frequency of alternative codons in each organism. Different schemes and algorithms have been developed to best adapt a coding gene to the codon usage of the host organism. The most common optimization strategy to date is completely avoiding rare codons, and aiming for maximum saturation with the most frequent ones. It has been demonstrated that the most frequent codons correlate with the most abundant tRNA pools, while the relative tRNA levels do not change with expression or cellular growth and are available for the translational machinery (Emilsson et al., 1993; Ikemura, 1985). Codon choice, however, is not the only parameter when contemplating a well-designed gene. Other variables to consider are adjusting GC content, and avoiding direct and reverse repeats, restriction sites, ribosomal entry sites, cryptic splice motifs, polyadenylation signals, sequences controlling mRNA half-life, RNA secondary structures, etc. (Fig. 11.2). However, it may be desirable to introduce certain DNA motifs, or avoid similarities to naturally occurring sequences.

5.4. Optimizer software Together, this approach results in a multiparameter optimization. The challenge is to find the sequence that represents the best compromise between different and sometimes conflicting requirements. Without doubt, the best solution would be to generate all possible combinations of codons representing a given amino acid sequence, assess all of them with the help of a quality function, and finally choose the one with the highest quality score regarding all necessary parameters (Fig. 11.3A). Unfortunately, even for a rather small protein of 100 amino acids, the number of possible

263

Gene Synthesis

Wild type gene sequence

Sequence repeats PABP

Codon usage

GC content

PAB

P

PABP

A AAA AAAAA AA A AAAAA AA

AAAAAA

AAAAAA

RNA sec. structures

Splice sites

Poly(A) sites, killer motifs

GeneOptimizer®

Optimized gene

Figure 11.2 GeneOptimizer multiparameter gene optimization: Parallel processing of performance relevant sequence parameter.

A Test all possible

1

2

check

3

check

4

check

check

B Iterative

next

next

next

C Sliding window

Figure 11.3

Schematic overview of potential optimization strategies.

combinations is in the range of 3100  5  1047, making the outlined approach impossible to perform in practice. The high-throughput processing of several hundred sequences per day asks for an algorithm capable of optimizing a gene in a matter of minutes. Another strategy is the serial

264

Frank Notka et al.

optimization of each sequence feature. Here, a first round could optimize the codon usage, a second cycle would adapt GC content, a third iteration eliminates repetitive sequences, and so on (Fig. 11.3B). Obviously, with each iteration, the quality of the primary parameters decreases and undesired motifs may occur. In order to still find an optimal solution, it is necessary to reduce the search space by performing an exhaustive search for the best solution only inside a small sequence window, which is moved along the whole sequence from the 50 - to the 30 -end of the reading frame. In each iteration, all codon combinations of the current window are calculated and ranked according to the desired parameters, also taking the already optimized part of the sequence into account. The best 50 codon of the window is fixed and the aperture for the next calculation round is slid one codon toward the 30 -end (Fig. 11.3C). This strategy considers both local and global sequence traits and can find an optimized sequence without human interaction in a matter of 1–3 min on a standard personal computer. This logic has been implemented in the GeneOptimizerÒ sequence suite, developed by GeneArt (Raab et al., 2010). The following passages describe the necessary developments and processes that have been implemented in Life Technologies/GeneArt’s technology platforms in order to materialize the transition of bench style gene synthesis into industrial scale DNA manufacturing. The production process chain is exemplified along the data content and the informational flow embodied within GeneOptimzerÒ and laboratory information and management system (LIMS), representing the company’s most fundamental IT groundwork.

6. Design Tool—GeneOptimizer 6.1. Project design Gene optimization is an optional process usually applied for biotechnological applications where protein expression using a specific host system is involved. There is strong evidence that optimization in general has a beneficial influence on production rates in different expression systems (Gustafsson et al., 2004, Maertens et al., 2010) as well as on expression level and duration in vivo (Kosovac et al., 2010). However, optimization can have advantages other than influencing expression. Whenever it seems favorable to avoid sequence homology, this can be achieved by gene optimization. Potential applications include (i) prevention of homology to host chromosomal sequences for enhanced plasmid stability and reduced integration events, (ii) reducing homologous recombination events to enhance safety in gene therapy or genetic vaccination approaches (Wagner et al., 2000), (iii) rescue experiments with modified genes that are, in contrast to the natural gene, not affected by siRNA-mediated

Gene Synthesis

265

silencing targeting the wild-type gene (Fath et al., 2011), or (iv) introducing silent mutations to eliminate specific DNA motifs (e.g., restriction endonuclease recognition sites). The GeneOptimizerÒ offers solutions for individual requests. Gene sequences are initially subjected to a multiparameter analysis. Subsequent modifications usually span the following options: (i) change only a specific parameter, (ii) perform complete optimization or optimization of defined sequence stretches, or (iii) process sequences in their original wildtype appearance. Thus, the GeneOptimizerÒ represents a valuable tool for project design. For example, it has been used to convert a commonly used reporter gene RNA (gfp gene; green fluorescent protein) into a quasilentiviral message, strictly following complex lentiviral regulation by adapting the gfp reporter gene to HIV codon bias (Graf et al., 2006). Gene synthesis in general contributes significantly to project design independently from any optimization process. Since natural templates are not required for gene synthesis, there is a high degree of freedom for sequence design. For example, any fusion or chimeric gene construct can be freely designed. There is no restriction in designing higher order complexes with alternating coding and noncoding regions up to the in silico design of a complete plasmid or even a genome (Gibson et al., 2008).

6.2. Sequence design The GeneOptimizerÒ tool has two independent but nevertheless integrative optimization functions. A given sequence is first optimized mainly at RNA level to improve expression characteristics as described above. In a second optimization process, the defined sequence is processed for production, providing computational segmentation and refinement cycles generating optimal production parameters. Depending on the length and complexity (e.g., sequence repeats, motif stretches, GC content, etc.), the sequence can be divided into subfragments of variable length (usually between 200 and 1800 nucleotides (nt)). Each subfragment is divided into overlapping oligonucleotides following a defined pattern: the sense strand sequence is split into sequential L-oligos of 50–60 nt in length. The antisense strand is split into shorter M-oligos of approximately 40 nt in length partially overlapping the corresponding, complementary L-oligos. This process is automated in a way that a given subfragment length is divided into a calculated number of L-oligos and corresponding M-oligos, matching a predefined oligo length interval. The complementary terminal overlap sequences are evaluated for potential mismatches (alternative pairing, self assembly) and if a certain threshold is exceeded, the process reenters the cycle with a slightly changed starting parameter. The whole cycle can be repeated until the predefined limit is reached. In an analogous process step, additional terminal amplification primers (providing cloning sites) and sequencing primer are automatically calculated.

266

Frank Notka et al.

6.3. Construction design If a sequence requires breakdown into subfragments, the program calculates all necessary subcloning steps and provides a cloning strategy. This process applies top-down assembly tree computation: starting from the final specifications (e.g., a 10-kb gene cloned into a plasmid harboring kanamycin resistance), (i) the cloning steps (e.g., step 1: 10 subfragments; step 2: combining five fragments at a time; and step 3: fusion of the two resulting fragments) and (ii) the cloning strategy including the choice of vectors for each subfragment and more importantly the respective antibiotic resistance provided (e.g., a kanamycin vector for the subfragments, an ampicillin vector for the first cloning step, and a kanamycin vector for the last cloning step) are defined in order to facilitate convenient cloning by resistance switch.

7. Production Processing—LIMS 7.1. Steering process The LIMS has been developed to virtually mirror and very specifically steer the gene synthesis process from ordering to shipment. It contains all production relevant operational tasks, rules, and information. The workflow engine provides the basis for steering and tracking the production status of any order started within the system. Fundamental tools such as bioinformatics sequence design or analyses tools are integrated and the system is capable of further plug-in extensions. The specific functions include (i) informational sequence processing; (ii) support of production logistics by generating work lists, linking the lab staff to automated pipetting stations or barcode-aided sample tracking; (iii) information database for accurate production monitoring, statistical process evaluation, and customer feedback; (iv) control and data acquisition from individual lab automats, such as liquid handling robots, oligo synthesizers, or analytical instruments; and (v) control of integrated and fully automated assembly modules.

7.2. Process control One of the most dominant functions of the LIMS is the provision and monitoring of in-process controls. Each production step requires a release entry in the system. For some steps, quality control demands for visual inspection of a process product (appearance of colonies on selection plates, PCR band(s) in gel electrophoresis, restriction analysis, etc.), for others analytical measurements have to be evaluated (e.g., optical density, highperformance liquid chromatography [HPLC] results). Results are reported

Gene Synthesis

267

back to the system, and depending on the result, positive or negative, the next task is generated (for positive results: the next step; for negative results: repeating the step or applying an alternative route) and displayed. This system is perfectly suited to handling a large number of parallel operations. Providing task lists that contain all orders designated for the specific operation (ligation, transformation, PCR1, PCR2, etc.) guarantees that each order is automatically shifted to the next operation step.

7.3. Process expansion Understanding how the LIMS operates highlights certain prerequisites that need to be addressed in order to enable the high degree of parallelization that the LIMS can theoretically achieve: the most important ones being standardization and automation. The capacity of the LIMS as well as carrying out manual or automated operations is limited. Therefore, it is mandatory to restrict the number of potential operations by introducing standards. Standards can be defined and applied for protocols, cloning/assembling methods, reagents, or operation conditions. These are defined in SOPs (standard operating protocols) and can be implemented in parallel within the LIMS. By defining operational standards, the production process can be dissected into defined, manageable, self-contained, and self-controlled operation steps. Having the production process divided into small and defined operation steps additionally provides optimal conditions for automation. Automated process solutions not only require precisely described standard protocols but are also required to manage the handling of a large number of parallel operations. Thus, the automation of manual tasks embedded in an LIMS environment is a consequent step toward, and at the same time, a necessary prerequisite for, high-throughput gene synthesis. Automation in gene synthesis is employed from oligonucleotide synthesis to fragment production, gene assembly, sequencing and operational tasks, such as sequence analysis and optimization, and then evaluation of sequence results. The implementation of process automation modules allows for specific and directed targeting of automatable operations and fast progress. The process chain from customer request to delivery comprises two interdependent strands: the information flow mapped inside LIMS and the material flow controlled by LIMS.

7.4. Order entry The process starts with entering customer and project information into the customer portal. These information data are directed to different registries, for example, the customer data into a customer relation management (CRM) system, the project cost and sales figures into an enterprise resource planning (ERP) system, and the project data into a production monitoring system. The production monitoring system contains all relevant data

268

Frank Notka et al.

on sequence, optimization, source organism, target organism, biosafety, biosecurity, required documents, etc. and feeds the production relevant information into the LIMS, while the sequence is loaded into the GeneOptimizerÒ and amended according to the project specifications.

7.5. Order processing The GeneOptimizerÒ defines the final sequence, the cloning strategy, and the fragment and oligo breakdown as described above and supplements the information already contained within the LIMS. The LIMS dissects the provided information into process tasks, clusters the tasks of all contained projects, and creates task lists. The majority of the process steps are automated (e.g., preparation of the sequencing reaction) or semiautomated (e.g., evaluation of sequence results), whereas all of the processes are managed within the LIMS. In order to match the information flow with the material flow, the LIMS contains additional modules (e.g., material or plasmid registries) and the physical containments are specified using barcodes, which enable accurate assignment of any sample to the correct order, the actual status, and the next task.

7.6. Oligonucleotide production The first task list is ordering the GeneOptimizerÒ defined oligonucleotides. The oligo sequences are transferred to a task list and allocated to an oligo synthesizer. Oligonucleotide synthesis at Life Technologies/GeneArt is based on a technology platform called Cerberus. This platform was developed to provide a synthesis format customized for large-scale gene synthesis specifications, which are mainly (i) parallel synthesis (operates in a 4 96well format), (ii) low consumables consumption (production in 96-well format), and (iii) high-quality output (error rate <0.1%). In addition to the actual oligo synthesizer machine, this platform comprises devices associated with the preparation of the synthesis plates, oligo deprotection and cleavage, central supply and waste management, exhaust air clearing and technical monitoring, and alarm systems. A standard synthesis run can be completed within 10 h and the daily production capacity of 10,000 oligos is sufficient to produce approximately 5 Mbp of dsDNA per month. Automated photometric evaluation of the oligo concentration is implemented and random sampling HPLC is used for quality control of each synthesis run. The next task within the process chain affects the first fragment assembly step: all oligonucleotides belonging to a specific fragment (L- and M-oligos) are mixed using liquid handling robots; the concentrations are adjusted according to a predefined range. The respective plates containing the oligo-mixes are transferred to a robotic-based production module that performs the initial gene synthesis steps.

Gene Synthesis

269

7.7. Subfragment production The first post-chemistry step in gene synthesis is assembling the oligonucleotides to yield longer contiguous sequences. The maximal final length of these constructs must be considered carefully to limit the likelihood of errors in the product as well as the number of transformants to screen. Currently, the most cost-effective size of these synthetic building blocks is between 1 and 2 kb. The assembly process is basically a multiplex primer extension reaction, taking place under controlled temperature cycling conditions. In the first cycling round, overlapping primers anneal to each other and are filled in by polymerase to form short double strands. These can again anneal to each other in the subsequent cycle and are extended to fragments bridging four oligonucleotides. This progression continues until fragments arise, containing the complete length of the intended product. Once achieved, terminal primers, present in excess, take effect and amplify the full-length product exponentially. In a next step, the linear DNA molecule is ligated into a minimal cloning vector using classical restriction endonuclease techniques. After transformation into E. coli and bacterial cultivation, some colonies are selected for evaluation by colony PCR sequencing. Again, the results are reported back to the LIMS, positive clones are further analyzed by DNA plasmid preparation, and the accuracy of the synthesized DNA construct is verified by sequencing (in-process QC).

7.8. Assembly Altogether, conditions for mass production are chosen to have a more than 95% chance of picking at least one correct fragment with a single screen. This sets a limitation on the total size of the initial product, since longer molecules accumulate mutations exponentially, resulting in increased necessary screening efforts for correct fragments. Therefore, in order to compile synthetic gene constructs exceeding 1–2 kb, they are reassembled from the sequence-verified first building blocks. Since assembling DNA elements is the nuts and bolts of biotechnology, several techniques exist to do so efficiently, although not all of them are equally apt for sequence-independent gene synthesis. The straightforward approach for DNA fragment linkage is classical manipulation with restriction enzymes and ligase (Maniatis et al., 1982). This method, however, is quite inflexible in terms of junction sequence design and involves well-known problems regarding availability and uniqueness of appropriate restriction sites. Type II class S restriction sites can eliminate scar sequences at the boundaries. These enzymes can produce sticky ends outside their recognition sequence, while the nucleotides of the adjacent cohesive stretch can be chosen freely, representing a common part of the intended compound product for ligation (Padgett and Sorge, 1996).

270

Frank Notka et al.

Designing this common part to have a length of approximately 20 bp allows flexible and specific attachment of two or more DNA fragments by fusion PCR, but is limited to moderate overall size and inherits an additional source of sequence errors (Mullinax et al., 1992). The DISEC-TRISEC and LIC-POR methods employ the exonuclease activity of Klenow or T4 DNA polymerase to generate compatible single-stranded overhangs, which are then combined with or without ligase, respectively (Aslanidis and de Jong 1990; Dietmaier et al., 1993). In vitro recombination extends this technology by annealing the overhangs under more stringent conditions at elevated temperatures and then filling and closing gaps with a heat stable polymerase and ligase.

8. Case Study: Large-Scale Gene Production Direct evolution strategies aim to improve protein or enzyme functions toward novel nonnatural properties. Since the potential sequence space of molecular variants is so vast, it is a common strategy to limit variation to those positions of a protein that is known to be related to function. In many cases, however, it turns out that substitutions of unexpected residues are responsible for advancing the molecule. A straightforward approach to obtain a complete data matrix of all beneficial, adverse, and neutral single amino acid substitution is to actually generate all these mutants and test them. For a regular 300 amino acid protein, this involves screening 300  19 ¼ 5700 variants, which is manageable even with low throughput screening assays. The functional analysis of these mutants generates a data matrix containing information about the importance of each protein position regarding overall function, as well as which non-wild-type amino acid contributes to adapting the protein toward the technical demands (Geddie and Matsumura, 2004; Tan et al., 2008). While the challenging expertise is a good screening system, the actual production of the necessary DNA constructs is tedious—and an excellent example of where high-throughput gene synthesis can be of great support. The described automated gene synthesis workflow together with the associated LIMS-guided data management (see Fig. 11.4) allows for the systematic replacement of single oligonucleotides during the assembly of synthetic fragments. In practice, for a given amino acid position, 19 oligonucleotides are synthesized, each containing a non-wild-type codon. Instead of one oligonucleotide mix, necessary for the construction of one gene, 19 parallel reactions are set up, with only one particular primer being different. This is an ideal prerequisite for automation and parallel processing in production of large quantities of similar DNA constructs and facilitates accessible and feasible projects in directed evolution.

271

Gene Synthesis

CRM Biosecurity

Status

>NM_001002749 ATGTGGAAGAGTGTGTTTAGTG TCAGTTTCCGCGTCAGTGTATGAC GCTGCCCATCACCTACAGGACTGG GGCAGAGGTCAGTGTGTGTCAATT TCAGTATCCTCACGACGGCCGGGA

Design and optimization

Oligo sequences Assembly rules Cloning strategy

Oligonucleotide synthesis

A

C

G

T Oligo sequences

Oligonucleotide assembly and amplification

Assembly rules

Cloning

Cloning strategy

Identification of correct clone

L I M S

Controlling

Order entry

Sequence data Clone ID A

Plasmid preparation B C

Plasmid preparation

A

B

C

Cloning strategy A

B

C

Sequence data TCTCTCGATCCCATTCCATCCAGGT

Final QC

Export

Statistics

Subfragment assembly

Clone ID

Export data

Figure 11.4 Schematic overview of gene synthesis production flow and interconnections to controlling LIMS.

272

Frank Notka et al.

9. Conclusion The complete process of gene synthesis—from sequence submission to shipping the final plasmid—is a process involving many different disciplines. Sales, bioinformatics, organic chemistry, molecular biology, export, and logistics must all play hand in hand to shift the entire workflow from small-scale to an industrial high-throughput operation. The LIMS is essential to track every intermediate in the multistep production when dealing with hundreds and thousands of syntheses in parallel. Equally, an increasing degree of automation is mandatory to avoid exponential growth in production volume necessitating an equivalent increase in manpower. Pipetting robots communicate flawlessly with an LIMS network, and vice versa. Some steps are also simply no longer manageable by humans, such as the move from 96- to 384-well plates, or the decrease of reaction volumes below 1 ml. It is the interplay between LIMS, automation, and miniaturization that creates the prerequisites necessary for a smooth and robust production platform enabling cheap and fast production of synthetic genes. The differences between lab-scale and industry-scale gene synthesis are thus not based upon novel technologies or innovative synthesizers as one might expect. The state-of-the-art technology has proven to be sufficient to satisfy the current gene synthesis demand. This does not imply that we can abstain from novel developments and even technological leaps forward in order to satisfy future demands in scales and costs. However, today’s technology—if employed in a reasonable and dedicated way—can provide the output demanded by the scientific community. The differences, therefore, rather originate from very straight forward adaptation of each single operational step to the process specifications, which are mainly (i) low consumables consumption, (ii) high degree of parallel sample processing, and (iii) low error rates in connection to reliability and reproducibility. Implementation of these specifications resulted in concrete measures related to method adaptation, standardization, in-process QC, information handling and flow, automation and machine development, and quality management compliance, in combination enabling industrial scale gene synthesis.

REFERENCES Agarwal, K. L., Buchi, H., Caruthers, M. H., Gupta, N., Khorana, H. G., Kleppe, K., Kumar, A., Ohtsuka, E., Rajbhandary, U. L., Van de Sande, J. H., Sgaramella, V., Weber, H., et al. (1970). Total synthesis of the gene for an alanine transfer ribonucleic acid from yeast. Nature 227, 27–34. Aslanidis, C., and de Jong, P. J. (1990). Ligation-independent cloning of PCR products (LIC-PCR). Nucleic Acids Res. 18(20), 6069–6074.

Gene Synthesis

273

Barany, F., and Gelfand, D. H. (1991). Cloning, overexpression and nucleotide sequence of a thermostable DNA ligase-encoding gene. Gene 109, 1–11. Bershtein, S., and Tawfik, D. S. (2008). Advances in laboratory evolution of enzymes. Curr. Opin. Chem. Biol. 12, 151–158. Cai, Y., Wilson, M. L., and Peccoud, J. (2010). GenoCAD for iGEM: A grammatical approach to the design of standard-compliant constructs. Nucleic Acids Res. 38(8), 2637–2644. Canton, B., Labno, A., and Endy, D. (2008). Refinement and standardization of synthetic biological parts and devices. Nat. Biotechnol. 26, 787–793. Cello, J., Paul, A. V., and Wimmer, E. (2002). Chemical synthesis of poliovirus cDNA: Generation of infectious virus in the absence of natural template. Science 297(5583), 1016–1018. Cheong, W. C., Lim, L. S., Huang, M. C., Bode, M., and Li, M. H. (2010). New insights into the de novo gene synthesis using the automatic kinetics switch approach. Anal. Biochem. 406(1), 51–60. Cooling, M. T., Rouilly, V., Misirli, G., Lawson, J., Yu, T., Hallinan, J., and Wipat, A. (2010). Standard virtual biological parts: A repository of modular modeling components for synthetic biology. Bioinformatics 26(7), 925–931. Czar, M. J., Anderson, J. C., Bader, J. S., and Peccoud, J. (2009). Gene synthesis demystified. Trends Biotechnol. 27(2), 63–72. De Lorenzo, V., and Danchin, A. (2008). Synthetic biology: Discovering new worlds and new words. EMBO Rep. 9(9), 822–827. Dietmaier, W., Fabry, S., and Schmitt, R. (1993). DISEC-TRISEC: Di- and trinucleotidesticky-end cloning of PCR-amplified DNA. Nucleic Acids Res. 21(15), 3603–3604. Edge, M. D., Green, A. R., Heathcliffe, G. R., Meacock, P. A., Schuch, W., Scanlon, D. B., Atkinson, T. C., Newton, C. R., and Markham, A. F. (1981). Total synthesis of a human leukocyte interferon gene. Nature 292, 756–762. Emilsson, V., Naslund, A. K., and Kurland, C. G. (1993). Growth-rate-dependent accumulation of twelve tRNA species in Escherichia coli. J. Mol. Biol. 230, 483–491. Endy, D. (2005). Foundations for engineering biology. Nature 438, 449–453. Fath, S., Bauer, A. P., Liss, M., Spriestersbach, A., Maertens, B., Hahn, P., Ludwig, C., Scha¨fer, F., Graf, M., and Wagner, R. (2011). Multiparameter RNA and Codon Optimization: A Standardized Tool to Assess and Enhance Autologous Mammalian Gene Expression. PLoS ONE. 6(3), e17596, 1–14. Geddie, M. L., and Matsumura, I. (2004). Rapid evolution of betaglucuronidase specificity by saturation mutagenesis of an active site loop. J. Biol. Chem. 279(25), 26462–26468. Gibson, D. G., Benders, G. A., Andrews-Pfannkoch, C., Denisova, E. A., BadenTillson, H., Zaveri, J., Stockwell, T. B., Brownley, A., Thomas, D. W., Algire, M. A., Merryman, C., Young, L., et al. (2008). Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319, 1215–1220. Gibson, D. G., Glass, J. I., Lartigue, C., Noskov, V. N., Chuang, R. Y., Algire, M. A., Benders, G. A., Montague, M. G., Ma, L., Moodie, M. M., Merryman, C., Vashee, S., et al. (2010). Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329(5987), 52–56. Gordeeva, T. L., Borschevskaya, L. N., and Sineoky, S. P. (2010). Improved PCR-based gene synthesis method and its application to the Citrobacter freundii phytase gene codon modification. J. Microbiol. Methods 81(2), 147–152. Graf, M., Ludwig, C., Kehlenbeck, S., Jungert, K., and Wagner, R. (2006). A quasilentiviral green fluorescent protein reporter exhibits nuclear export features of late human immunodeficiency virus type 1 transcripts. Virology 352, 295–305.

274

Frank Notka et al.

Graf, M., Schoedl, T., and Wagner, R. (2009). Rationales of gene design and de novo gene construction. In “Systems Biology and Synthetic Biology,” (P. Fu and S. Panke, eds.), John Wiley & Sons, Inc., Hoboken, NJ. 10.1002/9780470437988.ch12. Gru¨nberg, R., and Serrano, L. (2010). Strategies for protein synthetic biology. Nucleic Acids Res. 38(8), 2663–2675. Gustafsson, C., Govindarajan, S., and Minshull, J. (2004). Codon bias and heterologous protein expression. Trends Biotechnol. 22, 346–353. Heinemann, M., and Panke, S. (2006). Synthetic biology-putting engineering into biology. Bioinformatics 22(22), 2790–2799. Heyman, A., Barak, Y., Caspi, J., Wilson, D. B., Altman, A., Bayer, E. A., and Shoseyov, O. (2007). Multiple display of catalytic modules on a protein scaffold: Nano-fabrication of enzyme particles. J. Biotechnol. 131, 433–439. Huang, M. C., Ye, H., Kuan, Y. K., Li, M. H., and Ying, J. Y. (2009). Integrated two-step gene synthesis in a microfluidic device. Lab Chip 9(2), 276–285. Ikemura, T. (1985). Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34. Itakura, K., Hirose, T., Crea, R., Riggs, A. D., Heyneker, H. L., Bolivar, F., and Boyer, H. W. (1977). Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin. Science 198, 1056–1063. Kelly, J. R., Rubin, A. J., Davis, J. H., Ajo-Franklin, C. M., Cumbers, J., Czar, M. J., de Mora, K., Glieberman, A. L., Monie, D. D., and Endy, D. (2009). Measuring the activity of BioBrick promoters using an in vivo reference standard. J. Biol. Eng. 20, 3–4. Kodumal, S. J., Patel, K. G., Reid, R., Menzella, H. G., Welch, M., and Santi, D. V. (2004). Total synthesis of long DNA sequences: Synthesis of a contiguous 32-kb polyketide synthase gene cluster. Proc. Natl. Acad. Sci. USA 101, 15573–15578. Kosovac, D., Wild, J., Ludwig, C., Meissner, S., Bauer, A. P., and Wagner, R. (2010). Minimal doses of a sequence-optimized transgene mediate high-level and long-term EPO expression in vivo: Challenging CpG-free gene design. Gene Ther. 18(2), 189–198. 10.1038/gt.2010.134. Koster, H., Blocker, H., Frank, R., Geussenhainer, S., and Kaiser, W. (1975). Total synthesis of a structural gene for the human peptide hormone angiotensin II. Hoppe Seylers Z. Physiol. Chem. 356, 1585–1593. Maertens, B., Spriestersbach, A., von Groll, U., Roth, U., Kubicek, J., Gerrits, M., Graf, M., Liss, M., Daubert, D., Wagner, R., and Schafer, F. (2010). Gene optimization mechanisms: A multi-gene study reveals a high success rate of full-length human proteins expressed in Escherichia coli. Protein Sci. 19(7), 1312–1326. Mandecki, W., Hayden, M. A., Shallcross, M. A., and Stotland, E. (1990). A totally synthetic plasmid for general cloning, gene expression and mutagenesis in Escherichia coli. Gene 94, 103–107. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982). Molecular Cloning: A Laboratory Manual. Cold Spring Harbor laboratory, Cold Spring Harbor, NY. Marchisio, M. A., and Stelling, J. (2009). Computational design tools for synthetic biology. Curr. Opin. Biotechnol. 20(4), 479–485. Menzella, H. G., Reisinger, S. J., Welch, M., Kealey, J. T., Kennedy, J., Reid, R., Tran, C. Q., and Santi, D. V. (2006). Redesign, synthesis and functional expression of the 6-deoxyerythronolide B polyketide synthase gene cluster. J. Ind. Microbiol. Biotechnol. 33, 22–28. Mullinax, R. L., Gross, E. A., Hay, B. N., Amberg, J. R., Kubitz, M. M., and Sorge, J. A. (1992). Expression of a heterodimeric Fab antibody protein in one cloning step. Biotechniques 12(6), 864–869. Padgett, K. A., and Sorge, J. A. (1996). Creating seamless junctions independent of restriction sites in PCR cloning. Gene 168(1), 31–35.

Gene Synthesis

275

Parmeggiani, F., Pellarin, R., Larsen, A. P., Varadamsetty, G., Stumpp, M. T., Zerbe, O., Caflisch, A., and Plu¨ckthun, A. (2008). Designed armadillo repeat proteins as general peptide-binding scaffolds: Consensus design and computational optimization of the hydrophobic core. J. Mol. Biol. 376, 1282–1304. Raab, D., Graf, M., Notka, F., Schoedl, T., and Wagner, R. (2010). The GeneOptimizer Algorithm: Using a sliding window approach to cope with the vast sequence space in multiparameter DNA sequence optimization. Syst. Synth. Biol. 4(3), 215–225. Saiki, R. K., Scharf, S., Faloona, F., Mullis, K. B., Horn, G. T., Erlich, H. A., and Arnheim, N. (1985). Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230(4732), 1350–1354. Samuel, G. N., Selgelid, M. J., and Kerridge, I. (2009). Managing the unimaginable. Regulatory responses to the challenges posed by synthetic biology and synthetic genomics. EMBO Rep. 10(1), 7–11. Shao, Z., Zhao, H., and Zhao, H. (2009). DNA assembler, an in vivo genetic method for rapid construction of biochemical pathways. Nucleic Acids Res. 37(2), e16. Szybalski, W., and Skalka, A. (1978). Nobel prizes and restriction enzymes. Gene 4, 181–182. Tan, L., Wiesler, S., Trzaska, D., Carney, H. C., and Weinzierl, R. O. (2008). Bridge helix and trigger loop perturbations generate superactive RNA polymerases. J. Biol. 7(10), 40. TerMaat, J. R., Pienaar, E., Whitney, S. E., Mamedov, T. G., and Subramanian, A. (2009). Gene synthesis by integrated polymerase chain assembly and PCR amplification using a high-speed thermocycler. J. Microbiol. Methods 79(3), 295–300. Tian, J., Ma, K., and Saaem, I. (2009). Advancing high-throughput gene synthesis technology. Mol. Biosyst. 5(7), 714–722. Tumpey, T. M., Basler, C. F., Aguilar, P. V., Zeng, H., Solo´rzano, A., Swayne, D. E., Cox, N. J., Katz, J. M., Taubenberger, J. K., Palese, P., and Garcı´a-Sastre, A. (2005). Characterization of the reconstructed 1918 Spanish influenza pandemic virus. Science 310(5745), 77–80. Van den Brulle, J., Fischer, M., Langmann, T., Horn, G., Waldmann, T., Arnold, S., Fuhrmann, M., Schatz, O., O’Connell, T., O’Connell, D., Auckenthaler, A., and Schwer, H. (2008). A novel solid phase technology for high-throughput gene synthesis. Biotechniques 45(3), 340–343. Van der Sloot, A. M., Kiel, C., Serrano, L., and Stricher, F. (2009). Protein design in biological networks: from manipulating the input to modifying the output. Protein Eng. Des. Sel. 22, 537–542. Wagner, R., Graf, M., Bieler, K., Wolf, H., Grunwald, T., Foley, P., and Uberla, K. (2000). Rev-independent expression of synthetic gag-pol genes of human immunodeficiency virus type 1 and simian immunodeficiency virus: implications for the safety of lentiviral vectors. Hum. Gene Ther. 11(17), 2403–2413. Xiong, A. S., Peng, R. H., Zhuang, J., Liu, J. G., Gao, F., Chen, J. M., Cheng, Z. M., and Yao, Q. H. (2008). Non-polymerase-cycling-assembly-based chemical gene synthesis: Strategies, methods, and progress. Biotechnol. Adv. 26, 121–134. Young, L., and Dong, Q. (2004). Two-step total gene synthesis method. Nucleic Acids Res. 32, e59.