The Genetic Basis of Behavior

The Genetic Basis of Behavior

C H A P T E R 2 The Genetic Basis of Behavior INTRODUCTION “We wish to suggest a structure for the salt of deoxyribose nucleic acid (D.N.A.)” is the...

4MB Sizes 2 Downloads 135 Views

C H A P T E R

2

The Genetic Basis of Behavior INTRODUCTION “We wish to suggest a structure for the salt of deoxyribose nucleic acid (D.N.A.)” is the first sentence of the famous paper by James Watson and Francis Crick (Watson and Crick, 1953). It forms the culmination of a race between several groups, to identify the molecular structure of the molecule responsible for transferring genetic information from one generation to the next. The most important players were, in addition to Watson and Crick, Linus Pauling at Caltech and Maurice Wilkins and Rosalind Franklin in London. The basic ingredients of DNA were known since the beginning of the century: its acidic nature, phosphate, the sugars (deoxyribose in DNA and ribose in RNA, see Fig. 2.1) and the four bases: the purines adenine [A] and guanine [G] and the pyrimidines cytosine [C] and thymine [T], the latter being replaced by the pyrimidine uracil [U] in RNA (Fig. 2.1). Perhaps the most important characteristic of DNA was reported in 1952 by Erwin Chargaff, an Austrian biochemist who had fled to the USA during the Nazi era. Although it had long been known that different DNA molecules contain different amounts of the four bases, Chargaff found that the ratio of A to T and C to G was almost always close to 1. This is now known as the Chargaff rule, and suggested that somehow A and T (as well as C and G) were closely related (Chargaff et al., 1952). The final piece of the puzzle came when Maurice Wilkins showed James Watson the (now famous) X-ray crystallographic photo 51. The photo was taken by Raymond Gosling while working as a PhD student for Rosalind Franklin. Apparently Franklin wasn’t aware of this, but to Watson and Crick the diffraction pattern provided valuable evidence that DNA has a helical structure.

THE MOLECULAR STRUCTURE OF DNA DNA molecules are large polymers (Fig. 2.2) consisting of a sugar moiety linked together by covalent phosphodiester bonds. The sugar moiety is a pentose (5-membered ring) called deoxyribose (as the hydroxyl group of ribose on the 29 position is removed), which is linked to the phosphate groups either on the 39 or the 59 position, thus forming long linear chains.

Gene-Environment Interactions in Psychiatry. http://dx.doi.org/10.1016/B978-0-12-801657-2.00002-1 Copyright © 2016 Elsevier Inc. All rights reserved.

19

20

2.  The Genetic Basis of Behavior

FIGURE 2.1  Basic building blocks of DNA and RNA. The backbone of DNA consists of a phosphate coupled to a deoxyribose sugar, while in RNA the backbone consists of a phosphate coupled to a ribose sugar. In addition, both DNA and RNA contain the purines adenine and guanine, while DNA also contains the pyrimidines cytosine and thymine. In RNA, thymine is replaced by uracil.

Due to this configuration, each DNA strand is said to have a 59 to 39 polarity, the relevance of which will be discussed later. In addition to the phosphate binding to the 39 and 59 position, deoxyribose is substituted at the 19 with one of the nucleotide bases. The beauty of DNA lies in the fact that the chemical structure of each nucleotide base allows for binding to one (and only one) of the other nucleotide bases through hydrogen bonds. Hydrogen bonds are (fairly weak) electrostatic bonds between polarized parts of a molecule such as between nitrogen (N) and oxygen (O). As is illustrated in Fig. 2.2, adenine and thymine can form two hydrogen bonds with each other, while adenine and guanine can form three hydrogen bonds. This then results in a double stranded form of DNA (which is much more stable than the single stranded form) and at the same time explains Chargaff’s rule. The stability of DNA is determined by the strength of the covalent binding between the different components, as well as the hydrogen bonds between the nucleotide bases on opposite

I.  General Introduction



The molecular structure of DNA

21

FIGURE 2.2  The basic structure of DNA. DNA consists of a backbone of phosphate groups connecting deoxyribose sugars at the 39 and 59 positions. In addition, one of the four nucleobases is attached to the 19 position of the deoxyribose. Due to the chemical structure of the nucleobases, adenine can only pair with thymine as they can form two hydrogen bonds. Likewise, cytosine and adenosine pair with each other through three hydrogen bonds. Since the backbones run in opposite directions they are said to be antiparallel.

strands. Although hydrogen bonds are relatively weak as mentioned previously, DNA molecules generally have millions of nucleotide bases which together form a formidable force. Due to the large number of phosphate groups, DNA is a highly negatively charged molecule and therefore dissolves well in water. DNA is further stabilized by globular proteins called histones. These proteins are now known to play a very important role in determining the overall functioning of DNA (which will be discussed further in Chapter 5). The double-stranded DNA generally takes the form of a double helix (Fig. 2.3A) and several different forms have been identified: A-DNA, B-DNA, and Z-DNA. A- and B-DNA are right-handed helixes (in which the helix spirals clockwise) while Z-DNA is a left-handed (counter clockwise) spiral. The difference between A- and B-DNA is the number of base-pairs (bp) per turn (10 vs 11). Under physiological conditions, the vast majority of DNA adopt the B-form with a pitch (ie, the distance occupied by a single turn of the helix) of 3.4 nm (Ghosh and Bansal, 2003). The helical structure of DNA is further characterized by a major and a minor groove. I.  General Introduction

22

2.  The Genetic Basis of Behavior

FIGURE 2.3  DNA replication. A: DNA forms a double helix, characterized by a minor and a major groove. The most common form of DNA has a pitch of 3.4 nm (length of a single turn of the double helix). B: DNA replication starts with the unwinding of the double helix after which each of the two strands is replicated. The process involves a number of different steps and different proteins. See the text and Box 2.1.

As discussed before, each DNA molecule has one terminal sugar in which the 39 carbon is not linked to a phosphate (the 39 end), and similarly one terminal sugar on the other side where the 59 carbon is not attached to a phosphate (the 59 end). The two strands of the double helix are said to be antiparallel because they always associate (anneal) in such a way that the 59 → 39 direction of one DNA strand is opposite to that of its partner. As both strands are complementary and thus essentially contain the same information, it is customary to write the DNA sequence of only a single strand. For this, we normally use the 59 → 39 direction as this is the direction in which DNA is synthesized and in which RNA molecules are synthesized. When a dinucleotide is described, it is customary to include a p (to indicate the phosphate bond). Thus CpG refers to a cytosine covalently linked to a guanine of the same strand.

I.  General Introduction



DNA synthesis and replication

23

DNA SYNTHESIS AND REPLICATION The unique structure of DNA with its complementary strands also explains one of the great mysteries of genetics; namely how information can be faithfully transferred from one cell and one generation to the next. Indeed, Watson and Crick already stated at the end of their paper “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanisms for the genetic material” (Watson and Crick, 1953). Box 2.1 summarizes the main proteins involved in DNA replication. The process of DNA replication starts with the unwinding of the double helix (Fig. 2.3) by so-called DNA helicase proteins, (a class of motor proteins) as they travel along the phosphodiester backbone. There are a large number of helicases, and 95 different nonredundant proteins have been identified in the human genome (Umate et al., 2011), of which 64 are RNA helicases (aimed at unwinding RNA) and 31 are DNA helicases (aimed at unwinding DNA). The existence of so many different helicases can be explained by the many different processes that helicases are involved in, such as DNA replication, transcription, translation, and mismatch repair (see later). In addition, helicases are involved in processes such as recombination and ribosome biogenesis.

BOX 2.1

D N A R E P L I C AT I O N DNA replication is a tightly controlled process following several distinct steps: 1. DNA replication is initiated at many different sites within the molecule (origins of replication) by the binding of the origin replication complex. 2. This multi-protein complex leads to the recruitment of several additional proteins and ultimately to the binding of DNA helicase to the leading strand of DNA. 3. DNA helicase functions to unwind the DNA and separate the two strands. 4. To prevent overwinding of the DNA in front of the separation, the enzyme topoisomerase functions to relax the DNA. 5. To initiate the synthesis of the daughter DNA strands, DNA primase attaches

6.

7.

8.

9.

10.

a short RNA primer sequence to the original DNA strands. In the leading strand, DNA polymerase δ attaches new nucleotides to the growing daughter strand. In the lagging strand, short Okazaki fragments are formed. They are initiated by the DNA primase induced attachment of an RNA primer. To this primer, DNA polymerase α binds a short sequence of nucleotides, after which DNA polymerase δ attaches the remaining nucleotides. RPA and Fen-1 then attach to the RNA/ DNA stretch to allow the exact removal of the RNA primer plus the DNA stretch synthesized by polymerase α. DNA ligase then connects the 59 phosphate group to the free 39 –OH group of the deoxyribose sugar.

I.  General Introduction

24

2.  The Genetic Basis of Behavior

DNA unwinding takes place at the replication fork within the so-called replisome complex, with hexameric helicases forming a ring around the phosphodiester backbone of DNA (McGlynn, 2013). This is thought to be important to ensure tight binding of the helicase to the DNA molecule. Although the principle of helicase directed unwinding is similar in all organisms, subtle differences exist. Thus in bacteria, the hexamer consists of 6 identical helicases and binds to the 59 → 39 strand, while in other organisms the hexamer consists of different helicases and travels in the 39 → 59 direction. While moving along the phosphodiesterase backbone of DNA, the helicase complex uses ATP to break-up the hydrogen bonds that keep the two strands of the DNA helix together. The number of base pairs unwound by a single ATP molecule is not precisely known, but in bacteria there is evidence that two base pairs are split by every molecule of ATP. In addition to helicases another group of enzymes is essential for the unwinding of DNA, the so-called topoisomerases. As the DNA unwinds and opens up due to the actions of DNA helicases, the DNA tends to become overwound ahead of the replication fork, which would severely inhibit (and ultimately stop) further unwinding and separation of the strands. Topoisomerases function to release the tension by temporarily cutting the phosphodiester backbone, allowing the DNA strand to uncoil and release the tension. Two different families of topoisomerases can be delineated: type I cut only one single strand on DNA, either at the 59 phosphate (type IA) or at the 39 phosphate (type IB) while type II topoisomerases temporarily cut both strands (Champoux, 2001). Unfortunately, the terminology of topoisomerases is slightly confusing, as several different forms of topoisomerase III exist as well, which actually belong to the type IA subfamily. After the combined action of topoisomerases and helicases, the two strands of DNA are separated allowing each strand to replicate, leading to two daughter molecules of DNA. As each of these consists of one of the original strands and one newly synthesized strand, the process is referred to as semiconservative. The process of synthesis of the new strands of DNA is critically dependent on the enzyme DNA polymerase. This enzyme uses the four different nucleotide triphosphates to add a single nucleotide monophosphate (the two additional phosphates are discarded) to the growing DNA strand. As the base-pair rule strongly favors A to bind to T and C to G, the new daughter molecule is an identical copy of the original (although some replication errors can occur, see later). There are over 120 different DNA polymerases, generally subdivided into three groups, of which the classical DNA-directed DNA polymerases are the most important ones. However, all polymerases share two fundamental properties: [1] they can only add nucleotides to an existing chain of DNA; [2] they can only attach nucleotides to the 39 –OH group of deoxyribose. Both these properties pose a serious problem in DNA replication. First of all, if polymerases can only add nucleotides to an existing chain, how does replication start? This problem is solved by attaching a short sequence to the single stranded DNA. This short stretch is attached by a primase enzyme and is in fact a short stretch of RNA rather than DNA, and is generally referred to as an RNA primer. The RNA primer then provides the free 39 –OH group for DNA polymerase to attach nucleotides. The RNA primer is eventually removed by a so-called ribonuclease enzyme, after which DNA polymerase fills in the gap. The gap is then closed by DNA ligase enzymes, which catalyze the phosphodiester bond between adjacent 39 –OH and 59 -phosphate groups. The second problem in DNA replication is illustrated in Fig. 2.3. Since the DNA strands are antiparallel, only one of the new strands will have a free 39 –OH group, while the other

I.  General Introduction



DNA synthesis and replication

25

must have a free 59 –OH group. If DNA polymerase can only attach new nucleotides to a free 39 –OH group, how can the second strand be copied? The answer is provided by the pioneering work of Reiji Okazaki and his colleagues. They showed that DNA replication is in fact a discontinuous process, with short stretches of DNA being formed, now referred to as Okazaki fragments. Each of these stretches starts again with an RNA primer attached by a primase, to which DNA polymerase can then attach nucleotides. In bacteria only one DNA polymerase (polymerase III) is active in the synthesis, while a second polymerase (polymerase I) removes the RNA primers and allows ligases to close the gap between the individual fragments. In contrast, in eukaryotic cells two different polymerases are involved in the synthesis of Okazaki fragments. After the RNA primer (around 12 bases) is attached by the primase PriSL, the first stretch of DNA (around 25 bases) is synthesized by polymerase α, while the remainder is then synthesized by polymerase δ. In addition, as neither of these DNA polymerases can remove the RNA primer by themselves, several nucleases and the single stranded binding protein RPA are required to remove the primer. There are further differences between bacteria and eukaryotic cells, including the speed of Okazaki fragment formation (about 1000 bases per second in bacteria versus only about 50 bases per second in eukaryotic cells) and the length of the fragments (about 1000–2000 bases in bacteria versus 150–250 bases in eukaryotic cells). As a result, DNA replication is much faster in bacteria than in eukarya, although the reasons for this are still unclear (Forterre, 2013). Although it has been suggested that the involvement of a second polymerase might represent a further stage in the evolution, this seems unlikely. In fact apart from slowing down DNA replication, the introduction of polymerase α has the major drawback that it lacks endonuclease activity which is essential for proofreading. Proofreading is important to ensure that the DNA molecule is replicated faithfully. As polymerase δ (and its bacterial counterpart polymerase III) can excise incorrect nucleotides (through its endonuclease activity) and insert correct nucleotides at the same time, the error rate of polymerase α is up to 100 times higher. It seems that the mechanism for Okazaki fragment maturation has been designed to compensate for this. Thus the polymerase δ of the previous Okazaki fragment synthesizes exactly the same number of bases to allow the Fen-I endonuclease to remove not only the RNA primer, but also the part of the DNA synthesized by polymerase α, allowing the ligase to close the gap between the two fragments. Given the complexity of the formation and maturation of the Okazaki fragments, and the slow actions of the primases (compared to DNA polymerases) the two daughter strands are synthesized at different rates and are therefore referred to as the leading (ie, the continuously synthesized 59 → 39 strand) and the lagging (ie, Okazaki fragments 39 → 59 strand). However, as the speed of synthesis is so different between the two strands, the primases also act as a stopping signal, to prevent the lagging strand from falling too much behind the leading strand. A final important aspect of DNA replication that we have not yet discussed is the origin of replication. Given the length of the DNA molecules (in humans up to 250 million base pairs), replication of a single DNA molecule would be extremely slow. Thus in most organisms, DNA replication starts at multiple sites within a single DNA molecule. These sites are referred to as origins of replication and there can be hundreds if not thousands of these sites per DNA molecule. The exact nature (ie, base pair composition) of these sites varies enormously between species (and even within one species), but they all can bind to the so-called origin recognition complex. This is a six-subunit complex which is remarkably conserved in

I.  General Introduction

26

2.  The Genetic Basis of Behavior

eukaryotic cells and the originator of replication. The binding of the origin recognition complex triggers the subsequent binding of several other proteins including Cdc6 and Mcm2-7 leading to the so-called prereplicative complex. There is evidence to support the idea that Mcm2-7 (which is a hexameric protein complex) functions as a helicase, unwinding the DNA (Bell and Dutta, 2002).

THE STRUCTURE OF THE GENOME The genome is defined as the total amount of genetic material of a species (see Box 2.2 for a summary of the most important terms used in this chapter). As can be expected this differs dramatically between different species. Although more complex organisms generally have

BOX 2.2

D E F I N I T I O N O F T H E I M P O RTA N T T E R M S Allele Alternative splicing Autosome Chromatin Codon Diploid DNA Dominance Exon F1, F2 Gamete Gene

Genome Genotype Heterozygosity Histones Homozygosity

An alternative version of a gene. Diploid organisms carry two alleles of each gene (although they may be identical) The process whereby one gene (and one mRNA molecule) can produce multiple different proteins (isoforms) All chromosomes with the exception of the sex chromosomes A complex of macromolecules consisting of DNA, RNA and proteins A sequence of three nucleotides coding for a single amino acid Cells (or organisms) with two copies of all autosomes Deoxyribonucleic acid, the chemical substance that contains the genetic material An allele that produces a certain phenotype even in a heterozygous individual A part of the DNA that codes for a specific (part of) a protein The offspring in first and second generation after mating of the F0 Mature reproductive cell (sperm in males, egg or ovum in females) The basic unit of inheritance that codes for a specific product. Through the process of alternative splicing, more than one isoform of a protein can be formed from a single gene The total DNA sequence in an organism The genetic composition of an organism or at a specific gene location The presence of different alleles for the same gene on the two chromosomes Globular proteins involved in packaging DNA (see Chapter 5) The presence of two identical alleles for the same gene on the two chromosomes

I.  General Introduction



The structure of the genome

Intron mRNA Nucleobase Nucleoside

Nucleosome Nucleotide Phenotype RNA rRNA tRNA

27

A part of the DNA that does not code for a protein Messenger RNA, a single strand molecule involved in DNA transcription The basic building blocks of DNA and RNA: adenine, guanine, thymine, cytosine and uracil Organic molecules consisting of a nucleobase connected to a sugar: ribose to form ribonucleosides or deoxyribose to form deoxyribonucleosides A segment of DNA (about 147 bp) wrapped around a central core of 8 histone molecules Organic molecules consisting of a nucleoside connected to a phosphate group. They form the basic unit of DNA and RNA. A functional characteristic of an organism that results from genetic factors (often in combination with specific environmental factors) Ribonucleic acid, a group of chemical substances involved in transcription, translation and regulation of genetic information Ribosomal RNA, critical for the formation of ribosomes which assists in DNA translation Transfer RNA, a macromolecule involved in protein formation in ribosomes

larger genomes, there is not a simple linear relationship. For instance, the genome of mammals is about 109–1010 bases, while some amphibians and birds have genomes close to and above 1011 bases. In all species the genomic information is stored in the chromosomes, with the number of chromosomes again being very different between different species. Table 2.1 list a number of different species and their chromosomes, again illustrating that there is no linear relationship between the complexity of the organism and the number of genes. It is important to realize that whereas in eukaryotic cells the vast majority of chromosomes are located in the nucleus, some chromosomes are also present in the organelles, most notably the mitochondria in animals and chloroplasts in plants. These organellar chromosomes are mostly circular and are much more tightly packed with genes (see later). As such, the organellar chromosomes resemble bacterial DNA and it has been suggested that mitochondria (and chloroplasts) were originally prokaryotic cells that were encapsulated by eukaryotic cells forming a so-called endosymbiotic relationship (Thiergart et al., 2012). Whereas the genome is the sum of all genetic material, a gene can be defined as the molecular unit of heredity. It is generally identical to a stretch of DNA that codes for a single polypeptide (protein) chain. In most organisms, genes include both coding stretches of DNA (so-called exons) but also intervening noncoding areas (so-called introns) as well as several regulatory stretches of DNA. In the next section we will describe the structure of genes in more detail. As Table 2.1 shows, the human genes are encoded in 23 pairs of chromosomes, 22 pairs of autosomes plus the X and Y sex chromosomes. The two most often used animal species in biomedical research, mice and rats, have 20 and 21 pairs respectively. This implies that the same

I.  General Introduction

28

2.  The Genetic Basis of Behavior

TABLE 2.1 Chromosomal Composition of Several Interesting Species Species

Official name

No. of chromosome

Adders tongue

Ophiglossum reticulatum

1260

Shrimp

Penaues semiculcatus

90

Pigeon

Columbidae

80

Dog

Canis lupus

78

Cow

Bos primigenius

60

Elephant

Loxodonta africana

56

Zebrafish

Danio rerio

50

Chimpanzee

Pan troglodytes

48

Human

Homo sapiens

46

Rat

Rattus norvegicus

42

Mice

Mus muculus

40

Cat

Felis catis

38

Fruit fly

Drosophila melanogaster

8

Jack jumper ant

Myrmecia pilosula

2

genes are located on different chromosomes in different species. For instance, the gene for the serotonin transporter [a protein that is crucially involved in regulating the extracellular concentration of the neurotransmitter serotonin and which has been implicated in several psychiatric disorders (see Section 2 of the book)] is located on chromosome 17 in humans, on chromosome 11 in mice and chromosome 10 in rats. Conserved synteny refers to the colocalization of genes on chromosomes in different species. It is generally accepted that during evolution the genome has become rearranged due to, for instance, chromosomal translocation. It follows that the larger the conserved synteny, the more closely related the two species are (in evolutionary terms).

DNA TRANSCRIPTION The expression of the genetic information stored in DNA involves two more steps namely transcription (during which information from DNA is passed on to RNA) and translation (during which information from RNA is passed on to proteins). Since virtually all organisms use exactly the same flow of information, the principle is commonly known as the “central dogma” of molecular biology (Fig. 2.5). It was first proposed by Francis Crick in a lecture in 1956 and later reiterated in a paper in Nature in 1970 (Crick, 1970). In this paper Crick already acknowledged that the simplest model DNA → RNA → protein (Fig. 2.4A) is likely not correct and does not account for RNA replication (which occurs in many viruses) and for the flow of information from RNA to DNA (Fig. 2.4B). This latter process occurs in so-called retro-viruses but also in so-called retrotransposons. DNA transcription starts with the binding of the enzyme RNA polymerase to a specific sequence of DNA called the promotor region. However, in order to be able bind to DNA and

I.  General Introduction



DNA transcription

29

FIGURE 2.4  The central dogma of molecular biology. (A) In its traditional form the central dogma states that the flow of information is from DNA to RNA to proteins, while acknowledging that DNA replicates itself. (B) In 1970, Crick adjusted the central dogma slightly, acknowledging that in certain circumstances the flow of information goes from RNA to DNA and RNA can replicate itself too. There is even some evidence, albeit in very rare circumstances, that information can flow directly from DNA to proteins.

initiate transcription, the RNA polymerase requires several other specific proteins to bind [so-called transcription factors (TF)] which allows for a differential expression of genes per cell. We will discuss this in the next section. There are several different RNA polymerases, the most important for gene transcription being RNA polymerase II (Table 2.2). As RNA polymerases contain intrinsic helicase activity, the binding to DNA automatically unwinds the double helix. Another crucial difference is that whereas DNA polymerase requires a primer, RNA polymerase can start from scratch. Normally, the primary RNA nucleotide binds to the 39 → 59 template strand of DNA in antiparallel fashion (Fig. 2.5). Thus, chain elongation occurs by adding new nucleotides to the free 39 prime –OH group of the ribose sugar. As with DNA, this implies that RNA is also polarized, although as only a single strand of RNA is generally synthesized, most commonly only the 59 → 39 RNA molecule exists, with the 59 end containing a free triphosphate group and the 39 end containing a free ribose hydroxyl group. Since chain elongation follows the same basic hydrogen-bond rule that governs DNA replication, the RNA strand is a copy of the 59 → 39 strand of DNA. For that reason, this strand of DNA is usually referred to as the sense strand, and the template 39 → 59 strand is referred to as the antisense strand. However, it is important that the RNA is not an exact copy of the sense

I.  General Introduction

30

2.  The Genetic Basis of Behavior

TABLE 2.2  The Different RNA Polymerases and their Function Species

Function

RNA polymerase I

Involved in the synthesis of ribosomal RNA

RNA polymerase II

Involved in the synthesis of messenger RNA, small nuclear and micro RNA

RNA polymerase III

Involved in the synthesis of transfer RNA, ribosomal RNA and small nuclear RNA

RNA polymerase IV

Involved in the synthesis of small interfering RNA in plants

RNA polymerase V

Involved in the synthesis of small interfering RNA in plants

DNA strand. In addition to having a ribose-phosphate (rather than a deoxyribose-phosphate) backbone, RNA contains the pyrimidine uracil rather than thymine (opposing adenine). The reason why RNA contains uracil rather than thymine is not known. However, as thymine and deoxyribose are more stable than uracil and ribose, one theory has proposed that over the course of the evolution the primary source of genetic information changed from RNA to DNA. In line with this, many viruses still only contain RNA and as mentioned earlier, many rely on RNA replication.

FIGURE 2.5  DNA transcription I. Like DNA replication, DNA transcription starts with the unwinding of the double helix. This occurs when RNA polymerase II binds to the promoter region of the gene in addition to specific TFs. In contrast to DNA replication however, the RNA resulting from DNA transcription is a single strand, only formed from the anti-sense (39 to 59) strand of DNA. For more details, see the text.

I.  General Introduction



The structure of genes

31

THE STRUCTURE OF GENES While the genetic information is more or less the same in each cell in an organism (although there are exceptions but this will be discussed later), there are large differences in which genes are actually expressed and which are not. For instance, whereas all cells contain the genetic code for voltage gated Na+ ion channels, only neurons express them, and therefore only these cells can actively initiate and propagate action potentials. But even within the neuronal cells different genes (and their protein products) are differentially expressed. For example the enzyme dopamine-β-hydroxylase converts dopamine into noradrenaline. So in cells that use noradrenaline as a neurotransmitter, this enzyme is highly expressed. However, dopamine is, in itself also a neurotransmitter (involved in among others Parkinson’s disease, schizophrenia and drug addiction). Therefore, in dopaminergic cells, dopamine-β-hydroxylase cannot expressed. Thus, each cell must have tightly controlled mechanisms in place to determine which genes are and which are not expressed. This differential gene expression is governed by the interaction between RNA polymerase, general TFs, and additional regulatory proteins. As mentioned earlier, although RNA polymerase II is responsible for the actual initiation and elongation of the RNA chain, a variable number of enzymes called TFs need to bind to the promoter region of DNA in order for the transcription process to start. Moreover, additional proteins can enhance or inhibit the transcriptional process. The combination of RNA polymerase II, the TFs and regulatory proteins together are referred to as the RNA polymerase II holoenzyme. These transcription and regulatory proteins are generally referred to as trans-acting elements, as they are coded by genes on chromosomes different from the gene they regulate. They not only bind to the promoter region of the gene they regulate but also to other parts of the chromosome such as enhancers or silencers, both short DNA sequences that are usually located upstream of the gene they regulated. When regulatory proteins bind to enhancers, they alter the three-dimensional structure of the DNA, enhancing interaction of the TFs or RNA polymerase binding to the promoter region, resulting in enhanced transcription. Silencers are similarly short stretches of DNA upstream of the promoter region. However, binding of specific regulatory proteins to silencers inhibit the transcription of the gene. Since the promoter region, enhancers, and silencers are all located on the same DNA as the gene they regulate, they are referred to as cis-acting elements. It is important to realize that in contrast to the promoter region which is mostly located directly upstream of the gene, enhancers and silencers may be located a considerable distance from the gene they regulate. Transcription starts with TFs binding to the promoter region of the gene. A large number of different promoter regions have been identified which in vertebrates were originally subdivided into two classes: TATA and CpG types, and in mammals in TATA box enriched and CpG-rich promoters. TATA box enriched promoters are characterized by a DNA sequences consisting of TATAAA while the CpG type promoters contain a variable number of CG repeats. Analysis of mammalian DNA has found that the classical TATA box promoters actually represent a minority of the mammalian promoters (Carninci et al., 2006). A recent, more detailed investigation of promoters in several different species (Homo sapiens, Drosophila melanogaster, Arabidopsis thaliana, and Oryza sativa) has identified 10 different clusters (Gagniuc and Ionescu-Tirgoviste, 2012), with CG based promoter again being the most common ones in humans, while the other AT based promoters being more common in the other three species.

I.  General Introduction

32

2.  The Genetic Basis of Behavior

During the first step of gene transcription (Fig. 2.6), the TATA box (about 25–30 base pairs upstream of the start of the gene) is recognized by TF TFIID, which consists of the TATA binding protein (TBP) and several transcription associated factors (TAF). This complex then recruits TFIIB, allowing RNA polymerase and TFIIF to bind, after which several additional TFs bind and transcription starts. It is important to realize that these TFs are also involved in transcription of genes that do not contain a TATA box, hence they are usually referred to as general TFs.

FIGURE 2.6  DNA transcription II. DNA transcription critically depends on the binding of a series of TF to the promotor region of a gene.

I.  General Introduction



mRNA PROCESSING

33

mRNA PROCESSING The result of RNA transcription with RNA polymerase II is a single stranded RNA sequence complementary to the entire length of the gene, usually referred to as the primary (RNA) transcript or precursor messenger RNA (pre-mRNA). This transcript, in most cases, contains both protein-coding and noncoding segments (Fig. 2.7), so-called exons and introns respectively. The primary RNA transcript therefore has to undergo a series of splicing reactions in which the RNA strand is cut on the boundaries between exons and introns (splice junctions), after which the separate exons are fused together again. In the vast majority of cases, an intron starts with a GT and ends with an AG. We therefore often speak of the GT – AG (or in the case of RNA GU – AG) rule. However, although GT and AG are essential for splicing they are not sufficient by themselves. In addition, introns generally have a branch site, consisting of an adenosine located approximately 18–40 nucleotides upstream of the 39 splice site (ss), and usually followed by a polypyrimidine tract. The introns are removed in a two-step transesterification process: in the first step the 29 –OH of the adenosine in the branch site carries out a nuclear attack on the 59 ss, resulting in the cleavage at this site. Subsequently the free 59 end ligates onto the adenosine leading to form the so-called lariat structure. In the second step, the 39 end is attacked by the free 39 –OH of the exon leading to the ligation of the 59 of exon 1 and the 39 of exon 2, thereby releasing the exon. Splicing is catalyzed by a large complex of proteins and small RNAs called the spliceosome. In fact, in eukaryotic cells there are two different spliceosomes: the more ubiquitous U2-dependent and the much less abundant U12-dependent spliceosome. The conformation and composition of the spliceosome are flexible and can change rapidly, allowing it to be both accurate and flexible (Will and Luhrmann, 2011; Matera and Wang, 2014). The U2-dependent spliceosome consists of 5 different so-called ribonucleoproteins (snRNP): U1, U2, U5, and U4/U6 as well as numerous other non snRNP proteins. Each snRNP consists of a small nuclear RNA (or two in the case of U4/U6) a common set of seven Sm proteins and a variable number of additional proteins. This complexity is essential to ensure a perfect orientation of the splicing. Similar to the DNA double helix, the spliceosome complex forms many relatively weak interactions with the splice junction, resulting in a strong, highly precise splicing event. For instance, the RNA part of U1 binds to the conserved 59 ss via the normal base-pairing rule, while the RNA part of U2 recognized the branch point (again binding via the base-pairing rule). After the splicing out of the introns, two additional changes usually occur in the mRNA (Fig. 2.7). First, a 7-methylguanoside (m7G) is linked to the first 59nucleotide. It is thought that this capping process serves several important functions, including protecting mRNA from exonuclease attack, facilitating the transfer of mRNA from the nucleus to the cytoplasm, and enhancing the attachment of the mRNA to the 40S subunit of the ribosomes, which is essential for the final translation process (see later). Secondly, at the 39 end of mRNA about 200 adenylate monophosphate (AMP) residues are added by the enzyme poly(A) polymerase. This poly(A) tail is thought to serve similar functions as the 59 capping process: facilitation of the transport from the nucleus to the cytoplasm, stabilizing the mRNA molecule in the cytoplasm and enhancing the binding to the ribosomes. An important aspect of mRNA processing is alternative splicing, which refers to the inclusion (and exclusion) of different exons in the final mRNA. It has been suggested that about 95% of all mammalian genes undergo alternative splicing. As a result, about 20,000 human

I.  General Introduction

34

2.  The Genetic Basis of Behavior

FIGURE 2.7  mRNA processing. DNA transcription leads to pre-mRNA which contains all the exons and the introns. Subsequently, through recruitment of the spliceosome the introns are excised leading to the mature mRNA. However, in most cases mRNA is further processed by adding a 7 methylguanoside to the 5 side and a long AMP tail on the 39 end. I.  General Introduction



mRNA TRANSLATION

35

protein coding genes can lead to between 250,000 and 1 million different proteins (de Klerk and 't Hoen, 2015). An analysis of 15 different human cell lines showed that a single gene can lead to 25 different mRNAs, with up to 12 expressed in a single cell (Djebali et al., 2012). It is important to realize that isoforms are not always equally expressed. Some isoforms may indeed be very rare (although this does not necessarily mean they are less important: a small change in a crucial pathway may lead to big effects). Several different mechanisms can underlie alternative splicing. For example, the BDNF gene undergoes significant alternative splicing due to alternative transcription initiation. Thus at least 11 different BDNF transcripts have been identified as a result of different promoters (Aid et al., 2007). In addition, alternative splicing can involve an alternative order of exons, or the exclusion of specific exons. Alternative splicing involves the recruitment of specific RNA-binding proteins that are not part of the normal spliceosome, but can enhance or suppress splicing sites (Witten and Ule, 2011).

mRNA TRANSLATION The final step in gene expression is the DNA translation, leading to the formation of proteins, molecules that consists of sequences of amino acids. Although there are 23 proteinogenic (protein-forming) amino acids, only 20 are found in all living species and are therefore referred to as standard amino acids (Fig. 2.8). Of the remaining three, pyrrolysine only occurs in some archaea and one bacterium, and N-formylmethionine occurs in bacteria, mitochondria, and chloroplasts. The last of the proteinogenic amino acids is selenocysteine, which has been found in many noneukaryotes as well as most eukaryotes. However, in contrast to the 20 standard amino acids, selenocysteine only occurs in about 25 human proteins. In order for the DNA translation process to take place, the mature mRNA is transported out of the cell nucleus into the cytoplasm, where it binds to ribosomes. These are large complex organelles, consisting of both proteins and a special form of RNA called ribosomal RNA (transcribed by RNA polymerase I and III, rather than II). Ribosomes consists of two different subunits, a large subunit (50S in prokaryotes and 60S in eukaryotes) and a small subunit (30S and 40S respectively). The term S refers to the Svedberg unit, an indication of the rate of sedimentation, rather than actual size. Ribosomes can be found freely around the cytoplasm or bound to the endoplasmatic reticulum (forming the rough endoplasmatic reticulum). These ribosomes are involved in different types of protein synthesis. The process of RNA translation is illustrated in Fig. 2.9 and starts with the formation of the ribosome around the mRNA (Fig. 2.9A). In most cases this initiation starts around the 59 m7G cap and the first part of exon 1. Since this part is normally not translated into amino acids, it is usually referred to as the 59 untranslated region (59–UTR, a similar region occurs at the 39 site). Once the ribosomes are bound to the mRNA, a special type of RNA binds to the start of the mRNA. This start sequence is referred to as the initiation codon recognition sequence and often has the sequence GCCPuCCAUGG (known as the Kozak sequence with translation starting at the AUG, where Pu can code for either of the two purines adenine or guanine). This special type of RNA is referred to as tRNA (or transfer RNA, transcribed by RNA polymerase III) and has a very characteristic cross-like structure. An amino acid is attached to the 39 site of tRNA through the enzyme aminoacyl tRNA synthetase, leading to a covalent ester binding of the amino acid to the 29 or 39 –OH group of the terminal adenosine

I.  General Introduction

36

2.  The Genetic Basis of Behavior

FIGURE 2.8  The major protein forming amino acids. There are 20 major protein forming amino acids that can be subdivided into 4 groups: neutral nonpolar amino acids (yellow box), neutral polar amino acids (green box), positively charged amino acids (blue box), and negatively charged amino acids (orange box).

nucleotide. If the amino acid is attached to the 29 –OH it will ultimately be transferred to the 39 –OH through a process called transesterification. The result of this process is a so-called aminoacyl-tRNA. However, given that there are multiple amino acids (and hence multiple aminoacyl-tRNAs), the question is how the ribosome determines which amino acids should be attached to each other. This is determined by the nucleotide sequence of RNA (and ultimately the DNA). Given that we have 20 standard amino acids, only a sequences of 3 (or more) nucleotides would have enough information to code for all amino acids. As there are 4 different nucleotides, a “sequence” of one would only code for 4 different amino acids, and a sequence of 2 for 42 = 16. A sequence of 3 could code for 43 = 64 different combinations, more than required for the 20 standard amino acids. It was again Francis Crick and his colleagues who first proposed that amino acids were coded by a triplet (codon) of nucleotides (Crick et al., 1961). As is evident from Fig. 2.10 there is considerable redundancy in the system and many amino acids are coded for by a number of different triplets. It was long thought that the nucleotide codes for the different amino acids were identical in all life forms and it therefore became known as the “universal code”. However, we now know that although highly I.  General Introduction



mRNA TRANSLATION

37

FIGURE 2.9  RNA translation. Through RNA translation, the nucleotide sequence of DNA is translated into amino acids. Translation starts with the binding of ribosomes to mRNA (A). Subsequently, tRNA binds to the triplet codons of mRNA. Each tRNA is covalently bound to one specific amino acid (based on the triplet codon) and thus each subsequent tRNA adds one amino acid to the growing amino acid chain until a stopcodon is reached.

conserved, there are some differences between species and also between cytoplasmic and mitochondrial codes. For instance, the codon ATA codes for Isoleucine in cytoplasmic mRNA, but for Methionine in mitochondrial mRNA. The process of DNA translation therefore requires the aminoacyl-tRNA to bind to the RNA strand based on the triplet (codon) nucleotide sequence. Hence the tRNA sequence holds what is known as an anti-codon sequence. Although based on the triplet code one would expect 64 different aminoacyl-tRNAs, in reality there are only about 30 types of cytoplasmic and 22 mitochondrial aminoacyl-tRNA. The explanation for this is that the general base pair rule for binding of tRNA to mRNA is strict for the first two nucleotides of the triplet codon, but relaxed for the last one. For instance, a G in the third position of the anticodon of tRNA I.  General Introduction

38

2.  The Genetic Basis of Behavior

FIGURE 2.10  The triplet code. The triplet code of DNA (and thus of RNA) determines the amino acid sequence. Illustrated here is the DNA triplet code, with the first letter of the codon (59) in the center, the second codon in the middle ring and the third and final codon in the outer ring (39).

can bind to both C and U on the mRNA strand. Likewise, a U can bind to both A and G on the mRNA strand. This is generally referred to as the wobble hypothesis. As the aminoacyl-tRNA binds to the mRNA, the 39 terminal amino acid is linked to the growing chain of amino acids by the formation of a peptide bond. The ribosome then repositions itself to allow the next aminoacyl-tRNA to bind to the next codon and the process repeats itself. The end of protein synthesis is signaled by one of the so-called stop-codons (UAA, UAG or UGA for RNA, or TAA, TAG or TGA for DNA see Fig. 2.10). These codons do not have a corresponding tRNA but they are recognized by the release factor eRF1. This protein, together with the ribosome dependent GTP-ase eRF3 leads to the release of the peptide chain from the ribosome. The released polypeptide chain can then undergo a series of post-translational modifications (such as phosphorylation, methylation, glycosylation etc.) and adopt its final three dimensional structure.

GENETIC ALTERATIONS The correct replication, transcription and translation of DNA is so fundamental to the survival of the species that it is accompanied by numerous quality control and repair mechanisms. It would be beyond the scope of this book to describe this in detail, but some of the mechanisms are being exploited in the creation of genetically altered animals (see Chapter 3). Nonetheless, in spite of these extensive quality control and repair mechanisms, incorrect DNA replication can occur and can lead to more or less serious problems, depending on the nature of the replication error. In addition, environmental toxins and infections can cause genetic alterations. In the remainder of the chapter we will discuss the most important genetic

I.  General Introduction

Single Nucleotide Variations

39

FIGURE 2.11  The most important genetic alterations. The most important genetic alterations include single nucleotide variants (SNV) in which a single nucleotide is changed. Single nucleotide insertions (SNI) and deletions (SND) generally lead to frameshift alterations (if they occur in exons) and involve the insertions or deletion of a single nucleotide (upper part of the figure). Tandem repeats can either take the form of a triplet repeat, in which one specific triplet is repeated, or variable number tandem repeat (VNTR) which involves repetition of larger nucleotide sequences (lower part of the figure. Finally, in the middle part of the figure, insert (ins) and deletions (del) involve the insertion or deletion of large nucleotide sequences.

alterations. They are illustrated in Fig. 2.11. There is no clear consensus about how genetic variations should be classified and subdivided (Ku et al., 2010), however, we have chosen to subdivide them based on the length of the nucleotide change, cognizant of the fact that this also, is quite arbitrary.

SINGLE NUCLEOTIDE VARIATIONS The simplest variations occur at the single nucleotide level (illustrated at the top part of Fig. 2.11) and include single nucleotide variants (SNV), insertions (SNI) and deletions (SND), A single nucleotide variant (SNV) is defined as a change at the nucleotide level where only one single nucleotide is exchanged for another (Fig. 2.11). SNVs are very common and are usually subdivided into three different types: • Silent or synonymous mutations: These are mutations in which the single nucleotide change either occurs in a noncoding segment of DNA, or where the change does not lead to a different amino acid. Remember that the genetic code contains a large degree of redundancy. As an example (Fig. 2.11) if the SNV would change the original triplet AAA to AAG, the resulting amino acid would still be Lysine. I.  General Introduction

40

2.  The Genetic Basis of Behavior

• Missense or nonsynonymous mutations: These mutations, on the other hand, lead to a change in the amino acid of the peptide. What the functional consequence of such an amino acid change is, is difficult to predict. It can lead to a loss of function, a gain of function or no change in function, depending on the amino acid change itself, as well as the position of the amino acid change. For instance, most neurotransmitter receptors belong to the family of the so-called G-protein coupled receptors. These receptors all have the same basic structure with 7 transmembrane domains. As the membrane of cells is a hydrophobic environment, the amino acids that make up these transmembrane domains all belong to the nonpolar amino acids such as Leucine or Valine (Fig. 2.9). An SNV which changes AUC to CUC leads to an exchange of Leucine for Isoleucine in a transmembrane domain, and would likely have very little impact on the overall functioning of the receptor. However, a SNV that changes the same AUC to AGC would change the neutral hydrophobic Isoleucine to a polar hydrophilic Serine, which would likely have much greater effect on the functioning. Indeed, we induced exactly such a SNV in the third transmembrane domain of the dopamine D1 receptor in Wistar rats, which led to a virtually complete loss of function (Smits et al., 2006). • Nonsense mutation: these mutations also involve a single change in the triplet code. However, rather than changing one amino acid for another, the mutation leads to premature stop-codon. As a result, the protein either appears in a truncated form or gets completely degraded through a process known as nonsense mediated mRNA decay. This process is part of the quality control system of the cell and metabolizes nonsense RNA in order to prevent potentially dangerous truncated proteins from being formed (Baker and Parker, 2004). Nonsense mutations almost invariably lead to a (complete) loss of function, basically creating what is known as a knock-out (see Chapter 3). As an example, we induced a nonsense mutation in the serotonin transporter by changing the codon from UGC (cysteine) to UGA (STOP). Detailed analysis showed that through nonsense mediated mRNA decay, no mRNA and no protein was formed, creating a serotonin transporter knock-out rat (Homberg et al., 2007). In addition to these SNVs, single nucletoide insertions (SNI) and deletions (SND) can occur anywhere within the DNA. These changes can lead to frameshift mutations, depending on the number of nucleotides deleted or inserted, as the reading frame (the codons) shifts either to the 39 or 59 end of the mRNA. As a result, from the mutation onwards virtually all the amino acids change (Fig. 2.11) and premature stopcodons may also occur. As can be imagined, such a frameshift mutation almost invariably leads to a loss of function (unless the mutation is found close to the terminal amino acid in which case the consequences of the frameshift might not be too dramatic).

TANDEM REPEAT SEQUENCES A second class of genetic variations are the so-called tandem repeat sequences. This refers to the insertion (or in rarer cases deletion) of a repeat sequence within a gene. We generally distinguish between two different type of tandem repeats, namely trinucleotide (triplet) repeats or variable number of tandem repeats. Both are illustrated in the lower part of Fig. 2.11.

I.  General Introduction

VARIABLE NUMBER TANDEM REPEATS (VNTRs)

41

TRIPLET REPEATS These genetic alterations involve the insertion of triplet codes (as opposed to single nucleotides). Although this does not lead to a frameshift this can have detrimental effects, especially when large numbers are inserted. The triplet repeat can occur in both protein coding as well as noncoding (including the 59-UTR) regions and have been implicated in at least 22 hereditary disorders (La Spada and Taylor, 2010), including Fragile X (where a CGG repeat occurs in the 59 –UTR) and Huntington’s disease (where CAG repeat in exon 1 leads to the insertion of additional glutamine residues). In the majority of these disorders (11 out of 22) the dysfunction involve a CAG repeat in the coding region of the gene, leading to the insertion of glutamine (Fig. 2.10) and are therefore known as poly-glutamine or poly(Q) disorders. However, other sequences such as CTG, CGG and GCC can also be replicated. The most accepted mechanism thought to underlie the development and elongation of triplet repeats involves the development of semistable hairpins and subsequent slippage of the DNA (Mirkin, 2007). As illustrate in Fig. 2.12, semistable hairpins can spontaneously form during DNA duplication, especially with triplet CXG or GXC (where X can be any nucleotides), as the C from the first triplet can form a stable bond with the G of the second triplet and vice versa. Although in the case of the CAG, the A of the first triplet only weakly binds to the A of the second triplet, the hairpin is still stable enough, especially when there are multiple triplets. A characteristic of such semistable hairpins structures is that they often lead to a similar, out of register realignment of the complementary repetitive strands, leading to socalled ‘slip-outs’ which can also fold into hairpins structures (Mirkin, 2007; Galka-Marciniak et al., 2012). Such hairpin formations can lead to a stalling of DNA polymerases and can ultimately lead to a reduction or an expansion of the repeats (depending on whether the stable hairpin is formed on the nascent or the template strand). An important characteristic of triplet repeat disorders is genetic anticipation, meaning that the disorder starts progressively earlier and is progressively more severe in subsequent generations. This is due to the increase in the number of repeats with each generation. Studies with several triplet repeat disorders have shown that a threshold exists, below which triplets are fairly stably transmitted. However, once the triplet has crossed the threshold, the triplet can increase massively from generation to generation. For example, in Huntington’s disease, CAG repeats below 35 are fairly normal, generally stable and do not lead to any symptoms. However, CAG repeats above 35 are inherently unstable and lead to the characteristic symptoms of Huntington’s disease. A major challenge in triplet repeat disorders is understanding the switch between below-threshold stable and above threshold unstable triplet repeats (Lee and McMurray, 2014).

VARIABLE NUMBER TANDEM REPEATS (VNTRs) VNTR are structural regions of the DNA where a short sequence of nucleotides (longer than 3) is repeated a variable number of times in tandem. VNTRs are commonly subdivided into microsatellites (repeat sequences shorter than 5 nucleotides) and minisatellites (repeat sequences larger than 5 nucleotides) and, like triplet repeats, are thought to be due to DNA slippage errors during DNA replication. As the number of repeats is very individually

I.  General Introduction

42

2.  The Genetic Basis of Behavior

FIGURE 2.12  The most likely mechanism underlying triplet repeats. Although the development of triplet repeats is not yet fully understood, the most likely mechanism involves the formation of stable hairpin formations within the DNA. This then leads to an out of register realignment leading to so-called “slip outs” leading to hairpin structure in the complimentary DNA strand and triplet replication.

determined (and generally differs between the maternal and paternal copy), analysis of VNTRs is often used in forensic and paternal identity research. In neuroscience, VNTRs occur in several important genes. Often studied examples are the 40 nucleotide sequence in the 39 UTR of the dopamine transporter (Vandenbergh et al., 1992) that can be repeated between 3 and 11 times, with the 9- and 10- repeat being the most common and the 17 nucleotide sequence in intron 2 of the serotonin transporter that is repeated 9, 10 or 12 times (Lesch et al., 1994). In several studies, the 12-repeat has been linked to increased aggression and

I.  General Introduction

COPY NUMBER VARIATIONS (CNVs)

43

mood disorders (Haddley et al., 2012). Finally, an often studied VNTR is within exon 3 of the dopamine D4 receptor, with 2 to 10 repeats of a 48 nucleotide stretch (Van Tol et al., 1992). This VNTR has been associated with drug addiction and ADHD (see Chapters 6 and 8).

SHORT INSERTIONS AND DELETIONS (INDELS) As the words imply, these genetic variations refer to the insertions or deletions of short nucleotide sequences as indicated in the middle of Fig. 2.11. A general, albeit arbitrary, rule of thumb is that insertions or deletion have sizes between 100 and 1000 nucleotides, while everything above 1000 is referred to as a copy number variation. However, this does not encompass the large number of indels with sizes below 100 nucleotides. It has been estimated that the human genome contains about 1.6 to 2.5 million indels, although a full scale analysis has yet to be undertaken. Indeed, a recent study identified almost 2,000,000 nonredundant indels with about equal numbers of insertions and deletions (Mills et al., 2011), however, the sizes of the indels were between 1 and 10,000 nucleotides, thus including SNI, SND, and CNVs, although the majority were <100 nucleotides. These indels were found on all chromosomes with a frequency of 1 indel per almost 1600 nucleotides. More than 800,000 indels were found in known genes, although only 2123 indels actually affect exons. However, more than 39,000 indels were found in promoter regions of known genes and could thus affect gene expression. In this respect, probably the best studied genetic variations in neuroscience is an indel within the promoter region of the serotonin transporter (SERT), the so-called 5-hydroxytryptamine transporter linked polymorphic region (5-HTTLPR). This region contains a 44 nucleotide insertion/deletion region leading to a large (l) and short (s) variant (Lesch et al., 1996). Although it was originally thought that the s-allele conferred lower SERT activity, further detailed analysis has identified an additional SNV within the long variant (LA and LG) with the LA also having low SERT activity (Murphy et al., 2008). Hence, it is now thought that the presence of the LG genotype confers high SERT activity.

COPY NUMBER VARIATIONS (CNVs) CNVs also represent insertion/deletion alterations. However, they are generally defined as that are larger than 1000 nucleotides. It has been estimated that up to 12% of the human genome is subject to CNVs and it can occur both during meiosis as well as during mitosis, as evidenced by the fact that monozygotic twins can have different CNVs (Bruder et al., 2008). It has been suggested that CNVs are as important as SNVs in determining individual differences. Although it was originally thought that CNVs can be advantageous as they induce redundancy and can therefore protect against mutations, much of the variation in CNVs is now considered to be detrimental, as they are involved in a multitude of diseases, including various cancers (Valsesia et al., 2013) and CNS disorders (Morrow, 2010). A change in CNV requires a change in chromosomal structure joining previously separate strands of DNA. Detailed analysis of these junctions shows that most CNVs are located in areas with high homology within the genome (Hastings et al., 2009). This suggests that CNVs are the result of abnormal homologous recombination. In addition, CNVs can occur in areas

I.  General Introduction

44

2.  The Genetic Basis of Behavior

with only a limited degree of homology (often referred to as microhomology). In such cases, CNVs are unlikely due to aberrant homologous recombination. Homologous recombination was already discussed in chapter 1 as the mechanisms underlying the crossing-over event that occurs during meiosis. It requires extensive DNA sequence identity (up to 300 nucleotides in mammalian cells (Liskay et al., 1987)), as well as a specific strand breaking protein (Rad51 in eukaryotic cells). HR underlies many DNA repair processes by using identical sequences to repair a double stranded DNA. When the process involves exchange of the homologous chromosomal position in the sister chromatid no structural change will take place. However, if the exchange involves a homologous segment from another chromosome (a process referred to as nonallelic homologous recombination NAHR) CNVs can occur. NAHR induces an unequal crossing over leading to a duplication in one, and a deletion of a DNA segment in the other chromosome. Homologous replication is also important for repairing broken replications and, if not performed accurately, can not only lead to duplications or deletions, but also to translocations (ie, exchange of DNA sequences between different chromosomes) or inversions (in which a stretch of DNA is reversely inserted in the same chromosome). These latter two structural changes (translocations and inversions) are generally known as copy neutral variations, in contrast to deletions and inversions, as the total number of nucleotides per chromosome does not change. In the field of psychiatry, CNVs have received increased attention, especially in autism and schizophrenia (see Chapters 8 and 9) and it has been suggested that both disorders are prone to de novo CNVs. In this respect, one of the most studied CNVs is the 22q11.2 CNV, involving a deletion of 1.5 to 3 million basepairs and more than 25 different genes. Individuals with this deletion show symptoms reminiscent of schizophrenia, autism, anxiety and ADHD (Jonas et al., 2014). Another often studied CNV is a deletion of 15q11-13. The interesting aspect of this CNV is that the functional consequences of this CNV depend on whether the deletion is on the maternal or paternal copy. If the deletion comes from the mother, the child will suffer from Angelman syndrome, whereas if the deletion is inherited from the father, the child will suffer from Prader-Willi syndrome. The reason for this difference lies in the fact that the genes of the paternal and maternal copies are differentially expressed (Mabb et al., 2011).

CHROMOSOMAL VARIATIONS The last group of genetic variations we will discuss are alterations in the number of chromosomes. Under normal circumstances, an individual obtains a single set of chromosomes from each parent, thus leading to a so-called diploid genome. As discussed more extensively in Chapter 1, this results from the separation of the chromosomes during meiosis, when sperm and egg cells become haploid (ie, carry only one set of chromosomes) and the subsequent fusion of the gamete cells during fertilizations. However, in certain circumstances during meiosis the chromosomes fail to separate, leading to gametes with two sets of chromosomes or no chromosomes. When such a gamete is subsequently fertilized, the resulting cells will contain an uneven number of chromosomes (generally referred to as aneuploidy leading to either monosomy or trisomy). In the vast majority of cases, such embryos are not viable and in fact they are likely to constitute the majority of miscarriages. However, a few exceptions exist, especially when the smaller autosomal

I.  General Introduction

REFERENCES 45

chromosomes or the sex chromosomes are involved. The most common autosomal trisomies are trisomy 21 (Down syndrome, occurring about 1 in 1000 live births) and trisomy 18 (Edwards syndrome, about 1 in 6000 live births), although cases of trisomy 13, 9, 8, and 22 have also been described. Trisomies can also occur in the sex chromosomes, leading to disorders such as triple X (XXX), XXY (Klinefelter Syndrome) or XYY. These sex chromosome trisomies all occur roughly in about 1 in 1000 live births, and compared to the autosomal trisomies present a much more subtle phenotype. In fact, many females with XXX or males with XYY often are not diagnosed at all. Monosomies, that is, missing one copy of a gene appears much more detrimental than having an additional copy. As a result, with the exception of Turner’s syndrome (X0) no other monosomies appear viable. Partial monosomy (perhaps more accurately referred to as CNVs) such as “cri du chat” (deletion of the end of the short arm of chromosome 5) and 1p36 syndrome (deletion of the end of the short arm of chromosome 1) have been described however. Turner’s syndrome occurs when only an X chromosome is present (in the absence of another X or Y) and is found in about 1 in every 2000 live births (although about 99% of all cases result in spontaneous termination in the first trimester). The symptoms are quite varied from patient to patient and often include short stature, gonadal dysfunction and sterility. However, additional health problems including congenital heart disease, diabetes and cognitive deficits are also frequently observed.

References Aid, T., Kazantseva, A., Piirsoo, M., et al., 2007. Mouse and rat BDNF gene structure and expression revisited. J. Neurosci. Res. 85, 525–535. Baker, K.E., Parker, R., 2004. Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr. Opinion Cell Biol. 16, 293–299. Bell, S.P., Dutta, A., 2002. DNA replication in eukaryotic cells. Annual Rev. Biochem. 71, 333–374. Bruder, C.E., Piotrowski, A., Gijsbers, A.A., et al., 2008. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82, 763–771. Carninci, P., Sandelin, A., Lenhard, B., et al., 2006. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genetics 38, 626–635. Champoux, J.J., 2001. DNA topoisomerases: structure, function, and mechanism. Ann. Rev. Biochem. 70, 369–413. Chargaff, E., Lipshitz, R., Green, C., 1952. Composition of the desoxypentose nucleic acids of four genera of seaurchin. J. Biol. Chem. 195, 155–160. Crick, F., 1970. Central dogma of molecular biology. Nature 227, 561–563. Crick, F.H., Brenner, S., Watstobi S Rj, et al., 1961. General nature of genetic code for proteins. Nature 192, 1227-&. de Klerk, E., 't Hoen, P.A., 2015. Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. TIG 31, 128–139. Djebali, S., Davis, C.A., Merkel, A., et al., 2012. Landscape of transcription in human cells. Nature 489, 101–108. Forterre, P., 2013. Why are there so many diverse replication machineries? J. Mol. Biol. 425, 4714–4726. Gagniuc, P., Ionescu-Tirgoviste, C., 2012. Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters. BMC Genomics, 13. Galka-Marciniak, P., Urbanek, M.O., Krzyzosiak, W.J., 2012. Triplet repeats in transcripts: structural insights into RNA toxicity. Biol. Chem. 393, 1299–1315. Ghosh, A., Bansal, M., 2003. A glossary of DNA structures from A to Z. Acta crystallographica. Section D. Biol. Crystallograph. 59, 620–626. Haddley, K., Bubb, V.J., Breen, G., et al., 2012. Behavioural genetics of the serotonin transporter. Curr. Topics Behav. Neurosci. 12, 503–535. Hastings, P.J., Lupski, J.R., Rosenberg, S.M., et al., 2009. Mechanisms of change in gene copy number. Nature Rev. Genetics 10, 551–564.

I.  General Introduction

46

2.  The Genetic Basis of Behavior

Homberg, J.R., Olivier, J.D.A., Smits, B.M.G., et al., 2007. Characterization of the serotonin transporter knockout rat: a selective change in the functioning of the serotonergic system. Neuroscience 146, 1662–1676. Jonas, R.K., Montojo, C.A., Bearden, C.E., 2014. The 22q11.2 deletion syndrome as a window into complex neuropsychiatric disorders over the lifespan. Biol. Psychiatry 75, 351–360. Ku, C.S., Loy, E.Y., Salim, A., et al., 2010. The discovery of human genetic variations and their use as disease markers: past, present and future. J. Hum. Genet. 55, 403–415. La Spada, A.R., Taylor, J.P., 2010. Repeat expansion disease: progress and puzzles in disease pathogenesis. Nat. Rev. Genetics 11, 247–258. Lee, D.Y., McMurray, C.T., 2014. Trinucleotide expansion in disease: why is there a length threshold? Curr. Opinion Genetics Dev. 26, 131–140. Lesch, K.P., Balling, U., Gross, J., et al., 1994. Organization of the human serotonin transporter gene. J. Neural Trans. General Sec. 95, 157–162. Lesch, K.P., Bengel, D., Heils, A., et al., 1996. Association of anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Science 274, 1527–1531. Liskay, R.M., Letsou, A., Stachelek, J.L., 1987. Homology requirement for efficient gene conversion between duplicated chromosomal sequences in mammalian cells. Genetics 115, 161–167. Mabb, A.M., Judson, M.C., Zylka, M.J., et al., 2011. Angelman syndrome: insights into genomic imprinting and neurodevelopmental phenotypes. Trends Neurosci. 34, 293–303. Matera, A.G., Wang, Z., 2014. A day in the life of the spliceosome. Nat.ure Rev. Mol. Cell Biol. 15, 108–121. McGlynn, P., 2013. Helicases at the replication fork. Adv. Exp. Med. Biol. 767, 97–121. Mills, R.E., Pittard, W.S., Mullaney, J.M., et al., 2011. Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res. 21, 830–839. Mirkin, S.M., 2007. Expandable DNA repeats and human disease. Nature 447, 932–940. Morrow, E.M., 2010. Genomic copy number variation in disorders of cognitive development. J. Am. Acad. Child Adolesc. Psychiatry 49, 1091–1104. Murphy, D.L., Fox, M.A., Timpano, K.R., et al., 2008. How the serotonin story is being rewritten by new gene-based discoveries principally related to SLC6A4, the serotonin transporter gene, which functions to influence all cellular serotonin systems. Neuropharmacology 55, 932–960. Smits, B.M.G., Mudde, J.B., van de Belt, J., et al., 2006. Generation of gene knockouts and mutant models in the laboratory rat by ENU-driven target selected mutagenesis. Pharmacogenetic. Genomic. 16, 159–169. Thiergart, T., Landan, G., Schenk, M., et al., 2012. An evolutionary network of genes present in the eukaryote common ancestor polls genomes on eukaryotic and mitochondrial origin. Genome Biol. Evol. 4, 466–485. Umate, P., Tuteja, N., Tuteja, R., 2011. Genome-wide comprehensive analysis of human helicases. Commun. Integrat. Biol. 4, 118–137. Valsesia, A., Mace, A., Jacquemont, S., et al., 2013. The growing importance of CNVs: new insights for detection and clinical interpretation. Frontiers Genetics 4, 92. Van Tol, H.H., Wu, C.M., Guan, H.C., et al., 1992. Multiple dopamine D4 receptor variants in the human population. Nature 358, 149–152. Vandenbergh, D.J., Persico, A.M., Hawkins, A.L., et al., 1992. Human dopamine transporter gene (DAT1) maps to chromosome 5p15.3 and displays a VNTR. Genomics 14, 1104–1106. Watson, J.D., Crick, F.H., 1953. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171, 737–738. Will, C.L., Luhrmann, R., 2011. Spliceosome structure and function. Cold Spring Harbor Perspect. Biol. 3, a003707. Witten, J.T., Ule, J., 2011. Understanding splicing regulation through RNA splicing maps. TIG 27, 89–97.

I.  General Introduction