Appendix to chapter 6 (Contributed by Rodger Staden, M.R.C. Laboratory of Molecular Biology, Cambridge, U.K.)
This appendix contains notes on how to start a database for the DB system and instructions on how to run some of the programs described in Chapter 6. When typing lines of commands for programs each line must be completed by a carriage return and in the descriptions below this is often referred to by cr.
Getting started with the DB system DBSTART This program is used to start a new database. It writes each of the files and sets the number of gels and contigs to zero. The files it creates are described below but note that the version number will be set to 0. Running DBSTART 1. Start the program. 2. Type project name (only six characters for our file naming conventions).
Reserved file names Each sequencing project is given a name for its files. Local file naming conventions will determine how names are set up. In our system file names are of the form ABCDEF.GHI We give each project a 6-character name, for example, LAMBDA; DBSTART will then automatically name our file of relationships LAMBDA.RL0, the file containing the archived gel 349
350
DNA SEQUENCING
reading names LAh4BDA.ARO and the working versions of gel readings LAh4BDA.SQO.
Running the programs BATIN Running the program 1. Start the program. 3. Type name for file of file names-‘FILE OF FILE NAMES’ ‘PLEASE TYPE NAME OF FILE 1’ 3. Type name for next gel reading-‘FILE NAME FOR NEXT GEL READING’ 4. Type in data on lines of <80 characters. Finish data with an c? character. ‘TYPE DATA IN NOW’ 5. If you wish to enter another gel reading type Y or caniage return only. ‘TO ENTER ANOTHER GEL READING TYPE Y’ If you t y p e Y control goes back t o step 3. 6. When all the gels have been entered and step 5 is answered by carriage return only the program asks if you wish t o have the gels printed. If you do, type Y and all the gels will be listed one after the other headed by their file names, or else type camage return only and the program stops. ‘TO GET GELS PRINTED TYPE Y 7. The program stops. DBCOMP Running the program 1. Start the program. 2. Type name of file of consensus and/or vector sequence file names: ‘FILE OF MASTER FILE NAMES =’ This must be written beforehand using a system editor. 3. Type name of file of gel reading names: ‘FILE O F G E L READING NAMES =’ This is written by BATJN. 4. Type a value for the minimum number of identical consecutive characters to define a match: ‘MIN =’ 5. Searching starts now.
Ch. 6
APPENDIX
351
DBUTIL
Running the program 1. Start the program. 2. Type name of database ‘PROJECT NAME =’ 3. Type version number ‘VERSION NUMBER =’ 4. Select 60 or 120 character lines for display: ‘FOR 120 CHARACTER LINES TYPE Y’ (This is to allow the use of both wide and narrow keyboard output). 5. Select option by number: ‘SELECT OPTION BY NUMBER STOP = 0, ENTER = 1, PRINT = 2, DISPLAY = 3, JOIN = 4, COMPLEMENT = 5, EDIT = 6, SEARCH = 7, FIX = 8, COPY = 9, CHECK = 10, SCAN = 11, CONSENSUS = 12 ‘OPTION NUMBER =’ ENTER ENTER allows entry of a new gel into the database system. It is necessary to know beforehand if the new gel overlaps with any sequences already in the database. (This is determined using DBCOMP.) It automatically numbers and writes the working version of the new gel and then offers the following options: (a) Complementing and reversing of the new gel (to add it to existing data it may need its sense changed); (b) Adding to an existing contig either at a left or right end or internally; (c) Movement of overlap; (d) Display of overlap; (e) Editing of the new gel. (It is only during entry into the database that gels can be edited individually. Once entry is completed all gels covering any position are edited together to maintain the alignments of the sequences.) (€) Editing of the contig. A basic requirement is that the data must be correctly aligned by the user when he enters it into the database. If the gels do not line up the user can use the editing
352
DNA SEQUENCING
functions to improve their alignment and this can always be achieved by making only insertions in the sequences. In this way n o data need ever be deleted until suficient agreement is found between the gels covering every sequence position. We use X or spaces as padding characters to achieve alignment so that problem areas stand out clearly in the lined-up sequences. 1. Type name of gel to enter ‘NAME OF ARCHIVE =’. If the gel has already been entered the program will type ‘GEL ALREADY IN LINE X, ENTRY STOPPED’ and return to DBUTIL. The program types the name of the working version of this gel. 2. Tell program if gel overlaps data already in the database: ‘IF THIS G E L OVERLAPS TYPE Y’ else t y p e camage return. If the gel does not overlap there is nothing else to do for entry and control passes back to DBUTIL. 3. Tell the program if the gel needs to be reversed and complemented t o be in the same sense as the contig it overlaps (reported by DBCOMP) ‘TO REVERSE A N D COMPLEMENT THIS GEL TYPE Y’ else carriage return only. 4. Tell program if the gel extends the contig leftwards: ’IF THIS NEW G E L EXTENDS THE CONTIG LEFTWARDS TYPE Y’. 5. Next prompt depends on step 4 and defines the position of the overlap. IN (a) For a gel that extends the contig leftwards-‘POSN NEW GEL OF LEFT CHARACXER IN O L D CONTIG =’ (b) For other overlaps-‘POSN IN CONTIG OF LEFT END -
If these numbers are out of range the program will reprompt. 6. The overlap is now displayed for its first 120 characters. The display consists of: on the top line the relative positions for the contig characters and then on succeeding lines the sequences from each of the gels that are in this part of the contig. At the left end of each gel is shown its gel number with the sign indicating its sense relative t o its archive. On the next line down is shown the consensus sequence for the gels lined up above and then below that is the sequence of the new gel. Asterisks below this line show
Ch. 6
APPENDIX
353
mismatches between the new gel and the consensus and finally on the next line are the positions for the new gel. 7. The program then needs to know if the position of the left end of the overlap is correct: ‘IF JOINT CORRECT TYPE Y If it is, type Y, if not, carriage return, and the program goes back to step 5. 8. The program now offers a number of options to allow the user to align the new gel correctly over its whole length with the data already in the contig. It is important that sufficient edits are made to the new gel or the gels in the contig at this stage to get the alignment correct because once entry is completed the alignment is fixed and cannot easily be changed (see DBFIX). Alignment can be achieved by making insertions or deletions but deletion of data requires the original gels to be checked. For this reason at entry we usually make only insertions to achieve alignment. We use X or spaces as padding characters to achieve alignment and so can distinguish padding characters from characters assigned from reading gels. Note: ‘EDIT NEW GEL‘ will now allow the insertion of space characters so an X character must be used for this function. The options offered are: ‘SELECT OPTION BY NUMBER’, ‘EDIT NEW GEL= 1, DISPLAY = 2, COMPLETE ENTRY = 3, GIVE UP = 4, EDIT CONTIG = 5’, ‘OPTION NUMBER =’ 9. EDIT NEW GEL is identical to SEQEDT except that it is limited to 200 characters per edit command string. Note: if one deletes or inserts at the leftmost character in the gel the position of the gel relative to the contig changes and the whole gel will move left or right. 10. DISPLAY allows display of the region of overlap only. This is defined by the relative positions in the contig. 11. COMPLETE ENTRY adds all of the information about the new gel to the database and returns to DBUTIL. 12. GIVE UP returns to DBUTIL with no information about the new gel being added to the database. N.B.: Both ‘complete entry’ and ‘give up’ options request the
354
DNA SEQUENCING
user to confirm these options are what he wants by typing ‘if you are sure type Y’. If the user does not t y p e Y the program goes back to step 8. 13. EDIT CONTIG allows editing of the contig, i.e., all of the gels covering any edit position are edited simultaneously so as to maintain the alignment of the gels. The program protects the user by allowing him to edit only within the region of overlap. See EDIT CONTIG described for the main options of DBUTIL.
PRINT This will print the contents of the database in three ways: (a) All the contig descriptor lines followed by all the gel descriptor lines in the order they appear in the file. (b) All the contigs are printed one after the other with their gel descriptor lines in the order they have in the contig. Each contig is headed by its contig descriptor line. (c) Selected contigs only. Each contig is headed by its contig descriptor line and followed by the gel descriptor lines that cover the region defined by the user. Steps 1. If you want option (c) type Y, else carriage return: ‘TO SELECT CONTIGS TYPE Y’ l(a). Define contig by left gel number: ‘NUMBER OF LEFT G E L THIS CONTIG =’ l(b). Define region of contig to print over: ‘RELATIVE POSITION OF LEFT E N D =’, ‘RELATIVE POSITION OF RIGHT E N D =’. Carriage return defaults to whole contig. l(c). To select another contig type Y: ‘TO SELECT ANOTHER CONTIG TYPE Y’. 2. For option (b) type Y, option (a) results if you type only carriage return ‘FOR SORTED DATA TYPE Y’. DISPLAY DISPLAY will display on the printer or screen all the gzls covering any region of the sequence, lined up in register showing their gel numbers, their strandedness and, underneath, their consensus.
Ch. 6
APPENDIX
355
(1) Define the contig you wish to display by the number of its left gel -‘NUMBER OF LEFT GEL THIS CONTIG = ’. (2) Define the region of the contig you wish to have displayed. Carriage return only defaults to whole of contig. ‘RELATIVE POSITION OF LEFT END =’, ‘RELATIVE POSITION OF RIGHT END =’. Printing will then start and when finished control passes back to the main options.
JOIN This function allows the joining of two contigs. It allows the user to correctly align the ends of the two contigs by editing each contig separately. It is important that the alignment achieved is correct because once the join is completed the alignment is fixed. (1) Define the left contig by the number of its leftmost gel‘LElT CONTIG, ‘NUMBER OF LEFT GEL THIS CONTIG
-
- 9
(2) Define the right contig by the number of its leftmost gel‘RIGHT CONTIG, ‘NUMBER OF LEFT GEL THIS CONTIG
- 7
(3) Define the position of overlap-‘RELATIVE POSITION IN LEFT CONTIG OF LEFT CHAR OF RIGHT CONTIG =’. The overlap must be of at least one character. If this criteria is not fulfilled the program prints ‘ILLEGAL JOJN. TO RETURN TYPE -9’ and then returns to step 3. If the user then types -9 the program returns to the main options in DBUTIL without making the join. (4) The program then displays the join showing all the gels overlapping the join from the left contig, their consensus, all the gels from the right contig that overlap the join, their consensus and then asterisks to denote mismatches between the two consensuses. (5) The program then offers the user options to enable him to align the two contigs. These options are: ‘SELECT OPTION BY NUMBER’, ‘MOVE JOIN = 1, EDIT LEFT CONTIG = 2, EDIT RIGHT CONTIG = 3, DISPLAY = 4, COMPLETE JOIN = 5, GIVE UP = 6’.
356
DNA SEQUENCING
a. Move join sends control back t o step 3. b. Edit left contig allows editing of the gels in the left contig (see EDIT CONTIG). c. Edit right contig as above. N.B.: The program protects the user by only allowing edits t o be made in the region of overlap. d. Display allows display of the overlapping region. e. Complete join sets up all the values in the database and returns control t o DBUTIL main options. After selecting this option the user must confirm it by typing Y when the program asks ‘if you are sure type Y’. If the user does not confirm control passes back to step 5. f. Give up returns control t o the main options of DBUTIL without any changes being made t o the database. COMPLEMENT This function will complement and reverse all of the gels in a contig. It automatically reverses and complements each gel sequence, reorders left and right neighbours, recalculates relative positions and changes each strandedness. The only user input required is to define the contig t o complement by the number of its leftmost gel. It will take a few moments for the program to complete this action. Control then passes back to the main options of DBUTIL. EDIT CONTIG EDIT CONTIG allows edits t o be made t o all of the gels covering any position in the contig. It maintains the alignments of the gels by always making the same number of insertions or deletions in all the gels. Note that these edits are immediately carried out and the ‘give-up’ options of ENTER and JOIN do not undo them. The user must undo them himself. Note that if this option has been entered from either ‘enter’ or ‘join’ options the program will restrict edits to the region of overlap. If the user tries to edit outside, the program reprompts for the position. To escape from
Ch. 6
APPENDIX
357
the insert or delete options the user can type a negative number for the position. To escape from the ‘number of chars’ stage the user can again type a negative number. The options offered by EDIT CONTIG are: ‘INSERT = 1, DELETE = 2, CHANGE = 3, RETURN = 4, OPTION NUMBER =’. 1. INSERT (a) Type position to perform insertion: ‘POSITION =’. (b) Type number of characters to insert: ‘NUMBER OF CHARS
-9
The maximum number allowed at any insertion point is 80. (c) For each gel in turn the program then prompts for the characters to insert at the edit position: ‘CHARS TO INSERT INTO GEL X =’. If the user types only carriage return the program will automatally insert space characters. When all the gels have been done control passes back to the main options.
2. DELETE (a) Type position to perform deletion: ‘POSITION =’. (b) Type number of characters to delete: ‘NUMBER OF CHARS -7 -
The program will then delete the given number of characters from all the gels covering the edit position and control will pass back to the main options. 3. CHANGE This function allows characters in individual gels to be replaced. (a) Type gel number to edit: ‘GEL NUMBER =’. The program responds with the relative position and length of the selected gel in case the user only knows the edit position relative t o the gel. (The edit position must be relative to the contig.) (b) Type position of first character to change: ‘POSITION =’. (c) Type the number of characters t o change: “UMBER OF CHARS =’. (d) Type of new characters: ‘NEWCHARS =’.
358
DNA SEQUENCING
The program will then replace the characters at the edit position and control passes back t o the main options. 3. RETURN This function simply passes control back to the part of the program from which the edit contig routine was entered.
SEARCH This function allows a search to be made for a gel by its archive name. If the gel is found its gel descriptor line is printed, if the gel is not found the program prints ‘NOT IN DB’. 1. Type name of archive: ‘ARCHIVE NAME =’. 2. To search for another type Y: ‘TO SEARCH FOR A GEL TYPE Y , else type only carriage return t o get back t o the main program.
DBFIX This program exists to cope with any problems that occur with the database. Its use requires a good understanding of the file structure of the system. With imagination a skilled user can correct any mistakes and misalignments in his data. It has the following options: 1 . LINE CHANGE allows the user to change the contents of any line in the file of relationships. The line is selected by number, the program prints the current line and prompts for the new line. The user should type the new values, each followed by a comma. The program then asks the user t o t y p e ‘Y’ if he is sure he has made the changes he wanted. If the user types anything else the line is left unchanged. 2. SHIFT allows the user to change all the relative positions of a set of neighbouring gels by some fixed value, i.e., it will shift related gels either left o r right. It can therefore be used to change the alignment of the gels in a contig. It prompts for the number of the first gel to shift and then for the distance t o move them. It then moves rightwards (i.e., follows right neighbours) and shifts each gel, in turn, up to the end of the contig. (This means that only
Ch. 6
APPENDIX
359
those gels from, the first to shift, to the rightmost, are moved.) It then reminds the user to change the length of the contig accordingly (use line change). 3. EDIT allows the user to edit an individual gel independently of any others it may be related to. The commands are as for SEQEDT or EDIT NEW GEL in ENTER. The effect of this editing on the length of this gel or its relationship to others must be accounted for (if necessary) by use of the line change function. Maximum number of edit string characters is 200. 4. SQUASH is a function that deletes a contig line by moving down all the contig lines above by one position. It prompts only for the line to delete. It does not delete any of the gels or gel lines for the deleted contig but it does reduce the number of contigs on line 1000 by 1. 5. RENAME GEL is a function that is used to rename the archive names of gels in the database; it only changes the name in the . A R N file of the database: it does not change anything else.
COPY COPY is a function that makes a copy of all of the data in all of the files in the database. It prompts for a version number for the new copy. CHECK There is no user input required once this option has been selected. SCAN
The user is requested to select the contig and the section of it he wishes to scan. He is then asked to supply a percentage (see the scan algorithm). The program prints out a summary for the region in terms of the percentage number of positions that have been given each of the 5 codes and then asks the user if he wants to have the codes printed out in the form of a sequence. The user is then asked if he wants to store this sequence of codes in a disk file for further processing; if so he is asked to give a name for the file.
360
DNA SEQUENCING
CONSENSUS 1. Type name for consensus file: ‘NAME FOR CONSENSUS FILE =’. 2. If you wish to select contigs type Y or carriage return: ‘TO SELECT CONTIGS TYPE Y’. 3. If selecting contigs supply number of left gel: ‘NUMBER OF LEFT GEL THIS CONTIG =’. 4. If selecting contigs define region of current contig: ‘RELATIVE POSITION OF LEFT END =’, ‘RELATIVE POSITION OF RIGHT END =’, (zeroes give whole contig). 5. If selecting contigs the user can cycle round steps 3 and 4 by typing Y: ‘TO SELECT ANOTHER CONTIG TYPE Y . 6. To write another separate consensus file type Y: ‘TO CALC ANOTHER CONSENSUS TYPE Y . If so, go back to step 3, or return. SEQEDT Running the program Reference should be made to the examples especially when preparing the edit commands. These are best prepared before running the program and written out on paper so that they can be copied at the keyboard. It is easiest to use a keyboard listing of the file to be edited t o write the edit commands on. There are two modes of program operation. A. Creation of a completely new file: 1. Start the program.
2. You do not wish to edit an old file, so type cr on receipt of the message: ‘TO EDIT A N OLD FILE TYPE Y’. 3. Supply name of output file when asked: ‘PLEASE TYPE NAME OF FILE 2’. 4. Type in edits. A few seconds after you have completed the edits with an @ character, keyboard output will start giving a copy of the data just written to disk.
Ch. 6
APPENDIX
36 1
B. Editing an old file 1. Start the program. 2. You do wish to edit an old file, so type Y cr on receipt of the message: 'TO EDIT AN OLD FILE TYPE Y'. 3. Supply names of input (FILE 1) and output (FILE 2) files when asked. 4. Type in edit commands. N.B.: The delete and insert commands may be used in reverse order and still produce the same result, e.g. D/3/I/abc/ is equivalent to I/abc/D/3/. N.B.: To insert data at position 1in a file do not find position 1 by command/F/l/I--, etc. The pointer is already at position 1 so just type /I/--. SEQLST Running the program 1. Start the program. 2. Supply name of the file containing sequence. 3. Define region to be listed (printed). 4. Select single- or double-stranded printing by typing 1 or 2. Typing cr only is equivalent to typing 1. If double-stranded output is selected there will be a short pause while the program creates the complementary strand. 5. Define number of gaps between lines (for 120 character line output). 6. Select 60 or 120 character line output. When printing has finished the program requests the next start position for any more printing required. A negative start position will stop the program.
TRNTFW, TRANMT Running the program 1. Start the program. 2. Supply the name of the file containing the sequence.
362
DNA SEOUENCING
3. Select code. 4. Supply printing start and stop positions. These are sequence numbers to define which region of the sequence file is to be
printed. The start position is always in phase 1 and determines the phase of all the genes. If zero values are supplied by the user typing only cr the whole file will be listed. 5. Supply first and last sequence numbers to define each gene or region t o be translated. After each pair of numbers has been supplied there will be a slight pause while the program calculates the translation. Then the program requests the next pair of numbers. This will continue until the user supplies zero start and end positions by typing only cr. Printing will then start. 6. When printing has finished the program is ready for further instructions and user input commences at step 3. A negative printer start position will stop the program.
TRANDK Running the program 1. Start the program. 2. Type name of input file containing D N A sequence. 3. Type name of output file to write the translation to. 4. Select code. 5. Type start and stop sequence numbers for each region you wish to translate. A zero start position for a region will signal that you have finished defining regions and the program will write the translation t o disk and stop.
SEARCH, SRCHMU Layout of the names and strings file This file must be laid out as shown in the example with names and strings separated by slashes. The end of any set of strings is terminated by an extra slash. The user may supply names in any
Ch. 6
363
APPENDLX
order whilst running the program. Names such as HaeII and HaeIII must appear in the file in this order. Lines of input may be up to 80 characters in length.
Keyboard string layout Individual strings are followed by a slash character and sets of strings are terminated by an extra slash, Lines of input may be up to 80 characters in length. Input is terminated by an (2 character. Keyboard name layout Names are followed by a slash character. Lines of input may be up to 80 characters in length. Input is terminated by an @ character. Example DDEl/CT-AG// ALUl/AGCT// HPAl/GTTACIC// HPAll/CCGG// RSAl/GTAC// THAl/CGCG// MBOl/GATC// HHAl/GCGC// HCIEl/AGGCCT/bGGCCFI/TGGCCl HAEll/OGCGCC/GGCGCT/AGCGC H .A-E-l l-l /-G G C // - -C . TAQl/TCGW/ ECOPl/AGACC/GGTCT// ECORl/GAATTC// ECORll/CCCIGG/CCTGG// HGAl/GACGC/GCGTC// HINFl/GA-TC// MROll/GACIGA/TCTTC//
-
#'TGGCCA// 1'/CIGCGCC//
Running the program (1) Start the program. (2) Supply the name of the file you wish to search through. (3) Select option by typing A, N or S. Option S is the one for which the user supplies strings from the keyboard as described in 'Keyboard String Layout'. Options A and N require the operator to supply the name of the file containing names and strings. This file is arranged as shown in 'Layout of Names and Strings File'. All three options are described above. (4) Change the search area if required by typing sequence character numbers to define the beginning and end of the region to be
364
DNA !EQUENC!ING
searched. To leave the area unchanged simply type cr. If the region has not previously been defined and the user types only cr the region will be defined as the whole of the file. (5) If option S is chosen supply the strings as described. If option N is chosen supply the names when the program prints the message ‘TYPE R. ENZYME NAMES N O W . The layout of this input is described in ‘Layout of Keyboard Name Input’. ( 6 ) When output is complete for all strings the program cycles round and requests the user to select his next option. If no option is selected, i.e., if the user types only cr, the program will stop.
SEQFIT Running the program Strings can be supplied to the program in one of two ways: they can either be typed in from the keyboard 0; extracted from a disk file. The program asks the user to select the string input mode at the beginning of the program run. This decision is binding for this run of the program, i.e., to change the string input mode requires the program to be restarted. 1. Start the program. 2. Define the string input mode. When the program types ‘TO TYPE IN STRINGS TYPE Y’, for string input from keyboard type Y, cr. For input from a disk file type only cr. 3. Supply name of disk file containing sequence. (a). For string input from disk supply name of disk file containing strings. 4. Supply strings (a) by typing them in; (b) by defining a region of the second disk file. Strings input from keyboard are terminated by an @ character. Strings must be <200 characters in length. 5. Define region of file 1 to be searched. If only cr is typed the program will search the whole file. 6. Define quality of fit as a percentage. Once this has been typed the search starts. When this has been done the program prints out the total scoring positions. These are then sorted into
Ch. 6
APPENDIX
365
descending order and then the top ten scores and their corresponding positions are written out on the keyboard. 7. Define the number of these scoring positions to be printed out on the keyboard. These positions will then be printed out starting with the highest score and working down. When printing has finished the user is offered, in turn, four options. If he wants any of them he types Y cr, if not, only cr. The options are to: (a) search for the complement of the last string. (b) supply a new string. (c) change the region to be searched. (d) change the percentage. (Options a and b are mutually exclusive.) If no option is selected the program will stop. 8. When option selection is completed the program prompts the user to supply the relevant input for the options selected (as above) and then searching commences as before.
OVRLAP
Running the program 1. Start the program. 2. Type name of file 1. 3. Type name of file 2. 4. If required, restrict REGION of file 1 to search through. Carriage return for both will means the whole file is searched. 5. If required, restrict STRING from file 1 to compare with file 2. Carriage return will mean the whole file is compared. 6. Type MIN the minimum overlap. Comparison will now begin.
SQRVCM Running the program 1. Start the program. 2. Type name of input file (FILE 1). 3. Type name of output file (FILE 2).
366
D N A SEQUENCING
HAIRPN, HAIRGU Running
the program
1. Start the program. 2. Type name of sequence file. 3. Type start and stop positions t o define region for program to search through. 4. Type MIN and MAX to define loop size, i.e., range of numbers of bases t o loop out. 5. Type minimum number of pairings required for any loops, i.e., a minimum value for p. The search now starts and takes a few seconds. When it is complete the program asks the user to select for output sorting: 6. Type S for sorting on size of stem i.e., highest p first; or type C for sorting on centres of loop or type B for both forms of sorting printed one after the other. Output then starts. 7. When output is finished the program is ready for a new region definition. A negative start position will stop the program.
Running
the program
1. Start the program. 2. Supply name of file containing protein sequence in one letter amino acid codes. 3. Define the region of the file to count over. Zero start and stop positions result in the whole sequence being counted, a negative start position stops the program.
TRNA Running the program
1. Start the program. 2. Type name of file containing sequence: ‘PLEASE TYPE N A M E OF FILE 1’.
Ch. 6
367
APPENDIX
3. Define region to search: ‘FIRST SEQUENCE NUMBER =*, ‘LAST SEQUENCE NUMBER =’ (default is whole sequence, a negative first sequence number will stop the program). 4. Define a maximum length for the tRNA: ‘MAX TRNA LENGTH =’ (default = 92). 5. Define minimum scores for each of the stems: ‘MIN SCORE AMINO ACYL STEM =’, ‘MIN SCORE TU STEM =’, ‘MIN SCORE ANTICODON STEM =’, ‘MIN SCORE D STEM =’. Lowest base-pairing in loops from Gauss paper (reference 8 in Chapter 6 ) Aminoacyl TU Anticodon D-loop
G-C, A-T 5 4 3 1
G-T 1 0 1 1
Score 11 8 7 3
6. Define a range of intron lengths: ‘MIN INTRON LENGTH =’, ‘MAX INTRON LENGTH =’ (defaults zero). 7. Define range of TU loop lengths: ‘MIN LENGTH TU LOOP =’ ‘MAX LENGTH TU LOOP = (defaults 6, 9). 8. Select conserved bases filtering or not: ‘TO FILTER ON CONSERVED BASES TYPE Y’. If you choose to filter then you will be prompted to supply a score for each of the conserved bases.
Conserved bases used in filtering
Base number 88 10 11 14 15 21 32
Base assignment T G C or T A A or G A C or T
368
DNA SEQIJK"G
Base number 33
37 48 53 54 55 56 57 58
60 61
Base assignment
T A
T or C G T T C A or G A C or T C
9. Define minimum total score for conserved bases: 'MIN TOTAL SCORE ON CONSERVED BASES ='. Searching then starts and any cloverleaf conforming to the above criteria will be displayed in both 1D and 2D.