1 / 19

Please Download Bioinfo89-11.exe 解壓縮後含下列檔案 : Bioinfo89-11 ( 上課 slide)

Fragment Assembly System (FAS). Please Download Bioinfo89-11.exe 解壓縮後含下列檔案 : Bioinfo89-11.ppt ( 上課 slide) Exercise89-11.doc ( 上課習作 ) Gelassemble commands.doc & SeqED commands.doc ( 指令集 ) Seq01.txt - seq10.txt ( 習作用序列 ). Fragment Assembly System (FAS). (1) Store fragment sequences;

robbin
Download Presentation

Please Download Bioinfo89-11.exe 解壓縮後含下列檔案 : Bioinfo89-11 ( 上課 slide)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fragment Assembly System (FAS) Please Download Bioinfo89-11.exe 解壓縮後含下列檔案: Bioinfo89-11.ppt (上課 slide) Exercise89-11.doc (上課習作) Gelassemble commands.doc & SeqED commands.doc (指令集) Seq01.txt - seq10.txt (習作用序列)

  2. Fragment Assembly System (FAS) (1) Store fragment sequences; (2) Recognize overlapping sequences and create aligned assemblies, called contigs; (3) Display, edit and output the contigs for further analysis. Assemble overlapping fragment sequences from a sequencing project. 5 3 Contig 1 1 4 Contig 2 2 Consensus A contig may not contain more than 1,650 fragments and may not be longer than 200,000 bases. No single fragment may be longer than 2,500 bases

  3. Begins a fragment assembly session bycreating a new fragment assembly project or by identifying an existing project. GelStart Enters a fragment sequences to a fragment assembly project from your terminal keyboard, a digitizer, or existing sequence files. GelEnter Aligns the sequences in a fragment assembly project into assemblies called contigs. GelMerge A multiple sequence editor for viewing and editing contigs assembled by GelMerge. GelAssemble GelView Displays the structure of the contigs in a fragment assembly project. Breaks up the contigs in a fragment assembly project into single fragments. GelDisassemble

  4. GelStart Use GelStart to create a new project database for each sequencing project. For each new project, GelStart creates a new directory, named after the project, as a subdirectory of your current working directory. gcg% gelstart -check Minimal Syntax: % gelstart [-NAME=]MyProject -Default Prompted Parameters: -NEWproject begins a new sequencing project -VECtors=GB:M13mp18,GB:SynpBR322 highlights specified sequences in GELENTER -SITes=GAATTC,GGATCC highlights specified patterns in GELENTER Local Data Files: None Optional Parameters: -DELete deletes a whole project! -NOMONitor suppresses the screen monitor

  5. SeqED <ctr>d screen mode command mode <return> • SeqEd is an interactive editor for entering and modifying sequences and for assembling parts of existing sequences into new genetic constructs. You can enter sequences from the keyboard or from a digitizer. AGTCTTAGTCGATCGTAcTGCATRCGA ....|:.......:|.........i.......:.|.........|.........|.........|.........|.. 0 10 20 30 40 50 60 70 "sample.seq" 27 nucleotides

  6. Screen Mode G, A, T, . . . - insert a sequence character <Delete> - delete a sequence character <Ctrl>H - delete a sequence character /TAACG<Return> - find the next occurrence of TAACG (last pattern entered is the default) 1<Return> - move to start of the sequence <Ctrl>E - move to end of the sequence [n]<Right-arrow> - go ahead n characters [n]<Left-arrow> - go back n characters <Up-arrow> - go up to check sequence <Down-arrow> - go down to original sequence 'markcharacter - go to marked position 37<Return> - go to position 37 (any positive integer) < - go back 50 characters > - go ahead 50 characters <Ctrl>R - redraw the screen <Ctrl>D - enter command mode [n] is an optional numeric parameter.

  7. Command Mode • EDit seqname - get a new sequence file to edit • [n] Include [seqname] - insert another sequence [at position n] • (SeqEd prompts for range and strand) • s,f Delete - delete a range of bases • [s] Check [/Blind] - check a range of bases [beginning at s] • 37 - go to base 37 • REDraw - redraw the screen • [n] COmment comment - insert a comment [at position n] • [n] COmment - enter comment editing mode [at position n] • [n] HEAding - edit documentary heading [at line n] • change - enter screen mode (<Return> is sufficient) • screen - enter screen mode (<Return> is sufficient) • OVERstrike - enter overstrike mode • INSert - enter insert mode • [n] Mark markcharacter - mark the sequence [at position n] • PERFect - require finds to be perfect matches • PROtein - set sequence type to PROTEIN • NUCleotide - set sequence type to NUCLEOTIDE • [s,f] Write [seqname] - write [a part of] the sequence to a file • DIGitizer - enter digitizer mode • RELoad - enter reload mode • ACCept - terminate reload mode • Help - show commands in screen and command modes • [s,f] EXit [seqname] - write [a part of] the sequence and quit • Quit - quit the editor without writing the sequence • [n] indicates an optional parameter. • s and f are numbers for start and finish of a range of interest

  8. GelStart %gcg1 gelstart GelStart begins a fragment assembly session by creating a new fragment assembly project or by identifying an existing project. What is the name of your fragment assembly project? bio GELSTART cannot find this project. Is it a new one (* No *) ? y You have a new project named "bio". Which vector sequence(s) would you like highlighted? gb:m13mp18 Which restriction site(s) would you like highlighted ? GAATTC Project BIO has 0 fragments in 0 contigs. You are ready to run the other fragment assembly programs.

  9. GelEnter GelEnter is a sequence editor that accepts sequence data. gcg% gelenter –check Minimal Syntax: % gelenter [-INfile1=]mu*.seq Prompted Parameters: None Local Data Files: set.keys (must be in your current working directory to be used) Optional Parameters: -ENTER=mu*.seq enters existing files into the database -STAden enters existing Staden format files into the database -FASTA enters existing FASTA format files into the database -SINGlecommand automatically returns to screen mode after each command -PERFect sets find to search for perfect symbol matches -VECtors=gb:synpbr322 highlights sequences from pBR322 -SITes=gaattc highlights GAATTC patterns -LANes=g,A,T,C sets lane order for digitizer -MINOverlap=10 sets minimum overlap length for Reload command -PCTOverlap=95 sets stringency for the Reload command -TOLerance=0.4 sets tolerance for digitizing ambiguity (0 to 1), with 1 being the most tolerant

  10. GelEnter GelEnter accepts any valid GCG sequence character. Once you enter sequences into a project database, you can no longer edit them with GelEnter. gcg2 21% gelenter seq02.dat GelEnter adds fragment sequences to a fragment assembly project. It accepts sequence data from your terminal keyboard, a digitizer, or existing sequence files. "seq02" 593 nucleotides IUB/GCGMeaning A A C C G G T/U T M A or C R A or G W A or T S C or G Y C or T K G or T V A or C or G H A or C or T D A or G or T B C or G or T X/N G or A or T or C ./~ gap character

  11. GelMerge GelMerge automatically recognizes overlaps among all of the sequences in a project database and creates aligned assemblies, called contigs, from the overlapping sequences. These contigs are stored in the project database. As you add new sequences that connect separate contigs to the project database, GelMerge aligns the contigs into larger assemblies. % GelMerge What word size (* 7 *) ? What fraction of the words in an overlap must match (* 0.80 *) ? What is the minimum overlap length (* 14 *) ? Reading ............ Comparing ............ Aligning ......... Writing ... Input Contigs: 12 Output Contigs: 3 CPU time: 02.29 (seconds)

  12. Minimal Syntax: % gelmerge -Default Prompted Parameters: -WORdsize=7 sets word size for overlap determination -STRIngency=0.8 sets minimum fraction of matching words in overlap -MINOverlap=14 sets minimum length of overlap Local Data Files: -MATRix1=gelmergedna.cmp assigns the scoring matrix for contig assembly -MATRix2=gelmergelocaldna.cmp assigns the scoring matrix for vector recognition Optional Parameters: -MINIdentity=14 sets minimum run of identical bases found at least once in an overlap between two contigs -MAXGap=10 sets maximum gap size for overlap determination -GAPweight=8 sets gap creation penalty in contig assembly -LENgthweight=2 sets gap extension penalty in contig assembly -ARChive creates contigs from the original gel readings -WORKing creates contigs from individual working fragment (with gaps removed) -REPortfile[=Filename] writes report of recognized vector sequences -EXCise removes vector sequences from single-fragment contigs -VECTORSTrigency=0.8 sets minimum fraction of matches in vector recognition -VECTORMINIdentity=12 sets minimum run of identical bases found at least once in a match between vector and fragment -VECTORMAXGap=5 sets maximum gap size in first step of vector recognition -VECTORGAPweight=30 sets gap creation penalty in vector recognition -VECTORLENgthweight=3 sets gap extension penalty in vector recognition -NOMERge suppresses contig assembly -NOMONitor suppresses screen trace of program progress -NOSUMmary suppresses screen summary at the end of the program -BATch submits program to the batch queue

  13. GelAssemble <ctr>D Command mode Screen mode <return> After assembling contigs with GelMerge, use the contig editor, GelAssemble, to review and modify the alignments. After choosing a contig for review, GelAssemble lets you edit the individual sequences in that contig to resolve inconsistencies. GelAssemble creates a consensus sequence that uses the IUB nucleotide ambiguity codes. You can modify a sequence and change the alignment in the same way you edit text with a text editor. Although GelMerge assembles and aligns contigs automatically, you can assemble contigs manually using GelAssemble. For example, you could manually assemble separate contigs that do not share sufficient overlap for GelMerge to assemble automatically. You can also separate fragments from a contig if you believe they should not be included. Once you are satisfied with a contig, you can store it in the sequencing project database. seq03 > GTTCATCAGTCTTGGTGGAGAAGTTCGACAGATGCCATTGGCAGATTTCACCGATGGTTC 220 seq01 > GTTCATCAGTCTTGGTGGAGAAGTTCGACAGATGCCATTGGCAGATTTCACCGATGGTTC 540 CONSENSUS > GTTCATCAGTCTTGGTGGAGAAGTTCGACAGATGCCATTGGCAGATTTCACCGATGGTTC 540 .........+.........+.........+.........+.........+.........+

  14. Gelassemble Screen Mode Keys Pressed Action [n]<Right-arrow> move ahead [n bases] [n]<Left-arrow> move back [n bases] [n]<Up-arrow> move up [to row n] [n]<Down-arrow> move down [to row n] > scroll one screen to the right < scroll one screen to the left 1<Return> move to start of the sequence <Ctrl>E move to end of the sequence 165<Return> move to base 165 in sequence /GATTC<Return> find next occurrence of GATTC <Ctrl>A move to next ambiguity in alignment <Ctrl>R move to next ambiguity in sequence <Ctrl>V move to next gap in consensus <Ctrl>D enter Command Mode <Ctrl>L toggle alignment display enlargement <Ctrl>W redraw the screen <Ctrl>O toggle INSERT/OVERSTRIKE mode ! summary of current sequence ? display these help screens <Ctrl>G recalculate the consensus G A T C .... add base at the cursor <Delete> delete a base, or move sequence left <Ctrl>H delete a base, or move sequence left <Space bar> move the sequence to the right <Ctrl>X delete alignment column <Ctrl>I restore alignment column <Ctrl>B begin selecting a range for removal <Ctrl>N remove the selected range <Ctrl>P insert the removed range - reject current fragment

  15. Gelassemble Command Mode [a,b] specifies a range of fragments. [x,y] specifies a range of bases. [n] is an optional numeric parameter. EDit [ContigName] replace current contig with a new contig CONTIGs select another contig for editing WRite write a contig to the database EXit write the contig and quit QUIT quit without writing ERASE delete current contig from the database 238 move to position 238 in the current fragment [x,y] PRETTYout [FileName] write the sequence alignment [position x - y] [a,b] SEQOUT write fragments [a - b] to sequence files BIGPICture [FileName] write bar schematic to an output file OVERstrike select OVERSTRIKE sequence edit mode NOOVERstrike select INSERT sequence edit mode [x,y] CONSensus recalculate the consensus sequence [a,b] LOCk lock strands [a through b] [a,b] Unlock unlock strands [a through b] [x,y] SELect select bases [x through y] REMove remove the selected bases [n] INSert insert the removed bases [at position n] CAncel cancel the selection

  16. [x,y] DElete delete bases [x through y] GOTo [FragmentName] move to strand by name FInd GAATC find the next occurrence of GAATC DIfferences show differences from the consensus MAtches show matches with the consensus Neither show neither matches nor differences REDraw redraw the screen Help display these help screens SORt [DEScending] sorts strands by their offsets in alignment [a,b] MOve moves a strand [from line a to line b] OPen opens a blank line at the cursor position [a,b] ANChor anchors strands [a through b] [a,b] NOANchor unanchors strands [a through b] LOad [ContigName] loads another contig into the Edit Screen REVerse reverse-complement the (anchored) strand(s) [n] Offset shifts the current fragment [to begin at n] REJect removes the current fragment from the screen NODUPlicate removes a duplicated fragment from the screen SPAWN renames a duplicated fragment SEParate makes two contigs from anchored and unanchored strands

  17. GelView GelView displays bar diagrams that show the overlaps among the fragments in each contig, providing a schematic view of the whole sequencing project. Gelview  filename.vew. cat/more filename.view GELVIEW Fragment Assembly contig display of Project: bio May 4, 2000 17:42 Contig: seq01 3 seq03 +-------------------> 2 seq01 +-----------------------------> C CONSENSUS +------------------------------------> |----------|----------|----------|---------|---------| 0 200 400 600 800 Contig: seq04 3 seq02 <---------------+ 2 seq04 +------------> C CONSENSUS +---------------------------> |----------|----------|----------|---------|---------| 0 400 800 1200 1600 Contig: seq05 2 seq05 +----------------------------> C CONSENSUS +----------------------------> |----------|----------|----------|---------|---------| 0 200 400 600 800 5 Fragments in 3 Contigs

  18. GelDisassemble GelDisassemble breaks up the contigs in a sequencing project, thus recreating the database as a collection of single fragments. % geldisassemble Are you sure you want to disassemble your project (* No *) ? Yes 1) Emptying "relation" directory.... 2) Emptying "consensus directory.... 3) Copying "working" to "consensus".... 4) Creating "relation".... Gel Project Disassembled

  19. Exercise 89-12 • Download Bioinfo89-12.exe from 講義上網/Lecture12 • Decompress the file • Start WSFTP  Transfer the files seq01.txt-seq10.txt to GCG • Start GCG FAS • Questions: • What is the correct order of the assembled sequence? • (2) Which putative protein this sequence encodes?

More Related