1 / 88

Lecture 5: Searching Sequence Databases Eric C. Rouchka, D.Sc. eric.rouchka@uofl kbrin.a-bldg.louisville/~rouchka/CECS6

Lecture 5: Searching Sequence Databases Eric C. Rouchka, D.Sc. eric.rouchka@uofl.edu http://kbrin.a-bldg.louisville.edu/~rouchka/CECS694/. Multiple Alignment Formats. Formats for storing multiple alignments are specified FASTA, GCG MSF, ALN, etc. FASTA Format.

shirin
Download Presentation

Lecture 5: Searching Sequence Databases Eric C. Rouchka, D.Sc. eric.rouchka@uofl kbrin.a-bldg.louisville/~rouchka/CECS6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5: Searching Sequence Databases Eric C. Rouchka, D.Sc. eric.rouchka@uofl.edu http://kbrin.a-bldg.louisville.edu/~rouchka/CECS694/ CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  2. Multiple Alignment Formats • Formats for storing multiple alignments are specified • FASTA, GCG MSF, ALN, etc CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  3. FASTA Format • Each sequence begins with a description line ‘>’ • Sequence data follows, with gap character ‘-’ CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  4. Example Fasta sequence >JC2395 NVSDVNLNK---YIWRTAEKMK---ICDAKKFARQHKIPESKIDEIEHNSPQDAAE---- -------------------------QKIQLLQCWYQSHGKT—GACQALIQGLRKANRCDI AEEIQAM >KPEL_DROME MAIRLLPLPVRAQLCAHLDAL-----DVWQQLATAVKLYPDQVEQISSQKQRGRS----- -------------------------ASNEFLNIWGGQYN----HTVQTLFALFKKLKLHN AMRLIKDY >FASA_MOUSE NASNLSLSK---YIPRIAEDMT---IQEAKKFARENNIKEGKIDEIMHDSIQDTAE---- -------------------------QKVQLLLCWYQSHGKS--DAYQDLIKGLKKAECRR TLDKFQDM CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  5. Stockholm Format • Features in a multiple alignment are annotated using a ‘magic’ label • GF: Generic per-File annotation • GC: Generic per-Column annotation • GS: Generic per-Sequence annotation • GR: Generic per sequence and per column markup • Used by PFAM, HMMER, Belvu CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  6. Example Stockholm Sequence • http://www.cgr.ki.se/cgr/groups/sonnhammer/Stockholm.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  7. GCG Multiple Sequence Format (MSF) • The beginning of the file is a header describing the sequences • Header may be formatted to contain specific information • Header ends with a “//” CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  8. !!AA_MULTIPLE_ALIGNMENT 1.0  msf MSF: 131 Type: P 22/01/02 CompCheck: 3003 ..  Name: IXI_234 Len: 131 Check: 6808 Weight: 1.00 Name: IXI_235 Len: 131 Check: 4032 Weight: 1.00 Name: IXI_236 Len: 131 Check: 2744 Weight: 1.00 Name: IXI_237 Len: 131 Check: 9419 Weight: 1.00  //   1 50 IXI_234 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT IXI_235 TSPASIRPPAGPSSR.........RPSPPGPRRPTGRPCCSAAPRRPQAT IXI_236 TSPASIRPPAGPSSRPAMVSSR..RPSPPPPRRPPGRPCCSAAPPRPQAT IXI_237 TSPASLRPPAGPSSRPAMVSSRR.RPSPPGPRRPT....CSAAPRRPQAT   51 100 IXI_234 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG IXI_235 GGWKTCSGTCTTSTSTRHRGRSGW..........RASRKSMRAACSRSAG IXI_236 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR..G IXI_237 GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR..G   101 131 IXI_234 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_235 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_236 SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE IXI_237 SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  9. ClustalW ALN Format • First non-blank line contains the word “CLUSTAL” • Each sequence starts with sequence name • Lines containing conservation symbols (* or :) are ignored CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  10. ClustalW ALN Format CLUSTAL W (1.82) multiple sequence alignment   JC2395 -NVSDVNLNKYIWRTAEKMKICDAKKFARQHKIPESKIDEIEHNSPQDAAEQKIQLLQCW 59 FASA_MOUSE -NASNLSLSKYIPRIAEDMTIQEAKKFARENNIKEGKIDEIMHDSIQDTAEQKVQLLLCW 59 KPEL_DROME MAIRLLPLPVRAQLCAHLDALDVWQQLATAVKLYPDQVEQISSQKQRGRSASN-EFLNIW 59 : * *. : :::* :: .::::* :. :. : .: ::* *  JC2395 YQSHGKTGACQALIQGLRKANRCDIAEEIQAM 91 FASA_MOUSE YQSHGKSDAYQDLIKGLKKAECRRTLDKFQDM 91 KPEL_DROME GGQYNHT--VQTLFALFKKLKLHNAMRLIKDY 89 .:.:: * *: ::* : :: CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  11. ClustalW ALN Format CLUSTAL W(1.4) multiple sequence alignment  IXI_234 TSPASIRPPA GPSSRPAMVS SRRTRPSPPG PRRPTGRPCC SAAPRRPQAT IXI_235 TSPASIRPPA GPSSR----- ----RPSPPG PRRPTGRPCC SAAPRRPQAT IXI_236 TSPASIRPPA GPSSRPAMVS SR--RPSPPP PRRPPGRPCC SAAPPRPQAT IXI_237 TSPASLRPPA GPSSRPAMVS SRR-RPSPPG PRRPT----C SAAPRRPQAT IXI_234 GGWKTCSGTC TTSTSTRHRG RSGWSARTTT AACLRASRKS MRAACSRSAG IXI_235 GGWKTCSGTC TTSTSTRHRG RSGW------ ----RASRKS MRAACSRSAG IXI_236 GGWKTCSGTC TTSTSTRHRG RSGWSARTTT AACLRASRKS MRAACSR--G IXI_237 GGYKTCSGTC TTSTSTRHRG RSGYSARTTT AACLRASRKS MRAACSR--G IXI_234 SRPNRFAPTL MSSCITSTTG PPAWAGDRSH E IXI_235 SRPNRFAPTL MSSCITSTTG PPAWAGDRSH E IXI_236 SRPPRFAPPL MSSCITSTTG PPPPAGDRSH E IXI_237 SRPNRFAPTL MSSCLTSTTG PPAYAGDRSH E CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  12. Phylip • The first line is two numbers • First indicates number of sequences • Second indicates length of alignment CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  13. Example Phylip File Phylip 3 92 JC2395 -NVSDVNLNK YIWRTAEKMK ICDAKKFARQ HKIPESKIDE IEHNSPQDAA FASA_MOUSE -NASNLSLSK YIPRIAEDMT IQEAKKFARE NNIKEGKIDE IMHDSIQDTA KPEL_DROME MAIRLLPLPV RAQLCAHLDA LDVWQQLATA VKLYPDQVEQ ISSQKQRGRS   EQKIQLLQCW YQSHGKTGAC QALIQGLRKA NRCDIAEEIQ AM EQKVQLLLCW YQSHGKSDAY QDLIKGLKKA ECRRTLDKFQ DM ASN-EFLNIW GGQYNHT--V QTLFALFKKL KLHNAMRLIK DY CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  14. PIR Format >P1;JC2395  -NVSDVNLNKYIWRTAEKMKICDAKKFARQHKIPESKIDEIEHNSPQDAAEQKIQLLQCW YQHGKTGACQALIQGLRKANRCDIAEEIQAM * >P1;FASA_MOUSE  NASNLSLSKYIPRIAEDMTIQEAKKFARENNIKEGKIDEIMHDSIQDTAEQKVQLLLCW YQHGKSDAYQDLIKGLKKAECRRTLDKFQDM * >P1;KPEL_DROME  MAIRLLPLPVRAQLCAHLDALDVWQQLATAVKLYPDQVEQISSQKQGRSASN-EFLNIW GGQYNH--VQTLFALFKKLKLHNAMRLIKDY * CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  15. GDE %JC2395 nvsdvnlnkyiwrtaekmkicdakkfarqhkipeskideiehnspqdaaeqkiqllqcwy qshgktgacqaliqglrkanrcdiaeeiqam %FASA_MOUSE Nasnlslskyipriaedmtiqeakkfarennikegkideimhdsiqdtaeqkvqlllcwy qshgksdayqdlikglkkaecrrtldkfqdm %KPEL_DROME --mairllplpvraqlcahldaldvwqqlatavklypdqveqissqkqrgrsasneflni wggqynhtvqtlfalfkklklhnamrlikdy CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  16. NEXUS #NEXUS BEGIN DATA; dimensions ntax=3 nchar=91; format missing=? symbols="ABCDEFGHIKLMNPQRSTUVWXYZ“ interleave datatype=PROTEIN gap= -; matrix JC2395 NVSDVNLNKYIWRTAEKMKICDAKKFARQHKIPESKIDEIEHNSPQDAAE FASA_MOUSE NASNLSLSKYIPRIAEDMTIQEAKKFARENNIKEGKIDEIMHDSIQDTAE KPEL_DROME --MAIRLLPLPVRAQLCAHLDALDVWQQLATAVKLYPDQVEQISSQKQRG JC2395 QKIQLLQCWYQSHGKTGACQALIQGLRKANRCDIAEEIQAM FASA_MOUSE QKVQLLLCWYQSHGKSDAYQDLIKGLKKAECRRTLDKFQDM KPEL_DROME RSASNEFLNIWGGQYNHTVQTLFALFKKLKLHNAMRLIKDY ; end; CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  17. General Feature Format (GFF) • Developed for easy parsing of features • Used to: • Read annotations into ACE format • Print out images of sequences and annotations • http://www.sanger.ac.uk/Software/formats/GFF/ CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  18. Sequence Formats • Numerous other formats • Descriptions: • BLOCKS Server • Http://www.blocks.fhcrc.org/blocks/help/blocks_format.html • EMBL accepted formats • http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/SequenceFormats.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  19. Sequence Conversion Programs • SEQIO • http://bioweb.pasteur.fr/docs/seqio/seqio.html • READSEQ • http://bimas.dcrt.nih.gov/molbio/readseq/ CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  20. Searching Sequence Databases • Compare a query sequence against a target database • Return significant results • Possible Homolgous sequences • Yields insight into structure and function CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  21. DNA vs. Protein Searches • Easier to determine similarity in protein sequences • 4 base of DNA means more random sequences • Consider alignment of length 4 • DNA: 1/44 = 1/256 chance at random • AA: 1/204 = 1/160,000 chance at random CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  22. DNA vs. Protein Searches • Redundancy in Genetic code • Multiple codons code for same amino acid • A.A. sequence could be identical • DNA sequence could be different CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  23. DNA vs. Protein Searches • Consider the two sequences: AUGGAATTAGTTATTAGTGCTTTAATTGTTGAATAA AUGGAGCTGGTGATCTCAGCGCTGATCGTCGAGTGA • Ungapped DNA alignment: AUGGAATTAGTTATTAGTGCTTTAATTGTTGAATAA ||||| | || || || | || || || | | AUGGAGCTGGTGATCTCAGCGCTGATCGTCGAGTGA • 21 identical resides (out of 36) 58% identity CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  24. DNA vs. Protein Searches • Translate each to protein first: ELVISISALIVE ELVISISALIVE • 100% identical at amino acid level CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  25. DNA vs. Protein Searches • If nucleotide region contains a gene, beneficial to translate first • Target and query translated into all six reading frames • 3 in forward, 3 in reverse CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  26. DNA vs. Protein Searches • Number of comparisons needed grows • 4 comparisons: 2 in each direction • 36 comparisons: 6 in each direction • More sensitive, but slower CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  27. Scoring Matrices • Defaults for major database searches • PAM250 (original) • BLOSUM62 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  28. FASTA • First rapid database search utility • 50 times faster than Dynamic Programming • Based on a heuristic – not guaranteed to locate optimal solution CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  29. FASTA Algorithm • Hashing approach: • Construct a table showing each word of length k (k-tuple) for query and target • 1 or 2 for proteins • 4 or 6 for DNA • Relative positions calculated by subtracting positions • Matches in same phase are strung together CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  30. FASTA Algorithm • Identify 10 regions with highest density of hits • Trim regions to include only residues contributing to high scores • Associate init1 score to each region Each region is partial alignment without gaps CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  31. FASTA Algorithm • Join initial regions to form approximate alignments with gaps • Assign score • Sum of init1 scores for initial regions • Subtract gap penalty CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  32. FASTA Algorithm • Construct Needleman-Wunsch optimal alignment of the query and database • Consider only a band 32 residues wide • Centered on best initial region CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  33. FASTA Algorithm Step 1: Locate k-tuples Step 1: Locate k-tuples Step 2: locate 10 highest density regions (init1) Step 3: Join initial regions with gaps (initn) Step 1: Locate k-tuples Step 2: locate 10 highest density regions (init1) Step 1: Locate k-tuples Step 2: locate 10 highest density regions (init1) Step 3: Join initial regions with gaps (initn) Step 4: Align query and database around best region CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  34. FASTA Scores • FASTA calculates the u and  parameters for the extreme value distribution • Results reported as normalized z-scores and E-values CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  35. Steps to calculate z-score • Average score for databases sequences of same length range determined • Average score plotted against log of average sequence length CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  36. Steps to calculate z-score • Points fitted to a straight line • A z score (number of standard deviations from fitted line) calculated for each score • High scoring and low scoring alignments removed • Steps repeated one or more times to refine CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  37. Steps to calculate z-score • Z score normalized • Z’ = 50 + 10 * z • Alignment score with std dev of 5 has normalized z score of 100 • Significance can be refined by shuffling sequences in database CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  38. Probability of z-score • Pearson, 2000 (ISMB): CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  39. Expected Value • In a database of D sequences: CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  40. Example Fasta Output • FASTA reports a histogram • Indicates distribution of normalized scores • Expected to fall in normal distribution • Outliers are significant matches CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  41. Histogram Output • normalized z’ score • Number of optimized scores found • Number of expected scores • “=“: approximate curve for observed • ‘*”: approximate curve for expected • Z’ score > 120 considered high-scoring CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  42. opt E() • < 20 188 0:== • 22 0 0: one = represents 109 library sequences • 24 0 0: • 26 2 1:* • 28 7 15:* • 30 28 91:* • 32 200 353:== * • 34 841 958:========* • 36 2217 1968:==================*== • 38 3746 3253:=============================*===== • 40 5360 4538:=========================================*======== • 42 6055 5547:==================================================*===== • 44 6496 6119:========================================================*=== • 46 5820 6232:====================================================== * • 48 5469 5966:=================================================== * • 50 4820 5444:============================================= * • 52 4202 4787:======================================= * • 54 3815 4089:=================================== * • 56 3271 3415:===============================* • 58 2755 2804:=========================* • 60 2268 2271:====================* • 62 1813 1821:================* • 64 1500 1448:=============* • 66 1233 1145:==========*= • 68 951 900:========* • 70 746 706:======* • 72 699 551:=====*= • 74 460 430:===*= • 76 337 335:===* • 78 287 260:==* • 80 244 202:=*= • 82 185 154:=* • 84 115 122:=* • 86 114 95:*= • 88 75 73:* inset = represents 1 library sequences • 90 70 57:* • 92 48 44:* :=======================================* • 94 26 34:* :========================== * • 96 33 26:* :=========================*======= • 98 14 20:* :============== * • 100 10 16:* :========== * • 102 7 12:* :======= * • 104 6 9:* :====== * • 106 5 7:* :===== * • 108 2 6:* :== * • 110 2 4:* :== * • 112 1 3:* := * • 114 0 3:* : * • 116 0 2:* : * • 118 0 2:* : * • >120 27 1:* :*========================== CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  43. Fasta Best scoring hits • At most, one hit per sequence • Description, z’ score, init1 score, initn score, opt score, E value CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  44. Example FASTA output The best scores are: initn init1 opt z-sc E(66345) MERR_PSEAE mercuric resistance operon regu ( 144) 928 928 928 1129.8 0 MERR_SHIFL mercuric resistance operon regu ( 144) 871 871 871 1061.3 0 MERR_SERMA mercuric resistance operon regu ( 144) 810 810 810 988.1 0 MERR_STAAU mercuric resistance operon regu ( 135) 292 172 298 373.6 3.5e-14 MERR_BACSR (strain rc607). mercuric resist ( 132) 241 198 289 363.0 1.4e-13 YHDM_ECOLI hypothetical transcriptional re ( 141) 175 175 276 347.0 1.1e-12 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  45. FASTA Alignment Output • Smith-Waterman type alignment • ‘:’ denotes conserved residue • ‘.’ denotes conservative substitution (ie a substitution with a positive score) CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  46. FASTA Alignment Output >>MERR_STAAU mercuric resistance operon regulatory protei (135 aa) initn: 292 init1: 172 opt: 298 Z-score: 373.6 expect() 3.5e-14 Smith-Waterman score: 298; 36.923% identity in 130 aa overlap 10 20 30 40 50 60 MerR MENNLENLTIGVFAKAAGVNVETIRFYQRKGLLLEPDKPYGSIRRYGEADVTRVRFVKSA . :. .::: :: ::.:.:.::::. : . .. : :.: . ::::.: MERR_S MGMKISELAKACDVNKETVRYYERKGLIAGPPRNESGYRIYSEETADRVRFIKRM 10 20 30 40 50 70 80 90 100 110 MerR QRLGFSLDEIAELLRL--EDGTHCEEASSLAEHKLKDVREKMADLARMEAVLSELVCACH ..: ::: :: :. . .:: .:.. ... .: :....:. : :.. .: :: : MERR_S KELDFSLKEIHLLFGVVDQDGERCKDMYAFTVQKTKEIERKVQGLLRIQRLLEELKEKCP 60 70 80 90 100 110 120 130 140 MerR ARRGNVSCPLIASLQGGASLAGSAMP ... .::.: .:.:: MERR_S DEKAMYTCPIIETLMGGPDK 120 130 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  47. FASTA Programs • FASTA – protein to protein OR DNA to DNA • TFASTA –query protein to DNA database • the DNA database is first translated in all six reading frames CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  48. FASTA Programs • FASTF – compares a set of ordered peptide fragments, obtained from analysis of a protein by cleavage and sequencing of protein bands resolved by electrophoresis, against a protein database • TFASTF – compares a set of ordered peptide fragments, against a DNA database CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  49. FASTA Programs • FASTS – compares a set of ordered peptide fragments, obtained from mass-spectometry analysis of a protein, against a protein database. • TFASTS – compares a set of ordered peptide fragments,, against a DNA database. >mgstm1 MGCEN,MIDYP,MLLAY,MLLGY CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

  50. FASTA Programs • FASTX, FASTY – compares a query DNA sequence to a protein sequence database • DNA sequence translated in all six reading frames • frameshifts allowed CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka

More Related