680 likes | 922 Views
DNA 序列分析. David Shiuan Department of Life Science Institute of Biotechnology and Interdisciplinary Program of Bioinformatics National Dong Hwa University. DNA 序列分析 (I). BLAST comparison ORF (open reading frame) Finder Promoter Search
E N D
DNA序列分析 David Shiuan Department of Life Science Institute of Biotechnology and Interdisciplinary Program of Bioinformatics National Dong Hwa University
DNA序列分析 (I) • BLAST comparison • ORF (open reading frame) Finder • Promoter Search -Promoter Prediction (BCM) -EPD(Eukaryote Promoter Database) -NNPP prokaryote promoter prediction(BCM) -ProtScan (BIMAS)
DNA序列分析 (II) • Sequence Alignment (Clastal W) • Tree Analysis (MEGA, PAUP, UPGMA) • Motif Prediction • Restriction Analysis (TCGA) • RNAFOLD (GCG)
Basic Local Alignment Search Tool • A sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query. • Algorithm : A fixed procedure embodied in a computer program.
Basic Local Alignment Search Tool • The initial search is done for a word of length "W" that scores at least "T" when compared to the query using a substitution matrix. Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S". The "T" parameter dictates the speed and sensitivity of the search.
BLOSUM62 Substitution Scoring Matrix • The BLOSUM 62 matrix shown here is a 20 x 20 matrix, in which every possible identity and substitution is assigned a score based on the observed frequencies of such occurences in alignments of related proteins. • Identities are assigned the most positive scores.
The NCBI BLAST family of programs • blastp compares an amino acid query sequence against a protein sequence database • blastn compares a nucleotide query sequence against a nucleotide sequence database • blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database • tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames • tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
Peptide Sequence Databasesfor BLAST search • nr • All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF • month • All new or revised GenBank CDS translation+PDB+SwissProt+PIR+PRF released in the last 30 days. • swissprot • Last major release of the SWISS-PROT protein sequence database (no updates)
E-value for the score S • the expected number of HSPs with score at least S is given by the formula E = K m n e – lS HSP : high-scoring segment pairs m and n :sequence lengths K and lambda : parameters
Promoter Search • ProtScan (at BIMAS) • EPD (Eukaryote Promoter Database) • Promoter Prediction (BCM) • NNPP (Prokaryote Promoter Prediction at BCM)