140 likes | 147 Views
Home Work. I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment score, E-value.
E N D
Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment score, E-value. 2) Near each result provide a link that redirects to Pairwise Alignment (from the previous exercise). The page for Pairwise Alignment should be pre-filled with the two sequences (first - the original sequence, second – the selected sequence from the Blast run). * You should also submit data flow diagram with BioPerl class names.
Home Work (continued) • Doc: bioperl tutorial section III.4.1 Running BLAST remotely (using RemoteBlast.pm) • Use sleep function • Data-Flow diagram example for retrieving sequence: • $gb = new Bio::DB::GenBank(); • $seq = $gb->get_Seq_by_acc('AF303112'); • print $seq1->seq(); GenBank get_Seq_by_acc('AF303112'); Seq $seq1->seq(); string
Home Work (continued) II. Translate PROSITE pattern into Perl regular expression.
Profile Analysis M. Gribskov, D. Eisenberg. Profile Analysis - detection of distantly related proteins by sequence comparison. The information is expressed in a position-specific scoring table (profile).
Profiles Seq1-> Seq2-> Seq3-> Seq4->
Profile alignment • Sequence – Profile Alignment. • Profile – Profile Alignment. Dynamic Programming. (the same idea as in Pairwise Sequence Alignment)
reminder: Pairwise Sequence Alignment The position-specific gap coefficients penalize gaps in conserved regions more heavily than gaps in more variable regions Sequence-Profile alignment: S(x,j) – aligning ‘x’ with column ‘j’ S(x,j)= Σy σ(x,y) p(y,j)/p(y) σ(x,y) – any regular score for Pairwise Alignment (PAM-k, BLOSUM-k …) p(y,j) – frequency that character y appears in mult. align. column ‘j’ p(y) – frequency that character y appears anywhere in all sequences from mult.align.
Profiles in GCG • PileUp creates a multiple sequence alignment from a group of related sequences. • ProfileMake makes a profile from a multiple sequence alignment. • ProfileSearch uses the profile to search a database for sequences with similarity to the group of aligned sequences. • ProfileSegments displays optimal alignments between each sequence in the ProfileSearch output list and the group of aligned sequences (represented by the profile consensus). • ProfileGap makes optimal alignments between one or more sequences and a group of aligned sequences represented as a profile. • ProfileScan uses a database of profiles to find structural and sequence motifs in protein sequences.
Iterative profile pairwise alignment • 1. Align some pair. • 2. While (not done) • (a)Pick an unaligned string which is ”near” some aligned one(s). • (b)Align with the profile of the previously aligned group. • Resulting new spaces are inserted into all strings in the group.
Progressive Profile Alignment ClustalW (algorithm of Thompson, Higgins, Gibson 1994) (the idea is close to Feng-Doolittle 1987, implemented in PileUp, GCG package) 1. Calculate the pairwise alignment scores, and convert them to distances. 2. Use a neighbor-joining algorithm to build a tree from the distances. 3. Align sequence - sequence, sequence - profile, profile - profile in decreasing similarity order.