1 / 32

Group Feature Extraction Based on Multiple Indexing Sequence Alignment

Group Feature Extraction Based on Multiple Indexing Sequence Alignment. 多重索引序列排比應用於 群組特徵擷取. Dr. Tun-Wen Pai Dept. of Computer Science and Engineering, National Taiwan Ocean University 2006.10.30. Central idea: finding short approximate patterns Motivation:

tamra
Download Presentation

Group Feature Extraction Based on Multiple Indexing Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Group Feature Extraction Based on Multiple Indexing Sequence Alignment 多重索引序列排比應用於群組特徵擷取 Dr. Tun-Wen Pai Dept. of Computer Science and Engineering, National Taiwan Ocean University 2006.10.30

  2. Central idea: finding short approximate patterns • Motivation: finding ordered combinatorial features • Objectives: • constructing evolutionary relationship • providing key features for structural alignment

  3. System Architecture

  4. Motif finding • short consensus motifs including tolerable characteristics • variable-site tolerance: the tolerated sites in a pattern can be variable • substitutable tolerance: the similar chemical properties of residues in a pattern can be substituted

  5. Variable-site tolerance • applying the uniqueness and efficient searching of hashing techniques • original patterns unique digital value • comparing patterns using a hash table structure

  6. Substitutable tolerance • depending on chemical properties • substitution matrix Blosum62 • bitwise clustering avoid misjudging two dissimilar residues

  7. Hierarchical clustering • revealing phylogenetic relationships • two sequences possess more consensus motifs more similar • scoring matrix pairwise similarities

  8. Exclusive Group Feature Extraction • Removing common motifs occurring in other subgroups • CP: combinatorial patterns • ECP: exclusive combinatorial patterns

  9. Background Model Analysis • Verifying conspicuousness • Hit ratio close to 0 unique • Hit ratio relative large insignificant

  10. The combinatorial features of RNase A-like superfamily extracted by MISA

  11. The combinatorial features of RNase A-like superfamily extracted by MISA(cont.) • The known H-K-H active sites are identified exactly

  12. The combinatorial features of RNase A-like superfamily extracted by ClustalW • The first H was misaligned

  13. The combinatorial features of RNase A-like superfamily extracted by ClustalW • The first H was misaligned

  14. The combinatorial features of RNase A-like superfamily extracted by ClustalW(cont.)

  15. The combinatorial features of RNase A-like superfamily extracted by MEME

  16. The combinatorial features of RNase A-like superfamily extracted by MEME(cont.) • The first ‘H’ was not successfully detected

  17. The combinatorial features of RNase A-like superfamily extracted by Gibbs Sampler 1, 1, 1 65 qekvt CKNGQ gncyk 69 1.00 F 1E21:A 1, 2, 0 107 kerhi IVACE gspyv 111 1.00 F 1E21:A 1, 3, 2 116 egspy VPVHFD asved 121 1.00 F 1E21:A 2, 1, 1 38 nyqrr CKNQN tfllt 42 1.00 F 1GQV:A 2, 2, 0 109 anmfy IVACD nrdqr 113 1.00 F 1GQV:A 2, 3, 2 127 pqypv VPVHLD rii 132 1.00 F 1GQV:A 3, 1, 1 37 nyrwr CKNQN tflrt 41 1.00 F 1DYT:A 3, 2, 0 108 grrfy VVACD nrdpr 112 1.00 F 1DYT:A 3, 3, 2 125 prypv VPVHLD tti 130 1.00 F 1DYT:A 4, 1, 1 65 ttniq CKNGK mnche 69 1.00 F 1RNF:A 4, 2, 0 105 strrv VIACE gnpqv 109 1.00 F 1RNF:A 4, 3, 2 114 egnpq VPVHFD g 119 1.00 F 1RNF:A 5, 1, 1 59 kaice NKNGN phren 63 1.00 F 1B1I:A 5, 2, 0 104 gfrnv VVACE nglpv 108 1.00 F 1B1I:A 5, 3, 2 111 aceng LPVHLD qsifr 116 1.00 F 1B1I:A 15 motifs Column 1 : Sequence Number, Site Number Column 2 : Motif type Column 3 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input

  18. The combinatorial features of RNase A-like superfamily extracted by Gibbs Sampler(cont.) • The first ‘H’ was not successfully detected • The motif colored in red wrong

  19. The Comparison in Average RMSD and Aligned Residues (using a straight forward structure alignment) • The lowest average RMSD • The highest average aligned residues

  20. MISA for primate map1b upstream sequences Ref: D. Liu and I. Fischer, “Structural analysis of the proximal region of the microtubule-associated protein 1B promoter”, J Neurochem, 1997, 69: pp. 910-919

  21. MISA for primate hspa2

  22. Hierarchical clustering for p450 family 1 It can be clustered into three subfamilites

  23. Combinatorial features for subfamily 1A

  24. Combinatorial features for subfamily 1B

  25. Combinatorial features for subfamily 1C

  26. cytochrome P450 subfamily 1A cytochrome P450 subfamily 1B cytochrome P450 subfamily 1C Exclusive group features for p450 family 1 • cytochrome P450 subfamily 1A • ^ E*L*A ^ *PK*L* ^ *W*ARR*LA* ^ L**FS ^ *SC*LEEH*S*E ^ G*F*P ^ *V*SV*NVI ^ *DF*P*LR*LP* ^ **EHY**F ^ **DIT**L ^ **ELD** ^ R*P*LS • cytochrome P450 subfamily 1B • ^ F*R*A ^ WK**R ^ R*F*T ^ **RYP**Q*R*Q ^ DQ**LP ^ G**NK*L* ^ **HQC** ^ **LLD** • cytochrome P450 subfamily 1C • ^ SI**EWSG**QPAL*A*F ^ **EAC*W* ^ F**YSKQW**HRK*AQS**RAFS*AN*QT* ^ EA**LV**FL ^ F*P*HE*T ^ N**FF**V**KV**HR ^ W**LL ^ *AK*RG*

More Related