320 likes | 443 Views
Group Feature Extraction Based on Multiple Indexing Sequence Alignment. 多重索引序列排比應用於 群組特徵擷取. Dr. Tun-Wen Pai Dept. of Computer Science and Engineering, National Taiwan Ocean University 2006.10.30. Central idea: finding short approximate patterns Motivation:
E N D
Group Feature Extraction Based on Multiple Indexing Sequence Alignment 多重索引序列排比應用於群組特徵擷取 Dr. Tun-Wen Pai Dept. of Computer Science and Engineering, National Taiwan Ocean University 2006.10.30
Central idea: finding short approximate patterns • Motivation: finding ordered combinatorial features • Objectives: • constructing evolutionary relationship • providing key features for structural alignment
Motif finding • short consensus motifs including tolerable characteristics • variable-site tolerance: the tolerated sites in a pattern can be variable • substitutable tolerance: the similar chemical properties of residues in a pattern can be substituted
Variable-site tolerance • applying the uniqueness and efficient searching of hashing techniques • original patterns unique digital value • comparing patterns using a hash table structure
Substitutable tolerance • depending on chemical properties • substitution matrix Blosum62 • bitwise clustering avoid misjudging two dissimilar residues
Hierarchical clustering • revealing phylogenetic relationships • two sequences possess more consensus motifs more similar • scoring matrix pairwise similarities
Exclusive Group Feature Extraction • Removing common motifs occurring in other subgroups • CP: combinatorial patterns • ECP: exclusive combinatorial patterns
Background Model Analysis • Verifying conspicuousness • Hit ratio close to 0 unique • Hit ratio relative large insignificant
The combinatorial features of RNase A-like superfamily extracted by MISA
The combinatorial features of RNase A-like superfamily extracted by MISA(cont.) • The known H-K-H active sites are identified exactly
The combinatorial features of RNase A-like superfamily extracted by ClustalW • The first H was misaligned
The combinatorial features of RNase A-like superfamily extracted by ClustalW • The first H was misaligned
The combinatorial features of RNase A-like superfamily extracted by ClustalW(cont.)
The combinatorial features of RNase A-like superfamily extracted by MEME
The combinatorial features of RNase A-like superfamily extracted by MEME(cont.) • The first ‘H’ was not successfully detected
The combinatorial features of RNase A-like superfamily extracted by Gibbs Sampler 1, 1, 1 65 qekvt CKNGQ gncyk 69 1.00 F 1E21:A 1, 2, 0 107 kerhi IVACE gspyv 111 1.00 F 1E21:A 1, 3, 2 116 egspy VPVHFD asved 121 1.00 F 1E21:A 2, 1, 1 38 nyqrr CKNQN tfllt 42 1.00 F 1GQV:A 2, 2, 0 109 anmfy IVACD nrdqr 113 1.00 F 1GQV:A 2, 3, 2 127 pqypv VPVHLD rii 132 1.00 F 1GQV:A 3, 1, 1 37 nyrwr CKNQN tflrt 41 1.00 F 1DYT:A 3, 2, 0 108 grrfy VVACD nrdpr 112 1.00 F 1DYT:A 3, 3, 2 125 prypv VPVHLD tti 130 1.00 F 1DYT:A 4, 1, 1 65 ttniq CKNGK mnche 69 1.00 F 1RNF:A 4, 2, 0 105 strrv VIACE gnpqv 109 1.00 F 1RNF:A 4, 3, 2 114 egnpq VPVHFD g 119 1.00 F 1RNF:A 5, 1, 1 59 kaice NKNGN phren 63 1.00 F 1B1I:A 5, 2, 0 104 gfrnv VVACE nglpv 108 1.00 F 1B1I:A 5, 3, 2 111 aceng LPVHLD qsifr 116 1.00 F 1B1I:A 15 motifs Column 1 : Sequence Number, Site Number Column 2 : Motif type Column 3 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input
The combinatorial features of RNase A-like superfamily extracted by Gibbs Sampler(cont.) • The first ‘H’ was not successfully detected • The motif colored in red wrong
The Comparison in Average RMSD and Aligned Residues (using a straight forward structure alignment) • The lowest average RMSD • The highest average aligned residues
MISA for primate map1b upstream sequences Ref: D. Liu and I. Fischer, “Structural analysis of the proximal region of the microtubule-associated protein 1B promoter”, J Neurochem, 1997, 69: pp. 910-919
Hierarchical clustering for p450 family 1 It can be clustered into three subfamilites
cytochrome P450 subfamily 1A cytochrome P450 subfamily 1B cytochrome P450 subfamily 1C Exclusive group features for p450 family 1 • cytochrome P450 subfamily 1A • ^ E*L*A ^ *PK*L* ^ *W*ARR*LA* ^ L**FS ^ *SC*LEEH*S*E ^ G*F*P ^ *V*SV*NVI ^ *DF*P*LR*LP* ^ **EHY**F ^ **DIT**L ^ **ELD** ^ R*P*LS • cytochrome P450 subfamily 1B • ^ F*R*A ^ WK**R ^ R*F*T ^ **RYP**Q*R*Q ^ DQ**LP ^ G**NK*L* ^ **HQC** ^ **LLD** • cytochrome P450 subfamily 1C • ^ SI**EWSG**QPAL*A*F ^ **EAC*W* ^ F**YSKQW**HRK*AQS**RAFS*AN*QT* ^ EA**LV**FL ^ F*P*HE*T ^ N**FF**V**KV**HR ^ W**LL ^ *AK*RG*