220 likes | 369 Views
Patterns and Profiles. Lisa Mullan, HGMP-RC. Terminology. Homologs Two proteins that share a common ancestor Usually similar functions Orthologs : different species Paralogs : same genome Analogs Two sequences that have NO common ancestor, but have similar functions. Protein
E N D
Patterns and Profiles Lisa Mullan, HGMP-RC
Terminology Homologs Two proteins that share a common ancestor • Usually similar functions • Orthologs : different species • Paralogs : same genome Analogs • Two sequences that have NO common • ancestor, but have similar functions. Protein • analogs may have the same fold.
7 10 Multiple sequence alignments CHERRIES CLEMENTIN-ES P-EAR--S GRE-ENAPPLES Most programs use “clustal” – a clustering algorithm
4 24 Multiple sequence alignments P-EARS----- GREENAPPLES CLEMENTINES CHERR--I-ES
0 24 Multiple sequence alignments GREENAPPLES CHERR---IES P-EARS----- CLEMENTINES
GREENAPPLES CLEMENTINES CHERRIES PEARS GREENAPPLES CLEMENTINES CHERR---IES P-EARS----- Multiple sequence alignments (cont.)
Multiple sequence alignments (cont.) CLUSTAL W (1.7) multiple sequence alignment Q40236/1-193 GTF-DQLQLVLRWPTSFCNGKNCKRTPKDFTIHGLWPDSEAGELNFCNPRASYTIVRHGTF Q40241/1-189 -----QLQLVLRWPTSFCNGKNCKRTPKDFTIHGLWPDSEAGELNFCNPRASYTIVRHGTF Q42513/1-193 GTF-NQLQLVLRWPASFCKGKKCERTPNNFTIHGLWPDIKGTILNNCNPDAKYASVTGGKF G255586/1-194 GAF-EYMQLVLQWPTAFCHTTPCKNIPSNFTIHGLWPDNVSTTLNFCGKEDDYNIIMDGP- Q40379/1-194 GAF-EYMQLVLQWPTTFCHTTPCKNIPSNFTIHGLWPDNVSTTLNFCGKEDDYNIIMDGP- :****:**::**: . *:. *.:********* . ** *. .* : * Q40236/1-193 EKRN---KHWPDLMRSKDNSMDNQEFWKHEYIKHGSCCTDLFNETQYFDLALVLKDRFDLLT Q40241/1-189 EKRN---KHWPDLMRSKDNSMDNQEFWKHEYIKHGSCCTDLFNETQYFDLALVLKDRFDLLT Q42513/1-193 VKRN---KHWPDLILTEAASLNSQGFWAYQFKKHGTCCSDLFNQEKYFDLALILKDKFDLLT G255586/1-194 EK-NGLYVRWPDLIREKADCMKTQNFWRREYIKHGTCCSEIYNQVQYFRLAMALKDKFDLLT Q40379/1-194 EK-NGLYVRWPDLIREKADCMKTQNFWRREYIKHGTCCSEIYNQVQYFRLAMALKDKFDLLT :** :****: : .:..* ** :: ***:**::::*: :** **: ***:***** Q40236/1-193 TFRIHGIVPRSSHTVDKIKKTIRSVTGVLPNLSCTKNMDLLEIGICFNREASKMIDCTRP Q40241/1-189 TFRIHGIVPRSSHTVDKIKKTIRSVTGVLPNLSCTKNMDLLEIGICFNREASKMIDCTRP Q42513/1-193 TFRNKGIIPKSTCTINKIQKTIRTVTGVVPNLSCTPTMELLEVGICFNRDASKLIDCDQP G255586/1-194 SLKNHGIIRGYKYTVQKINNTIKTVTKGYPNLSCTKGQELWEVGICFDSTAKNVIDCPNP Q40379/1-194 SLKNHGIIRGYKYTVQKINNTIKTVTKGYPNLSCTKGQELWEVGICFDSTAKNVIDCPNP ::: :**: . *::**::**::** ****** :* *:****: *.::*** .* Q40236/1-193 KTCNPGEDNLIGFP Q40241/1-189 KTCNPGEDNLIGFP Q42513/1-193 KTCDTSGNTEIFFP G255586/1-194 KTCKTASNQGIMFP Q40379/1-194 KTCKTASNQGIMFP ***... : * **
Multiple sequence alignments (cont.) ( ( Q40236/1-193:-0.00066, Q40241/1-189:0.00066) :0.18460, Q42513/1-193:0.17928, ( G255586/1-194:0.00258, Q40379/1-194:0.00258) :0.32591);
Motifs - assigned to the secondary structure of a protein E.coli trp repressor
Leucine zipper motif L-X(6)-L-X(6)-L-X(6)-L
http://bioinf.man.ac.uk/dbbrowser/PRINTS/ “A fingerprint is a group of conserved motifs used to characterise a protein family”
Domains Many definitions – depends who you speak to! • Domains are discrete structural units • Defined by structure • Domain boundaries can be inferred from careful sequence analysis • Domains are the common currency of protein function
But – there are slightly more glutamates than aspartates in the alignment! EFGHIVW EYAHMIW DYAHSLW EFGHPLW [ED]- [FY]- [GA]- H- X- [VIL]- W And could X be represented more accurately by {FYW}?
EFGHIVW EYAHMIW DYAHSLW EFGHPLW So, let’s add some numbers to the problem! Positions One 15 5 0 0 0 0 0 0 0 0 0 0 0 0 Two 0 0 10 10 0 0 0 0 0 0 0 0 0 0 Three 0 0 0 0 0 10 10 0 0 0 0 0 0 0 Four 0 0 0 0 20 0 0 0 0 0 0 0 0 0 Five 2 2 -2 -2 2 2 2 2 2 2 2 2 2 -2 Six 0 0 0 0 0 0 0 5 0 0 0 10 5 0 Seven 0 0 0 0 0 0 0 0 0 0 0 0 0 20 E D F Y H G A I M S P L V W
M 1.0 I .50 0.75 0.75 E .75 D .25 F .50 Y .50 S 1.0 V .25 I .25 L .50 X 1.0 1.0 1.0 0.25 1.0 0.25 H 1.0 1.0 W 1.0 But…….profiles do not support gaps…. EFH-IIVW EYH--MIW DYHSISLW EFH-IPLW Hidden Markov Models introduce statistics into profiles
Pfam-A • 2,216 Curated families with annotation. • Pfam-B • 40,000 families derived from Prodom.