380 likes | 401 Views
Signals in Sequences. The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information. Rule 1. First rule of sequence analysis: If a residue is conserved, it is important. Rule 2. Second rule of sequence analysis:
E N D
Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information.
Rule 1 First rule of sequence analysis: If a residue is conserved, it is important.
Rule 2 Second rule of sequence analysis: If a residue is very conserved, it is very important.
GPCR Project GPCR is THE drug target. Lots of data available. You have ~630 GPCRs. Little structure data. 2000 sequences known. ‘Easy’ to align.
Laerte about modelling: “Use the sequence, Luke”
Conserved, CMA, variable QWERTYASDFGRGH QWERTYASDTHRPM QWERTNMKDFGRKC QWERTNMKDTHRVW Black = conserved White = variable Green = correlated mutations(CMA)
CMA and tree 1 ASASDFDFGHKM 2 ASASDFDFRRRL 3 ASLPDFLPGHSI 4 ASLPDFLPRRRV
CMA versus tree 1 ASASDFDFGHKMGHS 2 ASASDFDFRRRLRHS 3 ASLPDFLPGHSIGHS 4 ASLPDFLPRRRVRIT 5 ASASDFDFRRRLRIT 6 ASLPDFLPGHSIGIT Red : 1,2,5 vs 3,4,6 Black : 1,3,6 vs 2,4,5 Yellow: 1,2,3 vs 4,5,6
Sequence Signals • Three classes of residues • Conserved • CMA • Variable
Conservation Artefacts Conservation can result from Not enough sequences Too conserved sequences Over-alignment
Variability Artefacts Variability can result from Wrong sequence choice Variable loops Alignment errors
CMA Artefacts CMA can result from Wrong sequence choice Poor sequence homogeneity Over-fitting
Sequence Entropy 20 Ei = S pi ln(pi) i=1
Sequence Variability Sequence variability is the number of residues that is present in more than 0.5% of all sequences.
Entropy - Variability Entropy = Information Variability = Chaos
Entropy - Variability Variability is result of evolution. Entropy is the protein’s break on evolutionary speed.
GPCR Entropy - Variability 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
GPCR Location 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
Ras Location 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
Protease Location 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
Globin Location 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
GPCR Location (Again) 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
GPCR signaling 11 Purple 12 Red 22 ‘Yellow’ 23 Green 33 Blue
Summary Given infinitely many sequences: Every residues role known. Signaling paths detectable. So, sequences contain many signals
Thanks to: Laerte Oliveira Sao Paulo Wilma Kuipers Weesp Florence Horn San Francisco Bob Bywater Copenhagen Nora vd Wenden The Hague Mike Singer New Haven Ad IJzerman Leiden Margot Beukers Leiden Amos Bairoch Geneva Fabien Campagne New York