400 likes | 420 Views
Statistics of small peptides. This tour guides you through a computational experiment that you can perform within BioBIKE. To get to BioBIKE, go to: http://ixion.csbc.vcu.edu:8003/biologin Enter a login name (letters only, no spaces) No password necessary.
E N D
Statistics of small peptides This tour guides you through a computational experiment that you can perform within BioBIKE. To get to BioBIKE, go to: http://ixion.csbc.vcu.edu:8003/biologin Enter a login name (letters only, no spaces)No password necessary This demonstration is best viewed as a slide show,enabling you to simulate a session and make changes in cursor position more obvious.To do this, click Slide Show on the top tool bar, then View show. Click anywhere to go on to the next slide
Statistics of small peptides How many types of peptides are there of each size class? How many peptides are there with a single amino acid? In other words, how many ways can you fill the box below with a different amino acid? Amino acid goes here (how many different amino acids are there?)
Statistics of small peptides How many types of peptides are there of each size class? How about peptides with two amino acids?How many ways can you fill the boxes below with a different amino acids? Amino acids go here (If you don't see the answer, then simplify the problem and count by hand)
To verify your answer in BioBIKE…(though you should be so certain of your answer that if BioBIKE were to disagree, you'd think that BioBIKE is wrong, not you!) Strategy: Generate all possible proteins of a given length, then count them.
To verify your answer in BioBIKE…(though you should be so certain of your answer that if BioBIKE were to disagree, you'd think that BioBIKE is wrong, not you!) Strategy: Generate all possible proteins of a given length, then count them.
That gives you all the peptide sequences of length 1. Is the list correct? How many are there? With this list you can count by hand, but later this won't be possible. To automate the process, wrap the function in COUNT-OF.
That gives you the number of all the peptide sequences of length 1. Now for something more interesting. Change the length from 1 to 5 (remembering to close the entry by pressing Enter).
Whoops! A problem. BioBIKE is attempting to save you from doing something potentially stupid by accident. You could easily use this command to ask for more sequences than there are electrons in the universe. But read the advice carefully and note that there is a way out.
Statistics of small peptides Identification of a proteinfrom a peptide sequence If you were given a peptide sequence, say "QWER" (glutamine-tryptophan-glutamate-arginine), is this enough information to identify the protein it came from? This is sort of like a variation on the birthday problem: How likely is it that someone in the room has the same birthday as you do? It depends on how many people there are in the room and how many birthdays there are to choose from. With 365 people in the room, what would be your chances? (ignore leap years)
Statistics of small peptides Identification of a proteinfrom a peptide sequence Even without doing the calculation, you can see that only if the number of birthdays is much greater than the number of people do you stand a good chance of having a unique birthday. So how many possible peptides (analogous to birthdays) are there? You did this already. And how many 4-aa peptides are in the proteins of, say, ss120 (analogous to the number of people in the room)? Simplify: How many 4-aa peptides are there in a single protein? Suppose the protein has 100 amino acids.
Statistics of small peptides Identification of a proteinfrom a peptide sequence Imagine that protein, with 100 amino acids: aa1- aa2- aa3- aa4- aa5- aa6- …aa95- aa96- aa97- aa98- aa99- aa100 How many 4-aa sequences are there in this protein? You might want to simplify. Suppose the protein were only 4 amino acids in length. How many would there be? Suppose it were 5 amino acids in length? 10? What's the rule? If I tell you the length of the protein, can you tell me the number of 4-aa peptides?
Statistics of small peptides Identification of a proteinfrom a peptide sequence Now imagine that there are many 100's of proteins in an organism (say ss120), with different lengths. What do you need to know to calculate the total number of 4-aa sequences in the proteins of ss120? You can get all the information you need in BioBIKE using the functions illustrated on the following slides.
Assembling these functions should get you the number of 4-amino acid peptides there are in ss120 proteins. How does this number compare with the number of possible 4-amino acid peptide sequences you calculated earlier?
Statistics of small peptides How much overlap is there in the molecular weights of different peptides? There several problems in attempting to identify a protein from a single small peptide. Let's examine one of them. Mass spectrometry directly gives you not the sequence of a peptide but rather its molecular weight. If every peptide has a different molecular weight, then one can go directly from molecular weight to sequence. Is this the case? Consider the set of 3-amino acid peptides as an example.
Statistics of small peptides How much overlap is there in the molecular weights of different peptides? • Strategy: • Calculate the molecular weights of all 3-amino-acid peptides • - Bin (count) each size class • Write the results to a file • Download the file • Upload the file into Excel • Make a histogram of the results • You'll want to consider the BioBIKE functions on the following slides.
MW-OF (from the GENES-PROTEIN menu; Translation submenu) Use it to get the molecular weights of all protein sequences of length 3. Use the SEQUENCE option so that the function knows enough to interpret a sequence like "PHE" as "proline-histidine-glutamate", using the one-letter code, rather than the abbreviation of phenylalanine, using the three-letter code.
BIN-DATA-OF (you used this in the previous tour) Use it to count the instances of each molecular weight. The interval should be set to 1 so each size class is counted individually. The max should be set to the biggest molecular weight a 3-amino acid peptide can have. That would be 3 times the molecular weight of the biggest amino acid. What's that?
WRITE (you used this in the previous tour) Use it to write the counts of the binned molecular weights, i.e. the previous result. (PREVIOUS-RESULT from the OTHER-FUNCTIONS menu may be of use here) Make up any file name you want, so long as you put it in quotes. Select TAB-DELIMITED from the Options menu, since the file will be uploaded into Excel.
Statistics of small peptides How much overlap is there in the molecular weights of different peptides? You should now be in a position to create a histogram within Excel. If you do, you'll see something remarkable, like…
Statistics of small peptides How much overlap is there in the molecular weights of different peptides? part of the histogram, blown up to show detail This is peculiar…
Statistics of small peptides How much overlap is there in the molecular weights of different peptides? part of the histogram, blown up to show detail This is peculiar… The numbers of instances each molecular weight class appears to skip by a discrete unit. Why is that? Let's examine the peptides and their molecular weights more closely.
Repeat the molecular weight calculation, but this time labeling the result (you'll see what labeling does in a moment) Execute the resulting function.
Note that each molecular weight now comes with the peptide that is associated with it. To compare this result with the histogram, we need to sort the result by molecular weight.
We want to sort by the molecular weight (the second position), not the peptide (the first position).
Execute the function and compare the results closely with your histogram in Excel. What accounts for the numbers? Why are molecular weights with only one peptide so rare? How many are there?
Statistics of small peptides • In this tour, you've seen: • How to determine the number of peptides in each size class. • Problems related to the identification of proteins from their peptides. • The degeneracy of molecular weights in peptides. • Some causes of this degeneracy.