1 / 22

Single Motif

Single Motif. Charles Yan Spring 2006. Single Motif. Similar Sequence Similar Function.

vanya
Download Presentation

Single Motif

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single Motif Charles Yan Spring 2006

  2. Single Motif

  3. Similar Sequence Similar Function • In some cases the sequence of an unknown protein is too distantly related to any protein of known structure to detect its resemblance by sequence alignment, but it can be identified by the occurrence in its sequence of a particular cluster of residue types which is variously known asapattern, motif, signature, or fingerprint.

  4. Single Motif Protein function prediction using a single motif • Each protein family is characterized by one motif. • If a protein contain a motif, it probably belong to the family that the motif corresponds to. • A pertinent analogy is the use of fingerprints by the police for identification purposes. A fingerprint is generally sufficient to identify a given individual. Similarly, a motif can be used to assign a newly sequenced protein to a specific family of proteins and thus to formulate hypotheses about its function.

  5. Single Motif This approach is based on the observation that • While there is a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. • Proteins belonging to a particular family generally share sequence and/or structural attributes. • In a protein family, some regions have been better conserved than others during evolution. These regions are generally important for the function of a protein and/or for the maintenance of its three- dimensional structure. • Thus, by analyzing the constant and variable properties of such groups of similar sequences, it is possible to derive a signature for a protein family.

  6. Single Motif • A motif is a conserved element corresponding to a region whose function or structure is known. It is likely to be predictive of any subsequent occurrence of such a structural/functional region in any other protein sequence. • Motifs are usually represented using alignment or regular expression

  7. Single Motif

  8. PROSITE • PROSITE (http://ca.expasy.org/prosite/) is a database of protein families and domains. (Starting in 1988). • PROSITE currently contains patterns (motifs) and profiles specific for more than a thousand protein families or domains. Release 19.18, of 10-Jan-2006 (contains 1398 documentation entries). • Each of these signatures comes with documentation providing background information on the structure and function of these proteins.

  9. PROSITE Steps in the development of a new motif • Select a set of sequences that belong to a function family. Make a multiple alignment. • Find a short (not more than four or five residues long) conserved sequence (core motif) which is part of a region known to be important or which include biologically significant residue(s).

  10. PROSITE Steps in the development of a new motif (cont.) • The most recent version of the Swiss-Prot knowledgebase is then scanned with these core pattern(s). If a core motif will detect all the proteins in the family and none (or very few) of the other proteins, we can stop at this stage. • In most cases we are not so lucky and we pick up a lot of extra sequences which clearly do not belong to the group of proteins under consideration. A further series of scans, involving a gradual increase in the size of the motif, is then necessary. In some cases we never manage to find a good motif.

  11. PROSITE The motif are described using the following conventions: • The standard IUPAC one-letter codes for the amino acids are used. • The symbol 'x' is used for a position where any amino acid is accepted. • Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses '[ ]'. For example: [ALT] stands for Ala or Leu or Thr. • Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Ala and Met. • Each element in a pattern is separated from its neighbor by a '-'.

  12. PROSITE The motif are described using the following conventions (Cont.): • Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to x-x or x-x-x or x-x-x-x. • When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a '<' symbol or respectively ends with a '>' symbol. In some rare cases (e.g. PS00267 or PS00539), '>' can also occur inside square brackets for the C-terminal element. 'F-[GSTV]-P-R-L-[G>]' means that either 'F-[GSTV]-P-R-L-G' or 'F-[GSTV]-P-R-L>' are considered. • A period ends the pattern. Examples: [AC]-x-V-x(4)-{ED}.This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

  13. PROSITE

  14. PROSITE

  15. PROSITE

  16. PROSITE

  17. PROSITE

  18. PROSITE

  19. PROSITE • There are a number of protein families as well as functional or structural domains that cannot be detected using patterns due to their extreme sequence divergence; the use of techniques based on weight matrices (also known as profiles) allows the detection of such proteins or domains. • Three types of entry in PROSITES: • 1327 patterns/motifs • 591 profiles/matrices • 4 rules

  20. PROSITE A profile or weight matrix is a table of position-specific amino acid weights and gap costs. These numbers (also referred to as scores) are used to calculate a similarity score for any alignment between a profile and a sequence, or parts of a profile and a sequence. An alignment with a similarity score higher than or equal to a given cut-off value constitutes a motif occurrence.

  21. PROSITE

  22. PROSITE The rule is described in ordinary English and is free-format.

More Related