230 likes | 383 Views
Protein Structure & Analysis. Biology 224 Dr. Tom Peavy Sept 27 & 29. <Images from Bioinformatics and Functional Genomics by Jonathan Pevsner> . Protein families. Protein localization. protein. Protein function. Gene ontology (GO): --cellular component --biological process
E N D
Protein Structure & Analysis Biology 224 Dr. Tom Peavy Sept 27 & 29 <Images from Bioinformatics and Functional Genomics by Jonathan Pevsner>
Protein families Protein localization protein Protein function Gene ontology (GO): --cellular component --biological process --molecular function Physical properties
The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) • Work groups • Protein Separation • Mass Spectrometry • Molecular Interactions • Protein Modifications • Proteomics Informatics • Themes • Controlled vocabularies • MIAPE: Minimum information about a proteomics experiment
Protein domains, motifs & signatures
Definitions • Signature: • a protein category such as a domain or motif • (a defining property of the protein or family) • Domain: • a region of a protein that can adopt a 3D structure • a fold • a family is a group of proteins that share a domain • examples: zinc finger domain • immunoglobulin domain • Motif (or fingerprint): • a short, conserved region of a protein • typically 10 to 20 contiguous amino acid residues
Definition of a domain According to InterPro at EBI (http://www.ebi.ac.uk/interpro/): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART (http://smart.embl-heidelberg.de): A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities.
15 most common domains (human) Zn finger, C2H2 type 1093 proteins Immunoglobulin 1032 EGF-like 471 Zn-finger, RING 458 Homeobox 417 Pleckstrin-like 405 RNA-binding region RNP-1 400 SH3 394 Calcium-binding EF-hand 392 Fibronectin, type III 300 PDZ/DHR/GLGF 280 Small GTP-binding protein 261 BTB/POZ 236 bHLH 226 Cadherin 226
Varieties of protein domains Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times
Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2) MBD TRD The protein includes a methylated DNA binding domain (MBD) and a transcriptional repression domain (TRD). MeCP2 is a transcriptional repressor. Mutations in the gene encoding MeCP2 cause Rett Syndrome, a neurological disorder affecting girls primarily.
Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins
Proteins can have both domains and patterns (motifs) Pattern (several residues) Pattern (several residues) Domain (aspartyl protease) Domain (reverse transcriptase)
Can find UniProt accession number within GenBank Entry Human hemoglobin subunit beta NP_000509
The SwissProt entry for any protein provides highly useful information…
Definition of a motif A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids. Simple motifs include transmembrane domains and phosphorylation sites. These do not imply homology when found in a group of proteins. PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). In contrast, a profile is a quantitative motif description. Profiles are found in Pfam, ProDom, SMART, and other databases. Page 231-233
Pattern syntaxThe symbol `x' is used for a position where any amino acid is accepted. Ambiguities are indicated by listing the acceptable amino acids for a given position, between square brackets `[ ]'. For example: [ALT] stands for Ala or Leu or Thr. Ambiguities are also indicated by listing between a pair of curly brackets `{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Ala and Met. Each element in a pattern is separated from its neighbor by a `-'. Repetition of an element of the pattern can be indicated by following that element with a numerical value or, if it is a gap ('x'), by a numerical range between parentheses.Examples: x(3) corresponds to x-x-x x(2,4) corresponds to x-x or x-x-x or x-x-x-x A(3) corresponds to A-A-A Note: You can only use a range with 'x', i.e. A(2,4) is not a valid pattern element. When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a `<' symbol or respectively ends with a `>' symbol.