130 likes | 252 Views
29 th Feb, 2012. Bioinformatics. Ayesha Masrur Khan. Protein Family and Domains. Once a protein sequence is obtained, there are many questions that can be asked, such as -what is the protein’s overall identity? -what putative functions does it have? -what biological motifs are present?
E N D
29th Feb, 2012 Bioinformatics Ayesha Masrur Khan
Protein Family and Domains Once a protein sequence is obtained, there are many questions that can be asked, such as -what is the protein’s overall identity? -what putative functions does it have? -what biological motifs are present? Different computational tools are needed to determine possible functional domains based on primary sequence data. Lec-5
Protein Family and Domains (contd.) • Therefore, family and domain databases are used to address the question- ‘what domains are contained within this sequence?’ or ‘what family does this protein belong to?’ BUT first: what are families and domains? Lec-5
Protein Family and Domains (contd.) Family---> A family of proteins was originally defined by Dayhoffet.al (1978) as a group of sequences with more than 50% identity when aligned with similar functions. Families are often also characterized by the presence of one or more domains with high sequence similarity. Domains---> Traditionally known as structurally independent folding units, are conserved functional units that may contain one or more motifs. Lec-5
Protein Family and Domains (contd.) Motifs---> These include both short stretches of fixed residue length that act as sites for post translational modifications and longer sequences that form secondary structures for protein-DNA, protein-ion or protein-lipid interactions. Lec-5
Domain Example: Pyruvatekinase Quaternary structure: 4 subunits 3 domains Lec-5
Zinc finger motif: A sequence motif Sequence motif: A particular amino-acid sequence that is characteristic of a specific biochemical function Three zinc fingers bound spirally in the major groove of a DNA molecule. The coordination of a zinc atom by characteristically spaced cysteineand histidine residues in a single zinc finger motif Lec-5
Other examples: structural motifs Another type is the functional motif, which is a sequence or structural motif that is always associated with a particular biochemical function. Lec-5
Protein families • Protein families are related to one another by sequence similarity, domain composition, or structure. • These include proteins found across species orthologues) or within the same species (paralogs). • Family descriptors are derived from MSAs (multiple sequence alignments) that enable us to define traits that encompass all member sequences. • Family descriptors have been based on sequence identity (>50% identical), common domains (e.g. catalytic binding domains, calcium binding motifs etc.), structure, or a combination of these characteristics. Lec-5
Protein Domains • Domains represent discrete stretches within the protein, unlike protein families, which are commonly defined over the length of the sequence. • These units are conserved at the level of sequence and structure. • They can be described by: • combinations of short regions of highly conserved amino acids within a domain • all amino acids • structural features • Domain description is developed in the same way as the family descriptors. Lec-5
Family-Domain Databases • Because of the reuse of motifs and domains, similarities can be found within sequences that are otherwise unrelated evolutionarily. • Therefore, methods are needed to distinguish between similarities due to random variation and those of common origin or function. Family-domain databases provide the following benefits: Increase sensitivity, i.e. true matches are detected through MSA Increased specificity, i.e. detect only related proteins Classification of protein sequences to appropriate families Lec-5
Family-Domain Databases Some database references Lec-5
Searching sequence databases • Search methods engage in a series of sequence alignments to determine degrees of similarity between sequences and then return a list of matched sequences to the user. • Alignment Algorithms • Manually, we examine two or more sequences for similar residue patterns, match up identical residues, decide qualitatively whether they are aligned well, and determine statistically how identical or similar the sequences are. • The automation of this process requires a computer-based method to line sequences up against one another and a scoring method for evaluating the success of the alignment in terms of similarity or identity. Lec-5