210 likes | 331 Views
D2DBT9 - Genetic Analysis and Bioinformatics. Collecting Public Information on Proteins – Self-guided practical session Dr. Jaume Bacardit jaume.bacardit@nottingham.ac.uk. Protein we are going to use today…. We are going to use in most examples the AXR4 protein from Arabidopsis Thaliana.
E N D
D2DBT9 - Genetic Analysis and Bioinformatics Collecting Public Information on Proteins – Self-guided practical sessionDr. Jaume Bacarditjaume.bacardit@nottingham.ac.uk
Protein we are going to use today… • We are going to use in most examples the AXR4 protein from Arabidopsis Thaliana MAIITEEEEDPKTLNPPKNKPKDSDFTKSESTMKNPKPQSQNPFPFWFYFTVVVSLATII FISLSLFSSQNDPRSWFLSLPPALRQHYSNGRTIKVQVNSNESPIEVFVAESGSIHTETV VIVHGLGLSSFAFKEMIQSLGSKGIHSVAIDLPGNGFSDKSMVVIGGDREIGFVARVKEV YGLIQEKGVFWAFDQMIETGDLPYEEIIKLQNSKRRSFKAIELGSEETARVLGQVIDTLG LAPVHLVLHDSALGLASNWVSENWQSVRSVTLIDSSISPALPLWVLNVPGIREILLAFSF GFEKLVSFRCSKEMTLSDIDAHRILLKGRNGREAVVASLNKLNHSFDIAQWGNSDGINGI PMQVIWSSEASKEWSDEGQRVAKALPKAKFVTHSGSRWPQESKSGELADYISEFVSLLPK SIRRVAEEPIPEEVQKVLEEAKAGDDHDHHHGHGHAHAGYSDAYGLGEEWTTT
Biological databases • Uniprot • NCBI Entrez • Pfam
UniProt • UniProt is a collaboration between the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). • Main protein data base • http://www.uniprot.org/
Querying UniProt with a protein Name: AXR4 Uniprot ID
Included in the AXR4 page…. • Annotation of protein • Function, location, specificity, disruption phenotype • Gene Ontology • Sequence • Transmembrane potential • Bibliographic references • Cross-references to other databases • GenBank, PIR, KEGG, TAIR (Adapbidopsis-specific) • Identify each of these
Returns these results • The top hit is the most likely protein for the query sequence, especially if the alignment score is high
Identifying homologs and aligning them • The blast section of uniprot is also useful to identify homologs. These will be the next hits in the list after the first one. • Now let’s select a few of them and click ‘Align’ on the green bar
ClustalW also generates phylogenetic trees(we will discuss about them in a future lecture)
Not only in Uniprot we have protein information…. • The NCBI’s Entrez system returns this for AXR4
Pfam returns three possible sequence motifs (but no significant results)
Screenshot from 2009 The database is larger now Protein Data Bank (PDB) Put your PDB ID Here Each protein in PDB is identified by a 4-letter code
Entry 2p31 Let’s click at display PDB file
PDB file for 2p31 Sequence Atomic coordinates of the amino acids
Prediction sites • Not all public information about proteins is actual solid information • There are also sites that perform predictions based on a protein’s primary sequence • Sometimes these services take long to perform their predictions, and ask for an email address to notify the user when finished