240 likes | 416 Views
Lecture 08. PROTEIN SEQUENCE ANALYSIS. PROTEIN DATABASES. PROTEIN SEQUENCE. PROPERTIES. TOOLS. MOTIF/DOMAIN. FOLDINDING. Protein Sequence/Motif/Domain Databases. http://www.uniprot.org/. http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi.
E N D
Lecture 08 PROTEIN SEQUENCE ANALYSIS
PROTEIN DATABASES PROTEIN SEQUENCE PROPERTIES TOOLS MOTIF/DOMAIN FOLDINDING
Protein Sequence/Motif/Domain Databases http://www.uniprot.org/ http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi http://www.ebi.ac.uk/Tools/pfa/iprscan/ Protein Analysis Tools http://tw.expasy.org/
Example cellular tumor antigen p53 isoform a [Homo sapiens] >gi|120407068|ref|NP_000537.3| cellular tumor antigen p53 isoform a [Homo sapiens] MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAA PPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKT CPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRN TFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGR DRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALEL KDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
http://www.ebi.ac.uk/Tools/pfa/iprscan/ InterPro provides an Integrated resource of Protein Families, Domains and Sites of the commonly used signature databases, and has an intuitive interface for text- and sequence-based searches. Bioinformatics infrastructural activities are crucial to modern biological research. Complete and up-to-date databases of biological knowledge are vital for the increasingly information-dependent biological and biotechnological research. Secondary protein databases on functional sites and domains like PROSITE, PRINTS, SMART, Pfam, ProDom, etc. are vital resources for identifying distant relationships in novel sequences, and hence for predicting protein function and structure. Unfortunately, these signature databases do not share the same formats and nomenclature, and each database has is own strengths and weaknesses. To capitalise on these, the following partners: EBI, SIB, University of Manchester, Sanger Institute, GENE-IT, CNRS/INRA, LION bioscience AG and University of Bergen unified PROSITE, PRINTS, ProDom and Pfam into InterPro (Integrated resource of Protein Families, Domains and Sites). The latest databases to join the project were SMART, and more recently, TIGRFAMs.
Protein Sequence Analysis Tools http://www.expasy.org/ http://tw.expasy.org/ ExPASy Molecular Biology Server Expert Protein Analysis System is the new SIB Bioinformatics Resource Portal which provides access to scientific databases and software tools in different areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc
Practical:Gene; RNA; Protein U62639 (Gene) aaaaatgtatgtctgattttgaaatgctcatttcctttgaggtttccatttttgagttgc 61 ccgtaatttgtatttttctgaagatgagcaattcaatttttaaattgcccgcacctctac 121 cgtttccatcgtgtattttgttaaaatattcacagattaacccatttaccgtttcatcca 181 cctgtttttcctcgaaaagattccaatgttctataattctacaaaacttcccacgcgaga 241 aacaactgtaataaactgaatatattatctatcgcatcgttttcaaccagaattaagcaa 301 gaggttccacaactttaaacaccaacaacgcaatcctaaatcatttgcaagattttattt 361 cagatgctacactttctgcctgaaaaaaattctgaaaagccgaacaataattcatggtaa 421 caatgaatggcagatacatcaaagttttagatgaacaatttttatgtattaaatgtacat 481 ttaaaaacaaattgcacaacgattctactactgtcgcactaattttacgtatgtctgtac 541 ttgaagatttcgaattaatttgttcaatattgtgttaaaatgtttgatttatacactcaa 601 atctttaaaagatttattggaaaagataaatggttaatttaaaccaaaaatttccatcaa 661 gccttttctgaaaacactaaaattattttcgtggtgggaccaggcgcgcgcgtcccatga 721 tgttcctttaatcaaaatgcatttctgtcccggcgggagaaattgaattttgattttaag 781 gcgcgaatttttgcctaaaaacgatgccattctttcattcttttcataatctcactcacc 841 atgagaaccatgcgccttgcttggttgctcccactttttattcacatactaatcaaggta 901 atttccccgtttttctagttttttcaatgtattttcatgtttcagaacacagctcaagct 961 ccggctgtcaacaactcgacatgcgatcaagcaaaggaatttgattgcgggaacgggaga 1021 ctccgatgcattcccgcggagtggcaatgcgacaacgtagcggactgcgacaaaggaaga 1081 gacgaatcgggctgctcatatgcgcatcattgttcgacaagcttcatgttatgcaagaat 1141 ggactgtgtgtcgcaaatgagttcaaatgcgacggcgaagacgactgccgcgatggaagc 1201 gatgagcagcattgcgagtacaatatcctgaagtctcgcttcgatggttccaatccttcg 1261 gctcctaccactttcgttggtcacaatggcccagaatgccatcctcctcgtttacgatgc 1321 cgatcaggacaatgtattcaaccagatctcgtttgtgatggacatcaggattgttctgga 1381 ggagatgatgaggtcaactgcaccagaaggggacatgaaaatatgcagtcctcgactgat 1441 tttcacgatgatgttcatcttgtcgatccaacctttttcgctaatgaagacaataaggta 1501 attgtttaatgtttattaatccgttttaacttttatttttcagtgtcggagtggatacac 1561 aatgtgccatagcggagacgtctgcatacctgacagttttctttgtgacggcgatctaga 1621 ttgtgatgatgcttcggacgagaaaaactgccaaactaatgctccaagcgaagaagaata 1681 tctttctgggcaagccgatcacatgcattcgtgctcagcagcaggaatgtattcttgtgg 1741 aacaaaaggatccgaaattggcgtttgtattccgatgaatgccacgtgtaatgggatcaa 1801 ggagtgtccactaggagatgacgagtcaaaacattgctccgaatgtgccagaaagcgatg 1861 tgaccacacatgtatgaacactccacacggggctcgctgcatttgtcaagaaggatataa 1921 gcttgccgatgacggactcacttgcgaggatgaagatgagtgtgcaactcatgggcactt 1981 gtgccagcatttctgtgaagatcgtttgggttcctttgcatgcaaatgtgccaacggtta 2041 tgagcttgaaacggatgggcattcttgtaaatacgaggcaaccactacgccagaaggata 2101 tttgttcatcagtcttggtggagaagttcgacagatgccattggcagatttcaccgatgg 2161 ttcaaattactcggcgattcaaaagtttgctggccacggaaccatcagatcgatcgactt 2221 catgcatcgcaacaacaaaatgttcatgtcaatttctgatgagcacggtgatccaactgg 2281 cgaattgtcagtgtccgacaatggattgatgagagttcttcgagaaaatgtcattggagt 2341 gagcaacgtggcagtcgactggattggtggaaacgttttcttcacacaaaaatgtatgtt 2401 tatctaatgtttaaatttttcatttgtgattcttacagctccatctccaagcgctgggat 2461 ttccatctgcacaatgagcggaatgttctgtcgccgagttatcgaaggcaaagaacaagg 2521 acaatcctatcgtggtcttgttgttcacccgatgcgcggtctcatcatctggatcgattc 2581 ttatcagaaatatcatcgcatcatgatggctaatatggatgggtctcaggtgagtcgatc 2641 gagtcgatctgatttagttcatttctaaataaatttcaggtcagaatccttctcgacaac 2701 aagttggaagttccatcagctcttgccatcgactacatccgccacgatgtctattttgga 2761 gatgttgaacgtcagttgatcgaaagagtcaatatcgacacgaaagagcgccgcgtagtg 2821 atttcgaacggagttcatcatccgtatgacatggcttacttcaatggtttcctatactgg 2881 gcagattggtaagacatcttatctaatttatattttcaaatttatttttcaggggaagcg 2941 agtcattaaaggttcaagagatgacccatcatcattcgagtcctcaagtcatccatactt 3001 tcaatcgttatccatatggtattgctgtcaatcactcactctaccagactggtcctccat 3061 caaacccatgccttgaactcgagtgcccatggctctgcgttattgtgccaaagagcgatt 3121 tcattatgactgccaagtgtgtctgcccagacggatacactcattccgtcactgaaaact 3181 cttgcatcccgcctgtgacgattgaggacgaggagaaccttgagaagctttcccacattg 3241 gatctgctttgatggccgaatactgcgaagctggtgtcgcgtgtatgaatggaggagcct 3301 gccgtgaactacaaaatgagcacggaagagctcatcgcatcgtttgtgattgtgagggtc 3361 catatgacgggcaatactgcgaacggctcaatccagagaagttctccgcaatggaagagg 3421 aagattcgtccttatggcttatcgttctgcttctcatttttctcatcatcgttgcggtag 3481 tcggaattattgccttcctttggttttctcaacaagagcatatgaaagatgtgatttcca 3541 ctgcccgtgtccgtgttgataacatggctagaaaagcggaagatgctgcagctccaattg 3601 tcgagaagttccgcaaggtcactgataagcagaggagcacgcctcctagagaaggttgtc 3661 aaacggcaacaaacgttgacttcgtttcctacgagacaaatgctgagaaaagaattcgga 3721 tggactcttcgccgacgtcatacggaaaccccatgtacgatgaagttcctgaatcgtcaa 3781 ctggtttcgtcagatcggcttccgcaccattcgctggagtcattcgatttgagaacgaca 3841 gcttgttgtgaattctactacaaaattactaaatcagatgtctgtaaagtatatctattt 3901 ttgcctatttattgcatgaaagttgataatgtcta
Practical:Gene; RNA; Protein U62639 (mRNA) atgagaaccatgcgccttgcttggttgctcccactttttattcacatactaatcaagaac 61 acagctcaagctccggctgtcaacaactcgacatgcgatcaagcaaaggaatttgattgc 121 gggaacgggagactccgatgcattcccgcggagtggcaatgcgacaacgtagcggactgc 181 gacaaaggaagagacgaatcgggctgctcatatgcgcatcattgttcgacaagcttcatg 241 ttatgcaagaatggactgtgtgtcgcaaatgagttcaaatgcgacggcgaagacgactgc 301 cgcgatggaagcgatgagcagcattgcgagtacaatatcctgaagtctcgcttcgatggt 361 tccaatccttcggctcctaccactttcgttggtcacaatggcccagaatgccatcctcct 421 cgtttacgatgccgatcaggacaatgtattcaaccagatctcgtttgtgatggacatcag 481 gattgttctggaggagatgatgaggtcaactgcaccagaaggggacatgaaaatatgcag 541 tcctcgactgattttcacgatgatgttcatcttgtcgatccaacctttttcgctaatgaa 601 gacaataagtgtcggagtggatacacaatgtgccatagcggagacgtctgcatacctgac 661 agttttctttgtgacggcgatctagattgtgatgatgcttcggacgagaaaaactgccaa 721 actaatgctccaagcgaagaagaatatctttctgggcaagccgatcacatgcattcgtgc 781 tcagcagcaggaatgtattcttgtggaacaaaaggatccgaaattggcgtttgtattccg 841 atgaatgccacgtgtaatgggatcaaggagtgtccactaggagatgacgagtcaaaacat 901 tgctccgaatgtgccagaaagcgatgtgaccacacatgtatgaacactccacacggggct 961 cgctgcatttgtcaagaaggatataagcttgccgatgacggactcacttgcgaggatgaa 1021 gatgagtgtgcaactcatgggcacttgtgccagcatttctgtgaagatcgtttgggttcc 1081 tttgcatgcaaatgtgccaacggttatgagcttgaaacggatgggcattcttgtaaatac 1141 gaggcaaccactacgccagaaggatatttgttcatcagtcttggtggagaagttcgacag 1201 atgccattggcagatttcaccgatggttcaaattactcggcgattcaaaagtttgctggc 1261 cacggaaccatcagatcgatcgacttcatgcatcgcaacaacaaaatgttcatgtcaatt 1321 tctgatgagcacggtgatccaactggcgaattgtcagtgtccgacaatggattgatgaga 1381 gttcttcgagaaaatgtcattggagtgagcaacgtggcagtcgactggattggtggaaac 1441 gttttcttcacacaaaaatctccatctccaagcgctgggatttccatctgcacaatgagc 1501 ggaatgttctgtcgccgagttatcgaaggcaaagaacaaggacaatcctatcgtggtctt 1561 gttgttcacccgatgcgcggtctcatcatctggatcgattcttatcagaaatatcatcgc 1621 atcatgatggctaatatggatgggtctcaggtcagaatccttctcgacaacaagttggaa 1681 gttccatcagctcttgccatcgactacatccgccacgatgtctattttggagatgttgaa 1741 cgtcagttgatcgaaagagtcaatatcgacacgaaagagcgccgcgtagtgatttcgaac 1801 ggagttcatcatccgtatgacatggcttacttcaatggtttcctatactgggcagattgg 1861 ggaagcgagtcattaaaggttcaagagatgacccatcatcattcgagtcctcaagtcatc 1921 catactttcaatcgttatccatatggtattgctgtcaatcactcactctaccagactggt 1981 cctccatcaaacccatgccttgaactcgagtgcccatggctctgcgttattgtgccaaag 2041 agcgatttcattatgactgccaagtgtgtctgcccagacggatacactcattccgtcact 2101 gaaaactcttgcatcccgcctgtgacgattgaggacgaggagaaccttgagaagctttcc 2161 cacattggatctgctttgatggccgaatactgcgaagctggtgtcgcgtgtatgaatgga 2221 ggagcctgccgtgaactacaaaatgagcacggaagagctcatcgcatcgtttgtgattgt 2281 gagggtccatatgacgggcaatactgcgaacggctcaatccagagaagttctccgcaatg 2341 gaagaggaagattcgtccttatggcttatcgttctgcttctcatttttctcatcatcgtt 2401 gcggtagtcggaattattgccttcctttggttttctcaacaagagcatatgaaagatgtg 2461 atttccactgcccgtgtccgtgttgataacatggctagaaaagcggaagatgctgcagct 2521 ccaattgtcgagaagttccgcaaggtcactgataagcagaggagcacgcctcctagagaa 2581 ggttgtcaaacggcaacaaacgttgacttcgtttcctacgagacaaatgctgagaaaaga 2641 attcggatggactcttcgccgacgtcatacggaaaccccatgtacgatgaagttcctgaa 2701 tcgtcaactggtttcgtcagatcggcttccgcaccattcgctggagtcattcgatttgag 2761 aacgacagcttgttgtga
Practical:Gene; RNA; Protein AAD09364 (Protein) 1 MRTMRLAWLL PLFIHILIKN TAQAPAVNNS TCDQAKEFDC GNGRLRCIPA EWQCDNVADC 61 DKGRDESGCS YAHHCSTSFM LCKNGLCVAN EFKCDGEDDC RDGSDEQHCE YNILKSRFDG 121 SNPSAPTTFV GHNGPECHPP RLRCRSGQCI QPDLVCDGHQ DCSGGDDEVN CTRRGHENMQ 181 SSTDFHDDVH LVDPTFFANE DNKCRSGYTM CHSGDVCIPD SFLCDGDLDC DDASDEKNCQ 241 TNAPSEEEYL SGQADHMHSC SAAGMYSCGT KGSEIGVCIP MNATCNGIKE CPLGDDESKH 301 CSECARKRCD HTCMNTPHGA RCICQEGYKL ADDGLTCEDE DECATHGHLC QHFCEDRLGS 361 FACKCANGYE LETDGHSCKY EATTTPEGYL FISLGGEVRQ MPLADFTDGS NYSAIQKFAG 421 HGTIRSIDFM HRNNKMFMSI SDEHGDPTGE LSVSDNGLMR VLRENVIGVS NVAVDWIGGN 481 VFFTQKSPSP SAGISICTMS GMFCRRVIEG KEQGQSYRGL VVHPMRGLII WIDSYQKYHR 541 IMMANMDGSQ VRILLDNKLE VPSALAIDYI RHDVYFGDVE RQLIERVNID TKERRVVISN 601 GVHHPYDMAY FNGFLYWADW GSESLKVQEM THHHSSPQVI HTFNRYPYGI AVNHSLYQTG 661 PPSNPCLELE CPWLCVIVPK SDFIMTAKCV CPDGYTHSVT ENSCIPPVTI EDEENLEKLS 721 HIGSALMAEY CEAGVACMNG GACRELQNEH GRAHRIVCDC EGPYDGQYCE RLNPEKFSAM 781 EEEDSSLWLI VLLLIFLIIV AVVGIIAFLW FSQQEHMKDV ISTARVRVDN MARKAEDAAA 841 PIVEKFRKVT DKQRSTPPRE GCQTATNVDF VSYETNAEKR IRMDSSPTSY GNPMYDEVPE 901 SSTGFVRSAS APFAGVIRFE NDSLL
Practical:Gene; RNA; Protein • Download the sequences Gene, RNA and Protein • Upload to SeqWEB • ANALYSIS: • Exon/intron organization. • Use (1) BESTFIT & GAP (“gene” vs “rna”) • (2) Genome Blastn • Opening Reading Frame • Use MAP to find the ORF • Use TRANSLATE to write the ORF • Compare your ORF with “protein” • 3. Protein Domain Search (NCBI CD Search, Interpro) • 4. Protein Sequence Analysis • see next page
ASSIGNMENT 03 Download the file ex.fasta download • Assemble the fragments • 2. How many potential reading frames are there? • 3. Give the names of these genes? • 4. The identity and similarity of the last gene with H. sapiens? • - nucleotide and amino acid sequence • 5. MW, pI and potential post-translational modification sites of any ONE protein. E-mail the ANSWER as attached files to --petang@mail.cgu.edu.tw. before ****郵件主旨: ASS03 bioinfo – (學號)