1 / 12

Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors). Karl Wilson. Objectives:. Introduce students to online protein and nucleotide databases (via GenBank at the NCBI website). Specific operations: Use of BLAST to find similar sequences (protein & nucleotide)

Download Presentation

Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors) Karl Wilson

  2. Objectives: • Introduce students to online protein and nucleotide databases (via GenBank at the NCBI website). • Specific operations: • Use of BLAST to find similar sequences (protein & nucleotide) • Downloading and saving sequences • Comparison of sequences and alignment with ClustalW • Interpretation of phylogenetic data.

  3. AAA92063. cysteinyl endopep...[gi:1223922] LOCUS AAA92063 362 aa linear PLN 22-AUG-2002 DEFINITION cysteinyl endopeptidase [Vigna radiata]. ACCESSION AAA92063 VERSION AAA92063.1 GI:1223922 DBSOURCE locus VRU49445 accession U49445.1 KEYWORDS . SOURCE Vigna radiata ORGANISM Vigna radiata Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Fabales; Fabaceae; Papilionoideae; Phaseoleae; Vigna. REFERENCE 1 (residues 1 to 362) AUTHORS Lee,K., Tan-Wilson,A.L. and Wilson,K.A. TITLE Direct Submission JOURNAL Submitted (16-FEB-1996) K. Lee, Department of Biological Sciences, State University of New York at Binghamton, P.O. Box 6000, Binghamton, NY 13902-6000, USA • The “test” protein sequence:

  4. Student given VRU49445 sequence (only) via e-mail or Blackboard Find sequence via Entrez, download in Fasta format VRU49445 sequence Submit to Protein-Protein BLAST (BLASTP) BLASTP results – related sequences

  5. Score E Sequences producing significant alignments: (bits) Value gi|1223922|gb|AAA92063.1| cysteinyl endopeptidase [Vigna ra... 705 0.0 gi|118158|sp|P12412|CYSP_VIGMU Vignain precursor (Bean endo... 686 0.0 gi|445927|prf||1910332A Cys endopeptidase 684 0.0 gi|7435774|pir||S22502 cysteine proteinase (EC 3.4.22.-) - ... 677 0.0 gi|544129|sp|P25803|CYSP_PHAVU Vignain precursor (Bean endo... 674 0.0 gi|1345573|emb|CAA40073.1| endopeptidase (EP-C1) [Phaseolus... 673 0.0 gi|31559530|dbj|BAC77523.1| cysteine proteinase [Glycine ma... 657 0.0 gi|31559526|dbj|BAC77521.1| cysteine proteinase [Glycine ma... 653 0.0 gi|7435817|pir||T08122 cysteine endopeptidase (EC 3.4.22.-)... 580 e-164 gi|600111|emb|CAA84378.1| cysteine proteinase [Vicia sativa] 540 e-152 gi|3688528|emb|CAA06243.1| pre-pro-TPE4A protein [Pisum sat... 539 e-152 gi|18423124|ref|NP_568722.1| cysteine proteinase [Arabidops... 521 e-147 gi|30141021|dbj|BAC75924.1| cysteine protease-2 [Helianthus... 516 e-145 gi|1076552|pir||S49166 cysteine proteinase (EC 3.4.22.-) pr... 510 e-143 gi|7435811|pir||T06708 cysteine proteinase (EC 3.4.22.-) T2... 490 e-137 gi|1169186|sp|P43156|CYSP_HEMSP Thiol protease SEN102 precu... 490 e-137 gi|25289998|pir||JC7787 carrot seed cysteine proteinase (EC... 485 e-136 gi|18408616|ref|NP_566901.1| cysteine proteinase, putative ... 483 e-135 gi|1173630|gb|AAB37233.1| cysteine proteinase 470 e-131 gi|4731374|gb|AAD28477.1|AF133839_1 papain-like cysteine pr... 462 e-129 gi|22331686|ref|NP_680113.1| cysteine proteinase, putative ... 462 e-129

  6. BLASTP results – related sequences Copy most similar cDNA sequences (in FASTA format) cDNA sequences from P. vulgaris, V. mungo, G. max, V. sativa, etc. Submit sequences to CLUSTALW at Biology Workbench website.

  7. Alignment of the Cysteine Proteases from Vigna, Phaseolus, Glycine, and Vicia. gi_118158_sp_P12412_CYSP_VIG MAMKKLLWVVLSLSLVLGVANSFDFHEKDLESEESLWDLYERWRSHHTVS gi_1223922_gb_AAA92063.1__cy MAMKKLLWVVLSLSLVLGVANSFDFHEKDLASEESLWDLYERWRSHHTVS gi_31559526_dbj_BAC77521.1__ MAMKKLLWVVLSLSLVLGSANSFDFHDKDLASEESFWDLYERWRSHHTVS gi_31559530_dbj_BAC77523.1__ MAMKKFLWVVLSLSLVLGVANSFDFHDKDLESEESLWDLYERWRSHHTVS gi_600111_emb_CAA84378.1__cy MEMKKLLFISLSLALIFTVANTFDFNEHDLESEKSLWNLYERWRSHHTVT gi_118158_sp_P12412_CYSP_VIG RSLGEKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_1223922_gb_AAA92063.1__cy RSLTEKHKRFNVFKENVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_31559526_dbj_BAC77521.1__ RSLGDKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_31559530_dbj_BAC77523.1__ RSLGDKHKRFNVFKANMMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_600111_emb_CAA84378.1__cy RNLDEKHNRFNVFKANVMHVHNTNKLDKPYKLKLNKFGDMTNYEFRRIYA gi_118158_sp_P12412_CYSP_VIG GSKVNHHKMFRGSQHGSGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG gi_1223922_gb_AAA92063.1__cy GSKVNHHKMFRGTQHGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG gi_31559526_dbj_BAC77521.1__ GSKVNHHRMFQGTPRGNGTFMYEKVGSVPPSVDWRKNGAVTGVKDQGQCG gi_31559530_dbj_BAC77523.1__ GSKVNHHRMFRDMPRGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGHCG gi_600111_emb_CAA84378.1__cy DSKISHHRMFRGMSHENGTFMYENAVDVPSSIDWRNKGAVTGVKDQGQCG

  8. Unrooted Phylogenetic Tree

  9. Add more sequences (e.g. of non-legumes) and see how tree changes? • Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. • Compare results.

  10. Possible Additions: • Add more sequences (e.g. of non-legumes) and see how tree changes? • Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. Compare results with those from protein sequences.

  11. Compare the nucleotide sequences of the cDNA and gene pairs where available – exons/introns? ACGTGTGACGAATCAAAGGTGCATGTTAGGCCAAACATATTTTCCAATGAACGTGTGACGAATCAAAGGTG----------------------------- ACCTGTGATGCATCAAAGGTGCATGTTCGGCCAAACTTTTTTTTTTTT–-ACCTGTGATGCATCAAAGGTG-----------------------------AACCACTATAATTAATAGATAACTTGAGAAACT--AAAGTGCCAAAAATC -------------------------------------------------- -TTTAATGAAACCAATA--TAACTTGAGAAATCTAAAATTGCCAAAAATC -------------------------------------------------- TTTCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGTCATGAAA ----------------AATGACCTAGCTGTGTCAATTGATGGTCATGAAA TTGCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGCCATGAGA----------------AATGACCTAGCTGTGTCAATTGATGGCCATGAGA AATGACCTAGCTGTGTCAATTGATGGCCATGAGA ************************** ***** *

  12. Examine targeting of cysteine protease – e.g. with TargetP or PSORT. PSORT : http://psort.ims.u-tokyo.ac.jp/ With AAA92063(Vigna radiata cysteine protease): endoplasmic reticulum (lumen) --- Certainty= 0.910(Affirmative) outside --- Certainty= 0.719(Affirmative) lysosome (lumen) --- Certainty= 0.190(Affirmative) endoplasmic reticulum (membrane) --- Certainty= 0.100(Affirmative)

More Related