210 likes | 363 Views
How to use the web for bioinformatics. Ethan Strauss ethan.strauss@promega.com 274-4330 X 1171 http://www.q7.com/~ethan. Objectives. At the end of this session you should be able to do all of the following freely available tools on the world wide web:
E N D
How to use the web for bioinformatics Ethan Strauss ethan.strauss@promega.com 274-4330 X 1171 http://www.q7.com/~ethan
Objectives At the end of this session you should be able to do all of the following freely available tools on the world wide web: Use Genbank or a similar database to find nucleic acid sequences of interest Understand the parts of a Genbank entry Use a BLAST server (e.g. ) to find related sequences. Perform an alignment of several nucleic acid sequences Obtain the protein sequence which corresponds to a specific Nucleic acid sequence
How to find all those dang URLs! • http://q7.com/~ethan/molbio/
Outline • Sequence Databases • What does a Genbank Entry look like? • Translation and other Utilities • BLAST • Multiple Sequence Alignment • PCR Primer Design
Sequences Databases • NCBI databases – Nucleic acids, proteins, Literature, genomes, taxonomy, SNPs and more! • EMBL – Nucleic acid, protein, structure, microarray data and more. • DBJJ – Nucleic acid, protein. • SwissProt – Very well annotated protein database. • Many other general and specialized databases exist.
Sequences DatabasesNCBI/Genebank • Nation Center for Biotechnology Information (NCBI) • Sponsored and run by the US government. • Contains many different databases and huge amounts of information. • Most or all data is freely downloadable. • This one site is probably sufficient for all your Nucleic acid a protein database needs!
Sequences DatabasesEntrez • Allows searching and access to NCBI databases.
Sequences DatabasesSequence Records • LOCUS Number Size Type Topology Division Date • DEFINITION - Name of the Sequence • ACCESSION - Unique Id number • VERSION - Other numbers which are associated • KEYWORDS • SOURCE – What was it isolated from • ORGANISM - More taxonomic detail • REFERENCE - Paper or papers about the sequence • AUTHORS • TITLE • JOURNAL • FEATURES - A complete list of all of the features of a sequence. Can be very extensive and useful! • ORIGIN – The actual Sequence! • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=58533118
Hands on • Find a gene of interest using the Entrez interface. • We will be working with this sequence throughout class, so you may want to open a word processing program and save the sequence (only) there for future reference
General Utilities • http://searchlauncher.bcm.tmc.edu/seq-util/seq-util.html • Translation • Restriction Digestion • Reformatting (alternately FASTA Formatter) • Complement/Reverse • Etc. • http://www.promega.com/biomath/calc11.htm • Melting Temperature of an oligo.
Hands on • Translate your sequence in all 6 reading frames.
BLAST • Basic Local Alignment Search Tool • Compares a query sequences against all sequences in a database. • Very powerful for finding biologically significant relationships and full gene sequences in the database when you have a fragment etc. • Different types: • Nucleic acid – Nucleic Acid • Protein- Protein • Nucleic Acid Translation – Protein • Protein – Nucleic Acid Translation • Translation - Translation
Hands on • Use ~120 bases (2 lines) from your sequence to find at least two other sequences related to it. • Note that if we all hit NCBI BLAST at once, it will be slow. We may not have time to wait. • Get all 3 sequences (your original and two others) into FASTA format using READSEQ.
Multiple Sequence Alignment • Many programs can align multiple sequences with each other to find the best fit for all. • This is generally more biologically meaningful for protein sequences since they are more highly conserved. • Clustal is the most common.
Multiple Sequence Alignment • MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDSX ETIKALA MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDS...ETIKALA MEA..YLNAII.VLV.TIIAVIS..L.RTEPC.IkITGESITV.ACklDa.....I..L. MEAgaYLNAIIfVLVaTIIAVISrgLtRTEPCtIrITGESITVhAChiDsx etIkaLa • LK PLSLERLFQ LK.PLSLERLFQ ......L..... lk plsLerlfq
Hands on • Use your FASAT Formatted sequences to perform a multiple sequence alignment. • Transfer the alignment to a word processing program and see if you can make it look decent. • Change to Courier or Courier New • Reduce Font Size • Change to Landscape view
PCR Primer Design • There are many PCR primer design programs online and off. • I recommend Primer 3. It is complex, but powerful. • You can ignore most parameters.
Hands on • Design primers for the sequence you have been working with.
Homework • Report:Please turn in a report which includes the following: • Information about your initial sequence including: • Genebank Accession Number • Species • Description • Location of ORF and any other important features. • Information about the 4 other sequences including the above • Genebank Accession Number • Species • Description • Location of ORF and any other important features. • E value from your BLAST results. • The sequences of the PCR primers you chose or a short explanation of why you could not find primers to amplify all of these genes. • The multiple sequence alignment with the locations of the primers clearly marked.