1 / 28

Bioinformatics

Bioinformatics. Overview School of B&I TCD May 2010. Who, me?. Andrew Lloyd atlloyd@tcd.ie 087-225-9850, 053-9255717, 01-896-2450 Director INCBI 1993-2000 Population genetics, evolution Whole genome analysis Immunology, chickens, FIRM. Definition/scope.

deon
Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Overview School of B&I TCD May 2010

  2. Who, me? • Andrew Lloyd • atlloyd@tcd.ie • 087-225-9850, 053-9255717, 01-896-2450 • Director INCBI 1993-2000 • Population genetics, evolution • Whole genome analysis • Immunology, chickens, FIRM

  3. Definition/scope • Storage, retrieval and analysis of biological (sequence) information. • Insert better definition here • Case can be made for microarray analysis • NOT • ecoinformatics (ecology) • Image analysis • Bar-coding hospital sheets

  4. Philosophy “Nothing worth learning can be taught” Oscar Wilde

  5. Getting bioinformation • Type it in: A,T,C,C,G,T,C,A (1991) • Access databases • Literature (Pubmed) • Medical (OMIM) • DNA sequence (EMBL/GenBank) • Protein sequence (UniProt, SwissProt, PIR) • 3-D structure (PDB)

  6. Annotation • In any DB, half is data and half context. • Gene ontology (language) • Parsing sequence (ORF, RBS, Intron, -helix) • Recognising similar sequences (evolution!) • Complementary info : DB cross-referencing • (DNA -> Protein -> 3D structure -> motifs)

  7. Secondary databases • Protein motifs, domains, families • RNA structures (16S ribosomal RNA…) • Taxonomy/classification • Metabolic pathways (KEGG) • Enzymes (Brenda, TCD, Ireland) • SNPs: mutations and variants • Disease DBs (OMIM) • Immuno, epitope DBs

  8. Complete genomes • Ensembl (complex, basically vertebrate) • Uniform look-and-feel; cross-refs • UCSC GoldenPath browser • Plants • Bacterial genomes • Including mitochondrial, chloroplast • Eubacteria vs Archaea vs Eukaryotes

  9. Annotated/known genes • What does my gene do? • Blast (fasta) against the DB • SRS/Entrez to access databases • Neighboring (similar things in same DB) • DB cross-references • full picture of attributes • What biochemical pathway?

  10. FullTextJournals OMIM GenBank/EMBLDNA Sequence UniProt Protein sequence PubMed Maps & Genomes Prosite Pfam PSSM PDB 3-D struct Taxonomy The territory

  11. Databases • BIG • EMBL/GenBank 200Gbp, 100m entries, 2500 complete genomes, 200K species • Encycl. Britannica 180m letters. 40m words • EMBL 1km of Britannica Volumes • Doubling every 14-18 mo • Human genome is X bp?

  12. Intrinsic vs Context Internal • DNA, protein sequence • DNA: Purine/Pyrimidine • AAs: small, hydrophobic, aromatic, polar • Variants: SNPs, Indels, Alt Splicing • 2ndry structure • DNA: stem/loops • Protein: helix, sheet, turn, loop

  13. Intrinsic vs Context External, context for your molecule • In other species (homologs, phylog trees) • In which cell • In which cellular location (GO) • Molecular complex (dimers) • Which pathway (KEGG) • Where in genome (neighbors, synteny)

  14. New Unknown Gene • Blast homology searching • Genomic location/neighboring genes • Where is it expressed? • How regulated (control sequences) • Intron/exon structure • Domain structure • Restriction sites etc. • Primer design

  15. DNA/gene structure • Four bases A T C G U • 2 pyrimidine, 2 purine • LOTS of them: how many? • Open reading frame • 5’ signals, 3’ signals • Introns/exons • Neighbours (operons)

  16. Two sequences • Alignment • Local • Global • Dotplot • Threading

  17. One seq vs many • Homology search vs database • Special case of 2-seq alignment • Blast vs fasta • Limit by species/taxon • Substitution matrices • Low complexity masking

  18. Multiple sequence alignment • MSA • Progressive alignment • ClustalW or (better) T-Coffee

  19. Phylogenetic trees • Computationally intensive • Distance matrix methods • Neighbor-joining (NJ) • UPGMA • Minimum evolution • Maximum parsimony • Maximum likelihood • Bayesian methods

  20. Genefinding • Special case of DNA analysis • How to annotate a genome • Bacterial • Find open reading frames (ORFs) • With start/stop codons • With promoter, RBS, CAAT, TATA • Eukaryotic • As above PLUS • Introns/exons • Alternative splicing

  21. Typical mammalian gene structure miRNAs? Introns Start (ATG) Stop ControlRegion DNA gt.. …ag 5’ 3’ Exon 2 Exon 3 Exon 4 Exon 1 Introns “spliced out” and discarded RNA RNA Stop: TAG, TGA, TAA ATGCCCAGGAGATTTGGA . . . MetProArgArgPheGly . . . PROTEIN

  22. Protein substructure • DNA makes protein and protein (enzymes) make everything else. • 20 Amino acids • Amino acid properties • Motifs • Domains • Biological units

  23. Amino acid propertiesagain … and again and again

  24. Protein 3-D structure • Relationship between sequence & structure • Secondary structure • Alpha helix • Beta sheet • Coil • Turn • Threading sequence to homologous structure

  25. Gene Expression • EST • SAGE • MicroArray • Clustering of same expressed genes

  26. Genomics • Complete DNA seq for a species • Gene order • Gene clusters/operons • Missing operons • Gene duplication • Whole genome duplication (WGD)

  27. SNPs • Key issue in genetics is that two organisms are both the same and different: • Humans vs chimps vs mouse • Parent vs offspring vs co-national vs human • Single nucleotide polymorphisms • Variation between individuals • Pharmacogenetics • Personal tailored medicine

  28. Summary/take home • Course designed to give you access to databases, software tools • …and ways of thinking about data

More Related