540 likes | 692 Views
CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014. Tamer Kahveci CISE Department University of Florida. Vital Information. Instructor: Tamer Kahveci Office: E566 Time: Mon/Wed/Fri 9:35 - 10:25 AM Office hours: Mon/Thu 2:00-2:50 PM Course page:
E N D
CIS 4930/6930 – Recent Advances in BioinformaticsSpring 2014 Tamer Kahveci CISE Department University of Florida
Vital Information • Instructor: Tamer Kahveci • Office: E566 • Time: Mon/Wed/Fri 9:35 - 10:25 AM • Office hours: Mon/Thu 2:00-2:50 PM • Course page: • http://www.cise.ufl.edu/~tamer/teaching/spring2014
Goals • This course will discuss the cutting edge developments in bioinformatics and computational biology. We will discuss in depth the recent publications on computational biology and bioinformatics with emphasis on computer science challenges and contributions particularly on biological networks.
Bioinformatics & Systems Biology • Bioinformatics is the science where computational and information science is used to understand biological data. • Systems biology studies the interactions between the components of biological systems, and how these interactions give rise to the function and behavior of that system.
This Course will • Give you exposure to research topics in bioinformatics. • Strongly encourage you to explore research problems and make contribution.
This Course will not • Teach you biology or fundamentals of bioinformatics. • Teach you programming • Teach you how to be an expert user of off-the-shelf molecular biology computer packages.
Course Outline • Introduction to terminology • Biological networks • Comparison of biological networks • Network motifs • Essentiality in networks • Network reconstruction
How can I get an A ? Grading Paper presentations Project HW & Quizzes • Bonus • 2.5% attendance • 2.5% project contribution 90+ = A- & above 80+ = B & above 70+ = C & above
Expectations • Require • Data structures and algorithms. • Coding (C, Java) • Encourage • actively participate in discussions in the classroom • read bioinformatics literature in general • attend colloquiums on campus • Academic honesty
Text Book • Not required, but recommended. • Class notes + papers.
Where to Look ? • Journals • Bioinformatics • Genome Research • PLOS Computational Biology • Journal of Computational Biology • IEEE Transaction on Computational Biology and Bioinformatics • Conferences • RECOMB • ISMB • ECCB • PSB • BCB
Goals • Understand major components of biological data • DNA, protein sequences, expression arrays, protein structures • Get familiar with basic terminology • Learn commonly used data formats
Genetic Material: DNA • Deoxyribonucleic Acid, 1950s • Basis of inheritance • Eye color, hair color, … • 4 nucleotides • A, C, G, T
Chemical Structure of Nucleotides Pyrmidines Purines
Making of Long Chains 5’ -> 3’
DNA structure • Double stranded, helix (Watson & Crick) • Complementary • A-T • G-C • Antiparallel • 3’ -> 5’ (downstream) • 5’ -> 3’ (upstream) • Animation (ch3.1)
Question • 5’ - GTTACA – 3’ • 5’ – XXXXXX – 3’ ? • 5’ – TGTAAC – 3’ • Reverse complements.
Repetitive DNA • Tandem repeats: highly repetitive • Satellites (100 k – 1 Gbp) / (a few hundred bp) • Mini satellites (1 k – 20 kbp) / (9 – 80 bp) • Micro satellites (< 150 bp) / (1 – 6 bp) • DNA fingerprinting • Interspersed repeats: moderately repetitive • LINE • SINE • Proteins contain repetitive patterns too
Genetic Material: an Analogy • Nucleotide => letter • Gene => sentence • Contig => chapter • Chromosome => book • Traits: Gender, hair/eye color, … • Disorders: down syndrome, turner syndrome, … • Chromosome number varies for species • We have 46 (23 + 23) chromosomes • Complete genome => volumes of encyclopedia • Hershey & Chase experiment show that DNA is the genetic material. (ch14)
Functions of Genes 1/2 • Signal transduction: sensing a physical signal and turning into a chemical signal • Enzymatic catalysis: accelerating chemical transformations otherwise too slow. • Transport: getting things into and out of separated compartments • Animation (ch 5.2)
Functions of Genes 2/2 • Movement: contracting in order to pull things together or push things apart. • Transcription control: deciding when other genes should be turned ON/OFF • Animation (ch7) • Structural support: creating the shape and pliability of a cell or set of cells
Introns and Exons 2/2 • Humans have about 25,000 genes = 40,000,000 DNA bases < 3% of total DNA in genome. • Remaining 2,960,000,000 bases for control information. (e.g. when, where, how long, etc...)
Protein DNA (Genotype) Phenotype Gene expression
Gene Expression • Building proteins from DNA • Promoter sequence: start of a gene • 13 nucleotides. • Positive regulation: proteins that bind to DNA near promoter sequences increases transcription. • Negative regulation
Microarray Animation on creating microarrays
Amino Acids • 20 different amino acids • ACDEFGHIKLMNPQRSTVWY but not BJOUXZ • ~300 amino acids in an average protein, hundreds of thousands known protein sequences • How many nucleotides can encode one amino acid ? • 42 < 20 < 43 • E.g., Q (glutamine) = CAG • degeneracy • Triplet code (codon)
Side Chain Molecular Structure of Amino Acid C • Non-polar, Hydrophobic (G, A, V, L, I, M, F, W, P) • Polar, Hydrophilic (S, T, C, Y, N, Q) • Electrically charged (D, E, K, R, H)
Direction of Protein Sequence Animation on protein synthesis (ch15)
Data Format • GenBank • EMBL (European Mol. Biol. Lab.) • SwissProt • FASTA • NBRF (Nat. Biomedical Res. Foundation) • Others • IG, GCG, Codata, ASN, GDE, Plain ASCII
Primary Structure of Proteins >2IC8:A|PDBID|CHAIN|SEQUENCE ERAGPVTWVMMIACVVVFIAMQILGDQEVMLWLAWPFDPTLKFEFWRYFTHALMHFSLMHILFNLLWWWYLGGAVEKRLGSGKLIVITLISALLSGYVQQKFSGPWFGGLSGVVYALMGYVWLRGERDPQSGIYLQRGLIIFALIWIVAGWFDLFGMSMANGAHIAGLAVGLAMAFVDSLNA
Secondary Structure: Alpha Helix • 1.5 A translation • 100 degree rotation • Phi = -60 • Psi = -60
Secondary Structure: Beta sheet anti-parallel parallel Phi = -135 Psi = 135
Tertiary Structure phi2 phi1 2N angles psi1
Tertiary Structure • 3-d structure of a polypeptide sequence • interactions between non-local atoms tertiary structure of myoglobin
Ramachandran Plot Sample pdb entry ( http://www.rcsb.org/pdb/ )
Quaternary Structure • Arrangement of protein subunits quaternary structure of Cro human hemoglobin tetramer
Structure Summary • 3-d structure determined by protein sequence • Prediction remains a challenge • Diseases caused by misfolded proteins • Mad cow disease • Classification of protein structure
Systems biology • A biological system is made up of components (e.g., proteins, genes, compounds) that interact with each other to affect one another. As a result they serve a set of functions of that system. • Internal factors can alter the networks. • E.g., gene expression and regulation. • External factors can alter the network. • E.g., drugs, radiation, food, temperature, bacteria and virus. • We develop quantitative mathematical models that can explain the how the interactions take place. • E.g., Boolean, stochastic, ordinary differential equations, probabilistic, etc. • We develop algorithmic methods to analyze the networks under these models.
Signal Transduction Networks • Vertices are proteins. • A directed edge from vertex X to vertex Y if X changes the activity level of Y under certain conditions
Transcription regulation networks • Two types of vertices: proteins (transcription factors, or TF’s) and genes • Edges are directed from TF’s to genes. • An edge from TF X to gene Y if X transcribes Y
Post-transcription regulation • Two types of vertices • RNA binding proteins • RNA • Directed edge from proteins to RNA RNA binding protein
Metabolic networks 1/2 • Various representations • Vertices are compounds and directed edges are biochemical reactions • Two types of vertices, one for compounds one for reactions. Directed edges from one type to the other.
Metabolic networks 2/2 • Reactions • Catabolism: breaking down large molecules, for example to harvest energy in cellular respiration • Anabolism: using energy to construct components of cells, such as proteins and nucleic acids
Protein-protein interaction (PPI) network • Vertices are proteins. • An edge between two vertices if the two proteins interact (i.e., form a protein complex). • Undirected edges.