250 likes | 451 Views
Biological Sequence Analysis. 140.638.01. The materials used in this class are made possible by:. Zhiping Weng, http://zlab.bu.edu Wenyi Wang Zhijin Wu Garland publishing, Alberts’s the Cell And the wealth of internet resources. Who are we?. Sining Chen Carlo Colantuoni
E N D
Biological Sequence Analysis 140.638.01
The materials used in this class are made possible by: • Zhiping Weng, http://zlab.bu.edu • Wenyi Wang • Zhijin Wu • Garland publishing, Alberts’s the Cell • And the wealth of internet resources
Who are we? • Sining Chen • Carlo Colantuoni • Giovanni Parmigiani
Who are you? • Field of research • Stats & computing background • Register or audit • Why are you taking this course • Specific topics you are interested
Administrative Details http://astor.som.jhmi.edu/~sining/BSA/syllabus.htm
The MHS program in Bioinfo • Jointly offered by Dept. Biostatistics and Molecular Microbiology and Immunology • An intensive one-year program that emphasizes biology, statistical methods, and computing
Goal of the class • Learn to look at biological sequences from a probabilistic point of view • Understand algorithms behind routine operations, e.g. BLAST. • Be able to build statistical model to solve problems involving sequences
Biological Sequence Analysis: Basic Biological Concepts Carlo Colantuoni Clinical Brain Disorders Branch, NIMH, NIH Dept. Biostatistics, JHSPH ccolantu@jhsph.edu colantuc@intra.nimh.nih.gov
Replication DNA Transcription Translation Molecular Cell Biology: Central Dogma RNA Protein Sequence analysis important at all 3 levels
The Human Genome Genomic Content: 3.3 billion bases ~30K genes 23 chromosomes (22+X/Y) Millions of variants DAD MOM 2 copies in every cell (46 chr) One copy from each parent Each parent passes on a “mixed copy” YOU
Nucleotides are the chemical building block of Nucleic Acids: DNA and RNA
Nucleotides are the chemical building block of Nucleic Acids: DNA and RNA
Protein-coding genes are not easy to find - gene density is low, and exons are interrupted by introns. From Genomic DNA to mRNA Transcripts EXONS INTRONS ~30K >30K Promoters Alternative splicing Poly-Adenylation
Molecular Cell Biology: Components of the Central Dogma Protein Translation protein coding START STOP mRNA AAAAA 5’ UTR 3’ UTR Transcription Genomic DNA 3.3 Gb
Translation - Protein Synthesis:Every 3 nucleotides (codon) are translated into one amino acid DNA: A T G C 1:1 RNA: A U G C 3:1 Protein: 20 amino acids Replication Transcription Translation
Translation - Protein Synthesis RNA Protein 5’ -> 3’ : N-term -> C-term
The Human Genome Genomic Content: 3.3 billion bases ~30K genes 23 chromosomes (22+X/Y) 2 copies in every cell One copy from each parent Each parent passes on a “mixed copy” DAD MOM Deletions Insertions Mutations Evolutionary Scale YOU
Biological Sequence Analysis: Primary Concepts Homologue Paralogue Ortholog Identity & Similarity