410 likes | 442 Views
Join our course to survey bioinformatics techniques, explore high-throughput data, and enhance practical skills. Learn to decode biological findings and foster collaboration between biology and computation.
E N D
BF528 - Applications in Translational Bioinformatics 1/23/2019
Instructor Introductions • Instructor: Adam Labadorf • TAs: • Emma Briars • Dakota Hawkins • Zhe Wang
Course Overview • Survey course in bioinformatics • Focus on high-throughput sequencing data, tools, and techniques • Focus on practical skills • Group work simulates real-world collaborative environment
Course Goals • Survey current bioinformatics techniques in translational studies • Give you hands-on experience working with high-throughput biological data and tools • Read and understand papers that use bioinformatics in translational studies • Develop shared vocabulary between biology and computation
Prerequisites • Molecular and cell biology • BF527, BE505/605 or equivalent • Good-to-haves: • Basic statistics knowledge • Programming/linux cluster experience • But don’t panic...
Course Organization • http://bf528.readthedocs.io • Wed/Fri 2:30-4:15 STH 318 • Some online content early in semester • Online content limited to ~1 hr/class • Class period split into two segments: • Lecture or discussion of online material • Project group meeting and discussion
Course Organization cont’d • Students assigned into groups of 4 • Each group has a primary TA • 4 projects over the course of the semester • The last project is an individual project • No homeworks • No exams
Projects • Assigned into groups based on experience • Groups are for the entire semester • You will reproduce published findings from published manuscripts • Each project has a full writeup
Project Groups • Group members will play one of four roles: • Data Curator - find, download, and organize data • Programmer - process data into analyzable form • Analyst - transform processed data into interpretable form • Biologist - understand paper and biological context, help interpret results • Roles rotate for each project • Structured class time to help facilitate group work and help each other!
Project Group Meeting : Wednesdays • Time allotted for groups to meet and discuss progress • “Stand-up” meeting structure: • “What did I work on since our last meeting?” • “What challenges did I encounter?” • “Are there any obstacles to completing my work?” • “What will I be working on for next meeting?” • Each group will make a brief status report at the end of class
Project Group Meeting : Fridays • Time allotted for roles to meet and discuss progress • Similar structure to Wednesdays • Share challenges and solutions among roles • Each role group will make a brief status report at the end of class
Project Report • Organized like a published study • Sections (primary role): • Intro - background and motivation (Biologist) • Data - data description (Data Curator) • Methods - processing and tools (Programmer) • Results - findings (Analyst) • Discussion - interpret findings (Biologist) • Conclusion (all)
Assessment • Each project is 25% of your total grade • Broken down: • Intro, Conclusion - 2.5% • Data, Methods, Results, Discussion, 20% • Stand-up participation: 15%
Biology as Data Science • 1953 - DNA structure published in Nature • 1972 - first genetic sequence determined, protein DataBank • 1977 - Sanger sequencing, first genome sequenced • 1983 - PCR technique invented • 1990 - Human Genome Project begun • 1995 - first bacterial genome sequenced, microarray technology first described • 1997 - yeast genome on a microarray, sequencing by synthesis concept established • 1998 - first multicellular eukaryote sequenced • 2001 - first draft of human genome • 2006 - Solexa Genome Analyzer released
“Big” Data • Single Microarray dataset: ~500Mb • Single short read dataset: ~2Gb-300Gb • Human genome reference sequence: ~2Gb • One run of Illumina instruments: • HiSeq 2500: ~1Tb • NovaSeq 6000: ~6Tb • Gene Expression Omnibus (GEO): • 2014: 1,237,138 samples, ~28 Tb • 2018: 2,335,694 samples, ?? Tb
What is Bioinformatics? “Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to study and process biological data.” Wikipedia
Conceptual History of Bioinformatics • Biological sequences digitized • Biological databases needed to store sequences • Search tools needed for databases • Tools for analyzing data from searches • Computational tools required to analyze human genome • Sophisticated sequence analysis tools enable analysis of large amounts of sequencing data • Sequencing data volume explodes, requiring new tools • And here we are
The Biologist’s Tools Wet lab biologists: Bioinformaticians:
Sequence: The Fundamental Datatype Sequence • Computer Science • genome assembly, homology, phylogeny • Physics • DNA/RNA/protein structure, drug prediction • Statistics • gene expression, population genetics, biomarkers • Mathematics • metabolic modeling, synthetic biology, systems biology
Translational Bioinformatics “Translational Bioinformatics is an emerging field in the study of health informatics,focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics, and clinical informatics.” Wikipedia
For Next Time Assignment: familiarize yourself with the material on basic command line usage found here: Workshop 0. Basic Linux and Command Line Usage
SSH and SCC • SCC - Shared Compute Cluster • You all have accounts on SCC • You will need an ssh client program to connect: • Mac, Linux: Terminal (included) • Windows: MobaXTerm • Connect to: scc1.bu.edu with your BU username/password Demonstration
Rank the following roles that you might play in a project in order of preference
Rank the following roles that you might play in a project in order of preference 2019 2018
How comfortable are you with the following programming languages/concepts?
How comfortable are you with the following statistics concepts?
How comfortable are you with the following biology concepts?
How comfortable are you with the following bioinformatics concepts?
Rank the following roles that you might play in a project in order of preference