320 likes | 522 Views
3 rd Summer School in Computational Biology September 8, 2014. Frank Emmert-Streib Computational Biology and Machine Learning Laboratory Center for Cancer Research and Cell Biology Queen’s University Belfast, UK. Organizers of the summer school. General questions:
E N D
3rd Summer Schoolin Computational Biology September 8, 2014 Frank Emmert-Streib Computational Biology and Machine Learning Laboratory Center for Cancer Research and Cell Biology Queen’s University Belfast, UK
Organizers of the summer school General questions: Frank Emmert-StreibShu-Dong Zhang f.emmert-streib@qub.ac.uks.zhang@qub.ac.uk
Lecturers of the summer school ShaileshTripathi . Alexey Stupnikov & Kevin Keenan, David Simpson, Caroline Meharg, MyrtoKostadima, BoriMifsud
History of the summer school Number of participants
Organizational notes • Coffee breaks (short - foyer) • Lunch (1 hour) • Sign-in sheets • Internet access: • Students from QUB: Use your QUB account • External students: Guest account ShaileshTripathi
What will we learn? • different high-throughput data types: • Microarray data • Sequencing data (DNA-seq, RNA-seq, ChIP-seq) • basic statistics and machine learning methods • Hypothesis testing • Supervised & unsupervised learning • basic data visualization • importance of large-scale data in modern biology systems biology
Vision of the VC Universities require interdisciplinary engagement in the educational and research effort Professor Patrick Johnston of President and Vice-Chancellor (VC) of Queen’s University
What will we notlearn?(Adjusting expectations) • Example: • When learning a foreign language, how much can you learn in 3 days? • Analogy: • programming language • statistics/machine learning • biology The time it takes to become proficientin computational biology is comparable to the time to learn a language.
Good news! • The summer school in computational biology provides you with a guided start. • When you are from Belfast: • Journal club: computational biology and biostatistics (every Monday in the HSB, 3pm) • Degree: MSc in Computational Genomics & Bioinformatics • General problems/questions: Frank Emmert-Streib
Central Dogma of Molecular Biology Francis Crick, 1956
What is reproducible research? Reproducible researchis the ability that an entire study can be reproduced, either by the same researcher or an independent researcher. In this context is important.
Example In order to understand the meaning of reproducible research let’s consider the following examples. Task: Produce the figure.
Example In order to understand the meaning of reproducible research let’s consider the following examples. Task: Produce the figure. Approach: Adobe Illustrator Gimp CorelDraw Powerpoint
Example In order to understand the meaning of reproducible research let’s consider the following examples. Task: Produce the figure. Summary: How long did it take? t=30min How did you do it? Describe it in a report.
Example When you publish results, e.g., and someone wants to repeat the same or a similar analysis • How long does a re-analysis take? • How is a re-analysis done?
Example When you publish results, e.g., and someone wants to repeat the same or a similar analysis • How long does a re-analysis take? – 30min • How is a re-analysis done? – depends on the report you provided& the availability of the software
Alternative way to generate results Create the figure by writing a program. • Latex • freely available
Back to data analysis The same line or argumentation holds for the analysis of data. • Create a figure -> conduct a data analysis • Adobe Illustrator -> Partek, GenomeStudioetc
Back to data analysis The same line or argumentation holds for the analysis of data. • Create a figure -> conduct a data analysis • Adobe Illustrator -> Partek, GenomeStudioetc In order to obtain reproducible results in ‘genomics’ we use R.
Reproducible research • Analyze data by writing programs in R. • Share your data & your programs with others. Other groups can reproduce your results. For this reason we use R in this summer school.
Data sharing US National Institute of Health (NIH) requires that all generated genomics data funded by NIH must be shared online. Nature, 4 September 2014 Mandatory!