80 likes | 230 Views
Practically Genomic A hands-on bioinformatics IAP. Course Materials: http://rous.mit.edu/index.php/IAP_2012 Instructors : Paola Favaretto, Sebastian Hoersch, Charlie Whittaker and Courtney Crummett KI for Integrative Cancer Research at MIT and MIT Libraries
E N D
Practically GenomicA hands-on bioinformatics IAP • Course Materials: http://rous.mit.edu/index.php/IAP_2012 Instructors: Paola Favaretto, Sebastian Hoersch, Charlie Whittaker and Courtney Crummett KI for Integrative Cancer Research at MIT and MIT Libraries • Students - Wide range of experience levels • Unix account access information will be provided • Evaluations - Please send comments to charliew@mit.edu
Turning Biologists into Bioinformaticists - A practical approach Target Audience are KI Biologists The teaching material should: • be modular and practical • have obvious contextual relevance • serve as readily accessible and easily used reference materials The students should: • become aware of the contents of a basic bioinformatics toolkit • learn how to find instructions covering tools and methods. • experiment with different methods covered in classes • gain familiarity and comfort with command-line computing
Turning Biologists into Bioinformaticists - A practical approach – the specifics • Theory - Core Bioinformatics Concepts • Important principles required to use bioinformatics • Tools - A Basic Bioinformatics Toolkit • The software of bioinformatics • Tasks - Bioinformatics Methods • Data analysis with bioinformatics • Under Development! http://rous.mit.edu/index.php/Teaching
IAP 2012 Agenda (subject to change) • 1-23-12 • Introduction • Getting more from Excel • Unix Introduction • 1-25-12 • Next Generation Sequence Analysis with Unix and Galaxy • 1-27-12 • Visualization and Analysis of Genomics Data rous.mit.edu
Theory – Genomic Data • All kinds of genomics data are described using at least 4 pieces of information. • The name of a DNA sequence name • A position on that sequence • A feature that exists at that position. • Genome assembly version Sequence1 Position Feature Chromosome1 1314 Mutation BED, GFF, GTF formats • Sequence 1 is a long block of sequence arranged by a process called genome assembly. • This is critical because the 3 pieces of information described above are only meaningful for one specific assembly version. A new version of the genome will probably not have this mutation at position 1314. It would be located elsewhere.
Theory – Microarray Data Target features created on a surface Labeled material hybridized Image analysis • Used for: • Gene expression analysis • Polymorphism detection • Copy number analysis • DNA binding studies ProbeID Sample1 Sample2 Sample3 Sample4 1007_s_at 10.93 11.44 11.19 11.64 1053_at 8.28 7.54 8.06 7.32 117_at 3.31 3.41 3.13 3.13 121_at 4.42 4.32 4.46 4.63 1255_g_at 1.8 1.7 1.75 1.81 Data is gathered about the features present on the array.
Theory – Next Generation Sequencing (NGS) Generate DNA fragments Attach to surface and amplify in situ. Subject surface to cycles of imaging/chemistry. Image analysis to call base sequences and qualities • Used for: • Gene expression analysis • Polymorphism/Mutation detection • Copy number analysis • Mixture Quantization • DNA or RNA binding studies • others… 200+ million clusters per experiment Data is gathered about everything in the input mixture.
Theory – NGS Alignment Files SAM Format Flag MapQual Base Quality Reference 2:75:1538:897 16 chr1 8291 0 60M AGGCCAGGCCCTC HHHHHGGH@HGHHHHH 4:31:101:1130 16 chr1 8328 1 60M CACCTACTTGCCA ################ Sequence Position Query CIGAR • Each line has a lot of information (not all columns are shown) • One experiment = millions of lines = many Gb of data • Scale of the data causes problems with Excel etc.