1 / 8

Practically Genomic A hands-on bioinformatics IAP

Practically Genomic A hands-on bioinformatics IAP. Course Materials: http://rous.mit.edu/index.php/IAP_2012 Instructors : Paola Favaretto, Sebastian Hoersch, Charlie Whittaker and Courtney Crummett KI for Integrative Cancer Research at MIT and MIT Libraries

euclid
Download Presentation

Practically Genomic A hands-on bioinformatics IAP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practically GenomicA hands-on bioinformatics IAP • Course Materials: http://rous.mit.edu/index.php/IAP_2012 Instructors: Paola Favaretto, Sebastian Hoersch, Charlie Whittaker and Courtney Crummett KI for Integrative Cancer Research at MIT and MIT Libraries • Students - Wide range of experience levels • Unix account access information will be provided • Evaluations - Please send comments to charliew@mit.edu

  2. Turning Biologists into Bioinformaticists - A practical approach Target Audience are KI Biologists The teaching material should: • be modular and practical • have obvious contextual relevance • serve as readily accessible and easily used reference materials The students should: • become aware of the contents of a basic bioinformatics toolkit • learn how to find instructions covering tools and methods. • experiment with different methods covered in classes • gain familiarity and comfort with command-line computing

  3. Turning Biologists into Bioinformaticists - A practical approach – the specifics • Theory - Core Bioinformatics Concepts • Important principles required to use bioinformatics • Tools - A Basic Bioinformatics Toolkit • The software of bioinformatics • Tasks - Bioinformatics Methods • Data analysis with bioinformatics • Under Development! http://rous.mit.edu/index.php/Teaching

  4. IAP 2012 Agenda (subject to change) • 1-23-12 • Introduction • Getting more from Excel • Unix Introduction • 1-25-12 • Next Generation Sequence Analysis with Unix and Galaxy • 1-27-12 • Visualization and Analysis of Genomics Data rous.mit.edu

  5. Theory – Genomic Data • All kinds of genomics data are described using at least 4 pieces of information. • The name of a DNA sequence name • A position on that sequence • A feature that exists at that position. • Genome assembly version Sequence1 Position Feature Chromosome1 1314 Mutation BED, GFF, GTF formats • Sequence 1 is a long block of sequence arranged by a process called genome assembly. • This is critical because the 3 pieces of information described above are only meaningful for one specific assembly version. A new version of the genome will probably not have this mutation at position 1314. It would be located elsewhere.

  6. Theory – Microarray Data Target features created on a surface Labeled material hybridized Image analysis • Used for: • Gene expression analysis • Polymorphism detection • Copy number analysis • DNA binding studies ProbeID Sample1 Sample2 Sample3 Sample4 1007_s_at 10.93 11.44 11.19 11.64 1053_at 8.28 7.54 8.06 7.32 117_at 3.31 3.41 3.13 3.13 121_at 4.42 4.32 4.46 4.63 1255_g_at 1.8 1.7 1.75 1.81 Data is gathered about the features present on the array.

  7. Theory – Next Generation Sequencing (NGS) Generate DNA fragments Attach to surface and amplify in situ. Subject surface to cycles of imaging/chemistry. Image analysis to call base sequences and qualities • Used for: • Gene expression analysis • Polymorphism/Mutation detection • Copy number analysis • Mixture Quantization • DNA or RNA binding studies • others… 200+ million clusters per experiment Data is gathered about everything in the input mixture.

  8. Theory – NGS Alignment Files SAM Format Flag MapQual Base Quality Reference 2:75:1538:897 16 chr1 8291 0 60M AGGCCAGGCCCTC HHHHHGGH@HGHHHHH 4:31:101:1130 16 chr1 8328 1 60M CACCTACTTGCCA ################ Sequence Position Query CIGAR • Each line has a lot of information (not all columns are shown) • One experiment = millions of lines = many Gb of data • Scale of the data causes problems with Excel etc.

More Related