660 likes | 804 Views
Analysis of NGS raw data with Galaxy. Cleaning, data control, alignment, polymorphism. CIBA courses – Brasil 2011. Alexis Dereeper. Alexis Dereeper, François Sabot. Aim of the Tutorial classes: 1- Galaxy vs Command line 2- Understand FASTQ files 3- Cleaning of Illumina data (FASTQ)
E N D
Analysis of NGS raw data with Galaxy Cleaning, data control, alignment, polymorphism CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Aim of the Tutorial classes: 1- Galaxy vs Command line 2- Understand FASTQ files 3- Cleaning of Illumina data (FASTQ) 4- Perform an assembly 5- Perform a mapping of Illumina reads on a reference sequence 6- Cleaning of a multiple SAM file CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
1- Galaxy CIRAD Server : http://gohelle.cirad.fr/galaxy/ Serveur principal: http://main.g2.bx.psu.edu/ CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
TOOLS DATA CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache...) - On a single machine, or a cluster... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache...) - On a single machine, or a cluster... BUT - Simple support - Much less powerful than terminal - Only for routine analysis - Only for limited data CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CONNECTION FOR THE TUTORIAL CLASSES: http://gohelle.cirad.fr/galaxy/ CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Connecting... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Add data... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Import data from Galaxy libraries CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Import data from Galaxy libraries CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
FASTQ file → TEXT file STRUCTURE: @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb @HWUSI-EAS454_0006:1:37:16314:3410#CTTGTA AGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG + `bTbbccccceeeeeceeeecccYeedded`ceec]dddde^a`deeeec\`dddcbaadadYd`]]Jc_^bc^^\ CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
SEQUENCE NAME @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
IUPAC SEQUENCE @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb Quality in ASCII CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb f → Quality = 38 (102 – 64) CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
WHAT IS QUALITY ? Quality value Q is an integer mapping ofp (i.e., the probability that the corresponding base call is incorrect). CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
FASTQC: quality control http://www.bioinformatics.bbsrc.ac.uk/projects/download.html#fastqc CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Why do we need to clean ? To remove remaining adapters/primers and low quality sequences → CutAdapt CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
20 70 7 CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Your data are now ready to be analyzed... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Concatenate files CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Untested Tools → NGS → Assembly → Assemble with MIRA CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
BLAST of putative contigs against reference CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
BLAST of putative contigs against reference CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Separate sequences by original individuals RC1, RC2... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Separate sequences by original individuals RC1, RC2... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Separate sequences by original individuals RC1, RC2... Use of regular expression via Galaxy: → RC[13456789] & remove reads => keep RC2 → RC[123456789]_& remove reads => keep RC10 CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Separate sequences by original individuals RC1, RC2... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1- Compute positions for each read CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1- Compute positions for each read 2- Associate positions of each member of the pair CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1- Compute positions for each read 2- Associate positions of each member of the pair 3- Selection of the more probable position respecting the conditions CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference 1- Compute positions for each read 2- Associate positions of each member of the pair 3- Select of the more probable position respecting the conditions 4- Edit a SAM output file CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot
CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot