1 / 66

Alexis Dereeper, François Sabot

Analysis of NGS raw data with Galaxy. Cleaning, data control, alignment, polymorphism. CIBA courses – Brasil 2011. Alexis Dereeper. Alexis Dereeper, François Sabot. Aim of the Tutorial classes: 1- Galaxy vs Command line 2- Understand FASTQ files 3- Cleaning of Illumina data (FASTQ)

Download Presentation

Alexis Dereeper, François Sabot

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Analysis of NGS raw data with Galaxy Cleaning, data control, alignment, polymorphism CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  2. Aim of the Tutorial classes: 1- Galaxy vs Command line 2- Understand FASTQ files 3- Cleaning of Illumina data (FASTQ) 4- Perform an assembly 5- Perform a mapping of Illumina reads on a reference sequence 6- Cleaning of a multiple SAM file CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  3. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  4. 1- Galaxy CIRAD Server : http://gohelle.cirad.fr/galaxy/ Serveur principal: http://main.g2.bx.psu.edu/ CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  5. TOOLS DATA CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  6. WEB APPLICATION - “Click'n'Play” system - transparent for user CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  7. WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  8. WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache...)‏ - On a single machine, or a cluster... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  9. WEB APPLICATION - “Click'n'Play” system - transparent for user MODULAR - Numerous default bricks (already integrated) - Adding of customizable bricks MULTIPLE - Based on a web server (Apache...)‏ - On a single machine, or a cluster... BUT - Simple support - Much less powerful than terminal - Only for routine analysis - Only for limited data CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  10. CONNECTION FOR THE TUTORIAL CLASSES: http://gohelle.cirad.fr/galaxy/ CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  11. Connecting... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  12. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  13. Add data... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  14. Import data from Galaxy libraries CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  15. Import data from Galaxy libraries CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  16. FASTQ file → TEXT file STRUCTURE: @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb @HWUSI-EAS454_0006:1:37:16314:3410#CTTGTA AGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG + `bTbbccccceeeeeceeeecccYeedded`ceec]dddde^a`deeeec\`dddcbaadadYd`]]Jc_^bc^^\ CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  17. @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  18. SEQUENCE NAME @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  19. IUPAC SEQUENCE @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  20. @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  21. @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb Quality in ASCII CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  22. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  23. @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  24. @HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT + cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb f → Quality = 38 (102 – 64) CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  25. WHAT IS QUALITY ? Quality value Q is an integer mapping ofp (i.e., the probability that the corresponding base call is incorrect). CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  26. FASTQC: quality control http://www.bioinformatics.bbsrc.ac.uk/projects/download.html#fastqc CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  27. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  28. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  29. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  30. Why do we need to clean ? To remove remaining adapters/primers and low quality sequences → CutAdapt CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  31. 20 70 7 CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  32. Your data are now ready to be analyzed... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  33. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  34. Concatenate files CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  35. Untested Tools → NGS → Assembly → Assemble with MIRA CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  36. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  37. BLAST of putative contigs against reference CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  38. BLAST of putative contigs against reference CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  39. Separate sequences by original individuals RC1, RC2... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  40. Separate sequences by original individuals RC1, RC2... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  41. Separate sequences by original individuals RC1, RC2... Use of regular expression via Galaxy: → RC[13456789] & remove reads => keep RC2 → RC[123456789]_& remove reads => keep RC10 CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  42. Separate sequences by original individuals RC1, RC2... CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  43. Mapping: Map 'pair-end‘ reads on a reference 1- Compute positions for each read CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  44. Mapping: Map 'pair-end‘ reads on a reference 1- Compute positions for each read 2- Associate positions of each member of the pair CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  45. Mapping: Map 'pair-end‘ reads on a reference 1- Compute positions for each read 2- Associate positions of each member of the pair 3- Selection of the more probable position respecting the conditions CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  46. Mapping: Map 'pair-end‘ reads on a reference 1- Compute positions for each read 2- Associate positions of each member of the pair 3- Select of the more probable position respecting the conditions 4- Edit a SAM output file CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  47. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  48. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  49. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

  50. CIBA courses – Brasil 2011 Alexis Dereeper Alexis Dereeper, François Sabot

More Related