240 likes | 379 Views
Considerations for Analyzing Targeted NGS Data Exome. Tim Hague , CTO. Exome Analysis. 3 sets of full exome sequences for the same individual, targeted by 3 different kits One set had data problems because reads were from 2 different sequencers
E N D
Considerations for Analyzing Targeted NGS DataExome Tim Hague, CTO
Exome Analysis • 3 sets of full exome sequences for the same individual, targeted by 3 different kits • One set had data problems because reads were from 2 different sequencers • Remaining 2 sets were analyzed both by the customer and by Omixon
Exome Targets • Illumina TruSeq ~62 Mbp • Nimblegen SeqCap EZ Exome ~64 Mbp • ~35 Mbp overlap between targets • Exons, ORFs and putative translated regions captured • 40M and 37M read pairs resp., 101bp length
Full Analysis Pipelines • In this case we are comparing two full NGS analysis pipelines • Including the mapping/alignment and a multi-step variant call pipeline • The Omixon pipeline for this analysis uses two variant callers • The Omixon pipeline also uses recalibration and indel realignment
Indel Handling • If indels are important to an analysis then this needs to be taken into account, from the planning stage onwards • BWA does better when indel realignment is used, in combination with paired data
Quality and Coverage • Some of these low quality variants can be removed by filtering, after variant call • Quality and coverage cut-offs have to be parameterized properly in the alignment and variant call • Quality recalibration can also help to reduce low quality false positives
Splicing and Promoters • Most of the exon kits also provide variant calls close to the coding regions • These should be included in the analysis if possible
Less false positives in complex regions 4. Higher coverage.
Less false positives in complex regions 5. Lower coverage.
Complex regions • Mismappings due to pseudogenes or repeats – or just complex regions? • Sometime more coverage can actually be bad • Need to watch out for non-specific read mappings (reads mapping to multiple places)
Very Complex Regions • Some regions are extremely difficult to map with any techniques • A different approach may be required to mapping/alignment • A different approach may be required to variant call (local de novo, phasing etc)
Problems with sex chromosomes • There are may heterozygous calls in the X and Y chromosomes that are certainly false positives or incorrect calls. • This is true for both pipelines, the read specificity and variant call procedure has to be improved for these chromosomes.
Summary • These kinds of comparative studies can be useful in analyzing the effectiveness of exome sequencing • Different exome kits can give different results • The data analysis and variant call tools chosen for the analysis can also have a big impact • There is some potential to improve the quality of the customer's exome analysis pipeline