300 likes | 510 Views
Considerations for Analyzing Targeted NGS Data BRCA. Tim Hague , CTO. Introduction. BRCA 1 and 2 are best known as 'cancer susceptibility' genes Actually the proteins repair damage in DNA Large number of known deleterious mutations Disproportionate number of indels. History.
E N D
Considerations for Analyzing Targeted NGS DataBRCA Tim Hague,CTO
Introduction • BRCA 1 and 2 are best known as 'cancer susceptibility' genes • Actually the proteins repair damage in DNA • Large number of known deleterious mutations • Disproportionate number of indels
History • Mary-Claire King discovered BRCA1 and BRCA2, published the function • Myriad Genetics won the patent
Distribution of known BRCA1 deletions >3 bp Indel size (nt)
Dominuque Stoppa Lyonnet at Curie Institute „Large scale deletions could account for as many as one-third of all BRCA1 mutations in some populations”
BRCA are tumor suppressor genes. 82% lifetime chance of developing breast/ovarian cancer. Science 2004, 306:2187-2191 >1,500 deleterious BRCA mutations 17 kbp coding region with mutation rate of 1/2000 NGS-based BRCA screening Leeds UK, Newgene UK, Ghent Belgium DIY genetic test published by Salzberg
82% chance of cancer >90% chance of being false positive/ negative
What kind of NGS data? • False negatives must be avoided • Precision of both sequencing data and the data analysis is key • Looking for indels – indel detection abilities are a key criterion • Repeats are also an issue in BRCA region
Homopolymer Errors Homopolymer errors look like small indelsand can cause noise Problem for:Roche 454 Ion Torrent
Long Reads Read length is a limiting factor for insertion detection. When searching for indels, long reads can help. Long reads can also help with repeats. Roche 454 have the longest reads.
Paired Reads • Paired reads can also help to increase effective 'read length' • Illumina MiSeq now has 2x250bp protocol
Compare 9 open source and commercial NGS analysis softwares • In silico test with mutated reference BRCA gene • 2211 known BRCA variants 1341 SNOs, 320 insertions and 551 deletions • Full GATK pipeline used for variant call, including quality recalibration and indel realignment
BWA Overall Sensitivity: 99.2% Paired End 94.5% Single End SNPs found: 99.5% PE 99.5% SE Insertions found: 99.4% PE 89.4% SE Deletions found: 98.5% PE 85.5% SE
BWA False Negatives : 17 Paired End 121 Single End False Positives: 23 PE 168 SE The longest (60bp+) deletions were notfound, either with PE or SE data
Other Tools • Most other alignment tools showed a similar trend – much better results overall with Paired data • Only two of the tools tested found the longest deletions, even with Paired data
Paired Reads - Conclusions • Much better for reliable variant detection than equivalent length single reads • Provided much better coverage in the BRCA region (spanning small repeats) If available, paired reads should be preferred
Indel Detection - Conclusions • Not all tools are good at finding indels. • Burrows Wheeler based aligners can't find indels beyond a few base pairs in single reads, but can make better use of paired data – if indel realignment is also used. • They still can't detect the longest indels (there is just a gap in coverage). If indel detection is required, an indel sensitive tool should be used
Overall - Conclusions • None of the alignment tools found all the variants • It will almost certainly require the same data to be analyzed with more than one tool, to get sufficiently accurate results