1 / 28

Considerations for Analyzing Targeted NGS Data BRCA

Considerations for Analyzing Targeted NGS Data BRCA. Tim Hague , CTO. Introduction. BRCA 1 and 2 are best known as 'cancer susceptibility' genes Actually the proteins repair damage in DNA Large number of known deleterious mutations Disproportionate number of indels. History.

huyen
Download Presentation

Considerations for Analyzing Targeted NGS Data BRCA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Considerations for Analyzing Targeted NGS DataBRCA Tim Hague,CTO

  2. Introduction • BRCA 1 and 2 are best known as 'cancer susceptibility' genes • Actually the proteins repair damage in DNA • Large number of known deleterious mutations • Disproportionate number of indels

  3. History • Mary-Claire King discovered BRCA1 and BRCA2, published the function • Myriad Genetics won the patent

  4. Distribution of known BRCA1 deletions >3 bp Indel size (nt)

  5. Dominuque Stoppa Lyonnet at Curie Institute „Large scale deletions could account for as many as one-third of all BRCA1 mutations in some populations”

  6. BRCA are tumor suppressor genes. 82% lifetime chance of developing breast/ovarian cancer. Science 2004, 306:2187-2191 >1,500 deleterious BRCA mutations 17 kbp coding region with mutation rate of 1/2000 NGS-based BRCA screening Leeds UK, Newgene UK, Ghent Belgium DIY genetic test published by Salzberg

  7. 82% chance of cancer >90% chance of being false positive/ negative

  8. What kind of NGS data? • False negatives must be avoided • Precision of both sequencing data and the data analysis is key • Looking for indels – indel detection abilities are a key criterion • Repeats are also an issue in BRCA region

  9. BRCA Repeats

  10. Homopolymer Errors Homopolymer errors look like small indelsand can cause noise Problem for:Roche 454 Ion Torrent

  11. Long Reads Read length is a limiting factor for insertion detection. When searching for indels, long reads can help. Long reads can also help with repeats. Roche 454 have the longest reads.

  12. Real examples with Roche 454 data

  13. Real examples with Roche 454 data

  14. Paired Reads • Paired reads can also help to increase effective 'read length' • Illumina MiSeq now has 2x250bp protocol

  15. Compare 9 open source and commercial NGS analysis softwares • In silico test with mutated reference BRCA gene • 2211 known BRCA variants 1341 SNOs, 320 insertions and 551 deletions • Full GATK pipeline used for variant call, including quality recalibration and indel realignment

  16. BWA Overall Sensitivity: 99.2% Paired End 94.5% Single End SNPs found: 99.5% PE 99.5% SE Insertions found: 99.4% PE 89.4% SE Deletions found: 98.5% PE 85.5% SE

  17. BWA False Negatives : 17 Paired End 121 Single End False Positives: 23 PE 168 SE The longest (60bp+) deletions were notfound, either with PE or SE data

  18. Indel sizes - BWA Single End

  19. Indel sizes - BWA Paired End

  20. Other Tools • Most other alignment tools showed a similar trend – much better results overall with Paired data • Only two of the tools tested found the longest deletions, even with Paired data

  21. Paired Reads - Conclusions • Much better for reliable variant detection than equivalent length single reads • Provided much better coverage in the BRCA region (spanning small repeats) If available, paired reads should be preferred

  22. Indel Detection - Conclusions • Not all tools are good at finding indels. • Burrows Wheeler based aligners can't find indels beyond a few base pairs in single reads, but can make better use of paired data – if indel realignment is also used. • They still can't detect the longest indels (there is just a gap in coverage). If indel detection is required, an indel sensitive tool should be used

  23. Overall - Conclusions • None of the alignment tools found all the variants • It will almost certainly require the same data to be analyzed with more than one tool, to get sufficiently accurate results

More Related