1 / 12

Reference mapping and variant detection

Reference mapping and variant detection. Peter Tsai Bioinformatics Institute, University of Auckland. R eference mapping. The mapping is the process of comparing each read with the reference genome. There are many different software available to perform reference mapping

madra
Download Presentation

Reference mapping and variant detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland

  2. Reference mapping • The mapping is the process of comparing each read with the reference genome. • There are many different software available to perform reference mapping • Multiple placement of reads (multi-hits) • Allow gaps • Don’t allow gaps at all • Limits on number of mis-matches • Assess your mapping results • % of total reads mapped • % of uniquely mapped reads • Coverage statistics, variance in depth

  3. Mapped read depth

  4. Variant detection • Identification of point mutation, short insertion and deletion. • We go thought every column of the alignment and see how many alleles are found and how many are different to the reference genome. Reference: ACGAAACGTAGTGAGGAC-GTA sample: ACCAAACGTAGAGAGGACCGTA SNP SNP indels

  5. Complexity of variant detection • 2nd generation sequencing is NOT single molecule sequencing • Due to the PCR amplification, some DNA fragments will be sequenced more often than others => results in uneven coverage across the genome. • This would provide false support in variant detection, as we are usually more confident in variants that has higher coverage support. • Solution: Mark or remove exact duplicate reads when doing variant detection.

  6. Complexity of variant detection • Cloning process artifacts (e.g. PCR induced mutations). • Error rate associated with the sequence reads. • Error rate associated with the mapping. • Reliability of the reference genome.

  7. Calling a variant • A hard cut-off in percentage of difference to reference base. • 75% as minimum threshold for a variant to be call homozygous variant. • Percentage based cut-off assumes you have sufficient coverage. A (ref): 0% G: 100% A (ref): 7% T: 93%

  8. When to call a variant ? A: 18% C: 0% G: 55% T: 27%

  9. Alignment considerations • Perform local realignment and calculate mapping score to determine which one is better.

  10. What depth do I need ?

  11. Factors to consider • Read length • Longer reads are more likely to be mapped with high confidences • Sequencing depth • Require sufficient depth, ~30x • Base call quality for each supporting bases • Use high quality bases, Q30 • Mapping quality • Local realignment to improve variant calling

More Related