120 likes | 247 Views
Detecting Copy Number Variation With Short Paired Reads. Department of Computer Science University of Toronto Genome Informatics 2009. Paul Medvedev , Marc Fiume, Misko Dzamba, Tim Smith, Adrian Dalca, Mike Brudno. Copy Number Variants (CNVs).
E N D
Detecting Copy Number Variation With Short Paired Reads Department of Computer Science University of Toronto Genome Informatics 2009 Paul Medvedev, Marc Fiume, Misko Dzamba, Tim Smith, Adrian Dalca, Mike Brudno
Copy Number Variants (CNVs) • Large regions that appear a different number of times within different indiv. • CNVs are associated with • a number of diseases • Input • reference human genome • sequenced donor genome • Output • CNV annotations in ref
Previous Approach Using depth of coverage: Campbell et al 2008 Chiang et al 2009 Yoon et al 2009 Ref DOC Ref CNV CNV • Our Approach: • Capture adjacency information about the donor genome in a graph. • Use these adjacencies together with DOC
Donor Graph Step 1: represent referenceadjacencies
Donor Graph Step 1: represent reference adjacencies
Donor Graph Step 2: represent donor adjacencies Donor Ref
Donor Graph Step 2: represent donor adjacencies Donor Ref
Which walk is the donor? Path Use depth-of-coverage: Ref DOC Ref 1 2 2 1 1 1 1 CNV • We find a path that is “most faithful” to the DOC • using probabilistic model to score “faithfulness” • use network flow to find traversal counts of walk with max score
Preliminary Results • NA18507 individual sampled with Illumina, hg18 reference • Total of 3730 CNV calls • 2165 losses, 1565 gains Size Distribution
Preliminary Results Sensitivity: Kidd et al.’s (2008) LOSS calls (141 calls) Percentage of Kidd’s calls that overlap one of ours: After randomly shuffling our calls: Specificity: Database of Genomic Variants (DGV) Percent of our calls that overlap with DGV: • After randomly shuffling • our calls:
Conclusion • Presented a method for detecting CNVs • Combines • depth-of-coverage • paired-end mapping • Improves • compared to paired-end mapping: • Increased sensitivity in repeating regions • segmental duplications • compared to depth-of-coverage methods: • better resolution (1Kb vs. 30Kb) • Global optimization approach
Detecting Copy Number Variation Paul Medvedev Marc Fiume Misko Dzamba Tim Smith Adrian Dalca Mike Brudno Genome Informatics 2009