1 / 17

Genomic Arrays: Tools for cancer gene discovery

Ian Roberts MRC Cancer Cell Unit Hutchison MRC Research Centre ir210@cam.ac.uk. Genomic Arrays: Tools for cancer gene discovery. What’s a genomic array?. A platform of regularly spaced genomic sequences All known genes or a subset of genes of interest

ruddj
Download Presentation

Genomic Arrays: Tools for cancer gene discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ian Roberts MRC Cancer Cell Unit Hutchison MRC Research Centre ir210@cam.ac.uk Genomic Arrays:Tools for cancer gene discovery

  2. What’s a genomic array? A platform of regularly spaced genomic sequences All known genes or a subset of genes of interest A tool for querying the genome about damage Genomic gains (oncogenes) Genomic losses (tumour suppressor genes) Applications Research  disease gene discovery Clinical  diagnostic tests

  3. Comparative genomic hybridisation GAIN: More test probe than reference probe (oncogene) LOSS: Reference probe in excess of test (tumour suppressor) Array platform Vast majority is normal Available probe + Tumour DNA (Test) Normal DNA (Reference)

  4. New generation arrays produce large amounts of data Agilent 244K array Raw data is foreground and background signal intensities in two channels Median ratio of foreground is important. 243,504 defined spots

  5. aCGH data analysis ... ... using camgrid

  6. Genomic array analysis strategy using R • array data is processed by snapCGH R package • Correct array data for background noise and mean distribution • Order data by genomic location • Apply an aCGH segmentation algorithm • Draw some plots • Determine significant findings (in house R functions) • Common and minimum genomic regions of gain and loss • Summarise output R  www.cran.r-project.org snapCGH  www.bioconductor.org parrot R on camgrid  http://www.bio.cam.ac.uk/local/condor-parrot.html

  7. Old vs. New genomic array plots Chromosome 7

  8. Significant region detection is computationally intensive

  9. Input data to snapCGH (e.g. 3 chrs, 2 analysis methods) Preprocess data Condor Job 1 Chr 1 Chr 2 Chr 3 Generate genome ordered data and condor dagman analysis batch files DNA copy GLAD DNA copy GLAD DNA copy GLAD Condor Job 2 Perform aCGH analysis + region detection (1 run per Chr per analysis method) DNAcopy dagman description file Segmentation Step 1. Clone call scoring Dagman job 1 … n CRI MRI Detection n. Clone call scoring Score combining Distributed aCGH analysis Consolidate output

  10. Condor job scripting in BASH & R • BASH function • Responsible for producing required condor files for discrete jobs • Default_submit has 2 positional parameters • R script name  $1 • Data files  $2 • Initiates aCGH analysis on grid. • Condor dagman R function set • R-scripter • Writes the appropriate R script for the current job • R-condor-submitter • Writes the condor job submission file • R-condor-executer • Writes the condor job executable file • R-job-descriptor • Writes the condor dagman description file

  11. End user abstraction – start_aCGH.sh • aCGH analysis undertaken by a single shell command • Manages array data input • Collects user specified parameters • Chromosome range • Segmentation algorithms • Significance thresholds • Links condor R job scripting

  12. start_aCGH.sh session on mole

  13. …. continued … 1 hr – 6 hr later! aCGH region information and plots

  14. Bio HMM Sample percentage Region size DNAcopy Sample percentage Region size Summary findings (38 arrays) • Rapid identification of regions of interest • Easy comparison of aCGH analysis via different algorithms

  15. Sample percentage Region size OSMR Real life application Retrospective analysis confirms initial findings! (summary of 38 samples)

  16. Future development • Tailor output for specific user requirements • Produce overall summary plot • Apply approach to expression arrays

  17. www.bio.cam.ac.uk/~ir210 • Grace Ng • Steph Carter • Konstantina Karagavriliidou • Jenny Barna • Mark Calleja • Nick Coleman

More Related