430 likes | 1.02k Views
Glue Grant H1 Analysis Tutorial. Weihong Xu 11/12/2008 Boston, MA. Outline. Introduction to array design and library files Image quantification (DAT->CEL) CEL reduction (CEL->exprCEL, remove SNP) Low level analysis (CEL->Expression Index) Practice session #1 Expression Console
E N D
Glue Grant H1 Analysis Tutorial Weihong Xu 11/12/2008 Boston, MA
Outline • Introduction to array design and library files • Image quantification (DAT->CEL) • CEL reduction (CEL->exprCEL, remove SNP) • Low level analysis (CEL->Expression Index) • Practice session #1 • Expression Console • High level analysis (Expression Index -> Gene List) • Practice session #2 If time permits, • Visualization • Glue Grant Exon Array Tools (beta-testing) • Practice session #3 Glue Grant H1 Analysis Tutorial
Introduction to array design • Significant change over Affymetrix exon array ST1.0 • More focused on known transcripts • Higher coverage • More comprehensive probe selection method • More contents: • exon probes 3.2M (0.32M targets) • junction probes 1M (0.25M targets) • coding SNP 1M (85K targets) • Untranslated Regions (UTR) 0.5M (50K targets) • tiling un-annotated units 0.5M (50K targets) • … • http://gluegrant1.stanford.edu/wiki/ Glue Grant H1 Analysis Tutorial
Some definitions (TC, EC, PSR, Juc, …) Glue Grant H1 Analysis Tutorial
Potential Analysis Questions • Gene expression • Alternative splicing • Transcript isoform deconvolution • Allele-specific expression • Antisense expression • … Glue Grant H1 Analysis Tutorial
Introduction to Library files • Support multiple tools: • Quality control • low level analysis and expression analysis using APT and Expression Console • High level analysis using dChip • Glue Grant Exon Analysis Tools; • visualization using cisGenomeBrowser or UCSC Genome Browser. • Library and annotation database • http://gluegrant1.stanford.edu/phpMyAdmin/ • username: ??? password: ??? • hglue – all tables are read-only • GlueArraySandBox – for users to generate personalized library files and annotations Glue Grant H1 Analysis Tutorial
Major types of library files • CLF - mapping of probe IDs to x/y in the CEL file • PGF - groups probes (by probe ID) into probe sets. • PS – a list of probe IDs • MPS – a list of meta probe set IDs with a corresponding list of probe set IDs • BGP – a list of Probe IDs to be used in background correction • QCC – a table of probe IDs for quality control and their corresponding type • KIL – a list of probe IDs to be ignored in DABG (probe with GC < 3) • http://www.affymetrix.com/support/developer/powertools/changelog/FILE-FORMATS.html Glue Grant H1 Analysis Tutorial
Image quantification (DAT->CEL) • Function: convert pixel image to probe intensity file • Gridding • Quantification • Software: • GeneChip Operating Software (GCOS) • Affymetrix GeneChip Command Console (AGCC) • http://www.affymetrix.com/products_services/software/specific/command_console_software.affx Glue Grant H1 Analysis Tutorial
CEL file reduction (CEL->exprCEL) • Function: remove SNPs to meet the IRB concern • Script: • Mac/Unix: modCEL.unix.pl --xymap=mapping_file \ --CEL=path/*.CEL --OUTDIR=path --Prefix=expr • PC: modCEL.pc.pl –xymap=mapping_file \ --CEL=filename.CEL --OUTDIR=path --Prefix=expr • Parameters: • xymap - mapping, hGlue1_0.r3.CEL2exprCEL.xymay • Prefix – a string that will be added to the CEL file name Glue Grant H1 Analysis Tutorial
Low Level Analysis (CEL->Expression Index) • APT/Expression Console and QC • Quality control • Extracting specific features • Background correction/Normalization/Summarization • Practice session (~30minutes to 1hr) Glue Grant H1 Analysis Tutorial
APT/Expression console • APT-Affymetrix Power Tool • Support both 3’ expression array and exon array • Support both expression and genotype analysis • Apt-probeset-summarize -- S(N(B)) • Apt-cel-extract -- extract features • Apt-dump-pgf -- extract probe/probeset information • Apt-summary-vis -- generating visualization track files • Apt-midas –alternative splicing • Memory management • http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx#1_1 Glue Grant H1 Analysis Tutorial
Overview of Quality Control • Function: ensure the quality and reproducibility of array result • What to assess? • Probe level • Per array: signal distribution of different probe types • Across array: overall signal distribution, PM-mean, BG-mean • Probe Set level (PSR, TC) • Per array: Pos_vs_Neg_AUC, Presence call • Across array: correlation plot (median correlation to other arrays in the same batch) Glue Grant H1 Analysis Tutorial
Quality Control Tool – GlueQC.R • requires R and APT • Syntax: Rscript GlueQC.R celpath outpath libpath • Libraries: • hGlue1_0.r3.clf • hGlue1_0.r3.pgf • hGlue1_0.r3.PSR.ps • hGlue1_0.r3.TC.mps • hGlue1_0.r3.KIL • hGlue1_0.r3.qc.clfpgf Glue Grant H1 Analysis Tutorial
Density distribution plot • Overall intensity range • separation between different probe types Glue Grant H1 Analysis Tutorial
All array density plot • Check the similarity of intensity distribution across arrays Glue Grant H1 Analysis Tutorial
QC summary plot • Check outliers in each plot • Flags can only be consider as caution sign, especially when the sample size is small Glue Grant H1 Analysis Tutorial
QC summary table Glue Grant H1 Analysis Tutorial
Extract features • Function: extract a subset of probe signals from CEL files • Tool: apt-cel-extract • Syntax: apt-cel-extract -o out.txt [-c chip.clf -p chip.pgf] [-d chip.cdf] [--probeset-ids=norm-exon.txt] *.cel • Parameters: • If using probeset-ids, CLF and PGF have to been supplied Glue Grant H1 Analysis Tutorial
Exampleslowlevelanalysis/extractfeatures.bat • extract all raw probe signal >apt-cel-extract -o raw_probe_signal.txt --cel-files CELlist.txt • extract quantile normalized and GC-background corrected probe signal >apt-cel-extract -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf --b hGlue1_0.r3.antigenomic.bgp -a quant-norm,pm-gcbg -o bgc_probe_signal.txt --cel-files CELlist.txt • extract probe signal of a specific content: “main->junction” >apt-dump-pgf -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf --probeset-type main --probeset-type junction -o juc.pgf >apt-cel-extract -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf --probe-ids juc.pgf -o juc_raw_probe_signal.txt --cel-files CELlist.txt Glue Grant H1 Analysis Tutorial
Background correction, normalization and summarization • Goal: transform probe signal into biological meaningful expression measure • Background correction -- remove non-target signal • Normalization --remove non-biological variance • Summarization -- summarize probe signal into probe set signal Glue Grant H1 Analysis Tutorial
apt-probeset-summarize • Syntax • apt-probeset-summarize –a rma-sketch [–a dabg] –c chip.clf –p chip.pgf –b chip.bgp –o outpath –m chip.mps [–kill-list chip.kil] *.CEL • Parameters • -a, analysis method • Chipstream format: a comma separated list of transformations with specific parameters passed as key value pairs, e.g. • rma-bg,quant-norm.sketch=-1.usepm=true.bioc=true,pm-only,med-polish • Predefined method: rma-sketch, dabg, rma, plier etc • --kill-list: needed when the analysis involves gc-bg • Windows: using ‘—cel-files filename’ instead of *.CEL Glue Grant H1 Analysis Tutorial
apt-probeset-summarize (2) • Background correction • gc-bg • rma-bg • Mas5-bg • Pm-gcbg • Pm-mm • Normalization • Quant-norm • Med-norm • Summarization • Plier/iter-plier • Median polish (RMA) • DABG • Median • No Li-Wong yet Glue Grant H1 Analysis Tutorial
ExamplesLowLevelAnalysis/bns.bat • PSR rma-sketch and dabg analysis apt-probeset-summarize -a rma-sketch -a dabg -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf -b hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -s hGlue1_0.r3.PSR.ps --qc-probesets hGlue1_0.r3.qcc -o BNS/PSR --cel-files CELlist.txt --kill-list hGlue1_0.r3.kil • TC (transcription cluster) Meta Probe Set rma-sketch or chipstream apt-probeset-summarize -a rma-sketch -a quant-norm.sketch=50000,pm-gcbg,iter-plier -c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -m hGlue1_0.r3.TC.mps -o BNS/TC --cel-files CELlist.txt --kill-list hGlue1_0.r3.kil • Compute U133Plus2 probe Set apt-probeset-summarize -a rma-sketch -c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -m hGlue1_0.r3.U133plus2.mps -o BNS/u133plus2 --cel-files CELlist.txt • Compute Human Exon ST1.0 Transcript Cluster apt-probeset-summarize -a rma-sketch -c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -m hGlue1_0.r3.HuEX_TC.mps -o BNS/huex --cel-files CELlist.txt Glue Grant H1 Analysis Tutorial
Apt-probeset-summarize output • [method].summary.txt – expression index matrix • [method].report.txt – quality control measures Glue Grant H1 Analysis Tutorial
Expression Console • Improvement over last tutorial • More summary options: EC, TC, JUC, EX, TX • Define probes into core, extended (multi probes) • Convert to U133plus2, HuEx format • Walk through an example • Summary • QC metrix • Link with annotation • Refer to doc/EC_Tutorial.doc (recycled from last tutorial) Glue Grant H1 Analysis Tutorial
Practice session #1 • CEL reduction (SNPremover) • GlueQC • GlueQC on data/07-20-08/CELlist_test.txt (15 arrays) • Low level Analysis • Feature extraction • Extract raw probe intensity of 15 arrays • Extract quantile normalized and GC-background corrected probe intensity of “main->junction” from 15 arrays • B.N.S • rma-sketch summary of PSR for 15 arrays • rma-sketch summary of TC for 15 arrays (use mps file from lib/GenBase) Glue Grant H1 Analysis Tutorial
High level analysis (Expression Index -> Gene List) • Array annotation and annotation files • Import APT results to dChip for high level analysis • A practice session Glue Grant H1 Analysis Tutorial
Array annotation (r3) • Update over r2 version • Corrected a bug caused by MySQL end-of-line problem • Added annotation for Transcript, Junction and other contents • Added annotation files for dChip and GenBase • Added BED files and REFFLAT files for Genome Browser • Refer to lib/readme.doc for details • Customerization: http://gluegrant1.stanford.edu/phpMyAdmin/ Glue Grant H1 Analysis Tutorial
hGlue1_0.r3.TC_annot.csv Glue Grant H1 Analysis Tutorial
hGlue1_0.r3.PSR_annot.csv Glue Grant H1 Analysis Tutorial
hGlue1_0.r3.Junction_annot.csv Glue Grant H1 Analysis Tutorial
dChip • Improve over last tutorial • Added Gene Ontology, KEGG pathway and chromosome band analysis • Walk through an example • Remove extra header and extra tail • Import external data into dChip • Differential Expression Analysis • Clustering/Enrichment • Chromosome/Genome enrichment Glue Grant H1 Analysis Tutorial
Practice session #2 • dChip Glue Grant H1 Analysis Tutorial
Visualization - cisGenomeBrowser • Light version of UCSC Genome Browser (Hui Jiang) • CEL image • Genome Region • http://biogibbs.stanford.edu/~jiangh/browser/index.html Glue Grant H1 Analysis Tutorial
cisGenomeBrowser-CEL Image Glue Grant H1 Analysis Tutorial
cisGenomeBrowser-Genomic Region Glue Grant H1 Analysis Tutorial
cisGenomeBrowser • Annotation track • hGlue1_0.r3.TC.refflat • hGlue1_0.r3.TX.refflat • Hg18.genefile (refseq track only) • Signal track (visualization/genCisGenomeBrowserTrack.bat) • probe raw signal barfile >genbar.pl –coord = hGlue1_0.r3.Probe.BED --signal = raw_probe_signal.txt –outdir = Probe_barfile • PSR barfile >genbar.pl --coord=hGlue1_0.r3.PSR.BED --signal=PSR/rma-sketch.summary.txt --outdir=PSR_barfile • Gene barfile >genbar.pl --coord=hGlue1_0.r3.TC.BED --signal=TC/rma-sketch.summary.txt --outdir=TC_barfile • Demo Glue Grant H1 Analysis Tutorial
Other Browsers • UCSC Genome Browser (visualization/genUCSCBrowsreTrack.bat) • apt-summary-vis -g hGlue1_0.r3.PSR.BED PSR/rma-sketch.summary.txt --wiggle-col-index 1 –o CEL1.PSR.wig • Need to tweak BED file to make PSR non-overlap in order to work on UCSC browser • Affymetrix Genome Browser • apt-summary-vis -g hGlue1_0.r3.PSR.BED PSR/rma-sketch.summary.txt –o PSR.egr Glue Grant H1 Analysis Tutorial
Glue Grant Exon Array tool • Highlights • Specially tailored for exon arrays • Command line with R interface • Probe sequence specific background model-MAT • Summarization: probe-selection (GenBase), Li-Wong model (dChip) and median-polish (RMA) • Integrated alternative splicing analysis (MADS) • Run analysis (GlueGrantExonArrayTool/runEAT.bat) • ../../bin/GlueGrantExonArrayTool/eat.win32.exe EXPR_param.conf -l ../../data/07-20-08/CELlist.txt • ../../bin/GlueGrantExonArrayTool/eat.win32.exe MADS_param.conf -l ../../data/07-20-08/CELlist.txt Glue Grant H1 Analysis Tutorial
Param.conf • Specify analysis parameters • Analysis type • Librarie files • Background correction method • Normalization method • Summarizaiton method • MADS parameters • Example: /GlueGrantExonArrayTool/Expr_param.conf Glue Grant H1 Analysis Tutorial
Practice session#3 • cisGenomeBrowser • Generate bar files for PSR and TC of 15 arrays in CELlist_test.txt from practice session#1 • Search for genes of your interests • Glue Grant Analsysis Tool • Repeat steps in runEAT.bat Glue Grant H1 Analysis Tutorial
Thank you Glue Grant H1 Analysis Tutorial