270 likes | 402 Views
Intro to Biocoductor and GeneGraph. by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012. Bioconductor repository. Is the repository with extensions and libraries for R-language found at http://www.bioconductor.org/ Bioconductor libraries cover Micorarray analysis
E N D
Intro toBiocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012
Bioconductor repository • Is the repository with extensions and libraries for R-language found at http://www.bioconductor.org/ • Bioconductor libraries cover • Micorarray analysis • Genetic variants analysis (SNPs) • Sequence analysis (FASTA, RNA-seq) • Annotation (pathways, genes) • High-troughput assays (Mass Spec) • All libraries are free to use and contain documentation (i.e. vignettes) • Vignettes are short and require some previous knowledge of R and/or other defendant libraries
Configuring R for Bioconductor • To install Bioconductor libraries the R environment needs to be configured source("http://bioconductor.org/biocLite.R") • To download anyBioconductor library use biocLite("package_name")function biocLite("GenomeGraphs") • Load the downloaded library functions via the library("lib_name")function library("GenomeGraphs") • Read help file(s) via the ?? or browseVignettes() ??library_name (e.g. ??GenomeGraphs) browseVignettes(package = "GenomeGraphs")
Genome Data Display and Plotting The GenomeGraphs library Authors: Steen Durinck and James Bullard (Bioconductor library)
Intro to GenomeGraphs library • Allows to retrieve data from • Ensembl comprehensive DB on genomes • intron/exon locations • sequence genetic variation data • protein properties (pI, domains, motifs) • GenomeGraphsBioc library allows to display data on: • Gene Expression (expression / location) • Comparative genomic hybridization (CGH) • Sequencing (e.g. variations/introns/exons)
GenomeGraphs: Plotting with gdPlot • gdPlotmain plotting function gdPlot(gdObjects, minBase, maxBase ...) • gdObjects = any objects created by GenomeGraphs a) BaseTrack; b) Gene; c) GenericArray; d) RectangleOverlay • minBasethe lowest nt position to be plotted (optional) • maxBasethe largest nt to display (optional) • Objects are data structures that hold/organize a set of variables 1) Create an object to plot using makeBaseTrack() function ObjBaseT = makeBaseTrack(1:100, rnorm(1:100),strand = "+") 2) Plot the newly created object gdPlot(ObjBaseT) • Display object structure / properties (e.g. of BaseTrack) attributes(ObjBaseT) • $ sign next to the name represents object variables • Change or view individual object variables attr(ObjBaseT, “strand")
Manipulating GenomeGraphs classgraphical properties • Changing values of variables (classical way) attr(Object, "variable/parameter") = value attr(ObjBaseT, "strand") = "-“ • If an array, to access individual elements use [element number 1..n] attr(ObjBaseT, "variable/parameter")[number] = new_value attr(ObjBaseT, "base")[1] = 0 • Changing graphical parameters such as color showDisplayOptions(ObjBaseT) alpha = 1 lty = solid color= orange lwd= 1 size = 5 type = p getPar(ObjBaseT, "color") [1] "orange" setPar(ObjBaseT, "color", "blue")
gdPlot composite/group plots • gdPlot(…) can take any number of gdObjects to plot • Let’s plot twoBaseTrack objects a and b a = makeBaseTrack(1:100, rlnorm(100), strand = "+") b = makeBaseTrack(1:100, rnorm(100), strand = "-") ab=list(a,b) gdplot(ab)
Manipulating values of the groupedgdObjects • ab is list of objects a and b • To access individual object within the list use [ ] ab[1] will display object a ab[2] will display object b • To modify the grouped object use double [[ ]] • To change color of b to red getPar(ab[[2]], "color") setPar(ab[[2]], "color", "red") -OR- (re-create object) b = makeBaseTrack(1:100, rnorm(100), strand = "-", dp=DisplayPars(color="red")) ab=list(a,b) gdPlot(ab)
Plotting with labels and legend • gdPlot does not provide functions to label axis • Trick = “use labeled / tagged” objects "label" = GenomeGraph object "+ strand"= makeBaseTrack(1:100, rlnorm(100), strand = "+") • To display legend use makeLegend("text","color") ab=list("+"=a, "-"=b, makeGenomeAxis(), makeLegend(c("+", "-"),c("orange", "red")) )
Retrieving and Displaying Data from Public Database Combining capabilities of biomaRt and GenomeGraph libraries
Welcome to biomaRt • biomaRt library allows to retrieve data from public DBs ensembl ENSEMBL GENES 68 (SANGER UK) snp ENSEMBL VARIATION 68 (SANGER UK) unimart UNIPROT (EBI UK) bacteria_mart_14 ENSEMBL BACTERIA 14 (EBI UK) ***Use listMarts() to see all available databases*** • Let’s retrieve gene data of the Bacillus subtilisstrain • useMart(database,dataset)allows to connect to specified databaseand dataset within this database db=useMart("bacteria_mart_14") listDatasets(db) Dataset Description version … … … bac_6_gene Bacillus subtilis genes (EB 2 b_subtilis) EB 2 b_subtilis db=useMart("bacteria_mart_14", "bac_6_gene")
Exploring biomaRt object • listAttributes()shows all prop. of the biomaRtobj. • The db object has total of 4175 genes • Use getBM(attribute, filter, value, biomaRt_obj)toextract values belonging to specified attributes • attribute: general term such as gene name/chromosome # / strand (+ or - ) • filter: parameter applied on attribute such as genomic region to consider (i.e. start and end in nt) • value: actual values of the applied filter(s) getBM(c("external_gene_id", "description","start_position", "end_position", "strand"), filters = c("start", "end"), values = list(1,10000), db) ID description start(nt) end(nt) strand metSMethionyl-tRNAsynthetase 45633 47627 1 ftsH Cell division protease ftsH homolog 76984 78897 1 hslO 33 kDachaperonin 79880 80755 1 DgkDeoxyguanosine kinase 23146 23769 -1
Plotting the selected “Genome Region” • Create an object with makeGeneRegion()function makeGeneRegion(start,end,chromosome name, strand, biomaRt object, plotting options) • Find notation used for the chromosome naming getBM("chromosome_name","","",db) chromosome_name 1 Chromosome gRegion = makeGeneRegion(1, 10000, chr = "Chromosome", strand = "+", biomart = db, dp = DisplayPars(plotId = TRUE, idRotation = 90, cex = 0.8, idColor = "black")) gdPlot(list(gRegion, makeGenomeAxis(), makeTitle("Position(nt)",cex=3,"black",0.1)))
Bacillus subtilis genome region (intron / exon) ensembl_gene_id
Mapping Expression data RNAseq GenomeArray()
Intro into RNA-seq data 1Ugrappa Nagalakshmi et. al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008 • HT sequencing technologies allow to sequence mRNA in a series of short contigs of 50-200 bp • In addition to gene expression analysis it possible to • Map location of introns (UTRs) / exons • Principal: one searches for a rapid changes in abundance of the RNA-Seq signal (contigs) • Integration of sequence + expression information • Ugrappa Nagalakshmi et. al. 2008 had used this strategy to accurately map yeast genome1 • Task: Display part of the seqDataEx dataset having both • abundance of mRNA (cDNA) transcripts and • annotated yeast genome
Plotting RNA-seq data with GenomeArray() • Need to get mRNA contig abundance data data("seqDataEx") rnaSeqAb=seqDataEx$david • Create GeneticArray object with makeGenericArray(intensity, probeStart, probeEnd, trackOverlay, dp = NULL) • intensity either microaray or RNAseq transcript abundance signal • probStart start position of the probe (location in nt) • Plot contig abundance w.r.t. to genomic location gdPlot(list("abun"=makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb[, "location"]) , makeGenomeAxis() ) )
Adding genomic annotation to the prev plot • Need to get annotated yeast genome data annotGen = useMart("ensembl", "scerevisiae_gene_ensembl") • Create mRNA contig abundance mRNAabun = makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb [, "location"]) • Create annotated seq. covering the mRNA contigs location annotSeq= makeGeneRegion(start = min(rnaSeqAb[, "location"]), end = max(rnaSeqAb[, "location"]), chr = "IV", strand = "+", biomart = annotGen, dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = 0.85, idColor = "black", size=0.5)) • Combine objects and plot them gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq), 1299000,1312000)
Overlays of Basic Shapes and Custom Text • To overlay rectangle use makeRectangleOverlay() makeRectangleOverlay(start, end, region = NULL, coords = c("genomic", "absolute"), dp) rectOver = makeRectangleOverlay( 1301500, 1302200, region=c(1,2), "genomic", DisplayPars(alpha = 0.5)) • To overlay text use makeTextOverlay() tOver = makeTextOverlay("Ribosomal Large subunit", 1302000, 0.95, region = c(1,1), dp= DisplayPars(color = "red")) • Combine all overlay objects into one vector with c(v1,v2) gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq), overlays=c(rectOver, tOver) )
Alternative splicing of transcript makeTranscript(id, type, biomart, dp = NULL)
Displaying alternative splicing of a gene • mRNA coming from the same ORF could be spliced in many ways • E.g. case of VDR genes of IgG • Given biomaRt object, the makeTranscript()will extract splicing information for given id (i.e. gene) • Download human genome databse hGenome <- useMart("ensembl", "hsapiens_gene_ensembl") • Select Ensembl ID to look at head(getBM(c("ensembl_gene_id", "description"),"","", hGenome)) • Get splicing data from biomaRt object (hGenome) spliceObj = makeTranscript("ENSG00000168309", "ensembl_gene_id" ,hGenome) • Plot the object with gdPlot gdPlot(list(makeTitle("Transcript ID: ENSG00000168309"),splicingObj, makeGenomeAxis()) )
Conclusion • GeneGraphs provides a wide range to plot genomic data • Can use external databases through biomaRt • Main useful features • identifies exons/introns • allows to cross-reference expression / genome data • flexible albeit complex plotting capabilities • allows to overlay graphical objects and text • ability to create custom legends • annotation capabilities provided by powerful biomaRt
Thank you for your patience!&Happy Bioconductor/R Exploration!