1 / 27

Intro to Biocoductor and GeneGraph

Intro to Biocoductor and GeneGraph. by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012. Bioconductor repository. Is the repository with extensions and libraries for R-language found at http://www.bioconductor.org/ Bioconductor libraries cover Micorarray analysis

lonato
Download Presentation

Intro to Biocoductor and GeneGraph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro toBiocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012

  2. Bioconductor repository • Is the repository with extensions and libraries for R-language found at http://www.bioconductor.org/ • Bioconductor libraries cover • Micorarray analysis • Genetic variants analysis (SNPs) • Sequence analysis (FASTA, RNA-seq) • Annotation (pathways, genes) • High-troughput assays (Mass Spec) • All libraries are free to use and contain documentation (i.e. vignettes) • Vignettes are short and require some previous knowledge of R and/or other defendant libraries

  3. Configuring R for Bioconductor • To install Bioconductor libraries the R environment needs to be configured source("http://bioconductor.org/biocLite.R") • To download anyBioconductor library use biocLite("package_name")function biocLite("GenomeGraphs") • Load the downloaded library functions via the library("lib_name")function library("GenomeGraphs") • Read help file(s) via the ?? or browseVignettes() ??library_name (e.g. ??GenomeGraphs) browseVignettes(package = "GenomeGraphs")

  4. Genome Data Display and Plotting The GenomeGraphs library Authors: Steen Durinck and James Bullard (Bioconductor library)

  5. Intro to GenomeGraphs library • Allows to retrieve data from • Ensembl comprehensive DB on genomes • intron/exon locations • sequence genetic variation data • protein properties (pI, domains, motifs) • GenomeGraphsBioc library allows to display data on: • Gene Expression (expression / location) • Comparative genomic hybridization (CGH) • Sequencing (e.g. variations/introns/exons)

  6. GenomeGraphs: Plotting with gdPlot • gdPlotmain plotting function gdPlot(gdObjects, minBase, maxBase ...) • gdObjects = any objects created by GenomeGraphs a) BaseTrack; b) Gene; c) GenericArray; d) RectangleOverlay • minBasethe lowest nt position to be plotted (optional) • maxBasethe largest nt to display (optional) • Objects are data structures that hold/organize a set of variables 1) Create an object to plot using makeBaseTrack() function ObjBaseT = makeBaseTrack(1:100, rnorm(1:100),strand = "+") 2) Plot the newly created object gdPlot(ObjBaseT) • Display object structure / properties (e.g. of BaseTrack) attributes(ObjBaseT) • $ sign next to the name represents object variables • Change or view individual object variables attr(ObjBaseT, “strand")

  7. Manipulating GenomeGraphs classgraphical properties • Changing values of variables (classical way) attr(Object, "variable/parameter") = value attr(ObjBaseT, "strand") = "-“ • If an array, to access individual elements use [element number 1..n] attr(ObjBaseT, "variable/parameter")[number] = new_value attr(ObjBaseT, "base")[1] = 0 • Changing graphical parameters such as color showDisplayOptions(ObjBaseT) alpha = 1 lty = solid color= orange lwd= 1 size = 5 type = p getPar(ObjBaseT, "color") [1] "orange" setPar(ObjBaseT, "color", "blue")

  8. gdPlot composite/group plots • gdPlot(…) can take any number of gdObjects to plot • Let’s plot twoBaseTrack objects a and b a = makeBaseTrack(1:100, rlnorm(100), strand = "+") b = makeBaseTrack(1:100, rnorm(100), strand = "-") ab=list(a,b) gdplot(ab)

  9. Manipulating values of the groupedgdObjects • ab is list of objects a and b • To access individual object within the list use [ ] ab[1] will display object a ab[2] will display object b • To modify the grouped object use double [[ ]] • To change color of b to red getPar(ab[[2]], "color") setPar(ab[[2]], "color", "red") -OR- (re-create object) b = makeBaseTrack(1:100, rnorm(100), strand = "-", dp=DisplayPars(color="red")) ab=list(a,b) gdPlot(ab)

  10. Plotting with labels and legend • gdPlot does not provide functions to label axis • Trick = “use labeled / tagged” objects "label" = GenomeGraph object "+ strand"= makeBaseTrack(1:100, rlnorm(100), strand = "+") • To display legend use makeLegend("text","color") ab=list("+"=a, "-"=b, makeGenomeAxis(), makeLegend(c("+", "-"),c("orange", "red")) )

  11. Retrieving and Displaying Data from Public Database Combining capabilities of biomaRt and GenomeGraph libraries

  12. Welcome to biomaRt • biomaRt library allows to retrieve data from public DBs ensembl ENSEMBL GENES 68 (SANGER UK) snp ENSEMBL VARIATION 68 (SANGER UK) unimart UNIPROT (EBI UK) bacteria_mart_14 ENSEMBL BACTERIA 14 (EBI UK) ***Use listMarts() to see all available databases*** • Let’s retrieve gene data of the Bacillus subtilisstrain • useMart(database,dataset)allows to connect to specified databaseand dataset within this database db=useMart("bacteria_mart_14") listDatasets(db) Dataset Description version … … … bac_6_gene Bacillus subtilis genes (EB 2 b_subtilis) EB 2 b_subtilis db=useMart("bacteria_mart_14", "bac_6_gene")

  13. Exploring biomaRt object • listAttributes()shows all prop. of the biomaRtobj. • The db object has total of 4175 genes • Use getBM(attribute, filter, value, biomaRt_obj)toextract values belonging to specified attributes • attribute: general term such as gene name/chromosome # / strand (+ or - ) • filter: parameter applied on attribute such as genomic region to consider (i.e. start and end in nt) • value: actual values of the applied filter(s) getBM(c("external_gene_id", "description","start_position", "end_position", "strand"), filters = c("start", "end"), values = list(1,10000), db) ID description start(nt) end(nt) strand metSMethionyl-tRNAsynthetase 45633 47627 1 ftsH Cell division protease ftsH homolog 76984 78897 1 hslO 33 kDachaperonin 79880 80755 1 DgkDeoxyguanosine kinase 23146 23769 -1

  14. Plotting the selected “Genome Region” • Create an object with makeGeneRegion()function makeGeneRegion(start,end,chromosome name, strand, biomaRt object, plotting options) • Find notation used for the chromosome naming getBM("chromosome_name","","",db) chromosome_name 1 Chromosome gRegion = makeGeneRegion(1, 10000, chr = "Chromosome", strand = "+", biomart = db, dp = DisplayPars(plotId = TRUE, idRotation = 90, cex = 0.8, idColor = "black")) gdPlot(list(gRegion, makeGenomeAxis(), makeTitle("Position(nt)",cex=3,"black",0.1)))

  15. Bacillus subtilis genome region (intron / exon) ensembl_gene_id

  16. Mapping Expression data RNAseq GenomeArray()

  17. Intro into RNA-seq data 1Ugrappa Nagalakshmi et. al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008 • HT sequencing technologies allow to sequence mRNA in a series of short contigs of 50-200 bp • In addition to gene expression analysis it possible to • Map location of introns (UTRs) / exons • Principal:  one searches for a rapid changes in abundance of the RNA-Seq signal (contigs) • Integration of sequence + expression information • Ugrappa Nagalakshmi et. al. 2008 had used this strategy to accurately map yeast genome1 • Task: Display part of the seqDataEx dataset having both • abundance of mRNA (cDNA) transcripts and • annotated yeast genome

  18. Plotting RNA-seq data with GenomeArray() • Need to get mRNA contig abundance data data("seqDataEx") rnaSeqAb=seqDataEx$david • Create GeneticArray object with makeGenericArray(intensity, probeStart, probeEnd, trackOverlay, dp = NULL) • intensity either microaray or RNAseq transcript abundance signal • probStart start position of the probe (location in nt) • Plot contig abundance w.r.t. to genomic location gdPlot(list("abun"=makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb[, "location"]) , makeGenomeAxis() ) )

  19. Adding genomic annotation to the prev plot • Need to get annotated yeast genome data annotGen = useMart("ensembl", "scerevisiae_gene_ensembl") • Create mRNA contig abundance mRNAabun = makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb [, "location"]) • Create annotated seq. covering the mRNA contigs location annotSeq= makeGeneRegion(start = min(rnaSeqAb[, "location"]), end = max(rnaSeqAb[, "location"]), chr = "IV", strand = "+", biomart = annotGen, dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = 0.85, idColor = "black", size=0.5)) • Combine objects and plot them gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq), 1299000,1312000)

  20. Resulting plot:Transcript abundance w.r.t. to location

  21. Overlays of Basic Shapes and Custom Text • To overlay rectangle use makeRectangleOverlay() makeRectangleOverlay(start, end, region = NULL, coords = c("genomic", "absolute"), dp) rectOver = makeRectangleOverlay( 1301500, 1302200, region=c(1,2), "genomic", DisplayPars(alpha = 0.5)) • To overlay text use makeTextOverlay() tOver = makeTextOverlay("Ribosomal Large subunit", 1302000, 0.95, region = c(1,1), dp= DisplayPars(color = "red")) • Combine all overlay objects into one vector with c(v1,v2) gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq), overlays=c(rectOver, tOver) )

  22. Overlay of rectangle and text

  23. Alternative splicing of transcript makeTranscript(id, type, biomart, dp = NULL)

  24. Displaying alternative splicing of a gene • mRNA coming from the same ORF could be spliced in many ways • E.g. case of VDR genes of IgG • Given biomaRt object, the makeTranscript()will extract splicing information for given id (i.e. gene) • Download human genome databse hGenome <- useMart("ensembl", "hsapiens_gene_ensembl") • Select Ensembl ID to look at head(getBM(c("ensembl_gene_id", "description"),"","", hGenome)) • Get splicing data from biomaRt object (hGenome) spliceObj = makeTranscript("ENSG00000168309", "ensembl_gene_id" ,hGenome) • Plot the object with gdPlot gdPlot(list(makeTitle("Transcript ID: ENSG00000168309"),splicingObj, makeGenomeAxis()) )

  25. The final result

  26. Conclusion • GeneGraphs provides a wide range to plot genomic data • Can use external databases through biomaRt • Main useful features • identifies exons/introns • allows to cross-reference expression / genome data • flexible albeit complex plotting capabilities • allows to overlay graphical objects and text • ability to create custom legends • annotation capabilities provided by powerful biomaRt

  27. Thank you for your patience!&Happy Bioconductor/R Exploration!

More Related