1 / 35

Data Mining with BioMart

Data Mining with BioMart. www.ensembl.org/biomart/martview www.biomart.org/biomart/martview. What is BioMart?. A data export tool A quick table generator A web interface to mine Ensembl data. BioMart- Data mining.

brigid
Download Presentation

Data Mining with BioMart

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining with BioMart www.ensembl.org/biomart/martview www.biomart.org/biomart/martview

  2. What is BioMart? • A data export tool • A quick table generator • A web interface to mine Ensembl data

  3. BioMart- Data mining • BioMart is a search engine that can find multiple terms and put them into a table format. • Such as: mouse gene (IDs), chromosome and base pair position • No programming required!

  4. General or Specific Data-Tables • All the genes for one species • Or… only genes on one specific region of a chromosome • Or… make BioMart select genes (I.e. all transcripts that match a microarry probe set, GO term, or InterPro domain).

  5. Results Tables or sequences

  6. The First Step: Choose the Dataset Dataset: Current Ensembl, Human genes

  7. The Second Step: Filters Filters: Define a gene set

  8. Attributes attach information Attributes: Determine output columns

  9. Query For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

  10. Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) • In the query: Filters: what we know Attributes: what we want to know.

  11. Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) • In the query: Filters: what we know Attributes: what we want to know.

  12. Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) • In the query: Filters: what we know Attributes: what we want to know

  13. A Brief Example Use the current Ensembl (archives are also available) Select Homo sapiens genes

  14. Select the Genes with Filters Click Filters Expand the ‘GENE’ panel. Expand the GENE panel to enter in the gene ID(s).

  15. Filters (and Count) Change this to HGNC curated name. Enter “CFTR” in the box. Click “Count” to see if genes passed through your filters.

  16. Attributes (Output Options) Click on ‘Attributes’ ‘Attributes’ allows you to output information.

  17. Attributes (Output Options) Select ‘EntrezGene ID’

  18. Attributes (Output Options) Select the Affy Platform ‘HG U133-PLUS-2’ in the ‘Microarray’ section

  19. The Results Table - Preview For the full result table: click “Go” or View “ALL” rows.

  20. Full Result Table Ensembl Transcript IDs Affy HG probeset EntrezGene ID Ensembl Gene ID for CFTR

  21. Other Export Options (Attributes) • Sequences: UTRs, flanking sequences, cDNA and peptides, etc • Gene IDs from Ensembl and external sources (MGI, Entrez, etc) • Microarray data • Protein Functions/descriptions (Interpro, GO) • Orthologous gene sets • SNP/ Variation Data

  22. BioMart around the world… BioMart started at Ensembl…To where has it travelled?

  23. Central Portal www.biomart.org

  24. WormBase

  25. Population frequencies Inter- population comparisons Gene annotation HapMap

  26. DictyBase

  27. GRAMENE www.gramene.org

  28. The Potato Center

  29. How to Get There http://www.biomart.org/biomart/martview http://www.ensembl.org/biomart/martview • Or click on ‘BioMart’ from Ensembl

  30. Worked Example • Follow the worked example on pg 26 • Then, do the exercises on pg 34 (answers on pg 37) This module should do the following: • Show you how to export multiple data types from Ensembl for gene IDs or chromosomal regions.

  31. Ensembl Core Databases Relational Database Normalised Each data point stored only once Therefore: Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining applications

  32. Normalised Schema

  33. BioMart Database Data warehouse De-normalised Query-optimised Therefore: Fast and flexible Ideal for data mining But: Tables with apparent “redundancy” Needs rebuilding from scratch for every release from normalised core databases

  34. De-Normalised Schema

  35. Information Flow REGION REGION GENE GENE EXPRESSION EXPRESSION HOMOLOGY HOMOLOGY PROTEIN PROTEIN SNP SNP SPECIES FOCUS SWISSPROT FASTA EMBL GTF REFSEQ HTML GO TEXT INTERPRO EXCEL AFFYMETRIX FILE DATASET FILTER ATTRIBUTES

More Related