1 / 39

Data Mining in Ensembl with BioMart

Data Mining in Ensembl with BioMart. Simple Text-based Search Engine. ‘Mouse Gene’ Gives Us Results. A More Complex Query is Not as Useful. BioMart- Data mining. BioMart is a search engine that can find multiple terms and put them into a table format.

apu
Download Presentation

Data Mining in Ensembl with BioMart

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining in Ensembl with BioMart

  2. Simple Text-based Search Engine

  3. ‘Mouse Gene’ Gives Us Results

  4. A More Complex Query is Not as Useful

  5. BioMart- Data mining • BioMart is a search engine that can find multiple terms and put them into a table format. • Such as: human gene (IDs), chromosome and base pair position • No programming required!

  6. General or Specific Data-Tables • All the genes for one species • Or… only genes on one specific region of a chromosome • Or… genes on one region of a chromosome associated with a disease

  7. BioMart Data Sets • Ensembl genes • Vega genes • SNPs • Markers • Phenotypes • Gene expression information • Gene ontology • Homology predictions • Protein annotation

  8. Web Interface WithBioMart, quickly extract gene-associated information from the Ensembl databases.

  9. Information Flow • Choose the species of interest (Dataset) • Decide what you would like to know about the genes (Attributes) (sequences, IDs, description…) • Decide on a smaller geneset using Filters. (enter IDs, choose a region …)

  10. Web Interface Choose the species of interest Choose what information to view. Choose the gene set using what we know. Three main stages: Dataset, Attributes and Filters.

  11. The First Step: Choose the Dataset Homo sapiens genes are the default.

  12. The Second Step: Attributes Four output pages. Attributes are what we want to know about the genes.

  13. TheSNP Attribute Page Output variation information such as SNP reference ID and alleles.

  14. Filters Allow Gene Selection Choose the gene set by region, gene ID(s), protein/domain type.

  15. Export Sequence or Tables Genes and attributes are exported as sequence (Fasta format) or tables.

  16. Query: • For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. • In the query: Attributes: what we want to know. Filters: what we know

  17. Query: • For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both EnsemblandMGI. • In the query: Attributes: what we want to know. Filters: what we know

  18. Query: • For all mouse genes on chromosome10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. • In the query: Attributes: what we want to know. Filters: what we know

  19. A Brief Example Change dataset to mouse Mus musculus

  20. A Brief Example Dataset has changed.

  21. Attributes (Output Options) Click Attributes. Click on ‘GENE’. Attributes allow us to choose what we wish to know. IDs are found in the ‘Features’ page.

  22. Attributes (Output Options) Ensembl Gene ID is selected Default options selected: Ensembl Gene ID and Transcript ID

  23. Attributes (Output Options) ‘Markersymbol ID’ will give us the MGI ID Scroll down to select MGI symbol. Also select the accession number.

  24. The Results Table ‘Results’ give us Gene IDs for all mouse genes in the Ensembl database.

  25. Select a Smaller Gene Set Expand the REGION panel Select ‘Filters’ Instead of all mouse genes, select protein coding genes on chromosome 10.

  26. Select Genes on Chromosome 10 Select chromosome 10 Instead of all mouse genes, select protein coding genes on chromosome 10.

  27. Select Protein Coding Genes Gene type: protein coding Filters are set to chromosome 10 and protein-coding genes. Genes must meet BOTH criteria to be in the result table.

  28. Results (Preview) For the full result table: Go This is a preview- if you are happy with the table, click ‘Go’.

  29. Full Result Table Transcript ID MGI symbol MGI Accession Number Ensembl Gene ID

  30. Original Query: • For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. • In the query: Attributes: columns in the ResultTable Filters: what we know

  31. Other Export Options (Attributes) • Sequences: UTRs, flanking sequences, cDNA and peptides, etc • Gene IDs from Ensembl and external sources (MGI, Entrez, etc.) • Microarray data • Protein Functions/descriptions (Interpro, GO) • Orthologous gene sets • SNP/ Variation Data

  32. Central Server www.biomart.org

  33. WormBase

  34. Population frequencies Inter- population comparisons Gene annotation HapMap

  35. DictyBase

  36. Uniprot, MSD

  37. GRAMENE Rice, Maize, Arabidopsis genomes…

  38. How to Get There • Eitherwww.biomart.org/biomart/martview • Or click on ‘BioMart’ from Ensembl

  39. Arek Kasprzyk Benoît Ballester Syed Haider Richard Holland Damian Smedley Thanks Q & A

More Related