1 / 16

Data Science and Visualization

Data Science and Visualization. 2014 Summer Internship - Tetherless World Constellation. Sumithra Gnanasekar Lakshmi Chenicheri. Objective. Visualize Minimum Information about a Marker Gene Sequence ( MiMarks ) compliant datasets A dark data exercise. *. MiMarks.

drew
Download Presentation

Data Science and Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science and Visualization 2014 Summer Internship - Tetherless World Constellation Sumithra Gnanasekar Lakshmi Chenicheri

  2. Objective • Visualize Minimum Information about a Marker Gene Sequence (MiMarks) compliant datasets • A dark data exercise *

  3. MiMarks • A standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences • Describes the environment from which the sample has been taken from • Ensures contextual data is collected and submitted *

  4. MiMarks Checklist

  5. Datasets • Two datasets from a bacterial diversity study from the Western English Channel • Focused on the seasonal structure of microbial communities • Dataset 1 was converted from Excel to CSV • Dataset 2 was converted from SRA to CSV • Data cleaning was undertaken to retrieve relevant fields *

  6. Tools for Visualization • R • Google charts integrated with R • Shiny R Studio • D3.js D3.js was finally used due to its flexibility of use and range of visualizations available *

  7. Scatter Plot Dataset 1 • Allows the user to filter fields • Drill and expand • Group based on fields • Handy in determining correlations between variables *

  8. Analysis of Scatter Plot Dataset 1 • Depth, density, total_Depth of water column, longitude and latitude were found to be independent of the other environmental variables • Near linear correlation between nitrate and silicate, and nitrate and phosphate *

  9. Scatter Plot Dataset 2 • Allows the user to filter fields • Drill and expand *

  10. Analysis of Scatter Plot Dataset 2 Linear trend seen in the scatter plots of: • Spots vs Bases • Nitrate vs Phosphate • Org_nitro vs Ord_carb • Temperature vs Density *

  11. Temporal Visualization Allows one to filter values based on time and analyze its effect on other variables *

  12. DOI Visualization • Visually represents DOIs associated with data points • On clicking a bubble, the metadata for that DOI is fetched and displayed *

  13. Bubble Chart • Visually represents the environment data associated with each sample • Bubble size corresponds to organism count *

  14. RDF Conversion The RDF conversion for MiMarkscompliant datasets involves two steps: • Construct an Ontology or use an existing one • Convert the dataset into a triple instance using CSV to RDF conversion tools csv2rdf4lod is an open source tool that can be used to easily convert the data in a CSV file into RDF encoded data *

  15. Spatio-temporal feature of MiMarks, VAMPS and CoDL datasets Some tools or visualizations that can be used to visualize the MiMarks, VAMPS and CoDL datasets are as follows: • Planetary.js, an open source tool will be effective in representing the spatial features in an interactive way • Motion charts that show the change over a period of time can be effective, by showing a change in the quantity represented as the size of the bubble in the motion chart • Calendar based representation of values if there is continuous data, is another option *

  16. Links to Visualizations • Timeline crossfiltering visualization:http://dco.tw.rpi.edu/viz/timeline/index.html • DOI visualization: http://dco.tw.rpi.edu/viz/doiVis/index.html • Scatterplot visualization for Dataset 1:http://dco.tw.rpi.edu/viz/scatterPlot/demo/demo.html • Bubble chart Visualization:http://dco.tw.rpi.edu/viz/Bubblechart/bubble_dataset2/index.html • Scatterplot visualization for Dataset 2:http://dco.tw.rpi.edu/viz/scatterplot_dataset2/demo/demo.html *

More Related