160 likes | 273 Views
Data Science and Visualization. 2014 Summer Internship - Tetherless World Constellation. Sumithra Gnanasekar Lakshmi Chenicheri. Objective. Visualize Minimum Information about a Marker Gene Sequence ( MiMarks ) compliant datasets A dark data exercise. *. MiMarks.
E N D
Data Science and Visualization 2014 Summer Internship - Tetherless World Constellation Sumithra Gnanasekar Lakshmi Chenicheri
Objective • Visualize Minimum Information about a Marker Gene Sequence (MiMarks) compliant datasets • A dark data exercise *
MiMarks • A standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences • Describes the environment from which the sample has been taken from • Ensures contextual data is collected and submitted *
Datasets • Two datasets from a bacterial diversity study from the Western English Channel • Focused on the seasonal structure of microbial communities • Dataset 1 was converted from Excel to CSV • Dataset 2 was converted from SRA to CSV • Data cleaning was undertaken to retrieve relevant fields *
Tools for Visualization • R • Google charts integrated with R • Shiny R Studio • D3.js D3.js was finally used due to its flexibility of use and range of visualizations available *
Scatter Plot Dataset 1 • Allows the user to filter fields • Drill and expand • Group based on fields • Handy in determining correlations between variables *
Analysis of Scatter Plot Dataset 1 • Depth, density, total_Depth of water column, longitude and latitude were found to be independent of the other environmental variables • Near linear correlation between nitrate and silicate, and nitrate and phosphate *
Scatter Plot Dataset 2 • Allows the user to filter fields • Drill and expand *
Analysis of Scatter Plot Dataset 2 Linear trend seen in the scatter plots of: • Spots vs Bases • Nitrate vs Phosphate • Org_nitro vs Ord_carb • Temperature vs Density *
Temporal Visualization Allows one to filter values based on time and analyze its effect on other variables *
DOI Visualization • Visually represents DOIs associated with data points • On clicking a bubble, the metadata for that DOI is fetched and displayed *
Bubble Chart • Visually represents the environment data associated with each sample • Bubble size corresponds to organism count *
RDF Conversion The RDF conversion for MiMarkscompliant datasets involves two steps: • Construct an Ontology or use an existing one • Convert the dataset into a triple instance using CSV to RDF conversion tools csv2rdf4lod is an open source tool that can be used to easily convert the data in a CSV file into RDF encoded data *
Spatio-temporal feature of MiMarks, VAMPS and CoDL datasets Some tools or visualizations that can be used to visualize the MiMarks, VAMPS and CoDL datasets are as follows: • Planetary.js, an open source tool will be effective in representing the spatial features in an interactive way • Motion charts that show the change over a period of time can be effective, by showing a change in the quantity represented as the size of the bubble in the motion chart • Calendar based representation of values if there is continuous data, is another option *
Links to Visualizations • Timeline crossfiltering visualization:http://dco.tw.rpi.edu/viz/timeline/index.html • DOI visualization: http://dco.tw.rpi.edu/viz/doiVis/index.html • Scatterplot visualization for Dataset 1:http://dco.tw.rpi.edu/viz/scatterPlot/demo/demo.html • Bubble chart Visualization:http://dco.tw.rpi.edu/viz/Bubblechart/bubble_dataset2/index.html • Scatterplot visualization for Dataset 2:http://dco.tw.rpi.edu/viz/scatterplot_dataset2/demo/demo.html *