A computational tool for depth-based Statistical analysis

A computational tool fordepth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department

The tool • Easy to use, efficient and expandable interface, for statistical research, based on the notion of data depth. • For scientists with no computer science background.

Our goal • Present the tool to the community • Code\software available on request • Run on real data • Get feedback • Is such a tool needed? • Additions\improvements?

General • C++ based software (no additional tools\software needed) • Simple interface. Should allow to • enter data files, sort the data points and filter unwanted data • perform calculations • present the results in an easy to understand graphical interface • Save and output data for future use • Fast • Portable code

General description Data filter txt, excel files output Statistical modules Geomview Contours display and selection

Data filter • Graphical user interface developed in C++ • Used to crop\manipulate a data set before it is fed into the statistical modules • Fast and light • Convenient and easy to use user interface • Portable code (UNIX, Solaris, Linux, Win)

Data filter

Statistical modules Depth contours (2D) • Half-space (location) depth contours • optimal O(n2) time • Supports two approaches for defining contours • Including Tukey median and the bagplot • Including contours’ parameters (size, etc..) • Convex hull peeling depth contours • Simplicial depth contours • Tukey median computation (O(nlog3n)) • Locating a new point in a set of depth contours (O(log n) query time)

Approaches for defining depth contours • P. Rousseeuw et al. • The k-th depth contour is the boundary of the set of points in the plane with depth k • R. Liu et al. (based on order statistics) • The sample p-th central hull is the convex hull containing the most central fraction p sample points.

Half-space (location) depth contours module Depth contours for a sample set with 8 data points Depth contours for a data set describing diabetic patients

Statistical modules – cntd. Plots • DD (Depth vs. Depth) plots • O(n2) time • Shrinkage plots • Fan plots

DD (Depth vs. Depth) plots module Depth according to set A Depth according to set B Two 2D data sets of 50 points each, created from normal distribution, centered at (0,0), with different covariance matrices (1 and 4 id).

Fan plots Relative area (CH of p%/CH) Percentile of points 50 data points, created from a random distribution, with covariance matrix 4 times identity. The fans are created for data sets containing the 1/6, 2/6, ..central regions. For each region the area of the CH of 2, 4, 6,…% of the points is computed.

Graphical contour selection tool • Plots depth contours and selects data ranges. • Actions • Import\export • Select points • Depth slider • Filter

Future work • Run the tool on existing data sets • Distribute preliminary versions and get users feedback • Data filter • Group by row\column • Filter by row\column • Interactions between rows\columns (addition, substitution, logical operations) • Statistical modules • Implement additional modules • Improve running times

Contributors • Prof. Diane Souvaine • Prof. Alva Couch • Eynat Rafalin • Michael Burr • Joe Handelman • James Hayes • Ori Taka • Alok Lal • Janet Luan • Kim Miller • Tim Mitchell • Nikolai Shvertner

A computational tool for depth-based Statistical analysis

A computational tool for depth-based Statistical analysis

Presentation Transcript

Risk analysis: A tool for animal

Statistical, Computational, and Informatics Tools for Biomarker Analysis

Xkl: A Tool For Speech Analysis

A Statistical Tool in Classroom

A computational tool for depth-based Statistical analysis

Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

A Compiler-Based Tool for Array Analysis in HPC Applications

Morphological Analysis for Phrase-Based Statistical Machine Translation

Statistical Challenges in Agent-Based Computational Modeling

clim.pact – a tool for empirical - statistical downscaling

A Power Grid Analysis and Verification Tool Based on a Statistical Prediction Engine

A Statistical Scheduling Technique for a Computational Market Economy

A Corpus Based Computational Linguistics

Morphological Analysis for Phrase-Based Statistical Machine Translation

Develop mathematical, statistical and computational methods for the analysis of

culturally relevant gender based analysis: a tool

Xkl: A Tool For Speech Analysis

TradeMap A web-based trade flow analysis tool

iTrend - Statistical & Technical Tool for Stock Analysis

Xkl: A Tool For Speech Analysis

A Tool for Risk-Based Testing

Kaolinite Market In-depth Insights & Statistical analysis 2019-2026