Parallel Analysis of Egg Data with hadoop on futuregrid

Parallel Analysis of Egg Data with hadoop on futuregrid Project Member: RewatiOvalekar Project Guide : Gregor von Laszweski, Lizhe Wang

BACKGROUND Importance of EEG Data: • Used in detectingand diagnosing brain related dis-orders • EEMD algorithm is developed to analyze the signals

Drawbacks of EEG Data: • EEG signals are complex in nature • Analysis of EEG signals are highly data-intensive and compute- intensive • Basic EEMD algorithm not time-efficient

Parallel eemd for eeg analysis • EEMD algorithm was modified to analyze data points in parallel • Multiple levels: • Epoch Level • Trial Level • Data Channel Level

Epoch Level: Single data point is considered and is processed at each level. The output from this instance is not consumed by another.

Trial Level Each Epoch can be split into number of trials. Decomposition of each trail is performed independently. All trials for a particular epoch are combined to get an output for each epoch.

Data Channel Level Data is parallelized at each channel, then the output is combined for its corresponding trial. The grain of parallelization is coarse at this level.

Multi-thread design • Each thread will process EEGdata point for a particular Epoch –level. • Local extrema will be calculated at each level. • All local maxima and minima will be connected using cubic spline

Multi-thread design

Limitations of multi-threaded design • Cannot process huge data due to availability of limited resources on a local machine Solution: • Develop Parallel EEMD algorithm using MapReduce on Hadoop

Why Hadoop? Hadoop provides a distributed framework to run applications on large cluster • MapReduce is used to implement the parallel EEMD algorithm

Mapreduce design: (epoch level parallelization) • Epoch Mapper: • Each map function will take input as single point • Calculate local extrema at each epoch level • Connect minima and maxima by cubic spline • Generate points which will be combined in Epoch Reducer

Mapreduce design: (epoch level parallelization) • Epoch Reducer: • Each reduce function will combine the points having the same egg data point • Generates data points, 8 IMF and one left data for an individual eeg data point

Mapreduce design: (epoch level parallelization)

Performance analysis of original algorithm

Performance analysis of eemd algorithm on hadoop • Analyzed for the same data-set by changing the number of nodes to be considered in a cluster

Performance analysis of eemd algorithm on hadoop • Analyzed huge data-set by keeping the number of nodes constant. Analyzed the data-set by changing the number of epochs to be processed at a time

CONCLUSION: • New Hadoop EEMD is better in terms of performance to analyze huge data as compared to the original algorithm • For better results while analyzing huge data-set consider number of mappers i.e. number of epochs to be processed at a time to be approximately double than the nodes available in the cluster

Thank you!!!!

Parallel Analysis of Egg Data with hadoop on futuregrid

Parallel Analysis of Egg Data with hadoop on futuregrid

Presentation Transcript

Hive: A data warehouse on Hadoop

Cosmic Issues and Analysis of External Comments on FutureGrid

Harnessing Big Data with Hadoop

Integrating Hadoop and Parallel DBMS

Engineering BIG DATA with HADOOP

Wrangling Customer Usage Data with Hadoop

Big Data and Hadoop On Windows

Interpreting the data: Parallel analysis with Sawzall

Integrating Hadoop and Parallel DBMS

Data Management Platform on Hadoop

Comments on FutureGrid

Experimenting with FutureGrid

Parallel Analysis of Egg Data with hadoop on futuregrid

Tutorial: To run the MapReduce EEMD code with Hadoop on Futuregrid

Parallel Interactive and Batch HEP-Data Analysis with PROOF

Virtual Networks on FutureGrid

Educational Virtual Clusters for On-demand MPI/Hadoop/Condor in FutureGrid

Batch Start on Big Data & Hadoop

Future of Big Data Hadoop with Current Trend

Big Data Hadoop Training | Big Data Hadoop Courses | Hadoop Online Training

Big Data Management with Hadoop

Parallel Interactive and Batch HEP-Data Analysis with PROOF