Energy I ssues in Data Analytics

Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it Energy Issues in Data Analytics

Motivations for Taking Care of Data • Data is everywhere (Big, complex, real-time, unstructured) • Putting data at the center of research work on energy issues may bring some benefits. (Today the focus is on algorithms). • Cost metrics of data management techniques (communication, storing, access, query, analysis) will help professionals and users to save energy in data-intensive apps. • Energy-scalable data management is important for sustainable data science.

Data Availability or Data Deluge? • Every life process today is data intensive. • The information stored in digital data archives is enormous and its size is still growing very rapidly.

Data Availability or Data Deluge? • Some decades ago the main problem was the shortage of information, now the challenge is • thevery large volume of information to deal with and • theassociated complexityto process it and to extract significant and useful parts or summaries.

ComplexBigProblems… • Bigger and more complex problems must be solved by using large-scale distributed computing systems. • DATA SOURCES are larger and larger and ubiquitous (Web, sensor networks, mobile devices, telescopes, …).

…andBigData • Evenwhere accessible, much data in many fields cannot be read by humans so • The hugeamount of data availabletodayrequiressmart data analysystechniquesto aidpeople to deal with it and • Scalablealgorithms, techniques, andsystems are needed (time and energyscalability).

Data: From Storing to Analysis • Storing data isnot the onlymainproblem. • A keyissueisanalyse, mine, and process data for makingituseful. Source: The Economist

Towards Models for Energy-aware Data Management • The main focus today is on energy-aware algorithms, tasks, applications. • The other side of the coin is dataand costs of operating on it. • Abstract energy-cost models for exchanging, accessing and transform data are primary elements for energy-aware data managementat large scale. • They are useful for sustainabledata science.

An Example:Energy-aware Mining of Data • We evaluated the energy cost of analyzing data by using some well-known data mining techniques on mobile devices. • Our interest was mainly on how the same technique consumes energy when dimension of data change. • Tests with different • Data set dimensions, • Attribute number, • Class number.

Data Mining Techniques • Energy characterization of data mining techniques running on mobile devices • k-means (data clustering) • J48 (data classification) • Apriori (association rules) • Common performance parameters • Number of instances (data set size) • Number of attributes • Algorithm-specific performance parameters • k-means: number of clusters • J48: decision tree size • Apriori: Number of rules, minimum support and minimum confidence

k-means (1) • Increasing the number of instances,withdifferentproduced clusters

k-means (2) • Increasing the number of attributes with differentproduced clusters

Apriori (1) • Increasing the number of instances with differentnumber of attributes

Apriori (2) • Increasingthe data set size with differentnumber of rules

Apriori (3) • Increasing the data set size with different minimum confidence

J48 • Increasing the number of instances with differentnumber of attributes

Results on different devices • Resultsobtained with differentsmartphones • Sony XperiaP: 1 GHz Dual CoreARMprocessor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM

Results on different devices • Results obtained with different smart phones • Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM

Results on different devices • Results obtained with different smart phones • Sony Xperia P: 1 GHz Dual Core ARM processor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM • Samsung Galaxy ACE: 800 MHz Qualcomm processor and 512 MB RAM

Concluding Remarks • Data-intensive applications demands for energy cost models based on data characteristics. • This should be done for sensors, smart phones, HPC servers, and clouds. In general, for large scale computing systems. • Sustainible data center services and applications may benefit from these models. • Preliminary experiments show useful data.

Data Sets • Census (http://archive.ics.uci.edu/ml/datasets/Census+Income) • Used with K-means • Data set size: 14 MB • Number of instances: 244348 • Number of attributes: 11 • Census_disc (http://archive.ics.uci.edu/ml/datasets/Census+Income) • Used with Apriori • Data set size: 19 MB • Number of instances: 333011 • Number of attributes: 11 • Covertype (http://archive.ics.uci.edu/ml/datasets/Covertype) • Used with J48 • Data set size: 14.5 MB • Number of instances: 114556 • Number of attributes: 55

Energy I ssues in Data Analytics

Energy I ssues in Data Analytics

Presentation Transcript

DATA ANALYTICS I

Data Analytics

DATA ANALYTICS I

t raining i ssues

Environmental I ssues

Data Analytics

Japanese E-resource Metadata I ssues

Two I ssues on Resolution

T echnical I ssues

Data Analytics

Data Analytics

Data Analytics Course | Data Analytics Online Course | Data Analytics Certification

Data Analytics in Bangalore

data analytics

data analytics

Data Analytics

data analytics

Data analytics

Data analytics in hiring