220 likes | 425 Views
Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it. Energy I ssues in Data Analytics. Motivations for Taking Care of Data. Data is everywhere (Big, complex, real-time, unstructured)
E N D
Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it Energy Issues in Data Analytics
Motivations for Taking Care of Data • Data is everywhere (Big, complex, real-time, unstructured) • Putting data at the center of research work on energy issues may bring some benefits. (Today the focus is on algorithms). • Cost metrics of data management techniques (communication, storing, access, query, analysis) will help professionals and users to save energy in data-intensive apps. • Energy-scalable data management is important for sustainable data science.
Data Availability or Data Deluge? • Every life process today is data intensive. • The information stored in digital data archives is enormous and its size is still growing very rapidly.
Data Availability or Data Deluge? • Some decades ago the main problem was the shortage of information, now the challenge is • thevery large volume of information to deal with and • theassociated complexityto process it and to extract significant and useful parts or summaries.
ComplexBigProblems… • Bigger and more complex problems must be solved by using large-scale distributed computing systems. • DATA SOURCES are larger and larger and ubiquitous (Web, sensor networks, mobile devices, telescopes, …).
…andBigData • Evenwhere accessible, much data in many fields cannot be read by humans so • The hugeamount of data availabletodayrequiressmart data analysystechniquesto aidpeople to deal with it and • Scalablealgorithms, techniques, andsystems are needed (time and energyscalability).
Data: From Storing to Analysis • Storing data isnot the onlymainproblem. • A keyissueisanalyse, mine, and process data for makingituseful. Source: The Economist
Towards Models for Energy-aware Data Management • The main focus today is on energy-aware algorithms, tasks, applications. • The other side of the coin is dataand costs of operating on it. • Abstract energy-cost models for exchanging, accessing and transform data are primary elements for energy-aware data managementat large scale. • They are useful for sustainabledata science.
An Example:Energy-aware Mining of Data • We evaluated the energy cost of analyzing data by using some well-known data mining techniques on mobile devices. • Our interest was mainly on how the same technique consumes energy when dimension of data change. • Tests with different • Data set dimensions, • Attribute number, • Class number.
Data Mining Techniques • Energy characterization of data mining techniques running on mobile devices • k-means (data clustering) • J48 (data classification) • Apriori (association rules) • Common performance parameters • Number of instances (data set size) • Number of attributes • Algorithm-specific performance parameters • k-means: number of clusters • J48: decision tree size • Apriori: Number of rules, minimum support and minimum confidence
k-means (1) • Increasing the number of instances,withdifferentproduced clusters
k-means (2) • Increasing the number of attributes with differentproduced clusters
Apriori (1) • Increasing the number of instances with differentnumber of attributes
Apriori (2) • Increasingthe data set size with differentnumber of rules
Apriori (3) • Increasing the data set size with different minimum confidence
J48 • Increasing the number of instances with differentnumber of attributes
Results on different devices • Resultsobtained with differentsmartphones • Sony XperiaP: 1 GHz Dual CoreARMprocessor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM
Results on different devices • Results obtained with different smart phones • Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM
Results on different devices • Results obtained with different smart phones • Sony Xperia P: 1 GHz Dual Core ARM processor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM • Samsung Galaxy ACE: 800 MHz Qualcomm processor and 512 MB RAM
Concluding Remarks • Data-intensive applications demands for energy cost models based on data characteristics. • This should be done for sensors, smart phones, HPC servers, and clouds. In general, for large scale computing systems. • Sustainible data center services and applications may benefit from these models. • Preliminary experiments show useful data.
Data Sets • Census (http://archive.ics.uci.edu/ml/datasets/Census+Income) • Used with K-means • Data set size: 14 MB • Number of instances: 244348 • Number of attributes: 11 • Census_disc (http://archive.ics.uci.edu/ml/datasets/Census+Income) • Used with Apriori • Data set size: 19 MB • Number of instances: 333011 • Number of attributes: 11 • Covertype (http://archive.ics.uci.edu/ml/datasets/Covertype) • Used with J48 • Data set size: 14.5 MB • Number of instances: 114556 • Number of attributes: 55