150 likes | 329 Views
Ethics of Big Data. Eduardo Felipe Zecca da Cruz. What is Big Data?.
E N D
Ethics of Big Data Eduardo Felipe Zecca da Cruz
What is Big Data? • Stamford, Conn.-based IT research firm Gartner Inc. defines "big data" as "high-volume, velocity and/or variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making and process automation." • Other experts point out that big data might include unstructured textual information from social media sites, machine-generated log data and a host of other information collected by cloud applications, on-premises applications and websites.
Numbers • Stored information totaled 0.8 zetabytes, the equivalent of 800 billion gigabytes, in 2009. • IDC predicts that by 2020, 35 zetabytes of information will be stored globally • Untargeted ad online was $1.98 per thousand views. • The average price of a targeted ad was $4.12 per thousand
Concerns • Privacy Issues • Users could feel monitored for every single action that they do on the Internet • Examples • Target Case • Sent coupons to a teenager for baby products by analyzing her shopping patterns • Revealed that she was pregnant to her family • Travel Agency Orbitz Case • Began up-charging Apple users after data-crunching revealed that they are generally willing to pay more for travel
Data Brokers • Data Broker is a business that collect personal information about customers and sells it to other organizations • Google, Facebook, Amazon and Microsoft take the most private information
Hadoop • Software Architecture that handles high volumes of simultaneous search queries • Created in 2005 • Licensed under the Apache License 2.0 • It was inspired by Google’s MapReduce
Hadoop Architecture • It consists of: • Hadoop Common package • Provides filesystem and OS level abstractions • Contains the necessary Java Archive files and scripts need to start Hadoop • Hadoop Distributed File System • Hadoop YARN • Hadoop MapReduce • Programming model for processing large data sets with a parallel, distributed algorithm on a cluster
Prominent Users • Yahoo! • Yahoo Search Webmap • Facebook • Has the largest Hadoop cluster in the world • After 2013 more than half of Fortune 50 uses Hadoop
Proposals • Consumer Privacy Bill of Rights Act of 2011 • A bill to establish a regulatory framework for the comprehensive protection of personal data for individuals under the aegis of the Federal Trade Commission, and for other purposes. • Do-Not-Track Online Act of 2011 • A bill to require the Federal Trade Commission to prescribe regulations regarding the collection and use of personal information obtained by tracking the online activity of an individual, and for other purposes.
Proposals • Clarity on Practices • Let users know when data is being collected • Simplicity of Settings • Privacy by Design • Exchange of Value
References • http://www.technologyreview.com/news/424104/what-big-data-needs-a-code-of-ethical-practices/ • http://www.forbes.com/sites/emc/2014/03/27/the-ethics-of-big-data/ • http://searchcloudapplications.techtarget.com/feature/Big-data-collection-efforts-spark-an-information-ethics-debate • http://whatis.techtarget.com/definition/data-broker-information-broker • http://hadoop.apache.org/ • http://en.wikipedia.org/wiki/Apache_Hadoop#File_system • http://searchcloudcomputing.techtarget.com/definition/Hadoop