270 likes | 281 Views
Explore the challenges of privacy in public social media with a focus on big data issues. Understand the impact of user-generated content on privacy, especially in social web environments. Learn about the risks associated with geo-tagging, metadata, and data sharing, and discover ways to protect personal data online.
E N D
Big Data Privacy Issues in Public Social Media Reporter:Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU http://www.ntu.edu.sg/home/rxlu/seminars.htm
SOURCE: Big Data Privacy Issues in Public Social Media References
BIG DATA: big data social web BIG DATA: Privacy concerns. Outline
Any task which is comparatively easy to execute when operating on a small but relevant set of data, but becomes unmanageable when dealing with the same problem with a large dataset can be classified as a Big Data problem. Problems encountered when dealing with Big Data include capture, storage, dissemination, search, analytics and visualisation Big data
Big data • The traditional Big Data applications such as astronomy and other e-sciences usually operate on non-personal information and as such usually do not have significant privacy issues. • Big Data research is being used to create and analyse profiles of us, for example for market research, targeted advertisement, workflow improvement or national security.
Big data • In social web, there is an increasing awareness of the value, potential and risk of the personal data which we voluntarily upload to the web. • Big Data issue has focused on entirely up to the controller of the Big Data sets do with this information information gleaned is used for nefarious purposes or not
Big data meet social web • Personal data which we voluntarily upload to the web • Problem: how can users control who has access to what they post themselves. • Effect other peoples’ data has on us. Example: • If a friend takes a picture of me during a volleyball game, shares this picture with other friends and one of them uploads the picture to the web, my insurance company can find and use that picture against me
Big data • Photos uploaded to Facebook per month has risen from 2 billion to over 6 billion. • Current social networks and photo-sharing sites do little to deal with the privacy implications created by other users’ media
Location information • Modern devices to embed geo-data and other metadata into the created content. • Privacy issues of location information embedded into uploaded media have not yet received much attention.
ENVIRONMENT & PROBLEM STATEMENT • Large preserving techniques to protect a user’s own privacy, from solutions which are installed locally on the user’s mobile device, to solutions which use online services relying on group-based anonymisation algorithms, as for instance mix zones or k-anonymity. • The location and other metadata contained in pictures and videos can also affect other people than the uploader himself.
Privacy issues • We categorise privacy issues into two classes. • Firstly, homegrown problems: uploads a piece of compromising media of himself with insufficient protection or forethought which causes damage to his own privacy. • This issue is a small data problem.
Privacy issues • Secondly we have the Big Data problems created by others: An emerging threat to users’ online privacy comes from other users’ media. • The amount of data being uploaded is so vast it cannot be manually sighted. • Also there are currently no countermeasures, to prevent others from uploading potentially damaging content about someone
Privacy issues • There are two requirements for this form of privacy threat to have an effect: Firstly, to cause harm to a person a piece of media needs to be able to be associated/linked to the person in some way. Recognisable in a photo, (hyper-)linked to a photo.
Privacy issues • Secondly, a piece of media inquestion must contain harmful content for the person linkedto it. • Metadata or associated datacauses harm. For instance time and location data can indicatethat a person has been at an embarrassing location, took partin a political event, or was not where he said he was.
Awareness of Damaging Media in Big Datasets • Direct linking of profiles to pictures-- finding information about people. • Non-linked tagging of photos:there is no automated mechanism to inform a user that he was named in or near a piece of media
ANALYSIS OF SERVICE PRIVACY • Flickr provides the most fine-grained privacy/access controlsettings of all analysed services.Flickr is the geo-fence. Geo-fence can be a predefined set of boundaries
ANALYSIS OF SERVICE PRIVACY • Facebook uses face recognition for friend tagging suggestions based on already tagged friends. • Picasa Web & Google+ store accessible by everyone who can access the image. • Locr is a geo-tagging focused photo-sharing site. Anybody who can see an image can also see the metadata.
ANALYSIS OF SERVICE PRIVACY • Instagram and PicPlz are services/mobile apps that allowposting images in a Twitter like way. Resized images strippedof metadata but with optional location data are stored by theservices.
SURVEY OF METADATA IN SOCIAL MEDIA • Analysed a set of 20,000 publicly available Flickr images and their metadata. 23% of the 20k users denied access to their extracted EXIF data in the Flickr database. • 3,000 images made with a camera phone from 3k random mobile Flickr users. 46.8% of the mobile users were Pro users and only 2% denied access to EXIF data in the Flickr database
SURVEY OF METADATA IN SOCIAL MEDIA • GPS location data was present in 19% of the 20k dataset and in 34% of the 3k mobile phone dataset. • iPhone 4 currently being the most common camera on Flickr. • Reverse geocoding becomes more common in client applications.
SURVEY OF METADATA IN SOCIAL MEDIA • Potential privacy impact images which could contain people who are unaware of the photo.
SURVEY OF METADATA IN SOCIAL MEDIA • Mobile devices when it comes to publishing GPS metadata
SURVEY OF METADATA IN SOCIAL MEDIA • One third of the pictures taken by dominantcamera devices contains GPS information. About one thirdof these images depict people on it. Thus, about 10% of allthe photos could harm other peoples’ privacy without themknowing about it.
SURVEY OF METADATA IN SOCIAL MEDIA • User’ s phone keeps a GPS recordof where the person was at which time, these two pieces ofinformation can be combined with the location data stored inthe media to significantly reduce the amount of data whichcould be relevant to the individual person.
SURVEY OF METADATA IN SOCIAL MEDIA • All three types of service are mainly focused on detectingrelevant media events and breaking down the Big Dataproblem to humanly manageable sizes. • The concept is mainlyfocused on bringing possibly relevant media to the attentionof the user without overburdening him.
Discussion • User’s privacy based on dangers created by the user himself while sharing media. • BUT, how users can be protected from other peoples’ media?
Thank you Rongxing’s Homepage: http://www.ntu.edu.sg/home/rxlu/index.htm PPT available @: http://www.ntu.edu.sg/home/rxlu/seminars.htm Ximeng’s Homepage: http://www.liuximeng.cn/