1 / 99

Mobile Social Network Analysis: Trust and Privacy Optimization

Explore the impact of online mobile social networks on real-life interactions. Evaluate Dunbar's number, trust metrics, and data-sharing protocols in social networks. Investigate the co-evolution of society and technology. Ongoing studies examine privacy implications and social inclusion. Initiate discussions on detecting dodgy behavior using technology advancements.

percyr
Download Presentation

Mobile Social Network Analysis: Trust and Privacy Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. China Mobile Leader’s ProgrammeMobile TechnologyJon Crowcroft http://www.cl.cam.ac.uk/~jac22 Jon.crowcroft@cl.cam.ac.uk +gmail, hotmail +441223763633 +447733 231822 +linkedin, facebook, myspace

  2. 4 Areas • Mobile Social Networks • Data Collection • Energy • Programming

  3. 1. Online Mobile Social Nets & Real Life

  4. We meet, we connect, we communicate • We meet in real life in the real world • We use text messages, phones, IM • We make friends on facebook, Second Life • How are these related? • How do they affect each other? • How do they change with new technology?

  5. Thank you but you are in the opposite direction! I have 100M bytes of data, who can carry for me? Give it to me, I have 1G bytes phone flash. I can also carry for you! Don’t give to me! I am running out of storage. Reach an access point. There is one in my pocket… Internet Search La Bonheme.mp3 for me Finally, it arrive… Search La Bonheme.mp3 for me Search La Bonheme.mp3 for me

  6. My facebook friendswheel

  7. My email statistics!

  8. Cliques and Communities

  9. Dunbar’s Number & Trust • Dunbar’s number:-150 (for humans) • Size of simple communities of humans • Reflects ability to cope with group • Humans gossip rather than physical grooming • Language lets us abstract • We can reason up to 5 levels of intentionality • (Shakespear does 6 :-) • T = 1 / [3.x^N] • T is trust metric • 3.x is a number between 3 and 4 • N is distance in social net

  10. Conjecture on N? • N = 0 = Kin (sex) • N = 1 = friends (beer/drugs) • N = 2 or more = acquaintances(dancing/music/laughing at same jokes) • How does this help in facebook?

  11. Conjecture on Online v. Real • We’re looking at co-lo networks • c.f. haggle, cityware - bluetooth etc • AND online social networks • Friendship graph on orkut,li,facebook • AND communication networks • Email address book, sms, phonecalls • Can use to infer real relationship • I.e. type of edge in graph (and value of N)

  12. Conjectures on Trust • Trust in terms of revelation/disclosure • Or carrying data (in ferry net) • Or simple automated/default grouping for ACLs • Need to do some experiments • Figure out how ties are broken • Forgetting • How new tools/technology affect • Size and dynamics of social net…

  13. EU Social Net Project Questions • What net/edge type is more likely to cause anedge in another net? • Does meeting someone dominate over online orvice versa - • i.e. how does new tech affect x (size of immediategang) and • N (scope of gang/level of intentionalityreasoning?)? • Can you use this to detect dodgy behaviour (spam, bullying, etc)?

  14. Ongoing studies • Data? • We have large datasets for single edge-type/modality • (6M phone call timeloc, 1M social net) • But only very small datasets for 2 or 3 modalities • 30 army base people -> retirement • 100 school leavers -> University • Very heavy-lifting • Not only lots of data processsing, but worse:- • Interview eahc user for context • Privacy? • Correlating (datamining) the different nets is massive breach of trust • Usefulness?

  15. Usefulness? • Improve privacy • As mentioned, could auto-default Fb settings and relate to phone/locn • Could also use as interest based filter • Fundamental understanding of social groups • How society/technology co-evolve • Social inclusion and accessibility (!) • Epidemiology (*) • Buzztraq • Use currency of local interest to • Fetch content…

  16. Epidemiology • Two projects - • Emulation (ESRC) • Run s/w on smart phone that mimics a disease • Has a “vector” and SIR(!) parameter per person • Run on “real socieity” based on meeting duration/proximity/frequency • Flubook (Horizon) • Panic button (“Not well”/”Feelin better”) • Uploads list of contacts in last week via free SMS • Puts anonymized data on google maps • Alerts trusted friendship group on facebook

  17. SIR • Susceptibility, Infectiousness, Recovery • Given contact distribution, • Can compute progress of epidemic • Whether collapse (S, I low, R high) • Or go pandemic (S, I high) • As with relationship between online and RL behaviour for socialising, • Flubook might alter contact rate… • ….systematically for subset of population • …(social or geographic) with high S/I • Help prevent/collapse epidemic

  18. Thank you… • Questions? …

  19. And another thing • Virtualising online social self • Floating it in the “cloud” • Crypt content, but allow cloud/fb to match interests (for advertising) • Migrate it to track user (and handset) • Performance gain • handset can be meagre cpu/memory • Latency reduced • Synchronisation/persistence assured • Don’t care if handset lost/stolen :-)

  20. Snakes (and Ladders) on a Plane • Human • Node • World

  21. Threads of your life • Human level is activities & relationships • Nodal level is processing and storage • World level is location and context

  22. Idea is… • To allow mobile (compact/portable) representation of your activities and relationships (0wned by ou) • Roam across arbitrary nodes in environment (embedded or handset owned by anyone) • While recording where you are and context (= other people)

  23. 2. Data Collection for Modelling Contact Networks Eiko Yoneki and Jon Crowcroft eiko.yoneki@cl.cam.ac.uk Systems Research Group University of Cambridge Computer Laboratory

  24. Outline • Purposes of Data Collection  Modelling Human Contact Networks • Proximity Data Collection Methodology • Issues for Data Collection • Examples of Data Analysis • Extending to Collect/Correlate Online Data • Conclusion

  25. Purpose of Data Collection 25 • Building communication protocol based on proximity • EU FP6 Haggle Project • Inferring social interaction, opinion dynamics  Apply results to networking and computer systems • EU FP7 Socialnets, EU FP7 Recognition • Network modelling for epidemiology • EPSRC Data Driven Network Modelling for Epidemiology • Understanding behaviour to infectious disease outbreak - social and economic influences • ESRC FluPhone Project

  26. Haggle: Pocket Switched Networks • Networked distributed database over opportunistically connected devices (e.g. Mobile phones) Legacy network (e.g. the Internet) Ex. Haggle Twitter EU FP6 Haggle http://www.haggleproject.org 26

  27. FluPhone Project • Understanding behavioural responses to infectious disease outbreaks • Extending data collection to general public • https://www.fluphone.org 27

  28. Purpose of Data Collection Modelling Contact Networks: Empirical Approach 28 Robust data collection from real world Post-facto analysis and modelling yield insight into human interactions Data is useful from building communication protocol to understanding disease spread

  29. Proximity Data Collection • Sensor board (iMote), mobile phone • Proximity detection by Bluetooth, and/or GPS • Environmental information (e.g. in train, on road) AroundYou iMote FluPhone 29

  30. Proximity Detection by Bluetooth Only ~=15% of devices Bluetooth on Scanning Interval 2 mins iMote (one week battery life) 5 mins phone (one day battery life) or continuous scanning by station nodes Bluetooth inquiry (e.g. 5.12 seconds) gives >90% chance of finding device Complex discovery protocol Two modes: discovery and being discovered 5~10m discover range Can it produce reliable data (negligible noise)? 30

  31. Sensor Board or Phone or ... • iMote needs disposable battery • Expensive • Third world experiment • Mobile phone • Rechargeable • Additional functions (messaging, tracing) • Smart phone: location assist applications • Provide device or software • Combine with online information (e.g. Twitter) 31

  32. Phone Price vs Functionality • ~<20 GBP range • Single task (no phone call when application is running) • ~>100 GBP • GPS capability • Multiple tasks – run application as a background job • Challenge to provide software for every operation system of mobile phone 32

  33. Location Data • Location data necessary? • Ethic approval gets tougher • Use of WiFi Access Points or Cell Towers • Use of GPS but not inside of buildings • Infer location using various information • Online Data (Social Network Services, Google) • Us of limited location information – Post localisation Scanner Location in Bath 33

  34. Target Population        34 • Provide devices to limited population or target general public • For epidemiology study ~=100% coverage may be required • Fluphone project: participants will be general public • Or school as mixing centres

  35. Experiment Parameters vs Data Quality • Battery life vs Granularity of detection interval • Duration of experiments • Day, week, month, or year? • Data rate • Data Storage • Contact /GPS data <50K per device per day (in compressed format) • Server data storage for receiving data from devices • Extend storage by larger memory card • Collected data using different parameters or methods  aggregated? 35

  36. Data Retrieval Methods • Retrieving collected data: • Tracking station • Online (3G, SMS) • Uploading via Web • via memory card • Incentive for participating experiments • Collection cycle: real-time, day, or week? 36

  37. Data Transformation for Analysis Transform to discrete version of contact data Deal with noise and missing data Ex. transitivity closure Data analysis requires high performance computer and storage Low volume - raw data in compact format Transformation of raw data for analysis increases data volume 37

  38. Security and Privacy • Current method: Basic anonymisation of identities (MAC address) • FluPhone Project – use of HTTPS for data transmission via 3G • Anonymising identities may not be enough? • Simple anonymisation does not prevent to be found the social graph • Ethic approval tough! • 40 pages of study protocol document for FluPhone project – took several months to get approval 38

  39. Consent 39

  40. Human Connectivity Traces Capture Human Interactions ..thus far not large scale Crawdad DB http://crawdad.cs.dartmouth.edu/ Contact: 025d04b2b3f 4650000025d0 5416492246711621549 5416492246711644527 Location: 0025d0e113da [lon: -3.384610278596745E125; lat: 1.3168305280597862E182] 5066619950170431763 HAGGLE 40

  41. Size of largest connected nodes shows network dynamics Regularity of Network Activity 5 Days Tuesday 41

  42. Inter Contact Time of Pair Nodes • Power law distribution (+ exponential decay) cutoff Time 42

  43. Classification of Node Pairs I: Community High Frequency - Long Duration: II: Familiar Stranger High Frequency - Short Duration: III:Stranger Low Frequency – Short Duration: IV: Friend Low Frequency - High Duration: II I Number of Contact III IV Contact Duration 43

  44. Betweenness Centrality • Frequency of a node that falls on the shortest path between two other nodes MIT Cambridge 44

  45. Uncovering Community Contact trace in form of weighted (multi) graphs Contact Frequency and Duration Use community detection algorithms from complex network studies K-clique, Weighted network analysis, Betweenness, Modularity, Fiedler Clustering etc. Fiedler Clustering 45

  46. 46 Visualisation of Community Dynamics

  47. Extending Data Collection to OSN • Online Social Networks (e.g. Facebook, Twitter) • Potential to obtain data of dynamic behaviour • High volume of data • Does Facebook matter? • Over 190 M users • Growth rates for 2008 around the world • Italy: 2900%, Argentina: 2000%, Indonesia: 600 47

  48. Power Law Degree Distribution • Crawled original Stanford (15043 Nodes), Harvard (18273 nodes) networks • From era when UIDs assign sequentially • Obtains friends of each user, and their affiliations • 2.1 million links, Maximum degree 911 48

  49. Information Cascade thru Social Networks • Use Google geo-coding API - predict the geographical access patterns • T0................................................Tk Texas Illinois Florida 49

  50. Conclusions 50 • Real World Data is Powerful! • Analyse Network Structure of Social Systems to Model Dynamics  Emerging Research Area • Weighted networks • Modularity • Centrality (e.g. Degree) • Community evolution and dynamics • Network measurement metrics • Patterns of interactions • Plan purpose of data collection first that leads to decide data collection method • Solve ethic issues/approval in advance • Combine data collection using device and available online data for efficiency and accuracy

More Related