80 likes | 263 Views
Miniprojects. Peeking into massive Online Social Networks (aka “Walking on Facebook”). Maciej Kurant. Miniprojects. Essentially, measure. LastFM. www.last.fm/user/rj. LastFM. API www.last.fm/api. LastFM. http://ws.audioscrobbler.com/2.0/?method=user.getfriends &user= rj &limit=10
E N D
Miniprojects Peeking into massive Online Social Networks (aka “Walking on Facebook”) MaciejKurant
Miniprojects Essentially, measure
LastFM • www.last.fm/user/rj
LastFM API • www.last.fm/api
LastFM http://ws.audioscrobbler.com/2.0/?method=user.getfriends &user=rj &limit=10 &page=1 &api_key=1b4218629b50c1159e15a6b8285b90ba API
LastFM http://ws.audioscrobbler.com/2.0/?method=user.getfriends &user=rj &limit=10 &page=1 &api_key=1b4218629b50c1159e15a6b8285b90ba API In Python import urllib2 import re api_key = '1b4218629b50c1159e15a6b8285b90ba' user = "rj" command = "http://ws.audioscrobbler.com/2.0/?method= user.getfriends&user="+user+"&limit=10&page=1&api_key="+api_key data = urllib2.urlopen(command).read() # XML format degree = int(re.search('total="(\d+)"', data).group(1)) friends = re.findall("<name>(.*)</name>", data) print degree # number of friends of "rj" print friends # first 10 friends (because page=1 and limit=10). For BFS, you need all friends. Set “limit=500” and pull multiple pages if necessary. For Random Walks, you will need only the degree and one neighbor. Set “limit=1” and 1)learn the degree, 2) select the index i of the neighbor, 3) Get the name by setting “page=i”.
Surprises • Banned user (once reached, seem to have 0 friends) • Server not responding • Friendship graph not connected (solution: consider only the component connected to user 'rj'.) • Case-sensitiveness? (rj == RJ ??) • … • Your program has to deal with them! I N E K G D M B H L A C J F
Miniprojects Data: LastFM, the component connected to user 'rj' 1) Random node Use MHRW of length L=50 to select a node uniformly at random from LastFM. Repeat it 100 times. Report the average degree of selected nodes, and of their neighbors. What changes if L counts only unique nodes in MHRW? Why? What happens if you use RW instead of MHRW? 2) RW vs RWRW Run RW in LastFM. What are the average <playcount>, <playlists>, <age>, <id>, and number of friends observed in the sample. How do they change after correcting for the degree bias (RWRW)? 3) Component size Based on RW, estimate the size of the component connected to user 'rj'. Use two approaches: [Katzir’11] and [Kurant’13?]. 4) BFS Collect a BFS sample starting from user 'rj' in LastFM. What node degrees, <playcount>, <playlists>, <age>, <id>, do you sample as you collect more nodes? How about implementing it on multiple threads? 5) Barbarian sampling Try to download the entirecomponent connected to user ‘rj’. You will probably need to use a cluster of machines, multiple threads, etc. Use your own API-key, please. Once you have it, report basic properties: size, average degree, degree distribution, etc (e.g., average <age>?). Compare with others.