100 likes | 212 Views
Eddie Aronovich eddiea@cs.tau.ac.il. Tools presentation. Once upon a time. “command line” input Files Web crawling (pull) Web sensors (using API - push). “Evolution of the input”. LinkedIn MAP Gapminder
E N D
Eddie Aronovich eddiea@cs.tau.ac.il Tools presentation
“command line” input Files Web crawling (pull) Web sensors (using API - push) “Evolution of the input”
LinkedIn MAP Gapminder - http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html - http://www.ted.com/talks/nicholas_christakis_the_hidden_influence_of_social_networks.html Evolution of the output (multiple dimensions)
Twitter • http://api.twitter.com/1/users/show.json?screen_name=TheMarker • Format the output (json) https://dev.twitter.com/docs/api/1/get/search • FB • /usr/bin/python fbconole.py fql("SELECT uidFROM user WHERE username='ariel.bardavid.5'" https://developers.facebook.com/docs/reference/apis/ API examples
import json from pprint import pprint json_data=open('json_data') data = json.load(json_data) pprint(data) json_data.close() Python code for json format
wget + parser (html2txt) ETL (Extract, Transform, Load) Structured vs. Unstructured data Web crawling
Scripting • bash • sed • awk • cron (and scratch space) • Hadoop • Condor Some general tools
Collect Data (and extract it) Analyze Data Build a model Run the model Collect more data Overview