180 likes | 348 Views
Thank you Prof. Dr. Gerhard Boerner !. Stephen, Thomas, Houjun, Me, Robert Jing. Large Scale Statistics in Internet Behaviors. H ongguang Bi Greetingland , LLC Los Angeles, CA. Chapter 1. Chapter 2. Chapter 3. Internet and WWW History, how it works.
E N D
Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas,Houjun, Me, RobertJing
Large Scale Statistics in Internet Behaviors Hongguang BiGreetingland, LLCLos Angeles, CA
Chapter 1 Chapter 2 Chapter 3 Internet and WWW History, how it works Internet User Behaviors & Privacy Online Advertising Geo, contextual and behavior targetings, Real-time bidding, Yield management About Collect User Information, what and how Chapter 4
Chapter 1: Internet and WWW Cosmology: Nature defines physical laws Internet: Human defines laws (or specifically: protocols) Cosmology: Real World Internet: Information World, or Virtual World Cosmology: photons, electrons, neutrinos… (monad? Leibniz) Internet: bit Cosmology: particles => stars => galaxies => clusters etc. Internet: bits => bytes or integers => words => pages & emails Cosmology: millions of galaxies detected => billions Internet: millions to billions of users Cosmology: goal=> structures, statistics of galaxies Internet: goal=>behaviors, statistics of users
Open Systems Interconnection Model: 7 layers HTTP Encrypt TCP, UDP IP
Information Age: Web and Email WWW: March 1989, Tim Berners-Lee http 0.9: 1995; http 1.0: 1996; http 1.1: June 1999, RFC 2616 Mailbox Protocol: 1971 SMTP: 1982, RFC 821Later developments: UUCP, sendmail,
http, how webworks Cookie is the only way that server can insert data into user’s browser. How does it work? • User sends request • URL Address • Browser (Firefox, IE, Mobile etc.) • Language, who refers you, etc. • Cookies • Web server responses • Message body • Message size, modified time etc. • Server information • Setup cookies Client: send request without cookie; Server: response with a “Set-Cookie” header, containing some informationClient: send request with a “Cookie” header containing the SAME information Cookie is bound to the specific server, and can be multiple
Chapter 2: User Behaviors & Privacy • 1 Billion internet users: few hundred millions in Europe, 100M in US, China • IP4 is full, which is 2^32 = 4.3 Billion addresses • Google gets 80 billions views every day, e.g. one internet user visits about 1 Google page very day (e.g. search, email, ad) • Internet brings new economics, life styles, and social phenomena. E.g. online shopping, social network (facebook), newspaper and publication, US elections • For the 1st time in history, human beings might lose privacy; and their social activities can be tracked, studied, finally, manipulated by powerful players such as US government or Google etc.
Cases: • Currently: “Tracking case”, Apply & GoogleInformation is transmitted securely to the Apply iAd server via a cellular network connection or Wi-Fi Internet conneciton,” explained a letter Apple sent to US Rep Edward Marke, D-Mass., on July 12 in response to his request for information, “The latitude/longitude coordinates are converted immediately by the server to a five-digit ZIP code”. • 2008 “Suicide” case, mySpace • On the technical side, Credit card industry has successfully built up tracking tools that trackuser behaviors for 20 year!
What kind of Private Information? • May lose, un-protected • Demographic information e.g.age, gender, income, household • Via ISP, or cellular service provider, social network sites, other Free services • You definitely expose • Geographic information (via IP) • OS and Browser, such as PC, Linux, iPhone • Language • May lost, protected by laws • You name, identity cards (credit card, SSI, driver license etc.) • Via online shopping sites, government/university service sites, credit report sites, dating sites etc. • practically, still be stolen => virus, spyware, break-in
Chapter 4: Collect User Information • Existing Techniques • Relational Database • Moving averages • Artificial neural Network • User Profile • Uniquely identified by an anonymous ID • The ID is tracked by using cookie and permanently saved in disk • Every ID has a profile , consisting of geographic information, demographic information, interests, shopping histories, recent behavior types (or, audiences) => any valuable information for advertisers
Relational Database • A database consists of many “normalized” tables • A table consists of a primary key and multiple values • One table can have many keys to search ResearchGroup: group_id, name, desciption, head Member: member_id, group_id, name, type (profession, postdoc, student), status (current, left) Left: left_id, member_id, when, where
Moving Average A simplified time-series analysis tool • A new value is an average of the last N detections, with weights that decay on time.
Artificial Neural Network Machine learning Training: 3,5 => 15 4,6 => 24 9,8 => 72 …. … .. 6,7 => 41 Neurons work in parallel => very fast
Chapter 5: Online Advertising The Good side of tracking
The system we are developing The Good side of user tracking Current Challenges • server process 10,000requests per second • for each request, update user profile with 100 attributes • pick up one from 100possible advertiser candidates • 10^8 decisions per second • 100 million impressions per day
In the Future • Statistics => dynamic, finding rules,clustering analysis,time-series analysis • Instant change of behaviors , e.g. shopping intention • How are behaviors affected by environment : social effect, “friend-recommendation” effect • THANKS!