1 / 18

Thank you Prof. Dr. Gerhard Boerner !

Thank you Prof. Dr. Gerhard Boerner !. Stephen, Thomas, Houjun, Me, Robert Jing. Large Scale Statistics in Internet Behaviors. H ongguang Bi Greetingland , LLC Los Angeles, CA. Chapter 1. Chapter 2. Chapter 3. Internet and WWW History, how it works.

forest
Download Presentation

Thank you Prof. Dr. Gerhard Boerner !

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas,Houjun, Me, RobertJing

  2. Large Scale Statistics in Internet Behaviors Hongguang BiGreetingland, LLCLos Angeles, CA

  3. Chapter 1 Chapter 2 Chapter 3 Internet and WWW History, how it works Internet User Behaviors & Privacy Online Advertising Geo, contextual and behavior targetings, Real-time bidding, Yield management About Collect User Information, what and how Chapter 4

  4. Chapter 1: Internet and WWW Cosmology: Nature defines physical laws Internet: Human defines laws (or specifically: protocols) Cosmology: Real World Internet: Information World, or Virtual World Cosmology: photons, electrons, neutrinos… (monad? Leibniz) Internet: bit Cosmology: particles => stars => galaxies => clusters etc. Internet: bits => bytes or integers => words => pages & emails Cosmology: millions of galaxies detected => billions Internet: millions to billions of users Cosmology: goal=> structures, statistics of galaxies Internet: goal=>behaviors, statistics of users

  5. Open Systems Interconnection Model: 7 layers HTTP Encrypt TCP, UDP IP

  6. Information Age: Web and Email WWW: March 1989, Tim Berners-Lee http 0.9: 1995; http 1.0: 1996; http 1.1: June 1999, RFC 2616 Mailbox Protocol: 1971 SMTP: 1982, RFC 821Later developments: UUCP, sendmail,

  7. http, how webworks Cookie is the only way that server can insert data into user’s browser. How does it work? • User sends request • URL Address • Browser (Firefox, IE, Mobile etc.) • Language, who refers you, etc. • Cookies • Web server responses • Message body • Message size, modified time etc. • Server information • Setup cookies Client: send request without cookie; Server: response with a “Set-Cookie” header, containing some informationClient: send request with a “Cookie” header containing the SAME information Cookie is bound to the specific server, and can be multiple

  8. Chapter 2: User Behaviors & Privacy • 1 Billion internet users: few hundred millions in Europe, 100M in US, China • IP4 is full, which is 2^32 = 4.3 Billion addresses • Google gets 80 billions views every day, e.g. one internet user visits about 1 Google page very day (e.g. search, email, ad) • Internet brings new economics, life styles, and social phenomena. E.g. online shopping, social network (facebook), newspaper and publication, US elections • For the 1st time in history, human beings might lose privacy; and their social activities can be tracked, studied, finally, manipulated by powerful players such as US government or Google etc.

  9. Cases: • Currently: “Tracking case”, Apply & GoogleInformation is transmitted securely to the Apply iAd server via a cellular network connection or Wi-Fi Internet conneciton,” explained a letter Apple sent to US Rep Edward Marke, D-Mass., on July 12 in response to his request for information, “The latitude/longitude coordinates are converted immediately by the server to a five-digit ZIP code”. • 2008 “Suicide” case, mySpace • On the technical side, Credit card industry has successfully built up tracking tools that trackuser behaviors for 20 year!

  10. What kind of Private Information? • May lose, un-protected • Demographic information e.g.age, gender, income, household • Via ISP, or cellular service provider, social network sites, other Free services • You definitely expose • Geographic information (via IP) • OS and Browser, such as PC, Linux, iPhone • Language • May lost, protected by laws • You name, identity cards (credit card, SSI, driver license etc.) • Via online shopping sites, government/university service sites, credit report sites, dating sites etc. • practically, still be stolen => virus, spyware, break-in

  11. Chapter 4: Collect User Information • Existing Techniques • Relational Database • Moving averages • Artificial neural Network • User Profile • Uniquely identified by an anonymous ID • The ID is tracked by using cookie and permanently saved in disk • Every ID has a profile , consisting of geographic information, demographic information, interests, shopping histories, recent behavior types (or, audiences) => any valuable information for advertisers

  12. Relational Database • A database consists of many “normalized” tables • A table consists of a primary key and multiple values • One table can have many keys to search ResearchGroup: group_id, name, desciption, head Member: member_id, group_id, name, type (profession, postdoc, student), status (current, left) Left: left_id, member_id, when, where

  13. Moving Average A simplified time-series analysis tool • A new value is an average of the last N detections, with weights that decay on time.

  14. Artificial Neural Network Machine learning Training: 3,5 => 15 4,6 => 24 9,8 => 72 …. … .. 6,7 => 41 Neurons work in parallel => very fast

  15. Chapter 5: Online Advertising The Good side of tracking

  16. The system we are developing The Good side of user tracking Current Challenges • server process 10,000requests per second • for each request, update user profile with 100 attributes • pick up one from 100possible advertiser candidates • 10^8 decisions per second • 100 million impressions per day

  17. In the Future • Statistics => dynamic, finding rules,clustering analysis,time-series analysis • Instant change of behaviors , e.g. shopping intention • How are behaviors affected by environment : social effect, “friend-recommendation” effect • THANKS!

More Related