1 / 20

Inferring Demographic Attributes of Anonymous Internet Users

Topic. Inferring Demographic Attributes of Anonymous Internet Users. Web Mining Seminar Qian, Jun. Structure. 1 Abstract 2 Introduction 3 Approach 4 Conclusion. Abstract. Anonymous internet uses Advertisement & demographic attributes Usage information Latent Semantic Analysis

alec
Download Presentation

Inferring Demographic Attributes of Anonymous Internet Users

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic Inferring Demographic Attributes of Anonymous Internet Users Web Mining Seminar Qian, Jun

  2. Structure • 1 Abstract • 2 Introduction • 3 Approach • 4 Conclusion

  3. Abstract • Anonymous internet uses • Advertisement & demographic attributes • Usage information • Latent Semantic Analysis • Neural Model

  4. Structure • 1 Abstract • 2 Introduction • 3 Approach • 4 Conclusion

  5. The Problem • Web advertisers want to target customers with certain demographic attributes • Most Internet users are anonymous

  6. The Solution • 1 Post the ad on relevant web-sites • 2 Wait for the search term of the users • 3 Make survey • Anyway…...

  7. Target Of This Research • Build a high-quality database to establish the possibility of inferring up to 6 demographic factors to those whose demographic information is not otherwise available

  8. Methodology • Collect usage information • Prepare usage information-LSA • Create a neural model

  9. LSA Overview • Information retrieval technique to create vector • Like create a single vector representing an internet user of interest • Combination of vectors and a vector

  10. Vector-space Information Retrieval • Documents are vectors of terms d=(t1,t2,…tn) • A query is a vector of terms as well q=(t1,t2,…tn) • Term-by-Document matrix/Row-by-Column

  11. The Singular Value Decomposition (SVD) • Decompose txd term-by-document matrix A , A = TSDt, into • a txk matrix T of term vectors • the transpose of a dxk matrix of document vectors • a kxk diagonal matrix S of singular value , define 100<k<300

  12. Structure • 1 Abstract • 2 Introduction • 3 Approach • 4 Conclusion

  13. Collect Background Information • Target----- a collection of documents consisting of popular web pages accessed by internet users • Procedure--a web-crawler was used,web pages with less than 4k bytes in size were accessed

  14. Create A Term By Document Matrix • Target------ Create term-by-document matrix from the document collection as input • Procedure--SMART software from Cornell University

  15. Perform A SVD On The Term-document Matrix • Target------an LSA vector representing all the usage data associated with each Internet user of interest • Procedure--Compute the sum of the vectors in the matrix T, scale the resulting vector by the inverse of the matrix S, add the document vectors representing the web pages accessed by the Internet user to the pseudo-document vector created in the previous step

  16. Create A Neural Model To Test The Hypothesis • Model----- 3-layer neural model • Training--- independent & dependent variables • Number----40000 observations for training, 20000 observations for validation

  17. Variables Variables Gender Age Under 18 Age 55+ Income Under $50000 Marital Status Some College Education Children in the Home Possible Values male, female true,false true,false true,false single,married true,false true,false

  18. Training • Training Data: contain equal proportions of the values of the dependent variable under consideration • Validation Data:contain true proportions of the values of the dependent variable under consideration

  19. Structure • 1 Abstract • 2 Introduction • 3 Approach • 4 Conclusion

  20. Conclusion • It is really possible to make demographic inferences about Internet users for whom information is not otherwise available • Privacy concern

More Related