200 likes | 372 Views
Topic. Inferring Demographic Attributes of Anonymous Internet Users. Web Mining Seminar Qian, Jun. Structure. 1 Abstract 2 Introduction 3 Approach 4 Conclusion. Abstract. Anonymous internet uses Advertisement & demographic attributes Usage information Latent Semantic Analysis
E N D
Topic Inferring Demographic Attributes of Anonymous Internet Users Web Mining Seminar Qian, Jun
Structure • 1 Abstract • 2 Introduction • 3 Approach • 4 Conclusion
Abstract • Anonymous internet uses • Advertisement & demographic attributes • Usage information • Latent Semantic Analysis • Neural Model
Structure • 1 Abstract • 2 Introduction • 3 Approach • 4 Conclusion
The Problem • Web advertisers want to target customers with certain demographic attributes • Most Internet users are anonymous
The Solution • 1 Post the ad on relevant web-sites • 2 Wait for the search term of the users • 3 Make survey • Anyway…...
Target Of This Research • Build a high-quality database to establish the possibility of inferring up to 6 demographic factors to those whose demographic information is not otherwise available
Methodology • Collect usage information • Prepare usage information-LSA • Create a neural model
LSA Overview • Information retrieval technique to create vector • Like create a single vector representing an internet user of interest • Combination of vectors and a vector
Vector-space Information Retrieval • Documents are vectors of terms d=(t1,t2,…tn) • A query is a vector of terms as well q=(t1,t2,…tn) • Term-by-Document matrix/Row-by-Column
The Singular Value Decomposition (SVD) • Decompose txd term-by-document matrix A , A = TSDt, into • a txk matrix T of term vectors • the transpose of a dxk matrix of document vectors • a kxk diagonal matrix S of singular value , define 100<k<300
Structure • 1 Abstract • 2 Introduction • 3 Approach • 4 Conclusion
Collect Background Information • Target----- a collection of documents consisting of popular web pages accessed by internet users • Procedure--a web-crawler was used,web pages with less than 4k bytes in size were accessed
Create A Term By Document Matrix • Target------ Create term-by-document matrix from the document collection as input • Procedure--SMART software from Cornell University
Perform A SVD On The Term-document Matrix • Target------an LSA vector representing all the usage data associated with each Internet user of interest • Procedure--Compute the sum of the vectors in the matrix T, scale the resulting vector by the inverse of the matrix S, add the document vectors representing the web pages accessed by the Internet user to the pseudo-document vector created in the previous step
Create A Neural Model To Test The Hypothesis • Model----- 3-layer neural model • Training--- independent & dependent variables • Number----40000 observations for training, 20000 observations for validation
Variables Variables Gender Age Under 18 Age 55+ Income Under $50000 Marital Status Some College Education Children in the Home Possible Values male, female true,false true,false true,false single,married true,false true,false
Training • Training Data: contain equal proportions of the values of the dependent variable under consideration • Validation Data:contain true proportions of the values of the dependent variable under consideration
Structure • 1 Abstract • 2 Introduction • 3 Approach • 4 Conclusion
Conclusion • It is really possible to make demographic inferences about Internet users for whom information is not otherwise available • Privacy concern