260 likes | 411 Views
An Automatic Advertisement/Topic MODELING AND RECOMMENDING SYSTEM. Yi Hou , Center for Clinical Investigation (CCI) EECS Department, Case Western Reserve University. Review. Word matters! -->. Motivation Lack systematic and automatic ADs/Topic categorizing system
E N D
An Automatic Advertisement/Topic MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical Investigation (CCI) EECS Department, Case Western Reserve University
Review Word matters! --> • Motivation • Lack systematic and automatic ADs/Topic categorizing system --> no place to specify category • Social Network Platform Popularity revenue from Facebook advertising shoot up 191 percent year-over-year in the first --> quarter of 2014
Tasks • 1) Given all the ADs/topics, establish a word network, where two words share an edge iff they co-occurred in at least one AD/topic and the edge weight is the counting of the times they have occurred together in an AD/topic. • Small world, power law distribution • 2) Given a word network, build a taxonomy T • Modularity based clustering • Top 20 IF-IDF keywords (due to vocabulary issue) • Empirical Network Analysis • 3) Given a user's current texting information. e.g. the most recent few Tweets/Posts (we initiate the value of 10 here), we are trying to build a ranking model R, where each AD will be ranked based on R and the top-10 ADs will be returned to the user.
Data Source • Data Crawling • Twitters stream APIs • ruby gem ”twitterstream” • acquired application-only authentication tokens • set up listening point recording global Tweets • only selected 5 categories of ADs/Topics: Car/Dating/Education/Grocery/Hiring, by keyword filtering • Manually collected data (experimented on) • only selected 5 categories of ADs/Topics: Car/Dating/Education/Grocery/Hiring.
Method • Data Preprocessing • Build word network • Build topic taxonomy • ADs/topic ranking
Data Preprocessing • Remove Stop Words • Such as “is”, “are”, “when” … • List from Stanford NLP lab. • Stemming • Reducing inflected words to their stem, base or root form • Used Porter stemmer at http://text-processing.com/demo/stem/ • e.g. “stemming” ”stem” • Result • Original: “I like data mining. It is awesome.” • New: ”I like data mine It awesom"
Data Visualization • In total, 1104 unique words, with word cloud representation.
Build Word Network • Co-occurrenceMatrix of Words • co-occurrence counting served as similarity of measurement of word pairs • co-occurrence matrix served as our adjacent matrix • co-occurrence counting served as the edge weight • Coded in C++. • # of nodes: 1104 • # of edges: 18972
Build topic Taxonomy • Modularity-based community finding • The algorithm exhaustively search the graph to maximize the modularity measurement • Heavily connected component signify the topic models • Each cluster/topic described by top-K highest TF-IDF keywords
Modularity-based finding • Modularity • one measure of the structure of networks or graphs • A measure of goodness of division of a network into sub clusters • Q represents the measure of goodness • C represents sets of clusters • eijstands for number of edges between cluster i, j • m represents total number of edges • Reference: • 1. Vincent D Blondel, Jean-Loup Guillaume, RenaudLambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000
How the algorithm works • Start with all vertices initiated as isolated clusters; • Successively join clusters with greatest increase ∆Q for modularity measurement; • Stop the procedure when joining any two clusters will result in ∆Q ≤ 0;
Clustering • We found 13 clusters: • Visualized: different clusters with different colors.
Clustering • We found 13 clusters: • Why not 5? • If we zoom in and look at 2 clusters, yellow and blue, respectively. We can see that they actually both belong to grocery. • So actually modularity based clustering categorize words in a better granularity. (Divided grocery into food/electronics…)
Clustering • Percentage distribution of 13 clusters:
Clustering • Top 20 TF-IDF keywords in each cluster: • Intuitively: • Cluster 1: chevy, ford cars • Cluster 2: date, single dating • Cluster 3: lunch, friend social (new) • Cluster 4: hire, join hiring • ……. • We observed well-defined clusters. • We observed new categories.
Empirical Network Analysis • Property definitions: • Diameter d: the diameter of a network is the largest geodesic distance in the (connected) network. • Shortest path lu,v: the shortest path between two nodes u and v in the network. • Average shortest path lnetwork: the average shortest path for every pairs of nodes in the network. • Power law distribution: node degree distribution follows a power law, at least asymptotically. • Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as lnetwork ∝ ln(N).
Empirical Network Analysis • Clustering coefficients definitions: • 1) Global clustering coefficient : • Nt: number of triangles formed in the graph • Nc: connected triple nodes in the graph • 2) Local clustering coefficient : • Directed graph: Undirected graph: • ni: direct neighbors of node I, • nc: direct connections between i’s direct neighbors • Averaged over all nodes: • Reference: “Social network analysis” – by LadaAdamic, University of Michigan
Empirical Network Analysis • In our experiment, we use local clustering coefficient definition(for undirected graph) , here is the statistics of the experiments. • The network satisfies small-world property! • Let’s recall: • Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as lnetwork ∝ ln(N).
Network Diameter • Betweenness Centrality Closeness Centrality Eccentricity Reference: 1. UlrikBrandes, A Faster Algorithm for Betweenness Centrality, in Journal of Mathematical Sociology 25(2):163-177, (2001)
Power Law Distribution • Degree Distribution In-degree Out-degree
Power Law Distribution • Degree Distribution In-degree Out-degree
Power Law Distribution • The nodes with high degrees satisfy power law distribution. • The nodes with low degrees don’t. • Because of limit of data, 1104 words in total.
Continue work: ranking • FB ranking: assign weights for each features. • But Youtube added randomness to increase recall at the cost of precision. • Reference: • 1, James Davidson, Benjamin Liebald, Junning Liu, PalashNandy, Taylor Van Vleet, UllasGargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, DasarathiSampath: The YouTube video recommendation system. RecSys 2010: 293-296
Continue work: ranking • Our ranking: a combination of FB news ranking and Youtube ranking: • We use cosine similarity to measure which topic cluster the user is most interested in. • We generate top 8 ADs/Topic by FB ranking algorithm. • And we add two more ADs/Topic by random. • Increase the prediction broadness (increase recall), at the cost of precision.
Limitation and Future Work • Will perform the system in larger scale dataset. • Since we don’t have real data, e.g. the performance(CTR) for each AD/topic, we need to generate them based on Gaussian model.
Thank you! Questions?