1 / 18

Collective Collaborative Tagging System

Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University. Collective Collaborative Tagging System. People-Powered Knowledge. Delicious example. Bookmark. Tags. Social Networks. People-generated.

milt
Download Presentation

Collective Collaborative Tagging System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University Collective Collaborative Tagging System

  2. People-Powered Knowledge • Delicious example Bookmark Tags Social Networks People-generated

  3. People-Powered Knowledge • Collaborative Tagging • Online bookmarking with annotations • Create social networks • Utilize power of people’s knowledge • Pros and cons • High-quality classifier by using human intelligence • But lack of control or authority

  4. Motivations • Distributed and fragmented knowledge Need an unified data set More accurate and richer information • No flexibility in choosing different information retrieval (IR) algorithms Need a playground to do experiment with various IR techniques  Help to discover hidden knowledge

  5. Proposed System Collective Collaborative Tagging (CCT) System CCT System Data Importer RDF RSS Atom HTML Data Coordinator Distributed Tagging Data Populate Bookmarks/ tags Repository Query with various options User Service Search Result SOAP, REST, …

  6. Development Plan and Progress • 1st - Service and algorithm development • Identify services and algorithms • 2nd - Interface development • Web2.o style interface • REST, SOAP, … • 3rd – Export/import service development • Merging distributed data sets • Export data to build mesh-up sites • So far, we are mainly in 1st stage and do some experiments in 2nd stage

  7. Prototype Different Data Sources Various IR algorithms Flexible Options Result Comparison

  8. Service Types and Algorithms Type Service Description Algorithm I Searching Given input tags, returning the most relevant X (X = URLs, tags, or users) Latent Semantic Indexing (LSI), FolkRank II Recommendation Indirect input tags, returning undiscovered X III Clustering Community discovering. Finding a group or a community with similar interests K-Means, Deterministic Annealing Clustering IV Trend detection Analysis the tagging activities in time-series manner and detect abnormality Time Series Analysis

  9. Data Models (I) • Vector-space model (bag-of-words model) • Assume n URLs and q tags • A URL can be represented by q-dimension vector, di = (t1, t2, … , tq) • A total data set can be represented by n-by-q matrix • Pairwise Dissimilarity Matrix • n-by-n symmetric matrix • Distance (Euclidean, Manhattan, … ) • Angles, cosine, sine, … • O(n2) complexity

  10. Data Models (II) • Graph model • Building a graph with nodes and edges • Edges are indicating relationship • Becoming complex networks (tag graph) • Dissimilarity • Related with path distance • Finding path is important (Shortest path problem) • Naive approach : O(n3) complexity (Source : MSI-CIEC)

  11. Searching • Latent Semantic Indexing • Using vector-space model, find the most similar URLs with user’s query tags • Dimension reduction from high q to low d (q >> d) • Removing noisy terms, extracting latent concepts Ideal Line Recall 2 terms4 terms8 terms20% dim. reductionNone Precision

  12. Clustering • Discover the group structures of URLs • Non-parametric learning algorithm • Non-trivial optimization problem • Should avoid local minima/maxima solution

  13. Deterministic Annealing Clustering • Deterministically avoid local minima • Tracing global solution by changing level of energy • Analogy to physical annealing process (High  Low)

  14. More Machine Learning Algorithms • Classification • To response more quickly to user’s requests • Training data based on user’s input and answering questions based on the training results • Artificial Neural Network, Support Vector Machine,… • Trend Detection • Can be used for prediction/forecasting • Time-series analysis of tagging activities • Markov chain model, Fourier transform, …

  15. Conclusion • The goal of our Collective Collaborative Tagging (CCT) system • Utilize various data sets • Provide various information retrieval (IR) algorithms • Help to utilize people-powered knowledge • Currently various models and algorithms are being investigated • Service interfaces and import/export function will be added soon

  16. Thank you!! Questions? jychoi@cs.indiana.edu

  17. Vector-space Vs. Graph Vector-space Model Graph Model Represen-tation -. q-dimensional vector -. q-by-n matrix -. G(V, E) -. V = {URL, tags, users} Dis-similarity -. Distances, cosine, … -. O(N2) complexity -. Paths, hops, connectivity, … -. O(N3) complexity Algorithm -. Latent Semantic Indexing -. Dimension reduction schemes -. PCA -. PageRank, FolkRank, … -. Pairwise clustering -. MDS

  18. Pairwise Dissimilarity • Pairwise clustering • Input from vector-based model vs. graph model • How to avoid local minima/maxima? (e.g, K-Means) Vector-space model Graph model

More Related