1 / 27

1 st DATA SCIENCE MEETUP in SEOUL

1 st DATA SCIENCE MEETUP in SEOUL. JINYOUNG KIM & HEEWON JEON. Data Science?. Organizations use their data for decision support and to build data-intensive products and services.

barto
Download Presentation

1 st DATA SCIENCE MEETUP in SEOUL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1st DATA SCIENCE MEETUP in SEOUL JINYOUNG KIM & HEEWON JEON

  2. Data Science? • Organizations use their data for decision support and to build data-intensive products and services. • The collection of skills required by organizations to support these functions has been grouped under the term "Data Science". - J. Hammerbacher

  3. Taxonomy of Data Science • What

  4. Taxonomy of Data Science • How

  5. Data Scientist? • A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning – H. Mason

  6. Data Science Meet-up • Let’s learn from each other! • Foster collaboration among participants • Beginning of a long and fruitful journey!

  7. In What Follows… • Presentation • Discussion • Who should care about Big Data, everyone? • Developing Career Path as a Data Scientist

  8. FROM DATA SCIENCETO INFORMATION RETRIEVAL

  9. Information Retrieval? • Definition • The study and the practice of how an automated system can enable its users to access, interact with, and make sense of information. • Characteristics • More than ten-blue-links of search results • Algorithmic solutions for information problems UX / HCI / Info. Vis. Information Retrieval / RecSys Large-scale System Infra. Large-scale (Text)Analytics

  10. IR in the Taxonomy of Data Science • What • How

  11. Major Problems in IR & RecSys • Matching • (Keyword) Search : query – document • Personalized Search : (user+query) – document • Item Recommendation : user – item • Contextual Advertising : (user+context) – advertisement • Quality • PageRank / Spam filtering / Freshness • Relevance • Combination of matching and quality features • Evaluation is critical for optimal performance

  12. The Great Divide: IR vs. RecSys IR RecSys User / Item Support decision making Proactive (push item) RecSys / KDD / UMAP • Query / Document • Provide relevant info. • Reactive (given query) • SIGIR / CIKM / WSDM • Both requires similarity / matching score • Personalized search involves user modeling • Most RecSys also involves keyword search • Both are parts of user’s info seeking process

  13. Improved Query Modeling for Structured Documents A Sneak-Peak of Information Retrieval Research

  14. Matching for Structured Document Retrieval [ECIR09,12,CIKM09] • Field Relevance • Different field is important for different query-term ‘registration’ is relevant when it occurs in <subject> 1 1 2 2 1 2 ‘james’ is relevant when it occurs in <to> Why don’t we provide field operator or advanced UI?

  15. Estimating the Field Relevance • If User Provides Feedback • Relevant document provides sufficient information • If No Feedback is Available • Combine field-level term statistics from multiple sources from/to from/to + ≅ title title content content Collection Top-k Docs from/to title content Relevant Docs

  16. Retrieval Using the Field Relevance • Comparison with Previous Work • Ranking in the Field Relevance Model sum q1 q2 ... qm q1 q2 ... qm f1 f1 f1 f1 w1 w1 multiply P(F1|q1) P(F1|qm) f2 f2 f2 f2 w2 w2 P(F2|q1) P(F2|qm) ... ... ... ... Per-term Field Score fn fn fn fn wn wn P(Fn|q1) P(Fn|qm) Per-term Field Weight

  17. Evaluating the Field Relevance Model • Retrieval Effectiveness (Metric: Mean Reciprocal Rank) Per-term Field Weights Fixed Field Weights

  18. Lessons from Data Science Perspective • Understanding user behavior provides key insights • The notion of field relevance • Choice of estimation technique relies on many things • Availability of data and labels (e.g., can we use CRF?) • Efficiency concerns (possibility of pre-computation) • Evaluation is critical for continuous improvement • IR people are very serious about dataset and metrics

  19. Data-driven Pursuit of Happiness

  20. LiFiDeA (= Life+Idea) Project • Goal • Improved Personal Info Mgmt. => Self Improvement • Collect behavioral data (schedule and tasks) • Correlate them with subjective judgments of happiness • Workflow • Write task-centric journals on Evernote • Weekly data migration into spreadsheet • Statistical analysis using Excel chart and R • Findings • Tracking itself helps, but not for a long time • Keeping right amount of tension is critical My Source of Inspiration

  21. My Self-tracking Efforts • Life-optimization Project (2002~2006) • Software dev. project for myself, for 4 years • Covers all aspects of personal info mgmt. • Core component of my Ph.D application

  22. My Self-tracking Efforts • LiFiDeA Project (2011-2012) Data Moved onto Excel Sheet Raw Data on Evernote Happiness by Place Happiness by Wake-up Time

  23. Lessons Learned • Combine existing solutions whenever possible • “Done is better than perfect” applies here • *You* should own your data, not the app you use • Apps can come and go, but the data should stay • Minimize data collection efforts for sustainability • Integrate self-tracking into your daily routine • “Effort << Benefit” should be kept all the time • Communicating regularly helps you make progress • Writing has been the best way to learn about the subject

  24. OPTIONAL SLIDES

  25. Criteria for Choosing IR vs. RecSsys • Confidence in predicting user’s preference • Availability of matching items to recommend RecSys IR • User’s willingness to express information needs • Lack of evidence about the user himself

  26. HCIR Way: IRWay: The from Query to Session Rich User Modeling Rich User Interaction USER SYSTEM Response Action User Model Interaction History Response Action Profile Context Behavior Response Action Filtering Conditions Related Items … Filtering / Browsing Relevance Feedback … Providing personalized results vs. rich interactions are complementary, yet both are needed in most scenarios. No real distinction between IR vs. HCI, and IR vs. RecSys

  27. The Great Divide: IR vs. CHI IR CHI User / System User Value / Satisfaction Interface / Visualization Human-centered Design User Study CHI / UIST / CSCW • Query / Document • Relevant Results • Ranking / Suggestions • Feature Engineering • Batch Evaluation (TREC) • SIGIR / CIKM / WSDM Can we learn from each other?

More Related