1 st DATA SCIENCE MEETUP in SEOUL

1st DATA SCIENCE MEETUP in SEOUL JINYOUNG KIM & HEEWON JEON

Data Science? • Organizations use their data for decision support and to build data-intensive products and services. • The collection of skills required by organizations to support these functions has been grouped under the term "Data Science". - J. Hammerbacher

Taxonomy of Data Science • What

Taxonomy of Data Science • How

Data Scientist? • A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning – H. Mason

Data Science Meet-up • Let’s learn from each other! • Foster collaboration among participants • Beginning of a long and fruitful journey!

In What Follows… • Presentation • Discussion • Who should care about Big Data, everyone? • Developing Career Path as a Data Scientist

FROM DATA SCIENCETO INFORMATION RETRIEVAL

Information Retrieval? • Definition • The study and the practice of how an automated system can enable its users to access, interact with, and make sense of information. • Characteristics • More than ten-blue-links of search results • Algorithmic solutions for information problems UX / HCI / Info. Vis. Information Retrieval / RecSys Large-scale System Infra. Large-scale (Text)Analytics

IR in the Taxonomy of Data Science • What • How

Major Problems in IR & RecSys • Matching • (Keyword) Search : query – document • Personalized Search : (user+query) – document • Item Recommendation : user – item • Contextual Advertising : (user+context) – advertisement • Quality • PageRank / Spam filtering / Freshness • Relevance • Combination of matching and quality features • Evaluation is critical for optimal performance

The Great Divide: IR vs. RecSys IR RecSys User / Item Support decision making Proactive (push item) RecSys / KDD / UMAP • Query / Document • Provide relevant info. • Reactive (given query) • SIGIR / CIKM / WSDM • Both requires similarity / matching score • Personalized search involves user modeling • Most RecSys also involves keyword search • Both are parts of user’s info seeking process

Improved Query Modeling for Structured Documents A Sneak-Peak of Information Retrieval Research

Matching for Structured Document Retrieval [ECIR09,12,CIKM09] • Field Relevance • Different field is important for different query-term ‘registration’ is relevant when it occurs in <subject> 1 1 2 2 1 2 ‘james’ is relevant when it occurs in <to> Why don’t we provide field operator or advanced UI?

Estimating the Field Relevance • If User Provides Feedback • Relevant document provides sufficient information • If No Feedback is Available • Combine field-level term statistics from multiple sources from/to from/to + ≅ title title content content Collection Top-k Docs from/to title content Relevant Docs

Retrieval Using the Field Relevance • Comparison with Previous Work • Ranking in the Field Relevance Model sum q1 q2 ... qm q1 q2 ... qm f1 f1 f1 f1 w1 w1 multiply P(F1|q1) P(F1|qm) f2 f2 f2 f2 w2 w2 P(F2|q1) P(F2|qm) ... ... ... ... Per-term Field Score fn fn fn fn wn wn P(Fn|q1) P(Fn|qm) Per-term Field Weight

Evaluating the Field Relevance Model • Retrieval Effectiveness (Metric: Mean Reciprocal Rank) Per-term Field Weights Fixed Field Weights

Lessons from Data Science Perspective • Understanding user behavior provides key insights • The notion of field relevance • Choice of estimation technique relies on many things • Availability of data and labels (e.g., can we use CRF?) • Efficiency concerns (possibility of pre-computation) • Evaluation is critical for continuous improvement • IR people are very serious about dataset and metrics

Data-driven Pursuit of Happiness

LiFiDeA (= Life+Idea) Project • Goal • Improved Personal Info Mgmt. => Self Improvement • Collect behavioral data (schedule and tasks) • Correlate them with subjective judgments of happiness • Workflow • Write task-centric journals on Evernote • Weekly data migration into spreadsheet • Statistical analysis using Excel chart and R • Findings • Tracking itself helps, but not for a long time • Keeping right amount of tension is critical My Source of Inspiration

My Self-tracking Efforts • Life-optimization Project (2002~2006) • Software dev. project for myself, for 4 years • Covers all aspects of personal info mgmt. • Core component of my Ph.D application

My Self-tracking Efforts • LiFiDeA Project (2011-2012) Data Moved onto Excel Sheet Raw Data on Evernote Happiness by Place Happiness by Wake-up Time

Lessons Learned • Combine existing solutions whenever possible • “Done is better than perfect” applies here • *You* should own your data, not the app you use • Apps can come and go, but the data should stay • Minimize data collection efforts for sustainability • Integrate self-tracking into your daily routine • “Effort << Benefit” should be kept all the time • Communicating regularly helps you make progress • Writing has been the best way to learn about the subject

OPTIONAL SLIDES

Criteria for Choosing IR vs. RecSsys • Confidence in predicting user’s preference • Availability of matching items to recommend RecSys IR • User’s willingness to express information needs • Lack of evidence about the user himself

HCIR Way: IRWay: The from Query to Session Rich User Modeling Rich User Interaction USER SYSTEM Response Action User Model Interaction History Response Action Profile Context Behavior Response Action Filtering Conditions Related Items … Filtering / Browsing Relevance Feedback … Providing personalized results vs. rich interactions are complementary, yet both are needed in most scenarios. No real distinction between IR vs. HCI, and IR vs. RecSys

The Great Divide: IR vs. CHI IR CHI User / System User Value / Satisfaction Interface / Visualization Human-centered Design User Study CHI / UIST / CSCW • Query / Document • Relevant Results • Ranking / Suggestions • Feature Engineering • Batch Evaluation (TREC) • SIGIR / CIKM / WSDM Can we learn from each other?

1 st DATA SCIENCE MEETUP in SEOUL

1 st DATA SCIENCE MEETUP in SEOUL

Presentation Transcript

Federal Big Data Working Group Meetup

Federal Big Data Working Group Meetup

Federal Big Data Working Group Meetup

Federal Big Data Working Group Meetup

Federal Big Data Working Group Meetup

Federal Big Data Working Group Meetup

Federal Big Data Working Group Meetup

Federal Big Data Working Group Meetup

Data Structures -1 st exam-

Physical Science Review 1 st Trimester

1 ST QUARTER DATA REVIEW

1 st Quarter Science Study Guide

1 st semester Science Review

Data Structures -1 st exam-

Federal Big Data Working Group Meetup

1 st Year Science Revision

Sponge Activities – Science 1 st Period

Data Science Nashville Meetup

Data Science for DHS and NOAA and Wolfram Data Science Platform Meetup

Federal Big Data Working Group Meetup

1 st Grade Science

Data Meetup @ Byte Academy