1 / 16

COMP 4332 Tutorial 9 April 8 CHEN Zhao

Introducing Project 2/Assignment 3: KDDCUP 2012 (Track 1). COMP 4332 Tutorial 9 April 8 CHEN Zhao. KDDCup 2012 hosted by Tencent. Website by Kaggle : https://www.kddcup2012.org / Tencent , 5 th largest Internet Company in the world, by market cap(Google, Amazon, Ebay , Facebook).

jalen
Download Presentation

COMP 4332 Tutorial 9 April 8 CHEN Zhao

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducing Project 2/Assignment 3: KDDCUP 2012 (Track 1) COMP 4332 Tutorial 9 April 8 CHEN Zhao

  2. KDDCup 2012 hosted by Tencent • Website by Kaggle: https://www.kddcup2012.org/ • Tencent, 5th largest Internet Company in the world, by market cap(Google, Amazon, Ebay, Facebook). • QQ (Online Messenger) • Weibo (micro-blog system)  KDDCup 2012 data source • QZone (blog system) • Games, Mail, Wechat, etc. • Two tasks • User recommendation • Ad click-through rate prediction • 16,000 US$ prize

  3. Timelines

  4. KDDCUP 2012 tasks • Task 1: Micro-blog User Recommendation • Recommends a popular person / an organization / a group TO a user • Task 2: Ad click-through rate prediction • How often will an Ad be clicked by a user? Project 2 only includes task 1

  5. User recommendation UI Popular user Recommendation You don’t need this Website to do the Task.

  6. Data • Download: http://www.kddcup2012.org/c/kddcup2012-track1/data • Description: http://www.kddcup2012.org/c/kddcup2012-track1 • File list: • rec_log_train.txt (1.99Gb) rec_log_test.txt (1GB) • user_profile.txt (55.8Mb) • item.txt (1.18Mb) • user_action.txt (217Mb) • user_sns.txt (740Mb) • user_key_word.txt (182Mb)

  7. Data - item • Item.txt: • An item is a specific user in TencentWeibo, which can be a person, an organization, or a group, that was selected and recommended to other users. • Total: 6096 • One example: • ID category keywords • Category is a four-field string, representing a hierarchical category. 2335869 8.1.4.2 412042;974;85658;...;6183;974

  8. Data – user profile • user_profile.txt • ID birth-y gender #tweet tags • Tag 0: this user has not marked any tag. 100044 1899 1 5 831;55;198;8;450;7;39;5;111 100054 1987 2 6 0 100065 1989 1 57 0

  9. Data – user-item matrix • rec_log_train.txt / rec_log_test.txt • UserIDItemID ?followed TimeStamp • ?followed: -1/1, user accepts the recommendation or not • In test data, it is filled with 0, to be predicted as -1/1. • TimeStamp: unix-timestamp • Seconds from 70.1.1 00:00:00 (UTC time) 2088948 1760350 -1 1318348785 2088948 1774722 -1 1318348785 2088948 786313 -1 1318348785 601635 1775029 -1 1318348785 601635 1902321 -1 1318348785 601635 462104 -1 1318348785 1529353 1774509 -1 1318348786

  10. Data -- Other information on users • user_action.txt • user A has retweeted user B 5 times, has “at” B 3 times, and has commented user B 6 times, then there is one line “A   B     3     5     6” in user_action.txt. • user_sns.txt • (Follower-userid)\t(Followee-userid) • user_key_word.txt • contains the keywords extracted from the tweet/retweet/comment by each user. • For details: http://www.kddcup2012.org/c/kddcup2012-track1

  11. Collaborative Filtering (CF) in one slide • Will a user follow an item (popular person/organization/group) ? users ? items More CF in next tutorial.

  12. Evaluation metric, AP@n • http://www.kddcup2012.org/c/kddcup2012-track1/details/Evaluation • Page 7 of http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf • ap@n = Σ k=1,...,n P(k) / (number of items clicked in m items) • (1)     If among the 5 items recommended to the user, the user clicked #1, #3, #4, then ap@3 = (1/1 + 2/3)/3 ≈ 0.56 • (2)     If among the 4 items recommended to the user, the user clicked #1, #2, #4, then ap@3 = (1/1 + 2/2)/3 ≈ 0.67 • (3)     If among the 3 items recommended to the user, the user clicked #1, #3, then ap@3 = (1/1 + 2/3)/2 ≈ 0.83

  13. Leaderboard • http://www.kddcup2012.org/c/kddcup2012-track1/leaderboard

  14. An open challenge • Few literature on this application. • Generally, methods from CF/RecSys can be used. • Try simple heuristics first. • Basic target: beat “Example Submission”

  15. Simple heuristics • Select most popular items to every user. • Select the items with high acceptance ratio.

  16. Assignment 3 • A survey on solutions of KDDCUP 2012 Track 1 • Contains at least 2 of the top 3 teams( http://www.kddcup2012.org/workshop ) • Compare their pros and cons • Try to propose some improvements. • Deadline: 17th April 2014 • Get prepared. • Requirements of Project 2 will be released next week.

More Related