1 / 27

USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA

USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA. Haoran Yu, Guangzhong Sun, Min Lv. Outline. Introduction Data Methods Find the longest inactive time span to discover sleeping time from dense data Statistics method in discovering sleeping time pattern from sparse data

una
Download Presentation

USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. USERS SLEEPING TIME ANALYSIS BASED ON MICRO-BLOGGING DATA Haoran Yu, Guangzhong Sun, Min Lv

  2. Outline • Introduction • Data • Methods • Find the longest inactive time span to discover sleeping time from dense data • Statistics method in discovering sleeping time pattern from sparse data • United time series by month • Subsequence statistics • Experimental Results • General result of sleeping time • Further analysis • Detect time zone of Sina Weibo users • Detection on movement between different time zones • Related Applications • Conclusion

  3. Introduction(1) • Goals • Inferring the sleeping time of Sina Weibo users • Enable general location(time zone) prediction

  4. Introduction(1) • Goals • Inferring the sleeping time of Sina Weibo users • Enable general location(time zone) prediction • Motivation • The increasing availability of user-generated contents. • User-generated posts • Time series of activities

  5. Introduction(1) • Goals • Inferring the sleeping time of Sina Weibo users • Enable general location(time zone) prediction • Motivation • The increasing availability of user-generated contents. • User-generated posts • Time series of activities • Users’ activities reflect their sleeping time • Posts: Goodnight,Sleep, or Guess How Much I Love You, etc • Time series: Active time & Inactive time Mid-night Sleep 13:42 Beijing Time – 00:42 Central Time Sleep 23:51 Beijing Time Sleep 20:05 Beijing Time

  6. Introduction(1) • Goals • Inferring the sleeping time of Sina Weibo users • Enable general location(time zone) prediction • Motivation • The increasing availability of user-generated contents. • User-generated posts • Time series of activities • Users’ activities reflect their sleeping time • Posts: Goodnight,Sleep, or Guess How Much I Love You • Time series: Active time & Inactive time • Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well) (Sleep at night)

  7. Counting Sheep with Path Data Science – Aug 23, 2012 By Path Data Team http://blog.path.com/post/30041197400/counting-sheep-with-path-data-science

  8. Counting Sheep with Path Data Science – Aug 23, 2012 By Path Data Team http://blog.path.com/post/30041197400/counting-sheep-with-path-data-science

  9. Art related works Economy & finance Different time to go to bed Different time to wake up Different active degree

  10. User live in western China User live in western Europe

  11. Introduction(1) • Goals • Inferring the sleeping time of Sina Weibo users • Enable general location(time zone) prediction • Motivation • The increasing availability of user-generated contents. • User-generated posts • Time series of activities • Users’ activities reflect their sleeping time • Posts: Goodnight,Sleep, or Guess How Much I Love You • Time series: Active time & Inactive time • Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well) (Sleep at night)

  12. Introduction(1) • Goals • Inferring the sleeping time of Sina Weibo users • Enable general location(time zone) prediction • Motivation • The increasing availability of user-generated contents. • User-generated posts • Time series of activities • Users’ activities reflect their sleeping time • Posts: Goodnight,Sleep, or Guess How Much I Love You • Time series: Active time & Inactive time • Sleeping time is tightly related to location (Gender, Age, job, etc may have influence as well) (Sleep at night) • Significance • Detect users with fake location (Distinguish real person and robots) • The change of location => advertisement

  13. Introduction(2) • Difficulty & Challenges • User are not always active => data are sparse • Aspects other then location => influence the sleeping time • Contribution and insights • A new open problem related to the relationship between sleeping time and micro-blogging activities is presented • Preliminary simple solutions are presented in this paper. Based on experiment results on real data set. • A data set of users’ time series of activities on Weibo is collected and open to the public.

  14. Data • Source: Sina Weibo • Sina Weibo (http://weibo.com/), as a micro-blogging platform, akin to a hybrid of Twitter and Facebook; it is used by well over 30% of 513 million users (Up to the end of year 2011) in China. It has a similar market penetration that Twitter has established out of China. • User type: • Verified users with daytime work jobs • > 500 fans and >100 posts • Total number of users: 844 • Total number of posts: 1, 579, 623 • Form of Data • => each corresponds to an action of user (temporarily only posts)

  15. t1 t2 t3 t4 tn-2 tn-1 tn Methods: Find the longest inactive time span to discover sleeping time from dense data • IDEAL CONDITIONS: people sleep for a long time once a day and keep active in the rest of time. • -> can't find out two or more long inactive span • -> D = {d1, d2,… , dn-1} determined by every two points can be defined • Therefore, a final span (d') ̅ representing the average sleeping time of a user can be figured out by calculating the average lower bound and upper bound of all result spans.

  16. Day 1 Day 2 … Day n Methods: Statistics method in discovering sleeping time pattern from sparse data • IDEAL CONDITIONS -> Less than 1% of Weibo users can satisfy • Statistics • 1440-minutes-period (60 X 24) • Daily data of time series are united together by a section(week/month/quarter…)

  17. Find the longest inactive time span to discover sleeping time from dense data- United time series by month • In order to solve this problem in this case, daily data of the time series are united by month for example, as new sequences Tm = {tm1, tm2 … tmn}. • In each of the time series data of month m, the lengths of spans Dm = {dm1, dm2, … ,dmn} determined by every two points can be defined as: • long inactive spans can be easily obtained from the time series data of the corresponding month. dmn dm1 dm2 dm3 … dm(n-2) dm(n-1) dmn tm1 tm2 tm3 tm4 … tm(n-2) t m(n-1) tmn

  18. Find the longest inactive time span to discover sleeping time from dense data- Subsequence statistics • ! The sparsity of time series data is different from different users. ->The chosen length of the section for doing statistics cannot match all the circumstances. • Subsequence statistics • Figure out all inactive spans of time • Move start point & end point of subsequence • Make every subsequence a ideal condition • Accuracy+ Efficiency – ->Cost too much time

  19. Further analysis: User with fake or unclear location label • Sina Weibo users label their location in the profile without verification. Some of them label the location as 'unknown' or as fake locations. • 12 users are judged to have obvious fake location labeled in their profiles. • Fake location within the real time zone can’t been detected (Beijing & Shanghai will not viewed as different location here) • The List are as follows: 9775521, 1025882825, 1092620107, 1191220232, 1194431627, 1215914717, 1222524197, 1228151340, 1686658697, 1686719832, 1697717391, 1713138055

  20. Verified User. XiaosongGao , a famous musician of mainland China Labels his location as 'The United States, Oversea' (GMT-8~GMT-5)

  21. The result shows that he sleeps at about 0 a.m. (GMT+8) and wakes up at about 8 a.m. (GMT+8). The corresponding time zone is GMT+7. Therefore, the location Mr. Gao labels himself is fake according to the difference between the location in his profile and the detected time zone.

  22. Further analysis: Detection on movement between different time zones • There are users who study abroad or have business overseas. They travel far, but they won't change the label of location in the profile page frequently. • 33 users are likely to have obvious movement (move from locations in different time zones) • Movement within a time zone can’t been detected • The List are as follows: 1025722364, 1025882825, 1043898503, 1059500987, 1061252391, 1094551220, 1120630263, 1152571540, 1159282133, 1191220232, 1193715355, 1214305597, 1216517652, 1217907573, 1218288163, 1221281087, 1222524197, 1222838993, 1229162931, 1682771764, 1689243304, 1689647122, 1693141730, 1693775877, 1695248214, 1697033804, 1697370827, 1702062500, 1703580823, 1706608200, 1711754322, 1711962552, 1712058193

  23. Verified User. Xiaoqian Liu, a reporter of CCTV Brazil Labels his location as ‘Brazil, Oversea‘ (GMT-4~GMT-2)

  24. The result indicates that, before Sept. 2011, he goes to bed at about 0 a.m. (GMT+8) and wakes up at about 6 a.m. (GMT+8). The corresponding time zone is GMT+8~GMT+9. After Sept. 2011, the time he starts sleeping change to about 9 a.m. (GMT+8) and the sleep period end at 4 p.m. (GMT+8). The corresponding time zone is GMT-3.5 ~ GMT-2.5. Thus, the obvious movement is detected and the time of this change can be estimated.

  25. Related  Applications • The study can be extended to other data sets. • Web visiting logs on the web server • Users or robots? • IP address of users or IP address of proxy/VPN • Data of user login activates • ‘Don’t lock my account when I use a proxy please…’ • May be useful to the recommendation • Users who share a same sleeping time is more likely to meet on social network. Why not recommendation their feeds first? • Watch your health!

  26. Conclusion • A new open problem related to the relationship between sleeping time and micro-blogging activities is presented. • This study associates the pattern of sleep and the time zone users live in. • For such a problem above, preliminary simple solutions are presented in this paper. Based on experiment results on real data set, the proposed solutions are proved both efficient and effective.

  27. Thanks! A data set of is open to the public. The URL is: http://www.haoranyu.com/research/weibosleepQuestions Welcome

More Related