120 likes | 226 Views
Weather and Tweets UCML 2013. Members: Vinh Dang, Wai I Iong, Matthew Dudley, Jiyuan Li. Background. Analyzing tweets related to the weather whether it has a positive, negative, or neutral sentiment. whether the weather occurred in the past, present, or future.
E N D
Weather and Tweets UCML 2013 Members: Vinh Dang, Wai I Iong, Matthew Dudley, Jiyuan Li
Background • Analyzing tweets related to the weather • whether it has a positive, negative, or neutral sentiment. • whether the weather occurred in the past, present, or future. • and what kind of weather the tweet references.
The data • Training set: (http://www.kaggle.com/c/crowdflower-weather-twitter) • contains tweets, locations, and a confidence score for each of 24 possible labels. • about 78000 attributes
The data Labels: • s1 + s2 + s3 + s4 + s5 = 1 • w1 + w2 + w3 + w4 = 1 • k1 + k2 + … + k15 may be greater than 1wd
The data • Testing set: • contains the id, tweet, state and location • no “sentiment”, “when”, or “kind” labels • which is where we are heading to • about 42000 attributes
Data Preprocessing • Data “normalizing” • convert html code into character (Ex: > → >) • examples: • convert all the hyperlinks in testing set into “{link}” • examples: • Tokenizing For example: “What a bright sunny!” “[what, a, bright, sunny, !]” • SQLite (for storing data)
Methodology • Bags of Words • tf-idf • Approach: 1) Regression SVM (SVR) 2) Ridge Regression
Result • Our result: • SVR RMSE = 0.26149 • Ridge RMSE = 0.16997 • Others: • The winner: 0.14314 • Start line (all zeros): 0.31957
Result • A better approach (Testing data VS. Actual results) • Review of Labels
Reference • CrowdFlower (2013) “Partly Sunny with a Chance of Hashtags.”, Kaggle, Retrieved from http://www.kaggle.com/c/crowdflower-weather-twitter. • Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm • Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
Question? The End