1 / 18

Examining Hurricane Irma with Twitter Data and Machine Learning

This study examines Hurricane Irma using Twitter data and machine learning techniques to analyze tweet content and sentiment for response effectiveness and social issues. The research investigates tweet changes over time, government perceptions, and response categories through methods including text cleaning, Bag-of-Words, and Doc2Vec modeling. Results suggest potential improvements and implications for future hurricane response strategies.

jhebb
Download Presentation

Examining Hurricane Irma with Twitter Data and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Examining Hurricane Irma with Twitter Data and Machine Learning Gbemisola Oladosu, Dylan Johnson, (Mentors: Dr. Chien-fei Chen, Dr. Xiaojing Xu, Zach McMichael, Jullian Ball)

  2. Outline • Introduction • Purpose • Research Questions • Methods • Data Collection • Text Cleaning • Descriptives • Bag-of-words • Doc2Vec • Doc2Vec Models • Results • Conclusion • Conclusion • Future Work

  3. Introduction • Proportion of Category 4 and 5 hurricanes in the US has increased in the past two decades[1] • Cost of hurricane damage is increasing[2] • Larger responses are necessary • Higher risk for social issues • Measuring response effectiveness and social issues is important

  4. Introduction (Cont.) • How did tweet content and purpose change over time? • What tweet content categories had the most number of complaints, messages of appreciation, or requests for help? • How did people feel about the government?

  5. Methods • 10,784 Irma-related tweets from September 2017 • Tweets were labeled for content (Code 1) and purpose (Code2) • Removed: • Duplicates • Labeling mistakes

  6. Methods (Cont.) • Text was cleaned: • Non-English • Retweets • Punctuation • Articles • URLs • Set to lowercase

  7. Methods (Cont.) • Oversampled to 10,000 Composition of Tweet Purpose (Code2) Composition of Tweet Content (Code1)

  8. Methods (Cont.) • Neural networks require numeric inputs • Bag of Words feature vector • Represent relationships btw. words • Word similarity • Word order • (‘Large’ as far from ‘fat’ as ‘cat’) The fat cat sat on the mat.

  9. Doc2Vec • Uses word embeddings (many dimensional vectors) to represent: • Word similarity • Word order • Relationships between words [3]

  10. Methods (Cont.) • Eight Doc2Vec models were created to predict Code 1 and Code 2

  11. Results (Cont.) • Only complaints at govt. • Mostly regarding infrastructure and animals • Tweets of appreciation were related to health, infrastructure, and social issues/crimes • Requests for help related to animals Code1 vs Code2

  12. Results (Cont.) • Tweets dramatically increased after landfall • Tweet composition remained constant Code1 vs Time Code1 vs Time

  13. Results (Cont.) • Not very accurate • ‘Relief efforts’ was predicted a lot before the hurricane made landfall • ‘Recovery info’ and ‘relief efforts’ were predicted disproportionately

  14. Conclusion • Predictions were inaccurate due to a small training set of fewer than 1400 tweets • Code 1 model’s accuracy of 43% and Code 2 model’s accuracy of 31% suggest more data could produce better results

  15. Conclusion (Cont.) • Analyzing the results of a more accurate model can allow future researchers to determine the most impactful hurricane problems • This can inform hurricane response groups on how to address these problems

  16. References • Holland, G., & Bruyère, C. L. (2013). Recent intense hurricane response to global climate change. Climate Dynamics, 42(3–4), 617–627. https://doi.org/10.1007/s00382-013-17130 2. Hurricane Costs. (2017). Retrieved July 23, 2019, from Noaa.gov website: https://coast.noaa.gov/states/fast-facts/hurricane-costs.html 3. Le, Q., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents. Retrieved from https://arxiv.org/pdf/1405.4053.pdf

  17. Acknowledgements Special thanks to our mentors: Dr. Chien-fei Chen, Dr. Xiaojing Xu, Zach McMichael, and Julian Ball

  18. Acknowledgements This work was supported primarily by the ERC Program of the National Science Foundation and DOE under NSF Award Number EEC-1041877 and the CURENT Industry Partnership Program. Other US government and industrial sponsors of CURENT research are also gratefully acknowledged.

More Related