MS defense: Social Media Data Analytics Applied to Hurricane Sandy, Han Dong, 7/29
MS Defense
Computer Science and Electrical Engineering
Social Media Data Analytics Applied to Hurricane Sandy
Han Dong
12:30-2:30 Monday, 29 July 2013, ITE 325b
Social media websites are an integral part of many people’s lives in delivering news and other emergency information. This is especially true during natural disasters. Furthermore, the role of social media websites is becoming more important due to the cost of recent natural disasters. These online platforms are usually the first to deliver emergency news to a wide variety of people due to the significantly large number of users registered. During disasters, extracting useful information from this pool of social media data can be useful in understanding the sentiment of the public; this information can then be used to improve decision making. In this work, I am presenting a system that automates the process of collecting and analyzing social media data from Twitter. I also explore a variety of visualizations that can be generated by the system in order to understand the public sentiment. I demonstrate an example of utilizing this system on the Hurricane Sandy disaster from October 26, 2012 to October 30, 2012. Finally, a statistical analysis is performed to explore the causality correlation between an approaching hurricane and the sentiment of the public.
As a result of the large amount of data collected by this system; scalable machine learning algorithms are needed for analysis. Boosting is a popular and powerful ensemble method in the area of supervised machine learning algorithms due to its theoretical convergence guarantees, simple implementation and ability to use different learning algorithms to produce a classifier with high accuracy. A novel parallel implementation of the multiclass version of Boosting (AdaBoost.MH) is proposed and our experimental results show that the parallel implementation achieves classification error percentages similar to serial implementation with fewer execution iterations. By distributing the tasks, the number of Boosting iterations decreased linearly at least up to 16 computational threads.
Committee: Professors Milton Halem (chair), Yelena Yesha, John Dorband and Shujia Zhou
Posted: July 18, 2013, 11:27 AM