Analyzing tweets from three metropolitan areas to study large temporal trends for a city as a whole

Understanding when, where, and how increasingly diverse and dynamic subpopulations interact in urban environments is critical to the integrity of the city as a whole. This knowledge can facilitate the development of communication strategies, urban planning methodologies, and resource allocation to best serve citizens.

Previous research focused either upon large temporal trends for the city as a whole, or mapped citizens in social or geographic space using broad categories (such as shared interests on social media or one’s racial designation). Whereas these studies focused on a single geographic area, this research, conducted by MSDS students Lander Basterra, Tyler Worthington and James Rogol, includes Twitter data from three culturally distinct metropolitan areas over the same 92-day period: Los Angeles, Chicago and Istanbul.

The global embrace of the Twitter social media platform provides the primary data source for a methodology applicable to any urban area. For tweets emanating from within each city, a bag-of-words approach to the messages’ textual content creates topical clusters within the most frequently occurring languages. These classifications transcend traditional racial designations.

Time series analysis of each Tweet’s timestamp reveal that the volume of tweets across topics is significantly correlated with major regional events. Furthermore, certain subpopulations’ postings rise and fall in sync with others. Examining the trends among strongly correlated topic groups provides an indication of how these groups might interact.