Pandemics, Protests, and Publics

Demographic Activity and Engagement on Twitter in 2020

Sarah Shugars, Adina Gitomer, Stefan McCabe, Ryan J. Gallagher, Kenneth Joseph, Nir Grinberg, Larissa Doroshenko, Brooke Foucault Welles, David Lazer

Forthcoming in Journal of Quantitative Description: Digital Media, 2021


When researchers collect and aggregate social media data, they are making explicit decisions about the populations and behaviors under study. However, there is little available guidance to ensure that these methodological choices are conceptually and empirically grounded. For example, how should researchers conceptualize a topical sample of social media content? Can it be understood as a self-contained world? Can we interpret individual accounts as participating in the same discourse or do we need to consider the ways in which different subpopulations engage? Should we disaggregate the specific mechanisms of user activity and engagement? In short: when do researchers need to consider the breadth of variation in user experience and behavior, and when can they meaningfully aggregate over such behavior? Leveraging a panel of 1.6 million Twitter accounts matched to U.S. voting records we provide empirical guidance on these questions through the conceptual lens of public sphere theory. We focus on the first nine months of 2020, giving particular attention to the Black Lives Matter movement and the COVID-19 pandemic. Examining the demographics, activity, and engagement of 800,000 American adults who collectively posted nearly 300 million tweets during this time, our findings provide practical and empirical guidance for researchers aiming to establish meaningful bounds around populations and behaviors to study. Specifically, we find that topics are imperfect but useful bounds, though topically selected tweets must be understood to be capturing segments of numerous, overlapping, and disconnected conversations. We further find that researchers should always conduct a dissaggrated analysis of tweet activity, separately examining behavior around authored tweets, retweets, quote tweets, and replies. Additionally, we find retweets and quote tweets appear to be used in distinctly different ways, potentially reflecting that retweets amplify content while quote tweets modify that content. Finally, we find that while temporal bias is inherent to social media data, its effects are manageable within our period of study. Overall, this work paints a picture of Twitter as a fluid, contextual environment best conceptualized as networked publics and characterized by enormous variety in user identity, activity, and engagement. While there are no self-contained “Twitter publics” around which perfect boundaries can be drawn, our findings provide valuable empirical guidance to researchers grappling with the conceptual implications of their methodological choices.