Identification of Victims, Types, and Risk Factors of Intimate Partner Violence during COVID-19 on Social Media

Principal Investigators:  

Sangmi Kim, PhD, MPH, RN | Emory University, Nell Hodgson Woodruff School of Nursing   

Abeed Sarker, PhD | Emory University School of Medicine, Department of Biomedical Informatics

Research Gap:

This study proposes to contribute to narrowing two research gaps in the domain of intimate partner violence (IPV). There is a lack of research on how events such as a pandemic affects IPV, and also on utilizing social media data as a method to monitor and analyze chatter that may lead to developing protective interventions.


Intimate partner violence (IPV)— physical violence, sexual violence, stalking, or psychological harm by a current or former partner or spouse—is a global public health problem, and about 1 in 4 women and 1 in 10 men in the US reported experiencing IPV. During the current COVID-19 crisis, IPV has been on the rise. The pandemic response measures, such as social distancing, shelter-in-place, school and business closures, and travel restrictions increased IPV victims’ susceptibility to violence because of social isolation, longer time spent with their perpetrator at home, stress from economic instability, health, and security, relationship strain, and the limited access to resources (e.g., shelter, legal aid). Thus, it is imperative to monitor population-level patterns of IPV incidents and provide contact-free interventions to IPV victims. One major challenge to do so, however, is the difficulty of collecting more reliable and actionable IPV-related data, particularly from conventional sources (e.g., survey or medical/police report) during the pandemic. As an alternative, we propose to use social media (SM) as a non-conventional data source/means. With advanced analytic technologies, including natural language processing (NLP) and machine learning, SM holds tremendous potential to assess the population level characteristics of IPV and IPV victims during the pandemic and proactively reach out to IPV victims to provide necessary support in a time-sensitive manner. The volume of IPV-related discussions on SM (e.g., Reddit) has rapidly increased over the last two years and more so during the pandemic. With the restricted in-person interactions and communications, SM can be a key channel to collect IPV-related live streaming data accurately, unobtrusively, at scale, and anonymously. However, the SM’s role in monitoring IPV and helping IPV victims has been underexplored, although recent feasibility studies have produced promising outcomes. Our long-term goal is to establish a framework for IPV surveillance and support of IPV victims during the pandemic and onward by utilizing SM.  


  1. To determine the characteristics of IPV and IPV victims in the US during COVID-19 on Twitter and Reddit via social media data analytics and expert annotation.   
  2. To develop an automated, SM-based system to detect, categorize, and store streaming IPV-related big data during COVID-19 on Twitter and Reddit via NLP and machine learning. 
  3. To develop a lexicon of IPV’s typologies, risk/protective factors, and IPV victims’ needs during COVID-19 by content analysis of the collected SM data on IPV. 

Why is this study important?

This preliminary study will lead us to a future NIH R01 grant to develop a SM-based IPV monitoring resource and an app/web-based intervention platform that provides personalized safety plans and social support to IPV victims identified by the proposal methods.

Updates and Results

Reddit Annotation: The Reddit manual annotations were accompanied by a thorough qualitative analysis. Out of a final set of 2,924 original posts (OPs), we randomly selected 300 OPs for quantitative content analysis.

  • Self-report status, timing of IPV, and help-seeking: Of 300 OPs, 36% (n = 108) accounted for a self-report of IPV (i.e., a report by survivors themselves), while 3.3% (n = 10) was a report by bystanders for their family, friends, or neighbors who had experienced IPV. Of 108 OPs written by survivors, about 40% regarded current/ongoing IPV, and about 14% contained help-seeking messages. Of 10 OPs written by bystanders, 90% were to report current/ongoing IPV they witnessed, and 80% contained help-seeking messages, such as how to help the victim or how to intervene effectively with the situation.
  • IPV typologies:  A majority of the posts (from both survivors and bystanders) reflected psychological aggression, followed by physical and sexual violence.
  • Tactics for psychological aggression:  For survivors, 61.4% of the psychological aggression involved expressive aggression, followed by gaslighting (54.3%), coercive control (44.3%), and a threat of physical or sexual violence (21.4%). For bystanders, 83.4% of the reported psychological aggression reflected expressive aggression.
  • Survivors’ needs during the pandemic:  The top five needs were asking about similar experiences, wanting legal advice, asking if their feelings/reactions/ thoughts/actions were valid, wanting to know if they were being abused, and asking for ways to grieve.

Twitter Annotation: For Twitter, the final annotated data consisted of 6,348 posts of which 5,680 (~89%) were non-self-reports and 668 (~11%) were self-reports.

  • The average pair-wise inter-annotator agreement (IAA) was 0.86 (Cohen’s kappa) which represents substantial agreement.

Classification: The RoBERTa modelshowed the best performance on the classification task from both datasets. Among traditional models, SVMs showed the best performance generally, but were outperformed by RoBERTa.

  • The overall accuracy of non-IPV class is generally higher than the IPV-class, which is not surprising since the data is imbalanced.
  • SVMs perform specifically poorly in terms of F1-score when the imbalance is high (e.g., Twitter data).
  • The classifiers perform better for subreddits with a greater proportion of IPV posts.
  • There is likely to be room for improvement in terms of F1-score on both datasets.  

Post-classification analyses: For the best-performing classifier (RoBERTa), we performed an analysis of the errors to identify common reasons behind misclassifications.

  • We identified that a common reason for misclassifications was that domestic violence can either be from an intimate partner or a family member, but the model lacked the capability to distinguish the nuance between the two scenarios.
  • In addition, we calculated the average length of the mistakenly classified posts, the correctly classified posts, and all posts, which were 545, 365, and 404, respectively. We observed that the error cases tended to be longer than the correct cases.
  • For Twitter, we found that lack of context typically resulted in classification errors.

While the results we obtained are promising, further effort and improvements are needed to effectively utilize the opportunity presented by social media.