Medical Needles in the Social Media Haystack

Gwenyth Mercep ’22

Figure 1: Social media use allows unique insight into the health of our world

In today’s digital age, there’s a lot of potentially useful information in circulation. With many of us compelled to share monologues on social media platforms, like Facebook, the landscape of data research is being revolutionized. Researchers from the University of Pennsylvania studied this change by using Facebook data to test the confluence of social media noise as indicating factors of the well-being of our world. This research analyzes whether or not Facebook data could provide a medical lens into the typically elusive daily lives of individuals. The questions posed were, “Can we predict individuals’ medical diagnoses from language posted on social media?” and, “Can we identify specific markers of disease from social media posts” [1]?

Adult patients seeking care in a health organization were invited to share their past social media activity and electronic medical record (EMR) data [1]. From the EMRs, age, sex, race, and prior diagnoses were recorded [1]. Medical conditions from EMR’s across 21 extensive categories were evaluated for predictability in alliance with social media content, involving approximately 20 million words written by 999 consenting individuals [1]. Merchant et al analyzed language patterns within clusters of related words, such as stomach, head, hurt, along with their statistical association with the categories of medical record diagnoses indicating health conditions [1]. For each medical condition, three predictive models associating Facebook posts with EMR-based diagnoses were used [1]. Of the 21 groups, 18 categories were successfully predicted using Facebook language with statistical significance [1]. The medical conditions with the highest prediction accuracy were diabetes, pregnancy, anxiety, psychoses, and depression [1]. Although certain phrases were associated with a particular category, religious language and diabetes, for example, it should be noted not everyone who mentioned these words has the condition, but the chance of having diabetes went up when those words were used.  

This use of social media data reveals elements of disease in patients’ daily lives that would otherwise be invisible to clinicians and medical researchers. This analysis encourages the gaze and implication of medicine up and out, beyond the walls of the doctor’s office. Similarly, to the power of genetic information, patterns of language can serve as markers of disease risk and as a screening tool [1]. In the diabetes association, that information would give public health professionals agency to explore the role of religion in the context of diabetes management. This suggests our genetic processes can be equated with our social media use promoting future applications of AI in medicine.

 [1] Merchant RM, et al., “Evaluating the predictability of medical conditions from social media posts”, PLOS ONE 14, 1-8 (2019). DOI:

[2] Image retrieved from:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s