Oncology/Endocrine
Association for Academic Surgery
A pattern-matched Twitter analysis of US cancer-patient sentiments

https://doi.org/10.1016/j.jss.2016.06.050Get rights and content

Abstract

Background

Twitter has been recognized as an important source of organic sentiment and opinion. This study aimed to (1) characterize the content of tweets authored by the United States cancer patients; and (2) use patient tweets to compute the average happiness of cancer patients for each cancer diagnosis.

Methods

A large sample of English tweets from March 2014 through December 2014 was obtained from Twitter. Using regular expression software pattern matching, the tweets were filtered by cancer diagnosis. For each cancer-specific tweetset, individual patients were extracted, and the content of the tweet was categorized. The patients' Twitter identification numbers were used to gather all tweets for each patient, and happiness values for patient tweets were calculated using a quantitative hedonometric analysis.

Results

The most frequently tweeted cancers were breast (n = 15,421, 11% of total cancer tweets), lung (n = 2928, 2.0%), prostate (n = 1036, 0.7%), and colorectal (n = 773, 0.5%). Patient tweets pertained to the treatment course (n = 73, 26%), diagnosis (n = 65, 23%), and then surgery and/or biopsy (n = 42, 15%). Computed happiness values for each cancer diagnosis revealed higher average happiness values for thyroid (h_avg = 6.1625), breast (h_avg = 6.1485), and lymphoma (h_avg = 6.0977) cancers and lower average happiness values for pancreatic (h_avg = 5.8766), lung (h_avg = 5.8733), and kidney (h_avg = 5.8464) cancers.

Conclusions

The study confirms that patients are expressing themselves openly on social media about their illness and that unique cancer diagnoses are correlated with varying degrees of happiness. Twitter can be employed as a tool to identify patient needs and as a means to gauge the cancer patient experience.

Introduction

Twitter (www.twitter.com) is a well-known online microblogging social media device that currently has 320 million monthly active members. The service allows for the users to send small messages called “tweets” that are limited to 140 characters; approximately 500 million tweets are sent per day.1 The Pew Research Center, which tracks social media usage among the United States adult internet users, reported that Twitter usage has significantly increased from 18% to 23% over the past year, with a significant increase of 5%-10% in users older than 65 y.2 Indeed, there has been wide recognition that Twitter is a powerful gauge of public sentiment across a spectrum of current social and medical issues: the impact of socioeconomic factors on happiness,3 climate policies,4 election results,5, 6 opioid abuse,7 understanding public perception of immunizations,8 predicting enrollment in Affordable Care Act marketplaces,9 perceptions of e-cigarettes,10 trending infectious disease,11, 12, 13 obesity, and allergies.14 Diez et al.15 sought to qualitatively characterize the content of breast cancer, colorectal cancer, and diabetes social groups on Facebook and Twitter. Twitter has also been used in the cancer arena to better understand breast cancer awareness month16 and to qualitatively categorize cervical and breast cancer screening patient dialog.17 At last, researchers have begun to understand the interconnectedness of cancer patients on Twitter and have sought to characterize those relationships.18 As patients increasingly turn to social media to express themselves about health care concerns, we sought to test the twittersphere as a potential means by which to collect and describe the content of patient tweets and to analyze patients' health sentiments with respect to the leading cancer diagnoses as documented by the National Cancer Institute.19 We hypothesized that the most prevalent cancers would be the most frequently tweeted and that patient happiness values would vary for each cancer diagnosis.

Section snippets

Methods

A large sample of English tweets from March 2014 to December 2014 with imbedded location coordinates (“geotagged”) were obtained from Twitter's streaming application programming interface. Pattern matching using “cancer” as a keyword returned 186,406 tweets. Using regular expression software (Perl), case insensitive pattern matching along with tokenization algorithms to strip punctuation, relevant cancer-related tweets were filtered from the data stream. Tweets from countries other than the

Results

The most frequently tweeted cancers were breast (n = 15,421), lung (n = 2928), prostate (n = 1036), and colon and/or rectal (n = 773; Table 1). Patients were manually extracted for each unique cancer diagnosis, with a total of 161 patients for breast cancer, although this only represented a small fraction of the total tweets (1.0%). This is in contrast to endometrial cancer, where out 43 total tweets, 10 patients were identified (23.3%). Following manually extracting patients for each cancer

Discussion

This study investigated the most commonly tweeted cancers and identified patients with active disease, as well as those in remission. The fact that the breast cancer was the top-tweeted cancer was not surprising, considering breast cancer is one of the most prevalent cancer types,19 the large public awareness surrounding the disease, and the highly publicized and endorsed October breast cancer awareness month. The national incidence of lung, prostate, and colorectal cancer is most likely

Conclusions

The most frequently tweeted cancers are breast, lung, and prostate cancer, and the most common theme of the tweets is sharing about treatment course. A hedonometric analysis of cancer-patient tweets demonstrated interdiagnosis variability, confirming the inherent natural history of the disease affects patient sentiments in unique ways. This preliminary study shows that patients do broadcast their illness through social media and that Twitter can and should be used as a source to gauge patient

Acknowledgment

This work was supported in part by the National Institutes of Health (NIH) Research Awards R01DA014028 & R01HD075669 and by Center of Biomedical Research Award P20GM103644 from the National Institute of General Medical Sciences to C.J.

Authors' contributions: W.C.C. authored the manuscript, performed the pattern matching, patient extraction, and tweet categorization. E.C. collected the Twitter data, performed the hedonometric analysis, created the word-shift graphs, and provided technical

References (26)

  • C.A. Wong et al.

    Twitter sentiment predicts Affordable Care Act marketplace enrollment

    J Med Internet Res

    (2015)
  • H. Cole-Lewis et al.

    Social listening: a content analysis of e-cigarette discussions on twitter

    J Med Internet Res

    (2015)
  • V. Lampos et al.
  • Cited by (0)

    1

    Present address: Oregon Health and Science University; 3181 SW Sam Jackson Park RD; Portland, OR 97239.

    View full text