Sentiment Analysis on Singaporean Twitter’s Reactions related to COVID-19

I recently became acquainted with the BSI Sentiment Analysis Pipeline created at BSI Bocconi (thanks Mathilde! 😉) and having failed before at scraping Twitter’s data, I decided to give this tool a shot. It turned out to be a gift that kept on giving.

My initial idea was simply to download tweets from Twitter and store these tweets in a structured, tabular form. While it sounds straightforward, many tools currently are either outdated and do not work, require a Twitter developer account or necessitate other cumbersome steps. The BSI Sentiment Python library works after these 2 lines (in Anaconda Jupyter NB):

!pip install bsi-sentiment --upgrade

from bsi_sentiment.twitter import search_tweets_sn

Their GitHub page shows an example of downloading tweets from Twitter, which I’ve modified into the following for a Singaporean context:

tweets = search_tweets_sn(
  q="singapore",
  since="2020-01-01",
  until="2020-12-31",
  near="Singapore",
  radius="200km",
  lang="en",
  max_tweets=-1
)

tweets.get_sentiment(method="vader")
tweets.to_csv("./results.csv")

Following this chunk of codes, tweets made from (presumably public accounts) within 200km radius of Singapore, in year 2020, containing the word “singapore”, are downloaded and stored in a .csv file named “results” in my local drive. Not only that, any keen eye would have spotted the get_sentiment() method. This allowed me to also obtain a “sentiment score” for every downloaded tweet! Exciting! 🤩

Sentiment Analysis

In a brief two-liner, sentiment analysis in this context is simply assigning a score to a tweet based on its content. Depending on the score assigned, the sentiment of the tweet is then assessed on a range of negative (<=-0.05) to neutral (-0.05 to 0.05) to positive (>=0.05). The scoring algorithm is based on VADER, “a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media“. More details available here.

Chinese New Year Tweets

Since I’m now able to easily obtain sentiment scores to every tweet that I’ve downloaded, it quickly occurred to me to try comparing sentiments surrounding CNY over the years, and especially in 2020 and 2021 as CNY 2020 in Singapore happened when the COVID-19 pandemic was first picking up, while CNY 2021 happened with some restrictions imposed by the Singaporean government. Here’s some visualisation:

Sentiments surrounding CNY, 6 months before and 14 days after

Each dot represents a tweet, a dot with “polarity” below or equals -0.05 is interpreted as a negative tweet, between -0.05 to 0.05 as a neutral tweet and above or equal to 0.05 as a positive tweet. High intensity in colour is due to the dots overlapping, indicating high tweet frequency during that time.

It’s quickly apparent that tweets about CNY are generally made about a month before CNY, peaking in frequency around CNY itself (when “Day from CNY” = 0) and are generally neutral to positive, seemingly regardless of the COVID-19 pandemic. The spread in polarity seems similar across all years. There are notably fewer positive tweets in 2021 (indicated by the lower intensity in colour) but it’s probably attributable to lower tweet counts in general and not a shift from positive to neutral or negative sentiments.

Circuit Breaker

I wanted to investigate the scoring methodology a little further so I decided to see what people were saying about the circuit breaker measure Singapore had implemented back in April 2020.

Sentiments surrounding circuit breaker

In brief, strict restrictions (termed collectively as the circuit breaker (CB)) were announced on 3rd April (first red line), enacted on 7th April (second red line) and announced to extend on 21st April (third red line).

Expectedly, many tweets containing the terms “circuit breaker” started to show up after the first CB announcement and lasted until some time after the announcement of its extension. What was curious to me however, was the high amount of positive tweets relative to negative ones regarding the CB, even after its extension. I did not expect Singaporeans to tweet favourably about such a measure, and even less so after it got extended.

Perhaps the VADER model doesn’t understand the Singaporean context?

As the positive sentiments didn’t make immediate sense to me, I created a word cloud to explore some of the most common words in these CB-related tweets.

Common words in CB-related tweets

The size of each word represents the frequency of it appearing in the CB-related tweets. This word cloud seems to lend a bit more credence to the positive sentiments picked up earlier, with frequent words including, “love”, “thank”, “good” and “happy”. There is nothing jarringly negative except perhaps “lockdown”, though I’m not entirely sure how the VADER model would rate this word.

In sum

I got disproportionately excited about the BSI Sentiment Python library, one idea led to another and culminated in this very insightful exercise. Kudos to the BSI Bocconi team for putting this library together and I hope it remains functional for a long time to come! Please also feel free to comment if you spot any mistakes or have other interesting views to share. (Apparently it also pays to be a busybody on LinkedIn — please keep sharing or liking useful tools on there! 😉)

One thought on “Sentiment Analysis on Singaporean Twitter’s Reactions related to COVID-19

Leave a reply to Mallory J Cancel reply