Red Dog Music | Oct 9, 2018 | 0
Sentiment analysis reveals the happiest and the saddest Christmas songs
What is it about Christmas songs? While many of us are filled with a complete hatred of them and pull one of a wide range of faces when we hear the first one of the season (by ‘the season’, it seems I’m referring to September these days), there’s something about a Christmas song that just seems to work with your festive shopping on the High Street.
Obviously, we at Red Dog Music have a wide and varied set of musical tastes, from the most esoteric of experimental electronica, to the most poptastic of pop songs, there’s something about leafing through the Argos catalogue on Christmas Eve, furiously keying in stock-checks that is better soundtracked by Slade and Johnny Mathis, than it might be by Audion or Nils Frahm.
However, that’s not to say that all Christmas music is cut from the same saccharine cloth… Some of our most favourite Christmas songs take us all the way from the simple joy of walking in a winter wonderland where sleigh bells ring, to a place where someone gave away our heart the day after we gave it to them.
With that in mind, we put our data science hats on and set out to find which classic Christmas songs are the happiest, and which Christmas songs are the saddest using the wonderful world of sentiment analysis – a technique that compares text to a list of words associated with particular emotions, or with a general feeling of positivity or negativity.
Uncovering the happiest and saddest Christmas songs
Obviously, there are a lot of Christmas songs out there and, quite frankly, most of them probably aren’t what we’d call ‘popular’. In the end, we ended up with a list of 47 Christmas songs with a core selection taken from some of the usual subjects found on the Wikipedia page ‘List of Christmas hit singles in the United Kingdom’, supplemented with a few choice picks from various Christmas song lists and a few choice picks to round out some decades and artist types so we could look for any hidden trends…
We furiously pasted song titles and lyrics into a spreadsheet, and fired the resulting csv file into R for some tidy analysis. Ready for the highlights? Let’s take a look at the charts:
The top 5 most positive Christmas song lyrics:
Peace on Earth / Little Drummer Boy – David Bowie and Bing Crosby
Mistletoe – Justin Bieber
Step Into Christmas – Elton John
Millenium Prayer – Cliff Richard
Mary’s Boy Child – Boney M.
The top 5 most negative Christmas song lyrics:
Lonely This Christmas – Mud
Christmas Lights – Coldplay
Christmas in Hollis – Run DMC
I Have Forgiven Jesus – Morrissey
Santa Claus is Coming to Town – Bruce Springsteen
Some obvious and some surprising findings in there. In the happy category, things were fairly tight, with Bowie and Crosby just edging it from the closely following crowd. In the negativity section, it’s perhaps unsurprising that Mud take the top spot – by a huge margin- and it makes sense that Mozza makes it into the group, but Santa Claus is Coming to Town? Surely some mistake?
While it’s a fantastic upbeat song – particularly the Springsteen version, let’s take a look at the lyrics a bit more closely:
You better watch out, you better not cry
Better not pout, I’m telling you why
What if I watch a heart-rending scene in a film? Am I not allowed to cry then? It all just seems a little bit unfair.
He sees you when you’re sleepin’
He knows when you’re awake
He knows if you’ve been bad or good
So be good for goodness sake
Sees you when you’re sleeping? Knows when you’re awake? That just sounds sinister.
So, there you go, while you thought it was the perfect Christmas soundtrack, it’s full of warnings about not doing things because you’re being watched. Sounds quite negative to me…
Delving deeper into the Christmas song data
Of course, now we’ve got a nice dataset, who says we have to stop there? Let’s see what else we can find…
First of all, if we take a general overview of the whole dataset, we can see just how miserable Lonely This Christmas is, but we can also see that our Christmas song lyrics are, overall, on the positive side of things.
Next up, let’s look and see if sentiment score correlates with peak chart position. It doesn’t really. The numbers show that you can have negative and positive UK number ones, and can have negative and positive songs that fail to make the top twenty.
What about if we look at sentiment scores over time? Have we become more bitter and cynical over time, or are we happier, cheerier souls?
Consolidating the data by decade suggests that may be the case, sentiment scores do seem to be trending lower since the 1950s. Not only that, but once again you can see just how much of an outlier that Mud song is, all out on its own as a separate dot in the 1970s!
If we plot all the songs as individual points, we can see that the trend is definitely heading down. However, the numbers suggest that there is a 23% that this observation could be due to chance. A possible trend perhaps, and perhaps an avenue for further investigation
Let’s just finish up by breaking down the dataset into as granular as we can get without running out of dimensions. The following plot puts sentiment score against year and shows peak chart position as the size of the point, but also breaks the data down by artist type, showing male and female solo performers, bands and groups (acts more known for singing more so than playing their own instruments).
The caveats and disclaimers
In this study, we have analysed only a small proportion of the history of Christmas songs. As we were interested in studying popular Christmas songs, there was an inherent selection bias in that we chose famous, successful and charting songs for our analysis, rather than selecting songs at random.
Additionally, there will be differences in overall sentiment score for different version of the song or in differences in the way the lyrics were transcribed, with different versions of songs containing different counts of words used in the sentiment analysis. One method that has been used previously is only to count the presence of a word, not the count of how many times it is used. The reason this was not performed in this study is that, in my opinion, a song that says “I feel sad” a hundred times over 10 minutes is more sad than one that says it once in three minutes.
Due to the small dataset used in this study, sentiment analysis was limited to an overall positive or negative sentiment using the ‘bing’ lexicon. Other, emotion-based lexicons could provide additional insight. Also, as the sentiment analysis looked at individual words rather than in the context of a sentence, some words could be mis-classified in some contexts.
Sentiment analysis of song lyrics using R and tidytext
If you fancy performing your own sentiment analyses, R and the tidytext package by Julia Silge and David Robinson make things fairly straightforward. Maybe you want to see whether Morrissey is more miserable now that heaven knows he was miserable then; perhaps you want to know if your own lyrics have become more cheery over time. Whatever insight you might hope to gain, R and tidytext can help you find it.
There is a great tutorial – on which this post is based – over on the tidytext documentation pages, so that’s perhaps where you should start. However, if you just want something that you should be able to just copy, paste and run, create a .csv file called ‘lyrics’ with two columns: ‘song’ and ‘text’, import it into R and, fingers crossed, this code should spit out your sentiments:
#load required packages require(tidytext) require(dplyr) require(tidyr) #prepare sentiment lexicon bing <- sentiments %>% filter(lexicon == 'bing') #create tokenised lyrics of one word per row tokenLyrics <- lyrics %>% unnest_tokens(word, text) #perform sentiment analysis with an inner join tokenLyrics <- tokenLyrics %>% inner_join(bing) #remove stop words tokenLyrics <- tokenLyrics %>% anti_join(stop_words) #summarise sentiment counts lyricSummary <- tokenLyrics %>% group_by(song) %>% count(song, sentiment) %>% spread(sentiment, n) %>% mutate(netSentiment = positive - negative) View(lyricSummary)
Wordclouds were created with the wordcloud2 package, and ggplot2 was used for all graphs and charts. Of course, rather than just copy, paste and run, time might be well spent looking at the top word in the analysis and making sure that they make sense in the context of the song.