Big changes are coming in the New Year with the launch of our new brand!   - If you have any questions in the mean time, drop us a line!

+44 (0)330 313 1000

by Graham Cookson
in Big Data , Hadoop , Apache
on 07 May 2015

General Election 2015: social media report

Over the last year, our in-house team of developers have been refining and evolving our big data analytics engine and, as we draw closer to the launch of our first product,, we thought it was about time to put it to the test. But to put a twist on things, we decided not to test it in relation to our product (a financial platform), we wanted to see what insights in other industries we could divulge.

With the buzz around the General Election hotting up over the last couple of months, we decided to point our big data analytics engine at the chatter that’s been happening around the televised Leaders’ Debates and Q&A sessions with the public.

Why look at the General Election?

Why did we choose this as a test bed for our platform? Put simply, we saw it as a great opportunity to validate our proprietary technology and platform.

You see, because our first product has a strong focus on the financial industry, we have spent most of our time engaging and directing our engine at that sector. We had designed and developed our technology so that it can cater for multiple industries, but a lot of the data we gather occurs around ‘unknown’ events and we needed to ensure that the insights we were producing were of value and are accurate.

With the General Election, we have a fixed period, around a known event, which allows us to test our platform and help validate the unknown events our platform usually deals with on a daily basis.

It’s also a topic of popular interest right now, with a fixed shelf life (when voting closes on May 7, 2015) and a high volume of individuals talking about it on social media - which offered us a chance to collect and observe large spikes of chatter and sentiment around a single topic.

And because it’s a popular topic, we knew that the information gathered could be filtered down to useful, bite-sized pieces of information that would be of value to us and other individuals.

Finally, we saw it as a good assessment of our Hadoop infrastructure – gathering millions of social messages and testing the speed and efficiency of our cluster.

How did we do it?

Our team started by collecting data around the election from a wide range of social media. We searched for an array of hashtags which were trending (e.g. #GE2015) but also searched for politicians’ names and social media IDs.

We also investigated mentions of political parties, which allowed us to see how party leaders were doing compared to their parties. This was really useful in the debates when the leaders garnered even more mentions than their respective parties - as it helped show the public’s sentiment towards the individual leader, not just the party as a whole.

When crunching the amount of data generated during the election there are a huge number of challenges. One thing we realised early on is that we need to make sure anything we analyse relates to the election – we used a range of tools to sift through our data and found that around half of what we collected wasn’t relevant. We also needed to find a way to process the data in real-time so we could produce useful insights during key moments such as the leader debates.

To achieve this, we used our Hadoop stack. This enabled us to process a single job over a huge number of machines; allowing for real-time processing of all of our data. We calculated we could use this to process over 22 billion pieces of data per day (although we ‘only’ collected around 200 million!) which allowed us to monitor this election on an unprecedented level.

Finally, we needed to work out how to use all this data. Having so much raw information allows us to gather a general picture of how things are going but also to focus in on important points like the Leaders’ Debates and Q&A TV spots. We decided to focus on how many mentions parties and leaders had on social media along with whether the sentiment of these mentions was positive or negative.

Insights gained

During the first all-party Leaders’ Debate (broadcast (April 2, 2015), we looked at the number of mentions each party and their respective leader received on social media. The results showed that the number of mentions for most parties remained steady with a couple of exceptions. Mentions for UKIP jumped around 20:51 when Nigel Farage talked about health tourism.

What proved even more interesting was to see how the number of mentions for the SNP and Nicola Sturgeon rose steadily during the debate – the following graph highlights the upward trend of SNP mentions:

-Click to enlarge image-

As the debate went on, more and more people were talking about the SNP and Nicola Sturgeon on social media. By the end of the debate, the SNP were getting more mentions than UKIP (who were the most talked about party for most of the debate).

The following chart shows each party ranked by number of mentions over five-minute periods during the debate – it shows the SNP moving through the ranks as the debate went on:

-Click to enlarge image-

We could also look at the sentiment for parties and their leaders during the debate and, again, the SNP out-performed all the other parties. The following chart shows positive, negative and neutral mentions on social media for all parties and leaders (neutral mentions being those with no clear positive or negative bias). The outer ring shows mentions for parties and the inner ring shows mentions for specific party leaders:

-Click to enlarge image-

Compare this with the same chart just looking at the SNP and Nicola Sturgeon:

-Click to enlarge image-

The overall sentiment was more strongly positive and 40% of social media mentioning Nicola Sturgeon was assessed as showing a positive bias.

Looking at the whole campaign, it’s clear to see how this debate raised the awareness of the SNP on social media. This chart shows the number of daily mentions of each party during the election campaign (note: there is a slight gap/date jump at the end of April when we had some downtime, while we upgraded our Hadoop stack in preparation for the final night of TV debates):

-Click to enlarge image-

After the debate on April 2, daily mentions for the SNP averaged over double the pre-debate daily total. The analysis of social media can clearly show the impact of specific events during a campaign and also show whether these events have a lasting effect on the mood of the electorate.

Sharing our insights

After running our engine during the first Leaders’ Debate, broadcast live on ITV (April 2, 2015), we decided to share our findings with the world, via social media, to see who (if anyone) would find our data as interesting as we found it.

We are happy to say that we received a positive response, with several retweets and one of our charts even being picked up by the Daily Mail for use in their report on the night.

-Click to enlarge image-

The second Leaders’ Debate saw several retweets and favourites from people interested in the data, and, excitingly for us, based on the insights we had been producing, Internet and DAB radio station, Share Radio asked our CEO, Gareth Mann, in for an interview to speak about our technology, what we’ve been doing around the General Election and the launch of our first product.

-Click to listen to the interview-

Possibly the best traction we saw was from the third set of insights we gathered from the BBC Question Time special and the audience Q&As broadcast live on the evening of April 30, 2015. With most of the main party leaders involved in different programmes across the evening, the volume of chatter online, as people watched live was incredible and the insights we produced seemed to stir up more discussion and buzz, when we produced our results on social media.

Within the first 20 minutes we were able to gather enough findings to supply insights to the public and then every 30 minutes thereafter we were able to push out updated charts until midnight, when the TV programmes had finished and chatter around the topic had subsided.

Not only did we have more retweets, comments and favourites on our coverage, but we were picked up by some journalists and featured in an article by The Independent, reporting on the night’s event.

-Click to enlarge image-

As we found once we published our results for each debate to the public, it’s possible to find a variety of insights in this data. So, we’ve collated all of our data from the full run of the campaign, published it to an Excel document and made it available for anyone to see here – feel free to download, play around and see what insights you can determine.

While, we are pleased with the results and insights we have garnered from this test on our big data analytics engine, the ultimate test will be to see how our insights compare to the final results on election night.

Will the sentiment shared by the British public on social media be reflective of the outcome of the actual event, or will we see a wild difference in what was said to what actually happens?

Liked this? You may also be interested in: