On Social Media Analysis – and Understanding How to Use Big Data

This week news outlets reported on Admiral insurance taking steps to use Facebook content to individually determine insurance premium discounts for users. Since the initial announcement Admiral have almost entirely abandoned the system due to negative response. But we thought we’d weigh in on this decision, and provide a more research based perspective on how big data should be collected and what it’s really useful for.
Since the 21st century technology has reached a point where the information and data which can be collected for research has found exponential growth. This shift is referred to as ‘big data’, and it is a true golden egg of the modern day information gathering. The reason for this is two-fold; firstly the shear amount, through social media and device based data collection, gaining a sample size this research has now become possible in a way that was never imaginable previously. The second factor is accessibility, in the 2016 landscape of big data, most of this information is already being collected, so where in the past researchers would have to manually hand out questionnaires and collected them, modern day social media research give access by simply requesting it. This is the digital tap that Admiral have hoped to draw from, taking the posts and content that are already sitting somewhere in a server room waiting to be analysed.

On the other hand…
But with such benefit comes inevitable constraints, like in any research it’s important to know what you can get out of data and the ethics of collection. Social Media Listening allows us to remove a confounding variable that can weaken research like surveys or focus groups, in that research participants must be aware that they are being studied, and for anyone who’s ever published research that involves even the slightest omission of truth or deception, it’s likely you’ve had to fairly rigorously adhere to ethics guidelines like say the British Psychology Society. So how does Social Media Listening work ethically then? Well you know that terms of service that you clicked accept on when signing up to Twitter? Well when you clicked on that you’ve told Twitter Inc. that unless you set your profile to private, anything you post can be seen by anyone on the public space of the internet, in their words “What you share on Twitter may be viewed all around the world instantly. You are what you Tweet!”. Social media listening tools just collect this data, by software (called API’s) that twitter allows anyone to access. For example, this STV tweet that’s in my data set this month appears in whole, an also tells me things like potentially 40,295 users have scrolled past it in their newsfeed.



Well these useful numbers are calculated by the Social Media Listening tools, in this case Brandwatch, and the tweet? Well anyone can see that with a quick google.




So back to Admiral then, this means that they’re fully ethical with this tool right? Well no, because Facebook works a bit different. Facebook content is anonymised, by a group called Datasift, who Facebook agreed to trust with this task back in march of 2015, you can read more about how they work here. Basically researchers don’t get to see full posts on Facebook, only the topic that a large group of people have talked about. Admiral learnt this the hard way when they found that their tool was in fact in beech of Facebook terms of service. This speaks volumes to the difference in how people use Facebook vs. Twitter, users of Facebook share more personal information, and in turn are more likely to set their profile to private. In fact as far back as 2012 research has been pointing to users keeping their information more private on the site, figures from the same year suggests that only 11.2% of users have set their Twitter profiles to private. This is why Twitter content is public domain, the platform has been build from the ground up for users to shout into a web space, but on Facebook you’re more likely to only want to talk to friends and family directly.

Constraints of Social Listening.
Most users of social listening, including DI use socials for research, we use big data to answer big questions. What Admiral hoped to do, and what research shouldn’t be used for is diagnosis of individuals. At its core this removes a safe guard that any social media research must account for, data needs to work on averaging, this isn’t unique to social research, but it’s a constraint of the sheer amount of data collected. Any dataset has to be large enough to claim a strong effect, in conventional research this is done by increasing the sample size, but sample size isn’t the problem with big data. with social media research the problem is control of what’s collected, because as we said above, you don’t have control over the data collection. So we analyse effects by averaging.
There’s no such thing as a perfect participant in social media analysis, people don’t always share, and some overshare. Say we want to understand how users talk about the Olympics (like we did here), well over the course of the event we can’t reliably depend on everyone who talks about Rio to weigh in on every news piece. But if news warrants immense pride in users, then we will see the overall effects, and we’re not depending on a small group of people to comment, instead we have a wide and diverse group of people expressing their sentiment.
This is a big problem that served to doom Admiral’s tool from the start, it can’t average. They are trying to diagnose individual people and rank them (from what they have explained) on just their sharing habits. Good social media research should work with the constraints to make the big data show great insights on the feelings of a whole group of people. This is where social media analysis is at its best, and knowing the restrictions of this invaluable resource of data helps researchers better utilise all manner of tools, and analysis to learn about online users.

The data used in this report was powered by Brandwatch.

© Disruptive Insight 2015.