Featured, Features | 4th June 2015

If polls are wrong, can a handful of stats tell us how people will vote?

The polls were wrong, but our demand for precision was the real problem. Polls shouldn’t be reported in round numbers anymore, and other data should be used.

Photo: Getty

By

If you hadn’t heard, the polls didn’t predict last month’s general election. Or rather, they very accurately predicted support for the SNP, Ukip, and Greens – and, at a national level, the Lib Dems – but failed to predict what really mattered: support for the Tories and Labour.

More than 500 national polls were published in the year before the election. 99 per cent of them suggested the Tories had no hope of winning a majority. Poll after poll implied both parties would win around 34 per cent. In the end Labour won 31 and the Tories 38.

Were the polls really wrong? Sure, but our approach to them was the real problem. We all treated polls as very specific when they were, and only ever can be, impressionistic. The magic of polling is that you can ask 1,000 people who they will vote for and, 19 times out of 20, their answers will be within 4 percentage points of representing actual British opinion.

The lesson of the 2015 election is to treat margins of error as real.

That is essentially what happened. Clearly the polls were at the far end of this ‘margin of error’, but the lesson of the 2015 election has to be that we should treat margins of error as real. The polls didn’t say Labour would poll 15 per cent; they come within 90 per cent of the right result. And this wasn’t unique. From 1979-2010 eve-of-election polls were only 80 per cent accurate.

Why didn’t we say this before May 7? We were given false confidence by the sheer amount of data being published. We not only had dozens of national polls saying similar things in the final weeks – especially after a string of pro-Tory phone polls all moved into line with online polls showing a tie – but more than 150 polls of individual seats by Michael Ashcroft, which were saying the same thing.

BRITAIN-VOTE

And, while many reasons have since been found to explain Miliband’s inevitable defeat, as many reasons would quickly have been sourced had Cameron lost. “The cuts were too severe”, “There was a presumption that be competent you had to be callous”, “The economy didn’t grow for nearly three years”, “Real wages fell for four years”, “A bunch of Etonians and Bullingdon Clubbers were never going to hold onto power”, “Cameron never tried”, “Inequality is the major issue now”.

Hindsight comes easily. But rather than squabbling about who was the least inaccurate forecaster, as some overzealous pollsters have been determined to do, or finding reasons why Labour lost, could we ignore polls completely? Could we instead use a few stats to explain elections, and predict the future?

*

That is the promise of demographic analysis. The spoiler is that Labour had smart people doing demographic work and it did not rescue them. Some insiders have suggested they knew the party was stumbling as election day approached, specifically among young families and professionals who found little to inspire them in Labour’s message.

Too many of these voters were telling Labour’s canvassers that they were undecided. These may have been the latest iteration of shy Tories.

Labour’s operation, like any modern campaign, placed every voter in a demographic category, and used polling and historical data to predict who each type of voter would support. We haven’t attempted anything so complicated, mainly because we don’t have the detailed consumer data that campaigns buy from companies like Experion.

Could we ignore polls completely and use a few stats to explain elections?

Instead, we gathered seat-by-seat data on everything from the number of business in a constituency to the proportion of council housing, level of educational qualifications and number of older voters. We then looked for relationships between these things, or ‘variables’, and support for the Tories.

The great aim is to discover the variables that can together predict support for the parties. And we want to use as few as possible. If we found some measure was very closely linked to how a seat voted then the next election would be determined by two things: by how much that measure changes in each seat and how much that relationship holds.

Say it was a measure of economic well-being, like the unemployment rate. And imagine that the relationship was exact, so that support for the Tories was higher wherever the rate was lower, and rose as the rate fell. Predicting the 2020 election would then just be a question of how much the unemployment rate changed in each seat – if the relationship held. But maybe the relationship would change, if Labour come to be seen as a party befitting places of low unemployment.

GREAT YARMOUTH, UNITED KINGDOM - MAY 17:  A youth looks at classic 1950s cars on May 17, 2008 in Great Yarmouth, Norfolk, England.  Twice a year fans of the 1950s come to Seacroft Holiday camp in Hemsby to buy memorabilia, clothes and to show off restored classic cars. Every night of the four day 'weekender' also includes a line up of bands playing original 50s music.  (Photo by Peter Macdiarmid/Getty Images)

This is the point, to come up with a new data-led way of thinking about things. Doing so in any depth is complicated. But our simple attempt found that a measure we created last year was the best predictor of Tory support. It is a measure we called ‘work quality’ and is based on data from the 2011 Census.

During the Census every job in the country was ranked, with professional and managerial jobs given a high rank (a 1 or 2), lower managerial and self-employed roles given a middle rank (3-5), semi-skilled and more routine work rated a 6 or 7, and the out-of-work categorised as an 8.

We then used this to calculate the average job in each seat. It ranged from a 2.9 in Richmond Park – the average job in Zac Goldsmith’s seat is just under a managerial/professional level – to a 5.5 in Birmingham Hodge Hill. In Liam Byrne’s seat, relatively routine work is the norm.

The average quality of jobs in each seat turned out to be closely linked to whether that seat voted Tory. A correlation of 0 means two things are not linked. A value of 1 means they are perfectly linked. The correlation was 0.65.

quality of work pic

This is greater than the link between the Tory vote and other economic variables: from the number of businesses in a seat (.60), to the 2014 unemployment rate (.53), proportion of public sector jobs (.46) or amount of self-employed work (.39).

It was only rivaled by the proportion of social housing in each place, with had the same link with Tory support (.65). Other variables, like the quality of educational qualifications (.40) or proportion of non-white residents (.26), had weaker relationships.

*

This isn’t surprising stuff. It all reinforces the basic truism of British politics: the better off a person or place is, the more likely they are to vote Tory. When we talk about economic standing or dependence on the state, we’re talking about essentially the same thing.

But the data is still interesting and we can do two things with it. First, it matters because of where this truism doesn’t hold. There are certain seats which are prosperous but don’t vote Tory, and there are ones that aren’t but do. Seats like Hornsey & Wood Green, Brighton Pavillion, Bristol West, Cambridge and Sheffield Hallam are among the former, and places like Peterborough, Great Yarmouth and Boston & Skegness epitomise the latter.

BRITAIN-WEATHER

There is a strong relationship between prosperity and Tory support, but it isn’t conclusive. So what is going on in these exceptional places? Is there actually some third factor which would explain the Tory vote more emphatically, and show that the link between prosperity and Toryism is spurious?

If there is, we haven’t discovered it. The risk when talking about the link between two things is to not realise that some unmentioned factor is really responsible. For instance, you would find a relationship between height and educational attainment if you measured an entire school of 3-18 year olds, but it would disappear once you accounted for age and measured each year individually.

The basic truism holds, but what is going on in these exceptional places?

The basic link with prosperity makes sense. The next step is to understand why these two sets of seats don’t fit the pattern. It’s clear that in the prosperous places mentioned above, some kind liberalism exists. These places vote Labour, Lib Dem and Green, and are all ‘cosmopolitan’; they are places where Ukip does badly.

So we searched for data that would measure this liberalism. We looked at the number of civil partnerships in each seat, but there is no statistically significant relationship between that and Tory support. We then considered the proportion of state school students in each, but that data hasn’t been aggregated by the DfE, and an initial analysis suggests no clear link: almost every child in Sheffield Hallam is state-educated while a minority are in Cambridge.

Equally, what of the seats that vote Tory but aren’t economically prosperous? There is some attitude that supercedes money in these seats. Again we looked for proxies for this attitude, but couldn’t find a fitting measure.

Our best analysis for all seats gathered together five variables – the quality of work, the proportion of public sector work, amount of social housing, and numbers of students and over 65s in each seat.

4 variables pic

By using these five measures – we avoided adding other economic variables which seemed duplicative, like the unemployment rate – we can explain just over 60 per cent of Tory vote in each seat. In other words, 60 per cent of the variation from the normal level of Tory support in each constituency can be explained by our five pieces of data.

Screen Shot 2015-06-04 at 15.19.17

That’s rather specific. Put more broadly, as these things should be, we are saying it’s fairly unlikely that a seat with a) low quality of work; b) at least a quarter of people in public sector work; c) a similar proportion of social housing; d) many students and e) few pensioners, will vote Tory.

And taking measures other than work quality into account is useful in explaining why student-heavy places like Brighton Pavillion, Bristol West and Cambridge don’t vote Tory, or why Dulwich & West Norwood and Islington North don’t, with their high proportion of public sector work and social housing. (We found it harder to uncover stats that explain why places like Peterborough, Yarmouth, Boston, Pendle or North East Cambridgeshire don’t vote Labour, as less prosperous places tend to.)

*

In the future, data like this – and far more of it – will surely become a greater part of political polling. Parties develop sophisticated internal models to understand every voter, while national polls are relatively abstract. If Michael Ashcroft maintains his interest in polling, he is one of the few pollsters with the limitless funds needed to try and bring together polling with more sources of data.

Many pundits are now likely to take little interest in numbers. That would be reductive.

More immediately, this kind of approach could have served sites like this one well. For instance, while polling showed the Tories winning in Ilford North, it wasn’t very demographically suited to them, and Labour won the seat when they won almost no others seats which we hadn’t been fairly certain they’d take.

And attempts to understand why some seats buck the truism of British politics will continue, often without much data to support them. Many pundits are now likely to take little interest in numbers, they can easily invoke May 7 when they find a stat they don’t like. That would be reductive.

There are clearly limits to data, and we can all be spared more breathless reporting about ‘Big Data’, but the inaccuracy of pre-election polls still doesn’t mean the man you met in a street one April morning was a focus group, or your gut feeling was much more than superstition.

The best solution to what happened is to create a new rule: no poll should ever again be reported in round numbers. Polls come with a margin of error, so report them with one.

Political data is very useful, but only when we realise it is rarely more than impressionistic. It is almost always like a sketch, not a blueprint. Every pre-election poll that said the parties were on 34 per cent, should – in hindsight – have been reported as putting them each on 30-38 per cent. If we adopted that rule, never again could a campaign be so distorted by pre-election predictions.