It is the hot topic for data
journalists in this election. Some call it MRP, some
Mr P. But the full name is multi-level regression
with post-stratification. So what is it? In short, it’s a way of
using a big national poll to estimate how people will
vote at constituency level. National polls of
1,000 people are good at telling us what
share of the national vote each party will get, but not
so good at predicting who will win each of the 650 seats. And in the UK system it’s
seats, not votes, that counts. For example, in 2015,
the Conservatives won 37 per cent of votes
and 51 per cent of seats. Ukip won 13 per cent of
the vote and less than 1 per cent of seats. Traditionally,
pollsters have tried to get around this
using a method like uniform national swing. If Labour was down 11 percentage
points on the last general election nationally, the
pollsters would subtract 11 percentage points from their
vote share in every seat. But that can’t capture all the
electoral nuance, for example, the influence of Leave and
Remain in particular areas, or big student votes
in university towns. So in comes MRP. Step one, a large poll
sample, tens of thousands of people across the country. That’s because you want dozens
of people in each constituency, the more the better,
to pick up on what makes that seat
different from the rest. Step two. Don’t just ask them
who they’re voting for. But who they are. Age, sex, ethnicity,
education level, housing, occupation, how they
voted in the EU referendum. You’ll also have gathered
lots of local information about their constituency, from
which parties have historically done well or poorly
there, to what’s happened to house prices. So you have data at
the individual level, but also the context of the
wider geographical area. That’s why it’s
called multi-level. You then run a
regression on that data. That’s a statistical
technique that measures the
probability of someone with those combinations
of personal and local characteristics, A,
voting at all, and B, voting for a particular party. So we’ve done MR. Then comes
P, post-stratification. This is where the
modellers use data from sources like the census and
the annual population survey. They can tally up
the number of people with each combination of these
demographic and socio-economic characteristics in
every constituency and then apply the voting
probabilities from the MR step onto the population data. So you have an estimate for
how a white British male who left school at 16 is likely
to vote, for example. And that estimate will
differ between Great Grimsby, Northwest Durham,
and Glasgow Central. In fact, you have a
series of estimates for different
demographic combinations in different places. So you then combine
them to give you the total number of votes
each party is likely to secure in every one of the
650 constituencies, and that can be used to
calculate which party is most likely to win each
seat, which is most likely to be its closest challenger,
how big the margin between first and second place
is likely to be, and so on. This allows parties to better
target their campaigning resources on seats that
are going to be close. And it can help people like you
make a more informed decision on who to vote for tactically. It gives the public a
more nuanced picture of how the election
is likely to play out. Now, it’s not a perfect system. This type of
modelling is complex, and there are many variables. And the choices the
modellers make mean one model will have different
outputs to the next. But MRP is the most
refined tool we have until the votes are
actually counted.