MACIEJ NOSKOWSKI/GETTY
Work the crowd Get the right people together and predicting the future is a cinch, says Arran Frood
32 | NewScientist | 24 February 2018
E
VERY day after breakfast, Shannon Gifford would sit down at her computer for an hour and scour obscure corners of the internet for clues. The questions she was attempting to answer changed. Once she was trying to find out whether radioactive poison would be discovered in the body of former Palestinian leader Yasser Arafat. At other times she was working out whether the price of oil would rise above $60 a barrel that year, or predicting the outcome of a forthcoming presidential election in Ghana. Gifford isn’t an investor, a spy or even an insatiably curious news junkie. Alongside hundreds like her, she was part of an extraordinary experiment to find out whether the wisdom of the crowd can predict the future. The answer surprised even the US intelligence officials behind the experiment. It turns out crowds really can make accurate predictions – so accurate, in fact, that they promise to permanently change how states analyse intelligence.
We have known some of the benefits of collective wisdom since Aristotle, but a slightly more recent example features in the 2004 book The Wisdom of Crowds by journalist James Surowiecki. The opening pages tell the story of the day Charles Darwin’s cousin Francis Galton went to a country fair. Galton, a formidable scientist himself, asked people to guess the weight of an enormous ox. Most got it absurdly wrong, but the median guess of the 800-strong crowd was just 1 pound off the true weight of the ox, which for the record was 1198 pounds, or 543 kilograms. The wisdom of crowds is an integral part of life today. We try suspected criminals by jury. We use crowdfunding websites to back new products. We follow the throng to popular restaurants. Now it even seems it may be possible to predict the future using the masses. One existing way of doing it is with prediction markets. This is a form of gambling in which people buy shares in the outcome of a future event, commonly sports fixtures and
elections. Unlike a run-of-the-mill bet, the market owners put the share price up or down depending on demand from buyers. That means the shares with the highest price are a de facto prediction of the future. So how accurate are they? In 2008, Joyce Berg at the University of Iowa and colleagues analysed the long term-performance of the Iowa Electronic Exchange, a prediction market set up in 1988. Looking at the predictions on five US presidential elections, the researchers found they were more accurate than polls 74 per cent of the time. That’s not too shabby a record, although beating a poll might not seem that impressive
Will the UK have a new prime minister before 1 October 2018? Yes
No
in today’s febrile political climate. And the prediction markets have a downside: they can be rigged. Rajiv Sethi, an economist at Columbia University in New York, showed that in one prediction market for the 2012 US presidential election, a single trader accounted for a third of all bets on Mitt Romney. They may well have been trying to manipulate public confidence in him winning. Despite this flaw, the US intelligence community became interested in prediction markets in the early 2000s. Hit hard by their failure to know the terrorist attacks of 11 September 2001 were coming, they wanted better ways to make geopolitical forecasts. The Pentagon’s research arm DARPA launched a predictions market that allowed real money bets on assassinations, uprisings and future terrorist attacks. It proved rather unpalatable, however. One senator called it “ridiculous and grotesque”. The programme was shut down before it had a chance to properly get going. Then in 2005, a book called Expert > 24 February 2018 | NewScientist | 33
Political Judgement brought crowd predictions to the fore again. The author Philip Tetlock, a psychologist now at the University of Pennsylvania, had studied expert predictions for two decades. In one experiment, he surveyed about 300 professional political and economic forecasters, asking them a series of questions about the future and getting them to pick answers from a range of options. He also asked them to assign a probability to their stance. He amassed tens of thousands of predictions and compared them with what really happened. The experts performed terribly: worse than if they had assigned equal odds to each outcome every time. Surprised? Don’t be. We know that humans are routinely blinded by cognitive biases. For example, we tend to undervalue new information that contradicts our established beliefs. But Tetlock went further. He also asked ordinary people similar questions and found their predictions were more accurate than those of the experts – possibly because they didn’t have the same ingrained, subjectspecific biases. The work caught the eye of Jason Matheny, who has long worked at, and is now director of, IARPA, a US agency that funds intelligencerelated research. So he wrote to Tetlock suggesting they set up a forecasting trial. This time it would be a tournament, pitting teams of people against each other. Would Tetlock’s findings hold on a bigger scale where anyone could take part?
First, IARPA set about 100 questions on geopolitics (the questions on these pages are illustrative examples). These were given to four teams of academics, one of which was the Good Judgment Project (GJP), with Tetlock and others at the helm. Each team then put out a call for ordinary people to volunteer to answer the questions. Once the responses were in, the academics used them to produce a final prediction, which was later compared with
Before 1 January 2019, will any other EU member state schedule a referendum on leaving the EU or the eurozone? Yes
No
what really happened. The competition was repeated with new questions each year between 2010 and 2015, by which time some 2800 volunteers had taken part across the four teams. The GJP team quickly developed a strategy that would win over and over again. As the group received more and more answers, and looked at what actually came to pass, it used an adapted version of a metric called the Brier score, originally developed to quantify the accuracy of weather forecasts, to rank correct
I PREDICT THE END OF FAKE NEWS Prediction markets that essentially allow people to bet on particular events coming to pass have been around for decades (see main story). But a new breed is rearing its head – and promising the earth. These markets, with names such as Augur, Gnosis, Bitcoin Hivemind and Stox, are based on blockchain technology. That gives these markets two potential advantages. First, because blockchains are decentralised, markets using them can sidestep regulations. In the US, prediction markets are considered gambling,
so bets are taxed. The size of bets can also be capped. “Prediction markets have a lot of baggage,” says Paul Sztorc, creator of Bitcoin Hivemind. Remove that, and the potential market size increases, meaning that in theory the accumulated bets will be better predictors of the future. Second, blockchain-based markets are run by their users. This means anyone can create stocks in any future event they wish. Some see this as chaos compared with established prediction markets, which often focus on events like elections with well-defined outcomes.
34 | NewScientist | 24 February 2018
But proponents claim these markets offer something revolutionary. “I think prediction markets in general can help us fight fake news – but particularly the blockchain ones,” says Sztorc. They provide a mechanism for crowds to verify the outcome of any event, with the best verifications earning cryptocurrency as a reward. Sztorc thinks people will consider that verification more trustworthy than, for example, a partisan media brand, political party or fact-checking blog. “You could have a place,” he says, “where people could see objective validity.”
predictions. Those made far in advance, for instance, or of rare events, scored more highly than obvious short-term predictions. This enabled the group to work out who the best performers were. Among the top 2 per cent was Gifford. From her home in Denver, Colorado, the 59-year-old would play with her predictions every day, enjoying the new websites that the stream of questions took her to. “It made me look at the news in a less passive way,” she says. It turned out she had just the right characteristics to effectively predict the future (see “Are you a superforecaster?”, right). Once the GJP team had identified the cream of the crop, it put them in teams and began using their predictions. This gave them barnstorming success. “To say the GJP exceeded our wildest expectations might even be underplaying it,” says Seth Goldstein at IARPA. The GJP superforecasters beat the other teams in the competition by a mile and were 50 per cent more accurate than a control crowd assembled by IARPA. Research comparing the GJP’s superforecasters against professional intelligence analysts is expected to be published soon. Gifford says the professionals only did better on questions where classified information was valuable. “In general, I think we beat them,” she says. We knew that crowds could beat experts, so if you take the smartest bit of the crowd, it stands to reason that they will beat them harder. But now a new possibility has suggested itself to the intelligence services: if the secret to assembling a wise crowd is to avoid bias and use the smartest heads, then perhaps icy machine minds could do the job. We already use machine learning to make predictions. IARPA’s Open Source Indicators project, which ran between 2012 and 2015, used computers to identify patterns in online searches and social media activity to predict significant societal events. The project successfully forecast the surges of civil unrest during protests in Brazil in June 2013. Now IARPA is launching a competition that will see teams of humans and machines work together to generate what the agency hopes will be the best crowd predictions yet. It is called the Hybrid Forecasting Competition. Silicon circuits are known for being coldly calculating, so you might think the wisest possible crowd would be formed exclusively of machines. But that isn’t necessarily true for two reasons. First, machines are programmed by humans and a touch of our bias often gets imparted. Second, we have all sorts of tacit knowledge that is hard to teach a machine:
ARE YOU A SUPERFORECASTER? Crowds of ordinary people can be good at predicting the future (see main story). But the most accurate predictions come when you identify the best 2 or 3 per cent of a crowd and team them up. Nearly all these superforecasters have a university degree, a wide range of interests and a curious mind. They also tend to have a few other key characteristics. INTELLIGENCE
Superforecasters are smart, particularly when it comes to fluid intelligence – the ability to apply past knowledge to new situations SHREWDNESS
Making good predictions involves absorbing lots of generic information and working out its significance to particular questions. Superforecasters are great at making those calls MOTIVATION AND COMMITMENT
LEO CORREA/REDUX/EYEVINE
The best forecasters tend to think about their predictions every day and work to refine them Algorithms predicted surges in Brazil’s civil unrest in 2013
that mental alarm bell that sounds when you come across a fact or report that sounds unreliable, for example. “We want to learn to what extent humans and machines can help each other,” says David Huber, a computer scientist at HRL Laboratories in California, who is leading one of the teams in the competition. Participants assigned to Huber’s team will log on to a website and watch the machine forecast running. They can also explore the data it is using and use it to inform their own predictions, discrediting it if they feel they should. So what happens if the competition replicates its predecessors and produces a step change in the accuracy of forecasts – then a system based on it predicts a terrorist strike or a North Korean missile launch? Politicians already have to make fraught decisions on the basis of uncertain intelligence. It is possible that an accurate human-machine forecasting programme
will make those decisions less uncertain. The jury’s out for the time being. But all three teams in the Hybrid Forecasting Competition say getting people to understand how the machine has made its predictions is key to gaining trust. Crowd forecasting will probably bring even bigger gains in other fields. “Prediction markets have enormous promise for revolutionising business and corporate governance,” says Robin Hanson, an economist at George Mason University in Virginia.
Before 1 October 2018, will the US provide notice of intent to withdraw from North American Free Trade Agreement? Yes
No
TEAMWORK
It’s good to talk. The best forecasts come from teams that discuss questions, then arrive at a communal decision
He thinks investors should be running such markets. You might set up two: one predicting a firm’s stock price if its CEO stays, the other where they quit. The difference is “the market estimate of whether the CEO should be dumped”, says Hanson. One complication could be the observer effect: if that CEO saw the data would they up their game? But done right, Hanson thinks it could dramatically improve our ability to find the value of changing course in a host of arenas, from product prices to politics. In fact, it is already happening. The GJP now has a commercial spin-off that offers the services of the world’s superforecasters to corporate clients. Gifford was offered a place, but for now she has decided to take a break. “The commercial questions didn’t seem as much fun,” she says. n Arran Frood is a science journalist in Bristol, UK. Questions in this article are based on examples from the Good Judgment Open website 24 February 2018 | NewScientist | 35