Mastering Forecasting

An applied Bayesian statistician and Thai Boxing aficionado, Ville Satopää is an associate professor of technology and operations management at INSEAD. His award-winning research explores different areas of forecasting: judgemental and statistical forecasting, modelling crowdsourced predictions, combining and evaluating different predictions, and information elicitation.

As well as developing general theory and methodology, Ville’s work also focuses on specific projects that analyse real-world data, such as hospital mortality rates, domestic tourism, and urban crime.

In this session broadcast on 24 April, Ville dives into the different types of forecasting including probabilistic and calibrated forecasting. He also shares his current work on different types of aggregation problems and biases, including herding, where people start sacrificing accuracy for the sake of being similar to everybody else.

Hear more from Ville on crowdsourcing, information diversity, the BIN model, and his predictions on the role of AI in future forecasting.

WATCH IT HERE:


Transcript

Des Dearlove:

Hello, I’m Des Dearlove, co-founder of Thinkers50. Welcome to our weekly LinkedIn Live session, celebrating the brightest new voices and ideas in the world of management thinking. Back in January, we announced the Thinkers50 Radar List for 2024, 30 exciting rising stars in management thinking, and we’ll be hearing from one of them today. This year’s Radar List is brought to you in partnership with Deloitte, and we like to make these sessions as interactive as possible so please do let us know where you are joining us from and put your questions in the chat box when you get a chance. Our guest today is Ville Satopää. He’s an assistant professor of technology and operations management at INSEAD. His award-winning research explores different areas of forecasting including geopolitical events and herding and crowdsourcing, and a bunch of things that we’re going to be talking about. It involves developing general theory and methodology, but also specific projects that analyze real world data.

In addition to working on forecasting, he’s experienced in applying Bayesian statistics to rank hospitals in terms of disease specific mortality rates. So some of this stuff is very practical. He’s also, I’m told, a Thai boxing aficionado. Ville, welcome. Congratulations on making the Thinkers50 Radar List.

Ville Satopaa:

Thank you so much, Des. It’s an honor to be here. I’m really looking forward to our chat.

Des Dearlove:

Okay. Well, before we get on to talking about your work, tell us a little bit about your journey. I mean, how did you end up working in this area? Did you from an early age want to get into forecasting or is this something that evolved?

Ville Satopaa:

I guess I … eventually in college just encountered statistics and I thought it was a really nice way to draw insights more objectively about various different applications. And whenever I think back at that time, I always think of what Tukey, one of the giants in statistics, said back in the early 2000 when he was asked, why do you love statistics so much? And he just simply replied, “Well, it allows me to play on everybody else’s backyard.” And to me this was just very appealing in particular because I didn’t really know at the time what application exactly would it be … healthcare or would it be, I don’t know, criminology or would it be epidemiology? What would really excite me? So I thought I could just hedge it out, learn the methodology and be helpful to a lot of different parties.

Now from there, forecasting or prediction is just a major, major area. This now expands to AI as well with machine learning techniques that often are viewed as some type of a prediction machine. Now I like forecasting or prediction in particular because it really gives you this instant feedback of how you’re doing. There’s very objective, very clear metrics that we’re trying to improve. And this is not always so, when in particular you’re trying to do, say, statistical inference that could rely or so your accuracy or reliability of results could depend on so many assumptions on the actual structure of the model you’re using. Whereas in forecasting, look, it doesn’t really matter how you get there, you’re either better than others and that’s it.

Des Dearlove:

Okay. I mean obviously forecasting, it takes a slight … but you have to remind yourself that we all rely on forecasting, whether it’s the weather, whether it’s political elections, I mean forecasting is all around us and we all, as I say in our everyday lives, are reliant on it. I mean it’s interesting too, although you get the instant feedback, it’s not as if it’s black and white. There’s degrees of accuracy. It’s not like it’s necessarily right or wrong, is it?

Ville Satopaa:

Yeah. No, you’re absolutely right about that. There’s I guess two things one could say about here, and one when you mentioned that it actually reminded me of, I think about six months ago I was teaching a class of MBA students and we started talking about forecasting. And there was a student there who said that, I used to work on forecasting in my company and he seemed very disheartened and he said, “Look, I mean it’s just a gamble. There’s no way that we’re ever going to get this right.” And that’s true. In many ways, we can never know the future exactly. But that’s really besides the point in forecasting. What you really want to be in some sense is less wrong. And sometimes I like to, if you want to think about gambling here, you can throw an analogy here.

It’s almost like going to an old school blackjack table. And you can go there naively and just play and have the odds speed tilted towards the house, or you could have a good strategy. You can learn how to count cards, which will give you more information about how to behave in the future and tilt the odds in your favour. Does this mean that you’re going to win every time? No. But in the long run you’re going to rise on top. And that’s what a lot of forecasting is also about. It really is about collecting information and then having the right processes in place to convert that into something that gives you an edge. Now that being said, I also do work a lot on probabilistic forecasting, which is also interesting in this respect because suppose I’m predicting some events. I don’t know whether this conflict in this part of the world is going to end by the end of the year, for instance. If I say today that my prediction is 35%. Now suppose that event doesn’t happen, was I wrong? What if that event did happen? Was I wrong?

So in this probabilistic setting, there’s really only two ways one can be absolutely wrong. It’s either if I made a prediction of zero or one and the opposite happened. But when I’m in this grey zone in the middle, I can never be entirely wrong. Now what we need to do then is to actually just start looking at multiple predictions again over time and see how well do those probabilities that I allocated align with the frequencies that we observed in the real world. So say all those times when I did predict 35%, did 35% of those events happen? If so, then what I’m known as a calibrated forecaster. And this is very useful because I actually can make these probabilities or statements that we can interpret in the way we like to think about probabilities to begin with.

Des Dearlove:

So I mentioned a couple of areas where you’ve applied your work. Let’s connect it to the world as we know it. I mean, I mentioned, I think, a mortality rate. But you’ve also done some research recently in terms of predictions for world politics. Tell us a little bit more about that and what that found.

Ville Satopaa:

Yes. So this actually takes me all the way back to graduate school. So I did my PhD at the Wharton School in the statistics department. And I think during the second year of the program, this is when the US intelligence community launched a new forecasting tournament called ACE. And there the idea was to bid different university teams against each other to really forecast these future political events, the kind that typically intelligence agents would be forecasting. And they wanted to see, well, let’s see what the researchers can do. Let’s put these teams against each other and see who can come up with the best method to predict these important events. So they will be a lot about election results. There were some about finance as well, economic crisis and various sorts of diverse events.

Des Dearlove:

If I heard you correctly. So the US intelligence community is, or services are obviously  … they see how valuable forecasts could be. So they’re actually effectively sponsoring, like a competition?

Ville Satopaa:

Exactly. Yeah. Absolutely.

Des Dearlove:

Okay. I’m just checking I heard you correctly.

Ville Satopaa:

Yeah. I mean the thing here to realise is that forecasting is really only a means to an end. Typically, we need an accurate forecast of an understanding what is about to happen in the future so that we can take the right decisions right now. So it really is a form of intelligence itself. Now I was part of the group at the University of Pennsylvania. I was doing statistical research within the group that decided to take a judgmental forecasting approach to the problem. So they filtered a lot of participants and came up with a large collection of individuals who would then make predictions of these events. The problem was that now I had 1,000 people and probably 1,000 different opinions in terms of what is going to happen. And we as a team could only submit one opinion, one prediction.

So the big statistical problem there was then about aggregation. How do I take all these thousand different opinions and combine them into a single prediction that would harness the full power of that crowd, all the information that is in it and that we could then submit to the competition. Now this is how I got started. And then really when you get into the weeds of it, you realise that there’s just so many problems to be solved and it’s a never-ending story, but I really enjoy it.

Des Dearlove:

So that line of research is something you’ve carried on, you’ve continued with to, as it were, to harvest?

Ville Satopaa:

Yes, absolutely. I’m still working on different types of aggregation problems. It sort of lives on. This is the beauty of starting off with an extremely difficult problem, that you can keep working on for a very long time. But I’ve also ventured from there to a few other areas of looking at in particular different types of biases, behavioural biases. I think earlier you mentioned herding. Herding is one type of bias that we oftentimes see in, for instance, in financial analysts.

Des Dearlove:

We should probably explain what herding is. I mean it’s not that, obviously it’s not to do with livestock.

Ville Satopaa:

Yeah. So herding or conforming is the behaviour where you may have say several individuals making predictions, but they actually start having secondary personal objectives besides just accuracy. So the thinking goes as follows. I know what my most accurate prediction would be based on information I know, but I also need to worry a little bit about my job, my own personal reputation. So what I start doing now is that I start thinking, well, what is everybody else about to do? What is going to be the consensus prediction in this group? And then I’m going to shrink my or move my best prediction a little bit towards that. So this way I have almost like an insurance that ultimately I would not end up being very far from everybody else and God forbid, wrong alone. Because that could get me sacked, that could get me in trouble. So I want to avoid that.

So people start sacrificing accuracy for the sake of being similar to everybody else. So leads to this herding behaviour. So we have studied this quite from a theoretical perspective in various types of dynamic settings and probabilistic forecasting. And now we are actually writing a paper where we imagine a crowd with various types of strategies, not only herding, but the opposite of that. You might also have some people who want to be controversial, they want to stand out. So what happens in a crowd where you have a mix of people like this. And this could happen in a practical setting when you have say more junior members of the company in the same room with the more senior who tend to be more controversial. Also, younger people tend to be a little bit more conservative than older. So what happens in a dynamic like this? And it turns out that it’s a tractable problem and we can theoretically model it and draw some insights out of it.

Des Dearlove:

Presumably too, herding is the phenomenon you see in stock markets and to some extent currency fluctuations. Sometimes, I mean much as, clearly, the herding instinct to not stand out may not get you fired. Being prepared to go against the herd could also make you rich if you pick the right stock and you are the only one or you are the first one to do it. So I guess there’s two sides to that.

Ville Satopaa:

Yeah. There really is. And I think overall, just taking even that to a broader picture is that there is huge potential in taking this behavioural modelling of forecasting to, say, the stock market that is ultimately governed by a lot of the behavioural aspects. But yeah. So it is interesting though because I said that now we’re modelling these crowds that are a mixture of things because… We arrived at this research problem, because when we were writing some of the earlier papers purely on herding perspective and we were doing literature review, we realised that you could find empirical results going either way. Some people are claiming that on the same data set that they are herding. And another paper saying that, no, they’re actually anti-herding, they’re being controversial. And we thought, how can this be? And one simple explanation is that perhaps the data set holds both types of people. So then we went into this direction.

Des Dearlove:

And you mentioned earlier about the aggregation problem in terms of particularly what we would call crowdsourcing. I guess it’s crowdsourcing as a form of forecasting. And how do you manage that? Is that something that we can make sense of because I think a lot of people would understand what crowdsourcing is and if there’s ways of making it more… Because it is a challenge if you ask, I mean, we’ve done surveys at Thinkers 50. You get people polar opposites. Some people say this was really good, and some people say the same thing was really, they didn’t like it at all. So it’s very difficult sometimes to make sense of that information.

Ville Satopaa:

Yeah. Because this is … there’s a few things we could talk about here. Let me go all the way back to the very first problem because I think that might be quite interesting. Because when I was introduced to this aggregation problem, this must have been the second year in graduate school, what we found empirically was that taking the simple average of these predictions actually was not performing very well. And this was puzzling to us because averaging, I mean throughout the history of statistics has been the way to reconcile data. And on top of that, what we noticed is that we could improve it systematically by always shifting it closer to whatever extreme was the closest to the average. Remember these are probability predictions. So the predictions are somewhere between zero and one. Zero meaning 100%, one meaning 100%.

So suppose your average of all these predictions came out to be 20%. Now what we realised, that we could systematically shift a little bit closer to zero. On the other hand, if your average happened to be, I don’t know, 65%, we’ve shifted closer to 100%. So we make it more extreme. And that seemed to improve it. Now we were puzzled. We were like, why is this happening? This seems to systematically be coming out of our data set. So we started really investigating this from the theoretical perspective because clearly the fundamentals driving the good behaviour of averaging were just not right here. Something when the underlying model was just not in alignment with the reality. And eventually what we converged on was this idea around information diversity, which says that in our setting when we were predicting these difficult future geopolitical events, it’s likely that people did not disagree with each other anymore simply because of some symmetric noise around some consensus that we could capture with averaging.

Instead, they were disagreeing also because they may know various different things. It’s information diversity. I could be extremely, say, well-informed about politics in Finland. So my prediction because of that might be different from yours. So this is another source of diversity. Now that’s not how averaging models diversity. Instead, averaging comes from early days of statistics when we would have some … a precise instrument trying to model, I don’t know say the position of the planets, and what we would then do is take multiple measurements. Eventually we’ll get some beautiful bell curve around the true position and would average all these points to get an accurate estimate of where the planet actually is. And this works because we end up cancelling out the two low estimates with the two high estimates. Now translating this to judgmental predictions, this would almost mean that we are all interpreting a common set of information and some of us underestimate the evidence and that some overestimate the evidence in that and averaging could work really well here.

The problem is that that’s not what was going on here. People knew various different things. So what you needed to do was to take that average that would interpret all differences as noise, even those bits that were actually information and make it more extreme. And what this ultimately does is that it brings back all that information that was lost due to it being interpreted as noise and hence harmful. It brings it back into the aggregate and makes it more confident. Why this way? Because suppose you take any prediction and as I keep on infusing it with more and more information, ultimately I become clairvoyant and my prediction should converge to zero if the event doesn’t happen or one if it happens. So this way we bring it back. That’s what was going on.

Now this is a technique we then developed and it worked extremely well in that competition that I mentioned earlier that was hosted by the intelligence community. And it was one of the reasons why our team was able to win the tournament. And it was also one of the reasons actually why we were able to use lay forecasters and aggregate their predictions, extremize them and arrive at predictions that were much more accurate than even predictions made by intelligent agents with supposedly classified internal information about the events. So they worked extremely well.

Des Dearlove:

Fascinating, really. I mean I’m not asking you to break any secret codes or anything, but what events are we talking about? Are we talking about political elections? Are we talking about economic movements? I’m trying to think what the intelligence community might be interested in tracking and predicting.

Ville Satopaa:

Yeah. So this data set is publicly available. So anybody can really see it. But it was a whole range of different kinds of events. They were quite carefully crafted as well because when you’re having a tournament like this, you don’t want to give questions that are impossible or too easy. So we always talked about this gold and middle ground of having true probabilities, whatever that means, somewhere between 10 and 90%. So there’s always some uncertainty left. Now what kind of events would they be? Yes, a lot of election results, a lot of questions about the conflicts at the time. Also, different kinds of economical crisis. I’m sure there must’ve been a question about Brexit. I remember one question asking what is the probability that a Japanese whaling ship is going to enter the Australian territories by a specific date? So you see it’s a very big range of topics.

Des Dearlove:

Okay. It’s interesting the applications to gambling. I mean you mentioned the casino example, but I mean the range of bets these days that you can make. We were talking earlier, we were talking to some colleagues at MIT and we are actually talking about the way that sports events now are, you can place a bet on almost anything. Yeah. So the whole forecasting world has become, is much more diverse. But you mentioned earlier noise. I see there’s some good questions coming in as well. I will get to the questions, but I just want to cover off this. You mentioned noise. I’m going to ask you to define what noise is. I mean I know you’ve got this, the BIN model, which is Bias, Information, and Noise. How do those factors affect the quality of forecasting?

Ville Satopaa:

Yes. So maybe it’s helpful here if I explain how we got off the ground with this one. It came about when Bartmellers, who was a co-author on this. I remember she showed me a graph of predictions made by people. And it was just a simple histogram. And she said to me right away, look how much variability there is. Okay. There’s a huge amount of noise here. And I said right away, “Well going off of our early discussion, how do you know all that disagreement is noise? It could also be that there is a bunch of disagreement that happens because they know various different things.” So then that led us to thinking, well, how could we model this? How could we capture or separate what portion of the variability we see is noise and what portion is actually due to information asymmetry? And then soon we realised that not everything is about random variability, but there’s also systematic variability which is biased.

And this left … to this model of the BIN model. Now in short, let me explain slightly differently. What really happens there is that we explain variability in predictions. So if I make a prediction for one event, maybe tomorrow make another prediction and so forth, this can be variability in that. Now that variability we can decompose into two parts. One part that is helpful, that correlates with the outcome. So this will be what we call information. So as an example, suppose I’m completely uninformed, I have no idea about the topic you’re going to ask me. I barely understand the question you’re asking me. It’s about some event that I never heard of. If I’m rational, I probably should predict 50%, because what else would I do? But now suppose I go and study a little bit and I get more informed, then what starts happening is that my prediction starts fluctuating around 50%. And on average it would start becoming closer and closer to the true events.

Eventually I would, say, have some crystal ball, I’d be clairvoyant. Well, I would more tell you whether the event happens or not. So that’s the variability that’s peer information. But the thing is with humans, is that we are very fickle. We are not very good to begin with in converting the information we have into an actual probability metric. And along with that we have emotions. We can think of something momentarily today or be over optimistic another day. So we are extremely fickle creatures and this creates noise. This is unexplained variability in your predictions that is not helpful. It doesn’t come from actual relevant information, yet it makes our predictions fluctuate as well.

So this is what I call noise. Then there’s bias. Now bias is a systematic overestimation of probabilities. As in for instance in our study that we performed on a large collection of these geopolitical forecasts, we found that people seem to be a little bit overly enthusiastic to see change. So we always think that we allocate too high a probability systematically to a change of status quo. So that’s a systematic shift. So then this model, we can apply it to real-world predictions and we can actually estimate these parameters and what actually drives those predictions.

Des Dearlove:

Okay. Interesting. I’m going to take a couple of questions, okay?

Ville Satopaa:

Yeah.

Des Dearlove:

So I was going to ask you about AI anyway, but I’m sure we will get to that but one of these questions here. What criteria do you find most effective for evaluating the accuracy and usefulness of different forecasting models, especially in complex fields like healthcare or urban planning?

Ville Satopaa:

Okay. So the metric depends on the type of output your model gives. So for instance, if your prediction is a probability, then what we need to use are what are known as proper scoring rules. Now these have been motivated for various different angles, but for the sake of this discussion, they are the ones that really measure how calibrated your prediction is. Remember earlier I explained a calibrated is a prediction that we can interpret like a probability. And they also measure how much information there is in this prediction. Now those are probably the most important two components you want to have in a probabilistic forecast. Why? Because again, this is something that you will then plug into your decision-making process. And the decision-making process then involves calculating, say, expected profits or expected mortality rate or expected losses. And in order to calculate this, you need to have an understanding of how likely are the different future scenarios. And that’s what a probabilistic forecast gives you.

And then once you can calculate the expected, say profits, that could be a function of, I don’t know the amount of production you’re going to do in your factory, or you can then start optimising that with respect to the decision-making parameters that you have. So it really depends. But I would say if it’s a point prediction, you usually don’t go wrong if you’re looking at out-of-sample performance in terms of squared error loss. So looking at the distance between the prediction at the outcome and squaring that – and looking that always out-of-sample.

Des Dearlove:

Okay. I’m going to take another question. This talks to the AI point. What emerging trends do you see shaping the future of forecasting in the next decade, particularly in relation to technology advancements like AI and machine learning?

Ville Satopaa:

No. This is interesting. Now obviously in the space of AI, we are moving into very interesting territories where we’re starting to build more and more information into these models. Take for instance these generative AI-like models, the large language models that people are talking a lot about on the internet. These are now trained on almost all the texts ever generated by human beings. That’s actually, it seems like a lot. But actually is not that much information because if you compare that to how much information humans process only in the early years of our lives, it could be many, many, many fold of all that information. And this happens because it turns out text is not a very compact format of information. We all have this saying that an image or picture is worth a thousand words. So if that’s true, what about a video?

Imagine the amount of information that is still sitting out there that we’re not harnessing. Now, the world is moving in that direction. Now why am I making this comment? Because I want to tie it back a little bit to this judgmental forecasting of events that are almost unique like. It’s interesting to think that well, humans can make predictions like this. So clearly humans draw information from somewhere and then they are able to reconcile that into a reasonable judgement. Why can’t we train a machine to do this? What is it so mysterious about it? And I think it really boils down to this argument that, look, we have a long life experience behind us. We understand cause and effect relationships in this world and we have a long life experience in our back pocket that is our training data set.

But as the world moves more and more towards this direction where we start harnessing more and more of basically everything that has been digitised into these machine learning models, could there be a day when actually something like a large language model can make predictions even of these unique-like events better than humans? Maybe. We’re not quite there yet. Some researchers have tried doing that and it didn’t really work all that well. But this could be something that we’re going to see. For now, maybe sort of in the short near term feature what I think is going to be very useful are these hybrid type methods. So humans still have the advantage of drawing all this contextual information, even from seemingly unrelated areas, bringing it together to make predictions about these kinds of unique-like events. Whereas machines are very good at crunching very rigorously huge amounts of data.

And they could be biased of course, but oftentimes in forecasting they’re a lot less biased than humans are. They’re a lot less noisy than humans are. So in another research what we are looking at as an example… Let me even give you the context because I think it makes it more concrete. We’re teaming up with a major pharmaceutical company and we are helping them to improve their predictions of clinical trial outcomes. Now they obviously want to have these predictions because then they know whether to go ahead with the testing, which could be very expensive, or whether they should pull the plug. So there what we have done now, which I found very effective is to have almost this … a machine, human machine layered approach where we start with the machine prediction. This could be as simple as looking at all the past drugs that have gone through clinical trials and taking some base rate of outcomes of drugs that were similar to the one we’re looking at right now.

That prediction is given to a group of humans who are experts and they will then update that in the light of contextual information about the very specific drug we’re looking at right now. They will update that. But then again, the issue now is that humans are very biased. They’re very noisy even when they’re updating something. They have all this information but they don’t necessarily know how to incorporate it exactly. So then we bring in another machine on the very end that irons out and fixes that human update so that ultimately what you get is the best of two worlds, harnessed rigorously into a calibrated prediction of what’s going to happen in these clinical trials. So I think the hybrid is a good bet on where things are going.

Des Dearlove:

It’s kind of a machine-human-machine sandwich really.

Ville Satopaa:

Yeah. I didn’t want to say, but that’s what I call it as well to my co-authors.

Des Dearlove:

Okay. All right. No, just checking.

Ville Satopaa:

Yes.

Des Dearlove:

Another question here, which I think it relates to the one-off event is, do you model for black swans? You know what black swan is. Very low probability, but high-impact event.

Ville Satopaa:

Yeah. So oftentimes these are in there implicitly because even when you’re forecasting as an example, for instance, we wrote a paper where we are looking at large amounts of sales data coming out of Walmart and here I think they had thousands of different SKUs. We had historical data of their sales and then the goal was to predict what the sales look like in the next few months. And what we did here was that we made probabilistic predictions. So you have a full, you can imagine like a bell curve that would describe where we think it’s going to be and then you can then plug this into all your demand production planning. Now there of course in that statement already is a prediction about black swans, but it’s in the tails. So when I make that prediction, I’m saying, well, however we define a black swan in that context, could it be that it’s in the one percentile, in either one of the tails?

That probabilistic prediction gives you an estimate of that. So that’s all integrated in, especially when you’re doing these Bayesian statistics models that are basically just another way of saying probabilistic modelling of the world. Now it’s there, but I admit it is very difficult. Because oftentimes just by definition we observe these rare events very rarely so we don’t actually have a whole lot of data to train our model to make those rare event predictions. So it is challenging. Even here what could help is that you can bring in a human in the loop, whether that is building more structure into your statistical model or whether you just have the human to update the full prediction before it’s put into decision-making. But again humans might be able to help here to compensate a little bit for the lack of data in the training that you’re using for the statistical model.

Des Dearlove:

I mean, another thing that occurs to me is that when we’re talking about AI, if you’re training the AI with data that already has bias built into it – I think we have to acknowledge that a lot of data, whether it’s to do with assumptions about people’s education, I’m thinking of a very practical application, for instance, making hiring decisions – what I’m saying is the AI is going to potentially have a lot of bias built in. It might be a more efficient processing of biased information. Does that make sense?

Ville Satopaa:

Yeah. No. It makes sense. So that’s why I was careful to choose my words to say that machine learning models can also be biased. I mean, ultimately they are reflections of the quality of the data that we fit in, and if that data involves human biases, they will seep through to machine predictions as well. So that’s definitely there. I agree.

Des Dearlove:

And how can we therefore over time, is there a way to filter for bias? Is there a way to use some of the forecasting models to improve the quality of the data that we’re feeding to the machines which are then going to be learning? Can it have a cleansing purifying cycle? Is there a way to do that?

Ville Satopaa:

Yeah, so I guess it depends a little bit on what context we’re talking about. So suppose we’re back in the forecasting world. This will be just a systematic shift. The beauty about bias is that it’s often deemed as a systematic error. So that seems to suggest that we have it systematic in the past. Hopefully we’ll continue to hold in the future. But what we can do then is that we can actually learn this bias, the systematic bias from the past. And then whenever we make predictions, let the initial predictions be biased this way, but before we report them, we’re going-

Des Dearlove:

Copy.

Ville Satopaa:

… to take off the bias so we correct that out. So that’s in a forecasting setting. It’s quite easy to correct. Actually, and that sometimes… Let me just tell you a quick story there because it relates to the clinical outcomes research that we did. Because initially when we looked at this data and we had these humans who have made these predictions and we looked at them on aggregate and they were extremely calibrated. Meaning that they were very close to matching these true frequencies in the real world. And I remember the desperation that me and my co-author had because we realised that there’s nothing systematic here that is going wrong, which means that how can we improve them? How can we have a research paper out of this when we cannot do much to improve them? And then we started splitting the data a little bit. And what emerged was actually that they did have a systematic bias, but it only emerged towards the latter phases of the clinical trial where the stakes are much, much higher.

As you progress through the four stages of a clinical trial, the trial gets bigger and bigger, more and more expensive. So obviously a human again starts to have these personal agenda-like ideas. Now suddenly it’s hundreds of millions at stake. Maybe I don’t update it so much. Maybe I’ll just keep the machine prediction as it is. And this is what we noticed that towards the latter phases, they were actually not updating as much as they should be updating, which gave us a handle now to correct that using a machine model and hence we could improve them ultimately quite a bit.

Des Dearlove:

No. Interesting. We’ve talked a little bit about noise, but it seems to me that social media must have had a big impact on the ability to forecast, say the outcome of political elections. Whether it’s bots introduced by foreign powers – let’s go back to the intelligence community – but clearly whether it’s Brexit, we now understand that perhaps there was a bit of interference in that referendum in terms of creating echo chambers, creating… I’m assuming this falls into the category of noise. I mean, how can we clean that up? I mean, what can we do about that, if anything?

Ville Satopaa:

Yeah. So I guess there’s two things happening here. One is that elections are interesting because it’s like a self-fulfilling prophecy as well. It’s ultimately the people involved who will decide the outcome as well. So you can start influencing the future outcome. A lot of the things we were looking at were not necessarily the events that weren’t really in the hands of the forecasters. They were almost considered to be external objectives. Now, if you wanted to forecast that, we need to learn to capture this manipulation of the human crowd also and build that into our model, so we could start to capture that look. Now this is another factor that needs to go into the model, that this manipulation is happening. In the past it has had this kind of an effect on elections so let’s build that in. Even though the polls are saying this, social media is doing that, so when I put it together, the real accurate prediction should be this, if that makes sense. So that should at least give us a little bit of a warning of what’s going to happen.

Des Dearlove:

I mean something that fascinates me too is the pandemic. We were completely taken by surprise, but we shouldn’t have been really. I mean, there must’ve been people who had forecast it. Well, I’m sure there were people that had forecast that there was going to… It was just a matter of time before we had a pandemic. How do you explain that? How do we manage to be so blindsided?

Ville Satopaa:

Yeah. No, it is a very good question. I think there were many different approaches applied to forecasting, in particular, the way the pandemic was going to progress. And there were a lot of epidemiologists applying different kinds of models and as far as I know they didn’t work that well. And that might be saying something about the assumptions that these models make and how they actually align with the real world. One group that I know actually did quite well there, was this group of super forecasters. Which is a group of forecasters that actually goes all the way back again to that story that I mentioned in the second year of my graduate school – that tournament organised by the intelligence community. So as we had thousands of forecasters making predictions every year, what we did was that we cherry-picked the top 2% performers and then dedicated them to this elite group called super forecasters. The group lived on even after the tournament was over. And I think even today they still get together in Boston and New York and they meet.

So these are people who are just extremely good at forecasting. And when I meet these people, I mean oftentimes I’m impressed, clearly, by their logical thinking. And I always think that this might be my personal view, but I think they are good at forecasting because they’re simply so good at taking information that is out there and knowing how to convert that into an actual probability. So they’re very good at this. There are some tricks one can do. Obviously you need to be open-minded, but you also always have to have this a rigorous way of questioning yourself like, why could I be wrong to avoid these confirmation biases? But this group was doing quite well. So there were some success stories there as well. But it’s a tough problem. I mean, again, maybe it boils down to the fact that it was quite a new situation. We’d never seen anything quite like it.

Des Dearlove:

It was interesting to see that some of the Asian countries who had experienced pandemics in the recent past seemed, certainly in the early state, I mean perhaps some locked down earlier because they had got a different forecasting process. 

Anyway, I’m afraid we’ve run out of time. But I think we’ve absolutely amply demonstrated the importance of forecasting and predictions and improving forecasts. So a huge thank you to you Ville and everybody else. Please do join us on the 1st of May at the same time when our guest will be Sophie Bacq, professor of social entrepreneurship at IMD.

Ville Satopaa:

Thanks a lot, Des. This was a lot of fun.

Des Dearlove:

You’re Welcome.

Ville Satopaa:

Time flew by.

Des Dearlove:

Wonderful.

Share this article:

Subscribe to our newsletter to keep up to date with the latest and greatest ideas in business, management, and thought leadership.

*mandatory field

Thinkers50 will use the information you provide on this form to be in touch with you and to provide news, updates, and marketing. Please confirm that you agree to have us contact you by clicking below:


You can change your mind at any time by clicking the unsubscribe link in the footer of any email you receive from us, or by contacting us at . We will treat your information with respect. For more information about our privacy practices please visit our website. By clicking below, you agree that we may process your information in accordance with these terms.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.

Privacy Policy Update

Thinkers50 Limited has updated its Privacy Policy on 28 March 2024 with several amendments and additions to the previous version, to fully incorporate to the text information required by current applicable date protection regulation. Processing of the personal data of Thinkers50’s customers, potential customers and other stakeholders has not been changed essentially, but the texts have been clarified and amended to give more detailed information of the processing activities.

Thinkers50 Awards Gala 2023

Join us in celebration of the best in business and management thinking.