Algorithm to calculate the odds of a team winning a sports match given full history

John Shedletsky picture John Shedletsky · Nov 6, 2014 · Viewed 26.2k times · Source

Assumptions:

  • The teams never change
  • The teams don't improve in skill
  • The entire history of each team's performance against some subset of other teams is known
  • The number of games played between teams is large, but potentially sparse (each team hasn't played each other team)

For example:

I have a long list of match outcomes that look like this:

Team A beats Team B
Team B beats Team A
Team A beats Team B
Team C beats Team A
Team A beats Team C

Problem:

Predict the correct betting odds of any team beating any other team.

In the example above, maybe we conclude that A should beat B 66% of the time. That is based off direct observation and is pretty straightforward. However, finding the probability that C beats B seems harder. They've never played together, yet it seems like most likely that C > B, with some low confidence.

Research I've Done:

I've read a fair bit about different ranking systems for games of skill, such as the Elo and Glicko rating systems for Chess. These fall short because they make assumptions about the probability distributions involved. For example, Elo's central assumption was that the chess performance of each player in each game is a normally distributed random variable. However, according to wikipedia, there are other distributions that fit the existing data better.

I don't want to assume a distribution. It seems to me that with 10,000+ match results on hand that I should be able to either deduce the distribution from the evidence (I don't know how to do this), or use some sort of reinforcement learning scheme that doesn't care what the distribution is.

Answer

Julian picture Julian · Nov 13, 2014

You want to make a best estimate of a probability (or multiple probabilities) and continuously update your estimate as more data become available. That calls for Bayesian inference! Bayesian reasoning is based on the observation that the probability (distribution) of two things, A and B, being the case at the same time is equal to the probability (distribution) of A being the case given that B is the case times the probability that B is the case. In formula form:

P(A,B) = P(A|B)P(B)

and also

P(A,B) = P(B|A)P(A)

and hence

P(A|B)P(B) = P(B|A)P(A)

Take P(B) to the other side and we get the Bayesian update rule:

P(A|B)' = P(B|A)P(A)/P(B)

Usually A stands for whatever variable you are trying to estimate (e.g. "team x beats team y") while B stands for your observations (e.g. the full history of matches won and lost between teams). I wrote the prime (i.e. the quote in P(A|B)') to signify that the left hand of the equation represents an update of your beliefs. To make it concrete, your new estimate of the probability that team x will beat team y, given all observations so far, is the probability of doing those observations given your previous estimate, times your previous estimate, divided by the overall probability of seeing the observations you have seen (i.e. given no assumptions about relative strength between teams; one team winning most of the time is less likely than both teams winning about equally often).

The P(A|B)' from the left hand of the current update becomes the new P(A) on the right hand of the next update. You just keep repeating this as more data come in. Typically, in order to be as unbiased as possible, you start with a completely flat distribution for P(A). Over time P(A) will become more and more certain, although the algorithm is fairly well able to deal with sudden changes of the underlying probability that you're trying to estimate (e.g. if team x suddenly becomes much stronger because a new player joins the team).

The good news is that Bayesian inference works well with the beta distribution which ElKamina also mentioned. In fact the two are often combined in artificial intelligence systems that are meant to learn a probability distribution. While the beta distribution in itself is still an assumption, it has the advantage that it can take many forms (including completely flat and extremely spikey), so there's relatively little reason to be concerned that your choice of distribution might be affecting your outcome.

One piece of bad news is that you still need to make assumptions, apart from the beta distribution. For example, suppose you have the following variables:

A: team x beats team y

B: team y beats team z

C: team x beats team z

and you have observations from direct matches between x and y and from matches between y and z but not from matches between x and z. A simple (though naieve) way to estimate P(C) could be to assume transitivity:

P(C) = P(A)P(B)

Regardless how sophisticated your approach, you'll have to define some kind of structure of probabilities to deal with the gaps as well as the interdependencies in your data. Whatever structure you choose, it will always be an assumption.

Another piece of bad news is that this approach is plain complicated and I cannot give you a full account of how to apply it to your problem. Given that you need a structure of interdependent probabilities (probability of team x beating team y given other distributions involving teams x, y and z), you may want to use a Bayesian network or related analysis (for example a Markov random field or path analysis).

I hope this helps. In any case, feel free to ask for clarifications.