The Algorithm

There are two keys to consider when using an algorithm to give you an edge in a contest. It must be:

More accurate than the consensus.
Different from your competition.

Seems obvious, yes, but it’s difficult to put into practice. Think about it. You can’t win if your picks aren’t better than everyone else’s and you can’t win if your picks match your competitors. As we dive into the details of creating a model I suggest you keep these two keys in the back of your mind. Remember, you are building a model to add value to your selections—not match “expert” opinion.

To start, I want to describe a few different algorithm types:

Power ratings
Regression
Play-by-play
Aggregation

A power ratings algorithm is one of the basic model types at your disposal. It works by comparing the relative power rating for one team against that of their opponent and adjusting for home field advantage among other factors. This difference yields the expected point spread. This is the model type I’ve used and refined over the years. Example: KenPom’s College Basketball Ratings

The regression algorithm is another common model type. These models consider any number of key characteristic variables (e.g., points for (PF), points against (PA), yards against, turnovers, sacks allowed, etc.) to describe a specific team or matchup. Users of a regression model apply weights to the individual variables to calibrate the model as needed. These weights can be, and often are, adjusted to maximize predictive capability. Building this model is a step-up in complexity from the power ratings model and can be more useful, depending on the quality of available data.

As game data becomes richer and more readily available, the most sophisticated models leverage play-by-play data, contextualized by situation, to infer a team’s ability above that of an average team, on a “per-play” basis. From this information, full game point spreads can be derived. Typically, these models are complex and assumed highly proprietary. Example: Football Outsider’s DVOA Team Efficiency Ratings

The aggregation model is another basic, yet effective, method to handicap games. Using a variety of available sources, this model type aggregates, or combines, the different spread values into a single value. Low correlation between data sources has been shown to dramatically improve the prediction accuracy of the aggregated value. A warning though: if the data sources are moderately correlated, the aggregated result may show less variance to the consensus “expert” opinions, which jeopardizes rule #2 that I mentioned at the beginning of the post. I think this model type is fairly straightforward, but allow me to explain how this works. If ESPN’s FPI thinks the Denver Broncos will win by 5 and Terry Bradshaw on Fox NFL Sunday likes DEN by 2 (assume these are the only two sources I’m using), this aggregate model will handicap the Broncos to win by 3.5.

“Imagine that a large number of observers are shown glass jars containing pennies and are challenged to estimate the number of pennies in each jar. As James Surowiecki explained in his best-selling “The Wisdom of Crowds”, this is the kind of task in which individuals do very poorly, but pools of individual judgements do remarkably well. Some individuals greatly overestimate the true number, others underestimate it, but when many judgments are averaged, the average tends to be quite accurate. The mechanism is straightforward: all individuals look at the same jar, and all their judgments have a common basis. On the other hand, the errors that individuals make are independent from the errors made by others, and (in the absence of a systematic bias) they tend to average to zero...to derive the most useful information from multiple sources of evidence, you should always try to make these sources independent of each other.”

— Daniel Kahneman, "Thinking Fast and Slow", p. 84

Now that we are aware of the different techniques used to forecast game performance, let’s discuss where to begin, what game-specific adjustments to make, and most importantly, how to adjust for week-to-week results.

It should be no surprise that the more precise your Week 1 rankings are, the better your algorithm will perform throughout the season. So given that, how do we start off on the right foot? Well it’s a mix of art and science.

My process is as follows: I’ll start with an assessment of each team’s ability at the end of the prior season, adjust slightly for off-season player/coach transactions, modify team ratings based on the Pythagorean winning percentage from the prior season, and finally regress outlier ratings back to the mean.

The Pythagorean winning percentage can help to identify teams that have either overachieved or underachieved the previous season. It’s calculated by the following equation:
PF^2/(PF^2+PA^2)
Teams that won greater than one game more than they were expected to based on Pythagorean winning percentage tend to regress to the mean the subsequent year. The opposite is true for teams that were “unlucky”, winning fewer games than were expected.

Ok, now that we’ve got a good Week 1 starting point, let’s turn to the game-specific adjustments to account for home field advantage, rest, travel, and motivation. Keep in mind, there is an art to this.

It’s widely accepted that home field advantage is roughly 2.5 points in favor of the home team. Some choose to blanket this adjustment across all teams/stadiums. I opt to assign stadium-specific adjustments: up to 4 points for teams with historically strong home field performance and down to zero adjustment for teams with historically bad performance at home.

Based on data I’ve analyzed since 2014, teams with more than a week’s rest (e.g., off a bye or played the previous Thursday night) covered the spread 10% more often than teams with abbreviated rest (Sunday night game after a Monday night game). I will slightly adjust higher teams coming off a long rest and if debating between two teams, I tend to favor the team with more rest. Performance off a bye-week is very interesting though. There are some NFL coaches who seem to have a mythical ability to prepare their teams to victory given the extra time (Andy Reid and Bill Belichick come to mind) and simple trend data would seem to support this. But how applicable are those statistics really? Be careful when applying a specific statistic to a general situation. From what sample and under what assumptions were those statistics calculated? It’s better to apply a generalized statistic to a specific situation, than the opposite.

While I haven’t done extensive research in the field of travel, I’ve seen little evidence to suggest teams traveling an extended distance has any effect on the probability of covering the spread. Additionally, motivation can be highly overrated. I caution using substantial adjustments for motivation—just because a team “needs” to win, doesn’t mean they’re come through.

The point is, making game-specific line adjustments beyond home-field advantage can be perilous.

Handicap wisely!

On MechanicsChristopher RuckelNovember 9, 2019algorithm, adjustments, Pythagorean, blog, start here