Performance Results
The traditional surgeon model of teaching is colloquially referred to as the “see one, do one, teach one" model. That model is the spirit behind this blog. This is a work in progress. I hope you find my insights actionable. To that end, I’m committed to continuous improvement and trying to keep the edge that’s proven successful over the last few years.
In order to improve performance, you must measure it.
In this post, I will:
Illustrate the inherent uncertainty in picking NFL games using data from 2010,
Define success using historical actuals of contest winners, and,
Share performance results of my algorithm.
The U.S. Army War College described the post-Cold War geopolitical situation as volatile, uncertain, complex, and ambiguous. Strategic leaders across the business landscape use this VUCA concept to set the proper course of action depending on the environment ahead. I argue that any decision to be made can benefit from understanding the VUCA environment.
Shifting the analogy back to picking NFL games, I think it’s pretty clear, with so many moving parts and game plans, the game of football is complex. With injuries and trades commonplace, I think it’s also volatile. Similar cases can be made for the game results being ambiguous and uncertain. The dominate VUCA environment will influence which algorithm type you choose to create. If you think the complexity of football is the primary driver in variable outcomes, you may seek to develop a characteristic regression model. In contrast, if you feel the environment is well defined but uncertain, you may opt to develop an aggregation model, to leverage disparate observations, broadening your understanding of the environment.
To illustrate the uncertainty in NFL game outcomes, let’s begin with the assumption that the Vegas closing line point spread is the clearest indicator of comparative team ability. Vegas undoubtedly has their own handicapping models but they also aggregate information from the bettors they service—particularly the sharp bettors. The histogram below illustrates the delta between the Vegas line and the final score margin. (N.B. The bin size is 3 points.)
This plot shows the inherent variability in picking winners based on the point spread alone. Only 25% of games finished within a field goal of the closing line. A little over half (~55%) ended within a touchdown of the closing line. And that doesn’t even account for which side won the game!
I wrote in an earlier post that if you get more than 52.4% of your ATS picks correct, you break even in Vegas (assuming -110 vig). If you correctly and consistently pick winners at a rate greater than 55% you are world-class. If you can get greater than 60% of your picks correct over the course of the season, you will likely win your contest.
Based on historical data, ~75-80% of the time the winning team covers the spread. Think about that for a second…just pick the winner of the game, it’s that simple. After all, the algorithm outputs an expected win probability for each matchup. Turns out, for a given week, the average expected win probability of the favorites is near 65%. If we combine this knowledge with the rate at which the winning team covers we can calculate the average expected cover rate. The expected cover probability can be calculated by multiplying the expected win probability by the winning team cover base rate: 65% * 80% = 48%. Uh oh. 48% isn’t anywhere close to profitable!
Unless you can find value outside of the raw algorithm you’re going to struggle to consistently pick winners. Empirically, this can be seen in the trend plot below comparing my raw algorithm with the “official” contest plays against Colin Cowherd’s Blazin’ 5 on Fox Sports.
In summary, the VUCA environment aptly describes the NFL. Its volatile, uncertain, complex, and ambiguous nature makes handicapping individual games difficult. I’ve adopted a layering approach to picking games, whereby I use the algorithm suggestion as a starting point to help identify differentiated picks and avoid the tendency to overreact to recent unexpected outcomes. Then I layer in additional information, like the aggregation of other model forecasts and market information before putting on an “official play”. Though performance for all three systems varies over time, I think the plot illustrates the added value of this layering approach over using the raw algorithm alone.
I hope you took some insights from this post. Comments and suggestions for new visualizations or content are welcome! All live Tableau visualizations can be found here. Thanks.