Past Performance Is No Guarantee Of Future Results

 

Past performance is no guarantee of future results.

— Mutual fund manager disclaimer

What’s a “good” model?

Well that depends. What is your definition of success—your benchmark against which to be compared? Is it good enough that the model consistently identifies winners at a clip greater than 52.4%? Is it winning your various pick ‘em leagues? Or is it just the humble satisfaction that you’re making better picks with the algorithm than without it?

As I said before, I embarked on this journey to find my unique winning edge. I was not trying to pivot into a career as a professional sports gambler. To me, success was seeing my name at the top of the leaderboards. I’ve proved the algorithm’s value by doing this year-in and year-out. Lately, my focus has widened to improve my model’s predictive abilities versus the “gold standards” of other publicly available models like ESPN’s Football Power Index (FPI), Football Outsider’s Defense-adjusted Value Over Average (DVOA), and Jeff Sagarin’s model as published in USA Today.

Let’s compare the historical performance against the SuperContest lines since 2016, keeping in mind the 52.4% breakeven benchmark:

FPI - 51.5%

DOVA - 51.3%

Jeff Sagarin - 49.8%

My Algorithm - 51.9%

Here’s the performance broken down an annual basis (*through the 2019 NFL season):

Annualize

A couple things stand out to me:

  1. All of the models are generally pretty close.

  2. None of the algorithms consistently perform above 52.4%.

It’s my feeling that with every passing year, the betting market gets more efficient. There are fewer and fewer arbitrage opportunities (to grab +7.5 when the rest of the market is at +6.5). Blindly following any of the models here will yield better-than-average results but you will likely need more if you want to be on the podium at the end of the season.

I’ve always said that the model is a starting point, a ballast against the high winds of public perception. I gauge my level of confidence in the algorithm’s raw suggestion and then compare it against the expert “consensus” from the other models I track. Any discrepancy triggers me to dig deeper in discerning where the public (or my model) may be wrong.

Keep in mind, in order to win your contest you need to outperform the consensus and differentiate from your competition. Hopefully this helps you understand the slim margins available for your algorithm to exploit.