You may call James R. Ashburn and Paul M. Colvert football maniacs. Or researchers with too much time on their hands. Or statisticians looking to apply data analysis everywhere they look. Or all of this at the same time.
Nevertheless, they have written a paper called A Bayesian Mean-Value Approach with a Self-Consistently Determined Prior Distribution for the Ranking of College Football Teams (currently in preprint). Behind this long title, hides a quite classical application of statistics.
We introduce a Bayesian mean-value approach for ranking all college football teams using only win-loss data. This approach is unique in that the prior distribution necessary to handle undefeated and winless teams is calculated self-consistently. Furthermore, we will show statistics supporting the validity of the prior distribution. Finally, a brief comparison with other football rankings will be presented.
If you want to know how far a stastistician can go on a semi-serious subject using a serious tone, read the article linked above, it can get pretty interesting, despite being 31 pages long!
The NCAA football matches is a dataset frequently used by statisticians to develop and test some algorithms and it has made the object of many studies with such esoteric titles as ” A Penalized Maximum Likelihood Approach for the Ranking of College Football Teams Independent of Victory Margins“, “Random Walker Ranking for NCAA Division I-A Football” and “Hybrid Paired Comparison Analysis, with Applications to the Ranking of College Football Teams“.
July 26th, 2006 | General Science
Paul and I appreciate the acknowledgment. As far as “too much time on our hands,” the reason we have a geeky hobby like this one is that it’s cheaper than golf and can be done from our homes after the kids have been put to bed. I do hope folks with an interest in the topic will take the time to read the paper, and I strongly encourage comparisons with other published methods.
Most “solutions” offered up for this problem seem to fall into one or more of three different categories — 1) they are unpublished, 2) their underlying model is weak or incomplete, 3) they have subjective inputs. Failing to publish a solution one is touting only raises suspicions — is there something to hide? Any good model will follow basic rules of statistics. An obvious one for this problem is the Central Limit Theorem (http://en.wikipedia.org/wiki/Central_limit_theorem). Any model with “knobs” that cannot be determined self-consistently is incomplete. Subjective inputs indicate that the solution has failed to fulfill its most fundamental purpose — removing all subjective elements from the rankings. We believe that overcoming this hurdle is our key contribution.
One final note: We don’t take this nearly as seriously as the tone of the paper might suggest. Except between September and January.
Comment by Jim Ashburn — August 28, 2006 @ 4:32 pm