The Three Kinds of Lies: Lies, Damned Lies, and Statistics
This was an article I posted back during the Dick Bennett era. I believe it was somewhere in the 1999-2000 season. At that time, fan criticism of Bennett's style of play was quite high in some quarters . Fans saw the scores in the 50's and low 60's and were convinced that the Badger offense was inept. What is telling about my article is that through today's lens, it really is quite UN-newsworthy. We have come to accept efficiency stats, points per possession, KenPom, tempo free stats, and so forth, but back then, it was absolutely NOT widely accepted. Going on memory here, but the Badgers had just finished a year in which they were 3rd in the Big Ten in terms of points per possession despite a low raw scoring average due to exceptionally low tempo numbers. There were MANY heated message board debates on the topic and I very much became known as a "Bennett apologist". Oh, and Mike Kelley was my hero (thanks to a good shooting percentage, low turnovers, and decent efficiency numbers despite his low usage rate)...or so it goes. Anyway, after digging back in my files, I got a kick out of reading my position back then, a position that seems quite obvious today. Anyway, here it is. Enjoy.
How can one statement be so true yet so false at the same time? The entire issue of statistics in relation to sports is one that is fundamental to even the most common fan’s appreciation of the particular sport they are viewing. From the most basic of statistics (the score of a game) to the most trivial or obscure, our perception of statistics varies along a wide continuum. Unfortunately, the problems in the ways many (if not most ) fans perceive statistics often clouds our understanding of that sport when in fact they should be enhancing our understanding.
Statistics can be broken down into two camps in my opinion: the common, well known statistics and the uncommon, more detailed and much less utilized statistics. Examples of common statistics might be batting average, homers, and RBI’s in baseball, scoring average in basketball, total rushing yards in football, etc. These are stats that we take for granted as being relevant in large part because they have entered the everyday lexicon of the baseball, basketball, or football fan. As each of these stats is commonly used by the media as a whole in addition to the general fan base, they are accepted as accurate reflections of ability. They are all easily understandable and user friendly. Generally, they are whole numbers and do not require explanation. How many TD passes does Brett Favre have? 35. This doesn’t require any explaining and isn’t some extended decimal number that doesn’t make sense unto itself. How many points per game does Glenn Robinson score? 19. Next question. Stats like these speak to the least knowledgeable sports fan. They are used to show one specific ability or attribute in the simplest way to cater to those that do not have the time or inclination to actually learn the real story.
Which leads us to the problem with statistics, namely context, and misuse and generalization.
A.) Context
Every situation or statistic has a context. Last season, I hit over .650 for my softball team. Does that mean I would hit .650 if I were a major league baseball player? I could score at will on the basketball court if I were playing a third grader. Does that mean I would be able to light up Michael Jordan as well? Of course these situations evoke obvious responses....”Don’t’ be silly, of course you couldn’t do those things!”. Why? Because the context is different. Accumulating stats depends on a large number of contexts: who I am competing against, where I am competing, what are the conditions where I am competing, etc. A hitter in baseball getting 300 at bats at Coors field rather than Dodger Stadium has a huge huge advantage in accumulating stats. A player shooting the ball 30 times a game has a tremendous advantage in terms of scoring points over a player shooting the ball 10 times a game. If a wide receiver has a poor quarterback throwing him the ball, there is a pretty good likelihood that his stats are going to suffer because of this. And yet, how often is the context completely ignored. How often do people actually believe that Dante Bichette is actually an All-Star baseball player, despite overwhelming evidence that he accumulates his numbers in large part due to the context in which he plays?
How does this relate to Badger sports? Well, recently I have been making a case for the basketball team NOT being as poor an offensive team as most are led to believe, specifically because of the “point per possession” statistic I developed. In this particular case, there is ample evidence to show that in fact they were a decent (though not great) offensive team, though many refuse to believe it, pointing to their relatively low points per game output, the most commonly referred to stat to indicate offensive strength. Why? Context of course. Wisconsin simply does not shoot the ball as often as their opponents do. Whether you think this is the correct style of play or whether you despise it, this is the context in which we must evaluate the team’s ability. Throw in the fact that their defensive proficiency further slows down the tempo of the game, forcing opponents to use a significant portion of their possession just to get a shot, exacerbates the context issue. But if you look at a context specific statistic such as “points per possession”, which takes into account (as it rightly should), we see that Wisconsin’s offense is not in fact as dismal as the critics like to assume.
Why the reluctance to accept the validity of such a relatively basic if not common statistic? My guess is the comfort. We are creatures of habit. If we haven’t integrated it into our own realm of understanding, we are going to be suspicious of it. The fact that all of the “catering to the lowest common denominator” announcers and writers don’t use such detailed and context-specific stats furthers our distrust of such statistics. If it is so relevant, why aren’t Dick Vitale and Billy Packer using it? The reason of course is that they are on TV to entertain us and most general fans really aren’t concerned with authentic measures of ability. We are used to seeing graphics at the start of each game showing things like points per game, not “points per possession”.
If however we are interested in such measures (which I assume many of you are as demonstrated by the fact that you are reading this), then it becomes vital to search for statistics that do in fact take into proper context whatever it is you are trying to measure. Yet, those that are so ingrained in their beliefs will often dismiss context specific stats entirely with remarks such as the title of this piece. It is quite ironic though that the statistics they do cling to are generally the ones that lie the most (in this case overall points per game).
Evaluating proper context CAN be difficult at times and there are always going to be variables which might affect a particular statistic, making nearly every statistic subject to some subjective analysis as to its worth. However, attempting to take context into account is infinitesimally better than ignoring context all together.
Which leads us to problem number two with statistics:
B.) Misuse and Generalization
Much in the same way that a context can cloud the use of a statistic, many stats are completely are totally misused. For instance, using batting average in baseball as an overall indicator of offensive proficiency. Batting average measures one specific ability...the ability to get a hit per each time at bat. However, overall offensive proficiency is dependent on not ONLY on whether you can get a hit, but what kinds of hits they are (extra base hits...slugging percentage), how often you can get on base without the aid of a hit (on base percentage), how good of a baserunner you are, along with the context in which you play. Year after year there are players that hit for an “empty” .300. They only hit singles, have a low OBP despite the good average, hit into tons of double plays, are poor baserunners, and yet play in a stadium that is conducive to offense. They are in fact terrible players, but are viewed in a pretty decent light due to the acceptance of batting average as a valid indicator of offensive performance. Is batting average important? Of course is it, but it is only one piece of the puzzle. If misused, it gives a very misleading picture as to one’s true ability.
In terms of basketball, a stat like individual scoring is much the same way. If a player ONLY contributes total points to a team without any other real offensive contributions, and in the meantime is gobbling up a large number of shots in the process, he really isn’t doing the team much good at all. And yet, how many folks (announcers and writers included) actually had the perception that guys like Mark Macon (ex of Temple a few years back) were actually good players due to their scoring average, despite the fact that his 20 PPG came with a 32% shooting percentage (or thereabouts), meaning he needed to take about 25 shots a game to get his points. (Really, this is simply just another form of context.) Points per game IS a relevant statistic, but instead of understanding that it is simply a measure of ONE particular attribute, we misuse it and come to believe it means something different.
Back to the Badgers for a moment. We come to the Mike Kelley argument. Most common fans have the belief that Mike Kelley is a terrible offensive player. Well, nothing could be further from the truth. The problem is that we misuse the stat of points per game, thinking that it accurately measure total offensive ability. Obviously, Mike Kelley is NOT a strong scorer. Jumble the numbers any way you want and you cannot conclude that Mike Kelley is a good scorer. But, does this mean he isn’t a decent OVERLL offensive player? Not in the least. Yet, this is just one contribution that a player can make on the offensive end of the court which leads to scoring points. In the case of Kelley, he does a large number of things on the offensive end quite well that are not measure in the statistic of points per game. I don’t think you would ever say he is a great offensive player or even above average, but if you take into account his entire contribution on the offensive end, you see that he is not nearly as bad as his points per game average might make it appear. Let’s not use stats for things they were not intended for.
CONCLUSION; Yes, stats do lie and can misrepresent. However, if we do the following, they indeed illuminate and accurately analyze a given phenomena.
1.) Decide specifically what it is you want to measure. Are you measuring net scoring output, or are you measuring efficiency? Are you measuring one specific skill or are you measuring a cumulative group of skills?
2.) Accurately determine and take into effect the context of each situation.
3.) Use as much objective data as possible. Subjective reflection and analysis is always going to occur as sometimes there simply is not enough objective date available. However, label the subjective data as such. Also make sure your data set is large enough. This in itself is at times a subjective opinion, but it still is a relevant issue.
4.) Do not misuse the statistic. Make sure it reflects what you are saying it reflects.
If you follow these general guidelines, stats DO NOT LIE...they simply measure one particular facet of a particular game (in this case). They may not tell the entire story, but they certainly serve to illuminate the truth. Very often this is done by eliminating the wrong conclusions rather than providing the right ones, though the net result is often the same.
So, when in fact you hear the “three kinds of lies: lies, damn lies, and statistics” realize that there is some element of truth to it, but it is generally used by those that do not adhere to these very common sensible guidelines. People are comfortable with what they already believe in subjectively. It is easy to believe that the Badgers were a terrible offensive team or an overall terrible rebounding team or that Mike Kelley is an inept offensive player if you are not willing to really examine the issue in depth but rather are content to misuse particular stats or refuse to consider context. But if you are clear and complete in your evaluation, you realize that in fact, these presumptions are in fact not true.
(For the record, they were a decent offensive team, only a poor OFFENSIVE rebounding team [which by the way hurts the offense as well because it leads to fewer possessions], and Mike Kelley is only a poor scorer, not a horrible offensive player as a whole...though he does need to improve)
Statistics can be broken down into two camps in my opinion: the common, well known statistics and the uncommon, more detailed and much less utilized statistics. Examples of common statistics might be batting average, homers, and RBI’s in baseball, scoring average in basketball, total rushing yards in football, etc. These are stats that we take for granted as being relevant in large part because they have entered the everyday lexicon of the baseball, basketball, or football fan. As each of these stats is commonly used by the media as a whole in addition to the general fan base, they are accepted as accurate reflections of ability. They are all easily understandable and user friendly. Generally, they are whole numbers and do not require explanation. How many TD passes does Brett Favre have? 35. This doesn’t require any explaining and isn’t some extended decimal number that doesn’t make sense unto itself. How many points per game does Glenn Robinson score? 19. Next question. Stats like these speak to the least knowledgeable sports fan. They are used to show one specific ability or attribute in the simplest way to cater to those that do not have the time or inclination to actually learn the real story.
Which leads us to the problem with statistics, namely context, and misuse and generalization.
A.) Context
Every situation or statistic has a context. Last season, I hit over .650 for my softball team. Does that mean I would hit .650 if I were a major league baseball player? I could score at will on the basketball court if I were playing a third grader. Does that mean I would be able to light up Michael Jordan as well? Of course these situations evoke obvious responses....”Don’t’ be silly, of course you couldn’t do those things!”. Why? Because the context is different. Accumulating stats depends on a large number of contexts: who I am competing against, where I am competing, what are the conditions where I am competing, etc. A hitter in baseball getting 300 at bats at Coors field rather than Dodger Stadium has a huge huge advantage in accumulating stats. A player shooting the ball 30 times a game has a tremendous advantage in terms of scoring points over a player shooting the ball 10 times a game. If a wide receiver has a poor quarterback throwing him the ball, there is a pretty good likelihood that his stats are going to suffer because of this. And yet, how often is the context completely ignored. How often do people actually believe that Dante Bichette is actually an All-Star baseball player, despite overwhelming evidence that he accumulates his numbers in large part due to the context in which he plays?
How does this relate to Badger sports? Well, recently I have been making a case for the basketball team NOT being as poor an offensive team as most are led to believe, specifically because of the “point per possession” statistic I developed. In this particular case, there is ample evidence to show that in fact they were a decent (though not great) offensive team, though many refuse to believe it, pointing to their relatively low points per game output, the most commonly referred to stat to indicate offensive strength. Why? Context of course. Wisconsin simply does not shoot the ball as often as their opponents do. Whether you think this is the correct style of play or whether you despise it, this is the context in which we must evaluate the team’s ability. Throw in the fact that their defensive proficiency further slows down the tempo of the game, forcing opponents to use a significant portion of their possession just to get a shot, exacerbates the context issue. But if you look at a context specific statistic such as “points per possession”, which takes into account (as it rightly should), we see that Wisconsin’s offense is not in fact as dismal as the critics like to assume.
Why the reluctance to accept the validity of such a relatively basic if not common statistic? My guess is the comfort. We are creatures of habit. If we haven’t integrated it into our own realm of understanding, we are going to be suspicious of it. The fact that all of the “catering to the lowest common denominator” announcers and writers don’t use such detailed and context-specific stats furthers our distrust of such statistics. If it is so relevant, why aren’t Dick Vitale and Billy Packer using it? The reason of course is that they are on TV to entertain us and most general fans really aren’t concerned with authentic measures of ability. We are used to seeing graphics at the start of each game showing things like points per game, not “points per possession”.
If however we are interested in such measures (which I assume many of you are as demonstrated by the fact that you are reading this), then it becomes vital to search for statistics that do in fact take into proper context whatever it is you are trying to measure. Yet, those that are so ingrained in their beliefs will often dismiss context specific stats entirely with remarks such as the title of this piece. It is quite ironic though that the statistics they do cling to are generally the ones that lie the most (in this case overall points per game).
Evaluating proper context CAN be difficult at times and there are always going to be variables which might affect a particular statistic, making nearly every statistic subject to some subjective analysis as to its worth. However, attempting to take context into account is infinitesimally better than ignoring context all together.
Which leads us to problem number two with statistics:
B.) Misuse and Generalization
Much in the same way that a context can cloud the use of a statistic, many stats are completely are totally misused. For instance, using batting average in baseball as an overall indicator of offensive proficiency. Batting average measures one specific ability...the ability to get a hit per each time at bat. However, overall offensive proficiency is dependent on not ONLY on whether you can get a hit, but what kinds of hits they are (extra base hits...slugging percentage), how often you can get on base without the aid of a hit (on base percentage), how good of a baserunner you are, along with the context in which you play. Year after year there are players that hit for an “empty” .300. They only hit singles, have a low OBP despite the good average, hit into tons of double plays, are poor baserunners, and yet play in a stadium that is conducive to offense. They are in fact terrible players, but are viewed in a pretty decent light due to the acceptance of batting average as a valid indicator of offensive performance. Is batting average important? Of course is it, but it is only one piece of the puzzle. If misused, it gives a very misleading picture as to one’s true ability.
In terms of basketball, a stat like individual scoring is much the same way. If a player ONLY contributes total points to a team without any other real offensive contributions, and in the meantime is gobbling up a large number of shots in the process, he really isn’t doing the team much good at all. And yet, how many folks (announcers and writers included) actually had the perception that guys like Mark Macon (ex of Temple a few years back) were actually good players due to their scoring average, despite the fact that his 20 PPG came with a 32% shooting percentage (or thereabouts), meaning he needed to take about 25 shots a game to get his points. (Really, this is simply just another form of context.) Points per game IS a relevant statistic, but instead of understanding that it is simply a measure of ONE particular attribute, we misuse it and come to believe it means something different.
Back to the Badgers for a moment. We come to the Mike Kelley argument. Most common fans have the belief that Mike Kelley is a terrible offensive player. Well, nothing could be further from the truth. The problem is that we misuse the stat of points per game, thinking that it accurately measure total offensive ability. Obviously, Mike Kelley is NOT a strong scorer. Jumble the numbers any way you want and you cannot conclude that Mike Kelley is a good scorer. But, does this mean he isn’t a decent OVERLL offensive player? Not in the least. Yet, this is just one contribution that a player can make on the offensive end of the court which leads to scoring points. In the case of Kelley, he does a large number of things on the offensive end quite well that are not measure in the statistic of points per game. I don’t think you would ever say he is a great offensive player or even above average, but if you take into account his entire contribution on the offensive end, you see that he is not nearly as bad as his points per game average might make it appear. Let’s not use stats for things they were not intended for.
CONCLUSION; Yes, stats do lie and can misrepresent. However, if we do the following, they indeed illuminate and accurately analyze a given phenomena.
1.) Decide specifically what it is you want to measure. Are you measuring net scoring output, or are you measuring efficiency? Are you measuring one specific skill or are you measuring a cumulative group of skills?
2.) Accurately determine and take into effect the context of each situation.
3.) Use as much objective data as possible. Subjective reflection and analysis is always going to occur as sometimes there simply is not enough objective date available. However, label the subjective data as such. Also make sure your data set is large enough. This in itself is at times a subjective opinion, but it still is a relevant issue.
4.) Do not misuse the statistic. Make sure it reflects what you are saying it reflects.
If you follow these general guidelines, stats DO NOT LIE...they simply measure one particular facet of a particular game (in this case). They may not tell the entire story, but they certainly serve to illuminate the truth. Very often this is done by eliminating the wrong conclusions rather than providing the right ones, though the net result is often the same.
So, when in fact you hear the “three kinds of lies: lies, damn lies, and statistics” realize that there is some element of truth to it, but it is generally used by those that do not adhere to these very common sensible guidelines. People are comfortable with what they already believe in subjectively. It is easy to believe that the Badgers were a terrible offensive team or an overall terrible rebounding team or that Mike Kelley is an inept offensive player if you are not willing to really examine the issue in depth but rather are content to misuse particular stats or refuse to consider context. But if you are clear and complete in your evaluation, you realize that in fact, these presumptions are in fact not true.
(For the record, they were a decent offensive team, only a poor OFFENSIVE rebounding team [which by the way hurts the offense as well because it leads to fewer possessions], and Mike Kelley is only a poor scorer, not a horrible offensive player as a whole...though he does need to improve)