Bad Statistics in Sports Science Called Out

by Brian Rigby, MS, CISSN

2 Replies

Quick

I just read an article in FiveThirtyEight entitled “How Shoddy Statistics Found A Home In Sports Research” (by Christie Aschwanden and Mai Nguyen)—it’s an interesting read, though perhaps more so to people interested in the bones of statistical analysis than to someone more interested in sports nutrition or sports science. The article discusses a statistical technique known as magnitude-based inference (or MBI), a technique that only sports scientists use and which other statisticians question the validity of.

I don’t have a strong enough background in statistics to really add anything to the controversy, but I wanted to write a bit because I think the underlying issue here is important. The reason sports scientists use MBI at all isn’t because they think it’s a superior method, but rather because they’re limited by their own experimental design—specifically the number of participants they have. As I’ve written before, sports science has an atrocious tendency to use extremely small sample sizes (often less than ten people); by comparison, most scientists would consider 100 participants in a study to be an absolute minimum to derive any real value from the results and to preferably have many more.

MBI uses what we might call “statistical magic” to separate the signal from the noise, but the problem is it doesn’t necessarily do this with any accuracy. The method is flawed, yes, but the method isn’t the real problem—the real problem is that you will never, ever, be able to determine how likely a supplement (or training technique, etc.) is to work with a sample size of only ten. Any method that claims to improve accuracy with such a small sample is akin to those magic programs criminal investigators use on TV that manage to convert horrendously pixelated photos into high-res images; the data simply isn’t there, and while we can make some reasonable guesses, we can’t guess our way quite as far as promised.

There’s a solution, though. Or, there’s at least part of a solution. The article discusses the inherent difficulty in recruiting a large number of professional (or highly trained) athletes to partake in a study, but this is approaching sports science from the wrong end. Yes, we need data on the top performers because we’re interested in whether certain things affect performance when it’s already extremely high—but we also should be getting data from everyday athletes, individuals who aren’t beginners but who also don’t do sport for a living. Probably that describes most of you (as it does me).

The biggest advantage here is that while there are only perhaps a handful of elite athletes available for a study, there are thousands of casual athletes available. We might not find out whether a supplement (or whatever) affects the top 1% of performance, but with a large study in a moderate population we can certainly determine whether a supplement is likely to have any effect at all, and if so, how large that effect will tend to be. If we have reliable data, then we don’t need statistical magic to make inferences about higher-performing populations, even if we have small sample sizes and limited data—we can say, “hey, supplement X improved max reps by 5% on average in this large group, so it’s not implausible that we saw the same results in a smaller group of elite athletes.” We still can’t be certain (but then, we never are), but at least we’ve established scientific plausibility.

Unfortunately, I don’t know if this quick post will be enough to change the methods of any sports scientists (haha), so we’re likely to see the same trend continue. Rest assured, though, that when I finally get the interest and money together to fund some sports science research on climbing, I’ll be drawing my first candidates from the “active but not elite climber” pool. It might not be as sexy as working with big names, but it sure as hell will give us better (and more applicable) data!

2 comments

  1. Jeremy

    I completely disagree with your claim that; “most scientists would consider 100 participants in a study to be an absolute minimum to derive any real value from the results”. It’s totally normal in medical and biological sciences to accept studies with far fewer than 100 samples. Smaller sample sizes mean larger error bars, but they don’t mean less meaningful results. The statistical methods and experimental design are far more important than the sample size. If you conduct a double blind placebo controlled study with a total of 40 participants (20 treated and 20 in the control) and get a statistically significant result, there’s at least a 95% probability that this reflects a genuine effect and not random chance, and only those not trained in statistics would find think this lacks credibility.

  2. Brian Rigby, MS, CISSN Post author

    You make a valid point, but in this case there’s a specific statistical method being called into question, and the reason the questionable method is being used at all is because there simply aren’t enough participants to use a better method. We can quibble about whether 100 participants or 40 participants would be suitable to prevent this sort of problem, but in either case the presumption is that we have enough participants to adequately power the study.

Leave a Reply

Your email address will not be published. Required fields are marked *