Here‘s the Nieuwenhuis et al. 2011 Nature Neuroscience paper that Sharon and I were talking about, which shows that a statistical error is surprisingly common.
2 thoughts on “Paper on interactions and null results”
James
One very compelling presentation I saw at the 2012 Psychonomics meeting made a great point I hadn’t considered about only considering significance.
Gregory Francis makes the point that we’re generally told that replication solidifies findings. But of course, we also get a lot of pressure to publish only the significant results of any study. That means that because we generally treat it as a competition between the number of significant replications, we can evaluate some things as strongly replicated when they aren’t.
If an effect actually exists, we should only replicate it a number of times that makes sense given the power of the effect. So if you have a low-ish power, and lo and behold you get 5 successful replications in a row, that’s more suspicious than confirmatory.
I feel like I should be taking a stats seminar every day for the rest of my career to keep from making these sorts of mistakes.
While I think James makes good points, I think we psychologists here at UMass are given sufficient statistical training to know immediately how wrong the Niewenhuis (2011) error is.
Frankly, the error that is discussed (concluding that there is an effect of some kind of neural training over time because the ‘after’ results are significantly different from 0 (p < 0.05) but the before results aren't) is a matter of the researchers promoting that finding being completely unaware of good experimental design. The researchers should obviously be asking if the difference between the before/after results is greater than zero, not if each score individually is–that's treating the groups independently, when they absolutely are not independent.
At a certain point, statistics is really more of an art than a science (e.g., what is considered a 'family' when considering family-wise error?), but anyone who makes the kind of error discussed in this paper should have their license to do statistics revoked.
One very compelling presentation I saw at the 2012 Psychonomics meeting made a great point I hadn’t considered about only considering significance.
Gregory Francis makes the point that we’re generally told that replication solidifies findings. But of course, we also get a lot of pressure to publish only the significant results of any study. That means that because we generally treat it as a competition between the number of significant replications, we can evaluate some things as strongly replicated when they aren’t.
If an effect actually exists, we should only replicate it a number of times that makes sense given the power of the effect. So if you have a low-ish power, and lo and behold you get 5 successful replications in a row, that’s more suspicious than confirmatory.
I feel like I should be taking a stats seminar every day for the rest of my career to keep from making these sorts of mistakes.
While I think James makes good points, I think we psychologists here at UMass are given sufficient statistical training to know immediately how wrong the Niewenhuis (2011) error is.
Frankly, the error that is discussed (concluding that there is an effect of some kind of neural training over time because the ‘after’ results are significantly different from 0 (p < 0.05) but the before results aren't) is a matter of the researchers promoting that finding being completely unaware of good experimental design. The researchers should obviously be asking if the difference between the before/after results is greater than zero, not if each score individually is–that's treating the groups independently, when they absolutely are not independent.
At a certain point, statistics is really more of an art than a science (e.g., what is considered a 'family' when considering family-wise error?), but anyone who makes the kind of error discussed in this paper should have their license to do statistics revoked.