The "Replication Crisis"
-
I think I posted a story in TOCR about the fact that many "peer reviewed" studies are non-reproducible. If I recall, about 20-30% of scientific (heh) studies can't be reproduced. There is a bias toward publication of positive results - IOW, if your study refutes another study, we don't want to publish it - and so, things that are assumed to be true are, in fact, not so. I've talked about how "peer-reviewed" studies frequently don't get the scrutiny they should, and they slip through the dam.
Science has been in a “replication crisis” for a decade. Have we learned anything?
Here at Vox, we’ve written about how the replication crisis can guide us to do better science. And yet blatantly shoddy work is still being published in peer-reviewed journals despite errors that a layperson can see.
In many cases, journals effectively aren’t held accountable for bad papers — many, like The Lancet, have retained their prestige even after a long string of embarrassing public incidents where they published research that turned out fraudulent or nonsensical. (The Lancet said recently that, after a study on Covid-19 and hydroxychloroquine this spring was retracted after questions were raised about the data source, the journal would change its data-sharing practices.)
Even outright frauds often take a very long time to be repudiated, with some universities and journals dragging their feet and declining to investigate widespread misconduct.
That’s discouraging and infuriating. It suggests that the replication crisis isn’t one specific methodological reevaluation, but a symptom of a scientific system that needs rethinking on many levels. We can’t just teach scientists how to write better papers. We also need to change the fact that those better papers aren’t cited more often than bad papers; that bad papers are almost never retracted even when their errors are visible to lay readers; and that there are no consequences for bad research.
In some ways, the culture of academia actively selects for bad research. Pressure to publish lots of papers favors those who can put them together quickly — and one way to be quick is to be willing to cut corners. “Over time, the most successful people will be those who can best exploit the system,” Paul Smaldino, a cognitive science professor at the University of California Merced, told my colleague Brian Resnick.
So we have a system whose incentives keep pushing bad research even as we understand more about what makes for good research.
Read it all.
-
@Klaus said in The "Replication Crisis":
There is no replication crisis in physics or biology or math. The replication crisis pertains to social sciences only.
Quelle surprise, as they say in Canada.
-
I think it is quite unfortunate that the "replication crisis" in some domains lowers the reputation of all sciences. It's important to specify the problem precisely.
Also, I see a lot of confusion in the mainstream press and public about the role of "peer review". Peer review is an indicator of quality, but peer review alone doesn't say anything at all. One needs to consider the reputation of the publication venue, too. I can get even the crappiest paper peer-reviewed and published somewhere. But that's completely different from a publication at the top journals in a field, which usually requires multiple years of work and even then often fails.
Social sciences are IMO particularly prone to non-reproduceable experiments for several reasons: a) it's very hard or impossible to perform "clean" experiments in which all relevant variables are under control, b) social scientists often lack a proper background in statistics, c) social science experiments often have political ramifications and are therefore more likely to be influenced by bias.
But on the other hand I'm optimistic that things get better. It's a very good sign that the problem has been identified and that people are working on fixing it. Good science is self-correcting.
-
@Klaus said in The "Replication Crisis":
The replication crisis pertains to social sciences only.
No, in medicine as well. I think it was Levitin's book, "A Field Guide to Lies," or perhaps Frankfurt's "On Bullshit," the point was made that theres's a bias toward publishing positive results.
No one wants to read "We tried corn oil as a treatment for dementia and it didn't work." However, if you published, "Corn oil for dementia lowers mortality," you might get published. When the study is replicated, guess what? It doesn't lower mortality. Few studies are tested this way, and of those that are, a shocking number fail to be valid.
HEre's an evaluation from 2 years ago:
https://pubmed.ncbi.nlm.nih.gov/29463308/
Abstract
Background: The ability to reproduce experiments is a defining principle of science. Reproducibility of clinical research has received relatively little scientific attention. However, it is important as it may inform clinical practice, research agendas, and the design of future studies.
Methods: We used scoping review methods to examine reproducibility within a cohort of randomized trials examining clinical critical care research and published in the top general medical and critical care journals. To identify relevant clinical practices, we searched the New England Journal of Medicine, The Lancet, and JAMA for randomized trials published up to April 2016. To identify a comprehensive set of studies for these practices, included articles informed secondary searches within other high-impact medical and specialty journals. We included late-phase randomized controlled trials examining therapeutic clinical practices in adults admitted to general medical-surgical or specialty intensive care units (ICUs). Included articles were classified using a reproducibility framework. An original study was the first to evaluate a clinical practice. A reproduction attempt re-evaluated that practice in a new set of participants.
Results: Overall, 158 practices were examined in 275 included articles. A reproduction attempt was identified for 66 practices (42%, 95% CI 33-50%). Original studies reported larger effects than reproduction attempts (primary endpoint, risk difference 16.0%, 95% CI 11.6-20.5% vs. 8.4%, 95% CI 6.0-10.8%, P = 0.003). More than half of clinical practices with a reproduction attempt demonstrated effects that were inconsistent with the original study (56%, 95% CI 42-68%), among which a large number were reported to be efficacious in the original study and to lack efficacy in the reproduction attempt (34%, 95% CI 19-52%). Two practices reported to be efficacious in the original study were found to be harmful in the reproduction attempt.
Conclusions: A minority of critical care practices with research published in high-profile journals were evaluated for reproducibility; less than half had reproducible effects.
The Wiki article on replication says this:
Out of 49 medical studies from 1990–2003 with more than 1000 citations, 45 claimed that the studied therapy was effective. Out of these studies, 16% were contradicted by subsequent studies, 16% had found stronger effects than did subsequent studies, 44% were replicated, and 24% remained largely unchallenged.[59] The US Food and Drug Administration in 1977–1990 found flaws in 10–20% of medical studies.[60] In a paper published in 2012, Glenn Begley, a biotech consultant working at Amgen, and Lee Ellis, at the University of Texas, argued that only 11% of the pre-clinical cancer studies could be replicated.[61][62]
-
Even when replication is not hard, there is little incentive to replicate. I am under the impression that the prestige and economic incentive are no where near as high as those for doing original experiments.
Funding for replication is minuscule compared to funding for original experiments. Prizes for replication is virtually nonexistent.
I fear that replication is a lot like “affordable housing” in that unless you make it a requirement to fund it along every major investigation, it just won’t happen on its own on any appreciable scale. E.g., reserve x% of funding for a second independent team to replicate the experiments completed by the first team who won the original research grant.
-
@George-K said in The "Replication Crisis":
@Klaus said in The "Replication Crisis":
The replication crisis pertains to social sciences only.
No, in medicine as well.
OK, fair enough. I should have included medicine. But I think it is important to maintain a distinction to the "hard sciences".
For instance, if I look at my own field, in recent years it has become standard procedure in the most prestigious publication venues to actually reproduce experiments before the paper is published, if it is practically feasible at all. Also, the whole experimental setup, including all programs and all data that is used in the experiments, are published together with the paper, such that everybody who is willing to invest a small amount of work can reproduce the experiments. My domain is maybe special in that the experiments often merely involve running and observing programs, so it is sufficient to hire a group of graduate students and let them re-run the experiments done in the paper. But I think it is unfair to lump all sciences together.
-
@Axtremus said in The "Replication Crisis":
Even when replication is not hard, there is little incentive to replicate. I am under the impression that the prestige and economic incentive are no where near as high as those for doing original experiments.
Yes, and that's one of the points that was made in (dammit I wish I remembered which) that book.
-
@Klaus said in The "Replication Crisis":
@George-K said in The "Replication Crisis":
@Klaus said in The "Replication Crisis":
The replication crisis pertains to social sciences only.
No, in medicine as well.
OK, fair enough. I should have included medicine. But I think it is important to maintain a distinction to the "hard sciences".
A disappointingly large proportion of the population probably aren't aware there's a difference.
(Of course, this statement hasn't yet been peer-reviewed, so it could be misleading)
-
@Doctor-Phibes and @Klaus actually called them "Social 'Sciences'" in his post, LOL.
-
@Doctor-Phibes said in The "Replication Crisis":
A disappointingly large proportion of the population probably aren't aware there's a difference.
There are so many things in my field that were considered absolute FACTS when I was training, only, in subsequent decades, to be discarded as bullshit.
-
I'm also not sure I completely agree with your dismissal of medical studies, @George-K .
Sure, there are many small BS studies.
But everybody who can count up to three knows that not all studies are equal.
If there's a proper large-scale study involving many thousand patients at different hospitals, then the result of such a study can immediately change the standard treatment because they are very reliable. If there's a single small study, a positive results merely means "Maybe there's something interesting here; it might be worth it to investigate it further".
Hence I'd argue it is not so relevant if many results of small studies cannot be replicated. They are merely indicators of whether it's worth it to investigate the subject matter more. What counts is whether the big studies are reliable, and I haven't heard many complaints about that.
-
@George-K said in The "Replication Crisis":
@Doctor-Phibes said in The "Replication Crisis":
A disappointingly large proportion of the population probably aren't aware there's a difference.
There are so many things in my field that were considered absolute FACTS when I was training, only, in subsequent decades, to be discarded as bullshit.
Well, look at, for instance, how median survival rates in oncology have developed in the last 40 years.
There's progress. What we know now is a better approximation of the truth than what we knew 40 years ago.
So I think the implicit message of your statement - today's FACTS are just as unreliable - is not appropriate. Surely some things we believe today will be invalidated in 40 years. But the current knowledge is the best guess we can make now.
-
@Klaus said in The "Replication Crisis":
So I think the implicit message of your statement - today's FACTS are just as unreliable - is not appropriate.
Well, facts are facts.
There's certainly progress in many things, like oncology as you point out.
But, my point is that many of the things that have caused the progress were, in fact, just guesses. We'll never know how many of the guesses fell by the wayside because they were not reproducible, or just plain wrong.
An example is the use of the Swan-Ganz catheter - gonna get nerdy here, so bear with me. The SG catheter is a device inserted into a patient's heart via a peripheral or central vein. The catheter measures pressures in various chambers of the right heart, and is supposed to tell you how the heart is functioning. It can determine blood flow (cardiac output), oxygen levels as well as just pressure. In the early 1980s it became standard practice to insert this in critically ill patients. I have inserted, literally, hundreds of these. Yes, hundreds.
Then...things started to change. Looking at mortality and morbidity in ICU patients, turns out that patients who had these things put in did worse than those without, probably because of poor decision making when looking at the data they provide. Toward the end of my career, it had become a rare thing.
Look at the use of beta-blockers during surgery. Same thing.
So, my point is that, basically, I agree. Medicine is not "hard science" as you put it. It's trial and error. Sometimes the trial works, but it takes a long time to reproduce and become standard.