Science empirically questions nature to uncover the mechanisms of life. Through the research discovery and communication processes, science drives society forward. Unfortunately, science has been plagued by a dissemination crisis, where research that could be published is blocked by publishers, and the quality and validity of research that is published is too often questionable or outright false. Science is romanticized as a flawless system, perfectly conducted with good intentions.
However, this glamorized image of science is an ideal rather than a reality. While publication rates increase each year (1), publication biases and time lag biases exist such that 50–60% of research is never published (2-6), and lengthy delays - sometimes spanning four to eight years - occur before some research is disseminated (7, 8). A preponderance of research with statistically significant and positive results is published, while studies with statistically nonsignificant and negative findings are suppressed (9-13). These biases are collectively known as dissemination biases (14, 15).
Additionally, the quality of a large amount of research is, at best, poor and, at worst, fraudulent and falsified. Retraction rates of biomedical and life science research have increased 10-fold since 1975 (16-18) due to research misconduct from errors, fraud, plagiarism, and contradictory or controversial results found in molecular studies as well as in clinical trials and epidemiological studies (19-25). These issues underlie the reproduction and replication crisis, where between 35-90% of research cannot be reproduced across social and life science disciplines investigating major domains such as psychology, genetics, cancer, and cardiovascular disease, amongst others (25-30).
These dissemination biases represent scientific misconduct because research that is not shared with the research community for such reasons is not afforded the opportunity to contribute to the overall literature. Within all research fields, but particularly in biomedical domains, a biased literature base is detrimental. Incomplete or skewed outcome reporting weakens data synthesis, culminating in flawed estimations and conclusions in systematic reviews/meta-analyses and clinical guidelines. Consequently, researchers, healthcare professionals, and the public are prevented from obtaining reliable evidence of the efficacy and effectiveness of interventions, which in turn negatively impacts their ability to make appropriate healthcare decisions.
Science is hard and imperfect. While simple errors happen (31-33), they cannot explain the scale of the dissemination crisis. Rather, the crisis stems from the linear and restrictive publication paradigm established in the 17th century wherein publications are submitted to publishers, reviewed, accepted or rejected, and then - if accepted - eventually published, accessed, and applied to real world scenarios.
Within this modern-day linear paradigm, peer review is considered the safeguard of the research dissemination process. An emergent concern echoed through the scientific community, however, is that peer review is a black box with unpredictable procedures that negatively impact dissemination and is “based on faith in its effects, rather than on its facts” (34, 35). From the time a manuscript is submitted to a journal, dissemination biases are fueled because publishers value impressive results more than exceptional methodological rigor (36).
Among the biases, positive outcome bias, where studies reporting positive and statistically significant outcomes are more likely to be published by journals than those reporting statistically non-significant and negative outcomes, is one of the most consequential (37, 38). The bias has led to the “file-drawer effect,” where the number of published positive studies has increased while negative studies have decreased (12).
Between 1990 and 2007, the number of studies reporting positive results grew 22%, a statistically significant trend increasing by 6% per year and one that is consistent across countries and academic disciplines (9).
Between 1991 and 2008, there was a statistically significant decrease in the ratio of non-significant to significant results in published studies (13). The increasing positive outcome bias in the literature may be partly due to research with false-positive results (39).
Between 1992 and 2014, out of 44 published reviews, the mean statistical power in the studies was small at 0.24, and this power had not increased in six decades (11). These results indicate that studies are underpowered and are less likely to confirm true effects three out of four times, which inflates effect sizes within the literature base with false positive results (40).
Given the decreased odds that nonsignificant or negative results will be published, researchers are incentivized to produce statistically significant and positive results because they are publishable and marketable for journals (41-44). To achieve publishable results, a growing concern is that researchers utilize questionable, unethical, biased and fraudulent research practices to conduct studies and analyze results, practices that the Office of Research Integrity categorizes as FFP (Fabrication, Falsification, and Plagiarism) (45). Numerous types of misconduct occur, including utilizing inappropriate study designs, altering the trial protocol, fabricating data and results, misrepresenting or failing to pre-specify statistical analyses, failing to report post-hoc analyses, selective outcome reporting, and many others (46). Misconduct skews the overall literature base on a given topic and undermines the objectivity and integrity of science.
One of the most common types of misconduct is inflation bias, also known as P-hacking, where researchers misreport true effect sizes by manipulating data to derive statistically significant results. Text-mining studies and meta-analyses demonstrate that P-hacking is widespread throughout biomedical and life science research. Among all open access papers in the PubMed/MEDLINE database, there is strong evidence that researchers turn non-significant results into significant results, demonstrated by comparing the distribution of reported to expected P-values (47). This finding is paralleled in other research, where the frequency of reported P-values that are significant (P < 0.05) is more common than expected, a rate that increased between 1965-2005 (48, 49).
The consequences of research misconduct are reflected in the retraction rate, which has increased 10-fold since 1975. While more retractions could also reflect heightened efforts to identify and remove questionable research, their volume highlights the sheer abundance of research requiring revocation and warrants real concern for the magnitude of undetected and undetectable problems. Out of 2,074 biomedical and life science articles, 67.4% were retracted due to misconduct - whether fraud (43.4%), duplicate publication (14.2%), or plagiarism (9.8%) - whereas only 21.3% were due to simple error (17). In contrast, research examining retraction rates between 2000 and 2010 demonstrates that error (73.5%) is more common than fraud (26.6%), but echos that retractions due to fraud have statistically significantly increased over the decade. In 2013, 467 publications were retracted, and by 2015, the number had increased to 684, representing a 23% average increase during that time span (18). Overall, reported estimations of misconduct range from 0.3 – 4.9%, which are thought to be underestimates of the total rate (46).
With fraudulent research habits representing a leading and statistically significant cause of retracted articles, they have become a focus of investigative research. A pooled meta-analysis of 21 surveys asking researchers if they had falsified data demonstrated a 1.97% weighted average of researchers admitting to having fabricated or modified data at least once, and 33.7% admitting to using questionable research practices. These researchers also admitted to knowing colleagues who falsified data (14.12%) or used questionable research practices (72%) (50).
As a result of the biases, poor research methods, and research misconduct described above, the validity of the entire biomedical and life science field is called into question. Estimations indicate that false findings may constitute the majority of all research results (39). The utilization of poor quality, heterogeneous, or FFP research practices underlies discrepancies in the published literature. A meta-analysis of 370 studies testing 36 genetic associations for disease outcomes demonstrated that the first study examining an association found stronger genetic effects than subsequent studies in 69% (P = 0.011) of the studies, partly due to heterogeneous study designs that were found in 39% of the studies (P = 0.02) or bias (20). Strong initial associations, known as the first time effect, become less pronounced or fluctuate widely over time, as more evidence on a topic accumulates (22). With increasing participant numbers in studies examining the same topic, estimated genetic associations predicting cancer outcomes decrease (19) and odds ratios of treatment effects fluctuate 0.6 – 1.7 fold across medical fields (23).
Discrepancies in the literature are evident when comparing randomized controlled trials to observational studies on the same topic. Studies examining the antioxidant properties of vitamins, for example, have demonstrated inverse associations with all-cause mortality, cancer, and cardiovascular disease in observational studies. However, when randomized controlled trials are conducted, testing the same association, the protective nature of antioxidant supplementation disappears (21). Inconsistencies also occur between studies utilizing similar trial designs. Re-analysis of 37 randomized controlled trials demonstrated that 62% of results were changed after re-examining the results, including reinterpretation of the patients that should be treated and changes in the direction, magnitude, and statistical significance of the treatment effect (24). Similarly, out of 49 highly cited original clinical trials, subsequent studies contradicted 16% of the original research, while 16% demonstrated weaker treatment effects (25). Discrepancies may be due to errors in the conduct, reporting, and analysis methods (51). Re-analysis of 250 controlled trials demonstrated that treatment effects were overestimated (P < 0.001) when trials utilized inadequate concealment methods or failed to adequately report the concealment methods (52).
Within the publish or perish culture, researchers seek novel discoveries because they are favored by publishers, at the expense of re-examining previously conducting studies. Yet, while an objective of science is to uncover new information, the examples of contradictory results demonstrate that it is equally, if not more, important to corroborate research to understand if results can be trusted. Research needs to be reproducible, through reanalysis of study data, and replicable, by re-conducting studies using the same methodology to confirm that similar findings can be obtained.
Attempts to confirm previous research have resulted in a reproduction and replication crisis where between 35-90% of research has been found to be irreproducible or irreplicable (25-29). Of 49 highly regarded biomedical studies covering topics such as hormone-replacement therapy for women, vitamin E for heart disease, coronary stents to prevent heart attacks, and daily low-dose aspirin to control blood pressure and protect against heart attacks and strokes, 45 reported discovering effective interventions. Yet, when 34 of the studies were repeated, 41% found contradictory results (25).
Bayer Pharmaceuticals attempted to replicate 67 studies testing drugs for cancer, women’s health, and cardiovascular disease. Over 75% of the studies could not be reproduced to attain similar outcomes (28). Efforts to reproduce 18 microarray-based gene expression studies were only able to fully reproduce two studies, partially reproduce six studies, and could not reproduce any part of 10 studies (53). In 2017, the Reproducibility Project: Cancer Biology was initiated to verify 50 influential cancer studies published in high impact journals such as Nature, Science, and Cell.
As of 2019, efforts to reproduce the results from 10 studies had been conducted. Four studies reproduced important parts of the original studies, four studies reported mixed reproducibility of results, and two studies could not be reproduced at all (27). The biotechnology company Amgen attempted to reproduce 53 hallmark studies examining novel cancer treatments. Over 89% of the studies could not be replicated with similar findings (54). Attempts to replicate eight epidemiology genome-wide association studies examining 1151 gene associations demonstrated that only 1.2% of gene loci-phenotype associations could be replicated (30).
The reproduction and replication crisis extends to the social sciences. Out of 100 psychology studies published in three top psychology journals, 97% of the original studies demonstrated significant results. However, out of 100 replication attempts, only 36% obtained similar findings, 47% contained effect sizes that were within the original 95% confidence interval, and 39% of effects were rated to have successfully replicated original results.
A meta-analysis of the original plus replicated studies revealed that only 68% of the results were statistically significant (26). Replication attempts to verify results from 21 studies published in Nature and Science demonstrated that 38% of effect sizes were not in the same direction as the original results, replications were on average 50% of the original effect size, and the true-positive rate was 67% (29).
Lack of reproducible and replicable outcomes is extremely concerning because individual studies form the basis for systematic reviews/meta-analyses and guidelines that inform clinical and healthcare policy decisions. If results cannot be corroborated, decisions about the application of research may have a dubious basis. The reproduction and replication crisis is the culmination of the dissemination crisis and is representative of how the quantity and quality of published research is diminished. Publication industry leaders acknowledge the dissemination crisis. Dr. Marcia Angell, physician and longtime Editor-in-Chief at The New England Journal of Medicine, stated:
"It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as an editor of the New England Journal of Medicine (55)."
Similarly, Dr. Richard Horton, current Editor-in-Chief at the Lancet, echoed similar sentiments:
"The case against science is straightforward: much of the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness (56)."
The breadth of the dissemination crisis implies that there is no simple fix to remediate the resulting issues. The structure and function of the science publication system must be called into question. In particular, the system’s paradigm must be examined, since it largely incentivizes novel studies demonstrating statistically significant and positive effects over quality research that reports negative and non-significant effects. In 1970, Thomas Kuhn, a renowned physicist and philosopher of science, proposed that scientists will switch allegiance between paradigms when evidence for the dominant paradigm becomes large enough to demonstrate that the paradigm is outdated and in need of reform (57). As demonstrated, the evidence has accumulated that the products and actions resulting from the science publication system are problematic and require modification. Incentives fuel actions, particularly in science:
An incentive structure that rewards publication quantity will, in the absence of countervailing forces, select for methods that produce the greatest number of publishable results. This, in turn, will lead to the natural selection of poor methods and increasingly high false discovery rates. Institutional change is difficult to accomplish, because it requires coordination on a large scale, which is often costly to early adopters (11).
Restructuring and achieving adoption of a new paradigm will only be possible if the new paradigm contains the necessary incentives and if the infrastructure is in place to functionalize the paradigm. The publication system’s incentive structure needs to shift away from rewarding quantity to rewarding quality, while ensuring that the total quantity of research can be disseminated and evaluated by the scientific community.
Ultimately, the dissemination crisis can only be remediated through widespread and fundamental change, and the preprint model, coupled with post-publication review, is a vehicle helping to catalyze this change. The model provides a safeguard against dissemination biases hindering research from being shared with the scientific community, while allowing new incentive and validation programs to emerge.
References
1. Detailed Indexing Statistics: 1965-2017. U.S. National LIbrary Of Medicine. https://www.nlm.nih.gov/bsd/index_stats_comp.html. Published 2018. Accessed September 20, 2018.
2. Blumle A, Meerpohl JJ, Schumacher M, von Elm E. Fate of clinical research studies after ethical approval--follow-up of study protocols until publication. PloS one. 2014;9(2):e87184.
3. Riveros C, Dechartres A, Perrodeau E, Haneef R, Boutron I, Ravaud P. Timing and completeness of trial results posted at ClinicalTrials.gov and published in journals. PLoS medicine. 2013;10(12):e1001566; discussion e1001566.
4. Ross JS, Tse T, Zarin DA, Xu H, Zhou L, Krumholz HM. Publication of NIH funded trials registered in ClinicalTrials.gov: cross sectional analysis. Bmj. 2012;344:d7292.
5. Hallinan ZP, Getz KA, Bierer BE. Compliance with results reporting at ClinicalTrials.gov. The New England journal of medicine. 2015;372(24):2370.
6. Scherer RW, Langenberg P, von Elm E. Full publication of results initially presented in abstracts. Cochrane Database of Systematic Reviews. 2005(3).
7. Hopewell S, Clarke M, Stewart L, Tierney J. Time to publication for results of clinical trials. The Cochrane database of systematic reviews. 2007(2):Mr000011.
8. Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results. The Cochrane database of systematic reviews. 2009(1):Mr000006.
9. Fanelli D. Negative results are disappearing from most disciplines and countries. Scientometrics. 2013;90(3):891-904.
10. Matosin N, Frank E, Engel M, Lum JS, Newell KA. Negativity towards negative results: a discussion of the disconnect between scientific worth and scientific culture. Disease models & mechanisms. 2014;7(2):171-173.
11. Smaldino PE, McElreath R. The natural selection of bad science. Royal Society open science. 2016;3(9):160384.
12. Rosenthal R. The file drawer problem and tolerance for null results. Psychological Bulletin. 1979;86(3):638-641.
13. Pautasso M. Worsening file-drawer problem in the abstracts of natural, medical and social science databases. Scientometrics. 2010;85(1):193-202.
14. Song F, Parekh S, Hooper L, et al. Dissemination and publication of research findings: an updated review of related biases. Health technology assessment (Winchester, England). 2010;14(8):iii, ix-xi, 1-193.
15. Bassler D, Mueller KF, Briel M, et al. Bias in dissemination of clinical research findings: structured OPEN framework of what, who and why, based on literature review and expert consensus. BMJ open. 2016;6(1):e010024.
16. Cokol M, Ozbay F, Rodriguez-Esteban R. Retraction rates are on the rise. EMBO reports. 2008;9(1):2.
17. Fang FC, Steen RG, Casadevall A. Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(42):17028-17033.
18. McCook A. Retractions rise to nearly 700 in fiscal year 2015. Retraction Watch. https://retractionwatch.com/2016/03/24/retractions-rise-to-nearly-700-in-fiscal-year-2015-and-psst-this-is-our-3000th-post/. Accessed September 20, 2018.
19. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005;365(9458):488-492.
20. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies. Nature genetics. 2001;29(3):306-309.
21. Lawlor DA, Davey Smith G, Kundu D, Bruckdorfer KR, Ebrahim S. Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence? Lancet. 2004;363(9422):1724-1727.
22. Ioannidis JP, Contopoulos-Ioannidis DG, Lau J. Recursive cumulative meta-analysis: a diagnostic for the evolution of total randomized evidence from group and individual patient data. Journal of clinical epidemiology. 1999;52(4):281-291.
23. Ioannidis J, Lau J. Evolution of treatment effects over time: empirical insight from recursive cumulative metaanalyses. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(3):831-836.
24. Ebrahim S, Sohani ZN, Montoya L, et al. Reanalyses of randomized clinical trial data. Jama. 2014;312(10):1024-1032.
25. Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research. Jama. 2005;294(2):218-228.
26. PSYCHOLOGY. Estimating the reproducibility of psychological science. Science. 2015;349(6251):aac4716.
27. Davis R. Reproducibility Project: Cancer Biology. eLife. https://elifesciences.org/collections/9b1e83d1/reproducibility-project-cancer-biology. Published 2017. Accessed.
28. Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nature reviews Drug discovery. 2011;10(9):712.
29. Camerer CF, Dreber A, Holzmeister F, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour. 2018;2(9):637-644.
30. Ioannidis JP, Tarone R, McLaughlin JK. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology (Cambridge, Mass). 2011;22(4):450-456.
31. Bakker M, Wicherts JM. The (mis)reporting of statistical results in psychology journals. Behav Res Methods. 2011;43(3):666-678.
32. Westra HJ, Jansen RC, Fehrmann RS, et al. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects. Bioinformatics (Oxford, England). 2011;27(15):2104-2111.
33. Ioannidis JP. Acknowledging and Overcoming Nonreproducibility in Basic and Preclinical Research. Jama. 2017;317(10):1019-1020.
34. Jefferson T, Alderson P, Wager E, Davidoff F. Effects of editorial peer review: a systematic review. Jama. 2002;287(21):2784-2786.
35. Smith R. Peer review: a flawed process at the heart of science and journals. Journal of the Royal Society of Medicine. 2006;99(4):178-182.
36. Button KS, Bal L, Clark A, Shipley T. Preventing the ends from justifying the means: withholding results to address publication bias in peer-review. BMC psychology. 2016;4(1):59.
37. Mahoney M. Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognitive Therapy and Research Vol37(5), Oct 2013, pp 996-1003. 1977;1(2):161-175.
38. Emerson GB, Warme WJ, Wolf FM, Heckman JD, Brand RA, Leopold SS. Testing for the presence of positive-outcome bias in peer review: a randomized controlled trial. Archives of internal medicine. 2010;170(21):1934-1939.
39. Ioannidis JP. Why most published research findings are false. PLoS medicine. 2005;2(8):e124.
40. Ingre M. Why small low-powered studies are worse than large high-powered studies and how to protect against "trivial" findings in research: comment on Friston (2012). NeuroImage. 2013;81:496-498.
41. Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLoS biology. 2015;13(3):e1002106.
42. Fanelli D. Do pressures to publish increase scientists' bias? An empirical support from US States Data. PloS one. 2010;5(4):e10271.
43. John L, Loewenstein G, Prelec D. Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science. 2012;23:524-532.
44. Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22(11):1359-1366.
45. Sarwar U, Nicolaou M. Fraud and deceit in medical research. Journal of research in medical sciences : the official journal of Isfahan University of Medical Sciences. 2012;17(11):1077-1081.
46. Thiese MS, Walker S, Lindsey J. Truths, lies, and statistics. Journal of thoracic disease. 2017;9(10):4117-4124.
47. Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The Extent and Consequences of P-Hacking in Science. PLoS biology. 2015;13(3):e1002106.
48. Leggett NC, Thomas NA, Loetscher T, Nicholls ME. The life of p: "just significant" results are on the rise. Quarterly journal of experimental psychology (2006). 2013;66(12):2303-2309.
49. Masicampo EJ, Lalande DR. A peculiar prevalence of p values just below .05. 2012;65(11):2271-2279.
50. Fanelli D. How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. PloS one. 2009;4(5):e5738.
51. Ioannidis JP, Lau J. Can quality of clinical trials and meta-analyses be quantified? Lancet. 1998;352(9128):590-591.
52. Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. Jama. 1995;273(5):408-412.
53. Ioannidis JPA, Allison DB, Ball CA, et al. Repeatability of published microarray gene expression analyses. Nature genetics. 2009;41:149.
54. Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012;483(7391):531-533.
55. Marcovitch H. Editors, publishers, impact factors, and reprint income. PLoS medicine. 2010;7(10):e1000355.
56. Horton R. Offline: What is medicine’s 5 sigma? The Lancet. 2015;385.
57. Kuhn TS. The structure of scientific revolutions. 2d ed. Chicago,: University of Chicago Press; 1970.