This may be a crisis that you have not heard about. Or maybe you have heard about some of the more egregious recent exhibits such as Sanna or Smeesters or Stapel or, possibly, Geraerts and Dijksterhuis.
Sanna resigned from the University of Michigan in May 2012 after a University of North Carolina investigation of concerns raised by University of Pennsylvania researcher Uri Simonsohn apparently found those concerns justified. (See here for the full story and here for the article describing the statistical methods that led Simonsohn to his concerns.) While the investigation results, as often in such cases, were not made public, as of today Sanna has had retracted 8 of his publications [see here for most recent RetractionWatch update on Sanna], presumably because of the odd data patterns that Simonsohn had ferreted out. (The reason for Sanna’s resignation were not made public.) Smeesters was a marketing professor at the Rotterdam School of Management. The investigation committee of Erasmus University Rotterdam found enough problems in his studies for the university to ask in June 2012 for the retraction of two of his articles [see here for the most recent RetractionWatch update on Smeesters; see also this excellent ScienceInsider article on this case]. Also in 2012, after more than a decade of outright invention of data for literally dozens of paper, Diederik Stapel (recently profiled in a lengthy piece in the New York Times) finally had his overdue downfall. [See here for most recent RetractionWatch update on Stapel]. Geraerts (see here a good discussion of that case) has denied that she faked data but two of her co-authors have asked the editors of “Memory” to retract their names from the joint paper. The Dijksterhuis situation (for a somewhat sensationalist write-up see this newsbit in “Nature”] is different from the Stapel and Sanna cases (and possibly the Smeeters and Geraerts situations) in that Dijksterhuis has not been accused of misconduct and none of his papers had to be withdrawn although several have come under heavy attack [see a prominent example here: Shanks et al. on the priming of intelligent behaviour being an elusive phenomenon; that article triggered a sometimes vitriolic debate in the wake of a news bit in “Nature”]
Fraud and misconduct in the social sciences are hardly a novel phenomenon; just google names like Karen Ruggiero and Marc Hauser, formerly of Harvard University. As in other walks of life, it is hard to guard against the deviance evidenced in the behaviour of Ruggiero, Hauser, Sanna, Stapel, Lichtenthaler, and the likes. It is good news that people like them get ferreted out seemingly at an increasing rate. (Of course, we do not know how many like cases are out there and what the growth rate is of new cases .)
Fraud and misconduct, however, are just the tip of the proverbial iceberg that the social sciences ship seems destined to collide with. Hiding the true dimension of the underwater parts of the iceberg is what the committee investigating the Stapel affair called a culture of verification bias. To wit,
“One of the most fundamental rules of scientific research is that an investigation must be designed in such a way that facts that might refute the research hypotheses are given at least an equal chance of emerging as do facts that confirm the research hypotheses. Violations of this fundamental rule, such as continuing to repeat an experiment until it works as desired, or excluding unwelcome experimental subjects or results, inevitably tend to confirm the researcher’s research hypotheses, and essentially render the hypotheses immune to the facts. Procedures had been used in the great majority of the investigated publications that lead to what is referred to here as verification bias.” (p. 48)
The culture of verification bias leads to publication of spurious positive results that often go unchallenged because editors and journal publishers are rarely open to replications (and negative results). Ed Yong’s excellent recent news feature for Nature reviews some of the relevant literature detailing this “pervasive bias” and quotes Ioannidis as saying that “most published research findings are false” and that across the sciences although the problem seem more severe in the softer sciences. Fanelli (“PLoS ONE” 2010, abstract) concluded
“the odds of reporting a positive result were around 5 times higher among papers in the disciplines of Psychology and Psychiatry and Economics and Business compared to Space Science, 2.3 times higher in the domain of social sciences compared to the physical sciences, and 3.4 times higher in studies applying behavioural and social methodologies on people compared to physical and chemical studies on non-biological material.”
See also Fanelli’s related 2012 “Scientometrics” article in which he documents that negative results are disappearing from most disciplines and countries. (I will revisit this article in due course since Fanelli also has interesting things to say about the institutional drivers of these developments.)
Small studies with low power is one of the key issues feeding into these findings, as argued in a highly readable piece by Kate Button in “The Guardian” [the piece contains a link to an article by herself and others including Ioannidis], and as argued in an important Ferguson & Heene contribution to “Perspectives on Psychological Science”. These authors argue that small sample size undermines the reliability of neuroscience and that of psychology. The situation is likely to be similar in (experimental) economics where the median power of dictator studies (according to work in progress by one of my Ph.D. students, Le Zhang) seems to be less than 25 percent, meaning that if there are 100 true effects to be discovered studies with 25 percent power on average can be expected to discover only 25 of them. (The flipside side of this is that those small, low-powered studies that discover a true effect, are more likely to overstate the effect.) Of course there are many other ways, some of them more consciously practiced than others, that lead to the irreproducibility of results. This is true for all laboratory social sciences.
One thing that is particularly troubling about the current controversy swirling around Dijksterhuis (and earlier the controversial studies by Bargh and Bem discussed in Yong’s already mentioned piece) is that experimental protocols are often severely underspecified. “The scientific enterprise rests on the ability of other researchers to replicate findings,” says Joel Cooper reflecting on fraud, deceit and ethics in one of the victimized journals, and it is troublesome to find, in the discussion that followed Ed Yong’s news item in Nature, that many experimenters seem to have trouble with that concept .
The one good development is that these problems are now being discussed. Unfortunately, the fact that they are being discussed does not, as Gary Marcus seems to argue in his recent “New Yorker” piece, guarantee that effective solutions will be found; see also Bobbie Spellman’s optimistic assessment of the situation. The game called “stabilization of the evidence base” has many players with often diverging interests and the institutional arrangements are not engineered by an omniscient social planner. It is quite questionable whether in light of the ever increasing competition among researchers for grant money, recognition, and even fame, or research-only positions science will be able to self-correct.
I shall summarize almost a dozen proposals in a sequel to this problem description.
For now, I am happy to hear comments on it.
Class, discuss !