Now you see it, now you don’t: On the deepening crisis in evidence production, and evaluation, in the social sciences (Part I: Problem description)

by

This may be a crisis that you have not heard about. Or maybe you have heard about some of the more egregious recent exhibits such as Sanna or Smeesters or Stapel or, possibly, Geraerts and Dijksterhuis.

Sanna resigned from the University of Michigan in May 2012 after a University of North Carolina investigation of concerns raised by University of Pennsylvania researcher Uri Simonsohn apparently found those concerns justified. (See here for the full story and here for the article describing the statistical methods that led Simonsohn to his concerns.) While the investigation results, as often in such cases, were not made public, as of today Sanna has had retracted 8 of his publications [see here for most recent RetractionWatch update on Sanna], presumably because of the odd data patterns that Simonsohn had ferreted out. (The reason for Sanna’s resignation were not made public.) Smeesters was a marketing professor at the Rotterdam School of Management. The investigation committee of Erasmus University Rotterdam found enough problems in his studies for the university to ask in June 2012 for the retraction of two of his articles [see here for the most recent RetractionWatch update on Smeesters; see also this excellent ScienceInsider article on this case].  Also in 2012, after more than a decade of outright invention of data for literally dozens of paper, Diederik Stapel (recently profiled in a lengthy piece in the New York Times) finally had his overdue downfall.  [See here for most recent RetractionWatch update on Stapel]. Geraerts (see here a good discussion of that case) has denied that she faked data but two of her co-authors have asked the editors of “Memory” to retract their names from the joint paper. The Dijksterhuis situation (for a somewhat sensationalist write-up see this newsbit in “Nature”] is different from the Stapel and Sanna cases (and possibly the Smeeters and Geraerts situations) in that Dijksterhuis has not been accused of misconduct and none of his papers had to be withdrawn although several have come under heavy attack [see a prominent example here: Shanks et al. on the priming of intelligent behaviour being an elusive phenomenon; that article triggered a sometimes vitriolic debate in the wake of a news bit in “Nature”]

Fraud and misconduct in the social sciences are hardly a novel phenomenon; just google names like Karen Ruggiero and Marc Hauser, formerly of Harvard University. As in other walks of life, it is hard to guard against the deviance evidenced in the behaviour of Ruggiero, Hauser, Sanna, Stapel, Lichtenthaler, and the likes. It is good news that people like them get ferreted out seemingly at an increasing rate. (Of course, we do not know how many like cases are out there and what the growth rate is of new cases .)

Fraud and misconduct, however, are just the tip of the proverbial iceberg that the social sciences ship seems destined to collide with. Hiding the true dimension of the underwater parts of the iceberg is what the committee investigating the Stapel affair called a culture of verification bias. To wit,

“One of the most fundamental rules of scientific research is that an investigation must be designed in such a way that facts that might refute the research hypotheses are given at least an equal chance of emerging as do facts that confirm the research hypotheses. Violations of this fundamental rule, such as continuing to repeat an experiment until it works as desired, or excluding unwelcome experimental subjects or results, inevitably tend to confirm the researcher’s research hypotheses, and essentially render the hypotheses immune to the facts. Procedures had been used in the great majority of the investigated publications that lead to what is referred to here as verification bias.” (p. 48)

The culture of verification bias leads to publication of spurious positive results that often go unchallenged because editors and journal publishers are rarely open to replications (and negative results). Ed Yong’s excellent recent news feature for Nature  reviews some of the relevant literature detailing this “pervasive bias” and quotes Ioannidis as saying that “most published research findings are false” and that across the sciences although the problem seem more severe in the softer sciences.  Fanelli (“PLoS ONE” 2010, abstract) concluded

“the odds of reporting a positive result were around 5 times higher among papers in the disciplines of Psychology and Psychiatry and Economics and Business compared to Space Science, 2.3 times higher in the domain of social sciences compared to the physical sciences, and 3.4 times higher in studies applying behavioural and social methodologies on people compared to physical and chemical studies on non-biological material.”

See also Fanelli’s related 2012 “Scientometrics” article in which he documents that negative results are disappearing from most disciplines and countries. (I will revisit this article in due course since Fanelli also has interesting things to say about the institutional drivers of these developments.)

Small studies with low power is one of the key issues feeding into these findings, as argued in a highly readable piece by Kate Button in “The Guardian” [the piece contains a link to an article by herself and others including Ioannidis], and as argued in an important Ferguson & Heene contribution to  “Perspectives on Psychological Science”.  These authors argue that small sample size undermines the reliability of neuroscience and that of psychology. The situation is likely to be similar in (experimental) economics where the median power of dictator studies (according to work in progress by one of my Ph.D. students, Le Zhang) seems to be less than 25 percent, meaning that if there are 100 true effects to be discovered studies with 25 percent power on average can be expected to discover only 25 of them. (The flipside side of this is that those small, low-powered studies that discover a true effect, are more likely to overstate the effect.)  Of course there are many other ways, some of them more consciously practiced than others, that lead to the irreproducibility of results. This is true for all laboratory social sciences.

One thing that is particularly troubling about the current controversy swirling around Dijksterhuis (and earlier the controversial studies  by Bargh and Bem discussed in Yong’s already mentioned piece) is that experimental protocols are often severely underspecified. “The scientific enterprise rests on the ability of other researchers to replicate findings,” says Joel Cooper reflecting on fraud, deceit and ethics in one of the victimized journals, and it is troublesome to find, in the discussion that followed Ed Yong’s news item in Nature, that many experimenters seem to have trouble with that concept .

The one good development is that these problems are now being discussed. Unfortunately, the fact that they are being discussed does not, as Gary Marcus seems to argue in his recent “New Yorker” piece, guarantee that effective solutions will be found; see also Bobbie Spellman’s optimistic assessment of the situation.  The game called “stabilization of the evidence base” has many players with often diverging interests  and the institutional arrangements are not engineered  by an omniscient social planner.  It is quite questionable whether in light of the ever increasing competition among researchers for grant money, recognition, and even fame,  or research-only positions science will be able to self-correct.

I shall summarize almost a dozen proposals in a sequel to this problem description.

For now, I am happy to hear comments on it.

Class, discuss !

11 Responses to "Now you see it, now you don’t: On the deepening crisis in evidence production, and evaluation, in the social sciences (Part I: Problem description)"
    • Hmmmh. Very incomplete (e.g., Lichtenthaler). RetractionWatch seems by far the more reliable site. Also, plagiarism and self-plagiarism are two different animals to my mind and not properly distinguishing them is a very problematic procedure.

      • Yes, very true. Do you know of any cases where the two different plagiarism types have interacted, e.g., copying someone else’s main idea (published previously) and then sending it to multiple journals at the same time? What would you label this? Thanks.

        • I am not aware of any such case. If they exist it is likely to be a very small number. I guess we could call this a self-plagiarizing plagiarism 😉

  1. I’d like to use this Comment space to shamelessly promote the Replications Section of the journal Public Finance Review, of which I am a Co-editor. More journals should encourage replications — with the caveat that they should publish both negative and positive confirmations of the original research. The discipline needs to incentivize the checking of published research.

    • I’m with you, Bob. And happy 2 hear about PFR. And, yes, of course both negative and positive confirmations should be published, That’s why pre-registered replics is such a good idea me thinkst.

  2. Very few people pay much attention to econometric methods and data in economics — this is apparent just from the published papers. Many these days run regressions or non-parametric tests on point estimates from individuals, for instance. And OLS on anything that moves, despite that year of grad school econometrics warning you about this. So why worry so much about fraud when we have the deeper issue of incompetence staring us in the face and we do nothing about it?

    • I thought I made it clear that outright fraud and misconduct is really not the key issue. Apparently not. Of course I am with you on the need for better metrics. That there are problems of competence does not mean that one can, or should, not worry about those other issues that I try to address here … they are easily as troubling as the ones that you mentioned.

    • I think you’re reading the wrong journals. Editors – and readers- are extremely concerned about econometric methods though not so much data. OLS won’t get you very far in economics.

%d bloggers like this:
PageLines