Ideology begets shoddy workmanship and bad outcomes

I have previously, on these pages, commented on developments and issues concerning the regulatory framework for what some people call the third sector and others call the not-for-profit sector (see here and here and here and here).

Most recently, and in light of the Abbott Government’s resolve to abort the Australian Charities and Not-for-profit Commission (ACNC), I reflected on the arguments in favor, and against, some such move giving particular attention to the fact that the ACNC has been in operation for more than a year. I pointed out that there is a considerable scientific literature out there that speaks to the issue of appropriate regulatory frameworks for third sectors, that we can draw on considerable experience from other jurisdictions, and that there is plenty of insights to draw on from areas such as environmental or tax compliance. But it seems the government and the many journos and “policy advisors” who currently do its bidding seems bent on ignoring that evidence.

Notwithstanding its promise that it would consult with the stakeholders, and what looks like considerable support from the sector in favor of the ACNC, and notwithstanding emerging facts that the ACNC really helps us understand the lay of the not-for-profit landscape and its deplorable intransparency and inaccountability, the Abbott Government recently introduced an ACNC repeal bill. Well, part 1 of some such bill, with a second part, said to contain all the details, promised for later delivery. That in itself is a rather questionable set-up obviously meant to prevent robust debate of the nuts and bolts of the arrangements that the Abbott Government might eventually come up with. Indeed, this Federal Government Repeal omnibus was so obviously flawed that the Selection of Bills Committee referred it to the Senate Standing Committee on Economics for inquiry; the latter is not expected to report back to Parliament until June 16 2014. The Senate Standing Committee on Economics subsequently asked for submissions to start the process “to thoroughly consider the impacts on Australia’s charitable sector of the repeal of the Australian Charities and Not-for-profits Commission and ensure adequate stakeholder consultation”.

May 2 was the deadline for submissions and as of this past weekend 16 submissions were shown on the Committee’s  website; one would hope that there is some delay in the process of listing, for such a low number of  submissions would not exactly endorse the alleged support of the sector for the ACNC. (Update Monday early afternoon: Currently there are more than 50 submissions; update Tuesday very early morning: More than 50 submission added since weekend. Stay tuned.)

Among the contributions currently available there are a couple of real gems.

The Queensland Law Society, for example, in an eminently readable six-page brief has identified as problematic the adoption of the two-stage legislative process; it points out that informed debate on Bill 1 is effectively impossible “as many of the issues necessarily raised cannot be considered in isolation, and cannot be adequately addressed without analysing the No.2 Bill.” The Society has also stressed that the current unclear situation, apart from generating considerable uncertainty for charities, “makes good administration by the current ACNC extremely difficult, which is surely an unnecessary outcome.” (p.2) Yes, an unnecessary outcome it is but, as I pointed out here, quite possibly one purpose of the exercise.

The Queensland Law Society rips bluntly into several factual misrepresentations in the regulatory impact statement (RIS), calling the RIS “less than rigorous, and not meeting the usually high standards and disciplines of Commonwealth legislative process.” (p. 2) It alleges, in other words, shoddy workmanship. Among other things, it identifies the failure to assess “the regulatory gaps and inconsistencies that were associated with regulation by the Australian Securities the Australian Taxation Office and makes no assessment of the consequences of returning regulation, piecemeal, to these agencies.” (p. 3) It recalls the ATO’s own earlier position on the need for something akin to an ACNC  and documents the efficiency of the ACNC relative to the ATO (p.4). Last but not least, the Society re-iterates “the inherent conflict of having the arm of Government charged with maximizing the tax revenue as also determining the entitlements to tax concessions”. (p. 6) These are all important issues that a legislator who would care for what is good for the sector, and for Australia for that matter, should pay attention to.

Another important submission is that by Creating Australia which, as I did here, points out that building something like the ACNC takes time and that attempts to gauge its performance after one year are at best dishonest. It also points out that the ACNC has already dealt with more than 500 complaints and investigated more than 240 of them.  It then goes on to ask

“If the ACNC were abolished, would charities and not-for-profits revert to the previous disparate regimes with ASIC and state-based incorporation requirements? 10% of charities and not-for-profits are companies limited by guarantee, 40% are incorporated. The sector will lose the benefits of a regulator which understands the challenges and opportunities which a single regulator can provide.

In the absence of the ACNC:

– Who will investigate complaints against charities and not-for-profits?

– Who will support the additional costs required to unpick one regulatory format for another?

– Indeed what form of regulation will replace the ACNC? [it is a challenge for the sector to respond to the proposed repeal as there are no details of the arrangements to replace the ACNC.]

– Who will have powers to investigate issues relating to financial management, breaches of governance procedures, the risk of fraud?

– Where will the public go to make complaints?

– Isn’t it going to be expensive to develop yet another ‘agency which succeeds the Australian Charities and Not-for-profits Commission’? Why wouldn’t you then retain the ACNC?

– We understand that ASIC can deal with governance but what will happen to the 40% of incorporated charities and not-for-profits? The 10% which are cooperatives? We cannot speak for the 40% unincorporated organisations which are mainly community focused.

– How will the powers vested in the ATO support the credibility of the charitable and not-for-profit sector?”

Good questions all. The government seems blinded by its ideological anti-regulatory stance. It makes disingenuous arguments about wanting to get rid of red tape when in fact its actions undermine and slow down the process of reducing it. The government also seems to not understand that the red tape exist because the kind of arrangements that were in place previously were, and will be, not up to the specific task (as acknowledged by the ATO and documented in the brief of the Queensland Law Society). Fact is that ATO and ASIC, pre-ACNC, were unable to produce the kind of transparency and accountability of the third sector that the third sector and society at large need and are entitled to qua the considerable tax concessions bestowed directly or indirectly on the sector. The resultant lack of transparency and accountability cost society literally billions of dollars in outright fraud and regulatory compliance costs. In light of the important – and indisputable — deterrent function that a functioning ACNC has, 15 million dollar seems a small price to pay.

It is unfortunate and deplorable that in an area where we can draw on considerable knowledge to engineer good policy outcomes, ideology and shoddy workmanship are likely to produce some bad, and very costly, policy outcomes.

 

 

Should the Government keep the ACNC? And if not, what should it put in its stead?

In the run-up to the election, Tony Abbott made it clear that a government led by him would abolish the Australian Charities and Not-for-Profits Commission (ACNC) and thus undo one of the Gillard Government’s signature projects.  Apparently that is about to happen in at least a couple of ways. In a recent speech to directors of not-for-profits, Federal Social Services Minister Kevin Andrews announced that the dismantling of the ACNC was imminent and that in its stead a National Centre for Excellence would be installed designed to be “a fount of both innovation and advocacy”.

“We’ll abolish the Australian Charities and Not-for-Profits Commission which in the view of this Government imposes an unnecessary and ponderous compliance burden on the sector,” Andrews is reported to have said in his speech. In line with an earlier speech, he is reported to also have said, “We want to transfer the focus from coercive compliance and regulation to collaborative education, training and development.” How nice.

More recently, ACNC staff was apparently also offered voluntary redundancies, a not too subtle attempt to destabilize an entity which is still in the process of being built; an attempt though that might be necessary since  abolishing the ACNC requires an Act of Parliament. In the most recent development, the Australian Tax Office is reported to plan taking over the regulatory and classificatory functions of the ACNC, another not too subtle signal that the Abbott Government is determined to unwind the ACNC.

Predictably, Federal Labor and Greens were not pleased. Shadow Assistant Treasurer Andrew Leigh, for example, argued that the Abbott Government should keep the ACNC be it only for the fact that according to one survey four out of every five “leaders” of the Third Sector were in favor of keeping it; see also here for another piece of his mind.

So what about compliance alternatives? What vision does the Abbott Government have for a regulatory framework that spans federal and state level and makes sure that the third sector is accountable and transparent and justifies the trust put into it through concessions and donations? In his speech to directors of not-for-profits Andrews was reported as having left open the possibility of some sort of national registrar; importantly, he mentioned a charity watchdog modelled after USA-based Charity Navigator , a US based entity that specializes in drawing up lists of top tens: 10 super-sized charities, 10 charities expanding in a hurry, 10 consistently low rated charities, 10 charities overpaying their for-profit fundraisers, 10 highly rated charities with low paid CEOs, and so on. Charities are also rated by their financial health and their accountability & transparency but it is not clear how that is done. Also not clear is how Charity Navigator intends to rate performance as it has announced it will.

Andrews provided few details about the exact configuration of compliance activities that he has in mind. Or for that matter about the National Centre for Excellence.

According to an “analysis” – more a rant really – on the Pro Bono Australia website, “policy analyst” John Butcher suggested that Andrews might find “the intellectual justification that has hitherto been conspicuously absent from the policy discussion” in an analysis that Helen Rittelmeyer wrote for “right-wing think-tank” Center for Independent Studies. Butcher attests Rittelmeyer “unimpeachable conservative credentials”.

I am on record as having been skeptical of both the ACNC and the way regulatory reform was engineered (see here and here and relatedly here and here; the first two CET contributions being on the insufficient way accountability and transparency of the third sector are being addressed in Australia and the last two being on the Mortensen / CAI case), so I read Rittelmeyer’s document (downloadable from the CIS website) wondering what this “credible poster child of libertarian conservatism” (Butcher) has to say and whether she really is as ideological as Butcher suggests. And what regulatory framework agenda we might have to expect from the Abbott Government.

While Rittelmeyer gets some facts and assessments right, she gets others remarkably wrong. Importantly, she simply does not know some of the relevant literature and does not understand subtle but consequential differences between charity watchdogs.  In an environment where even a Federal minister can admit without consequence that he gets his facts from wikipedia that is not too surprising, I guess. After all, disdain for science seems par for the course for the Abbott Government (e.g., a debate about climate change anyone?) But I digress. Back to Rittelmeyer.

As also highlighted by the two-page “snapshot” that accompanies her document, Rittelmeyer concludes that “even with an annual budget close to $15 million, it is unlikely that the ACNC will make significant progress on any of the three objectives it was created to address: improving public trust in the NFP sector; reducing the burden of red tape that charities now face; and policing fraud and wrongdoing in the sector. The commission’s record during its first year has only confirmed this scepticism.”

To repeat, I am by no means a fan of the ACNC and the way it was implemented but, while there are legitimate questions to be asked about the presumption of honesty which the ACNC applies and the apparently related issue of compliance staff having left soon after they were hired, Rittelmeyer’s assessment strikes me as ignorant about what it takes to build organizations.

Throughout 2013 the ACNC was in a building mode; during that one year it has registered almost 60,000 charities (including almost 2,000 new ones). If these data – as problematic by their very nature and as incomplete as they are; see also here – are made available in full to interested parties and researchers (as is  planned and as the Charity Commission of NZ did), then there is considerable potential for ferreting out various kinds of misbehaviour ranging from outright fraud to various forms of misconduct.

The ACNC received last year in excess of 200 complaints that provided leads to various problems, I understand that several dozen are still under investigation. It is simply too early to say whether the ACNC is able to ferret out fraud and various forms of misconduct. And contrary to the claim made by Rittelmeyer that fraud and misbehaviour are nothing to worry about in Australia, there remains plenty of work to do and a well-run ACNC – simply by harmonizing reporting standards and collecting facts about the lay of the not-for-profit landscape — could go some way towards solving blatant problems such as this (the high cost of  being charitable) and  this (charities banking on a lack of transparency).

Rittelmeyer takes the 2012 BDO report as indication that fraud is “a minor and declining problem in Australia” (p. 6) because the percentage of charities that has experienced fraud is at 12 % – a number similar to  the number reported by the National Fraud Authority in England  – but less than in previous years. The NFA also reports that only about only a fifth of respondents apparently attempted to measure their fraud loss. Even if the total losses amounted to only a couple of percent of the revenues (donations) flowing into the sector they would amount to more than a billion dollars. And that’s before one takes into account that self-reported data in this context are highly likely to be under-reported due to the reputational consequences.

And it’s before we start talking about various forms of misconduct. A minor problem?

(There are other reasons such as the deterrent function of an effective ACNC as well as the fact that without some such agency tax status and breaks would by default have to be determined by the ATO which seems an obvious conflict of interest although as we learned yesterday a conflict of interest obviously lost on the Abbott Government.)

As regards the argument that the ACNC has failed to address red tape, it also seem way too early to tell. Certainly the intentions are good; and even those skeptical about what the ACNC can do given its current set-up and philosophy will have to acknowledge that it takes time to reduce red tape. That the ACNC so far has not reached agreement to eliminate duplicate reporting with NSW and Victoria is hardly surprising given the adversarial politics that have been on display in Australia for years. It would be interesting to learn from the South Australia and the ACT precedents how much the harmonization of reporting has saved there. Surely it was more than the 1 hour per economically significant nonprofit that Zettelberger allows the national director of UnitingCare Australia to whine about  (p. 4).

Zettelberger also argues, comparing “trust” data from England and NZ and Australia, that the ACNC has not managed to instill trust in the sector. Again, it seems way too early to make that judgement. It also seems rather silly to compare trust scores across jurisdictions that are so different. Such numbers have little meaning.

(And since we are at it, the charity commission of NZ was not shut down because it did not deliver. There are conflicting reports about what happened in NZ and some of them suggest just the opposite; surely it failed not as miserably as the Charity Commission of England seems to have indeed done.)

Rittelmeyer’s conclusions become even more problematic and, I would argue, untenable, when she thinks out loud about compliance alternatives. Like Andrews she seems to favor organizational templates such as Charity Navigator  and GuideStar.

As to GuideStar, I have yet to read a serious assessment that comes out in its favor and that factors in the huge amounts of funding that that project has received so far. Let’s also remember that its first, and so far most successful, incarnation (still operating at a loss as of last year) is built on data that are self-reported and, importantly, collected by the government. (Yes, those pesky  IRS 990 forms.)

Both Andrews and Rittelmeyer favor a charity watchdog modelled after USA-based Charity Navigator. As mentions, Charity Navigator is a US based entity that specializes in drawing up lists of drawing up lists of various top tens and also —  somehow — rates charities’ financial health and their accountability & transparency. How exactly these assessments are being made is unclear. Notably, Charity Navigator gave CAI / Mortensen a four star rating years after questions about it were raised.

Szper & Prakash (Voluntas 2011), when examining 90 nonprofits in the state of Washington for the period 2004 – 2007, find that ratings tend not to affect donor support for these nonprofits; based on interviews the authors suggest that charities believe that donors tend not to be affected by the Charity Navigator ratings, maybe because they realize that they are not reliable. The authors review several other studies that come to similar conclusions. Relatedly, Szper (Voluntas 2013) finds that, between 2004 and 2008, financial information reported by nonprofits reacted to ratings although she makes clear that she does not intend to imply that ratings changed these nonprofits’ internal operations or how they perceive themselves. This “playing to the test” is of course what economic theory would predict and numerous instances of US colleges and universities misreporting to U.S. News and World Reports suggest.

A key problem is that the data Charity Navigator uses are typically provided by the charities themselves. Says Rittelmeyer (p. 9),

“Each charity evaluator has a different approach, but the general model is the same. The evaluator obtains information about charities from publicly available government documents (such as IRS forms in the United States), from the charity’s own website, or by asking the charity to voluntarily submit information. The evaluator then posts some or all of the information on its own website, usually with its own evaluation of the charity attached in the form of a letter grade or a star rating.”

Note the implications: In a market segment widely known for its diverse set of organizations – organizations that in addition produce mostly experience and credence goods – accountability and transparency is essentially left as a choice to those who might well have a vested interest not to be accountable and transparent.

The presumption of honesty may be a noble on but it is one that is too often taken advantage of (politicians charging private traveling to the public purse anyone?) The literature on tax or environmental compliance or the recent rash of (laboratory) studies on dishonesty (see also here or here) makes clear that this is a silly assumption at best. The truth is, we lie a lot and we lie frequently. Honestly. And we do so the more is at stake. There is a lot at stake in the third sector and taking advantage is made easier because people working in that sector often have a very high opinion of their work (and themselves) and hence find it easy to rationalize transgressions as being in the interest of the some higher cause.

Rittelmeyer seems to argue that the proliferation of national and international evaluators will somehow – Competition! Competition ! — take care of whatever imperfections the current watchdogs are afflicted with. For all I can see that is not likely given their structural problems. And I am in any case not aware of any empirical evidence that would support that conjecture.

I have elsewhere (see here and here and here) made the case for a two-pronged approach based on a truly independent charity commission/registrar that provides elementary data to help us understand the lay of the third sector and to harmonize reporting standards and on the kind of certification model championed by the International Committee on Fundraising Organizations for the few hundred nationally operating charities. Some of the more successful templates are those provided by some of West-European certification agencies. I have pointed to them previously (e.g., here in my comment on the Productivity Commission report).

As it is, the ICFO — a growing confederation of certification agencies in a wide number of countries — just published a new and comprehensive description of the various models used by its members. It is an intriguing read – all 158 pages of it — that would become well those that are seriously concerned about accountability and transparency of the third sector (see here for the press release and here for the opus magnum itself).

The various certification models represented in the ICFO are not, wisely in my opinion, based on the presumption of honesty and attempts of damage control when the damage has been done; most models require the few hundred nationally operating charities – if they apply — to subject to extensive self-reporting and follow-up onsite visits. Those that pass the test are given a seal of approval that has no gradations. A charity is considered either to walk its talk or not. Note that nationally operating charities have an incentive to apply for certification since not applying could be interpreted as having failed the test. Indeed the successful West-European certification agencies typically give the seal of approval to hundreds of not-for-profits, which account for the major share of revenues of the third sector. These agencies make due with low operating costs (less than two million dollars for countries the size of Canada, Germany, Netherlands, and Switzerland) and have been shown theoretically and empirically to be working reasonably well.

If the present government is truly interested in the third sector becoming accountable and transparent, then looking into these models rather than untested Charity Navigator or GuideStar models seems a much better way to go.

What to do with the ACNC? Even Kenneth Andrews seems to admit that some sort of national registrar is needed. That is what the ACNC is already doing and it seems to do it reasonably well. (It’s certainly better than the ATO doing it since, as mentioned, it would face an inherent conflict of interest.) Can, and should, the ACNC engage in collaborative education, training and development as it already does? I am less sure of that although especially smaller charities that notoriously have difficulties to deal with the regulatory burden might benefit from it.

In short, dismantling the ANC strikes me as a silly undertaking at this point; redefining its purpose and challenging its performance in the basic provision of data that make us understand the third sector seems, in contrast, a reasonable way to proceed.  Together with an effective and independent certification system for nationally operating charities and not-for-profits, fraud and various forms of misbehavior could be better controlled and trust in the third sector could be effectively increased. There is considerable scientific evidence out there in favor of some such solution and it ought to be tapped.

There is a real danger that in stead of the not particularly persuasive current regulatory framework one is put in place that will be even worse.

Now you see it, now you don’t: On the deepening crisis in evidence production, and evaluation, in the social sciences (Part II: Some proposals to address it)

Yesterday I stated my understanding of the problem.

So, what to do in light of the deepening crisis?

First, in a recent open letter published in “The Guardian” more than 70 researchers have argued that scientific journals ought to allow pre-registered replications (and other studies). In fact, the journals “Attention, Perception & Psychophysics”, “Perspectives on Psychological Science”, and “Social Psychology” have already launched similar projects.  The experiences so far seem promising.

Second, in the discussion of Ed Yong’s “Nature” news feature it was suggested (see the Lieberman and Hardwicke comments) that undergraduates ought to be enticed – maybe through a special journal for replication studies – to conduct replication studies.  This seems an idea worth pursuing.

Third, all journals ought to insist that data for studies they publish ought to be posted. This is the conclusion that Simonsohn also has come to and it makes a lot of sense (“Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone.”) Specifically, data sets ought to be posted with the journal in which the article is published. There is an interesting issue whether, and under what circumstances, the data ought to be made accessible to other researchers, especially if a study is on-going but that issue seems a minor and solvable one. Relying on the original authors to supply data, sometimes years after the fact, is bound to be a problem for a number of reasons (moves, deteriorating hard and software, crashes, theft) all of which lead to availability attrition of data when journals do not create depositories of data.

Fourth, and relatedly, in his discussion of the Smeesters and Geraerts affairs (here), Richard Gill provides on two slides “morals of the story”. He argues that data preparation and data analysis are integral part of the experiment and that “keeping proper log-books of all steps of data preparation, manipulation, selection/exclusion of cases, makes the experiment reproducible. “ Exploratory analyses and pilot studies ought to be fully reported, as should be the complete data collection design (which, of course, should be written down in advance in detail and followed carefully). He also argues against the wide-spread division of labor where younger co-authors do much of the data – analysis. (I doubt that this latter point is implementable at this point; it’s too entrenched a practice already. It seems better to identify who did what for a project.)

Fifth, replicability relies on detailed instructions and descriptions that allow everyone, everywhere to try to replicate. That in some cases (such as Dijksterhuis’s) sufficient protocols do not exist and have to be generated years after a study has been conducted seems highly problematic.

Sixth, Simmons, Nelson, and Simonsohn (following up on an earlier indictment of practices that facilitate false positives) have provided what they call a 21-word solution for the problem. Say they: “If you determined sample size in advance, say it. If you did not drop any variables, say it. If you did not drop any conditions, say it.“ It is an interesting question whether some such statement would indeed lead to full transparency but it seems a step in the right direction.

Seventh, meta-analyses ought to be conducted more often, in particular in economics where they are still relatively rare. As Ferguson & Heene make clear (here), meta-studies are no panacea but they force some discipline on evidence evaluation.  To the extent that they would also be subjected to the “Just-Post-It” requirement that are likely to help stabilized the evidence base.

Eighth, adversarial collaborations (e.g., http://www.youtube.com/watch?v=sW5sMgGo7dw; transcript provided under the video) is a way of getting away from the trench warfare that can be found in many areas of social sciences these days. Rather than  lobbing at each other ever confirming evidence for one’s own position, the protagonists could agree on writing a joint article – possibly with a third, mutually agreed-on party  moderating – that might help settle disputes. One of the nice aspects of some such way of collaborating is the much more likely balanced assessment of what previous literature had to say.

Ninth, tournaments  are a recent, and increasingly used, tool, as Leonidas Spiliopoulos and I have demonstrated here . In our most recent version of the paper (conditionally accepted at “Psychological Methods”), we argue that tournaments are much wider applicable than we have so far seen.

Tenth,  transparency indices. RetractionWatch has proposed some such index for journals (see here)

Eleventh,  Deborah Mayo – intrigued by one of the final recommendations of the committee that investigated the Stapel affair — has argued on her blog that “the relevant basic principles of philosophy of science, methodology, ethics and statistics that enable the responsible practice of science” may well be taught by philosophy departments.  Maybe so.  It seems for sure desirable that a course addressing these issues be taught everywhere.

Any other ideas? Comments?

Class, discuss !

 

Now you see it, now you don’t: On the deepening crisis in evidence production, and evaluation, in the social sciences (Part I: Problem description)

This may be a crisis that you have not heard about. Or maybe you have heard about some of the more egregious recent exhibits such as Sanna or Smeesters or Stapel or, possibly, Geraerts and Dijksterhuis.

Sanna resigned from the University of Michigan in May 2012 after a University of North Carolina investigation of concerns raised by University of Pennsylvania researcher Uri Simonsohn apparently found those concerns justified. (See here for the full story and here for the article describing the statistical methods that led Simonsohn to his concerns.) While the investigation results, as often in such cases, were not made public, as of today Sanna has had retracted 8 of his publications [see here for most recent RetractionWatch update on Sanna], presumably because of the odd data patterns that Simonsohn had ferreted out. (The reason for Sanna’s resignation were not made public.) Smeesters was a marketing professor at the Rotterdam School of Management. The investigation committee of Erasmus University Rotterdam found enough problems in his studies for the university to ask in June 2012 for the retraction of two of his articles [see here for the most recent RetractionWatch update on Smeesters; see also this excellent ScienceInsider article on this case].  Also in 2012, after more than a decade of outright invention of data for literally dozens of paper, Diederik Stapel (recently profiled in a lengthy piece in the New York Times) finally had his overdue downfall.  [See here for most recent RetractionWatch update on Stapel]. Geraerts (see here a good discussion of that case) has denied that she faked data but two of her co-authors have asked the editors of “Memory” to retract their names from the joint paper. The Dijksterhuis situation (for a somewhat sensationalist write-up see this newsbit in “Nature”] is different from the Stapel and Sanna cases (and possibly the Smeeters and Geraerts situations) in that Dijksterhuis has not been accused of misconduct and none of his papers had to be withdrawn although several have come under heavy attack [see a prominent example here: Shanks et al. on the priming of intelligent behaviour being an elusive phenomenon; that article triggered a sometimes vitriolic debate in the wake of a news bit in “Nature”]

Fraud and misconduct in the social sciences are hardly a novel phenomenon; just google names like Karen Ruggiero and Marc Hauser, formerly of Harvard University. As in other walks of life, it is hard to guard against the deviance evidenced in the behaviour of Ruggiero, Hauser, Sanna, Stapel, Lichtenthaler, and the likes. It is good news that people like them get ferreted out seemingly at an increasing rate. (Of course, we do not know how many like cases are out there and what the growth rate is of new cases .)

Fraud and misconduct, however, are just the tip of the proverbial iceberg that the social sciences ship seems destined to collide with. Hiding the true dimension of the underwater parts of the iceberg is what the committee investigating the Stapel affair called a culture of verification bias. To wit,

“One of the most fundamental rules of scientific research is that an investigation must be designed in such a way that facts that might refute the research hypotheses are given at least an equal chance of emerging as do facts that confirm the research hypotheses. Violations of this fundamental rule, such as continuing to repeat an experiment until it works as desired, or excluding unwelcome experimental subjects or results, inevitably tend to confirm the researcher’s research hypotheses, and essentially render the hypotheses immune to the facts. Procedures had been used in the great majority of the investigated publications that lead to what is referred to here as verification bias.” (p. 48)

The culture of verification bias leads to publication of spurious positive results that often go unchallenged because editors and journal publishers are rarely open to replications (and negative results). Ed Yong’s excellent recent news feature for Nature  reviews some of the relevant literature detailing this “pervasive bias” and quotes Ioannidis as saying that “most published research findings are false” and that across the sciences although the problem seem more severe in the softer sciences.  Fanelli (“PLoS ONE” 2010, abstract) concluded

“the odds of reporting a positive result were around 5 times higher among papers in the disciplines of Psychology and Psychiatry and Economics and Business compared to Space Science, 2.3 times higher in the domain of social sciences compared to the physical sciences, and 3.4 times higher in studies applying behavioural and social methodologies on people compared to physical and chemical studies on non-biological material.”

See also Fanelli’s related 2012 “Scientometrics” article in which he documents that negative results are disappearing from most disciplines and countries. (I will revisit this article in due course since Fanelli also has interesting things to say about the institutional drivers of these developments.)

Small studies with low power is one of the key issues feeding into these findings, as argued in a highly readable piece by Kate Button in “The Guardian” [the piece contains a link to an article by herself and others including Ioannidis], and as argued in an important Ferguson & Heene contribution to  “Perspectives on Psychological Science”.  These authors argue that small sample size undermines the reliability of neuroscience and that of psychology. The situation is likely to be similar in (experimental) economics where the median power of dictator studies (according to work in progress by one of my Ph.D. students, Le Zhang) seems to be less than 25 percent, meaning that if there are 100 true effects to be discovered studies with 25 percent power on average can be expected to discover only 25 of them. (The flipside side of this is that those small, low-powered studies that discover a true effect, are more likely to overstate the effect.)  Of course there are many other ways, some of them more consciously practiced than others, that lead to the irreproducibility of results. This is true for all laboratory social sciences.

One thing that is particularly troubling about the current controversy swirling around Dijksterhuis (and earlier the controversial studies  by Bargh and Bem discussed in Yong’s already mentioned piece) is that experimental protocols are often severely underspecified. “The scientific enterprise rests on the ability of other researchers to replicate findings,” says Joel Cooper reflecting on fraud, deceit and ethics in one of the victimized journals, and it is troublesome to find, in the discussion that followed Ed Yong’s news item in Nature, that many experimenters seem to have trouble with that concept .

The one good development is that these problems are now being discussed. Unfortunately, the fact that they are being discussed does not, as Gary Marcus seems to argue in his recent “New Yorker” piece, guarantee that effective solutions will be found; see also Bobbie Spellman’s optimistic assessment of the situation.  The game called “stabilization of the evidence base” has many players with often diverging interests  and the institutional arrangements are not engineered  by an omniscient social planner.  It is quite questionable whether in light of the ever increasing competition among researchers for grant money, recognition, and even fame,  or research-only positions science will be able to self-correct.

I shall summarize almost a dozen proposals in a sequel to this problem description.

For now, I am happy to hear comments on it.

Class, discuss !

Can’t We All Be More Like Scandinavians? (No, probably that’s not a good idea but wait … )

In a very recently published study, likely future Nobel Prize laureate Daron Acemoglu (who was awarded the John Bates Clark medal, a leading indicator for Nobel Prize wins, in 2005) and a couple of well-known colleagues revisit the old question of the comparative advantages of cut-throat societies like the USA on the one hand and more European – style states (such as Norway, Sweden, Finland, Denmark, and presumably Germany) on the other hand,  and provide an intriguing answer that ought to bear on what’s left of the public policy discourse in Australia, especially in the run-up to the federal election next year.

The basic story that Acemoglu and his colleagues formalize (based on  stylized facts that I found persuasive if somewhat incomplete),  is this: Inequality attracts more entrepreneurial activity which leads to more innovation (as measured by patent measures); in essence this part of the story is illustrated by the USA. (And, yes, we can argue about causality here, can’t we always?)

Innovation, of course, moves the production frontier out. Importantly, innovation activity also creates considerable knowledge spillovers that allow others (“Scandinavians”) to free-ride on that knowledge creation and to design and implement „cuddlier“ societies. It is, so Acemoglu et al. argue, quite possible that “Scandinavian” societies are better off by various measures.

Why then would not every country (want to) free-ride? Well, they might want to but … in order for others to free-ride some fool has to give them the opportunity. In other words, symmetric equilibria – where either every state is Scandinavian, or every state is non-Scandinavian – are dominated by an asymmetric equilibrium (or possibly several asymmetric equilibria), and that is a good thing from a global point of view. It’s an intriguing idea. And I think it has a lot to go for it. (I am also sure that those at the receiving end in the USA, arguably approximated by Romney’s 47  percent estimate, do not like the reality of it at all.)

Clearly, the Acemoglu et al. paper has some bearings on the debate about the growing inequality in Australia – since 1980, the top 1 percent doubled, and the top 0.1 percent tripled their share -, as reflected in Swan’s recent song as well as the continuing political debate on inequality,  evidenced very recently here (Joye in AFR on Egalitarian distribution of income is destructive) and here (Leigh, identified by Joye as Australia’s leading inequality expert, in response, also in AFR, on Take the test, which society do you prefer?)

Leigh’s position, and Labor’s for that matter,  is essentially that being more “Scandinavian” is a good thing because a) the spoils of the minerals resources boom ought to be spread more fairly, b) it is good for the social fabric, and c) if state revenues are invested properly (e.g., in education and social services for those truly in need), they will enhance the growth and welfare prospects of the country more than more cut-throat strategies are likely to do. The work by Acemoglu et al. seems to strengthen this latter argument considerably.

Of course, ultimately it is all about the right mix of equity and efficiency although it is worth reminding ourselves that equity and efficiency are not necessarily substitutes. Acemoglu et al. argue that this insight does not only apply to organizational processes.

Adam Smith, revisited

The perception of Adam Smith, too often claimed by ignorant liberal market extremists as one of theirs (as the founder or father of modern capitalism, or laissez-faire economics, and so on), has undergone a significant change, at least among those who actually have read parts of his oeuvre, and especially among those that have read more than his classic “The Wealth of Nations” (1776). Behavioral economists, in particular, have taken to gut Smith’s earlier “Theory of Moral Sentiments” (1759). Unfortunately, many behavioralist economists seem to have gotten out of it little more than some tantalizing quotation to decorate their articles on human foibles or social preferences. To the extent that Smith’s work on astronomy, methodology, rhetoric, languages, and jurisprudence, continues to be ignored, it remains poorly understood and is likely to continue to serve as an inkblot test for political priors.

There is no excuse for that kind of cavalier appropriation of the history of economic thought. At least half a dozen biographies are out there, from Dugald Stewart (Smith’s first biographer of sorts, still extremely insightful), over Scott and Rae, to more recent ones such as the two editions of Ian Ross’s often celebrated attempt to write the ultimate (his)story of Smith life and work. I personally found Stewart’s – as short, and short on biographic detail, as it is – always the best primer. No longer.

Nicholas Phillipson recently published a brilliant attempt to reflect on Smith’s life and work as well as the circumstances in which they unfolded. It’s a veritable tour de force but one worth every minute it takes to read it. Indeed it is a must-read for everyone who is interested in what Smith really said.

What makes Phillipson’s book so truly outstanding, and indeed astounding, is his knowledge of the place and times in which Smith grew up. Phillipson had earlier written a similar biography on David Hume (first edition 1989, of which a new and revised edition was published recently) , Smith’s best intellectual buddy,  but it is Phillipson’s detailed knowledge of the culture and society of eighteenth-century Edinburgh and Scotland, and England for that matter, that contributes to the insights laid out in this book .

Says Phillipson, “I wanted to write about Smith’s life and works in a way which would throw light on the development of an extraordinary mind and an extraordinarily approachable philosophy at a remarkable moment in the history of Scotland and of the Enlightenment.” ( xiii) He succeeds brilliantly in that undertaking, often relying on Smith’s own device of engaging in conjectural history, i.e., telling the most likely story based on the facts that he can muster (of which there are many; Phillipson has done his homework well).

Phillipson refers to the standard sources – e.g., Stewart, Scott, Rae, Ross (the first edition), and the editors of the Glasgow/Liberty editions of Smith’s oeuvre, but he goes significantly beyond (selective) reliance of standard sources. He connects, for example, the geography of key places that Smith lived in with the political and social circumstances in which he lived, and then with the emergence of Smith’s ideas about our capacity for sociability.

The Oxford period, generally neglected by commentators, is here given prominence in Smith’s intellectual development. Phillipson urges us to reconsider the way French Literature  might have shaped Smith’s moral philosophy. He also convincingly shows that Smith’s first philosophical investigations on the origin and evolution of language and jurisprudence, which together became the basis for his entire system, were attempts to become a “perfect Humean” (71) in providing more systematic accounts of topics Hume had neglected. Phillipson also highlights Smith’s effort in reconstructing his Lectures on Jurisprudence during his last year at Glasgow University “so as to bring questions about the duties of government to the fore”, as if he was already preparing the ground for The Wealth of Nations (172-173, 175).

The picture of Smith’s character that emerges in Phillipson’s thirteen chronologically ordered chapters  is that of a precocious doted-on only child and adolescent who was brought up by a pious mother who he adored, who was extraordinarily deeply and widely read, who was inspired to a significant degree by Francis Hutcheson and David Hume, who wrote beautifully yet carefully (guided by his early thinking about Rhetoric and Belle Lettres), who early in his life lectured in rudimentary form about the big themes that he was to write about in his key published works, who spent thirteen years – during which he wrote his highly influential The Theory of Moral Sentiments (TMS) and which he called “the happiest and most honourable period of my life” (268) — as philosophy professor at Glasgow, who then resigned to accompany as a tutor a young Duke while traveling Europe (especially Paris), who – after a few months in London – spent a decade in Kirkcaldy to prepare what was for a long time perceived to be his opus magnum, The Wealth of Nations, and who then spent another decade  as influential Commissioner of Customs while revising his published works and trying to make progress on others, who was well-connected – indeed “born into the middling ranks of Scottish society” (8) – and socially savvy (in that he understood well what might be too offensive), and who throughout his life was noted for his extraordinary memory as well as his considerable absent-mindedness and social awkwardness (the latter being particularly contextual).

There is little new in the basic story line presented above, even though one might be surprised to learn that during his professorship in Glasgow Smith was a “cult figure” for students who could buy his portrait bust at local bookshops and a “guru” for merchants turned free traders by his influence (136).

What is new, and what to some extent leads to a revision of the caricature of Smith as a mere scholar, is evidence compiled by Phillipson that shows that Smith, not with-standing his well-documented  and frequently mentioned  absent-mindedness and social awkwardness, was very much a man of the world. That was reflected in his membership in multiple clubs as well as his being

“a serious university librarian, acquiring stocks of classical literature, contemporary history, philosophy, law and, interestingly, commerce. … By 1754 Smith had also gained a reputation for property management. … By the late 1750s he was in charge of the university’s accounts and the university’s dealings with the town council on property matters and the students’ tax liability. … By the late 1750s seniority and competence had established him as one of the most powerful and heavily worked members of the College. He was Quaestor from 1758 to 1760, Dean of the Faculty – twice –  from 1760 to 1762 and Vice-Rector from 1762 to 1764. … by the end of his professorial career he had also been drawn into the thick of the complicated and often acrimonious political life of the College.” (131)

Similarly, Phillipson argues that Smith, after Townsend’s death, was importantly involved in reviving the Buccleuch estates (see 202 – 204), being quite possibly instrumental in devising an intriguing incentive-compatible scheme meant to encourage agricultural improvements (204), and quite possibly guiding Buccleuch through treacherous financial waters when the Ayr Bank (of which Buccleuch was one of the founders and capital guarantors) crashed in 1772 and left the Buccleuch estates to remain seriously encumbered for seven decades (206 – 207). Plus, already when Smith came back from France in 1766, his advice was sought by top political figures.

“He was able to move in political circles at a time when the future of Anglo-American relations, the role of the East India Company in the government of India and public finance and taxation were under discussion, all matters of importance to the Wealth of Nations.” (201)

When he returned to London in 1773 – “In the spring of 1773 Smith decided to end his Kirkcaldy retreat and to finish the Wealth of Nations (WN) in the capital. He needed company and American news.” (209) — , things were not any different: “The three years Smith spent in London … were notably sociable … .” (210) It was, “[h]owever, the American question that appears to have absorbed most of his energies … . “ (211)

Phillipson provides considerable detail about the immense work load that Smith was burdened with when he became Commissioner of Customs (255 – 268). In fact, Phillippson argues that this appointment was “a misjudgement of historic proportions” ( 209) on Buccleuch’s part: “The Commissionership of Customs was certainly honourable and lucrative, but it proved to be time consuming and wearisome and was to leave Smith constantly bewailing the lack of time for pursuing his many philosophical projects.” (209)

In sum, Phillipson’s book is a very fine read indeed. In my view it is the most insightful book yet on Smith’s life and work; it will be hard to match. It is a must-read for Smith scholars, and should be of considerable interest to others, not only  economists. Even though it only came out a few months ago it has been reviewed widely and – mostly – to overwhelming acclaim. Detractors tend to be people who found it too challenging and too academic (see some of the reviewers on Amazon).

I have written, together with Benoit Walraevens, a (long) review on which especially the second part of this blog entry draws heavily; the review has been published in History of Economic Ideas. If you are interested in reading our review in all its glory but have trouble accessing a copy, please send a request to a.ortmann ( at ) unsw (dot) edu (dot) au.

Ig(noble)Nobels 2012, and beyond …

Last week, in another fun-filled ceremony at Harvard University, this year’s IgNobels were awarded for research that seems particularly ignoble. This spoof has become a cult event of note; a report on the festivities and a succinct summary of this year’s ten award winners may be found here.

Some of the Ig(noble) Nobels – a prize for “discoveries that cannot, or should not, be reproduced” — seem richly deserved. But I do wonder about, and find undeserved,  two.

First, the prize for neuroscience, which seems the one most meaningful to economists, went to  C. Bennett, A. Baird, M. Miller, and G. Wolford [USA], for demonstrating that, by using complicated instruments and overly simplistic statistics, one can see meaningful brain activity everywhere – even in a dead salmon. The IgNobel seems undeserved because this research addresses – tongue in cheek but rather effectively – major problems in behavioural research (underpowered studies and other questionable statistics).  These caveats are not new (e.g., here and here) but warrant repeated reminders.

Second, the prize for fluid dynamics, went to H.C. Mayer and R.Krechetnikov for their systematic exploration of why it is so difficult to walk with a cup of coffee without spilling it. Apparently, particularities of common cup sizes, coffee properties (its viscosity), and the biomechanics of a walking individual combine to contribute to the occasionally ugly consequences of such mishaps:  “While walking appears to be a periodic, regular process, closer examination reveals fluctuations in the gait pattern, even under steady conditions. Together with other natural factors – uneven floors, distractions during walking, etc. – this explains why the cup motion during the constant walking speed regime is composed of noise and smooth oscillations of constant amplitude.”  (p. 3) The authors draw on lessons from sloshing engineering for preventive measures such as concentric rings (baffles) arranged around the inner wall of a mug, possibly – for better damping – perforated. While this strikes me as a good start, plenty of further explorations seem in order.  (In work in progress, I am currently conducting controlled experiments on the effects of putting a lid on a coffee mug. I will make sure that the CE reader will be first to learn about my study’s results. Pilot sessions conducted so far have been promising.)

In closing I note that no IgNobel prize for economics has been awarded in years; I hence nominate for 2013 this recent  study by Attema and colleagues: “Your Right Arm for a Publication in AER?” . The authors use the time tradeoff method popular in medical decision making to elicit economists’ preferences for publishing in top economic journals and living without limbs. The American Economic Review (AER) turns out to be preferred to QJE which outranks RES which outranks EER. The (relatively few) responses allegedly imply that they would sacrifice more than half a thumb for publishing in AER.

I submit that what needs to be said about this study is succinctly summarized in the commentary reported in fn 1. A worthy contender for the 2013 IgNobel sweepstakes it is.

Unskilled and unaware of it?

In a widely cited, and provocatively titled, article in 1999, now approaching 1,400 citations on scholar google, Kruger and Dunning seemed to provide evidence that “difficulties in recognizing one’s own incompetence lead to inflated self-assessments”. In other words, the less skilled (that’s what the authors really meant to say, alas a provocative title tends to sell) were argued, on average, to be more unaware of the absolute and relative quality of their performance. In fact, the less people were skilled the more they seemed “miscalibrated”.

In an earlier article, Krajc and yours truly have provided in response a simple model and some exploratory computational exercises that suggested that the less skilled may simply face a more complicated signal extraction problem. Our argument hinged on the distribution of skills in the environments that were typically studied being highly asymmetric, often resembling J – shaped distributions. Simply put, it is easy for the A++ student to figure out where s/he stands but much more difficult for those ranked towards the bottom of the class.

In an article just published, Ryvkin, Krajc and yours truly provide evidence in favor of a conjecture formulated in the earlier article: that with fairly little feedback self-assessment biases can be overcome. There was certainly a distinct literature on calibration that suggested that much (e.g., Juslin, Winman, & Olsson 2000 or Koehler 1996 – see here and here). There was, however, also some evidence that suggested otherwise (reviewed in our just published article.)

We hence set out to study whether, and to what extent, the difficulties of the less skilled in recognizing their own incompetence (largely, overconfidence) can be reduced by feedback. We report the results of two studies, one in a natural setting of a two-months graduate orientation and screening semester (and there particularly the micro-economics course by instructors not part of the research team), and another in the same environment but using tasks and stimuli materials that were better under our control. We document initially the same strong miscalibration that Kruger and Dunning also documented but also show that over the course of the two months this miscalibration almost completely disappears with the notable exception of those at the very bottom of the skills distribution and there only for their relative self-assessment which might provide support for the conjecture of Krajc & Ortmann (JoEP 2008; for reference see above) or self-image (e.g., Koeszegi 2006).

So, are the less skilled doomed to be unaware? It seems, no, not really. Learning goes a long way. Which should make us happy. But, of course, it all depends (on the strength and type of feedback, for example.)

MOOConomics

Inside Higher Ed, an online blog that features detailed analyses on developments in higher education in the USA (including a number of interesting reads on the developments at the University of Virginia – here and here — and the fundamental questions it prompts on the governance of higher education institutions there and elsewhere and, yes, the introduction of MOOCs has to do with governance), published last week an essay by Carlo Salerno on “The real economics of massive online courses” (since retitled “Bitter Reality of MOOConomics”). Salerno, a director with Xerox Education Services Group, as do others, claims that MOOCs (massive open online courses) are not sustainable.

Calling the Massive Open Online Course movement (instigated by MITx and by Stanford/Coursera) a craze, he makes two points.

First, „The overwhelming majority of college-goers today don’t enroll in higher education to get an education as much as they seek to earn a credential that they can successfully leverage in a labor market. Surely, the former is supposed to beget the latter, but it’s a hurdle that’s easily, and often, leaped.“ In other words, the times where education was a costly signal in terms of effort have gone. The assessment seems to support introduction of MOOCs although it seems far from clear what the value of some such credential could possibly be.

Second, students – by way of well-documented peer effects – are an important input to education: „For individual institutions, obtaining high quality inputs works to optimize the school’s objective function, which is maximizing prestige.“ So, if institutions care for their prestige, „colleges have a strong incentive to protect, or control, the quality of the degrees that they confer because successful graduates directly affect the institution’s prestige and the public’s perceptions about the value of its products.“ Selectivity, in other words, matters. Which essentially guts the promise of MOOCs all by itself.

Or does it?

In the end it is quite possibly all about relative positioning of higher education institutions. With about one third of the institutions being already in dire financial straits and more headed to financial unsustainability – at least in the USA (see here and links therein) — and about the same percentage of institutions financially strained world-wide, we are in the middle of a world-wide race to the bottom which will intensify the pressure to offer online courses, or to at least make them part and parcel of the course offerings.

I doubt MOOCs will really be an option for those institutions who like to think themselves as being at the top of the prestige pyramid, as Stephen King also seems to argue (see here, in particular the last paragraph). That’s because, in the end, even the most clever and accomplished teaching videos cannot compensate for the interactive learning that happens between teacher and student and among students. This form of interaction is the one with by far the most effective and efficient feedback.Which is what ultimately drives high-quality learning.

The behavioral economics of teacher incentives. (Maybe, maybe not.)

Good teachers matter indeed.

Some teachers are borne this way; they just have the natural ability that it takes. Others have to work, and possibly work hard, on becoming at least halfway decent ones. That often requires considerable effort that can, or cannot, be elicited through teacher incentives. Not surprisingly, incentivizing teachers – and measuring teaching outcomes – has been on the agenda for a while but remains a bone of contention.

A recent working paper by Italian researchers suggests strongly that teaching evaluations, for example, are not a promising avenue for the simple reasons that they can be, and apparently are, gamed. The authors show persuasively that teaching evaluations are not only a poor measure of effectiveness but, in fact, may measure the opposite. Not really a surprise there except maybe for administrators that are convinced that everything can be kpi-d. Well, the paper by Braga et al should make them think again.

While it is difficult enough to determine good teachers ex post, it is even more problematic to do so ex ante. Say the authors of another new working paper:
“Observable characteristics such as college-entrance test scores, grade-point averages, or major choice are not highly correlated with teacher value-added on standardized test scores … . And, programs that aim to make teachers more effective have shown little impact on teacher quality … . To increase teacher productivity, there is growing enthusiasm among policy makers for initiatives that tie teacher incentives to the achievement of their students.” (Fryer et al. 2012, p. 1) Apparently, in the USA at least ten states and many school districts have implemented various teacher incentive programs.

In “Enhancing the efficacy of teacher incentives through loss aversion: A field experiment,” Fryer et al. (2012) ride an old workhorse of behavioral economists – loss aversion – for some additional mileage. Loss aversion is the idea that people – somewhat irrationally — cling on to something that theirs when on average they should not. The idea of loss aversion is closely tied to various “endowment effects” figuring prominently in the behavioral economics literature and is also a key ingredient of prospect theory.

Arguing that there is “overwhelming laboratory evidence for loss aversion” (p. 18, see also p.2 for a similar statement) but little from the field, the authors report the results of a field experiment that they undertook during the 2010-2011 school year in nine schools in Chicago Heights, IL, USA. They randomly picked a set of teachers for participation in a pay-for-performance program – 150 or 160 eligible teachers chose to participate — and then randomly assigned these to one of two treatments. In the “Gain” treatment, participants were given at the end of the school year bonuses linked to student achievement. In the “Loss” treatment, participants were given at the beginning of the school year a lump sum payment (parts of) which they had to return if their students did not meet performance targets. Teachers with the same performance received the same final bonus independent of the frame.

The result:  Those in the Loss treatment manage to increase student math test scores significantly indeed (“equivalent to increasing teacher quality by more than one standard deviation”) while those in the Gains treatment don’t. The authors attribute this strong showing of the “Loss” frame; essentially they argue that paying people upfront, and threatening them with repossession if they would fail to make the grade, were better incentivized because they were “loss” averse.

I am rather skeptical about these results and doubt that they will be confirmed in (large-scale) replications, or for that matter in field applications. What makes me skeptical is that, for starters, the alleged laboratory evidence in favor of  loss aversion is much less overwhelming than the authors try to make us believe. For example, work by Plott & Zeiler in The American Economic Review 2005 (here), 2007 (here), and 2011 (see here and here for a user-friendly blog entry based on their 2005 article that led to the 2011 controversy) has seriously questioned the reality of the endowment effect as well as the related asymmetry between willingness to accept and willingness to pay and hence the underlying idea of loss aversion. Ironically, one of the co-authors (List) has also made his name with artefactual experiments that seem to demonstrate that only inexperienced consumers are likely to fall for endowment effects (e.g., this 2004 Econometrica piece .)

I am also skeptical about these results because – while other explanations are argued not to be likely (something which, especially regarding cheating, seems debatable) – what strikes me as the most obvious explanation is not being discussed: Hawthorne, Pygmalion, placebo and other expectancy effects (e.g., here; see also a recent piece by two of the present authors). Rather than being loss averse, those in the Loss treatment may simply be shame averse for not having made the grade, an effect that most likely was significantly enhanced by them knowing that they were closely watched by scientists.

Last but not least there is, of course, the question whether the one-off effort, if indeed it exists to some extent, could be extracted year after year after year. Maybe, maybe not.

%d bloggers like this: