Academia - Core Economics

Ethical failures: Where they come from and how to address them

A review of

Gentilin, Dennis. The Origins of Ethical Failures. Lessons for Leaders. A Gower Book. Routledge (2016). ISBN: 978-1-138-69051-6

Ethical failures were in the press big-time in 2017. Prominently, creeps like Harvey Weinstein, James Toback, Bill Cosby, Larry Nassar, etc. were accused of sexual transgressions of various sorts (and in some cases admitted them to varying degrees). The sheer number of accusations leaves little doubt that, in their substance, they are correct. One thing that was truly shocking, on top of the specifics of many of the allegations, was that some of these transgressions went on for literally decades, that many people seem to have known about them for years (if not decades), and that the perpetrators did get away with them for an unconscionably long time. It is clear that organizational failures must have played a major role. This was implicitly acknowledged in the name of The Royal Commision (RC) into Institutional Responses to Child Sexual Abuse, established under the Gillard government in 2013 and which reported all 17 volumes of its findings on December 15, 2017. The RC also laid out recommendations.

It did not really come as a surprise that once again massive organizational failure, in particular of the Catholic Church, was identified as a major finding. It did not come as a surprise because for years there had been a never-ending stream of trials, not just in Australia, suggesting just that, and providing plenty of evidence that the Catholic Church – in its (continued) belief that it is a law and world unto itself — had engaged for decades in what might generously be called economy with the truth.

Two weeks earlier, after another year of numerous reports of questionable practices, and record profits of the four major banks, the Turnbull government saw itself forced — by its own backbenchers, no less — to announce that it would establish a RC into misconduct in the banking industry. It was a step that Labor and the Greens had urged for more than a year. (The recent draft report of the Productivity Commission has made clear that some such RC is indeed overdue.) The Turnbull government’s acceptance of something that it could not prevent, and its subsequent attempts to undermine the effectiveness of the RC by simultaneously widening its scope and imposing an essentially unrealistic timeline, demonstrates, at the minimum, the kind of myopic opportunism that Australian politics seems drenched in.

Having graduated in 2001, Gentilin became a member of the FX trading desk of the National Australian Bank (NAB), one of the four major banks. In 2004 that trading desk became involved in a trading scandal that rocked NAB and led, within a couple of weeks, to the resignation of both its chairman and CEO, the reconfiguration of the board of directors, and significant financial and reputational losses. Gentilin was the young trader who blew the whistle. Contrary to many other whistleblowers (who are typically harrassed out of the organizations on which they blew the whistle), he stayed with NAB for more than a decade – as head of the institutional sales team and a member of the corporate strategy team — before he resigned in January 2016 to found Human Systems Advisory, a name meant to be programmatic. The foreword of his book was written by the current chairman of NAB who states: “There are no simple answers in this book. But there are answers. And there are important truths, supported by deep and rigorous analysis. These should be of interest to all corporate leaders, in both executive and non-executive roles.” (p. xvi). One such truth, says the chairman – apparently quoting Gentilin – is that “leaders must strive to articulate a meaningful social purpose for their organizations that is underpinned by a virtuous set of values.” That’s quite a mouthful, and the impending Royal Commission on the banking system suggests strongly that the major banks (that tried at first to fight off the RC until they realized that fight had been lost) have continuing trouble to understand that particular message, as does the recent draft of the related Productivity Commission report.

Below, I am interested in both the depth and rigor of the analysis and the truths that Gentilin establishes. I am also interested in the implementability of the measures that he proposes.

In his Introduction, Gentilin states that he draws his evidence from “behavioural business ethics” which he defines as the intersection of business ethics and psychology (p. 5). While he is credited on his website with a degree in psychology, Gentilin makes clear that he wrote this book as a “practitioner” rather than “an academic, a philosopher or an ethicist” (p. 4). He does so in four chapters that explore “The Power of Context”, “Group Dynamics”, “Our Flawed Humanity”, and “What We Fail to See”. A conclusion follows.

Gentilin relies heavily on summaries of articles from psychology that explore human nature and the circumstances under which nice behaviour might turn into, well, not so nice behaviour of different shades. While there is brief perfunctionary nod (p. 3) to the replicability crisis that has afflicted psychology, throughout the book there is little discussion of relevant laboratory design and implementation issues such as incentivisation, experimenter expectancy effects, external validity, and so on (Hertwig & Ortmann 2001; Ortmann 2005). Never mind the fact that much of the evidence on unethical behaviour paraded in this book has been produced with deceptive practices, arguably an unethical practice itself (Ortmann & Hertwig 2002; Hertwig & Ortmann 2008). There is no discussion of statistical issues such (lack of) power computations, p-hacking, publication biases, and what not here either.

Claiming that “explanations of unethical conduct rarely give proper consideration to the system within which people operate … (and) tend to focus on identifying ‘bad apples’ or ‘rogues’” (p. 7), in Chapter 1, Gentilin explores how the environment can impact human (mis)behaviour and, on balance, concludes that “the ‘barrel’ within which the ‘bad apples’ operate must be given as much (if not more) attention as the ‘bad apples’ themselves.” (p. 8). Before he reviews the lessons to be learned from the Stanford Prison Experiment, Gentilin reviews literature on social norms and how they affect behaviour. The well-known Cialdini et al. littering and Mazar et al. (dis)honesty studies are paraded, as is an interesting lab study by MacNeil & Sherif (1976) in which the authors demonstrate generational transfer of (questionable) practices, and a related field study by Pierce & Snyder (2008). Distinguishing between descriptive (“derived from what is”) and injunctive (“derived from what ought to be”) norms, Gentilin documents cases where unethical descriptive norms tear to smithereens injunctive ones. He relates this to his reading of what led to the FX trading scandal at the NAB: “young people in particular are vulnerable and endorsing immoral social norms … In the FX trading scandal that engulfed the NAB, immoral social norms emerged that promoted excessive risk taking and misstating the true value of the currency options portfolio.” (pp. 18 – 19). This is hardly surprising, and indeed Gentilin mentions the LIBOR rate-fixing scandal and the professional cycling drug-taking as other high-visibility events. He could have also mentioned the lending practices of major US banks before the housing and mortgage crises (e.g., Gjerstad & Smith 2014), the despicable transgressions at Abu Ghraib, or zillions of other real-world examples. After having reviewed the Stanford Prison experiment in some detail, Gentilin identifies two important take-home lessons from it: first, a specific context “can cause people of sound character to behave in totally uncharacteristic and inappropriate ways.” (p. 24) and, second, the emergence of such contexts is possible only when leaders allow it. Drawing on more experimental evidence (such as Bandura’s children imitating adults’ behaviour experiments), he suggests the obvious parallel for what happened at NAB: “Just as the adults were the role models in Bandura’s experiments, leaders that control the bases of power are the role models in large organizations. For these leaders there will inevitably appear some key moments where, through their actions, choices and decisions, they will send powerful messages that shape the ethical climate for their organizations and types of social norms that emerge. … how a leader responds in these ‘defining moments’ shapes the ‘character of their companies’.” (p. 30). Only leaders who are veritable role models will be able to prevent formal mechanism being eroded by informal mechanisms that hammer away at them. Again, Gentilin suggests that such failure of leadership is what happened at NAB and at the Barclays Bank during the LIBOR rate-fixing schedule, and for that matter in the phone-hacking scandal that led to the demise of News of the World. Gentilin concludes the chapter with a list of “ten questions for senior leaders within any organization” (pp. 37 – 38). Presumably, these questions are unlikely to be answered in an honest manner where it matters. It is the evidence accumulated in this chapter but also elsewhere (Dana et al. 2007 comes to mind, or Miller & Ross 1976) that suggests that much.

Gentilin starts off Chapter 2 with a Nietzsche quotation that sets the stage: “Madness is the exception in individuals but the rule in groups.” (p. 45). The basic point made is that group membership can reinforce – cue social media echo chambers – the drifting away from injunctive norms to descriptive ones. Writes he: “In my experience at the NAB, dysfunctional group dynamics in the currency options business played a significant role in promoting the emergence and maintenance of immoral social norms and unethical behaviour [such as flagrant and persistent limit breaches or excessive risk taking, AO]”. To buttress the case, Gentilin presents Milgram’s 1974 obedience studies, as well as Gina Perry’s recent critique of them (Perry 2012) which, in light of considerable supporting evidence of the original studies (e.g., Haslam et al. 2014), he dismissesin their substance. He then highlights what we learn from Milgram’s inclusion of a variation that drew on the group paradigm. That motivates a discussion of the conformity experiments through which Asch (1956) tried to identify the conditions under which participants would contradict a majority. In this context, Gentilin also briefly discusses a between-subjects study by Woodzicka & LeFrance (2001) who had a male interviewer ask female applicants inappropriate questions. The basic result was that 6 out of 10 subjects claimed they would object (hypothetically) but none in the control group refused the answer in a “real-life” scenario. That seems the kind of pattern that allowed the Weinsteins of this world to get their way for too long. Only in the case of Weinstein and similar assholes (here used in the technical sense of Sutton 2007), the stakes were arguably considerably higher. People’s lack of willingness to stand up and be counted is, unfortunately, so widespread that it is well documented and it is a recurrent theme of great movies such as Hidden Figures. Gentilin makes clear that, based on his experience at NAB, “facing the fork in the road in a hypothetical scenario is vastly different from facing it in reality.” (p. 67) He also states, “I am personally sceptical of other research into whistleblowing that focuses on ascertaining the types of personality or dispositional characteristics that may predict whether an observer of wrongdoing will take action and report it. …This line of enquiry fails to properly consider the power of the situation.” (p. 67). Gentilin concludes the chapter with another list of “ten questions for senior leaders (and followers) within any organization” (pp. 73). I doubt that these questions will be answered in an honest manner where it matters, for essentially the exact reason that Gentilin has identified in the chapter.

In Chapter 3, Gentilin – notwithstanding his, in my considered opinion, sensible stand on the relative importance of context and dispositional characteristics – dives into “our flawed humanity”. Programmatically, he starts with an epigraph featuring a quotation from Kant, “Out of the crooked timber of humanity, no straight thing has ever been made.” (p. 80). Gentilin then tries to answer questions such as “Are Humans Self-Interested?”, cursorily sampling evidence from experimental economics, neuroscience, and evolutionary biology. Predictably he concludes that this research shows that “human nature (is) far different from the one suggested by the axiom of self-interest” (p. 86), though he qualifies the statement with the caveat that we are not always altruistic and cooperative. This alleged “paradigm shift” (p. 87) is, unfortunately, the major bone of contention between those marketing Behavioural Economics (and often shamelessly benefitting from it) and those doing Experimental Economics, and I believe that the social-preferences literature that has created it has as much merits as the IN oxytocin, ego depletion, and power poses research now, for all I can see, thoroughly debunked. Better not plan your life, or organization, on such flimsy evidence. From an evidence point of view, and also a theory point of view (e.g., the important insights stemming from repeated game situations), this chapter is the weakest. Gentilin’s sampling of the evidence strikes me as scattershot and unsystematic. After discussions of issues such as power and its corrupting influence and fear and awareness of our own mortality that feeds into it, Gentilin concludes the chapter with a list of “eleven questions for senior leaders within any organization” (pp. 118) I fear, these questions, again, are unlikely to be answered in an honest manner where it matters.

In Chapter 4, Gentilin starts with a quotation from Kahneman’s best-seller Thinking Fast and Slow: “We can be blind to the obvious, and we can also be blind to our blindness.” This double-whammy – a variant of the Dunning – Krueger effect — is why questions to senior leaders are unlikely to be answered honestly and self-critically. After a brief mention of another persistent bone of contention – the System 1 / System 2 delineation – and our alleged propensity to rely too much on automatic system 1 which makes us, presumably, liable to various biases (in this chapter loss aversion, framing, overconfidence, moral disengagement, euphemistic labelling), Gentilin lays out the slippery-slope argument that in his view was at the heart of the events that led to the NAB trading scandal: “The FX trading incident at the NAB classically illustrated the slippery slope in action. Not only did ethical standards erode over time, but the seriousness of the ethical transgressions accelerated … “ (p. 130). Laboratory evidence is provided to make that point (e.g., the interesting Gino & Bazerman 2008 study) along with field evidence from the NAB case (pp. 131). An intervention discussed here is to give people more time and essentially get them to break out of their System 1 mode: “There are now numerous studies that illustrate how providing a person with more time whenever they are confronted with an ethical dilemma tends to lead to a more virtuous decision being made.” (pp. 146-7). I have serious doubt about the relevance of, say, the Good-Samaritian study mentioned here for real-world decision making and suspect that a theoretical grounding in organizational economics and repeated game theory would really help to address the challenges that organizations and their leaders face.

Gentilin concludes his book with a plea for more (business ethics) education, a call for the installation of Chief Ethics Officers, and more Lessons for Leaders. He wants business schools to challenge their students intellectually, emotionally, and spiritually. That sounds like something straight out of a high-gloss advertisement such schools produce. The reality, however, of Australian business schools (and undoubtedly business schools everywhere) is that they are rarely intellectually demanding. Their inability to challenge their students emotionally and spiritually is shown effectively by their treatment of casuals and staff. What business schools typically do not have are, in particular, truly independent ethics officers, and HR departments, that could hold the feet of currently widely unaccountable senior leadership to the fire. So, while the idea of a Chief Ethics Officer, who has “a genuine ‘seat at the table’” (p. 161), and is independent, able to freely raise matters of concern, and able to freely “speak truth to power” (p. 161), is conceptually on the money, realistically it is very unlikely to be implemented any time soon, as are truly independent HR departments. As to Lessons for Leaders, Gentilin wants them to be virtuous in the sense of having some community-oriented values. There is a lot of wishful thinking on display here (e.g., that others are willing to take the same risks that he took in 2004) but I think, after everything we learned through the flurry of recent examples mentioned at the beginning of this review, there is not much reason for hope. Even something that should have been uncontroversial, such as the Royal Commission on banking, and the way it came about, demonstrates that common ground is hard to find and cannot be relied on. I fear much harder thinking will be needed to address ethical failures and I fear some strategies will be of the innovative kind provided by the #MeToo campaign that not only has brought down some true monsters but is likely to have changed power and gender relations in the working world irreversibly.

In summary then, Gentilin tackles arguably the most important issue of our times – ethical failures within organizations and for that matter ethical failures more generally. His book is strongest where he illustrates the emergence of his insights with examples from his own NAB 2004 experience. His illustration of various arguments he makes with evidence from behavioural business ethics is wanting. As pointed out above, to his credit Gentilin himself – although unaware of important methodological debates among psychologists as well as between psychologists and economists – grasps intuitively the lack of external validity of some of the evidence that he presents and it is clear that his NAB 2004 experience has been a good guide to identify which laboratory evidence has some external validity, and which does not. I think the book could be considerably improved with a more even-handed and complete assessment of the evidence from psychology and other social sciences (and here in particular economics) as well as an additional focus on incentive-compatible organizational design. To rely on business ethics education in business schools (whether in Australia or elsewhere) or a sense of community oriented-ness of business leaders is just not going to cut the mustard, as the widely perceived need for the Royal Commission in the banking system demonstrates.

Having recently interacted with NAB, once again, with mortgage related issues, I have no doubt that NAB culture is pervaded with everything but a meaningful social purpose that is underpinned by a virtuous set of values (e.g., the loan officer I dealt with did everything to prevent me from comparison shopping, and essentially gave me misleading information about the rates that I would be getting), and I have little doubt that the same applies to each of the other three major banks. There is a reason why the major banks in Australia have had outsized profits and some of the highest returns on equity in the world. The recent draft of the related Productivity Commission report spells them out.

I appreciate Dennis Gentilin’s comments on a draft of this review.

EU plans for VAT taxation are doomed to fail. Again.

Taxation is the potential downfall of the EU as an institution. The reason is that within the EU, several member states are making money from the tax evasion in other member states, a situation akin to having a wife slowly murdering her husband with poison. Unless this stops, a divorce becomes inevitable.

Luxemburg, the Netherlands, Ireland, Lichtenstein, Austria, London, and several others are at it: they help large corporations avoid their taxation responsibilities. They either make deals that allow companies to hide their tax obligation, have idiosyncratic definitions under which there are less tax obligations, provide re-labelling services such that head-offices can be a mere post-box, etc.

These tax-avoidance enablers have also systematically frustrated all attempts over the last 30 years to harmonise taxation and reverse the damage they have done to the integrity of the other nation states in the EU. Whenever the issue of tax evasion was in the public eye, for instance during the GFC, they stalled by insisting tax evasion should be solved internationally and should include all other tax havens. Predictably, these were impossible demands. They have also made life difficult inside committees and government forums.

The EU bureaucracy has just put out a new set of proposals regarding VAT on large international corporations (like Google and Amazon), impact evaluated and all. I have read them and predict they will not be implemented, nor would they work anyway.

For one, the EU commission has no power to enforce new tax rules, and these proposals are in a long line of ignored prior proposals. To become law they would need the unanimous backing of all EU members. They hence need the cooperation of about 5 countries that would lose billions if they complied. Fat chance, even with Brexit reducing the political clout of London.

Secondly, the proposals repeat the main mistake of the past: they advocate a rules-based administrative system of taxation which is cumbersome, highly-complex, and easy to game. I explain how over the fold. Continue reading “EU plans for VAT taxation are doomed to fail. Again.”

Adverse Action Lawyer wanted in Frijters versus UQ case

I am seeking a lawyer to run an Adverse Action case connected to the recent Fair Work Commission verdict that found systematic breaches of procedures and procedural fairness in the University of Queensland’s actions against me following my research on racial attitudes in Brisbane. I first raised these breaches late 2013, but they were never addressed, with lots of new ones added to them as the case dragged on. The VC of the university was also personally informed of these breaches in April 2014, publicly denying there was anything wrong about UQ’s action in February 2015. He was again informed in March 2015, consistently failing to rectify breaches of procedure brought to his attention. I wish to bring an Adverse Action case to claim back my considerable costs.

I expect the case to be worth at least a few hundred thousand dollars in terms of damages (legal cost, value of my time, etc.), and for it to be potentially one of many others because the FW case uncovered widespread breaches of procedures in UQ’s handling of misconduct cases. So there might well be many others who are now looking to bring Adverse Action cases against UQ.

I offer a pay-for-success contract wherein the first part of any awarded damages would go to the lawyer, but after a threshold payment I want 50% to go to the successful lawyer and 50% towards Vanavil, which is a school for orphaned victims of the 2004 Tsunami flood in India. I feel that helping the poorest Indians will go some way to nullify the damage that the managers of UQ did when they suppressed evidence of adverse treatments of Indians (and Indigenous peoples) in Brisbane and made it harder to research these things in general. And I want to feel that I haven’t wasted my time these last three years on fighting mindless bureaucracies, but that my efforts ended up helping people in need.

Negotiations on the offered contract are possible. Please contact me on email if you are interested or have a good suggestion for a good adverse action lawyer ( p dot frijters AT uq dot edu dot au).

[Ps. The VC of UQ was still making inappropriate claims last week on the UQ media about his lack of involvement and has refused to retract his claims this last week when I pointed his errors out to him.]

Best practice governance of top academic departments

Over the last 15-20 years academic school meetings have gone from rambling and unstructured brawls to dull “executive infomercials”. The former led to marathon meetings. The current model has led to a middle-management culture that often does not take advantage of the very valuable specialists skills of talented, highly trained (and experienced) scholars in the department. Nor does it allow for reasonable checks and balances on the powers of the executive-something that is vital for the management of any group of academics.

Our school at the ANU has adopted a system informed both by ANU’s academic board and how the world’s top economics departments operate. The following is the template that we have adopted and which has replaced the executive committee.

School Meetings

1. Members of the School Meeting are all academic staff at level B and above who are engaged equal to or greater than 50% FTE. Observer status is given to all other staff in the School.
2. School Meetings are chaired by the Speaker of the School Meeting, who is chosen by the Head of School. The Speaker is a Member of the School Meeting who is not a Professor nor a Chair of one of the Committees nor the Head of School.
3. School meetings shall be held at least twice every semester during the academic year. Additional meetings may be called by the Head of School or at the written request of twenty-five percent of the Members of the School Meeting. One-half the Membership of the School Meeting constitutes a quorum.
4. Meetings shall be conducted in accordance with Robert’s Rules of Order. Minutes of the meeting and reports submitted to the Meeting shall be kept and made available to the Members.
5. The initial Agenda for regular School Meetings shall be prepared by the Speaker and circulated in written form at least five calendar days prior to the meeting. The initial Agenda includes the items 1. Apologies 2. Confirmation of Minutes of Previous Meeting 3. Reports of Head of School and the Chairs of Committees 4. Workplace Health and Safety 5. Other Business 6. Items added by the Head of School.

6. Additional items may be suggested by individual Members and, at the discretion of the Head of School, be added to the agenda for the forthcoming meeting. Alternatively, items may be placed on the agenda by written petition of twenty-five percent of the Members. All such additions to the agenda must occur at least three days prior to the meeting.
7. A simply majority of those present and those sending proxy votes and/or absentee ballots shall decide an issue arising from an item in the initial agenda or an item added at the discretion of the Head of School.
8. A majority above 60% of those present and those sending proxy votes and/or absentee ballots shall decide an issue arising from the agenda item not described in 7.
9. A majority above 70% of those present and those sending proxy votes shall decide a motion not arising from an issue on the agenda.
10. Proxies may be given from one member to another. Proxies are given in written form to the other member a copy of which must be received by the Speaker three calendar days prior to the meeting. Proxies cannot be specific to particular items on the agenda nor to a particular motion/amendment not arising from the agenda.

Did the University of Queensland suppress a study?

Possibly and so I am putting the question out there in the hopes a journalist might investigate.

But first some context. In 2013, Redzo Mujcic and Paul Frijters (a frequent blogger here) published a study demonstrating unconscious discrimination on the part of bus drivers in Brisbane. Today, Ian Ayres took to the New York Times to promote the study’s findings.

As they describe in two working papers, Redzo Mujcic and Paul Frijters, economists at the University of Queensland, trained and assigned 29 young adult testers (from both genders and different ethnic groups) to board public buses in Brisbane and insert an empty fare card into the bus scanner. After the scanner made a loud sound informing the driver that the card did not have enough value, the testers said, “I do not have any money, but I need to get to” a station about 1.2 miles away. (The station varied according to where the testers boarded.)

With more than 1,500 observations, the study uncovered substantial, statistically significant race discrimination. Bus drivers were twice as willing to let white testers ride free as black testers (72 percent versus 36 percent of the time). Bus drivers showed some relative favoritism toward testers who shared their own race, but even black drivers still favored white testers over black testers (allowing free rides 83 percent versus 68 percent of the time).

The study also found that racial disparities persisted when the testers wore business attire or dressed in army uniforms. For example, testers wearing army uniforms were allowed to ride free 97 percent of the time if they were white, but only 77 percent of the time if they were black.

Wow. That’s quite a result and certainly the sort of thing we want our social scientists to be doing. No wonder Ayres raised it in the NYT. I did wonder, therefore, why I hadn’t heard much about it.

A possible answer came from Ian Ayres in a follow-up post at Forbes.

Professors Mujcic and Frijters deserve our thanks for authoring a study that is not only illuminating about what white privilege means. But their employer, the University of Queensland, has not thrown them a parade. After the City of Brisbane complained that the study encouraged fare evasion, the University initiated a complaint process against Professor Frijters and has ordered the authors to suppress this important paper. Blessed are those who are persecuted for righteousness sake. Instead of being persecuted, the authors should be praised for offering us a model for civil rights testing in the new millennium.

Now that is some allegation. If it is true it is shocking to an incredible degree. Not just that the City of Brisbane complained to the University but that the University, my alma mater, actually went so far as to suppress a paper.

I did a quick — but hardly investigative search — to see what this might all be about but didn’t come up with anything. But I think a response at the very least from UQ is required.

An MYEFO mystery: what’s with the resource tax?

It’s the time of the mid-year Economic Fiscal Outlook (MYEFO) and we’re told that we’re about 11 billion deeper in the red this financial year than we thought, with the treasurer blaming the dropping iron price and the reduced wage growth. I have gone over the MYEFO documents (which are an exercise in obfuscation if ever I saw one), found that wage growth and the dropped iron ore price would ‘only’ cost us 2.3 billion each in this financial year (2014-2015), noted that this was far short of the 11 billion headline, and thus went looking for the ‘real story’.

This threw up the mystery of the resource tax. Here is what it says on table 3.2:

Table 3.2: Impact of Senate on the Budget (underlying cash balance)
	Estimates		Projections
	2014‑15	2015‑16	2016‑17	2017‑18	Total
	$m	$m	$m	$m	$m
Impact of decision taken as part of Senate negotiations(a)
Repeal of the Minerals Resource Rent Tax and related measures	-1,684	-2,334	-1,670	-947	-6,634

which seems to means that the repeal of the minerals resource rent tax (and related measures) is costing us around 2 billion per year. Yet, in the ‘Overview Part’, the MYEFO says “The repeal of the Minerals Resource Rent Tax and other related measures will save the budget over $10 billion over the forward estimates and around $50 billion over the next decade.”.

What is going on?

Update (thanks Chris Lloyd): it seems to be a language issue. Part of the story seems to be that the MYEFO is counting the repeal of the mining tax, which was an election promise, as something the Senate inflicted on the budget, so the 2 billion a year is ‘revenue foregone’. So the MYEFO is blaming the Senate for the outcome of an election promise, using an odd formulation to say that the repeal will save us 50 billion when it seems to imply it would cost us 50 billion. Weird.

How to lie with statistics: the case of female hurricanes.

I came across an article in PNAS (the Proceedings of the National Academy of Sciences) with the catchy title ‘Female Hurricanes are deadlier than male hurricanes’. It is doing the rounds in the international media, with the explicit conclusion that our society suffers from gender bias because it does not sufficiently urge precautions when a hurricane gets a female name. Intrigued, and skeptic from the outset, I made the effort of looking up the article and take a closer look at the statistical analysis. I can safely say that the editor and the referees were asleep for this one as they let through a real shocker. The gist of the story is that female hurricanes are no deadlier than male ones. Below, I pick the statistics of this paper apart.

The authors support their pretty strong claims mainly on the basis of historical analyses of the death toll of 96 hurricanes in the US since 1950 and partially on the basis of hypotheticals asked of 109 respondents to an online survey. Let’s leave the hypotheticals aside, since the respondents for that one are neither representative nor facing a real situation, and look at the actual evidence on female versus male hurricanes.

One problem is that the hurricanes before 1979 were all given female names as the naming conventions changed after 1978 so that we got alternating names. Since hurricanes have become less deadly as people have become better at surviving them over time, this artificially makes the death toll of the female ones larger than the male ones. In their ‘statistical analyses’ the authors do not, however, control adequately for this, except in end-notes where they reveal most of their results become insignificant when they split the sample in a before and after period. For the combined data though, the raw correlation between the masculinity in the names and the death toll is of the same order as the raw correlation between the number of years ago that the hurricane was (ie, 0.1). Hence the effects of gender and years are indeed likely to come from the same underlying improvement in safety over time.

Using the data of the authors, I calculate that the average hurricane before 1979 killed 27 people, whilst the average one after 1978 killed 16, with the female ones killing 17 per hurricane and the male ones killing 15.3 ones per hurricane, a very small and completely insignificant difference. In fact, if I count ‘Frances’ as a male hurricane instead of a female one, because its ‘masculinity index’ is smack in the middle between male and female, then male and female hurricanes after 1978 are exactly equally deadly with an average death toll of 16.

It gets worse. Even without taking account of the fact that the male hurricanes are new ones, the authors do not in fact find an unequivocal effect at all. They run 2 different specifications that allow for the naming of the hurricanes and in neither do they actually find an effect unequivocally in the ‘right direction’ (their Table $3).

In their first, simple specification, the authors allow for effects of the severity of a hurricane in the form of the minimum air pressure (the lower, the more severe the hurricane) and the economic damage (the higher, the more severe the hurricane). Conditional on those two, they find an insignificant effect of the naming of the hurricanes!

Undeterred and seemingly hell-bent to get a strong result, the authors then add two interaction terms between the masculinity of the name of the hurricane and both the economic damage and the air pressure. The interaction term with the economic damage goes the way the authors want, ie hurricanes with both more economic damage and more feminine names have higher death tolls than hurricanes with less damage and male names. That is what their media release is based on, and their main text makes a ‘prediction graph’ out of that interaction term.

What is completely undiscussed in the main text of the article however is that the interaction with the minimum air pressure goes the opposite way: the lower the air pressure, the lower the death toll from a more feminine-named hurricane! So if the authors had made a ‘prediction graph’ showing the predicted death toll for more feminine hurricanes when the hurricanes had lower or higher air pressures, they would have shown that the worse the hurricane, the lower the death toll if the hurricane had a female name!

The editors and the referee were thus completely asleep for this pretty blatant act of deception-by-statistics. Apparently, one can hoodwink the editors of PNAS by combining the following tricks: add correlated interaction terms to a regression of which one discusses only the coefficients that fit the story one wants to sell; then make a separate graph out of the parameter one needs in the main text, whilst putting technically sounding information in parentheses to throw editors, reviewers, and readers off the scent.

And the hoodwinking in this case is not small either. In order to accentuate what really is a non-result, the authors in the main text claim that “changing a severe hurricane’s name from Charley (MFI=2.889, 14.87 deaths) to Eloise (MFI=8.944,41.45 deaths) could nearly triple its death toll.” This, whilst in the years since 1979 the average death toll for their included hurricanes is 16 for both ‘female hurricanes’ and 16 for ‘male hurricanes’ (own calculations)! The authors conveniently forgot to mention in their dramatic result that Charley would have had to have been a hurricane that did immense economic damage but that had a very high minimum air pressure, ie was actually a very weak hurricane. Only for such an ‘impossible hurricane’ would their own model predict the increase in deaths from a female name. Put differently, I could have claimed that if the hurricane was very strong in terms of low air pressure, that changing the name from Charley to Eloise would have halved the death toll!

The authors also quite willingly pretend to have found things they have not in fact researched. They thus write “”Feminine-named hurricanes (vs. masculine-named hurricanes) cause significantly more deaths, apparently because they lead to a lower perceived risk and consequently less preparedness”” and the conclusions even speak of “gender biases”! Where do they try and measure this supposed bias in actual preparations? You guessed it, nowhere. PNAS should really clean up its act and not allow this sort of article, with its fairly blatant statistical artefacts, to slip through the cracks.

Let me explain the trickery in a bit more depth for the interested reader: air pressure and economic damage are highly related (the correlation is apparently -0.56), which means that one gets a strongly significant interaction between femininity and economic damage only because one simultaneously has added the interaction with minimum air pressure. One then talks about the interaction that goes the way one wants and happily neglects to mention the other one. And one needs both interactions at the same time to get the desired result on the interaction between the names and economic damage: without this interaction with minimum air pressure, what you get is a whole shift upwards of the male death prediction and a loss of significance on the interaction term with economic damage. You see this in the ‘additional analyses’ run by the author, in very small font after the conclusions, wherein the whole thing becomes insignificant for the first period and the reduced coefficient for the later period on the interaction with air pressure coincides with a halving of the coefficient on the interaction with economic damage as well. Hence, without including both interactions you would probably get that the female hurricanes are predicted to be less deadly than the male ones when the economic damage is small and more deadly when the damage is large (to an insignificant extent). So you need the interaction that is almost invisible in the main text and the conclusions to ‘get’ the result that the headlines are based on.

There is another, even more insidious trick played in this article. You see, with only 96 hurricanes to play with, which really only includes 26 to 27 ‘male’ hurricanes, the authors are asking rather a lot from their data in that they want to estimate 5 parameter coefficients, three of which based on names. If you then only use a simple indicator for whether or not a hurricane has a male name, you have the problem that you don’t have enough variation to get significance on anything.

So what did the authors do? Ingeniously, they decided to increase the variation in their names by having people judge just how ‘masculine’ their names were. Hence many of the ‘female’ hurricanes were ‘re-badged’ as ‘somewhat male hurricanes’. So the female hurricanes of the pre 1979 era had an average “masculinity index” of 8.42, whilst those of the new post-1979 era had an average of 9.01. Simply put, according to the authors the female hurricanes ‘of old’, which were of course more deadly as they occurred earlier, were also more masculine, contributing to the headline ‘results’.

Supposedly masculine female names included “Ione”, “Beulah”, and “Babe”. And who judges whether these are masculine names? Why, apparently this was done by 9 ‘independent coders’, by which one presumes the authors meant colleagues sitting in the staff room of their university in 2013! Now, even supposing that they were independent, one cannot help but notice that the coders will have been relatively unaware of the naming conventions in the 1950s and 1960s. How is someone born in 1970 sitting in a staff room in 2013 supposed to judge how ‘masculine’ the name ‘Ione’ was perceived to be in 1950? These older names probably just sounded unusual and hence got rated as ‘more probably male’. Similarly, it is beyond me why ‘Hugo’ would be rated as less masculine than ‘Jerry’ or ‘Juan’.

The authors’ own end-notes called ‘additional analyses’ indeed show that you get insignificant results without this additional variation begotten from making the names continuous. So the authors need to fiddle with the names of the hurricanes, pool two eras together whilst not controlling for era, and add two strongly correlated and opposing interaction terms in the same analyses to get the results they want. It is what economists refer to as ‘torturing the data until it confesses’.

Finally, for the observant, there is the following anomaly telling you something about the judgements made in this research: the masculinity of names is judged on a 1 to 11 scale (only integers) by 9 raters. Yet the averages reported in the authors’ appendices include such values as 1.9444444 (Isaac) and 9.1666666 (Ophelia). Note that if it were true that there were indeed nine raters, then all values should be an exact multiple of one-ninth, ie 0.11111111. The discrepancy indicates that either there were not always nine raters, or else that not all coded values were integers (an impossibility according to the main text). The 9.16666 for instance is a multiple of one-sixth and thus suggests only 6 rates were used for ‘Ophelia’. the 1.9444444 is a multiple of one-eigteenth, suggesting that there were twice as many raters for ‘Isaac’. Alternatively, in both cases, there were nine raters but one of the nine raters picked two values simultaneously (one even and one uneven) and thus added 0.055555 to a multiple of one-ninth in the displayed average. It is not a big thing as this kind of judgement is made all the time but I can’t find the footnote that owns up to this in the paper.

Predictions versus outcomes in 2013?

In the last 5 years, I have made a point of giving clear predictions on complex socio-economic issues. I give predictions partially to improve my own understanding of humanity: nothing sharpens the thoughts as much as having to actually predict something. Another reason is as a means of helping my countries (Australia/the Netherlands) understand the world: predicting socio-economic events is what scientists are for!

Time to have a look at my predictive successes and failures over the last few years, as well as the outstanding predictions yet to be decided. Let us start with what I consider my main failure.

Failed predictions

The main area I feel I haven’t read quite right is the conflict in Syria, as part of the general change in the whole Middle East. I am still happy with my long-run predictions for that region, where I have predicted that urbanisation, more education, reduced fertility rates, and a running out of fossil fuels will lead to a normalisation of politics in a few decades time. But at the end of 2012 I was too quick in thinking the Syria conflict was done and dusted. To be fair, I was mainly following the ‘intrade political betting markets’ which was 90% certain Assad would no longer be president by the end of this year, but the prophesised take-over of the country by the Sunni majority has not quite happened. The place has become another Lebanon, with lots of armed groups defending their own turf and making war on the turf of others. The regime no longer controls the whole country, but is still the biggest militia around.

What did I fail to see? I mainly over-estimated the degree to which the West would become involved. I expected the Americans and the Turks to put a lot of resources into the more secular militias, giving them training grounds and more modern equipment. As far as I can tell, this did happen a bit, but simply not to the degree I thought likely, and I don’t really know why. There were several attempts by the US and Turkey to identify an ‘opposition coalition’ to then support, so something hidden from view must have prevented actual support. Perhaps the US has decided it prefers Assad to the alternatives after all.

The willingness of the Iranians and Russians to support the regime has also been stronger than I thought, and the efforts of the Sunni-neighbours to support the non-regime militias have been less cogent than I thought: instead of backing a clear group that had a real future in terms of leading the country (the more secular groups), foreign anti-regime support came mainly for the crazies who went along with the ideology of fanatics elsewhere. That suggests a lack of pragmatic involvement from the neighbours.

I wouldn’t call it a complete predictive failure because Syria as a country no longer exists: it now does have all kinds of regional power brokers and so one could ‘claim’ the regime indeed has lost (most of) its power, but the conflict has gone on longer than the betting markets that I went along with predicted. So this also educates me about the lack of intellectual weight to that kind of political betting market: these are probably more feel-good markets with low turnover that simply don’t aggregate much hidden information. As a related failure, I can mention that I put a low probability on the event that the Muslim brotherhood would overplay its hand when in government in Egypt. I did mention the possibility (see later), but didn’t think it would happen.

Successful predictions

A very recent prediction of mine was on bitcoins. A month ago, I said governments were going to intervene because of the money laundering opportunities in the bitcoin network, and that it hence would not become a dominant trading currency. The next week, the Chinese came down with severe restrictions on bitcoins in their country: financial institutions were not allowed to trade in it and individuals trading in it had to register with their real names, killing off most laundering opportunities. As a result, the value of the bitcoins halved. I wouldn’t claim bitcoins are quite dead yet. It is when many other countries start to enact similar regulation (as some are doing) that it becomes an official curiosum.

Other predictions have been on various aspects of the GFC in Europe. I predicted such things as the Greek defaults when European governments were still pretending they would not occur, the survival of the Euro when there was lots of speculation on imminent euro exits, the inability of the ECB to actually meaningfully monitor banks, and the failure to get agreements on tax evasion (which have all been painfully clear in 2013). My proudest moment was to predict in December 2011 the overall trajectory of where the politics of the financial crisis was heading: support for weak new institutions in exchange for continued bailouts and forms of money printing, with national sovereignty as the sticking point preventing stronger institutions. We are still on that trajectory now, as this very recent report by the Bruegel Foundation argues which dryly summarises recent events: “Five years of crisis have pushed Europe to take emergency financial measures to cushion the free fall of distressed countries. However, efforts to turn the crisis into a spur for “an ever closer union” have met with political resistance to the surrender of fiscal sovereignty. If such a union remains elusive, a perpetual muddling ahead risks generating economic and political dysfunction.” The latest banking deal fits this mould perfectly.

I am also proud of my predictions on the ill-fated Monti-government in Italy of 2012. Before he was in power, I predicted he was unlikely to have the personality to change anything, and within weeks of him in government (December 2011) I mentioned the reforms he was talking about were dead in the water, months before the magazine The Economist still put him up as a great reformer. Only in 2013 did mainstream media outside of Italy wake up to his failure. I am similarly looking good on my observations regarding the problems in Spain.

On the Middle East, in 2011 I picked the current Lybian chaos coming from its resource curse. A few weeks into the Arab spring I predicted the ensuing grand coalition in 2012 between islamists and the military in Egypt, whereby the islamists would form government but with a tacit agreement with the military not to interfere with the economic interests of that military. I also predicted that the torture machine of the Egyptian military would first deal with the urban youth and then become oriented towards the islamists should they step out of line, which they did.

The main prediction I have been making since 2007 (and which has gotten me into the most trouble!) is the uselessness of looking for a world coalition to reduce CO2 emissions, mainly because the temptation to free-ride is irresistible both within countries and between them. I have thus consistently called to forget about emission strategies and to instead think of technological advances, geo-engineering and adaptation. In each year since 2007, the developments have been accordingly: steady increases in actual emissions with a growing number of scientists and research groups thinking more seriously about geo-engineering: previous agreements on emissions have not been kept and new ones are toothless, whilst you get many beautiful political speeches designed for consumption by the gullible during each new conference on the issues.

In 2013 for instance, the Japanese reneged on their earlier Kyoto promises because they decided to switch from nuclear to fossil, following on from a previous reneging by Canada. Similarly, the EU watered down its commitments in order not to upset the German car industry, whilst China and India and others helped prevent emission agreements with any bite. A nice write-up of the recent Warshaw talk-fest can be found here. Conspicuous in that write-up is the increased awareness of the importance of adapting to climate change, and the degree to which hope lies with new technology, not massive emission reductions under existing ones. The Australian deal with the EU trading scheme, which was all smoke-and-mirrors anyway, has fallen through, essentially replaced with a policy of ‘business as usual till the bigger players come up with a plan’, which I see as a sensible policy for Australia at the moment.

Predictions on the ledger

In many ways, the ‘emission controls are hopeless’ prediction is a running prediction for decades, so that one is very much still on the ledger. And one in which I am quite willing to bet against those who say they believe serious emission reductions will come about via emission markets or other controls.

Another prediction coming ‘half-good’ recently is the bet with Andrew Leigh on happiness and incomes in rich countries, where my prediction was that richer countries getting even richer would not get happier. For the data we agreed to look at it, this indeed held, but more because I got lucky with the data available – other data showed different results. Read about it in my recent blog on the topic by following the link!

Another prediction ‘on the ledger’ is that there is going to be no real change in Chinese politics till several years after they run out of easy growth opportunities, say 20 years from now. After that, I predict stronger and stronger pressure to adopt a Western-style political system from the Chinese business community. I gave a possible trajectory for how it might happen (local experimentation growing into national systems), but that is not the only way change might happen, if it happens at all. The prediction is the consolidation of the one-party rule till years after the growth has levelled off. That consolidation has indeed been in full swing this last year: as a recent piece of the Institute of Peace and Conflict Studies argues, in 2013 we got more media control and control over the economy by the party. Still, there are some embryonic signs of attempts to get some kind of separation of powers in that country, such as via more independent judiciary and financial institutions.

The prediction that the ‘behavioural genetics’ crowd is going nowhere soon is also a prediction ‘on the ledger’. The same goes for the prediction that Australia is not going to seriously improve its education-for-the-masses anytime soon, and the unlikelihood of solar replacing fossil fuel for mass electricity-generation anytime soon.

There is then a whole heap of predictions that I am quite happy to say have come true, but where it is also a certainty someone else would disagree. For instance, I predicted that the Melbourne Model, which is a change in how the University of Melbourne structures undergraduate education, would lead to dumbed-down degrees. Everything I hear about that place confirms it, but I would be astounded if the chancellery of the University of Melbourne would agree with that assessment! Similarly, my stated fears regarding the Gonski reforms (not quite predictions as I made it clear I had a hard time finding out what was actually going to happen) are looking all-too-true, but I am sure the ministries involved would disagree. One can trawl my archives for several more such ‘debatable’ prediction outcomes.

Finally, I have a bet on with Conrad Perry for what is going to happen in Egypt next. My prediction is that the next elected government will again be an islamist-lead government, a kind of Brotherhood 2.0. They may change labels and be even more careful, but I thought it likely that they would be involved as a dominant player in the next elections simply because of the high level of religiosity in that country. Conrad Perry bets on ‘all other outcomes’ with a bottle of red to the winner. Jim Rose also made an implicit prediction, which is that the new generation of military are going to be successful in their bid to monopolise power in Egypt, but he didn’t bet anything. Still, Jim is looking rosy on that prediction.

The prediction+bet with Conrad on Egypt was entered into around August/September and things have moved on a bit since then. The Egyptian military has proven more popular and bent on total control than I thought, but we are still looking at a situation in which one is likely to get democratic elections (though the military might well rig them). I will say I am less confident about my prediction now than 3 months ago, essentially because the military has been more brutal than I thought they would be, but there is still a chance for my prediction to happen so I am not ready to concede defeat on that one yet!

The Xmas quiz answers and discussion

Last Monday I posted 4 questions to see who thought like a classic utilitarian and who adhered to a wider notion of ethics, suspecting that in the end we all subscribe to ‘more’ than classical utilitarianism. There are hence no ‘right’ answers, merely classic utilitarian ones and other ones.

The first question was to whom we should allocate a scarce supply of donor organs. Let us first briefly discuss the policy reality and then the classic utilitarian approach.

The policy reality is murky. Australia has guidelines on this that advocate taking various factors into account, including the expected benefit to the organ recipient (relevant to the utilitarian) but also the time spent on the waiting list (not so relevant). Because organs deteriorate quickly once removed, there are furthermore a lot of incidental factors important, such as which potential recipient is answering the phone (relevant to a utilitarian)? In terms of priorities though, the guidelines supposedly take no account of “race, religion, gender, social status, disability or age – unless age is relevant to the organ matching criteria.” To the utilitarian this form of equity is in fact inequity: the utilitarian does not care who receives an extra year of happy life, but by caring about the total number of additional happy years, the utilitarian would use any information that predicts those additional happy years, including race and gender.

In other countries, the practices vary. In some countries the allocation is more or less on the basis of expected benefit and in the other is it all about ‘medical criteria’ which in reality include the possibility that donor organs go to people with a high probability of a successful transplant but a very low number of expected additional years. Some leave the decision entirely up to individual doctors and hospitals, putting huge discretion on the side of an individual doctor, which raises the fear that their allocation is not purely on the grounds of societal gain.

What would the classic utilitarian do? Allocate organs where there is the highest expected number of additional happy lives. This thus involves a judgement on who is going to live long and who is going to live happy. Such things are not knowable with certainty, so a utilitarian would turn to statistical predictors of both, using whatever indicator could be administrated.

As to length of life, we generally know that rich young women have the highest life expectancy. And amongst rich young women in the West, white/Asian rich young women live even longer. According to some studies in the US, the difference with other ethnic groups (Black) can be up to 10 years (see the research links in this wikipedia page on the issue). As to whom is happy, again the general finding is that rich women are amongst the happiest groups. Hence the classic utilitarian would want to allocate the organs to rich white/Asian young women.I should note that the classic utilitarian would thus have no qualms about ending up with a policy that violates the anti-discrimination laws of many societies. Our societies shy away from using observable vague characteristics as information to base allocations on, which implicitly means that the years of life of some groups are weighed higher than the years of life of another. The example thus points to a real tension between on the one hand classic utilitarianism and its acceptance of statistical discrimination on the basis of gender and perceived ethnicity and on the other hand the dominant moral positions within our society. Again, I have no wish to say which one is ‘right’ but merely note the discrepancy. As to myself, I have no problem with the idea that priority in donor organs should be given to young women though I also see a utilitarian argument for a bit of positive discrimination in terms of a blind eye to ethnicity (ie, there is utilitarian value in maintaining the idea that allocations should not be on the basis of perceived ethnicity, even though in this case that comes at a clear loss of expected life years).

The second question surrounded the willingness to pre-emptively kill off threats to the lives of others.

The policy reality here is, again, murky. In order to get a conviction on the basis of ‘attempted’ acts of terrorism or murder, the police would have to have pretty strong evidence of a high probability that the acts were truly going to happen. A 1-in-a-million chance of perpetrating an act that would cost a million lives would certainly not be enough. Likely, not even a 10% chance would be enough, even though the expected costs of a 10% chance would be 100,000 lives, far outweighing the life of the one person (and I know that the example is somewhat artificial!).

When it concerns things like the drone-program of the west though, under which the US, with help from its allies (including Australia), kills off potential terrorist threats and accepts the possibility of collateral damage, the implicit accepted burden of proof seems much lower. I am not saying this as a form of endorsement, but simply stating what seems to go on. Given the lack of public scrutiny it is really hard to know just how much lower the burden of proof is and where in fact the information is coming from to identify targets, but being a member of a declared terrorist organisation seems to be enough cause, even if the person involved hasn’t yet harmed anybody. Now, it is easy to be holier-than-thou and dismissive about this kind of program, but the reality is that this program is supported by our populations: the major political parties go along with this, both in the US and here (we are not abandoning our strategic alliance over it with the Americans, are we, nor denying them airspace?), implying that the drone program happens, de facto, with our society’s blessing, even if some of us as individuals have mixed feelings about it. So the drone program is a form of pre-emptively killing off potential enemies because of a perceived probability of harm. The cut-off point on the probability is not known, but it is clearly lower than used in criminal cases inside our countries.

To the classic utilitarian, if all one knew would be the odds of damage and the extent of damage, then the utilitarian would want to kill off anyone who represented a net expected loss. Hence the classic utilitarian would indeed accept any odds just above 1 in a million when the threat is to a million lives: the life of the potential terrorist is worth the expected costs of his possible actions (which is one life). If one starts to include the notion that our societies derive benefit from the social norm that strong proof of intended harm is needed before killing anyone, then even the classic utilitarian would increase the threshold odds to reflect the disutility of being seen to harm those social norms, though the classic utilitarian would quickly reduce the thresholds if there were many threats and hence the usefulness of the social norm became less and less relevant. To some extent, this is exactly how our society functions: in a state of emergency or war, the burden of proof required to shoot a potential enemy drastically reduces as the regular rule of law and ‘innocent till proven guilty’ norms give way to a more radical ‘shoot now, agonize later’ mentality. If you like, we have recognised mechanisms for ridding ourselves of the social norm of a high burden of proof when the occasion calls for it.

As to personally pulling the trigger, the question to a utilitarian becomes entirely one of selfishness versus the public good and thus dependent on the personal pain of the person who would have to pull the trigger. To the utilitarian person who is completely selfless but who experiences great personal pain from pulling the trigger, the threshold probability becomes 2 in a million (ie, his own life and that of the potential terrorist), but to a more selfish person the threshold could rise very high such that even with certainty the person is not willing to kill someone else to save a million others. That might be noble under some moral codes, but to a utilitarian it would represent extreme selfishness.

So the example once again shows the gulf between how our societies normally function when it concerns small probabilities of large damages, and what the classic utilitarian would do. A utilitarian is happy to act on small probabilities, though of course eager to purchase more information if the possibility is there. Our societies are less trigger-happy. Only in cases whereby there is actual experienced turmoil and damage, do our societies gradually revert to a situation where it indeed just takes a cost-benefit frame of mind and suspends other social norms. A classic utilitarian is thus much more pro-active and willing to act on imperfect information than is normal in our societies.

The third question was about divulging information that would cause hurt but that did not lead to changes in outcomes. In the case of the hypothetical, the information was about the treatment of pets. To the classic utilitarian, this one is easy: information itself is not a final outcome and, since the hypothetical was set up in that way, the choice was between a lower state of utility with more information, versus a higher state of utility with less information. The classic utilitarian would chose the higher utility and not make the information available.

The policy reality in this case is debatable. One might argue that the hypothetical, ie that more information would not lead to changes but merely to hurt, is so unrealistic that it basically does not resemble any real policies. Some commentators made that argument, saying they essentially had no idea what I was asking, and I am sympathetic to it.

The closest one comes to the hypothetical it is the phenomenon of general flattery, such as where populations tell themselves they are god’s chosen people with a divine mission, or where whole populations buy into the idea that no-one is to blame for their individual bad choices (like their smoking choices). One might see the widespread phenomenon of keeping quiet when others are enjoying flattery as a form of suppressing information that merely hurts and would have no effect. Hence one could say that ‘good manners’ and ‘tact’ are in essence about keeping information hidden that hurts others. Personally, though I hate condoning the suppression of truth for any cause, I have to concede the utilitarian case for it.

The fourth and final question is perhaps the most glaring example of a difference between policy reality and classic utilitarianism, as it is about the distinction between an identified saved life and a statistically saved life. As one commenter already noted (Ken), politicians find it expedient to go for the identified life rather than the un-identified statistical life, and this relates to the lack of reflection amongst the population.

To the classic utilitarian, it should not matter whose life is saved: all saved lives are to the classic utilitarian ‘statistical’. Indeed, it is a key part of utilitarianism that there is no innate superiority of this person over that one. Hence, the classic utilitarian would value an identified life equally to a statistical one and would thus be willing to pour the same resources into preventing the loss of a life (via inoculations, safe road construction, etc.) as into saving a particular known individual.

The policy practice is miles apart from classic utilitarianism, not just in Australia but throughout the Western world. For statistical lives, the Australian government more or less uses the rule of thumb that it is willing to spend some 50,000 dollars per additional happy year. This is roughly the cut-off point for new medicines onto the Pharmaceutical benefit Scheme. It is also pretty much the cut-off point in other Western countries for medicines (as a rule of thumb, governments are willing to pay about a median income for another year of happy life of one of their citizens).

For identified lives, the willingness to pay is easily ten times this amount. Australia thus has a ‘Life Saving Drugs’ program for rare life-threatening conditions. This includes diseases like Gaucher Disease, Fabry disease, and the disease of Pompe. Openly-available estimates of the implied cost of a life vary and it is hard to track down the exact prices, but each year of treatment for a Pompe patient was said, in a Canadian conference for instance, to cost about 500,000 dollars. In New Zealand, the same cost of 500,000 is being used in their media. Here in Australia, the treatment involved became available in 2008 and I understand it indeed costs about 500,000 per patient per year. There will be around 500 patients born with Pompe on this program in Australia (inferred from the prevalence statistics). Note that this treatment cost does not in fact mean the difference between life and death: rather it means the difference between a shorter life and a longer one. Hence the cost per year of life saved is actually quite a bit higher than 500,000 for this disease.

What does this mean? It means, quite simply, that in stead of saving one person with the disease of Pompe, one could save at least 10 others. In order for the person born with Pompe to live, 10 others in his society die. It is a brutal reality that is difficult to talk about, but that does not change the reality. Why is the price so high? Because the pharmaceutical companies can successfully bargain with governments for an extremely high price on these visible lives saved. They hold politicians to ransom over it, successfully in the case of Australia.

Saving one identified life rather than ten unidentified ones is not merely non-utilitarian. It also vastly distorts incentives. It distorts the incentives for researchers and pharmaceutical companies away from finding solutions to the illnesses had by the anonymous many, to finding improvements in the lives of the identifiable few. It creates incentives to find distinctions between patients so that new ‘small niches’ of identified patients can be found out of which to make a lot of money. Why bother trying to find cures for malaria and cancer when it is so much more lucrative to find a drug that saves a small but identifiable fraction of the population of a rich country?

So kudos to those willing to say they would go for the institution that saved the most lives. I agree with you, but your society, as witnessed by its actions, does not yet agree, opening the question what can be done to more rationally decide on such matters.

Thanks to everyone who participated in the quiz and merry X-mas!

Rich countries and happiness: the story of a bet.

Do countries that are already rich become even happier when they become yet richer? This was the essential question on which I entered a gentleman’s bet in 2004 with Andrew Leigh and which just recently got settled.

The reason for the bet was a famous hypothesis in happiness research called the Easterlin hypothesis which held that happiness did not increase when rich countries became even richer. When I was preparing a presentation on this matter in 2004 I used the following graph to illustrate the happiness income relation across countries:

This graph shows you the relation between average income (GDP in purchasing power terms) and average happiness on a 0-10 scales for many countries. As one can see, the relation between income and happiness is upward sloping for low levels of income, but becomes somewhat flat after 15,000 dollars per person. I championed the idea that this was not just true if you looked across countries, but that this would also hold true over time.

Andrew Leigh’s thinking was influenced by other data, particularly a paper by Stevenson and Wolfers which – he thinks debunks the Easterlin hypothesis. Here’s one of their graphs:

What’s striking about this graph is that the dotted line slopes up in the top right corner. In other words, the relationship between happiness and income becomes stronger, not weaker, for countries with average incomes over $15,000. Andrew thinks that this is because they specify income in log terms (in other words, we’re looking at the effect on happiness of a percentage increase in income rather than a dollar increase in income). I think it’s because the Gallup poll isn’t measuring happiness, but is instead asking people to rank themselves on the Cantrill ladder of life scale.

So our gentleman’s bet was in effect a bet on whether happiness in the world value surveys behaved different to the ladder question of the Gallup polls, and on whether the short-run relation between income and happiness was strong enough to show up in periods of 5 to 10 years as well. Andrew thought it would, I thought 5-10 years would be long enough for the typical long-run no-effects findings to show up and that happiness has a different relation with income than the Cantril-question. So we bet on whether one would get a significantly positive relation between GDP growth and happiness changes for the rich countries when one looked at the World Value data for 2005. We agreed to look at the relation between income and happiness using country-average variation. The winner would get 100 bucks.

Now, both of us forgot about the bet for a few years as the data was supposed to become available. Only recently did Andrew remind me of our bet and asked to check what had happened.

When I (with research assistance from Debayan Pakrashi) started to look into this data again, it quickly became apparent that Andrew and I had been pretty sloppy in formulating the precise conditions of the bet. In many ways, our bet had been far too vague.

For one, the World Value survey is not in fact held in particular years. Rather, some survey is run almost every year in some country that adds to the collection of surveys known as the World Value Survey. Hence there was really no such thing as a ‘2005 wave’. Taken literally, only Australia, Finland, and Japan had a survey in 2005 and were countries that in the previous wave already had a GDP of 15,000 dollars. In all those countries, income had gone up a lot since their previous survey, with Australian happiness down and Japanese and Finnish happiness up. That is a bit meagre as ‘waves’ go.

So the first ‘addition’ was to have a bandwidth of years for the ‘2005’ waves that included 2004, 2005, 2006, 2007, and 2008. That gave 12 countries that were rich enough in the previous wave to qualify. The raw data was:

The next ‘snag’ was of course that there are many ways to define the dependence on income: linear or logarithmic. With logarithmic income one normally gets stronger statistical significance on income, so we went for logarithms.

Then, of course, there are still many other things one can put into the regression. Does one account for effects of particular years (in bands) and for the level of happiness that a country starts? We decided to try it all. Hence the final ‘deciding’ set of regressions were as follows:

Which tells you that the relation between income changes and happiness changes (the last two columns) was either quite insignificantly positive or even negative if one entered year-bands.

When one reflects on the list of countries used in the analysis though, it is clear that the outcome of the bet will have had little to do with the true relation between income and happiness. It will have hinged on hidden aspects of the data. For instance, the Australian world value survey in 1995 was run differently from the 2005 version. Hence the big drop in Australian happiness you see in this period for this data does in fact not show up for other Australian data (like the HILDA). So one suspects some change in the data-gathering to be responsible for it. Indeed, the level of Australian happiness in this data is markedly below the level found for the HILDA (where it is almost 8.0).

Similarly, the big increase in Japanese happiness in this period doesn’t show up either in other Japanese data and so probably has something to do with changes in how the survey was run there. The changes can relate to the months in which the surveys were held, the precise words used for the happiness question, the questions preceding the happiness questions, the cities in which the survey was run, how the survey was run (face-to-face or via telephone), etc.

So I may have gotten lucky and won the bet, but one cannot see the outcome as decisive evidence that income and happiness have no long-run relation within rich countries. The data for the 2010 post-GFC wave might well show the opposite!

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: