Ethical failures: Where they come from and how to address them

A review of

Gentilin, Dennis. The Origins of Ethical Failures. Lessons for Leaders. A Gower Book. Routledge (2016). ISBN: 978-1-138-69051-6

Ethical failures were in the press big-time in 2017. Prominently, creeps like Harvey Weinstein, James Toback, Bill Cosby, Larry Nassar, etc. were accused of sexual transgressions of various sorts (and in some cases admitted them to varying degrees). The sheer number of accusations leaves little doubt that, in their substance, they are correct. One thing that was truly shocking, on top of the specifics of many of the allegations, was that some of these transgressions went on for literally decades, that many people seem to have known about them for years (if not decades), and that the perpetrators did get away with them for an unconscionably long time. It is clear that organizational failures must have played a major role. This was implicitly acknowledged in the name of  The Royal Commision (RC) into Institutional Responses to Child Sexual Abuse, established under the Gillard government in 2013 and which reported all 17 volumes of its findings on December 15, 2017. The RC also laid out recommendations.

It did not really come as a surprise that once again massive organizational failure, in particular of the Catholic Church, was identified as a major finding. It did not come as a surprise because for years there had been a never-ending stream of trials, not just in Australia, suggesting just that, and providing plenty of evidence that the Catholic Church – in its (continued) belief that it is a law and world unto itself — had engaged for decades in what might generously be called economy with the truth.

Two weeks earlier, after another year of numerous reports of questionable practices, and record profits of the four major banks, the Turnbull government saw itself forced — by its own backbenchers, no less — to announce that it would establish a RC into misconduct in the banking industry. It was a step that Labor and the Greens had urged for more than a year. (The recent draft report of the Productivity Commission has made clear that some such RC is indeed overdue.) The Turnbull government’s acceptance of something that it could not prevent, and its subsequent attempts to undermine the effectiveness of the RC by simultaneously widening its scope and imposing an essentially unrealistic timeline, demonstrates, at the minimum, the kind of myopic opportunism that Australian politics seems drenched in.

Having graduated in 2001, Gentilin became a member of the FX trading desk of the National Australian Bank (NAB), one of the four major banks.  In 2004 that trading desk became involved in a trading scandal that rocked NAB and led, within a couple of weeks, to the resignation of both its chairman and CEO, the reconfiguration of the board of directors, and significant financial and reputational losses. Gentilin was the young trader who blew the whistle. Contrary to many other whistleblowers (who are typically harrassed out of the organizations on which they blew the whistle), he stayed with NAB for more than a decade – as head of the institutional sales team and a member of the corporate strategy team — before he resigned in January 2016 to found Human Systems Advisory, a name meant to be programmatic. The foreword of his book was written by the current chairman of NAB who states: “There are no simple answers in this book. But there are answers. And there are important truths, supported by deep and rigorous analysis. These should be of interest to all corporate leaders, in both executive and non-executive roles.” (p. xvi).   One such truth, says the chairman – apparently quoting Gentilin – is that “leaders must strive to articulate a meaningful social purpose for their organizations that is underpinned by a virtuous set of values.” That’s quite a mouthful, and the impending Royal Commission on the banking system suggests strongly that the major banks (that tried at first to fight off the RC until they realized that fight had been lost) have continuing trouble to understand that particular message, as does the recent draft of the related Productivity Commission report.

Below, I am interested in both the depth and rigor of the analysis and the truths that Gentilin establishes.  I am also interested in the implementability of the measures that he proposes.

In his Introduction, Gentilin states that he draws his evidence from “behavioural business ethics” which he defines as the intersection of business ethics and psychology (p. 5). While he is credited on his website with a degree in psychology, Gentilin makes clear that he wrote this book as a “practitioner” rather than “an academic, a philosopher or an ethicist” (p. 4). He does so in four chapters that explore “The Power of Context”,  “Group Dynamics”,  “Our Flawed Humanity”, and “What We Fail to See”.  A conclusion follows.

Gentilin relies heavily on summaries of articles from psychology that explore human nature and the circumstances under which nice behaviour might turn into, well, not so nice behaviour of different shades. While there is brief perfunctionary nod (p. 3) to the replicability crisis that has afflicted psychology, throughout the book there is little discussion of relevant laboratory design and implementation issues such as incentivisation, experimenter expectancy effects, external validity, and so on (Hertwig & Ortmann 2001; Ortmann 2005). Never mind the fact that much of the evidence on unethical behaviour paraded in this book has been produced with deceptive practices, arguably an unethical practice itself (Ortmann & Hertwig 2002; Hertwig & Ortmann 2008). There is no discussion of statistical issues such (lack of) power computations, p-hacking, publication biases, and what not here either.

Claiming that “explanations of unethical conduct rarely give proper consideration to the system within which people operate … (and) tend to focus on identifying ‘bad apples’ or ‘rogues’” (p. 7), in Chapter 1, Gentilin explores how the environment can impact human (mis)behaviour and, on balance, concludes that “the ‘barrel’ within which the ‘bad apples’ operate must be given as much (if not more) attention as the ‘bad apples’ themselves.” (p. 8). Before he reviews the lessons to be learned from the Stanford Prison Experiment, Gentilin reviews literature on social norms and how they affect behaviour.  The well-known Cialdini et al. littering and Mazar et al. (dis)honesty studies are paraded, as is an interesting lab study by MacNeil & Sherif (1976) in which the authors demonstrate generational transfer of (questionable) practices, and a related field study by Pierce & Snyder (2008). Distinguishing between descriptive (“derived from what is”) and injunctive (“derived from what ought to be”) norms, Gentilin documents cases where unethical descriptive norms tear to smithereens injunctive ones. He relates this to his reading of what led to the FX trading scandal at the NAB: “young people in particular are vulnerable and endorsing immoral social norms … In the FX trading scandal that engulfed the NAB, immoral social norms emerged that promoted excessive risk taking and misstating the true value of the currency options portfolio.” (pp. 18 – 19). This is hardly surprising, and indeed Gentilin mentions the LIBOR rate-fixing scandal and the professional cycling drug-taking as other high-visibility events. He could have also mentioned the lending practices of major US banks before the housing and mortgage crises (e.g., Gjerstad & Smith 2014), the despicable transgressions at Abu Ghraib, or zillions of other real-world examples.  After having reviewed the Stanford Prison experiment in some detail, Gentilin identifies two important take-home lessons from it: first, a specific context “can cause people of sound character to behave in totally uncharacteristic and inappropriate ways.” (p. 24) and, second, the emergence of such contexts is possible only when leaders allow it. Drawing on more experimental evidence (such as Bandura’s children imitating adults’ behaviour experiments), he suggests the obvious parallel for what happened at NAB: “Just as the adults were the role models in Bandura’s experiments, leaders that control the bases of power are the role models in large organizations. For these leaders there will inevitably appear some key moments where, through their actions, choices and decisions, they will send powerful messages that shape the ethical climate for their organizations and types of social norms that emerge.  … how a leader responds in these ‘defining moments’ shapes the ‘character of their companies’.” (p. 30). Only leaders who are veritable role models will be able to prevent formal mechanism being eroded by informal mechanisms that hammer away at them. Again, Gentilin suggests that such failure of leadership is what happened at NAB and at the Barclays Bank during the LIBOR rate-fixing schedule, and for that matter in the phone-hacking scandal that led to the demise of News of the World. Gentilin concludes the chapter with a list of “ten questions for senior leaders within any organization” (pp. 37 – 38). Presumably, these questions are unlikely to be answered in an honest manner where it matters. It is the evidence accumulated in this chapter but also elsewhere (Dana et al. 2007 comes to mind, or Miller & Ross 1976) that suggests that much.

Gentilin starts off Chapter 2 with a Nietzsche quotation that sets the stage: “Madness is the exception in individuals but the rule in groups.” (p. 45). The basic point made is that group membership can reinforce – cue social media echo chambers – the drifting away from injunctive norms to descriptive ones. Writes he: “In my experience at the NAB, dysfunctional group dynamics in the currency options business played a significant role in promoting the emergence and maintenance of immoral social norms and unethical behaviour [such as flagrant and persistent limit breaches or excessive risk taking, AO]”.  To buttress the case, Gentilin presents Milgram’s 1974 obedience studies, as well as Gina Perry’s recent critique of them (Perry 2012) which, in light of considerable supporting evidence of the original studies (e.g., Haslam et al. 2014), he dismissesin their substance. He then highlights what we learn from Milgram’s inclusion of a variation that drew on the group paradigm.  That motivates a discussion of the conformity experiments through which Asch (1956) tried to identify the conditions under which participants would contradict a majority.  In this context, Gentilin also briefly discusses a between-subjects study by Woodzicka & LeFrance (2001) who had a male interviewer ask female applicants inappropriate questions. The basic result was that 6 out of 10 subjects claimed they would object (hypothetically) but none in the control group refused the answer in a “real-life” scenario.  That seems the kind of pattern that allowed the Weinsteins of this world to get their way for too long. Only in the case of Weinstein and similar assholes (here used in the technical sense of Sutton 2007), the stakes were arguably considerably higher. People’s lack of willingness to stand up and be counted is, unfortunately, so widespread that it is well documented and it is a recurrent theme of great movies such as Hidden Figures.  Gentilin makes clear that, based on his experience at NAB, “facing in the fork in the road in a hypothetical scenario is vastly different from facing it in reality.” (p. 67) He also states, “I am personally sceptical of other research into whistleblowing that focuses on ascertaining the types of personality or dispositional characteristics that may predict whether an observer of wrongdoing will take action and report it. …This line of enquiry fails to properly consider the power of the situation.” (p. 67). Gentilin concludes the chapter with another list of “ten questions for senior leaders (and followers) within any organization” (pp. 73). I doubt that these questions will be answered in an honest manner where it matters, for essentially the exact reason that Gentilin has identified in the chapter.

In Chapter 3, Gentilin – notwithstanding his, in my considered opinion, sensible stand on the relative importance of context and dispositional characteristics – dives into “our flawed humanity”. Programmatically, he starts with an epigraph featuring a quotation from Kant, “Out of the crooked timber of humanity, no straight thing has ever been made.” (p. 80).  Gentilin then tries to answer questions such as “Are Humans Self-Interested?”, cursorily sampling evidence from experimental economics, neuroscience, and evolutionary biology. Predictably he concludes that this research shows that “human nature (is) far different from the one suggested by the axiom of self-interest” (p. 86), though he qualifies the statement with the caveat that we are not always altruistic and cooperative.  This alleged “paradigm shift” (p. 87) is, unfortunately, the major bone of contention between those marketing Behavioural Economics (and often shamelessly benefitting from it) and those doing Experimental Economics, and I believe that the social-preferences literature that has created it has as much merits as the IN oxytocin, ego depletion, and power poses research now, for all I can see, thoroughly debunked. Better not plan your life, or organization, on such flimsy evidence. From an evidence point of view, and also a theory point of view (e.g., the important insights stemming from repeated game situations), this chapter is the weakest.  Gentilin’s sampling of the evidence strikes me as scattershot and unsystematic. After discussions of issues such as power and its corrupting influence and fear and awareness of our own mortality that feeds into it, Gentilin concludes the chapter with a list of “eleven questions for senior leaders within any organization” (pp. 118)  I fear, these questions, again, are unlikely to be answered in an honest manner where it matters.

In Chapter 4, Gentilin starts with a quotation from Kahneman’s best-seller Thinking Fast and Slow: “We can be blind to the obvious, and we can also be blind to our blindness.”  This double-whammy – a variant of the Dunning – Krueger effect — is why questions to senior leaders are unlikely to be answered honestly and self-critically.  After a brief mention of another persistent bone of contention – the System 1 / System 2 delineation  – and our alleged propensity to rely too much on automatic system 1 which makes us, presumably, liable to various biases (in this chapter loss aversion, framing, overconfidence, moral disengagement, euphemistic labelling), Gentilin lays out the slippery-slope argument that in his view was at the heart of the events that led to the NAB trading scandal: “The FX trading incident at the NAB classically illustrated the slippery slope in action. Not only did ethical standards erode over time, but the seriousness of the ethical transgressions accelerated … “ (p. 130). Laboratory evidence is provided to  make that point (e.g., the interesting Gino & Bazerman 2008 study) along with field evidence from the NAB case (pp. 131). An intervention discussed here is to give people more time and essentially get them to break out of their System 1 mode: “There are now numerous studies that illustrate how providing a person with more time whenever they are confronted with an ethical dilemma tends to lead to a more virtuous decision being made.” (pp. 146-7). I have serious doubt about the relevance of, say, the Good-Samaritian study mentioned here for real-world decision making and suspect that a theoretical grounding in organizational economics and repeated game theory would really help to address the challenges that organizations and their leaders face.

Gentilin concludes his book with a plea for more (business ethics) education, a call for the installation of Chief Ethics Officers, and more Lessons for Leaders. He wants business schools to challenge their students intellectually, emotionally, and spiritually. That sounds like something straight out of a high-gloss advertisement such schools produce. The reality, however, of Australian business schools (and undoubtedly business schools everywhere) is that they are rarely intellectually demanding. Their inability to challenge their students emotionally and spiritually is shown effectively by their treatment of casuals and staff. What business schools typically do not have are, in particular, truly independent ethics officers, and HR departments, that could hold the feet of currently widely unaccountable senior leadership to the fire. So, while the idea of a Chief Ethics Officer, who has “a genuine ‘seat at the table’” (p. 161), and is independent, able to freely raise matters of concern, and able to freely “speak truth to power” (p. 161), is conceptually on the money, realistically it is very unlikely to be implemented any time soon, as are truly independent HR departments. As to Lessons for Leaders, Gentilin wants them to be virtuous in the sense of having some community-oriented values.  There is a lot of wishful thinking on display here (e.g., that others are willing to take the same risks that he took in 2004) but I think, after everything we learned through the flurry of recent examples mentioned at the beginning of this review, there is not much reason for hope. Even something that should have been uncontroversial, such as the Royal Commission on banking, and the way it came about, demonstrates that common ground is hard to find and cannot be relied on. I fear much harder thinking will be needed to address ethical failures and I fear some strategies will be of the innovative kind provided by the #MeToo campaign that not only has brought down some true monsters but is likely to have changed power and gender relations in the working world irreversibly.

In summary then, Gentilin tackles arguably the most important issue of our times – ethical failures within organizations and for that matter ethical failures more generally. His book is strongest where he illustrates the emergence of his insights with examples from his own NAB 2004 experience. His illustration of various arguments he makes with evidence from behavioural business ethics is wanting. As pointed out above, to his credit Gentilin himself – although unaware of important methodological debates among psychologists as well as between psychologists and economists – grasps intuitively the lack of external validity of some of the evidence that he presents and it is clear that his NAB 2004 experience has been a good guide to identify which laboratory evidence has some external validity, and which does not. I think the book could be considerably improved with a more even-handed and complete assessment of the evidence from psychology and other social sciences (and here in particular economics) as well as an additional focus on incentive-compatible organizational design.  To rely on business ethics education in business schools (whether in Australia or elsewhere) or a sense of community oriented-ness of business leaders is just not going to cut the mustard, as the widely perceived need for the Royal Commission in the banking system demonstrates.

Having recently interacted with NAB, once again, with mortgage related issues, I have no doubt that NAB culture is pervaded with everything but a meaningful social purpose that is underpinned by a virtuous set of values (e.g., the loan officer I dealt with did everything to prevent me from comparison shopping, and essentially gave me misleading information about the rates that I would be getting), and I have little doubt that the same applies to each of the other three major banks. There is a reason why the major banks in Australia have had outsized profits and some of the highest returns on equity in the world. The recent draft of the related Productivity Commission report spells them out.

 

I appreciate Dennis Gentilin’s comments on a draft of this review.

 

EU plans for VAT taxation are doomed to fail. Again.

Taxation is the potential downfall of the EU as an institution. The reason is that within the EU, several member states are making money from the tax evasion in other member states, a situation akin to having a wife slowly murdering her husband with poison. Unless this stops, a divorce becomes inevitable.

Luxemburg, the Netherlands, Ireland, Lichtenstein, Austria, London, and several others are at it: they help large corporations avoid their taxation responsibilities. They either make deals that allow companies to hide their tax obligation, have idiosyncratic definitions under which there are less tax obligations, provide re-labelling services such that head-offices can be a mere post-box, etc.

These tax-avoidance enablers have also systematically frustrated all attempts over the last 30 years to harmonise taxation and reverse the damage they have done to the integrity of the other nation states in the EU. Whenever the issue of tax evasion was in the public eye, for instance during the GFC, they stalled by insisting tax evasion should be solved internationally and should include all other tax havens. Predictably, these were impossible demands. They have also made life difficult inside committees and government forums.

The EU bureaucracy has just put out a new set of proposals regarding VAT on large international corporations (like Google and Amazon), impact evaluated and all. I have read them and predict they will not be implemented, nor would they work anyway.

For one, the EU commission has no power to enforce new tax rules, and these proposals are in a long line of ignored prior proposals. To become law they would need the unanimous backing of all EU members. They hence need the cooperation of about 5 countries that would lose billions if they complied. Fat chance, even with Brexit reducing the political clout of London.

Secondly, the proposals repeat the main mistake of the past: they advocate a rules-based administrative system of taxation which is cumbersome, highly-complex, and easy to game. I explain how over the fold. Continue reading “EU plans for VAT taxation are doomed to fail. Again.”

Adverse Action Lawyer wanted in Frijters versus UQ case

I am seeking a lawyer to run an Adverse Action case connected to the recent Fair Work Commission verdict that found systematic breaches of procedures and procedural fairness in the University of Queensland’s actions against me following my research on racial attitudes in Brisbane. I first raised these breaches late 2013, but they were never addressed, with lots of new ones added to them as the case dragged on. The VC of the university was also personally informed of these breaches in April 2014, publicly denying there was anything wrong about UQ’s action in February 2015. He was again informed in March 2015, consistently failing to rectify breaches of procedure brought to his attention. I wish to bring an Adverse Action case to claim back my considerable costs.

feb2015cover3264x1078

I expect the case to be worth at least a few hundred thousand dollars in terms of damages (legal cost, value of my time, etc.), and for it to be potentially one of many others because the FW case uncovered widespread breaches of procedures in UQ’s handling of misconduct cases. So there might well be many others who are now looking to bring Adverse Action cases against UQ.

I offer a pay-for-success contract wherein the first part of any awarded damages would go to the lawyer, but after a threshold payment I want 50% to go to the successful lawyer and 50% towards Vanavil, which is a school for orphaned victims of the 2004 Tsunami flood in India. I feel that helping the poorest Indians will go some way to nullify the damage that the managers of UQ did when they suppressed evidence of adverse treatments of Indians (and Indigenous peoples) in Brisbane and made it harder to research these things in general. And I want to feel that I haven’t wasted my time these last three years on fighting mindless bureaucracies, but that my efforts ended up helping people in need.

Negotiations on the offered contract are possible. Please contact me on email if you are interested or have a good suggestion for a good adverse action lawyer ( p dot frijters AT uq dot edu dot au).

[Ps. The VC of UQ was still making inappropriate claims last week on the UQ media about his lack of involvement and has refused to retract his claims this last week when I pointed his errors out to him.]

Best practice governance of top academic departments

Over the last 15-20 years academic school meetings have gone from rambling and unstructured brawls to dull “executive infomercials”. The former led to marathon meetings. The current model has led to a middle-management culture that often does not take advantage of the very valuable specialists skills of talented, highly trained (and experienced) scholars in the department. Nor does it allow for reasonable checks and balances on the powers of the executive–something that is vital for the management of any group of academics.

Our school at the ANU has adopted a system informed  both by ANU’s academic board and how the world’s top economics departments operate. The following is the template that we have adopted and which has replaced the executive committee.

 

School Meetings

1. Members of the School Meeting are all academic staff at level B and above who are engaged equal to or greater than 50% FTE.  Observer status is given to all other staff in the School.
2. School Meetings are chaired by the Speaker of the School Meeting, who is chosen by the Head of School. The Speaker is a Member of the School Meeting who is not a Professor nor a Chair of one of the Committees nor the Head of School.
3. School meetings shall be held at least twice every semester during the academic year. Additional meetings may be called by the Head of School  or at the written request of twenty-five percent of the Members of the School Meeting. One-half the Membership of the School Meeting constitutes a quorum.
4. Meetings shall be conducted in accordance with Robert’s Rules of Order. Minutes of the meeting and reports submitted to the Meeting shall be kept and made available to the Members.
5. The initial Agenda for regular School Meetings shall be prepared by the Speaker and circulated in written form at least five calendar days prior to the meeting. The initial Agenda includes the items 1. Apologies 2. Confirmation of Minutes of Previous Meeting 3. Reports of Head of School and the Chairs of Committees 4. Workplace Health and Safety 5. Other Business 6. Items added by the Head of School.

 

6. Additional items may be suggested by individual Members and, at the discretion of the Head of School, be added to the agenda for the forthcoming meeting. Alternatively, items may be placed on the agenda by written petition of twenty-five percent of the Members. All such additions to the agenda must occur at least three days prior to the meeting.
7. A simply majority of those present and those sending proxy votes and/or absentee ballots shall decide an issue arising from an item in the initial agenda or an item added at the discretion of the Head of School.
8. A majority above 60% of those present and those sending proxy votes and/or absentee ballots shall decide an  issue arising from the agenda item not described in 7.
9. A majority above 70% of those present and those sending proxy votes shall decide a motion not arising from an issue on the agenda.
10. Proxies may be given from one member to another. Proxies are given in written form to the other member a copy of which must be received by the Speaker three calendar days prior to the meeting. Proxies cannot be specific to particular items on the agenda nor to a particular motion/amendment not arising from the agenda.

 

 

 

Did the University of Queensland suppress a study?

Possibly and so I am putting the question out there in the hopes a journalist might investigate.

But first some context. In 2013, Redzo Mujcic and Paul Frijters (a frequent blogger here) published a study demonstrating unconscious discrimination on the part of bus drivers in Brisbane. Today, Ian Ayres took to the New York Times to promote the study’s findings.

As they describe in two working papers, Redzo Mujcic and Paul Frijters, economists at the University of Queensland, trained and assigned 29 young adult testers (from both genders and different ethnic groups) to board public buses in Brisbane and insert an empty fare card into the bus scanner. After the scanner made a loud sound informing the driver that the card did not have enough value, the testers said, “I do not have any money, but I need to get to” a station about 1.2 miles away. (The station varied according to where the testers boarded.)

With more than 1,500 observations, the study uncovered substantial, statistically significant race discrimination. Bus drivers were twice as willing to let white testers ride free as black testers (72 percent versus 36 percent of the time). Bus drivers showed some relative favoritism toward testers who shared their own race, but even black drivers still favored white testers over black testers (allowing free rides 83 percent versus 68 percent of the time).

The study also found that racial disparities persisted when the testers wore business attire or dressed in army uniforms. For example, testers wearing army uniforms were allowed to ride free 97 percent of the time if they were white, but only 77 percent of the time if they were black.

Wow. That’s quite a result and certainly the sort of thing we want our social scientists to be doing. No wonder Ayres raised it in the NYT. I did wonder, therefore, why I hadn’t heard much about it.

A possible answer came from Ian Ayres in a follow-up post at Forbes.

Professors Mujcic and Frijters deserve our thanks for authoring a study that is not only illuminating about what white privilege means.  But their employer, the University of Queensland, has not thrown them a parade.  After the City of Brisbane complained that the study encouraged fare evasion, the University initiated a complaint process against Professor Frijters and has ordered the authors to suppress this important paper.  Blessed are those who are persecuted for righteousness sake.  Instead of being persecuted, the authors should be praised for offering us a model for civil rights testing in the new millennium.

Now that is some allegation. If it is true it is shocking to an incredible degree. Not just that the City of Brisbane complained to the University but that the University, my alma mater, actually went so far as to suppress a paper.

I did a quick — but hardly investigative search — to see what this might all be about but didn’t come up with anything. But I think a response at the very least from UQ is required.

An MYEFO mystery: what’s with the resource tax?

It’s the time of the mid-year Economic Fiscal Outlook (MYEFO) and we’re told that we’re about 11 billion deeper in the red this financial year than we thought, with the treasurer blaming the dropping iron price and the reduced wage growth. I have gone over the MYEFO documents (which are an exercise in obfuscation if ever I saw one), found that wage growth and the dropped iron ore price would ‘only’ cost us 2.3 billion each in this financial year (2014-2015), noted that this was far short of the 11 billion headline, and thus went looking for the ‘real story’.

This threw up the mystery of the resource tax. Here is what it says on table 3.2:

Table 3.2: Impact of Senate on the Budget (underlying cash balance)
Estimates Projections
2014‑15 2015‑16 2016‑17 2017‑18 Total
$m $m $m $m $m
Impact of decision taken as part of Senate negotiations(a)
Repeal of the Minerals Resource Rent Tax and related measures -1,684 -2,334 -1,670 -947 -6,634

which seems to means that the repeal of the minerals resource rent tax (and related measures) is costing us around 2 billion per year. Yet, in the ‘Overview Part’, the MYEFO says “The repeal of the Minerals Resource Rent Tax and other related measures will save the budget over $10 billion over the forward estimates and around $50 billion over the next decade.”.

What is going on?

Update (thanks Chris Lloyd): it seems to be a language issue. Part of the story seems to be that the MYEFO is counting the repeal of the mining tax, which was an election promise, as something the Senate inflicted on the budget, so the 2 billion a year is ‘revenue foregone’. So the MYEFO is blaming the Senate for the outcome of an election promise, using an odd formulation to say that the repeal will save us 50 billion when it seems to imply it would cost us 50 billion. Weird.

How to lie with statistics: the case of female hurricanes.

I came across an article in PNAS (the Proceedings of the National Academy of Sciences) with the catchy title ‘Female Hurricanes are deadlier than male hurricanes’. It is doing the rounds in the international media, with the explicit conclusion that our society suffers from gender bias because it does not sufficiently urge precautions when a hurricane gets a female name. Intrigued, and skeptic from the outset, I made the effort of looking up the article and take a closer look at the statistical analysis. I can safely say that the editor and the referees were asleep for this one as they let through a real shocker. The gist of the story is that female hurricanes are no deadlier than male ones. Below, I pick the statistics of this paper apart.

The authors support their pretty strong claims mainly on the basis of historical analyses of the death toll of 96 hurricanes in the US since 1950 and partially on the basis of hypotheticals asked of 109 respondents to an online survey. Let’s leave the hypotheticals aside, since the respondents for that one are neither representative nor facing a real situation, and look at the actual evidence on female versus male hurricanes.

One problem is that the hurricanes before 1979 were all given female names as the naming conventions changed after 1978 so that we got alternating names. Since hurricanes have become less deadly as people have become better at surviving them over time, this artificially makes the death toll of the female ones larger than the male ones. In their ‘statistical analyses’ the authors do not, however, control adequately for this, except in end-notes where they reveal most of their results become insignificant when they split the sample in a before and after period. For the combined data though, the raw correlation between the masculinity in the names and the death toll is of the same order as the raw correlation between the number of years ago that the hurricane was (ie, 0.1). Hence the effects of gender and years are indeed likely to come from the same underlying improvement in safety over time.

Using the data of the authors, I calculate that the average hurricane before 1979 killed 27 people, whilst the average one after 1978 killed 16, with the female ones killing 17 per hurricane and the male ones killing 15.3 ones per hurricane, a very small and completely insignificant difference. In fact, if I count ‘Frances’ as a male hurricane instead of a female one, because its ‘masculinity index’ is smack in the middle between male and female, then male and female hurricanes after 1978 are exactly equally deadly with an average death toll of 16.

It gets worse. Even without taking account of the fact that the male hurricanes are new ones, the authors do not in fact find an unequivocal effect at all. They run 2 different specifications that allow for the naming of the hurricanes and in neither do they actually find an effect unequivocally in the ‘right direction’ (their Table $3).

In their first, simple specification, the authors allow for effects of the severity of a hurricane in the form of the minimum air pressure (the lower, the more severe the hurricane) and the economic damage (the higher, the more severe the hurricane). Conditional on those two, they find an insignificant effect of the naming of the hurricanes!

Undeterred and seemingly hell-bent to get a strong result, the authors then add two interaction terms between the masculinity of the name of the hurricane and both the economic damage and the air pressure. The interaction term with the economic damage goes the way the authors want, ie hurricanes with both more economic damage and more feminine names have higher death tolls than hurricanes with less damage and male names. That is what their media release is based on, and their main text makes a ‘prediction graph’ out of that interaction term.

What is completely undiscussed in the main text of the article however is that the interaction with the minimum air pressure goes the opposite way: the lower the air pressure, the lower the death toll from a more feminine-named hurricane! So if the authors had made a ‘prediction graph’ showing the predicted death toll for more feminine hurricanes when the hurricanes had lower or higher air pressures, they would have shown that the worse the hurricane, the lower the death toll if the hurricane had a female name!

The editors and the referee were thus completely asleep for this pretty blatant act of deception-by-statistics. Apparently, one can hoodwink the editors of PNAS by combining the following tricks: add correlated interaction terms to a regression of which one discusses only the coefficients that fit the story one wants to sell; then make a separate graph out of the parameter one needs in the main text, whilst putting technically sounding information in parentheses to throw editors, reviewers, and readers off the scent.

And the hoodwinking in this case is not small either. In order to accentuate what really is a non-result, the authors in the main text claim that “changing a severe hurricane’s name from Charley (MFI=2.889, 14.87 deaths) to Eloise (MFI=8.944,41.45 deaths) could nearly triple its death toll.” This, whilst in the years since 1979 the average death toll for their included hurricanes is 16 for both ‘female hurricanes’ and 16 for ‘male hurricanes’ (own calculations)! The authors conveniently forgot to mention in their dramatic result that Charley would have had to have been a hurricane that did immense economic damage but that had a very high minimum air pressure, ie was actually a very weak hurricane. Only for such an ‘impossible hurricane’ would their own model predict the increase in deaths from a female name. Put differently, I could have claimed that if the hurricane was very strong in terms of low air pressure, that changing the name from Charley to Eloise would have halved the death toll!

The authors also quite willingly pretend to have found things they have not in fact researched. They thus write “”Feminine-named hurricanes (vs. masculine-named hurricanes) cause significantly more deaths, apparently because they lead to a lower perceived risk and consequently less preparedness”” and the conclusions even speak of “gender biases”! Where do they try and measure this supposed bias in actual preparations? You guessed it, nowhere. PNAS should really clean up its act and not allow this sort of article, with its fairly blatant statistical artefacts, to slip through the cracks.

Let me explain the trickery in a bit more depth for the interested reader: air pressure and economic damage are highly related (the correlation is apparently -0.56), which means that one gets a strongly significant interaction between femininity and economic damage only because one simultaneously has added the interaction with minimum air pressure. One then talks about the interaction that goes the way one wants and happily neglects to mention the other one. And one needs both interactions at the same time to get the desired result on the interaction between the names and economic damage: without this interaction with minimum air pressure, what you get is a whole shift upwards of the male death prediction and a loss of significance on the interaction term with economic damage. You see this in the ‘additional analyses’ run by the author, in very small font after the conclusions, wherein the whole thing becomes insignificant for the first period and the reduced coefficient for the later period on the interaction with air pressure coincides with a halving of the coefficient on the interaction with economic damage as well. Hence, without including both interactions you would probably get that the female hurricanes are predicted to be less deadly than the male ones when the economic damage is small and more deadly when the damage is large (to an insignificant extent). So you need the interaction that is almost invisible in the main text and the conclusions to ‘get’ the result that the headlines are based on.

There is another, even more insidious trick played in this article. You see, with only 96 hurricanes to play with, which really only includes 26 to 27 ‘male’ hurricanes, the authors are asking rather a lot from their data in that they want to estimate 5 parameter coefficients, three of which based on names. If you then only use a simple indicator for whether or not a hurricane has a male name, you have the problem that you don’t have enough variation to get significance on anything.

So what did the authors do? Ingeniously, they decided to increase the variation in their names by having people judge just how ‘masculine’ their names were. Hence many of the ‘female’ hurricanes were ‘re-badged’ as ‘somewhat male hurricanes’. So the female hurricanes of the pre 1979 era had an average “masculinity index” of 8.42, whilst those of the new post-1979 era had an average of 9.01. Simply put, according to the authors the female hurricanes ‘of old’, which were of course more deadly as they occurred earlier, were also more masculine, contributing to the headline ‘results’.

Supposedly masculine female names included “Ione”, “Beulah”, and “Babe”. And who judges whether these are masculine names? Why, apparently this was done by 9 ‘independent coders’, by which one presumes the authors meant colleagues sitting in the staff room of their university in 2013! Now, even supposing that they were independent, one cannot help but notice that the coders will have been relatively unaware of the naming conventions in the 1950s and 1960s. How is someone born in 1970 sitting in a staff room in 2013 supposed to judge how ‘masculine’ the name ‘Ione’ was perceived to be in 1950? These older names probably just sounded unusual and hence got rated as ‘more probably male’. Similarly, it is beyond me why ‘Hugo’ would be rated as less masculine than ‘Jerry’ or ‘Juan’.

The authors’ own end-notes called ‘additional analyses’ indeed show that you get insignificant results without this additional variation begotten from making the names continuous. So the authors need to fiddle with the names of the hurricanes, pool two eras together whilst not controlling for era, and add two strongly correlated and opposing interaction terms in the same analyses to get the results they want. It is what economists refer to as ‘torturing the data until it confesses’.

Finally, for the observant, there is the following anomaly telling you something about the judgements made in this research: the masculinity of names is judged on a 1 to 11 scale (only integers) by 9 raters. Yet the averages reported in the authors’ appendices include such values as 1.9444444 (Isaac) and 9.1666666 (Ophelia). Note that if it were true that there were indeed nine raters, then all values should be an exact multiple of one-ninth, ie 0.11111111. The discrepancy indicates that either there were not always nine raters, or else that not all coded values were integers (an impossibility according to the main text). The 9.16666 for instance is a multiple of one-sixth and thus suggests only 6 rates were used for ‘Ophelia’. the 1.9444444 is a multiple of one-eigteenth, suggesting that there were twice as many raters for ‘Isaac’. Alternatively, in both cases, there were nine raters but one of the nine raters picked two values simultaneously (one even and one uneven) and thus added 0.055555 to a multiple of one-ninth in the displayed average. It is not a big thing as this kind of judgement is made all the time but I can’t find the footnote that owns up to this in the paper.