How to lie with statistics: the case of female hurricanes.

I came across an article in PNAS (the Proceedings of the National Academy of Sciences) with the catchy title ‘Female Hurricanes are deadlier than male hurricanes’. It is doing the rounds in the international media, with the explicit conclusion that our society suffers from gender bias because it does not sufficiently urge precautions when a hurricane gets a female name. Intrigued, and skeptic from the outset, I made the effort of looking up the article and take a closer look at the statistical analysis. I can safely say that the editor and the referees were asleep for this one as they let through a real shocker. The gist of the story is that female hurricanes are no deadlier than male ones. Below, I pick the statistics of this paper apart.

The authors support their pretty strong claims mainly on the basis of historical analyses of the death toll of 96 hurricanes in the US since 1950 and partially on the basis of hypotheticals asked of 109 respondents to an online survey. Let’s leave the hypotheticals aside, since the respondents for that one are neither representative nor facing a real situation, and look at the actual evidence on female versus male hurricanes.

One problem is that the hurricanes before 1979 were all given female names as the naming conventions changed after 1978 so that we got alternating names. Since hurricanes have become less deadly as people have become better at surviving them over time, this artificially makes the death toll of the female ones larger than the male ones. In their ‘statistical analyses’ the authors do not, however, control adequately for this, except in end-notes where they reveal most of their results become insignificant when they split the sample in a before and after period. For the combined data though, the raw correlation between the masculinity in the names and the death toll is of the same order as the raw correlation between the number of years ago that the hurricane was (ie, 0.1). Hence the effects of gender and years are indeed likely to come from the same underlying improvement in safety over time.

Using the data of the authors, I calculate that the average hurricane before 1979 killed 27 people, whilst the average one after 1978 killed 16, with the female ones killing 17 per hurricane and the male ones killing 15.3 ones per hurricane, a very small and completely insignificant difference. In fact, if I count ‘Frances’ as a male hurricane instead of a female one, because its ‘masculinity index’ is smack in the middle between male and female, then male and female hurricanes after 1978 are exactly equally deadly with an average death toll of 16.

It gets worse. Even without taking account of the fact that the male hurricanes are new ones, the authors do not in fact find an unequivocal effect at all. They run 2 different specifications that allow for the naming of the hurricanes and in neither do they actually find an effect unequivocally in the ‘right direction’ (their Table $3).

In their first, simple specification, the authors allow for effects of the severity of a hurricane in the form of the minimum air pressure (the lower, the more severe the hurricane) and the economic damage (the higher, the more severe the hurricane). Conditional on those two, they find an insignificant effect of the naming of the hurricanes!

Undeterred and seemingly hell-bent to get a strong result, the authors then add two interaction terms between the masculinity of the name of the hurricane and both the economic damage and the air pressure. The interaction term with the economic damage goes the way the authors want, ie hurricanes with both more economic damage and more feminine names have higher death tolls than hurricanes with less damage and male names. That is what their media release is based on, and their main text makes a ‘prediction graph’ out of that interaction term.

What is completely undiscussed in the main text of the article however is that the interaction with the minimum air pressure goes the opposite way: the lower the air pressure, the lower the death toll from a more feminine-named hurricane! So if the authors had made a ‘prediction graph’ showing the predicted death toll for more feminine hurricanes when the hurricanes had lower or higher air pressures, they would have shown that the worse the hurricane, the lower the death toll if the hurricane had a female name!

The editors and the referee were thus completely asleep for this pretty blatant act of deception-by-statistics. Apparently, one can hoodwink the editors of PNAS by combining the following tricks: add correlated interaction terms to a regression of which one discusses only the coefficients that fit the story one wants to sell; then make a separate graph out of the parameter one needs in the main text, whilst putting technically sounding information in parentheses to throw editors, reviewers, and readers off the scent.

And the hoodwinking in this case is not small either. In order to accentuate what really is a non-result, the authors in the main text claim that “changing a severe hurricane’s name from Charley (MFI=2.889, 14.87 deaths) to Eloise (MFI=8.944,41.45 deaths) could nearly triple its death toll.” This, whilst in the years since 1979 the average death toll for their included hurricanes is 16 for both ‘female hurricanes’ and 16 for ‘male hurricanes’ (own calculations)! The authors conveniently forgot to mention in their dramatic result that Charley would have had to have been a hurricane that did immense economic damage but that had a very high minimum air pressure, ie was actually a very weak hurricane. Only for such an ‘impossible hurricane’ would their own model predict the increase in deaths from a female name. Put differently, I could have claimed that if the hurricane was very strong in terms of low air pressure, that changing the name from Charley to Eloise would have halved the death toll!

The authors also quite willingly pretend to have found things they have not in fact researched. They thus write “”Feminine-named hurricanes (vs. masculine-named hurricanes) cause significantly more deaths, apparently because they lead to a lower perceived risk and consequently less preparedness”” and the conclusions even speak of “gender biases”! Where do they try and measure this supposed bias in actual preparations? You guessed it, nowhere. PNAS should really clean up its act and not allow this sort of article, with its fairly blatant statistical artefacts, to slip through the cracks.

Let me explain the trickery in a bit more depth for the interested reader: air pressure and economic damage are highly related (the correlation is apparently -0.56), which means that one gets a strongly significant interaction between femininity and economic damage only because one simultaneously has added the interaction with minimum air pressure. One then talks about the interaction that goes the way one wants and happily neglects to mention the other one. And one needs both interactions at the same time to get the desired result on the interaction between the names and economic damage: without this interaction with minimum air pressure, what you get is a whole shift upwards of the male death prediction and a loss of significance on the interaction term with economic damage. You see this in the ‘additional analyses’ run by the author, in very small font after the conclusions, wherein the whole thing becomes insignificant for the first period and the reduced coefficient for the later period on the interaction with air pressure coincides with a halving of the coefficient on the interaction with economic damage as well. Hence, without including both interactions you would probably get that the female hurricanes are predicted to be less deadly than the male ones when the economic damage is small and more deadly when the damage is large (to an insignificant extent). So you need the interaction that is almost invisible in the main text and the conclusions to ‘get’ the result that the headlines are based on.

There is another, even more insidious trick played in this article. You see, with only 96 hurricanes to play with, which really only includes 26 to 27 ‘male’ hurricanes, the authors are asking rather a lot from their data in that they want to estimate 5 parameter coefficients, three of which based on names. If you then only use a simple indicator for whether or not a hurricane has a male name, you have the problem that you don’t have enough variation to get significance on anything.

So what did the authors do? Ingeniously, they decided to increase the variation in their names by having people judge just how ‘masculine’ their names were. Hence many of the ‘female’ hurricanes were ‘re-badged’ as ‘somewhat male hurricanes’. So the female hurricanes of the pre 1979 era had an average “masculinity index” of 8.42, whilst those of the new post-1979 era had an average of 9.01. Simply put, according to the authors the female hurricanes ‘of old’, which were of course more deadly as they occurred earlier, were also more masculine, contributing to the headline ‘results’.

Supposedly masculine female names included “Ione”, “Beulah”, and “Babe”. And who judges whether these are masculine names? Why, apparently this was done by 9 ‘independent coders’, by which one presumes the authors meant colleagues sitting in the staff room of their university in 2013! Now, even supposing that they were independent, one cannot help but notice that the coders will have been relatively unaware of the naming conventions in the 1950s and 1960s. How is someone born in 1970 sitting in a staff room in 2013 supposed to judge how ‘masculine’ the name ‘Ione’ was perceived to be in 1950? These older names probably just sounded unusual and hence got rated as ‘more probably male’. Similarly, it is beyond me why ‘Hugo’ would be rated as less masculine than ‘Jerry’ or ‘Juan’.

The authors’ own end-notes called ‘additional analyses’ indeed show that you get insignificant results without this additional variation begotten from making the names continuous. So the authors need to fiddle with the names of the hurricanes, pool two eras together whilst not controlling for era, and add two strongly correlated and opposing interaction terms in the same analyses to get the results they want. It is what economists refer to as ‘torturing the data until it confesses’.

Finally, for the observant, there is the following anomaly telling you something about the judgements made in this research: the masculinity of names is judged on a 1 to 11 scale (only integers) by 9 raters. Yet the averages reported in the authors’ appendices include such values as 1.9444444 (Isaac) and 9.1666666 (Ophelia). Note that if it were true that there were indeed nine raters, then all values should be an exact multiple of one-ninth, ie 0.11111111. The discrepancy indicates that either there were not always nine raters, or else that not all coded values were integers (an impossibility according to the main text). The 9.16666 for instance is a multiple of one-sixth and thus suggests only 6 rates were used for ‘Ophelia’. the 1.9444444 is a multiple of one-eigteenth, suggesting that there were twice as many raters for ‘Isaac’. Alternatively, in both cases, there were nine raters but one of the nine raters picked two values simultaneously (one even and one uneven) and thus added 0.055555 to a multiple of one-ninth in the displayed average. It is not a big thing as this kind of judgement is made all the time but I can’t find the footnote that owns up to this in the paper.

Spend on windshield insurance instead of a better car alarm system

Crime Scene
Crime Scene

Recently our car was broken into, despite being parked in what I thought was a relatively safe place. While getting the broken window replaced, I learned that a key consideration when designing car windows is that they shatter safely, so as not to injure passengers. Car windows are not primarily designed to keep the crooks out; in fact the police officer who inspected our car was surprised that the burglars apparently needed to hit our window more than once in order to break it.

A corollary is that car windows are easy to replace. If you design something to break, you might as well make it easy to swap. Many car windows are apparently held in place by just two little hinges (as is ours) and the entire replacement process takes 15 minutes.

What this means is that you are probably better off paying for optional “glass insurance” than you are for a better car alarm system. If someone wants to break into your car, the alarm is not going to stop them. It might even lead them to damage the door or other parts of the car that are more expensive to repair than the window 😉

In Australia, even “comprehensive” insurance packages do not usually cover damage to windows or windshields. In our case this is an optional extra that costs around $60/year. It turns out that each window costs around $200 to replace (the deductible is typically around $500). So at least to me, this seems a worthwhile extra to pay for.

Bait and Switch

News from Singapore this week made it all over the internet. A group of five diners were charged S$1224 (AUD1039) for a steamed fish at the new Resorts World casino. They had earlier ordered a different fish (which was presumably less expensive) but the waiter suggested a substitute without identifying the price. The diners later complained and received a 15% discount. But there are lots of people complaining online that the fish usually costs $6 per 100grams instead of the $60 per 100g charged by the restaurant.

It may sound a bit harsh, but in my opinion the diners exhibited a lack of bargaining skills. I would have refused to pay any more than for whatever fish it was the substitute for, after all it was the restaurant’s fault for running out of stock. It is also evident that the restaurant manager needs to do an MBA. It is a failure in marketing if people are complaining about your firm’s prices based on the cost of the raw materials used. My former MBA students would have learnt that in a well-run restaurant, customers would be happy to pay for the skills of its chefs, quality of the dining experience, and the ambiance. After all, if you’ve dropped by a high-end restaurant in Japan and eaten fugu (poisonous pufferfish), a price like US$100-200 per head is not unreasonable.

Singapore, June 30, 2010- THEY feasted on a fish named sultan – and were made to pay a king’s ransom for it. Well, not quite a king’s ransom, but a whopping $1,224 for that single steamed fish dish. And the bill left a sour aftertaste. The diner, who only wanted to be known as Mr Liu, 35, had taken his four friends to Resorts World Sentosa’s (RWS) Feng Shui Inn restaurant on June 12. The group had initially wanted marble goby, better known locally as soon hock, but the waiter said there was no stock for the fish. The waiter suggested the white sultan fish instead. The group agreed, without asking how much the dish would cost. They were stunned when the bill arrived. The single sultan fish, which weighed 1.8kg, set them back by $1,224. Source: http://soshiok.com/article/12333

ps: remember to ask the price before ordering at a restaurant.

My Week with H1N1

I suppose it would have been wise to have taken the H1N1 vaccine. But from many accounts, the pig flu was supposed to have only a mild effect on adults. I had observed this to be the case when my wife and various friends had it earlier this year. A few sniffles, a sore throat, and all was well again. So I did not give serious thought to being inoculated.

Well, Nature had decided to disprove my assumptions in a big way. I just spent the past week in bed battling a high fever, diarrhoea, sore throat and various other symptoms. According to the doctor, my symptoms were in line with H1N1. But there was a catch: I had also caught a second, bacterial infection, and that made a huge difference. The interaction of the two wreaked havoc on my body. The fever was difficult to control even with strong medicine, often leading to shivers. For five days I was on a liquid diet. I felt a profound tiredness that I had never felt the previous times I had battled influenza.

Inadvertently this led to some time well-spent in introspection and meditation. In fact this was perhaps the first week in a long time that I have been quite completely offline. I hope you will share the lesson I’ve learnt: the germs are pretty innovative and combinatorial attacks can be very nasty. If you haven’t gotten inoculated for H1N1, now’s the time.

ps: This is an Economics blog and makes no pretense of offering medical advice

UPDATE: I’ve been asked if this is a low-probability random event that I’ve experienced. No. The multiple infections are not independent: a first infection weakens the body sufficiently and makes it easy for a second infection to set in. Search around the web and you will find quite a number of reports in which H1N1 appears with pneumonia and other infections.

Comet McNaught indeed

Well, we just spent an hour looking for Comet McNaught. It was a cloudless sky but for one bank of dark clouds. We used this diagram from Steve Quirk. It is also useful in identifying the one set of clouds in the Melbourne sky on the 14th January. They were right there at the 14 point and didn’t move. So much for the brightest comet since 1965.

Imagine there’s no …

I have just finished reading Richard Dawkins’ latest book, The God Delusion. I have long had an affinity with Dawkins’ work. The Selfish Gene stands out. There, Dawkins outlined how evolutionary science had evolved, in particular, to be solidly grounded in the roots of game theory. That book was, in fact, my first real introduction to game theory and what it could do. Continue reading “Imagine there’s no …”

World’s oldest person dies

News today that the world’s oldest person, Elizabeth Bolden, had died at the age of 116. My reaction was: what? Again! Continue reading “World’s oldest person dies”