Review: Tomer’s Advanced Introduction to Behavioral Economics

In the next couple of months I shall, in preparation for an invited longer review essay on recent books on BE, post reviews of individual books such as Tomer’s, Angner’s A Course in Behavioral Economics, Cartwright’s An Introduction to Behavioral Economics, and Dhami’s The Foundations of Behavioral Economic Analysis. Comments are welcome.

Here is the first review, for your entertainment:

Tomer, John F. Advanced Introduction to Behavioral Economics. Elgar (2017). ISBN: 978 1 78471 991 3 (cased), ISBN: 978 1 78471 993 7 (paperback)

Tomer, an Emeritus Professor of Economics at Manhattan College, covers much ground in a fairly superficial manner. We are lectured about the scientific practices of “mainstream economics” (narrow, rigid, intolerant, mechanical, separate, individualistic; see p. 10) and the emergence of behavioral economics (BE). In passing, we hear about different “strands” of BE (chapter 3: “The bounded rationality strand”, chapters  4 and 5: “the psychological economics strand”, chapter 6: “behavioral finance”), “BE, public policy, and nudging” (chapter 7), “law and BE” (chapter 8), “behavioral macroeconomics” (chapter 9), “the empirical methods of BE” (chapter 10), and neuroeconomics (chapter 12).  We are also treated to an answer (I am sure you can guess it) to the question: “Are mainstream economists open-minded toward behavioral economics or do they resist it?” (chapter 11) In chapter 13 the author enlightens us about paths “Toward a more humanistic BE” and in chapter 14 we can read about “Behavioral economic trends”.

Each of these chapters are about 10 – 12 pages long. Along the way we hear about ENE’s  (Early Neoclassical Economics) and NE’s (Neoclassical Economics) “lack of behavioral realism. NE’s lack of connection to other social sciences in particularly regrettable for those who place a high value on a unified social science or at least on having many viable linkages among the different social sciences.” (p. 9) Referring to a decade-old study of his that was published in an inconsequential journal, we learn that “The results for NE (also referred to as mainstream economics) are quite clear. NE is rated high on all six dimensions (narrowness, rigidity, intolerance, mechanicalness, separateness, and individualism,” (p. 12).  After this paper tiger has been successfully constructed, we are told how it is being torn to smithereens: “ In contrast, the eight strands of BE … are in general far less narrow, rigid, intolerant, mechanical, separate, and individualistic than NE. … Overall, there is clear evidence that BE is 1) less positivistic than NE … , 2) distinctively different from NE, and 3) much more integrated with other social science disciplines than NE. In other words, BE is arguably better than NE in the way it conducts its scientific practices.” (p. 12)

This tired rhetorical figure has been used by those marketing BE for a long time. It also shows up regularly in the press (e.g., Elliott 2017 but see Attanasio et al. 2017 or for that matter Ortmann 2012), the related blogosphere, and even literature (Schumacher 2014): while BE is much more realistic and useful, NE is the old staid economics (that has done little for us). In the words of the protagonist of Dear Committee Members, “ … sociology has gone the way of poli-sci and econ, now firmly in the clutches of rabid number crunchers who have abandoned or forgotten the link between their abstruse theoretical  musings and the presence of human beings on the planet’s surface; .. ” (p. 152)

That lack of behavioral realism is, so we learn, addressed by behavioural economists’ wholesale adoption of psychological insights which inevitably “enrich” the dismal models of mainstream  economists.  Ignoring the interesting question what the trade-off is that these richer models come with – in this book this trade-off is never discussed –, there are at least two issues here.

First, and to repeat a theme that I have belabored elsewhere (see also this comment here), there is no such thing as a monolithic body of evidence in psychology that economists could mine to inject more behavioral realism in their allegedly dismal models. The fact is, much of the evidence on heuristics and biases that is being appealed to has been questioned left and right. Every halfway knowledgeable (behavioral) economist will agree that the only interesting question about cognitive biases (such as reference dependence, endowment effects, availability, anchoring & adjustment, and representativeness) is when, and under what circumstances, they exist (if they exist at all).

Second, and more importantly, psychology as a field has, at least since Bem (2011), gone through what many people have called a replicability crisis (e.g., OSC 2015, Spellman 2015, Schimmack 2018) that played out at first in blogs and discussion groups such as the Facebook Psychological Methods Discussion Group, but increasingly also in journals and their practices. You would not know that some such upheaval is happening from reading Tomer’s book.

Take, for example, Tomer’s telling discussion of Zak’s oxytocin research in chapter 13. We learn that he is “a well-known economist who appreciates the softer, more intangible side of human behavior” (p. 145) and has shown through his research that “there is a direct link between the amount of oxytocin in humans’ blood and brains and humans’ concerns for each other. … Most importantly, oxytocin fosters trust. Oxytocin surges in a person’s bloodstream when an individual is shown a sign of trust and/or when something engages in a person’s sympathies and they experience empathy. ” (lit cit) Unfortunately, these claims have been thoroughly debunked and even effectively ridiculed in one of John Oliver’s excellent shows. All the literature I know suggests strongly that Intranasal oxytocin has no discernible effect and claims to the contrary are about as much bogus science as claims of ego depletion and the empowering effects of power poses:  what these alleged phenomena reflect is little but shoddy science that people got away with for too long, demonstrating a cavalier attitude to questionable research practices from p-hacking over lack of proper powering up to hiding unsuccessful trials in drawers. You would not know about this crisis if you trusted Tomer who seems completely unaware of these developments that are slowly also starting to be recognized in economics.

Yes, I am not impressed by Tomer’s book. The knowledge laid out in Tomer’s slim volume is severely out of date and unabashedly partisan. According to the December 2017 IDEAS/RePeC data,  there are at least 50,000 research economists out there world-wide and they innovate every day in what is most likely one of the most brutally competitive industries the world has seen. The idea that somewhere someone (“mainstream economics”) has a monopoly on doctrinal truth and can enforce it, shows a stunning cluelessness about the current state of the art (and science) of economics and its sociology.  In his recent presidential address, Alvin E. Roth – an outsider of sorts himself — has argued that economics has been very open to various outsiders and their ideas and practices and you have surely seen that in the emergence of experimental economics and also in some quarters of BE (although BE remains afflicted with many charlatans, often of the non-academic kind that sell BE as panacea to everyone who thinks they can get something for nothing).

I doubt that Tomer’s slim volume is “particularly useful for advanced undergraduate students, graduate students, policymakers, and other professionals who participate economic-related matters.”  (statement on the  back of the book)  In fact, I fear it will promote more sloppy science of the kind that is on display in this book. That kind of sloppy science is also too often on display when you speak with policy makers and Behavioral Insights architects and the like these days.

When all is said and done, it is this kind of sloppiness that undermines trust in the joint enterprise called science.

Lemonade and the question of (laboratory) evidence

Lemonade Inc., the New York based fintech startup that sells home and renters insurance has been in the news recently. It has raised tens of millions in venture capital  and also considerable interest in the top echelons of corporate Australia. I know because I was asked to reflect on it as part of a workshop on behavioral economics/behavioral science that I conducted a couple of months ago. I have to admit that I did not know about Lemonade before that request.

Turns out that Lemonade uses “Behavioral Science (and Technology) To Onboard Customers and Keep Them Honest”, so the title of a piece in Fast Company earlier this year. Lemonade bets that insights from Behavioral Economics (BE) will give it the edge over incumbent competitors. It bets specifically that the BE insights of Dan Ariely (he of Predictably Irrational and TED talk fame, and now Lemonade’s CBE = Chief Behavioral Officer) will provide that edge, important components being “trusting our customers” and “giving back” to charity all unused excess funds. On top of these components, or maybe undergirding it, is the promise that Lemonade commits to spending at most 20 percent of its income on administration and marketing, which presumably prevents it from profit maximizing at the expense of its customers. Lemonade also promises that it will process claims fast and relatively un-bureaucratically, at least by the standard of an industry that has a reputation for delaying tactics and for its persistent attempts to evade having to pay up. Examples of speedy processing are featured prominently on Lemonade’s website.

And not only that: A couple of months ago, Lemonade launched its Zero Everything policy which gets rid of deductibles and rate hikes after claims and is supposed to pay for itself through elimination of the paperwork that comes with relatively small claims.

BE principles are also appealed to when customers that make claims are asked to submit a brief video outlining their claim and to provide at the same time a honesty pledge which supposedly induces more honesty.

In sum then, Lemonade builds its business allegedly on the trust(worthiness) of its customers, and of itself, and also honesty on the part of both parties.

Let’s start with the (laboratory) evidence for trust(worthiness). On its web page, Lemonade illustrates the advantages of trust(worthiness) with one of the workhorses of experimental economics, the trust, or investment, game. According to the web page, a person that invests (the trustor) will see her investment to a trustee of $100 quadruple and then see the trustee return half of that $400 to herself (the trustor), for an impressive ROI of one hundred percent. Trust pays off, we learn: “We are more trusting and reciprocating than what standard economic theory predicts.”

Ignoring the stab at economic theory (which shows little more than a lack of elementary knowledge of modern economic theory), there are at least three problems with the Lemonade narrative. First, it is not clear at all why this particular game, in this particular parameterization, captures the customer – insurance company situation. Second, I am not aware of anyone ever having experimentally tested this game with that specific parametrization (specifically, a multiplication factor of 4), and I am not aware — the multiplication factors typically used being 3 or 2 — of responders returning more than what was invested. In fact, the results of my own work (which are very much in line with the literature in this area) suggest that trustors invest about half of what they were given and trustees return slightly less than what was invested. It is noteworthy that there is much heterogeneous behavior to be found in these experiments, with many of those that trust (“invest”) being brutally exploited.

  “Everyone has a price, the important thing is to find out what it is.” (P. Escobar)

Which brings us to the question of honesty. There is indeed some evidence that the way in which people are being prompted makes a difference and, more generally, that context matters (see Various, JEBO 2016). Friesen & Gangadharan  (Economics Letters 2012) use an individual performance task (“matrix task”) after which they ask their subjects to self-report the number of successes that participants had. While very few of their participants – only one out of 12 — are dishonest to the maximal extent, about one out of 3 are to different degrees, with men (in particular those of Aussie and NZ provenance) being more dishonest, and more frequently so, than female participants. Rosenbaum, Billinger, & Stieglitz  (Journal of Economic Psychology 2014) review experimental evidence of (dis)honesty 63 experiments from economics and psychology (including Friesen and Gangadharan EL 2012) and find the robust presence of unconditional cheaters and non-cheaters with the honesty of the remaining individuals being particularly susceptible to monitoring and intrinsic lying costs. Most of these experiments involve fairly low stakes, so those intrinsic lying costs are unlikely to be much of a constraint when stakes increase. The fraction of unconditional non-cheaters is almost certain to shrink towards the Escobar limit when stakes increase.

Interestingly, notwithstanding its public declarations in the good of people, Lemonade tells itself that, while trust is good, control is better.  It runs its claimants, on top of the honesty pledges, through 18 different fraud detection algorithms before it pays up. On top of this, Lemonade engages in blatant cream-skimming. For example, it did not quote half of their customers that wanted to insure their homes. And it reports that the customers that are joining, or allowed to join, are younger, educated, tech-savvy, above-average earners, and female. So much for trust, trustworthiness, and all that BE marketing horsemanure. Pretty cold-blooded standard economic theory if you ask me. Note that this screening takes care of a key problem with their advertised approach: the likely adverse selection of bad types that mere trusting would invite, a very likely whammy on top of the moral hazard problem that every insurer faces.

So is Lemonade a viable business model?

Time will tell.

In the State of New York, Lemonade claims to have overtaken Allstate, GEICO, Liberty Mutual, State Farm, etc. in what is probably the single most critical market (renters and home insurance) share metric of all: NY renters buying new insurance policies since 1 Jan 2017.

Lemonade, we are told, is growing “exponentially” = “new bookings have doubled every ten weeks since launch, and show no sign of letting up.” According to its most recent Thanksgiving Transparency ‘17 report, Lemonade has now branched out into, and is selling in, Illinois, California and Nevada, Texas, New Jersey and Rhode Island, and has been licensed in 15 other states.

Of course, collecting insurance premia is one thing. Paying insurance claims and balancing the books is another thing altogether and the verdict on that one will be out for a while.

If Lemonade succeeds – and we all should hope it does –, it will do so because it engages in cream-skimming, targeting of low-risk market segments, and massive control and surveillance of its clientele. It will not do so because of its invocation of the feel-good alleged BE findings so prominently displayed on its web page.









How to lie with statistics: the case of female hurricanes.

I came across an article in PNAS (the Proceedings of the National Academy of Sciences) with the catchy title ‘Female Hurricanes are deadlier than male hurricanes’. It is doing the rounds in the international media, with the explicit conclusion that our society suffers from gender bias because it does not sufficiently urge precautions when a hurricane gets a female name. Intrigued, and skeptic from the outset, I made the effort of looking up the article and take a closer look at the statistical analysis. I can safely say that the editor and the referees were asleep for this one as they let through a real shocker. The gist of the story is that female hurricanes are no deadlier than male ones. Below, I pick the statistics of this paper apart.

The authors support their pretty strong claims mainly on the basis of historical analyses of the death toll of 96 hurricanes in the US since 1950 and partially on the basis of hypotheticals asked of 109 respondents to an online survey. Let’s leave the hypotheticals aside, since the respondents for that one are neither representative nor facing a real situation, and look at the actual evidence on female versus male hurricanes.

One problem is that the hurricanes before 1979 were all given female names as the naming conventions changed after 1978 so that we got alternating names. Since hurricanes have become less deadly as people have become better at surviving them over time, this artificially makes the death toll of the female ones larger than the male ones. In their ‘statistical analyses’ the authors do not, however, control adequately for this, except in end-notes where they reveal most of their results become insignificant when they split the sample in a before and after period. For the combined data though, the raw correlation between the masculinity in the names and the death toll is of the same order as the raw correlation between the number of years ago that the hurricane was (ie, 0.1). Hence the effects of gender and years are indeed likely to come from the same underlying improvement in safety over time.

Using the data of the authors, I calculate that the average hurricane before 1979 killed 27 people, whilst the average one after 1978 killed 16, with the female ones killing 17 per hurricane and the male ones killing 15.3 ones per hurricane, a very small and completely insignificant difference. In fact, if I count ‘Frances’ as a male hurricane instead of a female one, because its ‘masculinity index’ is smack in the middle between male and female, then male and female hurricanes after 1978 are exactly equally deadly with an average death toll of 16.

It gets worse. Even without taking account of the fact that the male hurricanes are new ones, the authors do not in fact find an unequivocal effect at all. They run 2 different specifications that allow for the naming of the hurricanes and in neither do they actually find an effect unequivocally in the ‘right direction’ (their Table $3).

In their first, simple specification, the authors allow for effects of the severity of a hurricane in the form of the minimum air pressure (the lower, the more severe the hurricane) and the economic damage (the higher, the more severe the hurricane). Conditional on those two, they find an insignificant effect of the naming of the hurricanes!

Undeterred and seemingly hell-bent to get a strong result, the authors then add two interaction terms between the masculinity of the name of the hurricane and both the economic damage and the air pressure. The interaction term with the economic damage goes the way the authors want, ie hurricanes with both more economic damage and more feminine names have higher death tolls than hurricanes with less damage and male names. That is what their media release is based on, and their main text makes a ‘prediction graph’ out of that interaction term.

What is completely undiscussed in the main text of the article however is that the interaction with the minimum air pressure goes the opposite way: the lower the air pressure, the lower the death toll from a more feminine-named hurricane! So if the authors had made a ‘prediction graph’ showing the predicted death toll for more feminine hurricanes when the hurricanes had lower or higher air pressures, they would have shown that the worse the hurricane, the lower the death toll if the hurricane had a female name!

The editors and the referee were thus completely asleep for this pretty blatant act of deception-by-statistics. Apparently, one can hoodwink the editors of PNAS by combining the following tricks: add correlated interaction terms to a regression of which one discusses only the coefficients that fit the story one wants to sell; then make a separate graph out of the parameter one needs in the main text, whilst putting technically sounding information in parentheses to throw editors, reviewers, and readers off the scent.

And the hoodwinking in this case is not small either. In order to accentuate what really is a non-result, the authors in the main text claim that “changing a severe hurricane’s name from Charley (MFI=2.889, 14.87 deaths) to Eloise (MFI=8.944,41.45 deaths) could nearly triple its death toll.” This, whilst in the years since 1979 the average death toll for their included hurricanes is 16 for both ‘female hurricanes’ and 16 for ‘male hurricanes’ (own calculations)! The authors conveniently forgot to mention in their dramatic result that Charley would have had to have been a hurricane that did immense economic damage but that had a very high minimum air pressure, ie was actually a very weak hurricane. Only for such an ‘impossible hurricane’ would their own model predict the increase in deaths from a female name. Put differently, I could have claimed that if the hurricane was very strong in terms of low air pressure, that changing the name from Charley to Eloise would have halved the death toll!

The authors also quite willingly pretend to have found things they have not in fact researched. They thus write “”Feminine-named hurricanes (vs. masculine-named hurricanes) cause significantly more deaths, apparently because they lead to a lower perceived risk and consequently less preparedness”” and the conclusions even speak of “gender biases”! Where do they try and measure this supposed bias in actual preparations? You guessed it, nowhere. PNAS should really clean up its act and not allow this sort of article, with its fairly blatant statistical artefacts, to slip through the cracks.

Let me explain the trickery in a bit more depth for the interested reader: air pressure and economic damage are highly related (the correlation is apparently -0.56), which means that one gets a strongly significant interaction between femininity and economic damage only because one simultaneously has added the interaction with minimum air pressure. One then talks about the interaction that goes the way one wants and happily neglects to mention the other one. And one needs both interactions at the same time to get the desired result on the interaction between the names and economic damage: without this interaction with minimum air pressure, what you get is a whole shift upwards of the male death prediction and a loss of significance on the interaction term with economic damage. You see this in the ‘additional analyses’ run by the author, in very small font after the conclusions, wherein the whole thing becomes insignificant for the first period and the reduced coefficient for the later period on the interaction with air pressure coincides with a halving of the coefficient on the interaction with economic damage as well. Hence, without including both interactions you would probably get that the female hurricanes are predicted to be less deadly than the male ones when the economic damage is small and more deadly when the damage is large (to an insignificant extent). So you need the interaction that is almost invisible in the main text and the conclusions to ‘get’ the result that the headlines are based on.

There is another, even more insidious trick played in this article. You see, with only 96 hurricanes to play with, which really only includes 26 to 27 ‘male’ hurricanes, the authors are asking rather a lot from their data in that they want to estimate 5 parameter coefficients, three of which based on names. If you then only use a simple indicator for whether or not a hurricane has a male name, you have the problem that you don’t have enough variation to get significance on anything.

So what did the authors do? Ingeniously, they decided to increase the variation in their names by having people judge just how ‘masculine’ their names were. Hence many of the ‘female’ hurricanes were ‘re-badged’ as ‘somewhat male hurricanes’. So the female hurricanes of the pre 1979 era had an average “masculinity index” of 8.42, whilst those of the new post-1979 era had an average of 9.01. Simply put, according to the authors the female hurricanes ‘of old’, which were of course more deadly as they occurred earlier, were also more masculine, contributing to the headline ‘results’.

Supposedly masculine female names included “Ione”, “Beulah”, and “Babe”. And who judges whether these are masculine names? Why, apparently this was done by 9 ‘independent coders’, by which one presumes the authors meant colleagues sitting in the staff room of their university in 2013! Now, even supposing that they were independent, one cannot help but notice that the coders will have been relatively unaware of the naming conventions in the 1950s and 1960s. How is someone born in 1970 sitting in a staff room in 2013 supposed to judge how ‘masculine’ the name ‘Ione’ was perceived to be in 1950? These older names probably just sounded unusual and hence got rated as ‘more probably male’. Similarly, it is beyond me why ‘Hugo’ would be rated as less masculine than ‘Jerry’ or ‘Juan’.

The authors’ own end-notes called ‘additional analyses’ indeed show that you get insignificant results without this additional variation begotten from making the names continuous. So the authors need to fiddle with the names of the hurricanes, pool two eras together whilst not controlling for era, and add two strongly correlated and opposing interaction terms in the same analyses to get the results they want. It is what economists refer to as ‘torturing the data until it confesses’.

Finally, for the observant, there is the following anomaly telling you something about the judgements made in this research: the masculinity of names is judged on a 1 to 11 scale (only integers) by 9 raters. Yet the averages reported in the authors’ appendices include such values as 1.9444444 (Isaac) and 9.1666666 (Ophelia). Note that if it were true that there were indeed nine raters, then all values should be an exact multiple of one-ninth, ie 0.11111111. The discrepancy indicates that either there were not always nine raters, or else that not all coded values were integers (an impossibility according to the main text). The 9.16666 for instance is a multiple of one-sixth and thus suggests only 6 rates were used for ‘Ophelia’. the 1.9444444 is a multiple of one-eigteenth, suggesting that there were twice as many raters for ‘Isaac’. Alternatively, in both cases, there were nine raters but one of the nine raters picked two values simultaneously (one even and one uneven) and thus added 0.055555 to a multiple of one-ninth in the displayed average. It is not a big thing as this kind of judgement is made all the time but I can’t find the footnote that owns up to this in the paper.

Predictions versus outcomes in 2013?

In the last 5 years, I have made a point of giving clear predictions on complex socio-economic issues. I give predictions partially to improve my own understanding of humanity: nothing sharpens the thoughts as much as having to actually predict something. Another reason is as a means of helping my countries (Australia/the Netherlands) understand the world: predicting socio-economic events is what scientists are for!

Time to have a look at my predictive successes and failures over the last few years, as well as the outstanding predictions yet to be decided. Let us start with what I consider my main failure.

                 Failed predictions

The main area I feel I haven’t read quite right is the conflict in Syria, as part of the general change in the whole Middle East. I am still happy with my long-run predictions for that region, where I have predicted that urbanisation, more education, reduced fertility rates, and a running out of fossil fuels will lead to a normalisation of politics in a few decades time. But at the end of 2012 I was too quick in thinking the Syria conflict was done and dusted. To be fair, I was mainly following the ‘intrade political betting markets’ which was 90% certain Assad would no longer be president by the end of this year, but the prophesised take-over of the country by the Sunni majority has not quite happened. The place has become another Lebanon, with lots of armed groups defending their own turf and making war on the turf of others. The regime no longer controls the whole country, but is still the biggest militia around.

What did I fail to see? I mainly over-estimated the degree to which the West would become involved. I expected the Americans and the Turks to put a lot of resources into the more secular militias, giving them training grounds and more modern equipment. As far as I can tell, this did happen a bit, but simply not to the degree I thought likely, and I don’t really know why. There were several attempts by the US and Turkey to identify an ‘opposition coalition’ to then support, so something hidden from view must have prevented actual support. Perhaps the US has decided it prefers Assad to the alternatives after all.

The willingness of the Iranians and Russians to support the regime has also been stronger than I thought, and the efforts of the Sunni-neighbours to support the non-regime militias have been less cogent than I thought: instead of backing a clear group that had a real future in terms of leading the country (the more secular groups), foreign anti-regime support came mainly for the crazies who went along with the ideology of fanatics elsewhere. That suggests a lack of pragmatic involvement from the neighbours.

I wouldn’t call it a complete predictive failure because Syria as a country no longer exists: it now does have all kinds of regional power brokers and so one could ‘claim’ the regime indeed has lost (most of) its power, but the conflict has gone on longer than the betting markets that I went along with predicted. So this also educates me about the lack of intellectual weight to that kind of political betting market: these are probably more feel-good markets with low turnover that simply don’t aggregate much hidden information. As a related failure, I can mention that I put a low probability on the event that the Muslim brotherhood would overplay its hand when in government in Egypt. I did mention the possibility (see later), but didn’t think it would happen.


                Successful predictions

A very recent prediction of mine was on bitcoins. A month ago, I said governments were going to intervene because of the money laundering opportunities in the bitcoin network, and that it hence would not become a dominant trading currency. The next week, the Chinese came down with severe restrictions on bitcoins in their country: financial institutions were not allowed to trade in it and individuals trading in it had to register with their real names, killing off most laundering opportunities. As a result, the value of the bitcoins halved. I wouldn’t claim bitcoins are quite dead yet. It is when many other countries start to enact similar regulation (as some are doing) that it becomes an official curiosum.

Other predictions have been on various aspects of the GFC in Europe. I predicted such things as the Greek defaults when European governments were still pretending they would not occur, the survival of the Euro when there was lots of speculation on imminent euro exits, the inability of the ECB to actually meaningfully monitor banks, and the failure to get agreements on tax evasion (which have all been painfully clear in 2013).  My proudest moment was to predict in December 2011 the overall trajectory of where the politics of the financial crisis was heading: support for weak new institutions in exchange for continued bailouts and forms of money printing, with national sovereignty as the sticking point preventing stronger institutions. We are still on that trajectory now, as this very recent report by the Bruegel Foundation argues which dryly summarises recent events: “Five years of crisis have pushed Europe to take emergency financial measures to cushion the free fall of distressed countries. However, efforts to turn the crisis into a spur for “an ever closer union” have met with political resistance to the surrender of fiscal sovereignty. If such a union remains elusive, a perpetual muddling ahead risks generating economic and political dysfunction.” The latest banking deal fits this mould perfectly.

I am also proud of my predictions on the ill-fated Monti-government in Italy of 2012. Before he was in power, I predicted he was unlikely to have the personality to change anything, and within weeks of him in government (December 2011) I mentioned the reforms he was talking about were dead in the water, months before the magazine The Economist still put him up as a great reformer. Only in 2013 did mainstream media outside of Italy wake up to his failure. I am similarly looking good on my observations regarding the problems in Spain.

On the Middle East, in 2011 I picked the current Lybian chaos coming from its resource curse. A few weeks into the Arab spring I predicted the ensuing grand coalition in 2012 between islamists and the military in Egypt, whereby the islamists would form government but with a tacit agreement with the military not to interfere with the economic interests of that military. I also predicted that the torture machine of the Egyptian military would first deal with the urban youth and then become oriented towards the islamists should they step out of line, which they did.

The main prediction I have been making since 2007 (and which has gotten me into the most trouble!) is the uselessness of looking for a world coalition to reduce CO2 emissions, mainly because the temptation to free-ride is irresistible both within countries and between them. I have thus consistently called to forget about emission strategies and to instead think of technological advances, geo-engineering and adaptation. In each year since 2007, the developments have been accordingly: steady increases in actual emissions with a growing number of scientists and research groups thinking more seriously about geo-engineering: previous agreements on emissions have not been kept and new ones are toothless, whilst you get many beautiful political speeches designed for consumption by the gullible during each new conference on the issues.

In 2013 for instance, the Japanese reneged on their earlier Kyoto promises because they decided to switch from nuclear to fossil, following on from a previous reneging by Canada. Similarly, the EU watered down its commitments in order not to upset the German car industry, whilst China and India and others helped prevent emission agreements with any bite. A nice write-up of the recent Warshaw talk-fest can be found here.  Conspicuous in that write-up is the increased awareness of the importance of adapting to climate change, and the degree to which hope lies with new technology, not massive emission reductions under existing ones. The Australian deal with the EU trading scheme, which was all smoke-and-mirrors anyway, has fallen through, essentially replaced with a policy of ‘business as usual till the bigger players come up with a plan’, which I see as a sensible policy for Australia at the moment.


                Predictions on the ledger

In many ways, the ‘emission controls are hopeless’ prediction is a running prediction for decades, so that one is very much still on the ledger. And one in which I am quite willing to bet against those who say they believe serious emission reductions will come about via emission markets or other controls.

Another prediction coming ‘half-good’ recently is the bet with Andrew Leigh on happiness and incomes in rich countries, where my prediction was that richer countries getting even richer would not get happier. For the data we agreed to look at it, this indeed held, but more because I got lucky with the data available – other data showed different results. Read about it in my recent blog on the topic by following the link!

Another prediction ‘on the ledger’ is that there is going to be no real change in Chinese politics till several years after they run out of easy growth opportunities, say 20 years from now. After that, I predict stronger and stronger pressure to adopt a Western-style political system from the Chinese business community. I gave a possible trajectory for how it might happen (local experimentation growing into national systems), but that is not the only way change might happen, if it happens at all. The prediction is the consolidation of the one-party rule till years after the growth has levelled off. That consolidation has indeed been in full swing this last year: as a recent piece of the Institute of Peace and Conflict Studies argues, in 2013 we got more media control and control over the economy by the party. Still, there are some embryonic signs of attempts to get some kind of separation of powers in that country, such as via more independent judiciary and financial institutions.

The prediction that the ‘behavioural genetics’ crowd is going nowhere soon is also a prediction ‘on the ledger’. The same goes for the prediction that Australia is not going to seriously improve its education-for-the-masses anytime soon, and the unlikelihood of solar replacing fossil fuel for mass electricity-generation anytime soon.

There is then a whole heap of predictions that I am quite happy to say have come true, but where it is also a certainty someone else would disagree. For instance, I predicted that the Melbourne Model, which is a change in how the University of Melbourne structures undergraduate education, would lead to dumbed-down degrees. Everything I hear about that place confirms it, but I would be astounded if the chancellery of the University of Melbourne would agree with that assessment! Similarly, my stated fears regarding the Gonski reforms (not quite predictions as I made it clear I had a hard time finding out what was actually going to happen) are looking all-too-true, but I am sure the ministries involved would disagree. One can trawl my archives for several more such ‘debatable’ prediction outcomes.

Finally, I have a bet on with Conrad Perry for what is going to happen in Egypt next. My prediction is that the next elected government will again be an islamist-lead government, a kind of Brotherhood 2.0. They may change labels and be even more careful, but I thought it likely that they would be involved as a dominant player in the next elections simply because of the high level of religiosity in that country. Conrad Perry bets on ‘all other outcomes’ with a bottle of red to the winner. Jim Rose also made an implicit prediction, which is that the new generation of military are going to be successful in their bid to monopolise power in Egypt, but he didn’t bet anything. Still, Jim is looking rosy on that prediction.

The prediction+bet with Conrad on Egypt was entered into around August/September and things have moved on a bit since then. The Egyptian military has proven more popular and bent on total control than I thought, but we are still looking at a situation in which one is likely to get democratic elections (though the military might well rig them). I will say I am less confident about my prediction now than 3 months ago, essentially because the military has been more brutal than I thought they would be, but there is still a chance for my prediction to happen so I am not ready to concede defeat on that one yet!

Rich countries and happiness: the story of a bet.

Do countries that are already rich become even happier when they become yet richer? This was the essential question on which I entered a gentleman’s bet in 2004 with Andrew Leigh and which just recently got settled.

The reason for the bet was a famous hypothesis in happiness research called the Easterlin hypothesis which held that happiness did not increase when rich countries became even richer. When I was preparing a presentation on this matter in 2004 I used the following graph to illustrate the happiness income relation across countries:

gruen 2004 image

This graph shows you the relation between average income (GDP in purchasing power terms) and average happiness on a 0-10 scales for many countries. As one can see, the relation between income and happiness is upward sloping for low levels of income, but becomes somewhat flat after 15,000 dollars per person. I championed the idea that this was not just true if you looked across countries, but that this would also hold true over time.

Andrew Leigh’s thinking was influenced by other data, particularly a paper by Stevenson and Wolfers which – he thinks debunks the Easterlin hypothesis. Here’s one of their graphs:



What’s striking about this graph is that the dotted line slopes up in the top right corner. In other words, the relationship between happiness and income becomes stronger, not weaker, for countries with average incomes over $15,000. Andrew thinks that this is because they specify income in log terms (in other words, we’re looking at the effect on happiness of a percentage increase in income rather than a dollar increase in income). I think it’s because the Gallup poll isn’t measuring happiness, but is instead asking people to rank themselves on the Cantrill ladder of life scale.

So our gentleman’s bet was in effect a bet on whether happiness in the world value surveys behaved different to the ladder question of the Gallup polls, and on whether the short-run relation between income and happiness was strong enough to show up in periods of 5 to 10 years as well. Andrew thought it would, I thought 5-10 years would be long enough for the typical long-run no-effects findings to show up and that happiness has a different relation with income than the Cantril-question. So we bet on whether one would get a significantly positive relation between GDP growth and happiness changes for the rich countries when one looked at the World Value data for 2005. We agreed to look at the relation between income and happiness using country-average variation. The winner would get 100 bucks.

Now, both of us forgot about the bet for a few years as the data was supposed to become available. Only recently did Andrew remind me of our bet and asked to check what had happened.

When I (with research assistance from Debayan Pakrashi) started to look into this data again, it quickly became apparent that Andrew and I had been pretty sloppy in formulating the precise conditions of the bet. In many ways, our bet had been far too vague.

For one, the World Value survey is not in fact held in particular years. Rather, some survey is run almost every year in some country that adds to the collection of surveys known as the World Value Survey. Hence there was really no such thing as a ‘2005 wave’. Taken literally, only Australia, Finland, and Japan had a survey in 2005 and were countries that in the previous wave already had a GDP of 15,000 dollars. In all those countries, income had gone up a lot since their previous survey, with Australian happiness down and Japanese and Finnish happiness up. That is a bit meagre as ‘waves’ go.

So the first ‘addition’ was to have a bandwidth of years for the ‘2005’ waves that included 2004, 2005, 2006, 2007, and 2008. That gave 12 countries that were rich enough in the previous wave to qualify. The raw data was:


The next ‘snag’ was of course that there are many ways to define the dependence on income: linear or logarithmic. With logarithmic income one normally gets stronger statistical significance on income, so we went for logarithms.

Then, of course, there are still many other things one can put into the regression. Does one account for effects of particular years (in bands) and for the level of happiness that a country starts? We decided to try it all. Hence the final ‘deciding’ set of regressions were as follows:



Which tells you that the relation between income changes and happiness changes (the last two columns) was either quite insignificantly positive or even negative if one entered year-bands.

When one reflects on the list of countries used in the analysis though, it is clear that the outcome of the bet will have had little to do with the true relation between income and happiness. It will have hinged on hidden aspects of the data. For instance, the Australian world value survey in 1995 was run differently from the 2005 version. Hence the big drop in Australian happiness you see in this period for this data does in fact not show up for other Australian data (like the HILDA). So one suspects some change in the data-gathering to be responsible for it. Indeed, the level of Australian happiness in this data is markedly below the level found for the HILDA (where it is almost 8.0).

Similarly, the big increase in Japanese happiness in this period doesn’t show up either in other Japanese data and so probably has something to do with changes in how the survey was run there. The changes can relate to the months in which the surveys were held, the precise words used for the happiness question, the questions preceding the happiness questions, the cities in which the survey was run, how the survey was run (face-to-face or via telephone), etc.

So I may have gotten lucky and won the bet, but one cannot see the outcome as decisive evidence that income and happiness have no long-run relation within rich countries. The data for the 2010 post-GFC wave might well show the opposite!

The water you drink has been piss at least 10 times already!

Last thursday I posed the question of how often the water you drink has been pissed by a vertebrate already. If the number is very small, then those who baulk at drinking recycled water have more cause to complain than if the number is very high.

As some commentators to that post pointed out, in reality we are all drinking water that includes some recycled piss: every dam from which we drink has ducks, lizards, and all sorts of animals pissing and shitting in it, so it is already a bit of a myth to think one can drink water that has not been recently mixed with piss. Still, as another comment revealed, many think the idea of copying Singapore and drinking water that is officially recycled sewage is deemed ‘gross’. So the question how often water has been piss in the past still matters for the ‘yuk factor’.

The answer comes from a very simple formula, which requires a few guesstimates as inputs:

Piss ratio = (total water pissed)/(total water) = (total vertebrate biomass ever lived* piss rate)/ (total water) = (average biomass vertebrates * piss rate per year * years of vertebrates) / (total water)

This simple formula thus boils down to 4 inputs for which we can search for good guesstimates.

The amount of water on the planet (total water) is the easiest one because it is the sort of thing geologists and physicists are good at estimating. As this linked article computes, there is around 1.386 billion cubic kilometers of water on the planet. Whilst it is true that this water comes in various forms, that is not relevant for the calculation: since we are considering hundreds of millions of years, it doesn’t matter how much of that water is currently salt, fresh, stored in ice, or whatever: compared to such long time horizons it all circulates pretty fast so there is no problem in taking it all as one blob of water.

I can already say that my best guess for how much water we humans have pissed during our existence is around 800 cubic kilometers, meaning that only one 2-millionth of the atoms in the average water molecule will have been pissed out by a human. So we might be drinking reconstituted piss, but not much is reconstituted human piss.

Now, onto the other three inputs into our crucial equation. What is the average wet biomass of vertebrates? If we take the present as a reasonable guess for how much vertebrate biomass the earth continuously houses, then the answer we can gleam here is around 10% of total animal biomass (zoomass), or in the order of 5 billion tonnes of wet biomass (a lot more than dry biomass which you will often see reported). This includes up to 2 billion tonnes of dry-biomass fish, a little under half a billion tonnes of human, close to a billion tonne of things we might eat that walk on land (cattle and such), and 2 billion other wet biomass. In turn, this is in the order of one thousands of total biomass.

Admittedly, the estimate of 5 billion tonnes of wet vertebrate biomass may be out by a factor of 2 or so, but can easily be an under-estimate since I only found a dry biomass estimate for fish.

Then the next part of the equation: how much does a vertebrate piss per year? Again, this turns out to be a tricky question because only birds and mammals produce concentrated urine like we do. The rest pisses much weaker stuff, though things like fish still produce ammonia and the other normal elements of piss because the basic physiology is not that different between us and a fish. So the process and form of piss is not the same across species but the substances produced by our bodies and eventually excreted somehow are not that dissimilar.

So we need to slightly alter the definition of what we are looking for and think of piss as a ‘human-like’ substance. We can then again take a conservative approach and don’t count the watery piss that fish produce as ‘100% piss’ but rather as a much weaker variety of what we produce. We can then take ourselves as the measure of what a body produces and simply scale up, getting an easier question to start out with: how much do we humans piss in a year? The answer turns out to be that we piss around 1.5 liters per day, or 500 liters per year. Another way to put this is that we piss out 8 to 9 times our weight in wet biomass per year.

Then onto the last unknown, which is the number of years that vertebrates have been around in the abundant form of life we have now. Again, a tough one. The earth is now quite a bit cooler and probably less fertile than it was in the times of the dinosaurs, so the amount of biomass walking around now is probably quite a bit less than it was in the more productive phases of earth, but by the same token for much of the earth’s inhabited history the inhabitants were bacteria and not things with spines. If we concentrate on the period of the vertebrates, the best guess is that fish arose some 500 millions years ago, whilst land was conquered by vertebrates some 380 million years ago. Taking a conservative guess for the total period of time that the volume of vertebrates we have now has been present, this means that the wet vertebrate biomass we have now has occupied earth for around 350 million years.

We can now put the pieces together to compute our piss ratio: 350 million years of 5 billion tonnes of wet vertebrates pissing 8 times their body weight per year equals 14,000 million cubic kilometers of piss. This means the atoms in your average water molecule will have been concentrated piss some 10 times already. And that is a conservative estimate. In the more likely scenario, there would have been more like 10 billion tonnes of vertebrate biomass on average, pissing 10 times their own body weight, living 400 million years, equating to water having been piss around 25 times already.

Perhaps equally interesting I can give some idea how often the water has been piss from a particular group of vertebrates. Starting from the best guess estimate, water has been fish piss some 10 times, mammal piss around twice, and other forms of piss 13 times. Only a trickle has been monkey piss.

As per usual, champagne to all those who thought the answer was ‘often’ (which is all commentators game to give a guess). Unflavoured recycled filtered desalinated naturalised piss for the rest!

Thoughts on “Thinking, fast and slow”

I couldn’t resist buying a copy of Daniel Kahneman’s best-seller when returning from holidays. Several friends and colleagues told me it was a great book; it got great reviews; and Kahneman’s journal articles are invariably a good read, so I was curious.

Its general message is simple and intuitively appealing: Kahneman argues that people use two distinct systems to make decisions, a fast one and a slow one. System 1, the fast one, is intuitive and essentially consists of heuristics, such as when we without much thought finish the nursery rhyme ‘Mary had a little…’. The answer ‘lamb’ is what occurs to us from our associative memory. The heuristic to follow that impulse gives the right answer in most cases but can be lead astray by phrases like ‘Ork, ork, ork, soup is eaten with a …’. Less innocuous examples of these heuristics and how they can lead to sub-optimal outcomes are to distrust the unfamiliar, to remember mainly the most intense and the last aspect of an experience (the ‘peak-end rule’), to value something more after possessing it than before possessing it (the ‘endowment effect’) and to judge the probability of an event by how easily examples can come to mind.

System 2, the slow way to make decisions, is more deliberative and involves an individual understanding a situation, involving many different experiences and outside data. System 2 is what many economists would call ‘rational’ whilst System 1 is ‘not so rational’, though Kahneman wants his cake and eat it by saying that System 1 challenges the universality of the rational economic agent model whilst nevertheless not wanting to say that the rational model is wrong. ‘Sort of wrong sometimes’ seems to be his final verdict.

Let me below explore two issues that I have not seen in the reviews of this book. The first is on whether or not his main dichotomy is going to be taken up by economics or social science in the longer-run. The second, related point, is where I think this kind of ‘rationality or not’ debate is leading to. Both issues involve a more careful look at whether the distinction between System 1 and 2 really is all that valid and thus the question of what Kahneman ultimately has achieved, which in turn will center on the usefulness of the rational economic man paradigm.

Continue reading “Thoughts on “Thinking, fast and slow””