Do we know in social science what it is that we are measuring or does any bit of data we look at on closer inspection reveal more complexity, no matter how close we look, just like a fractal? Another way to put this is to ask whether anything we measure is solid, concrete and absolutely trustworthy, a true ‘fundamental’ on which one can build further theories?
Let us think about this by unpacking particularly important statistics at deeper and deeper levels so as to see whether we get anything concrete the closer we look. I will start with GDP and end up with brainwaves.
Let us start with Gross Domestic Product. It is now measured for nearly all countries in the world and GDP series for some countries go back for centuries. Angus Maddison even constructed series going back to the Roman Empire. GDP levels and changes in it are used millions of times per day in official documents, research reports, cross-country analyses, newspapers, blogs, etc.
You would then think that something as fundamental and widely used as GDP would be like a solid rock of data: something you can count on to be ‘correct’ and to measure something important. Let us see.
How is GDP constructed and what is it? In Australia, one can make the case no-one really understands where the whole number comes from, even though the Australian Bureau of Statistics gives a quarterly number. You see, it is built up from statistics from many different sources involving thousands of data gatherers, as you would expect from an aggregate number about the whole economy: wage information from all the sectors of the economy go into it, import and export information go into it coming from the trading institutions, tax information goes into it regarding investments and capital write-offs, etc. Essentially all the sources of production and income that involve money get sent to a GDP data creation group via several detours involving separate units in other parts of government making up such things as trade figures and capital accounts. The underlying statistics are thus from many groups and I am confident in saying there is probably no single individual in Australia truly on top of how all that underlying data came about.
Still, the fact that there is probably no-one who really knows where every aspect of the number comes from does not necessarily invalidate it, though it does mean one has to trust a whole lot of people to have measured ‘the right thing’ in terms of data they sent on.
Once sent to a central place, a kind of magic is let loose on the bundled production and income information. All kinds of statistics ‘needed’ to get a GDP number are imputed for which there was no information, ie they are made up. This is partially for good reasons: many statistics feeding into GDP are not in fact collected quarterly, such as income tax information. Many sources of information have missing data, such on historical capital levels or levels of imports and exports. The underlying reason for such things often have little to do with supposed laziness of the central statisticians: it is simply not so easy to know what the current value of container freights in the harbour of Sydney is and delays in data processing are partially due to honest mistakes by companies or ships that run late.
The magic does not stop at guessing a few numbers. The GDP people ‘correct’ for how many Sundays there are in their data, what the timing of the holiday was, and many other seasonal influences. After all, the general feeling is that the accident of having, say, 2 more public holidays in a quarter in one year than the next year should not be seen to be the 2% drop in production that this in reality entails. Hence, a kind of ‘holiday smoothing’ takes place. So even conceptually the GDP figures you are fed in the media are not really the ‘level of production in a quarter’. Also, because a lot of information arrives late, is imputed, or is downright distrusted, the GDP people use information on GDP in both the previous quarter and in the subsequent quarter to inform them of what GDP in one particular quarter ‘should have been’.
Think of what this kind of forward-and-backward looking does: it builds in an automatic way in which GDP starts to look like a cycle: it starts to look like a well-behaved wave simply because it is constructed as a complicated weighted average of forward and backward looking information, even if in the ‘raw data’ GDP is much more erratic. Moreover, it means that there are ‘vintages’ of GDP for any moment in time as more information becomes available. There is not just one number for GDP in the first quarter of 2013, but there will be many ‘updates’. So our guess as to what GDP is in 2013 is different in 2013 from what the guess of 2013 GDP will be in 2014, 2015, etc. And these vintages can sometimes vary an awful lot (see for instance here), meaning that a year which seemed to be a boom year can later on be said to be a recession year, and then later again a boom year. 1948 is for instance a year that only became a recession year decades later due to imputations of capital series.
Background documents telling you how GDP and its components are constructed run into the thousands of pages (glance here for instance). And even they do not have all the information you would need to get on top of it for these handbooks tells you about data manipulations and definitions, not the construction of the ‘raw data on the ground’. Hence, if you look closely, the whole construct of GDP starts to look more and more shaky, quite apart from whether it measures ‘aggregate production’ (which is another discussion, usually answered with ‘no’ as GDP fails to measure all the goods and services that are outside of the tax system, such as the environment and home production): it is not a solid and unchanging number if we zoom in, but rather a moving target. We thus don’t really know what the number means. We just know we want it to be high.
You should thus realise that almost any seminar you go to where people make a big deal about changes in GDP from one quarter to the other is dependent on these statistical conventions and tricks: a lot of STAR, VAR, and other models ending in ‘AR’ (which are estimated on a daily basis at the RBA) that are use to analyse quarterly variation in GDP run the risk that they might merely re-discover how the data was constructed rather than uncovering something deep and meaningful about our economy.
Let us then look a step deeper and pick out a very small particular aspect of what ‘should be’ in GDP, say education production. Now, intuitively you might think that production in education should be measured by how much is learned and thus some measure of the increase in knowledge held by those educated. Alas, no. That is far too hard to measure. Why is it too hard to measure? Think about it: a kid does not just learn from teachers, but also from peers, parents, own discovery, tv, etc. How would you then assign any measured increase in knowledge to the teachers? Basically impossible, so we don’t even try.
So how do we then measure educational production? Simple: we count the costs of education. So we add up all the salaries of the teachers and administrators and all the costs of the building and all the costs we see of the libraries and books. And we then call it production.
What does this mean? Think about it: whether or not there is any learning, the salaries of the teachers count even if they would not show up at school and have no pupils. It is not production in any sense except in the sense that it counts for GDP. Worse, changes in the measured costs turn up as increases in production. Hence a general increase in property prices will show up as increased value of school grounds and increased costs of building schools, and thus as an increase in education production. Note also how fragile and hard to measure this ‘property price’ aspect is: it is not easy to know what the current property prices are because not all buildings get sold every day and the price of any building can vary simply because different people showed up to buy based on fairly accidental circumstances. So not only does one have to guess the prices in reality but one has to be wary of measured prices too. Note also that of course it is not easy to say who is a teaching administrator and who administrates something else, such as, say, the ministry of education building, so there is a large amount of fudge in terms of what counts towards education versus other things.
Once again therefore, educational production disintegrates as a solid measured concept if you look close at how it is measured. Not that that prevents ‘net human capital stock’ to be compared over time and across countries for a century, but by now you should realise that lack of certainty in what data means does not stop it being used. You can guess what the underlying uncertainties means for research into such things as the ‘cost-effectiveness of education’: if you don’t actually measure anything looking like real production then good luck with being very precise about cost-effectiveness of that production. Indeed, I hope from now on you look with a bit more scepticism any time you see an analysis of ‘GDP and education’, of which there are thousands.
Let us go the next measurement layer and consider something as seemingly clear and fundamental as being female, which the vast majority of workers in education in Australia are. Surely here we have something solid: barring a few mistakes we can’t get gender wrong, can we?
Think again. If you zoom in to the concept of gender, it is suddenly not clear at all what one is measuring: is being a female about not having overt male genitalia? Is being a female about wearing a dress? Is being female about being less aggressive and looking after the kids?
Put this way, it should become clear to you that measuring gender on an all-or-nothing basis, which is what statisticians do, is a complete misnomer. Not all females are equally small, nor do they have equal levels of testosterone, nor do all have caring roles inside their families, nor do they perceive themselves in the same way as every other female, etc.. Indeed, even if you just zoom in on just the supposed genetic basis of ‘gender’ you do not get clarity: not all women are XX and not all men are XY. You do not just have all kinds of ‘inbetweens’ (XXY’s and the like), but even within the XX and the XY groups there is actually a huge amount of variation as to how many ‘gender relevant genes’ are ‘turned on or off’ depending on things as trivial as a good night’s sleep, and of course there are actual genetic differences within the genders: genes, turned off or on, differ between people.
Even the things that all ‘females’ truly have in common disintegrates when one looks closely: there are laws particular to ‘females’ (ie they until recently could not fight as frontline soldiers), there are toilets just for them, and as a quick label assumptions are made about their roles in life. These are solid things, no? Well, these things too differ in time and across space: the assumptions made about gender differ from year to year and street by street. Army duties and possibilities change over time and by army unit. Even toilets varies, as do the amenities in toilets. So on closer inspection even the things that seem truly the same for all females are not in fact the same over time and across space.
So what the statistician conveniently lumps into the all-or-nothing variable ‘female’ actually disintegrates as a solid concept if you zoom in. Note that this does not prevent nearly all empirical social scientists from happily putting ‘gender dummies’ in their empirical equations and tables as if it is a solid concept meaning the same thing for all entities labelled as ‘females’ in all years in all countries. Strictly speaking, as with all the higher-up concepts like GDP and educational production, this means that social scientists use variables with a high degree of non-random measurement error in their analyses. To a purist, this invalidates all of them, not that that has ever stopped us running these analyses. And yes, I run regressions with ‘female dummies’ all the time so I cannot claim to be holier than any others in this regard. But if we are truly anally scientific about this, using gender dummies instead of explicitly recognising that it is a variable that only with a great degree of non-random measurement error might measure some underlying fixed construct means we cannot be assured of the robustness of any estimation results involving a gender variable. The only good news to this damning reality is that the same problem occurs with any other variable we put into our analyses.
Zooming another layer deeper, let us now think of the brain activity in a particular part of females, say the cerebellum. The uninitiated social scientist might think we have finally arrived at a level where we get precision in measurement and interpretation, and I regularly meet social scientists exited by the certainty they soon expect brain scans to give us that was not found at any other level.
Alas, not so. Not just does the cerebellum alone have more than 50,000,000,000 neurons, but each of these neurons has about 10,000 connections (dentrites) to other neurons, including neurons in the rest of the brain. Do we measure the individual electrical and chemical currents between all these individual neurons and connections? Of course not. All we manage to measure, and even then with great difficulty, is how much total activity there is in whole areas of the cerebellum. In these large areas, we are effectively summing activity over billions of neurons in thousands of functional groups, meaning that all we measure is the aggregate of thousands more specialised groups.
Even the much smaller functional groups (whose individual activity we rarely measure) have all kinds of functional roles, including motor memory but also involved in emotions and spatial awareness. A single group might have as small a basic role as calculating how to flex one of the twenty-odd muscles in the hand, yet the brain does not really work on the basis of a single clump of cells calculating something in isolation of other groups. This is basically because a functional group includes the various roles of all its individual neurons, which are connected all over the brain making the functional role connected to the whole of the brain. You should thus not be surprised to know that activity in the cerebellum has some link to human emotion (movement IS sensual!). Hence, the cerebellum has hundreds of functional roles and is involved in literally millions of different pattern-recognition activities. Good luck truly unpicking something as highly aggregated as the activity in a large area of the cerebellum! Indeed, as the pattern of connections differs for any two humans based on their life experiences no two brains are the same.
Reflect further on how integrated any individual mental activity is, such as a particular emotion (which the cerebellum is also involved in): emotions are connected to evaluations of circumstances against a historical database and all kinds of mental habits. In a sense, the displayed emotion thus depends on not just things like self-image and previous experience but also on all the immediate inputs of the whole body and sensory experiences. Hence, what the cerebellum does depends on the whole of the rest of the brain, what someone had to eat, the circumstances in the womb of the mother, and the intensity of light in the room.
I hope you can see that measuring all that to a degree that we would truly understand the cerebellum is as hopeless as correctly measuring and truly understanding GDP. Indeed, if that female is a teacher of economics her understanding of GDP will be involved in her cerebellum activities!
So not only do we not measure what ‘truly’ goes on in the cerebellum, but even conceptually it would require a being far smarter than us to interpret and use what really happens in the cerebellum.
So for the whole spectrum of social science, from what we measure at the grand aggregate level (such as GDP) down to the smallest measure of activity we have (brain patterns), we end up not truly knowing what we are measuring when we zoom in. Every time we look closer, the uncertainty is just as big as when we zoomed in at any higher level. Human life and human behaviour seems like a fractal: it does not get any clearer the more you zoom in. It is really very frustrating.
What does all this mean for how to do ‘proper’ social science and how to interpret all our claims of certainty? It means that all our stories based on ‘fundamentals’, ‘first principles’, ‘solid data’, ‘undoubted measurement’, ‘micro-foundations’, etc. are just that: stories. They help us limited beings muddle through a reality we have no hope of fully understanding in terms of some undoubted underlying truth that we will someday measure properly. Stories of certainty are then just particularly simple stories. Potentially useful, but never 100% true.
Whilst this fundamental lack of certainty in anything we do or measure in social science does not mean that we should give up on abstractions or measurement, it does mean a bit of humility is in order.