Iran Election Statistics

How do you detect election fraud? A recent article in the Washington Post describes a novel statistical idea. It is the kind of twist on viewing the data that any freakonomics fan should have thought of.

[DDET Read More]

Most people believe the election results in Iran were rigged. They base this on a couple of arguments. The most obvious is that Ahmadinejad did unreasonably well, including especially in areas where you would expect him to poll poorly – for instance in the home seat of his main opponent Mousavi. This argument would sway most people but it is not scientific. Ahmadinejad could still say that he ran a great campaign and that the people decided he was a safer pair of hands.

Another more statistical argument is that the variability in Ahmadinejad’s vote is too small. By too small, I think they mean less than one would expect from the regional variation that one normally sees. So this does intersect with the first argument and Ahmadinejad’s high vote in some unlikely seats. But it is a distinct statistical view in that we focus on variation across seats rather than overall mean level. Certainly, if one could actually show that the variability of Ahmadinejad’s vote was less than binomial variation this would be pretty damning and suggestive that someone had just made the figures up.

Well, it turns out that the variability of the vote is about 100 times higher than binomial – there is plenty of regional variation overall. I suppose we could compare the variability with a different benchmark – such as the variability of the vote for winners of previous elections – but the argument starts to lose force as there are other explanation of why variability might decline.

So, if we are interested in revealing whether the election count data has been concoted, let’s focus more on the process of human beings making figures up. Humans are pretty bad at making figures up. Human generated data typically looks too good to be true and follows theory too well. It is well known, for instance, that Mendel’s famous pea data was probably concocted, perhaps by his assistants. Forged signatures can often be recognised by experts as being too consistent. Real signatures are not perfect and vary from day to day.

Humans are especially terrible at generating random numbers. And for a large voting count, for instance 325911 which was Ahmadinejad’s count in the region of Ardabil, the last few digits should be essentially random. On the other hand, if someone were making the numbers up and not concentrating too hard on the unimportant final digits, you might expect to see some tell-tale signs of non-randomness in the those final digits.

This idea is due to Alexandra Scacco and Bernd Baber who have suggested that there is indeed such evidence in the data. They claim that human generated random numbers tend to have too many 7’s and not enough 5’s. And looking at pairs of digits, they claim that human generated digits will have too many adjacent sequences such as 23 and 76.

The data for the 2009 Iranian presidential election are HERE and a graphic of the marginal distribution of the last digit is below. There are indeed too many 7s and not enough 5s. The overall goodness-of-fit of a uniform distribution has a P-value around 8% but this may underestimate the evidence. If we concentrate on the a priori hypothesis of too many 7s and not enough 5s then the chi-square statistic is much, much stronger.

But is the hypothesis of too many 7s and not enough 5s really a priori? I could not find any evidence on the web for the assertion of not enough 5s but there is some prior reason to look too many 7’s. If we just look at the excess of 7s the P-value is around 0.4%. The excess of 20/116=17.2% sevens over the expected 10% is very suspicious indeed.

One might obtain even stronger results if we concentrate only on those electorates where Ahmadinejad’s vote was likely to be poor. One assumes that the fraudsters would not alter the counts in the electorates where he won. So the random digits in these true counts might dilute the non-randomness in the fraudulent counts.

I also had a look at the last pairs of digits hoping to find something even stronger but I could not recover the results quoted in the Washington post article. Moreover, I could not find any evidence for their claims about adjacent sequences, though I have heard that two digit primes tend to get preferentially chosen. Anyway, I challenge readers to look at the digits and find some really damning evidence of non-randomness (which we would have to correct for the effort you put into the search!)

If I were going to fudge some numbers, I would start multiplying the real counts by some scaling factor, or leave the last digits alone, or even use a random number generator on my mobile phone! I guess criminals and fraudsters are not always very smart.

[/DDET]

Author: Chris J. Lloyd

Professor of Business Statistics, Melbourne Business School

8 thoughts on “Iran Election Statistics”

  1. This statistical study relies on “Benders Law” which has been shown by the Carter Center to be inapplicable to electoral results.

    There’s no actual evidence of election fraud in Iran. See IRANAFFAIRS.COM for a list of claims and counter-claims.

    Like

  2. I think Bender’s law applies to the FIRST digit, not the last.

    I am not aware that Bender’s law has been discredited, let alone by the Carter Centre.

    An analysis of the first digit and how it compares to this law is availabel at http://arxiv.org/PS_cache/arxiv/pdf/0906/0906.2789v3.pdf.

    The comment above was postd by someone called hassani from a Yahoo account.

    Like

  3. Here’s a good item about Bedford’s Law:

    http://www.thefreelibrary.com/I’ve+got+your+number-a054636935

    It’s important to realise Bedford’s analysis is only an indicator of human tampering with data sets – it doesn’t imply malicious intent. As to its relevance for democratic elections: it doesn’t really matter. Even if the vote tally is perfectly valid statistically the entire system remains fundamentally idiotic.

    Like

  4. Here’s a good item about Bedford’s Law:

    http://www.thefreelibrary.com/I’ve+got+your+number-a054636935

    It’s important to realise Bedford’s analysis is only an indicator of human tampering with data sets – it doesn’t imply malicious intent. As to its relevance for democratic elections: it doesn’t really matter. Even if the vote tally is perfectly valid statistically the entire system remains fundamentally idiotic.

    Like

  5. From the Carter Center study on the elections in Venezuela:

    “In short, Benford’s Law does not generally apply to electoral data and even in cases where we suspect that it might apply, we find that it does not. All in all, Benford’s Law seems like a very weak instrument for detecting voting fraud. There are many reasons to believe that it does not apply to electoral data, and empirical tests suggest that deviations from the law are not necessarily indicative of fraud.”
    SOURCE:

    Click to access 2020.pdf

    Like

Comments are closed.