Note: I originally published this post on the
Social Science Statistics blog at Harvard in November 2006. (I am an Institute Fellow at
IQSS, the organization behind SSS.) It was attacked and denounced by, among others,
Tim Lambert at Deltoid and
Kieran Healy at Crooked Timber. I believe that the version below is identical to the one which was published (and removed), but it might not be exact. Since academics should be responsible for their prose, I republish it here.
SSS is an interesting blog, which I occasionally contribute to (example
here). I think that it would be fun if they tackled more controversial topics, but I respect Gary King's
judgment that this is not their primary mission.
In retrospect, I should have followed Gary's advice to tone down the language a bit.
--------------
The latest Lancet survey of Iraqi mortality, Burnham et al (2006), has come in for criticism. (See the Wikipedia
entry for links. See
here,
here,
here and
here for criticism.) Daniel Davies is correct when he
writes:
This is the question to always keep at the front of your mind when arguments are being slung around (and it is the general question one should always be thinking of when people talk statistics). How Would One Get This Sample, If The Facts Were Not This Way? There is really only one answer - that the study was fraudulent. It really could not have happened by chance. If a Mori poll puts the Labour party on 40% support, then we know that there is some inaccuracy in the poll, but we also know that there is basically zero chance that the true level of support is 2% or 96%, and for the Lancet survey to have delivered the results it did if the true body count is 60,000 would be about as improbable as this. Anyone who wants to dispute the important conclusion of the study has to be prepared to accuse the authors of fraud, and presumably to accept the legal consequences of doing so.
Assume, for a moment, that fraud occurred. How is it most likely to have happened? We can be fairly certain that the editors and authors did not do anything so crude as to lie about the numbers. If there is fraud, it derives from the Iraqi survey teams themselves. Consider the issues that I
raised about Roberts et al (2004), the first Lancet study.
The central problem with the Lancet study was that it was conducted by people who, before the war started, were against the war, people who felt that the war was likely to increase civilian casualties and who, therefore, had an expectation/desire (unconscious or otherwise) to find the result that they found.
Consider the Iraqis who did the actual door-to-door surveying. Do you think that they appreciated having such a well paying job? Do you think that they hoped for more such work? If you were them, would you be tempted to shade the results just a little so that the person paying you was happy?
We know very little about these Iraqi teams. Besides monetary incentives to give the Lancet authors the answers they wanted, the Iraqis may have had political reasons as well. The paper reports (page 2) that:
The two survey teams each consisted of two female and two male interviewers, with the field manager (RL) serving as supervisor. All were medical doctors with previous survey and community medicine experience and were fluent in English and Arabic.
The field manager (RL) is Riyadh Lafta, an author of both papers. Now Lafta could be the most honest and disinterested scientist in all the world. Or he could be a partisan hack. There is almost no way for outsiders to judge. But were all the interviewers Sunni? (None of them seemed to speak Kurdish.) Were any former members of the Baath Party? Among highly educated doctors, party membership was common, even somewhat compulsory. It is unseemly to even raise these sorts of questions, and I agree that the names of the interviewers should not be released for safety reasons. But the entire paper hangs on their credibility. How can anyone know that they are telling the truth? The paper goes on:
A 2-day training session was held. Decisions on sampling sites were made by the field manager. The interview team were given the responsibility and authority to change to an alternate location if they perceived the level of insecurity or risk to be unacceptable. In every cluster, the numbers of households where no-one was at home or where participation was refused were recorded.
This is key. The interviewers could, at their discretion, change the location of the sample. How many times did they do this? We are not told and the authors refuse to release the underlying data or answer questions about their methodology. Again, as a matter of procedure, this may be a perfectly fine way to conduct the study. Safety concerns are paramount. But there is no way for any outsider to know how "random" the sampling actually was without access to more detailed information.
From page 4:
[A] final sample of 1849 households in 47 randomly selected clusters. In 16 (0·9%) dwellings, residents were absent; 15 (0·8%) households refused to participate.
Here, finally, is a hard number that we can use to evaluate the likelihood of fraud by the survey teams. If it is typical in such surveys to have such high (99%+) contact and response rates, then there is much less to worry about. But if such a level of cooperation is uncommon, if we can't find a single similar survey with anywhere near this level of compliance, then we should be suspicious. And, once we are suspicious of the underlying data, there is no reason to waste time on the arcana of calculating confidence intervals for cluster sampling. Unreliable data means useless results.
A commentator at Crooked Timber
writes:
I have a stats background, and I’ve made a living conducting market and social research surveys for more than 25 years.
...
I’m also very worried about the fieldwork itself. I believe the reported refusal rate was 0.8% (I can’t find this in the report itself, so feel free to correct me). This is simply not believable. I have never conducted a survey with anything like a refusal rate that low, and before anyone talks about cultural differences, there are many non-cultural reasons for people to refuse to participate. If my survey was in a war-zone, I would expect refusal rates to be higher than normal.
One anonymous blog commentator is hardly an authority, but the point he raises is a factual one. What is the typical response rate for surveys of this kind? What is the highest response rate that has ever been recorded in such a survey, in any country on any topic?
In the context of US opinion polling, Mark Blumenthal
reports.
The most comprehensive report on response and cooperation rates for news media polls I am aware of was compiled in 2003 by three academic survey methodologists: Jon Krosnick, Allyson Holbrook and Alison Pfent. In a paper presented at the 2003 AAPOR Conference, Krosnick and his colleagues analyzed the response rates from 20 national surveys contributed by major news media pollsters. They found response rates that varied from a low of 4% to a high of 51%, depending on the survey and method of calculation.
Just because the response rates in Burnham et al (2006) would be clear evidence of fraud if they were reported in the context of US polling is not dispositive since Iraq is different from the US and face-to-face is different from telephone polling.
Wikipedia claims a 40% -- 50% response rate for household surveys without providing a source.
I can not find a single example of a survey with a 99%+ response rates in a large sample for any survey topic in any country ever. (If you come across such an example, please post it below.) Assume, for a moment, that there are no such examples, that no survey anywhere has ever had such a high response rate. If so, there are three possibilities.
1) The survey teams provided fraudulent data.
2) There is something different about this survey team or about Iraq at this time which makes this situation different from any other survey ever undertaken.
3) The high response rate is a once-in-a-life-time freak event. It would not be repeated even if the same survey team took another survey.
I do not think that 2 or 3 are very likely. Fraud in surveys, on the other hand, is all too common.