Saturday, March 10, 2007

Response Rate

Imagine that we conduct 100 independent polls, each of 10,000 people, in the US. The polls are all supposed to use the exact same procedure. Will all the response rates be exactly the same?

Of course not. One poll might have a 60% response rate and another a 64% response rate, not because of fraud or malfeasance but just because of random chance. In fact, we would expect there to be a distribution of response rates, perhaps centered around 60% with a 2% standard deviation. Although the poll with a 64% response is higher than the vast majority of the other polls, at least one poll out of the 100 needs to be the highest, just as one will be the lowest. An extreme result is no proof of fraud. This is all the more so since there is no human way to ensure that all 100 polls use the exact same procedure. At the very least, different individuals will be conducting the polls, or the same individuals will be conducting the polls on different days.

But what if the results of 99 of the polls produce a nice normal distribution centered on 60% with a 2% standard deviation but the 100th poll features a 99% response rate? What would be a reasonable conclusion?

First, this could just be random. Perhaps poll results are fat-tailed, and so extreme results are to be expected. Second, this could just be an honest mistake. Perhaps the interviewers in the 100th poll mismarked the forms. Perhaps the forms were marked directly but there was an error in the automatic reader. Third, this excessively high response rate might be evidence of fraud, might indicate that the reviewers for this poll did not bother to interview anyone and just filled out the forms themselves. Without more information, it is hard to know which of these three explanations is correct or if there is something else going on.

Readers can judge for themselves, but if anyone reports a 99% response rate for a US poll, I think that the second and third explanation (honest mistake or fraud) are the most likely. I can find no evidence that poll results are fat-tailed. For determining the usefulness of the poll results, it doesn't really matter whether the problem is a mistake or a fraud. In either case, the results of the poll are not reliable.

It should be obvious how this theoretical concerns relates to the Lancet II. I argue that the 99% response rate is ludicrously high, way higher than the rate for almost all polls on almost all subjects in almost all countries in the world. Kieran Healy takes me to task and writes:

Kane says, “I can not find a single example of a survey with a 99%+ response rates in a large sample for any survey topic in any country ever.” I googled around a bit looking for information on previous Iraqi polls and their response rates. It took about two minutes. Here is the methodological statement for a poll conducted by Oxford Research International for ABC News (and others, including Time and the BBC) in November of 2005. The report says, “The survey had a contact rate of 98 percent and a cooperation rate of 84 percent for a total response rate of 82 percent.” Here is one from the International Republican Institute, done in July. The PowerPoint slides for that one say that “A total sample of 2,849 valid interviews were obtained from a total sample of 3,120 rendering a response rate of 91 percent.” And here is a report put out in 2003 by the former Coalition Provisional Authority, summarizing surveys conducted by the Office of Research and Gallup. In the former, “The overall response rate was 89 percent, ranging from 93% in Baghdad to 100% in Suleymania and Erbil.” In the latter, “Face-to-face interviews were conducted among 1,178 adults who resided in urban areas within the governorate of Baghdad … The response rate was 97 percent.” So much for Iraqi surveys with extraordinary response rates being hard to find.

See the original post for links. Now, it is hard to know what to make of this. Healy finds the results for 4 polls. Their response rates are 82%, 91%, 89% and 97%. The average here is 89.75%. Let's round up to 90%.

So, I claim to not be able to find any poll with a response rate higher than 99% (the response rate in the Lancet II). Healy claims that I am wrong and, for evidence, cites 4 polls with response rates lower than 99%. Am I missing something? Isn't he just providing further evidence for my concerns. If, of the hundreds (?) of polls conducted in Iraq, Lancet II features the highest response rate, isn't that cause for concern? (Note that Healy, in an earlier portion of the same post cites a poll with 100% response rate. I hope to return to that specific example at a later date.)

Again, one poll will, by definition, have the highest response rate. A priori, there is no reason why Lancet II might not be that poll. But it is a bit worrying that the poll with the most controversial result of any poll conducted in Iraq in the last 2 years would also have the highest response rate of any poll. What are the odds of that? If response rate and controversy are independent, then this would be a surprising result. If they are correlated (perhaps people are more likely to want to participate in a poll about death rates than in a poll on less controversial topics), then this is to be expected.

In any event, the annoyance comes when someone like Henry Farrell on Crooked Timber writes:

I don’t have very much respect for David Kane (it isn’t me who was accused of fraud). What bugs me as much as the initial offensive accusation is that he never to my knowledge apologized afterwards or sought to retract his accusation (if I’d done something similar, god forbid, I hope that I’d have apologized abjectly to the offended parties; I’d likely have disappeared entirely from public debate immediately thereafter).

What accusation does Henry want me to retract? The main point of my initial post was a) that any problems with Lancet II are likely to lie with the interviewers, not with the specific clustering formulas and other arcana used by the authors in their statistical analysis and b) there is some evidence that the response rate for Lancet II is excessively high. Why such concerns make me untouchable is unclear.


