Saturday, March 17, 2007

Random Numbers

A central concern of critics is the relationship between the US authors of the paper and the Iraqi survey teams. What did the US authors tell the Iraqis to do in terms of the exact survey procedures? What did the Iraqis actually do? What did the Iraqis tell the US authors that they did? How can the US authors (and the rest of us) know for sure what the Iraqis did?

A small but revealing example of this involves the sampling procedures. The main paper reports:

Sampling followed the same approach used in 2004, except that selection of survey sites was by random numbers applied to streets or blocks rather than with global positioning units (GPS), since surveyors felt that being seen with a GPS unit could put their lives at risk.


As a first stage of sampling, 50 clusters were selected systematically by Governorate with a population proportional to size approach, on the basis of the 2004 UNDP/Iraqi Ministry of Planning population estimates (table 1). At the second stage of sampling, the Governorate’s constituent administrative units were listed by population or estimated population, and location(s) were selected randomly proportionate to population size. The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. On the residential street, houses were numbered and a start household was randomly selected.

The "Human Cost" paper reports the same.

Selection of households to be interviewed must be completely random to be sure the results are free of bias. For this survey, all households had an equal chance of being selected. A series of completely random choices were made. First the location of each of the 50 clusters was chosen according the geographic distribution of the population in Iraq. This is known as the first stage of sampling in which the governates (provinces) where the survey would be conducted were selected. This sampling process went on randomly to select the town (or section of the town), the neighborhood, and then the actual house where the survey would start. This was all done using random numbers. Once the start house was selected, an interview was conducted there and then in the next 39 nearest houses.

A perfectly sensible procedure. Random numbers are, indeed, widely used in surveys. But is this actually what happened? Consider Gilbert Burnham's recent speech at MIT. Watch from 1:07 to 1:10 in the video. Some crank (i.e., me) is trying to understand precisely what the procedure was.

According Burnham, the team did not use random numbers! Instead, he mentioned two approaches. Once is to write down all the names of the candidate streets on pieces of papers and then "randomly" select among them by hand. Also, for selecting the specific house, Burnham reports:

Once they selected the streets, then they numbered the houses on that street from one to whatever the end of that street was. And then they randomly, using serial numbers on money, they randomly selected a start number and started with that house, and from that they went to the nearest front door, the nearest front door, nearest front door, nearest front door, until they had a total of 40 houses.

"[S]erial numbers on money" is not the same thing as "random numbers," as any statistician will tell you. First, there is no guarantee that the serial numbers on currency are random. Who knows if there are more 1's than 7's on Iraqi (or US) currency? Second, using serial numbers makes it much easier for an unscrupulous interviewer to cheat.

Now, I don't actually worry that this caused a major problem. Putting street names on a piece of paper and picking one out of a hat is a fairly random process. Once you have picked a street, I wouldn't think that it matters much which house you start with. Even a malicious interviewer would have trouble, I would think, knowing which house to pick, whatever answer he might "want" to get.

But note that this might be a concern. The interviewers went to a neighborhood and, before starting the survey --- before picking the start house? --- told the local kids about the survey. From those kids, one could learn which house suffered several deaths. One could check this by seeing if there is a tendency for houses near the start of the survey in a given cluster to have higher death rates than houses at the end of the survey.

Anyway, my complaint is that this is another example of the methodology described in the article not being accurate. Don't claim to use random numbers while actually using some other process. And, if the article is incorrect in its claim about the process used to select which houses to interview, what else is it incorrect about?

The Lancet ought to publish a correction.


Post a Comment

<< Home