Wednesday, February 13, 2008

Recent Articles on L2

Several recent (to me) articles about the Lancet surveys are worth reading.

First, a tour de force (pdf) from Michael Spagat: "Ethical and Data-Integrity Problems in the Second Lancet Survey of Mortality in Iraq." This is the paper that I wanted to write last summer, but was not smart or organized enough to do. Michael will be presenting this paper at JSM in August.

Second, "Estimating mortality in civil conflicts: lessons from Iraq" by Debarati Guha-Sapir and Olivier Degomme (pdf) is critical of L2. The authors identify several "errors and methodological weaknesses" of L2 and write:

Our re-estimation of total war-related death toll for Iraq from the invasion until June 2006 is therefore around 125,000.

This is less than 1/4 of the comparable L2 estimate of 601,000.

Third, a 2006 working paper (pdf) by Mark J. van der Laan. He argues that the reported confidence interval for L2 is significantly too narrow.

Fourth, the IFHS study. They estimate about 151,000 violent deaths over the same period as L2. Again, this is around 1/4 the L2 estimate. M. Ali, one of the authors, will be presenting on my panel at JSM.

Good stuff all.

Where does that leave my estimate of excess war deaths? Time to update with the information in these studies. Recall that I wrote two years ago:

If I had to bet, I would provide much wider confidence intervals than either the Lancet authors or most of their critics. Burnham et al. (2006) estimate 650,000 "excess deaths" since the start of the war with a 95% confidence interval of 400,000 to 950,000. My own estimate would center around 300,000 and range from 0 to 1.2 million.

Then, last month, I updated to:

So, my new estimate is 150,000 (at first glance, this new paper seems much better than L1 or L2) with a confidence interval of 0 to 500,000.

This still seems OK, but I think that the upper bound can start to come down. When experience scholars like the folks at IFHS and CRED come up with independent (?) estimates with upper confidence intervals well below 500,000, then I can be fairly sure that this is too conservative. So, now I go with 125,000 (shifting a little lower than IFHS because I am impressed with the view that the IFHS estimate does too much adjustment for dangerous clusters that were not sampled) and a range of 0 to 300,000.

All of this, of course, depends on the assumption that the mortality rate in Iraq, in the absence of war, would have been similar to that of Iraq in 2002 to 2003. In other words, I assume that Sadam would not have attacked Iran, gassed the Kurds, taken revenge on the Shiites and so on. Whatever probability you assign to those events, you should decrease your excess death estimates accordingly.

Sunday, February 10, 2008

Another False Statement

My general take (as here) is that Les Roberts is much more likely to say something false than Gilbert Burnham is. Perhaps I need to rethink this assumption. Consider Burnham's comment about/to the Wall Street Journal two years ago.

Mr. Moore did not question our methodology, but rather the number of clusters we used to develop a representative sample. Our study used 47 randomly selected clusters of 40 households each. In his critique, Mr. Moore did not note that our survey sample included 12,801 people living in 47 clusters, which is the equivalent to a survey of 3,700 randomly selected individuals. As a comparison, a 3,700-person survey is nearly 3 times larger than the average U.S. political survey that reports a margin of error of +/-3%.

By what law of political arithmetic is a survey with 12,801 people living in 47 clusters "equivalent" to a survey of 3,700 randomly selected individuals? It all depends on the design effect, on how clustered the response of interest is. And you can't know how large the design effect is until you do the survey. Again, I think that some of Moore's critique was weak, but this is confusing at best.

In addition, Mr. Moore claimed that the Hopkins study did not include any demographic data. The survey did collect demographic data, such as age and sex, related to violence, although they are not the same details Mr. Moore’s company would have collected for public opinion polls. The characteristics of households in our study are similar to other accounts of households in Iraq and the region, though the household size for the 2006 study is smaller (6.9) than found in the 2004 survey (7.9).

The debate over demographic data is key. Moore did make that claim. Roberts then responded that the survey did collect that data, as everyone would have expected it to. This made Moore look like an idiot. See here for relevant links. It was only 6 months later, once the data had been made available to me and others, that it became clear that little meaningful demographic data had been collected. But, by that time, most people had forgotten the debate. But not we Lancet fanatics!

This debate came up on Deltoid here. Note my painstaking efforts to force Lambert to print a correction after he slandered Neil Munro. One of the reasons, probably, that Lambert would be in such a rush to attack Munro on this is because Burnham and others continue to mislead on this point. Note the weasel wording:

The survey did collect demographic data, such as age and sex, related to violence, although they are not the same details Mr. Moore’s company would have collected for public opinion polls.

The survey collected age and sex for deaths, whether or not they were "related to violence." So, if someone died from a heart attack before the war started, his age and sex were recorded. But no age information was collected for the individual residents of each household. This means that it is impossible to know whether or not the sample matches up with other information about the population structure of Iraq. Although lots of information might be included under the heading "demographic," the minimum details are age and sex. If you are not collecting age/sex information, you do not get to claim that you are collecting "demographic" data.

Now, Burnham's sentence above is so convoluted that it is tough to be certain that it is false. But the next one is clearly untrue.

The characteristics of households in our study are similar to other accounts of households in Iraq and the region

That's just false. They did not collect ages, so they have no idea if the "characteristics of households" match "other accounts." Burnham is just making stuff up, allowing the reader to believe that they collected age/sex information (and that it matches other surveys) when, in fact, they did no such thing.

Again, it might be possible for a lawyer to argue (unlike in our last case) that nothing in Burnham's letter is literally false, but, put together, his statements are completely misleading. He should be embarrassed.

UPDATE: One more item. Tim Lambert (because he is a serious guy who gets the details correct) took the time to post a correction to this item from 2006. (It would be nice if Tim gave me a little credit and if he dated these corrections. The subsequent set of comments make a lot more sense if you know that his correction was added in 2008.) But the best part is that, even in 2008, Roberts does not know what went on!

I was wrong! Shannon cleaned and analyzed the data. I never saw the raw forms. We collected age and gender on everyone in 2004. That was the plan in 2006. My understanding is that they did this for some houses in the start but as that was the most lengthy part of the interview they just started recording how many people were in the house and the age and gender of the dead.

Nice story. But that's not what they did! Or at least is not what they claimed to have done. The data that has been distributed does list the number of males and females in each house but not their ages. But Roberts claims that they just recorded number of household residents. Well, which is it? Did they just make up the genders of household residents or did they collect that data? If I were smart, I would know a quick way to test to see if they were making stuff up. If they were, you would probably see too many houses with similar numbers of men and women and not enough extreme households of all men or all women. (Of course, this assumes a certain default model of household formation.)

Another item to add to the long list of things to investigate . . .