Monday, January 14, 2008

False Statement on Missing Certificates

See UPDATE below.

Gilbert Burnham is an honest scientist doing his best to get to the truth, but he did not collect the data for L2; he did not travel to Iraq; he has only second-hand knowledge of what the interviewers did (as opposed to what they were supposed to do). Les Roberts is an ideologue and former candidate for Congress who will say most anything to advance the cause. Neither of them has, as far as I know, done any meaningful work with the actual data underlying L2, i.e., they rely on (the very smart) Shannon Doocy and Elizabeth Johnon to do the analysis. Put all this together and have them write a letter to the National Journal in response to Munro and Cannon's article on the L2. The results are not pretty. The NJ article included this information on missing certificates.


Under pressure from critics, the authors did release a disk of the surveyors' collated data, including tables showing how often the survey teams said they requested to see, and saw, the death certificates. But those tables are suspicious, in part, because they show data-heaping, critics said. For example, the database reveals that 22 death certificates for victims of violence and 23 certificates for other deaths were declared by surveyors and households to be missing or lost. That similarity looks reasonable, but Spagat noticed that the 23 missing certificates for nonviolent deaths were distributed throughout eight of the 16 surveyed provinces, while all 22 missing certificates for violent deaths were inexplicably heaped in the single province of Nineveh. That means the surveyors reported zero missing or lost certificates for 180 violent deaths in 15 provinces outside Nineveh. The odds against such perfection are at least 10,000 to 1, Spagat told NJ. Also, surveyors recorded another 70 violent deaths and 13 nonviolent deaths without explaining the presence or absence of certificates in the database. In a subsequent MIT lecture, Burnham said that the surveyors sometimes forgot to ask for the certificates.


Having looked at the raw data, I believe the above analysis is 100% correct. Burnham and Roberts (BR) respond that:


The statement on missing certificates is wrong. Three clusters did not have the presence of certificates noted, and in all there were 120 deaths in which the interviewers neglected to note their presence.


First, it is a bad sign that BR do not specify which "statement" they are disagreeing with. Besides the section that I quoted (which I think is what they are referring to), there are few other references to missing certificates in the article. and none seems connected to BR's comment. Second, the article does not even present what I consider the most damning aspect of the missing certificate issue: interviewers were much more likely to "forget" to ask for certificates for violent deaths and for more recent deaths. Their forgetfulness was anything but random. Third, and most importantly, what BR claim in the above is false. Let's go to the data!

> library(lancet.iraqmortality)
> x <- prep.deaths()
> summary(x$certificate)
no yes forgot
45 501 83

There were 45 deaths in which the interviewers asked for certificates but for which no certificates were found. There were 83 deaths in which the interviewers forgot to ask. (In the raw data, these cases are marked as NA. I code them as forgot because this is what Gilbert Burnham claimed happened in these cases.)

> y <- subset(x, x$certificate == "forgot")
> dim(y)
[1] 83 14
> table(y$cluster)[table(y$cluster) != 0]

1 2 14 16 18 20 22 23 24 25 30 31 32 33 34 39 40 42 46 51
1 1 3 1 1 2 1 4 10 1 1 1 4 24 7 6 8 2 1 4
> length(table(y$cluster)[table(y$cluster) != 0])
[1] 20

So, there were 20 (not 3) clusters in which the interviewers forgot to ask for at least some death certificates. But maybe Burnham and Roberts are referring to clusters in which interviewers did ask but no certificates were available?

> y <- subset(x, x$certificate == "no")
> dim(y)
[1] 45 14
> table(y$cluster)[table(y$cluster) != 0]

2 4 5 11 12 13 25 26 34 35 36 37 41 45
2 2 1 4 1 6 1 2 20 1 1 2 1 1
> length(table(y$cluster)[table(y$cluster) != 0])
[1] 14
>

There were 14 such clusters. (And, minor note, there are 128 (not 120) cases in which interviewers either forgot to ask for the death certificate or did ask but did not get to see it.)

So, what are BR talking about? Who knows? My guess is that Roberts wrote this (without really looking at the data) and then convinced Burnham to sign off. Burnham would never, I think, purposely misrepresent the data. Roberts will say whatever it takes to convince people that the L2 results are basically accurate.

As a rule of thumb, you should double-check every claim that Les Roberts makes about the raw data. Much of what he says is true. Yet a lot of really important claims are false.

I think that BR are trying to use one of their favorite tricks: looking at all the data together when the critics just want to make a point about the data from violent deaths, where we believe the real problems are. But since BR can't even provide a competent summary of the data, the whole effort makes no sense. They really should check all their empirical claims with Shannon Doocy before making them.

UPDATE 2008-04-10. I just noticed that Burnham/Roberts have changed the text of the letter without telling anyone. Classy! Now the letter is a lie because it is no longer what was "submitted to the editors of the National Journal on January 7, 2008." The offending passage reads.


The statement on missing certificates is wrong. There were 83 deaths (13%) in which the interviewers neglected to note their presence and these deaths were distributed across 20 clusters.


1) The numerical claims are now correct, as I show above.

2) But now the claim that the "statement on missing certificates is wrong" makes no sense. Nothing in Munro's article is contradicted by these numbers.

3) It is sleazy for Burnham/Roberts to make this correction without giving me credit. (I e-mailed them about it and, after not getting satisfaction, brought the issue to the attention of the General Counsel of Johns Hopkins).

4) It is sleazy for Burnham/Roberts to pretend that this was the original version of the letter.

5) It is false to (still) claim that "The following letter was submitted to the editors of the National Journal on January 7, 2008." This was not the letter that they submitted to the National Journal.

Sunday, January 13, 2008

Tweaked

The always informed Tim Lambert provides an approving reference to Rebecca Goldin's post at stats.org on the National Journal article. Goldin is a serious statistician. She provides an excellent overview of the dispute over the accuracy of L2.


The National Journal team did its homework, interviewing many experts (rather than conservative pundits) and categorizing the potential flaws of the study into different headings. However suspicious some facts surrounding the Lancet study might be (such as the anti-war position held by the scientists conducting the study), only two criticisms cited by the Journal raise any alarms.

One is “main street bias,” the idea that the Lancet study authors over-sampled regions near main streets, which were in turn more likely to be home to victims of car-bombs or other violence. The other is fraud – not by those who wrote the Lancet article, but by those in the field, doing the interviews under minimal supervision.


Goldin is obviously not a knee-jerk Lancet defender or attacker. I agree with her that these are, far and away, the most important criticisms of L2. I also agree with some, but not all, of her criticisms of Munro.


The Journal made a convincing argument that the data may well have been tweaked, in part based on the theory that faked data has patterns that true data rarely fit into; for example, invented people reported as killed may be more likely to be 30 or 40 than 32 or 43. It doesn’t seem unusual if any individual is 30, but it’s awfully strange if all of the deaths consist of 30-year-olds. Apparently, those conducting the Lancet study did not put enough checks in place to ensure that interviewers didn’t pad the books. The data look like inventiveness may have played a role, based on which death certificates the survey conductors reported to have seen, and which they didn’t.


So, the data "may well have been tweaked" and "look like inventiveness may have played a role." In other words, there are good reasons for suspecting "fraud," as many of us have for more than a year. Goldin is correct to note that "we should be careful in reading too much into any particular statistical anomaly" and she is right to worry about that "If those looking for fault in the Lancet study only considered a few possible ways in which the data didn’t look random, then unusually distributed data is far more damning than if they considered many, many ways and found one."

Luckily, I was among the first people to look closely at the data (and certainly the first to describe (pdf) the problems with it). I can confirm that just about the very first thing that I looked at was whether the rate of "forgetting" to ask for death certificates was correlated to date or type of death. And, sure enough, it was! We can be sure that the interviewers did not just "forget" to ask for death certificates, that they purposely asked sometimes and did not ask others.

Does that invalidate the whole study? No. Yet it provides further evidence that the US authors (like Gilbert Burnham) had only the foggiest idea of what the Iraqi interviewers were up to. And it makes any reasonable person suspicious of what else the interviewers were up to. If they felt comfortable picking and choosing which families to ask about death certificates then how can we be sure that they didn't similarly pick and choose which neighborhoods to place clusters in and which houses to visit?

At the end of the day, "tweaked" and "inventiveness" are just nice terms for "fraud." Neither Goldin nor I know, for a fact, that there was fraud in the Lancet data collection process, but much of the circumstantial evidence points in that direction.

Saturday, January 12, 2008

Roberts Quotes on IFHS

If I were Les Roberts, I would either keep quite about IFHS or say nothing too critical. I would know that harsh comments, while perhaps helpful in the short term in convincing people that the true number of violent deaths is much higher than 151,000, are likely to cause the IFHS authors to write another, even more critical paper. In fact, if I were Roberts, I would be very worried about the IFHS authors getting so pissed off that they might climb of the Scheuren bandwagon and demand more detailed data from L2 and/or demand data from L1.

Fortunately (for me!), Roberts does not think that way. Instead, he seems to be going out of his way to attack the IFHS results, just as he has done in the past with IBC. Here are a couple of quotes:


A paragraph in the published abstract of the report, blandly titled "Adjustment for Reporting Bias" contains an implicit confession of the subjectivity with which the authors reached their conclusions. As Sprey points out, "they say 'the level of completeness in reporting of death was 62%,' but they give no real explanation of how they arrive at that figure." Les Roberts, one of the principal authors of the Johns Hopkins studies, has commented: "We confirmed our deaths with death certificates, they did not. As the NEJM study's interviewers worked for one side in this conflict, [the U.S.- sponsored government] it is likely that people would be unwilling to admit violent deaths to the study workers."

...

If any further confirmation of the essential worthlessness of the NEJM effort, it comes in the bizarre conclusion that violent deaths in the Iraqi population have not increased over the course of the occupation. As Iraq has descended into a bloody civil war during that time, it should seem obvious to the meanest intelligence that violent deaths have to have increased. Indeed, even Iraq Body Count tracks the same rate of increase as the Hopkins survey, while NEJM settles for a mere 7% in recent years. As Roberts points out: "They roughly found a steady rate of violence from 2003 - 2006. Baghdad morgue data, Najaf burial data, Pentagon attack data, and our data all show a dramatic increase over 2005 and 2006."


See also here for similar material.


There are reasons to suspect that the NEJM data had an under-reporting of violent deaths.

The death rate they recorded for before the invasion (and after) was very low....lower than neighboring countries and 1/3 of what WHO said the death rate was for Iraq back in 2002.

The last time this group (COSIT) did a mortality survey like this they also found a very low crude death rate and when they revisited the exact same homes a second time and just asked about child deaths, they recorded almost twice as many. Thus, the past record suggests people do not want to report deaths to these government employees.

We confirmed our deaths with death certificates, they did not. As the NEJM study's interviewers worked for one side in this conflict, it is likely that people would be unwilling to admit violent deaths to the study workers.

They roughly found a steady rate of violence from 2003 - 2006. Baghdad morgue data, Najaf burial data, and our data all show a dramatic increase over 2005 and 2006.

Finally, their data suggests 1/4 of deaths over the occupation through 6/06 were from violence. Our data suggest a majority of deaths were from violence. All graveyard reports I have heard are consistent with our results.


The more that Roberts criticizes IFHS, the more likely they are to come after him. You go, guy!

Next Steps

Assume for a second that you think some of the data in L2 is either fraudulent or a product of a corrupted survey process. What data would you want to look at from IFHS to test that hypothesis?

The problem is that L2 supporters can reasonably quibble with how IFHS "adjusts" for clusters that were too dangerous to visit. IFHS reports that:


Of the 1086 originally selected clusters, 115 (10.6%) were not visited because of problems with security. These clusters were located in Anbar (61.7% of the unvisited clusters), Baghdad (26.9%), Nineveh (10.4%), and Wasit (0.8%). Since past mortality is likely to be higher in these clusters than in those that were visited during the IFHS, we imputed mortality figures for the missing clusters in Anbar and Baghdad with the use of information from the Iraq Body Count on the distribution of deaths among provinces to estimate the ratio of rates of death in these areas to those in other provinces with high death rates.


(It is not clear to me why the authors did not perform a similar adjustment because of the missing clusters in Nineveh and Wasit. Perhaps the number of missing clusters was too small to matter? Perhaps the IBC data was not detailed enough to work with?)

Anyway, reasonable people will disagree over whether or not the IFHS adjustments for this problem were too small (L2 supporters) or too large (IBC supporters). On my list of to-dos is trying to calculate what the violent death estimate is without the adjustment. The trick to avoiding the whole morass is just to ignore these governorates altogether. Instead, throw out these 4, plus the 3 in Kurdistan (where everyone agrees things have been peaceful) and focus on the 11 others. Or just the subset of these 11 in which L2 (implausibly) reports extremely high violent mortality.

The beauty of this approach is that the IFHS estimates for these "complete" governorates require no adjustments. No clusters were skipped. Every household was checked. There is no good reason for the IFHS and L2 estimates to be that different. The confidence intervals for the IFHS will be much narrower because there is no extra uncertainty associated with the adjustment for missing clusters. Also, because we are ignoring Kurdistan, we will get a much more focussed look at the differences between IFHS and L2.

And, back of the envelope, those differences will be huge. As the IFHS authors note:


All three sources agreed on the low mortality in Kurdistan. Of all the violent deaths occurring in Iraq, the proportion in Baghdad was 54% in the IFHS, 60% in the Iraq Body Count, and only 26% in the study by Burnham et al.


In other words, the big differences between L2 and IFHS are not in Baghdad. (In fact, and also on the to-do list, it is not clear to me that L2 and IFHS disagree that much about Baghdad, especially if we throw-out the deeply suspect results from cluster 33 in L2.) So, the disagreement in the remaining clusters will be large. Combine a large raw difference with narrow confidence intervals for the IFHS estimates, and you have a recipe for, as the Marxists say, heightening the contradictions.

Reconciliation

One issue that has come up in the Lancetosphere is how much the IBC estimates are undercounts and what light, if any, IFHS and other sources shed on the topic. This is a hard problem with no easy solution. Here is my preliminary take:

Assume for a second that the IFHS estimate of 150,000 total violent deaths for March 2003 through June 2006 is correct. How does that gibe with the 50,000 civilian violent deaths as reported by IBC? (Actually numbers are 151,000 for IFHS and 48,000 for IBC, but I am rounding with abandon.) Both are, obviously, much lower than the 600,000 estimate from L2. (Since there were almost no pre-war violent deaths in L2, "violent deaths" and "excess violent deaths" are the virtually identical in L2.) IFHS is 1/4 L2 and IBC is 1/3 IFHS.

First, the US military was killing, during the invasion, thousands of Iraqi troops and, after the invasion, thousands of insurgents. Those numbers are counted in the IFHS total but not by IBC. Donald Johnson points out this article which reports that the US military claims that it killed 19,000 insurgents from June 2003 through September 2007. Guestimating from their table, it seems like the toll up to June 2006 might be 13,000. But this does not include Iraqi military deaths for the invasion itself. Skimming this Wikipedia article, 10,000 seems a reasonable estimate. So, rounding up, we might have about 25,000 Iraqi soldiers and insurgents killed by US forces during this period. I do not have a sense of whether this is more likely an overestimate or an underestimate.

Second, insurgents were killing many non-civilians in the Iraqi population. The deaths of Iraqi soldiers are not counted by IBC but are included in IFHS. I think that the same is true for police officers. Anyway, it is certainly the case that thousands of Iraqi combatants have died, presumably somewhere between US combat deaths around 2,000 and insurgent deaths of 13,000. Call it 5,000.

Third, there has been a great deal of insurgent/insurgent violence. That is what happens when a civil war starts. Some of these are clearly "civilian deaths." When the local Sadr militia picks up a Sunni man minding his own business and kills him, that is clearly a non-combatant death and should be counted by IBC. But when two armed groups are fighting, as in much of the intra-Shia violence, it is not clear if those deaths are counted (by IBC) as civilian or if they should be. Could there be another 20,000 such deaths? Sure.

So, we have 30,000 Iraqi military/insurgent deaths caused by the US military plus 20,000 Iraqi deaths involving combatants in the civil war. (Again, I don't put any particular faith in these numbers and haven't look closely for good data. The point is that there are certainly tens of thousands of deaths, at a minimum, that are included in IFHS but are, by definition, excluded from IBC.

In summary, there are 50,000 IBC civilian deaths plus 50,000 soldier/insurgent/combatant deaths yielding 100,000 total. But this is 50,000 less than IFHS. Fine. I don't see that discrepancy as a big one. Could IBC be off by a factor of 2? Sure! Could IBC be correct but my summary of Iraqi combatant deaths be off by a factor of 2? Sure!

In general, the IBC and IFHS numbers are broadly consistent (as both the IFHS and OBC principal investigators would no doubt agree) because IBC is counting a subset of the deaths captured by IFHS.

Said and Unsaid

There is much fun discussion in the Lancetosphere about the IFHS study. Lancet defenders are doing everything they can to insinuate that the IFHS results support or validate or, at least, are not inconsistent with L1 and L2. And, to some degree, they are reasonable to do so. For example, both L1 and L2 report that Iraq has 18 governorates. IFHS agrees!

Perhaps that is a bit snarky, but it is important to distinguish between what the authors of a study actually say and what you conclude about the studies data, models and results. For example, the IFHS authors say nothing about L1. They do not even cite it among their references. Perhaps they think that L1 is a great paper and their results support it 100%. Perhaps they think that L1 is completely wrong and should be retracted by the Lancet. They don't say and so we don't know.

But the IFHS authors do make very specific claims about L2. From the abstract:


When underreporting was taken into account, the rate of violence-related death was estimated to be 1.67 (95% uncertainty range, 1.24 to 2.30). This rate translates into an estimated number of violent deaths of 151,000 (95% uncertainty range, 104,000 to 223,000) from March 2003 through June 2006.

Conclusions: Violence is a leading cause of death for Iraqi adults and was the main cause of death in men between the ages of 15 and 59 years during the first 3 years after the 2003 invasion. Although the estimated range is substantially lower than a recent survey-based estimate, it nonetheless points to a massive death toll, only one of the many health and human consequences of an ongoing humanitarian crisis.


In other words, the number of violent deaths estimated by L2 is too high. They are making a claim that a specific number from L2, the 600,000 violent deaths, is an overestimate. (Note that "violent deaths" (IFHS) and "excess violent deaths" (L2) refer to almost exactly the same underlying number because there were almost no violent deaths before the war.)

To mention in the abstract that a different peer-reviewed article is wrong is a strong statement of the beliefs of the IFHS authors.


The most striking difference in rates of death was between those in the study by Burnham et al. and those in the two other data sources for the six high-mortality provinces, which accounted for 64% of all deaths in the study by Burnham et al.


What is most surprising, to me, is how critical the IFHS authors are of L2 (Burnham et al), how they go out of their way to imply that L2 is wrong. Note especially Figure 1. If your main point was the L2 was wrong, I do not think that you could have constructed this figure in a more accusatory fashion.

Of course, just because the IFHS authors think that L2 is wrong does not mean that it is. But Lancet supporters should not pretend that the IFHS authors think that L2 (or L1) is correct about anything when they provide almost nothing but criticisms of L2. Now, that criticism is couched in the polite language of an academic paper, but, given the constraints of acceptable dialog in the New England Journal of Medicine, could the IFHS authors written anything more critical of L2? I don't see how.


There was greater agreement regarding mortality from nonviolent causes between the IFHS study (372 deaths per day) and the study by Burnham et al. (416 deaths per day)


This is one of the few (only?) places were IFHS offers support to L2. But this support is more damning than helpful because none of the serious critics of L2 have complained about their estimates of post-invasion non-violent mortality. After all, the increase that L2 found (50,000) was not even statistically significant. As Michael Spagat wrote:


The main problem with the comparison highlighted by the L2 authors [in asserting that L2 validates L1] is that it is of all excess deaths, not just violent deaths. All suggestions of possible bias in L2 that we know of, sampling or non-sampling, pertain to violent deaths. The available facts simply do not support a claim that L1 and L2 suggest very similar numbers of violent deaths. By persistently conflating non-violent deaths with violent deaths the L2 authors have obscured this essential point.


Correct. To say that the L2 estimates of non-violent death (not "excess non-violent death") are reasonable is to congratulate them on the fact that they got the number of governorates in Iraq correct. It is true, but faint praise.


The most striking difference in rates of death was between those in the study by Burnham et al. and those in the two other data sources for the six high-mortality provinces, which accounted for 64% of all deaths in the study by Burnham et al.


People like me think that the reason for this is that the L2 data from those provinces is fraudulent, either made up out of whole cloth or derived from a corrupted sampling scheme. One of my guesses is:


The L2 interviewers, for whatever reason, decide that they want to report more deaths. They go to a cluster and, following the procedure outlined in Burnham's discussion, gather all the neighborhood kids around and tell them about the survey. So far, so good. But then they ask the kids, "Has anyone in the neighborhood died in the last couple years, especially violently?" The kids know this and tell them. Then the interviewers preferentially select those houses, either a picking and choosing around the neighborhood or just placing the 40 house cluster in the part of the neighborhood that, by chance, had the most deaths. In that scenario, all the data is "accurate" in the sense that no one is making anything up, but the mortality estimate will be much too high.

Access to the demographic data might allow us to catch that because the houses would have way more young men than a random sample should have, given what we know about the age/sex distribution in Iraq from sources like ILCS.


Now, of course, I wasn't in Iraq for the surveys. But a corrupted interview process would help explain why the L2 results in these provinces are so wildly divergent from those of IFHS and IBC. Note also that this could have occurred without the knowledge of any of the L2 authors. Only Lafta was in Iraq for the survey and, since there were two teams operating independently, the fraud (if that is the cause) might have happened without his knowledge. This is why people like Fritz Scheuren want to see the underlying data classified by interviewer. This is why everyone would have liked to see demographic information. Remind me again about why Les Roberts lied to Tim Lambert about the whether or not age data was collected for individual households.

Back to IFHS. Table 4 is another example of the IFHS authors arguing, politely, that the L2 results are completely implausible. Could any critic of L2 constructed this table in a more damning fashion? Not that I can see.

And the final statement comes in the Discussion.


The IFHS results for trends and distribution of deaths according to province are consistent with what has been reported from the scanning of press reports for civilian casualties through the Iraq Body Count project. The estimated number of deaths in the IFHS is about three times as high as that reported by the Iraq Body Count. Both sources indicate that the 2006 study by Burnham et al. considerably overestimated the number of violent deaths. For instance, to reach the 925 violent deaths per day reported by Burnham et al. for June 2005 through June 2006, as many as 87% of violent deaths would have been missed in the IFHS and more than 90% in the Iraq Body Count. This level of underreporting is highly improbable, given the internal and external consistency of the data and the much larger sample size and quality-control measures taken in the implementation of the IFHS.


"Highly improbable" is New England Journal of Medicinese for "total crap."

And this is where things get fun. Lancet redoubts like Crooked Timber and Deltoid are filled with hopeful attempts to rescue L2 from IFHS, to argue that, by some excess death calculation that the IFHS authors do not use, IFHS supports L2. Perhaps. All of us spend time trying to use the raw data and models of a given paper to answer questions that the papers authors fail to address. Who knows? Perhaps the next paper from the IFHS team will be an excess death calculation that matches perfectly with L1 and L2. Hah! Anyone with experience reading scientific papers knows that the IFHS paper could not have been more critical of L2. Future work from these authors will almost certainly continue in the same vein. Anyone want to bet otherwise?

There is no way to spin the IFHS paper as anything less than a total rejection of the L2 violent death estimates. That doesn't mean that the IFHS authors are right, but we need to be clear on what they say.

IFHS Versus L1

Some Lancet supporters claim that the results from IFHS support L1. I think that this is wrong. Consider a comment by me from Crooked Timber.


The issue before us is: How does the 151,000 estimate of violent deaths in all of Iraq from IFHS compare with L1? Now, since the surveys use different terminology over a different time scale, we will not be able to make an exact comparison. But, L1 reports 73 violent deaths in all of Iraq post-invasion compared to 1 pre-invasion (from the phantom US bombing runs, no doubt.) (See Table 2.) Speaking very roughly each excess death in the sample corresponds to 3,000 or so deaths in the population. So, for all of Iraq, there were around 200,000 violent deaths in L1 through September 2004. (I am obviously skirting over the, in this context, unimportant distinction between violent deaths and excess violent deaths.)

IFHS estimates 151,000 violent deaths through June 2006. Relative to IFHS, the L1 estimate is ludicrously high.

Again, this is just back of the envelope, but I wanted to help Tim clean out his garage, following Kieran’s kind suggestion.

Now, Tim might argue that we need to exclude Falluja for this that or the other reason. Fine. If the L1 authors had just dropped Falluja from all of their analysis (or included it everywhere or done both), I wouldn’t have objected so much. But they picked and chose. Yet, in this context, you don’t get to play that game. The IFHS authors estimate 151,000. That is their number. You can either try to get numbers out of L1 that are comparable to that (as I do above). Or you can claim that such a comparison is impossible. But you can’t just claim the comparison works for the subset of the IFHS data that you want to look at.

Crooked Timber Comment

I comment that I made at Crooked Timber is worth repeating here. I have also been active in Deltoid recently: here, here, here and here. There is no rest for the wicked. Of course, the master plan is to convince smart folks like Tim Lambert and Daniel Davies that the underlying data from L2 is suspect. Once they are on my side, things will go much faster.


----------
Did someone call for me? Hello Crooked Timber! I have been banned from Henry and dsquared's threads, but, since Kieran has not banned me, I assume he does not object to my contribution. Kieran writes:


A study like this gives us good reason to substantially revise our estimate of the total number of excess deaths downward. The Burhnam et al estimate of excess deaths looks like it was too high, assuming that the new survey is basically reliable.


Exactly correct! Last year, my estimates were:


If I had to bet, I would provide much wider confidence intervals than either the Lancet authors or most of their critics. Burnham et al. (2006) estimate 650,000 "excess deaths" since the start of the war with a 95% confidence interval of 400,000 to 950,000. My own estimate would center around 300,000 and range from 0 to 1.2 million. Obviously, no one is really interested in my estimate --- derived as it is from reading the literature and associate debates --- but I thought it reasonable to be upfront about my prior beliefs.


In retrospect, I should have placed more weight on the informed estimates of people like Jon Pedersen. He estimated violent deaths at 100,000 (1/6th of the L2 estimate) and that sure matched up nicely with the 150,000 from IFHS. So, my new estimate is 150,000 (at first glance, this new paper seems much better than L1 or L2) with a confidence interval of 0 to 500,000.

It would be interesting to read the current estimates of folks like Kieran and dsquared. Note that I wrote a month ago:


Where is the debate going? I sometimes worry that, like so many other left/right disputes, this will never be resolved, that we will never be sure whether or not the Lancet articles were fraudulent. Will these estimates be the Chambers/Hiss debate of the 21st century? I hope not. Fortunately, other scientists are hard at work on the topic, reanalyzing the data produced in L2 and conducting new surveys. Both critics and supporters of the Lancet results should be prepared to update their estimates in the face of this new evidence. If independent scientists publish results that are similar to those of the Lancet authors, then I will recant my criticism. Will Lancet supporters like Lambert and Davies do the same when the results go against their beliefs? I have my doubts.


I should not have doubted Kieran's willingness to update his estimates. My apologies! dsquared, on the other hand, is acting about how I suspected. Is there any new information that would cause him to doubt the L2 results?

Because more stuff is coming! What's most interesting about IFHS is how they went out of their way to attack L2. They didn't need to do that. They could have been much nicer. They could have spun the story as Roberts and dsquared would like to. Instead, they go for the jugular, as much as you can in the NEJM. They highlight how their confidence interval rejects the L2 range by more much more than 100,000 deaths. They don't just argue that they are right. They argue that L2 is very, very wrong.

Who is right? Time will tell. Did everyone catch how Horton was shoving the L2 authors off the sled in the National Journal article?


Today, the journal's editor tacitly concedes discomfort with the Iraqi death estimates. "Anything [the authors] can do to strengthen the credibility of the Lancet paper," Horton told NJ, "would be very welcome." If clear evidence of misconduct is presented to The Lancet, "we would be happy to go ask the authors and the institution for an official inquiry, and we would then abide by the conclusion of that inquiry."


Hardly a ringing endorsement! Perhaps Richard Horton knows/suspects that something is not right with the L2 data . . .

Where is this going? The wheels of science grind slowly, but they grind very fine indeed. If the data underlying L1/L2 is fake, then the Lancet papers will be the most important scientific fraud of the decade. Think that is impossible? Think again.

What can the Crooked Timber community do? Act like scholars and scientists. (As Kieran does in this post.) Keep an open mind. Consider all the evidence. Look at the underlying data. Study the statistical models. Replicate the results. Make our findings public.

One small step would be for dsquared to allow me to publish comments in his threads on this topic. But perhaps open discussion and debate is not what he is looking for.

Friday, January 04, 2008

National Journal Article

Neil Munro's National Journal article is out. I haven't had a chance to read it closely, but my quotes are not as contextualized as I would like them to be.


Still, the authors have declined to provide the surveyors' reports and forms that might bolster confidence in their findings. Customary scientific practice holds that an experiment must be transparent -- and repeatable -- to win credence. Submitting to that scientific method, the authors would make the unvarnished data available for inspection by other researchers. Because they did not do this, citing concerns about the security of the questioners and respondents, critics have raised the most basic question about this research: Was it verifiably undertaken as described in the two Lancet articles?

"The authors refuse to provide anyone with the underlying data," said David Kane, a statistician and a fellow at the Institute for Quantitative Social Statistics at Harvard University.


That is correct, but it is important to note that the authors' behaviour was much better in L2 than in L1. In L2, most researchers were provided with some of the data. I attribute this to goodwill and professionalism on the part of lead author Gilbert Burnham. But, it is still pathetic that they refuse to share the data with Spagat et al and that they have yet to (will never?) allow Scheuren and others to see if there are problems with different interviewers providing anomalous results. In L1, their behaviour has been horrible, due mostly, I believe, to Les Roberts' attitude. No one has seen the underlying data for L1, other than cluster-level summaries. This is not the way that scientists ought to behave.

On this topic, I wish that Munro had quoted me about the fact that, as far as I know, no scientific team has ever granted data access, however incomplete, to some critics but not others. It is inexcusable for the Lancet authors to show data to me but not to Spagat.


To Kane, the study's reported response rate of more than 98 percent "makes no sense," if only because many male heads of households would be at work or elsewhere during the day and Iraqi women would likely refuse to participate. On the other hand, Kieran J. Healy, a sociologist at the University of Arizona, found that in four previous unrelated surveys, the polling response in Iraq was typically in the 90percent range.


Again, this is an accurate quote, but the context is off. My key point is not about how much time Iraqi men are away or how likely Iraqi women are to participate in a survey. Who knows? My key point is that there has never been a single-contact survey with 98%+ participation, in any country at any time on any topic. Never. What are the odds that the most controversial survey of the decade with achieve an unprecendented repsonse rate? More background here.


The authors should not have included the July data in their report because the survey was scheduled to end on June 30, according to Debarati Guha-Sapir, director of the World Health Organization's Collaborating Center for Research on the Epidemiology of Disasters at the University of Louvain in Belgium. Because of the study's methodology, those 24 deaths ultimately added 48,000 to the national death toll and tripled the authors' estimate for total car bomb deaths to 76,000. That figure is 15 times the 5,046 car bomb killings that Iraq Body Count recorded up to August 2006.

According to a data table reviewed by Spagat and Kane, the team recorded the violent deaths as taking place in early July and did not explain why they failed to see death certificates for any of the 24 victims. The surveyors did remember, however, to ask for the death certificate of the one person who had died peacefully in that cluster.


First, where is documentation for the claim that the survey was supposed to end on June 30th? I have never heard of that. In fact, I doubt it. When you go and start fieldwork for a survey in some war-torn country, you certainly have a plan and a schedule that you hope to keep. I believe that they wanted to finish by June 30th. But why would the study protocal require that? It wouldn't. There is no reason to put on such a straight-jacket. Instead, the protocal called for getting 50 clusters. However long that took is how long it would take.

Of course, I am still deeply suspicious of the results for that cluster. Finding a whole bunch of deaths at the end of the survey --- and in a category, car bombs, that you wanted/expected to much larger than the data that you had gathered so far --- is awfully convenient, just as finding scores of deaths in Falluja at the very end of L1 was convient. But the June 30 date is, as far as a I can tell irrelevant.

Second, I agree that the authors did not explain why they did not ask for death certificates in that specific case. But a plausible explanation would be that the deaths happened a day or two before the survey and that, therefore, the interviewers knew that the families would not yet have death certificates available. So, why ask for them? As always, the authors should be a lot more transparent and willing to answer questions, but I think that they have plausible responses to these issues.

None of which means that I believe those answers. My guess continues to be that the/some interview teams went to a neighborhood and asked the kids who had died and then interviewed those houses preferentially. I suspect that they went out looking in early July for a neighborhood with car-bomb deaths, even went to that specific neighborhood after they heard about the car-bomb on the news. But suspicions are not proof. I would be happy to bet, however, that Lafta was a part of the team that did those interviews, just as he was the one to go to Falluja for L1.

I think that the issue about car-bomb deaths that is most damning is how the authors pretended in the paper that there was a gradual rise in such deaths, consistent with news reports and IBC, over the course of the time period when, in fact, car-bomb deaths were constant for the two years prior to July 2006. Alas, Munroe does not make that point.

I think that the table associated with the article is fine as far as it goes. But I wish that Munro had used my tables which show how "forgetting" to ask for death certificates was much more common for later deaths and for violent ones. That is the damning evidence that something more than forgetfulness was going on when the interviewers failed to even check for death certificates.

But all these are quibbles. Munro has done a fine job in gathering all sorts of evidence and arguments. I spoke with him several times and there is no doubt that he understands the ins and outs of the debate.

I hope to have more substantive comments on the article in due course.