Saturday, April 14, 2007

Infinity

As noted previously, Gilbert Burnham's presentation at MIT featured information about the results for Lancet I. Burnham is describing their estimate of excess deaths due to the war. Their estimate was 98,000 if the data from Falluja is ignored. The estimated death rate in Falluja was much higher than elsewhere so removing Falluja was described as "conservative," by both the authors and other commentators. Recall the findings from the paper (pdf) itself.


The risk of death was estimated to be 2·5-fold (95% CI 1·6–4·2) higher after the invasion when compared with the preinvasion period. Two-thirds of all violent deaths were reported in one cluster in the city of Falluja. If we exclude the Falluja data, the risk of death is 1·5-fold (1·1–2·3) higher after the invasion. We estimate that 98 000 more deaths than expected (8000–194 000) happened after the invasion outside of Falluja and far more if the outlier Falluja cluster is included.


Any empirical researcher is vaguely suspicious of results which just barely reject the null hypothesis. That 8,000 figure is awfully close to zero. Would other reasonable parameterizations generate the same result? Perhaps. But, given that this excluded Falluja, there is nothing to worry about, right? The war must have increased the mortality rate.

But note how Burnham described things (quote starts around 19:30) during the talk.


We got a huge amount of criticisms for these confidence intervals, and I'll come to this confidence interval in just a bit. But we had a confidence interval at the low end of 8,000 and at the high end of 194,000.

...

Now this is what the confidence intervals would look like. There is a 10% probability that it was less than 44,000 and only a 2.5% chance that it was less than 8,000. If we put Falluja into it, the top end of the confidence interval would be infinity. It really skewed things so badly that we decided that we should just leave it out and be conservative.


Huh? He can't possibly mean that the top end would be infinity. I would think (albeit, not as an expert on confidence intervals in cluster sampling) that a top end of infinity would imply a bottom end of the confidence interval at infinity as well. Wouldn't it have to? Note that excess deaths is a real valued variable. If it were bounded by zero, then, of course, you might have a confidence interval which was bounded below but not above.

For real valued variables, I do not think that I have ever seen an applied situation with actual data in which the confidence interval was infinity above but not below. Has anyone else? Almost every confidence interval of a real valued variable is symmetric. If it is infinite at the top, then it is infinite at the bottom.

Also, the fact that it is infinite at the top suggests that this was not a bootstrap confidence interval. I think that it is very hard (impossible?) to get infinity for bootstrap confidence intervals. A bootstrap always gives you back something. That would suggest that they were using an analytic calculation of some kind, one that failed to converge.

And that wouldn't be surprising. If the Falluja data is enough of an outlier, then the software has trouble calculating a confidence interval. It spits back infinity of NA or NaN or whatever.

Assume for now that the confidence interval was infinity on both sides. This would mean that only by ignoring the Falluja data were the authors able to reject the null hypothesis of zero excess deaths due to the war. It would be highly misleading to report a rejection of the null hypothesis which required throwing out some of the data without at least telling your readers how fragile this result is.

Perhaps Burnham just meant "really big" instead of "infinity." But, if so, then the same large range was probably present at the lower limits as well. That is, even if the upper confidence interval was just some huge number and not literally infinity, that would suggest that the lower bound was similarly large and, probably, covered zero. (This might depend on how much higher the mean estimate was, of course.) In either case, if getting a statistically significant result requires throwing out some of the data --- even though you have no reason for thinking that data is less accurate than other collected data --- you must report this fact to your readers.

Yet all of this discussion is in direct contradiction to the paper. The authors claim to have calculated a confidence interval (for the increase in risk) include the Falluja data.


The risk of death was estimated to be 2·5-fold (95% CI 1·6–4·2) higher after the invasion when compared with the preinvasion period.


A 4.2 upper confidence level for the relative risk is neither infinite nor, I think, particularly high. Might not a raging war make things 4 times more dangerous in a country? Why would Burnham claim that the upper confidence limit is infinite while the paper reports a reasonable number? Is there some reason why one might have a reasonable upper limit for the relative risk but not for the total numbers of deaths? Not that I can think of.

No doubt I am just missing something obvious. Clarifications would be welcome.

0 Comments:

Post a Comment

<< Home