Monday, March 12, 2007

Data Validity

This article on Lancet II and the news reactions thereto is interesting.

As a biostatistician, the Bloomberg School's Zeger has thought a lot about the study. "I am so impressed by Gil because he was able to conduct a scientific survey on a shoestring budget under very difficult circumstances," he says. He does not dismiss all concerns about the methodology. "It was the best science that could be done under the circumstances. We're always making decisions absent scientific-quality data — that's public health practice." But he draws an important distinction between practice and science. "We tend to have a different standard for scientific research. This study was on the research end. It was published in a scientific journal. There are a lot of aspects that are below the reporting standards you would have if you were doing a U.S. clinical trial, for example: the documentation for each case, the ability to reproduce the results, detailed information about how everything was done. I think it would be useful for the school and the public health community to think through these kinds of issues.

"[But] it's absolutely appropriate, on very limited resources, to go into a place like Iraq and make an estimate of excess mortality to use in planning and making decisions. My own sense is I would rather err on the side of generating potentially useful data, with all of the caveats. I think noisy data is better than no data." Zeger notes that the tests of the data's validity, built into the second survey at his recommendation, all checked out. He admits the numbers are hard to grasp, especially the study's estimate that from June 2005 to June 2006, Iraqis were dying at a rate of 1,000 per day. "That's a lot of bodies," he says. "I have a hard time getting my mind around that. But as a scientist, what do you do? That's the number."

I certainly agree that noisy data is better than no data, but only if we have access to all the details of where that noisy data comes from. What "vaidity" checks is Zeger talking about?


Presumably the "validity checks" are the claims that 92% of those asked produced death certificates.

Though far from providing "validity" imv, it's a very dubious claim that itself raises more cause for doubt:

I have written Zeger to ask him.

From the context, I would have thought that "validity" refers to things like missing data, extreme values (to many violent deaths in one cluster) and so on.

