Tuesday, March 02, 2010

Dubious Polling

From the "2010 Top Ten "Dubious Polling" Awards" by Pollsters George Bishop and David Moore.

With this article, veteran pollsters, authors and political scientists George F. Bishop and David W. Moore issue their Second Annual Top Ten “Dubious Polling” Awards. These awards are intended to mark for posterity some of the most risible and outrageous pronouncements by polling organizations during the previous year.



WINNERS: The John Hopkins Bloomberg School of Public Health and one of its professors, Dr. Gilbert Burnham, for stonewalling in the face of serious questions about a flawed survey project, which reported more than 600,000 Iraqi deaths from 2003 to 2006. The head researcher was formally censured by the American Association for Public Opinion Research (AAPOR) for covering up his data collection efforts, but the Bloomberg School refuses to investigate the methodology. (Ah, the wisdom of the three monkeys: “See no evil, hear no evil, speak no evil!”).

BACKGROUND: In 2006, the British medical journal, The Lancet, published the results of a survey, designed and supervised by Dr. Gilbert Burnham of the John Hopkins Bloomberg School of Public Health, and his colleagues.* The survey purported to show that about 600,000 Iraqi deaths occurred in Iraq by July 2006, as a consequence of the invasion of Iraq.

A lot of people were against the war, but jacking up the body count with bad studies is not a good tactic for anyone. According to economics professor Michael Spagat of Royal Holloway College, these results were anywhere from seven to 14 times as high as other credible estimates, including those made by the non-partisan Iraq Body Count, a consortium of U.S. and U.K researchers, also concerned about the human toll of the war.

Such large differences in estimates led other researchers to question the methodology of the study. But contrary to scientific norms, Burnham refused to provide details about how the survey was conducted. When a complaint was lodged with AAPOR, its standards committee also tried to obtain such details, but was rebuffed. That led to the censure.

What exactly were John Hopkins Bloomberg School, and Burnham, et. al., hiding? AAPOR asked for the kind of information that any scientist doing this type of work should release: a copy of the questionnaire, the consent statement that interviewees have to see, a full description of the selection process, a summary of the disposition of all sample cases, and how the mortality rate was calculated.

John Hopkins Bloomberg School initially stood behind the study, but then eventually concluded that Burnham had made some unauthorized changes in his methodology, and thus “the School has suspended Dr. Burnham’s privileges to serve as a principal investigator on projects involving human subjects research.”

But the Bloomberg School has not come clean with the problems of the research project. Their press release admitted that their internal review “did not evaluate aspects of the sampling methodology or statistical approach of the study.” Instead, Bloomberg asserts, “It is expected that the scientific community will continue to debate the best methods for estimating excess mortality in conflict situations in appropriate academic forums.”

Let’s see: The Bloomberg School will not attempt to evaluate what experts believe is almost certainly a faulty methodology, saying the scientific community should make the evaluation. But then the school advises Burnham not to release details about his methods, so the scientific community can’t have the information it needs for a definitive assessment.

Sounds like a cop-out and a Catch 22, all rolled into one!

And we thought Richard Nixon was tricky.

* Burnham G, Lafta R, Doocy S, Roberts L. 2006a. ‘Mortality after the 2003 invasion of Iraq: a cross-sectional cluster sample survey’. The Lancet 368:1421-1428. It can be accessed online at http://brusselstribunal.org/pdf/lancet111006.pdf.

Thursday, February 04, 2010

Response Rates

My hard working research assistant Daniel Suo spent a bunch of time looking for surveys with a response rate as high as Lancet II. We never found a single one. He also gathered some quotes from survey experts. Here they are:

"-Any U.S. survey with contact rates higher than the figures you quoted from the Census Bureau would be suspicious. It is not easy to find people at home AND who will answer the door.
-Figures like 90% would indicate that we are NOT dealing with a random selection of households. Some market research surveys that use quota samples instruct interviewers to go to a certain block and then knock on every nth door until they find someone at home. That's how you get contact rates of 100%. "
- Professor Tom Piazza, University of California at Berkeley

"in today's survey environment, there is no such thing as an extremely high response
rate on a door-to-door survey with just one contact attempt. The demographics work against
it, along with increasing resistance to surveys."
- Professor John Goyder, University of Waterloo

" I think the best estimate is from Groves and Couper, page 82 - - 49% contacted on first attempt.
Mind you, only some of those consented to the interview, so this is simply contactability. JG"
- Professor John Goyder, University of Waterloo

"If I had to pick a number, I'd say that response rates
above 90% on the first contact for door to door surveys of people are
unusual these days."
- Professor Jerry Reiter, Duke University

"I think it is impossible to get even close to 100% on first attempt, people out, or not answering doors.
I would ask them to define what they mean buy response rates, they may be playing with the definition to suit their needs."
- Randa Bell , CMRP, PRC Vice President- Marketing and Client Services ASDE Survey Sampler, Inc

"if you can
get 25% or above I think you can be confident presenting your results to an
academic crowd."
- Professor Phillip Howard, University of Washington

"I don't know of any study that gets even close to 100 percent nor a
systematic list of studies focusing on their response rate. "
- Professor Sidney Verba, Harvard University

"In Mexico, single-contact-attempt door-to-door survey have an average response rate of 70%. There is not such a thing as a 90% or higher response rate (at least not that I know). "
- Pablo Paras

Thursday, November 19, 2009

From this article in the New York Times:

One big question hovering over McChrystal is whether his experience in Iraq truly prepares him for the multiheaded challenge that faces him now. For nearly five years, McChrystal served as chief of the Joint Special Operations Command, which oversees the military’s commando units, including the Army Delta Force and the Navy Seals. (Until recently, the Pentagon refused to acknowledge that the command even existed.)

As JSOC’s commander, McChrystal spent no time trying to win over the Iraqis or training Iraqi forces or building the governing capacity of the Iraqi state. In Iraq (and, for about a third of his time, in Afghanistan), McChrystal’s job, and that of the men under his command, was, almost exclusively, to kill and capture insurgents and terrorists.

The rescue of Iraq from the cataclysm that it had become by 2006 is an epic tale of grit and blood and luck. By February of that year, Iraq had descended into a full-blown civil war, with a thousand civilians dying every month. Its central actors were the gunmen of Al Qaeda, who, with their suicide bombers, carried out large-scale massacres of Shiite civilians; and the Shiite militias, some of them in Iraqi uniforms, who retaliated by massacring thousands of young Sunni men.

Emphasis added. Is this turning into the conventional wisdom? Instead of the 25,000 or so deaths per month that Lancet 2 would suggest for this period, the New York Times is now going with 1,000? That would be in the same neighborhood (even less than?) IBC.

Thursday, June 04, 2009

Old Posts

I am deleting some Lancet posts that I placed on EphBlog and putting them here.

October 21, 2006: Dust Up

Many readers will be perplexed about Billy's comment on the previous post.

Dude, don't you have anything to say about the dust-up at the SSS Blog?

Billy is referring to a post I wrote about the Burnham et al (2006) paper about the number of excess deaths in Iraq resulting from invasion and occupation. (That post was taken down and replace with this.) For background reading, you can start with Dan Drezner '90. I have been involved in the debate since the first Lancet study, Roberts et al (2004). The post I wrote generated criticism here, here and here.

Now, were it not for Billy's comment, I probably wouldn't have mentioned any of this. This blog is about All Things Eph, after all, not All Things David Kane. But if it were a different Eph than I involved in the debate, I would certainly be following it here. The reason that I published the original post (and related ones) at the SSS Blog was that they seemed to belong there (or someplace like there) and not here.

Now, if this were all happening to some other Eph, I would invite him to start an Eph Dairy devoted to the topic here, to post as often as he liked about his endless debates about the study. I would encourage this Eph to make most of his posts brief, with continuations below the break, so that the main page were not so cluttered with ramblings that were not of broad interest to our audience.

Should I follow the inverse of Kant's categorical imperative and treat myself the way I would treat any other Eph. Perhaps. For now, I'll leave it to the readers of EphBlog, especially my fellow authors. Should I start an Eph Diary about my criticisms of these Lancet articles?

August 5, 2007: Eph Diarist on Lancet/Iraq?

Regular readers will know that I am heavily involved (here, here and here) in work critical of estimates of mortality in Iraq published in The Lancet. I think more of this belongs in EphBlog. Do you agree?

Our Eph Diarist feature tries to bring the non-Williams related work of an Eph to a broader audience. If an Eph asked me if she could write regular entries about her efforts on this topic, I would gladly publish her, perhaps with a request that she put most of each entry "below the fold" (as I have done with this one) so that readers uninterested in the topic are not distracted. If people complained that this, like Derek's Red Sox Diary, did not belong on EphBlog, I would tell them to not read her stuff.

But, since I am the one interested in doing this, I can hardly judge my own case. So, regular readers, what do you think? I will probably post an entry or two just to give people a flavor. I like to think that this is great stuff. The substantive issue, mortality in Iraq, is important. The statistical details are subtle. Seeing how science works behind the scenes --- working papers, conference presentations, peer review --- is interesting.

And, if you find all this boring, don't read the entries!

August 6, 2007: "Not Credible"

The most damning comment about the Lancet estimates of mortality in Iraq came in the Wednesday afternoon session.

The presentation by Les Roberts focussed on the second Lancet survey (termed L2 by we aficionados). One discussant was Fritz Scheuren of NORC at the University of Chicago. Scheuren, besides being a distinguished statistician, is a past president of the American Statistical Association (the organization in charge of the entire conference).

Scheuren termed the response rate in L2 "not credible." (That is an exact quote. He used the phrase several times and clearly meant it.) Comments:

1) It was amazing to see such an accusation at a professional gathering. Scheuren was not suggesting that Roberts himself was guilty of fraud. Instead, he seems to believe that there is no way that the Iraqi interviewers achieved a 98% response rate.

2) Scheuren is an expert on surveys in conflict zones.

Fritz Scheuren, Ph.D., is a statistical consultant for HRDAG [Human Rights Data Analysis Group]. Recently, Dr. Scheuren consulted on the methods of statistical analysis for Peru's Truth and Reconciliation Commission and reviewed the report of the analysis. In late August of 2003, he visited Peru, along with Dr. Ball and Jana Asher, to meet with representatives of political, military, and civil society groups to explain the technical basis of the findings detailed in the report. In recent years, Dr. Scheuren has advised on the overall direction and approach to the statistical analysis for several HRDAG projects, including Guatemala and Kosovo. As a top statistician in the field, he has also provided critical peer-review of HRDAG work.

If a statistician of Scheuren's caliber and experience says that you have a problem, you have a problem.

3) Note that this point (a 98%+ response rate is ipso facto evidence of fraud) was what first made me (in)famous in Lancet circles last November. Here is a copy of the original post and associated links. Scheuren seems to agree with me. Does ace Lancet defender Daniel Davies consider Scheuren guilty of "devious hack-work?"

4) During the Q&A, Scheuren and Roberts got into a back and forth on data sharing. Scheuren made clear that, while he did not view the response rate as "credible," he did not think that it was absolutely impossible either. He was ready to be convinced if Roberts were to share more data with him. Specifically, Scheuren wants to see the results broke down by interviewer/team. For example, if one of the two main teams reported 100% response rates while the other reported 96%, Scheuren would (I think) be a lot more skeptical of the first team's results. Roberts mumbled something about that data not being on the computer (which is plausible). But Roberts also expressed no interest in providing Scheuren (or anybody else) with more data. At one point, Roberts said that, if it were up to him, the Lancet authors would not have shared any of their data with anyone else.

I will have further updates on what happened in Salt Lake City over the next couple of days. When the history of this sad saga is written, the moment when a past president of the American Statistical Society termed the Lancet numbers "not credible" will mark the end of the beginning of the debunking of these flawed studies.

Let me end with the same challenge that I have been issuing for the last year: Is there another survey like the one conducted by L2 (nationwide sample with a single contact attempt) for which the response rate is as high or higher than 98%? I have never been able to find one. Can anyone? Any survey in any country on any topic at any time? And, if we can't, what are the odds that one of the most controversial surveys of the decade should just happen, by chance, to have the highest response rate ever?

"Not credible," indeed.

[And from the comment thread to that post.]

Excellent catch on this article.

Statisticians grill surveyor of Iraqi casualties of war
By Paul Foy
The Associated Press

The courtroom-style questioning came in a packed ballroom in Salt Lake City at the world's largest gathering of statisticians.

I think that I asked the most "courtroom-style" question. More on that later.

On the hot seat: A globe-trotting researcher who says his team's surveys of Iraqi households projected nearly 655,000 had died in the war as of July 2006, a number still 10 times higher than conventional estimates.

Leslie F. Roberts and others from Johns Hopkins University took accounts of births and deaths in some Iraqi households to estimate that the country's death rate had more than doubled after the 2003 invasion.

Number crunchers this week quibbled with Roberts' survey methods and blasted his refusal to release all his raw data for scrutiny - or any data to his worst critics. Some discounted him as an advocate for world peace, although none could find a major flaw in his surveys or analysis.

I think that "blasted" is a fair description of the reaction to Roberts' refusal to share data with his critics. In terms of a "major flaw," too bad the reporter did not ask me! I give good quote.

However, Stephen Fienberg, a professor of statistics at Carnegie Mellon University in Pittsburgh, said: ''I thought the surveys were pretty good for non-statisticians.''

Harsh! More on what else Fienberg had to say soon.

Roberts, an epidemiologist, said he is opening a new front in the study of public health hazards: War. He has conducted about 30 mortality studies since 1990 in conflicts around the globe, including the Congo, where he was similarly accused of exaggerating war-related deaths.

Says who? I have never read of such an accusation. Has anyone? This sounds to me like something that Roberts spoon fed to the reporter. "See? I get this criticism all the time . . . "

Roberts organized two surveys of mortality in Iraqi households that were published last October in Britain's premier the medical journal, The Lancet. He acknowledged that the timing was meant to influence midterm U.S. elections.

Really? Roberts admitted this (sort of) with regard to L1 but has since backed away. I have never seen him admit this for L2. Has he? Perhaps this came up at lunch because I did not see Roberts do this in the main presentation.

''It puts you in a position where you are going to get attacked,'' said Fritz Scheuren, a senior fellow at the University of Chicago's National Opinion Research Center, who is trying to organize another Iraqi survey to see if he can match Roberts' results.

Scheuren, the American Statistical Association's former president, said he couldn't find anything wrong with The Lancet surveys.

This is not inconsistent with what I wrote above. Scheuren is a careful guy. He can not point to a flaw in the data that he has seen. He wants to get more detailed data. He still does not find the results "credible."

He complained, however, that he wasn't able to get Roberts to reveal which of his Iraqi surveyors conducted which surveys, information that could reveal any bias in workers who compile consistently implausible results.

Roberts said he won't release the researchers' identities for fear of exposing them to retaliation. The Iraqi government has strongly disputed the findings.

More heat (generated by Roberts?) than light. Scheuren does not want to know the "researchers' identities." None of us do. We just want to know that Researcher A reported this data; Researcher B reported that data; and so on. Roberts has an annoying habit of pretending that his critics are asking for names when he knows we just want to see if the results vary greatly by interviewer.

August 7, 2007: Holocaust Denial

Besides his formal presentation, Les Roberts also participated in a luncheon roundtable entitled "Media Coverage Regarding Data on Mortality in Iraq."

I was able to pull up and chair and listen in. Roberts is a charming and engaging speaker. Most of the ten or so people around the table were clearly fans. For me, the best part was when Roberts, perhaps playing a bit to his audience, compared people who disagree with the Lancet estimates to people who deny the Holocaust.


I have heard of Roberts using this comparison in the past, but this was the first time that I got a chance to witness it myself. He comes close to making that accusation in his speech at Brown, but shies away from making the comparison explicit, perhaps because he knew that the tape was running. He frames the issue by pointing out (correctly!) that Iranian President Mahmoud Ahmadinejad makes people angry when he describes the Holocaust as a "myth." Roberts thinks that President Bush dismissing the Lancet estimates amounts to the same thing.

It is not clear if Roberts thinks that Fritz Scheuren, past president of the American Statistical Association, is no better than famous Holocaust denier David Irving. Clarification welcome!

Thursday, February 26, 2009

Burnham Sanctioned

Two weeks ago, I claimed that a partial victory for we Lancet skeptics would come with the "Censure of Roberts/Burnham by Johns Hopkins." I declare victory!

Iraq Researcher Sanctioned

Associated Press
Tuesday, February 24, 2009; Page A04

The Johns Hopkins Bloomberg School of Public Health is sanctioning the lead author of a 2006 study that suggested massive civilian deaths in Iraq.

The school announced yesterday that it is barring Gilbert M. Burnham from serving as a principal investigator on projects involving human subjects, saying he violated school policies by collecting the names of those interviewed.

The school completed an internal review of the study, which estimated that nearly 655,000 Iraqis had died because of the U.S.-led invasion and war in Iraq. The review found that inclusion of identifiers did not affect the results of the study.

The school says the paper published in the Lancet medical journal incorrectly stated that identifying data were not collected. A correction will be submitted.

Now, "sanction" is not the same thing as "censure," but it is close enough for blog-work. The original Hopkins news release is here. (Hat tip to Tim Lambert.) Key sections:

The Bloomberg School of Public Health’s IRB acted properly in determining that the original study protocol was exempt from review by the full IRB under federal regulations. The original protocol explicitly stated that no names of study participants or living household members would be collected. The protocol also included an appropriate script to secure verbal consent from study participants, rather than a written consent process that would have included participants’ signatures.

Note that the interests of Hopkins and Gilbert Burnham are not necessarily aligned, although both wish the whole debate would go away. Hopkins wants to ensure that the Federal Government isn't angry with it since Hopkins is so dependent on federal money. Hopkins' key concern is to demonstrate that the university did nothing wrong. If that means throwing Burnham under the bus, so be it.

An examination was conducted of all the original data collection forms, numbering over 1,800 forms, which included review by a translator.


A review of the original data collection forms revealed that researchers in the field used data collection forms that were different from the form included in the original protocol. The forms included space for the names of respondents or householders, which were recorded on many of the records.

If the interviewers weren't even using the correct form (and did not visit the correct governorates), then why should we believe that they did anything else correct?

Left unexplained is whose fault this is. Did Burnham hand Lafta the wrong form? On purpose? By mistake? Or did Burnham give Lafta the correct form and then Lafta just used the wrong one? On purpose or by mistake? Recall (pdf) that the process included two face-to-face meetings.

American and Iraqi team members met twice across the border in Jordan, first to plan the survey and later to analyze the findings.

My guess is that Burnham did everything correctly (he seems like a careful and professional guy to me) but then Lafta just screwed it all up, either out of sloppiness or because field work is hard, even for the most diligent of researchers. And then Burnham got in real trouble because, after the fact, he tried to cover for Lafta. It is always the cover-up, never the crime.

And that is the real issue for Burnham's credibility. If he had just fessed up immediately, then that would be one thing. But, instead, he made a series of misleading claims about what happened. Consider just six examples:

1) Burnham's February 2007 talk at MIT (video here).

There were limitations to record keeping; we had some criticism that we could not produce a record showing which households were visited and who names of people were, and so forth.

We intentionally did not record that, because we felt that if the team were stopped at a checkpoint, of which there are lots of checkpoints, and the records were gone through, some of you may have had this experience, where you stop at a checkpoint, people go through all your papers, read everything, and they find certain neighborhoods. That might have increased risk, which we didn't want to do.

That's not true. The survey forms included names and Burnham knew that at the time he gave his talk to MIT. Note that these comments were part of the main presentation, something that Burnham had probably given on multiple occasions, not an off-the-cuff answer to a question from the audience. If Burnham said this at MIT, he probably said it elsewhere.

2) Burnham wrote to blogger Tim Lambert in February 2008:

As far as the survey forms, we have all the original field survey forms. Immediately following the study we met up with Riyadh (in this very hotel I am now in) and Shannon, Riyadh and I went through the data his team had computer entered, and verified each entry line-by-line against the original paper forms from the field. We rechecked each data item, and went through the whole survey process cluster-by-cluster. We considered each death, and what the circumstances were and how to classify it. Back in Baltimore as we were in the analysis we checked with Riyadh over any questions that came up subsequently. We have the details on the surveys carried at each of the clusters. We do not have the unique identifiers as we made it clear this information was not to be part of the database for ethical reasons to protect the participants and the interviewers.

Again, this statement is incorrect. Burnham did have the full names of at least some (or many? or most?) of the survey participants.

3) Burnham in an interview with the National Journal.

Hopkins's guidelines say that substantial modification of survey procedures without approval by the university's IRB can be deemed a serious violation. "All intentional noncompliance is considered to be serious," the guidelines state.

The survey "was carried out as we designed it," Burnham told National Journal.

Not true. The survey design called, explicitly, for the names of participants to not be collected. But the names were collected. Even if one tries to defend Burnham (see below) by insisting that he did not realize, until recently, that full names were collected, we still have the problem that Burnham has always known that Lafta used the wrong forms, forms with a space for the names of the participants. Using the wrong form is, obviously, not carry out the survey "as we designed it."

4) A description in Johns Hopkins Magazine.

Concern for the safety of interviewers and respondents alike produced two more decisions. First, they would not record identifiers like the names and addresses of people interviewed. Burnham feared retribution if a hostile militia at a checkpoint found a record of households visited by the Iraqi survey teams.

Now, a Burnham defender might argue that a) Burnham is not quoted, so this misstatement is not his fault and b) there is no misstatement here since "would not" is not the same as "did not." Perhaps. But the article clearly featured extensive cooperation from Burnham (and Doocy). They have some responsibility to make sure that the author describes their work accurately. And, to the extent that a mistake is made, they have an obligation to correct it. If Johns Hopkins Magazine can not trust professors from Johns Hopkins to clearly describe their research, then what is the world coming to?

The point of this example is not so much that it is damning in and of itself. It isn't. The point is that it is part of a pattern of Burnham giving his listeners a false impressions of whether or not names were collected. He makes everyone think that names were not collected even though they were.

5) The official Q&A from the Hopkins page, since removed (along with other material) for obvious reasons.

Approval specified that no unique identifiers would be collected from households visited by researchers, including complete names, addresses, telephone numbers or other information which could potentially put the households at risk. While household demographics were collected in both Iraq studies, personal information such as the date, location and cause of death was collected only for deceased household members. Research regulations do not consider a dead person to be a human subject and informed consent is not required for uniquely identifying information on the characteristics and circumstances of death. Informed consent was obtained from the principal respondent in each household before interviews were conducted.

"Personal information" was, we now know, collected for some living household members. Although Burnham's name does not appear on this Q&A, he was and is the Director of Center for Refugee and Disaster Response. It is hard to believe that he did not sign off on the document before it was posted.

6) "The Human Cost of the War in Iraq" (pdf), the official companion piece to Burnham et al (2006).

The survey was explained to the head of household or spouse, and their consent to participate was obtained. For ethical reasons, no names were written down, and no incentives were provided to participate.

Since names were written down, this statement is also not true. One might defend Burnham's misstatement at the MIT lecture as being an honest mistake, made while speaking off-the-cuff. One might argue that an informal e-mail to a blogger like Tim Lambert does not require perfect accuracy. One might believe that interviewers from the National Journal and Johns Hopkins Magazine were mistaken. One might assume that that Burnham never saw/approved that Q&A on the homepage of the Center that he heads.

But here we have a formal report with a statement that Burnham knew was both untrue and important. (If it were a trivial issue, then Hopkins would not have sanctioned him and no correction to the Lancet would be necessary.) How many other untrue statements are there?

Even worse, all the other L2 authors are also authors of this paper. Now, it could be that some of those authors (like Les Roberts) never saw the actual forms and so might be guilty of nothing more than an honest mistake. But Shannon Doocy saw the forms. She knew that this statement was not true.

Is Burnham telling the truth even now? Good question! Consider this article from the Baltimore Sun:

The Johns Hopkins University has disciplined the lead author of a widely publicized study that reported widespread civilian deaths in Iraq as a result of the U.S. invasion.

Discipline, sanction, censure. Whatever. They all work for me!

Because of the difficulty of carrying out research in Iraq during the war, Burnham and his team partnered with Iraqi doctors at a university in Iraq. Burnham, working out of Jordan, said he made it clear to the doctors that they could collect the first names of children and adults, to help keep the information straight, but that last names could not be collected.

Huh? This seems really fishy to me. I assume that collecting first names was not a part of the protocol that Burnham submitted to Hopkins. I have certainly never heard of anything like this. (Counter examples welcome!) You either write down someone's name or you don't. So, Burnham knew, before the interviewers went out that they were using a form with a spot for names? And he didn't say, "Wait! That's not the form you are supposed to use." The whole thing makes no sense. Unless Burnham is trying to cover for Lafta, trying his best to make his actions in recording names less damning.

And that is key to defending the results of the study. If Lafta screwed this up, then there is every reason to think that he screwed up other stuff, either on purpose or by mistake. If he wrote down names when Burnham told him not to, then how does anyone know if he followed the assigned sampling plan? We don't.

But if Burnham can take the blame himself, can claim that he told/allowed Lafta to write down partial names, then, perhaps, the rest of Lafta's work can still be trusted. I do not believe that Burnham really told Lafta that he could write down partial names.

You disagree? Fine. Let's ask Shannon Doocy. She was with Burnham in Jordan. She was a co-author of the study. She knows Lafta. Presumably, she was in the room when Burnham/Lafta discussed the planing for the study. I bet that she can't/won't back up Burnham on this point. She's not 66 and tenured . . .

When the surveys came back to him in Jordan, it appeared that some had last names. Many were in Arabic. Burnham said he asked his Iraqi partners and was told that the names were not complete, which he accepted. But Hopkins, in its investigation, found that the data form used in the surveys was different from what was originally proposed, and included space for names of respondents. Hopkins found that full names were collected.

Oh, what a tangled web we weave . . .

If I were Baltimore Sun reporter Stephen Kiehl, I would ask some follow up questions:

1) How many of the 1,800+ forms had names? How many of those names were full names (as opposed to just first names)? Were those names all in Arabic, all in English or a mixture? How many of the names were in English?

"Many" might be 20 or 100 or 500. But, given that the forms featured lots of English words (cause of death and so on), I would bet that many (hundreds?) names were in English. So, Burnham looked at a name like "Nouri al-Maliki" or "Iyad Allawi" and said, "Sure. I accept that these names are not complete." Unlikely! Again, it would be nice to ask Shannon Doocy some questions as well.

2) What is Burnham's take on the fact that his "Iraqi partners" misled him? (Was this just Lafta or Lafta and some other interviewers? My understanding had been was that it was just Lafta who brought the forms to Burnham/Doocy in Jordan. And note that Lafta is not just a "partner," he is an official co-author on the study.) Here we have documented proof that one author of the study misled another author of the study. How much faith does Burnham think we should have in a study in which the authors are lying to each other?

3) Why did Burnham mislead so many news organizations? Consider his (and Roberts') letter (also here) to the National Journal:

In the ethical review process conducted with the Bloomberg School of Public Health's Institutional Review Board, we indicated that we would not record unique identifiers, such as full names, street addresses, or any data (including details from death certificates) that might identify the subjects and put them at risk.

Oh, snap! Did you catch that? I thought that this was just going to be one of many examples of where Burnham lied, after the fact, about not collecting names. But note that he is not lying! Sneaky! He and Roberts do not claim (falsely) to not have collected names. They claim (truthfully) to having told Hopkins that they "would not" record names. Hah! I did not catch that the first time around because I assumed a basic level of honesty from Burnham. My mistake! (Further discussion here.) Comments:

1) This is circumstantial evidence that, at least by this point, Roberts was in on the lie.

2) I suspect that "full names" rather than just "names" is used to preserve wiggle room on various dimensions.

Tuesday, February 24, 2009

Likely Correction?

The Johns Hopkins statement includes:

The paper in The Lancet incorrectly stated that identifying data were not collected. An erratum will be submitted to The Lancet to correct the text of the 2006 paper on this point.

I am confused about how this is going to work. The only relevant statement that I could find in the paper was:

The survey purpose was explained to the head of household or spouse, and oral consent was obtained. Participants were assured that no unique identifiers would be gathered.

For all we know, this is a true statement. Perhaps this is what participants were told, albeit incorrectly. If that is so, then a hard core Lancet defender, like Les Roberts, might maintain that no correction is necessary. In fact, don't all the authors of a paper need to agree on any correction? I suppose that a single author, like Burnham, can say whatever he wants, but I wouldn't think that a single author has the right to make an official change unless the other authors agree. At most, a single author can just demand that his name be removed from the publication. And, say what you will about Les Roberts, but he is a head-strong fellow. Not that there is anything wrong with that! What if Burnham wants to make a correction but Roberts doesn't?

Perhaps there is some other reference in the paper that I have missed.

Moreover, wouldn't this also be a good time to correct the mistaken description in the paper of the sampling scheme. After all, even Roberts/Burnham admit that it is not accurate.

Anyway, I don't have a strong opinion on what will happen. I am just curious what others predict . . .

Misleading Statements on Unique Identifiers

I had assumed that it would be easy to find places where Roberts/Burnham claimed that names were not collected. Turns out that it isn't. Looks like they knew that this fact might eventually come out so they tried to mislead without lying. Consider their letter to Science.

Those who work in conflict situations know that checkpoints often scrutinize written materials carried by those stopped, and their purpose may be questioned. Unique identifiers, such as neighborhoods, streets, and houses, would pose a risk not only to those in survey locations, but also to the survey teams. Protection of human subjects is always paramount in field research. Not including unique identifiers was specified in the approval the study received from the Johns Hopkins Bloomberg School of Public Health Committee on Human Research. At no time did the teams “destroy” details, as Bohannon contends. Not recording unique identifiers does not compromise the validity of our results.

Were Burnham/Roberts lying? Tough to say! I (and everyone else who read those words two years ago) assumed that the names of the participants were not collected. (We now know that at least some full names were recorded.) But the above passage is written in such a way as to give the impression that no names were collected --- "Not including unique identifiers was specified in the approval the study received" --- without explicitly stating that this had been done. Clever!

Recall how the paper itself handled the issue.

The survey purpose was explained to the head of household or spouse, and oral consent was obtained. Participants were assured that no unique identifiers would be gathered.

Again, this could be a truthful statement. Who knows what the interviewers said to the participants? But it completely misleads the reader into thinking that names had not been collected when, in fact, they had been collected.

So, hard core Lancet defenders will insist that no correction is necessary. Then why is Johns Hopkins insisting that one be made?

The paper in The Lancet incorrectly stated that identifying data were not collected. An erratum will be submitted to The Lancet to correct the text of the 2006 paper on this point.

If you agree with Hopkins that a correction is necessary, then you should also believe that Roberts/Burnham owe an apology/correction to, among others, the readers of Science. They were just as likely to have been misled.

But also consider this June 2007 letter to Science from Burnham, Roberts and Doocy.

At the time the study was published, Iraqi colleagues requested that we delay release, as they were very fearful that somehow streets, houses, and neighborhoods might be identified through the data with severe consequences. We agreed to wait for 6 months and have now made the data available.

From the beginning, we have taken the position that protecting participants is paramount, and thus we will not be releasing any identifiers below the level of governorate. The demand by some for street names seems to arise from the erroneous belief that only main streets were sampled, when in fact, where there were residential streets that did not intersect with the selected commercial street, these too were included in the sampling frame for identification of the start house. In any event, our interviewers reported that most, although not all, violent deaths occurred away from the location of residence.

Does this make much sense? To the extent that their "Iraqi colleagues" were "very fearful," it would not be because of specific neighborhoods being identified via statistical magic but because the full names of participants could be released. That's the real problem. By pretending otherwise, Burnham/Roberts/Doocy continued to mislead all of us into thinking that the worst conceivable data release (from the point of view of participant safety) might allow identification of a specific street when, in fact, the worst possible release would involve the actual names of participants. Again, there is nothing in the above that is literally a lie --- Who knows what their "Iraqi colleagues" told them? --- but there is no doubt that readers were misled, again, into thinking that names had not been recorded.

Thursday, February 19, 2009

L2 Sampling Details

This post serves to collect information from various sources about the details of the L2 sampling procedure.

Summary: L2 authors have given different (and conflicting) accounts of exactly what the interviewers did and/or were supposed to do. They have made no final statement about what sampling plan was followed. Anyone who claims to "know" what the sampling plan in L2 was is lying.

Let's begin with the description from the paper itself.

As a first stage of sampling, 50 clusters were selected systematically by Governorate with a population proportional to size approach, on the basis of the 2004 UNDP/Iraqi Ministry of Planning population estimates (table 1). At the second stage of sampling, the Governorate’s constituent administrative units were listed by population or estimated population, and location(s) were selected randomly proportionate to population size. The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. On the residential street, houses were numbered and a start household was randomly selected. From this start household, the team proceeded to the adjacent residence until 40 households were surveyed. For this study, a household was defined as a unit that ate together, and had a separate entrance from the street or a separate apartment entrance.

The problem with this description, as highlighted by the "main street bias" work in Johnson et al (2008), is that there are houses/streets that are not included in the sampling frame. Consider this figure from Johnson et al.

It is obvious that some houses are not within the sample frame.

Now, just because the stated procedure has this problem does not invalidate it. Given the time/resource constraints faced by L2, this approach is perfectly reasonable. The critical thing is not so much the exact approach used as it is transparency on the part of the L2 authors as to their procedure.

The best single source for an overview of L2 sampling issues is Mike Spagat's "Ethical and Data-Integrity Problems in the Second Lancet Survey of Mortality in Iraq" (pdf), forthcoming in Defence and Peace Economics. Below are some of the key sections:

The authors of L2 have still not fully disclosed their sample design (Bohannon, 2008, Spagat, 2007). Gilbert Burnham and Les Roberts have stated frequently that the L2 field teams did not follow the sampling methodology that was published in the Lancet but they have not supplied a viable alternative. Burnham and Roberts have also issued a series of contradictory statements about their sampling procedures and have either destroyed or not collected evidence necessary to evaluate these procedures.

See Spagat for citations.

These lists of main streets are at the core of the claimed sampling methodology. Yet, the L2 authors have refused to provide these lists or even clarify where they came from.

The footnote associated with this sentence is particularly damning.

For example, Seppo Laaksonen, a professor of survey methodology in Helsinki, requested and was denied any information on main streets, even the average number of main streets per cluster (Laaksonen, 2008).

This is the sort of behavior that makes me (and most statisticians) incredibly suspicious of L2. It is fine, perhaps, for the L2 authors to ignore me. I am just some random guy on the internet. It is fine, perhaps, for them to ignore Spagat et al since Spagat has been so critical of their work. But to refuse to answer a question from a professor of survey methodology with no particular axe to grind is just pathetic. If you are an academic, you answer questions from your fellow academics.

Back to Spagat:

Gilbert Burnham did make aspects of the sampling methodology fairly concrete in Biever (2007), an interview with the New Scientist.

“The interviewers wrote the principal streets in a cluster on pieces of paper and randomly selected one. They walked down that street, wrote down the surrounding residential streets and randomly picked one. Finally, they walked down the selected street, numbered the houses and used a random number table to pick one. That was our starting house, and the interviewers knocked on doors until they’d surveyed 40 households…. The team took care to destroy the pieces of paper which could have identified households if interviewers were searched at checkpoints.” (Biever, 2007, emphasis added.)

Whatever its strengths or weaknesses, this does seem to be a procedure that can be followed in the field. The L2 authors may no longer be able to specify their sample design since these pieces of paper have been destroyed. But they should be able to supply lists of principal streets or at least specify how many such streets there were per governorate.

Burnham explains that the sampling information was destroyed to protect the identities of respondents, but this explanation is inadequate. Pieces of paper with lists of principal streets and surrounding streets would be of no use for identifying households included in the survey. Even lists of all of the households on a street that was actually sampled would not be usable for identifying particular L2 respondents. On the other hand, the L2 data-entry form that Riyadh Lafta submitted to the WHO contains spaces for listing the name of each head of household in addition to names of people who died or were born during the L2 sampling period. If the field teams could travel around with pieces of paper containing the names of their respondents plus many of their family members then they did not have to destroy lists of streets. Finally, as noted above in section 2, the lists of L2’s respondents would have been widely known at the local level in any case.

The L2 authors have often dismissed the possibility of sampling bias by stating that they did not actually follow the sampling procedures that they claimed to have followed in their Lancet publication. For example, Burnham and Roberts (2006a) write that they had removed the following sentence from their description of their sampling methodology at the suggestion of peer reviewers and the editorial staff at the Lancet:

"As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have." (Burnham and Roberts, 2006a)

Combining this with Gilbert Burnham’s New Scientist interview already quoted (Biever, 2007) would imply that at each location:

A. Field teams wrote names of main streets on pieces of paper and selected one street at random.

B. The field teams then walked down this street writing down names of cross streets on pieces of paper and selected one of these at random.

C. The field teams then became aware of all other streets in the area that did not cross the main avenues and may have selected one of these instead of one of the cross streets written on pieces of paper. This wide selection was done according to an undisclosed procedure.

The Biever (2007) description of Burnham does outline a sampling procedure that could have been followed and is broadly consistent with the published methodology. If other types of streets, beyond those that would be covered by the published methodology, were included in the sampling procedures then the authors need to specify how these streets were included. More fundamentally, how did the field teams discover the existence of such streets that could not be seen by walking down principal streets as described by Burnham in Biever (2007)? The L2 field teams would not have brought detailed street maps with them into each selected area or else it would not have been necessary to walk down selected principal streets writing down names of surrounding streets on pieces of paper. We can also rule out the possibility that the teams completely canvassed entire neighborhoods and built up detailed street maps from scratch in each location. Developing such detailed street maps would have been very time consuming and the L2 field teams had to follow an extremely compressed schedule that required them to perform forty interviews in a day (Hicks, 2006).

In Giles (2007), an article in Nature, Burnham and Roberts suggested one possible explanation on how the field teams had managed to augment their street lists beyond streets that could be seen by walking down a main street but this suggestion was rejected by an L2 field-team member interviewed by Nature:

“But again, details are unclear. Roberts and Gilbert Burnham, also at Johns Hopkins, say local people were asked to identify pockets of homes away from the centre; the Iraqi interviewer says the team never worked with locals on this issue.” (Giles, 2007)

Even if locals had identified such “pockets of homes away from the centre” the authors still would have to specify how these were included in the randomization procedures. Indeed, involving local residents in selecting the streets to be sampled would seem to be at odds with random selection of households. Locals could, for example, lead the survey teams to particularly violent areas.

Burnham and Roberts have induced further confusion about their sample design by issuing a series of contradictory statements.

"The sites were selected entirely at random, so all households had an equal chance of being included." (Burnham et al, 2006b, emphasis added)

"Our study team worked very hard to ensure that our sample households were selected at random. We set up rigorous guidelines and methods so that any street block within our chosen village had an equal chance of being selected." (Burnham and Roberts, 2006b, emphasis added)

“… we had an equal chance of picking a main street as a back street.” (The National Interest, 2006).

These statements contradict each other and the methodology published in the Lancet. Some streets are much longer than others. Some streets are much more densely populated than others. Such varied units cannot all have equal probability of selection. If, for example, every street block had an equal chance of selection then households on densely populated street blocks would have lower selection probabilities than households on sparsely populated street block. If main streets are more densely populated on average than back streets are and main streets and back streets have equal selection probabilities then households on main streets would have lower selection probabilities than households on back streets.

All this is bad enough. Yet Lancet defenders assert that these contradictions are unimportant, that such discrepancies are inevitable when researchers try to explain their work to the public, that much of the "conflict" is just caused by a reasonable effort on behalf of the Lancey authors to simplify.

And that is a reasonable defense. But, to work, there must come a point at which we can point to a specific statement that describes the sampling plan used. What is the truth? Normally, the truth would be whatever is written in the published paper, but the Lancet authors now claim that the published work is not accurate. (And they have failed to publish a correction in the Lancet.) The truth could also be the official sampling plan that the authors created prior to starting the project. They must have written up a plan ahead of time. If they were to release that now, they could point to that.

So, if the truth is not in the paper, or in a correction to the paper or in an official document released by the authors, where is it? Good question! No one knows.

Burnham gave a presentation at MIT in February 2007. (See here for video, transcript and extensive commentary.) Let me quote my prior commentary.

Burnham claims that they did not restrict the sample to streets that crossed their main streets. Instead, they made a list of "all the residential streets that either crossed it or were in that immediate area." This is just gibberish.

First, if this was what they actually did, why didn't they describe it that way in the article? Second, given the time constraints, there was no way that the teams had enough time to list all such side streets. Third, even if the interviewers did do it this way, the problem of Main Street Bias would still exist, except it would be more Center Of Town Bias. Some side streets are in the "immediate area" of just one main street (or often in the area of none) and other side streets (especially those toward the center of a town or neighborhood) are near more than one. The later are much more likely to be included in the sample.

Again, it is not unreasonable for there to be some confusion in the immediate aftermath of publication about exactly what the interviewers did. My point here is not to attack Burnham for an in artful phrase like "equal chance" as used in an interview. But, given that this was such an important matter of public concern, I would expect Burnham to have his story straight by the time that he spoke at MIT, 4 months after publication.

But things are even worse that that!

Consider the Q&A that the Lancet authors posted in early 2008 on the Johns Hopkins web page. Note the description of the sampling:

Sampling for the 2006 study was designed to give all households in Iraq an equal chance of being included.

This statement can not possibly be true. How can the authors, more than a year after the study was published, assert such an obvious falsehood about the single most important criticism of the study?

But, even though this is a false statement, it is at least a clear one. Want to know what the official sample plan was? Don't check the paper. (It's wrong.) Don't look for corrections. (None were made.) Don't look for the original planning documents. (They were never released.) But, at least there is an official statement on the Johns Hopkins website. When Tim Lambert and other Lancet defenders want to claim that the sampling plan was such and such, they could link here as evidence.

But then the Lancet authors deleted that page! (This was in a conjunction with various other false statements made on the web page as well as other unexplained deletions.) So, the official word is that there is no official word. The last clear statement made by the Lancet team --- a statement that, one assumes, supersedes the published paper and other interviews given by the authors --- was removed from the web without any explanation.

Anyone who claims to know the sampling plan used in L2 is lying.