Sunday, March 09, 2008

Data for L1

Those interested can download (some of) the data from L1 here. Comments:

1) Thanks to Tim Lambert for posting that data. I wonder if Les Roberts has ever asked him to remove it? When we chatted briefly at JSM last summer, Roberts seemed to accuse me of posting his data on my blog. I did not do so since he didn't give me his permission. I only included the data in my R package after Lambert made the data public. I am not sure what I would do if Roberts asked me to remove the data from my package. On the one hand, I take proper scholarly behavior seriously. It is their data and I would not share it without permission. Indeed, I have been asked by several folks to provide a copy of the L2 data. I have never done so (even though I could have without getting caught) because I don't think that scientists ought to behave that way. But the L1 data is a trickier case because Lambert has already placed the data in the public domain. (In fact, I made sure to download the data from his site rather than use the copy that Roberts gave to me.) Fortunately, Roberts has not asked me to remove the data from the R package, so I am covered. Also, I have discussed the issue with Burnham and Doocy, who have offered no objections.

2) The back story was that I was trying to get the data out of Roberts and was cc'ing Lambert both out of politeness (since it was his blog Deltoid that got me involved in the debate) but also out of a sense that Roberts was, I thought, more likely to play nice if Lambert, a vigorous supporter of L1, was listening in on the conversation.

3) I have not looked closely at the cluster level data that is available above for L1. My main focus has been on what happens when you include and exclude Falluja from the analysis. But I am still somewhat suspicious of the raw data here, and not just the ludicrous outlier that is Falluja. For example, the range of average household size seems ridiculously large. Assume that each cluster included 30 households. Then, the average household size in Karbala 1 is just 4.6, rising to 11.5 in Thaura (Baghdad). (Hat tip to Mike Spagat.) Does that seem reasonable? I guess that household size might be larger in the city than the country (?), but more than 2 times larger? Is that consistent with data from other surveys?

4) Recall that Roberts (can't find the link) only participated in the surveys for the first 8 clusters. At that point, there was some problem with the police and his interviewers argued that he (as an American) was putting them in danger. So, he spent the rest of the time in Baghdad. The obvious question is: Were the results very different between the clusters that Burnham Roberts supervised and the ones that he did not? A quick glance suggests that this is tough to know because, judging from the dates, he participated in all (?) the clusters in Baghdad, so looking at the results for just "his" clusters would need to adjust for that in some way. I am too busy to tackle this now.

5) Note that two of the strangest clusters were Falluja and, again, Thaura. Recall the stories about how these two (or at least Falluja) were saved to the end of the process because they were so dangerous. I have always been quite suspicious of results that come at the tail end of these surveys, as the famous cluster 33 does in L2. You can see that these were the last two because they feature the latest "date finished" in the data.

6) Why don't the authors provide data from L1 at the same level of detail that they do for L2? Excellent question. I have pushed both Burnham and Roberts on this point. The official story is that this data is "no longer available." That is a direct quote from an e-mail to me. I have tried to find out what this means. Surely the data is available at least to them. They did not lose it? Or destroy it? Elizabeth Johnson, the statistical consultant, is a careful and serious scholar. I am virtually certain that she kept a back-up copy, along with documentation and computer code. Why won't Roberts share this with us? He may not have anything to hide but he sure acts like he does.