Here's what I know so far. My best projection was Obama 53.1 to McCain 45.9. This was done by a straight-line projection from their polling average, which means that I allocated more of the undecideds to Obama than McCain; I guessed McCain would edge Obama on undecideds but did not take the time to adjust the model. Nonetheless, with 126M votes recorded, the totals thus far of Obama 52.7 McCain 46.0 are not very far from my predictions.
One question of interest is how much of the error in my predictions was tied to how much polling was done in a state. So I did a simple (but nonlinear) regression of Absolute Error against Polling Strength (the sum of each poll's weight, which comes from its age and the reliability of its pollster).
The results fit pretty tightly. I'll try to keep this simple for the non-statistical folks.
Polling Strength (PS) ranged from 0.16 in DC, HI, and MD to 12.95 in MO, with a median of 2.0 and a mean of 3.16. (So Missouri is the farthest blue dot to the right)
Absolute Error (AE) ranged from 0.2% in NH to 22.5% in HI with a median of 2.25% and a mean of 3.8%. (So Hawaii is the highest blue dot from the bottom)
The regression equation is: AE = 4.65 - 1.66*ln(PS)
So MO's expected error is 0.4% and DC's is 7.7%
The p-value on this result is 0.0005. For the non-stat crowd, statisticians never say they've proven anything. What they do say however, is that this result is so extreme you would only see it 1 in 2,000 times by chance. 1 in 20 is generally the rule for significance, so this is a highly significant result. Finally r^2 is 0.25, meaning one quarter of the variability in the model's error is attributable solely to how much polling was done.
*************************
Wow, I really made that more boring than I thought I could.
What is most striking to me is that graph, where I think the line fits pretty well. The takeaway from this is that when people say polls are useless, this is meant as evidence that that's not true, provided you have a good set of polls and not just one or two. Hawaii, for example, had very little polling and the last poll conducted there was way off the mark. My prediction error was 22.5% including it and 4.4% excluding it. In states with more polls this sort of problem could not have arisen.
I'll have data that's actually interesting once more data comes in.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.