February 04, 2009

Election 2008 Analysis

Ok, so I finally got around to analyzing my election predictions. Here's a table.

I correctly predicted the outcome of 50 of the 51 states, missing Indiana.

StateEVPollProjWinΔpΔaError
DC30.45

66.8285.9219.1
VT31.17

28.1337.018.9
DE33.02

27.9225.00-2.9
NY314.38

27.7426.69-1.1
MD100.73

25.8025.44-0.4
HI40.66

22.6445.2622.6
CA556.64

22.6024.061.5
IL214.14

22.1525.133.0
CT71.92

20.1622.372.2
RI40.76

19.8327.928.1
MA124.16

17.7725.818.0
OR79.62

15.7416.350.6
ME45.87

15.5217.321.8
WA117.03

15.3617.181.8
IA79.67

15.289.53-5.8
NJ159.48

14.1215.571.5
MI179.43

13.8916.472.6
WI1010.78

13.2413.900.7
MN1013.62

9.9110.240.3
NH413.00

9.779.61-0.2
NM57.14

9.4915.135.6
PA2127.15

8.0610.352.3
CO914.57

6.428.952.5
VA1321.06

6.066.300.2
NV513.12

5.7912.496.7
OH2028.72

2.894.541.6
FL2729.05

2.032.820.8
NC1527.25

1.040.33-0.7
MO1124.40

0.23-0.13-0.4
IN1116.09

-0.511.031.5
ND32.67

-2.67-8.63-6.0
GA1515.07

-3.23-5.21-2.0
AZ107.31

-3.61-8.52-4.9
MT35.28

-4.29-2.262.0
SC84.60

-8.64-8.98-0.3
SD32.88

-9.13-8.410.7
AR63.06

-9.29-19.85-10.6
MS63.53

-9.43-13.17-3.7
LA92.26

-10.04-18.63-8.6
WV56.24

-10.60-13.12-2.5
TX342.53

-11.01-11.77-0.8
TN112.31

-12.60-15.07-2.5
AK35.71

-13.62-21.54-7.9
KY88.52

-14.04-16.23-2.2
KS64.03

-16.97-14.962.0
NE50.78

-19.05-14.934.1
AL93.42

-23.08-21.581.5
ID40.71

-23.49-25.43-1.9
UT51.76

-24.63-28.18-3.6
WY33.38

-26.34-32.24-5.9
OK74.25

-26.62-31.29-4.7

Unless otherwise noted, I am omitting DC and HI from my analysis. These states were not only significant leverage points as indicated by severe values of the Cook's D statistic, but also represented significant outliers based on their studentized residuals. Because of this and the dismal amount of polling in both states, I have omitted them. (This, by the way, is how statisticians say "these data really screwed up my model so I took 'em out.")

I missed statewise margin of victory by {3.1 average, 2.2 median}. I was closest to correctly guessing the outcome in New Hampshire (which is just dumb luck; that state was crazy!). I predicted 54.5-44.7; the outcome was 54.1-44.5, a net error of 0.16 points. I was farthest from correctly guessing the outcome in Arkansas (damn PUMAs!) - I predicted a 9.3 pt McCain victory; he won the state by 19.9 pts, an error of 10.56 points.

The regression model treated me nicely.

The first plot shows my prediction line (red) and where each state actually fell (blue).
The second plot shows my prediction (x-axis) with my prediction error (y-axis).
Highlighting +/- 4.00 was arbitrary.



My
regression intercept has a p-value of 0.484, indicating there is absolutely no evidence that my average prediction varied significantly from the average actual outcome.

The model in total has a p-value of 0.000000000000000000000000000000000224, which besides being an awesomely small number, also indicates the probability of beating my model by chance alone.

On the down side, the regression model coefficient of 1.124 had a 95% CI of {1.057, 1.192}. Because 1.000 is not in this interval, this indictates that my model significantly underpredicted the average differential in each state (for either candidate; in other words, I gave Obama too much credit in red states and McCain too much credit in blue states, etc.).

Finally, my model scored an r-square value of 0.960. In other words, if each state's outcome was determined by the results of 100 coin flips, having my model in hand would be analogous to already knowing the outcome of 96 of the flips. Or to use another analogy, it would be like predicting the rankings of teams in baseball standings for a 162-game season while already knowing how the first 156 games turn out.

And just for curiosity's sake, I was joking about being unjustified in removing HI and DC. While it's true I missed them both by large margins (especially HI), it's clear in the picture below why DC (off to the right) is overly influential in determining the line of best fit. (HI is the point at 23, 45).



Had I included the two states, r-squared drops from 0.960 to 0.956 and significance actually decreases (that's good) from 2E-34 to 8E-35.

So that's all I've got. I won't do anything for the Senate, but it looks as though I will be wrong about Coleman winning Minnesota (I can't believe that is still not settled. I don't feel bad about missing it, and to my credit I did predict that it would be the last race decided.)

Finally, I got the absolute percentages for Obama and McCain wrong - I said it would be 53.1-45.9; instead it was 52.9-45.7. But notice that I overshot both numbers (I just took the fact that Kerry and Bush added up to 99.0% and applied the same assumption to Obama and McCain). In terms of share of the 2-party vote, it broke down like this:

Prediction - 53.64 to 46.36
Actual - 53.68 to 46.32

Adding up the two errors, I missed the national vote share by less than one-tenth of one percentage point (0.08%). I'll take it.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.