Somebody Think A Little.: Stat Analysis

Showing posts with label Stat Analysis. Show all posts

February 04, 2009

Election 2008 Analysis

Ok, so I finally got around to analyzing my election predictions. Here's a table.

I correctly predicted the outcome of 50 of the 51 states, missing Indiana.

State	EV	Poll	Δp	Δa	Error
DC	3	0.45	66.82	85.92	19.1
VT	3	1.17	28.13	37.01	8.9
DE	3	3.02	27.92	25.00	-2.9
NY	31	4.38	27.74	26.69	-1.1
MD	10	0.73	25.80	25.44	-0.4
HI	4	0.66	22.64	45.26	22.6
CA	55	6.64	22.60	24.06	1.5
IL	21	4.14	22.15	25.13	3.0
CT	7	1.92	20.16	22.37	2.2
RI	4	0.76	19.83	27.92	8.1
MA	12	4.16	17.77	25.81	8.0
OR	7	9.62	15.74	16.35	0.6
ME	4	5.87	15.52	17.32	1.8
WA	11	7.03	15.36	17.18	1.8
IA	7	9.67	15.28	9.53	-5.8
NJ	15	9.48	14.12	15.57	1.5
MI	17	9.43	13.89	16.47	2.6
WI	10	10.78	13.24	13.90	0.7
MN	10	13.62	9.91	10.24	0.3
NH	4	13.00	9.77	9.61	-0.2
NM	5	7.14	9.49	15.13	5.6
PA	21	27.15	8.06	10.35	2.3
CO	9	14.57	6.42	8.95	2.5
VA	13	21.06	6.06	6.30	0.2
NV	5	13.12	5.79	12.49	6.7
OH	20	28.72	2.89	4.54	1.6
FL	27	29.05	2.03	2.82	0.8
NC	15	27.25	1.04	0.33	-0.7
MO	11	24.40	0.23	-0.13	-0.4
IN	11	16.09	-0.51	1.03	1.5
ND	3	2.67	-2.67	-8.63	-6.0
GA	15	15.07	-3.23	-5.21	-2.0
AZ	10	7.31	-3.61	-8.52	-4.9
MT	3	5.28	-4.29	-2.26	2.0
SC	8	4.60	-8.64	-8.98	-0.3
SD	3	2.88	-9.13	-8.41	0.7
AR	6	3.06	-9.29	-19.85	-10.6
MS	6	3.53	-9.43	-13.17	-3.7
LA	9	2.26	-10.04	-18.63	-8.6
WV	5	6.24	-10.60	-13.12	-2.5
TX	34	2.53	-11.01	-11.77	-0.8
TN	11	2.31	-12.60	-15.07	-2.5
AK	3	5.71	-13.62	-21.54	-7.9
KY	8	8.52	-14.04	-16.23	-2.2
KS	6	4.03	-16.97	-14.96	2.0
NE	5	0.78	-19.05	-14.93	4.1
AL	9	3.42	-23.08	-21.58	1.5
ID	4	0.71	-23.49	-25.43	-1.9
UT	5	1.76	-24.63	-28.18	-3.6
WY	3	3.38	-26.34	-32.24	-5.9
OK	7	4.25	-26.62	-31.29	-4.7

Unless otherwise noted, I am omitting DC and HI from my analysis. These states were not only significant leverage points as indicated by severe values of the Cook's D statistic, but also represented significant outliers based on their studentized residuals. Because of this and the dismal amount of polling in both states, I have omitted them. (This, by the way, is how statisticians say "these data really screwed up my model so I took 'em out.")

I missed statewise margin of victory by {3.1 average, 2.2 median}. I was closest to correctly guessing the outcome in New Hampshire (which is just dumb luck; that state was crazy!). I predicted 54.5-44.7; the outcome was 54.1-44.5, a net error of 0.16 points. I was farthest from correctly guessing the outcome in Arkansas (damn PUMAs!) - I predicted a 9.3 pt McCain victory; he won the state by 19.9 pts, an error of 10.56 points.

The regression model treated me nicely.

The first plot shows my prediction line (red) and where each state actually fell (blue).
The second plot shows my prediction (x-axis) with my prediction error (y-axis).
Highlighting +/- 4.00 was arbitrary.

My regression intercept has a p-value of 0.484, indicating there is absolutely no evidence that my average prediction varied significantly from the average actual outcome.

The model in total has a p-value of 0.000000000000000000000000000000000224, which besides being an awesomely small number, also indicates the probability of beating my model by chance alone.

On the down side, the regression model coefficient of 1.124 had a 95% CI of {1.057, 1.192}. Because 1.000 is not in this interval, this indictates that my model significantly underpredicted the average differential in each state (for either candidate; in other words, I gave Obama too much credit in red states and McCain too much credit in blue states, etc.).

Finally, my model scored an r-square value of 0.960. In other words, if each state's outcome was determined by the results of 100 coin flips, having my model in hand would be analogous to already knowing the outcome of 96 of the flips. Or to use another analogy, it would be like predicting the rankings of teams in baseball standings for a 162-game season while already knowing how the first 156 games turn out.

And just for curiosity's sake, I was joking about being unjustified in removing HI and DC. While it's true I missed them both by large margins (especially HI), it's clear in the picture below why DC (off to the right) is overly influential in determining the line of best fit. (HI is the point at 23, 45).

Had I included the two states, r-squared drops from 0.960 to 0.956 and significance actually decreases (that's good) from 2E-34 to 8E-35.

So that's all I've got. I won't do anything for the Senate, but it looks as though I will be wrong about Coleman winning Minnesota (I can't believe that is still not settled. I don't feel bad about missing it, and to my credit I did predict that it would be the last race decided.)

Finally, I got the absolute percentages for Obama and McCain wrong - I said it would be 53.1-45.9; instead it was 52.9-45.7. But notice that I overshot both numbers (I just took the fact that Kerry and Bush added up to 99.0% and applied the same assumption to Obama and McCain). In terms of share of the 2-party vote, it broke down like this:

Prediction - 53.64 to 46.36
Actual - 53.68 to 46.32

Adding up the two errors, I missed the national vote share by less than one-tenth of one percentage point (0.08%). I'll take it.

February 02, 2009

Punxsutawney Phil: Groundhog Data

How reliable are Phil's predictions?

After a too-long internet search, the best historical weather data I was able to find (I'm happy to do something else if you can find better data) was monthly average temperatures for Portland, OR from 1941 on. Since Phil predicts "early spring" or "six more weeks of winter," I gave February a weight of 4 and March a weight of 2 (number of weeks of those six weeks) and took the average. Not the best method of course, but the best I have available. There were no predictions for 1941 and 1942 and no weather data for 2007 and 2008, so I had 63 data points.

Phil predicts more winter:
Number of times - 53
Mean temperature - 44.38
Variance - 4.94

Phil predicts early spring:
Number of times - 10
Mean temperature - 44.96
Variance - 4.90

Two-sample t-test:
T-statistic: -0.760 with 61 degrees of freedom
One-sided p-value: 0.225

***************************************************

Translation: We say these results are so extreme they would "only" happen one in four times by chance. Which isn't sufficient evidence to claim that there's any relationship between Groundhog's Day predictions and actual outcomes.

Sorry to eat away at your childhood :)

November 25, 2008

Quick Back-Pat Update

Source	Obama	McCain	Dem Error	Rep Error	Total Error
Election 2008	52.8	45.9	---	---	---
RCP	52.1	44.5	0.7	1.4	2.1
538	52.3	46.2	0.5	0.3	0.8
Chris	53.1	45.9	0.3	0.0	0.3

November 21, 2008

Competitiveness and Blueness

OK, so I still haven't read the report that Warren posted about state competitiveness. Nonetheless, I opened it, saw the data, and instantly new what my question was.

I've been saying that this whole idea about big shifts in the red-blue map is generally overblown, but when it comes to economic success, there may be some real change (like Colorado and Virginia going blue, and Missouri going red).

So I did a very simple regression on the Overall Index value by Dem margin of victory per state in '08 minus that in '04. (So losing a state by 20 in '04 and by 16 in '08 comes out to be the same as losing a state by 1 in '04 and winning it by 3 in '08, etc.) I first did all the states; then I did it again omitting AK, AR, DE, HI, IL, MA, and TX to eliminate home-state effects (I believe Hillary's one-time candidacy did significantly hurt Obama in Arkansas).

The results are astounding! r^2 of 60%! P-value of 0.000038. There is no doubt that more competitive states saw bigger blue shifts in 2008. A score of 0 in the report corresponds to a red shift of 6.74%. Then, for each add'l point in the index, the state shifts blue by a whopping 3.36%. So all else equal, if Kerry lost a state by 10 points in 2004 but it has a competitiveness index of 5.00 - right at the median, then Barack Obama would have been ever-so-slightly favored to win the state in 2008.

I'll do more with this in the coming weeks.

Oh, and yet another complaint about the electoral college - when the EVs are redistributed following the 2010 census, it's likely that only blue states will lose EVs and only red states will gain them. This happened in 2000 as well - John Kerry played on a tougher map than Al Gore by about 5 EVs.

Somebody Think A Little.

February 04, 2009

Election 2008 Analysis

February 02, 2009

Punxsutawney Phil: Groundhog Data

November 25, 2008

Quick Back-Pat Update

November 21, 2008

Competitiveness and Blueness

Permanent Topics

Labels

Link Dump Section

Blog Archive

PostRank

Somebody Think A Little.

February 04, 2009

Election 2008 Analysis

February 02, 2009

Punxsutawney Phil: Groundhog Data

November 25, 2008

Quick Back-Pat Update

November 21, 2008

Competitiveness and Blueness

Permanent Topics

Labels

Link Dump Section

Blog Archive

Subscribe To

PostRank