Statistics for the win.
I won't go into to much explanation since Nate does, but check this out. Benford's Law says that, with remarkably few assumptions, an extremely wide range of types of data all exhibit similar patterns when it comes to their values' first digit.
So whether you're talking about the number of people living in each of America's counties or the distance between Earth and each of the Universe's objects, if you took your entire list and only kept the first digits, the proportion of 1s would be about the same on both lists, and the same for 2s, etc. (This sort of makes sense - there are probably more counties with 10-19 people than 90-99, more with 100-199 than 900-999, more with 100,000-199,999 than 900,000-999,999, etc.)
Long story short, Nate's analysis appears to show that one of Ahmadinejad's opponent's vote totals per county(?) behaved extremely well according to Benford's Law - except there were far too many 7s ...
Showing posts with label Statistics. Show all posts
Showing posts with label Statistics. Show all posts
June 18, 2009
May 19, 2009
(Twin) brother by another father.
I am not even going to bother moving beyond fact in this case: a woman had sex with her boyfriend, and then cheated on him soon after, leading to the one-in-a-million (seriously, 1/1,000,000) outcome of a set of twins with different fathers. The boyfriend has been told of the infidelity and has agreed to raise both boys as his own, despite being biological father to only one. Of course, since modern news stories only ever seem to hamper progress toward social equality, the family is black, the parents unmarried, and the lot of them live in good ol' Texas. America, fuck yeah!
December 24, 2008
Teams of the Decade
Continuing an earlier post on NFL history, I wanted to comment on the teams of this decade.
First the best:
It's actually unbelievably close. The New England Patriots thus far have the best record of this decade, at 101-42. However, the Indianapolis Colts are close behind at 100-43. At distant third is the Pittsburgh Steelers with a record of 93-49-1. Philadelphia is also over 60% for the decade.
The race for the bottom is not nearly as close. At 40-103, the Detroit Lions need to have a perfect season next year to have even the slightest hope of catching the Houston Texans (39-72) for the bottom. Put it another way: Houston didn't enter the league until 2002. So their records indicate that Detroit could have lost every game in 2000 and 2001, and then essentially tied the Texans thereafter. Arizona, Cleveland, and Oakland are also below 40% for the decade.
First the best:
It's actually unbelievably close. The New England Patriots thus far have the best record of this decade, at 101-42. However, the Indianapolis Colts are close behind at 100-43. At distant third is the Pittsburgh Steelers with a record of 93-49-1. Philadelphia is also over 60% for the decade.
The race for the bottom is not nearly as close. At 40-103, the Detroit Lions need to have a perfect season next year to have even the slightest hope of catching the Houston Texans (39-72) for the bottom. Put it another way: Houston didn't enter the league until 2002. So their records indicate that Detroit could have lost every game in 2000 and 2001, and then essentially tied the Texans thereafter. Arizona, Cleveland, and Oakland are also below 40% for the decade.
To punt or not to punt?
When I read that some whacko high school football coach in Arkansas decided to stop punting for the year, I was a taken back a little.
As I read down the article, however, I came to find he's got a mind for probability and a Class 5A State Championship.
As I read down the article, however, I came to find he's got a mind for probability and a Class 5A State Championship.
November 12, 2008
Some Preliminary Analysis
Unfortunately there is still not enough data to do the analysis I'd like to do - I mean, we've still got one state and two senate races that are too close to call (Georgia's senate race is not over because of the upcoming runoff - the first election, however, undoubtedly was won by Chambliss).
Here's what I know so far. My best projection was Obama 53.1 to McCain 45.9. This was done by a straight-line projection from their polling average, which means that I allocated more of the undecideds to Obama than McCain; I guessed McCain would edge Obama on undecideds but did not take the time to adjust the model. Nonetheless, with 126M votes recorded, the totals thus far of Obama 52.7 McCain 46.0 are not very far from my predictions.
One question of interest is how much of the error in my predictions was tied to how much polling was done in a state. So I did a simple (but nonlinear) regression of Absolute Error against Polling Strength (the sum of each poll's weight, which comes from its age and the reliability of its pollster).

The results fit pretty tightly. I'll try to keep this simple for the non-statistical folks.
Polling Strength (PS) ranged from 0.16 in DC, HI, and MD to 12.95 in MO, with a median of 2.0 and a mean of 3.16. (So Missouri is the farthest blue dot to the right)
Absolute Error (AE) ranged from 0.2% in NH to 22.5% in HI with a median of 2.25% and a mean of 3.8%. (So Hawaii is the highest blue dot from the bottom)
The regression equation is: AE = 4.65 - 1.66*ln(PS)
So MO's expected error is 0.4% and DC's is 7.7%
The p-value on this result is 0.0005. For the non-stat crowd, statisticians never say they've proven anything. What they do say however, is that this result is so extreme you would only see it 1 in 2,000 times by chance. 1 in 20 is generally the rule for significance, so this is a highly significant result. Finally r^2 is 0.25, meaning one quarter of the variability in the model's error is attributable solely to how much polling was done.
*************************
Wow, I really made that more boring than I thought I could.
What is most striking to me is that graph, where I think the line fits pretty well. The takeaway from this is that when people say polls are useless, this is meant as evidence that that's not true, provided you have a good set of polls and not just one or two. Hawaii, for example, had very little polling and the last poll conducted there was way off the mark. My prediction error was 22.5% including it and 4.4% excluding it. In states with more polls this sort of problem could not have arisen.
I'll have data that's actually interesting once more data comes in.
Here's what I know so far. My best projection was Obama 53.1 to McCain 45.9. This was done by a straight-line projection from their polling average, which means that I allocated more of the undecideds to Obama than McCain; I guessed McCain would edge Obama on undecideds but did not take the time to adjust the model. Nonetheless, with 126M votes recorded, the totals thus far of Obama 52.7 McCain 46.0 are not very far from my predictions.
One question of interest is how much of the error in my predictions was tied to how much polling was done in a state. So I did a simple (but nonlinear) regression of Absolute Error against Polling Strength (the sum of each poll's weight, which comes from its age and the reliability of its pollster).
The results fit pretty tightly. I'll try to keep this simple for the non-statistical folks.
Polling Strength (PS) ranged from 0.16 in DC, HI, and MD to 12.95 in MO, with a median of 2.0 and a mean of 3.16. (So Missouri is the farthest blue dot to the right)
Absolute Error (AE) ranged from 0.2% in NH to 22.5% in HI with a median of 2.25% and a mean of 3.8%. (So Hawaii is the highest blue dot from the bottom)
The regression equation is: AE = 4.65 - 1.66*ln(PS)
So MO's expected error is 0.4% and DC's is 7.7%
The p-value on this result is 0.0005. For the non-stat crowd, statisticians never say they've proven anything. What they do say however, is that this result is so extreme you would only see it 1 in 2,000 times by chance. 1 in 20 is generally the rule for significance, so this is a highly significant result. Finally r^2 is 0.25, meaning one quarter of the variability in the model's error is attributable solely to how much polling was done.
*************************
Wow, I really made that more boring than I thought I could.
What is most striking to me is that graph, where I think the line fits pretty well. The takeaway from this is that when people say polls are useless, this is meant as evidence that that's not true, provided you have a good set of polls and not just one or two. Hawaii, for example, had very little polling and the last poll conducted there was way off the mark. My prediction error was 22.5% including it and 4.4% excluding it. In states with more polls this sort of problem could not have arisen.
I'll have data that's actually interesting once more data comes in.
Subscribe to:
Posts (Atom)