I’ve made my first stab at a prediction for the electoral college outcome for the US Presidential race, 2016. I use a roughly similar methodology as I did to accurately predict most of the Democratic primaries. However, since primaries are different from a general, the methodology had to be adapted.
For the primaries, I eventually used this methodology. I used results form prior primaries to predict voter behavior by ethnicity, in order to predict final behavior. That worked because primaries are done a few states at a time, and because all the people being modeled were Democrats.
It turns out that white people vary a lot across the country with how many per state are assholes. I think there is some variation among Hispanics as well, but African Americans are pretty consistent. So, here, I combined ethnicity with a “Romney Index” indicating how many people in a given state voted for Romney against Obama.
I then put down the poll numbers, the averages of the last several polls, from RCP, where available. I then ranked the results to knock out states with no polls. I then took out the middle, which included swing states, close states, etc. to use only the 23 most distinct states for which there were data to produce a multi variable regression model using “white”, “black”, “hispanic”, and “romney_index” as independent variables. The dependent variable was the poll value. In future iterations, that is what will change. I’ll do a more refined version of that.
I then applied this formula to predict the breakdown between Clinton and Trump in the other ca. half of the states that are more ambiguous.
The multiple R-squared for this model was 0.952, so that’s great. But, I was using only the values at the extreme, so I violated the law of homoscedasticity. But I don’t care about no stinking homoscedasticity, because I have only one data set, am predicting only one election, and I am basically using the regression model as a fancy fill in the blank formula. The fact that the R-squared is so high is great, were it low, I’d be in trouble, but its actual value is not important.
I then took all the states where Trump gets over 50% of the vote and gave them to him. I then gave almost all the other states to Clinton, but I left out a few that were very close, to leave them as unknown. Even if all those unknowns go to Trump, however, the outcome is the same: Clinton wins. Trump loses.
I’ll refine and revise again with more care given to the various parts of the model. I’d love to do this poll free, but not sure if that is possible.
The final output data are spewed onto 270 to win.
from ScienceBlogs http://ift.tt/2e1VGMd
I’ve made my first stab at a prediction for the electoral college outcome for the US Presidential race, 2016. I use a roughly similar methodology as I did to accurately predict most of the Democratic primaries. However, since primaries are different from a general, the methodology had to be adapted.
For the primaries, I eventually used this methodology. I used results form prior primaries to predict voter behavior by ethnicity, in order to predict final behavior. That worked because primaries are done a few states at a time, and because all the people being modeled were Democrats.
It turns out that white people vary a lot across the country with how many per state are assholes. I think there is some variation among Hispanics as well, but African Americans are pretty consistent. So, here, I combined ethnicity with a “Romney Index” indicating how many people in a given state voted for Romney against Obama.
I then put down the poll numbers, the averages of the last several polls, from RCP, where available. I then ranked the results to knock out states with no polls. I then took out the middle, which included swing states, close states, etc. to use only the 23 most distinct states for which there were data to produce a multi variable regression model using “white”, “black”, “hispanic”, and “romney_index” as independent variables. The dependent variable was the poll value. In future iterations, that is what will change. I’ll do a more refined version of that.
I then applied this formula to predict the breakdown between Clinton and Trump in the other ca. half of the states that are more ambiguous.
The multiple R-squared for this model was 0.952, so that’s great. But, I was using only the values at the extreme, so I violated the law of homoscedasticity. But I don’t care about no stinking homoscedasticity, because I have only one data set, am predicting only one election, and I am basically using the regression model as a fancy fill in the blank formula. The fact that the R-squared is so high is great, were it low, I’d be in trouble, but its actual value is not important.
I then took all the states where Trump gets over 50% of the vote and gave them to him. I then gave almost all the other states to Clinton, but I left out a few that were very close, to leave them as unknown. Even if all those unknowns go to Trump, however, the outcome is the same: Clinton wins. Trump loses.
I’ll refine and revise again with more care given to the various parts of the model. I’d love to do this poll free, but not sure if that is possible.
The final output data are spewed onto 270 to win.
from ScienceBlogs http://ift.tt/2e1VGMd
Aucun commentaire:
Enregistrer un commentaire