# Bio-index

The bio-index model, developed by PollyVote team members Scott Armstrong and Andreas Graefe, predicts U.S. presidential election winners based on information about candidates’ biographies. The model uses 58 biographical cues that are expected to influence the chances of a candidate on being elected. The bio-index is useful to political decision-makers, as it can provide advice on questions such as whether a candidate should run for office or which candidate a party should nominate.

## Variables

A large stream of research, particularly in psychology, analyzes questions such as what makes people emerge as leaders? For example, meta-analyses found intelligence and height to have a positive impact on both leader performance and leader emergence. Such findings from prior research were used to identify and code the majority of variables in the bio-index. In addition, some variables are based on common sense. For example, it was assumed that voters are more attracted to candidates who are married but not divorced. As shown in Table 1 at the end of this page, the model distinguishes two types of variables:

1. Yes / No variables (n=47): For this type of variable, candidates are assigned a score of 1 if they possess a certain attribute and 0 otherwise.
2. Comparative variables (n=11): For this type of variable, the candidates of the two major parties are compared on the underlying attribute. The candidate who scores better than his opponent is assigned a score of 1 and 0 otherwise.

## Forecast calculation

The model is based on the index method, which is useful in situations with many variables and good prior knowledge about the variables’ directional effect on the target criterion.

### Predicting the election winner

After all variables have been coded, the total index scores for each candidate are calculated by summing up the scores using equal weights. Then, the candidate who achieves the higher overall score is predicted as the election winner.

### Predicting vote-shares

Simple linear regression is then used to relate the incumbent party candidate’s relative bio-index score (`bio`) to the dependent variable, which is the actual two-party popular vote share received by the candidate of the incumbent party (V). Using data from the 30 elections from 1896 to 2012 leads to the following vote equation:

V = 19.9 + 60.5 * `bio`,

where `bio` = `bio-score (incumbent)` / [`bio-score (incumbent)` + `bio-score (challenger)`]

That is, the big-issue model predicts that an incumbent would start with 19.9% of the vote, plus a share depending on `bio`. If the incumbent’s relative `bio` score went up by 10 percentage points, the incumbent’s vote share would go up by 6 percentage points.

## 2016 forecast

The table above shows the coding for Hillary Clinton and Donald Trump. For some comparative variables, such as intelligence, data are not available. Other comparative variables, such as weight or height, are not coded due to the fact that the Clinton-Trump race is an inter-gender comparison.

Based on the coding, Clinton achieves a bio-index score of 18 points (compared to 11 points for Trump) and is thus predicted to win the election. Inserting these figures into the vote equation derived above leads to:

V = 19.9 + 60.5 * ( 19 / (19 + 11)) = 58.3

That is, the bio-index model predicts Clinton to achieve 58.3% of the vote, compared to 41.7% for Trump.

You can also compute your own bio-index model forecasts. This feature allows you

1. See how the model forecast would change for different variable coding.
2. Compare two hypothetical candidates against each other.
3. See how you would perform as a candidate.

To do so, scroll all the way down to Table 1  and adjust the variable values in the green highlighted cells. Whenever you change a variable value, the forecast will be updated at the bottom of the table.

### Forecasts for other match-ups

The following chart shows the bio-index model’s forecasts for Clinton vs. other candidates. Trump stands out as one of the worst possible candidate (only Boehner scores lower than him). In comparison, Clinton performs well compared to the vast majority of potential opponents. In fact, there are only two candidates of those we coded that the model would predict to defeat Clinton: Jeb Bush and Lindsey Graham, both of whom already dropped out of the race.

## Past performance

The following chart shows the model’s predicted and actual percentage point lead in the two-party vote for the winners of the 30 elections from 1896 to 2012. The vote-share predictions are calculated in-sample. If both the grey and the orange bars are on the right hand side of zero, the model correctly predicted the final election winner.

As can be seen, the model’s forecasts failed only two times: it wrongly predicted Ford to beat Carter in 1976 as well as Bush to defeat Clinton in 1992. For the remaining 28 elections, the model correctly predicted the winner. This record of 93% correct predictions compares favorably to other statistical models as well as to polls and prediction markets. The forecast for the 2012 election, published in August 2011, correctly predicted Obama to defeat Romney.

The chart also puts the model’s 2016 forecast in historical perspective. The predicted 16.6-point lead for Clinton over Trump is the 3rd-largest margin ever (tied with the 1996 race between Clinton and Dole). The two elections in which the method predicted an even larger lead for a candidate were 1904 (Theodore Roosevelt vs. Alton Parker) and 1940 (Franklin D. Roosevelt vs. Wendell Willkie).

## Limitations

As any model, the bio-index is subject to limitations.

1. The bio-index model ignores many factors that are also important for predicting election outcomes. Examples include information about the state of the economy, the time the incumbent party has held the White House, the perceived ability to handle issues, or the effectiveness of the advertising campaigns.
2. The wrong forecasts in 1976 (predicted Ford) and 1992 (predicted Bush) indicate a certain bias towards experience. This is of course obvious, given that the model is based on the assumption that prior experience is a predictor of leader emergence.
3. The 2016 election will most likely be the first male-female race for president in U.S. electoral history. In this situation, some variables (e.g., height, weight) are not coded since their predictive validity is unclear.

That said, the model’s major aim is not to produce the most accurate forecasts possible. If this is what you are looking for, check out the combined PollyVote forecast. Instead, the major goal of the bio-index was to provide decision-making implications by advising parties on who they should nominate. Given the predicted 18-point lead for Clinton in a hypothetical race against Trump, the model’s implications are clear: Donald Trump would be the worst possible choice of the remaining candidates.