# Bio-index

The bio-index model, developed by PollyVote team members Scott Armstrong and Andreas Graefe, predicts U.S. presidential election winners based on information about candidates’ biographies. The model uses 58 biographical cues that are expected to influence the chances of a candidate on being elected. The bio-index is useful to political decision-makers, as it can provide advice on questions such as whether a candidate should run for office or which candidate a party should nominate.

## Variables

A large stream of research, particularly in psychology, analyzes questions such as what makes people emerge as leaders? For example, meta-analyses found intelligence and height to have a positive impact on both leader performance and leader emergence. Such findings from prior research were used to identify and code the majority of variables in the bio-index. In addition, some variables are based on common sense. For example, it was assumed that voters are more attracted to candidates who are married but not divorced. As shown in Table 1 at the end of this page, the model distinguishes two types of variables:

1. Yes / No variables (n=47): For this type of variable, candidates are assigned a score of 1 if they possess a certain attribute and 0 otherwise.
2. Comparative variables (n=11): For this type of variable, the candidates of the two major parties are compared on the underlying attribute. The candidate who scores better than his opponent is assigned a score of 1 and 0 otherwise.

## Forecast calculation

### Predicting the election winner

After all variables have been coded, the total index scores for each candidate are calculated by summing up the scores. Then, the candidate who achieves the higher overall score is predicted as the election winner.

### Predicting vote-shares

Simple linear regression is then used to relate the incumbent party candidate’s relative bio-index score (`bio`) to the dependent variable, which is the actual two-party popular vote share received by the candidate of the incumbent party (V). Using data from the 30 elections from 1896 to 2012 leads to the following vote equation:

V = 19.9 + 60.5 * `bio`,

where `bio` = `bio-score (incumbent)` / [`bio-score (incumbent)` + `bio-score (challenger)`]

That is, the big-issue model predicts that an incumbent would start with 19.9% of the vote, plus a share depending on `bio`. If the incumbent’s relative `bio` score went up by 10 percentage points, the incumbent’s vote share would go up by 6 percentage points.

## 2016 forecast

The table above shows the coding for the two major parties‘ likely nominees (according to PredictWise), Hillary Clinton and Donald Trump. For some comparative variables, such as intelligence, data are not available. Other comparative variables, such as weight or height, are not coded due to the fact that the Clinton-Trump race is an inter-gender comparison.

Based on the coding, Clinton achieves a bio-index score of 18 points (compared to 10 points for Trump) and is thus predicted to win the election. Inserting these figures into the vote equation derived above leads to:

V = 19.9 + 60.5 * ( 18 / (18 + 10)) = 58.8

That is, in a hypothetical Clinton-Trump match-up, the bio-index model predicts Clinton to achieve 58.8% of the vote, compared to 41.2% for Trump.

You can also compute your own bio-index model forecasts. This feature allows you

1. See how the model forecast would change for different variable coding.
2. Compare two hypothetical candidates against each other.
3. See how you would perform as a candidate.

To do so, scroll all the way down to Table 1 and adjust the variable values in the green highlighted cells. Whenever you change a variable value, the forecast will be updated at the bottom of the table.

### Forecasts for other match-ups

We also coded other potential candidates (the complete codings are available here; let us know if you spot any mistakes). The following chart shows a selection of hypothetical match-ups between Republicans and Democrats. According to the bio-index model, Trump would be the worst possible candidate for the Republican party. However, the model also predicts Clinton to defeat all other remaining Republican candidates.

## Past performance

The model’s out-of-sample predictions of the 30 U.S. Presidential Elections from 1896 to 2012 failed only two times. The model wrongly predicted Ford to beat Carter in 1976 as well as Bush to defeat Clinton in 1992. For the remaining 28 elections, the model correctly predicted the winner. This record of 93% correct predictions compares favorably to other statistical models as well as to polls and prediction markets (Armstrong & Graefe, 2011). The forecast for the 2012 election, published in August 2011, correctly predicted Obama to defeat Romney (Graefe & Armstrong, 2011).

## Limitations

The bio-index model ignores many factors that are also important for predicting election outcomes. Examples include information about the state of the economy, the time the incumbent party has held the White House, the perceived ability to handle issues, or the effectiveness of the advertising campaigns. Yet, in relying on a different set of information than other models, the bio-issue model contributes to the accuracy of the combined PollyVote forecast.