Tuesday, August 16, 2016

Combining polls data from different sources using covariance intersection

In a previous post, we have seen how to perform polls for a single State using poll data from KTNV/Rasmussen.
Here  we are going to see how to combine polls from different sources.

Let us consider again Nevada polls.

Poll Date Sample MoE Clinton (D) Trump (R) Johnson (L) Spread
0 RCP Average 7/7 - 8/5 -- -- 43 40.7 6.3 Clinton +2.3
1 CBS News/YouGov* 8/2 - 8/5 993 LV 4.6 43 41.0 4.0 Clinton +2
2 KTNV/Rasmussen 7/29 - 7/31 750 LV 4.0 41 40.0 10.0 Clinton +1
3 Monmouth 7/7 - 7/10 408 LV 4.9 45 41.0 5.0 Clinton +4

Instead of doing an average of the poll as it is done by RCP (RealClearPolitics), we use Covariance Intersection. Covariance intersection is an algorithm for combining two or more data source when the correlation between them is unknown.

Let us denote with \(\hat{a}\) a vector of observations (e.g., 43,41,16 from CBS News/YouGov) and  \(\hat{b}\) another vector of observations (e.g., 41,40,19 from KTNV/Rasmussen). \(A\) denotes the reliability of the data poll \(\hat{a}\) that we assume to be equal  \(1/sample size\) (e.g., 1/993
for CBS News/YouGov) and \(B\) denotes the reliability of the data poll \(\hat{b}\) (e.g., 1/750 for KTNV/Rasmussen).

Given the weight \(\omega\),Covariance Intersection provides a formula to combine them: 

$$ C^{{-1}}=\omega A^{{-1}}+(1-\omega )B^{{-1}}\,, $$ $$ \hat{c} =C(\omega A^{{-1}}{\hat a}+(1-\omega )B^{{-1}}{\hat b})\,. $$

This formula can be extended to an arbitrary number of sources.  For instance, for the previous table using  uniform weights  \(\omega_1=1/3,\omega_3=1/3,\omega_3=1/3\), we get

$$ C^{{-1}}=\omega_1 993+\omega_2 750+\omega_3 408=717$$ $$ \hat{c} =C(\omega_1 993{[43,41,16]}+\omega_2 750[41,40,19]+\omega_3 408[45,41,14])\,. $$

The final result is

C^{{-1}}=717,  ~~~\hat{c}=[42.68,   40.65,   16.67]

It can be observed that by using \(\omega_1=1/3,\omega_3=1/3,\omega_3=1/3\) the combined poll \(\hat{c}\) reduces to the average of the input polls
weighted by the sample size. However, it is possible to choose other values of the weights, see for instance here.

Tuesday, August 9, 2016

Nevada data poll with near-ignorance priors and Python

I will show how to apply the models described in http://idpstat.blogspot.ch/2016/08/a-description-of-bayesian-near_3.html to predict USA2016 election results in Nevada (as an example). The polls data are from www.realclearpolitics.com, in particular KTNV/Rasmussen poll (see below). In a future post, I will discuss how to take into account of the three polls. We start by importing the data.

import pandas as pd
data_df = pd.read_table('nevada.csv')
Poll Date Sample MoE Clinton (D) Trump (R) Johnson (L) Spread
0 RCP Average 7/7 - 8/5 -- -- 43 40.7 6.3 Clinton +2.3
1 CBS News/YouGov* 8/2 - 8/5 993 LV 4.6 43 41.0 4.0 Clinton +2
2 KTNV/Rasmussen 7/29 - 7/31 750 LV 4.0 41 40.0 10.0 Clinton +1
3 Monmouth 7/7 - 7/10 408 LV 4.9 45 41.0 5.0 Clinton +4
#KTNV/Rasmussen poll

N=750 #sample size
pt=0.40 # percentage votes for Trump
pc=0.41 # percentage votes for Clinton
pu=1-pt-pc  # percentage votes Other (Johnson)

#compute the number of votes

We now define the function we aim to compute in the Bayesian inference. We call the function $g$: it returns the difference between the chance of Trump $\theta_1$ and those of Clinton $\theta_2$

#define the function of interest for Bayesian inference
def g(theta):
    #x is a numpy vector
    return (theta[:,0]-theta[:,1])

We now write the function to compute the posterior samples of the posterior Dirichlet distribution

#function that computes the posterior sampels
def compute_posterior_samples(ap,Np):
    #ap: posterior Dirichlet distribution vector parameters
    #Np: number of MC samples
    return np.random.dirichlet(ap,Np) #we use numpy

We can now compute the posterior expectation of interest. We use a uniform prior.

import warnings

%matplotlib inline
import numpy as np
import seaborn as sns

#data from the poll
#uniform prior 
#number of MC samples
postsamples = compute_posterior_samples(datapoll+a,Np)

# Set up the matplotlib figure
sns.distplot(g(postsamples), axlabel="Trump-Clinton", 
             kde=True, hist=True) #, hist_kws={"range": [-1,1]}
print('Posterior probability of Trump winning is', prob)
Posterior probability of Trump winning is 0.38011

The posterior probability in favor of Trump is the area of the curve from 0 to 1, while the one in favor of Clinton is the are from -1 to 0. This is a standard Bayesian analysis using a uniform prior. We have already explained that this type of prior is not noninformative. We now use a better model: a near-ignorance model (http://idpstat.blogspot.ch/2016/08/a-description-of-bayesian-near_3.html).

#prior for a swing scenario in favor of Clinton
#prior for a swing scenario in favor of Trump

postsampleslower = compute_posterior_samples(datapoll+al,Np)
postsamplesupper = compute_posterior_samples(datapoll+au,Np)

# Set up the matplotlib figure
sns.distplot(g(postsampleslower), axlabel="Trump-Clinton", 
             kde=True, hist=True) #, hist_kws={"range": [-1,1]}
             kde=True, hist=True,color='darkred') #, hist_kws={"range": [-1,1]}

print('Posterior probability of Trump winning is in [',problower,probupper,']')
Posterior probability of Trump winning is in [ 0.31799 0.44406 ]

We can see that a change of only two votes from Clinton to Trump (or vice versa) changes the probability of 12 points. This means that the prior has a significative effect on the posterior. A near-ignorance model automatically provides a sensitivity analysis of the inferences to the prior strength. This explains why we should always use near-ignorance models. In the next post, we will discuss how to combine polls from different States and return the overall winning probability. Then it will be even more evident the importance of using a near-ignorance models.


Friday, August 5, 2016

A description of a Bayesian near-ignorance model for USA election polls

Election Poll for a single state

In this and follwoing posts, I'll present a way to compute Bayesian prediction for the result of USA 2016 election based on election poll data and near-ignorance prior models. This model is described in detail here:
A. Benavoli and M. Zaffalon. "Prior near ignorance for inferences in the k-parameter exponential family". Statistics , 49:1104-1140, 2014. (http://www.idsia.ch/~alessio/benavoli2014b.pdf)
In USA 2016 election, we have two main candidates Trump and Clinton. We denote with $\theta_{t}$ Trump's winning probability; with $\theta_{c}$ Clinton's winning probability and $\theta_{u}$ the undecided case (undecided voters, other candidates etc.). The goal of the inference is to estimate these parameters and in particular the focus is to compare the proportions of voters for $Trump$ and $Clinton$, i.e., $\theta_{t}-\theta_{c}$.

Likelihood model

In the election poll, a total of $n$ adults are polled to indicate their preference for the candidates $Trump$ and $Clinton$. Let $\hat{y}_{nt}$ denote the proportion of the sample that supports $Trump$, $\hat{y}_{nc}$ denote the proportion that supports $Clinton$ and $\hat{y}_{nu}=1-\hat{y}_{nt}-\hat{y}_{nc}$ denote the proportion that is either undecided or vote for someone else. The counts $n\hat{y}_{nt}$ (number of votes for Trump), $n\hat{y}_{nc}$ (number of votes for Clinton) and $n\hat{y}_{nu}$ (undecided) are assumed to have a multinomial distribution with sample size $n$ and respectively parameters $\theta_{t}$ (Trump), $\theta_{c}$ (Clinton) and $\theta_{u}$ (undecided). Thus, the likelihood model is:
$$ p(data|\theta)=\theta_{t}^{n\hat{y}_{nt}} \theta_{c}^{n\hat{y}_{nt}} \theta_{ u}^{n\hat{y}_{nu}}, $$where $\theta_{t}+\theta_{c}+\theta_{u}=1$ are the unknown non-negative chances to be estimated.

Noninformative Prior model

In the standard Bayesian apparoach, we need to choose a prior on the unknonw $\theta$ parameters. A Dirichlet conjugate prior is a natural prior for $\theta_{t}$, $\theta_{c}$ and $\theta_{u}$:
$$ p(\theta)\propto \theta_{t}^{\alpha_{t}-1} \theta_{c}^{\alpha_{c}-1} \theta_{u}^{\alpha_{u}-1}, $$where in the case of lack of prior information the prior parameters are commonly selected as follows: Haldane's prior $\alpha_{t}=\alpha_{c}=\alpha_{u}=0$ (that is an improper prior); Jeffreys' prior $\alpha_{t}=\alpha_{c}=\alpha_{u}=\tfrac{1}{2}$; uniform prior $\alpha_{t}=\alpha_{c}=\alpha_{u}=1$. The expected value of $E[\theta_{t}-\theta_{u}]$ is equal to $0$ and the prior probability $P(\theta_{t}>\theta_{u})=0.5$ for both Jeffreys and uniform priors (for Haldane's prior they are not defined). See the paper for details about how computoing this lower and upper bounds.
These are commonly called noninformative priors, but they are not noninformative. These priors express indifference between $Trump$ and $Clinton$, but not prior ignorance.
To see that, consider $P[\theta_{t}+0.5 \theta_{u}>\theta_{c}+0.4\theta_{u}]$, this is the probability that the proportion of votes of $Trump$ exceeds the votes for $Clinton$ assuming a ``swing'' scenario in which $50\%$ of the undecideds vote for $Trump$ and $40\%$ of the undecideds for $Clinton$. This probability is equal to $0.76$ in the case of the uniform prior and $0.66$ in the case of Jeffreys' prior. It depends on the choice of the prior and this shows that the uniform and Jeffrey's priors are not really uninformative for this kind of poll.
Combining likelihood and prior, the resulting posterior is
$$ p(\theta|n,\hat{y}_n)\propto \theta_{t}^{n\hat{y}_{nt}+\alpha_{t}-1} \theta_{c}^{n\hat{y}_{nc}+\alpha_{c}-1} \theta_{u}^{n\hat{y}_{nu}+\alpha_{u}-1}, $$which is always proper in the case of the Jeffreys' and uniform prior and in the case of Haldane's prior provided that $\hat{y}_{nt},\hat{y}_{nc},\hat{y}_{nu}>0$.
The posterior expected value of $\theta_{t}-\theta_{c}$ (Trump-Clinton difference) is: $$ E[\theta_{t}-\theta_{c}|n,\hat{y}_n]=\dfrac{n\hat{y}_{nt}+\alpha_{t}}{n+\alpha_{t}+\alpha_{c}+\alpha_{u}}-\dfrac{n\hat{y}_{nc}+\alpha_{c}}{n+\alpha_{t}+\alpha_{c}+\alpha_{u}}, $$ while the posterior probability of the event $\theta_{t}-\theta_{c}>0$ is $$ P[\theta_{t}>\theta_{c}|n,\hat{y}_n]=\dfrac{\int\limits_{\{\theta_{t}>\theta_{c}\}} \theta_{t}^{n\hat{y}_{nt}+\alpha_{t}-1} \theta_{c}^{n\hat{y}_{nc}+\alpha_{c}-1} \theta_{u}^{n\hat{y}_{nu}+\alpha_{u}-1} ~d\theta}{\int \theta_{t}^{n\hat{y}_{nt}+\alpha_{t}-1} \theta_{c}^{n\hat{y}_{nc}+\alpha_{c}-1} \theta_{u}^{n\hat{y}_{nu}+\alpha_{u}-1} ~d\theta}, $$ which can be computed numerically by sampling from the Dirichlet distribution.

Near-ignorance Prior model

We now present a weaker prior model (near-ignorance model) that automatically allows us to take into account of possible swing scenarios.
A near-ignorance is a set of priors, in particular we consider this set:
$$ \mathcal{M}=\left\{\theta_{t}^{\ell_{t}-1}\theta_{c}^{\ell_{c}-1}\theta_{u}^{-\ell_{t}-\ell_{c}-1}, ~|\ell_i| \leq v, ~-\ell_{t}+\ell_{c}\in [-v,v] \right\}, $$here $v$ is parameter that represents pseudo-votes and it determines the strength of the prior inferences on the posterior inferences
When we have set of probabilty distributions, inference is obtained by computing lower and upper bounds of the expectations and probabilties of interest. From this model, a priori we have $\underline{E}[\theta_{t}-\theta_{c}]=-1$ (lower expectation), $\overline{E}[\theta_{t}-\theta_{c}]=1$ (upper expectation). Moreover, we have that $\underline{P}(\theta_{t}>\theta_{c})=0$, $\overline{P}(\theta_{t}>\theta_{c})=1$ and that $\underline{P}[\theta_{t}+0.5 \theta_{u}>\theta_{c}+0.4\theta_{u}]=0$, and $\overline{P}[\theta_{t}+0.5 \theta_{u}>\theta_{c}+0.4\theta_{u}]=1$.This is really a model of prior ignorance. Before seein the data all the probabilties of interest can assume any value from 0 and 1 that means we are not biasing our inference neither towards Trump not towards Clinton. This is a more correct expression of the lack of prior information on the election result.
The resulting set of posteriors is
$$ \mathcal{M}_p=\left\{p(\theta|data)\propto \theta_{t}^{n\hat{y}_{nt}+\ell_{t}-1} \theta_{c}^{n\hat{y}_{nc}+\ell_{c}-1} \theta_{u}^{n\hat{y}_{nu}+\ell_{u}-1}, ~|\ell_i| \leq v, ~-\ell_{t}+\ell_{c}\in [-v,v] \right\}, $$From this set we can extract posteriors that are useful for analysing possible Swing Scenario:
  1. v votes move from Trump to Clinton, the resulting posterior is $$ p(\theta|n,\hat{y}_n)\propto\theta_{t}^{n\hat{y}_{nt}-v-1} \theta_{c}^{n\hat{y}_{nc}+v-1} \theta_{u}^{n\hat{y}_{nu}-1} $$
  2. v votes move from Clinton to Trump, the resulting posterior is $$ p(\theta|n,\hat{y}_n)\propto \theta_{t}^{n\hat{y}_{nt}+v-1} \theta_{c}^{n\hat{y}_{nc}-v-1} \theta_{u}^{n\hat{y}_{nu}-1} $$
In the next post, we will use this model to make inferences on signle state polls.