Introduction: Estimating Proabilties with Bayesian Inference

Adapted from WIll Koershen

Learning Purposes Only

Bayesian Approach

Problem:

Today's Bayesian problem of the week: Suppose we visit a wild animal preserve where we know that the only animals are lions and tigers and bears, but we don't know how many of each there are. During the tour, we see 3 lions, 2 tigers, and 1 bear. Assuming that every animal had an equal chance to appear in our sample, estimate the prevalence of each species. What is the probability that the next animal we see is a bear?

In a frequentist view, we would just the the observed animals to esimate the prevalance and let the data completetly speak for itself. In, contract, a Bayesian framework we need to incorpate priors. In the limit as n observations tends to infinity, the effect of the priors disappear, we and only use the data. We can alger the weigght of the priors depending on out confidence in them

Model

The overall sysm is as follows:

  1. The underlying model is multinomal with parmaters $p_{k}$
  2. The $prior$ distribution of $p_{i}$ is a Dirichlet distribution
  3. THe $\alpha$ is a parameter of the prior Dirichelt distribution,

THe model can be expressed as:

$$ \begin{align} \alpha = (\alpha_{1}...,\alpha_{K}) && = concentration hyperparameter \\ p | \alpha = (\alpha_{1}...,\alpha_{K})&& = (K,\alpha) \\ X | p = (x_{1}...,x_{K}) && = (K,p) \end{align} $$

Our goal is to fine $p_{lions},p_{tigers},p_{bears}$ given for the observation vector $c = [c_{lions},c_{tigers},c_{bears}]$

Multinomial distribution

The problem is a classic example of hte multi. dist, which desribes a situation where we have n independent trials, each with k possible outcomes.

Dirichelt Distribution

The prior for a multinomial distribution in Bayesian statistics is a Dirichlet distribution. (Together, a multinomial distribution with a dirichlet prior is called, not surprisingly, a Dirichlet-Multinomial Distribution. The Dirichlet distribution is characterized by $\alpha$, the concentration hyperparameter vector.

Hyperapameters

The $\alpha$ vector is a hyperparameter, a parameter of a prior distribution. This vector in turn could have its own prior distribution which is called a hyperprior. We won't use a hyperprior, but instead will only specify the hyperparameters.

The hyperparameter vector can be thought of as pseudo-counts, which we use to show our prior belief for the prevalence of each species. If want a uniform hyperparameter reflecting that we believe the chance of observing any species is the same we set each element of alpha equal such as $\alpha = [1, 1, 1]$. We can increase our decrease the effect of the priors by increasing or decreasing the values. This can be useful when we have more or less confidence in our prior beliefs. We'll see the effects of adjusting the hyperparameters in the notebook, but will mainly work with $\alpha = [1, 1, 1]$.

In [1]:
import pandas as pd
import numpy as np

# Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
plt.rcParams['font.size'] = 22
%matplotlib inline

from matplotlib import MatplotlibDeprecationWarning

import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=MatplotlibDeprecationWarning)

import pymc3 as pm
import arviz
In [2]:
#helper functions
'''Functions for drawing contours of Dirichlet distributions.'''

# Author: Thomas Boggs

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.tri as tri
from functools import reduce

_corners = np.array([[0, 0], [1, 0], [0.5, 0.75**0.5]])
_triangle = tri.Triangulation(_corners[:, 0], _corners[:, 1])
_midpoints = [(_corners[(i + 1) % 3] + _corners[(i + 2) % 3]) / 2.0 \
              for i in range(3)]

def xy2bc(xy, tol=1.e-3):
    '''Converts 2D Cartesian coordinates to barycentric.
    Arguments:
        `xy`: A length-2 sequence containing the x and y value.
    '''
    s = [(_corners[i] - _midpoints[i]).dot(xy - _midpoints[i]) / 0.75 \
         for i in range(3)]
    return np.clip(s, tol, 1.0 - tol)

class Dirichlet(object):
    def __init__(self, alpha):
        '''Creates Dirichlet distribution with parameter `alpha`.'''
        from math import gamma
        from operator import mul
        self._alpha = np.array(alpha)
        self._coef = gamma(np.sum(self._alpha)) / \
                     reduce(mul, [gamma(a) for a in self._alpha])
    def pdf(self, x):
        '''Returns pdf value for `x`.'''
        from operator import mul
        return self._coef * reduce(mul, [xx ** (aa - 1)
                                         for (xx, aa)in zip(x, self._alpha)])
    def sample(self, N):
        '''Generates a random sample of size `N`.'''
        return np.random.dirichlet(self._alpha, N)

def draw_pdf_contours(dist, border=False, nlevels=200, subdiv=8, **kwargs):
    '''Draws pdf contours over an equilateral triangle (2-simplex).
    Arguments:
        `dist`: A distribution instance with a `pdf` method.
        `border` (bool): If True, the simplex border is drawn.
        `nlevels` (int): Number of contours to draw.
        `subdiv` (int): Number of recursive mesh subdivisions to create.
        kwargs: Keyword args passed on to `plt.triplot`.
    '''
    from matplotlib import ticker, cm
    import math

    refiner = tri.UniformTriRefiner(_triangle)
    trimesh = refiner.refine_triangulation(subdiv=subdiv)
    pvals = [dist.pdf(xy2bc(xy)) for xy in zip(trimesh.x, trimesh.y)]

    plt.tricontourf(trimesh, pvals, nlevels, **kwargs)
    plt.axis('equal')
    plt.xlim(0, 1)
    plt.ylim(0, 0.75**0.5)
    plt.axis('off')
    if border is True:
        plt.hold(1)
        plt.triplot(_triangle, linewidth=1)

def plot_points(X, barycentric=True, border=True, **kwargs):
    '''Plots a set of points in the simplex.
    Arguments:
        `X` (ndarray): A 2xN array (if in Cartesian coords) or 3xN array
                       (if in barycentric coords) of points to plot.
        `barycentric` (bool): Indicates if `X` is in barycentric coords.
        `border` (bool): If True, the simplex border is drawn.
        kwargs: Keyword args passed on to `plt.plot`.
    '''
    if barycentric is True:
        X = X.dot(_corners)
    plt.plot(X[:, 0], X[:, 1], 'k.', ms=1, **kwargs)
    plt.axis('equal')
    plt.xlim(0, 1)
    plt.ylim(0, 0.75**0.5)
    plt.axis('off')
    if border is True:
        plt.hold(1)
        plt.triplot(_triangle, linewidth=1)
    
def annotate_plot():
    cs = ['#008fd5', '#fc4f30', '#e5ae38']
    animals = ['lions', 'tigers', 'bears']
    for i, (xy, s) in enumerate(zip([(-0.15, 0.075), (0.38, 0.868), (0.98, 0.075)],
                         animals)):
        plt.annotate(s, xy, color = cs[i], size = 20)

def add_legend(ax):
    """Function to add legend to plots"""
    animals = ['lions', 'tigers', 'bears']
    for l, a in zip(ax.get_lines()[:3], animals):
        ax.plot(0, 0, label = a, c = l.get_color())
    ax.legend()
   
def display_probs(d):
    for key, value in d.items():
        print(f'Species: {key:8} Prevalence: {100*value:.2f}%.')
# if __name__ == '__main__':
#     f = plt.figure(figsize=(8, 6))
#     alphas = [[0.999] * 3,
#               [5] * 3,
#               [2, 5, 15]]
#     for (i, alpha) in enumerate(alphas):
#         plt.subplot(2, len(alphas), i + 1)
#         dist = Dirichlet(alpha)
#         draw_pdf_contours(dist)
#         title = r'$\alpha$ = (%.3f, %.3f, %.3f)' % tuple(alpha)
#         plt.title(title, fontdict={'fontsize': 8})
#         plt.subplot(2, len(alphas), i + 1 + len(alphas))
#         plot_points(dist.sample(5000))
#     plt.savefig('dirichlet_plots.png')
#     print 'Wrote plots to "dirichlet_plots.png".'

Specifics

Mainly use one verionsof the hyperparmeters where they are all 1

In [3]:
#observations
animals = ['lions','tigers','bears']
c = np.array([3,2,1])
#hyperparams
alphas = np.array([1,1,1])
alpha_list = [np.array([0.1, 0.1, 0.1]), np.array([1, 1, 1]),
                    np.array([5, 5, 5]), np.array([15, 15, 15])]

Expected Value

A way to get a sinlge point esimate of the prevalance is to use the expected value of the posterior for $p_{k}$. The expected value of the Dirchilet mult distr is:

$ \begin{align} [p_{i} | X,\alpha] = \frac{c_{i} + \alpha_{i}}{N + \sum_{k} \alpha_{k}} \end{align} $

Using $\alpha = [1, 1, 1]$, and the observation vector $c = [3, 2, 1]$ we get the expected prevalances:

$$p_{lions} = \frac{4}{9} = 44.4\%$$$$p_{tigers} = \frac{3}{9} = 33.3\%$$$$p_{bears} = \frac{2}{9} = 22.2\%$$

Does not extend to limited amount of data. For that we turn to Bayesian modeling, and sample fromt the posterior MCMC. The expected value represent our best guess for what a single trial. We can also oalter the expected value by adjusting the weight of the hyperparm vectro

In [4]:
display_probs(dict(zip(animals, (alphas + c) / (c.sum() + alphas.sum()))))
Species: lions    Prevalence: 44.44%.
Species: tigers   Prevalence: 33.33%.
Species: bears    Prevalence: 22.22%.
In [5]:
display_probs(dict(zip(animals, (4/9, 3/9, 2/9))))
Species: lions    Prevalence: 44.44%.
Species: tigers   Prevalence: 33.33%.
Species: bears    Prevalence: 22.22%.

Let's try a difference hyperparams value and see how expected value changes

In [6]:
values = []
for alpha_new in alpha_list:
    values.append((alpha_new + c) / (c.sum() + alpha_new.sum()))

value_df = pd.DataFrame(values, columns = animals)
value_df['alphas'] = [str(x) for x in alpha_list]
value_df
Out[6]:
lions tigers bears alphas
0 0.492063 0.333333 0.174603 [0.1 0.1 0.1]
1 0.444444 0.333333 0.222222 [1 1 1]
2 0.380952 0.333333 0.285714 [5 5 5]
3 0.352941 0.333333 0.313725 [15 15 15]
In [7]:
melted = pd.melt(value_df, id_vars = 'alphas', value_name='prevalence',
        var_name = 'species')

plt.figure(figsize = (8, 6))
sns.barplot(x = 'alphas', y = 'prevalence', hue = 'species', data = melted,
            edgecolor = 'k', linewidth = 1.5);
plt.xticks(size = 14); plt.yticks(size = 14)
plt.title('Expected Value');

Note, that as the weight of the hyperparams (these are the alpha values used from the posterior dirchilet dist) increases, the prevalcne conveges to around 1/3. Intutitvely, as we increase the number of pseudocounts, we express more strognly our inital belief of the equal prevalence scenario. With heavier priors, the data matters less.

On the hand, when we decrease the weight of the hyperparams, the data matters more and the expected value convered on the counts. The ultimate choise of the hyperparms depends on our confidence in out prior

Maximum Aposeteriori Estimation (MAP)

MAP is another point estimation that is equal to the mode of the posterior distribution of the posterior distribution: $${\displaystyle \operatorname {arg\,max} \limits _{\mathbf {p} }p(\mathbf {p} \mid \mathbb {X} )={\frac {\alpha _{i}+c_{i}-1}{\sum _{i}(\alpha _{i}+c_{i}-1)}},\qquad \forall i\;\alpha _{i}+c_{i}>1}$$ Note the lapacian smoothing:

In the case of $\alpha = [1, 1, 1]$ , this turns into the frequency of observed in the data

In [8]:
display_probs(dict(zip(animals, (alphas + c - 1) / sum(alphas + c - 1))))
Species: lions    Prevalence: 50.00%.
Species: tigers   Prevalence: 33.33%.
Species: bears    Prevalence: 16.67%.
In [9]:
display_probs(dict(zip(animals, c / c.sum())))
Species: lions    Prevalence: 50.00%.
Species: tigers   Prevalence: 33.33%.
Species: bears    Prevalence: 16.67%.
In [10]:
values = []
for alpha_new in alpha_list:
    values.append((alpha_new + c - 1) / sum(alpha_new + c - 1))

value_df = pd.DataFrame(values, columns = animals)
value_df['alphas'] = [str(x) for x in alpha_list]
value_df
Out[10]:
lions tigers bears alphas
0 0.636364 0.333333 0.030303 [0.1 0.1 0.1]
1 0.500000 0.333333 0.166667 [1 1 1]
2 0.388889 0.333333 0.277778 [5 5 5]
3 0.354167 0.333333 0.312500 [15 15 15]
In [11]:
melted = pd.melt(value_df, id_vars = 'alphas', value_name='prevalence',
        var_name = 'species')

plt.figure(figsize = (8, 6))
sns.barplot(x = 'alphas', y = 'prevalence', hue = 'species', data = melted,
            edgecolor = 'k', linewidth = 1.5);
plt.xticks(size = 14); plt.yticks(size = 14)
plt.title('Maximum A Posterior Value');

Same effect occurs! Both expected value and MAP only provide a point estimate with no uncertainty. Given the relatively few observations, there should only be a large amount of uncertaintiy...

Bayesian Model

The objective is to find the parameters of the multinomial dist, $p_{k}$ which are the probs of each species given the evidence. Recally, we are using a multinomial as our model, a Dirichilet distributino as the prior, and the specified hyperparam vector: $ \begin{align} (p|X,\alpha) \end{align} $

PyMC3 and MCMC

Build model using PyMC3, and then use a variant of Markov Chain Monte Carlo (the No-Uturn Sample specifically) to draw sample from the posterior. With enough samples, the estimate will convered on the true posterior. ALong with single point estimates (such as the mean of the sampled values), MCMC also give us the built in uncertaintiy because we get thousands of possible values from the posterior

In [12]:
with pm.Model() as model:
    # Parameters of the Multinomial are from a Dirichlet
    parameters = pm.Dirichlet('parameters', a=alphas, shape=3)
    # Observed data is from a Multinomial distribution
    observed_data = pm.Multinomial(
        'observed_data', n=6, p=parameters, shape=3, observed=c)
In [13]:
print(parameters)
print(alphas)
print(c)
parameters
[1 1 1]
[3 2 1]
In [14]:
model
Out[14]:
$$ \begin{array}{rcl} parameters &\sim & \text{Dirichlet}(\mathit{a}=array)\\observed_data &\sim & \text{Multinomial}(\mathit{n}=6, \mathit{p}=f(\text{parameters},~f(f(\text{parameters})))) \end{array} $$

Sampling the model

Draw 1000 from the posterior in 2 chains. WE use 500 smaples for tuning which are discarded. This meand that for each random variable in the model - the random variables are the parameters - we will have 2000 values drawn from the posterior distribution

In [15]:
with model:
    # Sample from the posterior
    trace = pm.sample(draws=1000, chains=2, tune=500, 
                      discard_tuned_samples=True)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [parameters]
Sampling 2 chains: 100%|██████████| 3000/3000 [00:00<00:00, 3120.03draws/s]
In [16]:
summary = pm.summary(trace)
summary.index = animals
summary
Out[16]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
lions 0.448075 0.162908 0.003424 0.137824 0.749909 1847.877113 0.999899
tigers 0.331740 0.152746 0.003644 0.054202 0.626152 1889.830372 0.999558
bears 0.220185 0.132979 0.003011 0.018361 0.488707 1825.819442 0.999748

We can see that the mean of the samples is very close to the expected value. However, instead of just getting one number, we get a rang of uncertaintity

In [17]:
# Samples
trace_df = pd.DataFrame(trace['parameters'], columns = animals)
trace_df.head()
Out[17]:
lions tigers bears
0 0.212186 0.384596 0.403217
1 0.255443 0.289665 0.454892
2 0.316250 0.400231 0.283519
3 0.341046 0.421098 0.237856
4 0.512615 0.112162 0.375223
In [18]:
trace_df.shape
Out[18]:
(2000, 3)

For a single point estimate, we can use the mean of the samples

In [19]:
# For probabilities use samples after burn in
pvals = trace_df.iloc[:, :3].mean(axis = 0)
display_probs(dict(zip(animals, pvals)))
Species: lions    Prevalence: 44.81%.
Species: tigers   Prevalence: 33.17%.
Species: bears    Prevalence: 22.02%.
In [20]:
summary.iloc[:, 3:5]
Out[20]:
hpd_2.5 hpd_97.5
lions 0.137824 0.749909
tigers 0.054202 0.626152
bears 0.018361 0.488707

The means after sample from the posterior appraoch the expected value, but sampling gives us a range of uncertaintity. In the limit, this uncertainity should become more narrow

Diagnostic plots

Posterior plot

In [24]:
ax = pm.plot_posterior(trace, varnames = ['parameters'], 
                       figsize = (20, 10),markeredgecolor = 'k');

plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
    ax[i].set_title(a);
/anaconda3/lib/python3.7/site-packages/pymc3/plots/__init__.py:40: UserWarning: Keyword argument `varnames` renamed to `var_names`, and will be removed in pymc3 3.8
  warnings.warn('Keyword argument `{old}` renamed to `{new}`, and will be removed in pymc3 3.8'.format(old=old, new=new))

Traceplot

The traceplot shows a kernel density estimate (a smoothed histogram) on the elt and all the samples that were drawn on the right. We collapse, the chains on the plots (combined = True) but in realtiy we drew 2 independent chains

In [33]:
prop_cycle = plt.rcParams['axes.prop_cycle']
cs = [x['color'] for x in list(prop_cycle)]

ax = pm.traceplot(trace, varnames = ['parameters'], figsize = (20, 12), combined = True);
ax[0][0].set_title('Posterior Probability Distribution'); ax[0][1].set_title('Trace Samples');
ax[0][0].set_xlabel('Probability'); ax[0][0].set_ylabel('Density');
ax[0][1].set_xlabel('Sample number');
add_legend(ax[0][0])
add_legend(ax[0][1])
/anaconda3/lib/python3.7/site-packages/pymc3/plots/__init__.py:40: UserWarning: Keyword argument `varnames` renamed to `var_names`, and will be removed in pymc3 3.8
  warnings.warn('Keyword argument `{old}` renamed to `{new}`, and will be removed in pymc3 3.8'.format(old=old, new=new))

MAPP results with PyMC3,

PyMC3 has a method for directly finidng the MAP

In [34]:
with model:
    #Find the MAP estimate
    map_ = pm.find_MAP()
    
display_probs(dict(zip(animals,map_['parameters'])))
/anaconda3/lib/python3.7/site-packages/pymc3/tuning/starting.py:61: UserWarning: find_MAP should not be used to initialize the NUTS sampler, simply call pymc3.sample() and it will automatically initialize NUTS in a better way.
  warnings.warn('find_MAP should not be used to initialize the NUTS sampler, simply call pymc3.sample() and it will automatically initialize NUTS in a better way.')
logp = -1.8042, ||grad|| = 1.118: 100%|██████████| 7/7 [00:00<00:00, 1338.51it/s]
Species: lions    Prevalence: 50.00%.
Species: tigers   Prevalence: 33.33%.
Species: bears    Prevalence: 16.67%.

Sample from Posterior

WE can now use the posterior to draw samples from the data.

In [35]:
with model:
    samples = pm.sample_ppc(trace, samples = 1000)
    
dict(zip(animals, samples['observed_data'].mean(axis = 0)))
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: DeprecationWarning: sample_ppc() is deprecated.  Please use sample_posterior_predictive()
  
100%|██████████| 1000/1000 [00:00<00:00, 1124.03it/s]
Out[35]:
{'lions': 2.682, 'tigers': 1.938, 'bears': 1.38}

Show sampling distributions

In [37]:
sample_df = pd.DataFrame(samples['observed_data'],columns=animals)
plt.figure(figsize = (22, 8))
for i, animal in enumerate(sample_df):
    plt.subplot(1, 3, i+1)
    sample_df[animal].value_counts().sort_index().plot.bar(color = 'm');
    plt.xticks(range(7), range(7), rotation = 0);
    plt.xlabel('Number of Times Seen'); plt.ylabel('Occurences');
    plt.title(f'1000 Samples for {animal}');

Dirichlet Distribution

In [38]:
draw_pdf_contours(Dirichlet(6 * alphas))
annotate_plot()
In [39]:
draw_pdf_contours(Dirichlet(6 * pvals))
annotate_plot();
In [40]:
draw_pdf_contours(Dirichlet([0.1, 0.1, 0.1]))
annotate_plot();

Next Observation

In order to find what we can expect from the next observation, we draw a single sample of 10000 times form a multinomial distribution. The probaility of seeing each species is proportional to that obtained from the sampling.

In [44]:
#draw from the multinomial posterior
next_obs = np.random.multinomial(n=1,pvals=pvals,size=10000)
# Data manipulation
next_obs = pd.melt(pd.DataFrame(next_obs, columns = ['Lions', 'Tigers', 'Bears'])).\
            groupby('variable')['value'].\
            value_counts(normalize=True).to_frame().\
             rename(columns = {'value': 'total'}).reset_index()
next_obs = next_obs.loc[next_obs['value'] == 1]

# Bar plot
next_obs.set_index('variable')['total'].plot.bar(figsize = (8, 6));
plt.title('Next Observation Likelihood');
plt.ylabel('Likelihood'); plt.xlabel('');

Looks like a lion is next!

next_obs.iloc[:, [0, 2]]

More observations

If we have additional observations, more trips to the preserve, we can easily do the sampling taking into accound this data.

In [46]:
c = np.array([[3,2,1],
            [2,3,1],
            [3,2,1],
            [2,3,1]])
c
Out[46]:
array([[3, 2, 1],
       [2, 3, 1],
       [3, 2, 1],
       [2, 3, 1]])
In [47]:
c.shape
Out[47]:
(4, 3)
In [50]:
with pm.Model() as model:
    # Parameters are a dirichlet distribution
    parameters = pm.Dirichlet('parameters', a=alphas, shape=3)
    # Observed data is a multinomial distribution
    observed_data = pm.Multinomial(
        'observed_data', n=6, p=parameters, shape=3, observed=c)    
    
    trace = pm.sample(draws=1000, chains=2, tune=500, discard_tuned_samples=True)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [parameters]
Sampling 2 chains: 100%|██████████| 3000/3000 [00:01<00:00, 2794.31draws/s]
In [51]:
summary = pm.summary(trace)
summary.index = animals
summary
Out[51]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
lions 0.406921 0.091226 0.002188 0.226976 0.581177 1809.102595 0.999936
tigers 0.408330 0.092009 0.002176 0.232542 0.588023 2021.664384 0.999609
bears 0.184749 0.070314 0.001662 0.064785 0.321719 1611.667404 0.999680

The uncertaintity of the prevalence of bears has decreased and the prevalence of the lions and tigers is nearly identical as expected from the data. As we gather more data, we can incorporate it into the model to get more accurate estimates. A larger amount of data will mean less uncertainty in the sampling as well.

In [58]:
ax = pm.plot_posterior(trace, varnames = ['parameters'], 
                       figsize = (20, 10), markeredgecolor= 'k')

plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
    ax[i].set_title(a)
    
plt.suptitle('Posterior with More Observations', y = 1.05)
/anaconda3/lib/python3.7/site-packages/pymc3/plots/__init__.py:40: UserWarning: Keyword argument `varnames` renamed to `var_names`, and will be removed in pymc3 3.8
  warnings.warn('Keyword argument `{old}` renamed to `{new}`, and will be removed in pymc3 3.8'.format(old=old, new=new))
Out[58]:
Text(0.5, 1.05, 'Posterior with More Observations')

Increasing/Decreasing Confidence in Hyperparmeters

WE can also increase or decrease our confidence in out inital belief (our priors) by changing the weighting of the hyperparameter vecotor. Initially we use all 1's but we can reduce or increase this!

In [54]:
# observations
animals = ['lions', 'tigers', 'bears']
c = np.array([3, 2, 1])

def sample_with_priors(alphas):
    """Sample with specified hyperparameters"""
    with pm.Model() as model:
        # Probabilities for each species
        parameters = pm.Dirichlet('parameters', a=alphas, shape=3)
        # Observed data is a multinomial distribution with 6 trials
        observed_data = pm.Multinomial(
            'observed_data', n=6, p=parameters, shape=3, observed=c)    

        trace = pm.sample(draws=1000, chains=2, tune=500, discard_tuned_samples=True)
    return trace
In [55]:
trace_dict = {}
for alpha_array in [np.array([0.1, 0.1, 0.1]), np.array([1, 1, 1]),
                    np.array([5, 5, 5]), np.array([15, 15, 15])]:
    trace_dict[str(alpha_array[0])] = sample_with_priors(alpha_array)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [parameters]
Sampling 2 chains: 100%|██████████| 3000/3000 [00:01<00:00, 2821.89draws/s]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [parameters]
Sampling 2 chains: 100%|██████████| 3000/3000 [00:00<00:00, 3054.93draws/s]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [parameters]
Sampling 2 chains: 100%|██████████| 3000/3000 [00:01<00:00, 2732.81draws/s]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [parameters]
Sampling 2 chains: 100%|██████████| 3000/3000 [00:01<00:00, 2705.66draws/s]
In [56]:
plt.figure(figsize = (20, 24))

for ii, (alpha, trace) in enumerate(trace_dict.items()):
    plt.subplot(4, 1, ii + 1)
    array = trace['parameters']
    for jj, animal in enumerate(animals):
        sns.kdeplot(array[:, jj], label = f'{animal}')
    plt.legend();
    plt.xlabel('Probability'); plt.ylabel('Density')
    plt.title(f'Alpha = {alpha}');
    plt.xlim((0, 1));
    
plt.tight_layout()
plt.show()

As the weight increases, the estimated prevalence tends to collapse to 1/3 for each species. These results are inuitive; We have a greater belief that the prevalences are equivalent, then the limited data we collected should not change our beliefs greatly

Comparison of Beliefs¶

Next we compare the hyperparameters $\alpha = [0.1, 0.1, 0.1]$ with $\alpha = [15, 15, 15]$

In [59]:
prior = '0.1'
trace = trace_dict[prior]

ax = pm.plot_posterior(trace, varnames = ['parameters'], 
                       figsize = (20, 10), markeredgecolor = 'k')

plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
    ax[i].set_title(a)
    
plt.suptitle(f'{prior} Prior', y = 1.05)
/anaconda3/lib/python3.7/site-packages/pymc3/plots/__init__.py:40: UserWarning: Keyword argument `varnames` renamed to `var_names`, and will be removed in pymc3 3.8
  warnings.warn('Keyword argument `{old}` renamed to `{new}`, and will be removed in pymc3 3.8'.format(old=old, new=new))
Out[59]:
Text(0.5, 1.05, '0.1 Prior')
In [60]:
summary = pm.summary(trace)
summary.index = animals
summary
Out[60]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
lions 0.484894 0.176648 0.004198 0.169706 0.818812 1802.463685 0.999508
tigers 0.339925 0.174586 0.004291 0.028798 0.655781 1796.585232 1.000536
bears 0.175181 0.145737 0.003541 0.000211 0.472574 1519.119630 1.000730
In [62]:
prior = '15'
trace = trace_dict[prior]

ax = pm.plot_posterior(trace, varnames = ['parameters'], 
                       figsize = (20, 10), markeredgecolor = 'k')

plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
    ax[i].set_title(a)
    
plt.suptitle(f'{prior} Prior', y = 1.05)
/anaconda3/lib/python3.7/site-packages/pymc3/plots/__init__.py:40: UserWarning: Keyword argument `varnames` renamed to `var_names`, and will be removed in pymc3 3.8
  warnings.warn('Keyword argument `{old}` renamed to `{new}`, and will be removed in pymc3 3.8'.format(old=old, new=new))
Out[62]:
Text(0.5, 1.05, '15 Prior')
In [63]:
summary = pm.summary(trace)
summary.index = animals
summary
Out[63]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
lions 0.352274 0.065472 0.001486 0.223575 0.477980 2065.347872 1.000749
tigers 0.334161 0.065777 0.001478 0.204646 0.459013 1860.216769 0.999806
bears 0.313564 0.062348 0.001444 0.194092 0.434725 2020.395211 0.999847

Overall, the choise of the hyperpriors depences on our confidence in out initial belief. If we have good reason to believe all the animals are represneted at the same frequency, then we should increase the weight of the priors!

Conclusions

In [64]:
prior = '1'
trace = trace_dict[prior]
summary = pm.summary(trace)
summary.index = animals
summary
Out[64]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
lions 0.442426 0.155742 0.003330 0.155825 0.746230 2173.840246 1.000089
tigers 0.339080 0.151136 0.003457 0.070684 0.634002 1995.054732 0.999538
bears 0.218494 0.131369 0.003193 0.012217 0.473441 1714.450285 1.000788
In [65]:
ax = pm.plot_posterior(trace, varnames = ['parameters'], 
                       figsize = (20, 10), markeredgecolor = 'k');

plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
    ax[i].set_title(a)
    
plt.suptitle('Posterior Plot', y = 1.05)
/anaconda3/lib/python3.7/site-packages/pymc3/plots/__init__.py:40: UserWarning: Keyword argument `varnames` renamed to `var_names`, and will be removed in pymc3 3.8
  warnings.warn('Keyword argument `{old}` renamed to `{new}`, and will be removed in pymc3 3.8'.format(old=old, new=new))
Out[65]:
Text(0.5, 1.05, 'Posterior Plot')

Estimated Prealences

  • Lions: 44.5% (16.9% - 75.8%)
  • Tigers: 32.7% (6.7% - 60.5%)
  • Bears: 22.7% (1.7% - 50.0%)

Proba next obs is bear is:

22.9 bases on the sampling

The best parts about Bayesian Inference are the incorporation of priors and the uncertainty inherent in the methods. With the scant evidence, we can provide estimates, but only with a large amount of uncertainty. More visits to the wildlife preserve would certainly help to clear up the matter!

In this notebook, we were able to explore the basics of Bayesian modeling and probabilistic programming. Although this was only a simple problem, we can use the concepts here to build more complex models.

In [ ]: