Problem:
Today's Bayesian problem of the week: Suppose we visit a wild animal preserve where we know that the only animals are lions and tigers and bears, but we don't know how many of each there are. During the tour, we see 3 lions, 2 tigers, and 1 bear. Assuming that every animal had an equal chance to appear in our sample, estimate the prevalence of each species. What is the probability that the next animal we see is a bear?
In a frequentist view, we would just the the observed animals to esimate the prevalance and let the data completetly speak for itself. In, contract, a Bayesian framework we need to incorpate priors. In the limit as n observations tends to infinity, the effect of the priors disappear, we and only use the data. We can alger the weigght of the priors depending on out confidence in them
The overall sysm is as follows:
THe model can be expressed as:
$$ \begin{align} \alpha = (\alpha_{1}...,\alpha_{K}) && = concentration hyperparameter \\ p | \alpha = (\alpha_{1}...,\alpha_{K})&& = (K,\alpha) \\ X | p = (x_{1}...,x_{K}) && = (K,p) \end{align} $$Our goal is to fine $p_{lions},p_{tigers},p_{bears}$ given for the observation vector $c = [c_{lions},c_{tigers},c_{bears}]$
The problem is a classic example of hte multi. dist, which desribes a situation where we have n independent trials, each with k possible outcomes.
The prior for a multinomial distribution in Bayesian statistics is a Dirichlet distribution. (Together, a multinomial distribution with a dirichlet prior is called, not surprisingly, a Dirichlet-Multinomial Distribution. The Dirichlet distribution is characterized by $\alpha$, the concentration hyperparameter vector.
The $\alpha$ vector is a hyperparameter, a parameter of a prior distribution. This vector in turn could have its own prior distribution which is called a hyperprior. We won't use a hyperprior, but instead will only specify the hyperparameters.
The hyperparameter vector can be thought of as pseudo-counts, which we use to show our prior belief for the prevalence of each species. If want a uniform hyperparameter reflecting that we believe the chance of observing any species is the same we set each element of alpha equal such as $\alpha = [1, 1, 1]$. We can increase our decrease the effect of the priors by increasing or decreasing the values. This can be useful when we have more or less confidence in our prior beliefs. We'll see the effects of adjusting the hyperparameters in the notebook, but will mainly work with $\alpha = [1, 1, 1]$.
import pandas as pd
import numpy as np
# Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
plt.rcParams['font.size'] = 22
%matplotlib inline
from matplotlib import MatplotlibDeprecationWarning
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=MatplotlibDeprecationWarning)
import pymc3 as pm
import arviz
#helper functions
'''Functions for drawing contours of Dirichlet distributions.'''
# Author: Thomas Boggs
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.tri as tri
from functools import reduce
_corners = np.array([[0, 0], [1, 0], [0.5, 0.75**0.5]])
_triangle = tri.Triangulation(_corners[:, 0], _corners[:, 1])
_midpoints = [(_corners[(i + 1) % 3] + _corners[(i + 2) % 3]) / 2.0 \
for i in range(3)]
def xy2bc(xy, tol=1.e-3):
'''Converts 2D Cartesian coordinates to barycentric.
Arguments:
`xy`: A length-2 sequence containing the x and y value.
'''
s = [(_corners[i] - _midpoints[i]).dot(xy - _midpoints[i]) / 0.75 \
for i in range(3)]
return np.clip(s, tol, 1.0 - tol)
class Dirichlet(object):
def __init__(self, alpha):
'''Creates Dirichlet distribution with parameter `alpha`.'''
from math import gamma
from operator import mul
self._alpha = np.array(alpha)
self._coef = gamma(np.sum(self._alpha)) / \
reduce(mul, [gamma(a) for a in self._alpha])
def pdf(self, x):
'''Returns pdf value for `x`.'''
from operator import mul
return self._coef * reduce(mul, [xx ** (aa - 1)
for (xx, aa)in zip(x, self._alpha)])
def sample(self, N):
'''Generates a random sample of size `N`.'''
return np.random.dirichlet(self._alpha, N)
def draw_pdf_contours(dist, border=False, nlevels=200, subdiv=8, **kwargs):
'''Draws pdf contours over an equilateral triangle (2-simplex).
Arguments:
`dist`: A distribution instance with a `pdf` method.
`border` (bool): If True, the simplex border is drawn.
`nlevels` (int): Number of contours to draw.
`subdiv` (int): Number of recursive mesh subdivisions to create.
kwargs: Keyword args passed on to `plt.triplot`.
'''
from matplotlib import ticker, cm
import math
refiner = tri.UniformTriRefiner(_triangle)
trimesh = refiner.refine_triangulation(subdiv=subdiv)
pvals = [dist.pdf(xy2bc(xy)) for xy in zip(trimesh.x, trimesh.y)]
plt.tricontourf(trimesh, pvals, nlevels, **kwargs)
plt.axis('equal')
plt.xlim(0, 1)
plt.ylim(0, 0.75**0.5)
plt.axis('off')
if border is True:
plt.hold(1)
plt.triplot(_triangle, linewidth=1)
def plot_points(X, barycentric=True, border=True, **kwargs):
'''Plots a set of points in the simplex.
Arguments:
`X` (ndarray): A 2xN array (if in Cartesian coords) or 3xN array
(if in barycentric coords) of points to plot.
`barycentric` (bool): Indicates if `X` is in barycentric coords.
`border` (bool): If True, the simplex border is drawn.
kwargs: Keyword args passed on to `plt.plot`.
'''
if barycentric is True:
X = X.dot(_corners)
plt.plot(X[:, 0], X[:, 1], 'k.', ms=1, **kwargs)
plt.axis('equal')
plt.xlim(0, 1)
plt.ylim(0, 0.75**0.5)
plt.axis('off')
if border is True:
plt.hold(1)
plt.triplot(_triangle, linewidth=1)
def annotate_plot():
cs = ['#008fd5', '#fc4f30', '#e5ae38']
animals = ['lions', 'tigers', 'bears']
for i, (xy, s) in enumerate(zip([(-0.15, 0.075), (0.38, 0.868), (0.98, 0.075)],
animals)):
plt.annotate(s, xy, color = cs[i], size = 20)
def add_legend(ax):
"""Function to add legend to plots"""
animals = ['lions', 'tigers', 'bears']
for l, a in zip(ax.get_lines()[:3], animals):
ax.plot(0, 0, label = a, c = l.get_color())
ax.legend()
def display_probs(d):
for key, value in d.items():
print(f'Species: {key:8} Prevalence: {100*value:.2f}%.')
# if __name__ == '__main__':
# f = plt.figure(figsize=(8, 6))
# alphas = [[0.999] * 3,
# [5] * 3,
# [2, 5, 15]]
# for (i, alpha) in enumerate(alphas):
# plt.subplot(2, len(alphas), i + 1)
# dist = Dirichlet(alpha)
# draw_pdf_contours(dist)
# title = r'$\alpha$ = (%.3f, %.3f, %.3f)' % tuple(alpha)
# plt.title(title, fontdict={'fontsize': 8})
# plt.subplot(2, len(alphas), i + 1 + len(alphas))
# plot_points(dist.sample(5000))
# plt.savefig('dirichlet_plots.png')
# print 'Wrote plots to "dirichlet_plots.png".'
Mainly use one verionsof the hyperparmeters where they are all 1
#observations
animals = ['lions','tigers','bears']
c = np.array([3,2,1])
#hyperparams
alphas = np.array([1,1,1])
alpha_list = [np.array([0.1, 0.1, 0.1]), np.array([1, 1, 1]),
np.array([5, 5, 5]), np.array([15, 15, 15])]
A way to get a sinlge point esimate of the prevalance is to use the expected value of the posterior for $p_{k}$. The expected value of the Dirchilet mult distr is:
$ \begin{align} [p_{i} | X,\alpha] = \frac{c_{i} + \alpha_{i}}{N + \sum_{k} \alpha_{k}} \end{align} $
Using $\alpha = [1, 1, 1]$, and the observation vector $c = [3, 2, 1]$ we get the expected prevalances:
$$p_{lions} = \frac{4}{9} = 44.4\%$$$$p_{tigers} = \frac{3}{9} = 33.3\%$$$$p_{bears} = \frac{2}{9} = 22.2\%$$Does not extend to limited amount of data. For that we turn to Bayesian modeling, and sample fromt the posterior MCMC. The expected value represent our best guess for what a single trial. We can also oalter the expected value by adjusting the weight of the hyperparm vectro
display_probs(dict(zip(animals, (alphas + c) / (c.sum() + alphas.sum()))))
display_probs(dict(zip(animals, (4/9, 3/9, 2/9))))
Let's try a difference hyperparams value and see how expected value changes
values = []
for alpha_new in alpha_list:
values.append((alpha_new + c) / (c.sum() + alpha_new.sum()))
value_df = pd.DataFrame(values, columns = animals)
value_df['alphas'] = [str(x) for x in alpha_list]
value_df
melted = pd.melt(value_df, id_vars = 'alphas', value_name='prevalence',
var_name = 'species')
plt.figure(figsize = (8, 6))
sns.barplot(x = 'alphas', y = 'prevalence', hue = 'species', data = melted,
edgecolor = 'k', linewidth = 1.5);
plt.xticks(size = 14); plt.yticks(size = 14)
plt.title('Expected Value');
Note, that as the weight of the hyperparams (these are the alpha values used from the posterior dirchilet dist) increases, the prevalcne conveges to around 1/3. Intutitvely, as we increase the number of pseudocounts, we express more strognly our inital belief of the equal prevalence scenario. With heavier priors, the data matters less.
On the hand, when we decrease the weight of the hyperparams, the data matters more and the expected value convered on the counts. The ultimate choise of the hyperparms depends on our confidence in out prior
MAP is another point estimation that is equal to the mode of the posterior distribution of the posterior distribution: $${\displaystyle \operatorname {arg\,max} \limits _{\mathbf {p} }p(\mathbf {p} \mid \mathbb {X} )={\frac {\alpha _{i}+c_{i}-1}{\sum _{i}(\alpha _{i}+c_{i}-1)}},\qquad \forall i\;\alpha _{i}+c_{i}>1}$$ Note the lapacian smoothing:
In the case of $\alpha = [1, 1, 1]$ , this turns into the frequency of observed in the data
display_probs(dict(zip(animals, (alphas + c - 1) / sum(alphas + c - 1))))
display_probs(dict(zip(animals, c / c.sum())))
values = []
for alpha_new in alpha_list:
values.append((alpha_new + c - 1) / sum(alpha_new + c - 1))
value_df = pd.DataFrame(values, columns = animals)
value_df['alphas'] = [str(x) for x in alpha_list]
value_df
melted = pd.melt(value_df, id_vars = 'alphas', value_name='prevalence',
var_name = 'species')
plt.figure(figsize = (8, 6))
sns.barplot(x = 'alphas', y = 'prevalence', hue = 'species', data = melted,
edgecolor = 'k', linewidth = 1.5);
plt.xticks(size = 14); plt.yticks(size = 14)
plt.title('Maximum A Posterior Value');
Same effect occurs! Both expected value and MAP only provide a point estimate with no uncertainty. Given the relatively few observations, there should only be a large amount of uncertaintiy...
The objective is to find the parameters of the multinomial dist, $p_{k}$ which are the probs of each species given the evidence. Recally, we are using a multinomial as our model, a Dirichilet distributino as the prior, and the specified hyperparam vector: $ \begin{align} (p|X,\alpha) \end{align} $
Build model using PyMC3, and then use a variant of Markov Chain Monte Carlo (the No-Uturn Sample specifically) to draw sample from the posterior. With enough samples, the estimate will convered on the true posterior. ALong with single point estimates (such as the mean of the sampled values), MCMC also give us the built in uncertaintiy because we get thousands of possible values from the posterior
with pm.Model() as model:
# Parameters of the Multinomial are from a Dirichlet
parameters = pm.Dirichlet('parameters', a=alphas, shape=3)
# Observed data is from a Multinomial distribution
observed_data = pm.Multinomial(
'observed_data', n=6, p=parameters, shape=3, observed=c)
print(parameters)
print(alphas)
print(c)
model
Draw 1000 from the posterior in 2 chains. WE use 500 smaples for tuning which are discarded. This meand that for each random variable in the model - the random variables are the parameters - we will have 2000 values drawn from the posterior distribution
with model:
# Sample from the posterior
trace = pm.sample(draws=1000, chains=2, tune=500,
discard_tuned_samples=True)
summary = pm.summary(trace)
summary.index = animals
summary
We can see that the mean of the samples is very close to the expected value. However, instead of just getting one number, we get a rang of uncertaintity
# Samples
trace_df = pd.DataFrame(trace['parameters'], columns = animals)
trace_df.head()
trace_df.shape
For a single point estimate, we can use the mean of the samples
# For probabilities use samples after burn in
pvals = trace_df.iloc[:, :3].mean(axis = 0)
display_probs(dict(zip(animals, pvals)))
summary.iloc[:, 3:5]
The means after sample from the posterior appraoch the expected value, but sampling gives us a range of uncertaintity. In the limit, this uncertainity should become more narrow
ax = pm.plot_posterior(trace, varnames = ['parameters'],
figsize = (20, 10),markeredgecolor = 'k');
plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
ax[i].set_title(a);
The traceplot shows a kernel density estimate (a smoothed histogram) on the elt and all the samples that were drawn on the right. We collapse, the chains on the plots (combined = True) but in realtiy we drew 2 independent chains
prop_cycle = plt.rcParams['axes.prop_cycle']
cs = [x['color'] for x in list(prop_cycle)]
ax = pm.traceplot(trace, varnames = ['parameters'], figsize = (20, 12), combined = True);
ax[0][0].set_title('Posterior Probability Distribution'); ax[0][1].set_title('Trace Samples');
ax[0][0].set_xlabel('Probability'); ax[0][0].set_ylabel('Density');
ax[0][1].set_xlabel('Sample number');
add_legend(ax[0][0])
add_legend(ax[0][1])
PyMC3 has a method for directly finidng the MAP
with model:
#Find the MAP estimate
map_ = pm.find_MAP()
display_probs(dict(zip(animals,map_['parameters'])))
WE can now use the posterior to draw samples from the data.
with model:
samples = pm.sample_ppc(trace, samples = 1000)
dict(zip(animals, samples['observed_data'].mean(axis = 0)))
Show sampling distributions
sample_df = pd.DataFrame(samples['observed_data'],columns=animals)
plt.figure(figsize = (22, 8))
for i, animal in enumerate(sample_df):
plt.subplot(1, 3, i+1)
sample_df[animal].value_counts().sort_index().plot.bar(color = 'm');
plt.xticks(range(7), range(7), rotation = 0);
plt.xlabel('Number of Times Seen'); plt.ylabel('Occurences');
plt.title(f'1000 Samples for {animal}');
draw_pdf_contours(Dirichlet(6 * alphas))
annotate_plot()
draw_pdf_contours(Dirichlet(6 * pvals))
annotate_plot();
draw_pdf_contours(Dirichlet([0.1, 0.1, 0.1]))
annotate_plot();
In order to find what we can expect from the next observation, we draw a single sample of 10000 times form a multinomial distribution. The probaility of seeing each species is proportional to that obtained from the sampling.
#draw from the multinomial posterior
next_obs = np.random.multinomial(n=1,pvals=pvals,size=10000)
# Data manipulation
next_obs = pd.melt(pd.DataFrame(next_obs, columns = ['Lions', 'Tigers', 'Bears'])).\
groupby('variable')['value'].\
value_counts(normalize=True).to_frame().\
rename(columns = {'value': 'total'}).reset_index()
next_obs = next_obs.loc[next_obs['value'] == 1]
# Bar plot
next_obs.set_index('variable')['total'].plot.bar(figsize = (8, 6));
plt.title('Next Observation Likelihood');
plt.ylabel('Likelihood'); plt.xlabel('');
Looks like a lion is next!
next_obs.iloc[:, [0, 2]]
If we have additional observations, more trips to the preserve, we can easily do the sampling taking into accound this data.
c = np.array([[3,2,1],
[2,3,1],
[3,2,1],
[2,3,1]])
c
c.shape
with pm.Model() as model:
# Parameters are a dirichlet distribution
parameters = pm.Dirichlet('parameters', a=alphas, shape=3)
# Observed data is a multinomial distribution
observed_data = pm.Multinomial(
'observed_data', n=6, p=parameters, shape=3, observed=c)
trace = pm.sample(draws=1000, chains=2, tune=500, discard_tuned_samples=True)
summary = pm.summary(trace)
summary.index = animals
summary
The uncertaintity of the prevalence of bears has decreased and the prevalence of the lions and tigers is nearly identical as expected from the data. As we gather more data, we can incorporate it into the model to get more accurate estimates. A larger amount of data will mean less uncertainty in the sampling as well.
ax = pm.plot_posterior(trace, varnames = ['parameters'],
figsize = (20, 10), markeredgecolor= 'k')
plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
ax[i].set_title(a)
plt.suptitle('Posterior with More Observations', y = 1.05)
WE can also increase or decrease our confidence in out inital belief (our priors) by changing the weighting of the hyperparameter vecotor. Initially we use all 1's but we can reduce or increase this!
# observations
animals = ['lions', 'tigers', 'bears']
c = np.array([3, 2, 1])
def sample_with_priors(alphas):
"""Sample with specified hyperparameters"""
with pm.Model() as model:
# Probabilities for each species
parameters = pm.Dirichlet('parameters', a=alphas, shape=3)
# Observed data is a multinomial distribution with 6 trials
observed_data = pm.Multinomial(
'observed_data', n=6, p=parameters, shape=3, observed=c)
trace = pm.sample(draws=1000, chains=2, tune=500, discard_tuned_samples=True)
return trace
trace_dict = {}
for alpha_array in [np.array([0.1, 0.1, 0.1]), np.array([1, 1, 1]),
np.array([5, 5, 5]), np.array([15, 15, 15])]:
trace_dict[str(alpha_array[0])] = sample_with_priors(alpha_array)
plt.figure(figsize = (20, 24))
for ii, (alpha, trace) in enumerate(trace_dict.items()):
plt.subplot(4, 1, ii + 1)
array = trace['parameters']
for jj, animal in enumerate(animals):
sns.kdeplot(array[:, jj], label = f'{animal}')
plt.legend();
plt.xlabel('Probability'); plt.ylabel('Density')
plt.title(f'Alpha = {alpha}');
plt.xlim((0, 1));
plt.tight_layout()
plt.show()
As the weight increases, the estimated prevalence tends to collapse to 1/3 for each species. These results are inuitive; We have a greater belief that the prevalences are equivalent, then the limited data we collected should not change our beliefs greatly
Next we compare the hyperparameters $\alpha = [0.1, 0.1, 0.1]$ with $\alpha = [15, 15, 15]$
prior = '0.1'
trace = trace_dict[prior]
ax = pm.plot_posterior(trace, varnames = ['parameters'],
figsize = (20, 10), markeredgecolor = 'k')
plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
ax[i].set_title(a)
plt.suptitle(f'{prior} Prior', y = 1.05)
summary = pm.summary(trace)
summary.index = animals
summary
prior = '15'
trace = trace_dict[prior]
ax = pm.plot_posterior(trace, varnames = ['parameters'],
figsize = (20, 10), markeredgecolor = 'k')
plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
ax[i].set_title(a)
plt.suptitle(f'{prior} Prior', y = 1.05)
summary = pm.summary(trace)
summary.index = animals
summary
Overall, the choise of the hyperpriors depences on our confidence in out initial belief. If we have good reason to believe all the animals are represneted at the same frequency, then we should increase the weight of the priors!
prior = '1'
trace = trace_dict[prior]
summary = pm.summary(trace)
summary.index = animals
summary
ax = pm.plot_posterior(trace, varnames = ['parameters'],
figsize = (20, 10), markeredgecolor = 'k');
plt.rcParams['font.size'] = 22
for i, a in enumerate(animals):
ax[i].set_title(a)
plt.suptitle('Posterior Plot', y = 1.05)
22.9 bases on the sampling
The best parts about Bayesian Inference are the incorporation of priors and the uncertainty inherent in the methods. With the scant evidence, we can provide estimates, but only with a large amount of uncertainty. More visits to the wildlife preserve would certainly help to clear up the matter!
In this notebook, we were able to explore the basics of Bayesian modeling and probabilistic programming. Although this was only a simple problem, we can use the concepts here to build more complex models.