## Monday, 25 September 2017

### Small Workshop on JAGS and Bayesian

Download JAGS files:

- for Windows:R_jags_tutorial_MSDOS.R

- for Mac/Linux: R_jags_tutorial.R

Current version of presentation (PDF)

- Bayesian JAGS tutorial presentation

## Wednesday, 6 September 2017

### Workshop on Bayesian Capture-Recapture: 2017 Conference for the Society of Marine Mammology, Halifax NS Canada

More information about registration and the price can be found here. The workshop is only available for conference attendees.

**Description**. This workshop is run in response to recent advances in CR modelling showing that there are number of advantages in implementing Bayesian CR models over other popular methods. These advantages will be highlighted and discussed throughout the workshop.

The workshop will include an introduction to Bayesian theory and beginner/intermediate tutorials on R and JAGS. JAGS is the popular and flexible Bayesian scripting language that can code CR models and much more! The workshop is suitable for students, researchers, professors, veteran program MARK users, and anyone who is already familiar with CR and would like to learn how to implement Bayesian CR models. Intermediate familiarity with R and CR is expected.

Outline of the workshop: i) Bayesian philosophy; ii) introduction to the BUGS/JAGS language; iii-v) practical tutorials to implement common CR models for cetaceans (POPAN, PCRD, individual-covariate models); vi) Bayesian model-selection; vii) Hierarchical Bayesian models; viii) open-session to be determined by pre-workshop participant outreach (spatial capture-recapture, cloud-computing and high-performance computing, multi-event).

You can register for this workshop online at the conference website http://www.smmconference.org/.

We welcome participants to express their interest in any particular topics to be covered or discussed during the workshop and invite questions on and submissions of difficult data sets, which we may be able to work on and discuss in the workshop.

If you have any inquiries about the workshop or its content, please do not hesitate to contact Dr. Robert Rankin at robertw.rankin@gmail.com

## Tuesday, 10 January 2017

### A Primer on the Bias-Variance Trade-off in ecology

*This blog is a originally from the Appendix of Rankin RW. 2016. EM and component-wise boosting for Hidden Markov Models: a machine-learning approach to capture-recapture. bioRxiv. doi: 10.1101/052266.*Available from 10.1101/052266

I will use simulations to illustrate the ``bias-variance trade-off'' for ecologists, which is one of the most important ideas in statistical modelling in the past half-century. The goal is to show how the AICc and another model-selection/model-averaging technique called "boosting" each negotiate the trade-off in order to improve estimation. Specifically, in order to minimize the

**expected error**of estimating survival

*phi*over T capture periods. The trade-off is fundamental to understanding the optimality of Frequentist shrinkage estimators and AIC model-selection. The illustrations are inspired by figure 6.5 in Murphy 2012, but is adapted to a type of capture-mark-recapture model called the Cormack-Jolly-Seber model.

The trade-off is an old idea without a citable origin (although Geman et al 1992, is often considered to be a definitive reference, but the phenomenon is clearly discussed as early as 1970 by Hoerl considering Ridge Regression, and likely earlier). Despite being an old and fundamental concept of statistical estimation, I have noticed that it poorly understood among academics and government scientists. In particular, it is my experience that ecologists are unduly wedded to the idea of being unbiased (in estimation), such that when they are presented with visual and quantitative evidence about the optimality of biased shrinkage estimators, they recoil at the sight of systematic bias, and ignore the crucial role of variance. Of course, bias is not desirable in and of itself, but so long as the bias goes to zero at a rate proportional to that of the variance, we may be able to improve our overall estimation performance by incurring a little bias.

In the following simulations, the goal is to minimize the

**Expected Error**of estimating survival, as quantified by the Mean Square Error (MSE). It is a population-level abstract quantity that can only be measured in simulations when we know to the ``true'' process. It is Frequentist in the sense that we hope to minimize the error over all possible data-sets that one might sample from the true population

**Y**. These multiple realizations are shown as grey lines in Figures 1 and 2. Of course, an analyst only has one dataset, and his goal is to get his estimates as close as possible to the truth.

The bias-variance trade-off arises from a classic decomposition of the expected error:

`MSE = E[phi_hat-phi_true]^2 + Var[phi] + constant`

. Figure 1 also shows this decomposition. The first term is the expected difference between an estimate and the true value, i.e. the bias. This difference is visualized as the red polygon in Figure 1. In the same figure, the bias manifests as shrinkage from the true red line towards a flat global mean. Quantifying the bias requires knowledge of the truth *phi_true*, and is therefore inaccessible in real-life situations. The second term is the variance and it does not depend on knowledge of the truth. Rather, it arises due to the vagaries of random sampling as well as the complexity of the estimation procedure: overly complex models which ``over-fit'' one dataset will vary wildly when fitted to a new dataset sampled from the same population. The variance can be visualized as the spread of the grey lines, or the green polygon in Figure 1.

The MSE decomposition has a naive meaning: in order to optimize our estimation performance, we should reduce the bias and/or the variance. Clearly, most ecologists see the value of tackling either of these two terms. But the nature of a trade-off has a more elusive importance: we cannot, in general, minimize both terms for a given sample-size, and we may deliberately increase one term in order to decrease the other. Shrinkage estimators incur a little bias and have lower variance (i.e., the red polygon is bigger but the green polygon is smaller). This strategy results in much smaller MSE values than complex unbiased estimators. In contrast, the MLEs of the complex full-model are unbiased but they typically have very high variance. This strategy is often worse at minimizing the MSE, for small-to-moderate samples sizes.

The following simulations show how different statistical methods have different strategies in negotiating the bias-variance trade-off. Imagine an analyst who is confronted with four different methods to estimate survival. The first is estimation by Maximum Likelihood using the full-model

*p(t)phi(t)*. The second method is AICc model-selection, and the third is AICc model-averaging; both use the following fixed-effects models:

*p(.)phi(.), p(t)phi(.), p(.)phi(t)*, and

*p(t)phi(t)*with obvious constraints on the last terms. The fourth method is called CJSboost, from statistical boosting. It uses submodels called "base-learners" equivalent to the aforementioned fixed-effect models (but without the previous constraints). The AICc-methods should theoretically do best because they are fundamentally motivated by trying to minimize an objective function that is very closely related to MSE called the KL-loss (see Akaike 1979 and Akaike 1998). Likewise, CJSboost is trying to minimize a related generalization-error called the negative Expected log-Likelihood, which is approximated through bootstrapping the capture-histories.

The fake data-sets were generated according to the following. The time-varying survival values were:

`phi_t = cos((t-2.3)/1.2)/11+0.75`

. The time-varying capture-probabilities *p_t*were drawn from a beta distribution with shape parameters A=12 and B=12, resulting in an average capture-probability of 0.5. The

*p_t*values were the same for all simulations. MLE and AICc analyses were run in Program MARK. For CJSboost, a ten-times 70-fold bootstrap-validation exercise was run per dataset to tune the CJSboost regularization parameters. The CJSboost program is available on my Github site. The simulations and analyses were repeated 40 times for three scenarios pertaining to the number of capture-histories

*n = {50,200,800)*.

The results clearly show the trade-off (Figure 2). At high sample sizes (

*n=800*), the shrinkage estimator CJSboost has the lowest MSE and therefore wins at estimating survival. However, it has the highest bias. How can it be considered a better estimator than the other methods when it is biased? The answer is obvious when looking at the grey lines in Figure 2, where each line is an estimate of

*phi*from an independent realization of data: compared to the other methods, each grey line from CJSboost is much more likely to be closer to the truth, despite systematic bias. In contrast,

**using the MLEs, one can only claim to be unbiased over all possible realizations of the data**as shown by the closeness of the dashed black line to the true red line. But, for any one realization (a single grey line) the MLEs can be very far away from the truth due to much higher variance.

At smaller sample sizes, we see that the bias becomes much more extreme for both AICc methods and CJSboost. In the case of the AICc methods, the model with most support is often

*phi(.)*, in which case the estimates are a single flat line. This is also the case in CJSboost, were shrinkage is so extreme as to force a flat line. Therefore, at low sample sizes, we are much better off, in terms of MSE, to use the flat-lined

*phi(.)*estimates rather than use the full-model MLEs, which vary so wildly as to be useless.

This primer is meant to illustrate the role of bias and variance in estimation errors. Simulations show how shrinkage estimators (CJSboost) and model-selection (by AICc) each negotiate the trade-off between bias and variance to try and minimize the Expected Error. CJSboost does particularly better by incurring a little bias.

The original article can be accessed at: http://dx.doi.org/10.1101/052266. Original blog post at mucru.org/a-primer-on-the-bias-variance-trade-off-in-ecology/.

## Tuesday, 10 May 2016

### New pre-print: EM and component-wise boosting for Hidden Markov Models: a machine-learning approach to capture-recapture

A new pre-print article is available online at: http://dx.doi.org/10.1101/052266. The study proposes a new way to fit

**capture-recapture models based on**: a machine-learning method which iteratively fits simple, non-parametric

*boosting**learners*and build a strong prediction function. The ensemble-method is a type of "multi-model inference" and "model-parsimony" technique, not unlike model-averaging by AICc. This new method, called CJSboost, is motivated by several deficiencies in the AICc model-selection approach, such as over-fitting the data and a tendency to produce ridiculous estimates, such as 100% survival. In contrast, boosting has the following benefits:

**variable selection**and step-wise multimodel inference (without the sometimes-impossible task of fitting all possible fixed-effects models, as in AIC-based model averaging);

**regularization**and sparse estimation, which deflates the influence of unimportant covariates (especially important during the current crisis of reproduciblility);

**inadmissible values**(e.g., survival= 100%);

**extensible**(see the wide variety of base-learners available under the "mboost" R package).

**predictive performance**

One disadvantage of the machine-learning paradigm for capture-recapture is the computational burden of running 50 or 100 bootstrap-validation to find the optimal regularization parameters that minimize our generalization error.

### The Prediction Perspective

Capture-recapture practitioners are perennially obsessed with model-selection and model-parsimony: the challenge of revealing important patterns in the data, without being tricked by frivolous false discoveries, i.e.,*not*over-fitting models. The machine-learning community has addressed this challenge through the idea of

*generalization error*: minimizing the cost of bad predictions on new, unseen data. Models that are too simple (e.g., the classic phi(.)p(.) model) are unlikely to make good predictions, whereas overly-complex models are likewise unlikely to generalize to new data.

The boosting framework controls the complexity of the model (called

*regularization*) by trying to minimize the

**generalization error**(aka expected loss), and hence, avoid overfitting the data. Interestingly,

**most capture-recapture practitioners have been implicitly using a prediction criteria, the**Few ecologists seem to recognize the connections between AIC-selection and prediction, and this manuscript tries to make these connections more obvious. However, while the AIC was developed in the context of General Linear Models with Gaussian error, the CJSboost method, proposed in this manuscript, is crafted specifically for capture-recapture models.

*AIC*, to rank and select models.### Sparsity and Prediction

One the of philosophical divisions between**prediction-optimal**regularizers, like boosting or the AICc, and other criteria, like the BIC/SBC, is belief in a

**sparse**"true model": whether there is a small number of truly influential variables that have large/medium effects, or, whether there are an infinite number of influential variables, each with a decreasingly important influence on our data. This division is rooted in whether we want to

**minimize our risk**of making bad numerical predictions, or whether we want to

**recover the "truth"**. The AIC and CJSboost adopt the former persepective: they both reduce the influence of unimportant variables, but there is still some positive-weight placed on unimportant variables (this is the consequence of being prediction optimal!).

If prediction is not one's focus, and one is interested in finding a sparse set of truly influential variables, then CJSboosted coefficients may be "hard-thresholded": discard unimportant covariates and rerun the algorithm. But how to judge which covariates are important? In the manuscript, I describe the technique of "stability selection" (Meinshausen and Bühlmann, 2010) for estimating (approximate) inclusion probabilities, based on repeatedly resampling the data and re-running the CJSboost algorithm. Probabilities lead to straight-forward inference: covariates with high inclusion probabilities are

*probably*important; covariates with low probabilities are probably

*not*important. An interesting thing about l1-regularizers, like boosting algorithms, is that by hard-thresholding the covariates with stability-selection probabilities, we can transform a prediction-optimal procedure into a "model-selection consistent" procedure. This may not be exactly true for capture-recapture models, but the manuscript explores the issue with simulations.

The animation below shows 30 simulations and their stability selection pathways. Each slide is a different simulated dataset with 3 truly influential covariates (in red) and 18 other frivolous covariates, just to try and trick the algorithm. The pathways show that as the regularization gets

*weaker*(larger m, to the right), there is a larger probability that any covariate will be selected by the algorithm. At strong regularization (small m, left), only the 3 most influential covariates (in red) enter the algorithm. Visually, this is an interesting way to discriminate between important and unimportant covariates, and can also help one achieve model-selection consistency.

Simulation: How to discriminate between important covariates (in red) and unimportant covariates (in grey) with stability selection curves. By repeated re-sampling the capture-histories (bootstrapping), and re-training the CJSboost model, we can estimate the posterior probability that a covariate is included in the model.

### Online Tutorial

The manuscript includes an online tutorial in R at: http://github.com/faraway1nspace/HMMboost. Interested readers can step through the analyses in the manuscript.### Peer-review

Readers should note that the manuscript is only a preprint: it is currently in peer-review.@article{Rankin052266,

author = {Rankin, Robert William},

title = {EM and component-wise boosting for Hidden Markov Models: a machine-learning approach to capture-recapture},

year = {2016},

doi = {10.1101/052266},

publisher = {Cold Spring Harbor Labs Journals},

abstract = {This study presents a new boosting method for capture-recapture models, rooted in predictive-performance and machine-learning. The regularization algorithm combines Expectation-Maximization and boosting to yield a type of multimodel inference, including automatic variable selection and control of model complexity. By analyzing simulations and a real dataset, this study shows the qualitatively similar estimates between AICc model-averaging and boosted capture-recapture for the CJS model. I discuss a number of benefits of boosting for capture-recapture, including: i) ability to fit non-linear patterns (regression-trees, splines); ii) sparser, simpler models that are less prone to over-fitting, singularities or boundary-value estimates than conventional methods; iii) an inference paradigm that is rooted in predictive-performance and free of p-values or 95\% confidence intervals; and v) estimates that are slightly biased, but are more stable over multiple realizations of the data. Finally, I discuss some philosophical considerations to help practitioners motivate the use of either prediction-optimal methods (AIC, boosting) or model-consistent methods. The boosted capture-recapture framework is highly extensible and could provide a rich, unified framework for addressing many topics in capture-recapture, such as spatial capture-recapture, individual heterogeneity, and non-linear effects.},

URL = {http://dx.doi.org/10.1101/052266},

eprint = {http://www.biorxiv.org/content/early/2016/05/09/052266.full.pdf},

journal = {bioRxiv}

}

## Monday, 21 March 2016

### New publication: A Bayesian Hierarchical Pollock's closed robust design

Rankin, R. W., Nicholson, K. E., Allen, S. J., Krützen, M., Bejder, L., & Pollock, K. H. (2016). A full-capture Hierarchical Bayesian model of Pollock’s Closed Robust Design and application to dolphins. Frontiers in Marine Science,3(25). doi:10.3389/fmars.2016.00025

We are pleased to announce our new publication on Shark Bay bottlenose dolphins which benchmarks model-averaging in Program MARK and a Bayesian Hierarchical model for temporary-migration Robust Design mark-recapture models. See below for the citation, link to the

**free full-text PDF**, and an

**online R/JAGS tutorial for Bayesian mark-capture**.

### Alternative to AIC Model-Averaging

The paper will be of interest to cetacean researchers who use**Program MARK**for temporary-migration Robust Design models. In particular that we show that a Hierarchical Bayesian model can

**yield similar estimates as model-averaging by AICc**, the latter being the current best-practise to deal with the vast number of 'fixed-effects' models that one typically considers. Model-averaging and Bayesian frameworks have some similar philosophical underpinnings, such as conditioning on the data (Burnham and Anderson 2014). However, the HB framework is also highly extensible and can deal with other challenges where the AIC is undefined, such as random-effects and individual-level heterogeneity in capture-probabilities.

### Mark-Recapture and low-sample sizes: the Bayesian Answer

Bayesian models are a solid answer to a perennial dilemma among cetacean researchers: photo-ID datasets are typically*sparse*or have

*low-sample sizes*. In contrast, researchers typically want complex data-hungry model to increase ecological realism. For example, a simple temporary-migration model or individual heterogeneity model will demand >30 - 70 variables for a mid-sized dataset. Frequentist and AICc-based inference will be overly-confident in such situations, and yield ridiculous estimates such as 100% detection, or 0% migration, or 100% survival, or just fail altogether. Alternatively, Hierarchical Bayesian models

**provide exact inference under low-sample sizes**: they just depend more on the prior distributions, which, if set-up thoughtfully, are more

*conservative*, make

*better predictions*, and can automatically safeguard against over-parametrization (Berger 2006, Gelman 2013).

### Individual Heterogeneity

*distribution*, without a large increase in complexity. In contrast, the comparable fixed-effect version in Program MARK, the 'two-point finite mixture model', typically yields over-parametrized models and unreliable capture-estimates (e.g., p=1).

### R and JAGS code

See our online R/JAGS tutorial at Github for code to run the Hierarchical Bayesian Pollock's Closed Robust Design. The tutorial includes an example photo-ID bottlenose dolphin dataset from Krista et al. 2012. We use the flexible BUGS-like Bayesian syntax called "JAGS", which makes Bayesian models accessible to almost anyone with rudimentary scripting skills.### Key Findings

**better estimation performance**than equivalent fixed-effects Maximum-Likelihood estimation (in MARK), mainly due to the latter's susceptibility to singularities (although there was no clear champion);

**serve as a type of multi-model inference**;

**heterogeneity in detection probabilities**can lead to a 8-24% increase in bottlenose dolphin abundance estimates, as compared to ML and Bayesian models that assume homogeneous detection probabilities;

### But Aren't Bayesian's Biased?

Some Mark users who are new to Bayesian inference may worry about prior information and the inherent bias of subjective Bayesian models. A subjective**Prior**contains information. This information competes with the information in the data to influence the

**Posterior**distribution (the probability distribution which is used to make conclusions about a variable of interesheterogenietyt, like population abundance or survival). The more information in a Prior, the more a posterior expectation will be pulled away from the "objective" MLE and towards the prior expectation; conversely, the more information in our data (e.g., a

*large*sample size), the prior becomes less influential and inferences are practically the same between ML-based and Bayesian paradigms.

But, there is strong evidence from the machine-learning and predictive analytics community that slightly

*conservatively*biased models yield better predictions, especially in the face of low-sample sizes and very complex models (Murphy KP, 2012). In the Learning community, this is called "

**Regularization**", such as the Lasso or Ridge Regression or Boosting: these techniques impose a penalty on model complexity and favour simpler models than "objective" ML models estimate. Interestedly, many of the Learning communities' regularization techniques are actually a type of Bayesian model (Hooten and Hobbs 2015).

The phenomenon is nicely illustrated with a coin toss: Imagine you want to estimate the fairness of a coin (H vs T), but you flip it only 3 times. Let's say you observed TTT (a rare, but not impossible event). The Maximum Likelihood Estimate would be that the probability of a tail is 100%, and the coin is definitely not fair. Of course, we know this is a terrible estimate and a stupid example, because we have solid prior information about the behaviour of coins. However, the sad fact is that many Dolphin Mark-Recapture researchers are making similarly stupid experiments, under small sample sizes and overly complex Frequentist models which estimate of 100% detection probability or 100% survival or 0% movement.

In contrast, a little bit of skepticism encoded in a Bayesian prior (like Beta(1,1)) can temper our estimates. For our coin example, the posterior would say that the probability of a tail is just 80%, given the observed sequence TTT. That's still not the true 0.5, but it is a better prediction than the frequentist's 100%

### Abstract

We present a Hierarchical Bayesian version of Pollock's Closed Robust Design for studying the survival, temporary-migration, and abundance of marked animals. Through simulations and analyses of a bottlenose dolphin photo-identification dataset, we compare several estimation frameworks, including Maximum Likelihood estimation (ML), model-averaging by AICc, as well as Bayesian and Hierarchical Bayesian (HB) procedures. Our results demonstrate a number of advantages of the Bayesian framework over other popular methods. First, for simple fixed-effect models, we show the near-equivalence of Bayesian and ML point-estimates and confidence/credibility intervals. Second, we demonstrate how there is an inherent correlation among temporary-migration and survival parameter estimates in the PCRD, and while this can lead to serious convergence issues and singularities among MLEs, we show that the Bayesian estimates were more reliable. Third, we demonstrate that a Hierarchical Bayesian model with carefully thought-out hyperpriors, can lead to similar parameter estimates and conclusions as multi-model inference by AICc model-averaging. This latter point is especially interesting for mark-recapture practitioners, for whom model-uncertainty and multi-model inference have become a major preoccupation. Lastly, we extend the Hierarchical Bayesian PCRD to include full-capture histories (i.e., by modelling a recruitment process) and individual-level heterogeneity in detection probabilities, which can have important consequences for the range of phenomena studied by the PCRD, as well as lead to large differences in abundance estimates. For example, we estimate 8%-24% more bottlenose dolphins in the western gulf of Shark Bay than previously estimated by ML and AICc-based model-averaging. Other important extensions are discussed. Our Bayesian PCRD models are written in the BUGS-like JAGS language for easy dissemination and customization by the community of capture-mark-recapture practitioners.

### Import into Reference Manager

Copy the following Bibtex into your favourite Reference Manager.

```
@article{rankin_full-capture_2016,
title = {A full-capture {Hierarchical} {Bayesian} model of {Pollock}'s {Closed} {Robust} {Design} and application to dolphins},
volume = {3},
url = {http://journal.frontiersin.org/article/10.3389/fmars.2016.00025},
doi = {10.3389/fmars.2016.00025},
abstract = {We present a Hierarchical Bayesian version of Pollock's Closed Robust Design for studying the survival, temporary migration, and abundance of marked animals. Through simulations and analyses of a bottlenose dolphin photo-identification dataset, we compare several estimation frameworks, including Maximum Likelihood estimation (ML), model-averaging by AICc, as well as Bayesian and Hierarchical Bayesian (HB) procedures. Our results demonstrate a number of advantages of the Bayesian framework over other popular methods. First, for simple fixed-effect models, we show the near-equivalence of Bayesian and ML point-estimates and confidence/credibility intervals. Second, we demonstrate how there is an inherent correlation among temporary migration and survival parameter estimates in the PCRD, and while this can lead to serious convergence issues and singularities among MLEs, we show that the Bayesian estimates were more reliable. Third, we demonstrate that a Hierarchical Bayesian model with carefully thought-out hyperpriors, can lead to similar parameter estimates and conclusions as multi-model inference by AICc model-averaging. This latter point is especially interesting for mark-recapture practitioners, for whom model-uncertainty and multi-model inference have become a major preoccupation. Lastly, we extend the Hierarchical Bayesian PCRD to include full-capture histories (i.e., by modelling a recruitment process) and individual-level heterogeneity in detection probabilities, which can have important consequences for the range of phenomena studied by the PCRD, as well as lead to large differences in abundance estimates. For example, we estimate 8\%-24\% more bottlenose dolphins in the western gulf of Shark Bay than previously estimated by ML and AICc-based model-averaging. Other important extensions are discussed. Our Bayesian PCRD models are written in the BUGS-like JAGS language for easy dissemination and customization by the community of capture-mark-recapture practitioners.},
number = {25},
journal = {Frontiers in Marine Science},
author = {Rankin, Robert W. and Nicholson, Krista E. and Allen, Simon J. and Krützen, Michael and Bejder, Lars and Pollock, Kenneth H.},
year = {2016}
}
```

## Friday, 26 February 2016

### Hierarchical Bayesian: Automatic Occam's Razor

*norm*! This new 'digitial dark age' has many culprits. One contributor is, I think, a misuse and misunderstanding of the dominant statistical paradigm called "Frequentism". Frequentism is a great intellectual achievement, and has such lofty ideals as 'objectivity', a.k.a., let the data speak for itself. Let's not inject our own hopes and desires into our scientific research. Sounds great doesn't it? Why would we want otherwise?

Well, for one thing, researchers (

*especially*dolphin researchers) rarely have enough data to use Frequentist tools. Thus, we violate the core principle of Frequentism, for which its name is derived:

In frequentism, it is the observeddatathat is the random variable. Had we repeated the experiment in a parallel universe, we would get different data. We don't care about the actual data values (like individual dolphin's measurements); instead, we care about the true population-levelprocesswhich gives rise to this data. We try to make inferences about this true process bysamplingfrom the data distribution, and then try to generalize over the entire data distribution.

The core principle is that the Frequentist is trying to use a small

*sample*to characterize the entire

*data distribution*. This requires a lot of data. And when we lack such data (because, heck, dolphins are difficult to find!), Frequentist estimates start to take on rediculous values. For example, I can't tell you how many times a dolphin researcher using a Mark-Recapture model has estimated a

**survival of 100%**. I am serious! There is a big problem with our supposedly 'objective' statistics when they lead us to believe that dolphins are

*immortal*. But here is the catch: my knowledge that dolphins age and die is a type of "prior information", and for me to use this prior biological knowledge leads us to another rival paradigm called Bayesianism.

I won't get into the philosophy of Bayesianism (partially because I don't quite understand it). The best I can do is imagine a Bayesian as a poker player: they don't consider the data in their hand (e.g., their current cards) to be

*random*: they consider it a

*given*, and they want to make the best predictions possible about the next cards to be drawn, conditional on the evidence in their hand. Seems reasonable enough, but it turns out this is in direct opposition to Frequentism.

**Parsimony and Hierarchical Bayes**

But, one nice feature about Bayesianism is that we can encode into our statistical models an old ideal from Occam: "parsimony". Basically, we say "Don't tell me a dolphin is immortal unless you have

*a lot*of data!" Technically, we can use "hyper-prior" distributions on parameters like survival to force simplicity in lieu of strong evidence, but allow for more complex models if there is a lot of data and larger sample size. This is demonstrated nicely in the following animation.

Look at the red Maximum Likelihood Estimates on the Right (a Frequentist tool). They are estimating a probability

*p*from a series of 0-1 outcomes (like, the number of dolphins captured from N available), and this is done in parallel during 10 different "groups" (like 10 independent experiments). Notice that the ML Estimates stay the same over a wide range of data/evidence (even though the "true" mean is around 0.5), and they vary from p=0.2 to p=1. However, the Hierarchical Bayesian posteriors (in purple) behave very differently. When the sample size is small (5 observations per group), each group is

*shrunk*to the overall mean at 0.5. It is only as the sample size increases that each group is allowed to take on their own individual values; and eventually, with a lot of data, each groups' posterior expectation becomes nearly the same as the MLE.

This is a type of "model parsimony": with only a little bit of data, our model is simple (all groups have the same estimate, and the estimate is the group mean at 0.5). With more data, our model becomes more complex (all groups share a mean, but they can also have individual variation, each with their own parameter

*p*).

The abuse of MLE is nicely illustrated with the following experiment: Imagine you want to estimate the fairness of a coin, but flip it only 3 times. Let's say you observed TTT (a rare event, but not impossible). The Maximum Likelihood Estimate would be that the probability of a tail is 100%, and the coin is definitely not fair. Of course, we know this is a terrible estimate and a stupid example, because we have solid prior information about the behaviour of coins. However, the sad fact is that many Dolphin Mark-Recapture researchers are making similarly stupid experiments, under small sample sizes and overally complex Frequentist models which estimate of100% detection probabilityor100% survivalor0% movement. These terrible estimates are polluting our scientific literature.

The MLE's are indeed the most 'objective', but they make bad predictions for new data and give us rediculous results under low-sample sizes. In contrast, a little bit of skepticism encoded in a Bayesian prior (like Beta(1,1)) can temper our estimates. Returning to the coin flipping example, the posterior would say that the probability of a tail is just 80%, given the observed sequence TTT. That's still not the true 0.5, but it is a better prediction than the frequentist's 100%

**Unified Under Hierarchical Bayes**

The poor performance of MLE was noticed by non-statisticians a long time ago in the Machine Learning and predictive analytics communities. They obsessed over "regularization": finding ways to constrain the complexity of models, and deliberately

*bias*the estimates towards simplicity. Techniques like the LASSO, Ridge Regression, Elastic Net, Boosting, AIC optimization, Bayesian Model averaging, etc, were devised to make better predictions on new data, rather than philosophical notions of probability and objectivity.

The eerie thing is that most of these techniques have a Bayesian interpretation. The famous Lasso is just a Bayesian model with a Double Exponential prior on the regression coefficients. The idea is explored in an accessible article by Hooten and Hoobs 2015.

**Subjective Pseudo Bayesian and the Degradation of Science**

If ecologists are willing to inject a little bit of bias into their analyses in order to get better predictions and sensible results, then what of objectivity? What is to stop one from 'cooking the books'? Talking about cavalier 'pseudo'-Bayesian analysis, the Bayesian heavy-weight James Berger wrote in 2004:

Statistics owes its central presence in science and life to the facts that (i) it is enormously useful for prediction; (ii) it is viewed as providing an objective validation for science. Discarding the latter would lead to a considerable downgrading of the value placed on statistics.

At the time of his warning, few non-statisticians were using subjective Bayesian models, and even then, they seemed to want "objective" reference priors which conveyed no prior information, and yielded nearly identical results as Frequentist models. It is a bit ironic that the opposite seems to have happened: the crisis of reproducibility have been unveiled, and it was under the watch of the so-called objective school of statistics, and not from subjectively biased Bayesian models. Now that the tables of turned, and more and more researchers are turning to Bayesian models, Berger's 2004 warning sits, waiting to be realized. The solution is probably everyone adopting a prediction mentality. Dolphin ecologists have already sort of adopted this practise by using the AIC for model selection, but have yet to grapple with its philosophical underpinnings, nor rival ideas.

### Hifime UH1 DAC under Linux and Jack

- reference-quality DAC chip: Texas Instruments PCM5102 (a flagship chip used in other >$1000 DAC's)

- Samples rates from 44.1, 48, 88.2, 96, 176.4, 192, 384kHz, and bit rates 16, 24 and 32.

- optional headphone amplifier (TPA6120, which I couldn't get much information about), as well as a direct line-out just from the DAC itself. This a great feature for someone who may have to drive high-impedance headphones, but stills wants the flexibility of using other neutral amplifiers.

- Asynchronous USB transfer with two TCXO as master clock

- under $200 USB, ships from Hong Kong (not from the USA-based HifimeDIY)

What's really nice is that Hifime explicitly mentions support for Linux (without drivers!), a major selling point for me as I don't want to play around with kernel patches, etc.

**Using with GNU/Linux JACK, Pulseaudio, and ALSA.**

The DAC shows up as HiFimeDIY SA9227 USB Audio. For most linux distributions, one will have to contend with Pulseaudio to get any sound out of the DAC. The easiet way is to use 'pavucontrol' (lightweight Pulseaudio GUI control) to explicitly change the 'Audio Interfaces'.

Unfortunately, Pulseaudio automatically down-samples audio. So, for bit perfect audio, one needs to use a music player with ALSA support. I use

**gmusicbrowser**, which is highly customizable (reminiscent of my only Win7 program Foobar2000). Click on Main (menu) -> Settings->Audio (Tab)-> Output device: set to "Alsa"; Click "advanced options" and enter "hw:Audio" as the Alsa option. Viola! Enjoy audio at 192000Hz and 24bit!

- See more at: http://hifimediy.com/hifime-uh1#sthash.twSvxSOI.dpuf