^{a}

^{b}

^{b}

Researchers in numerical cognition rely on hypothesis testing and parameter estimation to evaluate the evidential value of data. Though there has been increased interest in Bayesian statistics as an alternative to the classical, frequentist approach to hypothesis testing, many researchers remain hesitant to change their methods of inference. In this tutorial, we provide a concise introduction to Bayesian hypothesis testing and parameter estimation in the context of numerical cognition. Here, we focus on three examples of Bayesian inference: the t-test, linear regression, and analysis of variance. Using the free software package JASP, we provide the reader with a basic understanding of how Bayesian inference works “under the hood” as well as instructions detailing how to perform and interpret each Bayesian analysis.

Like any other discipline within the psychological and behavioral sciences, researchers in numerical cognition rely on statistical inference – hypothesis testing and parameter estimation – to verify theoretical claims against observed data. In recent years, Bayesian inference has become increasingly popular as an alternative to classical null hypothesis significance testing (NHST). Bayesian inference operates under a fundamentally different framework from NHST; instead of hard “accept/reject” decisions about a null hypothesis, Bayesian inference works by quantifying evidence about competing hypotheses in light of observed data. Hypotheses that do a better job of predicting data will receive increased support, whereas hypotheses that do not predict the data well will receive decreased support.

Despite its many advantages over NHST and increased calls for its use, the Bayesian approach is still relatively underutilized in the social sciences (

In this tutorial paper, we will introduce the reader to the basics of Bayesian inference through the lens of some classic, well-cited studies in numerical cognition. In three detailed examples illustrating

In this section, we will review and elaborate on the concept of hypothesis testing. Even though the reader of this paper is probably familiar with the general concept of hypothesis testing, we believe that it is nonetheless instructive to go over the basics.

Hypothesis testing can be viewed as a process of validating general claims based on a small sample of the general population. To illustrate, suppose that we are policy makers confronted with a new study program which claims to increase the mathematical abilities of sixth graders in the United States. To ease the exposition we use a (fictitious) scale called the Scale for Advancing Mathematical Ability (SAMA) to measure mathematical ability. As policy makers we know that the population mean SAMA score is

The superiority claim about the new study program can be formalized as the working hypothesis

Nonetheless, we would like to infer the validity of the working hypothesis

To deduce how hypotheses, i.e., claims about the population such as

Reasoning from the general population by deriving statements about potential data outcomes under the assumption that the parameters are known forms the basis of null hypothesis significance testing. By definition, the

Under the null model

A small

We conclude our brief review on ^{i} with

The null-hypothesis significance test treats “acceptance” or “rejection” of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true (pp. 422-423).

In a similar vein,

Modern statisticians have developed extensive mathematical techniques, but for the most part have rejected the notion of the probability of a hypothesis, and thereby deprived themselves of any way of saying precisely what they mean when they decide between hypotheses (p. v).

Though there are many other problems with

To address these problems, we can use Bayesian hypothesis testing, which gives us the ability to quantify evidence by comparing the

which indexes the relative adequacy of both hypotheses as predictors of our observed data. One immediate benefit of the Bayes factor is ease of interpretation. For example, if

Typically, the Bayes factor is denoted as

Unlike the ^{ii}:

Bayes factor | Evidence |
---|---|

1 – 3 | anecdotal |

3 – 10 | moderate |

10 – 30 | strong |

30 – 100 | very strong |

> 100 | extreme |

In principle, the Bayes factor does not come with a specified decision rule. We believe that context made explicit using prior model probabilities by field experts are much more important than overly generalized “rules” such

Bayes’ rule mathematically expresses the most basic idea in science; namely, that our prior belief in a hypothesis –

Since our goal is to directly compare the predictive adequacy of two hypotheses

or equivalently,

This gives us a lovely way to remember another fundamental fact of Bayesian hypothesis testing; that is,

Now we come back to our earlier example of the SAMA test for mathematical ability and demonstrate these ideas in context. Working as policy makers, we might have seen many claims of improved study programs that did not work out well. For instance, if four out of five past claims of improved study programs led to null results, we might place 4-to-1 odds

For illustration, suppose that we found

While the goal of this paper is to provide some concrete examples of Bayesian hypothesis testing in numerical cognition, before doing so we will discuss

As a general method, Bayesian hypothesis testing gives us several pragmatic benefits, each of which will be highlighted in our examples below. The first of these is the ability to quantify evidence on a continuous scale. A second benefit is the ability to differentiate between

Given these benefits, what does a Bayesian hypothesis test require of the researcher? Like any hypothesis testing situation, the researcher must specify a null hypothesis

Given this specification, we can use JASP to perform all necessary calculations “under the hood”. Note that the specification of the prior distribution of effect sizes under

In the remainder of the paper, we describe a number of practical examples of using Bayesian inference in the context of numerical cognition. In the examples, each based on a well-cited paper in numerical cognition, we demonstrate a number of JASP’s features using synthetic data sets, each generated in R using specific model assumptions that closely match the patterns of observed data described in the original papers. The data sets, as well as the R script used to generate the data sets, are available for download from

To demonstrate the Bayesian

In their original paper,

We will now describe a Bayesian analysis inspired by the original methods of

JASP screenshot showing the ttest.csv data set, modeled after

Similar to

JASP screenshot showing descriptive statistics, including mean and standard deviation, and boxplots for the distSame and distDifferent variables.

We are now ready to perform a Bayesian analysis of the simulated ^{iii} First, click the header of the condition column. To remove all of the larger trials from the analysis and retain only the smaller trials, we simply “turn off” the larger value by clicking on the check mark next to it. The check mark will be replaced by a cross, and you will be able to see that the larger trials have been greyed out. The filter dialog can be closed by clicking on the X on the right side of the dialog box. Notice that a funnel icon will appear next to the condition column header to indicate that a filter is currently being applied. Any filters can be removed by re-opening the filter dialog and clicking the eraser icon.

To perform a Bayesian one-sample

JASP screenshot showing the Bayesian one-sample

As mentioned above, the Bayes factor is ^{iv}

The bottom plot shows this in more detail. The dashed line depicts the aforementioned Cauchy prior (with a default scale/prior width of

Another question to consider in our analysis is how sensitive the Bayes factor is to changes in the specification of the Cauchy prior on the population-level effect size

JASP screenshot showing a Bayes factor robustness check for the one-sample

Now we are ready to perform a Bayesian

JASP screenshot showing a Bayesian one-sample

It is again important to note that the 95% credible interval that is displayed in the figure is conditional on

What is also important to realize is that a test addresses a different question than an estimation. The Bayes factor focuses on the question “Does the effect exist?”, whereas the posterior focuses on the question “If it exists, then how large is it?”. This is why the Bayesian testing problem compares the model where the population effect is fixed at zero against one where it is free to vary. Once the data have convinced us that the population effect

Additionally, one might want to see how evidence changes over time as data are collected. JASP gives us the ability to monitor this evidential flow with the “Sequential analysis” plot. The resulting plot shows the progression of the Bayes factor over time; see

JASP screenshot showing a sequential analysis; this provides a visualization of the “evidential flow” of incoming data.

We can follow the same steps described above on the participants who were assigned to the “choose larger” condition. The inference is largely the same. For same decade trials, we obtain

Let us now recap what we have done here. Importantly, we have described how to conduct the Bayesian

Though the discussion throughout this example is designed to be instructive, most of the narrative we have used is also good for the reporting of Bayesian analyses. Practitioners who desire more guidance on the reporting of such analyses are encouraged to consult

To demonstrate Bayesian linear regression, we use an example inspired by

The questions we would like to address here are (1) “Which out of these two covariates (

The Bayesian linear regression procedure begins by considering the following four models:

Here,

After downloading the file regression.csv from the Github repository (

First, we will construct scatterplots to visualize the relationship between fluency and the two proposed predictors. An easy way to do this uses the “Bayesian Correlation” option, which can be selected by clicking the “Regression” button. Once here, move all three variables into the analysis box, then open the “Plot Individual Pairs” menu. The two scatter plots will be revealed by moving the variables symbolicDE and fluency (in that order) as a pair into the box, and then doing the same with nonsymbolicDE and fluency.

JASP screenshot showing how to construct scatterplots before performing a Bayesian linear regression.

To perform the Bayesian linear regression, we click the “Regression” button and select “Bayesian Linear Regression”. We then move the fluency variable to the “Dependent Variable” box and we move both the symbolicDE and nonsymbolicDE variables to the “Covariates” box. We will leave the basic defaults in place except for two. First, select “Posterior summary” under “Output”. Also, we will select one more option that is found in the “Advanced Options” section, but will actually simplify our discussion for this tutorial. Under “Model Prior”, select “Uniform”. The output can be seen in

JASP screenshot showing a Bayesian linear regression.

First, we will walk through the basic output given by JASP. Note that the basic functionality of JASP here is based on the R package BAS (

The first column,

Various Bayes factors are contained in the third and fourth columns. Each column gives a slightly different type of Bayes factor, however, and must be interpreted appropriately. The third column,

Finally, the fourth column, ^{v} This is a very useful column for doing specific model comparisons. For example, we can use this column to directly compare the two most probable models,

Once we have positive evidence for

JASP screenshot showing the marginal posterior distribution for the symbolicDE (

Given this background, let us first interpret the Posterior Summary table in

After observing data, we can calculate the posterior inclusion probabilities

Using these posterior inclusion probabilities, we can calculate an “inclusion Bayes factor”, which is defined as the factor by which the prior odds for including a specific predictor are increased after observing data. For example, the prior odds for including symbolicDE is equal to 0.5/(1-0.5) = 1, and the posterior odds is equal to

Finally, let us consider the marginal posterior distribution displayed in ^{vi} 95% credible interval, which states that the coefficient for nonsymbolicDE ranges from -0.764 to 29.736.

To summarize, the Bayesian linear regression procedure offers many advantages over and above what we get from traditional null hypothesis testing. Rather than simply making black-and-white decisions about whether symbolic and/or nonsymbolic distance is a significant predictor of mathematical fluency, we are able to engage in a much deeper analysis. With only a few clicks in JASP, we can easily perform model comparison, allowing us to compare the adequacy of four different models to predict the data at hand. Further, with Bayesian model averaging, we are able to simultaneously engage in testing and estimation, factoring our uncertainty about model choice into our estimation of the effects of symbolic and nonsymbolic distance on mathematical fluency. One should note that model averaging becomes quite beneficial when we have many predictors in our regression model, as the space of possible models on these predictors becomes large very quickly. The reader is encouraged to consult

In our final example, we demonstrate a Bayesian factorial analysis of variance (

The theory of Bayesian ANOVA is analogous to that of Bayesian linear regression elaborated on above, because an ANOVA model can be expressed as a linear regression with dummy variables (

Notice that in this case, the null model

The simulated data is contained in the file anova.csv, which can be downloaded from (

To perform a Bayesian analysis of variance, we click the button for the “ANOVA” module and select “Bayesian Repeated Measures ANOVA” from the menu. This procedure requires us to specify the names and levels of the repeated measures factors; clicking on the terms “RM Factor 1”, “Level 1”, etc. in the “Repeated Measures Factors” box makes these names editable. In

JASP screenshot of the Bayesian Repeated Measures ANOVA menus.

At this point, the output will look similar to the linear regression example we did previously. But before attempting to interpret the output, let us construct a plot of the condition means. In

JASP screenshot of the Descriptive Plots output, showing the condition means formed by crossing the two factors of problem size and format.

Based on the repeated measures factors that are entered, JASP constructs a set of models to reflect the possible additive combinations and interactions between these factors. Since we entered two factors (size and format), JASP builds 5 competing models:

Note that JASP does not include interactions without also including the relevant main effects; thus, the model which includes the interaction between problem size and format (i.e.,

Though we have 5 models to consider, our goal is to directly compare the predictive adequacy of the additive model

Adding the factors of problem size and format to the null model in a Bayesian Repeated Measures ANOVA.

At this point, the output will change considerably.

Output tables for the Bayesian Repeated Measures ANOVA.

The first column,

As in the regression example, the third column,

In summary, our observed data is quite evidential for an interaction model, where there is an interaction between problem size and format on mean response times in a mental arithmetic task. This supports the narrative of

We note that there are many other options that a user might want to explore when computing a Bayesian analysis of variance in JASP. Also, the methods described here also work when the user is interested in more complex designs, including those with 3 or more factors. In these cases, the number of potential models to consider increases quickly. However, JASP displays these models in order of decreasing posterior probability, so the user can quickly ascertain the best fitting model from the potentially large list of candidate models (see also

In this tutorial, we have presented the basics of Bayesian inference in the context of three well-cited studies from the numerical cognition literature. The examples provided in this tutorial represent a broad class of techniques (e.g.,

Further, we have demonstrated how Bayesian analyses afford the practitioner a different perspective on statistical inference. Importantly, this perspective has many pragmatic advantages over the current practice of reporting

From the Fisherian point of view,

We advise readers not to get too hung up on the specific labels and breakpoints in this table, see also

For a demonstration of the different methods of filtering in JASP, see

To simplify matters, we identify the hypotheses to the models and use the notation

It is easy to compute the Bayes factor with respect to the most probable model, once we have the Bayes factor with respect to the null model. Let

Here, unconditional means that we do not have to condition on a specific model to interpret the credible interval. That is, the credible interval factors in both our uncertainty about the model as well as our uncertainty about the effect size under the model.

The authors have declared that no competing interests exist.

The authors have no support to report.