^{1}

^{2}

^{3}

^{3}

^{1}

^{2}

^{1}

^{2}

^{4}

^{3}

While several studies have shown that the performance on numerosity comparison tasks is related to individual differences in math abilities, others have failed to find such a link. These inconsistencies could be due to variations in which math was assessed, different stimulus generation protocols for the numerosity comparison task, or differences in inhibitory control. This within-subject study is a conceptual replication tapping into the relation between numerosity comparison, math, and inhibition in adults (N = 122). Three aspects of math ability were measured using standardized assessments: Arithmetic fluency, calculation, and applied problem solving skills. Participants’ inhibitory skills were measured using Stroop and Go/No-Go tasks with numerical and non-numerical stimuli. Finally, non-symbolic number sense was measured using two different versions of a numerosity comparison task that differed in the stimulus generation protocols (Panamath; Halberda, Mazzocco & Feigenson, 2008, https://doi.org/10.1038/nature07246; G&R, Gebuis & Reynvoet, 2011, https://doi.org/10.3758/s13428-011-0097-5). We find that performance on the Panamath task, but not the G&R task, related to measures of calculation and applied problem solving but not arithmetic fluency, even when controlling for inhibitory control. One possible explanation is that depending on the characteristics of the stimuli in the numerosity comparison task, the reliance on numerical and non-numerical information may vary and only when performance relies more on numerical representations, a relation with math achievement is found. Our findings help to explain prior mixed findings regarding the link between non-symbolic number sense and math and highlight the need to carefully consider variations in numerosity comparison tasks and math measures.

Even after such a manipulation, a numerical ratio effect is present and this is considered evidence that individuals are basing their decisions on numerical information (

Developmental studies have shown that the interference from non-numerical features is stronger in young children than in older children and adults, possibly due to the fact that their inhibitory functions are not yet as mature (e.g.,

Finally, the contradictory findings of

In this study, we attempt to disentangle seemingly contradictory findings from prior research using a direct and conceptual replication of studies that examined the relation between non-symbolic number sense and symbolic math skills in adults. We opted for an adult sample because we wanted to untangle the complex processes first before exploring how these processes operate through development. This enabled us to examine 1) the possible impact of the type of math abilities that were assessed, 2) the possible role that inhibition may have on the relation between number sense and math, and 3) the stimulus generation algorithm used to construct the dot arrays for the numerosity comparison task in a within-subject design where all participants were presented with a large number of tasks. More concretely, in a sample of 122 adults we compared two non-symbolic number comparison tasks (stimuli were generated with either the Panamath or the G&R software)^{1}

To determine sample size, an _{age} = 20.23 years; _{age} = 19.70 years;

All assessments took place in a single laboratory visit, and tasks were presented in a pseudo-randomized order. All participants first completed a series of paper-and-pencil questionnaires and math tests (first Arithmetic Fluency, then Arithmetic Skills). Then, both numerosity comparison tasks were administered in a counterbalanced order (half of the participants started with Panamath, the other half with G&R). Finally, four inhibition tasks were administered of which the order was counterbalanced according to a Latin square design. Numerosity comparison and inhibition tasks were computerized and presented on a 15.4-inch laptop computer. In total, testing lasted about one hour.

To measure arithmetic fluency, the “Tempo Test Arithmetic” was used (“

Four subtests of the “Cognitive Skills Arithmetic, 5th grade” test (“

Participants completed four computer-based tasks designed to test inhibitory control: Two Stroop tasks, and two Go/No-Go tasks in which one task of each kind used numerically relevant stimuli and the other used numerically irrelevant stimuli. The four tasks were presented in a pseudo-random order counterbalanced across participants.

In the Stroop tasks, participants were instructed to respond to one piece of relevant information about a stimulus while ignoring the other, often more salient, piece of information. Stroop tasks consisted of 16 practice trials with feedback^{2}

In the Go/No-Go tasks, participants were instructed to respond as quickly as possible to all stimuli except for a specified stimulus, thus requiring them to develop and subsequently inhibit a prepotent response. The task consisted of 10 practice trials followed by 100 test trials, in which the “Go” stimulus was presented for 75% of trials and the “No-Go” stimulus was presented for the remaining 25%. A fixation cross was presented for 500 ms, the stimulus was presented for 90 ms, and participants had 750 ms to respond before the start of the next trial. In the

Participants completed two numerosity comparison tasks for which stimuli were generated with different software (Panamath and G&R). Both tasks started with six practice trials with auditory feedback, and a test phase of 144 trials administered in a single block. Each trial began with the presentation of a fixation cross on an otherwise blank screen, after which dot arrays (yellow and blue dots) were simultaneously presented on the left and right sides of the screen for 500 ms. Participants were instructed to indicate which side contained more dots by pressing the “f” key (left hand) or the “j” key (right hand). Half of the trials featured the correct answer on the left side of the screen, the other half on the right. Numerosities ranged from 10 to 40 and six ratios were used: 1.11, 1.14, 1.2, 1.25, 1.5 and 2. The two tasks were completed sequentially, and the order was counterbalanced across participants. Accuracy scores were used for all analyses (see also

One numerosity comparison task was created and administered using the Panamath Software (downloaded from the Panamath website;

The other numerosity comparison task was an adapted version of the one used by

Research questions, data collection, and a series of analyses for this investigation were preregistered on Open Science Framework on February 19, 2018 (see

We first presented a series of repeated measures analyses of variance (ANOVAs) to test for the effects of ratio and trial type in both numerosity comparison tasks and for the presence of interference effects in the Stroop tasks. Second, because the performance in both numerosity comparison tasks was affected by trial type, an Exploratory Factor Analysis (EFA) was conducted to uncover the underlying structure of the different trial types. This EFA was not preregistered but seemed a logical step in the data analysis given the strong effects of trial type. Finally, after computing descriptive statistics and bivariate relations for variables of interest, we ran two path models with all math assessments simultaneously regressed on each of the two number comparison tasks to address the first research question. To address our second research question, we ran a series of path models to test whether inhibitory control measures were independently related to performance on assessments of mathematical skill, and whether the relation between numerosity comparison and mathematical skills operates indirectly through inhibitory control. All path analyses were run in Mplus 8 (

Two repeated-measures ANOVAs modeled six within-subject levels (one for each ratio). Additionally, each test modeled separate trial types (two levels for G&R: Congruent and incongruent; three levels for Panamath: area-congruent, area-neutral, and area-incongruent). Results of both ANOVAs are shown in

In the G&R numerosity comparison task, there was a main effect of trial type,

For Panamath, there was also a main effect of trial type,

Error rates were 15.35 and 3.17% respectively for the numerical Stroop and the animal Stroop task. Repeated measures ANOVAs were conducted for both Stroop tasks to examine the expected presence of an interference effect. Results are presented in

Because both analyses of the numerosity comparison tasks revealed an effect of trial type, an EFA was conducted to determine the underlying factor structure of the different trials. Trial accuracy was aggregated to the ratio bin level for each trial type in the two tasks. That is, a total of 30 variables were included, 12 representing accuracy on the G&R incongruent or congruent trials in each of the six ratio bins and 18 representing accuracy on the three Panamath trial types in each of the six ratio bins. These variables were submitted to an EFA with ROTATION = OBLIMIN. A three-factor model fit the data significantly better than did a two-factor model, χ^{2}(28) = 50.399, ^{2}(27) = 30.441, ^{2}(348) = 513.587,

Condition | % Correct | Factor 1 | Factor 2 | Factor 3 | |
---|---|---|---|---|---|

Panamath area correlated 1.11 | 70.33 | 13.87 | 0.087 | −0.178 | −0.048 |

Panamath area correlated 1.14 | 74.66 | 13.34 | 0.328 | −0.150 | 0.055 |

Panamath area correlated 1.20 | 83.62 | 12.95 | 0.358* | 0.013 | −0.206 |

Panamath area correlated 1.25 | 77.99 | 12.55 | 0.361 | 0.245 | 0.269 |

Panamath area correlated 1.50 | 91.07 | 11.04 | 0.438* | −0.064 | 0.204 |

Panamath area correlated 2.00 | 98.41 | 5.20 | 0.422 | −0.019 | 0.079 |

Panamath area equal 1.11 | 68.77 | 17.71 | 0.099 | 0.070 | 0.128 |

Panamath area equal 1.14 | 71.43 | 16.12 | 0.249 | −0.038 | −0.129 |

Panamath area equal 1.20 | 71.50 | 15.75 | 0.299 | 0.199 | 0.129 |

Panamath area equal 1.25 | 84.55 | 12.77 | 0.367 | −0.027 | 0.169 |

Panamath area equal 1.50 | 89.90 | 10.64 | 0.451* | 0.102 | 0.135 |

Panamath area equal 2.00 | 96.52 | 6.58 | 0.416* | 0.035 | 0.090 |

Panamath area anticorrelated 1.11 | 56.02 | 15.90 | 0.084 | 0.043 | 0.166 |

Panamath area anticorrelated 1.14 | 70.20 | 16.89 | 0.444* | −0.029 | −0.103 |

Panamath area anticorrelated 1.20 | 76.37 | 14.46 | 0.356* | 0.106 | −0.026 |

Panamath area anticorrelated 1.25 | 84.87 | 12.08 | 0.329 | −0.018 | −0.019 |

Panamath area anticorrelated 1.50 | 86.27 | 14.85 | 0.533* | 0.002 | −0.235* |

Panamath area anticorrelated 2.00 | 93.14 | 12.14 | 0.299 | 0.155 | −0.179 |

G&R congruent 1.11 | 82.98 | 14.13 | −0.072 | −0.263 | 0.550* |

G&R congruent 1.14 | 83.83 | 13.94 | 0.012 | −0.305 | 0.566* |

G&R congruent 1.20 | 84.37 | 16.14 | 0.024 | −0.229 | 0.533* |

G&R congruent 1.25 | 87.32 | 12.05 | −0.036 | 0.018 | 0.762* |

G&R congruent 1.50 | 91.06 | 10.19 | 0.034 | 0.090 | 0.685* |

G&R congruent 2.00 | 95.52 | 7.22 | 0.083 | 0.069 | 0.624* |

G&R incongruent 1.11 | 31.68 | 20.02 | 0.038 | 0.640* | −0.232 |

G&R incongruent 1.14 | 31.48 | 18.32 | −0.181 | 0.840* | 0.044 |

G&R incongruent 1.20 | 36.78 | 19.88 | 0.002 | 0.512* | −0.360* |

G&R incongruent 1.25 | 45.26 | 21.66 | 0.063 | 0.709* | −0.054 |

G&R incongruent 1.50 | 58.57 | 22.80 | 0.048 | 0.829* | −0.030 |

G&R incongruent 2.00 | 73.86 | 22.11 | 0.174 | 0.777* | 0.080 |

*Factor loading significant at

Basic descriptive statistics of the variables of interest for the path analyses are presented in

Variable | Minimum | Maximum | ||
---|---|---|---|---|

Age | 20.23 | 2.05 | 17.89 | 29.40 |

G&R congruent accuracy | 0.88 | 0.09 | 0.63 | 1.00 |

G&R incongruent accuracy | 0.46 | 0.17 | 0.10 | 0.85 |

Panamath total accuracy | 0.80 | 0.05 | 0.67 | 0.91 |

Number go/no-go commission errors | 6.87 | 3.60 | 1.00 | 16.00 |

Animal go/no-go commission errors | 5.05 | 3.43 | 0.00 | 16.00 |

Number stroop interference RT score | 42.14 | 26.62 | −14.00 | 123.00 |

Animal stroop interference RT score | 28.93 | 23.89 | −13.00 | 108.50 |

Arithmetic fluency | 104.86 | 17.45 | 66 | 151 |

Procedural calculation sum score | 6.69 | 1.91 | 2 | 10 |

Applied mathematics sum score | 5.89 | 1.91 | 1 | 10 |

Variable | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|

1. Age | — | ||||||||||

2. G&R congruent accuracy | .06 | — | |||||||||

3. G&R incongruent accuracy | −.04 | −.60** | — | ||||||||

4. Panamath total accuracy | .05 | .01 | .25** | — | |||||||

5. Number go/no-go commission errors | −.04 | −.11 | −.09 | −.21* | — | ||||||

6. Animal go/no-go commission errors | −.04 | −.02 | −.18 | −.24** | .60** | — | |||||

7. Number stroop interference RT score | .12 | −.07 | −.00 | .19* | −.15 | −.11 | — | ||||

8. Animal stroop interference RT score | −.01 | .07 | −.16 | −.07 | −.03 | .03 | −.15 | — | |||

9. Arithmetic fluency | .11 | −.13 | .02 | −.05 | −.09 | .00 | .07 | .19* | — | ||

10. Procedural calculation sum score | .11 | −.09 | .10 | .17 | −.07 | −.09 | .08 | .01 | .46** | — | |

11. Applied mathematics sum score | .14 | .01 | .08 | .19* | −.07 | −.01 | .11 | −.01 | .25** | .24** | — |

*

To test whether numerosity comparison skills are related to mathematical skills, two separate path models were conducted wherein the three math skills (arithmetic fluency, procedural calculation, and applied word problems) were simultaneously regressed on each of the three measures of numerosity comparison skills by task (i.e., Panamath in one model; G&R in the other). Covariances among all outcome variables were modeled such that all models were fully identified so as to assess the unique contribution of numerosity comparison skills to each outcome measure net of any similarity between different measures of mathematical skills. Results of both path models are shown in

†

To test whether associations between numerosity comparison and various mathematical skills differed depending on the numerosity comparison task utilized (e.g., whether accuracy on the Panamath task was more strongly related to arithmetic fluency than was accuracy on the G&R incongruent trials), we then conducted a multi-group path model with each numerosity measure (i.e., Panamath, G&R incongruent, G&R congruent) representing a different group. Parameters were tested for equality across the three groups; a Wald test confirmed parameters were not significantly different across numerosity measures in predicting any of the three math outcomes, arithmetic fluency: χ^{2}(2) = 1.87, ^{2}(2) = 5.71, ^{2}(2) = 2.85, ^{2}(1) = 1.68, ^{2}(1) = 2.74, ^{2}(1) = 3.22,

Lastly, to test whether there was a unique contribution of accuracy on the Panamath numerosity comparison task net of any covariance shared with either congruent or incongruent trials from the G&R numerosity comparison task, a single path model was conducted in which the three math assessments were simultaneously regressed on the three numerosity comparison measures. Panamath continued to predict procedural calculation (β = 0.17,

Because of the strength of the correlation (

*

Another path model (

Finally, a last path model was run with all parameters simultaneously estimated (i.e., both inhibition scores and all three ANS scores predicting all three math outcome measures) to test specificity and robustness of effects. With this more saturated model, we find only one significant association between RT on the Animal Stroop task and the test of arithmetic fluency, β = 0.20,

The relation between numerosity comparison and math achievement has been the topic of intense debate. The discussion has been complicated by the fact that studies have used different math assessments and numerosity comparison tasks that used different stimulus generation algorithms. These tasks may pose greater or lesser demands on inhibition, which has been measured in past work in a variety of ways as well. Here, we examine the influence of the stimulus generation algorithm used in different numerosity comparison tasks, the type of math assessment, and the role that inhibition plays on the relation between numerosity comparison and math. More specifically, we compared two of the most commonly utilized algorithms (Panamath;

First, correlations between performance on the two numerosity comparison tasks are dependent on trial type. There is a weak but significant correlation between Panamath and the G&R incongruent trials (

To arrive at a possible explanation for these observations, we first want to reiterate some details of both stimulus generation algorithms. As mentioned before, we used three different trial types in the Panamath numerosity comparison task: trials wherein the cumulative surface area of all dots in an array and number was positively correlated, trials wherein they were equated, and trials wherein they were negatively correlated and cumulative perimeter was in turn equated. While most studies using Panamath only use two trial types (i.e., area-congruent and area-neutral trials), we also included area-incongruent trials because previous work has found that some participants may use total perimeter as a non-numerical cue to solve non-symbolic number comparison tasks (

A possible explanation as to why participants rely more on non-numerical cues in some cases than others is offered by the signal clarity account (

Our second main finding was that performance on the two numerosity comparison tasks was differentially related to our math assessments. Performance on the G&R numerosity comparison task was unrelated to any aspect of math, whereas performance on the Panamath numerosity comparison task related to measures of procedural calculations and applied word problem solving but not arithmetic fluency, supporting prior findings (e.g.,

Our findings may explain the contradicting findings from previous studies that examined the relation between numerosity comparison and mathematics achievement (e.g.,

Our third main finding was that the two numerosity comparison tasks related differently to our four inhibition measures. Performance on the Panamath task was significantly correlated with the commission errors made in both the Number and Animal Go/No-Go task and with the interference effect observed in the Number Stroop task. In contrast, the G&R numerosity comparison task was not correlated with any of the inhibition measures. One possible explanation is that participants were not actively trying to inhibit non-numerical information in the G&R task, while the Panamath task may have triggered a greater reliance on inhibitory control to extract the numerical information needed to complete the task. Importantly, inhibition was not responsible for the correlation between the Panamath numerosity comparison task and math achievement as the associations between these two variables remained significant when controlling for inhibition. This result is in line with the findings of

In conclusion, this study replicates and extends much of the prior research regarding the relations between non-symbolic number sense and symbolic number understanding and provides some further clarity to seemingly inconsistent findings. This study is the first to systematically examine the impact of variations in math assessment, stimulus generation algorithms to assess numerosity comparison skills, and inhibition for the link between numerosity comparison and math achievement. Our results suggest that variations in the stimulus generation protocols for numerosity comparison tasks result in different reliance on non-numerical cues and non-symbolic number representations. Possibly, non-symbolic number is processed more prominently in the Panamath task because of the larger variance in trial types that are presented during its administration, while the large variation in non-numerical cues in the G&R algorithm results in decisions heavily influenced by non-numerical cues. However, alternative explanations underlying the relation between Panamath and math tests such as cognitive flexibility need to be examined further. This explanation fits with our findings that performance on the Panamath task but not the G&R task was correlated with math achievement and that these results held even when controlling for inhibition. In sum, our findings with adults explain prior mixed findings regarding the link between non-symbolic number sense and math and highlight the need to carefully consider the stimulus generation protocols used for numerosity comparison tasks and the choice of math measures. Future studies should explore how these factors impact the link between non-symbolic number sense and math through development.

The supplemental materials contain the preregistration protocol, the materials and script used in the numerosity comparison tasks and the cleaned dataset (for access see

This research was supported by a grant from the Fund for Scientific Research- Flanders awarded to Bert Reynvoet and Delphine Sasanguie.

There are several ways in which the stimuli can be created with each software. The details of the version that was used here can be found in the methods section and the experimental scripts are available on OSF (see

Nine participants received only five practice trials due to a programming error; however, they were not excluded from analyses.

The authors have declared that no competing interests exist.

The authors have no additional (i.e., non-financial) support to report.