^{a}

Numeracy, as measured by performance on the non-symbolic numerical comparison task, is a key construct in numerical and mathematical cognition. The current study examines individual variation in performance on the numerical comparison task. We contrast the hypothesis that performance on the numerical comparison task is primarily due to more accurate representations of numbers with the hypothesis that performance dependent on decision-making factors. We present data from two behavioral experiments and a mathematical model. In both behavioral experiments we measure the precision of participant’s numerical value representation using a free response estimation task. Taken together, results suggest that individual variation in numerical comparison performance is not predicted by variation in the precision of participants’ numerical value representation.

Learners’ performance on non-symbolic numerical comparison tasks is used to define the learner’s

Characterization of the cognitive processes involved in completing the numerical comparison task is essential for theory and application and will contribute to an explanation of variation in learner’s numerical comparison ability. Additionally, it will help researchers to determine

Cognitive processes and representations that may contribute to performance include executive functioning and inhibition (

The current study includes two behavioral experiments and a mathematical model of the non-symbolic numerical comparison and non-symbolic numerical estimation tasks. Our approach assumes that participants have a mechanism for representing relative values that can be used in completing these numerical tasks. We focus on value representation that may be constructed from a combination of numerical and non-numerical information in order to make accurate and ecologically valid conclusions. Participants’ use of non-numerical information on numerical tasks is supported by prior work (e.g.,

We evaluate evidence for two cognitive processes, precision of number representation and decision-making threshold, which may contribute to completing the numerical comparison task. Both are characterized at an algorithmic level of analysis (

Learners with a more precise representation of number values are better able to make distinctions between numerical stimuli and answer correctly on a number comparison task (e.g.,

We also consider the hypothesis that individual differences in numerical comparison task performance are primarily due to variation in

We do not assume the Precision and Decision processes to be mutually exclusive. The current study evaluates the degree to which these two processes account for behavioral data across two numerical tasks. The behavioral experiments examine the relation between participants’ accuracy and precision of number estimation as it relates to numerical comparison (e.g.,

Across two experiments we combine behavioral and modeling data to examine the possibility that variation in numerical comparison performance is driven primarily by individual differences in the precision of numerical representations. We also consider the possibility that variation in numerical comparison is primarily driven by variation in the decision-making processes and not representations of number value. There is mixed evidence in prior work regarding the relationship between numerical comparison and estimation task performance. In some cases, no relationship between numerical comparison accuracy and estimation accuracy is found (

In addition to estimation accuracy, we calculated the variation of participants’ estimates, e.g., the precision (

If the variation in participants’ performance is primarily due to variation in the precision of their numerical representations we expect a strong correlation between accuracy on the comparison task and estimation precision in the estimation task. If individual differences are due to variation in decision-making thresholds, we would not expect a significant correlation between accuracy on the comparison task and estimation precision. Numerical representation precision may not be all that there is to the estimation task or comparison task. The participant must map their internal representation of the stimuli to an output. In the case of the estimation task, the output is a specific cardinal value. For each participant, the model will fit their performance on the comparison and estimation tasks simultaneously. The question to be examined is if how well adjustments to the neural representation precisions fit participants’ data relative to the fit when decision-making parameters are also adjusted. Does including a decision-making evidence parameter significantly improve model fit to participant’s data one or both tasks?

Participants (

Stimuli were 96 visually presented pairs of square arrays with a midline separator. Shape arrays ranged in number from 23 to 111 (see the

Stimuli were 64 visually presented shape arrays. The number of objects ranged from 23 to 111 (see

Participants’ performance was calculated as the number of correct responses on the task. Performance ranged from 32% to 84% correct. For the remaining analysis, we only consider participants with performance statistically above chance (58%) on the numerical comparison task (

Participants’ performance was calculated using the deviations between the participants' response and the actual number of shapes displayed. Participants' mean deviation ranged from 7.78 to 44.51 with a median of 17.33. Deviations can also be calculated in terms of proportions (e.g., a response of 50 when 40 items were displayed would be a deviation of 0.25). Participants' mean deviation in terms of proportion difference ranged from 0.125 to 0.811 with a median of 0.255.

We also calculated the variation in participants’ responses, separately from accuracy (

We evaluated the relationship between participants' behavior on the two tasks using a linear regression with estimation deviation (e.g., accuracy), estimation precision, and participant age in predicting numerical comparison (

Independent Variable | B | 95% CI | |||
---|---|---|---|---|---|

Estimation Accuracy | -0.04 | [-0.03, 0.04] | 0.32 | .753 | 0.15 |

Estimation Precision | 0.14 | [-0.85, 0.53] | 0.44 | .647 | 0.06 |

Age | 0.58 | [-0.21, 1.38] | 1.47 | .147 | 0.11 |

Accuracy*Precision | -0.01 | [-0.03, 0.01] | 0.85 | .397 | 0.28 |

Precision*Age | -0.82 | [-2.09, 0.45] | 1.29 | .201 | 0.48 |

Accuracy*Age | -0.03 | [-0.08, 0.01] | 1.67 | .101 | 0.80 |

Accuracy*Precision*Age | 0.02 | [-0.04, 0.09] | 0.77 | .440 | 0.23 |

The data suggest that there is not a strong relationship between non-symbolic numerical comparison performance and the precision of participants' numerical representations. A Bayes factor analysis suggests evidence for the null hypothesis, BF = 0.126 for the regression model with Estimation Accuracy and Precision as predictors.

Participants (

Participants completed three tasks during the experimental session; the Numerical Comparison, Free Response Estimation and the ^{rd} Edition

Stimuli were 90 visually presented pairs of shape arrays with a midline separator. Shape arrays ranged in number from 23 to 111 (see

Stimuli were 40 visually presented shape arrays. Arrays included randomly places black squares of varying sizes. The number of objects ranged from 23 to 111 (see

Participants completed the

Participant’s performance on the numerical comparison task ranged from 60% to 94% correct with a median of 80%.

We eliminated any trial for which participants did not make a response, representing 4% of trials. We also eliminated responses that represented the top 5% of estimates as many of these responses appeared to by types, e.g., ‘500000'. Participants’ performance on the estimation task was calculated in the same manner as in Experiment 1. Accuracy was calculated by taking the absolute value of the difference between the target value and participants’ given estimate and dividing by the target value. This gives us a ratio-difference score, e.g., an estimation of 13 for the target value 10 would produce a score of 0.3. Participant accuracy ranged from 0.30 to 1.06 with a median of 0.60.

We also calculated the variation in participants’ responses, separately from accuracy. Participants’ precision score calculation was the same as described in Experiment 1. Participants’ precision score ranged from 0.29 to 0.84 with a median of 0.56. Precision and accuracy scores significantly correlated where increased precision was associated with higher accuracy,

Participant’s performance on the TEMA was calculated using the scoring instructions. Participant's scores ranged from 85 to 132 with a median of 114.

We conducted a linear regression to predict participants’ numerical comparison score (arcsine transformation of the proportion of correct responses) using estimation accuracy, estimation precision and age and TEMA score as predictors. Given the regression analysis to be performed 30 participants is sufficient for a large effect size (

Independent Variable | B | 95% CI | |||
---|---|---|---|---|---|

Estimation Accuracy | -0.02 | [-0.35, 0.29] | 0.17 | .86 | 0.06 |

Estimation Precision | 0.06 | [-0.32, 0.46] | 0.35 | .73 | 0.04 |

TEMA score | 0.0002 | [-0.004, 0.005] | 0.10 | .91 | 0.04 |

Age | -0.04 | [-0.13, 0.05] | 0.86 | .39 | 0.34 |

We found that adults’ performance on the estimation task was significantly different in terms of accuracy (

For both Experiments 1 and 2 we find no statistically significant relationship between participants’ performance on the non-symbolic numerical comparison task and estimation task scores. The behavioral data reported in data sets for Experiments 1 and 2 rely on the interpretation of a null effect, thus we used a Bayes Factor approach. While the two experiments had a range of participant ages, we found the same pattern across both analyses for adults and children. In both cases, Bayes factor values suggest evidence for the null effect when compared to the tested regression models. We also do not find that Experiment 2 participants’ TEMA scores significantly predicted numerical comparison scores, despite prior evidence of a connection (

Current study results differ from prior work which reported a significant relationship between estimation variability and number comparison but not estimation accuracy and number comparison (

We interpret the results of this experiment as inconsistent with the Precision hypothesis.

The lack of significant correlation between estimation precision and numerical comparison suggests that numerical representation precision is not the

The purpose of the modeling experiment is to demonstrate how well the processes proposed by the Precision and Decision hypotheses fit the behavioral data from both Experiments 1 and 2. We evaluated the Precision hypothesis and the alternative Decision hypotheses using a dynamic neural field model (e.g.,

An important point here is that the current model is much more strict than prior models of numerical comparison (

The model was implemented using MATLAB (MathWorks). The architecture was a multilayered dynamic systems model (e.g.,

Each trial was comprised of 600 time-steps, which was selected to be large enough for activity from the input layers to create a decision in the output layer. Decisions were defined as when the decision layer produced a steady peak (activity with a peak value at the same layer index for 10 straight time-steps). The time-step of the decision was converted to the predicted reaction time of the decision. Thus on trials in which the model predicted a fast decision the steady peak was reached a relatively low time-step. On trials in which the model predicted a slower decision, the steady peak was reached on a higher time-step.

Model instantiations were fit to behavioral data from Experiment 2 using an evolutionary optimization algorithm. Model instantiations were completed in batches of 10, each corresponding to a generation. For each generation, the model instantiations were ranked based on their deviation from the behavioral data. Model instantiations with smaller deviations, smaller error, were ranked higher. For each generation instantiations ranked, 1–2 were moved forward as is to the next generation. Instantiations ranked 3–5 were ‘mutated' by adjusting the specifications by a small random amount. Instantiations ranked 6–10 were discarded. Thus each generation included 5 new instantiations were randomly generated specifications, 3 ‘mutated’ instantiations and 2 instantiations carried over from the previous generation. The specifications of the evolutionary algorithm were selected to maximize the efficiency of the algorithm to keep the number of batches needed relatively low.

The same process was employed for modeling behavioral data from Experiment 1 (adult participants) and Experiment 2 (child participants).

The precision condition model instantiations performance on the numerical comparison task ranged from 0.46 to 0.72 with a median of 0.62. Performance on the estimation task, in terms of average proportional deviation from the target, ranged from 0.08 to 0.27 with a median of 0.15. We found that performance on the comparison task correlated with the neural tuning curve width

The decision condition model instantiations performance on the numerical comparison task ranged from 0.50 to 0.86 with a median of 0.65. Performance on the estimation task, in terms of average proportional deviation from the target, ranged from 0.08 to 0.23 with a median of 0.14. We found that performance on the comparison task correlated with the neural tuning curve width

Model data was evaluated using a similar analysis to the behavioral data. Each model version produced independent simulations of the numerical comparison and estimation task. We compared results for the Precision condition models (

To compare model fit for Precision and Decision conditions we compared the deviation from human data for both tasks. The model error for the numerical comparison task was significantly lower for the Decision condition models (median = 0.01) than for the Precision condition models (median = 0.2),

We calculated overall model error by combining numerical comparison and estimation error amounts. The overall model error was calculated as _{Comparison} + Error_{Estimation} / 100

These results show that the Decision condition models are better able to fit adult participants data for the numerical comparison task. Fit to participant data for the estimation task was equivalent. This suggests that the additional decision layer specification was only relevant to model fit to numerical comparison task and that it leads to superior model fit compared to the use of neural tuning curve precision.

The precision condition model instantiations performance on the numerical comparison task ranged from 0.58 to 0.72 with a median of 0.67. Performance on the estimation task, in terms of average proportional deviation from the target, ranged from 0.25 to 0.08 with a median of 0.13. We found that performance on the comparison task did not significantly correlate with the neural tuning curve width,

The decision condition model instantiations performance on the numerical comparison task ranged from 0.57 to 0.93 with a median of 0.77. Performance on the estimation task, in terms of average proportional deviation from the target, ranged from 0.08 to 0.22 with a median of 0.12. We found that performance on the comparison task not significantly correlated with the neural tuning curve width

Model data was evaluated using a similar analysis to the behavioral data. Each model version produced independent simulations of the numerical comparison and estimation task. We compared results for the Precision condition models (

To compare model fit for Precision and Decision conditions we compared the deviation from human data for both tasks. The model error for the numerical comparison task was significantly lower for the Decision condition models (median = 0.03) than for the Precision condition models (median = 0.15),

We calculated the overall model error by combining numerical comparison and estimation error amounts. The overall model error was calculated as _{Comparison} + Error_{Estimation} / 100

These results show that the Decision condition models are better able to fit participants data for the numerical comparison task. Fit to participant data for the estimation task was equivalent. This suggests that the additional decision layer specification was only relevant to model fit to numerical comparison task and that it leads to superior model fit compared to the use of neural tuning curve precision.

The mathematical modeling results demonstrate that the Decision model instantiations fit both adult’s and children’s data significantly closer than the Precision model instantiations. Put more generally, a mathematical model that includes specifications for both numerical representation and decision-making is a better fit to human data than a model that only includes numerical representation. The modeling results suggest that behavioral data reported in behavioral Experiments 1 and 2 cannot be well-characterized using only neural tuning curve precision. This is in contrast with the apparent success of using neural tuning curves to model numerical comparison task (

The current study evaluated two models of the processes involved in comparing non-symbolic numbers. Results from both empirical and mathematical experiments are inconsistent with the hypothesis that numerical comparison performance is better characterized by variation in neural tuning curve precision. We find that participant’s performance on free response estimation, used as an estimate of tuning curve precision, does not correlate with numerical comparison performance. Mathematical modeling results demonstrate that variation in the decision-making process can better account for participants’ numerical comparison scores above and beyond variations in neural tuning curve precision. We interpret these results as inconsistent with the Precision hypothesis. Individual variation in performance on the numerical comparison task is not primarily due to variation in tuning curve precision.

The current results provide important evidence regarding the processes involved in non-symbolic numerical comparison. The current and recent results suggest that numerical representation precision does not play the primary role in the numerical comparison task. This contradicts some previous speculation about the role of neural tuning curve precision in numeral tasks (

If the individual variation in numerical comparison accuracy is due to decision-making more so than number representation what does that tell us? The importance of decision-making in numerical comparison may be informative in the design of interventions to improve learner’s performance on numerical tasks. Individual variation in numerical decision-making may contribute to the association between numerical comparison skill and general mathematical skill. Learners’ skill at numerical decision-making may contribute to performance in a wide range of numerical and arithmetic task.

If learners’ performance on the numerical comparison task can be characterized without invoking their representations of number values it calls into question the source of the correlation between numerical comparison skill and later arithmetic skills. Recent meta-analysis show mixed evidence that numerical comparison skill, in and of itself, predicts later performance (

How do the numerical comparison measures used here relate to other work? The non-symbolic numerical comparison task has varying relationships to other measures depending on the details of the task (e.g.,

The child participant data lower sample size may contribute to a possible Type 1 error. It is also possible that adult and child participant results vary because of developmental changes in the relationship between estimation accuracy and precision. Given the scope of the current data we suggest caution in interpreting differences between the adult and child participants.

Reliability of measures calculated using a split-half Spearman correlation. For the data in Experiment 1, we calculated the Spearman coefficient using split half as

Other models of decision-making such as drift diffusion models are fairly successful for weighing evidence in two-alternative decision-making (

23, 29, 33, 37, 69, 87, 99, 111

The author has no funding to report.

The author has declared that no competing interests exist.

The author has no support to report.