The Relationship Between Children’s Approximate Number Certainty and Symbolic Mathematics

Why do some children excel in mathematics while others struggle? A large body of work has shown positive correlations between children’s Approximate Number System (ANS) and school-taught symbolic mathematical skills, but the mechanism explaining this link remains unknown. One potential mediator of this relationship might be children’s numerical metacognition: children’s ability to evaluate how sure or unsure they are in understanding and manipulating numbers. While previous work has shown that children’s math abilities are uniquely predicted by symbolic numerical metacognition, we focus on the extent to which children’s non-symbolic/ANS numerical metacognition, in particular sensitivity to certainty, might be predictive of math ability, and might mediate the relationship between the ANS and symbolic math. A total of 72 children aged 4–6 years completed measures of ANS precision, ANS metacognition sensitivity, and the Test of Early Mathematical Ability (TEMA-3). Our results replicate many established findings in the literature, including the correlation between ANS precision and the TEMA-3, particularly on the Informal subtype questions. However, we did not find that ANS metacognition sensitivity was related to TEMA-3 performance, nor that it mediated the relationship between the ANS and the TEMA-3. These findings suggest either that metacognitive calibration may play a larger role than metacognitive sensitivity, or that metacognitive differences in the non-symbolic number perception do not robustly contribute to symbolic math performance.

beyond the contributions of IQ, executive function, and domain-general metacognitive abilities (e.g., knowledge of optimal memory strategies).
If individual differences in symbolic mathematics are uniquely predicted by numerical metacognition, and if the ANS generates an approximate numerical confidence signal, then one possible explanation for the correlation between these two abilities may be that individual differences in metacognitive sensitivity mediate the relationship between the ANS and symbolic math. Indeed, one previous finding suggests that individual differences in ANS metacognition correlate with a standardized symbolic math test (the Test of Early Mathematical Ability ) which has also previously been shown to robustly correlate with ANS precision (Vo et al., 2014).
However, this previous work did not concurrently measure individual differences in ANS precision, leaving open the possibility that ANS metacognition contributes to symbolic mathematics independently of the ANS itself, or that the observed correlation between the ANS and symbolic math may be entirely explained by the correlation between ANS precision and ANS metacognition.
The existing work relating symbolic and non-symbolic numerical metacognition to math to date has also relied on measures of metacognitive calibration: how well children's internal certainty states match their actual accuracy. However, it is well known that measures of calibration combine metacognitive sensitivity (how well children can discriminate between their own certainty states, also referred to as metacognitive resolution or precision) and metacognitive bias (a general tendency to over/under-estimate certainty), two distinct components of metacognitive ability (for discussion, see Nelson, 1984). One possibility not yet considered in the literature is that these components of metacognition may have different relationships with numerical and/or mathematical reasoning (see Winman, Juslin, Lindskog, Nilsson, & Kerimi, 2014). For instance, it may be more important for children to be well-attuned to their own certainty (sensitivity) than it is for children to be able to accurately report their certainty (bias), or vice versa.
One strategy to separate metacognitive sensitivity from metacognitive bias is to use statistics largely drawn from Signal Detection Theory (e.g., gamma, Nelson, 1984; A'ROC, Vo et al., 2014;d', Maniscalco & Lau, 2012;Salles, Ais, Semelman, Sigman, & Calero, 2016), though these measures are not as effective if children do not report both high and low certainty with relatively equal frequency (a common occurrence given children's documented overconfidence in many tasks, including number reasoning, Destan & Roebers, 2015;Nelson, 1984;Vo et al., 2014). Another strategy, which we adopt here, is to experimentally isolate metacognitive sensitivity by asking children to compare their certainty between two trials, rather than reporting their certainty in a single trial (see Nelson, 1984;Nelson & Narens, 1980). These "relative" metacognitive assessments have been used in memory and perception research for several decades, (Barthelmé & Mamassian, 2009;Butterfield, Nelson, & Peck, 1988;De Gardelle & Mamassian, 2014;Lipowski, Merriman, & Dunlosky, 2013), and have recently been shown to be appropriate for measuring children's ANS metacognitive sensitivity separately from their calibration abilities (Baer et al., 2018;Baer & Odic, 2019).
Here, we sought to replicate and extend the existing work on the relationship between ANS metacognition and symbolic mathematics by concurrently measuring individual differences in three abilities ( Figure 1): 1) ANS precision, measured through a classic assessment asking children to report the more numerous of two arrays of dots ; 2) ANS metacognitive sensitivity, measured through a relative confidence task in which children decide which of two available trials they are more certain of getting right (Baer et al., 2018;Baer & Odic, 2019); and 3) the Test of Early Mathematics 3 rd Edition (TEMA-3, Ginsburg & Baroody, Math and Number Sense Confidence 52 2003), an age-standardized measure of symbolic mathematics ability appropriate for preschool-aged children.
Beyond seeking to replicate the previously documented relationship between ANS precision and the TEMA-3, we also tested whether this relationship might be mediated by individual differences in ANS metacognition sensitivity. Furthermore, because previous work has shown that ANS precision selectively predicts the informal scale of the TEMA-3 (e.g., children's counting and basic arithmetic skills), over the formal scale (e.g., more advanced abilities such as word problems;  our secondary goal was to test whether ANS metacognition contributes separately to formal vs. informal components of the TEMA-3. such as Law or Medicine, data unavailable for the remaining 13 children). We did not ask parents to report the grade level of their child, but we estimate based on their birthdays that 10 were in grade 1, 27 were in kindergarten, and the remaining 35 were in preschool or in-home care at the time of testing.

Materials and Procedures
The study consisted of three tasks in a set order: 1) the ANS Discrimination Task; 2) the Relative ANS Metacognition Task; and 3) the TEMA-3. The ANS Discrimination Task and the Relative ANS Metacognition Task were both presented using an 11.3" Apple Air laptop computer using custom-made scripts in Psychtoolbox-3 (Brainard, 1997). These scripts are freely available online see the Supplementary Materials section. The TEMA-3 was subsequently administered by the researcher in the same room and followed the standardized testing guidelines set by the test. After completing the TEMA-3, children would "trade in" coins they earned during the study in exchange for a small prize (a book, stuffed animal, or shirt) and a diploma as a present for their participation (all children were given the same choice of prizes no matter how many coins they accrued).

ANS Discrimination Task
To measure the precision of each child's intuitive sense of number, we used a 30-trial dot comparison task used widely in the literature on the ANS Odic & Starr, 2018). Each trial consisted of a group of yellow dots on the left side of the screen and a group of blue dots on the right (see Figure 1) presented Baer & Odic 53 for 1,200 ms, and children were subsequently asked to select the group that has more dots without counting.
Dot area varied randomly in between 0.35° and 1.82° visual angle, measured at a viewing distance of 40 cm (though we did not restrict children's viewing distance, we estimate that most children sat about this distance from the monitor). The rectangle within which the dots were drawn subtended approximately 10.1° × 20.9°v isual angle. The cumulative area and density of the dots was controlled for across trials; on half the trials the cumulative area and density was Congruent with the side that had more dots (e.g., if the left side had twice as many dots, it also had twice the cumulative area), and on the other half the cumulative area and density was Incongruent with the side that had more dots (e.g., if the left side had twice as many dots, it also had half the cumulative area). To vary difficulty and draw out individual variability associated with the precision of children's ANS, we presented 5 numerical ratios: 1.13 (e.g., 8 yellow dots and 9 blue dots), 1.33, 1.5, 2.0, and 3.0. Children received pre-recorded accurate positive or negative feedback from the computer (e.g., "That's right!" or "Oh, that's not right!") to motivate children to complete as many trials as possible, and children were rewarded with a coin in their cup for each correct answer. Occasionally, the experimenter would give feedback to encourage the child to stay engaged in the task (e.g., "That's okay, let's do another one!"). To avoid the influence of motor development on the results, children either verbally identified or pointed to the set with more dots, while the experimenter pushed a corresponding button to indicate their answer. Spearman-Brown split-half corrected reliability for this task was .65. Note. In the Number Discrimination Task (Section a), children indicated which color had more dots. In the Relative Approximate Number System Metacognition Task (Section b), children selected the question they expected to answer correctly, then answered only that question.

Relative ANS Metacognition Task
This task was modeled after existing work in adults (e.g., Barthelmé & Mamassian, 2009;De Gardelle et al., 2016;De Gardelle & Mamassian, 2014) and children (e.g., Baer et al., 2018;Butterfield et al., 1988;Lipowski et al., 2013), which measures individual differences in metacognitive sensitivity. To experimentally isolate metacognitive sensitivity from other components of metacognition, including calibration/bias, children are not Math and Number Sense Confidence 54 asked to provide a report of their confidence in a single item, but rather select one of two items that they feel more confident in answering (for discussion, see Nelson, 1984). This task logic stems from Signal Detection Theory, and is also the origin of the ANS Discrimination Task we use (we can isolate numerical reasoning from under-estimation biases or a lack of knowledge of number words by asking children to indicate which of two sets is more numerous). Previous work has shown that this paradigm reliably measures individual differences in certainty sensitivity (the smallest difference in certainty that children can identify) while alleviating issues with over-or under-confidence; since children are always selecting the trial they are more certain in, their general tendency to be over-or under-confident across the board does not impact performance on the task (see Baer et al., 2018).
In each of 30 trials, children were shown two screenshots of dot comparison trials (as in the ANS Discrimination Task described above, except that each screenshot subtended approximately 5.87° by 9.38° visual angle) that varied in numerical ratio, and were asked to select one question to answer (see Figure 1). Children would verbally identify or point toward the side containing the trial that they wished to attempt. Given previous work (Baer et al., 2018;Baer & Odic, 2019) and the coin reward structure, we expected children to be motivated to choose the higher certainty (i.e., easier) of the two dot comparison questions. Afterwards, the trial children identified would zoom in and cover the entire screen, and children would be asked to identify which set-the blue or the yellow-had more dots, in the identical manner to the ANS Discrimination Task above. Children received no feedback for their trial choice (i.e., the metacognitive part of the task), but did receive pre-recorded positive and negative feedback from the computer for the dot comparison trial they chose and would receive a coin if they answered the selected trial correctly. As in the ANS Discrimination Task, the experimenter would occasionally give feedback to encourage the child to stay engaged in the task (e.g., "That's okay, let's do another one!").
To measure which children were better able to distinguish smaller and smaller differences in their certainty, we varied the difference between the two presented trials by controlling their "metaratio" (i.e., the ratio of the ratios for the two presented dot comparison trials). On each trial, children were presented with one of five metaratios: 1.1 (e.g., ratio 1.33 vs. ratio 1.2), 1.25, 1.5, 2.0, and 4.0. While the large metaratios, such as 4.0, present children with an extremely easy versus an extremely difficult trial, the smaller metaratios, such as 1.1, present children with two trials that are both intermediately difficult and that could only be differentiated if children are representing certainty at a very narrow grain. Spearman-Brown split-half corrected reliability for this task was .55. The dependent variable in this task was accuracy in choosing the easier (larger) ratio.

TEMA-3
To assess children's math skills, all children received Form A of the TEMA-3 (Ginsburg & Baroody, 2003), which has been commonly used to examine the relationship between the ANS and math performance (e.g., Libertus et al., 2013;Vo et al., 2014). The TEMA-3 is a standardized test of mathematics ability normed for children age 3-8 years that is administered verbally by the experimenter with simple materials like tokens or flashcards. Following the standardized guidelines for the TEMA-3, the experimenter would begin the test at an age-normed question and proceed to ask harder and harder questions until the child hit a ceiling level of performance; subsequently, the experimenter would ask easier questions until the child hit the base level.
Raw scores on the test were converted to an age-normalized score centered at 100. In addition, we calculated two subscale scores: Informal Score, which includes basic counting skills, number comparisons using spoken words or objects, and informal calculation using objects; and the Formal Score, which relies on concepts that Baer & Odic 55 are typically only learned through schooling, such as reading and writing numbers, memorized facts about common calculations, and solving written equations. Because no age norms exist for these subscales, we report the average number of questions correctly solved by the child; all analyses involving Informal and Formal scores instead statistically control for age.

ANS Discrimination Task
Consistent with past work on children's developing sense of number Odic, 2018),  Note. Ages are binned for presentation, but treated continuously in our analyses. Error bars represent 1 SE.

Relative ANS Metacognition Task
In the Relative ANS Metacognition Task, children chose the easier question 56% of the time (SD = 14%), a rate significantly higher than chance of 50%, t(71) = 3.88, p < .001, d = 0.46. This is consistent with past work using this paradigm and suggests that children chose the trial they were more confident in answering correctly.

Math and Number Sense Confidence 56
Also consistent with past work, we find large individual differences in this task: while some children could only distinguish very large differences between the ratios (i.e., only between trials they were "very sure" vs. "not at all sure"), other children could reliably distinguish even small differences (i.e., trials they were "very sure" vs. "somewhat sure", Baer et al., 2018;Baer & Odic, 2019) ii . A Greenhouse-Geisser corrected repeated-measures ANCOVA with Metaratio (i.e., the ratio between ratios) as the IV and Age as a covariate showed that accuracy on this task was ratio-dependent, F(3.48, 243.70) = 33.21, p < .001, η p 2 = .32, see Figure 2b, but that accuracy did not improve with Age, F(1, 70) = 2.86, p = .095, η p 2 = .04; there was also no significant Metaratio by Age interaction, F(3.48, 243.70) = 0.80, p = .511, η p 2 = .01. We also found no correlation between children's choice of trials on the ANS Metacognition Task and ANS accuracy when controlling for age, r(69) = .06, p = .598, consistent with previous reports that children's metacognition on ANS tasks taps into more domain-general metacognitive components (Baer et al., 2018). These results broadly replicate previous work showing that as the difference in certainty between two trials grows, children are increasingly sensitive to this difference.

TEMA-3
Children's TEMA-3 age-normed scores averaged 107.75 (SD = 13.14), reflecting a slightly above-norm sample, that the TEMA-3 does not provide age-norms for these two scales, and hence we only report the raw scores for these two subscales.

Regression Analyses
Next, we examined whether ANS accuracy and Relative ANS Metacognition account for variability in TEMA-3 performance, first for the age-normed TEMA score, and then for the Informal and Formal subscales. We replicate the previously documented relationship between ANS precision and children's TEMA-3 scores (see Table 1): a hierarchical linear regression entering Age and ANS accuracy in Step 1 significantly predicted children's standardized TEMA-3 scores, with ANS accuracy as the only significant predictor. However, we found that the addition of ANS Metacognitive accuracy in Step 2 did not add any further predictive power, and that the beta coefficient for this factor is not significantly different from zero (Table 1). In other words, when controlling for individual differences in Age and ANS, non-symbolic ANS Metacognition does not contribute additional explanatory power to the TEMA-3.
Previous work has shown that the relationship between the ANS and formal mathematics may be selective for only the Informal subscale of the TEMA-3 . To confirm this, we conducted a mixed-level linear regression over the raw formal and informal TEMA scores with Age and ANS accuracy as fixed effects and subjects as random slopes; we dummy-coded the TEMA scores into a Type variable (i.e., "Formal" or "Informal"), and added it as an interaction term for ANS accuracy. Likelihood ratio tests were used for model significance testing: Step 1 was compared to an intercept-only model, Step 2 was compared to the model from Step 1. In Step 1 (ANS accuracy, Type, and Age), the fixed effects accounted for 83% of the variance, with no significant effect of Type or ANS, but a significant effect of Age and a significant ANS × Type interaction (see Table 1 when controlling for age. Furthermore, adding Metacognitive accuracy in Step 2 did not add any additional explanatory power (see Table 1). Therefore, even when examining the two subscales separately, we find that Metacognitive accuracy does not contribute any explanatory power beyond the contributions of ANS accuracy and Age.

Mediation Analyses
Although our initial analytic plan was to conduct a full mediation analysis between ANS accuracy, Metacognitive accuracy, and the TEMA-3, we did not do so as we found that Metacognitive accuracy was not predicted by the ANS, r(69) = .06, p = .598, nor did it predict TEMA-3 scores when considered alone, r(69) = .13, p = .271, or when additionally controlling for ANS accuracy, ß = 0.11, p = .333. As a result, Metacognitive accuracy cannot be entered into a formal mediation analysis with these variables (Baron & Kenny, 1986).

Additional Analyses of the Relative ANS Metacognition Task
Given that the interpretation of our [null] result is contingent on the Metacognitive Task accurately measuring variability in ANS metacognitive sensitivity, we conducted additional analyses for this measure.
First, to check that children were paying attention during the Metacognitive Task and not responding randomly on every trial, we examined whether the ANS portion of the task (i.e., when children would answer their chosen question) followed similar patterns to the ANS Discrimination task. We found that children chose the side with more dots 83% of the time (SD = 10%), well above chance of 50%, t(71) = 28.74, p < .001, d = 3.92. In fact, a linear regression with the accuracy on the chosen ANS trials from the Metacognitive Task and
Next, because children in this sample were only slightly more likely than chance to select the larger metaratio, we also re-ran our regression analyses with the subset of participants whose performance was significantly above chance (see Inglis, Attridge, Batchelor, & Gilmore, 2011 for a similar approach). We removed all children who chose the larger metaratio 50% or less of the time, leaving a sample of 42 children. Nevertheless, we once again did not find that adding Metacognitive accuracy into a regression with ANS and Age significantly changed R 2 , including for formal vs. informal scores (all ∆R 2 < .05, all p's > .143). We also again failed to find that Metacognitive accuracy correlated with the ANS, r(39) = −.09, p = .565 or any of the TEMA-3 scores (all r's < .19, all p's > .240).
Finally, as can be seen in Figure 2b, children's performance on the Metacognitive Task varied with metaratio (just like their ANS performance varied by numerical ratio). Because the lowest metaratios may have been too difficult for children and might have been introducing noise into the measure, we also decided to examine children's performance on the Metacognitive task by only taking the largest three metaratios (1.5, 2.0, and 4.0) into consideration. When looking at these metaratios together, children chose the larger (more certain) ratio on 65% of trials (SD = 18%), above chance of 50%, one-sample t (71)  Overall, then, even when looking at these subsamples of our data that maximize children's performance on the metacognitive task, we replicate the main findings that children chose trials to complete using a sense of certainty and answered those trials with their ANS; while the latter component was predictive of symbolic math ability, the former was not.

Discussion
What factors explain the observed relationship between individual differences in children's intuitive number sense and symbolic mathematics? Here, we tested whether non-symbolic/ANS numerical metacognition-that is, the sensitivity of children's ability to decide when they can easily discriminate between ANS trials-might mediate the reported relationship between ANS precision and performance on a standardized test of symbolic mathematics (TEMA-3). While we replicated the previously documented relationship between ANS precision and the TEMA-3, we failed to find any reliable relationship between ANS metacognitive sensitivity and the TEMA-3. We therefore conclude that individual differences in ANS metacognitive sensitivity may not contribute to the relationship between ANS precision and the TEMA-3.
If, as noted in the Introduction, the ANS generates a metacognitive signal (Baer & Odic, 2019;Halberda & Odic, 2014), and if symbolic math performance is at least partly predicted by more general numerical metacognition (Bellon et al., 2019;Rinne & Mazzocco, 2014;Vo et al., 2014), why did we fail to observe a relationship Baer & Odic 59 between these variables? We see at least two possible explanations for this failure. One is that our relative metacognition task failed to measure children's metacognitive sensitivity, or at least failed to measure it with sufficient power to detect an existing relationship. This explanation, however, is inconsistent with the fact that children performed above chance at choosing the more certain trial, particularly on the largest metaratios where the difference in certainty was larger. If children were answering randomly, or succeeding on the task by computing a quantity other than confidence, we would not expect either of these to hold. Moreover, when we looked at only children who performed above chance, and when we removed the two lowest-performing metaratios, the relationship with TEMA-3 scores did not change.
An alternative explanation is that perceptual metacognition is not directly relevant for symbolic mathematics, even if symbolic number metacognition is. In previous work using this relative certainty measure, children's performance on a relative confidence ANS task strongly correlated with performance on a relative confidence surface area and emotion tasks, even though these three dimensions are perceptually independent of each other (Baer et al., 2018). In other words, and consistent with similar suggestions in the adult literature (De Gardelle et al., 2016;De Gardelle & Mamassian, 2014), perceptual confidence may be represented domain-generally, and may not be tied to specific representations that instantiate it. Individual differences in the relative ANS metacognition task may therefore broadly tap into general perceptual metacognition, rather than specific numerical metacognition. And, since previous work has demonstrated that number-specific metacognitive abilities are more predictive of symbolic math performance than domain-general metacognitive ones (Bellon et al., 2019), one potential explanation for our finding may be that the ANS Metacognition Task only taps into the more general differences in perceptual metacognition that are not as relevant for early math abilities. Furthermore, it is also possible that while symbolic number metacognition is relevant for mathematics, non-symbolic number metacognition may not be, whether tapped into by our task or not.
One challenge for this explanation is that a positive correlation between ANS metacognition and the TEMA-3 has been previously reported by Vo et al. (2014). One key difference between their work and ours is the nature of the metacognitive task. In Vo and colleagues' work, children's metacognition was assessed through an "absolute" confidence task: children were asked to rate their confidence as either high or low by betting coins depending on whether they believed they answered the dot comparison trial correctly or incorrectly. Beyond potentially tapping into individual differences in impulse control, risk aversion, and other executive function differences, absolute confidence tasks cannot always separate metacognitive sensitivity from metacognitive bias: a general tendency children have to constantly indicate high or low confidence (Baer & Odic, 2019;Butterfield et al., 1988;Lipowski et al., 2013;Nelson, 1984). Therefore, one possible explanation that accounts for all the available evidence is that children's metacognitive sensitivity alone (what we experimentally isolate in our measure) is not related to their TEMA-3 performance, but their metacognitive calibration is related (as found in past work, Bellon et al., 2019;Rinne & Mazzocco, 2014;Vo et al., 2014; and see Winman et al., 2014 for further discussion of components of metacognition). Future work can test this possibility by simultaneously measuring children's metacognitive sensitivity and metacognitive calibration, ideally through a mixture of both relative and absolute certainty tasks.
Although we did not find a significant correlation between ANS metacognitive sensitivity and symbolic mathematics, we did successfully replicate the correlation between ANS precision and the TEMA-3 Starr et al., 2013; for review, see Szkudlarek & Brannon, 2017), including the stronger relationship between the ANS and the informal questions on the TEMA-3 compared to formal ones .

Math and Number Sense Confidence 60
We found that this relationship held even when controlling for age and individual differences in ANS metacognitive sensitivity. While there has been a significant amount of debate about the validity of the correlation between the ANS and mathematical abilities (e.g., Gilmore et al., 2013;Szűcs, Nobes, Devine, Gabriel, & Gebuis, 2013), two recent meta-analyses have shown that there is a small, but reliable effect between the ANS and mathematical abilities (Chen & Li, 2014;Schneider et al., 2017). Our data contributes to this literature, and generally aligns with the size of the effect that has been documented by these meta-analyses.
In conclusion, the primary aim of this work was to test whether ANS metacognitive sensitivity correlates with formal mathematics, and if so, whether it acts as a mediating variable between ANS precision and the TEMA-3. We failed to find any significant relationship between ANS metacognitive sensitivity and the TEMA-3, suggesting either that metacognitive calibration may play a larger role or that metacognitive differences in the ANS do not robustly contribute to number-specific metacognitive abilities that are thought to be critical for symbolic math performance.
Notes i) Thirteen children in the sample were not rewarded with coins because this change was added after testing began to try and boost children's motivation to complete the task. Results do not change in direction or magnitude with these children removed.
ii) Ten children in our sample consistently chose the smaller ratio, a similar rate to other reported results using this paradigm (Baer et al., 2018;Baer & Odic, 2019). Results reported here do not differ in direction or magnitude if these children are removed from the sample.

Funding
This work was supported by an Insight Grant from the Social Sciences and Humanities Research Council of Canada (SSHRC) to DO, and a SSHRC Joseph-Armand Bombardier Canada Graduate Scholarship to CB.