Can You Trust Your Number Sense: Distinct Processing of Numbers and Quantities in Elementary School Children

Theories of number development have traditionally argued that the acquisition and discrimination of symbolic numbers (i.e., number words and digits) are grounded in and are continuously supported by the Approximate Number System (ANS)—an evolutionarily ancient system for number. In the current study, we challenge this claim by investigating whether the ANS continues to support the symbolic number processing throughout development. To this end, we tested 87 first(Age M = 6.54 years, SD = 0.58), third(Age M = 8.55 years, SD = 0.60) and fifth-graders (Age M = 10.63 years, SD = 0.67) on four audio-visual comparison tasks (1) Number words–Digits, (2) Tones–Dots, (3) Number words–Dots, (4) Tones–Digits, while varying the Number Range (Small and Large), and the Numerical Ratio (Easy, Medium, and Hard). Results showed that larger and faster developmental growth in the performance was observed in the Number Words–Digits task, while the tasks containing at least one non-symbolic quantity showed smaller and slower developmental change. In addition, the Ratio effect (i.e., the signature of ANS being addressed) was present in the Tones–Dots, Tones–Digits, and Number Words–Dots tasks, but was absent in the Number Words–Digits task. These findings suggest that it is unlikely that the ANS continuously underlines the acquisition and the discrimination of the symbolic numbers. Rather, our results indicate that non-symbolic quantities and symbolic numbers follow qualitatively distinct developmental paths, and argue that the latter ones are processed in a semantic network which starts to emerge from an early age.

Taken together, for many years now, these and other similar findings were forging strong the idea that the ANS and its ratio acuity underline the acquisition of the symbolic numbers (e.g., Dehaene, 2007;Dehaene & Cohen, 1995;Piazza, 2010;Matthews et al., 2016;Von Aster & Shalev, 2007; see also Dehaene & Changeux, 1993).
An increasing amount of developmental and experimental behavioural research, however, recently challenged the foundational role of the ANS in the processing and acquisition of symbolic numbers. Concretely, it has been suggested that symbolic numbers and non-symbolic quantities are acquired independently and follow distinct developmental trajectories (see Bialystok, 1992;Carey, 2009aCarey, , 2009bLeibovich & Ansari, 2016;Noël & Rousselle, 2011;Núñez, 2017;Reynvoet & Sasanguie, 2016;Wilkey & Ansari, 2020). First, studies investigating children's abilities to map between non-symbolic quantities (i.e., sets of dots) and symbolic numbers (i.e., digits and number words) showed that symbolic number mappings (i.e., number words-digits) are acquired earlier than digits-dots mappings (Hurst, Anderson, & Cordes, 2017;Jiménez Lira, Carver, Douglas, & LeFevre, 2017;. These latter findings are difficult to reconcile with the traditional ANS view, according to which the mappings between symbolic numbers and non-symbolic quantities develop before the number words-digits mappings, because the non-symbolic quantity representations are readily available (e.g., Benoit et al., 2013). Instead, these results suggest that symbolic number representations develop independently of the ANS (e.g., Hurst et al., 2017;. Second, recent studies (e.g., Goffin & Ansari, 2019;Lyons, Bugden, Zheng, De Jesus, & Ansari, 2018;Sasanguie, Defever, Maertens, & Reynvoet, 2014;Sasanguie, Göbel, Moll, Smets, & Reynvoet, 2013; and meta-analyses (Schneider et al., 2017) suggest that indeed a small reliable link between the ANS acuity and the (later) mathematics achievement is present, but symbolic number skills are related stronger and contribute more to the (later) mathematics achievement than the non-symbolic quantity processing skills. Specifically, in their meta-analysis, Schneider et al. (2017) found that the relation between the symbolic comparison tasks (i.e., measurement of symbolic number processing) and math achievement tests is significantly higher, r = .302, 95% CI [.243, .361], than the relation between numerosity comparison tasks (i.e., a measurement of ANS acuity) and math achievement, r = .241, 95% CI [.198, .284].
Finally, previous research has shown that the ratio effect in symbolic numerical tasks is not always present, challenging the claim that numerical ratio is an essential factor in symbolic number discrimination. For example, in a series of audio-visual studies Reynvoet (2018, 2020) and Sasanguie, De Smedt, and Reynvoet (2017) report that in adult participants, a ratio effect is always present when the task involves non-symbolic quantities (i.e., tones -dots, tones -digits, number words -dots), but no ratio effect is observed in the purely symbolic task (i.e., number words -digits; see also van Hoogmoed & Kroesbergen, 2018).
According to the authors, these results indicate that there are distinct mental representations for symbolic numbers and non-symbolic quantities. More specifically, they suggest that, in contrast to non-symbolic quantities, symbolic numbers are represented precisely in a semantic network, where numbers are represented in terms of their associative relations (e.g., Krajcsi, Lengyel, & Kojouharova, 2016Reynvoet & Sasanguie, 2016;Vos, Sasanguie, Gevers, & Reynvoet, 2017).
Overall, these studies provide evidence incompatible with the traditional ANS view and suggest that the acquisition and the discrimination of symbolic numbers are not ratio-dependent and that the processing of symbolic Can You Trust Your Number Sense 306 numbers may follow a separate developmental path from an early age. However, a systematic investigation of the developmental trajectory of the ratio effect in symbolic number and non-symbolic quantity discrimination currently lacks in the literature. Moreover, the lack of ratio effect in the symbolic number processing tasks in adults (e.g., Marinova et al., 2018;Sasanguie et al., 2017;van Hoogmoed & Kroesbergen, 2018), does not rule out the possibility that discrimination of symbolic numbers is based on the ANS at earlier stages in the development. Therefore, developmental data is crucial for providing insights into the relation between symbolic numbers and non-symbolic quantities.
In the current cross-sectional study, we aim to address this latter question by examining whether the ANS, indexed by the presence of a ratio effect, is continuously engaged in the processing of symbolic numbers in children.
To this end, we tested 87 first-(M age = 6.54 years, SD = 0.58), third-(M age = 8.55 years, SD = 0.60) and fifth-graders (M age = 10.63 years, SD = 0.67) on four audio-visual comparison tasks-(1) Number words-Digits, (2) Tones-Dots, (3) Number words-Dots, and (4) Tones-Digits. The Ratio (Easy, Medium, Hard), and the Number Range (Small (4-9) and Large (13-28)) of the number pairs was varied. We used an audio-visual paradigm instead of purely visual presentation because of the following advantages. First, participants can not base their judgements on the perceptual similarities between the stimuli (e.g., Barth et al., 2003;Marinova et al., 2018;Sasanguie et al., 2017). Second, due to the inclusion of the large numbers, it is possible that participants can decompose the numbers and base their decisions on the decades or the units only (e.g., Nuerk & Willmes, 2005). However, as we have previously shown , decomposition is unlikely to occur when using audio-visual tasks in languages as Dutch, where an inversion of the double-digit numbers exists (i.e., "five and twenty" or "vijf-en-twintig" in Dutch, instead of "twenty-five"). That is, because the position of the units and the decades differs between two consecutive stimuli (e.g., spoken number word "vijf-en-twintig" vs visually presented digit "21"), decomposition strategy would be inefficient . Finally, the audio-visual paradigm is very suitable for testing smaller children because proficient reading is not required (for a similar claim see . Furthermore, we included both small and large numbers in order to directly address potential developmental differences between the acquisition of single-digit numbers and number words without a compound structure (e.g., "vijf" in Dutch or "five") on the one hand, and the acquisition of double-digit numbers and number words with a compound structure (e.g., "vijf-en-twintig" in Dutch or "twenty-five") on the other. Given that we were interested in the contribution of the ANS to the acquisition of symbolic numbers we avoided including numbers within the subitising range (i.e., 1-4), whenever it was all tasks. Furthermore, six other participants were excluded (i.e., three first-graders, and three third-graders), because the visual inspection of their data showed that they pressed the same response key throughout the whole audio-visual task. Consequently, the final sample consisted of 67 children-26 were first-graders (M age = 6.54 years, SD = 0.58, 12 males), 22 were third-graders (M age = 8.55 years, SD = 0.60, 15 males), and 19 were fifth-graders (M age = 10.63 years, SD = 0.67, 9 males). To determine our sample size per age group, we performed a-priori power analysis using the G*Power software version 3.1 (Faul, Erdfelder, Lang, & Buchner, 2007). To obtain an effect of ratio with size, η p 2 = .19 (i.e., the smallest effect of ratio reported in Appendix in Marinova et al., 2018), with α = .05 and power set at 95%, the required sample was 15 participants. As a consequence, power is guaranteed with our current sample size of > 15 per age group. (Data is freely available, see Supplementary Materials).

Procedure, Tasks, and Stimuli
The procedure, tasks, and stimuli were similar to those used in the previous study by . Concretely, all children were presented with four audio-visual comparison tasks-(1) a Number words-Digits task, (2) a Tones-Dots task, (3) a Number Words-Dots task and (4) a Tones-Digits task (see Figure 1). Numbers were presented auditorily as number words or sequences of beeps, and visually as Arabic numerals or dot configurations. Both small (4 to 9) and large (13 to 28) number pairs were used, presented in three Ratio conditions-"Easy", "Medium", and "Hard" (see Table 1). Within the large number range, decade numbers and numbers without compound structure (i.e., 11 and 12) were excluded from the stimulus set.  The number words were presented in Dutch by a native female speaker. The beep sequences were generated and controlled for with a custom Python 2.7 script. Each individual tone lasted 40ms. To ensure that the presentation of the beeps was fast enough to encourage participants to rely on approximations, instead of counting (Barth et al., 2003;Philippi, van Erp, & Werkhoven, 2008;Tokita, Ashitani, & Ishiguchi, 2013;Tokita & Ishiguchi, 2012, the duration of the intertone interval was randomly varied (minimal duration was Can You Trust Your Number Sense 308 set to 10ms; for further technical details see the Method section in . The dot configurations were generated with the MATLAB script of Gebuis and Reynvoet (2011), controlling for non-numerical cues (i.e., total surface, convex hull, density, dot size and circumference). The Arabic numerals were written in font Arial, size 40. The auditory stimuli (i.e., number words and tones) were presented binaurally through headphones ( ≈ 65dB SPL). Participants were tested simultaneously in small groups of 3 to 5 children, in a quiet room at school equipped with individual laptops Dell latitude 5580, 15 inch HD displays (unmodified factory model) and individual active noise control headphone sets. E-prime 3.0 software (Psychology Software Tools, http://pstnet.com) was used for controlling the stimulus presentation and recording of the data.
Each trial began with a 600ms white fixation cross, presented in the centre of a black screen. Then, the auditory stimulus was presented for 2500ms. Immediately after, the visual stimulus was presented for 1000ms.
Afterwards, a blank screen appeared. Children were instructed to judge which stimulus (the auditory or the visual) was larger by pressing the "a" or "p" buttons on an AZERTY keyboard. Participants could respond either during the presentation of the visual stimulus or during the blank screen. After the response was given, there was an intertrial interval of 1500ms before the next trial began. Before each audio-visual task, each child received 10 practise trials (with feedback). The practise trials were followed by 36 randomly presented trials for the first-graders, and 72 trials for the third-and fifth-graders (without feedback). For half of the trials, the small number of the number pair appeared first, followed by the larger number (e.g., 19-21), in the other half of the trials the order was reversed: first the larger number was presented, then the smaller one (e.g., 21-19). Each audio-visual task was presented in a separate block. The order of the tasks was fully counterbalanced across participants.

Results
The mean accuracies are depicted in Table 2. To make our results as informative and useful as possible, the data were further analysed in both classical and Bayesian statistical frameworks. Given that the Bayesian approach allows us to evaluate both the alternative and the null hypotheses, we preferred to base the interpretations of our results on the Bayesian analyses (see Wagenmakers et al., 2018a). We report the Bayes factors (BF) or log(BF) in case the BF values are too large to interpret (Jarosz & Wiley, 2014;Wagenmakers et al., 2018aWagenmakers et al., , 2018b ii . To obtain both classical and Bayesian results, we used the JASP statistical package version 0.12 (https://jasp-stats.org).
First, we compared the performance for each condition to the chance level (.50) using (Bayesian) one-sample t-test. Results showed that first-graders did not perform significantly above chance in almost any condition.
Third-and fifth-graders performed above chance in all tasks (see Table 2). These results indicate that possibly the audio-visual tasks were too hard for the first-graders. Consequently, we urge the reader to approach the results from the first-graders with caution.     Overall, both Bayesian and classical ANOVAs yielded a similar pattern of results. Specifically, third-and fifth-graders performed better in the Number Words-Digits task and Number Words-Dots task, followed by the

Marinova & Reynvoet 311
Tones-Digits and Tones-Dots tasks. The performance of these children was also ratio-dependent, indicating that ANS has been addressed. Interestingly, in fifth-graders, there was also some evidence for an interaction between Task and Ratio, yielding results, similar to the previous studies in adults: the effect of Ratio was present in all tasks containing quantities (Number Words-Dots, Tones-Dots, and Tones-Digits), but not in the purely symbolic task (Number Words-Digits; Marinova et al., 2018;Sasanguie et al., 2017). Taken together, these two observations possibly suggest that the performance in the Number Words-Digits task is underlined by a distinct cognitive mechanism.
Third, to investigate further whether the performance in the tasks shared a common cognitive mechanism we performed bivariate Pearson's (Bayesian) correlations between the four audio-visual tasks (see Sasanguie et al., 2017, p. 236). Given that first-graders performed at the chance level, the correlations were computed for the third-and fifth-graders only. The correlations are depicted in Table 3. For the third-graders, the performance on the Number Words-Digits Task correlated only with the performance on the Number Words-Dots task, but not with the performance on the Tones-Dots and Tones-Digits tasks. However, all the tasks containing non-symbolic quantities (i.e., Number Words-Dots, Tones-Dots, and Tones-Digits) correlated with each other.
For the fifth-graders, the Number Words-Digits task did not correlate with any of the remaining tasks. Significant correlations were present between the Tones-Digits and Tones-Dots tasks, and between the Number Words-Dots, and Tones-Dots tasks. Overall, these results were in line with the findings of Sasanguie et al. (2017) and showed that for the thirdand fifth-graders, there was a tendency for the tasks containing quantities to be intercorrelated (i.e., Number Words-Dots, Tones-Dots, Tones-Digits), while the purely symbolic task (Number Words-Digits) tended to not correlate with these tasks. These results are also in line with the observation of the ANOVA analysis and suggest that numerical tasks, involving non-symbolic quantity processing most likely share common cognitive mechanisms, while the symbolic number processing is underlined by distinct cognitive processes. Nevertheless, this does not refute the possibility that some pre-verbal number system, such as the Parallel Individuation system (PI; see Carey, 2009a), is involved in the early stages of the symbolic number acquisition. We elaborate on this possibility in the Discussion section.

Discussion
Previous studies claimed that the processing and acquisition of symbolic numbers are deeply rooted in the ANS and are continuously supported by this system throughout development. Recent findings, however, suggest that symbolic numbers may be processed and acquired independently from the ANS. In light of this latter claim, the current study aimed to re-evaluate the role of the ANS in the acquisition of symbolic numbers. To this end, first-, third-, and fifth-graders performed four audio-visual tasks, testing their abilities to compare pairs of symbolic and non-symbolic quantities within the small and large range, and across easy, medium and hard ratios. Overall, our results suggest that it is unlikely that the ANS underlies the acquisition and the processing of symbolic numbers.
First, both classical and Bayesian results clearly illustrated that symbolic and non-symbolic quantity processing tasks exhibit different behavioural patterns. Specifically, our results showed that in third-and fifth-graders, the performance is much better in the Number Words-Digits task, and slightly better in the Number Words-Dots task than in the other two tasks, i.e., the Tones-Dots and Tones-Digits tasks. A somewhat similar tendency for higher accuracy in the Number Words-Digits tasks was observed in first-graders too. However, because the first-graders performed the audio-visual tasks hardly above chance, the behavioural pattern for this age group remains inconclusive. Nevertheless, when taken together, these results suggest that in the Number

Words-Digits and Number Words-Dots tasks the developmental growth is larger, while in the Tones-Dots and
Tones-Digits tasks the growth seems to be less steep after the first grade. These observations are difficult to reconcile with previous claims, arguing that the increase in ANS acuity drives the growth of symbolic number knowledge (e.g., Halberda & Feigenson, 2008;Piazza, 2010;Starr, Libertus, & Brannon, 2013). Concretely, our results show that the increase in the performance on the purely non-symbolic task (i.e., Tones-Dots) is much slower and is not sufficient to support the growth in the Number Words-Digits task. Therefore, the results obtained in the current and some previous studies (e.g., Hurst et al., 2017; see also Hutchison et al., 2020;Lyons et al., 2018) suggest that the numerical development may follow a different developmental path(e.g., Goffin & Ansari, 2019;Lyons et al., 2018).
Alternatively, the steeper developmental growth observed in the Number Words-Digits task and to a lesser extend in Number Words-Dots task, compared to the other tasks containing tone sequences (i.e., Tones-Dots and Tones-Digits) could be due to the following reason. The sequential presentation of the tones may have put an additional cognitive load on the children's working memory, possibly making the extraction of numerical information harder. However, previous audio-visual studies, have demonstrated that five-year-old children successfully extract numerical information from tone sequences in order to compare them with visually presented dot patterns (e.g., Barth, La Mont, Lipton, & Spelke, 2005).
Second, in line with the claims above, in fifth-graders, there was also an interaction between the comparison tasks and the Numerical Ratio. Concretely, we observed ratio effects (i.e., the signature of ANS being addressed) in all tasks, containing at least one non-symbolic quantity (i.e., Tones-Dots, Tones-Digits, Number Words-Dots), but the effect was absent in the Number Words-Digits task. These results corroborate with previous studies in adults (e.g., Marinova et al., 2018;Sasanguie et al., 2017), where a similar pattern of results was obtained. These findings seem to suggest that the numerical ratio is not as crucial a factor for the processing of symbolic numbers as previously argued (e.g., Matthews et al., 2016;Piazza, 2010). The lack of a ratio effect in the purely symbolic task is surprising in light of previous Marinova & Reynvoet 313 findings (e.g., Mundy & Gilmore, 2009) and models according to which, in numerical comparison task, ratio effect should be always present, because it is a result of a response-related strategy (Verguts & Fias, 2004;Verguts, Fias, & Stevens, 2005; see also Krajcsi et al., 2016). However, it might be due to the (sequential) audio-visual presentation technique with relatively long presentation times that was adopted in this study. In a recent study with audio-visual presentation, Lin and Göbel (2019) demonstrated that the size of the symbolic ratio effect decreased as the stimulus onset asynchrony (SOA) between both numbers increased. More specifically, the authors observed a larger distance effect when the digit and number words were presented simultaneously (i.e., SOA = 0ms), and a smaller distance effect when longer SOAs were used (e.g., SOA = 500 ms). Although the precise effect of SOA on the numerical distance effect needs to be examined further, it has no repercussions for our conclusion, which is based on the interaction of the ratio effect with the task.
Third, correlational analyses in third-and fifth-graders showed that while tasks containing quantities (i.e., Tones-Dots, Number Words-Dots, Tones-Digits) tended to be related to each other, the symbolic number processing task (i.e., Number Words-Digits) was not. These results are in line with the findings of  and seem to suggest that the processing of quantities and symbolic numbers are founded by different cognitive mechanisms.
Overall, in light of these findings, it seems implausible that an imprecise pre-verbal system such as the ANS, showing a qualitatively distinct and "slower" developmental path than the symbolic number processing, is capable of providing continuous support in the discrimination of symbolic numbers (Krajcsi et al., 2018;Núñez, 2017). These data are instead in line with recent approaches in numerical cognition, according to which symbolic number system develops independently of the ANS (e.g. Carey, 2009aCarey, , 2009bNoël & Rousselle, 2011;Núñez, 2017;Reynvoet & Sasanguie, 2016;Wilkey & Ansari, 2020). These models, however, do not rule out the possibility that at the early stages of symbolic number acquisition, children rely on some pre-verbal number system, such as the PI system (Carey, 2009a(Carey, , 2009b. Concretely, it has been suggested that, children possibly rely on the PI system to acquire the meaning of the small numerals (up to 4) by associating them with small sets of items (Carey, 2009a(Carey, , 2009bCarey & Barner, 2019;Carey et al., 2017;Hutchison et al., 2020). Later on, children acquire larger numerals by building associative relations between the symbols themselves. These relations are further forged increasingly stronger throughout development as a result of children's increasing experience with symbolic numbers through counting procedures and formal schooling (see also Reynvoet & Sasanguie, 2016). Consequently, a symbolic number network is formed where numbers are processed in terms of their mutual connections. These various connections become more numerous and sophisticated as a result of the semantic associations acquired throughout development. For example, numbers can be represented in terms of their order associations, e.g., 1-2-3, but also based on whether they are odd (e.g., 1-3-5), even (e.g., 2-4-6), multiplied by 10 (e.g., 10-20-30 or 10-100-1000) etc. (e.g., Krajcsi et al., 2016;Reynvoet & Sasanguie, 2016;Vos et al., 2017). Our results provide support for these latter findings by demonstrating that such symbolic number network emerges independently from the ANS from an early age, as opposed to emerging only in adulthood as previously argued (Lyons, Ansari, & Beilock, 2012).
In conclusion, the current cross-sectional study examined the role ANS plays in the acquisition of symbolic numbers. Overall, our results showed that the symbolic number processing undergoes substantial and faster developmental growth in performance, while the non-symbolic quantity processing performance changes to a lesser extent. Moreover, the ratio effect (the signature of ANS being addressed) was absent in the symbolic number task. In contrast, the effect was present in all tasks, containing at least one non-symbolic quantity (i.e.,

Can You Trust Your Number Sense 314
Number Words-Dots, Tones-Dots, Tones-Digits). These latter tasks also tended to be correlated with each other, and not with the Number Words-Digits task. These results show that it is unlikely that ANS provides continuous support in the processing of symbolic numbers, and are rather in line with studies suggesting distinct developmental trajectories for symbolic numbers and non-symbolic quantities (e.g., Carey, 2009aCarey, , 2009bReynvoet & Sasanguie, 2016;Wilkey & Ansari, 2020).

Notes
i) With respect to the credibility and the scientific integrity of our research, we report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study (Simmons, Nelson, & Simonsohn, 2012).
ii) The Bayes Factor (BF10 ) is the ratio of the likelihood of the alternative hypothesis and the likelihood of the null hypothesis. For statistical analyses, involving a larger number of factors such as repeated-measures ANOVA, it is recomeded to report the BFInclusion (see Wagenmakers et al., 2018b for the rationale). Conventionally, the evidence provided by the BF values is categorized as "anecdotal" (for values between < 1 and 3), "moderate" ( for values between 3 and 10), "strong" ( for values between 10 and 30), "very strong" (for values between 30 and 100), and "extreme" (for values > 100) (Jeffreys, 1961).

Funding
This research was supported by the KU Leuven research funds (grant number C14/16/029).