Number Line Tasks and Their Relation to Arithmetics in Second to Fourth Graders

Considering the importance of mathematical knowledge for STEM careers, we aimed to better understand the cognitive mechanisms underlying the commonly observed relation between number line estimations (NLEs) and arithmetics. We used a within-subject design to model NLEs in an unbounded and bounded task and to assess their relations to arithmetics in second to fourth grades. Our results mostly agree with previous findings, indicating that unbounded and bounded NLEs likely index different cognitive constructs at this age. Bounded NLEs were best described by cyclic power models including the subtraction bias model, likely indicating proportional reasoning. Conversely, mixed log-linear and single scalloped power models provided better fits for unbounded NLEs, suggesting direct estimation. Moreover, only bounded but not unbounded NLEs related to addition and subtraction skills. This thus suggests that proportional reasoning probably accounts for the relation between NLEs and arithmetics, at least in second to fourth graders. This was further confirmed by moderation analysis, showing that relations between bounded NLEs and subtraction skills were only observed in children whose estimates were best described by the cyclic power models. Depending on the aim of future studies, our results suggest measuring estimations on unbounded number lines if one is interested in directly assessing numerical magnitude representations. Conversely, if one aims to predict arithmetic skills, one should assess bounded NLEs, probably indexing proportional reasoning, at least in second to fourth graders. The present outcomes also further highlight the potential usefulness of training the positioning of target numbers on bounded number lines for arithmetic development.

In NLE tasks, participants need to place a target number on a visually presented number line (Siegler & Opfer, 2003). In the classical "bounded" version of the task, number lines are labelled with two endpoints (Cohen & Blanc Goldhammer, 2011;Cohen & Sarnecka, 2014). More recently, an "unbounded" task has been introduced, where only the origin of the number line and a scaling unit (e.g., 0-1) are given (Cohen & Blanc-Goldhammer, 2011).
However, recent studies assessing both the classical bounded NLE task and the relatively new unbounded version found that associations between NLEs and arithmetics were only observed for the former but not the latter task (Jung et al., 2020;; but see Kim & Opfer, 2017). This finding also agrees with the meta-analysis of Schneider and colleagues (2018), where correlations with mathematical skills were moderated by the variant of the NLE task. More concretely, relations with mathematical competence were significantly positive for the standard, bounded version of the NLE task but not significantly different from zero for unbounded number lines. This not only suggests that both tasks likely reflect different cognitive constructs, but also questions whether numerical magnitude representations can truly account for the relation between bounded NLEs and arithmetics.
More concretely, since bounded number lines have a clearly defined origin and endpoint, individuals can scale the line length to a proportion by estimating the distances of the target number from the left and right boundaries of the line. For example, if individuals need to place the number 60 on a line from 0 to 100, they pick a position on the line and then look back and forth between that position and the boundaries of the line, adjusting the position until its distances to the lower and upper boundaries seem to be 60 and 40 respectively. When participants use this "proportional judgement" strategy to estimate the position of numbers on bounded lines, their estimates form an S-shaped ogival curve around the accuracy line, which can be described by Spence's power model (Spence, 1990;Spence & Krizel, 1994). Hollands and Dyre (2000) adapted the power model to include the possibility of using multiple reference points. This so called "cyclic power model (CPM)" predicts multiple cycles of the S-shaped ogival pattern, depending on the number of reference points. Some individuals might only rely on the origin and endpoint, reflected by a single cycle (i.e., one-cycle power model, 1CPM), while others might use the middle of the line as an additional reference point, yielding two cycles of the S-shaped ogival pattern (i.e., two-cycle power model, 2CPM). Cohen and Sarnecka (2014) later introduced a variation to the CPM, the subtraction bias cyclic model (SBCM). This model is identical to the 1CPM but includes a subtraction bias, which indexes difficulties in the "subtract-and-compare" process. This model was shown to describe the estimates of younger children, suggesting that poor mensuration skills underlie the negatively accelerating (logarithmic-like) pattern of data usually produced by younger children on the bounded task (Cohen & Sarnecka, 2014). However, it should be noted that although CPMs can be indicative of the application of proportional reasoning, the latter calculation strategy is not necessarily exclusively captured by the superior fits of those power functions. Namely, Yuan, Prather, Mix, and Smith (2020) have recently shown that power models provided relatively poor fits to performances on a clustered dot array task that is readily solvable by basic proportional reasoning skills. Thus, although power models likely index the reliance on proportional judgement, poor fits do not necessarily suggest the absence of such a calculation strategy.
Even though the question of whether proportional reasoning is adequately modelled by CPMs remains a matter of debate, the application of such a calculation strategy on bounded tasks can be further supported by studies focussing on participants' error rates (e.g., Ashcraft & Moore, 2012). Namely, smaller error variability is typically observed close to the mid-and endpoints of the number line, resulting in a characteristic M-shaped distribution, thereby suggesting the reliance on reference points as a guidance for placing estimates on bounded lines (Cohen & Blanc-Goldhammer, 2011;Link, Huber, Nuerk, & Moeller, 2014). The use of proportional judgement is further corroborated by participants' verbal reports and observations of their solution behavior (Peeters, Degrande, Ebersbach, Verschaffel, & Luwel, 2016;Peeters, Verschaffel, & Luwel, 2017;Petitto, 1990), as well as eye-movement data (Schneider et al., 2008;Sullivan, Juhasz, Slattery, & Barth, 2011).
As opposed to the bounded task, strategies on unbounded number lines were shown to consist of direct estimation and dead-reckoning (Cohen & Blanc-Goldhammer, 2011;Cohen & Sarnecka, 2014;Jung et al., 2020;Link, Huber, et al., 2014;Reinert, Hartmann, Huber, & Moeller, 2019;Reinert, Huber, Nuerk, & Moeller, 2017). Namely, participants use the given unit to either directly estimate target numbers by moving straight to their position on the number line or apply dead-reckoning by firstly estimating a working window of a specific size [e.g., 5 units] and by then using multiples of this working window to estimate the location of larger numbers. Estimating the position of numbers by repeatedly counting to smaller numbers (i.e., estimating the line length of the working window) creates a scalloped pattern in participants' estimates. This pattern of estimation can be described by a simple variant of Stevens's power law, termed the scalloped power model (SPM; Cohen & Blanc-Goldhammer, 2011). When participants estimate numbers directly, their estimates are characterized by a single scallop (i.e., single-scalloped power model, 1SPM). Conversely, dual-(2SPM) and multi-scalloped power models (multi-SPM) reflect the estimates of individuals applying the working window once or multiple times respectively before counting towards the position of the estimate. Considering that the SPMs indicate the reliance on basic counting skills, such as repeated addition, rather than any advanced calculation strategy, NLEs best described by these functions are considered to provide a more accurate picture of numerical magnitude representations (see Cohen & Sarnecka, 2014).
Direct estimation on unbounded number lines is also further confirmed by participants' error data. Namely, error variability in the unbounded task linearly increases with target number and is not characterized by the M-shaped pattern reflecting proportional judgement on bounded number lines (Link, Huber, et al., 2014).
Considering that 1) mainly bounded but not unbounded estimates correlate with arithmetics and that 2) direct estimation is likely only indexed by unbounded NLEs, mechanisms other than numerical magnitude representations such as proportional judgement probably underlie the relation between NLEs and arithmetics. In other terms, NLEs correlate with mathematical skills, because proportional reasoning is a key proficiency component in mathematics (cf., Boyer, Levine, & Huttenlocher, 2008), but not because numerical magnitude representations are an important determinant of mathematical development. The unbounded NLE task is thought to provide a better measure of the MNL (Cohen & Sarnecka, 2014). However, the absence of a relation between arithmetics and unbounded NLEs  questions the importance of numerical magnitude representations for mathematical learning. model (MLLM) produced superior fits to bounded estimates than a model integrating all variants of the CPMs, termed mixed cyclic power model (MCPM). Since the MLLM predicts estimates as a weighted sum of logarithmic and linear transforms of the target number, it reflects the idea of age-related log-to-linear shifts in numerical representations (Anobile, Cicchini, & Burr, 2012;Cicchini, Anobile, & Burr, 2014;Dehaene, Izard, Spelke, & Pica, 2008;Kim & Opfer, 2017;Opfer et al., 2016). Their findings therefore suggest that not only unbounded but also bounded estimates properly reflect numerical magnitude representations. Interestingly, what is more, the logarithmicity component of the MLLM was found to significantly relate to addition and subtraction skills. This indicates that the degree of logarithmic compression in numerical magnitude representations probably accounts for the relation between bounded NLEs and arithmetics. The importance of the MNL in explaining associations between NLEs and mathematical skills is further corroborated by their observation that even unbounded estimates (assumed to index numerical representations more directly) significantly correlated with arithmetics (Kim & Opfer, 2017). In sum, Kim and Opfer (2017) thus provide evidence in favour of the initial idea that the MNL underlies the commonly observed relation between NLEs and arithmetics.
How can the findings of Kim and Opfer (2017) be reconciled with evidence against the importance of the MNL in the relation between NLEs and arithmetics? One possible explanation for this discrepancy is that studies not finding a relation between unbounded NLEs and arithmetics were mainly conducted in relatively older elementary school children  or adolescents (Jung et al., 2020), while Kim and Opfer (2017) tested younger children, attending preschool, first or second grade. Notably, the importance of the MNL for unbounded NLEs has been confirmed in older children (Link, Huber, et al., 2014) and adolescents (Jung et al., 2020), suggesting that the reliance on numerical magnitude representations might well account for the lack of relation to arithmetics at these later developmental stages. This assumption has, however, not yet been confirmed with intra-individual studies in older elementary school children. More concretely, only studies focussing on either younger children (Kim & Opfer, 2017) or adolescents (Jung et al., 2020) have assessed the constructs underlying NLEs as well as their relations to arithmetics in the same individuals. It is thus advisable to employ such a within-subject approach also in slightly older elementary school children to determine whether relations between arithmetics and bounded but not unbounded NLEs previously observed in fourth graders  can actually be explained by the reliance on proportional judgement on bounded as opposed to the MNL on unbounded number lines in such older elementary school children (Link, Huber, et al., 2014).

Aim
In this paper, we generally aimed to further unravel the cognitive mechanisms underlying the commonly observed relation between NLEs and arithmetics. To shed further light on the aforementioned inconsistencies and to account for the potential shortcomings of previous designs, the present study used a within-subject approach to assess 1) model fits for unbounded and bounded NLEs (and as such their underlying cognitive determinants) as well as 2) their relations to arithmetics within the same individuals. We focussed on slightly older elementary school children attending second to fourth grade to complement previous studies using a similar within-subject approach in younger children (Kim & Opfer, 2017) or adolescents (Jung et al., 2020) and to get a better understanding of the mechanisms actually explaining the relation between bounded but not unbounded NLEs and arithmetics previously reported in fourth graders .
The current hybrid design (i.e., the use of a modelling procedure to better understand the constructs underlying bounded and unbounded NLEs and consequently individual differences in addition and subtraction skills, see e.g., Kim & Opfer, 2017, or Jung et al., 2020, for a similar design) enabled us to further assess the mechanisms explaining relations between NLEs and arithmetics without the hurdle of incorporating additional tests to determine the influence of potentially confounding domain-specific and domain-general variables previously shown to relate to NLEs and/or arithmetics (e.g., visuospatial abilities, see Sella, Sader, Lolliot, & Cohen Kadosh, 2016;Simms, Clayton, Cragg, Gilmore, & Johnson, 2016).
We also performed moderation analysis to determine whether potential associations between NLEs and arithmetics depend on the best-fit model, thereby further unravelling the cognitive construct actually accounting for their relation. More concretely, we hypothesized that if bounded NLEs reflect proportional judgement, which in turn explains their relation to mathematics, associations might be strongest in those children whose estimations are best described by functions capturing this kind of calculation strategy.
We intended to contrast the MNL with proportional reasoning as possible constructs underlying unbounded and bounded NLEs and their relations to arithmetics. Since the likelihood of strategy application and the reliable use of reference points was shown to be increased by employing a familiar number range (e.g., Slusser et al., 2013;White & Szűcs, 2012), we focussed on numbers in the range of 0-20 and 0-100 (depending on the task variant, see Link, Huber, et al., 2014, and, for comparable differences in number ranges between bounded and unbounded tasks). This covers a familiar interval also in the youngest children of the present sample (e.g., Dackermann et al., 2018). The linear-like estimation patterns, usually produced by second to fourth graders on such familiar number lines (Booth & Siegler, 2006;Sella et al., 2015;Siegler & Booth, 2004), were then modelled using functions capturing either the MNL (i.e., the MLLM and SPMs) or calculation strategies, such as proportional reasoning (i.e., CPMs).
Considering the logarithmic-like (as opposed to linear-like) estimation patterns usually observed in younger children or with less familiar number ranges, we would like to point out that different conclusions regarding the constructs underlying (bounded and unbounded) NLEs and their relations to arithmetics might be drawn in those cases. It was, however, beyond the scope of the present study to examine whether the shift from seemingly logarithmic to more linear NLEs with age and experience reflects 1) changes in the disposition of numerical magnitude representations (e.g., Siegler & Opfer, 2003;Siegler & Booth, 2004), 2) the maturation of calculation strategies (e.g., the successful use of reference points, see Slusser et al., 2013), 3) the development of better mensuration skills (e.g., Cohen & Sarnecka, 2014) or 4) the reliance on different cognitive constructs (e.g., MNL and calculation strategies in younger and older individuals respectively, Dackermann et al., 2015) and how this shift might affect the relations to arithmetics across development. Given the linear-like NLEs of older children on familiar number lines, we did also not focus on any additional models, such as the bilinear account, suggested to provide good fits for logarithmic-like NLEs and thereby yet another alternative explanation for the logarithmic-to-linear shift hypothesis (see Ebersbach et al., 2008, or Moeller et al., 2009).

Method Participants
A cross-sectional sample of 69 elementary school children (20 second graders [11 boys, mean age = 8.17 years, SD = 0.37], 18 third graders [9 boys, mean age = 9.32 years, SD = 0.55], 31 fourth graders [14 boys, mean age = 10.40 years, SD = 0.46]) was assessed on arithmetic and NLE tasks. All children participated voluntarily and were included in the sample only after their parents provided a signed informed consent form. The study was approved by the local Ethics Review Panel (ERP).
Post-hoc power analysis using the G*Power 3 software (Faul, Erdfelder, Buchner, & Lang, 2009) indicated that the present sample size of 69 participants provided a power of 78% to detect interaction effects at the small to medium level with three groups and two repeated measures.

Procedure and Tasks
All tasks were administered in group settings, starting with the arithmetic task.

Arithmetic
To assess children's arithmetic skills, the TTR (Tempo Test Rekenen; De Vos, 1992) was administered. It is a paper-and pencil timed standardized arithmetic test that consists of 200 arithmetic number fact problems. All items are presented across five DIN A4 sheets: 1 sheet for addition, subtraction, multiplication, division, and a mixture of these operations, respectively. On each sheet, the problems increase in difficulty. Children need to solve as many items as possible within 1 min per sheet (i.e., 5 min in total). In line with the results of previous studies (Kim & Opfer, 2017;, only the addition and subtraction problems were evaluated for the present study. Children received one point for every correctly solved item (i.e., maximum score = 40 per operation).

NLE
Stimuli and design -The study consisted of a bounded and unbounded version of the NLE task, requiring participants to indicate the correct position of a presented target number on a number line. All number lines had a constant length of 20 cm and were presented on paper in landscape format with one item per DIN A4 sheet.
In the bounded task, number lines were labelled below the origin and endpoint with the numbers 0 and 100, respectively. Target numbers [2,3,6,7,9,11,13,17,18,27,35,47,53,64,75,82,95,99] were placed above the origin with one target number per number line. Children were instructed that they are presented with a number line that only has an origin and an endpoint. The task was then explained (in Luxembourgish) as follows: "Look at the number above the number line -where do you think this number goes between 0 and 100. Please mark your estimate on the number line. " 50 was used as practice trial.
In the unbounded task, number lines were only labelled below the origin with 0 but did not comprise a clearly defined endpoint. A unit indicating the distance between 0 and 1 was depicted below the origin. Target numbers were presented above the origin. The numerical length of the unbounded number line was 29. However, only items up to 20 [2,3,4,6,7,8,9,12,13,14,15,16,17,18,19] were used to keep sufficient space between the largest target number and the physical endpoint. Children were instructed that there is no end to the number line but that they can see how long the distance from 0 to 1 is. 10 was used as practice trial.
All children first completed the unbounded task followed by the bounded one. This was to avoid that the endpoint '100' given in the latter task biased estimations in the former. We particularly wanted to prevent the children from mistakenly assuming that unbounded number lines might also cover the range from 0 to 100. Participants were not informed about the number range covered by the unbounded task prior to completing it. In other words, the task that explicitly defined a number range (i.e., the bounded task) had to be administered last to avoid participants from potentially building up expectations about the number range used in the unbounded task. This procedure has also been previously used in studies assessing unbounded and bounded NLEs using a within-subject design (e.g., Cohen & Sarnecka, 2014;Link, Huber, et al., 2014;Reinert et al., 2017;Reinert et al., 2019). No feedback was provided to the children regarding the correctness of their estimates for either practise or experimental trials in both tasks.
Analysis -Outlier Exclusion. Individual estimates that differed more than ± 2.5 SDs from the respective grade's mean estimate were excluded prior to data analyses (cf., ; see also Cohen & Sarnecka, 2014). Overall, this resulted in the exclusion of 1.64% and 1.93% of estimates in the unbounded and bounded task respectively. There was no effect of task or grade and no interaction on the percentage of excluded items. Importantly, this trimming procedure did not change any of the main findings.
Percent Absolute Error. The percent absolute error (PAE = abs(estimate -target number)/number scale*100) was averaged across all target numbers separately for each individual in each task to index overall task difficulty in every child.
We also calculated mean PAEs as a function of target number in both the unbounded and bounded tasks across all children separately for each grade.
Model Fittings. We assessed the goodness of fit of several models previously shown to mathematically reflect children's NLEs. Models were estimated using conventional ordinary least square regression in R.
We distinguished between models that either index direct estimation and dead-reckoning or reflect calculation strat egies, such as proportional judgement. Direct estimation is indicated by the superior fits of either the mixed log-linear model (MLLM) or the scalloped power models (SPMs), with the latter also reflecting some extent of dead-reckoning depending on the variant of the model. Conversely, proportional judgment can be revealed by the superior fits of the cyclic power models (CPMs), including the subtraction bias cyclic model (SBCM).
The MLLM predicts estimates as a weighted sum of logarithmic and linear transforms of the target number (Anobile et al., 2012;Cicchini et al., 2014;Kim & Opfer, 2017;Opfer et al., 2016). It reflects the assumption that logarithmic and linear representations co-exist to some extent in the same individual, since logarithmic-to-linear shifts occur at different times in development for different number ranges. The MLLM is defined as: where y indicates the estimate of a target number x on a 0-U number line. α denotes a scaling parameter and λ is a logarithmicity index that measures the degrees of logarithmic compression in estimates. If estimation is perfectly linear, λ converges to 0, whereas its value approaches 1 as estimation shows more logarithmic compression. The SPMs reflect number estimation via dead reckoning, whereby participants firstly estimate a unit and then estimate the position of the following unit based on their current position. Direct estimation is indexed by a single-scal loped power model (1SPM), which is identical to Stevens' power law without a scaling factor: Dual-(2SPM) and multi-scalloped power models (multi-SPM) reflect the estimation of a particular working window of numbers (e.g., 5) before using multiples of this working window to estimate the position of higher target numbers. The 2SPM allows to identify participants, estimating the working window once before positioning their estimates: while the multi-SPM indexes multiple applications of the working window: In these models, x is the target number, β is the characteristic exponent, describing the numerical bias and the shape of the power function, and d is the size of the working window. Finally, CPMs index calculation strategies, such as probably proportional judgment. These models suggest that participants use at least two reference points (i.e., the origin and the endpoint) to guide their estimations. Number placement thus occurs via estimating a target number's distance from the lower and upper bounds of the number line until consistency of the two distances. While the one-cycle power model (1CPM) reflects the use of two reference points (i.e., the origin and the endpoint): the two-cycle power model (2CPM) indexes the reliance on an additional central reference point: 1CPM and 2CPM were fitted with one free parameter, the exponent β describing the numerical bias and the shape of the power function. U represents the upper bound of the number line. In addition to these models, Cohen and Sarnecka (2014) developed the subtraction bias cyclic model (SBCM) to formally model poor mensuration. To scale the length of a number line to a proportion by estimating the distances from the origin and the endpoint of the line, participants need to subtract the target number from the value of the upper bound. To account for difficulties with this subtract-and-compare process, the 1CPM was modified to incorporate an exponent s, capturing this potential bias in subtraction: Importantly, CPMs require definition of an upper bound U, which can be easily specified in the bounded task. However, for the unbounded task, such an upper bound does not exist per se. In theory, participants might have used the end of the physical line as an upper bound (i.e., 29) or the largest target number (i.e., 19). Since these strategies might vary between participants, a fixed upper bound for testing these models in the unbounded task cannot be used. The upper bound was therefore estimated by the fitting procedure for unbounded NLEs. The range of the parameter accounting for the upper bound was constraint to lie between 19 and 29. Unbounded estimates were only fitted with the modified version of the SBCM and 1CPM, but not the more complex 2CPM, indexing the reliance on an additional central reference point.
The models were fitted both to individual estimates and to median estimates separately for each grade. Models were compared in terms of goodness of fit by calculating AICc 1 (Akaike information criterion with a correction for finite sample sizes) values (e.g., Burnham, Anderson, & Huyvaert, 2011), with lower AICc values indexing superior model fits (see e.g., Barth et al., 2016;Gross et al., 2018;Kim & Opfer, 2017;Link, Huber, et al., 2014;Luwel et al., 2018;Möhring et al., 2018;Reinert et al., 2017;Sasanguie et al., 2016;Slusser et al., 2013, for a comparable use of AIC). AICc is a penalized-likelihood criterion, meaning that it penalizes for the number of parameters in a model. In contrast to other measures (e.g., R 2 ), it thus considers both goodness of fit and model complexity in terms of the number of parameters (Burnham & Anderson, 2004).

Unbounded and Bounded NLEs Estimation Errors
Firstly, we determined whether estimation errors depended on grade and/or the version of the number line task. We therefore conducted a linear mixed effect model using the lme4 package (Bates, Maechler, Bolker, & Walker, 2014) in the R environment (R Development Core Team, 2007). Task and grade with interaction term were entered as fixed effects into the model. As random effect, we included intercepts for participants. We then compared this model to reduced models without the effects or interaction in question using chi-squares tests on the log-likelihood values. The full model fitted better than the reduced model without the interaction term, χ 2 (1) = 8.53, p = .003. The full model was also significantly better than the reduced models excluding either grade, χ 2 (1) = 6.10, p = .01, or task, χ 2 (1) = 4.26, p = .04. Performances were better on the bounded (PAE = 7.57, SD = 3.43) than the unbounded task (PAE = 13.02, SD = 5.18). Moreover, performances increased with grade, but mainly on the bounded task (2 nd grade: PAE = 9.25, SD = 3.94, 3 rd grade: PAE = 8.45, SD = 3.25, 4 th grade: PAE = 5.97, SD = 2.43) compared to the unbounded task (2 nd grade: PAE = 11.12, SD = 4.33, 3 rd grade: PAE = 14.96, SD = 4.97, 4 th grade: PAE = 13.12, SD = 5.51). In addition, there was no correlation between unbounded and bounded mean PAEs (r = -.04, p = .74) or SDs of mean PAEs (r = -.004, p = .97). This did not change when controlling for grade (mean PAE: r = .02, p = .89, SD of mean PAE: r = .04, p = .75). Mean PAEs did, however, relate to SDs of mean PAEs in both the unbounded (r = .81, p < .001) and bounded tasks (r = .86, p < .001).
1) The same results were obtained when using the Bayesian information criterion (BIC) instead of AICc as a measure of goodness of fit (see also Dackermann et al., 2015;Opfer et al., 2016).
Next, we correlated unbounded and bounded PAEs with the size of target numbers separately for each grade. Unbounded PAEs significantly increased with increasing target number in every grade (2 nd grade: r = .93, p < .001, 3 rd grade: r = .93, p < .001, 4 th grade: r = .93, p < .001). Conversely, no relations between PAEs and the size of target numbers were observed in the bounded task, except for third grade, where estimation errors were smaller for larger target numbers (r = -.7, p < .001). Visual inspection of the plots indicated a pattern of bounded PAE distribution reflecting fewer errors in and around to-be-expected reference points, including the origin and endpoint (reference points 2, 3, 47, 53, 95, 99: mean PAE = 5.48, SD = 2.28 vs. no reference points 7, 11, 17, 27, 64, 75, 82: mean PAE = 8.00, SD = 3.76), which is characteristic of proportional judgement (Ashcraft & Moore, 2012). This thus strengthens the assumption that different constructs underlie unbounded and bounded NLEs.

Model Fits and Parameters
To further confirm the use of different strategies on unbounded and bounded tasks, we fitted a series of models to children's unbounded and bounded NLEs. Since relying only on mean or median estimates across individuals from each grade for each task might obscure individual differences in estimation patterns and trajectories (see also Sasanguie, Verschaffel, Reynvoet, & Luwel, 2016), all models were fit to individual estimates in a first step. However, for the sake of completeness and to be in accordance with some previous studies (Barth & Paladino, 2011;Cohen & Sarnecka, 2014;Slusser & Barth, 2017;Slusser et al., 2013), model fits were also assessed at the grade level by considering unbounded and bounded grade median estimates.
First, children were grouped based on the model that best described their individual estimates. Table 1 presents the percentage of children best fit by the different models separately for each task and grade. In the unbounded task, the estimates of approximately 50% of the children were best fit by the 1SPM in each grade, while the MLLM explained NLEs in about one third of all the children in every grade. Overall, unbounded NLEs of less than 10% of the children were best described by the multi-SPM or any of the CPMs. Conversely, in the bounded task, the CPMs provided the best fit for the estimates of about 90% of the children in each grade. While the bounded estimates of most of the second and third graders were best explained by the SBCM (i.e., 40%), the estimates of fourth graders were described to an equivalent extent by either the SBCM, 1CPM or 2CPM. The bounded estimates of only 10% of the children in every grade were best described by the MLLM, while the SPMs provided the best fit for the bounded estimates of almost none of the children 2 .
To determine whether NLEs in terms of PAEs depended on the strategy used to position target numbers on either the unbounded or bounded number line (i.e., best-fit model), we performed two one-way ANOVAs on either unbounded or bounded PAEs including best-fit unbounded or bounded model respectively as between-subject factor. For these analyses, individuals whose unbounded NLEs were not described by either the MLLM or 1SPM were pooled to avoid very small sample sizes. For the same reason, those participants whose bounded NLEs were not fit by a variant of the CPMs were combined. Analysis revealed a main effect of best-fit model for bounded PAEs, in that children using three reference points outperformed children whose estimates were best described by either the MLLM or 1SPM (2CPM: PAE = 5.47, MLLM-1SPM: PAE = 9.32, F(3, 64) = 4.33, p = .008, η p 2 = .17), even when controlling for grade. No other differences were observed for bounded NLEs. For unbounded PAEs, there was also a main effect of model, such that individuals whose estimates were described by the 1SPM performed better than children whose estimates were fit by the MLLM (MLLM: PAE = 14.71, 1SPM: PAE = 11.44, F(2, 65) = 3.58, p = .034, η p 2 = .10).
2) It could be argued that the better fits of the CPMs compared to the MLLM in the bounded task might be explained by a decline in attentional processes on this task rather than by its boundedness. Namely, task order was fixed in the present study with the bounded task always being administered last and attention was previously shown to generally increases the linearity of NLEs (see e.g., Anobile et al., 2012). Nonetheless, we believe that this is unlikely the case since some children's NLEs were best-fit by a linear function in the bounded but not in the unbounded task, although the former task was administered last. Together with the aforementioned findings of the lack of correlation between unbounded and bounded PAEs and the qualitative differences in the effect of grade on the latter, we assume that any differences between unbounded and bounded NLEs (in terms of PAE and/or best-fit model) were unlikely confounded by task order and attentional processes. Apart from this, we also considered the average goodness of fit, as indexed by AICc, across all participants per grade for each model used to fit either unbounded or bounded NLEs. Mean unbounded and bounded AICc are displayed in Table 2. According to Burnham and Anderson (2002), models having a ΔAICc within 0-2 of the best model have substantial support and should be taken into consideration when making inferences, while models with a ΔAICc within 4-7 have considerably less support and models with a ΔAICc > 10 have essentially no support. In the unbounded task, the 1SPM (AICc = 26.65) provided the best fit across all individuals, but the MLLM and the remaining SPMs could also substantially support the data (i.e., ΔAICc < 2, see Table 2). The worst fits were provided by the CPMs, notably the modified version of the 1CPM including an additional free parameter that represented the "upper bound" of the unbounded number line. The numerical bias captured by the exponent β of the 1SPM did not depend on grade, F(2, 66) = 2.83, p = .07, η p 2 = .08, and was significantly positively accelerating across all grades, β = 1.09, t(68) = 11.41, p < .001. Similarly, grade had no effect on λ, which approached 0, λ = 0.098, F(2, 66) = 0.21, p = .81, η p 2 = .01. These findings thus provide no evidence for logarithmic-like estimation patterns on unbounded number lines in the current sample. In the bounded task, the SBCM provided the best fit across all participants (AICc = 67.04) as well as in each grade. According to Burnham and Anderson (2002), the bounded NLEs of fourth graders could also be explained by the 1CPM (i.e., ΔAICc < 2), while those of second and third graders were only best described by the SBCM (see Table 2). The 2CPM only provided moderate support for the children's estimates even in fourth graders, where the estimates of almost one third of the children were best fit by this model (see Table 2). Overall, the outcome suggests that children generally rely on two reference points (i.e., the origin and endpoint) with some bias in subtraction, when estimating the position of target numbers on bounded number lines. Interestingly, the exponent s of the SBCM, capturing this subtraction bias, did not differ between grades, F(2, 66) = 2.18, p = .12, η p 2 = .06, and was relatively close to 1 in all grades (2 nd grade: s = 0.96, 3 rd grade: s = 0.96, 4 th grade: s = 1.00). Conversely, the exponent β depended on grade for both the SBCM, F(2, 66) = 8.86, p < .001, η p 2 = .21, and 1CPM, F(2, 66) = 8.96, p < .001, η p 2 = .21. It was negatively accelerating in each grade, but approached 1 as children got older, reflecting progressively linear estimations (SBCM: 2 nd grade = 0.69, 3 rd grade = 0.71, 4 th grade = 0.89; 1CPM: 2 nd grade = 0.66, 3 rd grade = 0.68, 4 th grade = 0.89). Importantly, unbounded and bounded model parameters did not correlate (see Table 3), further indicating the reliance on different cognitive constructs when estimating the position of target numbers and either unbounded or bounded number lines. Note. ΔAICc represents the difference to the best model across all grades separately for each task. The best model is indicated by ΔAICc = 0. Alongside these analyses at the individual level, we also fitted the different models to the children's median estimates separately for each task and grade (see Figure 1). Median estimates in the unbounded task were best described by the 1SPM in second graders and the 2SPM in third and fourth graders. In the bounded task, the SBCM, 1CPM and 2CPM provided the best fits for median estimations of second, third, and fourth graders respectively. However, since median estimates are computed across participants, possibly differing from each other in terms of the best-fit model, one should be cautious when interpreting the outcomes (Firebaugh, 2015).

Median Estimates by Target Number in the Unbounded and Bounded Tasks for Each Grade
Note. Solid lines represent best-fit models. Unbounded median estimates were best fit by the 1SPM in second graders and the 2SPM in third and fourth graders. Bounded median estimates were best fit by the SBCM, 1CPM, and 2CPM in second, third, and fourth graders, respectively.

Arithmetic Skills
To determine whether arithmetic performances depended on grade and/or task (addition vs. subtraction), we also conducted a linear mixed effect model. Task and grade with interaction term were entered as fixed effects, while we included intercepts for participants as random effect. Model comparisons were done using chi-squares tests on the log-likelihood values. Since the full model did not provide a better fit than the reduced model without the interaction term, χ 2 (1) = 2.53, p = .11, we proceeded with contrasting the model without interaction term to models that were further reduced by excluding the remaining effects in question. Model fits were significantly worse when excluding either grade, χ 2 (1) = 30.61, p < .001, or task, χ 2 (1) = 43.47, p < .001. Performances were weaker in the subtraction (17.67, SD = 4.52) than the addition task (21.17, SD = 5.14) and in younger children (2 nd grade = 31.70, 3 rd grade = 37.56, 4 th grade = 44.19). Performances were significantly correlated in both tasks (r = .7, p < .001).

Relations Between NLEs and Arithmetic Skills
Better addition and subtraction skills were related to significantly fewer and less variable estimation errors, as indexed by individual mean PAEs and SDs of mean PAEs, in the bounded but not the unbounded task (see Table 4, above the diagonal). Controlling for grade did, however, affect the relation between bounded NLEs and subtraction skills (see Table  4, below the diagonal). Since most of the children relied on calculation strategies, such as probably proportional reasoning, when placing target numbers on bounded number lines, we additionally assessed correlations with arithmetic skills when focussing only on those individuals whose bounded NLEs were best described by one of the variants of the CPMs (n = 60). Correlations between arithmetic skills and bounded mean PAEs as well as SDs of mean PAEs remained stable compared to including all children (Fisher's z for comparison of correlations based on independent samples: addition: mean PAE: z = 0.14, p = .89, SD of mean PAE: z = -0.05, p = .96, subtraction: mean PAE: z = 0.41, p = .68, SD of mean PAE: z = 0.40, p = .69, see Table 5, above the diagonal). This suggests that relations with arithmetic skills were not driven by individuals that did not apply a calculation strategy, such as proportional judgement (n = 9). Moreover, controlling for grade had no effect on the significance of the relations between arithmetic skills and bounded NLEs, when including only those children likely using proportional reasoning (see Table 5, below the diagonal). To further assess the importance of calculation strategies, we determined whether the relation between arithmetic skills and bounded NLEs was conditional upon strategy use by performing moderation analysis using Hayes' PROCESS macro for SPSS. In two separate analyses, we assessed the effect of bounded mean PAEs on addition and subtraction skills respectively, including best-fit model as moderator. Those children whose estimates were best fit by either the MLLM or the 1SPM were categorized as not applying any calculation strategy and compared to those individuals classified as either SBCM, 1CPM or 2CPM. This resulted in a multi-categorical moderator of four levels. In this model, moderation is depicted by the significant effect of the interaction term between bounded mean PAEs and best-fit model on addition and/or subtraction skills, while controlling for the effects of the factors included in the interaction term. A bootstrapping approach with 5.000 bootstrap samples was used. Significance was determined at 95% bias-corrected confidence intervals. To avoid multicollinearity issues, all variables were mean centred prior to analyses.
While the interaction between bounded mean PAEs and best-fit model did not account for a significant proportion of the variance in addition skills (ΔR 2 = .06, p = .13), a tendency was observed for subtraction skills (ΔR 2 = .08, p = .08). When examining the conditional effect at the four different levels of the moderator, no significant relation between bounded mean PAEs and subtraction skills was observed in children that did not apply any calculation strategy, b = -0.03, t(61) = -0.10, p = .92. Conversely, as expected, fewer bounded estimation errors were associated with significantly better subtraction skills in children whose estimates were best described by either the SBCM, b = -0.88, t(61) = -4.00, p < .001, the 1CPM, b = -0.93, t(61) = -2.57, p = .01, or the 2CPM, b = -1.06, t(61) = -1.99, p = .05.
We also assessed correlations between addition and subtraction skills and best-fit model parameters (see Table  6). Model parameters were adjusted as such: absolute values of s-1 and β-1. This allowed us to directly determine whether larger deviations from 1 were associated with poorer arithmetic skills. Parameters of the best-fit models in the unbounded task (i.e., λ of MLLM and β of 1SPM) did not correlate with either addition or subtraction skills. Conversely, significant correlations were observed between arithmetic skills and parameters of the models best describing bounded NLEs (i.e., s of SBCM and βs of SBCM, 1CPM and 2CPM). More concretely, smaller biases (i.e., parameter values closer to 1) were associated with better arithmetic skills. Interestingly, parameter s of SBCM, capturing the extent of subtraction bias, was related to addition but not subtraction skills. Finally, we performed a linear mixed model to determine the effects of best-fit CPM (SBCM vs. 1CPM vs. 2CPM) and/or task (addition vs. subtraction) on arithmetic skills to assess whether the latter performances were affected by the specific type of proportional judgement strategy applied by the children. Best-fit CPM and task with interaction term were entered as fixed effects and intercepts for participants as random effect. Model comparisons were done using chi-squares tests on the log-likelihood values. Since the full model did not provide a better fit than the reduced model without the interaction term, χ 2 (1) = 0.54, p = .46, we proceeded with contrasting the model without interaction term to models that were further reduced by excluding the remaining effects in question. Model fits were significantly worse when excluding task, χ 2 (1) = 40.24, p < .001, but not best-fit CPM, χ 2 (1) = 1.19, p = .28. This thus indicates no differences in overall arithmetic skills depending on the specific calculation strategy used, with additions always being easier than subtractions. In sum, our findings suggest that calculation strategies, such as probably proportional reasoning, underlie the relation between NLEs and arithmetics. However, using a more sophisticated strategy with three instead of two reference points does not positively affect arithmetic skills or vice-versa better arithmetic skills do not necessarily entail the reliance on more reference points when estimating the position of target numbers on bounded number lines. Conversely, how well you apply a certain calculation strategy depends on arithmetic abilities, considering the relations of the different model parameters (i.e., βs) with addition and subtraction skills.

Discussion
In this paper, we aimed to shed further light on the cognitive mechanisms underlying the commonly observed relation between NLEs and arithmetics in older elementary school children. Below we will first discuss the cognitive constructs indexed by the two different NLE tasks and then consider how those mechanisms could explain the differential relations of unbounded and bounded NLEs to arithmetic performances.

Unbounded and Bounded NLEs Likely Index Different Cognitive Constructs
Unbounded and bounded NLEs were unrelated in terms of both estimation errors and model parameters. Moreover, only bounded NLEs improved with grade and were overall better than unbounded performances. In addition, while un bounded NLEs were best described by models reflecting direct estimation, functions supposedly capturing proportional judgment provided better fits for bounded NLEs. The two tasks thus probably elicit different estimation strategies with only the unbounded version providing an accurate measure of children's numerical magnitude representations in the current sample. More concretely, unbounded NLEs of about 85% of the children were best described by either the MLLM or the 1SPM, reflecting direct estimation. In addition, the goodness of fits (as indexed by AICc) provided by these models were better than those of the CPMs, probably indexing proportional reasoning strategies. Unbounded estimation errors also significantly increased as a function of target number. This pattern reflects a signature of the approximate number system (e.g., Cantlon, Cordes, Libertus, & Brannon, 2009;Cohen & Blanc-Goldhammer, 2011;Gallistel & Gelman, 2000) and is not characteristic of proportional judgment, indicated by an M-shaped distribution of error variability (e.g., Link, Huber, et al., 2014;Reinert et al., 2019). Our findings thus suggest that the unbounded task directly assesses the representation of numerical magnitudes. This agrees with previous studies, reporting the superior fits of models directly tapping into number representation (Cohen & Blanc-Goldhammer, 2011;Cohen & Sarnecka, 2014;Jung et al., 2020;Kim & Opfer, 2017;Link, Huber, et al., 2014;Reinert et al., 2017). Interestingly, there were no grade-related differences in the degree of logarithmic compression in estimations, as indexed by λ of the MLLM, or the numerical bias in estimations, as reflected by β of 1SPM, suggesting that numerical magnitude representations remained fairly stable at that developmental stage. Moreover, λ was close to 0 and β was significantly positively accelerating across all grades. These findings thus provide no evidence for any logarithmic compression in unbounded estimations, even in the youngest children of the current sample, and thereby confirm the assumption that NLEs are no longer characterized by a logarithmic-like pattern in the current age group using a familiar number range (Booth & Siegler, 2006).
Conversely, bounded NLEs of the majority of the children were best fit by the CPMs, commonly reported to index proportional judgement (Barth & Paladino, 2011;Cohen & Blanc-Goldhammer, 2011;Cohen & Sarnecka, 2014;Jung et al., 2020;Rouder & Geary, 2014;Slusser et al., 2013). While the SBCM provided the best fit for estimates of most of the second and third graders, the estimates of the majority of fourth graders were described to an equivalent extent by either the SBCM, 1CPM or 2CPM. This pattern was confirmed when considering the average goodness of fits of all the different models per grade. Based on AICc, both the SBCM and the 1CPM can provide substantial support in favor of bounded NLEs (e.g., Burnham & Anderson, 2002). Model fits thus suggest that in the bounded task, the majority of the children relied on two reference points (i.e., the origin and endpoint) and used subtraction or division to scale target numbers to line length, with some children (mainly the younger) still lacking the mathematical skills to successfully do so. The use of calculation strategies as opposed to direct estimation in the bounded task is also corroborated by the absence of a significant relation between bounded estimation errors and target numbers. Moreover, fewer errors were made at to-be-expected reference points (i.e., the origin and endpoint), further confirming the reliance on proportional judgement in the bounded task. The application of calculation strategies might then also explain grade-related changes in bounded but not unbounded NLEs, considering that such strategies are fostered in school.

Proportional Reasoning Likely Explains the Relation Between NLEs and Arithmetic Skills
Bounded NLEs, likely indexing proportional reasoning, related to addition and subtraction skills. Importantly, these relations remained significant when controlling for grade in children whose estimated were best described by the CPMs. This generally agrees with previous findings, consistently reporting strong relations between bounded NLEs and arithmetics (e.g., Ashcraft & Moore, 2012;Booth & Siegler, 2006;Fazio et al., 2014;Geary, 2011;Jung et al., 2020;Sasanguie et al., 2013;Siegler & Booth, 2004;see Schneider et al., 2018, for a meta-analysis; see Siegler, 2016, for a review). Conversely, no relation was observed for unbounded NLEs, reflecting direct estimation in the majority of children. This is also in line with previous observations, indicating that correlations between NLEs and mathematical skills were moderated by the variant of the number line task (Schneider et al., 2018), with positive correlations being observed only for bounded but not unbounded NLEs in older elementary school children  and adolescents (Jung et al., 2020). Altogether, these outcomes suggest that the commonly observed association between NLEs and mathematical competences is likely explained by calculation strategies rather than the MNL, at least in second to fourth graders. This can be further supported by the moderation analysis, which revealed a tendency for a significant interaction of bounded NLEs and best-fit model on subtraction skills. More concretely, relations between bounded estimation errors and subtraction skills were only observed in children whose estimates were best described by the CPMs, likely capturing proportional judgement, but not the MLLM or 1SPM, indexing direct estimation. It should, however, be noted that the present moderation analysis was mostly exploratory. Namely, groups based on best-fit model were fairly unequal in size, since bounded NLEs of only 10% of the children were best described by models other than the CPMs. Nonetheless, it strengthens the assumption that calculation strategies account for the correlation between NLEs and arithmetics in older elementary school children.

Practical Implications
It is important to also comment on the more practical implications of the present outcomes. What does the absence of a relation between arithmetics and unbounded NLEs, probably providing a purer measure of numerical magnitude representations, tell us about the importance of the latter for arithmetic development? The present findings suggest that the scaling of numerical magnitudes on the MNL does not relate to arithmetics in second to fourth graders. This agrees with studies reporting the absence of a relation between the SNARC effect, another important marker of the MNL, and arithmetic skills in older children attending fourth (Georges et al., 2017) or fifth grade (Schneider et al., 2009) and even adults (Cipora & Nuerk, 2013;but see Hoffmann, Mussolin, Martin, & Schiltz, 2014). In contrast, younger children might rely on the MNL when performing arithmetics. Accordingly, Kim and Opfer (2017) reported that not only bounded but also unbounded NLEs, both best described by the MLLM indexing numerical magnitude representations, significantly correlated with addition and subtraction skills in younger children attending preschool, first or second grade when assessed on relatively unfamiliar number lines. Moreover, some previous SNARC studies indicated strong relations between the SNARC effect and arithmetic skills in preschool (Hoffmann et al., 2013) and up to third grade (Georges et al., 2017). In contrast to the latter SNARC study, we did, however, not find a relation between unbounded NLEs and addition or subtraction skills even in the youngest children attending second grade. This might be explained by the possibility that relations to arithmetics depend on the specific properties of the MNL indexed by the numerical task. The spatial properties of the MNL include both directionality (i.e., the specific directional orientation) and scaling (i.e., the spatial intervals between numerical values, see Aulet & Lourenco, 2018). While the unbounded task likely reflects the scaling of numerical magnitudes, the SNARC task probably rather indexes the directionality of numerical magnitude representations (Cipora, Patro, & Nuerk, 2015). As such, MNL scaling could be more important for arithmetics at the initial stages of mathematical development, while directionality plays an important role also at later stages until none of the properties of the MNL are predictive of math achievement. Once that stage is reached, arithmetic performances might solely depend on calculation strategies, as captured by the bounded task. Overall, the present findings do not question the importance of the MNL, but only suggest that the scaling of numerical magnitude representations might be less important for arithmetic learning in older elementary school children.
Despite the outcome of the present study providing no evidence for the importance of numerical magnitude representations for arithmetic learning, the relation of bounded NLEs to addition and subtraction skills further suggests that bounded number lines are a valuable and robust tool for predicting mathematical competence. Since number line tasks are easily applicable, relatively short, and very cost-effective, they could also be used to assist the diagnosis of mathematical learning difficulties. The present study also further highlights the potential usefulness of training the positioning of target numbers on bounded number lines for arithmetic development.

Limitations and Future Directions
First of all, it should be emphasized that the present outcomes might not be generalizable to different age groups and/or number ranges. We tested older elementary school children to 1) complement previous within-subject designs assessing both model fits and relations to arithmetics in younger children (Kim & Opfer, 2017) or adolescents (Jung et al., 2020) and to 2) determine whether relations between arithmetics and bounded but not unbounded NLEs previously observed in fourth graders  can actually be explained by the reliance on proportional judgement on bounded as opposed to the MNL on unbounded number lines in such older elementary school children (Link, Huber, et al., 2014). To do so, we focussed on familiar numbers in the range of 0-20 and 0-100 on unbounded and bounded lines respectively (see Link, Huber, et al., 2014 and, for comparable differences in number ranges between bounded and unbounded tasks), since the likelihood of strategy application and the reliable use of reference points was shown to be increased by employing a familiar number range (e.g., Slusser et al., 2013;White & Szűcs, 2012). Furthermore, a familiar number range was also used in the study of , reporting a relation between bounded but not unbounded NLEs and arithmetics in fourth graders.
Since relatively older children (and adults) usually produce seemingly linear estimation patterns on familiar number lines, while negatively accelerating logarithmic-like responses are commonly observed in younger children or with less familiar number ranges, it is probable that different conclusions regarding the constructs underlying bounded and unbounded NLEs as well as their relations to arithmetics might be drawn depending on age and/or number range. Since the shift from seemingly logarithmic to more linear (unbounded and bounded) NLEs with age and experience was suggested to reflect 1) changes in the disposition of numerical magnitude representations (e.g., Siegler & Booth, 2004;Siegler & Opfer, 2003), 2) the maturation of calculation strategies (e.g., the successful use of reference points, see Slusser et al., 2013), 3) the development of better mensuration skills (e.g., Cohen & Sarnecka, 2014) or 4) the reliance on different cognitive constructs (e.g., MNL and calculation strategies in younger and older individuals respectively, Dackermann et al., 2015), all of these factors could potentially differentially influence relations to arithmetics across development.
The idea that different cognitive constructs might underlie NLEs at different developmental stages could be suppor ted by the findings of Yuan and colleagues (2020), showing that performances on the bounded task were related to counting skills and did not correlate with a clustered dot array task that is readily solvable by proportional reasoning skills in 4-to 6-year-olds. This suggests that as opposed to older and more experienced children, bounded NLEs in younger individuals might not index calculation strategies, such as proportional reasoning, but reflect numerical magnitude representations. This would also agree with studies suggesting that strategy application might only develop in older children on familiar number lines (e.g., White & Szűcs, 2012). It could also provide an explanation for the findings of Kim and Opfer (2017), indicating that the negatively accelerating (logarithmic-like) estimation patterns of younger children attending preschool, first or second grade on less familiar bounded number lines were better fitted by the MLLM, indexing the MNL, than by alternative models reflecting calculation strategies. Likewise, Link, Huber, et al. (2014) reported similar results for bounded and unbounded number lines in first graders, suggesting that both tasks might assess the same construct, namely numerical magnitude representations, in these relatively younger children. On the other hand, Cohen and Sarnecka (2014) argued that the negatively accelerating pattern on bounded lines in younger children or with higher number ranges might still describe the application of a calculation strategy (rather than the MNL), yet result from relatively poorer mensuration skills. Likewise, Slusser et al. (2013) reported that younger children's bounded NLEs were best described by the proportion judgment account and that their logarithmic-like esti mation pattern reflected the inability to correctly use the upper reference point. In any case, these findings collectively suggest that bounded NLEs might not necessarily index the same advanced proportional reasoning strategy in younger less experienced children than in older children on familiar number lines.
Apart from the constructs underlying NLEs, also their relations to arithmetics might vary depending on age and/or number range. For instance, Kim and Opfer (2017) reported that not only bounded but also unbounded NLEs significantly correlated with addition and subtraction skills in younger children attending preschool, first or second grade when assessed on relatively unfamiliar number lines 3 . Interestingly, as already mentioned before, in that study both bounded and unbounded NLEs were best described by the MLLM, indexing numerical magnitude representations. This thus agrees with previous studies highlighting the importance of the MNL for arithmetic learning at earlier developmental stages (e.g., Hoffmann et al., 2013). λ of the MLLM also significantly differed from zero for both bounded and unbounded NLEs, confirming the negatively accelerating logarithmic-like estimation patterns usually observed at this developmental stage. It is therefore possible that as long as unbounded NLEs are characterized by a logarithmic-like pattern, possibly reflecting logarithmically compressed numerical magnitude representations (Kim & Opfer, 2017), they explain variance in arithmetics, with less compression indicating better performances. Since Kim and Opfer (2017) could draw similar conclusions for bounded than unbounded NLEs at such earlier developmental stages, it is likely that proportional reasoning only underlies the relation between bounded NLEs and arithmetics in older children, as assessed in the current study.
Another important point worth mentioning is that due to the linear-like estimation patterns usually observed in older children on familiar number lines, the present study did not include any additional models, such as the bilinear (also known as decomposed linear or segmented linear) account. This model was suggested to provide good fits for logarithmic-like NLEs and is thereby yet another alternative explanation for the logarithmic-to-linear shift hypothesis (see Ebersbach et al., 2008or Moeller et al., 2009). More concretely, this account suggests that the logarithmic-like shape of NLEs in younger children or on unfamiliar number lines results from a difference in numerical processing depending on the number interval, with each number segment yielding a different slope. While Ebersbach and colleagues (2008) suggested that the breakpoint at which the response function alters reflects changes in number familiarity, Moeller and colleagues (2009) argued that it might depend on the understanding of the place-value structure of the Arabic number system. Since such an estimation pattern originating from two linear intervals can also be fitted by a logarithmic curve, these authors argued that the underlying numerical magnitude representations might have been erroneously assumed to feature a logarithmic disposition in younger children. Instead, they argue for a decomposed representation of either single-and two-digit numbers or of familiar and unfamiliar numerical magnitudes. This decomposed representation is then eventually integrated into one holistic linear representation with age and experience, explaining mostly linear-like NLEs in older children. Including this account as a potential model for negatively accelerating response patterns in younger children or with less familiar numbers could shed further light on whether the initially logarithmic-like response patterns on unbounded and bounded number lines actually index a logarithmically compressed MNL (e.g., Siegler & Booth, 2004;Siegler & Opfer, 2003), relatively poorer mensuration skills (e.g., Cohen & Sarnecka, 2014), the inability to use reference points (e.g., Slusser et al., 2013), or rather insufficient place-value coding (Moeller et al., 2009). Finally, apart from using different models with unfamiliar number ranges at earlier developmental stages, future studies should also complement the present findings by considering potential domain-specific and/or domain-general covariates. In the current study, we employed a hybrid design, where we used a modelling procedure to better under stand the constructs underlying (bounded and unbounded) NLEs and consequently individual differences in addition and subtraction skills (see e.g., Kim & Opfer, 2017, or Jung et al., 2020, for a similar design). Nonetheless, further assessing the influences of basic numerical skills and/or visuospatial abilities on NLEs and arithmetics as well as their relations might strengthen the current conclusions. Based on the present outcomes, one would for instance envisage a relation between visuospatial abilities, likely required for efficient proportional reasoning, and bounded but not (or to a lesser extent) unbounded NLEs. Visuospatial abilities might then also act as a confounder in the relation between bounded NLEs and arithmetics. Support for this assumption has already been provided by some studies, showing that the relation between bounded NLEs and mathematical performances was fully explained by visuospatial abilities in 10-year-old children (Simms et al., 2016) and adults (Sella et al., 2016). Moreover, since only unbounded but not bounded NLEs were found to provide a reliable index of the MNL in the current sample using a familiar number range, only the former might relate to other measures of symbolic and non-symbolic numerical magnitude representations, such as (non-)symbolic number comparison performances, the distance and/or SNARC effects. Evidence for this idea already comes from the study of Schneider and colleagues (2009), finding no relations between bounded NLEs and the SNARC or distance effect in fifth and sixth graders.
3) Considering that Kim and Opfer (2017) presented participants with unbounded number lines up to 1000, but likely did not provide enough space to validly measure a positively accelerating response function, unbounded NLEs could have been inadvertently biased similarly to bounded NLEs, such that their conclusions based on unbounded lines should be taken with care (see also Cohen & Ray, 2020).

Conclusion
In second to fourth graders, unbounded and bounded NLEs index different cognitive constructs. While unbounded estimates reflect direct estimation, thereby providing an appropriate measure of the scaling of numerical magnitude representations, bounded estimates rather index calculation strategies, such as proportional reasoning. These calculation strategies then likely account for the relation of bounded but not unbounded NLEs to addition and subtraction skills. Although the present findings do not provide any evidence for the involvement of numerical magnitude representations for arithmetic learning, we cannot rule out their importance at earlier developmental stages. Depending on the aim of future studies, the present outcomes suggest measuring estimations on unbounded number lines if one is interested in directly assessing numerical magnitude representations in second to fourth graders. Conversely, if one aims to predict arithmetic skills at this age, one should rather assess estimations on bounded number lines, likely indexing proportional reasoning.
Funding: The current research was supported by the National Research Fund Luxembourg (FNR, www.fnr.lu) under Grant AFR PhD-2013-1/5558196.