<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with MathML3 v1.2 20190208//EN" "JATS-journalpublishing1-mathml3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en">
<front>
<journal-meta><journal-id journal-id-type="publisher-id">JNC</journal-id><journal-id journal-id-type="nlm-ta">J Numer Cogn</journal-id>
<journal-title-group>
<journal-title>Journal of Numerical Cognition</journal-title><abbrev-journal-title abbrev-type="pubmed">J. Numer. Cogn.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2363-8761</issn>
<publisher><publisher-name>PsychOpen</publisher-name></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">jnc.17459</article-id>
<article-id pub-id-type="doi">10.5964/jnc.17459</article-id>
<article-categories>
<subj-group subj-group-type="heading"><subject>Empirical Research</subject></subj-group>
</article-categories>
<title-group>
<article-title>Optimizing the 0-100 Number Line Estimation Task: Scale Reduction and Its Implications for Elementary Mathematical Cognition</article-title>
<alt-title alt-title-type="right-running">Optimizing 0-100 Number Line Estimation Task</alt-title>
<alt-title specific-use="APA-reference-style" xml:lang="en">Optimizing the 0-100 number line estimation task: Scale reduction and its implications for elementary mathematical cognition</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes"><name name-style="western"><surname>Chawla</surname><given-names>Kamal</given-names></name><xref ref-type="corresp" rid="cor1">*</xref><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="author"><name name-style="western"><surname>Booth</surname><given-names>Julie L.</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib>
<contrib contrib-type="author"><name name-style="western"><surname>Barbieri</surname><given-names>Christina Areizaga</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib>
<contrib contrib-type="editor">
<name>
<surname>Yeo</surname>
<given-names>Darren J.</given-names>
</name>
<xref ref-type="aff" rid="aff4"/>
</contrib>
<aff id="aff1"><label>1</label><institution content-type="dept">College of Education and Human Development</institution>, <institution>University of Maine</institution>, <addr-line><city>Orono</city>, <state>ME</state></addr-line>, <country country="US">USA</country></aff>
<aff id="aff2"><label>2</label><institution content-type="dept">College of Liberal Arts</institution>, <institution>Temple University</institution>, <addr-line><city>Philadelphia</city>, <state>PA</state></addr-line>, <country country="US">USA</country></aff>
<aff id="aff3"><label>3</label><institution content-type="dept">College of Education and Human Development</institution>, <institution>University of Delaware</institution>, <addr-line><city>Newark</city>, <state>DE</state></addr-line>, <country country="US">USA</country></aff>
<aff id="aff4">Nanyang Technological University, Singapore, <country>Singapore</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>*</label>326 Shibles Hall, College of Education and Human Development, University of Maine, Orono, United States. <email xlink:href="kamal.chawla@maine.edu">kamal.chawla@maine.edu</email></corresp>
</author-notes>
<pub-date date-type="pub" publication-format="electronic"><day>08</day><month>05</month><year>2026</year></pub-date>
<pub-date pub-type="collection" publication-format="electronic"><year>2026</year></pub-date>
<volume>12</volume>
<elocation-id>e17459</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>03</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>31</day>
<month>12</month>
<year>2025</year>
</date>
</history>
<permissions><copyright-year>2026</copyright-year><copyright-holder>Chawla, Booth, &amp; Barbieri</copyright-holder><license license-type="open-access" specific-use="CC BY 4.0" xlink:href="https://creativecommons.org/licenses/by/4.0/"><ali:license_ref>https://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License, CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p></license></permissions>
<abstract>
<p>We investigate the optimal number of items for the 0-100 number line estimation task used in research on children’s mathematical cognition and learning. In this paper, we reanalyzed data involving <italic>N</italic> = 234 students, applying an Item Response Theory- Graded Response Model to identify items with high discrimination parameters (&gt; 1.0), iteratively reducing the 23-item scale by including items with discrimination values close to 1.0 until the reduced scale produced comparable scores to the original. Our analysis identified a reduced scale of 15 items that maintained strong correlations with–and produced consistent patterns of developmental change and predictive capability compared to–the original scale. Our findings demonstrate that a reduced 0-100 number line estimation task can effectively measure numerical magnitude understanding (accuracy and linearity of estimates) from kindergarten through third grade while saving time and resources.</p>
</abstract>
<kwd-group kwd-group-type="author"><kwd>mathematical cognition</kwd><kwd>mathematics</kwd><kwd>number line</kwd><kwd>estimation</kwd><kwd>cognitive development</kwd><kwd>reduced scale</kwd><kwd>item response theory</kwd></kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro"><title></title>
<p>Over the past two decades, the number line estimation (NLE) task has arguably become one of the most well-known methods of measuring numerical competence in adults and children alike. A simple Google Scholar search using the term “Number Line Estimation” yielded over 3300 hits, and the seminal NLE papers have each been cited over 1100 times (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>, <italic>1102 citations</italic>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>, <italic>1566 citations</italic>). Though each of these citations certainly does not indicate that the task itself was used in the study, it is clear, at a minimum, that results from NLE tasks have been widely used to shape our understanding of numerical cognition.</p>
<p>The core NLE task involves presenting a series of blank number lines with the endpoints of the line marked by specific numbers; for each item, a third number that is numerically somewhere between the two endpoint numbers is given, and the participant’s task is to estimate where that number would fall on the number line (e.g., <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>). For example, on a 0-100 number line task, the number “0” would be marked at the left endpoint and “100” marked at the right endpoint, and an individual item might ask participants to demonstrate where they believe individual numbers (e.g., 23, 47, 2, 96, 14, etc.) belong on the line. Scoring of this task typically involves either computing a participant’s estimation accuracy across all of their items</p>
<disp-formula>
  <mml:math>
    <mml:mrow>
      <mml:mtext>Percent of Absolute Error (PAE)</mml:mtext>
      <mml:mo>=</mml:mo>
      <mml:mrow>
        <mml:mo>|</mml:mo>
        <mml:mfrac>
          <mml:mrow>
            <mml:mtext mathvariant="italic">Target Number</mml:mtext>
            <mml:mo>&#x2212;</mml:mo>
            <mml:mtext mathvariant="italic">Estimated Number</mml:mtext>
          </mml:mrow>
          <mml:mrow>
            <mml:mtext mathvariant="italic">Scale of Estimates</mml:mtext>
          </mml:mrow>
        </mml:mfrac>
        <mml:mo>|</mml:mo>
      </mml:mrow>
      <mml:mo>&#x00D7;</mml:mo>
      <mml:mn>100</mml:mn>
    </mml:mrow>
  </mml:math>
</disp-formula>
<p>(<xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>) or examining a participants’ pattern of estimates across all of their items by fitting different types of models to their estimates (e.g., linear, logarithmic, exponential (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>); power models (<xref ref-type="bibr" rid="r5">Barth &amp; Paladino, 2011</xref>; <xref ref-type="bibr" rid="r46">Ruiz et al., 2023</xref>); mixed log linear models (<xref ref-type="bibr" rid="r42">Qin et al., 2024</xref>)).</p>
<p>The use of the NLE task has been widespread across both contexts and cultures. For example, the task has been administered in laboratory or one-on-one settings (<xref ref-type="bibr" rid="r41">Praet &amp; Desoete, 2014</xref>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>; <xref ref-type="bibr" rid="r64">Wall et al., 2016</xref>), online using platforms such as MTurk (<xref ref-type="bibr" rid="r34">Landy et al., 2017</xref>) and Qualtrics (<xref ref-type="bibr" rid="r24">Fitzsimmons &amp; Thompson, 2022</xref>), in an FMRI scanner (<xref ref-type="bibr" rid="r63">Vogel et al., 2013</xref>), and in both K-12 (<xref ref-type="bibr" rid="r4">Barbieri et al., 2021</xref>; <xref ref-type="bibr" rid="r33">Jung et al., 2020</xref>) and college classrooms (<xref ref-type="bibr" rid="r59">Steinke, 2017</xref>). NLE tasks have also been employed in studies across the globe, including in Israel (<xref ref-type="bibr" rid="r1">Ashkenazi &amp; Cohen, 2023</xref>), Singapore (<xref ref-type="bibr" rid="r47">Ruiz et al., 2024</xref>), Chile (<xref ref-type="bibr" rid="r65">Xu et al., 2023</xref>), Luxembourg (<xref ref-type="bibr" rid="r39">Nuraydin et al., 2023</xref>), Zambia (<xref ref-type="bibr" rid="r60">Sudo et al., 2022</xref>), China (<xref ref-type="bibr" rid="r35">Li et al., 2024</xref>), Germany (<xref ref-type="bibr" rid="r33">Jung et al., 2020</xref>), Turkey (<xref ref-type="bibr" rid="r49">Sarı &amp; Olkun, 2021</xref>), and with individuals from Indigenous tribes (<xref ref-type="bibr" rid="r17">Dehaene et al., 2008</xref>).</p>
<p>The characteristics of the number line task have varied widely across studies. For example, many studies use a paper-and-pencil administration (e.g., <xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>; <xref ref-type="bibr" rid="r14">Chan &amp; Mazzocco, 2024</xref>), while others use computerized versions of the task (e.g., <xref ref-type="bibr" rid="r10">Booth &amp; Siegler, 2008</xref>; <xref ref-type="bibr" rid="r23">Fazio et al., 2014</xref>; <xref ref-type="bibr" rid="r28">Gunderson &amp; Hildebrand, 2021</xref>). Most studies use the number-to-position version of the task described above, though others have given a mark on a number line and asked the participant to estimate what number goes there (Position-to-Number, e.g., <xref ref-type="bibr" rid="r40">Peeters et al., 2017</xref>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>). And while most studies present each item individually as described above (i.e., participants only see one number line at a time), others have provided a page containing several number lines that are visible at once (<xref ref-type="bibr" rid="r3">Barbieri et al., 2023</xref>; <xref ref-type="bibr" rid="r66">Young &amp; Booth, 2015</xref>) or have asked participants to place multiple numbers on the same number line (<xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>; <xref ref-type="bibr" rid="r59">Steinke, 2017</xref>).</p>
<p>Perhaps the greatest variability across studies using number line estimation tasks is the numerical scale and the type of numbers to be placed. While many studies have continued to employ the 0 – 100 or 0 – 1000 whole-number scales introduced in the original studies (e.g., <xref ref-type="bibr" rid="r31">Hoard et al., 2008</xref>; <xref ref-type="bibr" rid="r41">Praet &amp; Desoete, 2014</xref>; <xref ref-type="bibr" rid="r49">Sarı &amp; Olkun, 2021</xref>; <xref ref-type="bibr" rid="r61">Sullivan et al., 2011</xref>), others have used much smaller (e.g., 0 – 10, <xref ref-type="bibr" rid="r19">Dietrich et al., 2016</xref>; 0 – 20, <xref ref-type="bibr" rid="r16">Cornu et al., 2017</xref>; 0 – 8, <xref ref-type="bibr" rid="r67">Yu et al., 2022</xref>) or larger scales (e.g., 0 – 10,000, <xref ref-type="bibr" rid="r33">Jung et al., 2020</xref>; 0 – 1,000,000, <xref ref-type="bibr" rid="r58">Slusser et al., 2013</xref>), irregular scales such as 0 – 62,571 (<xref ref-type="bibr" rid="r8">Booth et al., 2014</xref>) or 1,000 – 1,000,000,000 (<xref ref-type="bibr" rid="r34">Landy et al., 2017</xref>), and scales involving negative numbers (-1000 – 0, <xref ref-type="bibr" rid="r11">Brez et al., 2016</xref>; -1,000 – 10,000, <xref ref-type="bibr" rid="r66">Young &amp; Booth, 2015</xref>); some have even explored the use of unbounded number lines (e.g., <xref ref-type="bibr" rid="r36">Link et al., 2014</xref>; <xref ref-type="bibr" rid="r43">Reinert et al., 2019</xref>). The number line task has also been used to measure participants’ understanding of different types of numbers, including fractions (e.g., <xref ref-type="bibr" rid="r7">Booth &amp; Newton, 2012</xref>; <xref ref-type="bibr" rid="r38">Namkung &amp; Fuchs, 2016</xref>), decimals (<xref ref-type="bibr" rid="r18">DeWolf et al., 2015</xref>; <xref ref-type="bibr" rid="r52">Schneider et al., 2009</xref>), and percentages (<xref ref-type="bibr" rid="r51">Schiller et al., 2024</xref>).</p>
<p>Despite the wide variability in the particulars of NLE task administration, there is a large consensus that individuals’ NLE performance is related to (and often predictive of) their general mathematical competence. For example, NLE has been shown to predict arithmetic performance (e.g., <xref ref-type="bibr" rid="r19">Dietrich et al., 2016</xref>; <xref ref-type="bibr" rid="r28">Gunderson &amp; Hildebrand, 2021</xref>), problem-solving skills (e.g., <xref ref-type="bibr" rid="r68">Zhu et al., 2017</xref>), and mathematical reasoning (e.g., <xref ref-type="bibr" rid="r47">Ruiz et al., 2024</xref>). In a meta-analysis of 41 papers, <xref ref-type="bibr" rid="r53">Schneider and colleagues (2018)</xref> reported strong associations between NLE and counting, computation skills, and school mathematics achievement. <xref ref-type="bibr" rid="r21">Ellis and colleagues (2021)</xref> replicated these overall findings, emphasizing the importance of performance on the NLE task, but found performance to be particularly predictive of mathematical competence in younger children. NLE performance even correlates with brain activation while solving arithmetic problems (<xref ref-type="bibr" rid="r6">Berteletti et al., 2015</xref>), and poor NLE performance has been linked to mathematical learning disabilities (<xref ref-type="bibr" rid="r26">Geary et al., 2012</xref>). In a recent systematic review of 33 manuscripts that examine predictors of algebra performance, NLE tasks were one of the most studied student-level factors, with fraction NLE tasks consistently demonstrating their predictive utility, but whole number line tasks were also often used (<xref ref-type="bibr" rid="r56">Silla et al., under review</xref>).</p>
<p>The prevalence of its employment and the usefulness of the task for predicting more complex mathematical competencies make it likely that the NLE will continue to be an essential measure in psychology and education research. Given its prominence as a predictor of school mathematics achievement and the fact that NLE data collection often takes place in schools during precious classroom time—either with full-class administration during class or by pulling individual students out of class—it is necessary to ensure that the task is as efficient as possible. However, the number of target items has varied quite a bit from study to study, ranging from a minimum of 6 items to 44 or more items (<xref ref-type="bibr" rid="r53">Schneider et al., 2018</xref>). Additionally, although studies with older children and adolescents often use paper-based tasks that can be administered reasonably quickly in whole-group settings (e.g., <xref ref-type="bibr" rid="r4">Barbieri et al., 2021</xref>), this is not the case for younger children. In studies with young children, number line tasks are often administered one-on-one with an experimenter using an iPad (e.g., <xref ref-type="bibr" rid="r27">Geary et al., 2008</xref>). Given that the 0 – 100 scale, in particular, is among the most widely used scales (696 Google Scholar hits, compared with 465 for 0 – 1000 and 542 for 0 – 1) and is useful for young children (who likely do not yet have fully developed attentional control; <xref ref-type="bibr" rid="r62">Tremolada et al., 2019</xref>), we aim to determine how few target items could be used to get an accurate measure of children’s numerical magnitude understanding to ensure responsible use of teachers’ and students’ time.</p>
<sec sec-type="other1"><title>Scale Reduction Techniques</title>
<p>Scale reduction techniques are widely used in research to create more efficient versions of longer measurement scales while maintaining their essential psychometric properties. These methods are especially valuable when researchers seek to reduce participant burden, streamline data collection, and increase the feasibility of applying the scale in various settings.</p>
<p>One technique used in this study is the discrimination parameter method, derived from the Graded Response Model (GRM) within Item Response Theory (IRT). The GRM is particularly suited for scales with ordered categories, such as the number line estimation tasks in the present study. By prioritizing items with high discrimination parameters, researchers can retain items that are most effective in differentiating between individuals with similar underlying traits (<xref ref-type="bibr" rid="r22">Embretson &amp; Reise, 2000</xref>). Discrimination parameters reflect how well an item distinguishes between respondents at different levels of the measured latent trait. Items with low discrimination values are discarded since they contribute little to the overall precision of the scale.</p>
<p>The optimal range of discrimination parameters in GRM typically falls between 0.5 and 2 (<xref ref-type="bibr" rid="r30">Hambleton et al., 1991</xref>). Items with discrimination values in this range are considered highly effective at differentiating between respondents, making them ideal candidates for retention on a reduced scale. Lower values, particularly below 0.5, indicate weak discrimination, and such items are often removed from the scale due to their minimal contribution to measurement accuracy. Items with high discrimination values (above 1.5) are particularly effective at distinguishing between respondents with different ability levels, while items with moderate discrimination values (between 0.5 and 1.5) still contribute meaningfully to the overall precision of the scale. Setting a value around 1.0 ensures that only items with optimal discrimination remain on the reduced scale. This technique allows for a more concise instrument without compromising its ability to measure the intended construct.</p>
<p>Another common method for scale reduction focuses on assessing model fit indices, such as Root Mean Square Error (RMSE) and Standardized Root Mean Square Residual (SRMR). These indices are frequently used in confirmatory factor analysis (CFA) to evaluate how well a model (with a reduced number of items) fits the data. While RMSE and SRMR help provide an overall picture of model fit, they are often used in conjunction with discrimination-based approaches to ensure both psychometric robustness and conceptual clarity of the reduced scale (<xref ref-type="bibr" rid="r12">Brown, 2015</xref>). In these approaches, researchers typically retain items that contribute to lower error rates and higher overall fit indices, making them essential tools when developing shorter scales that maintain their integrity.</p>
<p>Lastly, some techniques focus on maximizing the amount of information provided by the scale at different points on the trait continuum. In these approaches, often called test information functions, items are selected based on their contribution to the total information at key points (e.g., low, moderate, and high levels of the trait). This method allows researchers to ensure that the reduced scale provides reliable measurements across the entire range of trait levels (<xref ref-type="bibr" rid="r2">Baker, 2001</xref>).</p>
<p>By employing discrimination parameters through the GRM and prioritizing items with high thresholds, the current study leverages an efficient method to produce a reduced scale that remains effective and valid for measuring developmental changes in number line estimation tasks. This technique is balanced against other reduction strategies to ensure that the final scale continues to perform comparably to the original instrument.</p></sec>
<sec sec-type="other2"><title>The Present Study</title>
<p>In the present study, we investigate the usefulness of scale reduction techniques for finding the optimal number of items for the 0 – 100 number line estimation task. By reanalyzing the original data from two seminal studies that employed this measure (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>), we aim to determine the minimal set of items that could produce a scale with the same psychometric properties as the original, longer scale. These seminal studies employed the 0 – 100 number line task with students from kindergarten to second grade and kindergarten to third grade, respectively. We recognize that second and third graders are typically quite linear and accurate on the 0 – 100 number line scale, and thus number line estimates on the 0 – 1000 scale might be a more appropriate indicator of their numerical competence. However, we feel that investigating the potential of a reduced scale on this most widely used scale (0 – 100) requires demonstration of replication of the findings from the original papers, which would require the use of all the data included in the original papers, not just that of kindergartners and first graders for whom this scale might be most relevant. We thus compare potentially reduced scales to the original scale on two measures of interest from the original studies to determine if a reduced scale yields sufficiently comparable performance scores for students. We conduct the exploratory factor analysis (EFA) to draw meaningful psychometric comparisons between the two scales. Finally, we aim to replicate key findings from the original studies to determine if a reduced scale would yield the same results as the original studies’ research questions around developmental changes in number line estimates and the correlation of those estimates with mathematics achievement.</p></sec></sec>
<sec sec-type="methods"><title>Method</title>
<sec><title>Data Sources</title>
<p>The dataset comprises accumulated data from three previous studies: <xref ref-type="bibr" rid="r54">Siegler and Booth (2004)</xref> Experiment 1; <xref ref-type="bibr" rid="r54">Siegler and Booth (2004)</xref> Experiment 2; and <xref ref-type="bibr" rid="r9">Booth and Siegler (2006)</xref> Experiment 1, each of which measured number line estimation on a 0-100 scale using 23 common number items: 3, 4, 6, 8, 12, 17, 21, 24, 25, 29, 33, 39, 48, 52, 57, 61, 64, 72, 79, 81, 84, 90, and 96; the <xref ref-type="bibr" rid="r54">Siegler and Booth (2004)</xref> study also used 43, while the <xref ref-type="bibr" rid="r9">Booth and Siegler (2006)</xref> study used 42; these items were therefore eliminated from the analysis. Estimates for these 23 numbers, grade level, age, school, and achievement score, were compiled from <italic>N</italic> = 234 students across the three studies: 61 kindergartners, 76 first-graders, 75 second-graders, and 22 third-graders. In this task, students were presented with sheets of paper, one at a time, each containing a 25 cm number line with a 0 marked at the left endpoint and 100 marked at the right endpoint. A number between 0 and 100 was printed at the top center of the page, and the student’s task was to indicate where on the number line they believed that number would go; they did this by making a mark on the number line at the preferred spot. For each item, the original researchers measured the number of millimeters from the left endpoint to the student’s placement. This was divided by the length of the whole scale and multiplied by 100 to determine what number should go where the student placed the mark. The data used for the present study consisted of those numbers for each student for each item and the student’s age, grade level, and mathematics achievement score.</p></sec>
<sec><title>Assessment Framework</title>
<p>This study utilizes number line estimation as a diagnostic tool for evaluating students' numerical magnitude representation capabilities. This approach is integral to discerning foundational mathematical understanding, necessitating an exploration of student response patterns to integer estimations on a numerical line. The investigation encompasses the evaluation criteria, the re-categorization of responses, and the probability assessment of endorsing a response predicated on item difficulty and discrimination attributes.</p></sec>
<sec><title>Item Selection Criteria</title>
<p>The redevelopment of the number line estimation scale was guided by the utilization of high discrimination parameters and an optimum threshold value (~1.0), as recommended by <xref ref-type="bibr" rid="r22">Embretson and Reise (2000)</xref> and <xref ref-type="bibr" rid="r30">Hambleton and colleagues (1991)</xref>. This method prioritizes the differentiation capacity of items based on their ability to distinguish between individuals of marginally differing trait levels, coupled with the spacing and meaningfulness of response categories. Having said that, using this method, we can refine the scale by discarding the previously developed items whose discrimination parameter is significantly lower than the threshold value of 0.1. This approach underpins the construction of a concise yet efficacious assessment tool, contrasting with alternative methodologies that emphasize model fit and error metrics, such as RMSE and SRMR values.</p></sec>
<sec><title>Grading System and Data Transformation</title>
<p>The grading system operationalized in this study delineates responses within ±5 points from the correct answer as '0' (very close), beyond 5 points as '1' (far), and beyond 10 points as '2' (very far). Inversely, responses below -5 points were categorized as '-1' (far), and those below -10 points as '-2' (very far). The ±5/±10 cutoffs were selected for interpretability and because they yielded ordered thresholds and well-behaved category response curves without sparse categories; alternative nearby cutoffs (±3/±6, ±5/±15, ±3/±7) produced substantively similar item discriminations and test information, supporting robustness.</p>
<p>We computed signed estimation error on the 0 – 100 scale and transformed responses to magnitude-only ordered categories required by a unidimensional graded response model: 3 = very close (|error| ≤ 5), 2 = far (5 &lt; |error| ≤ 10), 1 = very far (|error| &gt; 10). Here, it is important to note that points were transformed to a new scale of 1 – 3 to align with the GRM’s unidimensional, ordered-category assumptions; the numeric labels are ordinal, and the GRM operates on the latent trait via estimated discrimination and threshold parameters.</p></sec>
<sec><title>Probability of Endorsing a Response</title>
<p>Two pivotal factors inform a student's likelihood of selecting a specific response category, reflecting their deviation from the accurate estimation: item difficulty and discrimination. The polytomous nature of item responses, ranging from very close to very far from the correct answer, necessitates the application of the IRT-Graded Response Model (IRT-GRM) for parameter estimation. This model facilitates the nuanced analysis of the endorsement probabilities, yielding insights into each question's underlying challenge and discriminatory power. The GRM requires an ordered set of response categories that reflect increasing deviations from the correct placement; the cutoffs here define an ordinal, severity-graded outcome (very close → very far) on each item, which the GRM models via item slopes (discrimination) and ordered threshold parameters that separate adjacent categories.</p></sec>
<sec><title>Data Preparation</title>
<p>To conduct the rest of the planned analyses, we first computed two types of performance scores for each individual student based on each iteration of the reduced scale.</p>
<sec><title>Accuracy of Estimates</title>
<p>We used Percent Absolute Error (PAE; e.g., <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>) to measure the overall accuracy of students' estimates on the number line</p>
<disp-formula>
  <mml:math>
    <mml:mrow>
      <mml:mi>PAE</mml:mi>
      <mml:mo>=</mml:mo>
      <mml:mrow>
        <mml:mo>|</mml:mo>
        <mml:mfrac>
          <mml:mrow>
            <mml:mtext mathvariant="italic">Target Number</mml:mtext>
            <mml:mo>&#x2212;</mml:mo>
            <mml:mtext mathvariant="italic">Estimated Number</mml:mtext>
          </mml:mrow>
          <mml:mrow>
            <mml:mtext mathvariant="italic">Scale of Estimates</mml:mtext>
          </mml:mrow>
        </mml:mfrac>
        <mml:mo>|</mml:mo>
      </mml:mrow>
      <mml:mo>&#x00D7;</mml:mo>
      <mml:mn>100</mml:mn>
      <mml:mtext>.</mml:mtext>  
    </mml:mrow>
  </mml:math>
</disp-formula>
<p>PAE is thus computed for a given child on a given item by subtracting the actual value of the to-be-estimated number from the numerical value that corresponds with the child’s placement on the number line for that number, taking the absolute value so that over- vs. underestimates are not distinguished, and dividing the value by the scale of the number line (in this case, 100 since it is a 0 – 100 number line); the resulting value is then multiplied by 100 to get a percentage. The average PAE is then computed across all the items for a given child to provide a measure of overall accuracy. For the present study, PAE scores were computed for each child for the entire scale and then, in turn, for each potential reduced scale.</p></sec>
<sec><title>Pattern of Estimates</title>
<p>As in <xref ref-type="bibr" rid="r54">Siegler and Booth (2004)</xref> and <xref ref-type="bibr" rid="r9">Booth and Siegler (2006)</xref>, we first computed the median estimate for each number of children in a particular grade and fit linear and logarithmic functions to the median estimates at each grade level. We then recorded the variance that could be explained by the best-fitting linear (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>) and logarithmic function (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>) at that grade level. This process was undertaken across the entire scale and for each potential reduced scale.</p>
<p>Then, to obtain a measure of the pattern of individual students’ estimates, we fit linear and logarithmic functions to each child’s estimates and recorded the amount of variance that could be explained by the best-fitting linear (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>) and logarithmic function (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>) for that child’s estimates. Again, these values were computed for each child for the entire scale and then for each potentially reduced scale.</p></sec></sec>
<sec><title>Exploratory Factor Analysis (EFA)</title>
<p>To further validate the reduced scale, we conducted Exploratory Factor Analysis (EFA) on both the original and reduced scales using the <italic>psych</italic> package in R (<xref ref-type="bibr" rid="r44">Revelle, 2021</xref>). We performed parallel analysis to determine the number of factors to retain (<xref ref-type="bibr" rid="r32">Horn, 1965</xref>). To assess unidimensionality, we calculated the eigenvalue ratio (first to second factor) and the percentage of variance explained by the first factor (<xref ref-type="bibr" rid="r57">Slocum-Gori &amp; Zumbo, 2011</xref>). We also assessed reliability using Cronbach's alpha and McDonald's omega (<xref ref-type="bibr" rid="r20">Dunn et al., 2014</xref>) for further comparisons.</p></sec></sec>
<sec sec-type="results"><title>Results</title>
<sec><title>Graded Response Model (GRM) Analysis</title>
<p>After running the Item Response Theory (IRT)-Graded Response Model using the <italic>ltm</italic> package in R (<xref ref-type="bibr" rid="r45">Rizopoulos, 2018</xref>) for our polytomous dataset, we found threshold (extreme) and discrimination values, as mentioned in <xref ref-type="table" rid="t1">Table 1</xref>. In the Graded Response Model, thresholds indicate where individuals are likely to transition between different response categories, with higher thresholds corresponding to more difficult items or response categories that require a higher level of the latent trait.</p>
<table-wrap id="t1" position="anchor" orientation="portrait">
<label>Table 1</label><caption><title>Threshold and Discrimination Values From Graded Response Model Analysis</title></caption>
<table frame="hsides" rules="groups" style="striped-#f3f3f3" width="75%">
<col width="25%"/>
<col width="25%"/>
<col width="25%"/>
<col width="25%"/>
<thead>
<tr>
<th>Target Number</th>
<th>Threshold 1</th>
<th>Threshold 2</th>
<th>Discrimination</th>
</tr>
</thead>
<tbody>
<tr>
<td>3</td>
<td align="char" char=".">-0.787</td>
<td align="char" char=".">0.125</td>
<td align="char" char=".">2.182</td>
</tr>
<tr>
<td>4</td>
<td align="char" char=".">-0.048</td>
<td align="char" char=".">0.771</td>
<td align="char" char=".">2.189</td>
</tr>
<tr>
<td>6</td>
<td align="char" char=".">0.284</td>
<td align="char" char=".">1.002</td>
<td align="char" char=".">2.421</td>
</tr>
<tr>
<td>8</td>
<td align="char" char=".">0.684</td>
<td align="char" char=".">1.174</td>
<td align="char" char=".">2.907</td>
</tr>
<tr>
<td>12</td>
<td align="char" char=".">0.610</td>
<td align="char" char=".">1.292</td>
<td align="char" char=".">2.499</td>
</tr>
<tr>
<td>17</td>
<td align="char" char=".">0.735</td>
<td align="char" char=".">1.475</td>
<td align="char" char=".">2.155</td>
</tr>
<tr>
<td>21</td>
<td align="char" char=".">0.650</td>
<td align="char" char=".">1.421</td>
<td align="char" char=".">1.788</td>
</tr>
<tr>
<td>24</td>
<td align="char" char=".">0.579</td>
<td align="char" char=".">1.454</td>
<td align="char" char=".">1.318</td>
</tr>
<tr>
<td>25</td>
<td align="char" char=".">0.405</td>
<td align="char" char=".">1.369</td>
<td align="char" char=".">1.837</td>
</tr>
<tr>
<td>29</td>
<td align="char" char=".">0.605</td>
<td align="char" char=".">1.446</td>
<td align="char" char=".">1.173</td>
</tr>
<tr>
<td>33</td>
<td align="char" char=".">0.302</td>
<td align="char" char=".">1.081</td>
<td align="char" char=".">1.246</td>
</tr>
<tr>
<td>39</td>
<td align="char" char=".">0.433</td>
<td align="char" char=".">1.635</td>
<td align="char" char=".">0.881</td>
</tr>
<tr>
<td>48</td>
<td align="char" char=".">0.987</td>
<td align="char" char=".">6.014</td>
<td align="char" char=".">0.341</td>
</tr>
<tr>
<td>52</td>
<td align="char" char=".">-0.393</td>
<td align="char" char=".">0.634</td>
<td align="char" char=".">0.997</td>
</tr>
<tr>
<td>57</td>
<td align="char" char=".">-0.131</td>
<td align="char" char=".">0.982</td>
<td align="char" char=".">0.880</td>
</tr>
<tr>
<td>61</td>
<td align="char" char=".">-0.630</td>
<td align="char" char=".">0.969</td>
<td align="char" char=".">0.554</td>
</tr>
<tr>
<td>64</td>
<td align="char" char=".">-0.399</td>
<td align="char" char=".">1.207</td>
<td align="char" char=".">0.598</td>
</tr>
<tr>
<td>72</td>
<td align="char" char=".">-0.791</td>
<td align="char" char=".">1.332</td>
<td align="char" char=".">0.517</td>
</tr>
<tr>
<td>79</td>
<td align="char" char=".">-0.047</td>
<td align="char" char=".">2.557</td>
<td align="char" char=".">0.472</td>
</tr>
<tr>
<td>81</td>
<td align="char" char=".">0.071</td>
<td align="char" char=".">1.687</td>
<td align="char" char=".">0.837</td>
</tr>
<tr>
<td>84</td>
<td align="char" char=".">0.702</td>
<td align="char" char=".">3.201</td>
<td align="char" char=".">0.384</td>
</tr>
<tr>
<td>90</td>
<td align="char" char=".">0.111</td>
<td align="char" char=".">1.747</td>
<td align="char" char=".">0.521</td>
</tr>
<tr>
<td>96</td>
<td align="char" char=".">1.999</td>
<td align="char" char=".">5.182</td>
<td align="char" char=".">0.341</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As mentioned previously, discrimination parameters indicate that an item is effective at differentiating between individuals with slightly different trait levels. Therefore, for our first iteration, we considered all the items whose discriminating values are more significant than 1 and then subsequently added items that are very close to 1, one by one, for further iterations. We stopped at the reduction iteration that yielded a comparative analysis that produced similar results to those of the original dataset.</p>
<p>As indicated in <xref ref-type="table" rid="t1">Table 1</xref>, all the values greater than target item 33 have low discrimination values less than 1. Thus, we reduced the length of the test by removing all the target items after 33 for this scale for the first iteration (the remaining items are 3, 4, 6, 8, 12, 17, 21, 24, 25, 29, 33). We then successively added back in one item at a time, starting with the largest discrimination value (i.e., we subsequently added 52 for the second iteration (discrimination value 0.997), 39 and 57 for the third (discrimination value 0.88), etc.), and 81 for the fourth (discrimination value 0.83). Thus, the original combined scale had 23 items, Iteration 1 had 11 items, Iteration 2 had 12 items, Iteration 3 had 14 items, and Iteration 4 had 15 items. We did not proceed with Iteration 5 because no further items had discrimination values close to the threshold of 1.0.</p>
<?table t1?>
</sec>
<sec><title>Student Performance</title>
<p>To test whether the reduced scale options yield sufficiently comparable performance scores for students, we first computed correlation coefficients for the PAE scores for the original scale with the PAE scores for each potential reduced scale and for the <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> values for the original scale with those for each potential reduced scale.</p>
<p>PAE scores for the original scale were significantly correlated with PAE scores for each of the potentially reduced scales as shown in <xref ref-type="table" rid="t2">Table 2</xref>. To compare these correlations, we used Fisher’s <italic>r</italic>-to-<italic>z</italic> tests for dependent samples. Accuracy for Iteration 2 was more highly correlated with that for the original scale than was Iteration 1 accuracy. Iteration 3 accuracy was more highly correlated than Iteration 2 accuracy, and Iteration 4 accuracy was more highly correlated with Iteration 3 accuracy. Thus, of all the potentially reduced scales, Iteration 4 produced the most highly correlated accuracy scores with those from the original scales. Similarly, <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores for the original scale were significantly correlated with <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores for each of the potentially reduced scales, as presented in <xref ref-type="table" rid="t3">Table 3</xref>, and we found similar results.</p>
<table-wrap id="t2" position="anchor" orientation="portrait">
<label>Table 2</label><caption><title>PAE Scores: Original vs Iterations</title></caption>
<table frame="hsides" rules="groups" style="striped-#f3f3f3">
<col width="24%" align="left"/>
<col width="18%"/>
<col width="18%"/>
<col width="10%"/>
<col width="11%"/>
<col width="9%"/>
<col width="10%"/>
<thead>
<tr>
<th valign="bottom">Comparison</th>
<th valign="bottom">Correlation Coefficient (<italic>r</italic>)</th>
<th valign="bottom">Degrees of Freedom (<italic>df</italic>)</th>
<th valign="bottom"><italic>p</italic></th>
<th valign="bottom">Fisher’s <italic>z</italic></th>
<th valign="bottom"><italic>z</italic></th>
<th valign="bottom"><italic>p</italic> (<italic>z</italic>-test)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Original vs. Iteration 1</td>
<td align="char" char=".">0.746</td>
<td>234</td>
<td align="char" char=".">&lt; .001</td>
<td>–</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>Original vs. Iteration 2</td>
<td align="char" char=".">0.756</td>
<td>234</td>
<td align="char" char=".">&lt; .001</td>
<td>–</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>Original vs. Iteration 3</td>
<td align="char" char=".">0.776</td>
<td>234</td>
<td align="char" char=".">&lt; .001</td>
<td>–</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>Original vs. Iteration 4</td>
<td align="char" char=".">0.797</td>
<td>234</td>
<td align="char" char=".">&lt; .001</td>
<td>–</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>Iteration 1 vs. Iteration 2</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td align="char" char=".">-3.642</td>
<td>–</td>
<td align="char" char=".">&lt; .001</td>
</tr>
<tr>
<td>Iteration 2 vs. Iteration 3</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td align="char" char=".">-4.741</td>
<td>–</td>
<td align="char" char=".">&lt; .001</td>
</tr>
<tr>
<td>Iteration 3 vs. Iteration 4</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td align="char" char=".">-5.182</td>
<td>–</td>
<td align="char" char=".">&lt; .001</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="t3" position="anchor" orientation="portrait">
<label>Table 3</label><caption><title><inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> Scores Comparisons</title></caption>
<table frame="hsides" rules="groups" style="striped-#f3f3f3">
<col width="24%" align="left"/>
<col width="18%"/>
<col width="18%"/>
<col width="10%"/>
<col width="11%"/>
<col width="9%"/>
<col width="10%"/>
<thead>
<tr>
<th valign="bottom">Comparison</th>
<th valign="bottom">Correlation Coefficient (<italic>r</italic>)</th>
<th valign="bottom">Degrees of Freedom (<italic>df</italic>)</th>
<th valign="bottom"><italic>p</italic></th>
<th valign="bottom">Fisher’s <italic>z</italic></th>
<th valign="bottom"><italic>z</italic></th>
<th valign="bottom"><italic>p</italic> (<italic>z</italic>-test)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Original vs. Iteration 1</td>
<td align="char" char=".">0.462</td>
<td>234</td>
<td align="char" char=".">&lt; .001</td>
<td>–</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>Original vs. Iteration 2</td>
<td align="char" char=".">0.736</td>
<td>234</td>
<td align="char" char=".">&lt; .001</td>
<td>–</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>Original vs. Iteration 3</td>
<td align="char" char=".">0.814</td>
<td>234</td>
<td align="char" char=".">&lt; .001</td>
<td>–</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>Original vs. Iteration 4</td>
<td align="char" char=".">0.926</td>
<td>234</td>
<td align="char" char=".">&lt; .001</td>
<td>–</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>Iteration 1 vs. Iteration 2</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td align="char" char=".">-8.771</td>
<td>–</td>
<td align="char" char=".">&lt; .001</td>
</tr>
<tr>
<td>Iteration 2 vs. Iteration 3</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td align="char" char=".">-5.829</td>
<td>–</td>
<td align="char" char=".">&lt; .001</td>
</tr>
<tr>
<td>Iteration 3 vs. Iteration 4</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td align="char" char=".">-13.858</td>
<td>–</td>
<td align="char" char=".">&lt; .001</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Next, we computed a series of paired-sample <italic>t</italic>-tests to compare the mean PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores for each potential reduced scale with the mean scores for the original scale, as shown in <xref ref-type="table" rid="t4">Table 4</xref>.</p>
<table-wrap id="t4" position="float" orientation="portrait">
<label>Table 4</label><caption><title>Differences Between Each Iteration and Original Scale in PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> Metrics</title></caption>
<table frame="hsides" rules="groups" style="striped-#f3f3f3">
<col width="18%" align="left"/>
<col width="17%"/>
<col width="12%"/>
<col width="12%"/>
<col width="12%"/>
<col width="12%"/>
<col width="17%"/>
<thead>
<tr>
<th>Measure</th>
<th>Iteration</th>
<th><italic>t</italic></th>
<th><italic>df</italic></th>
<th><italic>p</italic></th>
<th><italic>d</italic></th>
<th>Effect Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>PAE</td>
<td>1</td>
<td align="char" char=".">-3.003</td>
<td>466</td>
<td align="char" char=".">.003*</td>
<td align="char" char=".">0.28</td>
<td>Small</td>
</tr>
<tr>
<td/>
<td>2</td>
<td align="char" char=".">-2.451</td>
<td>466</td>
<td align="char" char=".">.015</td>
<td align="char" char=".">0.23</td>
<td>Small</td>
</tr>
<tr>
<td/>
<td>3</td>
<td align="char" char=".">-1.906</td>
<td>466</td>
<td align="char" char=".">.057</td>
<td align="char" char=".">0.18</td>
<td>Small</td>
</tr>
<tr>
<td/>
<td>4</td>
<td align="char" char=".">-1.727</td>
<td>466</td>
<td align="char" char=".">.085</td>
<td align="char" char=".">0.16</td>
<td>Small</td>
</tr>
<tr style="grey-border-top">
<td><inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula></td>
<td>1</td>
<td align="char" char=".">2.287</td>
<td>466</td>
<td align="char" char=".">.023</td>
<td align="char" char=".">0.21</td>
<td>Small</td>
</tr>
<tr>
<td/>
<td>2</td>
<td align="char" char=".">2.296</td>
<td>466</td>
<td align="char" char=".">.022</td>
<td align="char" char=".">0.21</td>
<td>Small</td>
</tr>
<tr>
<td/>
<td>3</td>
<td align="char" char=".">2.068</td>
<td>466</td>
<td align="char" char=".">.039</td>
<td align="char" char=".">0.19</td>
<td>Small</td>
</tr>
<tr>
<td/>
<td>4</td>
<td align="char" char=".">1.956</td>
<td>466</td>
<td align="char" char=".">.051</td>
<td align="char" char=".">0.18</td>
<td>Small</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Note.</italic> To account for multiple comparisons, we applied a classic Bonferroni correction, adjusting the significance threshold to α = .05/8 = .00625.</p>
<p>*<italic>p</italic> &lt; .00625.</p>
</table-wrap-foot>
</table-wrap>
<p>In <xref ref-type="table" rid="t4">Table 4</xref>, we summarize the <italic>t</italic>-test results for both PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> measures across all four iterations of the reduced scale, including the <italic>t</italic>-values, degrees of freedom, <italic>p</italic>-values, Cohen's <italic>d</italic> effect sizes, and interpretation of effect sizes. To account for multiple comparisons, we applied the Bonferroni correction, adjusting the significance threshold to α = .05/8 = .00625. Only one comparison yielded a statistically significant difference after this correction (that for Iteration 1 PAE). PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> for Iterations 2, 3, and 4 did not significantly differ from the original scale.</p>
<sec><title>Selecting a Preferable Reduced Scale</title>
<p>We considered the findings from the prior analysis and used them to determine which iteration we deem most preferable. Though Iteration 1 was the only reduced scale that had discrimination values all of less than one, this scale also had all eleven target items that were focused on the lower range of the 0 – 100 scale, specifically between 3 and 33 (see <xref ref-type="table" rid="t1">Table 1</xref>). Thus, we were concerned that this scale would not be able to fully capture any developmental changes that occurred at the higher end of the scale. We decided that a reduced scale that also represented the higher end of the 0 – 100 range would be most preferable. Thus, we considered Iterations 2 – 4. Iterations 2 and 3 added target items at around the midpoint of the scale (i.e., 52 for Iteration 2 and 39 and 57 for Iteration 3). Thus, we saw these as improvements upon Iteration 1. However, Iteration 4 included a target item at the higher end of the scale (i.e., 81) and we saw this as a particular benefit to the scale. We also considered correlations between the original scale and reduced scales’ <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> and PAE scores for each iteration. As displayed in <xref ref-type="table" rid="t3">Table 3</xref>, we found significant correlations for all iterations, yet correlations for our <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> comparisons became significantly and dramatically stronger as we progressed from Iteration 1 (<italic>r</italic> = .462) to Iteration 4 (<italic>r</italic> = .926). As displayed in <xref ref-type="table" rid="t2">Table 2</xref>, correlations between original and reduced scale PAE scores also became progressively and significantly stronger, though the increases were not as drastic, going from Iteration 1 (<italic>r</italic> = .746) to Iteration 4 (<italic>r</italic> = .797).</p>
<p>Next, we considered differences in original PAEs and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> to each of the reduced scales. As displayed in <xref ref-type="table" rid="t4">Table 4</xref>, considering the Bonferroni-corrected <italic>p</italic>-value threshold of .00625, only one significant difference was found: PAE for Iteration 1 was significantly different from PAE for the original scale. <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> for Iteration 1, as well as PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> for Iterations 2, 3, and 4, were not significantly different from that on the original scale.</p>
<p>Based on these analyses, we recommend Iteration 4 as the preferred reduced scale, as it produced the PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> values that are most highly correlated with the original scale, and because those PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> values were not significantly different from the original scale. We thus used Iteration 4 in the subsequent analyses to determine if key findings from the original studies are replicated with the reduced scale. This scale included the following 15 items: 3, 4, 6, 8, 12, 17, 21, 24, 25, 29, 33, 39, 52, 57, and 81.</p></sec></sec>
<sec><title>Exploratory Factor Analysis (EFA) Analysis</title>
<p>Before replicating key findings from the original studies, we conducted Exploratory Factor Analysis (EFA) to examine the underlying factor structure of both the original and reduced scales and to assess their dimensionality. We first examined the factor structure using principal axis factoring with oblimin rotation. Scree plots were generated to visually compare the eigenvalues of both scales, as shown in <xref ref-type="fig" rid="f1">Figure 1</xref>.</p>
<fig id="f1" specific-use="style(width:75%;)" position="anchor" fig-type="figure" orientation="portrait">
<label>Figure 1</label>
<caption>
<title>Scree Plot Comparison for Both Original Scale and Reduced Scale</title>
</caption>
<graphic xlink:href="jnc.17459-f1.svg" position="anchor" orientation="portrait"/></fig>
<p>Parallel analysis was then performed to determine the number of factors to retain (<xref ref-type="bibr" rid="r32">Horn, 1965</xref>), with results displayed in <xref ref-type="fig" rid="f2">Figure 2</xref>.</p>
<fig id="f2" position="anchor" fig-type="figure" orientation="portrait">
<label>Figure 2</label>
<caption>
<title>Parallel Analysis Comparisons</title>
</caption>
<graphic xlink:href="jnc.17459-f2.svg" position="anchor" orientation="portrait"/></fig>
<p>We compared the observed eigenvalues with those generated from random data to identify significant factors. To assess unidimensionality, we calculated the eigenvalue ratio (first to second factor) and the percentage of variance explained by the first factor (<xref ref-type="bibr" rid="r57">Slocum-Gori &amp; Zumbo, 2011</xref>). Reliability was assessed using Cronbach's alpha and McDonald's omega (<xref ref-type="bibr" rid="r20">Dunn et al., 2014</xref>). Additionally, we examined factor loadings to compare the clarity of factor structure between the original and reduced scales.</p>
<p>As shown in <xref ref-type="table" rid="t5">Table 5</xref>, the reduced scale maintained comparable psychometric properties to the original scale while improving some aspects of unidimensionality. The reduced scale (15 items) demonstrated a slightly lower eigenvalue ratio (3.64) compared to the original scale (3.80), but explained a higher percentage of variance through its first factor (56.49% vs. 54.85%). Both scales exhibited high internal consistency reliability, with the original scale showing marginally higher Cronbach's alpha (α = .86 vs. .85) and identical McDonald's omega values (ω = .89). Notably, parallel analysis suggested fewer factors for the reduced scale (2 factors) compared to the original scale (4 factors), indicating a potentially simpler factor structure. These results suggest that the reduced scale preserves the essential psychometric qualities of the original while potentially offering a more parsimonious measure of the construct.</p>
<table-wrap id="t5" position="anchor" orientation="portrait">
<label>Table 5</label><caption><title>Psychometric Comparisons</title></caption>
<table frame="hsides" rules="groups" width="80%" style="striped-#f3f3f3">
<col width="50%" align="left"/>
<col width="25%"/>
<col width="25%"/>
<thead>
<tr>
<th>Property</th>
<th>Original Scale</th>
<th>Reduced Scale</th>
</tr>
</thead>
<tbody>
<tr>
<td>Number of Items</td>
<td>23</td>
<td>15</td>
</tr>
<tr>
<td>Eigenvalue Ratio (Factor 1/Factor 2)</td>
<td align="char" char=".">3.82</td>
<td align="char" char=".">3.64</td>
</tr>
<tr>
<td>Variance Explained by First Factor (%)</td>
<td align="char" char=".">54.85</td>
<td align="char" char=".">56.49</td>
</tr>
<tr>
<td>Cronbach's Alpha</td>
<td align="char" char=".">0.86</td>
<td align="char" char=".">0.85</td>
</tr>
<tr>
<td>McDonald's Omega </td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">0.89</td>
</tr>
<tr>
<td>Factors Suggested by Parallel Analysis</td>
<td>4</td>
<td>2</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Note.</italic> Reduced Scale: 3, 4, 6, 8, 12, 17, 21, 24, 25, 29, 33, 39, 52, 57, 81.</p>
</table-wrap-foot>
</table-wrap>
<sec><title>Developmental Changes in Accuracy of Number Line Estimates</title>
<p>As in the original studies, we first conducted a one-way ANOVA on PAE scores by grade. In the original publications, there were significant main effects of grade on PAE scores such that kindergartners’ estimates were less accurate than those of first or second graders (<xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>) or first, second, and third graders (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>). With the reduced scale, there was also a significant main effect of grade, <italic>F</italic>(3,230) = 52.766, <italic>p</italic> &lt; .001, <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi mathvariant="normal">η</mml:mi><mml:mi mathvariant="normal">p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .408. Follow-up paired-sample <italic>t</italic>-tests with Bonferroni correction showed that kindergarteners had higher PAE (<italic>M</italic> = .28) than first (<italic>M</italic> = .16), second (<italic>M</italic> = .13), or third graders (<italic>M</italic> = .09); first graders also had significantly higher PAE than third graders (all <italic>p</italic> &lt; .05).</p></sec>
<sec><title>Developmental Changes in Patterns of Number Line Estimates</title>
<p>As in the original studies, we used a three-pronged approach to examining developmental change in patterns of number line estimates: 1) Comparing variance explained by linear vs. logarithmic functions fitting median estimates at each grade level, 2) Comparing the distribution of the best-fitting type of function (Linear or Logarithmic) for individual students in each grade level, and 3) examining changes in the amount of variance accounted for by the best fitting linear function (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores) for each child by grade level.</p>
<p>As previously mentioned, since the original studies, there have been a number of efforts to test additional functions beyond the linear and logarithmic to assess the nature of the pattern of students’ number line estimates. While it is not practical to test the effectiveness of the reduced scale using all of the types of models that have been introduced, there is one particular model, the mixed log linear model, which relies on a similar theoretical framework as the original studies and which has been shown to parsimoniously capture individual differences in most students’ number line estimates (<xref ref-type="bibr" rid="r42">Qin et al., 2024</xref>). As one final test of the effectiveness of the reduced scale, we thus conclude this section by assessing change in the fit of the mixed log linear model to students’ estimates on the original and the reduced scales.</p></sec>
<sec><title>Variance Explained by Linear vs. Logarithmic Functions Fitting Median Estimates</title>
<p>In the original studies, median estimates were computed for each number for students in a particular grade level. That is, the median estimate for the placement of 3 by kindergartners, the median estimate for the placement of 3 by first graders, the median estimate for the placement of 4 by kindergartners, etc. Then, for each grade, the median estimates were plotted against the actual values of the numbers, and the best-fitting linear and logarithmic functions were identified. The absolute value of the differences between the median estimates and the number predicted by the best fitting linear and logarithmic functions were then computed and compared. Results from those studies generally indicated that kindergartners’ median estimates were better fit by the logarithmic function, first graders’ estimations were typically equally well fit by the two functions, and second and third graders’ estimations were better fit by the linear function (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>)<xref ref-type="fn" rid="fn1"><sup>1</sup></xref><fn id="fn1"><label>1</label>
<p>As in the original studies, the best-fitting exponential function was also computed, but we do not include it here as it was not the best fit for any grade level.</p></fn>. Plots showing the best-fitting linear and logarithmic functions at each grade level for the original and reduced scales can be found in <xref ref-type="fig" rid="f3">Figure 3</xref>.</p>
<fig id="f3" position="anchor" fig-type="figure" orientation="portrait">
<label>Figure 3</label>
<caption>
<title>Best-Fitting Linear and Logarithmic Functions for Medians at Each Grade for the Original (a-d) and Reduced Scales (e-h)</title>
</caption>
<graphic xlink:href="jnc.17459-f3.svg" position="anchor" orientation="portrait"/></fig>
<p>To evaluate whether a linear or logarithmic model was a better fit for the median values at each grade level, we first computed, for each trial at each grade level, the absolute value of the difference between the median value and the value for that trial that would be predicted by the best-fitting linear and logarithmic models. We then computed paired-samples <italic>t</italic>-tests for each grade level to compare the average distance of the median values to the linear vs. the logarithmic model; each of the included trials was therefore a separate data point in this analysis. Results for the original scale containing 23 trials, consolidated across studies, were aligned with those described in the original studies (see <xref ref-type="fig" rid="f3">Figures 3a-d</xref>): Kindergartners’ estimates were better fit by the logarithmic (<italic>R</italic><sup>2</sup> = .93) than the linear function (<italic>R</italic><sup>2</sup> = .67; <italic>t</italic>(22) = 3.97, <italic>p</italic> &lt; .001, <italic>d</italic> = 0.80); First graders were equally fit by the logarithmic (<italic>R</italic><sup>2</sup> = .96) and linear functions (<italic>R</italic><sup>2</sup> = .95) as indicated by a non-significant; <italic>t</italic>(22) = -0.16, <italic>p</italic> = .88, <italic>d</italic> = 0.03; Second (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .98; <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .89; <italic>t</italic>(22) = -6.94, <italic>p</italic> &lt; .001, <italic>d</italic> = 1.40) and third graders (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .98; <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .85; <italic>t</italic>(22) = -5.01, <italic>p</italic> &lt; .001, <italic>d</italic> = 1.01) were better fit by the linear function.</p>
<p>For the reduced scale with 15 items, as shown in <xref ref-type="fig" rid="f3">Figure 3e</xref>, kindergarten students’ estimates were better fit by the logarithmic (<italic>R</italic><sup>2</sup> = .93) than the linear function (<italic>R</italic><sup>2</sup> = .57; <italic>t</italic>(14) = 3.91, <italic>p</italic> &lt; .01, <italic>d</italic> = 0.95). Third graders’ estimates on the reduced scale (<xref ref-type="fig" rid="f3">Figure 3h</xref>) were better fit by the linear (<italic>R</italic><sup>2</sup> = .97) than the logarithmic function (<italic>R</italic><sup>2</sup> = .83; <italic>t</italic>(14) = -2.99, <italic>p</italic> &lt; .01, <italic>d</italic> = 0.73). Estimates on the reduced scale were equally well fit by the linear and logarithmic functions for first (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .92; <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .96; <italic>t</italic>(14) = 1.67, <italic>p</italic> = .12, <italic>d</italic> = 0.41) and second graders (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .95; <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .90; <italic>t</italic>(14) = -1.25, <italic>p</italic> = .23, <italic>d</italic> = 0.30). That is, findings for the reduced scale deviated from the original scale only for second graders with no significant difference in fit between linear and logarithmic with the reduced scale but a better fit with linear for the original scale. These differences will be considered in the discussion. We then proceeded to examine fit distributions at the student level, described next.</p>
<?figure f3?>
</sec>
<sec><title>Distribution of the Best-Fitting Type of Function for Individual Students</title>
<p>In the original studies, chi-squared analyses of the distribution of children in each grade for whom the linear or logarithmic function provided the better fit revealed that the percentage of children whose estimates were best fit by the logarithmic function decreased with grade, and the percentage of children best fit by the linear function increased with grade. In general, a higher proportion of kindergartners were best fit by the logarithmic function, a higher proportion of the older students (2nd and/or 3rd graders) were better fit by the linear function, and students in the middle were equally likely to be best fit by the linear and logarithmic functions. With the reduced scale, the best fitting function for students’ estimates again varied by grade, χ<sup>2</sup>(3, <italic>N</italic> = 234) = 24.408, <italic>p</italic> &lt; .001. As shown in <xref ref-type="fig" rid="f4">Figure 4</xref>, the percentage of children best fit by the logarithmic function decreased with age, while the percent best fit by the linear function increased.</p>
<fig id="f4" specific-use="style(width:75%;)" position="anchor" fig-type="figure" orientation="portrait">
<label>Figure 4</label>
<caption>
<title>Best Fit-Measures (Linear vs Logarithmic)</title>
</caption>
<graphic xlink:href="jnc.17459-f4.svg" position="anchor" orientation="portrait"/></fig>
<p>Kindergartners’ and first graders’ estimates were more likely to be best fit by the logarithmic than the linear function. Second graders’ estimates were equally likely to be best fit by the linear and logarithmic functions, while third graders were more likely to be best fit by the linear function than the logarithmic function. <xref ref-type="bibr" rid="r9">Booth and Siegler (2006)</xref> also conducted additional paired-sample <italic>t</italic>-tests comparing the mean linear and logarithmic fit at each grade level, revealing that the linear function fit student estimates worse than the logarithmic one for kindergartners, equal to the logarithmic one for first graders, and better than the logarithmic one for second and third graders. The same analyses with the reduced scale across all of the participants revealed that the linear function fit student estimates worse than the logarithmic one for kindergartners [.28 vs. .44; <italic>t</italic>(60) = -8.544, <italic>p</italic> &lt; .001] and first graders [.64 vs. .70; <italic>t</italic>(75) = -2.816 <italic>p</italic> &lt; .01], and better than the logarithmic one for third graders [.84 vs. .76; <italic>t</italic>(21) = 2.742, <italic>p</italic> &lt; .01]. The fit of the linear and logarithmic functions for second graders was equal, with a non-significant trend towards a better fit for the linear function [.72 vs. .69; <italic>t</italic>(74) = 1.726, <italic>p</italic> = .09]. Thus, findings for the reduced scale matched those for the original scale for kindergartners and third graders; for first graders, the reduced scale led to more students being classified as logarithmic than for the original scale, and for second graders the reduced scale led to more students being equally fit by the logarithmic and linear functions rather than better fit by the linear function as in the original scale.</p>
<p>A final way to evaluate the ability of the reduced scale to replicate findings from the original scale is to examine change in the function of best fit for individual students. As shown in <xref ref-type="table" rid="t6">Table 6</xref>, the best-fitting function from the reduced scale matched that for the original scale for 85.5% of individual students (93.4% of kindergartners, 81.6% of first graders, 84% of second graders, and 81.8% of third graders). Across all grades, when the best-fitting function did not match, the change was more likely to be towards a better fit by the logarithmic function; that is, students who were best fit by the linear function on the original scale but by the logarithmic function for the reduced scale (<italic>N</italic> = 30 students). Only a handful of students (<italic>N</italic> = 4) were better fit by the logarithmic function for the original scale but by the linear function for the reduced scale. In general, the reduced scale most accurately replicated the best-fitting function for individual students for kindergartners and was least accurate for second graders.</p>
<table-wrap id="t6" position="anchor" orientation="portrait">
<label>Table 6</label><caption><title>Match in Best-Fitting Function for Individual Students With Original vs. Reduced Scales</title></caption>
<table frame="hsides" rules="groups">
<col width="20%" align="left"/>
<col width="20%"/>
<col width="20%"/>
<col width="20%"/>
<col width="20%"/>
<thead>
<tr>
<th rowspan="2" scope="rowgroup" valign="bottom" align="left">Grade</th>
<th rowspan="2" valign="bottom">Best fit on Original Scale</th>
<th colspan="2" scope="colgroup">Best fit on Reduced Scale<hr/></th>
<th rowspan="2" valign="bottom">% Match</th>
</tr>
<tr>
<th>Log</th>
<th>Lin</th>
</tr>
</thead>
<tbody>
<tr>
<th rowspan="2" scope="rowgroup">Kindergarten</th>
<td>Log</td>
<td>48</td>
<td>2</td>
<td rowspan="2" align="char" char=".">93.4%</td>
</tr>
<tr>
<td>Lin</td>
<td>2</td>
<td>9</td>
</tr>
<tr style="grey-border-top">
<th rowspan="2" scope="rowgroup">First Grade</th>
<td>Log</td>
<td>36</td>
<td>2</td>
<td rowspan="2" align="char" char=".">81.6%</td>
</tr>
<tr>
<td>Lin</td>
<td>12</td>
<td>26</td>
</tr>
<tr style="grey-border-top">
<th rowspan="2" scope="rowgroup">Second Grade</th>
<td>Log</td>
<td>24</td>
<td>0</td>
<td rowspan="2">84.0%</td>
</tr>
<tr>
<td>Lin</td>
<td>12</td>
<td>39</td>
</tr>
<tr style="grey-border-top">
<th rowspan="2" scope="rowgroup">Third Grade</th>
<td>Log</td>
<td>3</td>
<td>0</td>
<td rowspan="2" align="char" char=".">81.8%</td>
</tr>
<tr>
<td>Lin</td>
<td>4</td>
<td>15</td>
</tr>
</tbody>
</table>
</table-wrap></sec>
<sec><title>Variance Accounted for by the Best Fitting Linear Function (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> Scores) for Individual Students</title>
<p>As in the original studies, we first conducted a one-way ANOVA on <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores by grade. In the original publications, there were significant main effects of grade on <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores such that kindergartners’ estimates were less linear than those of first or second graders (<xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>) or first, second, and third graders (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>). With the reduced scale, there was also a significant main effect of grade, <italic>F</italic>(3,230) = 60.05, <italic>p</italic> &lt; .001, <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi mathvariant="normal">η</mml:mi><mml:mi mathvariant="normal">p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .439. Follow-up paired-sample <italic>t</italic>-tests with Bonferroni correction showed that kindergarteners had lower <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> (<italic>M</italic> = .28) than first (<italic>M</italic> = .64), second (<italic>M</italic> = .72), or third graders (<italic>M</italic> = .85); first graders also had significantly lower <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> than third graders (all <italic>p</italic> &lt; .001).</p></sec>
<sec><title>Variance Accounted for by the Mixed Log-Linear Model for Individual Students</title>
<p>We used the method employed by <xref ref-type="bibr" rid="r42">Qin and colleagues (2024)</xref> to compute the fit of the mixed log-linear model (MLLM) for each student’s estimates, separately for the original scale and for the reduced scale. These calculations resulted in two variables of import for each student for each scale: the amount of variance accounted for by the function (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">MLLM</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>), and the weight of the logarithmic component of the function (λ). For the data in the present study, <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">MLLM</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores did not differ significantly between the original (<italic>M</italic>&nbsp;= .59) and reduced scales (<italic>M</italic> = .52; <italic>t</italic>(466) = .407, <italic>p</italic> = .68). There was also no significant difference in λ for the original (<italic>M</italic> = -.09) vs. reduced scales (<italic>M</italic> = .04; <italic>t</italic>(466) = -0.677, <italic>p</italic> = .50). Developmental change in patterns of estimates in <xref ref-type="bibr" rid="r42">Qin and colleagues (2024)</xref> was represented by a decrease in λ by student age, indicating that the relative degree of logarithmicity in estimates decreased as students aged. Similarly, in the present study, there were significant decreases in λ by grade for both the original (<italic>F</italic>(3,230) = 32.82, <italic>p</italic> &lt; .001, <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi mathvariant="normal">η</mml:mi><mml:mi mathvariant="normal">p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .296) and reduced scales (<italic>F</italic>(3,230) = 32.25, <italic>p</italic> &lt; .001, <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi mathvariant="normal">η</mml:mi><mml:mi mathvariant="normal">p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .300). Follow-up paired-sample <italic>t</italic>-tests with Bonferroni correction for each scale showed that kindergarteners had higher λ (Ms = .91 and .94 for original and reduced scales, respectively) than first (Ms = .59 and .61), second (Ms = .48 and .50), or third graders (Ms = .34 and .35; all <italic>p</italic>s &lt; .001); first graders also had significantly higher λ than third graders (<italic>p</italic>s &lt; .01). Descriptive statistics that demonstrate PAE, <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> by grade are displayed in <xref ref-type="table" rid="t7">Table 7</xref>.</p>
<table-wrap id="t7" position="anchor" orientation="portrait">
<label>Table 7</label><caption><title>Descriptive Statistics for Original and Reduced Scale by Grade Level</title></caption>
<table frame="hsides" rules="groups" style="compact-1 striped-#f3f3f3">
<col width="22%" align="left"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<col width="6.5%"/>
<thead>
<tr>
<th rowspan="3" scope="rowgroup" valign="bottom" align="left">Grade</th>
<th colspan="4" scope="colgroup"><inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>PAE</mml:mi><mml:mi mathvariant="normal">&nbsp;</mml:mi><mml:mn>&nbsp;</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula><hr/></th>
<th colspan="4" scope="colgroup"><inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula><hr/></th>
<th colspan="4" scope="colgroup"><inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula><hr/></th>
</tr>
<tr>
<th colspan="2" scope="colgroup">Original Scale<hr/></th>
<th colspan="2" scope="colgroup">Reduced Scale<hr/></th>
<th colspan="2" scope="colgroup">Original Scale<hr/></th>
<th colspan="2" scope="colgroup">Reduced Scale<hr/></th>
<th colspan="2" scope="colgroup">Original Scale<hr/></th>
<th colspan="2" scope="colgroup">Reduced Scale<hr/></th>
</tr>
<tr>
<th><italic>M</italic></th>
<th><italic>SD</italic></th>
<th><italic>M</italic></th>
<th><italic>SD</italic></th>
<th><italic>M</italic></th>
<th><italic>SD</italic></th>
<th><italic>M</italic></th>
<th><italic>SD</italic></th>
<th><italic>M</italic></th>
<th><italic>SD</italic></th>
<th><italic>M</italic></th>
<th><italic>SD</italic></th>
</tr>
</thead>
<tbody>
<tr>
<td>Kindergarten</td>
<td align="char" char=".">24.20%</td>
<td align="char" char=".">0.052</td>
<td align="char" char=".">28.42%</td>
<td align="char" char=".">0.081</td>
<td align="char" char=".">0.334</td>
<td align="char" char=".">0.227</td>
<td align="char" char=".">0.284</td>
<td align="char" char=".">0.207</td>
<td align="char" char=".">0.449</td>
<td align="char" char=".">0.248 </td>
<td align="char" char=".">0.442 </td>
<td align="char" char=".">0.247</td>
</tr>
<tr>
<td>First Grade</td>
<td align="char" char=".">15.57%</td>
<td align="char" char=".">0.057</td>
<td align="char" char=".">16.30%</td>
<td align="char" char=".">0.094</td>
<td align="char" char=".">0.689</td>
<td align="char" char=".">0.235</td>
<td align="char" char=".">0.643</td>
<td align="char" char=".">0.229</td>
<td align="char" char=".">0.714</td>
<td align="char" char=".">0.182</td>
<td align="char" char=".">0.699</td>
<td align="char" char=".">0.190</td>
</tr>
<tr>
<td>Second Grade</td>
<td align="char" char=".">12.65%</td>
<td align="char" char=".">0.045</td>
<td align="char" char=".">12.94%</td>
<td align="char" char=".">0.075</td>
<td align="char" char=".">0.786</td>
<td align="char" char=".">0.195</td>
<td align="char" char=".">0.721</td>
<td align="char" char=".">0.239</td>
<td align="char" char=".">0.737</td>
<td align="char" char=".">0.159</td>
<td align="char" char=".">0.689</td>
<td align="char" char=".">0.192</td>
</tr>
<tr>
<td>Third Grade</td>
<td align="char" char=".">8.88%</td>
<td align="char" char=".">0.027</td>
<td align="char" char=".">8.93%</td>
<td align="char" char=".">0.044</td>
<td align="char" char=".">0.879</td>
<td align="char" char=".">0.125</td>
<td align="char" char=".">0.847</td>
<td align="char" char=".">0.115</td>
<td align="char" char=".">0.796</td>
<td align="char" char=".">0.089</td>
<td align="char" char=".">0.758</td>
<td align="char" char=".">0.127</td>
</tr>
</tbody>
</table>
</table-wrap></sec>
<sec><title>Correlations Between Accuracy and Linearity of Estimates With Students’ Mathematics Achievement Scores</title>
<p>Because the original studies were conducted at different times and in different school districts, they also took different achievement tests or versions of achievement tests. We therefore first <italic>z</italic>-standardized achievement test scores within grade within study to ensure that we could compare across studies and across grade levels.</p>
<p>As in the original studies, we computed partial correlations between students’ PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores with their achievement test scores, controlling for age. Across the original studies, there was evidence of connections between both measures of number line performance and student achievement scores, though the particular grade levels and measures that yielded significant correlations varied by study. In general, though, the larger the amount of variance in estimates explained by a linear function (e.g., higher <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores), and the greater the accuracy of estimates (e.g., lower PAE), the higher the achievement scores. Partial correlations for <italic>z</italic>-standardized achievement test scores with PAE and <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> on the reduced scale, controlling for age, can be found in <xref ref-type="table" rid="t8">Table 8</xref>. In the original studies, all such analyses were conducted separately for each grade level; because we have <italic>z</italic>-standardized the scores to account for the use of different tests that are not identically normed, here we also present the correlations for the whole sample. As with the original papers, with the reduced scale, <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores were consistently correlated with achievement test scores, such that higher <inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> scores were associated with higher achievement test scores; this was the case for the sample as a whole as well as for students within each individual grade level. Again, as in the original papers, PAE scores for the reduced scale were correlated with achievement test scores; this was the case across the whole sample, and for first and second graders in particular.</p>
<table-wrap id="t8" position="anchor" orientation="portrait">
<label>Table 8</label><caption><title>Partial Correlations Between Accuracy and Linearity of Number Line Estimates With Achievement Scores, Controlling for Age Using the Reduced Scale</title></caption>
<table frame="hsides" rules="groups" style="striped-#f3f3f3">
<col width="34%" align="left"/>
<col width="33%"/>
<col width="33%"/>
<thead>
<tr>
<th>Sample</th>
<th>PAE</th>
<th><inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr>
<td>All Grades</td>
<td align="char" char="."><italic>r</italic>(227) = -.283, <italic>p</italic> &lt; .001</td>
<td align="char" char="."><italic>r</italic>(227) = .318, <italic>p</italic> &lt; .001</td>
</tr>
<tr>
<td>Kindergarten</td>
<td><italic>r</italic>(58) = -.154, <italic>ns</italic></td>
<td align="char" char="."><italic>r</italic>(58) = .264, <italic>p</italic> &lt; .05</td>
</tr>
<tr>
<td>First Grade</td>
<td align="char" char="."><italic>r</italic>(72) = -.411, <italic>p</italic> &lt; .001</td>
<td align="char" char="."><italic>r</italic>(72) = .406, <italic>p</italic> &lt; .001</td>
</tr>
<tr>
<td>Second Grade</td>
<td align="char" char="."><italic>r</italic>(71) = -.334, <italic>p</italic> &lt; .01</td>
<td align="char" char="."><italic>r</italic>(71) = .315, <italic>p</italic> &lt; .01</td>
</tr>
<tr>
<td>Third Grade</td>
<td><italic>r</italic>(17) = -.437, <italic>ns</italic></td>
<td align="char" char="."><italic>r</italic>(17) = .511, <italic>p</italic> &lt; .05</td>
</tr>
</tbody>
</table>
</table-wrap></sec></sec></sec>
<sec sec-type="discussion"><title>Discussion</title>
<p>Over the past two decades, the number line estimation task has been widely used as a measure of children’s numerical competence (e.g., <xref ref-type="bibr" rid="r35">Li et al., 2024</xref>; <xref ref-type="bibr" rid="r46">Ruiz et al., 2023</xref>). However, the number of trials given within the task has varied considerably, involving up to 44 or even more trials given to individual students. The present study used data from two previously published studies involving number line estimation on a 0 – 100 scale (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>) to determine potential reduced sets of trials that could be used to produce similar results while taking up less time in data collection.</p>
<p>The current findings demonstrate that our reduced whole number line (0 – 100) scale successfully replicates the patterns of developmental findings from both prior foundational work (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>) and recent assertions (<xref ref-type="bibr" rid="r42">Qin et al., 2024</xref>). However, we find inconsistencies specifically in relation to patterns of results for second graders, discussed further below. That is, the reduced scale (15 target items) has comparable psychometric properties compared to the original scale (23 target items). Further, as in seminal pieces by Siegler and Booth (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>), the logarithmic-to-linear shift is supported with the reduced scale, with kindergarteners’ estimates better fit by a logarithmic than a linear function and third graders’ estimates better fit by a linear than a logarithmic function. Analysis of individual student distributions also supports this claim. The previous work demonstrated developmental change in how children represent numerical magnitudes on a mental number line. They found that young children (around 5) hold a logarithmic representation of numbers where smaller numbers (e.g., 1, 2, 3) are spaced farther apart on their mental number line, while larger numbers (e.g., 30, 40, 50) are compressed closer together. That is, they overestimate the size of smaller numbers relative to larger numbers. This reflects a non-linear representation of numerical magnitude in which equal intervals represent exponential rather than additive changes. Children’s numerical representations become more linear as they develop (~6 – 8 years old). That is, they shift towards holding a linear mental number line in which equal intervals represent the same magnitude as demonstrated by their placement of target numbers spaced equally and appropriately on a number line; a similar finding from <xref ref-type="bibr" rid="r42">Qin and colleagues (2024)</xref> regarding increased linearity in student estimates with age is also replicated with our reduced scale.</p>
<p>As in the prior work, first-graders were no more linear in their estimates than logarithmic. However, there were some slight deviations in the findings of the current work and prior work with second graders’ estimates. In the seminal work, second graders’ estimates were significantly more linear than logarithmic. In the current analysis with the reduced scale, the second graders’ estimates did not significantly differ in terms of their linearity or logarithmic nature. An inspection of the <italic>R</italic><sup>2</sup> values shows that second graders’ estimates seemed to be more linear (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">lin</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .95) than logarithmic (<inline-formula><mml:math><mml:mstyle scriptminsize="0pt"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">log</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> = .90), but these differences are not significantly different. This discrepancy does not appear to be an issue with power, as the analysis included second graders across three original studies. However, it is possible that the sensitivity of the reduced scale for detecting pattern shifts in number line estimation varies by grade. That is, our reduced scale may be most accurate with respect to linear and logarithmic patterns for the grades chronologically furthest from the developmental transition (kindergartners and third graders) and the grades around the transition (e.g., 2nd grade) may need a more expanded measure to more precisely and fully capture the nuanced change from logarithmic to linear representations.</p>
<p>Next we consider the finding that the best option of the reduced 0 – 100 scales comprised 15 target items, with 12 of those items in the first half of the range of the scale. That is, the scale included the following targets: 3, 4, 6, 8, 12, 17, 21, 24, 25, 29, 33, 39, 52, 57, and 81. This is likely due to the composition of the sample in relation to their developmental stage of numerical magnitude understanding. That is, the majority of the sample held a logarithmic representation of numerical magnitude, and so more target items in the first half of the scale is likely better able to capture the shape of the curve in that part of the scale. As mentioned previously, it is likely that the 0 – 100 scale employed in the present study is not the optimal way to capture the numerical competence of the older students in this study (second and third grade students), and that their performance on a 0 – 1000 scale may be more useful. However, when it is desirable to assess the 0 – 100 number line estimation skills of these older students who have more accurate or linear representations of the scale, researchers may need even fewer target items to capture their magnitude understanding in this range. Future work with older samples may wish to investigate further reductions. Following this logic, it is possible that reduction procedures for larger scales (e.g., 0 – 1000, 0 – 10,000) may follow the same patterns with older children. For example, the logarithmic to linear shift that occurs within the 0 – 1,000 scale happens at the age of 8 – 10 (<xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>; <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>). The logarithmic to linear shift that occurs within the 0 – 10,000 scale happens around the age of 10 – 12 (<xref ref-type="bibr" rid="r55">Siegler et al., 2011</xref>). It is possible that reduced scales for these ranges would rely more heavily on targets in the 0 – 500 range (for the 0 – 1000 scale) and in the 0 – 5000 range (for the 0 – 10000 scale) if studying 8 – 10 year olds and 10 – 12 year olds, respectively. Alternatively, it is possible that there is already bias reflected in the design of the original scale due to the intentional oversampling of the first half of the scale. That is, because there are more target items in the first half of the number line, it may be easier to replicate a logarithmic curve found with a reduced set of target items sampled from this same distribution. To determine whether this possibility is a better explanation for the perceived accuracy of the reduced scale, a set of targets that are equally representative of the entirety of the scale may be necessary. Though this is a possibility, and more research with a wider variety of targets may serve to support or challenge prior work, we consider the findings of <xref ref-type="bibr" rid="r25">Gashaj et al. (2016)</xref>, who used an approximately equal number of target items above and below 50 on a 0 – 100 scale and found that kindergarteners’ numerical magnitude on a 0 – 100 number line was more logarithmic than linear. Still, further research using a more even distribution of target items across scales at different grades or ages may clarify whether potential biases exist in the original scale.</p>
<p>We also employed rigorous psychometric methods to evaluate and refine the scale, addressing limitations in the original scale's development. We utilized the graded response model (GRM), which is specifically designed for analyzing ordered polytomous data such as Likert-style items (<xref ref-type="bibr" rid="r48">Samejima, 2016</xref>). This approach allowed us to examine item-level properties and overall scale performance with greater precision than classical test theory methods alone.</p>
<p>Our analyses also included exploratory factor analysis (EFA) with scree plot examination, providing insights into the scale's dimensionality (<xref ref-type="bibr" rid="r13">Cattell, 1966</xref>). We also conducted parallel analysis, which is considered one of the most accurate methods for determining factor retention (<xref ref-type="bibr" rid="r32">Horn, 1965</xref>). The use of multiple criteria for assessing dimensionality, including eigenvalue ratios and explained variance, strengthened our conclusions about the scale's structure. We also used both Cronbach's alpha and McDonald's omega to check for reliability, which gave us a more complete picture of internal consistency (<xref ref-type="bibr" rid="r20">Dunn et al., 2014</xref>). These methodological strengths, combined with our systematic approach to scale reduction, provide a solid foundation for confidence in our findings. The refined scale not only maintains the psychometric integrity of the original but also potentially improves its unidimensionality and efficiency, addressing the limitations of the original scale's atheoretical development.</p>
<p>Despite these strengths, the current study has the following limitations that can be addressed in future research. First, we note some methodological limitations. Our method for factor analysis was exploratory in nature. In future studies, confirmatory factor analysis (CFA) could be used to confirm the structure of the scale even more. It would be possible to test the hypothesized factor structure that came out of our exploratory analyses more thoroughly with CFA. This could make the construct validity of the refined scale stronger. Moreover, the assumptions of unidimensionality and local independence, which are fundamental to the graded response model used in this study, warrant further examination. Future research should investigate how well these assumptions hold across different contexts and populations, as violations of these assumptions could impact the accuracy of item parameter estimates and overall scale performance.</p>
<p>Further, while we believe that the reduced scale sufficiently captures variance in students’ number line estimation skills, it is, of course, not certain that they would have made the same estimations on the target items if they had only been given those items. That is, it is possible that the students’ estimations on the retained target items were influenced by their experience considering the other, non-retained target items. Future work would need to directly compare children’s estimations when given only the reduced scale vs. the full scale to determine whether this is of consequence on estimates. Additionally, while our sample (<italic>N</italic> = 234) was drawn from three studies by the same group—which may limit external generalizability—we note that the present work primarily targets scale reduction and item functioning (discrimination, thresholds) rather than population-level generalization; nonetheless, future validation in independent, diverse samples is warranted. We also note here that we had far fewer third graders in our analytic sample than earlier grades. Future work in this area should attempt to recruit a more even distribution of participants per grade.</p>
<p>We also consider the age of the data itself. That is, the data used in the current study originates from studies conducted as early as 2004. Some of the trends we see in the logarithmic to linear shift may happen at a different age or grade level, as schools have taken steps to focus more on inclusion of number line activities in their standard mathematical practices in second grade (e.g., <xref ref-type="bibr" rid="r69">National Governors Association Center for Best Practices &amp; Council of Chief State School Officers [NGA &amp; CCSSO], 2010</xref>). More current number line work with the 0 – 100 scale has replicated the same pattern but has argued that the pattern is more reflective of skill at mapping numbers to space (e.g., <xref ref-type="bibr" rid="r15">Cohen &amp; Quinlan, 2018</xref>; <xref ref-type="bibr" rid="r29">Haman &amp; Patro, 2022</xref>; <xref ref-type="bibr" rid="r50">Sasanguie et al., 2016</xref>). Indeed, most recent work using the 0 – 100 NLE task emphasizes results regarding accuracy (e.g., PAE) rather than linearity (e.g., <xref ref-type="bibr" rid="r47">Ruiz et al., 2024</xref>); or evaluate linearity without consideration of the logarithmic function (e.g., <xref ref-type="bibr" rid="r35">Li et al., 2024</xref>). Thus, the usefulness of the NLE task for comparing patterns of estimation may be less important now. Researchers who maintain the goal of investigating transitions in estimation patterns may need to confirm that the reduced scale we derived in the current study yields the same or similar findings with children currently in third grade or below.</p>
<p>It is also important to note that although discrimination might differ by grade, our per-grade subsamples are too small to support stable multi-group GRM estimation; small group sizes in polytomous IRT can lead to imprecise discrimination and threshold estimates, so we report a single-group calibration and reserve formal grade-level invariance tests for future work with larger, grade-balanced samples.</p>
<p>Finally, it is worth noting that the reduced scale is comprised almost entirely of items in the lower half of the numerical scale. When the original task was designed, numbers were oversampled from the lower third of the scale in order to capture a potential logarithmic-to-linear shift (e.g., <xref ref-type="bibr" rid="r54">Siegler &amp; Booth, 2004</xref>). Oversampling in the case of the 0 – 100 number line meant including 10 items between 0 – 30 and 14 items between 31&nbsp;–&nbsp;100. In contrast, our reduced scale retained 8 items between 0 – 30 and only 5 items between 31 – 100. Thus, oversampling of the low end of the scale is indeed important and may in fact be even more critical than originally anticipated. That is, accurately capturing variance in young students’ number line estimations appears to require an even greater proportion of numbers at the low end and many fewer items in the top two-thirds of the scale.</p>
<sec><title>Practical Implications</title>
<p>We consider the practical implications of this work. Magnitude understanding is one of the most studied mathematical competencies in the fields of both psychology and education. Thus, the number line task is one of the most often administered assessments in research in this domain. Though some researchers use a paper-and-pencil task as was traditionally done in the foundational work (e.g., <xref ref-type="bibr" rid="r9">Booth &amp; Siegler, 2006</xref>), much of the field is moving towards iPad or Chromebook administration of the task. Administration is typically conducted in a one-on-one setting with a researcher. Math instructional time makes up approximately 12% of U.S. students’ school days (with variations across states and districts; <xref ref-type="bibr" rid="r37">Mullis et al., 2016</xref>). Many schools are hesitant to interrupt children’s school day to engage in research. U.S. schools in particular are hesitant to spend too much time on research-based activities that they see as taking away from instructional time. Although intervention-based work can be seen as instructional time and may be welcome by many school administrators and practitioners, the assessments that come along with such classroom-based research are typically not viewed as “instructional practices”. That is, schools prioritize instructional time, and research activities must not interfere with teaching and learning. Thus, schools actively safeguard instructional time. As such, researchers conducting classroom-based assessments must be cognizant of the time their assessments take. The current scale reduction study takes important strides towards addressing these concerns in both a practical and methodologically rigorous way that simultaneously aligns with the developmental changes that occur within the children the scale is designed to assess.</p></sec></sec>
</body>
<back>
<ref-list><title>References</title>
<ref id="r1"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ashkenazi</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Cohen</surname>, <given-names>N.</given-names></string-name></person-group> (<year>2023</year>). <article-title>Developmental trajectories of number line estimations in math anxiety: Evidence from bounded and unbounded number line estimation.</article-title> <source>Applied Cognitive Psychology</source>, <volume>37</volume>(<issue>6</issue>), <fpage>1316</fpage>–<lpage>1327</lpage>. <pub-id pub-id-type="doi">10.1002/acp.4125</pub-id></mixed-citation></ref>
<ref id="r2"><mixed-citation publication-type="book">Baker, F. B. (2001). <italic>The basics of item response theory</italic>. ERIC Clearinghouse on Assessment and Evaluation.</mixed-citation></ref>
<ref id="r3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Barbieri</surname>, <given-names>C. A.</given-names></string-name>, <string-name name-style="western"><surname>Miller-Cotto</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>Clerjuste</surname>, <given-names>S. N.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Chawla</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2023</year>). <article-title>A meta-analysis of the worked examples effect on mathematics performance.</article-title> <source>Educational Psychology Review</source>, <volume>35</volume>(<issue>1</issue>), <elocation-id>11</elocation-id>. <pub-id pub-id-type="doi">10.1007/s10648-023-09745-1</pub-id></mixed-citation></ref>
<ref id="r4"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Barbieri</surname>, <given-names>C. A.</given-names></string-name>, <string-name name-style="western"><surname>Young</surname>, <given-names>L. K.</given-names></string-name>, <string-name name-style="western"><surname>Newton</surname>, <given-names>K. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Booth</surname>, <given-names>J. L.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Predicting middle school profiles of algebra performance using fraction knowledge.</article-title> <source>Child Development</source>, <volume>92</volume>(<issue>5</issue>), <fpage>1984</fpage>–<lpage>2005</lpage>. <pub-id pub-id-type="doi">10.1111/cdev.13568</pub-id><pub-id pub-id-type="pmid">33929044</pub-id></mixed-citation></ref>
<ref id="r5"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Barth</surname>, <given-names>H. C.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Paladino</surname>, <given-names>A. M.</given-names></string-name></person-group> (<year>2011</year>). <article-title>The development of numerical estimation: Evidence against a representational shift.</article-title> <source>Developmental Science</source>, <volume>14</volume>(<issue>1</issue>), <fpage>125</fpage>–<lpage>135</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-7687.2010.00962.x</pub-id><pub-id pub-id-type="pmid">21159094</pub-id></mixed-citation></ref>
<ref id="r6"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Berteletti</surname>, <given-names>I.</given-names></string-name>, <string-name name-style="western"><surname>Man</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Booth</surname>, <given-names>J. R.</given-names></string-name></person-group> (<year>2015</year>). <article-title>How number line estimation skills relate to neural activations in single digit subtraction problems.</article-title> <source>NeuroImage</source>, <volume>107</volume>, <fpage>198</fpage>–<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2014.12.011</pub-id><pub-id pub-id-type="pmid">25497398</pub-id></mixed-citation></ref>
<ref id="r7"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Booth</surname>, <given-names>J. L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Newton</surname>, <given-names>K. J.</given-names></string-name></person-group> (<year>2012</year>). <article-title>Fractions: Could they really be the gatekeeper’s doorman?</article-title> <source>Contemporary Educational Psychology</source>, <volume>37</volume>(<issue>4</issue>), <fpage>247</fpage>–<lpage>253</lpage>. <pub-id pub-id-type="doi">10.1016/j.cedpsych.2012.07.001</pub-id></mixed-citation></ref>
<ref id="r8"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Booth</surname>, <given-names>J. L.</given-names></string-name>, <string-name name-style="western"><surname>Newton</surname>, <given-names>K. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Twiss-Garrity</surname>, <given-names>L. K.</given-names></string-name></person-group> (<year>2014</year>). <article-title>The impact of fraction magnitude knowledge on algebra performance and learning.</article-title> <source>Journal of Experimental Child Psychology</source>, <volume>118</volume>, <fpage>110</fpage>–<lpage>118</lpage>. <pub-id pub-id-type="doi">10.1016/j.jecp.2013.09.001</pub-id><pub-id pub-id-type="pmid">24124868</pub-id></mixed-citation></ref>
<ref id="r9"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Booth</surname>, <given-names>J. L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Siegler</surname>, <given-names>R. S.</given-names></string-name></person-group> (<year>2006</year>). <article-title>Developmental and individual differences in pure numerical estimation.</article-title> <source>Developmental Psychology</source>, <volume>42</volume>(<issue>1</issue>), <fpage>189</fpage>–<lpage>201</lpage>. <pub-id pub-id-type="doi">10.1037/0012-1649.41.6.189</pub-id><pub-id pub-id-type="pmid">16420128</pub-id></mixed-citation></ref>
<ref id="r10"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Booth</surname>, <given-names>J. L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Siegler</surname>, <given-names>R. S.</given-names></string-name></person-group> (<year>2008</year>). <article-title>Numerical magnitude representations influence arithmetic learning.</article-title> <source>Child Development</source>, <volume>79</volume>(<issue>4</issue>), <fpage>1016</fpage>–<lpage>1031</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-8624.2008.01173.x</pub-id><pub-id pub-id-type="pmid">18717904</pub-id></mixed-citation></ref>
<ref id="r11"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Brez</surname>, <given-names>C. C.</given-names></string-name>, <string-name name-style="western"><surname>Miller</surname>, <given-names>A. D.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ramirez</surname>, <given-names>E. M.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Numerical estimation in children for both positive and negative numbers.</article-title> <source>Journal of Cognition and Development</source>, <volume>17</volume>(<issue>2</issue>), <fpage>341</fpage>–<lpage>358</lpage>. <pub-id pub-id-type="doi">10.1080/15248372.2015.1033525</pub-id></mixed-citation></ref>
<ref id="r12"><mixed-citation publication-type="book">Brown, T. A. (2015). <italic>Confirmatory factor analysis for applied research</italic> (2nd ed.). The Guilford Press.</mixed-citation></ref>
<ref id="r13"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Cattell</surname>, <given-names>R. B.</given-names></string-name></person-group> (<year>1966</year>). <article-title>The scree test for the number of factors.</article-title> <source>Multivariate Behavioral Research</source>, <volume>1</volume>(<issue>2</issue>), <fpage>245</fpage>–<lpage>276</lpage>. <pub-id pub-id-type="doi">10.1207/s15327906mbr0102_10</pub-id><pub-id pub-id-type="pmid">26828106</pub-id></mixed-citation></ref>
<ref id="r14"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Chan</surname>, <given-names>J. Y.-C.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Mazzocco</surname>, <given-names>M. M.</given-names></string-name></person-group> (<year>2024</year>). <article-title>New measures of number line estimation performance reveal children’s ordinal understanding of numbers.</article-title> <source>Journal of Experimental Child Psychology</source>, <volume>245</volume>, <elocation-id>105965</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.jecp.2024.105965</pub-id><pub-id pub-id-type="pmid">38823358</pub-id></mixed-citation></ref>
<ref id="r15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Cohen</surname>, <given-names>D. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Quinlan</surname>, <given-names>P. T.</given-names></string-name></person-group> (<year>2018</year>). <article-title>The log–linear response function of the bounded number-line task is unrelated to the psychological representation of quantity.</article-title> <source>Psychonomic Bulletin &amp; Review</source>, <volume>25</volume>(<issue>1</issue>), <fpage>447</fpage>–<lpage>454</lpage>. <pub-id pub-id-type="doi">10.3758/s13423-017-1290-z</pub-id><pub-id pub-id-type="pmid">28429176</pub-id></mixed-citation></ref>
<ref id="r16"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Cornu</surname>, <given-names>V.</given-names></string-name>, <string-name name-style="western"><surname>Hornung</surname>, <given-names>C.</given-names></string-name>, <string-name name-style="western"><surname>Schiltz</surname>, <given-names>C.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Martin</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2017</year>). <article-title>How do different aspects of spatial skills relate to early arithmetic and number line estimation?</article-title> <source>Journal of Numerical Cognition</source>, <volume>3</volume>(<issue>2</issue>), <fpage>309</fpage>–<lpage>343</lpage>. <pub-id pub-id-type="doi">10.5964/jnc.v3i2.36</pub-id></mixed-citation></ref>
<ref id="r17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Dehaene</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Izard</surname>, <given-names>V.</given-names></string-name>, <string-name name-style="western"><surname>Spelke</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Pica</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2008</year>). <article-title>Log or linear? Distinct intuitions of the number scale in Western and Amazonian indigene cultures.</article-title> <source>Science</source>, <volume>320</volume>(<issue>5880</issue>), <fpage>1217</fpage>–<lpage>1220</lpage>. <pub-id pub-id-type="doi">10.1126/science.1156540</pub-id><pub-id pub-id-type="pmid">18511690</pub-id></mixed-citation></ref>
<ref id="r18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>DeWolf</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Bassok</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Holyoak</surname>, <given-names>K. J.</given-names></string-name></person-group> (<year>2015</year>). <article-title>From rational numbers to algebra: Separable contributions of decimal magnitude and relational understanding of fractions.</article-title> <source>Journal of Experimental Child Psychology</source>, <volume>133</volume>, <fpage>72</fpage>–<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1016/j.jecp.2015.01.013</pub-id><pub-id pub-id-type="pmid">25744594</pub-id></mixed-citation></ref>
<ref id="r19"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Dietrich</surname>, <given-names>J. F.</given-names></string-name>, <string-name name-style="western"><surname>Huber</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Dackermann</surname>, <given-names>T.</given-names></string-name>, <string-name name-style="western"><surname>Moeller</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Fischer</surname>, <given-names>U.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Place‐value understanding in number line estimation predicts future arithmetic performance.</article-title> <source>The British Journal of Developmental Psychology</source>, <volume>34</volume>(<issue>4</issue>), <fpage>502</fpage>–<lpage>517</lpage>. <pub-id pub-id-type="doi">10.1111/bjdp.12146</pub-id><pub-id pub-id-type="pmid">27136923</pub-id></mixed-citation></ref>
<ref id="r20"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Dunn</surname>, <given-names>T. J.</given-names></string-name>, <string-name name-style="western"><surname>Baguley</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Brunsden</surname>, <given-names>V.</given-names></string-name></person-group> (<year>2014</year>). <article-title>From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation.</article-title> <source>British Journal of Psychology</source>, <volume>105</volume>(<issue>3</issue>), <fpage>399</fpage>–<lpage>412</lpage>. <pub-id pub-id-type="doi">10.1111/bjop.12046</pub-id><pub-id pub-id-type="pmid">24844115</pub-id></mixed-citation></ref>
<ref id="r21"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ellis</surname>, <given-names>A.</given-names></string-name>, <string-name name-style="western"><surname>Susperreguy</surname>, <given-names>M. I.</given-names></string-name>, <string-name name-style="western"><surname>Purpura</surname>, <given-names>D. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Davis-Kean</surname>, <given-names>P. E.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Conceptual replication and extension of the relation between the number line estimation task and mathematical competence across seven studies.</article-title> <source>Journal of Numerical Cognition</source>, <volume>7</volume>(<issue>3</issue>), <fpage>435</fpage>–<lpage>452</lpage>. <pub-id pub-id-type="doi">10.5964/jnc.7033</pub-id></mixed-citation></ref>
<ref id="r22"><mixed-citation publication-type="book">Embretson, S. E., &amp; Reise, S. P. (2000). <italic>Item response theory for psychologists</italic>. Psychology Press.</mixed-citation></ref>
<ref id="r23"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Fazio</surname>, <given-names>L. K.</given-names></string-name>, <string-name name-style="western"><surname>Bailey</surname>, <given-names>D. H.</given-names></string-name>, <string-name name-style="western"><surname>Thompson</surname>, <given-names>C. A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Siegler</surname>, <given-names>R. S.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Relations of different types of numerical magnitude representations to each other and to mathematics achievement.</article-title> <source>Journal of Experimental Child Psychology</source>, <volume>123</volume>, <fpage>53</fpage>–<lpage>72</lpage>. <pub-id pub-id-type="doi">10.1016/j.jecp.2014.01.013</pub-id><pub-id pub-id-type="pmid">24699178</pub-id></mixed-citation></ref>
<ref id="r24"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Fitzsimmons</surname>, <given-names>C. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Thompson</surname>, <given-names>C. A.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Developmental differences in monitoring accuracy and cue use when estimating whole-number and fraction magnitudes.</article-title> <source>Cognitive Development</source>, <volume>61</volume>, <elocation-id>101148</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.cogdev.2021.101148</pub-id></mixed-citation></ref>
<ref id="r25"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Gashaj</surname>, <given-names>V.</given-names></string-name>, <string-name name-style="western"><surname>Uehlinger</surname>, <given-names>Y.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Roebers</surname>, <given-names>C. M.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Numerical magnitude skills in 6-years-old children: Exploring specific associations with components of executive function.</article-title> <source>Journal of Educational and Developmental Psychology</source>, <volume>6</volume>(<issue>1</issue>), <fpage>157</fpage>–<lpage>172</lpage>. <pub-id pub-id-type="doi">10.5539/jedp.v6n1p157</pub-id></mixed-citation></ref>
<ref id="r26"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Geary</surname>, <given-names>D. C.</given-names></string-name>, <string-name name-style="western"><surname>Hoard</surname>, <given-names>M. K.</given-names></string-name>, <string-name name-style="western"><surname>Nugent</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Bailey</surname>, <given-names>D. H.</given-names></string-name></person-group> (<year>2012</year>). <article-title>Mathematical cognition deficits in children with learning disabilities and persistent low achievement: A five-year prospective study.</article-title> <source>Journal of Educational Psychology</source>, <volume>104</volume>(<issue>1</issue>), <fpage>206</fpage>–<lpage>223</lpage>. <pub-id pub-id-type="doi">10.1037/a0025398</pub-id><pub-id pub-id-type="pmid">27158154</pub-id></mixed-citation></ref>
<ref id="r27"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Geary</surname>, <given-names>D. C.</given-names></string-name>, <string-name name-style="western"><surname>Hoard</surname>, <given-names>M. K.</given-names></string-name>, <string-name name-style="western"><surname>Nugent</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Byrd-Craven</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2008</year>). <article-title>Development of number line representations in children with mathematical learning disability.</article-title> <source>Developmental Neuropsychology</source>, <volume>33</volume>(<issue>3</issue>), <fpage>277</fpage>–<lpage>299</lpage>. <pub-id pub-id-type="doi">10.1080/87565640801982361</pub-id><pub-id pub-id-type="pmid">18473200</pub-id></mixed-citation></ref>
<ref id="r28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Gunderson</surname>, <given-names>E. A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Hildebrand</surname>, <given-names>L.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Relations among spatial skills, number line estimation, and exact and approximate calculation in young children.</article-title> <source>Journal of Experimental Child Psychology</source>, <volume>212</volume>, <elocation-id>105251</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.jecp.2021.105251</pub-id><pub-id pub-id-type="pmid">34333360</pub-id></mixed-citation></ref>
<ref id="r29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Haman</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Patro</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2022</year>). <article-title>More linear than log? Non-symbolic number-line estimation in 3-to 5-year-old children.</article-title> <source>Frontiers in Psychology</source>, <volume>13</volume>, <elocation-id>1003696</elocation-id>. <pub-id pub-id-type="doi">10.3389/fpsyg.2022.1003696</pub-id><pub-id pub-id-type="pmid">36389566</pub-id></mixed-citation></ref>
<ref id="r30"><mixed-citation publication-type="book">Hambleton, R. K., Swaminathan, H., &amp; Rogers, H. J. (1991). <italic>Fundamentals of item response theory.</italic> SAGE.</mixed-citation></ref>
<ref id="r31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Hoard</surname>, <given-names>M. K.</given-names></string-name>, <string-name name-style="western"><surname>Geary</surname>, <given-names>D. C.</given-names></string-name>, <string-name name-style="western"><surname>Byrd-Craven</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Nugent</surname>, <given-names>L.</given-names></string-name></person-group> (<year>2008</year>). <article-title>Mathematical cognition in intellectually precocious first graders.</article-title> <source>Developmental Neuropsychology</source>, <volume>33</volume>(<issue>3</issue>), <fpage>251</fpage>–<lpage>276</lpage>. <pub-id pub-id-type="doi">10.1080/87565640801982338</pub-id><pub-id pub-id-type="pmid">18473199</pub-id></mixed-citation></ref>
<ref id="r32"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Horn</surname>, <given-names>J. L.</given-names></string-name></person-group> (<year>1965</year>). <article-title>A rationale and test for the number of factors in factor analysis.</article-title> <source>Psychometrika</source>, <volume>30</volume>(<issue>2</issue>), <fpage>179</fpage>–<lpage>185</lpage>. <pub-id pub-id-type="doi">10.1007/BF02289447</pub-id><pub-id pub-id-type="pmid">14306381</pub-id></mixed-citation></ref>
<ref id="r33"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Jung</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Roesch</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Klein</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Dackermann</surname>, <given-names>T.</given-names></string-name>, <string-name name-style="western"><surname>Heller</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Moeller</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2020</year>). <article-title>The strategy matters: Bounded and unbounded number line estimation in secondary school children.</article-title> <source>Cognitive Development</source>, <volume>53</volume>, <elocation-id>100839</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.cogdev.2019.100839</pub-id></mixed-citation></ref>
<ref id="r34"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Landy</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>Charlesworth</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ottmar</surname>, <given-names>E.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Categories of large numbers in line estimation.</article-title> <source>Cognitive Science</source>, <volume>41</volume>(<issue>2</issue>), <fpage>326</fpage>–<lpage>353</lpage>. <pub-id pub-id-type="doi">10.1111/cogs.12342</pub-id><pub-id pub-id-type="pmid">26888051</pub-id></mixed-citation></ref>
<ref id="r35"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Li</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Yang</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ye</surname>, <given-names>X.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Children’s number line estimation strategies: Evidence from bounded and unbounded number line estimation tasks.</article-title> <source>Frontiers in Psychology</source>, <volume>15</volume>, <elocation-id>1421821</elocation-id>. <pub-id pub-id-type="doi">10.3389/fpsyg.2024.1421821</pub-id><pub-id pub-id-type="pmid">39575331</pub-id></mixed-citation></ref>
<ref id="r36"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Link</surname>, <given-names>T.</given-names></string-name>, <string-name name-style="western"><surname>Huber</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Nuerk</surname>, <given-names>H.-C.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Moeller</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Unbounding the mental number line—New evidence on children’s spatial representation of numbers.</article-title> <source>Frontiers in Psychology</source>, <volume>4</volume>, <elocation-id>1021</elocation-id>. <pub-id pub-id-type="doi">10.3389/fpsyg.2013.01021</pub-id><pub-id pub-id-type="pmid">24478734</pub-id></mixed-citation></ref>
<ref id="r37"><mixed-citation publication-type="book">Mullis, I. V. S., Martin, M. O., Foy, P., &amp; Hooper, M. (2016). <italic>TIMSS 2015 International Results in Mathematics</italic>. Chestnut Hill, MA, USA: TIMSS &amp; PIRLS International Study Center.</mixed-citation></ref>
<ref id="r38"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Namkung</surname>, <given-names>J. M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Fuchs</surname>, <given-names>L. S.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Cognitive predictors of calculations and number line estimation with whole numbers and fractions among at-risk students.</article-title> <source>Journal of Educational Psychology</source>, <volume>108</volume>(<issue>2</issue>), <fpage>214</fpage>–<lpage>228</lpage>. <pub-id pub-id-type="doi">10.1037/edu0000055</pub-id><pub-id pub-id-type="pmid">26955188</pub-id></mixed-citation></ref>
<ref id="r69"><mixed-citation publication-type="web">National Governors Association Center for Best Practices &amp; Council of Chief State School Officers. (2010). <italic>Common Core State Standards.</italic> <ext-link ext-link-type="uri" xlink:href="https://corestandards.org/">https://corestandards.org</ext-link></mixed-citation></ref>
<ref id="r39"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Nuraydin</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Stricker</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>Ugen</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Martin</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Schneider</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2023</year>). <article-title>The number line estimation task is a valid tool for assessing mathematical achievement: A population level study with 6484 Luxembourgish ninth-graders.</article-title> <source>Journal of Experimental Child Psychology</source>, <volume>225</volume>, <elocation-id>105521</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.jecp.2022.105521</pub-id><pub-id pub-id-type="pmid">35973280</pub-id></mixed-citation></ref>
<ref id="r40"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Peeters</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>Sekeris</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Verschaffel</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Luwel</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Evaluating the effect of labeled benchmarks on children’s number line estimation performance and strategy use.</article-title> <source>Frontiers in Psychology</source>, <volume>8</volume>, <elocation-id>1082</elocation-id>. <pub-id pub-id-type="doi">10.3389/fpsyg.2017.01082</pub-id><pub-id pub-id-type="pmid">28713302</pub-id></mixed-citation></ref>
<ref id="r41"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Praet</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Desoete</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Number line estimation from kindergarten to Grade 2: A longitudinal study.</article-title> <source>Learning and Instruction</source>, <volume>33</volume>, <fpage>19</fpage>–<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1016/j.learninstruc.2014.02.003</pub-id></mixed-citation></ref>
<ref id="r42"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Qin</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>Kim</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Opfer</surname>, <given-names>J. E.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Varieties of number-line estimation: Systematic review, models, and data.</article-title> <source>Developmental Review</source>, <volume>74</volume>, <elocation-id>101161</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.dr.2024.101161</pub-id></mixed-citation></ref>
<ref id="r43"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Reinert</surname>, <given-names>R. M.</given-names></string-name>, <string-name name-style="western"><surname>Hartmann</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Huber</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Moeller</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Unbounded number line estimation as a measure of numerical estimation.</article-title> <source>PLoS One</source>, <volume>14</volume>(<issue>3</issue>), <elocation-id>e0213102</elocation-id>. <pub-id pub-id-type="doi">10.1371/journal.pone.0213102</pub-id><pub-id pub-id-type="pmid">30870436</pub-id></mixed-citation></ref>
<ref id="r44"><mixed-citation publication-type="web">Revelle, W. (2021). <italic>psych: procedures for psychological, psychometric, and personality research</italic> (R package version 1.9.12) [Computer software]. Northwestern University, Evanston, Illinois. <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/ltm/index.html">https://cran.r-project.org/web/packages/ltm/index.html</ext-link></mixed-citation></ref>
<ref id="r45"><mixed-citation publication-type="web">Rizopoulos, M. D. (2018). <italic>Package ‘ltm’</italic> [Computer software]. <ext-link ext-link-type="uri" xlink:href="http://wiki.r-project.org/rwiki/doku.php">http://wiki.r-project.org/rwiki/doku.php</ext-link></mixed-citation></ref>
<ref id="r46"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ruiz</surname>, <given-names>C.</given-names></string-name>, <string-name name-style="western"><surname>Kohnen</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Bull</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2023</year>). <article-title>Number line estimation patterns and their relationship with mathematical performance.</article-title> <source>Journal of Numerical Cognition</source>, <volume>9</volume>(<issue>2</issue>), <fpage>285</fpage>–<lpage>301</lpage>. <pub-id pub-id-type="doi">10.5964/jnc.10557</pub-id></mixed-citation></ref>
<ref id="r47"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ruiz</surname>, <given-names>C.</given-names></string-name>, <string-name name-style="western"><surname>Kohnen</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Bull</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2024</year>). <article-title>The relationship between number line estimation and mathematical reasoning: A quantile regression approach.</article-title> <source>European Journal of Psychology of Education</source>, <volume>39</volume>(<issue>2</issue>), <fpage>581</fpage>–<lpage>606</lpage>. <pub-id pub-id-type="doi">10.1007/s10212-023-00708-2</pub-id></mixed-citation></ref>
<ref id="r48"><mixed-citation publication-type="book">Samejima, F. (2016). Graded response models. In W. J. van der Linden (Ed.), <italic>Handbook of item response theory</italic> (pp. 95-107). Chapman and Hall/CRC.</mixed-citation></ref>
<ref id="r49"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Sarı</surname>, <given-names>M. H.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Olkun</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Number line estimations, place value understanding and mathematics achievement.</article-title> <source>Journal of Education and Future</source>, <volume>19</volume>, <fpage>37</fpage>–<lpage>47</lpage>. <pub-id pub-id-type="doi">10.30786/jef.729843</pub-id></mixed-citation></ref>
<ref id="r50"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Sasanguie</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>Verschaffel</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Reynvoet</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Luwel</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2016</year>). <article-title>The development of symbolic and non-symbolic number line estimations: Three developmental accounts contrasted within cross-sectional and longitudinal data.</article-title> <source>Psychologica Belgica</source>, <volume>56</volume>(<issue>4</issue>), <fpage>382</fpage>–<lpage>405</lpage>. <pub-id pub-id-type="doi">10.5334/pb.276</pub-id><pub-id pub-id-type="pmid">30479447</pub-id></mixed-citation></ref>
<ref id="r51"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Schiller</surname>, <given-names>L. K.</given-names></string-name>, <string-name name-style="western"><surname>Abreu-Mendoza</surname>, <given-names>R. A.</given-names></string-name>, <string-name name-style="western"><surname>Thompson</surname>, <given-names>C. A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Rosenberg-Lee</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Children’s estimates of equivalent rational number magnitudes are not equal: Evidence from whole numbers, percentages, decimals, and fractions.</article-title> <source>Journal of Experimental Child Psychology</source>, <volume>247</volume>, <elocation-id>106030</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.jecp.2024.106030</pub-id><pub-id pub-id-type="pmid">39167859</pub-id></mixed-citation></ref>
<ref id="r52"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Schneider</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Grabner</surname>, <given-names>R. H.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Paetsch</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2009</year>). <article-title>Mental number line, number line estimation, and mathematical achievement: Their interrelations in Grades 5 and 6.</article-title> <source>Journal of Educational Psychology</source>, <volume>101</volume>(<issue>2</issue>), <fpage>359</fpage>–<lpage>372</lpage>. <pub-id pub-id-type="doi">10.1037/a0013840</pub-id></mixed-citation></ref>
<ref id="r53"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Schneider</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Merz</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Stricker</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>De Smedt</surname>, <given-names>B.</given-names></string-name>, <string-name name-style="western"><surname>Torbeyns</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>Verschaffel</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Luwel</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Associations of number line estimation with mathematical competence: A meta‐analysis.</article-title> <source>Child Development</source>, <volume>89</volume>(<issue>5</issue>), <fpage>1467</fpage>–<lpage>1484</lpage>. <pub-id pub-id-type="doi">10.1111/cdev.13068</pub-id><pub-id pub-id-type="pmid">29637540</pub-id></mixed-citation></ref>
<ref id="r54"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Siegler</surname>, <given-names>R. S.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Booth</surname>, <given-names>J. L.</given-names></string-name></person-group> (<year>2004</year>). <article-title>Development of numerical estimation in young children.</article-title> <source>Child Development</source>, <volume>75</volume>(<issue>2</issue>), <fpage>428</fpage>–<lpage>444</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-8624.2004.00684.x</pub-id><pub-id pub-id-type="pmid">15056197</pub-id></mixed-citation></ref>
<ref id="r55"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Siegler</surname>, <given-names>R. S.</given-names></string-name>, <string-name name-style="western"><surname>Thompson</surname>, <given-names>C. A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Schneider</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2011</year>). <article-title>An integrated theory of whole number and fractions development.</article-title> <source>Cognitive Psychology</source>, <volume>62</volume>(<issue>4</issue>), <fpage>273</fpage>–<lpage>296</lpage>. <pub-id pub-id-type="doi">10.1016/j.cogpsych.2011.03.001</pub-id><pub-id pub-id-type="pmid">21569877</pub-id></mixed-citation></ref>
<ref id="r56"><mixed-citation publication-type="other">Silla, E. M., Guba, T. P., Rodrigues, A., Anisiobi, O. C., Scanniello, A., &amp; Barbieri, C. A. A. (2026). <italic>Systematic review of mathematical and motivational relations with algebra performance</italic> [Manuscript under review].</mixed-citation></ref>
<ref id="r57"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Slocum-Gori</surname>, <given-names>S. L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Zumbo</surname>, <given-names>B. D.</given-names></string-name></person-group> (<year>2011</year>). <article-title>Assessing the unidimensionality of psychological scales: Using multiple criteria from factor analysis.</article-title> <source>Social Indicators Research</source>, <volume>102</volume>, <fpage>443</fpage>–<lpage>461</lpage>. <pub-id pub-id-type="doi">10.1007/s11205-010-9682-8</pub-id></mixed-citation></ref>
<ref id="r58"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Slusser</surname>, <given-names>E. B.</given-names></string-name>, <string-name name-style="western"><surname>Santiago</surname>, <given-names>R. T.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Barth</surname>, <given-names>H. C.</given-names></string-name></person-group> (<year>2013</year>). <article-title>Developmental change in numerical estimation.</article-title> <source>Journal of Experimental Psychology: General</source>, <volume>142</volume>(<issue>1</issue>), <fpage>193</fpage>–<lpage>208</lpage>. <pub-id pub-id-type="doi">10.1037/a0028560</pub-id><pub-id pub-id-type="pmid">22612768</pub-id></mixed-citation></ref>
<ref id="r59"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Steinke</surname>, <given-names>D. A.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Evaluating number sense in community college developmental math students.</article-title> <source>COABE Journal</source>, <volume>6</volume>(<issue>1</issue>), <fpage>5</fpage>–<lpage>19</lpage>.</mixed-citation></ref>
<ref id="r60"><mixed-citation publication-type="confproc">Sudo, S., Chileya, G., Kume, A., &amp; Fujino, Y. (2022). Estimating number line as a cause of low mathematics performance in Zambia. In <italic>Proceedings of the 7th International STEM Education Conference 2022 (iSTEM-Ed), Sukhothai, Thailand.</italic> IEEE.</mixed-citation></ref>
<ref id="r61"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Sullivan</surname>, <given-names>J. L.</given-names></string-name>, <string-name name-style="western"><surname>Juhasz</surname>, <given-names>B. J.</given-names></string-name>, <string-name name-style="western"><surname>Slattery</surname>, <given-names>T. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Barth</surname>, <given-names>H. C.</given-names></string-name></person-group> (<year>2011</year>). <article-title>Adults’ number-line estimation strategies: Evidence from eye movements.</article-title> <source>Psychonomic Bulletin &amp; Review</source>, <volume>18</volume>(<issue>3</issue>), <fpage>557</fpage>–<lpage>563</lpage>. <pub-id pub-id-type="doi">10.3758/s13423-011-0081-1</pub-id><pub-id pub-id-type="pmid">21409477</pub-id></mixed-citation></ref>
<ref id="r62"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Tremolada</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Taverna</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Bonichini</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Pillon</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Biffi</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2019</year>). <article-title>The developmental pathways of preschool children with acute lymphoblastic leukemia: Communicative and social sequelae one year after treatment.</article-title> <source>Children</source>, <volume>6</volume>(<issue>8</issue>), <elocation-id>92</elocation-id>. <pub-id pub-id-type="doi">10.3390/children6080092</pub-id><pub-id pub-id-type="pmid">31412554</pub-id></mixed-citation></ref>
<ref id="r63"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Vogel</surname>, <given-names>S. E.</given-names></string-name>, <string-name name-style="western"><surname>Grabner</surname>, <given-names>R. H.</given-names></string-name>, <string-name name-style="western"><surname>Schneider</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Siegler</surname>, <given-names>R. S.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ansari</surname>, <given-names>D.</given-names></string-name></person-group> (<year>2013</year>). <article-title>Overlapping and distinct brain regions involved in estimating the spatial position of numerical and non-numerical magnitudes: An fMRI study.</article-title> <source>Neuropsychologia</source>, <volume>51</volume>(<issue>5</issue>), <fpage>979</fpage>–<lpage>989</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2013.02.001</pub-id><pub-id pub-id-type="pmid">23416146</pub-id></mixed-citation></ref>
<ref id="r64"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Wall</surname>, <given-names>J. L.</given-names></string-name>, <string-name name-style="western"><surname>Thompson</surname>, <given-names>C. A.</given-names></string-name>, <string-name name-style="western"><surname>Dunlosky</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Merriman</surname>, <given-names>W. E.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Children can accurately monitor and control their number-line estimation performance.</article-title> <source>Developmental Psychology</source>, <volume>52</volume>(<issue>10</issue>), <fpage>1493</fpage>–<lpage>1502</lpage>. <pub-id pub-id-type="doi">10.1037/dev0000180</pub-id><pub-id pub-id-type="pmid">27548391</pub-id></mixed-citation></ref>
<ref id="r65"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Xu</surname>, <given-names>C.</given-names></string-name>, <string-name name-style="western"><surname>Burr</surname>, <given-names>S. D. L.</given-names></string-name>, <string-name name-style="western"><surname>LeFevre</surname>, <given-names>J.-A.</given-names></string-name>, <string-name name-style="western"><surname>Skwarchuk</surname>, <given-names>S.-L.</given-names></string-name>, <string-name name-style="western"><surname>Osana</surname>, <given-names>H. P.</given-names></string-name>, <string-name name-style="western"><surname>Maloney</surname>, <given-names>E. A.</given-names></string-name>, <string-name name-style="western"><surname>Wylie</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>Simms</surname>, <given-names>V.</given-names></string-name>, <string-name name-style="western"><surname>Susperreguy</surname>, <given-names>M. I.</given-names></string-name>, <string-name name-style="western"><surname>Douglas</surname>, <given-names>H.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Lafay</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2023</year>). <article-title>Development of children’s number line estimation in primary school: Regional and curricular influences.</article-title> <source>Cognitive Development</source>, <volume>67</volume>, <elocation-id>101355</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.cogdev.2023.101355</pub-id></mixed-citation></ref>
<ref id="r66"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Young</surname>, <given-names>L. K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Booth</surname>, <given-names>J. L.</given-names></string-name></person-group> (<year>2015</year>). <article-title>Student magnitude knowledge of negative numbers.</article-title> <source>Journal of Numerical Cognition</source>, <volume>1</volume>(<issue>1</issue>), <fpage>38</fpage>–<lpage>55</lpage>. <pub-id pub-id-type="doi">10.5964/jnc.v1i1.7</pub-id></mixed-citation></ref>
<ref id="r67"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Yu</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Kim</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>Fitzsimmons</surname>, <given-names>C. J.</given-names></string-name>, <string-name name-style="western"><surname>Mielicki</surname>, <given-names>M. K.</given-names></string-name>, <string-name name-style="western"><surname>Thompson</surname>, <given-names>C. A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Opfer</surname>, <given-names>J. E.</given-names></string-name></person-group> (<year>2022</year>). <article-title>From integers to fractions: The role of analogy in developing a coherent understanding of proportional magnitude.</article-title> <source>Developmental Psychology</source>, <volume>58</volume>(<issue>10</issue>), <fpage>1912</fpage>–<lpage>1930</lpage>. <pub-id pub-id-type="doi">10.1037/dev0001398</pub-id><pub-id pub-id-type="pmid">35666925</pub-id></mixed-citation></ref>
<ref id="r68"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Zhu</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Cai</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Leung</surname>, <given-names>A. W.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Number line estimation predicts mathematical skills: Difference in Grades 2 and 4.</article-title> <source>Frontiers in Psychology</source>, <volume>8</volume>, <elocation-id>1576</elocation-id>. <pub-id pub-id-type="doi">10.3389/fpsyg.2017.01576</pub-id><pub-id pub-id-type="pmid">28955282</pub-id></mixed-citation></ref>
</ref-list>
<sec sec-type="data-availability" id="das"><title>Data Availability</title>
<p>The data supporting the findings of this study are not publicly available because they are part of an ongoing larger longitudinal research project. To support transparency and reproducibility, metadata and analytic materials (e.g., study documentation and R code) are available from the corresponding author upon reasonable request.</p>
</sec>
<fn-group>
<fn fn-type="financial-disclosure"><p>The authors have no funding to report.</p></fn>
</fn-group>
<fn-group>
<fn fn-type="conflict"><p>The authors have declared that no competing interests exist.</p></fn>
</fn-group>
<fn-group>
   <fn fn-type="presented-at"><p>An earlier version of this work was presented at the 2025 AERA Annual Meeting. The conference presentation introduced the study concept and preliminary results whereas the present article reports the full analyses, complete results, and expanded implications.</p></fn> 
</fn-group>
<ack>
<p>The authors have no additional (i.e., non-financial) support to report.</p>
</ack>
</back>
</article>