A recent meta-analysis demonstrated the overall correlation between the number line estimation (NLE) task and children’s mathematical competence was r = .44 (positively recoded), and this relation increased with age. The goal of the current study was to conceptually replicate and extend these results by further synthesizing this correlation utilizing studies not present in the meta-analysis. Across seven studies, 954 participants, ranging from 3 to 11 years old (Age M = 6.02 years, SD = 1.57), the overall estimation-competence correlations were similar to those of the meta-analysis and ranged from r = −.40 to −.35. The current conceptual replication demonstrated that the meta-analysis captured a stable overall relation between performance on the NLE task and mathematical competence. However, the current study failed to replicate the same moderation of age group presented in the meta-analysis. Furthermore, the current study extended results by assessing the stability and predictive validity of the NLE task while controlling for covariates. Results suggested that the NLE task demonstrated poor stability and predictive validity in the seven samples present in this study. Thus, although concurrent relations replicated, the differential age moderation, lack of stability, and lack of predictive validity in these studies require a more nuanced approach to understanding the utility of the NLE task. Future research should focus on understanding the connection between children’s developmental progression and NLE measurement before further investigating the predictive and diagnostic importance of the task for broader mathematical competence.

The number line estimation (NLE) task is a tool that has been widely used to assess children’s numerical magnitude abilities and mathematical cognition (e.g.,

Evidence for the association between performance on the NLE task and mathematical competence was strengthened by a recent meta-analysis (

The current study adds to the literature by leveraging data from seven independent and diverse studies not included in the previous meta-analysis (

Based on

Six studies for this conceptual replication were conducted in Midwestern cities in the United States of America, and one study was conducted in Chile. Studies were included in the analyses if they had administered a NLE task at least once. There were no data exclusion or outlier protocols. All study samples were collected between 2012 and 2020, whereas _{age}

For full descriptive information, see

Variable | Study 1 |
Study 2 |
Study 3 |
Study 4 |
Study 5 |
Study 6 |
Study 7 |
|||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

275 | 5.71 (0.40) | 4.93 | 6.94 | 92 | 5.55 (0.34) | 4.92 | 6.67 | 35 | 5.65 (0.44) | 4.83 | 6.42 | 62 | 6.92 (2.09) | 3.42 | 11.50 | 101 | 4.31 (0.63) | 3.17 | 5.25 | 261 | 7.88 (0.93) | 5.75 | 10.41 | 124 | 4.18 (0.58) | 3.12 | 5.26 | |

277 | 0.47 (0.50) | 0 | 1 | 92 | 0.46 (0.50) | 0 | 1 | 35 | 0.31 (0.47) | 0 | 1 | 61 | 0.54 (0.50) | 0 | 1 | 101 | 0.51 (0.50) | 0 | 1 | 262 | 0.47 (0.50) | 0 | 1 | 124 | 0.46 (0.50) | 0 | 1 | |

Mathematical Competence | ||||||||||||||||||||||||||||

WJAP Time 1 | 274 | 17.85 (4.17) | 1 | 30 | 92 | 21.82 (4.11) | 11 | 30 | 34 | 14.71 (4.25) | 1 | 22 | 262 | 27.26 (5.09) | 14 | 42 | ||||||||||||

PENS-B Time 1 | 100 | 6.06 (4.90) | 0 | 18 | 124 | 10.24 (5.82) | 0 | 23 | ||||||||||||||||||||

Panamath Time 1 | 250 | 80.84 (11.40) | 33.33 | 98.86 | 56 | 84.56 (15.11) | 25.00 | 100.00 | ||||||||||||||||||||

WJAP Time 2 | 175 | 24.25 (3.76) | 12 | 35 | 35 | 20.80 (4.75) | 12 | 32 | 159 | 29.97 (4.59) | 19 | 44 | ||||||||||||||||

PENS-B Time 2 | 114 | 13.55 (5.95) | 0 | 24 | ||||||||||||||||||||||||

Panamath Time 2 | 169 | 85.87 (7.81) | 46.59 | 98.86 | ||||||||||||||||||||||||

Number Line Estimation Task | ||||||||||||||||||||||||||||

PAE Time 1 | 252 | 17.73 (10.18) | 2.56 | 47.41 | 92 | 13.54 (7.63) | 3.60 | 46.30 | 35 | 9.24 (4.81) | 1.98 | 20.00 | 62 | 13.23 (10.06) | 2.25 | 47.85 | 101 | 41.55 (11.98) | 14.44 | 63.33 | 259 | 9.30 (9.13) | 0.32 | 49.76 | 124 | 22.70 (7.09) | 8.38 | 45.00 |

PAE Time 2 | 169 | 12.45 (7.11) | 2.94 | 40.47 | 155 | 12.63 (7.60) | 3.29 | 38.70 | 114 | 21.00 (8.90) | 4.75 | 46.75 | ||||||||||||||||

Symbolic | Symbolic | Symbolic | Symbolic | Symbolic | Symbolic | Nonsymbolic | ||||||||||||||||||||||

Paper | Paper | Paper | Paper | Computer | Computer | Paper | ||||||||||||||||||||||

0–20 | 0–20 | 0–20 | 0–20 | 0–10 | 0–100, 0–1,000 | 0–10 | ||||||||||||||||||||||

Pathways | School Instruction | LENA | Sharing Task | Storybook | Chile | ANS |

Furthermore, all studies included at least one measure of mathematical competence beyond the NLE task. These measures included standardized mathematical achievement tests (Woodcock-Johnson Applied Problems subtest [WJAP;

This study (Pathways) is an extension of a larger longitudinal project exploring the effects of schooling on executive functions. Data were collected on children attending four local elementary schools in Midwestern U.S. cities. The total sample includes three waves of children who were followed from kindergarten to second grade. Thus, there are three cohorts of children in the total sample. Data from this dataset were previously published (

Schools included in this sample served children with a range of socioeconomic backgrounds based on school percentages of free and reduced-price lunch. Children were 4 to 6 years old (_{age}

All children completed a modified version of the NLE task. Children were presented with a piece of paper and pencil, and a line labeled 0 and 20 at the left and right ends, respectively. Their task was to draw a hash mark on the line indicating their best guess at where a given random integer between 0–20 fell on the line. The experimenter held one of three flipbooks that presented the child with the number they were to place on the line. Each flipbook contained the same randomly chosen 10 integers (1, 4, 6, 8, 9, 10, 13, 16, 17, and 19) in different orders. A new, unmarked line was used in each trial so that only one hash mark was placed on each line. Before testing, participants completed three practice trials on a shorter line labeled 0–10 to ensure an understanding of the instructions. The experimenter did not provide any feedback about children's placement of the hash marks during the practice trials. Participants then completed 10 test trials on the 0–20 line without feedback. To assess children’s numerical magnitude accuracy, their percentage of absolute error (PAE) scores were calculated following the procedure described by

Children completed the standardized English version of the Applied Problems subtest from the Woodcock-Johnson III Tests of Achievement (

Children also completed a computerized nonsymbolic numerical discrimination task, titled Panamath (

This sample (School Instruction) is part of a larger study exploring the role of classroom instruction on early math skills. Data from this study have not yet been previously published. Kindergarten children (_{age}

Children completed the same version of the NLE task described in Study 1. Children’s PAE scores were calculated based on the 10 test trials.

Children completed the standardized mathematical achievement test (Woodcock-Johnson III Applied Problems subtest;

This sample (LENA) is part of a larger study exploring the role of the home environment on early numeracy skills. Data from the broader study were previously published at _{age}

Children were presented with a number line on paper that ranged from zero to twenty. Children were required to select the appropriate position of a number on a number line between 0 and 20. First, children were shown where both 0 and 20 go in a horizontal line with "0" below the left end and "20" below the right end on a sheet of paper. Then, all numbers from 1 to 19 were presented, one at a time, in a random order, and they were asked to estimate the position of each number on the line, one number per number line. To assess the accuracy of the child's estimates, each child's PAE was calculated.

Children completed the standardized mathematical achievement test (Woodcock-Johnson III Applied Problems subtest;

This sample (Sharing Task) was part of a larger project examining the relation between social division and math achievement in children. Data from this study have not yet been previously published. Children’s ages ranged from 3 to 14 years old when they participated in the study (_{age}

Children completed the same version of the NLE task described in Study 1. Children’s PAE scores were calculated based on the 10 test trials.

Children completed the nonsymbolic numerical discrimination task (Panamath;

This study (Storybook) recruited participated from a larger longitudinal project exploring a storybook intervention on children's mathematical skills. Data from this study have not yet been previously published. Participants included in the current study are the pretest data collected prior to the intervention random assignment. Children were recruited from local preschools in Midwestern U.S. cities. Children were 3 to 5 years old (_{age}

Children completed an iPad version of the NLE task ranging from 0–10 (

Children completed the PENS-B (

This sample (Chile) is part of a more extensive cross-sequential study looking at the early predictors of math skills in Chilean children. Data from the broader study were previously published at _{age}

An iPad version of the NLE task was administered, similar to Study 5 (

Children completed the Spanish version of the Woodcock-Johnson III Applied Problems subtest (Batería III Woodcock-Muñoz;

This sample (ANS) is part of a larger study on children’s mathematical, executive function (EF), and literacy development during preschool. Data from the broader study were previously published at _{age}

Children completed a modified version of the NLE task with paper and pencil designed to be a non-verbal number line task where, rather than being presented with numbers, children were presented with sets of dots (1–10). The task also included modifications according to

Children completed the nonsymbolic numerical discrimination task (Panamath) and the standardized numeracy skill assessment (PENS-B). The methods for administering these tests are described in Study 4 and Study 5, respectively.

Data analyses were run using the

Further, we tested if different aspects of the NLE and mathematical competence measures moderated any correlations using a

Finally, exploratory multiple linear regressions were conducted to extend the results from

The seven studies used in this manuscript are existing datasets, thus, a sensitivity power analysis was used to calculate the range of minimally detectable effect sizes (MDES) given the sample sizes across the proposed correlations (

The overall effect sizes and effect sizes by moderator variables are listed in

Moderator | 95% CI | Studies | ||
---|---|---|---|---|

Overall | 7 | |||

WJAP | −.36 | [−0.42, −0.30] | 634 | 4 |

PENS-B | −.40 | [−0.51, −0.30] | 224 | 2 |

Panamath | −.35 | [−0.44, −0.26] | 408 | 3 |

PAE Time 2 | .24 | [0.13, 0.33] | 409 | 3 |

Age group | ||||

<6 years | 7 | |||

WJAP | −.40 | [−0.50, −0.29] | 306 | 4 |

PENS-B | −.40 | [−0.51, −0.28] | 224 | 2 |

Panamath | −.29 | [−0.39, −0.29] | 320 | 3 |

PAE Time 2 | .23 | [0.11, 0.33] | 226 | 3 |

6–9 years | 5 | |||

WJAP | −.11 | [−0.19, −0.01] | 291 | 4 |

PENS-B | ||||

Panamath | −.20 | [−0.42, 0.04] | 75 | 2 |

PAE Time 2 | −.05 | [−0.23, 0.14] | 180 | 2 |

Number Type | ||||

Symbolic | 6 | |||

WJAP | −.36 | [−0.41, −0.30] | 634 | 4 |

PENS-B | −.26 | [−0.45, −0.06] | 100 | 1 |

Panamath | −.27 | [−0.40, −0.14] | 285 | 2 |

PAE Time 2 | .08 | [−0.05, 0.22] | 295 | 2 |

Non-Symbolic | 1 | |||

WJAP | ||||

PENS-B | −.22 | [−0.38, −0.05] | 124 | 1 |

Panamath | −.26 | [−0.39, −0.10] | 123 | 1 |

PAE Time 2 | .06 | [−0.15, 0.23] | 114 | 1 |

Presentation Medium | ||||

Computer | 2 | |||

WJAP | .06 | [−0.04, 0.18] | 259 | 1 |

PENS-B | −.26 | [−0.45, −0.08] | 100 | 1 |

Panamath | ||||

PAE Time 2 | –.01 | [−0.19, 0.16] | 151 | 1 |

Paper | 5 | |||

WJAP | −.4 | [−0.48, −0.30] | 258 | 3 |

PENS-B | −.22 | [−0.38, −0.07] | 124 | 1 |

Panamath | −.35 | [−0.44, −0.25] | 408 | 3 |

PAE Time 2 | .23 | [0.11, 0.34] | 258 | 2 |

Number Range | ||||

0 to 10 | 2 | |||

WJAP | ||||

PENS-B | −.40 | [−0.50, −0.27] | 224 | 2 |

Panamath | −.26 | [−0.40, −0.10] | 123 | 1 |

PAE Time 2 | .06 | [−0.13, 0.25] | 114 | 1 |

0 to 20 | 4 | |||

WJAP | −.40 | [−0.49, −0.31] | 375 | 3 |

PENS-B | – | – | – | – |

Panamath | −.27 | [−0.39, −0.14] | 285 | 2 |

PAE Time 2 | .19 | [−0.02, 0.38] | 144 | 1 |

0 to 100 | 1 | |||

WJAP | −.51 | [−0.63, −0.39] | 95 | 1 |

PENS-B | ||||

Panamath | ||||

PAE Time 2 | .53 | [0.36, 0.68] | 80 | 1 |

0 to 1,000 | 1 | |||

WJAP | −.55 | [−0.64, −0.46] | 164 | 1 |

PENS-B | ||||

Panamath | ||||

PAE Time 2 | .57 | [0.35, 0.73] | 71 | 1 |

The tests of moderation for the measures of mathematical competence were not found to be statistically significant for the NLE relation (see

Moderator Comparison | |||
---|---|---|---|

Overall | |||

WJAP & PENS-B | 0.63 | .529 | 1,800 |

WJAP & Panamath | −0.27 | .787 | 1,040 |

Panamath & PENS-B | −0.79 | .427 | 630 |

Age Group | |||

<6 years & 6–9 years | |||

WJAP | −3.86 | <.001 | 595 |

Panamath | −0.74 | .457 | 393 |

Number Type | |||

Symbolic & Non-Symbolic | |||

PENS-B | −0.30 | .767 | 222 |

Panamath | −0.05 | .961 | 406 |

Presentation Medium | |||

Computer & Paper | |||

WJAP | 5.47 | <.001 | 515 |

PENS-B | −0.30 | .767 | 222 |

Number Range | |||

0 to 10 & 0 to 20 | |||

Panamath | 0.05 | .961 | 406 |

0 to 20 & 0 to 100 | |||

WJAP | 1.25 | .212 | 468 |

0 to 20 & 0 to 1,000 | |||

WJAP | 2.13 | .033 | 537 |

0 to 100 & 0 to 1,000 | |||

WJAP | 0.43 | .669 | 257 |

Due to our small sample size for children in the above 9 age group (

The test of moderation for the correlation between the NLE task and mathematical competence was found to be statistically significant for the participants’ age group on the Applied Problems subtest measure (

The tests of moderation for the type of numbers presented to children were also not found to be statistically significant for the correlation between performance on the NLE task and mathematical competence (see

The presentation medium moderation test was found to be statistically significant for the estimation-competence relation for children's performance on the Woodcock-Johnson III Applied Problems subtest (

The correlation between performance on the NLE task and the Applied Problems subtest was found to be significantly moderated by the number range that was presented (

To further assess the stability and validity of the NLE task, we also included extension analyses beyond the replication of the study by

Results from regression analyses assessing the stability of the NLE task are presented in

Variable | Model 1 |
Model 2 |
Model 3 |
Model 4 |
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

β | β | β | β | |||||||||

Constant | 26.95 (2.29) | < .001 | 14.58 (2.92) | < .001 | 30.81 (6.86) | < .001 | 39.35 (3.82) | < .001 | ||||

Age | −2.25 (0.32) | −0.37 | < .001 | 2.16 (0.57) | 0.30 | < .001 | −2.56 (1.91) | −0.17 | .182 | −3.74 (0.69) | −0.38 | < .001 |

Sex | 0.99 (0.80) | 0.06 | .219 | 2.04 (0.81) | 0.14 | .013 | −0.52 (1.79) | −0.03 | .771 | 0.22 (1.03) | 0.01 | .835 |

PAE Time 1 | 0.06 (0.04) | 0.07 | .187 | −0.03 (0.04) | −0.05 | .428 | 0.05 (0.13) | 0.04 | .675 | 0.05 (0.06) | 0.05 | .374 |

Mathematical Competence | ||||||||||||

WJAP Time 1 | −0.76 (0.11) | −0.58 | <.001 | |||||||||

PENS-B Time 1 | −0.01 (0.20) | −0.01 | .965 | |||||||||

Panamath Time 1 | −0.07 (0.04) | −0.13 | .066 | |||||||||

Adjusted ^{2} |
.16 | .18 | .04 | .23 | ||||||||

406 | 290 | 113 | 237 |

When controlling for previous mathematical competence, performance on the NLE task at Time 1 was not a predictor of performance at Time 2 (Model 2 β(

Results from the multiple linear regressions are presented in

Variable | Model 1 |
Model 2 |
Model 3 |
||||||
---|---|---|---|---|---|---|---|---|---|

WJAP Time 2 |
PENS-B Time 2 |
Panamath Time 2 |
|||||||

β | β | β | |||||||

Constant | 10.61 (1.17) | < .001 | 0.07 (3.27) | .982 | 28.10 (4.78) | < .001 | |||

Age | 0.08 (0.24) | 0.02 | .728 | 2.37 (0.91) | 0.24 | .010 | 5.44 (0.85) | 0.36 | < .001 |

Sex | −0.69 (0.33) | −0.06 | .039 | −0.65 (0.85) | −0.06 | .448 | −2.37 (1.29) | −0.09 | .067 |

PAE Time 1 | 0.03 (0.02) | 0.05 | .141 | −0.08 (0.06) | −0.09 | .206 | −0.08 (0.08) | −0.05 | .287 |

Mathematical Competence | |||||||||

WJAP Time 1 | 0.73 (0.04) | 0.83 | < .001 | ||||||

PENS-B Time 1 | 0.54 (0.09) | 0.52 | < .001 | ||||||

Panamath Time 1 | 0.37 (0.05) | 0.42 | < .001 | ||||||

Adjusted ^{2} |
.69 | .51 | .49 | ||||||

334 | 113 | 237 |

The multiple linear regressions suggest the Applied Problems subtest of the Woodcock-Johnson III Tests of Achievement predicted itself when controlling for other variables, β(

Regression results demonstrated that the PENS-B measure predicted itself when controlling for other variables, β(

Results revealed that Panamath was a statistically significant predictor of itself when controlling for other variables, β(

The goal of the current study was to conceptually replicate correlational results from a meta-analysis examining the relation between NLE and mathematical competence (

The new findings from the seven independent studies replicated the

Estimation-competence stability across the mathematical competence measures could be due to a few different things. First, the mathematical competence measures are highly correlated. Although, one of the mathematical competence measures was Panamath which is thought to measure a more innate, non-symbolic numerical processing (

Results also supported part of our second hypothesis, such that age group moderated the estimation-competence association. The age moderation replicated results from

Inconsistent with our third hypothesis and findings from

Although complicated, these replication results highlight the necessity for future work to focus on the importance of variation in the NLE task. In their meta-analysis,

In the current replication effort, we also extended the

Consistently, however, our results suggested that the NLE task did not predict any of our three mathematical competence measures while controlling for prior achievement. However, in each case, mathematical competence demonstrated strong stability among all competence measures, even while controlling for children’s age at testing, sex, and performance on the number line task. These findings add to our previous results to suggest that the NLE task did not demonstrate strong predictive validity for other mathematical competence measures. Taken together, the inconsistency and lack of evidence in stability, the lack of evidence in predictive validity, and the changing relation with mathematical competence across ages raises into question what, specifically, the NLE task measures.

Although both the stability and predictive validity of the NLE task were exploratory, our results are consistent with other recent studies that have assessed the reliability of the task. For example, multiple studies have shown low internal reliability for various versions of the NLE task (

In sum, the current study successfully replicated the overall findings from the

Variable | Study 1 |
Study 6 |
Study 7 |
||||||
---|---|---|---|---|---|---|---|---|---|

β | β | β | |||||||

3.89 (9.88) | .694 | 49.42 (7.45) | < .001 | 30.90 (6.54) | < .001 | ||||

Age | 1.06 (1.65) | 0.06 | .522 | −5.34 (1.04) | −0.48 | < .001 | −2.61 (1.43) | −0.17 | .072 |

Sex | 0.16 (1.24) | 0.01 | .896 | 2.68 (1.16) | 0.17 | .022 | −0.50 (1.73) | −0.03 | .772 |

PAE Time 1 | 0.15 (0.06) | 0.20 | .022 | 0.22 (0.08) | 0.26 | .007 | 0.05 (0.12) | 0.04 | .643 |

Adjusted ^{2} |
.02 | .18 | .01 | ||||||

139 | 146 | 110 |

For this article, a dataset is freely available (

The Supplementary Materials contain the syntax and a simulated dataset for this study (for access see

