The debate about how to characterize performance on the number line estimation (NLE) task has yielded a diverse set of accuracy measures. These accuracy measures include characterizing performance by deviation from the correct score with percent absolute error (PAE), modeling the shape of responses via the logarithmic-to-linear shift, and modeling the strategy use via the cyclical power model (one and two cycle). In the present study, accuracy on a symbolic NLE task was examined using phenotypic and quantitative genetic analyses of all four measurements. Data were collected from a same-sex twin sample at ages 12 and 15 (N = 150 pairs) as part of the Western Reserve Reading and Math Project. Linear mixed-effect models were used to compare how well the four NLE accuracy measures predicted math achievement, as measured by the Woodcock Johnson-III Fluency, Calculation, and Applied Problems subtests, after cognitive ability was controlled. NLE accuracy measures were not related to Fluency or Calculation after cognitive ability was controlled, but all NLE accuracy measures were related to Applied Problems at 12 and 15 years old. Although theories about what the NLE task measures have been contested in the literature, the relationship between NLE accuracy and achievement did not differ regardless of the type of accuracy measure used. In addition, the estimates for genetic and environmental influences were proportionately similar across the NLE accuracy measures. Overall, all proposed measures of accuracy in the present sample appear appropriate for prediction of math achievement in adolescents.
In an era of increasing demand on schools to prepare students for success in STEM careers, the urgency for gaining a better understanding of the underlying components of math cognition that predict achievement in math is increasing. Pervasive core processes for mathematical reasoning include the manner and precision with which people conceptualize, represent, manipulate, and compare internal numerical magnitude representation (
More broadly, when symbolic numbers (e.g., “8”) were presented to both adults and children, regardless of modality, magnitude representation was automatically activated, even in cases where a task did not require a judgment of magnitude (
Internal magnitude representation has been measured using a variety of techniques (
Although variants of the procedures exist, the symbolic NLE task is arguably the most well-studied of the variants, and characteristics of the symbolic NLE task have been studied in relation to performance on other tasks as well. Over time, students become better on the NLE task, and individual differences in performance on the NLE task are associated with higher math achievement on curriculum based tests in elementary school (
As with math achievement, there are several ways to evaluate performance on the NLE task, and the most common for the NLE task is average error. In addition to the average error, the pattern of response has also been identified as a relevant aspect of the task, with some children displaying a logarithmic response pattern and others displaying a linear response pattern (
With the ability to hold multiple representations at the same time, it appears that the logarithmic-to-linear shift in performance on the NLE task may be driven by a familiarity with numbers that arises from school instruction. Further evidence of the importance of school instruction comes from the fact that adults from cultures without formal mathematical instruction map number to space logarithmically, whereas the majority of adults from Western cultures map number to space linearly (
Due to the developmental logarithmic-to-linear shift, linearity has been used as a performance measure on the NLE task. Mathematics achievement has been correlated with linearity based on individual fit statistics to linear functions using R^{2} (
Although there is robust evidence to suggest that performance on the NLE task is better in older and more well-educated children, the connection between better performance on the NLE task and changes to internal magnitude representation is still under debate (
Not only does attention to the midpoint and endpoints change performance, but also the range of numbers used for the number line changes the pattern of responding (
If responses on the NLE task are driven by strategy use, specifically how subjects use the midpoint and endpoints, then performance on the task may be more accurately characterized by a function that predicts minimal errors around the markers in use, the cyclical power model (
Due to these theoretical disagreements, comparisons between the cyclical power model and the logarithmic-to-linear shift have been conducted (mostly with group level data) with mixed results. In these studies, the fit of a mixed logarithmic-linear model (MLLM) representing the logarithmic-to-linear shift is favored in some (
Given the theoretical disagreement in the literature, it is not clear how well each method of scoring can predict performance on math achievement tests. The purpose of this study is to examine how different ways of measuring accuracy on the number line task using methods from theoretically different origins predict math achievement.
Behavior genetics is a tool used to investigate the origins of individual differences in traits by calculating the proportion of the variation accounted for by genetics, shared environment, and nonshared environment or error in a trait. Behavior genetics has particular utility in the present study because a direct measure of the amount of variation accounted for by genetics and different aspects of the environment on the NLE task has not been conducted. In addition, examining the NLE data in a behavior genetics framework will allow us to validate the theoretical underpinnings of each approach. Theoretically, the shared environmental factor, which would account for shared experiences in school settings, would be significant for the logarithmic-to-linear shift because schooling is linked to more linear responding on the task. Training programs have also shown success in initiating the logarithmic-to-linear shift (
This study is unique in using an adolescent sample to evaluate individual differences in responses on a NLE task, which not only will fill a grade and age hole in the literature but will also describe individual differences at an older age range in which linear responding may be assumed but not proven for all participants. Although the logarithmic-to-linear shift for numbers 0-1000 occurs in elementary school for most children, performance still varies in sixth grade. For example, at age 12, a logarithmic function was still the best fit function for 28% of participants (
Overall, each measurement style is attempting to measure different characteristics of performance on the NLE task. However, given that we are attempting to use the number line estimation task to understand individual differences and the predictive validity for achievement, a comparison of the accuracy measures, though they are theoretically different, is called for. The present study attempts to answer the following three questions. First, do the measures closely resemble one another? It was hypothesized that, given the theoretical differences between the measures, these accuracy measures would not be highly correlated. Second, are the accuracy measures distinct in their predictive value of different types of math achievement? It was hypothesized that NLE task performance would be most highly predictive of math achievement measures that involve complex math reasoning involving proportion judgment. Third, are the accuracy measures distinct in their genetic and environmental origins? It was hypothesized that variation in all measures would be significantly predicted by a genetic component due to the cognitive nature of the task. In addition, it was hypothesized that the shared environment component would be relevant for all measures due to past studies that have demonstrated the influences of schooling and intervention.
Data were drawn from the Western Reserve Reading and Math Project (WRRMP), a 10-wave longitudinal twin study in which same-sex twin pairs were recruited from school nominations and birth records in kindergarten or first grade. Data for the present study were drawn from the 8^{th} and 9^{th} measurement occasions. These waves of measurement were approximately 3.0 years apart (on average), and the participants averaged 12.2 years (
The NLE task was administered to participants at ages 12 and 15 via a pencil and paper format with 0 and 1000 displayed at opposite ends of the number line (
Differences in the administration procedure (emphasizing and correcting the half mark vs. not drawing attention to the half mark) and item distribution (oversampling the left side of the distribution vs. evenly sampling the distribution) has been shown to change which model (a mixed log-linear model or a mixed cyclical power model) has a better fit to the responses (
Percent absolute error (PAE) is the sum of the absolute value of errors divided by the total length of the number line times the number of trials (
The mixed log-linear model characterizes the shape of response between a linear and logarithmic function (
The one-cycle cyclical power model is a function that predicts the use of 0 and 1000 as anchors that assist in the proportion judgment (
Maturation of numerical ability would theoretically draw attention not only to the endpoints of the number line but also to its midpoint; subjects begin to use the midpoint as an anchor in the task, noting that values greater than 500 should be placed to the right of the midpoint, and values less than 500 should be placed to the left of the midpoint. The use of a midpoint strategy would lead to responses consistent with a two-cycle cyclical power model (
The Woodcock Johnson III (WJ-III) was administered as a measure of math achievement and included mathematics subtests Math Fluency, Calculation, and Applied Problems at ages 12 and 15 (
A general cognitive ability summary measure was compiled from several measures, including the Boston Naming Test (
Descriptive statistics for raw values are listed in
Phenotypic correlations are displayed in
Task | 12-years-old |
15-years-old |
||||||
---|---|---|---|---|---|---|---|---|
Min | Max | Min | Max | |||||
NLE accuracy | ||||||||
PAE (log) | 3.97 (0.58) | 2.79 | 5.99 | 300 | 3.55 (0.46) | 2.28 | 4.85 | 300 |
λ (log) | 0.12 (0.14) | 0 | 0.66 | 300 | 0.03 (0.07) | 0 | 0.37 | 300 |
|β_{1}-1| | 0.2 (0.19) | 0 | 0.80 | 300 | 0.08 (0.11) | 0 | 0.54 | 300 |
|β_{2}-1| | 0.35 (0.24) | 0 | 0.90 | 300 | 0.19 (0.16) | 0 | 0.73 | 300 |
102.07 (16.41) | 68 | 181 | 292 | 104.59 (18.03) | 63 | 171 | 300 | |
103.65 (13.85) | 52 | 146 | 287 | 102.9 (15.91) | 62 | 140 | 299 | |
108.55 (10.57) | 69 | 134 | 299 | 107.26 (10.68) | 76 | 140 | 295 |
Task | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12-years-old | |||||||||||||||
1. PAE (log) | .87** | .88** | .86** | -.16** | -.25** | -.40** | .37** | .40** | .41** | .28** | -.14** | -.28** | -.36** | -.28** | |
2. λ (log) | .97** | .87** | -.13** | -.20** | -.37** | .34** | .44** | .44** | .30** | -.17** | -.23** | -.36** | -.25** | ||
3. |β_{1}-1| | .88** | -.13** | -.24** | -.38** | .37** | .42** | .45** | .31** | -.17** | -.25** | -.36** | -.26** | |||
4. |β_{2}-1| | -.07 | -.21** | -.37** | .35** | .37** | .39** | .28** | -.08 | -.24** | -.33** | -.26** | ||||
5. WJ Fluency | .58** | .47** | -.17** | -.14* | -.12* | -.10 | .81** | .45** | .44** | .43** | |||||
6. WJ Calculation | .66** | -.31** | -.22** | -.24** | -.23** | .53** | .66** | .69** | .49** | ||||||
7. WJ AP | -.43** | -.37** | -.36** | -.38** | .51** | .66** | .82** | .59** | |||||||
15-years-old | |||||||||||||||
8. PAE (log) | .63** | .64** | .71** | -.22** | -.29** | -.43** | -.28** | ||||||||
9. λ (log) | .95** | .79** | -.17** | -.20** | -.29** | -.17** | |||||||||
10. |β_{1}-1| | .72** | -.15** | -.21** | -.30** | -.16** | ||||||||||
11. |β_{2}-1| | -.14* | -.22** | -.32** | -.17** | |||||||||||
12. WJ Fluency | .53** | .54** | .48** | ||||||||||||
13. WJ Calculation | .76** | .48** | |||||||||||||
14. WJ AP | .59** | ||||||||||||||
15. g composite |
*
In order to account for non-independence of observations (due to the fact that the participants were each part of a twin dyad), linear mixed-effect models with random intercept and Satterwhaite correction were used (
The NLE measure is used to predict math achievement while controlling for g in
NLE | 12-years-old |
15-years-old |
||||||
---|---|---|---|---|---|---|---|---|
Fluency | ||||||||
PAE (log) | 270.74 | .29*** | 221.98 | .01 | 275.79 | .37*** | 213.40 | .03 |
λ (log) | 269.96 | .28*** | 215.78 | -.01 | 276.91 | .36*** | 224.01 | -.04 |
|β_{1}-1| | 269.88 | .28*** | 216.31 | -.03 | 276.89 | .36*** | 222.40 | -.01 |
|β_{2}-1| | 272.04 | .30*** | 226.02 | .04 | 276.90 | .36*** | 199.29 | -.03 |
Calculation | ||||||||
PAE (log) | 263.15 | .43*** | 263.62 | -.01 | 272.74 | .39*** | 233.80 | -.05 |
λ (log) | 264.21 | .43*** | 260.53 | -.02 | 272.11 | .39*** | 241.62 | -.04 |
|β_{1}-1| | 263.62 | .43*** | 261.39 | -.05 | 271.55 | .39*** | 241.67 | -.07 |
|β_{2}-1| | 260.78 | .44*** | 266.75 | .01 | 271.35 | .39*** | 213.22 | -.05 |
Applied problems | ||||||||
PAE (log) | 275.29 | .43*** | 244.86 | -.16*** | 263.24 | .41*** | 239.34 | -.20*** |
λ (log) | 275.71 | .43*** | 238.00 | -.17*** | 266.83 | .42*** | 245.14 | -.14*** |
|β_{1}-1| | 275.66 | .44*** | 238.67 | -.17*** | 266.74 | .43*** | 245.16 | -.16*** |
|β_{2}-1| | 275.28 | .43*** | 247.44 | -.17*** | 264.77 | .43*** | 215.83 | -.16*** |
***
The twin sample allows for comparison of correlations of monozygotic (MZ) twins, who share 100% of their DNA, and dizygotic (DZ) twins, who share on average 50% of their DNA in order to get an estimation of the amount of variance accounted for by genetics, shared environment (factors that make siblings more similar to one another) and nonshared environment (factors that make siblings less similar to one another) and error. OpenMx, a software package in R, was used to conduct the twin analyses in order to get estimates of heritability, shared environment, and nonshared environment/error for the accuracy values at ages 12 and 15 (
NLE accuracy | 12-year-old |
15-year-old |
||||||||
---|---|---|---|---|---|---|---|---|---|---|
PAE (log) | .46 | .13 | .40 [.12, .52] | 0 | .60 [.48, .71] | .27 | .20 | .06 [0, .38] | .19 [0, .34] | .75 [.61, .87] |
λ (log) | .52 | .09 | .45 [.25, .56] | 0 | .55 [.44, .68] | .28 | .28 | .00 [0, .36] | .28 [0, .37] | .72 [.62, .83] |
|β_{1}-1| | .51 | .10 | .45 [.30, .57] | 0 | .55 [.43, .55] | .27 | .25 | .05 [0, .41] | .22 [0, .36] | .73 [.58, .85] |
|β_{2}-1| | .55 | .10 | .49 [.32, .60] | 0 | .51 [.40, .64] | .05 | .02 | .04 [0, .18] | .002 [0, .15] | .96 [.82, 1.0] |
The debate about how to appropriately characterize performance on the NLE task has left an open question about how the theoretical stances reflected in the measurements differentially translate to prediction of math achievement. Does a method that describes the average error such as PAE predict math achievement better than a method designed to capture the logarithmic-to-linear shift such as the mixed log-linear model or a method designed to capture strategy use such as the cyclical power model (one-cycle or two-cycle)? The results of the analyses of this study provide several conclusions: 1) PAE, mixed log-linear model, one-cycle cyclical power model, and two-cycle cyclical power model are highly correlated with one another 2) The accuracy measures for each provide more predictive value for the Applied Problems subtest than the other math achievement measures when g is included as a predictor 3) Differences in behavior genetic estimates are not noted among the accuracy measures.
First, the high correlations among the accuracy measures, especially in the 12-year-old sample, are notable given theoretical differences between the measures. Although the one-cycle cyclical power model and two-cycle cyclical power model account for performance based on strategy use, and the PAE and logarithmic-to-linear shift do not, the measures were still highly correlated. The highest correlation between the mixed log-linear model and one-cycle cyclical power model may be due to a similarity in the predicted shape of responding, in which both models capture overestimation of spaces on the lowest end of the number line. Such high correlations, especially between the mixed log-linear model and one-cycle cyclical power model, indicate that the different measures are mostly capturing the same variation.
Although all measures of the NLE task were significantly correlated both cross-sectionally and longitudinally with all three math achievement measures, the relationship between the NLE task and two measures of math achievement were no longer significant once general intelligence was included as a predictor. We have two possible explanations for why performance on the NLE task predicts performance on the Applied Problems subtest once g is controlled but does not predict performance on the Fluency or Calculation subtests.
First, it is possible that internal magnitude representation only assists in performance on tests that are developmentally challenging for the participant. For example, in children in kindergarten through second grade, NLE task performance was predictive of accuracy on simple addition and subtraction problems (
Alternatively, the significant prediction of performance on the NLE task for Applied Problems may be due to shared characteristics of the tasks such as proportional reasoning requirements and strategy use. The Applied Problems subtest requires the participants to perform proportional tasks such as identifying what a third of a quantity would be in a story problem. In addition, story problems in the Applied Problems subtest require the participants to choose relevant information before performing operations; this is a more complex task that requires some strategy for higher performance (
The phenotypic analyses also gave insight into the stability of the task from age 12 to age 15. In the 3-year interval, the correlation between task performance was relatively stable (PAE: .37, λ: .44, β_{1}: .45, β_{2}: .28). In a previous study of 5-year olds, the correlation between performance (measured by PAE) on a NLE task across 30 weeks of measurement was .41 for 1-100 endpoints, .46 for 1-10 endpoints, and nonsignificant for 1-20 endpoints (
As in the phenotypic analyses, the behavior genetic analyses also did not show any differentiation between the measures despite theoretical differences. Genetics were hypothesized to be influential in all measures given the amount of variation that is typically predicted in cognitive variables (
We also hypothesized that a significant proportion of the variation in performance on the NLE task would be due to shared environment because of the environmental influences demonstrated by previous studies (
Overall, there do not appear to be fundamental differences between the accuracy measures on the NLE task in samples of 12 and 15 year olds. All measures are highly correlated and approximately equally predictive of math achievement. The appropriateness of the accuracy measure for a given study of adolescents thus can be determined based on pragmatic and theoretical underpinnings of the study. PAE is beneficial in that it is calculated without fitting data to a model, and thus even when subjects’ responses are extreme, a result can still be obtained. However, if individual differences of progress towards linearity are being sought, then the mixed log-linear model seems to still be the most appropriate, although the fit may not be appropriate in the cases of very low performers. In addition, the similarity of the one-cycle cyclical power model with mixed log-linear model has also been established. The high correlation between these measures indicates that they are both measuring a similar pattern of responding, but the cyclical power model may lose individual differences for an even larger subset of the lowest performers due to model fit concerns. This study provided evidence for relevant individual differences in magnitude representation for adolescents, a group whose magnitude representation has not been largely studied. The similarity of the measures despite differences in theoretical underpinnings has also been shown using both regression and behavior genetic analyses.
The Western Reserve Reading and Math Project was supported by the Eunice Kennedy Shriver National Institute of Child Health and Development grants HD038075, HD059215, HD068728 and HD075460 and by National Center for Advancing Translational Sciences, grant 8UL1TR000090-05. S. Lukowski was supported by the National Science Foundation Graduate Research Fellowship Program under grant no. DGE-1343012.
The authors have declared that no competing interests exist.
The authors have no support to report.