^{*}

^{a}

^{b}

^{c}

^{d}

Does nonverbal, approximate number acuity predict mathematics performance? Some studies report a correlation between acuity of representations in the Approximate Number System (ANS) and early math achievement, while others do not. Few previous reports have addressed (1) whether reported correlations remain when other domain-general capacities are considered, and (2) whether such correlations are causal. In the present study, we addressed both questions using a large (N = 204) 3-year longitudinal dataset from a successful math intervention, which included a wide array of non-numerical cognitive tasks. While we replicated past work finding correlations between approximate number acuity and math success, these correlations were very small when other domain-general capacities were considered. Also, we found no evidence that changes to math performance induced changes to approximate number acuity, militating against one class of causal accounts.

Beginning early in infancy, humans can represent approximate numerical quantities nonverbally, using what is sometimes called the “Approximate Number System” (ANS) or the “number sense” (

In fact, more than a dozen studies have reported some correlation between the ANS and symbolic math, and these correlations often survive the addition of non-numerical control predictors, like verbal SAT score, IQ, and spelling ability (e.g.,

Many other studies, however, have found the relation between ANS acuity and symbolic math ability to be negligibly small or even absent, especially when controlling for other non-numerical cognitive skills like inhibitory control, symbolic number knowledge, knowledge of numerical cardinality, and non-numerical quantity comparison (e.g.,

One way to adjudicate between these discrepant findings is via meta-analysis. For example, one recent meta-analysis demonstrated that – across a wide range of study methodologies and (36 independent) samples – ANS acuity explained substantial variability in symbolic math achievement (

In the present study, we tested whether the ANS is meaningfully causally linked to mathematics achievement. We did this first, by assessing its longitudinal predictive power relative to a large battery of other cognitive measures, and second, by assessing whether changes in math performance ^{nd} through 5^{th} graders (

In addition to the measures described above,

To summarize, we re-analyzed data from a longitudinal math intervention to test the uniqueness and causal status of correlations between the ANS and math achievement, while simultaneously probing dot-array estimation, a measure of associations between numerals and approximate magnitudes. In doing so, we provide the first large-scale longitudinal study to assess the causal link between the ANS and math achievement while simultaneously controlling for an exhaustive battery of domain-general cognitive capacities.

Data were obtained from a previous study by

Children received several measures of math competence, including the Woodcock-Johnson III Computation test, the Math Fluency subtest of the Wechsler Individual Achievement Test (WIAT-III), and in-house tests of arithmetic and place value understanding. Also, children’s math grades were available (as a score between 0-100), as reported by their school. Detailed descriptions of these measures are available in the Supplementary Materials of

As reported in

To test whether differences in math training are related to changes in ANS and estimation ability, we compared the ANS and estimation abilities of children who completed the mental abacus intervention to the control group.

Children’s ANS acuity was assessed using a 10-minute timed computerized task. As is typical for tasks assessing ANS acuity, two arrays of black dots were presented simultaneously on a gray background; the two arrays were separated by a vertical black line. Half of the trials controlled for total surface area across the arrays; the other half of trials controlled for item size (

Arrays were visible for 1000 ms and were followed by a 300 ms white noise mask image. Children were instructed to indicate which array was more numerous by pressing the Z (which was covered with a left arrow) or M (which was covered by a right arrow) key. The experiment was self-paced, and children pressed the space bar to progress to the next trial. To ensure that children attended to each trial, two beeps were presented via headphones immediately prior to the presentation of the arrays.

Trials were presented in blocks of 8. Within each block, the ratio of items in the two sets remained constant; all children started with a 4:5 ratio. Within each block, the numerical magnitudes of the arrays varied substantially (e.g., 16 vs. 20; 80 vs. 100). In order to succeed on a given block, the child needed to get 6 out of 8 trials correct. Side of the correct response was pseudo-randomly ordered so that alternating responses or consistent choices of “left” or “right” would lead to failure of the block. If participants succeeded on a block, they moved to the next hardest ratio (e.g., 5:6), while if they failed, they moved to the next easiest ratio (e.g., 3:4). Ratios ranged from 1:2 to 15:16.

Children’s estimation ability was tested by asking them to estimate the number of dots on a screen; task duration was 10 minutes. Arrays were randomly generated and contained black dots on a gray background. The number of dots ranged from 3-120, and dot size and total area of the array varied across trials. Children viewed each array for 400 ms, and then entered their numerical estimate on a keypad (errors could be corrected by using the backspace key). Prior to beginning the task, children completed a keypad typing training session to ensure that all children could appropriately use the keypad.

Children were tested on a battery of control tasks, again described in detail in the Supplemental Materials of

In the verbal working memory task, children heard a sequence of target syllables (e.g., “GU, TI”). They then had to decide whether a second sequence of syllables (e.g., “RA”, “TI”) was the same or different from the target. The test sequence was either identical to the target, or differed by one syllable. All stimuli were recorded by a native speaker of Gujarati, using syllables that are legal in English, Hindi, and Gujarati.

In the spatial working memory task, participants saw a 5x5 grid of circles. They then saw a target sequence of circles turn yellow (only one circle was filled at a time). They were then presented with a test sequence of blue circles, and had to decide whether the blue circles appeared in the same locations as the yellow circles. Again, the sequence of blue circles was either identical to the sequence of yellow circles, or contained a single circle that appeared in a different position.

Both working memory tasks were adaptive – children were asked to remember

Children also completed a paper-and-pencil task that measured mental rotation ability. In this task, participants were asked to match one of two target items (either letters or shapes, based on the

Before presenting our main analyses, we first describe how data were used to construct measures of ANS acuity, estimation ability, and mathematics achievement.

Each child’s ANS acuity was measured for each year (Year 0, 1, 2, and 3) as a Weber fraction (using the method described by _{acuity} Y0 = .37; Y1 = .18; Y2 = .15; Y3 = .14). Prior to analysis, we excluded children whose Weber fractions were > .8, as a value this large likely reflects a misunderstanding of the task. We selected the Weber fraction as our DV because it is the standard measure of ANS acuity in the field. However, because measures of ANS acuity differ from one another (

Using previously unreported data, we also constructed several measures of estimation ability, using data from Years 1, 2, and 3 (Year 0 estimation data were not collected). First, we tested the internal consistency of children’s estimates. To assess consistency we used two measures: ordinality and linear ^{2}, described below.

Ordinality captures the extent to which a child’s estimates are ordered consistently. Specifically, we defined ordinality by calculating the proportion of trials on which the child gave estimates in the correct direction relative to previous estimates. For example, if a smaller number of dots was shown on trial

The Linear ^{2} measure of internal consistency represents the amount of variability in estimation performance that can be accounted for by knowing the number of dots a child was estimating. In other words, this value represents the extent to which the relation between a child’s estimate and the number of dots that they saw can be described by a linear function (in previous work, this has been referred to as the “linearity” of children’s estimates; e.g., ^{2}, we constructed a linear regression predicting each child’s estimates from the number of dots presented, and then reported the linear ^{2} of the line (Y1 = .37; Y2 = .35; Y3 = .36; again, these values did not appear to increase over time). Importantly, a high Linear ^{2} score does not necessarily indicate that a child provided accurate estimates, but rather that the child’s estimates were internally consistent (for example, one could imagine a child who overestimated small numbers, underestimated large numbers, and yet still provided estimates that were perfectly linear). Unlike ordinality, which only captures the internal consistency of the ^{2} value, children must also be internally consistent in the ^{2}

In addition to these two measures of internal consistency of estimates, we also calculated the accuracy of estimates via the Proportion Absolute Error (PAE), which represents the absolute value of the deviation of an estimate from the actual number presented, divided by the number presented (_{PAE}: Y1 = .71; Y2 = .71; Y3 = 70; again, no group-wise change over time). PAE has been shown previously to predict math performance on standardized tests (

To provide an approximate measure of the reliability of each our estimation and ANS measures, we predicted each year’s data from the previous year’s data; we report these Pearson correlation coefficients and significance level in

Year | Ordinality | PAE | Linear ^{2} |
ANS |
---|---|---|---|---|

Year 0-1 | n/a | n/a | n/a | 0.232** |

Year 1-2 | 0.213** | 0.278** | 0.191** | 0.191* |

Year 2-3 | 0.333*** | 0.542*** | 0.379*** | 0.444*** |

*

We did not have a set of specific, a priori, theoretically-motivated predictions about the differences between our particular measures of math competence (WIAT, WJ-III, arithmetic, place value, and math grades), and so we constructed two different math measures, both of which combined data from multiple math tests. Our goal in creating these two composite measures was to reduce the dimensionality of our analyses and avoid the issue of attempting to analyze five different but highly correlated measures of symbolic math.

We created a composite of the standardized math test scores (WIAT math fluency and WJ-III Computation subtest) by calculative the proportion correct for each test, and averaging scores on the two measures. This composite showed improved mathematics performance across each year of testing (_{standardized}: Y0 = .20; Y1 = .31; Y2 = .43; Y3 = .54). Because standardized math testing is commonly used both in psychology and education, this measure captured the type of math competence that is likely to be measured in a classroom or lab setting.

We also created a single composite measure that took into account

Our fitted model showed an effect of Year, suggesting that children’s ANS acuity improved over time (

ANS acuity (Years 0, 1, 2, and 3) and estimation performance (Years 1, 2, and 3 – we did not test estimation in Year 0). Red line indicates children who learned abacus; black line indicates children who were in the control group. For Proportion Absolute Error (PAE) and ANS measures, smaller numbers indicate better performance. For our Linear ^{2} and Ordinality measures, larger numbers indicate better performance. Error bars are Standard Errors.

We next tested whether our intervention influenced estimation performance. Because we did not have Year 0 (baseline) data for estimation, we could not assess with certainty whether abacus training caused changes to estimation. However, we were able to test whether there were differences in estimation performance between the abacus and control group during Years 1, 2, and 3.

No estimation measure showed consistent (e.g., across more than one year) differences between the abacus and control group. Also, when correcting for multiple comparisons, no ^{2}, there were no effect in Years 1 or 2 (Year 1:

While our results thus far are not consistent with the view that math training causes improvements to the ANS, they leave open the opposite possibility that ANS acuity might still be related to mathematics ability. To assess this, our next set of analyses tested whether ANS acuity and estimation performance were related

To test whether ANS acuity was related to math performance across intervention groups, we constructed regression models predicting standardized math scores from ANS acuity. For simplicity, we fit these models for each year separately. Following the logic of previous studies in this literature, these models test whether ANS acuity predicts concurrent math achievement. For these and all subsequently reported models, we scaled all predictors in order to compare the relative predictive value of each parameter in the models directly (since all betas and standard errors are in standard units).

Replicating previous research, we found that ANS acuity was a concurrent predictor of standardized math scores for Years 0, 1, and 3 (see ^{i} Our alternative measure of ANS acuity (hardest ratio reached) showed that ANS was a concurrent predictor of PC1 in all years. Thus, while we found inconsistent evidence, the majority of our correlations revealed a concurrent predictive relation between ANS and math, replicating previous results.

Predictor | Standardized Tests |
PC1 |
||||
---|---|---|---|---|---|---|

Year 0 | ||||||

ANS | -0.286 | 0.076 | 0.0002 | -0.296 | 0.079 | 0.0002 |

Year 1 | ||||||

ANS | -0.121 | 0.073 | 0.01 | -0.129 | 0.073 | 0.08 |

PAE | -0.074 | 0.074 | 0.32 | -0.168 | 0.073 | 0.02 |

Linear ^{2} |
0.103 | 0.074 | 0.17 | 0.158 | 0.074 | 0.03 |

Ordinality | 0.0008 | 0.074 | 0.99 | 0.105 | 0.074 | 0.16 |

Year 2 | ||||||

ANS | -0.105 | 0.074 | 0.16 | -0.147 | 0.074 | 0.048 |

PAE | -0.145 | 0.073 | 0.049 | -0.248 | 0.071 | 0.011 |

Linear ^{2} |
0.191 | 0.073 | 0.001 | 0.249 | 0.073 | 0.0008 |

Ordinality | 0.208 | 0.073 | 0.005 | 0.191 | 0.075 | 0.011 |

Year 3 | ||||||

ANS | -0.226 | 0.072 | 0.002 | -0.262 | 0.071 | 0.0003 |

PAE | -0.228 | 0.072 | 0.002 | -0.248 | 0.071 | 0.0007 |

Linear ^{2} |
0.277 | 0.071 | 0.0001 | 0.338 | 0.07 | <.0001 |

Ordinality | 0.228 | 0.072 | 0.002 | 0.274 | 0.072 | 0.0002 |

We next asked whether estimation performance concurrently predicted math success. Each of our estimation measures concurrently predicted standardized math scores in Years 2 and 3, but not in Year 1 (see ^{2} concurrently predicted our math PC1 every year. Thus, as in previous work, we find that estimation performance predicts concurrent math achievement. More interesting, however, is whether this relationship survives the addition of a large battery of domain general measures.

Having replicated past work showing that ANS and estimation ability are predictive of concurrent math achievement, we next asked whether such predictive relations were uniquely numerical, or whether they could be explained by domain-general cognitive abilities. To do this, we predicted math outcomes (our standardized math score composite and PC1) from numerical predictors (ANS and estimation performance) and from our battery of non-numerical measures (mental rotation, spatial WM, verbal WM, Raven’s, age, and Intervention Condition; additional information about each of these measures is available here:

Standardized Beta weights (bars are standard error) when predicting standardized math scores from each of our predictors (ANS, PAE, Linear ^{2}, and Ordinality), controlling for other non-numerical tasks. Each cell represents the results of a single model output, such that the results of 13 models are depicted. Columns represent years of test (Y0-Y3).

Standardized Beta weights (bars are standard error) when predicting our math PC1 from each of our predictors (ANS, PAE, Linear r^{2}, and Ordinality), controlling for other non-numerical tasks.

While measures like Raven’s (Y1-Y3, all models |^{2} and PAE never significantly predicted standardized test scores (all ^{2} may have some predictive power), and Ordinality only predicted standardized test scores in Year 2 (

Having explored the relationship between standardized math scores and the ANS and estimation tasks, we next applied the same analyses to predict the composite math score (PC1; see ^{ii} Neither Ordinality nor PAE ever predicted PC1 when controlling for other factors (all ^{2} did significantly predict PC1 in both Year 2 (^{2}, our estimation measures typically failed to predict our math PC1 once other control variables were included in the model. Further, our ANS measure did not consistently predict math PC1 when controlling for other factors.

We tested whether nonverbal number (ANS) acuity and verbal estimation ability were uniquely predictive of symbolic math achievement. Specifically, we assessed (1) whether improvements to math performance caused changes to ANS acuity; (2) whether relations between math performance, ANS, and estimation were consistent over time; and (3) whether relations between math performance, ANS, and estimation persisted over time when controlling for performance on a battery of non-numerical control tasks. To test these questions, we conducted new analyses of data from a three-year-long randomized controlled math intervention,

We first asked whether changes to math performance

Next, we sought to replicate and then explain previous work which found that ANS and estimation ability are related to formal math skill. Consistent with some previous studies, we found that both ANS acuity and estimation performance served as concurrent predictors of math success. However, we also found that this predictive relation was attenuated substantially when other, non-numerical predictors were included in the model. In fact, non-numerical measures like Raven’s, Mental Rotation, and verbal working memory were very strong predictors of math outcome, whereas ANS acuity and estimation were not. With the exception of a small subset of our analyses (ANS acuity in Year 0 and Linear ^{2} in Years 2 and 3), we found little evidence that our ANS and estimation measures uniquely predicted math outcomes when controlling for other cognitive abilities. This finding supports the conclusion that the relation between ANS acuity and symbolic math performance is often weakest in the early elementary school years (

Why might correlations between estimation, ANS acuity, and mathematics achievement disappear when controlling for other cognitive capacities? One likely explanation is that tasks that measure ANS acuity and estimation also depend on capacities like spatial working memory, and domain general abilities like comparison, analogy, and perhaps even proportional reasoning; all of these skills have been implicated in mathematics or estimation performance (^{iii} Because the ANS and estimation tasks that have previously been shown to predict math skill also depend on non-numerical cognitive abilities, and because few past studies thoroughly measured these factors, these previous findings may be driven in part by confounding non-numerical factors.

Alternatively, some have argued that the ability to use ANS representations to perform approximate math computations (e.g., the ability to nonverbally “add” quantities) – rather than ANS acuity itself – is most predictive of math achievement (

Before concluding, we note that while our study serves as an important extension of existing work on the ANS, it remains possible that studies in other populations will yield different results. As in all psychological research, the characteristics of the participants in our sample may limit our ability to draw generalizable inferences in the general human population (

To conclude, while we replicated past findings that ANS and estimation ability are concurrently predictive of math success, we failed to find evidence that changes to math skill caused changes to ANS or estimation performance. We also failed to find consistent evidence that ANS and estimation performance uniquely predicted math success. In fact, the strongest predictors of math performance were our non-numerical cognitive predictors, like Raven’s, verbal working memory, and mental rotation. These data suggest that, while approximate measures of numerical competence (e.g., ANS acuity and estimation) may be related to math success, this relationship is likely fragile, and is one among many that predict mathematics achievement. More informative predictors of math achievement include domain general capacities like working memory, mental rotation, and general intelligence.

Thanks to George Alvarez, for stimulus design and funding support. This work was funded by a grant to D.B. and George Alvarez from NSF REESE grant #0910206 and by an NSF GRFP to J.S.

We thank the staff, families, and children at Zenith School for help in collecting data, with a special thank you to Abbasi Barodawala, Mary Joseph, and Snehal Karia. Thanks also to Neon Brooks, Sean Barner, Eleanor Chestnut, Jonathan Gill, Ali Horowitz, Talia Konkle, Ally Kraus, Molly Lewis, Bria Long, Ann Nordmeyer, Mahesh Srinivasan, Viola Störmer, Jordan Suchow, and Katharine Tillman for help with data collection.

Using our alternative measure of ANS acuity, we found that ANS was a concurrent predictor of standardized math scores for Years 0, 2, and 3, but not for Year 1, and that ANS was a concurrent predictor of PC1 across all years.

Our alternative measure of ANS acuity (hardest ratio tested) predicted PC1 in Years 0 and 1.

A deeper critique, based on the evidence that individuals’ scores on different ANS tasks often fail to correlate with one another, is that the non-numerical properties of ANS stimuli (including the controls used, whether trials get increasingly more difficult over time, and the visual properties of the stimuli;

The authors have declared that no competing interests exist.