Beginning early in infancy, humans can represent approximate numerical quantities nonverbally, using what is sometimes called the “Approximate Number System” (ANS) or the “number sense” (Dehaene, 1997; Xu & Spelke, 2000). The ANS is used to represent and compare numerical magnitudes, and does so according to Weber’s law, such that the ratio of any two numerical quantities determines the likelihood that they can be differentiated (for review, see Dehaene, 1997). A number of recent studies report that individual differences in ANS acuity are related to mathematics achievement, such that individuals with greater numerical acuity also perform better on standardized math tests, the SAT, and a host of other math measures (Chen & Li, 2014; Halberda, Mazzocco, & Feigenson, 2008; Halberda et al., 2012).
In fact, more than a dozen studies have reported some correlation between the ANS and symbolic math, and these correlations often survive the addition of nonnumerical control predictors, like verbal SAT score, IQ, and spelling ability (e.g., Anobile, Stievano, & Burr, 2013; Bonny & Lourenco, 2013; Desoete et al., 2012; DeWind & Brannon, 2012; Gilmore et al., 2010; Halberda, Mazzocco, & Feigenson, 2008; Halberda et al., 2012; Libertus et al., 2011; Libertus, Feigenson, & Halberda, 2013; Libertus, Odic, & Halberda, 2012; Mazzocco et al., 2011a, 2011b; Piazza et al., 2010; Starr et al., 2013). Further, Park and Brannon (2013, 2014) have provided evidence that training on nonverbal number tasks can lead to improvements to math performance in adults, raising the possibility that the ANS is foundational to mathematical learning, not merely an interesting correlate. These results are exciting for at least two reasons. First, they suggest a link between the evolutionarily ancient ANS and the more recent human innovation of symbolic arithmetic, thus potentially providing insight into the origins of mathematical thought. Second, they suggest that tests of ANS acuity may be helpful in designing diagnostic and intervention tools for early math difficulties (Park & Brannon, 2013, 2014; Starr, Libertus, & Brannon, 2013), perhaps even before children begin formal math training.
Many other studies, however, have found the relation between ANS acuity and symbolic math ability to be negligibly small or even absent, especially when controlling for other nonnumerical cognitive skills like inhibitory control, symbolic number knowledge, knowledge of numerical cardinality, and nonnumerical quantity comparison (e.g., Chu et al., 2015; Fuhs & McNeil, 2013; Gilmore et al., 2013; Göbel et al., 2014; Holloway & Ansari, 2009; Kolkman et al., 2013; Nosworthy et al., 2013; Price et al., 2012; Sasanguie et al., 2013; Sasanguie, Defever, Maertens, & Reynvoet, 2014; Sasanguie, De Smedt, Defever, & Reynvoet, 2012; Tibber et al., 2013; van Marle, Chu, Li, & Geary, 2014; Wei et al., 2012). Further, at least one study that trained children’s ANS acuity found no effect on math ability (Obersteiner, Reiss, & Ufer, 2013). These discrepant findings raise important questions about the nature (and malleability) of the ANS, and about the practical significance of any relationship between the ANS and symbolic math (for review, see De Smedt, Noël, Gilmore, & Ansari, 2013).
One way to adjudicate between these discrepant findings is via metaanalysis. For example, one recent metaanalysis demonstrated that – across a wide range of study methodologies and (36 independent) samples – ANS acuity explained substantial variability in symbolic math achievement (Chen & Li, 2014). However, half of the studies included in the metaanalysis did not control for participants’ nonnumerical cognitive capacities, and the vast majority of those that did controlled only for participants’ linguistic ability. This inconsistent inclusion of control tasks is a problem, because tasks typically used to measure ANS acuity – e.g., dot array comparison – could plausibly draw on other nonnumerical cognitive capacities, like working memory, nonnumerical quantity representation, and inhibitory control (e.g., Gilmore et al., 2013). Critically, each of these cognitive capacities has also been shown to predict early math achievement, making it possible that they mediate relationship between ANS acuity and math achievement (Alloway & Passolunghi, 2011; Clark, Pritchard, & Woodward, 2010; DeStefano & LeFevre, 2004; Geary, 2011; Gilmore et al., 2013; Hornung, Schiltz, Brunner, & Martin, 2014; Lourenco, Bonny, Fernandez, & Rao, 2012; Passolunghi, Cargnelutti, & Pastore, 2014; Thompson, Nuerk, Moeller, & Cohen Kadosh, 2013). Thus, although the reported correlations between ANS acuity and math achievement may be due to a unique relationship between verbal and nonverbal numerical abilities, it is also possible that other, nonnumerical perceptual and cognitive capacities explain the reported correlations. Similar challenges confront the interpretation of studies that compare individuals with different levels of formal mathematics training (e.g., Castronovo & Göbel, 2012; Lindskog, Winman, & Juslin, 2014; Nys et al., 2013; Pica, Lemer, Izard, & Dehaene, 2004).
In the present study, we tested whether the ANS is meaningfully causally linked to mathematics achievement. We did this first, by assessing its longitudinal predictive power relative to a large battery of other cognitive measures, and second, by assessing whether changes in math performance caused changes to the ANS. Specifically, we analyzed publicly available data from a threeyear longitudinal randomized controlled trial (RCT) of a math intervention in 2^{nd} through 5^{th} graders (N = 204). This RCT provided supplemental mathematics training in an experimental group using a popular mental arithmetic technique called “mental abacus” (Barner et al., 2016). Mental abacus training involves teaching participants to perform arithmetic calculations using an abacus, and, at more advanced levels, removing the physical abacus and asking users to visualize the abacus and calculations using this mental representation (Frank & Barner, 2012; Hatano, Miyake, & Binks, 1977; Stigler, Chalip, & Miller, 1986). In the RCT, abacus training improved children’s arithmetic abilities relative to a control group who had received additional training in a standard math curriculum, allowing us to test whether changes in arithmetic skill (induced by the intervention) led to changes in ANS acuity. Critically, the study also included a longitudinallyadministered measure of ANS acuity, as well as tests of spatial working memory, verbal working memory, mental rotation ability, and general intelligence (Raven’s Progressive Matrices). The availability of these measures allowed us to assess the predictive relation between ANS acuity and math achievement while concurrently controlling for a larger than usual set of nonnumerical cognitive capacities. To our knowledge, this is the first study to have experimentally manipulated math ability in order to test the effects of improvements to math skill on ANS acuity.
In addition to the measures described above, Barner et al. (2016) also collected (but did not analyze) longitudinal dotarray estimation data, by measuring children’s ability to label arrays of dots with number words. These data allowed us to test the relationship between estimation and math achievement. Given that ANS representations are known to be linked to number words in the verbal count list (e.g., Le Corre & Carey, 2007; Mundy & Gilmore, 2009; Whalen, Gallistel, & Gelman, 1999), it is possible that the strength and precision of this link mediates the relation between ANS acuity and symbolic math performance (Libertus, Odic, Feigenson, & Halberda, 2015). Since estimation ability captures the translational process between symbolic and nonsymbolic number representations (see Sullivan & Barner, 2014a, for discussion), we also tested whether children’s estimation ability was uniquely related to math performance (Booth & Siegler, 2006, 2008; Gunderson, Ramirez, Beilock, & Levine, 2012; Kolkman et al., 2013; Moore & Ashcraft, 2015; Siegler & Booth, 2004). To our knowledge, no previous study has assessed this link while simultaneously controlling for a large battery of domaingeneral cognitive capacities.
To summarize, we reanalyzed data from a longitudinal math intervention to test the uniqueness and causal status of correlations between the ANS and math achievement, while simultaneously probing dotarray estimation, a measure of associations between numerals and approximate magnitudes. In doing so, we provide the first largescale longitudinal study to assess the causal link between the ANS and math achievement while simultaneously controlling for an exhaustive battery of domaingeneral cognitive capacities.
Method
Participants
Data were obtained from a previous study by Barner et al. (2016) at https://github.com/langcog/mentalabacus. Participants were 204 children from a charitable school in Gujarat, India. Children spoke English (the language of instruction at their school), and most children also spoke an additional language (Guajarati and Hindi were the most common). Children came from either Muslim (41%) or Hindi (59%) families. Family income was low, with 80% of children coming from families earning around $2000 USD per year (median household income for 20062012 was just over $9,700 globally, and just over $3,000 in India; Phelps & Crabtree, 2013). Children were between of 5 and 8 years of age (M = 6.65 years, SD = .53) at the time of enrollment in 2010 (Year 0).
Math Measures
Children received several measures of math competence, including the WoodcockJohnson III Computation test, the Math Fluency subtest of the Wechsler Individual Achievement Test (WIATIII), and inhouse tests of arithmetic and place value understanding. Also, children’s math grades were available (as a score between 0100), as reported by their school. Detailed descriptions of these measures are available in the Supplementary Materials of Barner et al. (2016).
Intervention
As reported in Barner et al. (2016), all children followed a standard math curriculum throughout the 3 years of the study. In addition to their standard math curriculum, students were also enrolled in one of two supplemental math programs: The control program used a standard international math curriculum, while the abacus program taught students mental abacus. Thus, all children received a shared core math curriculum, and one of two possible supplemental math curricula. The random assignment procedure operated as follows. At enrollment in Year 0, children were randomly assigned to one of three homerooms. Children randomly assigned to Homeroom 1 received supplemental instruction in a standard international math curriculum. Children assigned to Homeroom 2 received an equal number of hours of supplemental training using mental abacus. Half of the children in Homeroom 3 received supplemental abacus training, and the other half received the control supplemental standard math curriculum. Supplemental abacus instruction was conducted by a privately trained abacus instructor, and in the case of the split class was conducted in a space separate from control children. Of the 204 children in the study, the majority provided data for every year of testing (Years 03; Control group n = 88; Mental Abacus group n = 99).
To test whether differences in math training are related to changes in ANS and estimation ability, we compared the ANS and estimation abilities of children who completed the mental abacus intervention to the control group.
ANS and Estimation Tasks
Children’s ANS acuity was assessed using a 10minute timed computerized task. As is typical for tasks assessing ANS acuity, two arrays of black dots were presented simultaneously on a gray background; the two arrays were separated by a vertical black line. Half of the trials controlled for total surface area across the arrays; the other half of trials controlled for item size (Dehaene et al., 2005). The correct answer was on the left 50% of the time.
Arrays were visible for 1000 ms and were followed by a 300 ms white noise mask image. Children were instructed to indicate which array was more numerous by pressing the Z (which was covered with a left arrow) or M (which was covered by a right arrow) key. The experiment was selfpaced, and children pressed the space bar to progress to the next trial. To ensure that children attended to each trial, two beeps were presented via headphones immediately prior to the presentation of the arrays.
Trials were presented in blocks of 8. Within each block, the ratio of items in the two sets remained constant; all children started with a 4:5 ratio. Within each block, the numerical magnitudes of the arrays varied substantially (e.g., 16 vs. 20; 80 vs. 100). In order to succeed on a given block, the child needed to get 6 out of 8 trials correct. Side of the correct response was pseudorandomly ordered so that alternating responses or consistent choices of “left” or “right” would lead to failure of the block. If participants succeeded on a block, they moved to the next hardest ratio (e.g., 5:6), while if they failed, they moved to the next easiest ratio (e.g., 3:4). Ratios ranged from 1:2 to 15:16.
Children’s estimation ability was tested by asking them to estimate the number of dots on a screen; task duration was 10 minutes. Arrays were randomly generated and contained black dots on a gray background. The number of dots ranged from 3120, and dot size and total area of the array varied across trials. Children viewed each array for 400 ms, and then entered their numerical estimate on a keypad (errors could be corrected by using the backspace key). Prior to beginning the task, children completed a keypad typing training session to ensure that all children could appropriately use the keypad.
Control Tasks
Children were tested on a battery of control tasks, again described in detail in the Supplemental Materials of Barner et al. (2016). This battery included two computerized measures of working memory: (1) a test of verbal working memory; and (2) a test of spatial working memory. For both tasks, the participant was first presented with a target sequence of events, and then had to decide whether a second sequence of events was the same as or different from the target (see below). For both tasks, children completed 10 practice trials prior to the start of the experiment, and 24 trials as part of the experiment. Children provided their responses by pressing either the ‘s’ key (relabeled as “S” for same) or the ‘l’ key (relabeled as “D” for different) on a keyboard.
In the verbal working memory task, children heard a sequence of target syllables (e.g., “GU, TI”). They then had to decide whether a second sequence of syllables (e.g., “RA”, “TI”) was the same or different from the target. The test sequence was either identical to the target, or differed by one syllable. All stimuli were recorded by a native speaker of Gujarati, using syllables that are legal in English, Hindi, and Gujarati.
In the spatial working memory task, participants saw a 5x5 grid of circles. They then saw a target sequence of circles turn yellow (only one circle was filled at a time). They were then presented with a test sequence of blue circles, and had to decide whether the blue circles appeared in the same locations as the yellow circles. Again, the sequence of blue circles was either identical to the sequence of yellow circles, or contained a single circle that appeared in a different position.
Both working memory tasks were adaptive – children were asked to remember n items during the target sequence, and if they responded correctly, they were asked on the next trial to remember n+1 items (if they responded incorrectly, they were next asked to remember n1 items). Thus, if a child successfully remembered two target syllables (e.g., “GU”, “TI”), on the next trial they were asked to remember three target syllables (e.g., “RA”, “MI”, “TU”). For both working memory tasks, we calculated participants’ memory score by averaging the level of difficulty for all 24 trials.
Children also completed a paperandpencil task that measured mental rotation ability. In this task, participants were asked to match one of two target items (either letters or shapes, based on the Shepard & Metzler, 1971 stimuli) to a sample. One target item was the mirror image of the sample while the other was an exact match; both target items were also rotated. Thus, in order to determine which target matched the sample, the child needed to mentally rotate the item. Finally, children completed Raven’s Progressive Matrices (A and/or B).
Results
Before presenting our main analyses, we first describe how data were used to construct measures of ANS acuity, estimation ability, and mathematics achievement.
Measures & Descriptive Statistics
ANS and Estimation Measures
Each child’s ANS acuity was measured for each year (Year 0, 1, 2, and 3) as a Weber fraction (using the method described by Halberda, Mazzocco, & Feigenson, 2008), where greater ANS acuity is indicated by a smaller Weber fraction (M_{acuity} Y0 = .37; Y1 = .18; Y2 = .15; Y3 = .14). Prior to analysis, we excluded children whose Weber fractions were > .8, as a value this large likely reflects a misunderstanding of the task. We selected the Weber fraction as our DV because it is the standard measure of ANS acuity in the field. However, because measures of ANS acuity differ from one another (Inglis & Gilmore, 2014), we also repeated all analyses using an alternative, less conservative measure of acuity, defined as the ratio of dots on the most difficult trial successfully completed by a child. These additional analyses are available here: https://github.com/langcog/jesstimation, and are only reported in the present paper when they differed from our main analyses.
Using previously unreported data, we also constructed several measures of estimation ability, using data from Years 1, 2, and 3 (Year 0 estimation data were not collected). First, we tested the internal consistency of children’s estimates. To assess consistency we used two measures: ordinality and linear r^{2}, described below.
Ordinality captures the extent to which a child’s estimates are ordered consistently. Specifically, we defined ordinality by calculating the proportion of trials on which the child gave estimates in the correct direction relative to previous estimates. For example, if a smaller number of dots was shown on trial n than on trial n1, a child’s estimate was labeled as ordinal if their estimate was smaller on trial n than on trial n1. The average rate of ordinal responding for each year was Y1 = .80; Y2 = .80; Y3 = .80, demonstrating high levels of ordinal responding. Surprisingly, performance on this measure did not appear to improve over time. Previous work has shown that children can provide ordinal estimates long before they provide accurate estimates, suggesting that this measure might capture children’s early structural knowledge of the relation between the verbal and nonverbal number systems (Sullivan & Barner, 2014a, 2014b). Recent work has even suggested that the understanding of ordinality mediates the link between the ANS and math achievement (Lyons & Beilock, 2011).
The Linear r^{2} measure of internal consistency represents the amount of variability in estimation performance that can be accounted for by knowing the number of dots a child was estimating. In other words, this value represents the extent to which the relation between a child’s estimate and the number of dots that they saw can be described by a linear function (in previous work, this has been referred to as the “linearity” of children’s estimates; e.g., Booth & Siegler, 2006). To calculate Linear r^{2}, we constructed a linear regression predicting each child’s estimates from the number of dots presented, and then reported the linear r^{2} of the line (Y1 = .37; Y2 = .35; Y3 = .36; again, these values did not appear to increase over time). Importantly, a high Linear r^{2} score does not necessarily indicate that a child provided accurate estimates, but rather that the child’s estimates were internally consistent (for example, one could imagine a child who overestimated small numbers, underestimated large numbers, and yet still provided estimates that were perfectly linear). Unlike ordinality, which only captures the internal consistency of the ordering of estimates, in order to have a high Linear r^{2} value, children must also be internally consistent in the relative distance between estimates. In numberline estimation tasks, Linear r^{2} has been shown repeatedly to correlate with symbolic math performance (Booth & Siegler, 2006, 2008; Gunderson, Ramirez, Beilock, & Levine, 2012; Kolkman et al., 2013; Moore & Ashcraft, 2015; Siegler & Booth, 2004).
In addition to these two measures of internal consistency of estimates, we also calculated the accuracy of estimates via the Proportion Absolute Error (PAE), which represents the absolute value of the deviation of an estimate from the actual number presented, divided by the number presented (M_{PAE}: Y1 = .71; Y2 = .71; Y3 = 70; again, no groupwise change over time). PAE has been shown previously to predict math performance on standardized tests (Castronovo & Göbel, 2012; Sasanguie et al., 2013; Siegler & Booth, 2004; although not across all studies: see Booth & Siegler, 2006), addition/subtraction performance (Link, Nuerk, & Moeller, 2014; Moore & Ashcraft, 2015 [addition only]), and mental arithmetic (Lyons, Price, Vaessen, Blomert, & Ansari, 2014). Thus, if possessing highly accurate and stable mappings between number words and nonverbal representations of number is important to math success (e.g., in the case that children actually recruit the ANS to check or compute symbolic math calculations), then PAE should be related to math performance.
To provide an approximate measure of the reliability of each our estimation and ANS measures, we predicted each year’s data from the previous year’s data; we report these Pearson correlation coefficients and significance level in Table 1.
Table 1
Year  Ordinality  PAE  Linear r^{2}  ANS 

Year 01  n/a  n/a  n/a  0.232** 
Year 12  0.213**  0.278**  0.191**  0.191* 
Year 23  0.333***  0.542***  0.379***  0.444*** 
*p < 05. **p < .01. ***p < .001.
Math Measures
We did not have a set of specific, a priori, theoreticallymotivated predictions about the differences between our particular measures of math competence (WIAT, WJIII, arithmetic, place value, and math grades), and so we constructed two different math measures, both of which combined data from multiple math tests. Our goal in creating these two composite measures was to reduce the dimensionality of our analyses and avoid the issue of attempting to analyze five different but highly correlated measures of symbolic math.
We created a composite of the standardized math test scores (WIAT math fluency and WJIII Computation subtest) by calculative the proportion correct for each test, and averaging scores on the two measures. This composite showed improved mathematics performance across each year of testing (M_{standardized}: Y0 = .20; Y1 = .31; Y2 = .43; Y3 = .54). Because standardized math testing is commonly used both in psychology and education, this measure captured the type of math competence that is likely to be measured in a classroom or lab setting.
We also created a single composite measure that took into account all available math achievement data for each child. To do this, we conducted a Principal Components Analysis on all of our symbolic math measures (WIAT, WJIII, arithmetic, place value, math grades). We then took the first principal component (PC1) as a measure of the primary shared variance between these tasks. We then predicted this measure – PC1 – from ANS and estimation performance.
Analyses
Effect of Intervention on ANS and Estimation
Barner et al. (2016), reported using the current data that abacus training had a significant impact on mathematics achievement when measured by the WJIII Computation subtest and the inhouse arithmetic battery. Therefore, by comparing the ANS acuity and estimation ability of children who received abacus instruction to that of children in the control group, we were able to ask whether improvements to mathematics ability caused improvements to ANS and estimation outcomes. We created a mixed effects linear regression model predicting ANS acuity from Year, Intervention Condition (abacus vs. no abacus), and their interaction. We also added participantlevel random effects of Year, capturing individual children’s growth over time. If improvements to math ability cause improvements to ANS acuity, then we should expect a significant Year by Intervention interaction, such that children who received abacus training (and therefore got better at math) showed larger improvements in and acuity over time than children in the control group.
Our fitted model showed an effect of Year, suggesting that children’s ANS acuity improved over time (B = .07, SE = .03; the negative coefficient captures older children’s smaller Weber fractions). However, we found no effect of Intervention Condition (B = .02, SE = .01, p = .11) or interaction of Intervention Condition and Year (B = .006, SE = .007, p = .36), and planned ttests also revealed no effects of abacus training on ANS in Years 1 (t(179) = .20, p = .84, d = .03), 2 (t(183) = .34, p = .74, d = .05), or 3 (t(183) = 1.26, p = .26, d = .19; Fig. 1). When using our alternate measure of ANS acuity (the hardest ratio reached), the abacus and control group differed somewhat in ANS acuity in Years 1 (t(182) = 2.29, p = .023, d = .34; Bonferroni p = .068) and 3 (t(183) = 2.37, p = .019, d = .35; Bonferroni p = .056), but these differences failed to reach significance when correcting for multiple comparisons.
Figure 1
We next tested whether our intervention influenced estimation performance. Because we did not have Year 0 (baseline) data for estimation, we could not assess with certainty whether abacus training caused changes to estimation. However, we were able to test whether there were differences in estimation performance between the abacus and control group during Years 1, 2, and 3.
No estimation measure showed consistent (e.g., across more than one year) differences between the abacus and control group. Also, when correcting for multiple comparisons, no pvalue reached significance. For PAE, there was no effect of abacus training in Year 1 (t(182) = 1.64, p = .10, d = .24), a significant, uncorrected, effect in Year 2 (t(185) = 2.21, p = .03, d = .32; Bonferroni p = .09), and no effect in Year 3 (t(184) = 1.47, p = .14, d = .22;). For Ordinality, there was a significant, uncorrected, effect in Year 1 (t(182) = 2.15, p = .03, d = .32; Bonferroni p = .11), and no effect in Years 2 or 3 (year 2: t(185) = 1.39, p = .17, d = .20; year 3: t(184) = 1.12, p = .26, d = .17). For Linear r^{2}, there were no effect in Years 1 or 2 (Year 1: t(182) = .598, p = .55, d = .09; Year 2: t(185) = .38, p = .71, d = .06), though there was a significant, uncorrected, effect in Year 3 (t(184) = 2.34, p = .02, d = .35; Bonferroni p = .06). Although there were trends indicating a relation between Intervention Condition and estimation performance (e.g., see Figure 1), none of these comparisons reached significance when correcting for the number of comparisons conducted. Further, there were no overall trends of improvement in estimation performance over time in relation to abacus training. To summarize: These analyses suggest that an intervention that improved math performance did not significantly improve ANS acuity. Also, analyses failed to find evidence for a benefit of training to estimation performance, though we lacked preintervention estimation data that could allow us to definitively test whether the intervention affected estimation.
While our results thus far are not consistent with the view that math training causes improvements to the ANS, they leave open the opposite possibility that ANS acuity might still be related to mathematics ability. To assess this, our next set of analyses tested whether ANS acuity and estimation performance were related at all to math achievement, when other cognitive measures were considered. Critically, the analyses that follow do not hinge on the particularities of the abacus intervention, and instead ask about the relation between the ANS and math independent from math training.
Relation Between Math Achievement and the ANS
To test whether ANS acuity was related to math performance across intervention groups, we constructed regression models predicting standardized math scores from ANS acuity. For simplicity, we fit these models for each year separately. Following the logic of previous studies in this literature, these models test whether ANS acuity predicts concurrent math achievement. For these and all subsequently reported models, we scaled all predictors in order to compare the relative predictive value of each parameter in the models directly (since all betas and standard errors are in standard units).
Replicating previous research, we found that ANS acuity was a concurrent predictor of standardized math scores for Years 0, 1, and 3 (see Table 2 for B, SE, and p), though we did not observe a significant relationship in Year 2. ANS was also a concurrent predictor of our math PC1 in Years 0, 2, and 3, but not in Year 1 (Table 2).^{i} Our alternative measure of ANS acuity (hardest ratio reached) showed that ANS was a concurrent predictor of PC1 in all years. Thus, while we found inconsistent evidence, the majority of our correlations revealed a concurrent predictive relation between ANS and math, replicating previous results.
Table 2
Predictor  Standardized Tests

PC1



B  SE  p  B  SE  p  
Year 0  
ANS  0.286  0.076  0.0002  0.296  0.079  0.0002 
Year 1  
ANS  0.121  0.073  0.01  0.129  0.073  0.08 
PAE  0.074  0.074  0.32  0.168  0.073  0.02 
Linear r^{2}  0.103  0.074  0.17  0.158  0.074  0.03 
Ordinality  0.0008  0.074  0.99  0.105  0.074  0.16 
Year 2  
ANS  0.105  0.074  0.16  0.147  0.074  0.048 
PAE  0.145  0.073  0.049  0.248  0.071  0.011 
Linear r^{2}  0.191  0.073  0.001  0.249  0.073  0.0008 
Ordinality  0.208  0.073  0.005  0.191  0.075  0.011 
Year 3  
ANS  0.226  0.072  0.002  0.262  0.071  0.0003 
PAE  0.228  0.072  0.002  0.248  0.071  0.0007 
Linear r^{2}  0.277  0.071  0.0001  0.338  0.07  <.0001 
Ordinality  0.228  0.072  0.002  0.274  0.072  0.0002 
Note. Each row contains two models so there are (for example) eight Year 3 models presented in the table.
Relation Between Math Achievement and Estimation
We next asked whether estimation performance concurrently predicted math success. Each of our estimation measures concurrently predicted standardized math scores in Years 2 and 3, but not in Year 1 (see Table 2 for statistics). Ordinality was a concurrent predictor of our math PC1 in Years 2 and 3, while both PAE and Linear r^{2} concurrently predicted our math PC1 every year. Thus, as in previous work, we find that estimation performance predicts concurrent math achievement. More interesting, however, is whether this relationship survives the addition of a large battery of domain general measures.
Relation Between DomainGeneral Cognitive Mechanisms, ANS, Estimation, and Math
Having replicated past work showing that ANS and estimation ability are predictive of concurrent math achievement, we next asked whether such predictive relations were uniquely numerical, or whether they could be explained by domaingeneral cognitive abilities. To do this, we predicted math outcomes (our standardized math score composite and PC1) from numerical predictors (ANS and estimation performance) and from our battery of nonnumerical measures (mental rotation, spatial WM, verbal WM, Raven’s, age, and Intervention Condition; additional information about each of these measures is available here: https://github.com/langcog/mentalabacus). We constructed one model per year, per dependent variable. We report the results of our models (standardized betas and SEs) in Figure 2 (predicting standardized test scores) and Figure 3 (predicting PC1). We also experimented with a number of more sophisticated techniques, including longitudinal growth modeling with lagged predictors. Unfortunately, the combination of missing data and the relatively small number of longitudinal data points relative to the large number of possible predictors made these analyses difficult to interpret, though none contradict the results we report here.
Figure 2
Figure 3
While measures like Raven’s (Y1Y3, all models B > .21), Verbal Working Memory (Y1Y3, all models B > .09), Mental Rotation (Y1Y3, all models B > .11), and the mental abacus intervention (Y1Y3, all models B > .16) all predicted standardized math scores with relative consistency (Figure 1), the ANS and estimation measures did not. For example, when controlling for nonnumerical tasks, ANS only predicted standardized test scores in year 0 (B = .24, SE = .08, p = .003, all other p>.05). Linear r^{2} and PAE never significantly predicted standardized test scores (all p>.05; but see Figure 1 for some evidence that Linear r^{2} may have some predictive power), and Ordinality only predicted standardized test scores in Year 2 (B = .15, SE = .07, p = .04). To summarize, our nonnumerical measures consistently predicted standardized test scores, while our numerical ANS and estimation measures only inconsistently contributed to predicting standardized test scores (once we controlled for other factors).
Having explored the relationship between standardized math scores and the ANS and estimation tasks, we next applied the same analyses to predict the composite math score (PC1; see Figure 2). Again, measures like Raven’s (Y1Y3, all models B > .19), Mental Rotation (Y1Y3, all models B > .08), the abacus intervention (Y1Y3, all models B > .19), and Verbal Working Memory (Y1Y3, all models B > .07) all predicted PC1 relatively consistently. Again, ANS performance significantly predicted PC1, but only in Year 0 (B = .19, SE = .08, p = .02; all other p > .05).^{ii} Neither Ordinality nor PAE ever predicted PC1 when controlling for other factors (all p > .05; but see Figure 3 for some evidence that Ordinality may have had predictive value). Interestingly, Linear r^{2} did significantly predict PC1 in both Year 2 (B = .19, SE = .07, p = .007) and Year 3 (B = .17, SE = .07, p = .008), though not in Year 1 (p > .05). Once again, with the exception of Linear r^{2}, our estimation measures typically failed to predict our math PC1 once other control variables were included in the model. Further, our ANS measure did not consistently predict math PC1 when controlling for other factors.
Discussion
We tested whether nonverbal number (ANS) acuity and verbal estimation ability were uniquely predictive of symbolic math achievement. Specifically, we assessed (1) whether improvements to math performance caused changes to ANS acuity; (2) whether relations between math performance, ANS, and estimation were consistent over time; and (3) whether relations between math performance, ANS, and estimation persisted over time when controlling for performance on a battery of nonnumerical control tasks. To test these questions, we conducted new analyses of data from a threeyearlong randomized controlled math intervention, Barner et al. (2016) having reported that this intervention improved math substantially relative to the control group. Unlike any previous study, this dataset allowed us here to test both correlations and possible causal relations between formal math ability and informal numeracy measures, and to do so in a way that robustly controlled for nonnumerical cognitive abilities. While previous studies have reported conflicting results using a host of less robust designs, we both replicated positive findings and showed that – in our data – these results are explained almost entirely by domaingeneral factors.
We first asked whether changes to math performance caused longitudinal changes to ANS acuity. While previous studies tested this possibility indirectly (e.g., by comparing ANS acuity between educated and uneducated participants; Pica et al., 2004), or tested the opposite causal direction (whether improving ANS performance improves math; Park & Brannon, 2013, 2014) our study was unique in that it asked whether a RCT math intervention (which improved math performance) affected participants’ ANS acuity. We found no compelling evidence that this successful math intervention improved ANS acuity, or led to differences in estimation performance. These findings militate against one class of possible causal relations between the ANS and mathematics achievement – e.g., whereby changes in math achievement cause changes that result in greater ANS acuity. One limitation of our conclusions, however, is that mental abacus differs in many ways from other methods for improving math performance: mental abacus training appears to recruit cognitive skills not directly related to math (Barner et al., 2016; Frank & Barner, 2012), and abacus processing activates both regions of the brain associated with numerical processing and several additional regions (like those associated with visuospatial processing; Du et al., 2013). It therefore remains possible that because of the properties of mental abacus training, our findings do not generalize to other forms of math training.
Next, we sought to replicate and then explain previous work which found that ANS and estimation ability are related to formal math skill. Consistent with some previous studies, we found that both ANS acuity and estimation performance served as concurrent predictors of math success. However, we also found that this predictive relation was attenuated substantially when other, nonnumerical predictors were included in the model. In fact, nonnumerical measures like Raven’s, Mental Rotation, and verbal working memory were very strong predictors of math outcome, whereas ANS acuity and estimation were not. With the exception of a small subset of our analyses (ANS acuity in Year 0 and Linear r^{2} in Years 2 and 3), we found little evidence that our ANS and estimation measures uniquely predicted math outcomes when controlling for other cognitive abilities. This finding supports the conclusion that the relation between ANS acuity and symbolic math performance is often weakest in the early elementary school years (Fazio et al., 2014), and with work showing that this relationship may be mediated by other, nonANSrelated, factors (e.g., Gilmore et al., 2013; Göbel et al., 2014; Holloway & Ansari, 2009).
Why might correlations between estimation, ANS acuity, and mathematics achievement disappear when controlling for other cognitive capacities? One likely explanation is that tasks that measure ANS acuity and estimation also depend on capacities like spatial working memory, and domain general abilities like comparison, analogy, and perhaps even proportional reasoning; all of these skills have been implicated in mathematics or estimation performance (Alloway & Passolunghi, 2011; Barth & Paladino, 2011; Clark, Pritchard, & Woodward, 2010; DeStefano & LeFevre, 2004; Geary, 2011; Link, Nuerk, & Moeller, 2014; Passolunghi et al., 2014; Sullivan & Barner, 2014a, 2014b; Thompson, Nuerk, Moeller, & Cohen Kadosh, 2013). Similarly, the visual demands of most ANS tasks (e.g., Clayton et al., 2015) leave open the possibility that visual processing ability might partially mediate the link between ANS acuity and math performance.^{iii} Because the ANS and estimation tasks that have previously been shown to predict math skill also depend on nonnumerical cognitive abilities, and because few past studies thoroughly measured these factors, these previous findings may be driven in part by confounding nonnumerical factors.
Alternatively, some have argued that the ability to use ANS representations to perform approximate math computations (e.g., the ability to nonverbally “add” quantities) – rather than ANS acuity itself – is most predictive of math achievement (Park & Brannon, 2013). Consistent with this, recent interventions on ANS acuity have shown that, at least in adults, training the use of the ANS during approximate (nonverbal) arithmetic is more effective than training simple nonverbal numerical comparisons (like those tested in our study; Park & Brannon, 2013). These findings raise the possibility that there is a privileged relation between the ANS and math outcomes, and that our study simply failed to detect it. However, our results raise several additional questions. In particular, our intervention results suggest that improvements to ANS acuity are not required in order for improvements in math performance to occur, thus leaving open the question of why researchers so frequently find correlations between ANS acuity and math performance. Second, the mechanism by which practicing approximate addition might affect symbolic math remains unclear.
Before concluding, we note that while our study serves as an important extension of existing work on the ANS, it remains possible that studies in other populations will yield different results. As in all psychological research, the characteristics of the participants in our sample may limit our ability to draw generalizable inferences in the general human population (Henrich et al., 2010). Our participants were lowincome students whose performance on a variety of tasks was below both Indian and US norms (Barner et al., 2016). Thus, the predictive relations between approximate and symbolic mathematics may be different in samples drawn from other populations. Finally, as already noted, to the extent that the mental abacus intervention failed to affect ANS representations, this may be particular to mental abacus, which is a unique math training program known to recruit nonmathematical cognitive skills and neural regions (Barner et al., 2016; Du et al., 2013). Thus, the lack of causal effect of math training on ANS acuity may not be generalizable to children learning mathematics using standard methods.
To conclude, while we replicated past findings that ANS and estimation ability are concurrently predictive of math success, we failed to find evidence that changes to math skill caused changes to ANS or estimation performance. We also failed to find consistent evidence that ANS and estimation performance uniquely predicted math success. In fact, the strongest predictors of math performance were our nonnumerical cognitive predictors, like Raven’s, verbal working memory, and mental rotation. These data suggest that, while approximate measures of numerical competence (e.g., ANS acuity and estimation) may be related to math success, this relationship is likely fragile, and is one among many that predict mathematics achievement. More informative predictors of math achievement include domain general capacities like working memory, mental rotation, and general intelligence.