Intensive math training does not affect approximate number acuity : Evidence from a three-year longitudinal curriculum intervention

Does nonverbal, approximate number acuity predict mathematics performance? Some studies report a correlation between acuity of representations in the Approximate Number System (ANS) and early math achievement, while others do not. Few previous reports have addressed (1) whether reported correlations remain when other domain-general capacities are considered, and (2) whether such correlations are causal. In the present study, we addressed both questions using a large (N = 204) 3-year longitudinal dataset from a successful math intervention, which included a wide array of non-numerical cognitive tasks. While we replicated past work finding correlations between approximate number acuity and math success, these correlations were very small when other domaingeneral capacities were considered. Also, we found no evidence that changes to math performance induced changes to approximate number acuity, militating against one class of causal accounts.

Beginning early in infancy, humans can represent approximate numerical quantities nonverbally, using what is sometimes called the "Approximate Number System" (ANS) or the "number sense" (Dehaene, 1997;Xu & Spelke, 2000).The ANS is used to represent and compare numerical magnitudes, and does so according to Weber's law, such that the ratio of any two numerical quantities determines the likelihood that they can be differentiated (for review, see Dehaene, 1997).A number of recent studies report that individual differences in ANS acuity are related to mathematics achievement, such that individuals with greater numerical acuity also perform better on standardized math tests, the SAT, and a host of other math measures (Chen & Li, 2014;Halberda, Mazzocco, & Feigenson, 2008;Halberda et al., 2012).
In fact, more than a dozen studies have reported some correlation between the ANS and symbolic math, and these correlations often survive the addition of non-numerical control predictors, like verbal SAT score, IQ, and spelling ability (e.g., Anobile, Stievano, & Burr, 2013;Bonny & Lourenco, 2013;Desoete et al., 2012;DeWind & Brannon, 2012;Gilmore et al., 2010;Halberda, Mazzocco, & Feigenson, 2008;Halberda et al., 2012;Libertus et al., 2011;Libertus, Feigenson, & Halberda, 2013;Libertus, Odic, & Halberda, 2012;Mazzocco et al., 2011a, Journal of Numerical Cognition jnc.psychopen.eu | 2363-8761 2011b;Piazza et al., 2010;Starr et al., 2013).Further, Park andBrannon (2013, 2014) have provided evidence that training on non-verbal number tasks can lead to improvements to math performance in adults, raising the possibility that the ANS is foundational to mathematical learning, not merely an interesting correlate.These results are exciting for at least two reasons.First, they suggest a link between the evolutionarily ancient ANS and the more recent human innovation of symbolic arithmetic, thus potentially providing insight into the origins of mathematical thought.Second, they suggest that tests of ANS acuity may be helpful in designing diagnostic and intervention tools for early math difficulties (Park & Brannon, 2013, 2014;Starr, Libertus, & Brannon, 2013), perhaps even before children begin formal math training.
One way to adjudicate between these discrepant findings is via meta-analysis.For example, one recent metaanalysis demonstrated that -across a wide range of study methodologies and (36 independent) samples -ANS acuity explained substantial variability in symbolic math achievement (Chen & Li, 2014).However, half of the studies included in the meta-analysis did not control for participants' non-numerical cognitive capacities, and the vast majority of those that did controlled only for participants' linguistic ability.This inconsistent inclusion of control tasks is a problem, because tasks typically used to measure ANS acuity -e.g., dot array comparison -could plausibly draw on other non-numerical cognitive capacities, like working memory, non-numerical quantity representation, and inhibitory control (e.g., Gilmore et al., 2013).Critically, each of these cognitive capacities has also been shown to predict early math achievement, making it possible that they mediate relationship between ANS acuity and math achievement (Alloway & Passolunghi, 2011;Clark, Pritchard, & Woodward, 2010;DeStefano & LeFevre, 2004;Geary, 2011;Gilmore et al., 2013;Hornung, Schiltz, Brunner, & Martin, 2014;Lourenco, Bonny, Fernandez, & Rao, 2012;Passolunghi, Cargnelutti, & Pastore, 2014;Thompson, Nuerk, Moeller, & Cohen Kadosh, 2013).Thus, although the reported correlations between ANS acuity and math achievement may be due to a unique relationship between verbal and nonverbal numerical abilities, it is also possible that other, non-numerical perceptual and cognitive capacities explain the reported correlations.Similar challenges confront the interpretation of studies that compare individuals with different levels of formal mathematics training (e.g., Castronovo & Göbel, 2012;Lindskog, Winman, & Juslin, 2014;Nys et al., 2013;Pica, Lemer, Izard, & Dehaene, 2004).
In the present study, we tested whether the ANS is meaningfully causally linked to mathematics achievement.
We did this first, by assessing its longitudinal predictive power relative to a large battery of other cognitive measures, and second, by assessing whether changes in math performance caused changes to the ANS.Specifically, we analyzed publicly available data from a three-year longitudinal randomized controlled trial (RCT) of a math intervention in 2 nd through 5 th graders (N = 204).This RCT provided supplemental mathematics training in an experimental group using a popular mental arithmetic technique called "mental abacus" (Barner et al., 2016).Mental abacus training involves teaching participants to perform arithmetic calculations using an abacus, and, at more advanced levels, removing the physical abacus and asking users to visualize the abacus and calculations using this mental representation (Frank & Barner, 2012;Hatano, Miyake, & Binks, 1977;Stigler, Chalip, & Miller, 1986).
In the RCT, abacus training improved children's arithmetic abilities relative to a control group who had received additional training in a standard math curriculum, allowing us to test whether changes in arithmetic skill (induced by the intervention) led to changes in ANS acuity.Critically, the study also included a longitudinally-administered measure of ANS acuity, as well as tests of spatial working memory, verbal working memory, mental rotation ability, and general intelligence (Raven's Progressive Matrices).The availability of these measures allowed us to assess the predictive relation between ANS acuity and math achievement while concurrently controlling for a larger than usual set of non-numerical cognitive capacities.To our knowledge, this is the first study to have experimentally manipulated math ability in order to test the effects of improvements to math skill on ANS acuity.
Since estimation ability captures the translational process between symbolic and non-symbolic number representations (see Sullivan & Barner, 2014a, for discussion), we also tested whether children's estimation ability was uniquely related to math performance (Booth & Siegler, 2006, 2008;Gunderson, Ramirez, Beilock, & Levine, 2012;Kolkman et al., 2013;Moore & Ashcraft, 2015;Siegler & Booth, 2004).To our knowledge, no previous study has assessed this link while simultaneously controlling for a large battery of domain-general cognitive capacities.
To summarize, we re-analyzed data from a longitudinal math intervention to test the uniqueness and causal status of correlations between the ANS and math achievement, while simultaneously probing dot-array estimation, a measure of associations between numerals and approximate magnitudes.In doing so, we provide the first largescale longitudinal study to assess the causal link between the ANS and math achievement while simultaneously controlling for an exhaustive battery of domain-general cognitive capacities.
Participants were 204 children from a charitable school in Gujarat, India.Children spoke English (the language of instruction at their school), and most children also spoke an additional language (Guajarati and Hindi were the most common).Children came from either Muslim (41%) or Hindi (59%) families.Family income was low, with 80% of children coming from families earning around $2000 USD per year (median household income for 2006-2012 was just over $9,700 globally, and just over $3,000 in India; Phelps & Crabtree, 2013).Children were between of 5 and 8 years of age (M = 6.65 years, SD = .53)at the time of enrollment in 2010 (Year 0).

Math Measures
Children received several measures of math competence, including the Woodcock-Johnson III Computation test, the Math Fluency subtest of the Wechsler Individual Achievement Test (WIAT-III), and in-house tests of arithmetic and place value understanding.Also, children's math grades were available (as a score between 0-100), as reported by their school.Detailed descriptions of these measures are available in the Supplementary Materials of Barner et al. (2016).

Intervention
As reported in Barner et al. (2016), all children followed a standard math curriculum throughout the 3 years of the study.In addition to their standard math curriculum, students were also enrolled in one of two supplemental math programs: The control program used a standard international math curriculum, while the abacus program taught students mental abacus.Thus, all children received a shared core math curriculum, and one of two possible supplemental math curricula.The random assignment procedure operated as follows.At enrollment in Year 0, children were randomly assigned to one of three homerooms.Children randomly assigned to Homeroom 1 received supplemental instruction in a standard international math curriculum.Children assigned to Homeroom 2 received an equal number of hours of supplemental training using mental abacus.Half of the children in Homeroom 3 received supplemental abacus training, and the other half received the control supplemental standard math curriculum.
Supplemental abacus instruction was conducted by a privately trained abacus instructor, and in the case of the split class was conducted in a space separate from control children.Of the 204 children in the study, the majority provided data for every year of testing (Years 0-3; Control group n = 88; Mental Abacus group n = 99).
To test whether differences in math training are related to changes in ANS and estimation ability, we compared the ANS and estimation abilities of children who completed the mental abacus intervention to the control group.

ANS and Estimation Tasks
Children's ANS acuity was assessed using a 10-minute timed computerized task.As is typical for tasks assessing ANS acuity, two arrays of black dots were presented simultaneously on a gray background; the two arrays were separated by a vertical black line.Half of the trials controlled for total surface area across the arrays; the other half of trials controlled for item size (Dehaene et al., 2005).The correct answer was on the left 50% of the time.
Arrays were visible for 1000 ms and were followed by a 300 ms white noise mask image.Children were instructed to indicate which array was more numerous by pressing the Z (which was covered with a left arrow) or M (which was covered by a right arrow) key.The experiment was self-paced, and children pressed the space bar to progress to the next trial.To ensure that children attended to each trial, two beeps were presented via headphones immediately prior to the presentation of the arrays.
Trials were presented in blocks of 8. Within each block, the ratio of items in the two sets remained constant; all children started with a 4:5 ratio.Within each block, the numerical magnitudes of the arrays varied substantially (e.g., 16 vs.20; 80 vs. 100).In order to succeed on a given block, the child needed to get 6 out of 8 trials correct.
Side of the correct response was pseudo-randomly ordered so that alternating responses or consistent choices of "left" or "right" would lead to failure of the block.If participants succeeded on a block, they moved to the next hardest ratio (e.g., 5:6), while if they failed, they moved to the next easiest ratio (e.g., 3:4).Ratios ranged from 1:2 to 15:16.
Children's estimation ability was tested by asking them to estimate the number of dots on a screen; task duration was 10 minutes.Arrays were randomly generated and contained black dots on a gray background.The number of dots ranged from 3-120, and dot size and total area of the array varied across trials.Children viewed each array for 400 ms, and then entered their numerical estimate on a keypad (errors could be corrected by using the backspace key).Prior to beginning the task, children completed a keypad typing training session to ensure that all children could appropriately use the keypad.

Control Tasks
Children were tested on a battery of control tasks, again described in detail in the Supplemental Materials of Barner et al. (2016).This battery included two computerized measures of working memory: (1) a test of verbal working memory; and (2) a test of spatial working memory.For both tasks, the participant was first presented with a target sequence of events, and then had to decide whether a second sequence of events was the same as or different from the target (see below).For both tasks, children completed 10 practice trials prior to the start of the experiment, and 24 trials as part of the experiment.Children provided their responses by pressing either the 's' key (relabeled as "S" for same) or the 'l' key (relabeled as "D" for different) on a keyboard.
In the verbal working memory task, children heard a sequence of target syllables (e.g., "GU, TI").They then had to decide whether a second sequence of syllables (e.g., "RA", "TI") was the same or different from the target.The test sequence was either identical to the target, or differed by one syllable.All stimuli were recorded by a native speaker of Gujarati, using syllables that are legal in English, Hindi, and Gujarati.
In the spatial working memory task, participants saw a 5x5 grid of circles.They then saw a target sequence of circles turn yellow (only one circle was filled at a time).They were then presented with a test sequence of blue circles, and had to decide whether the blue circles appeared in the same locations as the yellow circles.Again, the sequence of blue circles was either identical to the sequence of yellow circles, or contained a single circle that appeared in a different position.
Both working memory tasks were adaptive -children were asked to remember n items during the target sequence, and if they responded correctly, they were asked on the next trial to remember n+1 items (if they responded incorrectly, they were next asked to remember n-1 items).Thus, if a child successfully remembered two target syllables (e.g., "GU", "TI"), on the next trial they were asked to remember three target syllables (e.g., "RA", "MI", "TU").For both working memory tasks, we calculated participants' memory score by averaging the level of difficulty for all 24 trials.
Children also completed a paper-and-pencil task that measured mental rotation ability.In this task, participants were asked to match one of two target items (either letters or shapes, based on the Shepard & Metzler, 1971 stimuli) to a sample.One target item was the mirror image of the sample while the other was an exact match; both target items were also rotated.Thus, in order to determine which target matched the sample, the child needed to mentally rotate the item.Finally, children completed Raven's Progressive Matrices (A and/or B).

Results
Before presenting our main analyses, we first describe how data were used to construct measures of ANS acuity, estimation ability, and mathematics achievement.

Measures & Descriptive Statistics ANS and Estimation Measures
Each child's ANS acuity was measured for each year (Year 0, 1, 2, and 3) as a Weber fraction (using the method described by Halberda, Mazzocco, & Feigenson, 2008), where greater ANS acuity is indicated by a smaller Weber fraction (M acuity Y0 = .37;Y1 = .18;Y2 = .15;Y3 = .14).Prior to analysis, we excluded children whose Weber fractions were > .8, as a value this large likely reflects a misunderstanding of the task.We selected the Weber fraction as our DV because it is the standard measure of ANS acuity in the field.However, because measures of ANS acuity differ from one another (Inglis & Gilmore, 2014), we also repeated all analyses using an alternative, less conservative measure of acuity, defined as the ratio of dots on the most difficult trial successfully completed by a child.These additional analyses are available here: https://github.com/langcog/jesstimation,and are only reported in the present paper when they differed from our main analyses.
Using previously unreported data, we also constructed several measures of estimation ability, using data from Years 1, 2, and 3 (Year 0 estimation data were not collected).First, we tested the internal consistency of children's estimates.To assess consistency we used two measures: ordinality and linear r 2 , described below.
Ordinality captures the extent to which a child's estimates are ordered consistently.Specifically, we defined ordinality by calculating the proportion of trials on which the child gave estimates in the correct direction relative to previous estimates.For example, if a smaller number of dots was shown on trial n than on trial n-1, a child's estimate was labeled as ordinal if their estimate was smaller on trial n than on trial n-1.The average rate of ordinal responding for each year was Y1 = .80;Y2 = .80;Y3 = .80,demonstrating high levels of ordinal responding.Surprisingly, performance on this measure did not appear to improve over time.Previous work has shown that children can provide ordinal estimates long before they provide accurate estimates, suggesting that this measure might capture children's early structural knowledge of the relation between the verbal and nonverbal number systems (Sullivan & Barner, 2014a, 2014b).Recent work has even suggested that the understanding of ordinality mediates the link between the ANS and math achievement (Lyons & Beilock, 2011).
The Linear r 2 measure of internal consistency represents the amount of variability in estimation performance that can be accounted for by knowing the number of dots a child was estimating.In other words, this value represents the extent to which the relation between a child's estimate and the number of dots that they saw can be described by a linear function (in previous work, this has been referred to as the "linearity" of children's estimates; e.g., Booth & Siegler, 2006).To calculate Linear r 2 , we constructed a linear regression predicting each child's estimates from the number of dots presented, and then reported the linear r 2 of the line (Y1 = .37;Y2 = .35;Y3 = .36;again, these values did not appear to increase over time).Importantly, a high Linear r 2 score does not necessarily indicate that a child provided accurate estimates, but rather that the child's estimates were internally consistent (for example, one could imagine a child who overestimated small numbers, underestimated large numbers, and yet still provided estimates that were perfectly linear).Unlike ordinality, which only captures the internal consistency of the ordering of estimates, in order to have a high Linear r 2 value, children must also be internally consistent in the relative the actual number presented, divided by the number presented (M PAE : Y1 = .71;Y2 = .71;Y3 = 70; again, no group-wise change over time).PAE has been shown previously to predict math performance on standardized tests (Castronovo & Göbel, 2012;Sasanguie et al., 2013;Siegler & Booth, 2004; although not across all studies: see Booth & Siegler, 2006), addition/subtraction performance (Link, Nuerk, & Moeller, 2014;Moore & Ashcraft, 2015 [addition only]), and mental arithmetic (Lyons, Price, Vaessen, Blomert, & Ansari, 2014).Thus, if possessing highly accurate and stable mappings between number words and nonverbal representations of number is important to math success (e.g., in the case that children actually recruit the ANS to check or compute symbolic math calculations), then PAE should be related to math performance.
To provide an approximate measure of the reliability of each our estimation and ANS measures, we predicted each year's data from the previous year's data; we report these Pearson correlation coefficients and significance level in Table 1.

Math Measures
We did not have a set of specific, a priori, theoretically-motivated predictions about the differences between our particular measures of math competence (WIAT, WJ-III, arithmetic, place value, and math grades), and so we constructed two different math measures, both of which combined data from multiple math tests.Our goal in creating these two composite measures was to reduce the dimensionality of our analyses and avoid the issue of attempting to analyze five different but highly correlated measures of symbolic math.
We created a composite of the standardized math test scores (WIAT math fluency and WJ-III Computation subtest) by calculative the proportion correct for each test, and averaging scores on the two measures.This composite showed improved mathematics performance across each year of testing (M standardized : Y0 = .20;Y1 = .31;Y2 = .43;Y3 = .54).Because standardized math testing is commonly used both in psychology and education, this measure captured the type of math competence that is likely to be measured in a classroom or lab setting.
We also created a single composite measure that took into account all available math achievement data for each child.To do this, we conducted a Principal Components Analysis on all of our symbolic math measures (WIAT, WJ-III, arithmetic, place value, math grades).We then took the first principal component (PC1) as a measure of the primary shared variance between these tasks.We then predicted this measure -PC1 -from ANS and estimation performance.Barner et al. (2016), reported using the current data that abacus training had a significant impact on mathematics achievement when measured by the WJ-III Computation subtest and the in-house arithmetic battery.Therefore, by comparing the ANS acuity and estimation ability of children who received abacus instruction to that of children in the control group, we were able to ask whether improvements to mathematics ability caused improvements to ANS and estimation outcomes.We created a mixed effects linear regression model predicting ANS acuity from Year, Intervention Condition (abacus vs. no abacus), and their interaction.We also added participant-level random effects of Year, capturing individual children's growth over time.If improvements to math ability cause improvements to ANS acuity, then we should expect a significant Year by Intervention interaction, such that children who received abacus training (and therefore got better at math) showed larger improvements in and acuity over time than children in the control group.
We next tested whether our intervention influenced estimation performance.Because we did not have Year 0 (baseline) data for estimation, we could not assess with certainty whether abacus training caused changes to estimation.However, we were able to test whether there were differences in estimation performance between the abacus and control group during Years 1, 2, and 3.
No estimation measure showed consistent (e.g., across more than one year) differences between the abacus and control group.Also, when correcting for multiple comparisons, no p-value reached significance.For PAE, there was no effect of abacus training in Year 1 (t(182) = -1.64,p = .10,d = -.24), a significant, uncorrected, effect in Year 2 (t(185) = -2.21,p = .03,d = -.32;Bonferroni p = .09),and no effect in Year 3 (t(184) = -1.47,p = .14,d = -.22;).For Ordinality, there was a significant, uncorrected, effect in Year 1 (t(182) = 2.15, p = .03,d = .32;Bonferroni p = .11),and no effect in Years 2 or 3 (year 2: t(185) = 1.39, p = .17,d = .20;year 3: t(184) = 1.12, p = .26,d = .17).For Linear r 2 , there were no effect in Years 1 or 2 (Year 1: t(182) = .598,p = .55,d = .09;Year 2: t(185) = .38,p = .71,d = .06),though there was a significant, uncorrected, effect in Year 3 (t(184) = 2.34, p = .02,d = .35;Bonferroni p = .06).Although there were trends indicating a relation between Intervention Condition and estimation performance (e.g., see Figure 1), none of these comparisons reached significance when correcting for the number of comparisons conducted.Further, there were no overall trends of improvement in estimation performance over time in relation to abacus training.To summarize: These analyses suggest that an intervention that improved math performance did not significantly improve ANS acuity.Also, analyses failed to find evidence for a benefit of training to estimation performance, though we lacked pre-intervention estimation data that could allow us to definitively test whether the intervention affected estimation.While our results thus far are not consistent with the view that math training causes improvements to the ANS, they leave open the opposite possibility that ANS acuity might still be related to mathematics ability.To assess this, our next set of analyses tested whether ANS acuity and estimation performance were related at all to math achievement, when other cognitive measures were considered.Critically, the analyses that follow do not hinge on the particularities of the abacus intervention, and instead ask about the relation between the ANS and math independent from math training.

Relation Between Math Achievement and the ANS
To test whether ANS acuity was related to math performance across intervention groups, we constructed regression models predicting standardized math scores from ANS acuity.For simplicity, we fit these models for each year separately.Following the logic of previous studies in this literature, these models test whether ANS acuity predicts concurrent math achievement.For these and all subsequently reported models, we scaled all predictors in order to compare the relative predictive value of each parameter in the models directly (since all betas and standard errors are in standard units).
Replicating previous research, we found that ANS acuity was a concurrent predictor of standardized math scores for Years 0, 1, and 3 (see Table 2 for B, SE, and p), though we did not observe a significant relationship in Year 2. ANS was also a concurrent predictor of our math PC1 in Years 0, 2, and 3, but not in Year 1 (Table 2).i Our alternative measure of ANS acuity (hardest ratio reached) showed that ANS was a concurrent predictor of PC1 in all years.Thus, while we found inconsistent evidence, the majority of our correlations revealed a concurrent predictive relation between ANS and math, replicating previous results.Note.Each row contains two models so there are (for example) eight Year 3 models presented in the table.

Relation Between Math Achievement and Estimation
We next asked whether estimation performance concurrently predicted math success.Each of our estimation measures concurrently predicted standardized math scores in Years 2 and 3, but not in Year 1 (see Table 2 for statistics).Ordinality was a concurrent predictor of our math PC1 in Years 2 and 3, while both PAE and Linear r 2 concurrently predicted our math PC1 every year.Thus, as in previous work, we find that estimation performance predicts concurrent math achievement.More interesting, however, is whether this relationship survives the addition of a large battery of domain general measures.

Relation Between Domain-General Cognitive Mechanisms, ANS, Estimation, and Math
Having replicated past work showing that ANS and estimation ability are predictive of concurrent math achievement, we next asked whether such predictive relations were uniquely numerical, or whether they could be explained by domain-general cognitive abilities.To do this, we predicted math outcomes (our standardized math score composite and PC1) from numerical predictors (ANS and estimation performance) and from our battery of non-numerical measures (mental rotation, spatial WM, verbal WM, Raven's, age, and Intervention Condition; additional information about each of these measures is available here: https://github.com/langcog/mentalabacus).We constructed one model per year, per dependent variable.We report the results of our models (standardized betas and SEs) in Figure 2 (predicting standardized test scores) and Figure 3 (predicting PC1).We also experimented with a number of more sophisticated techniques, including longitudinal growth modeling with lagged predictors.Unfortunately, the combination of missing data and the relatively small number of longitudinal data points relative to the large number of possible predictors made these analyses difficult to interpret, though none contradict the results we report here.all predicted standardized math scores with relative consistency (Figure 1), the ANS and estimation measures did not.For example, when controlling for non-numerical tasks, ANS only predicted standardized test scores in year 0 (B = -.24,SE = .08,p = .003,all other p>.05).Linear r 2 and PAE never significantly predicted standardized test scores (all p>.05; but see Figure 1 for some evidence that Linear r 2 may have some predictive power), and Ordinality only predicted standardized test scores in Year 2 (B = .15,SE = .07,p = .04).To summarize, our non-numerical measures consistently predicted standardized test scores, while our numerical ANS and estimation measures only inconsistently contributed to predicting standardized test scores (once we controlled for other factors).ii Neither Ordinality nor PAE ever predicted PC1 when controlling for other factors (all p > .05;but see Figure 3 for some evidence that Ordinality may have had predictive value).Interestingly, Linear r 2 did significantly predict PC1 in both Year 2 (B = .19,SE = .07,p = .007)and Year 3 (B = .17,SE = .07,p = .008),though not in Year 1 (p > .05).Once again, with the exception of Linear r 2 , our estimation measures typically failed to predict our math PC1 once other control variables were included in the model.Further, our ANS measure did not consistently predict math PC1 when controlling for other factors.

Discussion
We tested whether nonverbal number (ANS) acuity and verbal estimation ability were uniquely predictive of symbolic math achievement.Specifically, we assessed (1) whether improvements to math performance caused changes to ANS acuity; (2) whether relations between math performance, ANS, and estimation were consistent over time; and (3) whether relations between math performance, ANS, and estimation persisted over time when controlling for performance on a battery of non-numerical control tasks.To test these questions, we conducted new analyses of data from a three-year-long randomized controlled math intervention, Barner et al. (2016) having reported that this intervention improved math substantially relative to the control group.Unlike any previous study, this dataset allowed us here to test both correlations and possible causal relations between formal math ability and informal numeracy measures, and to do so in a way that robustly controlled for non-numerical cognitive abilities.While previous studies have reported conflicting results using a host of less robust designs, we both replicated positive findings and showed that -in our data -these results are explained almost entirely by domain-general factors.
We first asked whether changes to math performance caused longitudinal changes to ANS acuity.While previous studies tested this possibility indirectly (e.g., by comparing ANS acuity between educated and uneducated participants; Pica et al., 2004), or tested the opposite causal direction (whether improving ANS performance improves math; Park & Brannon, 2013, 2014) our study was unique in that it asked whether a RCT math intervention (which improved math performance) affected participants' ANS acuity.We found no compelling evidence that this successful math intervention improved ANS acuity, or led to differences in estimation performance.These findings militate against one class of possible causal relations between the ANS and mathematics achievement -e.g., whereby changes in math achievement cause changes that result in greater ANS acuity.One limitation of our conclusions, however, is that mental abacus differs in many ways from other methods for improving math performance: mental abacus training appears to recruit cognitive skills not directly related to math (Barner et al., 2016;Frank & Barner, 2012), and abacus processing activates both regions of the brain associated with numerical processing and several additional regions (like those associated with visuospatial processing; Du et al., 2013).It therefore remains possible that because of the properties of mental abacus training, our findings do not generalize to other forms of math training.
Next, we sought to replicate and then explain previous work which found that ANS and estimation ability are related to formal math skill.Consistent with some previous studies, we found that both ANS acuity and estimation performance served as concurrent predictors of math success.However, we also found that this predictive relation was attenuated substantially when other, non-numerical predictors were included in the model.In fact, non-numerical measures like Raven's, Mental Rotation, and verbal working memory were very strong predictors of math outcome, whereas ANS acuity and estimation were not.With the exception of a small subset of our analyses (ANS acuity in Year 0 and Linear r 2 in Years 2 and 3), we found little evidence that our ANS and estimation measures uniquely predicted math outcomes when controlling for other cognitive abilities.This finding supports the conclusion that the relation between ANS acuity and symbolic math performance is often weakest in the early elementary school years (Fazio et al., 2014), and with work showing that this relationship may be mediated by other, non-ANS-related, factors (e.g., Gilmore et al., 2013;Göbel et al., 2014;Holloway & Ansari, 2009).
Why might correlations between estimation, ANS acuity, and mathematics achievement disappear when controlling for other cognitive capacities?One likely explanation is that tasks that measure ANS acuity and estimation also depend on capacities like spatial working memory, and domain general abilities like comparison, analogy, and perhaps even proportional reasoning; all of these skills have been implicated in mathematics or estimation performance (Alloway & Passolunghi, 2011;Barth & Paladino, 2011;Clark, Pritchard, & Woodward, 2010;DeStefano & LeFevre, 2004;Geary, 2011;Link, Nuerk, & Moeller, 2014;Passolunghi et al., 2014;Sullivan & Barner, 2014a, 2014b;Thompson, Nuerk, Moeller, & Cohen Kadosh, 2013).Similarly, the visual demands of most ANS tasks (e.g., Clayton et al., 2015) leave open the possibility that visual processing ability might partially mediate the link between ANS acuity and math performance.
iii Because the ANS and estimation tasks that have previously been shown to predict math skill also depend on non-numerical cognitive abilities, and because few past studies thoroughly measured these factors, these previous findings may be driven in part by confounding non-numerical factors.
Alternatively, some have argued that the ability to use ANS representations to perform approximate math computations (e.g., the ability to nonverbally "add" quantities) -rather than ANS acuity itself -is most predictive of math achievement (Park & Brannon, 2013).Consistent with this, recent interventions on ANS acuity have shown that, at least in adults, training the use of the ANS during approximate (nonverbal) arithmetic is more effective than training simple nonverbal numerical comparisons (like those tested in our study; Park & Brannon, 2013).These findings raise the possibility that there is a privileged relation between the ANS and math outcomes, and that our study simply failed to detect it.However, our results raise several additional questions.In particular, our intervention results suggest that improvements to ANS acuity are not required in order for improvements in math performance to occur, thus leaving open the question of why researchers so frequently find correlations between ANS acuity and math performance.Second, the mechanism by which practicing approximate addition might affect symbolic math remains unclear.
Before concluding, we note that while our study serves as an important extension of existing work on the ANS, it remains possible that studies in other populations will yield different results.As in all psychological research, the characteristics of the participants in our sample may limit our ability to draw generalizable inferences in the general human population (Henrich et al., 2010).Our participants were low-income students whose performance on a variety of tasks was below both Indian and US norms (Barner et al., 2016).Thus, the predictive relations between approximate and symbolic mathematics may be different in samples drawn from other populations.Finally, as already noted, to the extent that the mental abacus intervention failed to affect ANS representations, this may be particular to mental abacus, which is a unique math training program known to recruit non-mathematical cognitive skills and neural regions (Barner et al., 2016;Du et al., 2013).Thus, the lack of causal effect of math training on ANS acuity may not be generalizable to children learning mathematics using standard methods.
To conclude, while we replicated past findings that ANS and estimation ability are concurrently predictive of math success, we failed to find evidence that changes to math skill caused changes to ANS or estimation performance.We also failed to find consistent evidence that ANS and estimation performance uniquely predicted math success.
In fact, the strongest predictors of math performance were our non-numerical cognitive predictors, like Raven's, verbal working memory, and mental rotation.These data suggest that, while approximate measures of numerical competence (e.g., ANS acuity and estimation) may be related to math success, this relationship is likely fragile, and is one among many that predict mathematics achievement.More informative predictors of math achievement include domain general capacities like working memory, mental rotation, and general intelligence.

Notes
i) Using our alternative measure of ANS acuity, we found that ANS was a concurrent predictor of standardized math scores for Years 0, 2, and 3, but not for Year 1, and that ANS was a concurrent predictor of PC1 across all years.ii) Our alternative measure of ANS acuity (hardest ratio tested) predicted PC1 in Years 0 and 1.
iii) A deeper critique, based on the evidence that individuals' scores on different ANS tasks often fail to correlate with one another, is that the non-numerical properties of ANS stimuli (including the controls used, whether trials get increasingly more difficult over time, and the visual properties of the stimuli; Clayton et al., 2015;Inglis & Gilmore, 2014;Smets et al., 2014) may overwhelm the signal of the ANS in ANS assessments.Consistent with this view, even within our dataset, our two measures of ANS acuity differed slightly from one another in predicting math performance.

Figure 1 .
Figure 1.ANS acuity (Years 0, 1, 2, and 3) and estimation performance (Years 1, 2, and 3 -we did not test estimation in Year 0).Red line indicates children who learned abacus; black line indicates children who were in the control group.For Proportion Absolute Error (PAE) and ANS measures, smaller numbers indicate better performance.For our Linear r 2 and Ordinality measures, larger numbers indicate better performance.Error bars are Standard Errors.

Figure 2 .
Figure 2. Standardized Beta weights (bars are standard error) when predicting standardized math scores from each of our predictors (ANS, PAE, Linear r 2 , and Ordinality), controlling for other non-numerical tasks.Each cell represents the results of a single model output, such that the results of 13 models are depicted.Columns represent years of test (Y0-Y3).

Figure 3 .
Figure 3. Standardized Beta weights (bars are standard error) when predicting our math PC1 from each of our predictors (ANS, PAE, Linear r 2 , and Ordinality), controlling for other non-numerical tasks.
Corre & Carey, 2007;Mundy & Gilmore, longitudinal dot-array estimation data, by measuring children's ability to label arrays of dots with number words.These data allowed us to test the relationship between estimation and math achievement.Given that ANS representations are known to be linked to number words in the verbal count list (e.g., LeCorre & Carey, 2007;Mundy & Gilmore,

Table 1
Year-to-year reliability for each of our estimation and ANS measures.