Identifying Domain-General and Domain-Specific Predictors of Low Mathematics Performance : A Classification and Regression Tree Analysis

Abstract Many children struggle to successfully acquire early mathematics skills. Theoretical and empirical evidence has pointed to deficits in domain-specific skills (e.g., non-symbolic mathematics skills) or domain-general skills (e.g., executive functioning and language) as underlying low mathematical performance. In the current study, we assessed a sample of 113 threeto five-year old preschool children on a battery of domain-specific and domain-general factors in the fall and spring of their preschool year to identify Time 1 (fall) factors associated with low performance in mathematics knowledge at Time 2 (spring). We used the exploratory approach of classification and regression tree analyses, a strategy that uses step-wise partitioning to create subgroups from a larger sample using multiple predictors, to identify the factors that were the strongest classifiers of low performance for younger and older preschool children. Results indicated that the most consistent classifier of low mathematics performance at Time 2 was children’s Time 1 mathematical language skills. Further, other distinct classifiers of low performance emerged for younger and older children. These findings suggest that risk classification for low mathematics performance may differ depending on children’s age.

domain-specific skill has been questioned (Leibovich & Ansari, 2016).In contrast to the ANS, less work has been conducted on the OTS, but this system appears to be active beginning in infancy and OTS deficits may be suggestive of later difficulties in mathematics (Andersson & Östergren, 2012;Mou & vanMarle, 2014).
Beyond the ANS and OTS, researchers have indicated that challenges in acquiring specific aspects of the symbolic numeracy system, such as cardinal number knowledge (Geary & vanMarle, 2016) and the numeral system (Göbel, Watson, Lervåg, & Hulme, 2014;Merkley & Ansari, 2016), may underlie difficulties in mathematics.Some have even suggested that these components of early numeracy are more related to low mathematics performance than the ANS as they are more representative of school-taught mathematics skills (De Smedt, Noël, Gilmore, & Ansari, 2013).It may be that more formal, school-taught, skills are developmentally dependent on these aspects of early numeracy as the relation between informal skills (e.g., cardinality) to formal skills (e.g., addition) is mediated by numeral knowledge (Purpura, Baroody, & Lonigan, 2013) such that a deficit at any point in that development may result in subsequent deficits.As such, difficulties in the acquisition of these symbolic numeracy skills may play a role in later low performance on measures of broader mathematics skills.

Domain-General Factors
A number of domain-general factors also have been associated with low mathematics performance, including broad structures of executive functioning and literacy skills as well as specific components of each of these domains, and particularly components such as working memory and language.
Executive functioning -Executive functioning, broadly, is the cognitive processes needed to complete a specific task or goal (Blair, Ursache, Greenberg, Vernon-Feagans, & the Family Life Project Investigators, 2015;Ponitz, McClelland, Matthews, & Morrison, 2009).It has been found to be an important predictor of academic success-particularly in mathematics (Allan, Hume, Allan, Farrington, & Lonigan, 2014;Fuhs, Nesbitt, Farran, & Dong, 2014;McClelland, Acock, & Morrison, 2006).There are three primary components that comprise executive functioning: working memory, response inhibition, and attention shifting (Lehto, Juujaarvi, Kooistra, & Pulkkinen, 2003;Miyake et al., 2000).Some prior evidence indicates that components of executive functioning may be differentially related to distinct components of mathematics (Lan, Legare, Ponitz, Su, & Morrison, 2011;Purpura, Schmitt, & Ganley, 2017).In particular, much of the research focused on mathematical difficulties and components of executive functioning has linked domains such as working memory and other aspects of a general cognitive system with low mathematics performance (Raghubar, Barnes, & Hecht, 2010;Swanson & Jerman, 2006;c.f. Landerl, Bevan, & Butterworth, 2004).Notably, in the domain-general cognitive deficit hypothesis (Geary, 2004) it is proposed that children with low mathematical performance have a deficit in their underlying domain-general cognitive system (e.g., executive functioning, processing speed) which limits their ability to effectively acquire early mathematics skills.
Given the importance of language skills early on in children's mathematics development, it is important to note that evidence from other recent studies (Purpura & Logan, 2015;Purpura & Reid, 2016;Toll & Van Luit, 2014a, 2014b) indicates that a specific component of language that overlaps with both domain-general and domainspecific skills-mathematical language (e.g., knowledge of words such as "many," "few," "near," "before")-is actually a stronger and more proximal predictor of mathematics performance than general language skills and other cognitive skills.Deficits in such knowledge, rather than in general language skills, may underlie deficits in mathematical performance.

Trees
Given that domain-specific factors and domain-general factors have been independently associated with overall mathematics performance and low mathematical performance in particular, it is evident that deficits in each area and deficits in broad mathematical ability are likely to co-occur.However, separately assessing the association between each domain and mathematical ability does not capture each domain's relative importance as a predictor or risk factor for low mathematical performance.A number of methods (e.g., discriminant analysis, logistic regression) have been developed to utilize multiple domains to predict risk status for learning difficulties in general (Hosmer & Lemeshow, 1989;Lachenbruch, 1975).One underutilized analytic method that could be used to identify risk factors for classifying children at risk of low mathematics performance is CART (Breiman, Friedman, Olshen, & Stone, 1984).Though a number of studies have used CART analyses in reading difficulty classification (Compton, Fuchs, Fuchs, & Bryant, 2006;Koon, Petscher, & Foorman, 2014) this method can also be applied to identifying children at risk of low mathematics performance.
CART is a strategy that uses step-wise partitioning to create increasingly homogenous subgroups from a heterogeneous sample using a variety of predictor variables (Gruenewald, Mroczek, Ryff, & Singer, 2008;Speybroeck, 2012).When the outcome variable is continuous, such as mathematics score, t-tests derived from F tests determine splits that maximize the difference in average mathematics scores between subgroups (Gruenewald et al., 2008).This continues until the predictors can no longer effectively split the subgroups.
Terminal subgroups are designated as "risk" or "no risk" based on the average mathematics score for the children in the subgroup.Each subgroup, whether or not it is split, is labeled as a "node."A node that cannot be split is termed a terminal node.The results resemble a tree structure such as the example in Figure 1 (note this is an example and does not use actual data).In this example, a full sample of 100 children was first split based on Variable 1. Children with a score less than or equal to 14 had the lowest average Time 2 mathematics score (Terminal Node 1; M = 14.52).Terminal Nodes 3 and 4 suggest that the combination of Variable 1 and Variable 2 was also associated with Time 2 mathematics score, such that children with a combination of high levels of Variables 1 and 2 had the highest average Time 2 mathematics score (Node 4; M = 21.67), and those with lower levels of Variable 2 had a lower average Time 2 mathematics score (Node 3; M = 18.20).In other words, this tree suggests that low levels of Variable 1 is a risk factor for low mathematics performance, whereas high levels of Variable 1 may serve as a protective factor for children with low levels of Variable 2. Importantly, these pathways are interpreted as combinations of factors, not a sequence of events (Gruenewald, Seeman, Ryff, Karlamangla, & Singer, 2006).CART is beneficial because it illuminates nonlinear pathways among the many factors included in the model.However, as a single tree structure is sensitive to changes in the sample from which they are created, "forests" of trees (i.e., all identified trees that account for significant variance in classification) may provide insight as to which factors consistently predict the outcome across many trees, rather than relying on a single tree (Strobl, Malley, & Tutz, 2009).CART analyses produce a series of potential trees (i.e., a forest of trees) that illuminate all reasonable potential classification modes.

Current Study
Given the early emergence of individual differences in mathematics development, there is a critical need to take a broader view of early risk assessment for mathematics difficulties.Both domain-specific and domain-general factors have been implicated in predicting risk of low mathematics performance.However, limited research has examined risk classification methods from a multivariate perspective that includes a wide range of factorsparticularly with preschool age children.CART analyses are an appropriate method for evaluating such relations.
To examine the classification utility of a range of domain-general and domain-specific factors, we utilize a dataset that includes fall of preschool (i.e., Time 1) assessments of background demographics, numeracy skills, ANS, literacy skills, executive functioning domains, and basic processing speed to conduct a CART analysis to identify strong classifiers of low mathematics performance in the spring of the preschool year (i.e., Time 2).We take an exploratory approach in this study given the mixed evidence on antecedents to low mathematical performance classification and that CART analyses are data-driven and exploratory by nature.However, given that specific mathematical skills and concepts develop in a connected and cumulative manner, called a learning Purpura, Day, Napoli, & Hart trajectory, and that these domain-specific and domain-general factors likely play a role in mathematical acquisition for more targeted components of mathematics development (Lan et al., 2011;Purpura & Napoli, 2015;Purpura, Schmitt, et al., 2017), we expected different factors to predict low mathematics performance for younger preschool children and older preschool children.Importantly, we are defining the classification outcomes as "low mathematical performance" rather than mathematical learning disabilities because of the exploratory nature of the study.We also note that prior work using this dataset (Purpura & Logan, 2015) has examined longitudinal predictors of general mathematics performance; however, the current study represents a novel approach to examining classification of low performance in mathematics as it focuses on data-driven classification strategies of children at risk for later low performance in mathematics.

Method Participants
Data were collected in 12 private preschools in the Midwest of the United Sates serving children from a range of socioeconomic statuses.Parents of 136 preschool children completed the study consent forms.Seven children did not assent to participate, four left their school before testing was completed, one did not complete all key measures, and eleven left their schools before the spring assessment.Children who left the study after the fall assessment were not significantly different from completers on age, F(1, 123) = 0.25, p = .615or fall mathematics performance, F(1, 123) = 3.29, p = .072.Of the 113 children who completed the fall and spring assessments, 54.0% were female.Children were 3.12 to 5.26 years old (M = 4.16, SD = .59).Students were representative of local demographics (72.6% Caucasian, 1.8% African American, 1.8% Hispanic, 8.8% Asian, 15.0% multi-racial/other).With regard to parental education, 23.0% of participants had parents with a high school education or less, 31.8% had at least one parent with a college degree, and 45.2% had at least one parent with a postgraduate degree.

Measures Early Mathematics
The Preschool Early Numeracy Skills Test -Brief Version (PENS-B) is a 24-item numeracy task that takes approximately five minutes to administer.In the current sample, the measure had strong internal consistency (Cronbach's α = .93).The 24 items are representative of the broad range of early numeracy skills children are expected to learn in preschool and kindergarten.Specific areas of assessment include: set comparison, numeral comparison, one-to-one correspondence, number order, numeral identification, ordinality, and number combinations.Children answered all 24 items and received one point for each correct answer.However, an empirically-derived ceiling rule was applied during scoring where, after a child incorrectly responded to 3 items in a row, s/he did not receive points for subsequent items (results were comparable regardless of the use of the ceiling rule, but the rule was utilized to be consistent with the developed standards for the measure; Purpura, Reid, Eiland, & Baroody, 2015).The test has been shown to have evidence of high reliability using both classical test theory and item response theory metrics (Purpura et al.).Furthermore, the measure has evidence of convergent validity as it is highly correlated with the Test of Early Mathematics Ability -3 rd Edition (TEMA-3; r = .73)and evidence of discriminant validity as it is more related to another measure of numeracy skills than it was to a measure of early literacy skills (Purpura et al.).

Approximate Number System
The Panamath Test (Halberda et al., 2008;www.panamath.org)was used to assess the ANS.Children were presented with a series of arrays on the computer screen and had to identify quickly which of two sets of dots was bigger.The program was set to 5 minutes and 4-years-old as the default setting across all children to ensure all participants received comparable assessments.The default settings included four ratio bins (1.25 to 1.46, 1.46 to 1.71, 1.78 to 2.09, and 2.71 to 3.18) from which equal distributions of items were selected.
Percent correct (accuracy) was used as the measure of children's ANS.

Language and Literacy Skills
Children were assessed on four measures of language and literacy skills, including the three subtests of the Test of Preschool Early Literacy Skills (TOPEL; Lonigan, Wagner, Torgesen, & Rashotte, 2007) and a measure of mathematical language (Purpura & Logan, 2015).Cronbach's α = .87for phonological awareness) according to the examiner's manual (Lonigan et al., 2007).

Mathematical language -
The mathematical language subtest is an author-developed measure of mathematical content language that has strong evidence of internal consistency in this sample (Cronbach's α = .85).The measure includes 16 items assessing comparative language (e.g., combine, more, less, take away) and spatial language (e.g., near, far).Using an item response theory framework, these items were selected from a larger battery of items.The selected items had a range of difficulty parameters and strong discrimination parameters.All items were designed to be completed without exact quantitative skills and in a non-mathematics context.For example, the more/less questions were asked in multiple ways: (a) comparing dots with such a gross difference that, regardless of children's mathematics ability, if they knew the meaning of the language terms, they would be able to respond correctly (e.g., 10 vs. 2), and (b) using a picture of mostly full glasses and a mostly empty glass when asking "Which glass has only a little bit of water?"

Executive Functioning
Measures of each of the three primary executive functioning components (i.e., response inhibition, attentional shifting, and verbal working memory), as well as a broad measure of executive functioning, were used.

Purpura, Day, Napoli, & Hart 371
Response inhibition -A modified Stroop-like task was used to assess response inhibition (Gerstadt, Hong, & Diamond, 1994).Children were shown a page with pictures of suns and moons in a 5 x 6 layout and were asked to say "moon" when they saw a picture of a moon and "sun" when they saw a picture of a sun.They were then timed to see how many pictures they could respond to correctly in 45 seconds.Next, children were asked to repeat the task saying the opposite of the picture.Children were not allowed to continue on to the next picture until the previous picture was responded to correctly.The total score was the number of items completed on the "opposite" trial in 45 seconds.
Cognitive flexibility -A version of the Dimensional Change Card Sort task (Zelazo, 2006) was used as a measure of attention shifting.Children were asked to sort a variety of colored picture cards on the basis of three dimensions: color, shape, and size.One set of items was given for each of the three dimensions.Then, a fourth set of items that included two dimensions was given (i.e., children were told to sort on color if the card had a black border and on size if the card did not have a border).Children received one point for each correct response.
Verbal working memory -To assess working memory, we used the computerized listening recall task from the Automated Working Memory Assessment (AWMA; Alloway, 2007).In this task, children listened to one or more sentences, were asked whether each sentence was true or false, and then were asked to recall the last word of each sentence in order.The task increased in difficulty after each set of questions (the first set of questions children responded with the last word of one sentence, the next set two sentences, etc.).Children were explicitly informed that they would need to remember the last word(s).The children completed trials within each block until they did not recall any words from the sentences in the same block.Children were awarded one point for each correct last-word they identified in the correct order.The outcome score was the total number of times a participant accurately recalled the last word in a sentence across trials.This test has been shown to have strong test-retest reliability (r = .88;Alloway, Gathercole, & Pickering, 2006).

Broad executive functioning -
The Head-Toes-Knees-Shoulders (HTKS) was used as a measure that integrates cognitive flexibility, working memory, and response inhibition through a gross motor task (McClelland & Cameron, 2012;McClelland et al., 2014).On the first part of this measure, children were instructed to touch their toes when told to "touch your head" and vice versa.The measure includes three sections of ten items each, with the task becoming progressively harder.Possible scores range from 0 to 60, with a total of 30 test items receiving scores of 0 (incorrect), 1 (self-correct), or 2 (correct).Previous research indicates high interrater agreement and evidence of convergent and predictive validity of this measure in assessing children's executive functioning (McClelland et al., 2007;McClelland et al., 2014).

Processing Speed
Rapid automatized naming (RAN) was used to measure processing speed as it has been shown to be highly related to both mathematics and literacy skills (Georgiou, Tziraki, Manolitsis, & Fella, 2013).RAN was assessed through a picture-naming task and a color-naming task.Children were initially asked to name four common pictures (house, cat, car, pig).They were then presented with a page of 40 pictures (5 x 8) and asked to name the pictures in order as fast as they could.If a child incorrectly named a picture or skipped a picture, they were redirected back to that picture.The total time (in seconds) it took to name all 40 pictures was recorded.After the picture task, the same task was repeated, but with colors (blue, red, green, black).The average of the two completion times was used as the child's RAN score.Higher scores on this task indicate slower processing speed.

Covariates
Three background variables were also used in the analyses.These covariates were age, sex, and highest parental education (scored on an 8 point scale ranging from less than 8 th grade to doctoral/postgraduate degree).

Procedure Assessment Procedure
Children were assessed on all tasks in the fall of the academic year in three or four 20-30 minute sessions.
Testing was conducted in shorter sessions as needed.The testing was repeated in the spring (approximately five months later).Assessments took place in the preschools at times identified by the schools in a room or area designated by the school directors or teachers.Individuals who had either completed or were working toward completion of a Bachelor's degree in psychology, speech/language and hearing sciences, or human development conducted the assessments.All testers completed two 2-3 hour training sessions where they learned how to administer each of the assessments.Both individual and group practice sessions followed this training.After approximately one to two weeks, all testers completed a "testing-out" session where they administered the test to lead project staff in a mock testing session.During the testing out session, lead project staff ensured all testers were fluent with the assessment measures and appropriately administered each task.

Analytic Strategy
Using IBMS SPSS Statistics 22, we utilized classification and regression trees (also termed recursive partitioning) to explore the individual characteristics and combinations of characteristics that are associated with low mathematics ability.Analyses were initially conducted using the full sample; however, those results primarily resulted in group classification that appeared largely based on age given that all variables were raw scores and most do not have normative age standard scores.Therefore, we split the sample into two age groups at the median age -children age 4.16 or younger (n = 58), and children older than 4.16 (n = 55).For each age group, we drew random subsamples to create forests of fifty trees.Classification and regression trees are particularly beneficial for small sample sizes because of the relaxed assumptions compared to more traditional regression analyses.Further, there are several benefits to using CART as a complement to more traditional regression techniques.First, CART is a tool that can be used to more easily explore and interpret nonlinear relations among variables (Gruenewald et al., 2008;Speybroeck, 2012).Second, it can be used to explore and interpret multi-level interactions (Gruenewald et al., 2008;Speybroeck, 2012).It also deals with a wide range of predictor variables (more variables are actually beneficial in CART analyses), and rather than estimating the average effect of variables on the outcome (as is the case in multiple regression analyses), it can estimate unique pathways for sub-groups of participants (Speybroeck, 2012).Finally, CART offers internal tests of the replicability of findings (described in detail below).Therefore, in these analyses, we used multiple tests for significance used to create the tree, Bonferroni adjustments were made to reduce chances of Type I error.To avoid overfitting the data (i.e., terminal nodes with 1 participant), we split continuous predictors into tertiles prior to analysis.
After generating forests, we used several criteria to select the final trees to represent each age group.We used a split-sample validation technique to test the robustness of each pathway; a test tree was grown in a randomly selected 60% of participants and then replicated in a training tree using the remaining 40%.Using t-tests, we compared mean levels of mathematics scores in each terminal node across the training and test trees.
Pathways that had significantly different mean scores were excluded from the final forest because they were determined to be nonreplicable, as were trees in which terminal nodes contained less than 10% of the sample.
Finally, we calculated the proportion of variance explained by the tree using the equation: Trees that contained substantially low proportions of variance in either the training or test trees were excluded from the final forest for each age group.

Descriptive Statistics
Table 1 provides descriptive statistics and correlations for all study variables.Mathematics score at Time 1 was significantly correlated with all other variables for both age groups.Time 2 math was correlated with all other variables in both age groups with the exception of verbal working memory for the younger group.There were no significant differences in mathematics score at Time 1 or Time 2 between male and female students in either age group, Younger -Time 1: t(56) = 0.57, p = .570;Time 2: t(56) = 0.61, p = .545;Older -Time 1: t(53) = 0.97, p = .337;Time 2: t(53) = 0.50, p = .622.

Classification and Regression Tree Analyses
CART analyses resulted in nine trees for the younger age group and 12 trees for the older age group.Using the trees, Time 1 variables that classified both low and high performance at Time 2 were identified.Low performance was determined by identifying the terminal node with the lowest mean score in an individual tree and high performance was determined by identifying the node with highest mean performance in an individual tree.Though classification of high performance was not an explicit goal of this study, we present the data to show the consistency of classifiers for low and high mathematics performance.In Table 2, the variables that predict low and high mathematics performance for younger children are presented.Note.T2 = Time 2; ML = Mathematical language; ANS = approximate number system; PK = Print knowledge; RI = Response inhibition; HTKS = Head, toes, knees, shoulders activity; CF = Cognitive flexibility.Variables with no significant predictors are not included in the table.
In Table 3, the variables that predict low and high mathematics performance for older children are presented.
Variables that were included in the analyses but did not result as significant predictors for an age group are not included in the tables.Visuals of all of the trees can be found in the Appendix.Given the exploratory nature of the study, it is best to examine trends across the forest to identify factors that appear to be strong classifiers of low and high mathematics performance so as not to over-interpret the findings; however, we also describe the trees that accounted for the most variance for younger and older children to provide descriptive examples of the findings.

Younger Children
As can be seen in Table 2, for younger children, a total of seven variables (sex, mathematical language, ANS, print knowledge, response inhibition, HTKS, and cognitive flexibility) each appeared in at least one pathway predicting the lowest mathematical scores at Time 2. Individually and in combination with other factors, the most common predictors of low mathematics performance at Time 2 were Time 1 scores on mathematical language, print knowledge, and response inhibition.Similar patterns of predictors emerged for predicting high performance on Time 2 mathematics based on Time 1 variables.
The percentage of variance in Time 2 mathematics scores explained by the trees ranged from 11.49% to 53.39%.Tree 1 (Figure 2) accounted for the highest percentage of variance in mathematics scores at Time 2, and it also predicted the greatest discrepancy between high and low average mathematics scores.In this tree, a response inhibition score greater than 21 was associated with a high average mathematics score (15.67), whereas a response inhibition score less than or equal to 21 in combination with a mathematical language score less than or equal to 7 was associated with a low average mathematics score (4.43).This suggests that for children between the ages of 3.12 and 4.16, greater response inhibition is linked to greater mathematical ability, whereas the combination of lower levels of response inhibition and lower levels of mathematical language skills is a strong classifier of later low mathematical performance.Note.T2 = Time 2; T1 = Time 1; ML = Mathematical language; ANS = approximate number system; PK = Print knowledge; DV = Definitional vocabulary; PA = Phonological awareness; VMW = Verbal working memory.Variables with no significant predictors are not included in the table.
Purpura, Day, Napoli, & Hart 377 Figure 2. The tree that accounted for the most variance in classifying younger children included nodes for both response inhibition and mathematical language such that the combination of low response inhibition and low mathematical language indicated those children who would be the lowest performers on mathematical knowledge at time 2. Note that the test tree (40% of total sample) is presented not the initial training tree (60% of sample).

Older Children
As can be seen in Table 3, for older children, a total of five different variables (Time 1 math, mathematical language, ANS, definitional vocabulary, and verbal working memory) each appeared in at least one pathway predicting the lowest mathematics scores at Time 2. Individually and in combination with other factors, the most common predictors of low mathematics performance at Time 2 were Time 1 scores on mathematics, mathematical language, and definitional vocabulary.Similar patterns of predictors emerged for predicting high performance on Time 2 mathematics based on Time 1 variables; however, print knowledge and phonological awareness each predicted one pathway but the ANS and verbal working memory did not.Similar to predicting lower mathematics performance, Time 1 mathematics, mathematical language, and definitional vocabulary were the most common predictors of high performance.
The percentage of variance in Time 2 mathematics scores explained by the trees ranged from 12% to 44%.
Trees 1 and 2 (Figure 3) accounted for the greatest percentage of variance in Time 2 mathematics score (about 44%).In both trees, lower mathematical language scores were associated with less mathematical ability at Time 2. In Tree 1, mathematical language scores greater than 14 were associated with the highest average Time 2 mathematics score (20.78).In Tree 2, however, children with mathematical language scores greater than 14 in combination with definitional vocabulary greater than 60 had the highest average mathematics score at Time 2 (22.25).These findings suggest that greater mathematical language and definitional vocabulary skills are linked to greater mathematical ability, whereas low mathematical language scores alone were a strong classifier of low mathematics performance.The first tree includes a split just for mathematical language.The second tree also included a node for mathematical language that is then split again based on general language such that a combination of high mathematical language and high general language indicated high performance in mathematical knowledge at time 2. Note that the test trees (40% of total sample) are presented not the initial training trees (60% of sample).

Discussion
In this study, we examined a variety of domain-general and domain-specific factors to determine classification of low and high mathematical performance using CART analyses, a person-centered analytic strategy designed to highlight potential higher-order interactions among variables.One of the key findings from this study is that it appears that domain-general factors (e.g., language and executive functioning) seem to be better classifiers of low performance than are domain-specific predictors (e.g., ANS and numeracy skills), particularly for younger children.This is despite the strong correlations between domain-specific factors and mathematics performance at Time 1 and Time 2. Though general language was a strong classifier for older children, mathematical language was a more consistent and robust classifier across age groups.Mathematical language has been classified in this study as a domain-general variable, but it should be noted that it also highly overlaps with domain-specific skills as it is comprised of content-specific language.In prior work, it has been found to be similarly related to both general language (domain-general skills) and numeracy performance (domain-specific skills) (Purpura & Reid, 2016).
The pattern of results found in this study may be due to one of two potential reasons.First, mathematical language and executive functioning skills seemed to be the strongest and most consistent classifiers of low performance.Both of these domains have been reported to be foundational for early mathematics development (Blair & Raver, 2015;Purpura, Napoli, Wehrspann, & Gold, 2017;Schmitt, McClelland, Tominey, & Acock, 2015).Further, these findings align with evidence from twin studies that suggest that general genetic Purpura, Day, Napoli, & Hart influences, potentially related to domain-general influences, underlie achievement across both general mathematics performance and mathematics difficulty (Plomin & Kovas, 2005).Given the limited time spent during the preschool year on mathematics skills in general (Piasta, Pelatti, & Miller, 2014), children deficient in either or both mathematical language and executive function may not have had the foundational skills necessary to benefit from that limited amount of instruction.
Second, it is evident that Time 1 mathematics performance is a better classifier of both low and high performance for older children than it is for younger children.This finding indicates that mathematics ability may be more stable for older preschoolers than for younger preschoolers.However, it may also be indicative of low performance in mathematics being the result of difficulties in non-mathematical areas such as language.These findings are supported by recent evidence using CART analyses with elementary school age children that found literacy and language skills were better classifiers of low mathematics performance than was initial mathematics performance (Truckenmiller, Petscher, Gaughan, & Dwyer, 2016).Further, the ANS was not a strong classifier of low performance which is inconsistent with literature that suggests it is an underlying feature of mathematical difficulties (Mazzocco, Feigenson, & Halberda, 2011), but aligns with other evidence that indicates the ANS is not likely to be a core facet in the development of mathematics difficulties (De Smedt et al., 2013).It is possible that that the ANS may be important in the development of symbolic mathematics skills, but only in a time-or developmental-phase limited fashion.For example, Chu and colleagues (2015) found that the ANS was related to informal mathematics skills, but not formal, and Purpura and Logan (2015) found that the ANS was a classifier of low mathematics performance but not moderate or high.Given the results of the current study, it appears unlikely that the ANS underlies mathematics difficulties as it is not a strong classifier of risk status.It may be that, in prior work, the ANS was acting as a proxy for other more important variables such as response inhibition or mathematical language skills.
Of particular importance in this study, the most consistent predictor of low mathematics performance in both age groups at Time 2 was Time 1 mathematical language.Both parent and teacher mathematical language talk are associated with children's growth in mathematics skills before and during preschool (Boonen, Kolkman, & Kroesbergen, 2011;Levine, Suriyakham, Rowe, Huttenlocher, & Gunderson, 2010) and recent evidence suggests that mathematical language underlies the development of symbolic numeracy skills (Purpura, Napoli, et al., 2017).It may be that in classrooms or homes lacking such foundational inputs, children do not have access to the necessary foundation upon which to build their mathematical knowledge.This aligns with previous work identifying mathematical language as one of the strongest predictors of early mathematics skills (Purpura & Logan, 2015;Toll & Van Luit, 2014a) and evidence suggesting that language skills may exacerbate difficulties in mathematics (Hanich et al., 2001;Jordan & Hanich, 2000;Silver et al., 1999).
Finally, one noted difference between the classifiers for younger and older preschool children was that executive functioning skills (particularly response inhibition) and print knowledge were common classifiers for younger children and general vocabulary skills were a common classifier for older children.The finding for younger children is supportive of evidence indicating that executive functioning skills are foundational for other school readiness domains (Blair & Raver, 2015) and particularly for math (Schmitt, Pratt, & McClelland, 2014).
Further, the print system-connecting symbols to their names and quantities-has been identified as a critical connection between informal and formal mathematics skills and children who have difficulties with the symbolic number system often have mathematical learning disabilities (Merkley & Ansari, 2016;Purpura et al., 2013; print skills, it may be acting as a proxy for more general code-related mapping skills (Koponen et al., 2013).
Finally, the finding for older children indicates that general language skills in addition to content-specific mathematical language may be related to low and high performance in mathematics which suggests that multiple aspects of language, or more complex and advanced language, may be integral for the successful acquisition of mathematics skills in preschool.However, it is also possible that this measure of general vocabulary skills was acting as a proxy for more general ability such as IQ, which was not measured in this study.

Limitations and Future Directions
Though a relatively clear and logical set of classifiers of performance was identified across both older and younger children, there are a number of limitations that must be discussed.First, though there are no prespecified sample sizes necessary for CART analyses and they are often used with smaller samples, the sample still may have been too small to identify reliable additional splits after the first or second splits.Subsequent studies should include larger samples of children within these age ranges.Further, the median split utilized to separate the sample into two age groups still resulted in groups that were relatively close in age (i.e., approximately one year difference).To better test if risk classification for low mathematics performance may differ depending on children's age, contrasts between samples with slightly larger age differences (e.g., 3 year olds vs. 5 year olds) or analyses predicting risk for the same children across different time points (e.g., predicting risk for the same children at 3 years old and then again for the same sample at 4 years old) should be conducted in future research.
Second, the sample was of somewhat higher average parental education than the typical population.A more diverse sample, as well as a sample with lower overall mathematical abilities, may reveal additional nodes not identified in the current study.
Third, in this study we only utilized one broad measure of symbolic numeracy skills rather than measures of the diversity of subdomains within the construct (e.g., cardinality, numeral knowledge).Recent evidence has suggested that cardinal number knowledge (Geary & vanMarle, 2016) and knowledge of the formal numeral system (Göbel et al., 2014;Merkley & Ansari, 2016;Purpura, Baroody, & Lonigan, 2013), may be specific components of the symbolic system that, individually, may be stronger classifiers of risk for later difficulties than a broad general measure.Similarly, some evidence has indicated that the OTS may be a domain-specific skill that is important for symbolic mathematics develop (Andersson & Östergren, 2012;Mou & vanMarle, 2014), but only at earlier stages in development (vanMarle, Chu, Mou, Seok, Rouder, & Geary, 2016).Unfortunately, a measure of the OTS was not included in the current study.Though the measure of symbolic numeracy skills was not revealed to be a consistent predictor of risk for younger children, more targeted components of numeracy may be unique predictors at different ages as the acquisition of these targeted skills may be indicators of critical junctions in mathematical development.Subsequent work should include a broader range of more targeted domain-specific measures.
Fourth, though we included a broad range of domain-general and domain-specific variables, a broader indicator of general IQ was not included.It is possible that IQ or other measures of general cognitive ability may be better classifiers of risk status as some evidence suggests that a significant portion of variance in mathematical development is accounted for by general cognitive abilities (Schenke, Rutherford, Lam, & Bailey, 2016).
Purpura, Day, Napoli, & Hart Finally, though appropriate for the sample and as a first use of CART analyses in examining classification of low mathematics performance, these analyses were exploratory.CART analyses do, however, include an internal replication analysis (training versus test tree) which does significantly limit spurious findings.There were a number of trees that were removed from the forest analyses because the replication across the test and training trees was not successful.Further follow up studies are needed to replicate and extend these findings.

Conclusions
Prior theoretical and empirical work examining predictors of low mathematical performance has yielded little consensus regarding the underlying causes of low mathematics performance, and some researchers have suggested that there may be multiple underlying causes of mathematical difficulties (Jordan et al., 2002;Mazzocco & Myers, 2003).The findings from this study support this notion that multiple factors, particularly mathematical language and executive functioning, may underlie low mathematics performance or at least be strong mechanisms for classifying children who are likely to have later low mathematics knowledge.However, and critically important, the factors associated with performance classification may differ by age.Future work examining more targeted classification strategies across ages is needed.

Figure 1 .
Figure 1.An example of a single recursive partitioning tree predicting math score at time 2. Note that this figure is an example of a tree and is not based on real data.
This measure includes three subtests: print knowledge, definitional vocabulary, and phonological awareness.The print knowledge subtest measures print concepts, letter discrimination, word discrimination, letter-name identification, and letter-sound identification.It has three sets of items and each has a separate ceiling rule.There are a total of 36 items that are either multiple choice or free response and children are awarded one point for each correct response.The definitional vocabulary subtest measures children's singleword spoken vocabulary and their ability to formulate definitions for words.There are a total of 35 items for which children are shown a picture and asked to identify the object (e.g., "What is this?") and describe its function (e.g., "What is it for?").Children are awarded one point each for identifying and describing for a maximum of 70 total points.The phonological awareness subtest includes both multiple choice and free response items involving blending and elision of words and sounds.There are a total of 27 questions divided into four sets.Two sets use multiple choice questions and the other two are free response.Children are awarded one point for each correct response.All three subtests have shown strong evidence of internal consistency (Cronbach's α = .95for print knowledge, Cronbach's α = .94for definitional vocabulary, and exhaustive chi-square automatic interaction detector(Biggs, De Ville, & Suen, 1991) to generate trees predicting Time 2 mathematics score using an extensive range of covariates, mathematics score at Time 1, mathematical language score, Panamath percentage, print knowledge, definitional vocabulary, phonological awareness, verbal working memory, cognitive flexibility, response inhibition, HTKS, and RAN.Due to the Purpura, Day, Napoli, & Hart 373 Journal of Numerical Cognition 2017, Vol.3(2), 365-399 doi:10.5964/jnc.v3i2.53 Below the diagonal are correlations for the younger sample (n = 55) and above the diagonal are correlations for the older sample (n = 58).ANS = approximate number system; RAN = Rapid automatized naming; HTKS = Head, toes, knees, shoulders activity.*p < .05. **p < .01. ***p < .001.Purpura, Day, Napoli, & Hart 375 Journal of Numerical Cognition 2017, Vol.3(2), 365-399 doi:10.5964/jnc.v3i2.53

Figure 3 .
Figure3.The trees that accounted for the most variance in classifying older children included nodes for mathematical language.The first tree includes a split just for mathematical language.The second tree also included a node for mathematical language that is then split again based on general language such that a combination of high mathematical language and high general language indicated high performance in mathematical knowledge at time 2. Note that the test trees (40% of total sample) are presented not the initial training trees (60% of sample).

Table 1
Correlations and Descriptive Statistics for all Measures

Table 2
Forest of Trees for Time 1 Variables Predicting the Lowest and Highest Scores at Time 2 for Younger Children

Table 3
Forest of Trees for Time 1 Variables Predicting the Lowest and Highest Scores at Time 2 for Older Children