Research Reports

Identifying Domain-General and Domain-Specific Predictors of Low Mathematics Performance: A Classification and Regression Tree Analysis

David J. Purpura^*^a, Elizabeth Day^a, Amy R. Napoli^a, Sara A. Hart^b ^c

[a] Department of Human Development and Family Studies, Purdue University, West Lafayette, IN, USA. [b] Department of Psychology, Florida State University, Tallahassee, FL, USA. [c] Florida Center for Reading Research, Tallahassee, FL, USA.

Journal of Numerical Cognition, 2017, Vol. 3(2), 365–399, https://doi.org/10.5964/jnc.v3i2.53

Received: 2016-05-28. Accepted: 2017-05-28. Published (VoR): 2017-12-22.

Handling Editors: Silke Goebel, University of York, York, United Kingdom; André Knops, Humboldt-Universität Berlin, Berlin, Germany; Hans-Christoph Nuerk, Universität Tübingen, Tübingen, Germany

*Corresponding author at: Department of Human Development and Family Studies, Purdue University, 1202 W. State Street, Rm. 231, West Lafayette, IN 47907-2055, USA. Phone: 1-765-494-2947. E-mail: purpura@purdue.edu

This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Many children struggle to successfully acquire early mathematics skills. Theoretical and empirical evidence has pointed to deficits in domain-specific skills (e.g., non-symbolic mathematics skills) or domain-general skills (e.g., executive functioning and language) as underlying low mathematical performance. In the current study, we assessed a sample of 113 three- to five-year old preschool children on a battery of domain-specific and domain-general factors in the fall and spring of their preschool year to identify Time 1 (fall) factors associated with low performance in mathematics knowledge at Time 2 (spring). We used the exploratory approach of classification and regression tree analyses, a strategy that uses step-wise partitioning to create subgroups from a larger sample using multiple predictors, to identify the factors that were the strongest classifiers of low performance for younger and older preschool children. Results indicated that the most consistent classifier of low mathematics performance at Time 2 was children’s Time 1 mathematical language skills. Further, other distinct classifiers of low performance emerged for younger and older children. These findings suggest that risk classification for low mathematics performance may differ depending on children’s age.

Keywords: math, numeracy, preschool, risk status, classification and regression trees

Given the importance of foundational mathematical skills for the acquisition of later mathematical skills (Geary, 1994), it is critical that all children be provided with maximal opportunities to develop these foundational skills early in their academic careers. These opportunities are particularly important for those children who are most at risk for later difficulties. Unfortunately, many children enter preschool significantly behind their peers in foundational mathematical knowledge (Starkey, Klein, & Wakeley, 2004; Stevenson et al., 1990). Children with early mathematical deficits usually develop skills at a slower rate than their peers (Aunola, Leskinen, Lerkkanen, & Nurmi, 2004) and experience long-standing academic and career difficulties (Adelman, 1999; Duncan et al., 2007; Evan, Gray, & Olchefske, 2006; McGregor, 1994).

Efforts to provide high quality early mathematics instruction to combat the challenges with early mathematics development have shown promising results (Clements & Sarama, 2008; Starkey et al., 2004). However, initial impacts of high-quality interventions are often attenuated as a function of underlying general abilities after children leave preschool (Bailey et al., 2016) and many children do not initially seem to respond to high-quality mathematics instruction (Fuchs, Fuchs, & Compton, 2012)—potentially because they also experience deficits in non-mathematical areas. Researchers and practitioners would benefit from more targeted and effective methods for identifying which children are at-risk for later difficulties and, most importantly, the factors that help predict risk, in order to develop more targeted and long-lasting instructional effects. This study addresses these needs by utilizing classification and regression trees (CART), a method used for identifying higher-order interactions among variables, to explore how the combinations of domain-specific factors (e.g., the approximate number system [ANS], initial numeracy performance) and/or domain-general factors (e.g., executive functioning, literacy/language) are associated with low mathematics achievement for preschool children.

Difficulties in Early Mathematics

The acquisition of early mathematics skills does not occur in isolation from other academic and cognitive skills. A number of domain-specific (Chen & Li, 2014; Merkley & Ansari, 2016) and domain-general (LeFevre et al., 2010; McClelland et al., 2007) factors have been found to be related to children’s acquisition of mathematical skills (Geary, 2005; Geary & Moore, 2016). Importantly, when some of these factors are deficient, the severity of mathematical difficulties is magnified (Barbaresi, Katusic, Colligan, Weaver, & Jacobsen 2005; Hanich, Jordan, Kaplan, & Dick, 2001; Jordan & Hanich, 2000; Silver, Pennett, Black, Fair, & Balise, 1999). Despite evidence indicating that such factors may be related to difficulties in mathematics skills, there is limited agreement in the field on how to identify children who are likely to experience later difficulties in mathematics (Watson & Gable, 2013). Some researchers hypothesize that difficulties in mathematics do not have a single cause, but instead could be the result of deficits in a range of different factors or combinations of factors (Jordan, Kaplan, & Hanich, 2002; Mazzocco & Myers, 2003). A number of theories regarding the origins of difficulties in mathematics have been posited (for an in-depth review, see Andersson & Östergren, 2012; Ashkenazi, Black, Abrams, Hoeft, & Menon, 2013). In general, these theories typically address individual aspects of domain-specific and domain-general factors related to difficulties in mathematics (Geary, 2005).

Domain-Specific Factors

Researchers suggest that there are two core non-symbolic systems that may underlie symbolic mathematics, the ANS and the object-tracking system (OTS), and deficits in one or both of these systems may result in difficulties in mathematics (Dehaene, 2011; Piazza, 2010; Piazza et al., 2010). The ANS is an individual’s ability to discriminate between two large numerosities (Halberda, Mazzocco, & Feigenson, 2008) and the OTS is a system that allows individuals to discriminate between small subitizable sets (Trick & Pylyshyn, 1994). A fairly substantial body of literature exists on the relation between the ANS and the symbolic numeracy system (e.g., skills such as counting, comparison, and addition) and it seems as though there may be a small and stable relation between the domains (Chen & Li, 2014) that is strongest in young children (Fazio, Bailey, Thompson, & Siegler, 2014; Inglis, Attridge, Batchelor, & Gilmore, 2011). This relation appears to be nonlinear (Bonny & Lourenco, 2013; Purpura & Logan, 2015) with the ANS more related to earlier verbally-based counting skills than to later print-based and calculation skills (e.g., numeral knowledge and addition; Chu, vanMarle, & Geary, 2015; Libertus, Feigenson, & Halberda, 2013). However, other researchers have found that the ANS may actually be reflective of domain-general skills such as executive functioning as its relation to symbolic numeracy skills is attenuated when controlling for response inhibition (Fuhs & McNeil, 2013). Thus, even though meta-analyses have indicated a small relation between the ANS and symbolic numeracy skills, its validity as a domain-specific skill has been questioned (Leibovich & Ansari, 2016). In contrast to the ANS, less work has been conducted on the OTS, but this system appears to be active beginning in infancy and OTS deficits may be suggestive of later difficulties in mathematics (Andersson & Östergren, 2012; Mou & vanMarle, 2014).

Beyond the ANS and OTS, researchers have indicated that challenges in acquiring specific aspects of the symbolic numeracy system, such as cardinal number knowledge (Geary & vanMarle, 2016) and the numeral system (Göbel, Watson, Lervåg, & Hulme, 2014; Merkley & Ansari, 2016), may underlie difficulties in mathematics. Some have even suggested that these components of early numeracy are more related to low mathematics performance than the ANS as they are more representative of school-taught mathematics skills (De Smedt, Noël, Gilmore, & Ansari, 2013). It may be that more formal, school-taught, skills are developmentally dependent on these aspects of early numeracy as the relation between informal skills (e.g., cardinality) to formal skills (e.g., addition) is mediated by numeral knowledge (Purpura, Baroody, & Lonigan, 2013) such that a deficit at any point in that development may result in subsequent deficits. As such, difficulties in the acquisition of these symbolic numeracy skills may play a role in later low performance on measures of broader mathematics skills.

Domain-General Factors

A number of domain-general factors also have been associated with low mathematics performance, including broad structures of executive functioning and literacy skills as well as specific components of each of these domains, and particularly components such as working memory and language.

Executive functioning

Executive functioning, broadly, is the cognitive processes needed to complete a specific task or goal (Blair, Ursache, Greenberg, Vernon-Feagans, & the Family Life Project Investigators, 2015; Ponitz, McClelland, Matthews, & Morrison, 2009). It has been found to be an important predictor of academic success—particularly in mathematics (Allan, Hume, Allan, Farrington, & Lonigan, 2014; Fuhs, Nesbitt, Farran, & Dong, 2014; McClelland, Acock, & Morrison, 2006). There are three primary components that comprise executive functioning: working memory, response inhibition, and attention shifting (Lehto, Juujaarvi, Kooistra, & Pulkkinen, 2003; Miyake et al., 2000). Some prior evidence indicates that components of executive functioning may be differentially related to distinct components of mathematics (Lan, Legare, Ponitz, Su, & Morrison, 2011; Purpura, Schmitt, & Ganley, 2017). In particular, much of the research focused on mathematical difficulties and components of executive functioning has linked domains such as working memory and other aspects of a general cognitive system with low mathematics performance (Raghubar, Barnes, & Hecht, 2010; Swanson & Jerman, 2006; c.f. Landerl, Bevan, & Butterworth, 2004). Notably, in the domain-general cognitive deficit hypothesis (Geary, 2004) it is proposed that children with low mathematical performance have a deficit in their underlying domain-general cognitive system (e.g., executive functioning, processing speed) which limits their ability to effectively acquire early mathematics skills.

Literacy and language skills

Though some research has connected both print knowledge (Brizuela, 2004; Neumann, Hood, Ford, & Neumann, 2013; Purpura, Hume, Sims, & Lonigan, 2011; Purpura & Napoli, 2015) and phonological awareness (Hecht, Torgesen, Wagner, & Rashotte, 2001; Koponen, Aunola, Ahonen, & Nurmi, 2007; Krajewski & Schneider, 2009; Michalczyk, Krajewski, Prebler, & Hasselhorn, 2013; Simmons, Singleton, & Horne, 2008; Vukovic & Lesaux, 2013b) to the development of children’s mathematical ability, among literacy skills, children’s oral language abilities have the strongest connection to mathematical ability (Praet, Titeca, Ceulemans, & Desoete, 2013; Purpura et al., 2011; Vukovic & Lesaux, 2013a, 2013b). LeFevre and colleagues (2010) found that the linguistic pathway of mathematics development was a stronger and more stable pathway than either the quantitative or visual spatial pathways. Further, children with difficulties in both mathematics and language tend to have more severe (Hanich et al., 2001; Jordan & Hanich, 2000) and persistent (Silver et al., 1999) mathematics difficulties than children with mathematics difficulties alone.

Given the importance of language skills early on in children’s mathematics development, it is important to note that evidence from other recent studies (Purpura & Logan, 2015; Purpura & Reid, 2016; Toll & Van Luit, 2014a, 2014b) indicates that a specific component of language that overlaps with both domain-general and domain-specific skills—mathematical language (e.g., knowledge of words such as “many,” “few,” “near,” “before”)—is actually a stronger and more proximal predictor of mathematics performance than general language skills and other cognitive skills. Deficits in such knowledge, rather than in general language skills, may underlie deficits in mathematical performance.

An Alternative Risk Status Assessment Approach: Classification and Regression Trees

Given that domain-specific factors and domain-general factors have been independently associated with overall mathematics performance and low mathematical performance in particular, it is evident that deficits in each area and deficits in broad mathematical ability are likely to co-occur. However, separately assessing the association between each domain and mathematical ability does not capture each domain’s relative importance as a predictor or risk factor for low mathematical performance. A number of methods (e.g., discriminant analysis, logistic regression) have been developed to utilize multiple domains to predict risk status for learning difficulties in general (Hosmer & Lemeshow, 1989; Lachenbruch, 1975). One underutilized analytic method that could be used to identify risk factors for classifying children at risk of low mathematics performance is CART (Breiman, Friedman, Olshen, & Stone, 1984). Though a number of studies have used CART analyses in reading difficulty classification (Compton, Fuchs, Fuchs, & Bryant, 2006; Koon, Petscher, & Foorman, 2014) this method can also be applied to identifying children at risk of low mathematics performance.

CART is a strategy that uses step-wise partitioning to create increasingly homogenous subgroups from a heterogeneous sample using a variety of predictor variables (Gruenewald, Mroczek, Ryff, & Singer, 2008; Speybroeck, 2012). When the outcome variable is continuous, such as mathematics score, t-tests derived from F tests determine splits that maximize the difference in average mathematics scores between subgroups (Gruenewald et al., 2008). This continues until the predictors can no longer effectively split the subgroups. Terminal subgroups are designated as “risk” or “no risk” based on the average mathematics score for the children in the subgroup. Each subgroup, whether or not it is split, is labeled as a “node.” A node that cannot be split is termed a terminal node. The results resemble a tree structure such as the example in Figure 1 (note this is an example and does not use actual data). In this example, a full sample of 100 children was first split based on Variable 1. Children with a score less than or equal to 14 had the lowest average Time 2 mathematics score (Terminal Node 1; M = 14.52). Terminal Nodes 3 and 4 suggest that the combination of Variable 1 and Variable 2 was also associated with Time 2 mathematics score, such that children with a combination of high levels of Variables 1 and 2 had the highest average Time 2 mathematics score (Node 4; M = 21.67), and those with lower levels of Variable 2 had a lower average Time 2 mathematics score (Node 3; M = 18.20). In other words, this tree suggests that low levels of Variable 1 is a risk factor for low mathematics performance, whereas high levels of Variable 1 may serve as a protective factor for children with low levels of Variable 2. Importantly, these pathways are interpreted as combinations of factors, not a sequence of events (Gruenewald, Seeman, Ryff, Karlamangla, & Singer, 2006). CART is beneficial because it illuminates nonlinear pathways among the many factors included in the model. However, as a single tree structure is sensitive to changes in the sample from which they are created, “forests” of trees (i.e., all identified trees that account for significant variance in classification) may provide insight as to which factors consistently predict the outcome across many trees, rather than relying on a single tree (Strobl, Malley, & Tutz, 2009). CART analyses produce a series of potential trees (i.e., a forest of trees) that illuminate all reasonable potential classification modes.

Variable	1	2	3	4	5	6	7	8	9	10	11	12
1. Time 1 Math	−	.62***	.64***	.68***	.48***	.43**	-.46***	.39**	.51***	.64***	.48***	.68***
2. Mathematical language	.56***	−	.62***	.54***	.61***	.41**	-.44***	.46***	.46***	.51***	.41**	.61***
3. ANS	.51***	.66***	−	.51***	.38**	.51***	-.41**	.14	.57***	.46***	.54***	.59***
4. Print knowledge	.51***	.63***	.58***	−	.57***	.37**	-.38**	.27*	.43**	.55***	.30*	.58***
5. Definitional vocabulary	.35**	.59***	.42**	.37**	–	.39**	-.47***	.31*	.27*	.39**	.18	.53***
6. Phonological awareness	.32*	.42**	.31*	.34*	.37**	–	-.44***	.36**	.28*	.32*	.37**	.45***
7. RAN	-.44**	-.46***	-.32*	-.42**	-.35**	-.11	−	-.32*	-.33*	-.36**	-.25	-.31*
8. Response inhibition	.47***	.53***	.46***	.62***	.35**	.19	-.45***	−	.21	.40**	.25	.29*
9. HTKS	.40**	.45***	.35**	.39**	.38**	.37**	-.20	.27*	−	.30*	.35**	.36**
10. Cognitive flexibility	.49***	.56***	.32*	.38**	.25	.18	-.36**	.34*	.41***	−	.29*	.38**
11. Verbal working memory	.33*	.22	.27	.30*	.28*	.29*	-.17	.07	.26	-.06	−	.38**
12. Time 2 Math	.56***	.65***	.55***	.55***	.38**	.34*	-.45***	.53***	.50***	.50***	.26	−
Younger Sample
M	7.44	10.09	61.86	14.65	44.33	13.34	89.18	16.85	7.44	11.24	1.09	11.25
SD	4.41	3.87	15.06	10.20	13.56	4.80	32.80	7.23	13.32	6.10	1.80	5.38
Range	0 to 19	2 to 16	27.50 to 86.25	0 to 36	8 to 66	5 to 26	46.50 to 172.25	3 to 35	0 to 52	4 to 22	0 to 6	2 to 23
Older Sample
M	13.24	12.81	72.80	22.41	56.91	16.16	66.62	23.21	13.22	15.67	2.47	15.60
SD	5.55	2.81	16.25	10.39	6.33	5.57	17.94	7.58	15.74	6.19	3.25	5.69
Range	0 to 23	2 to 16	41.25 to 100	1 to 36	36 to 66	5 to 27	37.50 to 111.00	0 to 36	0 to 56	4 to 22	0 to 11	0 to 24

Tree	R²	M T2 Math	Sex	ML	ANS	PK	RI	HTKS	CF
Predictors of Low Performance
1	53.39	4.43		≤ 7			≤ 21
2	52.15	7.39				≤ 10
3	32.26	6.50	Male		≤ 66.25
4	30.85	7.20				≤ 7
5	29.44	7.67		≤ 7
6	16.14	9.00							≤ 6
7	14.18	5.50					≤ 11
8	13.47	3.00				≤ 7		0
9	11.49	8.00		≤ 8
Predictors of High Performance
1	53.39	15.67					> 21
2	52.15	12.44				>10
3	32.26	14.44			> 66.25
4	30.85	17.00		> 11		> 7		> 2
5	29.44	14.67		>12
6	16.14	14.75					> 13		> 6
7	14.18	13.22					> 20
8	13.47	11.79				> 7
9	11.49	13.60		> 13

Tree	R²	M T2 Math	T1 Math	ML	ANS	PK	DV	PA	VWM
Predictors of Low Performance
1	44.31	10.40		≤ 11
2	43.81	13.27		≤ 14
3	39.98	12.50		≤ 12
4	35.88	12.83		≤ 14			≤ 56
5	35.68	8.50	≤ 10	≤ 12
6	30.72	11.92		≤ 12
7	27.24	13.13			≤ 76.25		≤ 59
8	22.74	9.67	≤ 12	≤ 12
9	21.67	8.00		11-14			≤ 56		≤ 1
10	18.88	14.29					≤ 56
11	13.40	14.75		≤ 14
12	12.04	14.00		≤ 12
Predictors of High Performance
1	44.31	20.78		> 14
2	43.81	22.25		> 14			> 60
3	39.98	19.20		> 12				> 19
4	35.88	20.13		> 14
5	35.68	18.33	> 10
6	30.72	20.50	> 17	> 14
7	27.24	19.63					> 59
8	22.74	20.25		> 14
9	21.67	22.20		> 14			> 60
10	18.88	20.29					> 60
11	13.40	18.00		> 14		> 30
12	12.04	21.50		> 14			> 60

Identifying Domain-General and Domain-Specific Predictors of Low Mathematics Performance: A Classification and Regression Tree Analysis

Abstract

Difficulties in Early Mathematics

Domain-Specific Factors

Domain-General Factors

Executive functioning

Literacy and language skills

An Alternative Risk Status Assessment Approach: Classification and Regression Trees

Figure 1

Current Study

Method

Participants

Measures

Early Mathematics

Approximate Number System

Language and Literacy Skills

TOPEL

Mathematical language

Executive Functioning

Response inhibition

Cognitive flexibility

Verbal working memory

Broad executive functioning

Processing Speed

Covariates

Procedure

Assessment Procedure

Analytic Strategy

Results

Descriptive Statistics

Table 1

Classification and Regression Tree Analyses

Table 2

Table 3

Younger Children

Figure 2

Older Children

Figure 3

Discussion

Limitations and Future Directions

Conclusions

Funding

Competing Interests

Acknowledgments

References

Appendix: Visuals of Trees

Figure A.1

Figure A.2

Figure A.3

Figure A.4

Figure A.5

Figure A.6

Figure A.7

Figure A.8

Figure A.9

Figure A.10

Figure A.11

Figure A.12

Figure A.13

Figure A.14

Figure A.15

Figure A.16

Figure A.17

Figure A.18

Figure A.19

Figure A.20

Figure A.21

Outline