What Applying Growth Mixture Modeling Can Tell Us About Predictors of Number Line Estimation

Number line estimation tasks have been considered a good indicator of mathematical competency for many years and are traditionally analyzed by fitting individual regression curves to individual responders. We innovate on this technique by applying growth mixture modeling and compare it to traditional regression using a sample of 2nd graders (n = 325) who completed both 0–20 and 0–100 number line tasks. We explore the effects of gender, special education needs, and migration background. Using growth mixture modeling, more children were identified as logarithmic responders than were identified using regressions. Growth mixture modeling was able to identify the significant effects of gender on class membership for both tasks, and of special education needs for the 0–20 task. Overall, growth mixture modeling provided a more complete picture of individual response patterns than traditional regression techniques. We discuss the implications of these findings and provide recommendations for future researchers to use growth mixture modeling with future number line task analyses.

children were more likely to provide linear responses than Chinese-American children. However, this study investigated cross-cultural difference and not the effect of a migration background. Direct investigation of these qualities will expand the overall picture of the development of number cognition, number line estimation, and mathematical competence. In particular, it can be uncovered whether the relationship between number line estimation and mathematical competence is a direct one, or if the relationship is differentially affected by related variables.

Growth Mixture Modeling for Number Line Tasks
Commonly, studies analyze number line tasks based on children transitioning from a better logarithmic fit to a better linear fit (Opfer & Siegler, 2007;Rouder & Geary, 2014) or the percentage of children which had a better linear or logarithmic fit (Opfer & DeVries, 2008). A linear or a logarithmic responder was identified based on which (frequentist or Bayesian) regression model had lower error, without regard to the difference between linear or logarithmic fits or the location of error. In this way, the data was reduced to nominal categories.
Meanwhile, growth mixture modeling can provide probabilities of linear or logarithmic fits, where a child is given a value between 0 and 1 reflecting the probability of being best fit by the model. This allows a gradation of certainty in the categorization responders.
Traditionally, latent growth modeling techniques are used to assess change over time (Duncan & Duncan, 2004). They can be easily adopted to many different growth curves (e.g., linear, logarithmic, linear segments, etc.), allow for various missing data solutions, and can model data with many different measurement time points. Latent growth models can model number line tasks in the same way that regressions have. Instead of calculating growth curve over time, one calculates the position of estimates over the space of the line. Similar to a random-slope, random-intercept model, this produces slopes and intercepts for the entire sample as well as estimates for each individual. Thus, separate regression models for each child are unnecessary. Furthermore, the models can compensate for issues of lower error rates around end-or anchor-points and different error distributions. Because latent growth models can be estimated within a structural equation modeling (SEM) framework, the models can be extended into other modeling techniques, such as mixture modeling.
Mixture modeling allows for the identification of latent classes (for a review, see Collins & Lanza, 2010). This is a good match for number line estimation studies because responses are believed to be generated from two distinct representations (i.e., linear or logarithmic). Graduated probabilities of class membership are generated based on both the degree and location of deviations from the expected curve. Mixture modeling applied to latent growth models is called growth mixture modeling (Ram & Grimm, 2009). This is a major improvement over all-or-nothing categorizing based on absolute error. Additionally, mixture models can test covariates based on class membership probabilities. Many past studies have examined covariates (e.g., gender, cognitive ability, age, and others; Laski & Yu, 2014;Opfer & Martens, 2012;Siegler & Ramani, 2009) based solely and better linear or better logarithmic regression fits. These results can be improved upon within a growth mixture modeling framework.
Furthermore, growth modeling can accurately capture a spatial relationship, while mixture modeling can potentially identify a particular strategy used by children within a block of number line trials. A child may use an anchoring strategy, which may be identified based on lower error rates at the anchors, or alternatively, the child may use a random response strategy, which may be identified by a flat line. Because growth mixture modeling Growth Mixture Models for Number Line Tasks 68 allows for a curve statistics for each individual child, such strategies can be identified and sorted into a class of responder.
In Figure 1, an example of a growth mixture model applied to a number line task can be seen. Since we use an SEM framework, the model can be read very similar to other structural equation models. In this simplified model, the task is a 0-4 number line with three trials (Numbers 1, 2, and 3). The row at the bottom represents these trials. These compare directly to observations made at different time points in a traditional latent growth model. Above are I and S for the intercept and slope from the latent growth model. The intercept is fixed at 1 in all cases, but we can see that the slope is fixed at f(x), where x is the target number for the estimation task. The scaling of the slope depends on the intended curve shape. In the linear class, f x = x and in the logarithmic class, where x is the target of the individual estimation task (i.e., 1, 2 or 3 in this example). Above the intercept and slope is C, which represents class membership. As described above, the classes differ only in slope loadings.
Class membership then depends on the outcomes of the latent growth model with the differing slope loadings.
Finally, onto class membership is regressed the background variables of interest, gender, SEN, and migration background.

The Present Study
The present study describes the application of a pre-existing analysis to a new area, number line tasks.
We apply growth mixture modeling to assess whether participant responses more closely match a linear, DeVries, Kuhn, & Gebhardt 69 logarithmic, or two-anchor model. We also assess the effect of SEN, migration background, and gender on model fit probability. Along these lines, we have developed a series of hypotheses.
First, we assess performance on both 0-20 and 0-100 number line tasks. We expect that the vast majority of children will provide responses corresponding to a linear model on the 0-20 task, while fewer will do so on the 0-100 task. This corresponds to Siegler and Opfer's (2003) findings of linear responses in the second grade on similar scales, as well as more recent findings (Fuchs et al., 2013;Kim & Opfer, 2017;Laski & Yu, 2014;Opfer & DeVries, 2008;Siegler, 2009).
Second, we examine performance based on gender, SEN, and migration background. We expect that boys will respond more linearly than girls, based on other work estimating the effects of gender on mathematical competency (Cimpian et al., 2016). Similarly, we expect children with SEN will more show logarithmic response patterns (Gebhardt et al., 2014;Hansen et al., 2015;Opfer & Martens, 2012). Furthermore, we expect that those with a migration background will also demonstrate more logarithmic responses, based on past work indicating a possible lag in mathematical development (Fuchs et al., 2013;Siegler & Ramani, 2009).
Third, we examine the fits of a two-anchors model, where responses vary based on distance to both endpoints, and not just the origin (see Rouder & Geary, 2014). We expect few children to fit within a two-anchors model because past results indicate a logarithmic model is still more likely early in development.
Finally, we repeat our analyses following the fit-based regression techniques used by previous work. We then compare the results of both techniques.

Method Participants
Participants were 325 second grade students attending regular primary schools in the Northwest of Germany.
Participants were recruited through their school administrators and teachers following established protocols for education research within Germany. Slightly under half the participants were boys (n = 144, 44.3%). Teachers were asked to report SEN and migration background of their participants, based upon whether the child or the child's parents were born abroad. The proportion of learners with a migration background was relatively high, although commiserate with the region (n = 143, 44.0%). A smaller number had SEN (n = 57, 17.5%), including language problems (n = 26), learning (n = 11), cognitive development (n = 6) and other (n = 14). An overview of the gender proportion and age of our participants can be found in Table 1.
Growth Mixture Models for Number Line Tasks 70

Procedure
All data were collected by the same trained research assistant. We updated previous similar procedures (e.g., Opfer & Siegler, 2007;Rouder & Geary, 2014;Siegler & Opfer, 2003) for use with a tablet computer. Tasks were implemented via the web-platform Levumi (www.levumi.de). Students were told they would play a game on a tablet computer. They were shown a number line task on a 10-inch tablet and the touch-interface was explained to them. Next, they received their own tablet with an example problem on it. When they clicked on the number line, a blue line indicating their choice appeared. They could then click a new position or click on the continue button. Once they completed the sample item, they had 3 minutes to complete as many of the number line problems as they could. The 3-minute limit ensured participants were engaged with the task and allowed for data to be collected with minimum disruption to the regular class. In the 0-20 condition, children received all possible numbers between 0 and 20. In the 0-100 children, children received 20 numbers, 2 from each decade of the range (e.g., 2 between 10 and 19, 2 between 20 and 29, etc.). This procedure was replicated for both the 0-20 and 0-100 number lines.
Due to the random ordering of trials, items that were not reached were treated as missing at random and deleted pairwise in all analyses. The average number of missing responses was 3.2 (SD = 4.0) in the 0-20 number line and 4.1 (SD = 5.9) in the 0-100 number line. Most responders had fewer than 10% missing (64.6% in the 0-20 task and 63.1% in the 0-100 task). A multiple linear regression indicated that gender, migration background and SEN did not have a significant effect on the number of missing responses (all ps > .10). In six cases, no responses were given at all. These cases were removed from all analyses, leaving 319 participants.

Analysis
We applied growth mixture modeling (see Ram & Grimm, 2009 for a review) to estimate both linear and logarithmic response models. These were calculated using the robust maximum likelihood estimator (MLR; Yuan & Bentler, 2008). All growth mixture analyses were conducted using Mplus 7.4 (Muthén & Muthén, 1998, and an example of our syntax and notes on model specification are provided in the Appendix. A two-class (linear and logarithmic) mixture model was applied to the 0-20 and 0-100 task. A three-class model (linear, logarithmic, and two-anchors; see Rouder & Geary, 2014) was also applied to the 0-100 number line task. A regression of gender, SEN, and migration background onto class membership probability was included in the mixture analysis via 1-step joint model estimation. The Appendix contains an example of our syntax and the scaling functions used to create a separate growth model for each latent class.

DeVries, Kuhn, & Gebhardt 71
In order to compare analysis techniques, we also conducted a regression analysis. Separate linear and logarithmic regressions were calculated for each child. A child was categorized as linear or logarithmic based on which model had a better R 2 .
Finally, we compare the results to linear and logistic regressions using gender, SEN, and migration background as predictors of linear R 2 or class membership, as determined by individual linear and logarithmic regressions of each child's responses.

Overall Model Performance
The models converged successfully in all cases, successfully replicating the log-likelihood values in multiple random starts. Fits of all models are described in Table 2, and additional test statistics for the models are described in Table 3. As seen in Table 4, good latent class separation was also achieved in all models. Based on all three fit metrics, the Akaike information criterion (AIC), Bayesian information criterion (BIC), and adjusted BIC, the three class model performed better; however, very as seen in Table 3, very few responders (5%) fit the two-anchors class in the three-class model. We explore the two and three class models in more detail below.

Two Class Model
Graphical summaries of the two-class models for the 0-20 task can be seen in the left half of Figure 2. The linear class is well defined and shows a straightforward linear trajectory; however, the logarithmic class shows a very high intercept with a low slope. Responses for the logarithmic class are more erratic. It is therefore unlikely that the logarithmic class represents actual logarithmic responses on this scale, but instead indicates random responders. This is supported by the relative small proportion of membership in this class (less than 3%). Note. The dashed line is the predicted aggregate linear model, and the solid line is the predicted aggregate logarithmic model. Circles represent linear class mean responses, and triangles represent logarithmic class mean responses. The 0-100 task also appears in Figure 2. In contrast to the 0-20 task, it demonstrates a clear logarithmic curve.
Coupled with the high proportion of membership, a logarithmic interpretation of this class is appropriate.

Effects of Subject Variables on Class Membership
The odds ratio effects on the probability of being a linear responder is described in Table 5. On the 0-20 number line, the only significant predictor of linear or logarithmic group membership was SEN, who were significantly less likely to respond linearly, p < .05. On the 0-100 number line, males were significantly more likely to produce linear responses than females, p < .01. Note. An odds ratio describes the likelihood change of the linear class matching given the child with the described quality (e.g., male, possessing a migration background, or having special education needs). Values that are significantly different from 1 are marked. *p < .05. **p < .01.

Three Class Model
We examined the three-class model for only the 0-100 number line task because the responders in the 0-20 task were overwhelming linear, and interpretation of the 0-20 model's second class was already unclear due to the flat slope of the logarithmic class. As seen in Table 4, the class separation remained good in the three class model for the 0-100 task. The graphical summary of the three-class model can be seen in Figure 3. Here the slope is quite flat, and there is relatively little sinusoidal curvature. This model appears to fit mostly a small group of random responders. Table 6 shows that males were more likely to be in the linear class than the logarithmic class, p < .01. Migration background had no effect, p > .05. Children with SEN were more likely to the linear or logarithmic class than in the two-anchors class, ps < .05. Based on the flat curvature of the two-anchors solutions, this suggests that children with SEN were more likely to be random responders.

Model Selection
While the information criteria fit values for the three-class model are better and the likelihood ratio test values indicate a significantly better fit, theoretical interpretation is also a critical factor in model selection. In this case, the interpretation of the three-class model does not follow the theoretical goals of the model. Instead of representing a logarithmic function of the distance from either end-point, it represented relatively flat, random responses of a very small (5%) proportion of responders. Thus, the three classes of the model were linear, logarithmic, and random responders. We therefore prefer the simpler 2-class model.

Comparison to Individual Regressions
We compared growth mixture modeling to traditional techniques which fit individual regression curves to each participant for the 0-100 task. We excluded the 0-20 task because of the relatively low variability in class membership at this range. These analyses produced very different results than the mixture modeling.
Nearly 30% more responders were classified as linear (83%) than in growth mixture modeling. Additionally, we conducted three separate regressions, summarized in Table 7. The first regression was a logistic regression of SEN, gender, and migration background onto whether the individual linear regression fit better than the individual logarithmic regression. Here, both gender and migration background were significant, p < .05. We also conducted linear regression of SEN, gender, and migration background onto both linear R 2 and logarithmic R 2 , no significant coefficients were found in either case, p > .05. Migration Background −.04 0.03 .257 Note. SEN = special education needs. The logistic model was coded as a better or equal linear R 2 = 1, and worse linear R 2 = 0. Thus, positive coefficients indicate an increase likelihood to be better fit by the linear function. *p < .05.

Discussion
Our paper describes a new application of growth mixture modeling for the analysis of number line tasks. This technique identified a significant effect of gender on linear response probability in the 0-100 number line, and of SEN for the 0-20 number line. These results differed from traditional number line analyses, which involve fitting individual regressions for each child, where far fewer logarithmic responders were identified and significant effects were found for gender and migration background in the 0-100 task. Our analysis represents an important innovation in data analysis of number line tasks and allows for superior data analysis and better treatment of missing data. Furthermore, this growth mixture modeling can deal with violations of normalcy and homoscedacity. Importantly, it provides continuous estimates of probability of linear, or logarithmic (or other) response patterns. This technique can also be extended to analyze other functions including power, segmental, and other curves. The technique can be further extended to more measurement points and other response ranges via simple changes to the syntax and scaling functions. A further extension of the models to latent transition analysis could also model changes in response patterns over time or due to an intervention.
Our new analysis identified more children into the logarithmic class than the regression analyses did. This difference can be explained by error distribution and model overlap. Number line errors are not consistent throughout the entire line (see Rouder & Geary, 2014 for a detailed explanation of this issue). This is further complicated by some individuals using an anchoring strategy (e.g., Friso-van den Bos et al., 2015;Kim & Opfer, 2017;Rouder & Geary, 2014). Thus within expected error ranges, the linear and logarithmic representations overlap at many points. Responses at these points are less informative when identifying a linear or a logarithmic responder, but regression techniques typically use overall error of the regression model for this purpose. Mixture models give these responses less weight, which allows for a better, probabilistic assessment of responders. This helps to explain the differences between growth mixture modeling and individual regression results. In the two class growth mixture model for the 0-100 number line task, boys were significantly more likely than Growth Mixture Models for Number Line Tasks 76 girls to fit the linear class. Similarly, they were more likely to have a better linear R 2 in the logistic regression, although their the R 2 did not differ significantly in the linear and logarithmic regressions. The logistic regression also found that children with a migrant background were significantly less likely to be better fit by the linear regression. However, the effect of migrant background was not significant in the growth mixture model. The effect of migration background was biased due to the small number of logarithmic responders. Growth mixture modeling avoided this by identifying a larger group of logarithmic responders.
While the data match the logarithmic curves well in the 0-100 condition, there are some additional issues about model performance. In the 0-20 condition, the vast majority (over 97%) of responders fit into the linear class, but the logarithmic class appears to be random responders and not true logarithmic responders. Therefore, for the 0-20 range, it would be better to interpret the logarithmic class as a class of random responders.
This represents only a few outlier cases and may be an artifact of using a two-class model when almost all responders were linear. We see a similar artifact in the three-class model of the 0-100 task. It is therefore critical to carefully interpret each class when applying growth mixture modeling.
Higher class proportions of linear responders on the 0-20 task than the 0-100 task replicated the critical finding of a logarithmic to linear shift with increasing magnitudes (Friso-van den Bos et al., 2015;Laski & Yu, 2014;Opfer & Siegler, 2007). Our participants were overwhelming (over 97%) linear responders in the 0-20 condition, and only slightly above 50% linear responders in the 0-100 condition.
We also explored a three-class model, with a linear, logarithmic, and two-anchor class (based on Rouder & Geary's 2014 M2). In the two-anchor class, curves are connected from the end-point to the midpoint, suggesting participants' compressed representation starts at the end points, and is fitted through the midpoint of the number line. However, response patterns in the two-anchors class responded along a straight, flat line, and not along a sinusoidal curve. This suggests that they instead adopted a random response pattern. Further work fitting such models across more age groups may identify true two-anchor strategies.
While few studies have precisely examined the effects of gender on number line estimation, our finding that boys responded more linearly than girls was well predicted by previous work relating to mathematical competency and gender. More work precisely measuring and accounting for this effect remains necessary. We also found that children with SEN were significantly less likely to belong to the linear class on the 0-20 task, although not on the 0-100 task. However, this result requires further examination as it represented a very small number of our participants. Additional work involving classrooms from multiple countries and cultures, and students from diverse backgrounds, with and without SEN is necessary. Further work modeling the change in class membership after feedback, over time, and throughout development is also necessary.

Conclusion
We apply growth mixture modeling for the first time to number line estimation tasks. The new method can effectively discern both linear and logarithmic representations. This approach identified effects of SEN and gender, while no effects of migration background were found. Our method can be readily adapted to many different number line estimation tasks in future analyses.

Funding
The authors have no funding to report. A further extension via a latent transition analysis could be attempted. This would be an ideal method for comparing group membership over the course of multiple measurement points; however, it may prove computationally complex. Future refinements may improve modeling efficiency allowing for such a model.
Lastly, alternative methods for entering covariates may be used. For instance, the 3-step process may be implemented in Mplus via the "auxiliary" command in the "variable" section. We explored this method for both the 2-and 3-class models for the 0-100 number line, and there were no changes to significant results.
Growth Mixture Models for Number Line Tasks 82