The Importance of Replicating Meta-Analyses: Commentary on “Conceptual Replication and Extension of the Relation Between the Number Line Estimation Task and Mathematical Competence Across Seven Studies”

[1] Educational Psychology, University of Trier, Trier, Germany. [2] Cognitive Psychology, University of Trier, Trier, Germany. [3] Clinical Psychology, University of Duesseldorf, Duesseldorf, Germany. [4] Parenting and Special Education, University of Leuven, Leuven, Belgium. [5] Instructional Psychology and Technology, University of Leuven, Leuven, Belgium. [6] Educational Research and Development, University of Leuven, Brussels, Belgium.


Are Replication Attempts of Meta-Analyses Useful?
Replications of meta-analyses and cross-validations of meta-analyses by comparing them against other (e.g., pre-regis tered original) studies are rare, and their relevance might not be obvious. However, replications and other cross-valida tions of meta-analyses are highly useful because meta-analyses generate new findings, so-called synthesis-generated evidence, for example, the overall effect size averaged across studies, the distribution of effect sizes, between-study heterogeneity, and the proportions of between-study heterogeneity explained by moderator variables. Conducting a meta-analysis and producing such synthesis-generated evidence is a multi-step process. In this process, researchers have to make many methodological decisions regarding the literature search, inclusion criteria, coding manual, bias correction, statistical models, and result interpretation. Further, when many original studies are biased similarly, the meta-analytic results are also biased. Consequently, it is not clear whether replications and cross-validations of a meta-analysis will replicate the results from the original meta-analysis. An impressive example is a recent investigation comparing 15 meta-analyses against preregistered multiple-laboratory replication studies (Kvarven et al., 2020). On average, the effect sizes found in the replication studies were three times smaller than those previously found in the meta-analyses. Thus, replication attempts of meta-analyses, like the one conducted by EEA, are valuable and essential for examining the validity and robustness of synthesis-generated evidence.
EEA did not use meta-analytic statistical methods to integrate the findings from their seven datasets and do not report typical outcomes of meta-analyses, for example, the between-study heterogeneity and distribution of the study effect sizes. Instead, they analyzed individual participants' data pooled over studies. This limits the comparability of EEA and SEA. Generally, meta-analyses with individual participants' data, such as the data used by EEA, are possible and offer many advantages over meta-analyses averaging across effect sizes, as done by SEA (Cooper & Patall, 2009).

Commonalities and Differences of the Original and the Replication Study
EEA successfully replicated SEA's main finding that NLE correlates with broader mathematical competence but found differences regarding some of the moderating effects. Overall, we fully agree with the conclusion of EEA: "the current study successfully replicated the overall findings from the Schneider et al. (2018) meta-analysis. " This robustness of the NLE-mathematical competence relation in EEA dovetails with the reliable correlations between the two constructs observed across various moderator levels in SEA. Methodologically, the high replicability of the NLE-mathematical competence association may be due to the fact that the NLE task is easy to explain, to understand, to administer, and to score. On the theoretical level, there is no doubt that number line estimation taps into central components of mathematical competence, notwithstanding the fact that there are many open questions regarding the details. The successful replication of the main finding supports this view. EEA reported some findings regarding moderating effects that diverge from the findings of SEA. The interpretation of these findings is hampered by the fact that EEA did not conduct a direct replication, but a conceptual replication. That is, they tested the same main hypotheses as the original study but used different methods. When a conceptual replication and an original study differ in their findings, it is unclear whether the evidence from one of the two studies is invalid or whether methodological differences can explain the divergences. This is the case here, too. Table 1 lists the many methodological differences between EEA and SEA, potentially explaining differences in the findings. Table 1 Differences Between the Original Meta-Analysis (SEA; Schneider et al., 2018) and the Replication (EEA, Ellis et al., 2021, this issue). For example, EEA found other age trends than SEA. However, SEA found that the age trends differed between estimation with whole numbers or fractions and only SEA but not EEA included studies with fractions. Other examples of relevant methodological differences are that EEA included a non-standard version of the NLE task involving nonsym bolic numerosities on flashcards, unusual marks on the number line, and a cover story including a bunny hopping on the line. Further, SEA had explicitly excluded measures of basic numerical cognition from consideration as mathematical competence measures. By contrast, EEA included one such measure (Panamath), which assesses the acuity of the approximate number system by a numerosity comparison task and had been found to have a relatively low reliability (Inglis & Gilmore, 2014).

Extension of the Original Meta-Analysis
SEA reported that number line estimation predicts mathematical competence over time with r = .496. EEA extended these findings by analyzing how strongly number line estimation predicts later number line estimation and mathemat ical competence. Because they analyzed individual participants' data, they could elegantly control the longitudinal relations for potentially confounding variables, such as age and sex. In these analyses, EEA found that number line estimation proficiency was only a very weak predictor of later number line estimation proficiency and mathematical competence. These findings are surprising because many of the studies synthesized by SEA found that the number line estimation task assessed a high proportion of systematic variance, had a good reliability, and a good predictive validity. One explanation of the weak relations found by EEA is that their person-level data allowed them to control for person characteristics, that might have inflated the correlation in other studies. Another possible explanation is that a substantial proportion of the longitudinal data analyzed by EEA had been obtained using a non-standard number line task and using basic ANS acuity rather than a more classical measure of mathematical competence, for example arithmetic tasks or a school achievement test.

Conclusion
Overall, EEAs' replication of the previous meta-analysis on number line estimation and broader mathematical compe tence is an innovative and useful contribution to the literature on numerical cognition. Any interpretation of the findings needs to carefully consider the many study characteristics that might moderate the relation between number line estimation and mathematical competence.

Funding:
The authors have no funding to report.