Several working memory processes have been hypothesized to influence different arithmetic operations. Working memory has been compartmentalized into a number of different subprocesses, such as phonological memory and visuospatial memory that are believed to have unique contributions to the performance of two distinct arithmetic operations: multiplication and subtraction. A previous dual task experiment produced these effects, but subsequent experiments have yielded inconsistent results. Because the reasons for these inconsistencies are not immediately apparent, the current study systematically reviewed these subsequent attempts and attempted to replicate this effect in a withinsubjects dual task experiment using tasks developed from prior work across a number of different subsamples. In contrast to the original finding, we observed no differential impact of specific working memory secondary tasks by arithmetic operation in any of our analyses. However, our analyses do not entirely rule out the possibility of differential effects of working memory tasks. Our findings suggest that the working memory facet by arithmetic operation interactions observed in previous work may be idiosyncratic in nature and difficult to predict a priori in subsequent experiments.
Evidence from cognitive psychology and neuroscience suggests domainspecific components of working memory may contribute to differences in mental arithmetic performance, but several important questions remain unanswered. A number of imaging and lesion studies suggest the parietal regions are heavily involved with the process of mental arithmetic, specifically addition and subtraction as well as with visuospatial processes (
The current study will review these approaches and their findings and describe our current approach to investigate the unique contributions of working memory within mental arithmetic. An influential study by
Cohen’s
Several studies have used similar methods; although some have replicated the direction of these effects, none have produced the pattern of opposite effects with magnitudes approaching the size in the original
Considering the variation in design features, the inconsistent results from previous attempts to replicate
Moreover, current metaanalytic evidence of dualtask experiments also suggest that the influence of specific working memory components on arithmetic performance may not be as robust as other findings related to dualtask performance, such as the effect of domaingeneral demands of the secondary task on performance (
While
In the current study, a dualtask paradigm was used to test the involvement of phonological and visuospatial resources within mental subtraction and multiplication. The aim of this study is to test whether the findings reported in
We used the software program G*Power to conduct an a priori power analysis (
Following this plan, we recruited and ran 100 total participants from the University of California, Irvine (Female = 64, age range = 18 – 25 years old,
All tasks used in these experiments were created through PsychoPy 3 (
Subtraction problems were presented using a 2alternative forced choice (2AFC) paradigm. Participants were presented with simple twodigit – twodigit problems for 2 s. There were no borrowing or crossing of decade boundaries to minimize central executive involvement. Participants then chose from two answer choices which were displayed for 3 s or until participants respond. Three different sets of subtraction problems were used across three rounds (round 1: subtraction only; round 2: subtraction under phonological load; round 3: subtraction under visuospatial load) with easy and hard working memory loads split across 2 blocks. The order of the three sets as well as the difficulty blocks were counterbalanced across participants. Each set contained 28 different subtraction problems where each was displayed twice in total with a different answer pair each time. The order of the three sets was counterbalanced across all participants. In half of the answer pairs, the correct and alternative answers had a distance of 2; whereas the other half had a distance of 10. This was done in order to encourage participants to take into account both decades and units and to discourage the strategy of paying attention only to the units or decades. Distance from correct response were either in the positive or negative direction. For example, for the problem 3614, the two answer pairs were 22 vs. 20 (difference = 2) or 12 vs. 22 (difference = +10). Problems with a decade in one of the operands or in the result were excluded. Eleven was not used as an operand.
Multiplication problems were presented using a 2AFC paradigm. Participants were presented with simple onedigit by onedigit and twodigit by onedigit multiplication problems. Participants then chose from two answer alternatives which were displayed for 3 s or until participants responded. Three different sets of multiplication problems were used across three rounds of tasks (round 1: multiplication only; round 2: multiplication under phonological load; round 3: multiplication under visuospatial load) with easy and hard working memory loads split across 2 blocks. The order of the three sets as well as the difficulty blocks was counterbalanced across participants. Each set contained 28 different subtraction problems where each was displayed four times in total with a different answer pair each time. Among the four answer pairs, one contained a response alternative from the multiplication table of the first operand, another contained an alternative from the multiplication table of the second operand (tablerelated response alternatives) and the other two pairs contained response alternatives that were not from either operand’s multiplication table (nontablerelated response alternatives). For example, for the problem 12 × 7, the four different answer pairs were 84 vs. 98 (98 from 7’s table), 84 vs. 72 (72 from 12’s table), 84 vs. 64, and 84 vs. 94. Half of the problems were twodigit by onedigit and the other half were onedigit multiplication. In onedigit multiplication trials, the smaller operand preceded the larger operand. In twodigit by onedigit trials, the twodigit operand preceded the onedigit. The twodigit number was smaller than twenty. The onedigit number was larger than two. Tie problems (e.g., 6 × 6) and problems with a decade in the operand or result were excluded. Products were all below 100 to restrict responses to be twodigits at most like in the subtraction task.
Following the same task designs as those outlined in
The visuospatial span task also followed similar procedures to those used in
The study used a 2×3 factorial design using withinsubject factors. The withinsubject factors were arithmetic operation type (subtraction or multiplication) and WM load type (no load, PL load, and VSSP load). Noload (i.e., arithmetic alone) conditions served as controls against dualtask conditions. While culture and difficulty were part of the analysis, these were only considered in the subgroup analyses and not for additional interactions, because our main focus was on the operation × load interaction. The entire experiment was conducted online through video conferencing in which an experimenter guided the participant in downloading the required materials and protocol for completing experimental tasks. The experiment was administered within two sessions that were scheduled to be around the same time and spread apart by one week. Participants were also instructed to abstain from taking any alcohol or drugs prior to either session. Participants completed the experiment using their own devices. To ensure that reactions times were sufficiently accurate and consistent across different devices and operating systems, participants were instructed to use either a home desktop or laptop rather than a tablet or mobile phone. No information related to the participants’ devices, such as IP address, were maintained except for the operating system (e.g., Windows 10, MacOS) in order to ensure proper installation of PsychoPy and the experiment itself. Recordings were also not taken to respect the privacy of the participants.
In session 1, participants were given a brief questionnaire to capture their demographic information and math education background before being introduced to the PsychoPy environment and to downloading the experimental tasks. These questions included asking about their current major and the number of math courses they have taken since entering university. In addition, we asked specific math background questions including, “Prior to coming to university, in which country did you receive the majority of your math education?”, “If you were taught how to use an abacus or mental abacus strategy for doing math, how often have you used it? (Never taught; Never used; Rarely; Sometimes; Often; Very often)”, and “Do you consider yourself an A, B, C, D, or F student compared to your peers?”. Altogether, these questions allowed us to potentially examine differences in math proficiency among our sample, especially in our comparison between the Chineseeducated student group and the nonChineseeducated student group. From here, participants were given the adaptive phonological and visuospatial staircase tasks. Prior to the staircase, 10 practice trials were administered to familiarize the participant with the stimuli and testing environment. Discounting the practice trials, there were 30 trials per staircase for a total of 60 trials to determine difficulty thresholds. The order of these tasks were randomized and counterbalanced for all participants. Staircase performance from session 1 were used to determine easy and hard span levels for the dualtask conditions used in session 2. In total, the first session took approximately 60 minutes.
In session 2, participants started the dualtask experiment. Participants downloaded their PsychoPy tasks that were modified to fit the appropriate difficulty levels as determined in session 1. Participants then completed arithmetic alone and under load over 4 experimental blocks (multiplicationeasy load, multiplicationhard load, subtractioneasy load, subtractionhard load). The order of these tasks followed a blockrandomization wherein the singlearithmetic task was always administered first in the block followed by either the visuospatial or phonological loads. Half of the participants received the visuospatial load before the phonological load, while the other half received the phonological load first. The order of the four blocks was also randomized and counterbalanced for each participant such that each of the possible sequences as well as their reverse orders appeared an equal number of times. 10 practice trials were given before the start of the first block to familiarize participants with the dualtask procedure. Participants then completed each block which contained 28 arithmetic problems for each condition (arithmetic alone, with PL load, with VSSP load) for a total of 336 trials. The order of conditions was also randomized and counterbalanced. At the end of each block, participants were be given up to a 5minute break. Participants finished after completing the 4^{th} block. In total, the second session took no more than 2 hours to complete.
In this experiment, we focused on the key interaction predicted by
Hypothesis 1: As predicted by
Hypothesis 1a: Multiplication performance is slower and less accurate under PL load compared to VSSP load.
Hypothesis 1b: Subtraction performance is slower and less accurate under VSSP load compared to PL load.
In addition to these, we tested secondary hypotheses regarding the differences between singletask arithmetic conditions vs. each of the dualtask conditions as they were reported in
Hypothesis 1c: Multiplication performance alone is significantly faster than under PL load but not VSSP load.
Hypothesis 1d: Subtraction performance alone is significantly faster than under VSSP load but not PL load.
According to
Hypothesis 2: Receiving primary math education from China but not the US is associated with differences in load type by arithmetic operation performance, specifically:
Hypothesis 2a: Multiplication performance is slower and less accurate under PL load compared to VSSP load only in Chineseeducated samples.
Hypothesis 2b: Subtraction performance is slower and less accurate under VSSP load compared to PL load only in Chineseeducated samples.
Hypothesis 2c: Multiplication performance alone is significantly faster than under PL load but not VSSP load only in Chineseeducated samples.
Hypothesis 2d: Subtraction performance alone is significantly faster than under VSSP load but not PL load only in Chineseeducated samples.
In order to test Hypotheses 1a1d, we conducted multiple 2×2 ANOVAs under four model specifications (for summary of planned analyses, see
Even though we acknowledge that testing these multiple hypotheses inflates the probability of type1 errors, we chose not to adjust error levels for each statistical test, because a statistically significant interaction does not guarantee any of the more specific hypotheses to be supported. Instead, we reported on the level of support for the theorized crossover effect and predicted simple effects based on how closely our reported findings aligned with our predictions. For Hypotheses 1a1d, we concluded that there was strong support for the underlying theory if we detected an interaction and main load effects in directions consistent with
As a complement to the frequentist analyses of the interaction effect, we also report a Bayesian analysis for the main model (whole group) to examine the relative support for both our hypotheses of interest and the null hypothesis. We conducted a Bayesian repeated measures ANOVA, dependent on the 2×2 factors in the main model. Following
Data were analyzed primarily in JASP using its frequentist and Bayesian repeated measures ANOVA and pairedsample
The following analyses were either changed or added from the preregistration. Full documentation of all deviations can be found in a document within the
In contrast to our hypothesis, in the full sample, multiplication performance was not significantly slower (
As a preregistered robustness check, we estimated the same models for three subsamples of the data: easier secondary task blocks, more difficult secondary task blocks, and the first arithmetic block under cognitive load only. Similar patterns of results for both frequentist and Bayesian analyses can be found in our secondary analyses of the easier load, harder load, and first block conditions (
The staircase procedure used during the first session of each experiment to estimate each participant’s subjective 80^{th} and 99^{th} percentile threshold for their verbal and visuospatial cognitive loads provided reasonable estimates. On average, the 99^{th} percentile (easy load) threshold for participants was 5.52 (
Again, in contrast to our hypothesis, in the full sample, subtraction performance was not significantly slower (
As a preregistered robustness check, we estimated the same models for the easy, hard, and firstarithmetic block under load subsamples of the data. Similar patterns of reaction time results for both frequentist and Bayesian analyses can be found in our secondary analyses of the easier load, harder load and first block conditions (
To test Hypothesis 1c we included the single multiplication task condition into the 2way ANOVA and performed pairwise
For our preregistered robustness check, we estimated the same models for the easy, hard, and firstarithmetic block under load subsamples of the data. Our frequentist and Bayesian analyses for our subsample analyses yielded similar patterns of results to our whole sample analyses (Tables S3, S4, S7, and S8 in the
We found no support for Hypothesis 1d either. Subtraction performance under no load was significantly faster than either load condition (
For our preregistered robustness check, we estimated the same models for the easy, hard, and firstarithmetic block under load subsamples of the data. Both frequentist and Bayesian analyses for the subsample analyses yielded similar patterns of results to our whole sample analyses (Tables S3, S4, S7, and S8 in the
To test whether the differential influence of working memory depends on where students received their primary math education, we computed a 2 (country; US vs. Chineseeducated) × 2 (WM load) × 2 (arithmetic) ANOVA in order to test whether the differential impact of WM load type on arithmetic operation is dependent on where participants received the majority of their math education. The 3way ANOVA did not yield a significant main effect for country,
In accuracy, there were no significant effects for country,
Following the lack of a 3way interaction, we examined the Chineseeducated subgroup directly. Overall, we did not find evidence to support Hypothesis 2a. While there appeared to be a moderate effect of verbal vs. visuospatial load on multiplication reaction times (see
Overall, we did not find evidence to support Hypothesis 2b. The effect of visuospatial load on subtraction reaction times had a much smaller effect size than in
We did not find evidence to support Hypothesis 2c. Multiplication performance under no load was significantly faster than both load conditions (
We did not find evidence to support Hypothesis 2d. Subtraction performance under no load was significantly faster than both load conditions (
In this registered report, we tested several preregistered predictions based on previous findings from the dualtask literature with respect to the differential effects of secondary WM task load on arithmetic performance. That is, we tested whether verbal secondary tasks reduce multiplication performance but not subtraction performance, whether visuospatial secondary tasks reduce subtraction performance but not multiplication performance, if these differential effects can be observed relative to each other or a no load control. These predictions have implications for theories of mathematical cognition and working memory, along with dualtask performance specifically. Building upon previous work in the field, we identified potential moderators that could explain contradictory findings from the literature  specifically, secondary task difficulty and having learned mathematics primarily in China – and tested whether hypothesized effects emerge under these conditions.
Consistent with all previous literature, we found arithmetic performance to be generally slower and less accurate under cognitive load. However, contrary to our preregistered predictions, we found no evidence for the moderating effect of secondary WM task load on arithmetic operations performance across the whole sample (Hypotheses 1a1d) or any of our preregistered subgroup analyses, nor did we find evidence for these effects in the Chineseeducated participants (Hypotheses 2a2d). In our followup Bayesian analyses, our results generally provided support for the null hypothesis that there was no moderation by secondary WM load on arithmetic operation over the additive effects of secondary WM task and arithmetic operation. However, the majority of Bayes factors suggested only anecdotal evidence for the null over the alternative, suggesting that these differential effects could still be real but that our current experiment was unable to find sufficient evidence otherwise. At best, we found anecdotal evidence of a verbal WM load effect on subtraction accuracy across some of our subsample analyses; however, the direction of this effect was the opposite of what was predicted by previous work. In sum, we did not find any evidence for the large strong crossover interaction reported by
Interactions between secondary task types and arithmetic play a prominent role in the dualtask literature. These interactions have been interpreted as providing evidence that domainspecific pathways, such as verbal or visuospatial pathways have differential effects on numerical cognition (e.g.,
Additionally, our results conflict with parallel processing models of dualtask theory that attribute differences in dualtask performance to the amount of overlap in cognitive resources between two tasks (
To conclude, the current study investigated the differential effect of WM task loads (verbal and visuospatial) on arithmetic operations (multiplication and subtraction). Consistent with prior metaanalytic work on correlations between WM tasks and arithmetic performance and the dualtask literature on WM and arithmetic performance, the current study found consistent effects on arithmetic performance when under load of more complex secondary tasks, but no clear pattern for domainspecific interference. Despite investigating whether the crossover effect would emerge under conditions previously hypothesized to moderate the effect (difficulty and the system in which participants were educated), we did not find evidence for the predicted interaction in any of our analyses. Although multiplication and subtraction seemed to operate exclusively through verbal and visuospatial pathways, respectively, in the original study, this interaction has not been subsequently observed. We interpret these findings as evidence for a more domaingeneral pathway for WM secondary tasks’ influence on numerical cognition, although we encourage future work that continues to carefully consider how theories of working memory and dualtask performance could explain previous domainspecific effects within numerical cognition.
Author  Sample size  WM tasks  Arithmetic tasks  Multiplication effect (PL vs. VSSP) 
Subtraction effect (PL vs. VSSP) 
PL vs. VSSP in Multiplication; Subtraction (ms) 

10  Repeat nonword string (PL), Matching abstract shapes and location (VSSP)  exact subtraction, exact multiplication  2.42  3.31  1170 vs. 996 

57  Repeat nonword string (PL), 4×4 grid location task (VSSP)  twodigit subtraction, one × twodigit multiplication  0.04  0.04  5103 vs. 5018 

73  Repeat nonword string (PL), 4×4 grid location task (VSSP)  twodigit subtraction, one × twodigit multiplication  0.02  0.07  3015 vs. 3038 

32  Letter span (PL), 5×5 grid location task (VSSP)  2AFC multiplication (one × one; two × one digit), 
0.10  0.00  1015 vs. 989 

Chen, E. H., Jaeggi, S. M., & Bailey, D. H. – Chineseeducated sample  22  Letter span (PL), 5×5 grid location task (VSSP)  2AFC multiplication (one × one; two × one digit), 
0.28  0.09  883 vs. 841 
Chen, E. H., Jaeggi, S. M., & Bailey, D. H. – othereducated sample  71  Letter span (PL), 5×5 grid location task (VSSP)  2AFC multiplication (one × one; two × one digit), 
0.05  0.02  946 vs. 939 
WM task  Arithmetic  RT Mean  ACC Mean  

No load  Multiplication  787  222  93%  11%  97 
Subtraction  713  213  95%  10%  97  
Verbal  Multiplication  938  221  90%  11%  97 
Subtraction  850  234  91%  8%  97  
Visuospatial  Multiplication  923  231  90%  9%  97 
Subtraction  854  276  93%  8%  97 
Variable  %  

Gender  
Male  33  66  
Female  64  34  
20.1 (1.3)  
Country of primary math education  
US  71  73.2  
China  22  22.7  
Other  4  4.1  
Math grade compared to peer  
A  35  36.08  
B  49  50.52  
C  12  12.37  
D  1  1.03  
F  0  0  
Abacus use  
Never Taught  69  71.13  
Never Used  16  16.49  
Rarely  11  11.34  
Sometimes  1  1.03 
Factor  Whole sample 
Chineseeducated 


PL vs. VSSP × Multiplication  1.20  (1, 96)  .28  .01  1.69  (1, 21)  .21  .07 
PL vs. VSSP × Subtraction  .15  (1, 96)  .70  .002  .17  (1, 21)  .67  .01 
Factor  Whole sample 
Chineseeducated 


PL vs. VSSP × Multiplication  .49  (1, 96)  .49  .01  3.59  (1, 21)  .07  .15 
PL vs. VSSP × Subtraction  6.31*  (1, 96)  .01  .06  3.41  (1, 21)  .08  .14 
*
Model  Task  Mean difference  

Multiplication  
Whole  Verbal  151  13.98  10.77  96  < .001 
Visuospatial  136  13.98  9.71  96  < .001  
Chinese  Verbal  160  34.45  4.65  21  < .001 
Visuospatial  118  34.45  3.42  21  .014  
Subtraction  
Whole  Verbal  137  13.98  9.77  96  < .001 
Visuospatial  141  13.98  10.12  96  < .001  
Chinese  Verbal  94  34.45  2.74  21  < .001 
Visuospatial  106  34.45  3.07  21  < .001 
Model  Task  Mean difference  

Multiplication  
Whole  Verbal  3%  1  4.68  96  < .001 
Visuospatial  3%  1  3.99  96  < .001  
Chinese  Verbal  4%  2  2.67  21  .14 
Visuospatial  2%  2  1.39  21  1  
Subtraction  
Whole  Verbal  4%  1  5.23  96  < .001 
Visuospatial  2%  1  2.64  96  .07  
Chinese  Verbal  2%  2  1.22  21  1 
Visuospatial  0%  2  0.22  21  1 
Models  Whole 
Chinese 


BF_{10}  error %  BF_{10}  error %  
Arithmetic  3.80e +6  0.91  0.25  1.33 
WM task + Arithmetic  4.50e +5  1.78  0.06  1.42 
WM task + Arithmetic + WM task × Arithmetic  9.60e +4  4.67  0.03  2.38 
WM task  0.12  0.79  0.26  1.92 
Models  Whole 
Chinese 


BF_{10}  error %  BF_{10}  error %  
Arithmetic  159.70  1.11  0.31  0.85 
WM task + Arithmetic  106.89  3.19  0.49  1.73 
WM task + Arithmetic + WM task × Arithmetic  28.05  2.00  0.15  2.86 
WM task  0.59  0.90  1.58  1.68 
Models  Whole 
Chinese 


BF_{10}  error %  BF_{10}  error %  
WM task  4.0e +26  0.87  808.66  0.81 
Arithmetic  4.19e +7  0.73  0.19  2.02 
WM task + Arithmetic  1.3e +37  1.02  153.86  2.02 
WM task + Arithmetic + WM task × Arithmetic  6.8e +35  3.51  32.60  3.00 
Models  Whole 
Chinese 


BF_{10}  error %  BF_{10}  error %  
WM task  1.61e +5  0.92  1.21  0.86 
Arithmetic  1868.54  1.21  0.18  1.26 
WM task + Arithmetic  6.13e +8  2.00  0.22  1.34 
WM task + Arithmetic + WM task × Arithmetic  4.41e +7  0.90  0.05  2.12 
Research was carried out in accordance with the ethical principles and standards of the Institutional Review Board at the University of California, Irvine.
For this article, a data set is freely available (
The Supplementary Materials contain the following items (for access see
Preregistration protocol
Research data and codebook
R code to organize data for analyses in JASP
Arithmetic tasks, additional analyses, and documentation of deviations from original preregistration
Protocol for dualtask experiment
The authors have no funding to report.
The authors have declared that no competing interests exist.
The authors have no additional (i.e., nonfinancial) support to report.