People often struggle with story problems even when they are capable of doing the underlying mathematics because they have difficulty extracting relevant information from the text to determine the appropriate mathematical equation and procedure (Daroczy, Wolska, Meurers, & Nuerk, 2015; Mayer & Hegarty, 1996; Schley & Fujita, 2014; Thevenot & Barrouillet, 2015). To be successful, solvers must use both mathematical and linguistic knowledge. For example, linguistic knowledge about word order may be critical in extracting the appropriate equation from the text (Fuchs, Gilbert, Fuchs, Seethaler, & Martin, 2018): ‘more than’ implies a different operation than does ‘then there were more’. In the present research, we addressed the question of whether subtle differences in the words used in story scenarios influence adults’ performance on mathematically simple problems.
MattarellaMicke and Beilock (2010) reported that word choice in irrelevant story text coupled with numerical interference influenced math performance. More specifically, their foregrounding hypothesis reflected predictions based both on situation model research in text comprehension (e.g., Kintsch & van Dijk, 1978) and retrieval processes in arithmetic cognition (e.g., Siegler, 1988). Within a text, the situation model represents the characters, their behaviour, and objectcharacter interactions (Zwaan & Radvansky, 1998). Shifts in the situation model can occur if important story elements are foregrounded (Glenberg, Meyer, & Lindem, 1987). Foregrounding occurs when objects are made more or less salient to participants by either spatially associating or dissociating extraneous information with the protagonist in a narrative (Glenberg et al., 1987). For example, ‘He put on his sweatshirt’ is spatially associative because the act of putting on the sweatshirt associates the sweatshirt with the protagonist’s body. In contrast, ‘He took off his sweatshirt’ is spatially dissociative because the act of taking off the sweatshirt dissociates the sweatshirt from the protagonist’s body. In texts where the object was spatially associated with the protagonist as opposed to dissociated, Glenberg and colleagues (1987) found that participants were faster to confirm or disconfirm the presence of the object. Based on this finding, MattarellaMicke and Beilock hypothesized that foregrounding could affect math story problems when irrelevant interfering numbers were foregrounded and thus associated with the protagonist of the story.
Predictions consistent with the distribution of association (DOA) model of simple arithmetic (Siegler, 1988) were used by MattarellaMicke and Beilock (2010) to refine the foregrounding hypothesis. According to the DOA model, both correct and incorrect answers to math problems are activated and compete for retrieval. Thus, if an incorrect answer is more highly activated than the correct answer, the incorrect response may be retrieved. The incorrect answer may also be retrieved if the solver activates the wrong information in response to a presented stimulus; this phenomenon is known as retrieval interference. For example, when participants are presented with singledigit addition problems (e.g., 2 + 3) followed by singledigit multiplication problems (e.g., 2 × 3), they are more likely to add when they should be multiplying (e.g., respond that 2 × 3 = 5; Campbell & Timm, 2000). For adults who have years of addition and multiplication practice, the correct response (e.g., 12 to 3 × 4) receives more activation than the highlyinterfering answer (e.g., 7) under most conditions. However, the activation of these associations is sensitive to semantic context (Bassok, Pedigo, & Oskarsson, 2008). Thus, MattarellaMicke and Beilock proposed that if an interfering answer became semantically foregrounded (i.e., through association with the protagonist), math answer retrieval would be disrupted because these slight shifts in the story problem text would change the accessibility of problem content. In essence, MattarellaMicke and Beilock made the strong claim that minor word changes to text that is irrelevant to constructing an appropriate situation model could nevertheless influence problem solving of skilled adults, especially when coupled with an interfering number.
In each word problem used by MattarellaMicke and Beilock (2010), a number that represented a quantity of objects was either associated with or dissociated from the protagonist. For example, consider this problem from MattarellaMicke and Beilock: “Alexis, an architect, was busy designing a new office building in town. She picked up 16 sketches she had been working on and took the elevator to a meeting where she would deliver her ideas. Alexis took the elevator 7 floors. If each floor is 9 feet tall, how many feet did she travel?” In this problem, the 16 sketches are associated with the protagonist because they were picked up. In the dissociation version, the object would be dissociated from the protagonist, such as “She set aside 16 sketches….” The number connected with the associated or dissociated information was either highlyinterfering or lessinterfering relative to the required calculation. In this example, the highlyinterfering number is 16, which is the sum of 7 and 9 (i.e., the critical numbers for the mathematical problem). In contrast, the lessinterfering version of this scenario contained a number that was similar in magnitude to the interfering number (e.g., 15). MattarellaMicke and Beilock hypothesized that foregrounded information that included a highlyinterfering number would disrupt mathematical problem solving. In support of their hypothesis, participants made more multiplication errors when the story included a highlyinterfering number (i.e., the sum of the critical numbers) and when the object in the story was associated with the protagonist.
Jarosz and Jaeger (2019) attempted to replicate MattarellaMicke and Beilock’s (2010) results for multiplication and extend the findings to division. They proposed the inconsistentoperations hypothesis which includes the prediction that association will activate addition and adds the further prediction that dissociation will activate subtraction. For example, the dissociative words put down are often used in subtraction problems, whereas the associative words picked up are often used in addition problems. Thus, in the example above, the addition operation would be activated in the associative condition, potentially interfering when the problem required multiplication, whereas subtraction would be activated in the dissociative condition, potentially interfering when the problem required division. Addition and multiplication interfere with each other in arithmetic studies without story contexts (Butterworth, Zorzi, Girelli, & Jonckheere, 2001; Campbell & Metcalfe, 2007; Campbell & Timm, 2000; LeFevre et al., 1996), and subtraction interferes with division if solvers are using the procedural strategy of repeated subtraction to calculate the quotient (Kouba, 1989; Mulligan & Mitchelmore, 1997). Note that although MattarellaMicke and Beilock included division problems in their study, these problems were not analyzed.
The foregrounding hypothesis (MattarellaMicke & Beilock, 2010) and the inconsistentoperations hypothesis (Jarosz & Jaeger, 2019) make the same prediction for multiplication problems: Solvers will make more errors on multiplication problems for associative than dissociative scenarios. In contrast, the inconsistentoperations hypothesis also predicts that solvers will make more errors on division problems for dissociative than for associative scenarios because dissociative scenarios contain subtraction language. Consistent with this prediction, across all three of their studies, Jarosz and Jaeger found that participants made more division errors when the problem scenario was dissociative rather than associative. The presence or absence of an interfering number did not influence their results. Moreover, they failed to replicate MattarellaMicke and Beilock’s findings in that there were no differences in performance for associative versus dissociative problems for multiplication. Thus, the foregrounding hypothesis was not strongly supported when all five studies across these two papers are considered. Furthermore, a search among published work that cited MattarellaMicke and Beilock (2010) did not uncover any other studies in which associative/dissociative language was manipulated in math word problems, despite the substantial literature on interactions between linguistic and numerical factors in word problem solving (Daroczy et al., 2015).
The foregrounding and the interferingoperations hypotheses have implications for models of the interplay between text elements and numerical components of story problems (Daroczy et al., 2015). The lack of supporting literature and the inconsistent patterns across the two papers suggested to us, however, that the effects of these highlyspecific manipulations on problemsolving performance might not be strong. We hypothesized that minor wording changes and the presence of interfering numbers in irrelevant text may not be sufficient to disrupt the construction of the situation model for story problems, especially among skilled adults. Accordingly, we attempted to conceptually replicate and extend the findings of MattarellaMicke and Beilock (2010).
A conceptual replication was necessary for several reasons. First, we were unable to obtain the original stimuli used by MattarellaMicke and Beilock (2010). Second, MattarellaMicke and Beilock’s design confounded specific stories with experimental conditions (i.e., story content was not counterbalanced across conditions) and other aspects of the problems were not controlled (i.e., problems varied in length). We redesigned the materials to be more uniform and to avoid confounds between specific stories and conditions. We also presented the word problems all at once, rather than in two parts, to maximize the ecological validity of the word problems. Third, sample sizes in MattarellaMicke and Beilock’s studies were small (i.e., 38 and 21 in Experiments 1 and 2, respectively) and effect sizes were small, indicating that larger sample sizes would be prudent for replication.
After we had designed and implemented the current research, we became aware of similar work by Jarosz and Jaeger (2019) in which they also attempted to replicate MattarellaMicke and Beilock (2010). Thus, not all the features of the studies conducted by Jarosz and Jaeger were included in the present research. Fortunately, we could compare performance across operations and across association conditions for both operations, because Jarosz and Jaeger had also modelled their manipulations on those of MattarellaMicke and Beilock. We did not, however, have the same numerical interference manipulations as Jarosz and Jaeger. In summary, our analyses constitute a conceptual replication of the research of MattarellaMicke and Beilock, and a partial conceptual replication of the research of Jarosz and Jaeger.
Study 1
Method
Participants
Twohundred and fortyseven undergraduate students accepted the invitation to participate in the study. However, after exclusion criteria were applied (see Results for details), data for 205 participants (83.0%) were analyzed (147 females; M_{age} = 22.4 years, SD = 5.9). Of these participants, all reported speaking fluent English, with 75.6% reporting English as their first language. After English, the most common first languages reported were Mandarin (6.3%), Arabic (5.9%), and French (2.9%). The remaining 9.3% of participants reported other first languages with a frequency of less than 1%. The study was approved by the Carleton University Research Ethics Board.
Measures
The Math Background and Interests Questionnaire
The Math Background and Interests Questionnaire (MBIQ; LeFevre, SmithChant, Hiscock, Daley, & Morris, 2003) is a 20item measure of math attitudes (n = 3), language and writing skills (n = 6), and demographic information (n = 11). Demographic information included questions with respect to gender, age, and program of study. Math attitudes and language and writing skills were reported on a 5point Likerttype scale, ranging from “1” (i.e., very low/almost never) to “5” (i.e., very high/always). With the exception of age and first language, these data were not used in the present research.
Math Story Problems
Six counterbalanced problem sets composed of 32 multiplication and 16 division problems were created based on the format used by MattarellaMicke and Beilock (2010). Problems required singledigit multiplication with operands from 2 to 9 (e.g., 4 × 5), excluding ties (e.g., 4 × 4); division problems used the corresponding operands (e.g., 20 ÷ 4). For multiplication problems, the scenarios were either associative or dissociative and contained either highlyinterfering (i.e., sum of the critical problem numbers) or lessinterfering numbers (i.e., numbers similar in magnitude to the interfering number). For division problems, the scenarios were either associative or dissociative and contained the same number used in the corresponding highlyinterfering multiplication problem. MattarellaMicke and Beilock had an equal number of multiplication and division problems (36 of each, total of 72). They described division as filler problems and did not provide details of their design or any analyses. Jarosz and Jaeger had 32 multiplication and 32 division problems (total of 64) in their problem set. Because our study was designed prior to the Jarosz and Jaeger (2019) publication, we did not have highly and lessinterfering conditions for division. In all studies, half of the problems in each operation were associative and half were dissociative.
The story problems consisted of two components, although the whole story problem was presented simultaneously to the participants. The first part introduced the story problem protagonist and a scenario that included a number of items (highlyinterfering or lessinterfering for multiplication) that were either associated with or dissociated from the protagonist (see Table 1 for examples). The second part presented a mathematical question that required either multiplication or division. The highlyinterfering numbers were the sums of the two numbers presented in the multiplication problem (e.g., “5” for 3 × 2). The lessinterfering numbers were similar in magnitude to the highlyinterfering numbers (e.g., “4” for 3 × 2). For division problems, the interfering number was the same number used in the highlyinterfering multiplication problem (e.g., “5” for 6 ÷ 3).
Table 1
Operation  Story Text 

Multiplication  Jane, a wristwatch model, is preparing for her next photoshoot. She put nail polish on 5/4/some fingers/She took nail polish off 5/4/some fingers and then headed to work. Jane had 2 photoshoots today and each shoot lasted 3 hours. How many hours did Jane work today? 
Division  Amy, a museum tour guide, was busy preparing for the next big tour. She picked up 15/some maps/She put down 15/some maps and collected payment from the tour group attendees. She collects $56 in total. If each person in the group paid $7, how many people are in her tour group? 
Note. Italics are used in this table to highlight the information that was manipulated. Associative/dissociative content and highlyinterfering number/lessinterfering number/interfering word (multiplication) and interfering number/interfering word (division) content is separated by slashes.
MattarellaMicke and Beilock (2010) and Jarosz and Jaeger (2019) presented the components of the story problems sequentially. However, in the present research our goal was to make the story problems similar to those encountered in typical situations. In MattarellaMicke and Beilock (2010), participants were asked to rate the introduction sentences for clarity and similarity to other passages to ensure that they read all of the text. In contrast, in the present research, participants were asked to answer a threeitem multiplechoice comprehension problem related to the protagonist (e.g., “What kind of model was Jane?”—swimsuit, wristwatch, shoe or “Where does Amy give tours?”—zoo, museum, or art gallery) on a separate screen after responding to the story problem. The comprehension question was included to ensure that participants read the whole story problem, including the associating or dissociating information and the interfering number.
Procedure
Participants were recruited through an online system and received course credit for their participation. Participants logged in, selected the study, and received a link to a survey created with an online survey tool (i.e., Qualtrics). Electronic informed consent was obtained prior to the start of the survey. Participants completed the MBIQ followed by the story problems. Prior to beginning the story problems, participants were asked to complete the problems as quickly and accurately as possible and informed that they could use pencil and paper if needed but were asked not to use a calculator. After participants responded to the story problem by typing a number into the box, the problem disappeared, and the comprehension question appeared. A single random order of the arithmetic problems was used, but different participants saw different initial scenarios for a particular problem. There were six different lists of problems (3 [multiplication highlyinterfering, multiplication lessinterfering, division] by 2 [associative, dissociative]). Participants were randomly assigned to one of the six lists.
After every eight story problems, an entertaining meme (i.e., a humorous photo combined with text) appeared on the screen. The meme signalled to participants that they could take a break for up to three minutes before continuing with the story problem task. The memes were inserted to reduce fatigue and encourage engagement so that participants would try their best and answer all of the questions. The study took approximately 60 minutes to complete. Stimuli, data, and analyses for Study 1 and Study 2 can be found in the Supplementary Materials.
Results
Several criteria were used to exclude participants who did not seem to be putting forth their best effort. First, data were excluded for participants who obtained a total score of less than 50% on the mathematical part of the story problems (n = 40). Second, participants who failed to answer at least 8 of the last 10 story problems were excluded (n = 2), on the assumption that they had decided not to complete the study. Thus, 205 participants were included in the analyses.
Analyses were conducted using repeatedmeasures ANOVAs. Error bars in figures are standard errors of the means, to be consistent with MattarellaMicke and Beilock (2010) and Jarosz and Jaeger (2019). Furthermore, in addition to reporting frequentist statistics, Bayes factors were computed to evaluate the fit of the data under the null and alternative hypotheses. The Bayes factor, BF_{01}, is “a ratio that contrasts the likelihood of the data fitting under the null hypothesis with the likelihood of fitting under the alternative hypothesis” (Jarosz & Wiley, 2014). For example, a Bayes factor of 12.2 indicates that the data are 12.2 times more likely to occur under the null hypothesis than the alternative hypothesis (Jarosz & Wiley, 2014). Taking the inverse, BF_{10}, puts the Bayes factor in terms of the alternative hypothesis (e.g., BF_{01} = 12.2, BF_{10} = 1/12.2 = 0.082). The Bayes factors were calculated in JASP. All interpretations of the Bayes factors with respect to the strength of the evidence for the null or alternative hypothesis in the present analyses are in accordance with the guidelines developed by Jeffreys (1961) and as listed in Table 4 of Jarosz and Wiley (2014).
Story Problems
Story problem comprehension scores (M = 89.6%, SD = 10.9%) suggested that participants were reading the story problems in their entirety. Participants made errors on 7.2% (SD = 11.2) of multiplication problems and 10.9% (SD = 16.2) of division problems. This level of performance was similar to the percentage of errors for multiplication (Ms = 5–7%) and division (Ms = 7–9%) reported by Jarosz and Jaeger (2019) in three studies and to the percentage of error reported by MattarellaMicke and Beilock (2010) for multiplication (Ms approximately 6–7%) in two studies.
Multiplication Problems
To replicate the analyses performed by MattarellaMicke and Beilock (2010), percentage error for multiplication story problems was analyzed in a 2 (association: associated, dissociated) by 2 (interfering number: highly, less) repeated measures ANOVA (see Figure 1; compare to Figures 1 and 2 of MattarellaMicke & Beilock). The effect of association was not significant, F(1, 204) = 0.14, p = .71, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .001. The estimated Bayes factor, BF_{01} = 12.2, indicates strong evidence for the null hypothesis. The effect of interference was not significant, F(1, 204) = 0.33, p =.57, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .002. The estimated Bayes factor, BF_{01} = 11.2, indicates strong evidence for the null hypothesis. Critically, in contrast to the results of MattarellaMicke and Beilock, the interaction between interference and association was not significant, F(1, 204) = 1.06, p = .31, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .005. The estimated Bayes factor, BF_{01} = 7.1, indicates substantial evidence for the null hypothesis. In summary, no evidence was found for the pattern of results reported by MattarellaMicke and Beilock.
Figure 1
Comparison Across Operations
To compare our results to those of Jarosz and Jaeger (2019), story problem errors were analyzed in a 2 (association: associated, dissociated) × 2 (operation: multiplication, division) repeated measures ANOVA (see Figure 2). Replicating Jarosz and Jaeger, there was a significant main effect of operation, F(1, 204) = 17.63, p < .001, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .08. People made more errors on division problems (M = 10.9%) than on multiplication problems (M = 7.2%). The estimated Bayes factor, BF_{10} = 1.220e+5, indicates decisive evidence for the effect of operation.
In contrast, there was no significant main effect of association condition (M = 9.0% for associated; M = 9.1% for dissociated), F(1, 204) = 0.07, p = .80, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .001. The estimated Bayes factor, BF_{01} = 12.7, indicates strong evidence for the null hypothesis. Furthermore, in contrast to Jarosz and Jaeger, the interaction of association and operation was not significant, F(1, 204) = 0.33, p = .57, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .002. The estimated Bayes factor, BF_{01} = 7.7, indicates substantial evidence for the null hypothesis. Thus, we replicated the effect of operation, but did not find evidence that dissociated contexts influenced problem solving for division problems.
Figure 2
Discussion
We failed to replicate the findings of MattarellaMicke and Beilock (2010) and thus did not find support for the foregrounding hypothesis. Neither textual association nor numerical interference significantly affected story problem performance for multiplication (cf. Figures 1 and 2 in MattarellaMicke and Beilock vs. Figure 1 in the present study). We also did not replicate the findings of Jarosz and Jaeger (2019) who found that dissociative contexts resulted in more errors on division problems than associative contexts (cf. Figures 1 to 3 in Jarosz & Jaeger vs. Figure 2 in the present study).
The stimuli used in the present study were constructed using the same criteria as MattarellaMicke and Beilock (2010) and Jarosz and Jaeger (2019), but they were not identical. We assumed that, if association and interference truly influenced story problem solving, then the findings should be replicable with similar stimuli. We ensured that the story problems were counterbalanced such that each story appeared in every condition. It was not possible to determine if MattarellaMicke and Beilock used this stringent criterion based on the information they provided, however, Jarosz and Jaeger did counterbalance stories across conditions. Careful problem construction and counterbalancing greatly reduced the possibility that one or a few poorly constructed story problems in a particular condition might drive the pattern of results.
Study 2
In Study 2, we attempted to provide a test of the foregrounding hypothesis that would maximize the opportunity for an interfering number in an associative context to influence problem solving. Thus, Study 2 was identical to Study 1 with one exception: Instead of presenting story problems that included either a highlyinterfering or lessinterfering number, the associated and dissociated multiplication problems contained either a highlyinterfering number (as in Study 1) or a nonnumeric word. Division problems all had nonnumeric scenarios.
According to MattarellaMicke and Beilock (2010) and the foregrounding hypothesis, only numerical information should interfere with problem solving. Thus, we directly compared numerical interference (i.e., the same number used in the highlyinterfering story problems in Study 1) to nonnumerical interference (i.e., an interfering word as opposed to a number). For example, “She put/removed nail polish on/from 5 fingers and then headed to work” was compared with “She put/removed nail polish on/from some fingers and then headed to work.” Note that, in their Study 3, Jarosz and Jaeger (2019) found an interaction between operation and association, even without any numerical interference. They concluded that the interaction was driven by the relation between the protagonists’ actions and the required operation, not by numerical interference. Thus, Study 2 provides a partial conceptual replication of the effects for division problems reported by Jarosz and Jaeger (Study 3).
Method
Participants
Four hundred and twentyfive undergraduate students at a Canadian university accepted the invitation to participate in the study. As in Study 1, participants were excluded if they obtained a total score of less than 50% on the mathematical part of the story problems (n = 64). Participants who failed to answer at least 8 of the last 10 story problems were also excluded (n = 2), on the assumption that they had decided not to complete the study. After exclusion criteria were applied, data for 359 participants (84.5%) were analyzed (240 females; M_{age} = 20.1 years, SD = 3.8). All retained participants reported speaking fluent English, with 81.3% reporting English as their first language. After English, the most common first languages reported were Mandarin (5.0%), Arabic (3.6%), and French (3.6%). The remaining 6.5% of participants reported other first languages with a frequency of less than 1%.
Measures
With the exception of the changes made to the interference component of the story problems, the measures for Study 2 were the same as those reported in Study 1.
Procedure
The procedure for Study 2 was the same as that reported in Study 1.
Results
Story Problems
Story problem comprehension scores (M = 88.8%, SD = 9.9%) suggested that participants were reading the story problems in their entirety. Participants made errors on 5.0% (SD = 8.1%) of multiplication problems and 12.4% (SD = 19.3%) of division problems. These means are similar to those reported in Study 1, and in MattarellaMicke and Beilock (2010) and Jarosz and Jaeger (2019).
Multiplication Problems
Percentage error for multiplication story problems was analyzed in a 2 (association: associated, dissociated) by 2 (interference: numeric, nonnumeric) repeated measures ANOVA (see Figure 3). There was a significant effect of interference, F(1, 358) = 4.28, p = .04, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .012. People made more errors on problems that contained numeric (M = 5.4%) than nonnumeric (M = 4.6%) scenarios. However, the estimated Bayes factor, BF_{10} = 0.13, indicates anecdotal evidence for the null hypothesis. Thus, there is not enough evidence to support the hypothesis that the presence of extraneous numbers made the problems more difficult.
The effect of association was not significant, F(1, 358) = 0.83, p = .36, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .002. The estimated Bayes factor, BF_{01} = 11.8, indicates strong evidence for the null hypothesis. Furthermore, the interaction between interference and association was not significant, F(1, 358) = 0.93, p = .34, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .003. The estimated Bayes factor, BF_{01} = 7.7, indicates substantial evidence for the null hypothesis. Thus, there was fairly strong evidence in favour of the null hypothesis for the effects of association and the interaction between association and interference. We again failed to replicate the findings of MattarellaMicke and Beilock (2010).
Figure 3
Comparison Across Operations
To compare our results to those of Jarosz and Jaeger (2019), story problem errors were analyzed in a 2 (association: associated, dissociated) × 2 (operation: multiplication, division) repeated measures ANOVA (see Figure 4). There was a significant main effect of operation, F(1, 359) = 63.34, p < .001, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .15. People made more errors on division problems (M = 12.4%) than on multiplication problems (M = 5.0%). The estimated Bayes factor, BF_{10} = 6.427e+25, indicates decisive evidence for the effect of operation. However, the effect of association was not significant (M = 8.7% for associated; M = 8.8% for dissociated), F(1, 358) = 0.03, p = .87, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .001. The estimated Bayes factor, BF_{01} = 16.7, indicates strong evidence for the null hypothesis. The association by operation interaction was also not significant, F(1, 358) = 0.38, p = .54, ${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .001 (see Figure 4). The estimated Bayes factor, BF_{01} = 11.1 indicates strong evidence for the null hypothesis. Thus, there is decisive evidence that the model should include an effect of operation and strong support for the null hypothesis for all other effects.
Figure 4
Summary of Accuracy Analyses
We again failed to replicate the patterns reported by MattarellaMicke and Beilock (2010). Although we did find a small effect of an interfering number compared to a nonnumerical word, the associated Bayes factor indicated that there was not much support for the model that included the effect of interference over the null model. Furthermore, although we found an effect of operation, we did not replicate the interaction between operation and association reported by Jarosz and Jaeger (2019). Thus, the results of this study did not support either the foregrounding or the inconsistentoperations hypothesis. To provide further insight into the possible effects of manipulations of operation, association, and interference on performance, we looked more closely at the types of errors that participants made.
Error Type Analysis
The foregrounding and inconsistentoperations hypotheses make specific predictions about the types of errors that people will make when they solve story problems that contain associative language and numerical interference. Thus, to better understand the types of errors individuals made, both in general and across each condition, errors for each story problem were analyzed, combining across Study 1 and Study 2.
Table 2 shows the error classifications for 1,463 out of the 2,112 errors participants made in both studies. The remaining 649 errors could not be classified into categories—they were either left blank or appeared to be mistyping errors (e.g., 7 × 9 = 634). Classifiable errors fell into two main categories: wrong operation errors and calculation errors. The foregrounding and inconsistentoperations hypotheses predict that participants will make specific wrong operation errors—using addition on multiplication problems with associated contexts and using subtraction on division problems with dissociated contexts. As shown in Table 2, however, addition and subtraction errors were infrequent (1.2% and 0.7% of total classifiable errors) and were approximately equally distributed across the story problem conditions. Instead, people frequently chose the wrong operation on division problems—multiplying instead of dividing (48.3% of classifiable errors). They rarely divided on multiplication problems (3.1% of classifiable errors), instead, most errors on multiplication problems were arithmetic miscalculations.
Table 2
Error Type  Conditions

Total  

Division

Multiplication


Associated  Dissociated  Associated

Dissociated


Interfering  NonInterfering  Interfering  NonInterfering  
Wrong Operation Errors  
Multiplication  333  373  –  –  –  –  706 
Division  –  –  13  13  8  12  46 
Addition  5  3  4  2  3  1  18 
Subtraction  1  0  3  2  3  1  10 
Calculation Errors  
Tablerelated multiplication  –  –  57  73  61  64  255 
Tablerelated division  91  67  –  –  –  –  158 
Other numerical errors  25  34  57  46  57  51  270 
Total Classifiable Errors  455  477  134  136  132  129  1,463 
The arithmetic calculation errors observed in the present studies are common in research on simple arithmetic (Campbell, 1995; Campbell & Xue, 2001) Tablerelated multiplication errors occurred when solvers used a multiplication fact that was one larger or smaller than the correct response (17.4% of classifiable errors; e.g., 9 × 6 = 48, the product of 8 × 6). Tablerelated division errors occurred when solvers’ answers for division problems were one unit off of the correct answer. For example, if the division question was 21 ÷ 3, participants responded with 8 or 6 instead of 7 (10.8% of classifiable errors). Other numerical errors occurred when solvers’ responses were numerically close to the correct response but did not meet the criteria for being a tablerelated error (18.5% of classifiable errors). For example, they responded to 9 × 4 with 34 (i.e., an operand intrusion error; Campbell, 1994).
Operation errors were more common than calculation errors for division (77% vs. 23%) but less common than calculation errors for multiplication (12% vs. 88%), χ^{2} (1, N = 1,463) = 564.98, p < .001. Participants were probably biased to select multiplication because there were twice as many multiplication problems as division problems. Interestingly, for division problems, only 31% of operation errors occurred on problems where both operands were less than 12, suggesting that even when problems had a large operand (e.g., 63 ÷ 7) participants still multiplied when they should have divided.
Among division errors, the percentage of errors on associative versus dissociative story problems did not differ for either operation errors (47% vs. 53%) or calculation errors (53% vs. 47%), χ^{2} (1, N = 932) = 2.43, p = .12. Similarly, among multiplication errors, the percentage of errors did not differ for associative or dissociative story problems for operation errors (57% vs. 43%) or calculation errors (50% vs. 50%), χ^{2} (1, N = 531) = 1.09, p = .30. These patterns of errors suggest that the association/dissociation manipulation did not influence the types of errors solvers made for either multiplication or division problems.
With respect to the foregrounding and inconsistentoperations hypotheses, Table 2 shows that participants rarely made addition and subtraction errors. Thus, there is little support for the idea that story problem content primes specific operations that would lead to specific types of errors. Instead, the majority of multiplication errors were the result of miscalculations and the majority of division errors were the result of choosing the wrong operation. In summary, there was no evidence to suggest that the association/dissociation portion of the story problem influenced error patterns.
General Discussion
In two studies, we examined adults’ performance on math story problems that included either associative or dissociative relations between objects and a protagonist. We tested the foregrounding hypothesis with a conceptual replication of two studies by MattarellaMicke and Beilock (2010). We tested the inconsistentoperations hypothesis with a partial conceptual replication of three studies by Jarosz and Jaeger (2019). Additionally, in Study 1, we examined the effect of numeric interference (i.e., highly vs. lessinterfering) on problem solving whereas in Study 2 we examined the effect of numeric versus nonnumeric interference on problem solving.
The results did not support either the foregrounding hypothesis (MattarellaMicke & Beilock, 2010) or the inconsistentoperations hypothesis (Jarosz & Jaeger, 2019). The foregrounding hypothesis predicts that highlyinterfering numerical information that is associated with the protagonist of the story will result in increased error rates on multiplication problems, presumably causing solvers to use an incorrect operation (i.e., addition instead of multiplication). MattarellaMicke and Beilock found this pattern of increased errors in two studies, however, in the studies reported here, we found no support for the foregrounding hypothesis. The inconsistentoperations hypothesis (Jarosz & Jaeger, 2019) predicted that associative or dissociative activities of the protagonist would interfere with multiplication or division, respectively—that is, when the operation implied by the protagonists’ activities was inconsistent with the operation required in the word problem, performance would be worse. In three studies reported in their paper, they found partial support for their hypothesis (i.e., only with division and dissociative actions). In the present research, however, we did not find evidence for the inconsistentoperations hypothesis with either multiplication or division.
In addition to the analyses of accuracy reported in the two prior studies, we looked more closely at the types of errors that participants made across story problem conditions. We found that participants rarely made the specific types of errors predicted by either the foregrounding or inconsistentoperations hypotheses, that is, using addition or subtraction instead of multiplication or division. In contrast, participants frequently made calculation errors on multiplication problems and chose the incorrect operation on division problems. Thus, the present studies showed no empirical evidence for either the foregrounding or the inconsistentoperations hypothesis, suggesting that information being associated with or dissociated from the protagonist of a story problem does not affect math word problem solving on these computationally simple problems.
There were four important differences between our studies and the two studies reported by MattarellaMicke and Beilock (2010). First, in the present research we used story problems that were counterbalanced across the experimental conditions. In contrast, it appears that MattarellaMicke and Beilock assigned specific story problems and numerical values to specific conditions. Thus, it is possible that specific story problems were more difficult than others in their study, resulting in the small differences they observed across conditions. Second, MattarellaMicke and Beilock had very small sample sizes for both studies (i.e., 38 and 21, respectively) whereas we had much larger sample sizes. Our data provide more confidence that effects are not due to the presence of extreme scores in small biased samples.
The third methodological difference between the previous research and the current work was that we presented each story problem as a unit whereas MattarellaMicke and Beilock (2010; see also Jarosz & Jaeger, 2019) presented the associative/dissociative context prior to the remainder of the problem. We chose this approach to increase the ecological validity of the problemsolving process and minimize the possibility that participants would notice that the initial foregrounding context was irrelevant to the arithmetic problem. The fourth difference between the previous work and ours was that we used comprehension questions to ensure that participants had read the whole text, whereas in MattarellaMicke and Beilock, participants rated the clarity of the text in comparison to other texts. Using comprehension questions allowed us to determine whether participants had read the associative/dissociative component of the story problem. We expected that our revised procedures would increase the chance of finding the predicted patterns if the foregrounding hypothesis was correct, because the critical text was presented simultaneously with the rest of the problem and the comprehension questions focussed on that content. In summary, despite the minor methodological differences across studies, we are confident that our design decisions, inclusion of Bayesian statistics, and large sample provided a reasonable test of the foregrounding hypothesis.
In comparing our results to those of Jarosz and Jaeger (2019), we found some similar patterns across operations (i.e., participants made more errors on division than multiplication problems) but did not replicate their partial support for the inconsistentoperations hypothesis. Although Jarosz and Jaeger included interfering numerical information in division problems and we did not, they found no effect of that manipulation, so these design differences are presumably not the reason for the different results. As in the current work, Jarosz and Jaeger counterbalanced problems across conditions, ensuring that there was no confounding of specific story elements with associative versus dissociative activities. Their sample sizes were larger than those of MattarellaMicke and Beilock (i.e., 66, 94, and 100 participants in three studies), although still smaller than those used in the present research. Once again, we are confident that our design decisions, inclusion of Bayesian statistics, and large sample sizes provided a reasonable test of the inconsistentoperations hypothesis.
In summary, the results of our study do not support the foregrounding or inconsistentoperations hypotheses. In fact, Bayesian analyses suggested that we have strong evidence in support of the null hypotheses for effects of association and interfering numbers. Even in the original study, MattarellaMicke and Beilock (2010) found only a 5% increase in multiplication errors in the critical condition. In general, across the two previous papers and in the current studies the overall percentages of error were similar; participants quite accurately solved simple arithmetic story problems. Patterns of errors were consistent with studies in which multiplication and division problems were mixed, resulting in crossoperation confusion effects (e.g., Bell, Fischbein, & Greer, 1984; Bell, Swan, & Taylor, 1981; Fischbein, Deri, Nello, & Marino, 1985; Graeber, Tirosh, & Glover, 1989). However, the text manipulations (i.e., associative vs. dissociative language) did not influence problem solving for our participants.
MattarellaMicke and Beilock (2010) and Jarosz and Jaeger (2019) also examined the relations between individual differences in working memory and performance on story problems. In some of their studies, effects of association were moderated by working memory skill. In their Study 2, MattarellaMicke and Beilock found that the increase in errors for foregrounded, interfering problems was related to individual differences in working memory for participants with lower working memory capacity. Jarosz and Jaeger, in their Study 1, found that participants with high working memory capacity made fewer errors than participants with lower working memory capacity, but they did not find interactions between working memory and association or interference. In their Study 2, participants were informed that they might be tested for their memory on any number that appeared in the story problem. Under these instructions, participants with low working memory capacity made more errors on dissociative division problems. In their Study 3, which was identical to the first study but did not contain numeric interference, they did not find any significant effects of working memory capacity.
Taken together, the patterns found in the two previous studies suggest that individual differences in working memory capacity may sometimes influence problem solving performance, consistent with many other studies (e.g., Hambrick & Engle, 2003; Wiley & Jarosz, 2012), but there was no clear pattern of results related to the associative/dissociative or interfering manipulations. Because patterns of working memory effects were not consistent across MattarellaMicke and Beilock and Jarosz and Jaeger, as described above, individual differences in working memory capacity seemed unlikely to be the source of the differences across studies. Accordingly, in the present studies we chose to focus on the effects of associative/dissociative text and interfering numbers on story problem performance.
Limitations
One limitation of the present studies was that we were unable to obtain the original stimuli used by MattarellaMicke and Beilock (2010). If we had access to the original stimuli, we could have performed a closer replication. However, we carefully followed the description of the stimuli outlined by MattarellaMicke and Beilock to design similar scenarios. Furthermore, if the foregrounding hypothesis was correct, it should be supported across a range of similar stimuli, as opposed to a specific set of story problems. Future studies that are able to directly test the stimuli used in the original MattarellaMicke and Beilock article may be able to provide more insights into why we failed to replicate their findings.
The online administration of this study is a second potential limitation. To ensure that participants were fully engaged in the study, strict exclusion criteria were implemented, which resulted in a relatively high exclusion rate (i.e., exclusion rates of 17.0% and 15.5% for Studies 1 and 2 vs. 5.2% and 4.8% in Studies 1 and 2 of MattarellaMicke & Beilock). However, Jarosz and Jaeger also excluded a substantial number of participants in Study 1 (26.7%) and Study 3 (13.0%), so exclusion rates are unlikely to explain the different findings. The online administration was also a strength because it allowed for a large sample size and full counterbalancing across all conditions. There is considerable evidence that online psychology studies produce similar results for a variety of effects originally observed in lab studies (Chuah, Drasgow, & Roberts, 2006; Cipora, Soltanlou, Reips, & Nuerk, 2019; Germine et al., 2012; Hilbig, Moshagen, & Zettler, 2016; Ihme et al., 2009). Furthermore, with more educators implementing classroom websites and using online assessments, the online environment may be representative of the typical environment in which students encounter word problems.
Implications and Conclusions
In two studies, we failed to replicate patterns of textual and numerical interference in math story problems that were reported in two previous papers (MattarellaMicke & Beilock, 2010; Jarosz & Jaeger, 2019). Bayesian statistics suggested strong support for the null hypotheses. Our findings emphasize the importance of large sample sizes, careful manipulations, and proper counterbalancing of problem features. Additionally, our examination of the specific errors made by participants revealed further information about underlying cognitive processes involved in problem solutions. Although linguistic features of math word problems undoubtedly influence problem solving in many situations (Daroczy et al., 2015; Fuchs, Fuchs, Compton, Hamlett, & Wang, 2015; Nesher, Hershkovitz, & Novotna, 2003; Thevenot & Barrouillet, 2015), our findings suggest that manipulating the associative relation between the protagonist and problem elements, specifically as irrelevant details unrelated to the situation model of the story problem, has no effect on adults’ ability to solve simple word problems. In summary, the present results suggest boundary conditions on the extent to which textual manipulations influence adults’ problemsolving performance.