The arithmetic verification task (8 × 4 = 24, true or false?) has been widely used and studied since the 1980s (e.g., Stazyk, Ashcraft, & Hamann, 1982) and has continued to be a part of diverse research in recent years including neurophysiological studies (e.g., Avancini, Soltész, & Szűcs, 2015; Domahs et al., 2007; Galfano, Penolazzi, Vervaeck, Angrilli, & Umiltà, 2009; Jasinski & Coch, 2012; NúñezPeña, GraciaBafalluy, & Tubau, 2011; Szűcs & Soltész, 2010), cognitive science (e.g., Desmet et al., 2012; Ghirardelli et al., 2010), and educational psychology (e.g., Rotem & Henik, 2013, 2015; van der Ven, Straatemeier, Jansen, Klinkenberg, & van der Maas, 2015). The verification task, and the arithmetic production task in which participants verbally produce an answer, have been mainstays of the cognitive arithmetic research literature for decades (Ashcraft, 1992; Zbrodoff & Logan, 2005). Verification problems may be solved by recognition (i.e., how much familiarity or “resonance” an equation activates; Zbrodoff & Logan, 1990), plausibilitybased strategies (e.g., 7 × 3 = 24 must be false because two odd numbers yield an odd product; Krueger, 1986; Krueger & Hallford, 1984; Lemaire & Reder, 1999), or verification problems can be solved by a retrieveandcompare strategy in which the problem’s correct answer is generated and directly compared to the presented answer (e.g., Ashcraft, Fierman, & Bartolotta, 1984; Avancini, Soltész, & Szűcs, 2015; Campbell & Fugelsang, 2001; Koshmider & Ashcraft, 1991; Romero, Rickard, & Bourne, 2006).
Despite wide usage of the arithmetic verification task in diverse research, an important unresolved issue concerns what the normative or default strategy is for simple arithmetic verification. Do participants typically produce the correct answer during verification or rely on a familiaritybased recognition strategy or plausibility strategy? When false equations may be consistently identified as false on the basis of a salient characteristic or manipulation (e.g., parity or magnitude disagreement with the correct answer; e.g., Lemaire & Reder, 1999) it is likely that participants do use a familiarity or plausibility check to decide true or false. When participants are not induced to exploit regularities in the experimental stimuli, what is the default strategy? In the present experiments we addressed this question for verification of simple multiplication equations.
There is a long history to this question. With respect to multiplication verification, Campbell (1987) argued that even when false answers make recognition or plausibility judgements difficult or unreliable (e.g., false answers are strong associative lures, such as 8 × 4 = 24), verification does not measure the same numberfact retrieval process as the production task. The difference may be explained by priming effects arising from the presented answer to be verified. Meagher and Campbell (1995; see also Campbell, 1987, 1991) measured effects of numerical primes (displayed for 200 ms) on production of multiplication facts (e.g., 4 × 8 = ?) with primeproblem interstimulus intervals (ISIs) of 0, 750, or 1500 ms. The 0 ISI condition approximates the simultaneous presentation of answer and operands in the standard verification task. Experiment 1 employed three kinds of numerical primes: correct, related, unrelated and a neutral prime (##). Correct primes were the correct answer to the upcoming problem (e.g., 24 for 4 × 6), a related prime was a multiple of one or the other operands (e.g., 28 or 18), and an unrelated prime was a product of whole number factors but not a multiple of either operand (e.g., 27). Relative to neutral primes, correct primes produced constant RT and accuracy benefits across ISIs, and unrelated primes produced constant RT costs. Related primes produced costs compared to unrelated primes at the 0ms ISI only. In Experiment 2, eliminating correctanswer primes from the stimulus set eliminated all the falseprime effects except the costs of related primes at the 0ms ISI.
To explain these findings, Meagher and Campbell (1995) proposed a fastacting retrieval priming mechanism that yields interference effects for relatedfalse primes and facilitation when the prime is the correct answer. These multiplication retrievalpriming effects were proposed to be automatic consequences of encoding a correct or associatively related numerical prime at the time of product retrieval. If such priming effects are automatic, retrievalpriming effects would also be expected to operate in the multiplication verification task, and similar effects have been observed in the product verification task (Campbell, 1987; Koshmider & Ashcraft, 1991). Priming and interference effects in multiplication production and verification do not provide strong evidence, however, that product retrieval is the default strategy for multiplication verification because they might reflect effects of related primes on equation recognition processes (Zbrodoff & Logan, 1990).
Furthermore, other phenomena raise doubts that product retrieval (e.g., as opposed to recognition of the equation without product retrieval) occurs during multiplication verification. Campbell and Tarling (1996) alternated multiplication production trials (e .g., 9 × 6 = ?) with verification trials (4 × 9 = 36, true or false?) and analyzed error priming. Error priming is the phenomenon that simplearithmetic errors produced in a running sequence of trials frequently match the answer to a problem solved earlier in the trial block (e.g., 9 × 4 answered correctly on Trial 10, then observe 9 × 6 = “thirty six” several trials later; Campbell, 1991, 1994; Campbell & Clark, 1989). They found that production errors were strongly primed by previous production trials (the erroranswer matching rate was about twice that expected by chance), but production errors were not strongly primed by previous verification trials. Conversely, verification errors were primed by previous verification trials, but not by production trials. Campbell and Tarling (1996) concluded that simple multiplication production and verification were mediated by different memory processes and suggested that a familiaritybased recognition process mediated product verification rather than a retrieveandcompare process.
The Present Experiments [TOP]
An alternative approach to investigating retrieval processes in product verification involves retrievalinduced forgetting effects observed in simple addition (e.g., 2 + 3 = ?) following retrieval practice of the multiplication counterparts (e.g., 2 × 3 = ?). Several experiments (e.g., Campbell, Dufour, & Chen, 2015; Campbell & Therriault, 2013; Campbell & Thompson, 2012a) have demonstrated a slowing of RTs and increased errors for addition counterparts following multiplication practice using a variant of the retrievalpractice paradigm developed to study retrievalinduced forgetting (RIF) with verbal materials by Anderson, Bjork, and Bjork (1994). In the original paradigm, several categorical word lists comprising categorycue pairs (e.g., FRUITOrange, FRUITBanana, PROFESSIONTeacher, PROFESSIONLawyer) are first viewed, then half of a subset of items in half the categories receive cuedretrieval practice (FRUITO, PROFESSIONL) and finally all cues are tested. The typical finding is that unpracticed items from the practiced category are more difficult to remember than items in the unpracticed categories. RIF may reflect inhibition of associative competitors during retrieval practice or cuebased interference (see Storm & Levy, 2012, for a review). In the present context, the critical feature of RIF is the principle that RIF is retrieval dependent: RIF is observed when the practice phase requires fact retrieval but not when the items are studied but not retrieved (see Anderson, 2003; Storm & Levy, 2012). Multiplicationinduced RIF of addition counterparts has been repeatedly shown to be retrieval dependent, observed only when multiplication fact retrieval is practiced (e.g., 4 × 6 = ?) but not when multiplication equations (e.g., 4 × 6 = 24) are studied, even when study practice is as effective as productretrieval practice at facilitating performance on a subsequent multiplication retrieval test of the practiced facts (Campbell et al., 2013; Campbell & Thompson, 2012a). Galfano, Penolazzi, Fardo, et al. (2011) found that 40 repetitions of passive practice (i.e., study) of simple multiplication equations did result in slower mean RT to verify related multiplication equations. In contrast, our paradigm, which provides only six practice repetitions and tests addition production rather than verification has never found addition RIF with study practice of multiplication facts (Campbell et al., 2013; Campbell & Thompson, 2012a; Maslany & Campbell, 2013).
The retrievaldependence of multiplicationfact retrieval induced addition RIF provides a diagnostic test for the occurrence of answer retrieval during multiplication verification trials. If product verification entails answer retrieval, rather than only familiaritybased recognition, for example, then subsequent RIF of addition counterparts should be as robust following verification practice as following multiplication production practice. Furthermore, we may expect multiplication RIF to be more robust following trueverification practice than falseverification practice owing to retrieval priming. According to Meagher and Campbell (1995), presented answers should prime correctproduct retrieval for true primes but interfere with correctproduct retrieval when presented answers are relatedfalse products.
In Experiment 1, two groups of participants received either six practice blocks of product verification trials (e.g., 4 × 6 = 28, true or false?) or product production trials (e.g., 4 × 6, state the product). False equations involved categoricallyrelated answers: specifically, the correct answer if one operand is changed by +/1 (e.g., 28 is an exemplar in the factor category 4). The multiplication practice phase was followed by an addition test phase with two additionproduction blocks including addition counterparts of practiced multiplication problems and counterpartunpracticed controls. The addition problems all had sums ≤ 10 or were socalled “tie” problems (2 + 2, 3 + 3, etc.), because larger, nontie additions (e.g., 9 + 6) usually do not produce the RIF effect (Campbell & Thompson, 2012a; Campbell et al., 2013; but see Campbell & Dowd, 2012). North American adults often display weak memory strength for the large nontie additions, making them weaker competitors for their multiplication counterparts and therefore less susceptible to RIF (see Anderson, 2003, for a discussion of competitiondependence and RIF). For the product verification practice group, across the six practice blocks the same multiplication problems appeared with true or false answers in all blocks (e.g., if counterbalancing assigned 4 × 6 to the true condition, it appeared as 4 × 6 = 24 in all six verification blocks). This design feature afforded tests of possible differences between RIF induced by practicing true and practicing relatedfalse verification equations.
Experiment 1 [TOP]
Method [TOP]
Participants [TOP]
Seventytwo participants were recruited at the University of Saskatchewan and received course credit or $7.50. Participants were assigned alternately to the multiplicationproduction practice or multiplicationverification practice condition yielding two groups of 36. Recruitment materials stipulated English as the first language for elementary arithmetic because addition RIF is potentially sensitivity to linguistic or cultural factors. The effect is robust in English speakers (Campbell & Thompson, 2012a), but Campbell et al. (2013) and Chen and Campbell (2017) did not observe RIF for small nontie addition problems among Chinese adult participants. The present sample included 50 women and 22 men (65 righthanded, 6 lefthanded, and 1 ambidextrous) with mean age of 26.9 years (SE = 1.06).
Apparatus [TOP]
Stimuli were presented using Eprime 2.0 (Schneider, Eschman, & Zuccolotto, 2012) on an LED monitor viewed by an experimenter and a CRT monitor viewed by the participant. Black characters in Courier New size 14 font appeared against a white background. The participant sat approximately 50 cm from the monitor, with a handheld microphone that detected the participant’s voice response and activated a switch that provided the stop signal to a software clock to measure RT.
Stimuli and Design [TOP]
Participants received six practice blocks of productverification trials (e.g., 4 × 6 = 28, true or false?) or six practice blocks of productproduction trials (e.g., 4 × 6 = ?, state the product) followed by a twoblock addition production test phase (e.g., 4 + 6 = ?). The multiplication practice phase included primary problems and filler problems (explained further on), and the test phase included only primary problems. For both the verification and production practice groups in both the practice and test phases, problem order was independently randomized for each block.
The primary multiplication and addition stimuli were composed from two sets of numerically small nontie (sum ≤ 10) and tie (i.e., repeated) operand pairs. For example, the number pair 2 and 5 yielded 2 × 5 and 2 + 5 (or the complements) and the pair 44 yielded 4 × 4 and 4 + 4. Direct memory retrieval is the predominant strategy reported by educated adults for the small nontie and tie multiplication and addition problems (Campbell & Alberts, 2009; Campbell & Xue, 2001; LeFevre, Bisanz, et al., 1996). Half of the participants in each of the verification and production practice groups received nonties with the smaller operand on the left (2 × 5) and for the other half it was on the right (5 × 2). Each set of operand pairs was comprised of two subsets, which were used to counterbalance problems across conditions. Set 1 included the pairs 25 28 36 44 77 (Subset 1) and 34 35 26 22 99 (Subset 2). Set 2 included pairs 23 27 46 55 66 (Subset 1) and 24 45 37 33 88 (Subset 2).
For the truefalse verificationpractice group, assignment of Set 1 and 2 to the multiplicationpracticed and multiplicationunpracticed condition, and assignment of the problem subsets to the trueverification and falseverification practice conditions, were fully counterbalanced across participants. Each operand pair appeared consistently as a true or false equation throughout verification practice. For each false verification trial a relatedfalse answer was assigned pseudorandomly by either increasing or decreasing one operand by one and multiplying it by the other operand (e.g., 4 × 8 = 24). False answers were restricted to not equal one of the problem’s operands or a multiple of five when 5 was one of the operands.
For the productionpractice group, exactly the same counterbalancing procedure with the problem sets and subsets was applied. For this group, however, who viewed a problem to be answered verbally (e.g., 2 × 5 = ?), rather than an equation to be verified (e.g., 2 × 5 = 10 or 2 × 5 = 8), the counterbalanced assignment of problem subsets to nominal true and false conditions allowed us to treat true vs. false as a properly counterbalanced factor (with respect to problem subsets) in analyses that combined the production and verification groups.
Additionally, during the practice phase both groups also received 10 large (sum > 10) nontie multiplication problems including 2 × 9, 3 × 8, 3 × 9, 4 × 7, 4 × 8, 4 × 9, 5 × 6, 5 × 7, 5 × 8 and 6 × 7. These served as filler problems to interfere with verification participants noticing that the same small/tie problems were consistently true or false. In each block, five of the foil problems were randomly selected to be true problems and the other five were false problems.
Following the practice phase, all participants received two test blocks of 20 addition production problems made from all operand pairs in both Sets 1 and Set 2. Addition counterparts of the multiplicationpracticed pairs (e.g., 2 + 5 is the addition counterpart of 2 × 5) were the RIF targets and the addition counterparts of multiplicationunpracticed pairs served as the control additions.
Procedure [TOP]
Participants were tested individually in a halfhour session that included a warmup task preceding the main experimental task. Instructions encouraged both speed and accuracy. In the warmup, the participant named the eight letters “a” through “h” appearing individually in a random order at the center of the screen. On each trial, a fixation dot appeared at the center of the screen and then flashed twice over a 1sec interval. On what would have been the third flash, the letter to be named appeared on the screen at fixation. For experimental trials, the fixation dot display was the same as in the warm up task and the problem appeared with the operator (× or +) at fixation. Verification participants were instructed to verbally respond "true" or "false" for verification trials and production participants were asked to state the correct product. In the addition production test phase, all participants were instructed to state the correct sum. Response timing began when the problem appeared and stopped when the participant's verbal response triggered the voiceactivated relay. The spoken response caused the problem to immediately disappear from the screen, which allowed the experimenter to detect and record spoiled RTs where the microphone had failed to detect response onset. After the experimenter entered the given answer or pressed the enter key, the fixation dot for the next trial appeared. There was no feedback about speed or accuracy.
Results [TOP]
For ANOVA tests, GreenhouseGeisser corrected statistics were reported when Mauchly’s Test indicated violation of the sphericity assumption. Along with null hypothesis significance tests we also reported a Bayes Factor (BF) for each test, calculated using MorePower 6.0 (Campbell & Thompson, 2012b). This program implements the Bayesian Information Criterion (BIC) as proposed by Masson (2011; see also Wagenmakers, 2007), which approximates the unitinformation prior as a default, objective Bayes prior probability (Wagenmakers, 2007). The BIC formulation favours H_{0} for small effect sizes making it conservative with respect to Type I errors (Nathoo & Masson, 2016). The estimated BF reported is the odds ratio of the null (H_{0}) over alternative hypothesis (H_{1}). For example, BF equal to 10 for a given ANOVA test indicates that the data favor H_{0} over H_{1} by 10 to 1, whereas a value of 0.1 indicates a 10 to 1 ratio in favour of H_{1.}^{i}
Multiplication Practice Phase [TOP]
RT [TOP]
A total of 451 practice RTs (5.2%) were marked for exclusion by the experimenter because the voicekey failed to detect response onset, or were discarded as outliers more than 2.5 SD from each Block (1 to 6) × Problem Type (true practiced, false practiced) mean for each participant. The overall error rate (excluding foil problems) during multiplication practice was 3.4%. Mean RT for correct responses received a Practice task (verification vs. production) × Problem type (true vs. false) × Block (1 to 6) ANOVA with practice task as a betweenparticipants factor and problem type and block as repeatedmeasures factors. True vs. false problem type was a pseudofactor for the productionpractice group. The corresponding means and SEs appear in Table 1.^{ii}
Table 1
Block  Verification Group

Production Group



True  False  “True”  “False”  
RT


1  1081 (50)  1229 (60)  1077 (54)  1054 (51) 
2  931 (35)  1148 (42)  954 (53)  936 (48) 
3  896 (35)  1146 (49)  963 (53)  969 (45) 
4  884 (39)  1090 (48)  896 (45)  944 (47) 
5  893 (40)  1070 (44)  915 (45)  912 (45) 
6  882 (38)  1067 (49)  881 (39)  906 (36) 
% Errors


1  2.2 (1.1)  8.3 (2.2)  2.2 (1.1)  4.4 (2.1) 
2  1.1 (0.8)  6.7 (1.6)  2.8 (1.8)  3.3 (1.3) 
3  1.7 (0.9)  5.6 (2.1)  1.1 (0.8)  3.9 (1.8) 
4  1.7 (0.9)  6.1 (1.9)  1.7 (0.9)  1.7 (0.9) 
5  2.2 (1.1)  6.1 (2.4)  0.6 (0.6)  2.8 (1.2) 
6  1.7 (0.9)  7.8 (1.8)  2.8 (1.4)  2.8 (1.4) 
Note. True vs. false was a pseudofactor for the productionpractice group in that the counterbalancing of problem subsets was the same as that used for the true/false factor for the verificationpractice group.
Mean RT followed a decelerating speedup function across practice blocks (means of 1110, 992, 994, 953, 947 and 943 ms) with greater RT gains from Block 1 to Block 2 than across later practice blocks [F(3.73, 261.27) = 20.86, p < .005, MSE = 38255, η^{2}_{p} = .23, BF < .0001]. There were no significant interactions involving the block factor (all ps > .23, η^{2}_{p} < .01, BFs > 75000). Overall, false problems were answered slower than true problems [F(1, 70) = 32.62, p < .001, MSE = 68124, η^{2}_{p} = .32, BF < .0001], but practice task interacted with problem type [F(1, 70) = 29.06, p < .001, MSE = 68124, η^{2}_{p} = .29, BF < .0001]. For the verificationpractice group, mean RT for false equations (1125 ms) was slower than for true equations (928 ms), whereas for the productionpractice group, for which truefalse was a pseudofactor, the nominally true and false problems had practically identical mean RTs (948 ms and 954 ms, respectively).
Error rate [TOP]
Table 1 includes the mean percentage of errors during the practice phase as a function of practice task (verification vs. production), problem type (true vs. false) and block (1 to 6). The corresponding ANOVA indicated a main effect of problem type [F(1, 70) = 14.16, p < .001, MSE = 151.22, η^{2}_{p} = .16, BF = .01], and there was weak evidence for the same Practice task × Problem type interaction observed in RT [F(1, 70) = 4.90, p = .030, MSE = 151.21, η^{2}_{p} = .07, BF = .74], with the verification group making more errors on false equations (6.8%) than true equations (1.8%), whereas the production group had more similar error rates for the nominally false (3.1%) and true (1.9%) problems. There were no other significant omnibus effects (all ps > .47, η^{2}_{p} < .02, BFs > 247000).
Addition Test Phase [TOP]
RT [TOP]
A total of 77 testphase RTs (2.7% of trials) were marked for exclusion by the experimenter or discarded as outliers more than 2.5 SD from each Block (1, 2) × Problem Type (true practiced, false practiced, or unpracticed) mean for each participant. The overall error rate during the addition test phase was 2.1% (61 errors). Mean RT for correct responses received a Practice task (verification vs. production) × Problem type (true practiced, false practiced or unpracticed) × Block (1 vs. 2) ANOVA with practice task as a betweenparticipants factor and problem type and block as repeatedmeasures.^{iii}
Mean RT (see Table 2) was faster in Block 2 (M = 852 ms, SE = 21 ms) than in Block 1 (M = 921 ms, SE = 24) [F(1, 70) = 38.23, p < .001, MSE = 13219, η^{2}_{p} = .35, BF < .0001].^{4} RT differed across problem types with means of 905 ms, 897 ms and 856 ms for the true practice, false practice and unpracticed conditions respectively [F(2, 140) = 8.59, p < .001, MSE = 11656, η^{2}_{p} = .11 for the omnibus test, BF = .04], with weak evidence for a linear component of the threeway interaction [F(1, 70) = 4.55, p = .036, MSE = 8182, η^{2}_{p} = .06, BF = .88].
Table 2
Block  VerificationPractice Group

ProductionPractice Group



Unpracticed  ΔTrue  ΔFalse  Unpracticed  ΔTrue  ΔFalse  
RT


1  841 (30)  85 (21)***  50 (15)**  930 (41)  31 (25)  44 (33) 
2  801 (30)  21 (20)  3 (18)  853 (28)  59 (22)*  67 (21)** 
M  821 (29)  53 (14)***  27 (12)*  892 (33)  45 (18)*  56 (22)* 
% Errors


1  2.2 (0.8)  1.7 (0.9)  0.6 (1.5)  1.4 (0.6)  3.1 (1.6)  1.9 (1.7) 
2  1.4 (0.6)  0.8 (0.6)  1.9 (1.7)  1.4 (0.7)  1.9 (1.4)  1.4 (1.2) 
M  1.8 (0.5)  1.3 (0.5)  1.3 (1.3)  1.4 (0.4)  2.5 (1.2)  1.7 (1.2) 
Note. ΔTrue and ΔFalse represent potential RIF effects (i.e., difference RT relative to the unpracticed condition as subtrahend) for true and false problems respectively. True vs. false was a pseudofactor for the productionpractice group in that the counterbalancing of problem subsets was the same as for the verificationpractice group.
*p ≤ .05. **p ≤ .01. ***p ≤ .001. With df = 35.
To pursue this, we computed for each participant the mean difference between the true practiced and unpracticed (i.e., baseline) condition, and between the false practiced and unpracticed (i.e., baseline) condition. These two difference scores represent potential RIF effects generated by true and false practice trials, respectively. Positive differences correspond to longer RT in the practiced condition (i.e., RIF). A Block × Problem Type ANOVA was conducted for each group (i.e., verification or production practice task). The true vs. false factor is a pseudovariable for the production practice group. The corresponding means and SEs appear in Table 2.
The analysis of the verificationgroup data provided weak evidence for a larger RIF effect in Block 1 (68 ms, SE = 15.3) than Block 2 (12 ms, SE = 17.0) [F(1, 35) = 5.45, p = .025, MSE = 20047, η^{2}_{p} = .14, BF = .44], which has been a common finding in arithmetic RIF (Campbell et al., 2013; Campbell & Thompson, 2012a). There was also weak evidence for larger RIF for truepracticed problems (53 ms, SE = 13.6) than falsepracticed problems (27 ms, SE = 11.7) [F(1, 35) = 4.68, p = .037, MSE = 5359, η^{2}_{p} = .12, BF = .63]. The corresponding analysis of the productiontask group indicated no significant effects of block or problem type (all ps > .38), but both groups presented evidence of RIF overall: 50.3 ms for the production group [t(35) = 3.40, p = .002, SE = 14.8, η^{2}_{p} = .25, BF = .03] and 40.0 ms for the verification group [t(35) = 3.60, p = .001, SE = 11.1, η^{2}_{p} = .27, BF = .02]. Thus, the current experiment provided strong evidence that both the multiplication production and multiplication verification tasks induced RIF of the addition counterparts expressed in verbal production RTs.
Error rate [TOP]
Table 2 includes the mean percentage of test phase errors by practice task (verification or production), problem type (true practiced, false practiced, or unpracticed) and block (1 and 2). There was a total of 61 errors in the test phase, 2.1% of trials. Of the 432 Participant × Problem type × Block cells, 375 (86.8%) contained zero errors. The preponderance of cells at the measurement floor precluded detailed inferential analyses of test phase errors.
Discussion [TOP]
The finding of robust addition RIF from practicing multiplication counterparts in a verification task implies that the multiplication verification problems were solved using a retrieveandcompare strategy (e.g., Koshmider & Ashcraft, 1991; Romero et al., 2006) rather than solved by evaluating equation familiarity (i.e., recognition) without explicit retrieval of the correct product (e.g., Campbell & Tarling, 1996; Zbrodoff & Logan, 1990). This conclusion follows because the addition RIF effect has been repeatedly demonstrated to be retrieval dependent, and not observed when the multiplication equations are only studied (Campbell et al., 2013; Campbell & Thompson, 2012a). The overall RIF effect size on addition RT was about the same following verification practice (40 ms) and production practice (50 ms). There was weak evidence that RIF for the verification practice group was greater in connection with true (53 ms) than false (27 ms) verification trials, although the RIF effect was significant at .05 for both types. This fits with the proposal by Campbell (1987) that true verification equations are more likely to prime retrieval of the problem’s correct answer than are false verification trials. Campbell (1987) argued that relatedfalse verification products actively interfered with retrieval of the correct answer and promoted retrieval errors. This would contribute to slower RT and higher error rate for relatedfalse than true equations and also contribute to weak addition RIF from relatedfalse verification equations.
Experiment 1 provided evidence that verification of simple multiplication equations was solved by a retrieveandcompare strategy, but is this finding owed to using closely related false answers? A plausible consequence of using relatedfalse products is that discrimination of true and false equations based only on familiarity was not a viable strategy because both trial types would produce a strong familiarity response (e.g., 2 × 8 = 14 might seem initially plausible). This could discourage use of familiarity information to perform the verification task and promote predominant use of a retrieveandcompare strategy. Experiment 2 pursued RIF of addition fact retrieval manipulating the relatedness of false verification answers during the multiplication practice phase. One group constituted a replication of the verification condition in Experiment 1 in which false multiplication answers were categorically related (i.e., a multiple of one of the factors and the correct answer if one operand was changed by ±1; e.g., 2 × 8 = 14). For the second group in Experiment 2, the false answers were unrelated in that they were not a multiple of either operand; e.g., 2 × 8 = 21). Participants find it much easier to reject such unrelatedfalse answers compared to relatedfalse answers (Campbell, 1987; Koshmider & Ashcraft, 1991). Having all false equations appear with an unrelatedfalse product (e.g., 2 × 8 = 21) could increase the utility of a familiaritybased recognition strategy because true equations (e.g., 2 × 8 = 16) and unrelatedfalse equations (e.g., 2 × 8 = 21) may be readily discriminable based on familiarity. In this case, in Experiment 2 we would expect to observe RIF for the relatedfalse multiplication group as in Experiment 1 but not the unrelatedfalse multiplication group because using a recognitionbased strategy that does not require explicit retrieval of the correct product should not produce RIF of addition counterparts.
Experiment 2 [TOP]
Method [TOP]
Seventytwo participant who did not participant in Experiment 1 were recruited in the same way as in Experiment 1. The sample included 51 women and 21 men with a mean age of 22.7 years (SE = 0.60). There were 66 righthanded, 5 lefthanded, and 1 ambidextrous participant. Experiment 2 was the same as Experiment 1 except that the production group was replaced with an unrelatedfalse verification group. Unrelated false answers were created by randomly selecting per trial one of the four answers produced by adding or subtracting 1 to/from both operands and then using the product as the false answer. In the case that no nonmultiples were produced with this method, the false answer was the true answer plus or minus 1.
Results [TOP]
Multiplication Practice Phase [TOP]
RT [TOP]
A total of 199 practice RTs (2.3%) were excluded as in Experiment 1. The overall error rate during multiplication practice was 4.5%. Mean RT for correct responses received a Falseanswer type (related practice group vs. unrelated practice group) × Problem type (true vs. false) × Block (1 to 6) ANOVA with falseanswer type as a betweenparticipants factor and problem type and block as repeatedmeasures factors. The corresponding mean RTs and SEs appear in Table 3.
Table 3
Block  Unrelated False Group  Related False Group  

True  False  True  False  
RT


1  958 (44)  1079 (42)  940 (33)  1191 (57) 
2  838 (39)  987 (44)  873 (36)  1044 (44) 
3  797 (38)  986 (57)  822 (32)  1042 (40) 
4  814 (34)  945 (43)  782 (31)  1006 (49) 
5  820 (38)  932 (41)  794 (28)  950 (37) 
6  784 (30)  896 (40)  783 (30)  965 (40) 
% Errors


1  5.6 (1.7)  7.8 (2.0)  3.3 (1.3)  7.8 (2.7) 
2  0.6 (0.6)  5.6 (2.2)  1.7 (0.9)  7.2 (2.0) 
3  2.8 (1.4)  5.0 (1.9)  2.2 (1.1)  8.9 (2.8) 
4  2.8 (1.2)  3.9 (1.6)  3.3 (1.5)  7.8 (2.9) 
5  2.8 (1.4)  3.3 (1.5)  1.1 (0.8)  5.6 (2.2) 
6  1.7 (0.9)  5.6 (1.9)  3.9 (1.8)  7.2 (2.3) 
As in Experiment 1, mean RT followed a decelerating speedup function across practice blocks with means of 1042, 935, 912, 887, 873, and 857 ms [F(3.642, 254.955) = 36.82, p < .001, MSE = 24099, η^{2}_{p} = .35, BF < .0001]. There were no significant interactions involving the block factor (all p > .05). False equations were answered slower than true equations [F(1, 70) = 195.32, p < .001, MSE = 31311, η^{2}_{p} = .74, BF < .0001], but the between group falseanswer type factor interacted with problem type [F(1, 70) = 7.28, p = .009, MSE = 31311, η^{2}_{p} = .09, BF = .24]: The two groups had similar mean RTs for true equations (835 ms and 832 ms for the unrelatedfalse and relatedfalse groups, respectively), but relatedfalse equations were answered slower on average (1033 ms) than unrelatedfalse equations (971 ms). There were no other significant omnibus effects (all p > .06).
Error rate [TOP]
Table 3 presents mean error rates during the practice phase as a function of falseanswer type (related vs. unrelated), problem type (true vs. false) and block (1 to 6). The only significant test was the main effect of problem type [F(1, 70) = 10.19, p = .002, MSE = 283.64, η^{2}_{p} = .09, BF = .06; all other pvalues > .06]. False equations had a higher error rate (6.3%) than true equations (2.6%).
Addition Test Phase [TOP]
RT [TOP]
A total of 77 test RTs (2.7%) were discarded as outliers as in Experiment 1. The error rate was 3.3% of trials. Mean RT for correct trials was analyzed as in Experiment 1 and the means and SEs appear in Table 4. We conducted a 2 (Falseanswer type group: related or unrelated during practice) × 2 (Block: 1 or 2) × 3 (Problem Type: true practiced, false practiced or unpracticed) ANOVA with falseanswer type a betweenparticipants measure and block and problem type repeated measures. The ANOVA indicated that mean RT was faster in Block 2 (774 ms) than in Block 1 (945 ms) [F(1, 70) = 53.84, p < .001, MSE = 9996, η^{2}_{p} = .44, BF < .0001]. There was weak evidence that mean RT differed across problem types with means of 822 ms, 816 ms and 790 ms for the truepracticed, falsepracticed and unpracticed conditions respectively [F(2, 140) = 4.185, p = .017, MSE = 10295, η^{2}_{p} = .06, but with BF = 2.20, the Bayesian analysis slightly favored H_{0}]. This was qualified by the linear component of the threeway interaction [F(1, 70) = 9.11, p = .004, MSE = 4710, η^{2}_{p} = .12, BF = .10].
Table 4
Block  Unrelated False Group

Related False Group



Unpracticed  ΔTrue  ΔFalse  Unpracticed  ΔTrue  ΔFalse  
RT


1  838 (36)  62 (23)*  70 (25)**  794 (27)  25 (22)  16 (17) 
2  795 (28)  9 (16)  9 (22)  732 (25)  51 (16)**  13 (14) 
M  816 (31)  26 (16)  40 (19)*  763 (25)  38 (16)*  14 (11) 
% Errors


1  4.7 (1.4)  3.1 (2.5)  0.8 (1.5)  1.1 (0.5)  < 0.01 (0.8)  4.4 (2.0) 
2  4.4 (1.2)  0.6 (1.4)  2.2 (1.3)  1.4 (0.7)  0.8 (1.1)  0.3 (1.1) 
M  4.6 (1.2)  1.3 (1.4)  0.7 (0.9)  1.3 (0.5)  0.4 (0.7)  2.4 (1.3) 
Note. ΔTrue and ΔFalse represent potential RIF effects (i.e., difference RT relative to the unpracticed condition as subtrahend) for true and false problems respectively.
*p ≤ .05. **p ≤ .01. With df = 35.
As in Experiment 1, to pursue the threeway interaction a Block × Problem Type ANOVA was conducted for each group (i.e., unrelatedfalse verification and relatedfalse verification groups) on RT estimates of RIF for each cell (i.e., mean addition RT for truemultiplication practiced minus unpracticed, and mean RT for falsemultiplication practiced minus unpracticed). The analysis of the unrelatedfalse group data indicated a larger RIF effect in Block 1 (66.1 ms) [t(35) = 3.54, p = .001, SE = 18.7, η^{2}_{p} = .26, BF =.02] than Block 2 (.01 ms) [t(35) = .001, p = .999, SE = 13.6, η^{2}_{p} < .001, BF = 6.00], with the test for the main effect of block indicating F(1, 35) = 11.76, p = .002, MSE = 13361, η^{2}_{p} = .25, BF = .03. There was no main effect of problem type or Block × Problem Type interaction (both p > .5, BF > 5.0). Thus, there was no evidence that addition RIF differed as a function of problem type (i.e., true equations vs. unrelatedfalse equations).
The corresponding analysis of the relatedfalse group indicated no significant effects, although the test for the main effect of problem type (true practice vs. false practiced) was in the same direction as observed in Experiment 1 with RIF of 38 ms for truepracticed problems [t(35) = 2.33, p = .025, SE = 16.4, η^{2}_{p} = .13, BF = .45] compared to 14 ms for relatedfalse practiced problems [t(35) = 1.27, p = .213, SE = 11.4, η^{2}_{p} = .04, BF = 2.67] with F(1, 35) = 3.13, p = .09, MSE = 6400, η^{2}_{p} = .08, BF = 1.28, for the main effect of problem type. There was weak evidence of greater RIF following true than following relatedfalse multiplication verification for this group in Block 2 (51 ms for true vs. 13 ms for false; t(35) = 2.22, p = .033, SE = 17.3, η^{2}_{p} = .12, BF = .59).
Error rate [TOP]
Table 4 includes the mean percentage of test phase errors by falseanswer type, practice type (true practiced, false practiced, or unpracticed) and block (1 and 2) in Experiment 2. Of the 432 Participant × Practice Type × Block cells, 353 (81.7%) contained a value of 0. We did not pursue inferential analyses of test phase errors.
Discussion [TOP]
During the multiplicationpractice phase, unrelatedfalse equations were answered faster on average than relatedfalse equations (971 ms vs. 1033 ms), but this difference between the groups was not observed for true equations (835 ms vs. 832 ms). Thus, as expected, unrelatedfalse equations were relatively faster to be identified as false compared to relatedfalse equations. Nonetheless, in the test phase, having answered unrelatedfalse multiplication equations produced quite robust RIF of the addition counterparts in Block 1 (70 ms), suggesting that retrieval of correct products occurred for the unrelatedfalse multiplication problems in the practice phase. The relatedfalse replication group produced significant RIF only for true equations but the evidence was weak that the addition RIF effect from practice of true multiplication equations was statistically greater than from practice of relatedfalse equations and only observed in test Block 2. When the two relatedfalse verification groups from Experiments 1 and 2 were combined, the observed addition RIF averaged across blocks was approximately twice as large following practice of true productverification equations (46 ms, SE = 10.6) compared to following relatedfalse equations (21 ms, SE = 8.1) [F(1, 70) = 7.76, p = .007, MSE = 5799, η^{2}_{p} = .10, BF = .20]. Thus, the combined experiments provided positive evidence that verification of true multiplication equations induced stronger RIF in addition counterparts than did practice of relatedfalse equations.
General Discussion [TOP]
Arithmetic verification is a widelyused experimental task, but what type of cognitive skill does it measure? Two experiments were designed to use RIF of addition fact retrieval as a diagnostic tool to assess whether multiplication product retrieval or a familiaritybased recognition strategy was used to solve truefalse multiplication verification equations. Selection between these two strategies has been assumed to depend on familiarity or plausibility of the false answers used. We proposed that addition RIF from multiplication retrieval practice is indicative of answer retrieval rather than only a familiarity/plausibility check for multiplication verification because addition RIF has repeatedly been demonstrated to be retrieval dependent (e.g., Campbell & Thompson, 2012a; Campbell et al., 2013). Merely studying correct multiplication equations (e.g., 2 × 3 = 6) does not produce a subsequent slowing of correct answer retrieval for the addition counterparts (2 + 3 = ?), even when study practice is as effective as retrieval practice in facilitating subsequent product retrieval of the studied multiplication problems (Campbell & Thompson, 2012a). Accordingly, we would expect to observe addition RIF following practice of multiplication verification only if correct product retrieval occurred during verification performance. Thus, the addition RIF effect provides indirect evidence that multiplication answer retrieval occurred during the practice phase.
Experiment 1 used relatedfalse verification products (e.g., 6 × 4 = 28), which are products categorically related to one of the problem factors (i.e., operands). These would be expected to be relatively difficult to identify as false based on familiarity alone and therefore likely to promote a retrieveandcompare strategy. Strong addition RIF was observed from verification practice of multiplication counterparts and it was not statistically different in effect size compared to the RIF produced by answerproduction multiplication practice. Nonetheless, there was evidence in Experiment 1 that trueverification produced a larger addition RIF effect on addition counterparts than did falseverification multiplication practice. The relatedfalse replication group in Experiment 2 did not produce as strong evidence as Experiment 1 that true multiplication verification yielded stronger addition RIF than relatedfalse verification, but the effect was present for this group in Block 2 (51 ms for true vs. 13 ms for false), and evidence for the effect was positive when the two relatedfalse verification groups from Experiments 1 and 2 were combined, with RIF averaged across blocks approximately twice as large for true as for relatedfalse equations (46 ms vs. 21 ms). Campbell (1987) proposed that true verification equations primed retrieval of the correct product, whereas relatedfalse equations interfered with retrieval of the correct product and often induced retrieval errors (see also Meagher & Campbell, 1995; Romero et al., 2006, p. 106). It follows that, because addition RIF is retrieval dependent, correctanswer priming and relatedanswer interference effects could contribute to stronger addition RIF from practice of true than relatedfalse multiplication verification equations.
Experiment 2 introduced an unrelatedfalse multiplication practice condition. Although unrelatedfalse verification was easier (i.e., faster) than relatedfalse verification, which might have induced a familiaritybased recognition strategy that would not produce addition RIF, we nonetheless observed addition RIF from practice of unrelatedfalse multiplication equal in magnitude to trueverification equations. This is consistent with correctproduct retrieval mediating both true and unrelatedfalse verification equations. Perhaps correct answer priming on true trials promotes retrieveandcompare more generally, at least when false answers are somewhat plausible. The similar RIF effect size for true and unrelatedfalse multiplication equations suggests that retrieveandcompare is the default strategy under these experimental conditions. Finding addition RIF from verification practice of true, relatedfalse and unrelatedfalse trials does not imply that a retrieveandcompare strategy was used exclusively for multiplication verification in the present studies. Indeed, Romero et al. (2006, Experiment 1) found that North American university students reported using retrieveandcompare on 72% of multiplication verification equations involving ties (e.g., 6 x 6 = 36) and small nonties with a sum ≤ 10). Consistent with this, our results imply an answerretrievalbased strategy was used for a sufficiently large proportion of true, relatedfalse and unrelatedfalse multiplication verification trials to induce a robust RIF effect on the addition counterpart problems.
Reconciling the RIF and ErrorPriming Evidence [TOP]
Arithmetic RIF and error priming in arithmetic both reflect associative competition in retrieval but the two phenomena may arise through different mechanisms. As explained previously, Campbell and Tarling (1996) found in blocks of alternating production and verification trials that error priming was task specific (i.e., correct production trials such as 8 × 3 = “twenty four” primed future production errors such 8 × 4 = “twenty four”, but did not prime future verification errors, and vice versa).They concluded that multiplication production and verification were mediated by different memory processes and suggested a familiaritybased over a retrievalbased model of arithmetic verification. In contrast to this sametask dependency for error priming, the present results provided good evidence that both multiplication production and verification involved retrieval of the correct product, as evidenced by RIF of the addition counterparts. This is not the only dissociation observed between error priming and RIF in arithmetic. Campbell (1994) examined error priming in both simple addition and multiplication in separate blocks with the problem format (Arabic digits or English number words) alternating across trials. Both multiplication and addition presented formatspecific error priming but no crossformat error priming (i.e., Arabic format problems primed errors on later Arabicformat problems but not on wordformat problems, and vice versa). Nonetheless, Campbell and Thompson (2012a) found that practicing multiplication facts in visual number word format (e.g., three × four) induced RIF measured in RT for addition counterparts tested in digit format (e.g., 3 + 4).
Thus, although error priming and RIF in numberfact retrieval are both indicators of retrieval competition, they apparently arise from distinct mechanisms. Error priming is sensitive to the conditions of problem encoding (verification equation vs. production problem; Arabic digit vs. number word format) whereas arithmetic RIF is less sensitive to these factors. Error priming has been shown to be stronger when the priming problem and errorprimed problem have the same operand in the same left or right position. For example, solving 6 × 4 is more likely to prime its answer (24) as an error to 8 × 4 than to 4 × 8 (Arbuthnott & Campbell, 1996). Error priming therefore may reflect interference resulting from reinstantiation of a recent retrieval pathway when the current problem shares encoding surface features with a previous retrieval episode (e.g., a common operand, operand surface format and operand spatial position). As noted previously, study of multiplication facts (i.e., viewing and reading aloud 3 × 4 = 12) does not induce RIF in addition counterparts even when the study phase produces robust facilitation in a subsequent product production task (3 × 4 = ?; Campbell & Thompson, 2012a). This indicates that priming of problem surface features and increasing the memory strength of the studied problem is insufficient to produce RIF. Instead, addition RIF effects may reflect inhibition of retrieval competitors during multiplication retrieval practice and this inhibitory mechanism of interference occurs when the target is successfully retrieved regardless of the similarity of practice and test problem format (see Storm & Levy, 2012, for a review of the status of an inhibition theory of RIF).
Conclusion [TOP]
The present experiments provided strong evidence that multiplication verification produced RIF in addition counterparts expressed in slower addition RTs. Multiplicationinduced RIF of addition counterparts has been repeatedly shown not to occur when multiplication equations are studied and no answer is generated, as would normally be the case if multiplication verification was solved by a familiaritybased recognition strategy to discriminate true from false equations. Consequently, the present results are strong evidence that product retrieval occurred in multiplication verification, even for false equations with weakly associated presented answers (i.e., the unrelated false products used in Experiment 2). We propose that except under conditions in which participants are induced to use logical criteria (e.g., odd even agreement of presented and correct answer) or that answers are very implausible (remote in numerical magnitude from correct) that retrieval of the product is normally a routine stage of the verification process (Ashcraft et al., 1984; Koshmider & Ashcraft, 1991), at least for the small and tie simple multiplication problems that induce RIF in addition counterparts (Campbell & Thompson, 2012a). The results also suggest that verification practice can be a useful task to facilitate multiplication production learning, given that answer memory retrieval is routinely involved in verification practice.