Distributed Practice and Time Pressure Interact to Affect Learning and Retention of Arithmetic Facts

Arithmetic is commonly taught through timed practice and drill, yet little research exists to guide optimal practice structure. This study investigated the effects of distributed practice and time pressure on the acquisition and retention of arithmetic facts. Following a pretest, adult participants (n = 211) were randomly assigned to learn unfamiliar times tables (17 and 19) in one of ten conditions in a 5 (spacing: daily, every other day, weekly, every 10 days, every other week) x 2 (time pressure: timed or untimed) factorial design. After the learning phase, retention tests were given to measure both accuracy and response time immediately, after a ten-day delay, and at the end of semester. Time pressure during learning elevated participants’ perceived stress. It also led to faster response times during testing when learning was spaced daily and every other day, but slower response times for all other spacings. These patterns were reversed in the absence of time pressure during learning. While timed and untimed practice during learning led to similar forgetting of practiced facts over time, untimed practice allowed participants to gradually improve on unpracticed facts and conceptually related facts across test phases. Ultimately, distributed practice and time pressure may interact in complex ways to affect the learning and retention of arithmetic facts, and the effects shown in previous studies using verbal material (e.g., narrative texts, word lists) may not generalize to arithmetic.

Research on distributed practice also has been conducted in practical settings (e.g., Goossens et al., 2016), as investigating the spacing effect in educational contexts can lead to valuable recommendations for improving the quality of learning . Some previous recommendations for incorporating distributed practice in the classroom include splitting up lessons, revisiting information, tailoring technology, and administering cumulative assessments (Carpenter & Agarwal, 2020). Additionally, retention tests can serve as distributed practice sessions that enhance long-term retention (Roediger & Karpicke, 2006). In fact, several academic curricula, including Saxon math ematics, now incorporate distributed practice by distributing and integrating concepts to provide multiple learning opportunities (Bullock et al., 2015).
To provide more precise recommendations about optimal spacing, it is necessary to directly examine the timing of memory consolidation and retroactive interference as well as how this fits with the storage of memories in the hippocampus and the transfer of knowledge to the cortex (Squire et al., 2015). Some evidence suggests that hippocampal memory loss may occur approximately one week after learning new material (Fisher & Radvansky, 2018), although this is not observed for all types of material, such as well-integrated narratives (Fisher & Radvansky, 2019). According to Radvansky et al. (2022), memory retention passes through several phases, including working memory (WM) for time periods less than one minute, early Long-Term Memory (e-LTM) for time periods from one minute to 12 hours, transitional Long-Term Memory (t-LTM) for time periods from 12 hours to around seven days, and Long-Lasting Memory (LLM) for periods longer than around one week. However, the ability to observe the consequences of these transitions can vary depending on the complexity of the material, how well it was learned, and its similarities to previously learned material.
The concept of multiple memory phases, regardless of the exact number of phases, is consistent with the two-factor model, which integrates encoding variability and study-phase retrieval mechanisms into a single model (Toppino & Bloom, 2002). Encoding variability suggests that each new encounter with information is encoded differently, leading to supplementary retrieval cues and additional memory traces. Study-phase retrieval suggests that the first encounter must be retrieved from long-term memory during the second encounter to enhance the memory trace. This theory explains distributed practice intervals and predicts an inverted-U-shaped relationship between learning opportunities and free recall (Verkoeijen et al., 2008). As the time between sessions increases, so does free recall, up to a certain point, after which performance begins to decrease. Therefore, study-phase retrieval suggests that successive encounters should not be distributed more than three and a half weeks apart to maintain the benefits of distributed practice (Verkoeijen et al., 2008).
When planning for optimal spacing between learning sessions, it is important to consider the non-linear relationship between the length of time between learning sessions and the length of time between the last learning session and the test (Cepeda et al., 2008). Forgetting may vary based on this ratio, suggesting that the length of the optimal spacing interval is not fixed. This differs from the suggestion by Fisher and Radvansky (2018) that hippocampal memory loss is always observed one week after learning new material, particularly for simple facts presented once. However, it is consistent with more recent suggestions by Radvansky (2019, 2022) and Radvansky et al. (2022) that the consequences of hippocampal memory loss may not always be observable one week after first learning complex material presented multiple times.
It is also consistent with the Multiscale Context Model (MCM) of long-term memory, which predicts how a study schedule influences retention and accounts for declarative memory (Mozer et al., 2009). While the MCM and two-factor model both aim to improve students' math proficiency through optimal scheduling, the models differ in their consider ations: specifically, the two-factor model considers the inter-study interval while the MCM considers the retention interval and can treat it as a random variable when the actual retention interval is unknown. Thus, by integrating both models or determining which is best suited to learning arithmetic facts, educators can better tailor their teaching strategies to improve students' mathematics proficiency.
It is important to note that both the MCM and two-factor model are based on verbal learning and may not fully apply to the mechanisms involved in learning arithmetic facts. For instance, a meta-analysis of distributed practice showed an inverse relationship between the size of the distributed practice effect and the task's conceptual difficulty (Donovan & Radosevich, 1999). Thus, distributed practice benefits for arithmetic problems may be attenuated, as they are more conceptually demanding than verbatim verbal recall (Rohrer & Taylor, 2006). It would be useful to provide guidelines regarding the boundaries between when the timing between learning sessions should be extended and when it has been extended too much, especially with regard to arithmetic practice. Therefore, the present study seeks to reconcile the time course for the optimal learning and retention of arithmetic facts.

Time Pressure
Learning arithmetic facts is a gradual process, and it is important not to prioritize speed over accuracy and comprehen sion. The What Works Practice Guide for Assisting Students Struggling with Mathematics (Fuchs et al., 2021) recommends timed practice as a beneficial approach based on evidence from 27 studies. It suggests that timed activities should be embedded in mathematics interventions once students have been introduced to the given arithmetic concept. Educators may use timed practice to improve arithmetic fluency (Knowles, 2010), including traditional timed tests like "mad minute" worksheets. However, we still have limited knowledge about how time pressure affects long-term retention of arithmetic facts.
Although the effects of time pressure on mathematical cognition are not fully understood, some students may feel stressed when taking timed arithmetic tests (Engle, 2002). It is uncertain if this induced stress hinders or improves learning and retention. Additionally, it is worth considering how students perceive time pressure during arithmetic practice. If it is perceived as a challenge or goal to achieve, time pressure can act as a typical challenge stressor and provide an opportunity for growth, which may enhance performance (Widmer et al., 2012).
The question of whether time pressure induces stress is complex. Daily timed practice drills result in higher posttest scores and more improvements between pretest and posttest scores compared to once-a-week timed practice drills or no practice at all (Knowles, 2010). Although this suggests that daily practice may be more effective than weekly practice, the study did not control for the total number of practices or compare timed and untimed practice, highlighting the need for further investigation.
One study compared timed versus untimed practice as part of a larger tutoring intervention for children identified as at-risk for math learning difficulties. The results showed that while timed practice improved the learning of arithmetic and two-digit calculations, it did not enhance number knowledge and word problems (Fuchs et al., 2013), indicating that the effects of time pressure may not be universal for all math tasks, especially those requiring higher levels of mathematical reasoning. However, the study also suggested that timed practice may be beneficial for basic arithmetic facts such as single-digit addition, but these positive effects were shown only on an immediate posttest. A follow-up study found that although the larger tutoring intervention had overall positive effects over a long time period, the differences between timed and untimed practice faded (Bailey et al., 2018), suggesting that time pressure may benefit short-term learning, but it may not persist.
Another concern about time pressure is that it may induce math anxiety (Boaler, 2014), which has the potential to negatively affect performance by depleting working memory, a vital component of mathematical cognition (Ashcraft & Krause, 2007). However, the interaction between emotion and cognition is complex. Optimal cognitive performance occurs with intermediate emotional arousal, and the Yerkes-Dodson Law details how too much or too little arousal negatively affects performance (Yerkes & Dodson, 1908). Similarly, children intrinsically motivated in math show an inverted-U curvilinear relationship between math anxiety and math performance (Wang et al., 2015), suggesting that moderate levels of anxiety can enhance performance in intrinsically motivated students.
Thus, there are two competing predictions. Timed practice may impair the learning of arithmetic facts because students are forced to learn new material in a stressful environment (Boaler, 2014). Alternatively, timed practice may enhance learning by increasing arousal to optimal levels or by compensating for weaker problem-solving strategies (Fuchs et al., 2013). The present study seeks to inform the debate on the effects of time pressure on the learning and retention of arithmetic facts.

The Current Study The Interaction Between Distributed Practice and Time Pressure
Although distributed practice and time pressure are well-studied factors that are known to affect learning and memory, little research has investigated how these factors interact during math learning. Mathematics achievement depends on various factors, including domain-specific concepts, skills, and strategies such as number sense and arithmetic decomposition (Busch et al., 2015). Given that multiple factors impact learning in the classroom, it is important to examine their interaction to gain a better understanding of basic learning processes and to provide more realistic and practical guidance for educators.
Our experiment tested competing theories of optimal distribution and the effectiveness of timed versus untimed practice. While numerous studies have focused on identifying the optimal learning conditions for retaining information in the verbal domain, relatively few studies have explored the conditions that promote mathematical cognition. It is possible that the relationship between inter-stimulus and retention intervals may vary depending on the type of memory formed during encoding. Furthermore, few distributed practice studies have investigated time pressure, so analyzing their interaction could offer practical insights into the contexts that support arithmetic learning and retention.

A Priori Hypotheses
We had two competing hypotheses in relation to distributed practice: the exact time distribution hypothesis and the variational distribution hypothesis. With the exact time distribution hypothesis, practice within a week (daily, every other day, weekly) should be superior to the other distribution conditions for recall (every 10 days, every other week) in all post-learning tests. This hypothesis is based on the idea that the 17-and 19-times tables, and arithmetic in general, are not used daily, making them weaker knowledge, and weaker memories fade if not refreshed prior to being lost in the hippocampus. This fading from the hippocampus is predicted to occur at around one week for simple facts presented once (Fisher & Radvansky, 2018), though this time frame may differ for complex materials presented multiple times.
Alternatively, according to the variational distribution hypothesis, daily practice may be more effective than other conditions (every other day, weekly, every 10 days, every other week) for immediate test recall, while every-other-day practice may be superior to other conditions for ten-day delayed test recall. This theory also predicts that the ratio between retention interval and inter-study interval should influence the end-of-semester performance, but the ideal ratio is still unknown. Although longer retention intervals often require longer interstudy intervals, the distribution of time between sessions does not follow a fixed proportion of time between learning and test (Cepeda et al., 2008). Rather, the proportion relating the retention interval and the inter-study interval is variable.
For time pressure, we expected practice with time pressure to be superior to practice without, but we also expected this effect to fade (e.g., its effect should be most noticeable in the immediate test, less noticeable in the ten-day delayed test, and least noticeable in the end-of-semester test). Time pressure would help make memories stronger in the moment, reducing reliance on hippocampal storage. The rationale was that stronger memory may reduce the urgency of including more practice prior to when memories of less frequently used knowledge like arithmetic may fade from the hippocampus at around one week (Squire et al., 2015). Based on math anxiety and math performance's inverted-U curvilinear relationship, if time pressure induces a moderate level of anxiety, it should encourage deeper encoding and automaticity. According to this view, time pressure should enhance arithmetic learning and retention. Based on the effects of timed arithmetic practice seen in Fuchs et al. (2013) and Bailey et al. (2018), we expected mostly short-term effects visible in the immediate test.
Based on our synthesis of the literature, we predicted that the exact time spacing hypothesis would be supported. If so, we further predicted that the effect of condition (within a week vs. after a week) would be reduced with time pressure. Time pressure may create a sense of urgency during learning that may compensate for the longer gap between sessions for participants in the every-10-days and every-other-week conditions. Time pressure may also be less effective for shorter distribution conditions because participants may already have a similar sense of urgency simply by knowing that their next learning session is approaching. Therefore, we reasoned that time pressure may have a larger effect in the longer conditions than in the shorter conditions.

Method Participants
We powered the study to detect a medium effect size because Davoli and colleagues (2020) found a medium effect of different study conditions in adults learning unfamiliar times tables, and a medium effect is practically meaningful in education. Thus, two hundred and fifty adults were recruited to detect a medium effect size at power = .80 while accounting for an anticipated 20% attrition due to the study's longitudinal nature. The actual attrition was 13.6% for 1-day testers, 11% for 2-day testers, 26.8% for 7-day testers, 24.4% for 10-day testers, 35% for 14-day testers. The final sample for the main analysis included 211 adults (ages 18-40; M Age = 25.2, SD Age = 5.9; 36.5% women and 63.5% men). Descriptive data of education, math anxiety, socioeconomic status, ethnicity, occupation, gender, math interest, age, and current residence are presented in the Appendix. Adults were recruited for the study because, although multiplication is a third-grade standard, second graders' arithmetic proficiency prior to learning multiplication is highly variable (De Smedt et al., 2009). As the first study of this issue, working with adults was also easier than working with children who may or may not have learned the basic concept of multiplication (Imbo et al., 2011), a baseline variability that could complicate analysis.
A total of 39 participants were initially assessed but excluded from the analysis. Participants were excluded for the following reasons: (a) if they did not participate in the three learning sessions and take all four tests (n = 32), (b) if they reported experiencing internet connection issues (n = 1), (c) if they reported using a calculator (n = 1), (d) if they spent less than one second on a problem for multiple problems (n = 4), or (e) if they answered the embedded attention check problems incorrectly (n = 1). The study was conducted through Prolific, an online platform for reliable data collection from a global cohort of participants used in prior psychology studies (e.g., Callan et al., 2017). Participants were screened for at least ten prior submissions and a 95% or above approval rating before they could participate for $10 an hour. There were no demographic constraints. A $2 bonus was awarded for completing all study parts. Participants were given adequate information and filled out an informed consent form. This research was approved by the University of Notre Dame's Institutional Review Board. Data is publicly available (see Supplementary Materials).

Materials
The materials consisted of the 17-and 19-times tables. In the learning phase, the times tables included multipliers from 1 to 12 in a random order, as these are common in children's timed tests. In the testing phase, 3, 6, 7, 8, 9, 12 were multipliers, as these multipliers take longer to mentally calculate, so it is easier to tell when they are not memorized. These arithmetic facts were used to design the pretest, learning phase problem sets, and tests on Qualtrics. The 17-times table had been used as a set of unknown arithmetic facts in the "Hand Position Affects Performance on Multiplication Tasks" study by Davoli and colleagues (2020). To establish that the results were not dependent on the particular arithmetic practiced (e.g., the 17-times table), the 19-times table added variability (Haverty, 1999). There were no expected differences based on the 17-or the 19-times table. To avoid expertise through prior practice, the times tables had high multiplicands: 17 and 19. Because some countries memorize up to the 16-times table, these practiced facts were above the 16-times table to minimize familiarity. Filler facts (random, unpracticed 7-, 9-, 12-times table facts) were also included in the tests. The 7-and 9-times tables were filler facts because they have the same digits in the ones place as the 17-and 19-times tables while the 12-times table similarly requires double digit multiplication. The pretest was identical in format to the post-tests in the testing phase, and six 17-times table facts, six 19-times table facts, and six filler facts were used.
The problem sets at the end of each learning phase were composed of either the 17-or 19-times table, which were randomly assigned to each participant. The assigned times table (e.g., 17) was the multiplicand, so the format of all the learning phase problem sets was a x b = c (e.g., 17 x 4 = __). The problem sets were designed as multiple-choice questions that included the correct answer and seven other randomly ordered answers based on common mistakes children make: Operand-Related errors (using the incorrect multiplier), Miss 1 errors (the answer is off by one from the correct answer), and Adding Instead of Multiplying errors (Buwalda et al., 2016). The adding instead of multiplying error was not included, as it is mostly seen in children who have not mastered multiplication. For 2-digit by 2-digit multiplication problems, the Forgetting Place Value errors (e.g., 15 x 12 = 30, 30 + 15 = 45) and Rounding Incorrectly errors (e.g., for 15 x 12: 15 x 10 = 150, 150 + [2 x 12] = 174) replaced other common errors. Answers with the same ones digit (Same Ones Digit errors) were also included. This category was also included for incorrect answers to add more variability. A Miss 1 category was included for incorrect answers to prevent searching for consecutive answers, as only using this error for correct answers could signal that the correct answer was a consecutive number. Multiple-choice was used instead of fill-in-the-blank to minimize possible variability due to typing speed or internet connection.
The State-Trait Anxiety Inventory (STAI; Spielberger et al., 1983) assesses how anxious participants feel while com pleting a task (e.g., state anxiety) and in general (e.g., trait anxiety). To identify whether participants experienced stress, especially when timed, participants filled out a shortened State-Trait Anxiety Inventory (STAI) after each learning phase. This version contained Items 1, 3, 6, 15, 16, and 17 of the State form of the STAI and had a favorable internal consistency (all r's >.90), making it a reliable and valid instrument (Tluczek et al., 2009). The STAI questions asked participants to indicate how they felt right after each learning phase. Using a five-point Likert scale, they indicated their present feelings for "I feel calm. I feel tense. I feel upset. I am relaxed. I feel content. I am worried. " Scoring was reversed for items 1, 15, and 16, as recommended in the manual (Spielberger et al., 1983). Higher scores indicated greater state anxiety, a measure of perceived stress.
After the learning phase, the participants started the testing phase. The tests were identical to the pretest, and they included 18 problems: six 17-times table facts, six 19-times table facts, and six filler facts (unpracticed arithmetic: 7-, 9-, 12-times tables). Each test included embedded attention check statements with simpler arithmetic such as "2 x 2 = __. " The questions were presented one at a time in black 12-point Arial font and in a random order to each participant during each test. The interstimulus interval depended on when the participant clicked the arrow to move on but had a limit of 90 seconds.
To evaluate understanding of the practiced facts, a "transfer" test was given after the end-of-semester test. This test included three different forms of "transfer" facts: the practiced multiplicand as the multiplier (e.g., when 17 x 7 is the practiced fact, the test problem would be 7 x 17), the non-traditional "answer = operations" (e.g., __ = 17 x 8), and the division version (e.g., 51 / 17). Three 17-times table, three 19-times table, and three filler transfer problems were used. Three filler traditional multiplication problems (e.g., 7 x 4 = __) were added to prevent signaling the variables of interest. The inclusion of "transfer" facts in this study was modeled after the transfer of knowledge in commutative pairs from a study by Davoli and colleagues (2020), in which undergraduates learned multiplication facts (of the format a x b = c) and transferred their learning to a novel format (Exp. 1: b x a = __) or the same format (Exp. 2: a x b = __). Participants used their devices and Prolific accounts and were asked to not use additional materials (e.g., paper and pencil, calculator, times table).

Task
Participants learned and practiced unfamiliar arithmetic facts from a 17-or 19-times table under different distribution conditions with time pressure as a between-subjects manipulation. The primary task was to learn arithmetic facts based on a randomly assigned 17-or 19-times table. The task consisted of five main phases: a pretest, a learning phase (three sessions), an immediate test (the final task in the third learning phase session), a ten-day delayed test, and a delayed test based on a later fixed date (9 weeks after the study start date; Figure 1).

Equal Learning Schedule With Time in Days Depicted Horizontally
Note. The interstudy interval (ISI) varies with assigned spacing condition: 1, 2, 7, 10, or 14 day intervals. The ten-day delayed test occurs after a fixed retention interval (RI) 10 days after the third study session. The end-of-semester test occurs after a variable RI from the ten-day delayed test (fixed at 9 weeks after the first session).
The pretest, immediate test, and end-of-semester test were the same for participants in all conditions. They mimicked traditional classroom tests with a time limit of 90 seconds per question, but this time limit was not highlighted as a "beat the clock" scenario nor presented to create an acute sense of time pressure regardless of assigned condition.

Procedure
After meeting the criteria, signing up, and providing informed consent, participants began the learning phase on Qualtrics. They read instructions describing the study's phases, how to answer questions, and its purpose. They were reminded of their anonymity. Due to an anticipated wide range in baseline levels of arithmetic proficiency, a pretest was administered at the start of the first learning phase session to gauge mathematical abilities. The pretest was also a baseline measure to ensure that question order did not influence answers and to assess participant learning over time.
Each participant was given the following task during each of the three learning phase sessions: solve a times table, study the times table, and take an assessment. Participants were first asked to solve a 17-or 19-times table with multipliers presented in a random order and to study their assigned times table with consecutive multipliers for as long as they would like, although this was roughly equal across conditions with an average of 6 minutes. When ready to move on, participants were told to try their best to solve each problem as quickly as they could while still maintaining high accuracy. They then recalled the studied arithmetic by filling out two consecutive 20 multiple-choice problem sets. In the timed condition of the learning phase, participants were given the goal of trying to solve at least 15 problems correctly in 30 seconds. After 30 seconds, regardless of how much was completed, each problem was displayed along with a red 0/1 (incorrect) or a green 1/1 (correct) score. Participants could review their answers for as long as they liked before moving on. The next screen showed how many problems they answered correctly and whether they met the goal of correctly answering at least 15 problems in 30 seconds. Any problems answered incorrectly or left blank reappeared to be redone. Participants then had a chance to beat their initial scores with another set of 20 problems with the same goal. Meanwhile, in the untimed condition, people were also given two sets of 20 problems but were not made aware of the time spent. Like the timed condition, answers were scored and revealed. They also reviewed answers, were notified of their accuracy, and re-solved incorrect or unanswered problems.
After completing the first learning phase session tasks, participants completed the State form of the modified State-Trait Anxiety Inventory. As an additional reliability measure, participants then answered a question about whether there was anything they had to share about their participation (e.g., opening a times table in a separate tab, connection issues). They were reminded that their answers would not impact their monetary compensation and completed a survey that collected their demographic information. This self-reported measure was included at the end of data collection to minimize biased responses during the study.
After all three learning phase sessions, participants completed a filler task (the shortened State-Trait Anxiety Inventory) followed by the immediate test. The end-of-semester test was administered 9 weeks after the start of the study. To serve as a more practical measure for how educators could incorporate the study's results into teaching, it was designed as a fixed date rather than a variable ratio with the last learning session. Nine weeks were chosen because K-12 schools in the U.S. commonly use a nine-week grading period system; this is thought to provide more time to learn complex concepts and to receive feedback and intervention prior to grade deadlines despite limited empirical evidence supporting this idea. With more schools shifting to a nine-week grading period, it may be helpful to assess the learning and retention of arithmetic using this grading period system.

Design
The design was a 5 (spacing condition: 1, 2, 7, 10, or 14 day intervals) x 2 (time pressure: timed or untimed) x 2 (fact type: practiced or unpracticed) x 4 (test phase: pretest, immediate, ten-day delayed, end-of-semester) mixed design with repeated measures on fact type and test phase. The distribution of study times was modified based on studies by Fisher and Radvansky (2018) and Cepeda et al. (2008). The spacing effects on the end-of-semester test were extrapolated from Cepeda et al. (2008) data of test performance as a function of the inter-study interval. Participants were randomly assigned to one of the ten between-subjects conditions in the 5 (spacing) x 2 (time pressure) factorial design for the learning phase. Bayesian analyses were performed to directly test the null hypothesis while comparing multiple competing hypotheses. A Bayes factor of less than 1 indicates evidence for the null hypothesis while greater than 1 indicates evidence for the alternative hypothesis. The larger the Bayes factor's absolute value, the stronger the evidence.

Response Time
Response times (RTs) ranged from 2 to 90 s with a mean RT per problem of 12 s (SE = .2 s) across all conditions because the accuracy of the Qualtrics timing was to the second. The response time data at each test was right-skewed, W(1806) = .79, p < .001, with skewness of 2.65 (SE = .06) and kurtosis of 12.96 (SE = .12). This is an expected and typical finding with response time data. Given that ANOVA is robust against violations of normality, we proceeded with the analysis. We did a 5 (spacing condition: 1, 2, 7, 10, or 14 day intervals) x 2 (time pressure: timed or untimed) x 2 (fact type: practiced or unpracticed) x 4 (test phase: pretest, immediate, ten-day delayed, end-of-semester) mixed analysis of variance (ANOVA) with repeated measures on fact type and test phase.

Effect of Test Phase on the Response Time for Arithmetic Fact Recall
Note. Response time on a pretest and three posttests was assessed among participants (n = 211) in all spacing and timing conditions. Marginal means ± standard error of the mean. *p < .05 as compared to pretest.

Effects of Time Pressure (A) and Fact Type (B) Across Test Phase and the Interaction Between Test Phase, Fact Type, and Time Pressure (C) on the Response Time of Arithmetic Fact Recall
Note. Response time on a pretest and three posttests was assessed among participants (n = 211) in all spacing conditions and compared between the timed (time pressure, n = 104) and untimed (no time pressure, n = 107) conditions (A) as well as for practiced facts (e.g., 17-times table facts for participants assigned to the 17-times table) versus unpracticed facts (e.g., 19-times table facts for participants assigned to the 17-times table) (B). Response time on a pretest and three posttests was then compared between timed and untimed condition participants and their performance on practiced versus unpracticed facts (C). Marginal means ± standard error of the mean. *p < .05. However, Post-Hoc analysis indicated that time pressure improved performance on the immediate test (M = 8.84, SE = .45; M = 10.16, SE = .44). The other three-way interactions and the four-way interaction were all not significant (all p > .05).

Effect of the Interaction Between Time Pressure and Spacing Condition on the Response Time of Arithmetic Fact Recall
Note. Response time on a pretest and three posttests was assessed among participants (n = 211) in all spacing conditions and compared between the timed (time pressure, n = 104) and untimed (no time pressure, n = 107) conditions. Marginal means ± standard error of the mean. *p < .05.

Effect of the Three-Way Interaction Between Test Phase, Fact Type, and Time Pressure on the Response Time of Arithmetic Fact Recall
Note. Response time on a pretest and three posttests was compared between the timed condition participants (A) and the untimed (B) and their performance on practiced facts (e.g., 17-times table facts for participants assigned to the 17-times table) versus unpracticed facts (e.g., 19-times table facts for participants assigned to the 17-times table). The time between the ten-day and end-of-semester posttests varies according to spacing condition (producing either an expanding or contracting schedule), which would warrant a bar graph. However, the relationship across test phase (where time is a continuous variable) was best demonstrated with a line graph. The untimed condition participants' performance on unpracticed facts followed a different forgetting trajectory from the other fact type by timing interaction conditions. Marginal means ± standard error of the mean.

Robustness Checks
To further understand the test phase x fact type, three-way, and the distribution condition x time pressure interactions, we re-analyzed the data without the pretest to see if the interactions still held, and they did in all posttests. We also checked if the interactions held when controlling for average accuracy as a covariate. This showed that the test phase x fact type interaction no longer held. However, the test phase x fact type x time pressure interaction did, as did the distribution condition x time pressure interaction.

Exploratory Analysis
Because the two post-practice interactions were not predicted a priori, we did an exploratory analysis. The first major finding was that timed practice decreased response time for participants in the 1-and 2-day conditions while increasing response time for participants in the 7-, 10-, and 14-day conditions ( Figure 6). This interaction's effect size was medium, F(1, 199) = 13.60, p < .001, η p 2 = .06.

Effects of Time Pressure and Within a Week Versus Greater Than a Week Spacing Conditions on Response Time
Note. Response time was assessed according to spacing condition (within a week, n = 89; or, greater than a week, n = 122) and timing condition (timed practice, n = 104; untimed practice, n = 107) for all tests excluding the pretest. Estimated marginal means ± standard error of the mean. *p < .05.
The second major finding was that the difference between the practiced and unpracticed facts seemed to follow the same forgetting trajectory across the test phase among timed condition participants ( Figure 5; Figure 3C). This trend was not found for untimed condition participants. Rather, these participants gradually improved on the unpracticed facts with repeated practice across the posttests.

Accuracy
We anticipated near-ceiling performance in accuracy. Accuracy ranged from .00 to 1.00 with a mean of .84 (SE = .01) across all conditions. The accuracy at each test was left-skewed, W(1780) = .74, p < .001, with skewness of -1.610 (SE = .06) and kurtosis of 2.191 (SE = .12). Because ANOVA is robust against violations of normality, we proceeded with the analysis. Participants exhibited more variability across test phase than expected on the arithmetic recall task ( Figure  7). Seeing condition effects in accuracy was a novel finding, so we moved to testing our hypotheses using accuracy data. We did a 5 (distribution condition: 1, 2, 7, 10, or 14 day intervals) x 2 (time pressure: timed or untimed) x 2 (fact type: practiced or unpracticed) x 4 (test phase: pretest, immediate, ten-day delayed, end-of-semester) mixed analysis of variance (ANOVA) with repeated measures on fact type and test phase.

Effect of Test Phase on the Accuracy of Arithmetic Fact Recall
Note. Accuracy on a pretest and three posttests was assessed among participants (n = 211) in all spacing and timing conditions. Marginal means ± standard error of the mean. *p < .05 as compared to pretest, † p < .05 as compared to end-of-semester test.
Mauchly's test indicated that the assumption of sphericity had been violated for test phase, χ 2 (5) = 70.58, p < .001. Degrees of freedom were corrected using Huynh-Feldt estimates of sphericity (ε = .902) for the within-subjects tests involving test phase. Two main effects were found. First, there was a main effect of test phase, F(2.71, 503.31) = 8.77, p < .001, η p 2 = .045 (Figure 7). To mirror response time data analysis, we next moved to the orthogonal Helmert contrasts. The Helmert contrasts revealed that accuracy at pretest was worse than the average accuracy on the three posttests, F(1, 186) = 13.20, p < .001. The immediate test was not more accurate than the ten-day delayed and end-of-semester tests, F(1, 186) = 2.62, p = .11, but the ten-day delayed test was more accurate than the end-of-semester test, F(1, 186) = 7.59, p = .006. Second, the main effect of fact type (practiced or unpracticed) was significant, F(1, 186) = 32.94, p < .001, η p 2 = .15. Accuracy was higher on the practiced facts (M = .87, SE = .01) than on the unpracticed facts (M = .82, SE = .01).
These main effects were qualified by significant test phase by time pressure, F(2.71, 503.31) = 3.67, p = .015, η p 2 = .019, test phase by fact type, F(3, 558) = 11.79, p < .001, η p 2 = .060, interactions. The timed condition was less accurate than the untimed condition on the immediate test ( Figure 8A). People were also more accurate on the practiced facts than on the unpracticed facts for the immediate and ten-day tests ( Figure 8B). This was not the case for the other tests. The other three-way interactions and the four-way interaction were not significant (all p > .05).

A Priori Hypotheses
We assessed accuracy in relation to our two hypotheses. We did not find evidence for a distribution effect on accuracy, F(4, 186) = 2.0, p = .10, η p 2 = .04, BF 10 = .14. We also did not find evidence that timed practice led to higher accuracy than untimed practice. The means were in the opposite direction of our prediction, although the difference was not significant, F(1, 186) = 1.98, p = .16, η p 2 = .01, BF 10 = 3.23. Again, we did not test the hypothesis that the distribution effect would be reduced in the time pressure condition, as it depended on finding a main effect of distribution.

Transfer Test
We conducted a 5 (spacing condition: 1, 2, 7, 10, or 14 day intervals) x 2 (time pressure: timed or untimed) x 2 (fact type: practiced or unpracticed) mixed analysis of variance (ANOVA) with repeated measures on fact type using the transfer test accuracy and response time data. Accuracy results revealed a main effect of fact type, F(1, 207) = 5.52, p = .02, BF 10 = 2.11, with it being higher for practiced (M = .86, SE = .02) than unpracticed facts (M = .81, SE = .02). A main effect of fact type on response time was indeterminate, F(1, 207) = 1.89, p = .17, BF 10 = .48. A main effect of timing condition was found on accuracy, F(1, 207) = 4.12, p = .04, with more accurate responses in the timed than untimed condition (M = .87, SE = .02; M = .80, SE = .02, respectively). This was not seen for response time. There were no main effects of distribution on accuracy or response time.

Effects of Time Pressure (A) and Fact Type (B) Across Test Phase
Note. Accuracy on a pretest and three posttests was assessed among participants (n = 211) in all spacing conditions and compared between the timed (time pressure, n = 104) and untimed (no time pressure, n = 107) conditions (A), for practiced facts (e.g., 17-times table facts for participants assigned to the 17-times table) versus unpracticed facts (e.g., 19-times table facts for participants assigned to the 17-times table) (B), and the interaction between time pressure and fact type across the test phases (C). Marginal means ± standard error of the mean. *p < .05.

Stress Measure
To determine the impact of time pressure during learning, the State-Trait Anxiety Inventory (STAI) was given after each learning phase session. Using state anxiety ratings (STAI-state), we did a 2 (time pressure: timed or untimed) x 3 (learning phase session: first, second, or third learning phase) mixed analysis of variance (ANOVA) with repeated measures on learning phase session. There was a main effect of timing condition on reported state anxiety, F(1, 10) = 14.81, p = .003, η p 2 = .60, a main effect of learning phase session on reported state anxiety, F(2, 20) = 58.65, p < .001, η p 2 = .85, and an interaction between timing condition and learning phase session, F(2, 20) = 28.27, p < .001, η p 2 = .74. Overall, participants in each time pressure condition reported experiencing less stress with each subsequent session. Participants in the timed condition reported experiencing more stress than those in the untimed condition following all sessions. They reported experiencing similar stress during the first and second learning sessions, and it was not until the third session that their stress decreased, while the participants in the untimed condition reported experiencing less stress during the second learning session (Table 1).

Common Arithmetic Errors
We did an exploratory analysis for the most common arithmetic error by recording how often a type of error appeared among incorrect answers for different fact types (17-or 19-times table) across test phase (Pretest, Immediate, Ten-Day, End-of-Semester) and the overall frequency of each error ( Table 2). Recall that the tests were multiple choice, so errors were limited to the answers we had included as foils based on previous research. We had an exploratory hypothesis (see pre-registration in the Supplementary Materials) that testing pressure could lead participants who relied on calculation rather than retrieval to hastily select an answer once the correct ones digit was known, resulting in the "Same Ones Digit" error. This error was indeed the most common (37%). Overall, 162 of the 211 participants (77%) made the Same Ones Digit error at least once during testing. The next most frequent error was Rounding Incorrectly (23.7%), which is only found in 2-digit by 2-digit multiplication. Place Value errors were non-existent after the pretest.

Discussion
The current study examined how distributed practice and time pressure affect the learning and retention of unfamiliar arithmetic facts. Our goal was to contribute to the ongoing search for optimal ways to structure practice. The study was designed to extend prior knowledge on distributed practice by integrating time pressure and considering effects on mathematical cognition. The main effects of test phase and fact type increased our confidence in the major findings. Participants took more time and were less accurate on the pretest compared to the immediate posttest. However, they took more time and were less accurate as the retention interval increased, suggesting that learning was followed by forgetting (Figures 2 and 7). Participants performed better on practiced than unpracticed facts. Despite not finding evidence in support of our a priori hypotheses, results revealed the following: (a) time pressure elevated perceived stress  and induced state anxiety, (b) distributed practice and time pressure interacted: time pressure during learning led to faster test performance if the inter-stimulus study interval was less than one week but slower test performance for longer inter-stimulus study intervals, (c) the presence or absence of time pressure contributed to a different forgetting trajectory of practiced and unpracticed facts, and (d) learning transferred to conceptually related facts.

Time Pressure as a Stressor
Participants who were timed during learning scored higher on the state anxiety portion of the State-Trait Anxiety Inventory (Table 1). The state anxiety component of the inventory assesses how anxious participants feel during learning, and our results suggest that time pressure increased participants' perceived anxiety levels. Timed participants reported feeling subjectively more stressed than those in the untimed condition. Although future studies may benefit from using math-specific anxiety measures, prior research on time pressure supports our findings. Time pressure has been shown to heighten the stress associated with decision-making tasks (Byrne et al., 2015;Young et al., 2012). Despite the increased stress reported by timed participants, there were no main effects of time pressure during learning on test performance. Nevertheless, the timed problem sets were perceived as stressful and affected performance differently across different spacing conditions.

The Interaction Between Spacing and Time Pressure
The second major finding was an interaction between time pressure and time between study sessions. Participants in the timed learning phase who studied material daily or every other day responded faster during the testing phase as compared to participants in the untimed learning phase. However, this pattern reversed when the inter-stimulus interval was around one week. Participants in the untimed learning phase who studied material weekly, every 10 days, or every other week responded faster during the testing phase than those in the timed learning phase (Figures 4 and  6). Therefore, time pressure during learning was found to enhance response times during testing only when paired with closely spaced together practice sessions. This finding is consistent with prior studies in which daily timed practice enhanced performance as compared to farther-spaced timed practice or no practice (Duhon et al., 2022;Knowles, 2010), although these prior studies did not compare timed practice with untimed practice. The hippocampus plays a critical role in memory (Scoville & Milner, 1957). Hippocampal memory loss is thought to occur approximately one week after first learning new material (Fisher & Radvansky, 2018). A hippocampal explanation would initially predict a main effect of spacing, as this has been empirically demonstrated in the memory literature (Hintzman et al., 1973;Kim et al., 2019;Shaughnessy et al., 1972). However, this robust and reliable finding has not been applied to arithmetic learning. Although the current study did not find a main effect of practice distribution, we observed an interaction between time pressure and distribution. One possible explanation for this is that time pressure did not compensate for inter-stimulus intervals longer than one week when learning arithmetic. Compared to language, arithmetic is less frequently used knowledge, largely because it is often disliked and avoided outside of educational settings (Cannon & Ginsburg, 2008). Although time pressure may create a beneficial sense of urgency (Squire et al., 2015), our study suggests that timed practice during learning led to faster response times during testing only when memories were recently consolidated. Therefore, the effects of time pressure may depend on the temporal dimensions of hippocampal-neocortical dialogue, during which hippocampal-dependent memories stabilize in the cortex (Sweegers et al., 2014), allowing for the transition of problem-solving strategies to more automatic retrieval (Rickard, 1997), and eventually leading to skill learning (Rickard et al., 2008). Although the precise temporal dimensions of this process are unknown (Squire et al., 2015), our results suggest that it likely occurs during an inter-stimulus interval between two and seven days.
Another possibility is that the longer inter-stimulus intervals functioned as stressors because a lack of recent review may be perceived as stressful. Time pressure is another stressor, as corroborated by the shortened State-Trait Anxiety Inventory (Table 1). By adding timed practice, the stressors may accumulate to surpass the amount of beneficial stress, as depicted by the inverted-U-shaped relationship between spacing and free recall (Verkoeijen et al., 2008). This may occur at an inter-stimulus interval between two and seven days, contributing to the observed time pressure and spacing trend reversal.

The Effects of Time Pressure on Forgetting
Our third major finding indicated that the forgetting trajectory differed based on whether practice occurred under time pressure, as evidenced by the time pressure, fact type, and test phase interactions. In the timed condition, both practiced and unpracticed facts followed a typical learning-then-forgetting-over-time pattern across the multiple test phases. Similarly, practiced facts in the untimed condition followed this pattern. However, unpracticed facts in the untimed condition seemed to allow participants to gradually improve on unpracticed facts across the tests (Figures 5 and 3C). The forgetting that occurred for the unpracticed facts in the timed condition suggests that the results are not merely a general effect of additional practice during the testing phase.

Transfer Test
Finally, we found that participants who learned a specific set of arithmetic facts were able to apply their knowledge to solve similar problems, indicating transfer of learning. Accuracy was higher for transfer facts that used the same times table as the practiced facts, suggesting that participants constructed conceptual knowledge. These results build upon a previous study where undergraduates learned multiplication facts (a x b = c) and then were asked to either transfer their learning to a novel format (b x a = __) or the same format (a x b = __) (Davoli et al., 2020). Our findings support the conclusion that learning occurred, as knowledge that is not transferred to new contexts or problems is considered inert (Bransford et al., 2001). While retention requires recall, transfer requires understanding and application (Bransford et al., 2001;Haskell, 2001). Additionally, the lack of time pressure in the untimed condition may have contributed to an enhanced conceptual understanding, possibly through a relational strategy like decomposition, which is associated with accurate equation encoding (Chesney et al., 2014). Alternatively, participants may have relied on operational patterns, as prior practice did not affect response times. Operational patterns may be helpful for traditional arithmetic, but are unhelpful when encoding, interpreting, and solving non-traditional problems (McNeil et al., 2015). Overall, although participants showed enhanced learning of the practiced facts, as seen in the accuracy data, the transfer of learning to a novel format takes time.

Implications
The present results demonstrate that time pressure during learning and previously practiced arithmetic facts affect test performance. While forgetting occurred for the other conditions, participants gradually improved on unpracticed facts with untimed practice, albeit with slower response time on the immediate posttest and no difference on the end-ofsemester posttest compared to other groups ( Figure 5). Although it is unclear whether performance would continue to improve with a longer retention interval, reviewing previously unlearned but related information in an untimed, stress-reduced environment may allow students to learn from tests. One way to implement such an environment is through something like math story time, which can increase children's mathematics achievement (Berkowitz et al., 2015). Given that time pressure only helps if it is paired with less spaced apart review (e.g., 1-or 2-day spacing), educators should consider the length of time between study sessions when using timed practice. These findings provide insights for optimizing arithmetic practice and using limited instructional time effectively.
The current study also contributes to the theoretical understanding of how the time between study sessions and the presence of time pressure influence learning and memory. It advances knowledge by examining memory for mathematical information, an area that is studied less than verbal information in memory research. The study did not replicate the typical distributed practice effect seen in the memory literature. This lack of replicability suggests that the ratio between the inter-stimulus interval and retention interval may vary based on the type of material studied, such as arithmetic, narrative texts, or word lists. Furthermore, the study provides insights on the relationship between practiced facts and the transfer to conceptually related facts across different interstudy intervals and time pressure conditions. Participants responded more accurately but not faster on practice facts, and untimed participants were more accurate, which deepens our understanding of how these factors affect learning.

Limitations
Despite the current study's theoretical and practical contributions, it is not without limitations. First, the 1-, 2-, and 7-day spacing conditions were in the "within a week" group, but each learning session was open on Prolific for 1-2-, 2-3-, and 7-8-days. Given these classifications, the 7-day group could have been part of the "greater than a week" category. That said, the 1-week time is a rough estimate. Second, the end-of-semester test, although practically relevant, may be theoretically uninterpretable because of the differences in the retention interval between the groups. Our study used an expanding, contracting, and equal learning schedule where the inter-stimulus interval respectively increased, decreased, or remained constant. An inconsistent pattern of learning episodes contributed to a variable retention interval between the ten-day delayed and end-of-semester tests, possibly acting as a confound for end-of-semester test interpretability, as the optimal learning schedule may depend on the retention interval (Küpper-Tetzel et al., 2014). Despite this, the end-of-semester test is ecologically relevant because the testing situation closely resembles a real class: assessing what students know, teaching the content and standards, testing them shortly after learning to assess what they retained, and administering an end-of-semester assessment. Lastly, practicing just one times table (17 or 19) makes our task somewhat different from a more natural learning situation where there would be higher potential for cross-times table interference.

Future Directions
The current study sheds light on how timed practice and distributed practice affect learning and retention of arithmetic facts, which may differ from the learning and retention of other information (e.g., narrative texts, word lists). More research is needed to understand the potential differences and interactions between time pressure and inter-stimulus intervals of less than one week versus greater than one week. One avenue for future work is to conduct a study with multiple inter-stimulus intervals between two and seven days to identify the exact time when practice with time pressure is no longer preferable. Understanding how spacing and time pressure impact participants' retrieval comfort level is also worth exploring, as individuals differ greatly in their retrieval comfort level. Retrieval comfort refers to how comfortable individuals are with relying on retrieval to solve arithmetic problems (Hecht, 2006;Siegler, 1988). For example, time pressure may harm learning for "not-so-good retrievers, " who are slower at executing fact retrieval processes, but may have a smaller effect on "good retrievers" who consistently opt for retrieval even when strategy choices are allowed. The impact on "perfectionists, " who are reluctant to rely solely on retrieval despite retrieving correct answers, is difficult to predict. Time pressure could serve as a form of exposure to help them overcome their discomfort by forcing them to practice retrieval, or it could lead to increased anxiety and reduce their retrieval comfort level further in the long run.
It is also important for future studies to address the end-of-semester test's theoretical interpretability by excluding the immediate and ten-day delayed tests, as they may act as learning sessions. To deepen our understanding of the interaction between spacing and timed practice, future research should systematically examine mechanisms that contribute to the gradual improvement on unpracticed facts that occurs in the untimed learning condition. If we can ascertain why the relationship between time pressure and spacing reverses between two and seven days, and why a lack of time pressure results in gradual improvement on unpracticed facts, then we will have a fuller picture of how distributed practice and time pressure affect math performance. Ultimately, this line of work has the potential to assist mathematics educators in making informed and nuanced decisions about how best to allocate their valuable instructional time to optimize student learning of arithmetic facts.