^{a}

^{*}

^{b}

^{c}

^{d}

Automatized arithmetic can interfere with numerical judgments, and semantic misalignment may diminish this interference. We gave 92 adults two numerical priming tasks that involved semantic misalignment. We found that misalignment either facilitated or reversed arithmetic interference effects, depending on misalignment type. On our number matching task, digit pairs (as primes for sums) appeared with nouns that were either categorically aligned and concrete (e.g., pigs, goats), categorically misaligned and concrete (e.g., eels, webs), or categorically misaligned concrete and intangible (e.g., goats, tactics). Next, participants were asked whether a target digit matched either member of the previously presented digit pair. Participants were slower to reject sum vs. neutral targets on aligned/concrete and misaligned/concrete trials, but unexpectedly slower to reject neutral versus sum targets on misaligned/concrete-intangible trials. Our sentence verification task also elicited unexpected facilitation effects. Participants read a cue sentence that contained two digits, then evaluated whether a subsequent target statement was true or false. When target statements included the product of the two preceding digits, this inhibited accepting correct targets and facilitated rejecting incorrect targets, although only when semantic context did not support arithmetic. These novel findings identify a potentially facilitative role of arithmetic in semantically misaligned contexts and highlight the complex role of contextual factors in numerical processing.

Humans are numerical thinkers. Adults often use efficient processes for generating and retrieving arithmetic sums (

Although

Priming effects for a) automatic arithmetic without context where participants match the target digit to preceding cue digits (LeFevre interference effect), b) the LeFevre interference effect moderated by the presence of categorically

This ^{i} We specifically modified the misaligned condition, which in Bassok’s version included multiple types of misalignment. Some of their misaligned sets included only tangible, concrete nouns (e.g., hens, radios), whereas other sets included both concrete and abstract, intangible nouns (e.g., tractors, messages). We propose that combinations of concrete and intangible referents are especially inconducive to automatic arithmetic because they are less likely to generate a plausible

We also explored contextual interference with automatic arithmetic at the level of full sentences, based on evidence that semantic misalignment affects accuracy and findings of evoked related potential (ERP) responses during word problem solving and verification (e.g.

We also pursued two secondary aims concerning how semantic alignment effects generalize across settings and persons. First, we attempted to replicate

Participants were 92 students (61 females) enrolled in undergraduate (

We designed this computerized task to test whether priming for addition facts varies with semantic context, modifying the version designed by

Our task differed from the original version (

Illustration of the experimental procedure for the Number Matching task.

We included 176 noun sets and enforced strict control of the nouns’ surface features, as follows: Nouns were drawn from the 5,000 most common plural nouns in the Corpus of Contemporary American English (

No single noun appeared in more than one noun triplet. Synonyms (e.g., ships, boats) were excluded within sets since they may be more subject to combining than other nouns. Likewise, subset relationships between cue nouns were excluded (e.g., roads, lanes) to avoid prompting division instead of addition (

Our critical manipulation was three levels of categorical alignment based on the appropriateness of summing across noun referents within a set (as per

The Number Matching Task included 40 noun triplets for each set (ACC, MCC, and MCI) in the digit-first trials (i.e., the trials of interest in this study). To ensure that participants did not statistically learn that noun triplets were often misaligned, 48 of the 56 noun-first trials involved ACC triplets, so that overall, half of all triplets in the study were aligned (as was the case in

Non-matching target |
Matching target |
|||||
---|---|---|---|---|---|---|

Noun Type | Cue 1 | Cue 2 | Target | Cue 1 | Cue 2 | Target |

ACC | pigs | cows | goats | whales | sharks | sharks |

ticks | fleas | moths | doctors | lawyers | doctors | |

donuts | bagels | cookies | plates | bowls | bowls | |

bankers | actors | sailors | frogs | toads | toads | |

apples | lemons | mangoes | lamps | desks | lamps | |

MCC | webs | eels | cabs | magnets | wizards | magnets |

homes | bones | cops | hotels | ladders | hotels | |

cards | boats | ports | prisons | oysters | oysters | |

brakes | swords | phones | statues | cigars | cigars | |

robots | towels | guitars | papers | hunters | hunters | |

MCI | tanks | myths | laws | wives | facts | wives |

trucks | nights | clowns | pearls | tales | pearls | |

sins | dogs | chips | pastors | options | options | |

weeks | bombs | hairs | defects | turkeys | defects | |

tactics | acorns | lessons | tasks | eggs | eggs |

Digit sets used in the Number Matching task consisted of three unique digits between 1 and 9 (

Nonmatching |
Matching |
|||||
---|---|---|---|---|---|---|

Target |
Target Control |
Cue Control |
||||

Cue | Sum | Neutral | Cue | Target | Cue | Target |

2■3 | 5 | 8 | 7■5 | 5 | 2■3 | 2 |

3■2 | 5 | 7 | 5■8 | 5 | 3■2 | 3 |

2■5 | 7 | 9 | 3■7 | 7 | 2■5 | 5 |

5■2 | 7 | 9 | 9■7 | 7 | 5■2 | 2 |

6■2 | 8 | 5 | 5■8 | 8 | 6■2 | 2 |

5■3 | 8 | 6 | 9■8 | 8 | 5■3 | 5 |

4■3 | 7 | 9 | 7■9 | 7 | 4■3 | 4 |

3■5 | 8 | 6 | 8■4 | 8 | 3■5 | 3 |

6■3 | 9 | 7 | 9■1 | 9 | 6■3 | 3 |

5■4 | 9 | 7 | 9■6 | 9 | 5■4 | 4 |

We also controlled for the size and distance between each sum/neutral target and its associated cue digits, and whether target digits appeared on the _{neutral} = 2.8, _{sum} _{neutral} = 1.7) exceeded that of the sum set (_{sum} = 1.1), because sum splits had a unimodal distribution whereas neutral splits had a bimodal distribution. Similar patterns held for the maximum and average splits.

Similarly, it was necessary to prevent the target digit from revealing whether it was likely to be a match. Since the cue digits in the nonmatching sets were constrained to sum to less than 10, these cue digits tended to be small (

Participants completed 176 trials. On the 120

Participants were assigned to one of four fixed trial orders. To prevent participants from ignoring word cues (since noun-first trials were less numerous), practice trials and the first of four blocks of testing trials included an equal number of digit-first and noun-first trials. However, maintaining this balance throughout the entire task would require an unfeasibly long task, so we decreased the ratio of noun-first to digit-first trials harmonically to one-half, one-third, and one-quarter in subsequent blocks for

Within each order, the sequence of trials was randomly generated with several constraints. Consecutive identical answers (match vs. non-match) did not exceed four trials, and no more than four trials included the same noun- or digit-triplet type. No more than four noun-first trials occurred in a row so that these less-numerous control trials were sufficiently spread out throughout the task.

Each trial occurred in a fixed order (

Participants received verbal instructions to strive for both accuracy and speed. They completed a demonstration trial with an experimenter who provided instructions and feedback, then completed 10 practice trials. During the experiment, participants were alerted when one-third and two-thirds of the trials were completed. Consistent with procedures adopted by

We created the Sentence Verification Task (

Illustration of the experimental procedure for the Sentence Verification Task.

Each cue was a declarative sentence containing two whole numbers. The semantic content of each cue sentence either implicated multiplication of the two numbers (e.g.

Prompt Type | Digit Type | Cue Sentence | Target Prompt | Filler (Second) Prompt |
---|---|---|---|---|

Implicative Contexts | ||||

Reject | Neutral | “The 3 ships each transported 9 crates.” | ||

Reject | Product | “Evan mowed 10 lawns at 6 dollars each.” | ||

Accept | Neutral | “Frank dealt 4 cards each to 10 poker players.” | ||

Accept | Product | “Gwen bought 6 toys for each of her 4 kids.” | ||

Non-Implicative Contexts | ||||

Reject | Neutral | “Jacky visited 3 orchards to pick 9 peaches.” | ||

Reject | Product | “They reserved 6 tables at the 4 Seasons Cafe.” | ||

Accept | Neutral | “Don baked 3 batches of lemon squares in 9 pans.” | ||

Accept | Product | “Mike won 6 medals in 4 hours.” |

Similar to the Number Matching task, the key contextual distinction between the cue sentences was whether the sentences implicated multiplication of the two numbers appearing in the sentence. In the 16

The target prompt statements always included one number that was either the product of the two numbers from the preceding cue sentence (for 16

One half of target prompts were classified as

We analyzed responses for the first prompt sentence only, because priming effects on the second prompt sentence may have been contaminated by the presence of the first prompt sentence. The second prompts included both

Eight sets of cue sentences and target prompt statements were generated in a 2 (Prompt Type: Accept or Reject) × 2 (Context for Products: Implicative or Non-implicative) × 2 (Digit Type: Neutral or Product) design. There were four sentences per experimental condition, yielding a total of 32 trials.

Several features of the sentences were balanced across conditions in order to strengthen the validity of reaction time (RT) comparisons. Each experimental condition had exactly one trial in which the first prompt referred to the same unit of measurement as the cue. For instance, there were four trials wherein cues implicated multiplication but the product did not appear in the first target prompt (e.g., the cue sentence, “Frank dealt 4

Cue sentences and prompt sentences were finalized through iterative piloting with 210 adults who completed prior versions of the task, either as volunteer study participants at our university (82 participants) or on Mechanical Turk (127 participants), an online marketplace for contract work where participants were paid for their responses. Based on pilot responses, we modified statements to maximize ease of judging the likelihood of being true. In the final pilot testing, two items were excluded for failing to reach our threshold of 80% accuracy, including an Accept/Implicative condition item (71%) and an Accept/Non-implicative condition item (59%). These were omitted because unusually difficult items may introduce cognitive complexity and construct-irrelevant variance to the measures. All remaining items had accuracy rates of 85% or above.

Participants listened to instructions, completed a single demonstration practice trial that did not involve any numbers, and then received feedback. All participants saw the same stimuli in the same quasi-randomly generated order adjusted to limit the number of consecutive trials with the same combination of condition and outcome (no more than two in a row) or the same keyed response (no more than three in a row for the first prompt). No feedback was given on trial responses. The task required about 5 minutes to complete. Two participants were excluded from analyses for failing to respond correctly to any trials in one or more conditions.

Participants completed a three-minute calculation fluency measure, the Math Fluency subtest of the Woodcock-Johnson III, during the testing session. This subtest is from a standardized, paper-and-pencil mathematics achievement measure. Participants were asked to solve as many problems as quickly as possible. Problems appeared in a test booklet, in order of increasing difficulty. The subtest has a median reliability of .92 with adult participants (

Participants were asked to provide their standardized college entrance examination test scores (ACT Math and ACT Reading). The ACT and SAT are widely used standardized college entrance exams. Each exam yields separate Mathematics and Reading scores. Both tests require basic to complex mathematics problem-solving skills or reading skills that tap meaning comprehension. Historically, scores for these exams have been highly correlated, with reported correlations of .92 for composite scores, .89 for Mathematics, and .83 for ACT Reading with SAT Verbal (now labeled Critical Reading;

The study was approved by our institutional human subjects review board. All 92 participants completed the Number Matching, Sentence Verification, and Math Fluency tasks, in that order. (A matching task excluded from the present study was administered as the third of four tasks.) The entire testing session took approximately one hour. In addition, Math and Reading ACT and SAT scores were collected from 73 participants who consented for the University’s Office of Institutional Research to release these scores to the researchers.

We carried out separate analyses for our two primary numerical tasks. We used repeated measures ANOVAs to test for hypothesized main effects and interactions involving noun alignment in the Number Matching task (non-matching trials only) and implication of multiplication in the Sentence Verification task (responses to first prompt sentences only). For the Number Matching task, we first evaluated whether we replicated ^{2}_{GLMM} defined by ^{2}_{GLMM} to evaluate changes in fixed effects and the conditional ^{2}_{GLMM} to evaluate changes in random effects. These measures are not necessarily comparable to the ^{2}

We first examined the degree to which our results replicate those of

Effect | η^{2} |
||||
---|---|---|---|---|---|

Combined Misaligned Analysis | |||||

Digit Type | 1 | 91 | 24.83 | < .001 | .008 |

Context for Sums | 1 | 91 | 2.25 | .138 | .001 |

Digit Type × Context for Sums | 1 | 91 | 15.49 | < .001 | .005 |

Expanded Misaligned Analysis | |||||

Digit Type | 1 | 91 | 14.26 | < .001 | .004 |

Context for Sums | 2 | 182 | 1.21 | > .250 | .001 |

Digit Type × Context for Sums | 2 | 182 | 14.79 | < .001 | .010 |

Our replication attempt was successful. We found a Context × Digit Type interaction (η^{2} = .005) similar in strength to that found by Bassok and colleagues (2008; η^{2} = .008). Non-matching Aligned

Reaction times in the Number Matching task (non-matching trials only) with Context for sums separated (a) two ways into Aligned Concrete-Concrete (ACC) vs. Misaligned, and (b) with Misaligned further separated into Misaligned Concrete-Concrete (MCC) and Misaligned Concrete-Intangible (MCI). Error bars represent one standard error of the mean.

The findings were only partially similar when we separated the two types of misalignment (^{2} = .010) than in the collapsed analysis (η^{2} = .005). Moreover, the MCC trials displayed the classic LeFevre _{contrast} = .336.

However, unlike the MCC condition, the MCI condition had a

We examined patterns of individual responses to rule out the potential influence of outliers on the observed facilitative effect of the MCI trials (under the Sum condition). Distributions and SDs were similar across conditions, and inspection of participant-level distributions of interference (Neutral – Sum) did not reveal outliers. Moreover, a binomial sign test revealed that a statistically significant number of participants (60 of 92) displayed LeFevre interference for ACC trials,

Linear mixed modeling can provide additional insight into individual differences and help bring more features of the design under statistical control (e.g., ^{2}(1) = 1517, ^{2}_{GLMM} = .360, as did including random intercepts for each item, χ^{2}(1) = 58.4, ^{2}_{GLMM} = .021. As with the repeated measures ANOVA, there was a significant Context × Digit Type interaction, Kenward-Roger ^{2}_{GLMM} = .004. Model 2 controls for practice and/or fatigue effects by additionally including a fixed effect for the trial number. For each successive trial, participants performed about 0.0014 trials/s faster, Kenward-Roger

Effect | Model |
|||||||
---|---|---|---|---|---|---|---|---|

(1) |
(2) |
(3) |
(4) |
|||||

β | β | β | β | |||||

Trial number | 0.001*** | 0.0001 | 0.001*** | 0.0001 | 0.001*** | 0.0001 | ||

Digit Type (Sum) | –0.065** | 0.026 | –0.064*** | 0.020 | –0.064*** | 0.020 | –0.064*** | 0.020 |

Context (MCC) | 0.007 | 0.025 | –0.002 | 0.020 | –0.002 | 0.020 | –0.002 | 0.020 |

Context (MCI) | –0.045* | 0.026 | –0.036* | 0.020 | –0.036* | 0.020 | –0.036* | 0.020 |

ACT Reading^{a} |
0.016*** | 0.005 | ||||||

Math Fluency rate | 0.004* | 0.002 | 0.003 | 0.002 | ||||

Digit Type (Sum) × Context (MCC) | 0.011 | 0.036 | 0.021 | 0.028 | 0.021 | 0.028 | 0.021 | 0.028 |

Digit Type (Sum) × Context (MCI) | 0.092** | 0.036 | 0.068** | 0.028 | 0.068** | 0.028 | 0.068** | 0.028 |

Constant | 1.300*** | 0.029 | 1.200*** | 0.029 | 1.000*** | 0.100 | 0.600*** | 0.160 |

Akaike Inf. Crit. | 879 | 643 | 642 | 634 | ||||

Bayesian Inf. Crit. | 935 | 706 | 711 | 710 | ||||

Marginal R^{2}_{GLMM} |
.006 | .050 | .067 | .109 | ||||

Conditional R^{2}_{GLMM} |
.385 | .419 | .419 | .417 |

^{2}_{GLMM} estimates the variance accounted for by fixed effects while conditional ^{2}_{GLMM} estimates the variance accounted for by both fixed and random effects.

^{a}Includes 6 participants with missing ACT Scores imputed from SAT scores.

Math fluency and ACT scores may also capture individual differences relevant to the Number Matching task. Math Fluency had a higher correlation with response speed (

Achievement measures were then entered into the model. ACT Math was not a significant predictor of speed, β = –0.003 trials/s, Kenward-Roger

In summary, we found striking evidence of interactions on the Number Matching task, including interference effects consistent with

In this task, cue numbers were presented within complete sentences that either implicated or did not implicate multiplication, and the prompt statements that followed contained either the product of the cue numbers or a neutral number. Two participants were excluded from these analyses for failing to respond correctly to any trials in one or more conditions.

We first carried out ANOVAs to test whether contextual alignment moderated evaluation of the veracity of cue sentences. This 2 (Prompt Type: Accept or Reject) × 2 (Context for Products: Implicative or Non-implicative) × 2 (Digit Type: Neutral or Product) repeated measures ANOVA focused on participants’ mean response speed on correct trials only. We found a strong Prompt Type × Context × Digit Type interaction, ^{2} = .053, and significant main effects and two-way interactions, excepting the Context × Digit Type interaction (

Effect | η^{2} |
||
---|---|---|---|

Full Analysis | |||

Context | 42.06 | < .001 | .020 |

Prompt Type | 25.03 | < .001 | .031 |

Digit Type | 32.81 | < .001 | .017 |

Context × Prompt Type | 7.70 | .007 | .004 |

Context × Digit Type | 1.07 | > .250 | .001 |

Prompt Type × Digit Type | 35.41 | < .001 | .018 |

Context × Prompt × Digit Type | 97.59 | < .001 | .053 |

Implicative Trials Analysis | |||

Prompt Type | 5.98 | .016 | .011 |

Digit Type | 16.88 | < .001 | .021 |

Prompt Type × Digit Type | 7.01 | .010 | .008 |

Non-Implicative Trials Analysis | |||

Prompt Type | 41.13 | < .001 | .067 |

Digit Type | 12.47 | .001 | .013 |

Prompt Type × Digit Type | 183.34 | < .001 | .145 |

To further understand this three-way interaction, we evaluated Implicative and Non-implicative trials separately (^{2} = .008. Unlike the interference effect seen in Number Matching task, there was little evidence of interference with rejection of incorrect prompts for Implicative Product trials, Δ

Mean reaction times by condition in the Sentence Verification task separated by Prompt Type and into (a) Implicative and (b) Non-Implicative trials. Error bars represent one standard error of the mean.

The repeated measures ANOVA on trials with Non-Implicative contexts revealed clear evidence of a strong Prompt Type × Digit Type crossover interaction, ^{2} = .145 (

Since exactly one trial in each condition had the same unit paired with one of the cue numbers and also the target number (^{2} = .037, but there was no Prompt Type × Digit Type × Unit interaction, ^{2} < .001. This indicates that the interaction is not simply due to the units associated with the digits. Post-hoc comparisons on

We tested for associations between achievement level and Sentence Verification task performance via linear mixed models (

Effect | Model |
|||||||
---|---|---|---|---|---|---|---|---|

(1) |
(2) |
(3) |
(4) |
|||||

β | β | β | β | |||||

Digit Type | 0.029* | 0.017 | 0.029* | 0.017 | 0.029* | 0.017 | 0.050 | 0.076 |

Prompt Type | –0.056*** | 0.019 | –0.055*** | 0.020 | –0.055*** | 0.020 | –0.056*** | 0.020 |

Context | –0.088*** | 0.017 | –0.088*** | 0.017 | –0.011 | 0.042 | –0.012 | 0.042 |

Digit Type × Prompt Type | 0.059** | 0.026 | 0.059** | 0.025 | 0.058** | 0.025 | 0.059** | 0.025 |

Digit Type × Context | 0.130*** | 0.025 | 0.130*** | 0.024 | 0.120*** | 0.024 | 0.130*** | 0.024 |

Prompt Type × Context | 0.110*** | 0.026 | 0.100*** | 0.025 | 0.100*** | 0.025 | 0.100*** | 0.025 |

Digit Type × Prompt Type × Context | –0.300*** | 0.036 | –0.300*** | 0.036 | –0.300*** | 0.036 | –0.300*** | 0.036 |

Fluency rate | 0.004*** | 0.001 | 0.004*** | 0.001 | ||||

Context × Fluency rate | –0.002** | 0.001 | –0.002** | 0.001 | ||||

ACT Math^{a} |
–0.004 | 0.004 | ||||||

ACT Reading^{a} |
0.013*** | 0.003 | ||||||

Digit Type × ACT Math^{a} |
0.006** | 0.002 | ||||||

Digit Type × ACT Reading^{a} |
–0.006*** | 0.002 | ||||||

Constant | 0.530*** | 0.019 | 0.530*** | 0.019 | 0.340*** | 0.066 | 0.120 | 0.110 |

Akaike Inf. Crit. | –474 | –487 | –493 | –506 | ||||

Bayesian Inf. Crit. | –418 | –403 | –397 | –387 | ||||

Marginal ^{2}_{GLMM} |
.059 | .060 | .081 | .110 | ||||

Conditional ^{2}_{GLMM} |
.303 | .327 | .324 | .328 |

^{2}_{GLMM} estimates variance accounted for by fixed effects; conditional ^{2}_{GLMM} estimates variance accounted for by fixed and random effects.

^{a}Includes 6 participants with missing ACT Scores imputed from SAT scores.

*

Random slopes for all main effects were then added to the model, significantly improving the fit according to a likelihood ratio test, χ^{2}(9) = 30.54, ^{2}_{GLMM} = .028. However, the random slope for Context was highly correlated with the random slope for Digit Type, ^{2}(4) = 7.36, ^{2}_{GLMM} = .003. Model 2 therefore excluded this term (see

In contrast to the Number Matching task, associations between math scores and contextual sensitivity

The addition of ACT Math and Reading to the model provided further explanatory power. The final model for the Sentence Verification task, Model 4 (see ^{2}_{GLMM} = .028. Both ACT Reading and Math interacted with Digit Type, with effects that were roughly of equal magnitude but opposite direction. No higher-order three-way interactions of the achievement measures with the condition variables were found,

Modeled Interaction between Context for Products and (a) ACT Math and (b) ACT Reading on the Sentence Verification task.

After adding all significant condition and individual difference terms to the model, we examined whether there remained any evidence of unexplained individual variability in the interactions between variables. There was no evidence of participant-specific differences in the slopes of two-way interactions for Model 4, χ^{2}(22) = 20.68, ^{2}_{GLMM} = .016, and only marginal evidence of potential individual differences when Model 4 was compared with a model that included random slopes for all possible condition interactions, χ^{2}(30) = 41.01, ^{2}_{GLMM} = .029.

Cognitive science has demonstrated that automatized cognitive processes, including arithmetic, can be modulated by context (e.g.,

Our Number Matching task showed that both the LeFevre interference (^{2} = .005) was similar to Bassok et al.’s Experiment 1(η^{2} = .008).

We hypothesized that semantic misalignment lies on a continuum, with more misaligned noun pairs suppressing obligatory arithmetic to a greater degree than less misaligned nouns (

Priming effects for rejection trials on our revised Number Matching task and on acceptance and rejection trials on our Sentence Verification task. These effects contrast with our originally hypothesized results reported in

Our experiment provides evidence that a wholly different effect may occur in specific misaligned conditions, counter to the Bassok effect. In the Misaligned Concrete-Concrete condition,

Another possibility is that participants engage in a rapid, efficient, strategic rejection in the MCI condition. Results suggest that participants automatically added in all conditions, but perhaps obligatory arithmetic

Additional evidence for strategic use of semantic misalignment comes from recent ERP studies on a different type of sentence verification task (

Analogous facilitation effects may also explain our Sentence Verification Task findings. In this task, participants read a sentence that either did or did not implicate multiplication, and then judged if a prompt that followed the sentence was likely to be true or false. When multiplication was

Our findings additionally point to the importance of context beyond the semantic alignment of the nouns that accompany numbers. On the Sentence Verification trials where multiplication was not implicated, the same interaction was observed even on trials where the same unit appeared both in the cue and the target sentences. This suggests that participants react to the broader context of the cue sentence and not only to the semantic alignment of the nouns associated with the cue numbers.

Are these alignment effects subject to individual differences? We found subtle evidence in the Sentence Verification task only, and equally subtle associations with arithmetic fluency scores. Fluency is ostensibly a measure of speed when correctly answering problems in a highly implicative context (an explicit arithmetic task), so it is intriguing that participants with higher math fluency scores made especially efficient use of information in non-implicative contexts. This suggests that fluency may be partially a matter of choosing operations accurately. Products within prompts interacted with math and reading achievement in opposite directions: Participants with

The theoretical bases for individual differences on the Sentence Verification task vary from well-documented individual differences underlying numerosity judgments (e.g.,

How do these findings apply to everyday situations wherein numbers appear in diverse arithmetic and non-arithmetic contexts?

We do not focus on mechanistic explanations for the semantic alignment effects, and thus do not attempt to resolve this issue. Following from Bassok (e.g.,

This work was supported by a Grant-in-Aid of Research, Artistry and Scholarship from the University of Minnesota Office of the Vice President for Research to MM.

The authors have declared that no competing interests exist.

The authors would like to thank Chun Hei Li and Taylor Praus for dedicated assistance with data collection, scoring, and entry; Dr. Sashank Varma for early discussions concerning experimental designs; Dr. Panayiota Kendeou for input on developing our text stimuli; Drs. Sashank Varma and Keisha Varma for invaluable support by sharing their respective lab spaces for recruitment and testing; Ella Coben for assistance in developing the noun triplets; members of the Early Math and Numeracy Lab who provided feedback on the development and piloting of our stimuli and protocols; and the Center for Cognitive Science for providing resources necessary for recruitment and data collection. The authors also thank the editor and two reviewers for feedback on an earlier version of the manuscript. MM conceived the study. MM and EB designed the study and stimuli. EB programmed all data collection protocols and oversaw the study logistics and data collection and, with input from MM and LR, analyzed the data. EB wrote the paper, with MM and LR. NS contributed to data collection, scoring and entry, and preparation of the manuscript.

^{2}

_{GLMM}to random slopes models.

^{2}from generalized linear mixed-effects models.