Success in mathematics requires fluency with numerical symbols and their relation to one another. Indeed, research shows a link between efficient comparison of Arabic numerals and mathematics achievement in children and adults (Castronovo & Göbel, 2012; De Smedt, Noël, Gilmore, & Ansari, 2013; Holloway & Ansari, 2009; Schneider et al., 2017). However, success in higherlevel mathematics such as algebra requires fluency with other symbols that represent numerical magnitude, namely literal symbols (e.g., x) (Rosnick, 1980, 1982; Schoenfeld & Arcavi, 1999). The present study concerns processing literal symbols compared to other symbols that represent numerical magnitude.
Students experience substantial and persistent difficulty with literal symbols from first exposure through college algebra (Akgün & Özdemir, 2006; Booth, 1999; Christou & Vosniadou, 2005; MacGregor & Stacey, 1997; McNeil et al., 2010; Philipp, 1992; Rosnick, 1982; Trigueros & Ursini, 2003; Ursini & Trigueros, 2004). Student difficulty results from many factors, including mathematical syntax (Schoenfeld & Arcavi, 1999), novel algebraic notational conventions (Kieran, 2007), and insufficient explanations in mathematics curricula (Rosnick, 1982). Importantly, student difficulty also stems from literal symbol processing – connecting the literal symbol to its referent (MacGregor & Stacey, 1997; Philipp, 1992; Rosnick, 1982, 1999; Stacey & MacGregor, 1999). Even the basic act of connecting literal symbols to their referents requires more complex processing compared to Arabic numerals because of properties inherent to literal symbols and how they function in mathematics.
Arabic numerals and literal symbols cooccur in algebra problems (e.g., 3x = 12), but the two symbolic representations qualitatively differ in three main ways. First, literal symbols do not have consistent numerical magnitudes across mathematical contexts like Arabic numerals (e.g., x can be 3, 4, or ½). Second, literal symbols do not have a single magnitude, like Arabic numerals (e.g., ‘3’ never stands for both 3 and 4 objects; x can stand for two numbers or all real numbers [Usiskin, 1999]). Third, literal symbols are not specific to numeracy, like Arabic numerals. Children initially develop fluency with literal symbols in the context of literacy. When children see literal symbols in a mathematics context, they bring strong prior associations for those symbols related to reading and writing. For instance, young students may confuse a literal symbol’s numerical magnitude with its numerical position in the alphabet, or confuse the linear ordering of numbers with the linear ordering of the alphabet (MacGregor & Stacey, 1997; Wagner, 1983). Students may also incorrectly associate literal symbols with objects (e.g., a = apples) rather than a related numerical magnitude (e.g., a = the number of apples), or confuse the two associations (McNeil et al., 2010; Rosnick, 1982). Associating literal symbols with objects can interfere with the notion that literal symbols can have a numerical referent (Booth, 1999) and can lead to difficulties with word problems that persist into college (McNeil et al., 2010; Philipp, 1992; Rosnick, 1982). Nascent research into the cognitive mechanisms of literal symbol processing suggests that even for their most restricted use, when they represent one magnitude like Arabic numerals, literal symbols are processed differently (Pollack, Leon Guerrero, & Star, 2016).
In sum, literal symbols signify a substantial departure from the symbolmagnitude mapping that students learn with Arabic numerals or number words. Students’ familiarity with literal symbols in the context of literacy and their inherent properties in mathematics show that literal symbols require more complex cognitive processing compared to Arabic numerals. However, an understanding of this cognitive processing is lacking. A useful starting point, as undertaken in the present study, is to target the core, lowerlevel cognitive processes that support literal symbol use in numerical contexts. Such work can inform future research related to higherlevel mathematical contexts that involve literal symbols.
Measuring Literal Symbol Processing
One way to investigate literal symbol processing is to adapt tasks, such as number comparison, which are commonly used to measure numerical magnitude processing (Dehaene, DehaeneLambertz, & Cohen, 1998; Schneider et al., 2017). In these tasks, participants may view two arrays of dots or two Arabic numerals and judge which one is larger in number. Error rates and response times decrease as the distance between the two numbers increase, a behavioral signature known as the Comparison Distance Effect (CDE) (Moyer & Landauer, 1967). The CDE may arise from the engagement of a mental number line, which indicates numerical magnitude processing (Restle, 1970). Numbers that are closer together are harder to compare since their representations overlap more on the mental number line than numbers that are farther apart. However, many argue that the CDE may result from alternative cognitive processes, such as decisionmaking associated with comparison itself (Cohen Kadosh, Brodsky, Levin, & Henik, 2008; Krajcsi & Kojouharova, 2017; Smets, Gebuis, & Reynvoet, 2013; Van Opstal, Gevers, De Moor, & Verguts, 2008; Verguts, Fias, & Stevens, 2005) or the strength of associations between discrete quantities in a semantic network (Krajcsi, Lengyel, & Kojouharova, 2016). Alternative numerical judgment tasks can elicit distance effects that result from processing numerical magnitude rather than decisionmaking processes, including a priming task (Reynvoet, De Smedt, & Van den Bussche, 2009) or a samedifferent task (e.g., Van Opstal & Verguts, 2011).
In a priming task, participants view two sequentially presented numbers (i.e., primetarget pairs) and compare the target to a fixed standard (i.e., 5). Response times and error rates are smaller when the distance between the prime and target is smaller. This is known as the priming distance effect (PDE), which occurs because the first number primes the comparison (e.g., Defever, Sasanguie, Gebuis, & Reynvoet, 2011; Van Opstal et al., 2008). For example, given the primetarget pairs 3 – 4 and 1 – 4, comparing ‘4’ to ‘5’ will be faster in the first pair because ‘3’ primes ‘4’ more than ‘1’ primes ‘4’.
Pollack et al. (2016) used a priming paradigm to investigate the differences between literal symbol and Arabic numeral processing in adults. The authors tested whether literal symbols produced a PDE when they were assigned a particular numerical magnitude (e.g., y = 9) to use during comparison. Literal symbols could be the prime (i.e., first number in the pair) or the target (i.e., second number in the pair). Consistent with prior literature (Defever et al., 2011), Pollack et al. (2016) found a PDE with Arabic number pairs. However, there was no PDE with literal symbols, even though participants learned the numerical magnitudes with a high level of accuracy. Because participants compared both the prime and target to ‘5,’ the authors used the prime comparisons to look for a CDE and found similar results; a CDE for Arabic numerals, but not literal symbols. These findings suggest that literal symbol processing may fundamentally differ from Arabic numeral processing, even when literal symbols represent one numerical magnitude. Alternatively, the task may not have required participants to process numerical magnitude. Because participants pressed left and right buttons for ‘less than 5’ and ‘more than 5,’ participants may have merely associated literal symbols with ‘left’ or ‘right.’ Further, this study did not include numerical judgments with a nonalphabetic symbol set. Therefore, differences between literal symbol and Arabic numeral processing, and whether preexisting associations in literacy affect literal symbol processing, remain open questions.
The Present Study
The present study addresses both limitations discussed above. First, it examines differences between processing literal symbols and Arabic numerals using a samedifferent task, in which participants decide whether two simultaneously presented symbols represent the same or different numerical magnitudes. This task produces a SameDifference Distance Effect (SDDE) that results from numerical magnitude processing, in which response times and error rates decrease as different numerical distances increase. A samedifferent task is ideal to probe whether literal symbols can also produce distance effects, as with other number symbols. Prior studies have consistently elicited SDDEs with dot arrays, number words, Arabic numerals, and their crossformat combinations, such as ‘TWO’ and ‘7’ (Defever, Sasanguie, Vandewaetere, & Reynvoet, 2012; Dehaene & Akhavein, 1995; Duncan & McFarland, 1980; Van Opstal & Verguts, 2011). Importantly, crossformat combinations produce SDDEs from access to numerical magnitude representations, not decisionrelated processes (Van Opstal & Verguts, 2011) or perceptual effects (Cohen, 2009). Further, while letters (without numerical magnitude) induce comparison distance effects (Lovelace & Snodgrass, 1971; Parkman, 1971), they do not produce SDDEs (Van Opstal & Verguts, 2011), which further supports using a samedifferent task to investigate numerical magnitude processing with literal symbols. Second, the present study incorporates samedifferent judgments with artificial symbols to test whether preexisting associations with literacy affect literal symbol processing.
Participants completed three samedifferent judgments, with Arabic numerals and number words (i.e., numbers only condition), Arabic numerals and literal symbols (i.e., literal symbols condition), and Arabic numerals and artificial symbols (i.e., artificial symbols condition). For the latter two, participants learned symbolmagnitude associations for the literal or artificial symbols. An SDDE was expected with Arabic numerals and number words. If preexisting linguistic associations hamper literal symbol processing, there would be an SDDE for artificial symbols but not literal symbols. Alternatively, both could produce SDDEs, but literal symbol processing may be more difficult, resulting in increased error rates and/or response times.
Because the literal and artificial symbol sets require learned associations, participants with higher working memory capacity may respond more accurately and quickly than participants with low working memory capacity. Indeed, taxing working memory can affect performance on some number comparison tasks (van Dijck & Fias, 2011; van Dijck, Gevers, & Fias, 2009). A working memory task was included to account for this possibility.
Method
Participants
Twentyfour typically developing adolescents aged 1418 (M = 16.63, SD = 1.28, 67% female, 87% righthanded) from the Boston area participated. A sample of adolescents of high school age provides an opportunity to examine literal symbol processing in the earlier stages of developing proficiency with literal symbols. Students in this age range have been introduced to literal symbols in a mathematics context and thus have learned (even if implicitly) that literal symbols can represent numerical magnitudes. Based on prior studies, a sample of 24 is adequate to elicit grouplevel symbolic SDDEs (e.g., Dehaene & Akhavein, 1995; Van Opstal & Verguts, 2011).
Participants were recruited via flyers, online postings, and through schools. All participants were native English speakers, since the language in which numbers are learned contributes to differences in mathematics performance (Dehaene, 1997). To ensure adequate introductory exposure to literal symbols, participants had previously passed Algebra I and either had taken, or were concurrently enrolled in, a subsequent mathematics class (e.g., Geometry). However, participants’ highest level of mathematics was not recorded. Participants who were 18 years old provided consent; those under 18 years old provided assent along with parental permission. The study was approved by the Harvard Committee on the Use of Human Subjects and participants received small monetary compensation for participating.
Stimuli and Procedures
Participation involved a onehour testing session with three samedifferent tasks, two training tasks, and a working memory task. Training and samedifferent judgments were designed using OpenSesame 2.8.3 (Mathôt, Schreij, & Theeuwes, 2012) and presented on a Google Nexus 7 tablet using the OpenSesame Experiment Runtime Application. To assess working memory, participants completed a backward digit span task (Mueller, 2011), which was administered using The Psychology Experiment Building Language (Mueller & Piper, 2014) on a Macbook Pro running OSX.
Participants completed the tasks sitting in a quiet space at their school, in a university research lab, or at a public library. Participants first completed samedifferent judgments with Arabic numerals and number words, and ended with the backward digit span task. Inbetween, participants completed two sets of tasks that involved associating literal or artificial symbols with unique numerical magnitudes and subsequently performing samedifferent judgments with those symbols. The symbol set (i.e., literal symbols, artificial symbols) that participants worked with first was counterbalanced.
SameDifferent Judgment With Numbers Only
Participants viewed crossnotation pairs of Arabic numerals and number words using 1, 2, 7, and 8; and “ONE,” “TWO,” “SEVEN,” and “EIGHT,” in Arial font size 72. Crossnotation pairs eliminate a visual matching strategy that interferes with semantic processing (Defever et al., 2012; Van Opstal & Verguts, 2011). Participants judged whether the pairs represented same or different magnitudes. Same pairs (e.g., ONE1) had a distance of zero and different pairs were either Near, with a distance of one (e.g., TWO1), or Far, with a distance of five, six, or seven (e.g., 8ONE). There were 8 trials of Same pairs, each shown four times, 8 trials of Near pairs, each shown twice, and 16 trials of Far pairs, each shown twice, for 80 trials per block. There were three blocks, resulting in 240 total trials, shown in a pseudorandom order. As in Van Opstal and Verguts (2011, see Experiment 2), participants saw Near and Far pairs equally often. Same pairs were shown twice as often to provide approximately the same number of left and right responses. Trials began with a fixation dot displayed for 500 ms, followed by the number pair, which displayed until response. Participants touched the left side of the screen if the number pair represented the same magnitude and touched the right side of the screen if the number pair represented different magnitudes. The trial ended with a 500 ms intertrial interval.
Literal Symbols
Participants first completed a pairedassociate learning task with four symbolmagnitude associations, similar to a procedure used in Pollack et al. (2016). Table 1 provides the set of literal symbols (adopted from Van Opstal & Verguts, 2011) and their numerical equivalents. The goal was for participants to equate the literal symbols with numerical magnitudes for use in a subsequent samedifferent task. Initially, participants viewed the four associations (e.g., Q = 1) for as long as needed, but for a minimum of 20 seconds. Then participants completed a test of the associations. Participants saw a symbol for 750 ms and were asked to vocalize the associated numerical magnitude and select it from a choice screen containing 1, 2, 7, and 8, each in one quadrant of the screen, displayed until response. The number positions changed for each trial. Feedback of “Good job!” or “Oops!” displayed for 750 ms and the correct symbolmagnitude association was displayed for 750 ms. The trial ended with a 500 ms intertrial interval. The task began with eight practice trials, two of each association. To test all associations approximately equally, trials were presented in the same pseudorandom order for all participants. Trials continued until the participant reached an accuracy of at least 93% with at least 21 additional trials.^{i} On average, participants completed 29 trials (SD = 23, range: 21109). This task took approximately five minutes.
Table 1
Arabic numeral  Literal symbol  Artificial symbol 

1  Q  
2  G  
7  R  
8  H 
After pairedassociate learning, participants completed a samedifferent task. Pairs were from the numerical samedifferent task, but in the different number pairs, the number words “ONE,” “TWO,” “SEVEN,” and “EIGHT” were replaced with Q, G, R, and H. Half of the Same pairs had the literal symbol on the left (e.g., Q1) and the remaining half had the literal symbol on the right (e.g., 1Q). All procedures were the same as the numerical samedifferent task.
Artificial Symbols
As with literal symbols, participants completed a pairedassociate learning task and then completed a samedifferent task. All participants successfully completed the learning task with 8 practice and 21 additional trials. In the samedifferent task, different number pairs consisted of the numbers 1, 2, 7, and 8, and four artificial symbols, which were corresponding Gibson figures used in prior research on artificial symbol training (Tzelgov, Yehene, Kotler, & Alon, 2000). Gibson, Gibson, Pick, and Osser (1962) created these figures in accordance with characteristics of letters (e.g., number of curves, lines, and angles; open and closed forms; symmetry).
Table 1 displays the four artificial symbols, and their literal symbol and numerical equivalents. As with literal symbols, half of the Same pairs had the artificial digit on the left and the other half had the artificial digit on the right. All other procedures were the same as in the literal symbols condition.
Backward Digit Span
Finally, each participant completed a backward digit span task (Mueller & Piper, 2014) to measure working memory capacity. Each participant saw a sequence of numbers 3 to 10 digits in length. After the sequence ended, the participant typed the sequence in reverse order. If the response was correct, the next sequence was one digit longer. If incorrect, the next sequence was the same length. Length and number of correct items were recorded; participants’ backward digit span was measured as the previous string length after two unsuccessful recalls at the same string length. Average backward digit span was 6.71 (SD = 1.55, range: 410).
Data Analysis
Following Sasanguie, Defever, Van den Bussche, and Reynvoet (2011) and Van Opstal and Verguts (2011), same pairs were excluded from analysis since the SDDE only manifests between Near and Far pairs, even for crossnotation pairs. Mean error rate was calculated for each participant, separately for Near and Far pairs for each symbol set (i.e., numbers only, literal symbols, artificial symbols). Median response time was calculated for each participant for correct responses, separately for Near and Far pairs for each symbol set.
Error rate and response time were modeled as separate outcomes. The relationship between distance and each outcome was estimated using randomeffects multilevel models. This approach is preferred for data analyses because it accounts for repeatedmeasures of distance (i.e., near, far) for each participant, simultaneously provides estimates of differences between all three symbols sets, allows for the inclusion of covariates, and facilitates model comparison. Equation (1) describes the twolevel multilevel model:
1
${Y}_{ij}={\beta}_{0}+{\beta}_{1}NEA{R}_{ij}+{\beta}_{2}NU{M}_{ij}+{\beta}_{3}A{S}_{ij}+{\beta}_{4}{\mathit{X}}_{j}+\left({e}_{ij}+{u}_{j}\right)$In Equation 1, ${Y}_{ij}$ represents each outcome (i.e., error rate, response time) for each distance i and each participant j. NEAR_{ij} is a dichotomous predictor in which Near trials are coded as ‘1’ and Far trials are the reference category. A set of dummy predictors, NUM_{ij} and AS_{ij}, represent the Arabic numeral (NUM_{ij} = 1) and artificial symbol (AS_{ij} = 1) sets for each distance i and participant j; literal symbols serve as the reference category to facilitate comparison between participants’ performance with literal symbols and the other symbol sets. A vector of participant level covariates, X_{j}, includes backward digit span, age, gender, and order of administration (for potential order effects beyond counterbalancing). Finally, the model includes random effects for the repeatedmeasures (e_{ij}) and participant levels (u_{j}) to account for the multilevel nature of the data. There are three parameters of interest; ${\beta}_{1}$ captures the effect of distance on the outcome, ${\beta}_{2}$ represents the average difference in the outcome between samedifferent judgments with number symbols and literal symbols, and ${\beta}_{3}$ represents the average difference in the outcome between samedifferent judgments with artificial and literal symbols. Together, ${\beta}_{2}$and ${\beta}_{3}$ represent the effect of symbol set.
Models were fit using maximum likelihood estimation and bootstrapped standard errors (200 replications) to address violations of L1 and L2 residual normality. For error rate, ShapiroWilk tests showed violations of Level 1 (W = 0.955, p < .001) and Level 2 (W = 0.905, p = .03) residual normality. Similarly, for response time, ShapiroWilk tests showed violations of Level 1 (W = 0.876, p < .0001) and Level 2 (W = 0.891, p = .01) residual normality. Lastly, due to a software malfunction, one participant’s data for the literal symbols samedifferent condition was not recorded.
Results
Descriptive Statistics
Table 2 displays the descriptive statistics for error rate and response time for each of the three symbol sets, both overall and by Near and Far distances. As Table 2 shows, error rates were larger for Near trials than for Far trials, in the sample. Sample response times were longer for Near trials compared to Far trials. Further, sample error rates were higher on average for literal and artificial symbols than for the numbers only condition. Average sample median response times were longer for literal and artificial symbols compared to the numbers only condition.
Table 2
Symbol Set  Error rate (%)

Response time (ms)



Overall  Near  Far  Overall  Near  Far  
Numbers only  
M  3.56  2.95  1.17  772  809  770 
SD  3.29  3.63  1.61  130  148  145 
Literal symbols  
M  4.91  4.26  1.59  921  953  931 
SD  5.44  4.05  3.09  220  189  257 
Artificial symbols  
M  4.58  4.60  1.91  843  872  846 
SD  4.05  4.56  2.31  140  185  147 
No Differences Across Symbolic Formats for Error Rate
Table 3 displays a taxonomy of fitted models estimating the effects of distance and symbol set on each outcome, and includes parameter estimates, standard errors, random effects, and goodnessoffit statistics. The left portion of Table 3 shows three fitted models estimating the effects of distance and symbol set on error rate.
Table 3
Parameter  Error rate (%)

Response time (ms)



Model 1  Model 2  Model 3  Model A  Model B  Model C  
Intercept  
$\widehat{\beta}$  2.718***  1.528***  1.650**  860.870***  846.683***  924.343*** 
SE  0.470  0.397  0.582  29.875  30.392  42.817 
Near  
$\widehat{\beta}$  2.377***  2.377***  28.366***  28.366***  
SE  0.346  0.346  6.772  6.772  
Numbers only  
$\widehat{\beta}$  0.777  149.068***  
SE  0.545  37.193  
Artificial symbols  
$\widehat{\beta}$  0.417  79.516**  
SE  0.582  28.951  
Random effects  
${\widehat{\sigma}}_{u}$  2.239**  2.302**  2.311**  146.366***  146.505***  147.899*** 
${\widehat{\sigma}}_{e}$  2.737***  2.406***  2.344***  119.368***  118.350***  98.240*** 
$\widehat{\rho}$  0.401*  0.478**  0.493**  0.601***  0.605***  0.694*** 
Goodnessoffit  
Loglikelihood  363.647  348.463  345.344  908.041  907.030  884.888 
*p < .05. **p < .01. ***p < .001.
Model 1 in Table 3 displays the fitted error rate across all distances and symbol sets. The intraclass correlation (i.e., $\widehat{\rho}$) for Model 1 shows that 40% of variance in error rate was attributable to differences across participants. As Model 2 shows, there was a distance effect for error rate. The error rate on Near trials was 2.377 percentage points higher than the error rate on Far trials, on average (z = 6.88, p < .0001). The third model estimates the effects of distance and symbol set on error rate. As Model 3 shows, there was not a statistically significant relationship between symbol set and error rate (ps > .15). Further, the relationship between distance and error rate was unchanged, showing the distance effect on error rate remained when controlling for symbol set. Taken together, these models show an SDDE for error rate across all symbol sets, but no effect of symbol set.
Additional models were fit to test for interactions between symbol set and distance and to test the effects of the backward digit span, gender, age, and order covariates. None produced statistically significant relationships with error rate (ps > .34 and ps > .13, respectively). Figure 1 illustrates the fitted relationship between error rate and distance by symbol set (i.e., Model 3).
Figure 1
Differences Across Symbolic Formats for Response Time
The right portion of Table 3 shows three models estimating the effects of distance and symbol set on response time. Model A shows the fitted average response time across all distances and symbol sets. The intraclass correlation for Model A shows that approximately 60% of the variance in response time was attributable to differences between participants. Model B shows a statistically significant distance effect for response times. On average, participants took about 28 ms longer to respond to Near trials than to Far trials (z = 4.19, p < .0001). Model C shows the distance effects remained when controlling for symbol set.
Model C also shows there was a statistically significant difference in response time between literal symbols and the numbers only condition; it took about 149 ms longer, on average, to make samedifferent judgments with literal symbols, controlling for distance (z = 4.01, p < .0001). Crucially, there was a statistically significant difference in response time between literal symbols and artificial symbols. Average response times were about 80 ms longer when making samedifferent judgments with literal symbols, controlling for distance (z = 2.75, p = .006). Further, a General Linear Hypothesis test showed that, on average, participants took 69.55 ms longer to make samedifferent judgments with artificial symbols compared to Arabic numerals (SE = 19.63, z = 3.54, p < .001). This shows that samedifferent judgments were fastest in the numbers only condition, which is expected because of participants’ fluency with Arabic numerals and number words, and a lack of automatic associations between numerical magnitudes and the other two symbol sets.
Subsequent models were also fit to test for interactions between distance and symbol set, and to test for effects of the backward digit span, gender, age, and order covariates. None showed a statistically significant relationship with response time (ps > .40 and ps > .18, respectively). Figure 2 illustrates Model C, the relationship between response time and distance for each symbol set.
Figure 2
To further investigate the response time differences, an additional multilevel model was fit using trial level data, adjusting the standard errors to account for the clustering of trials within participants. Unlike the more common participantlevel analyses, which weight each participant’s aggregated response times equally, the trial level analysis differentially weights participants according to the number of trials each participant contributes (i.e., how accurate the participant is). Table 4 displays three models (A1, B1, C1) that relate to the participant level Models AC (see Table 3). As Table 4 shows, the results were consistent across both analyses. For distance, comparison of Near trials took longer on average than Far trials. For symbol set, comparison with literal symbols took longer than comparison with numbers only and with artificial symbols, and the difference was greater in magnitude for the former than the latter. The trial and participant level estimates differ slightly, which is expected based on the differential weighting of the data (since only accuratelyanswered trials were included in the response time analysis). All subsequent models testing covariates and interactions were consistent with the participantlevel analysis.
Table 4
Parameter  Response time (ms)



Model A1  Model B1  Model C1  
Intercept  
$\widehat{\beta}$  946.508***  937.758***  1045.849*** 
SE  39.080  39.777  62.150 
Near  
$\widehat{\beta}$  26.699***  26.917***  
SE  6.923  6.877  
Numbers only  
$\widehat{\beta}$  207.910***  
SE  54.421  
Artificial symbols  
$\widehat{\beta}$  108.312*  
SE  43.908  
Random effects  
${\widehat{\sigma}}_{u}$  199.857***  199.873***  198.492*** 
${\widehat{\sigma}}_{e}$  406.508***  406.314***  397.485*** 
$\widehat{\rho}$  0.195  0.195  0.200 
Goodnessoffit  
Loglikelihood  73600.353  73595.646  73378.444 
*p < .05. **p < .01. ***p < .001.
Taken together, these results show that in addition to digits and number words, literal and artificial symbols produced an SDDE. As one would expect, it took longer, on average, to make samedifferent numerical magnitude judgments with symbols that are not Arabic numerals or number words. Importantly, these results suggest that some symbol sets may be more difficult to work with than others. Specifically, while there was no difference in accuracy when working with different symbol sets, samedifferent judgments with literal symbols elicited a cognitive processing cost compared to artificial symbols, which manifested as a longer response time, on average, regardless of distance.
Discussion
The present study used a series of samedifferent judgments to examine the nature of literal symbol processing related to numerical magnitude. It replicated an SDDE with Arabic numeral  number word pairs and examined whether preexisting associations for literal symbols hinder literal symbol processing. Each aim is discussed in turn.
SDDEs Across Symbolic Formats
Participants compared crossnotation pairs of number words and Arabic numerals that either represented the same magnitude or differed in magnitude by a small or large amount. Participants showed an SDDE; error rate was lower and response times were shorter on average for number pairs with greater distance. These results support the presence of a symbolic SDDE in adolescents, which aligns with prior research on the symbolic SDDE in children (e.g., Defever et al., 2012) and adults (e.g., Dehaene & Akhavein, 1995; Van Opstal & Verguts, 2011).
After pairedassociate learning, participants completed samedifferent tasks with literal and artificial symbols. Both symbol sets produced SDDEs for error rate and response time. Building on prior research on the SDDE (e.g., Dehaene & Akhavein, 1995; Van Opstal & Verguts, 2011), these findings imply that samedifferent judgments with newlylearned symbol formats involve processing numerical magnitude. These SDDEs also align with related studies that show congruity effects for numerical Stroop tasks with artificial symbols (Cohen Kadosh, Soskic, Iuculano, Kanai, & Walsh, 2010; Tzelgov et al., 2000) and comparison distance effects with artificial symbols associated with nonsymbolic numerical magnitudes (Lyons & Ansari, 2009). The present results extend this work, suggesting that nonnumeric symbols associated with symbolic, cardinal representations of number show behavioral signatures of numerical processing.
Further, the magnitude of the SDDEs did not differ across symbolic formats. This may seem surprising, since the literal and artificial symbol trainings may be akin to the development of symbolquantity associations. Research on changes in the CDE over developmental time suggests that distance effects may be smaller (Lyons, Nuerk, & Ansari, 2015) or larger (Halberda & Feigenson, 2008; Holloway & Ansari, 2009; Sekuler & Mierkiewicz, 1977) for symbols learned more recently. However, literature examining SDDEs over developmental time suggest a stable magnitude across people of different ages (Defever et al., 2012). Priming paradigms also appear to produce stable distance effect magnitudes across developmental time (Defever et al., 2011; Reynvoet et al., 2009). These differences, between the stability of the SDDE and the PDE on one hand, and the variability of the CDE on the other, may stem from the latter’s lack of dependence on access to numerical magnitude representations.
Literal Symbol Processing
The present study lends additional insight into literal symbol processing of numerical magnitude. An SDDE when making judgments with literal symbols stands in contrast to prior related work in which literal symbols did not produce a PDE, while Arabic numerals did (Pollack et al., 2016). Differences in findings across the two studies may result from methodological differences. It may be that the samedifferent task required numerical magnitude processing while the priming task in Pollack et al. (2016) did not, since participants could use a ‘left’‘right’ strategy rather than processing numerical magnitudes. Alternatively, samedifferent and priming numerical judgments may require a different level or strength of association between symbols and their numerical referents. Arabic numerals have strong, permanent associations with numerical magnitudes that are overlearned through formal schooling, whereas associations between literal symbols and numerical magnitudes in authentic mathematical contexts are temporary. Temporary associations may not be encoded strongly enough to elicit a PDE as with Arabic numerals, but may be strong enough to elicit an SDDE. Finally, the SDDE and PDE may arise from different mechanisms, such as a differential reliance on automatic versus intentional processing (Sasanguie et al., 2011), which could explain the discrepant findings across the two studies.
Even though an SDDE was present for literal symbols for both error rate and response time, the present study suggests that literal symbol processing involving numerical magnitude may differ from processing with other symbolic formats. It is expected that samedifferent judgments would be fastest in the numbers only condition because of strong associations between numerical magnitudes, Arabic numerals, and number words. However, when comparing performance between the literal and artificial symbol conditions, participants took longer on average to make numerical judgments with literal symbols, controlling for distance. Put another way, even though one would expect a processing cost when recalling newer associations generally, that cost is greater in magnitude for literal symbols compared to artificial ones. This suggests that the cognitive processing cost that manifests between the literal and artificial symbols conditions results from literal symbol processing per se.
What might account for this difference? First, one might argue that the difference in average response time between literal and artificial symbols may depend on the specific symbols and associations. For example, all symbolmagnitude associations were randomly assigned, which resulted in a numerical magnitude ordering for literal symbols that does not preserve the linear ordering of the alphabet (i.e., Q, G, R, H), which could in turn produce longer response times. However, this seems unlikely here given that the linear ordering of the alphabet does not affect performance on the samedifferent task (Van Opstal & Verguts, 2011). If ordinal positions of letters in the alphabet were to affect numerical magnitude processing, this would suggest one mechanism by which literal symbols are more difficult to work with in numerical contexts due to their preexisting associations from literacy (i.e., ordinal representations). Such hypotheses can be tested explicitly in future studies, for example by contrasting performance utilizing symbolquantity mappings that preserve and do not preserve the ordinality of the alphabet.
Second, the cognitive processing cost associated with literal symbol samedifferent judgments may reflect executive function ability. While in the present study working memory was not a statistically significant predictor of performance, inhibition may play a role. Participants may inhibit a literal symbol’s competing representation in the context of literacy when processing associated numerical magnitude information. Indeed, recent research has shown that inhibitory control plays an important role in performance on tasks that measure numerical magnitude processing (e.g., Gilmore et al., 2013). Future research can examine whether inhibitory control, measured in numerical and nonnumerical contexts, may mitigate the magnitude of the literal symbol processing cost.
Limitations
There are several limitations to the present study. First, participants likely had heterogeneous levels of knowledge and instruction related to literal symbols. Participants came from different schools and may have had different (or no) formal instruction related to the role of literal symbols in mathematics (see Nie, Cai, & Moyer, 2009). Second, the participant age range was relatively large, suggesting the sample spanned beginning high school to beginning college. Age was not a statistically significant predictor of either outcome, which aligns with prior research on the SDDE (Defever et al., 2012; Duncan & McFarland, 1980). However, there may be differences in literal symbol processing across age, grade, or level or type of exposure to literal symbols. Future studies can address this limitation with a crosssectional study that tests groups of participants from different grades in the same school, for example, and by confirming with a larger sample size that the covariates tested in the present study are unrelated to response time. A third limitation pertains to the Arabic numeral and number word stimuli. As noted in prior studies on symbolic number processing (Peters, De Smedt, & Op de Beeck, 2015), numerosity is confounded with the length of number words, which may affect samedifferent judgments when comparing Arabic numerals and number words. For example, ‘1’ is a small numerosity and ‘ONE’ has relatively few letters, whereas ‘8’ is a larger numerosity and ‘EIGHT’ has relatively many letters. Finally, the samedifferent task contrasts performance between different numerosities with Near and Far distances. As such, this task precludes a finegrained analysis of distance that is more common to comparison tasks. However, samedifferent tasks eliminate the confound of distance effects that arise from decisionrelated processes (Van Opstal et al., 2008). These considerations underscore the importance of task selection when measuring numerical magnitude processing.
Conclusion
The present study examined differences in numerical magnitude processing across three symbolic formats. It focused on differences between literal and artificial symbol processing using samedifferent numerical judgments. Participants completed three samedifferent tasks in which they compared Arabic numerals to number words, literal symbols, or artificial symbols. To work with literal and artificial symbols, participants first associated these symbols with numerical magnitudes. All three symbol sets elicited an SDDE, evidence of numerical processing across all symbolic formats. Importantly, samedifferent judgments with literal symbols took longer than their artificial symbol counterparts, on average, regardless of distance. These findings suggest that literal symbol processing of numerical magnitude may engage mechanisms that require additional cognitive processing steps, such as reducing interference from competing representations or processing verbal labels, that are not required for working with other number symbols.
To learn algebra, students must develop fluency and flexibility with literal symbols, including when they represent numerical magnitude. The core of this fluency and flexibility is symbolreferent connections, investigated here in a restricted case, in which literal symbols represented one numerical magnitude as Arabic numerals do. While persistent student difficulty with literal symbol processing results from myriad factors, the present study suggests that difficulty partially results from unique, lowerlevel cognitive processing demands required for basic processing of literal symbols as symbols that represent numerical magnitude in mathematics contexts. Additional research can illuminate the nature of such mechanisms and how they impact students’ understanding of literal symbols, which in turn can support learning in higherlevel mathematics.