Number and space are known to be associated to each other, as first mentioned by Galton (1880), who reported participants to have vivid imaginations of a mental number line. Later, Dehaene et al. (1993) discovered an effect indicating a tight link between numbers and space that did not rely on introspection, namely the SNARC-effect (an acronym for Spatial Numerical Association of Response Codes). When participants perform a parity judgement task by pressing a left or a right button towards odd or even numbers, they tend to have faster left-hand responses for small numbers, while they show faster right-hand responses for large numbers. This effect is classically explained by the mental number line, which states that numbers are represented on a left to right oriented line, as on a ruler (Dehaene et al., 1993), although other accounts have been proposed more recently, such as the polarity coding account (Proctor & Cho, 2006), the working memory account (van Dijck & Fias, 2011) or the brain’s asymmetric frequency tuning account (Felisatti et al., 2020).
Apart from having been studied as a topic on itself, addressing questions like with which type of number stimuli it occurs (Ebersbach et al., 2014; Fias, 2001; Nemeh et al., 2018; Nuerk et al., 2005) or with which tasks (Didino et al., 2019; Fias et al., 1996; Fias et al., 2001), or what type of spatial information is involved (Aleotti et al., 2020; Dehaene et al., 1993; Wood et al., 2006), the SNARC effect has been investigated as being possibly related to mathematical abilities, the idea being that well elaborated spatial representations of numerical magnitude scaffold the learning and development of mathematical concepts and procedures (Cipora et al., 2020). The results of the studies that looked at the correlations between the SNARC effect and mathematics, however, are far from unequivocal. While there is some evidence that better math performance is associated with a stronger SNARC effect in children (Bachot et al., 2005; Crollen et al., 2015; Georges et al., 2017; Hoffmann et al., 2013) and with a weaker SNARC effect in adults (Cipora et al., 2020; Cipora et al., 2016; Hoffmann et al., 2014; Kramer et al., 2018), the majority of studies reveals no significant relation between SNARC and mathematics (Bonato et al., 2007; Bull et al., 2013; Crollen & Noël, 2015; Dehaene et al., 1993; Gibson & Maurer, 2016; Göbel et al., 2015; Schneider et al., 2009). Currently, the reason for these inconsistent findings is yet to be understood. An issue that arises if one wants to correlate mathematical performance with the SNARC effect, is how individual differences in the SNARC effect are measured. Different tasks or task designs may measure different aspects of spatial number processing or may measure them differently. Perhaps these paradigmatic differences may explain some of the inconsistencies.
Different tasks recruit the spatially defined number magnitude representations in a different way and therefore may have contributed the inconsistency in findings. While at first sight, the most frequently used tasks, i.e. magnitude classification and parity judgment, measure the strength of a mental number line in a similar way, as both tasks use left- and right-hand responses and quantify how strongly their latencies differ for small versus large numbers, there are substantial differences between the two tasks. As pointed out by Gevers et al. (2006) the magnitude classification SNARC effect and the parity judgment SNARC effect are qualitatively different, the first one being categorical (with numbers smaller than the reference showing equal left hand reaction time advantages and numbers larger than the reference showing equal faster right hand responses) and the second one being continuous (with the left hand advantage for the smallest number gradually changing to right hand advantages for the largest numbers). The categorical nature of the magnitude classification SNARC effect has been shown to originate from the distance effect (Gevers et al., 2006), which is inherent to number comparison. Consequently, the magnitude classification SNARC effect can be in a less pure measure of how strong number and space are related, as the SNARC measure does not only depend on how strong the magnitude representations of a participant are associated to space but also on how precise the representations represent number magnitude (with more sharply tuned magnitude representations leading to smaller distance effects). Hence, while the magnitude classification task is useful to evaluate whether or not a SNARC effect occurs, for instance to verify spatial coding in young children who don’t understand the parity concept yet (see for instance Hoffmann et al., 2013), it is not well suited to quantify the strength of spatial number magnitude coding.
The parity judgment provides a purer measure of the association of individual number magnitudes and their spatial coding, as it is not influenced by the distance effect. Nevertheless, there are several design choices that might affect the size of the SNARC effect which may, accordingly, affect correlations with mathematical ability. Two factors are important to consider. A first one is mapping order, which has been largely overlooked in previous research (Cipora, Soltanlou, et al., 2019). Participants either start with even-left and odd-right mapping or with even-right and odd-left mapping, and then in the second half of the experiment the other mapping is used, so that at the end of the experiment both left- and right-hand responses are obtained for every number. The mapping and the order in which the two mappings are administered may not be trivial, as the mapping coincides with congruency levels of the Markedness of Response Codes (MARC, Nuerk et al., 2004) effect and may therefore influence reaction times. The MARC effect reflects effects of congruency between linguistic markedness of parity status and of left vs right. Linguistic markedness refers to the fact that in pairs of words reflecting a dichotomy (like even-odd, left-right, happy-unhappy etc.) one of the words can be considered atypical (labelled as ‘marked’) as opposed to the typical word (labelled as unmarked). For parity status even is linguistically unmarked and odd is marked) and for left vs right it is the case that right is unmarked and left is marked. Thus, in the situation where participants have to respond right to even numbers and left to odd numbers there is congruency between the linguistic markedness of the even-odd and right-left dichotomy. Conversely, when responding left to even numbers and right to odd numbers a markedness incongruent mapping occurs. The MARC effect reflects the difference in reaction times between the markedness incongruent and the markedness congruent mapping. Knowing that response latencies can have an impact on the size of the SNARC effect, with larger latencies resulting in stronger SNARC effects (Gevers et al., 2006), the mapping order could have an important influence on the measured size of the SNARC effect. This effect of mapping could even be more pronounced if one considers a general learning effect, which implies elevated response times in the beginning of the task compared to the end of the task. Also, the switch cost associated to the change of mapping needs to be considered, especially because switch costs are known to be typically asymmetric, i.e. larger switch costs from a difficult task to an easy task (Allport et al., 1994; de Jong, 1995; Meuter & Allport, 1999; Yeung & Monsell, 2003), resulting in case of a parity judgement SNARC experiment in a larger switch cost for the MARC incongruent-to-congruent mapping compared to the MARC congruent-to-incongruent mapping. The effect of mapping order can be eliminated by counterbalancing in group-level studies but not at an individual level for the study of individual differences.
A second factor that might affect the size of the SNARC effect is the nature of the instruction. Because a number’s parity status cannot be derived from the surface characteristics of the stimulus, it has to be retrieved from memory. This is supposed to be mediated by access to semantic memory taking the form of a mental number line (Fias et al., 1996). In this respect, instructing the participants to indicate whether a number is odd or even by pressing either of two response keys fits well the purpose of obtaining a valid measure of the SNARC effect. Yet, the step towards a different type of instruction, imposed by the experimenter or self-generated by the participant at various stages in the experiment, is small. Grouping the stimuli in two categories (2-4-6-8 and 1-3-7-9), associating a response to these two categories and responding on the basis of these category-response bindings stored in working memory (for instance “if 2 4 6 or 8 press left, if 1 3 7 9 press right” isn’t farfetched as a possible strategy. Using this strategy, semantic processing is no longer necessary to solve the task. Hence, with instructions that make explicit reference to the possibility of distinguishing between two sets of numbers, one might obtain differently sized SNARC effects compared to instructions where no reference is made to these categories.
Given the potential influence of mapping and instruction, we decided to experimentally verify the role of order of mapping and of the nature of the instruction on the size of the SNARC effect as measured in the context of a parity judgment task. The effect of mapping was investigated by comparing the MARC incongruent first to the MARC congruent first mapping conditions. The effect of instruction was investigated by comparing instructions which require semantic processing (odd/even) and instructions which do not require semantic processing (2-4-6-8/ 1-3-7-9) within a parity judgement task. As it is impossible to rule out transfer from one condition to the other if the different conditions were administered to the same participants, a between subject design was adopted.
Method
Participants
A total of 119 psychology students took part in this study, however, only the data of 116 participants were used (97 female, 104 right-handed, mean age (SD = 18.9 (2.3), range = 17-36). The data of one participant was removed due to technical issues, another participant did not follow the instructions and finally, of one participant we could not calculate the SNARC effect due to empty cells. All participants had normal to corrected-to-normal vision. All participants had Dutch, which is read from left to right, as their main language. Five participants state that they have been exposed to right-left languages. All participants gave informed consent to participate and received one course credit for their participation. This study is in line with the regulations of the Ethical committee of Ghent University.
Materials and Procedure
Participants enrolled for a session of approximately one hour, each session containing three different tasks, always in the same order: a parity judgement task (with four different conditions across participants), an ordinal position working memory task (cfr. van Dijck & Fias, 2011) and a Number Line Estimation task (Berteletti et al., 2015), which won’t be further investigated in this study. Apart from a brief general explanation, all task instructions were included within the test battery, which was self-paced, followed by a practice block for each task. In case there were any questions, the experimenter was available to assist the participants. Data collection took place in groups of maximum 10 participants. The study was implemented using E-Prime Go (Psychology Software Tools, 2020), which enables millisecond accuracy for stimulus presentation and response registration.
A 2x2 between subject factorial design (mapping order x instruction) was set up to verify the role of mapping order and instructions, resulting in four different conditions. An overview of these conditions can be found in Table 1.
Table 1
Overview of the Different Conditions
| Instruction | |||
|---|---|---|---|
| Semantic processing (Even or odd) | Non-semantic processing (‘2-4-6-8’ or ‘1-3-7-9’) | ||
| mapping order | Condition 1 | Condition 2 | |
| A MARC incongruent | EVEN → press left | 2 – 4 – 6 – 8 → press left | |
| ODD → press right | 1 – 3 – 7 – 9 → press right | ||
| switch mapping | switch mapping | ||
| B MARC congruent | EVEN → press right | 2 – 4 – 6 – 8 → press right | |
| ODD → press left | 1 – 3 – 7 – 9 → press left | ||
| Condition 3 | Condition 4 | ||
| B MARC Congruent | EVEN → press right | 2 – 4 – 6 – 8 → press right | |
| ODD → press left | 1 – 3 – 7 – 9 → press left | ||
| switch mapping | switch mapping | ||
| A MARC incongruent | EVEN → press left | 2 – 4 – 6 – 8 → press left | |
| ODD → press right | 1 – 3 – 7 – 9 → press right | ||
Note. C1 starts with the MARC incongruent mapping and instruction which requires semantic processing by pressing left for even numbers and right for odd numbers. C2 starts with the MARC incongruent mapping and instruction which requires non-semantic processing by pressing left for the numbers ‘2-4-6-8’ and right for the numbers ‘1-3-7-9’. C3 starts with the MARC congruent mapping and instruction which requires semantic processing by pressing right for even numbers and left for odd numbers. C4 starts with the MARC congruent mapping and instruction which requires non-semantic processing by pressing right for the numbers ‘2-4-6-8’ and left for the numbers ‘1-3-7-9’.
The first factor, the mapping order, differed across conditions: participants in one condition started by pressing left to even/2-4-6-8 numbers and right to odd/1-3-7-9 numbers (Condition 1 and 2). The other group started by pressing right to even/2-4-6-8 numbers and left to odd numbers/1-3-7-9 (Condition 3 and 4). Halfway the experiment, the initial mapping was reversed. The second factor, instruction, implied either a semantic processing or a non-semantic processing strategy. Within the semantic strategy condition, participants received the instruction to respond as a function of the parity status of the target number (Condition 1 and 3). In the non-semantic processing task, participants were instructed to respond with one button to the numbers 2, 4, 6, 8 and with the other button to 1, 3, 7, 9 (Condition 2 and 4), without any reference to parity or magnitude.
The experiment consisted of 256 trials, equally divided across 4 blocks of 64 trials. Each trial started with an empty black square on a white background (22% x 35% of screen size, border 2px). After 300ms, one of the eight possible stimuli was shown (Arabic digits: 1, 2, 3, 4, 6, 7, 8, 9) for 2000ms in the middle of the square (Arial, font size 48) followed by an intertrial interval (an empty screen) of 1000ms. Participants were asked to judge Arabic digits as fast and accurately as possible by pressing a left ‘f’ or right ‘j’ button on the keyboard with the left and right index fingers respectively. This stimulus response mapping was reversed after the second block. Each number was shown exactly 8 times per block, the order was pseudo-randomized, no number was shown on consecutive trials. In total 16 trials per condition were presented (8 numbers x 2 responses), resulting in a total of 256 trials. After each block of 64 trials, participants got a little break. During the second break, participants received the instruction in red letters that the mapping switched. The task contained two practice blocks of 8 trials: one at the start of the task, one after the reverse of response mapping. There was feedback during the practice trials (750ms), not during the experimental trials.
Results
The following analysis plan was adopted. First an ANOVA is conducted to evaluate the general effect of instruction and of MARC congruency and the order in which it is administered on reaction times are evaluated. Next, we zoomed in on the SNARC effect and used an ANOVA to specifically look at the effect of instruction and mapping order on the SNARC effect. Finally, we used regression analyses to quantify the size of the SNARC effect and ran an ANOVA to evaluate whether the size of the SNARC effect differs between instruction and mapping conditions.
General Description (Accuracy and Reaction Times)
All participants had an accuracy of above 73.4%, with an average accuracy of 96.2% (C1: 97.0%, C2: 96.8, C3: 96.7%, C4: 94.5%) for the different conditions respectively. Only correct trials, slower than 250ms and faster than 2.5 SD from the mean per condition, were used in the analysis (excluding 6.3% of trials). Average overall reaction times were C1: 517ms (SD = 62), C2: 519ms (SD = 80), C3: 525ms (SD = 48), C4: 549ms (SD = 86). There is no significant relation between response speed and accuracy, as shown by the correlations between mean reaction time and mean accuracy (C1: Pearson’s r = .149, p = .430; C2: Pearson’s r = -.177, p = .367; C3: Pearson’s r = .189, p = .335; C4: Pearson’s r = -.100, p = .598).
Effect of Mapping and Instruction on the Overall Reaction Times and MARC-Effect
First, we looked at the general effects of Instruction, Mapping order and MARC congruency as possible factors that determine reaction times. For this purpose, we used a 2 x 2 x 2 ANOVA with MARC congruency (incongruent vs congruent) as within subject factor and Mapping order (Incongruent first vs Congruent first) and Instruction (parity vs categorical) as between subjects’ factor to explore the general effects of learning and mapping order on general reaction time. There was a significant MARC effect, F(1, 112) = 9.281, p = .003, = .077, which did not interact with Mapping order, F(1,112) = 0.167, p = .684, = .001, nor with Instruction, F(1,112) = .866, p = .354, = .008. None of the interactions were significant. Although mapping order seemed to be somewhat slower for the congruent-first mapping (see Figure 1), this was not significant, F(1, 112) = 2.208, p = .140, = .019. Also the main effect of Instruction was not significant, F(1, 112) = 1.001, p = .319, = .009.
Figure 1
Effect of Mapping Order Within Different Instructions
Effect of Mapping and Instruction on the SNARC Effect
First, we evaluated the effects of Mapping order and Instruction on the interaction between Magnitude and Side of Response with an ANOVA with Mapping order (incongruent-congruent vs. congruent-incongruent) and Instruction (even/odd vs 2-4-6-8/1-3-7-9) as between subject factors and Magnitude (eight: numbers 1; 2; 3; 4; 6; 7; 8; 9) and response side (2: left vs right) as within subject factors. This is visualized in Figure 2. When the assumption of sphericity was violated, a Greenhouse-Geisser correction was applied (Greenhouse & Geisser, 1959). The effects of Magnitude, F(5.608, 628.067) = 18.766, p < .001, = 144, and Response side, F(1, 112) = 7.615, p = .007, = .064, were significant. The main effects of Instruction, F(1, 112) = 0.955, p = .330, and of Mapping order, F(1, 112) = 2.256, p = .136, = .020, was not significant. Furthermore, the interaction between Magnitude and Response side was significant, F(4.509, 504.980) = 10.890, p < .001, = .089, reflecting the presence of a SNARC effect. This Magnitude x Response side interaction did not enter in further interactions with Instruction, F(4.509, 504.980) = 0.634, p = .657, = .006, or Mapping Order, F(4.509, 504.980) = 0.690, p = .616, = .006. Finally no significant four way interaction between Magnitude x Response side x Instruction x Mapping order was detected, F(4.509, 504.980) = 0.655, p = .642, = .006.
Figure 2
Reaction Times Per Number and Hand for Each Mapping and Instruction Condition
Regression Analysis
To further quantify the SNARC effect, slopes were calculated by using the regression approach as described by Fias et al. (1996). Average reaction times were calculated for each number and each response side, after which the differences in reaction times (dRT) were calculated for each number by subtracting the left from the right-hand reaction times. Next, for each participant, these dRTs are used in a linear regression with Magnitude as predictor, leading to a regression weight for each participant. A negative regression weight reflects left-right mapping: faster left-hand responses for small magnitude numbers and faster right-hand positions for large magnitude numbers. A t-test is used to verify whether the regression weights significantly differ from zero at the group level. The results showed an average SNARC slope of -6.653, t(115) = -6.062, p < .001, d = 0.100. A 2 x 2 ANOVA with Mapping Order and Instruction as between-subject variables showed that the slopes didn’t differ as a function of instruction, F(1, 112) = .565, p = .454, = .005, or mapping order, F(1, 112) = 3.264x10−4, p = .986, = 2.913x10−6. Furthermore, there was no significant interaction between instruction and mapping order, F(1,112) = .014, p = .905, = 1.282x10−4. SNARC slopes for each condition are visualized in Figure 3.
Figure 3
SNARC Slopes for Each Condition
To evaluate if the SNARC effect and MARC effect are dependent or independent from each other, we investigated the correlation between the SNARC slope and the MARC congruency effect. No significant correlations were observed: Pearson r for the parity/incongruent first condition was 0.08, for the category/incongruent first condition 0.153, for the parity/congruent first condition -0.055, and for the category/congruent first condition -.056 with all p > 0.41. It can be concluded that the size of the SNARC effect is independent from the size of the MARC effect.
To verify whether there is a relationship between general processing speed and the size of the SNARC effect, we calculated the correlation between average reaction time and SNARC slope over all participants. No correlations were observed for any of the conditions with a Pearson r of -0.14 for the parity/incongruent first condition, 0.14 for the category/incongruent first condition, -0.17 for the parity/congruent first condition and -0.14 for the category/congruent first condition (all p > 0.38).
To verify the reliability of the task, a split half reliability was calculated (Cipora, van Dijck, et al., 2019). Based on the order of appearance, valid trials were divided in two parts. For both parts separately, regression slopes were computed. For each participant, these slopes were correlated to each other to yield a reliability index. To adjust for task length, the Spearman-Brown correction was applied, resulting in a general split half reliability of 0.82 for the parity/incongruent first condition, 0.81 for the category/incongruent first condition, 0.61 for the parity/congruent first condition and 0.70 for the category/congruent first condition.
Discussion
The ability to represent numbers on a mental number line has been taken as an important determinant of mathematical abilities (Bonato et al., 2007; Bull et al., 2013; Cipora et al., 2020; Crollen & Noël, 2015; Dehaene et al., 1993; Gibson & Maurer, 2016; Göbel et al., 2015; Schneider et al., 2009). The SNARC effect is a prototypical experimental marker to assess the involvement of spatial number coding in a specific cognitive task. To investigate the link between spatial number coding and mathematical ability, the SNARC effect is also used as an individual difference measure. Here we investigated whether the SNARC effect as an individual difference measure is robust against some design choices that can reasonably be thought to potentially affect the size of the SNARC effect. Such robustness is a necessary precondition if one wants to use the SNARC effect in correlational research.
More precisely, in the present study we investigated two design factors that may potentially have an impact on the size of the SNARC effect as measured in the parity judgement task, the task which we believe, as outlined in the introduction, is best suited to quantify spatial number coding. First, to obtain reaction times of left- and right-hand responses for each number, it is necessary to apply an initial response mapping (e.g. odd is left, even is right) that has to be switched later on in the experiment (e.g. odd is right, even is left). Because one mapping can be assumed to be easier than the other mapping based on linguistic markedness (Nuerk et al., 2004), the order in which the response mapping are administered may affect the size of the SNARC, certainly if combined with a general task training effect: starting with the difficult markedness-incongruent mapping combined with the training effect might lead to very slow reaction times in the first half in the experiment and relatively fast reaction times in the second half which has an easier mapping and has received more practice with parity judgment. With the opposite mapping order, the difference between the two halves of the experiment may be smaller. The results show that, indeed, the response mapping has an influence on reaction times based on linguistic markedness, with the congruent MARC mapping (odd-left and even-right) being faster than the incongruent MARC mapping. Importantly, however, the MARC effect wasn’t influenced by the order in which the two parity-to-response assignments were administered. Similarly, the order of response assignments didn’t have an impact on the size of the SNARC effect.
A second factor that we investigated was the nature of the instruction. Following earlier models on the architecture of numerical cognitive system, determining the parity status of a number requires access to number semantic memory (Dehaene & Cohen, 1995; Fias et al., 1996). Yet, it is also possible that participants don’t follow this path and adopt an alternative strategy, for instance if they are less familiar with the concept of parity. A plausible strategy would be to reconfigure the task set by putting numbers in two categories (2-4-6-8 and 1-3-7-9) and associating these numbers to the required response. Such a category to response mapping wouldn’t in principle require access to semantic number knowledge, in which case no SNARC effect would be expected. Of course, participants could also use a mixture of strategies, in which case the SNARC effect wouldn’t disappear but diminish in size. Here we adopted a condition in which participants were instructed to perform a category to response mapping task: if you see 2-4-6 or 8 press left (or right) and if you see 1-3-7 or 9 press right (or left). We found that the nature of the instructions didn’t have an impact on the size of the SNARC effect, nor on the MARC effect, suggesting that participants behaved uniformly, whatever the instruction. Given the fact that a MARC effect is observed also in the categorical task where in principle linguistic markedness isn’t at stake, it is reasonable to assume that participants spontaneously translated the categorical instruction to the parity instruction, where linguistic markedness is relevant.
The observed robustness against mapping order and nature of instructions is encouraging to use the parity judgment SNARC as an individual difference measure. Yet, of course, a useful instrument to differentiate between individuals and correlate these with mathematical ability (or other) measures, should also have a good reliability. The split-half reliabilities are high, consequently, the current tasks allow for the measurement of individual differences, with the incongruent mapping first condition having overall higher internal consistency than the task conditions which have the congruent mapping first. Of course, one has to keep in mind that reliability is not restricted to internal consistency but might also be looked at over time as test-retest reliability. Our observations do not speak to reliability over time. There are few test-retest reliabilities reported in the literature, but Viarouge, Hubbard, and McCandliss (2014) found moderate test-retest reliability (r = 0.37) with two weeks between test and retest. Reliability of this size is not so bad, especially taking into consideration that the SNARC slope is a performance-based measure (Brysbaert, 2024), but requires large numbers of participants to have enough power to detect true correlations with other variables (Brysbaert, 2024; Hedge et al., 2018). Note that in a recent study Roth et al. (2024) pushed the test-retest reliability to the extreme, by having participants perform the SNARC experiment on thirty days. It was found that the SNARC effect wasn’t stable across the testing sessions. Yet, one can wonder to what extent the task is processed in the same way and the SNARC effect measures the same underlying construct if it is repeated so many times.
The present results relate to those of Didino, Breil, and Knops (2019). While Didino et al. (2019) conducted a within-subject study comparing the size of the SNARC effect in different tasks that differed in level of semantic processing needed to solve the task (magnitude comparison, parity judgment, phoneme monitoring and color judgement), we adopted a between subjects approach investigating different implementations of the same task (parity judgment). Didino et al. (2019) observed differently sized SNARC effects for the different tasks and demonstrated that the size of the SNARC effect depended on the processing time needed for the task (with slower tasks leading to larger SNARC effects), not on the degree of semantic processing. The present study is in line with these findings. There were no latency differences between the versions of the tasks and, accordingly, no differences in the SNARC effect. Exploratively we also looked at individual differences in response latencies and correlate them with the SNARC effect, but didn’t find a relationship, so also at an individual level it is not the case that slower participants exhibit stronger SNARC effects.
In conclusion, the current study shows that the SNARC effect measured with the parity judgment task is resistant to variations in design. This makes parity judgement SNARC effect as useful tool to study individual differences in the degree with which numbers are represented on a mental number line and relate these differences to other cognitive skills and abilities. Yet, one should realize that the present study was conducted in a homogeneous group of university students. It remains unknown whether other age groups show the same invariance to mapping order and instructions.
This is an open access article distributed under the terms of the Creative Commons Attribution License (