The symbolic number magnitude comparison task is often used to investigate the cognitive processes of (multi-digit) number processing. In this task, participants are usually asked to indicate the larger out of two numbers. Two stable hallmark effects observed when comparing numbers are the numerical distance (Moyer & Landauer, 1967; see also e.g., Hohol et al., 2020) and (unit-decade) compatibility effects (Nuerk, Weger, & Willmes, 2001; see Nuerk, Moeller, & Willmes, 2015 for an overview of further numerical effects in multi-digit number processing).
Numerical Distance and Compatibility Effects
The distance effect reflects the finding that performance in number comparison tasks increases with larger distance between numbers. Thereby, the distance effect gave rise to the widely held thought that numbers are represented and processed analogically along the so-called mental number line (e.g., Dehaene & Changeux, 1993; Gallistel & Gelman, 1992; Restle, 1970). The fact that the distance effect was observed for the overall distance when comparing multi-digit numbers (Hinrichs, Yurko, & Hu, 1981) led to the assumption of a simple elongation of the mental number line from the single- into the two-digit number range (e.g., Brysbaert, 1995; Dehaene, Dupoux, & Mehler, 1990) proposing analogue, holistic processing also for multi-digit numbers. However, further research showed that next to the overall distance between numbers, distances between the single corresponding digits (i.e., hundreds, tens, units) influence numerical processing as well (e.g., Nuerk et al., 2001; Verguts & De Moor, 2005). These results favour an alternative account suggesting that rather than being processed purely holistically, the single digits of multi-digit numbers are processed componentially (e.g., hundreds, tens, units etc. are processed separately; see Huber, Nuerk, Willmes, & Moeller, 2016, for a comprehensive computational modelling approach).
Strong support for the componential processing account was further provided by the unit-decade compatibility effect (Nuerk et al., 2001). The unit-decade compatibility effect reflects performance differences between unit-decade compatible number pairs (i.e., the comparison of both tens and units leads to the same decision: 32_57, 3 < 5 and 2 < 7) and unit-decade incompatible number pairs (i.e., the comparison of tens and units leads to opposing decisions: 37_62, 3 < 6 but 7 > 2). When overall distance is held constant between compatible and incompatible number pairs, compatible number pairs are usually responded to faster and with fewer errors than incompatible ones (e.g., Nuerk et al., 2001; see also Huber et al., 2019, for a large-scale online investigation). Moreover, compatibility effects were also observed for three-digit numbers (Bahnmueller et al., 2015, 2016; Korvorst & Damian, 2008; Mann, Moeller, Pixner, Kaufmann, & Nuerk, 2012; see also Huber, Moeller, Nuerk, & Willmes, 2013, for simulated data; see also Meyerhoff, Moeller, Debus, & Nuerk, 2012, for compatibility effects in four- and six-digit numbers). Following the same logic as for two-digit numbers, hundred-decade and hundred-unit compatibility effects can be defined for three-digit numbers (e.g., the number pair 327_465 is hundred-decade compatible because 3 < 4 and 2 < 6, but it is hundred-unit incompatible because 3 < 4 but 7 > 5). In sum, compatibility effects indicate that the magnitudes of the decision-irrelevant digits (i.e., units in two-digit number comparison, tens and units in three-digit number comparison) interfere with the comparison process suggesting that the magnitudes of the single constituting digits of a number are processed componentially.
Both distance and compatibility effects were typically investigated with the magnitude comparison paradigm in which participants were asked to indicate the larger of two numbers. Following from this, one may ask whether the observed effects are generic to multi-digit number processing or whether they also, at least partly, originate from the specific task setup of selecting the larger number. The natural alternative to this task is a setup in which participants are asked to indicate the smaller of the two numbers presented. Even though it may seem that these setups don’t differ much, empirical evidence evaluating effects of task instruction (e.g., picking the larger vs. picking the smaller number) is surprisingly limited. In this context, the concept of linguistic markedness might be of particular interest.
Linguistic Markedness
Linguistic markedness refers to the fact that most adjective pairs have an unmarked/base form and a marked/derived form. Examples for unmarked/base adjectives are “old”, “even”, “right”, “large” or “friendly” and their respective marked/derived counterparts are “young”, “odd”, “left”, “small” and “unfriendly”. Thereby, marked adjectives can, for instance, be constructed by adding a prefix or suffix to the unmarked form (e.g., un-friendly; formal markedness) and/or can represent the adjective form that is used less frequently (e.g., “How young are you?”, “How small are you?”; distributive markedness, cf. Lyons, 1968). In this context, previous studies, for instance, indicate that marked adjectives decrease performance in sentence comprehension (e.g., Sherman, 1973, 1976). Another example can be found in the study by Hines (1990) who observed slower reactions to numbers that have to be classified as odd compared to numbers that have to be classified as even (showing a so-called “odd effect”; see also Nuerk, Iversen, & Willmes, 2004). Following from the linguistic markedness account, the default (unmarked) pick larger setup might differ from the marked pick smaller setup resulting in differences in general task performance (i.e., longer reaction times in the pick smaller setup) as well as in observed numerical effects.
Linguistic Markedness and Numerical Effects
Up to now, only few studies investigated modulations of numerical effects resulting from manipulations of unmarked vs. marked task instructions. For instance, Verguts and De Moor (2005) manipulated linguistic markedness of task instruction (pick the smaller vs. pick the larger number) when investigating the distance effect in a two-digit number comparison task. They found an overall distance effect for within-decade number pairs (e.g., 64_68) but not for between-decade number pairs for which decade distance was held constant (decade distance was always 1; e.g., 68_72) for both the marked and the unmarked task instructions (see Moeller, Klein, & Nuerk, 2013, for a discussion of the differential results regarding distance effects). Crucially, although there was no formal statistical comparison, descriptively overall response times in the pick smaller condition were about 60 ms slower than in the pick larger condition (see Figure 1 in Verguts & De Moor, 2005). Thus, this study seems to show an effect of linguistic markedness on overall reaction times, however, no evidence was provided indicating a modulating effect of linguistic markedness on the numerical distance effect.
Contrarily, Arend and Henik (2015) demonstrated that the linguistic markedness of the task instruction modulates the size congruity effect (SiCE). The SiCE refers to the finding that in numerical and physical comparison tasks, response times are longer when number magnitude and physical size are congruent (e.g., ) than when they are incongruent (e.g., ; Henik & Tzelgov, 1982). In their study, reaction times were longer in the pick smaller condition compared to the pick larger condition. Moreover, the SiCE was larger when participants were instructed to pick the larger as compared to when they were instructed to pick the smaller number in the number magnitude comparison task, but no difference was found in the physical comparison task.
Further studies show that the linguistic markedness of task instruction also affects other types of Spatial-Numerical Associations (SNAs; see e.g., Cipora, Schroeder, Soltanlou, & Nuerk, 2018). Patro and Haman (2012) found an effect of SNA congruency (i.e., faster reactions to larger numerosities on the right) only in the pick larger but not in the pick smaller condition (i.e., reactions to smaller numerosities did not differ between left and right; see Figure 2 in Patro & Haman, 2012). Type of instruction also affects comparative judgments of conceptual size of objects, but not Arabic numbers (Shaki, Petrusic, & Leth-Steensen, 2012).
To sum up, the evidence for the modulating role of linguistic markedness of task instruction on numerical effects remains inconsistent. One potential mechanism by which linguistic markedness of task instruction might affect specific numerical effects may be due to its influence on overall reaction times. For instance, the spatial-numerical association of response codes effect (SNARC effect; Dehaene, Bossini, & Giraux, 1993) was shown to increase with longer overall reaction times (Cipora, Soltanlou, Reips, & Nuerk, 2019; see also Gevers, Verguts, Reynvoet, Caessens, & Fias, 2006; see Cipora et al., 2016, for a discussion of potential measurement artifacts in this context). Other cognitive effects, such as the Simon effect seem to also vary with general reaction time (Mapelli, Rusconi, & Umiltà, 2003; see also Glaser & Glaser, 1982, for the Stroop effect).
With respect to the effects of interest in the present study, the distance effect was shown to be more pronounced for longer reaction times (Hohol et al., 2020). However, to the best of our knowledge, associations of overall response times and compatibility effects have not been reported yet. Nonetheless, in developmental studies overall reaction times were standardized to control for potential effects of interindividual variability in reaction times on the size of compatibility effects (Mann, Moeller, Pixner, Kaufmann, & Nuerk, 2012; Nuerk, Kaufmann, Zoppoth, & Willmes, 2004; Pixner, Moeller, Heřmanová, Nuerk, & Kaufmann, 2011). The reasoning behind the standardization is that prolonged processing of a stimulus might lead to increased interference of task irrelevant digits (i.e., unit digit in two-digit number pairs, unit and tens digit in three-digit number pairs) in incompatible number pairs and, thereby, to larger compatibility effects.
The Present Study
The current study set out to evaluate the generality of basic effects in multi-digit number processing (i.e., distance and compatibility effects) across marked and unmarked task instructions (i.e., pick the larger vs. pick the smaller number). In particular, in a conceptual replication attempt of the study by Bahnmueller et al. (2015), we employed the same three-digit number comparison paradigm with Arabic digits in a comparable sample of German- and English-speaking adults. However, instead of asking participants to indicate the larger of two presented three-digit numbers we asked participants to indicate the smaller of two three-digit numbers.
As it seems unlikely that a change in linguistic markedness of task instructions leads to major disruptions of the main underlying cognitive mechanisms of multi-digit Arabic number processing (i.e., number magnitude should still be processed, numbers should still be processed componentially), we predicted reliable main effects of hundred distance, hundred-decade compatibility, and hundred-unit compatibility when participants are asked to pick the smaller number.
To investigate potential modulating effects of linguistic markedness more directly, we compared overall reaction times as well as the respective numerical effects directly between the newly conducted pick smaller and the pick larger experiment in Bahnmueller et al. (2015). In line with previous reports (Arend & Henik, 2015; Verguts & De Moor, 2005), we expected prolonged reaction times when instructed to pick the smaller as compared to picking the larger number.
Regarding modulations of the numerical effects due to linguistic markedness of the task instruction, we expected to replicate the findings by Verguts and De Moor (2005) showing comparable distance effects for marked and unmarked task instructions. However, regarding the hundred-decade and the hundred-unit compatibility, we expected to find larger compatibility effects when instructed to pick the smaller number because longer overall reaction times and, thus, prolonged processing of number pairs in the pick smaller experiment should lead to increased interference of task irrelevant digits (i.e., unit and tens digit) in incompatible number pairs and, thereby, to larger compatibility effects.
Method
Participants
For the analyses of the pick smaller experiment, newly collected data of a total of 53 participants were considered (after exclusions, see below). Based on Bahnmueller et al. (2015; henceforth referring to the pick larger experiment), we did not expect three-digit number processing to be influenced by the number word structure (e.g., inverted vs. non-inverted number words; but see, e.g., Steiner et al., 2021, this issue, for inversion-related effects when processing multi-digit numbers in children). However, we recruited a comparable sample of German- and English-speaking participants for the pick smaller experiment. This allowed for optimal comparability between studies and further exploration of potential language-related modulations within the present pick smaller experiment.
Three participants were excluded in the pick smaller experiment because error rates exceeded 10% in the experimental trials. Moreover, another four participants were excluded because they consistently used the reverse response coding (i.e., they picked the larger number). Thus, the final pick smaller sample consisted of 30 native German speakers (24 female, all right handed, Mage = 22.7 years, SD = 2.8) and 23 native English speakers (16 female, all right handed, Mage = 19.7 years, SD = 1.4).
For the re-analyses of the pick larger experiment, data of a total of 51 participants were considered. Two participants were excluded because error rates exceeded 10%. Thus, the final pick larger sample consisted of 24 native German speakers (21 female, 20 right handed, Mage = 23.1 years, SD = 6.3) and 27 native English speakers (21 female, 25 right handed, Mage = 20.1 years, SD = 2.3).
German-speaking participants were recruited via postings at the University of Tuebingen and the Leibniz-Institut für Wissensmedien Tübingen. English-speaking participants were recruited at the University of York. Participants received course credit or 5€/4£ for compensation. The study was approved by the local ethics committee of the University of York.
Power Calculations
Sample size estimates for paired t-tests for the pick smaller experiment were calculated using JAMOVI (The jamovi project, 2020) and were based on the respective effect sizes observed in the pick larger experiment. Based on this, a sample size of 27 should be sufficient to detect a hundred-decade compatibility effect (i.e., the smallest main effect observed in the pick larger experiment) of an effect size of d = 0.59 or larger with α = .05 (one-tailed) and a power of .90. To achieve comparability between the pick smaller and the pick larger experiment and to increase sensitivity for detecting a smaller effect in the pick smaller experiment, we aimed at collecting a comparable number of participants (N = 51) allowing us to detect a medium sized effect of d = 0.46.
G*Power (Faul et al., 2009) was used for power estimates of the between-subject effect of instruction (pick smaller vs. pick larger) as well as the within-between interaction of the respective numerical effect and instruction in the 2 × 2 mixed factor ANOVAs. A total sample size of 100 is sufficient to detect a medium sized between-subject as well as interaction effect of f = 0.33 ( = .1) with α = .05 and a power of .90 (see Supplementary Materials for all outputs of the power calculations).
Stimuli
The same stimulus set was used in the pick smaller and the pick larger experiment. In total, 640 three-digit number pairs were used. Of these, 320 were experimental items manipulated orthogonally according to hundred-, decade-, and unit distance (each small [1-3] vs. large [4-8]), as well as hundred-decade and hundred-unit compatibility (compatible vs. incompatible). Moreover, problem size was matched across all item categories and decade as well as unit distance was matched for the respective item categories. In addition to the 320 experimental items, 320 filler items were included in the stimulus set to avoid that participants focused only on the decision-relevant hundred-digit (160 within-hundred filler items, e.g., 672_648; 160 within-hundred-within-decade filler items, e.g., 282_284). Please refer to the Supplementary Materials in Bahnmueller et al. (2015) for a more detailed description of the stimulus set as well as descriptive characteristics of all item categories.
Unfortunately, due to a programming error in the pick smaller experiment, participants were only presented with 560 of the 640 items (i.e., the last block [80 items] was not presented). The 560 items were randomly drawn from the total item set for each participant. Regarding the 320 experimental stimuli included in the analyses, an item was presented 46.4 times on average (SD = 2.4, range: 40-52). Because items were drawn randomly, stimulus matching was not substantially affected (see Supplementary Materials for item characteristics of the experimental items in the pick smaller experiment compared to item characteristics of the matched stimulus set).
Procedure
The procedure of both experiments was identical and differed only with respect to the task instruction. In particular, participants were instructed to indicate the smaller (pick smaller experiment) or the larger (pick larger experiment) of two simultaneously presented three-digit numbers as fast and as accurately as possible. Numbers were presented above each other. In the pick smaller experiment, participants were asked to press the upward arrow of a standard keyboard in case the upper number was smaller, and they were asked to press the downward arrow in case the lower number was the smaller one. In contrast, in the pick larger experiment, participants had to indicate the location of the larger number by pressing the upward arrow in case the upper number was larger, and the downward arrow in case the lower number was larger.
The respective experiment started with 10 practice trials, followed by 8 blocks (7 blocks in the pick smaller study) containing 80 items each. After each block, the participant could take a short break. Stimulus order was randomized separately for each participant and across blocks. Stimuli were presented centrally in white against a black background (font: Arial, font size: 24, bold). A trial started with a fixation cross presented centrally for 500ms. Following the fixation cross, a number pair was presented and remained on the screen until a response was given. The next trial started after an inter-trial-interval of 500ms.
Results
Analyses
Analyses were performed using R (R Core Team, 2020) and RStudio (RStudio Team, 2020) as well as JASP for Bayesian analyses (JASP Team, 2020). For the interpretation of Bayes factors, we use the classification adopted in JASP (van Doorn et al., 2019) differentiating strong (BF01 < 1/10) and moderate evidence against H0 (1/10 < BF01 < 1/3), weak/inconclusive evidence (1/3 < BF01 < 3) as well as moderate (3 < BF10 < 10) and strong evidence for H1 (BF10 > 10). Data, analysis script and JASP output files illustrating Bayesian analyses with all the parameters used can be found in the Supplementary Materials.
As error rates were very low (pick smaller experiment: M = 4.3%, SD = 2.0%; pick larger experiment: M = 3.7%, SD = 2.1%) analyses focused on reaction times (RT). Practice trials and filler items were excluded from the analyses. Moreover, RTs faster than 200ms as well as RTs deviating more than +/- 3SD from an individual participant’s mean RT were excluded. This trimming procedure resulted in a loss of 1.4% of data.
Directly addressing our primary research question, we first report results of the analyses of numerical effects in the new pick smaller experiment using three paired t-tests1 (i.e., one per numerical effect; effect sizes (Cohen’s d for paired t-tests) along with 95% confidence intervals were estimated as implemented in JASP). Moreover, a 2 × 2 × 2 × 2 mixed design ANOVA similar to the one reported by Bahnmueller et al. (2015) discerning the within-subject factors hundred distance, hundred-decade compatibility, and hundred-unit compatibility, as well as the between-subject factor language group (German vs. English) will also be reported for the pick smaller experiment.
Analyses of the pick smaller experiment are directly followed by the re-analysis of the results of the pick larger experiment using the same, more focused analyses (i.e., one paired t-test per numerical effect). Afterwards, results of the direct comparison of the two experiments are reported separately for mean reaction times and each numerical effect using both frequentist as well as Bayesian measures to be able to quantify the evidence for both the null and the alternative hypothesis.
Pick Smaller Experiment
Results of t-tests indicated a regular hundred distance effect with faster RTs for number pairs with a large (M = 694ms, SD = 123ms) as compared to a small hundred distance, M = 788ms, SD = 147ms, t(52) = 18.80, p < .001, d = 2.58, 95% CI [2.02, 3.14]. Moreover, both the hundred-decade, t(52) = 6.34, p < .001, d = 0.87, 95% CI [0.55, 1.18], and the hundred-unit compatibility effects were significant, t(52) = 6.89, p < .001, d = 0.95, 95% CI [0.62, 1.27]. Responses were faster for compatible (hundred-decade: M = 731ms, SD = 135ms; hundred-unit: M = 728ms, SD = 132ms) compared to incompatible number pairs (hundred-decade: M = 749ms, SD = 134ms; hundred-unit: M = 751ms, SD = 138ms). The significance of results remains unchanged when correcting for multiple comparisons. Thus, all three numerical effects were also present when participants had to pick the smaller number.
We further ran a 2 × 2 × 2 × 2 mixed design ANOVA discerning the within-subject factors hundred distance, hundred-decade compatibility, and hundred-unit compatibility, as well as the between-subject factor language group (German vs. English) for the pick smaller experiment. As expected based on the results of the t-tests above, we observed significant main effects of hundred distance, F(1, 51) = 353.75, p < .001, = .87, hundred-decade compatibility, F(1, 51) = 46.23, p < .001, = .48, and hundred-unit compatibility, F(1, 51) = 51.32, p < .001, = .50. Moreover, the interaction of hundred-distance and hundred-unit compatibility was significant, F(1, 51) = 4.66, p = .036, = .08, indicating that the hundred-unit compatibility effect was significant for both small and large hundred distances (small: t(52) = 6.16, p < .001; large: t(52) = 4.03, p < .001) but was larger for small compared to large hundred distances, t(52) = 2.24, p = .029. Crucially, neither the main effect of language group, F(1, 51) = 1.88, p = .176, = .04, nor any of the interactions with language group were significant (all p ≥ .142). Thus, results for the pick smaller experiment provide no evidence for a difference in numerical effects between German and English speakers replicating observations of Bahnmueller et al. (2015) previously reported for the pick larger experiment. Results of a parallel Bayesian mixed design ANOVA showing a comparable pattern can be found in the Supplementary Materials.
Pick Larger Experiment
Paralleling analyses of the pick smaller experiment and providing a more focused analysis as presented in Bahnmueller et al. (2015), three separate paired t-tests were also run for the pick larger experiment. Comparable to the pick smaller study, a significant hundred distance effect was observed showing faster RTs for number pairs with a large (M = 728ms, SD = 160ms) as compared to small hundred distance, M = 815ms, SD = 176ms, t(50) = 22.88, p < .001, d = 3.20, 95% CI [2.52, 3.88]. In addition, the effect of hundred-decade compatibility was significant, t(50) = 4.19, p < .001, d = 0.59, 95% CI [0.29, 0.88]; indicating that compatible number pairs (M = 764ms, SD = 172ms) were responded to faster than incompatible number pairs (M = 777ms, SD = 165ms). Finally, the effect of hundred-unit compatibility was also significant, t(50) = 6.79, p < .001, d = 0.95, 95% CI [0.62, 1.28], with compatible number pairs (M = 757ms, SD = 165ms) being responded to faster than incompatible number pairs (M = 784ms, SD = 172ms). Again, the significance of results remains unchanged when correcting for multiple comparisons. Refer to Bahnmueller et al. (2015) for results of the analysis of the full factorial design.
Pick Smaller vs. Pick Larger Experiment
Mean Reaction Time
Results of an independent t-tests showed no significant difference in mean RT between the pick larger (M = 770ms, SD = 168ms) and the pick smaller task instruction, M = 740ms, SD = 134ms; t(52) = 1.03, p = .306, d = 0.20, 95% CI [-0.19, 0.59].
Modulation of the Hundred Distance Effect
A mixed design ANOVA with the within-subject factor hundred distance (small vs. large) and the between-subject factor instruction (pick smaller vs. pick larger) revealed a significant effect of hundred distance, F(1, 102) = 818.76, p < .001, = .89; small: M = 801ms, SD = 162ms; large: M = 711ms, SD = 143ms). Neither the main effect of instruction, F(1, 102) = 1.05, p = .308, = .01; pick smaller: M = 741ms, SD = 143ms; pick larger: M = 771ms, SD = 173ms, nor the interaction of hundred distance and instruction were significant, F(1, 102) = 1.56, p = .214, = .02.
To quantify the evidence in case of non-significant results, we further ran a Bayesian mixed design ANOVA using default JASP prior scales. It revealed that the data were best represented by a model that included the main effect of hundred distance only. The Bayes Factor (BF10) for this model was 4.33 × 1046, indicating strong evidence for this model over the null model. Results further showed strong evidence against the model only including the main effect of instruction (BF10 = 1.29 × 10-47 or BF01 = 7.75 × 1046) as the data were 7.75 × 1046 times more likely under the best model (i.e., the model only including the main effect of hundred distance). Moreover, results revealed weak/inconclusive evidence against the model including both main effects (BF10 = 0.49 or BF01 = 2.03) and moderate evidence against the model additionally including the interaction term (BF10 = 0.18 or BF01 = 5.55) when compared to the best model (see Table 1, see also Supplementary Materials for JASP output and analyses files).
Table 1
Models | Model comparison
|
Analyses of effects
|
||||||
---|---|---|---|---|---|---|---|---|
p(m) | p(m|data) | BFM | BF10 | effects | p(incl) | p(incl|data) | BFincl | |
HD | .20 | .60 | 5.95 | 1.00 | HD | .40 | .89 | 4.15 × 1046 |
HD + Instr | .20 | .29 | 1.67 | 0.49 | Instr | .40 | .29 | 0.49 |
HD + Instr + HD × Instr | .20 | .11 | 0.48 | 0.18 | HD+Instr | .20 | .11 | 0.37 |
Null Model | .20 | 1.38 × 10-47 | 5.52 × 10-47 | 2.31 × 10-47 | ||||
Instr | .20 | 7.72 × 10-47 | 3.09 × 10-47 | 1.29 × 10-47 |
Note. HD = hundred distance; Instr = instruction; m = model; incl = inclusion. Models are compared to the best fitting model (i.e., the model only including the main effect of HD).
Modulation of the Hundred-Decade Compatibility Effect
A mixed design ANOVA with the within-subject factor hundred-decade compatibility (compatible vs. incompatible) and the between-subject factor instruction revealed a significant effect of hundred-decade compatibility, F(1, 102) = 54.64, p < .001, = .35; compatible: M = 747ms, SD = 154ms; incompatible: M = 763ms, SD = 150ms. The interaction of hundred-decade compatibility and instruction was not significant, F(1, 102) = 1.57, p = .213, = .02. The paralleling Bayesian mixed design ANOVA showed that the data were best represented by a model that included the main effect of hundred-decade compatibility only. The BF10 for this model was 1.72 × 108, indicating strong evidence for this model when compared to the null model. Moreover, there was strong evidence against the model only including the main effect of instruction (BF10 = 2.51 × 10-9 or BF01 = 3.98 × 108) by indicating that the data are 3.98 × 108 times more likely under the best model (i.e., only including the main effect of hundred-decade compatibility). Finally, results revealed weak/inconclusive evidence against the model including both main effects (BF10 = 0.45 or BF01 = 2.23) and moderate evidence against the model additionally including the interaction term (BF10 = 0.17 or BF01 = 5.97) when compared to the best model (see Table 2).
Table 2
Models | Model comparison
|
Analyses of effects
|
||||||
---|---|---|---|---|---|---|---|---|
p(m) | p(m|data) | BFM | BF10 | effects | p(incl) | p(incl|data) | BFincl | |
HDC | .20 | .62 | 6.49 | 1.00 | HDC | .40 | .90 | 1.74 × 108 |
HDC + Instr | .20 | .28 | 1.54 | 0.45 | Instr | .40 | .28 | 0.45 |
HDC + Instr + HDC × Instr | .20 | .10 | 0.46 | 0.17 | HDC + Instr | .20 | .10 | 0.37 |
Null Model | .20 | 3.60 × 10-9 | 1.44 × 10-8 | 5.81 × 10-9 | ||||
Instr | .20 | 1.55 × 10-9 | 6.21 × 10-9 | 2.51 × 10-9 |
Note. HDC = hundred-decade compatibility; Instr = instruction; m = model; incl = inclusion. Models are compared to the best fitting model (i.e., the model only including the main effect of HDC).
Modulation of the Hundred-Unit Compatibility Effect
A final mixed design ANOVA with the within-subject factor hundred-unit compatibility (compatible vs. incompatible) and the between-subject factor instruction revealed a significant effect of hundred-unit compatibility, F(1, 102) = 93.43, p < .001, = .48; compatible: M = 742ms, SD = 149ms; incompatible: M = 767ms, SD = 155ms. The interaction of hundred-decade compatibility and instruction was not significant, F(1, 102) = 0.47, p = .494, = .01. The corresponding Bayesian mixed design ANOVA showed that the data were best represented by a model that included the main effect of hundred-decade compatibility only. The BF10 for this model was 1.06 × 1013, indicating strong evidence for this model when compared to the null model. When compared to the best model (i.e., only including the main effect of hundred-unit compatibility), results revealed strong evidence against the model only including the main effect of instruction (BF10 = 5.61 × 10-14 or BF01 = 1.78 × 1013). Moreover, when compared to the best model, results revealed weak/inconclusive evidence against the model including both main effects (BF10 = 0.43 or BF01 = 2.32) and moderate evidence against the model additionally including the interaction term (BF10 = 0.11 or BF01 = 9.01; see Table 3).
Table 3
Models | Model comparison
|
Analyses of effects
|
||||||
---|---|---|---|---|---|---|---|---|
p(m) | p(m|data) | BFM | BF10 | effects | p(incl) | p(incl|data) | BFincl | |
HUC | .20 | .65 | 7.38 | 1.00 | HUC | .40 | .93 | 9.49 × 1012 |
HUC + Instr | .20 | .28 | 1.55 | 0.43 | Instr | .40 | .28 | 0.43 |
HUC + Instr + HUC × Instr | .20 | .07 | 0.31 | 0.11 | HUC + Instr | .20 | .07 | 0.26 |
Null Model | .20 | 6.15 × 10-14 | 2.46 × 10-13 | 9.48 × 10-14 | ||||
Instr | .20 | 3.64 × 10-14 | 1.46 × 10-13 | 5.61 × 10-14 |
Note. HUC = hundred-unit compatibility; Instr = instruction; m = model; incl = inclusion. Models are compared to the best fitting model (i.e., the model only including the main effect of HUC).
Figure 1 illustrates Cohen’s d and 95% confidence intervals around the respective effect separately for each numerical effect and instruction (pick smaller vs. pick larger). In line with Bayesian analyses, similar point estimates and largely overlapping confidence intervals do not provide evidence for a difference in numerical effect between experiments.
Figure 1
Bin Analyses
To explore potential differences in the time course of the effects of interest both within and across experiments, we further ran a bin analysis dividing the RT distribution in each condition into four equal bins (i.e., from fastest to slowest RTs; cf. Arend & Henik, 2015). In contrast to Arend and Henik (2015), the results pattern did not show evidence for a systematic influence of RT bin on the numerical effects of interest (neither in the pick smaller nor in the pick larger experiment). The differential result pattern may result from differences in effects under investigation (size congruity effect versus distance and compatibility effects), and number range (single vs. multi-digit numbers). For the interested reader results of these analyses are provided in the Supplementary Materials).
Discussion
In a conceptual replication attempt of the study by Bahnmueller et al. (2015), the present study aimed at evaluating the generalizability of basic effects in multi-digit number processing across marked and unmarked task instructions. Overall, we replicated effects of hundred distance, hundred-decade-, as well as hundred-unit compatibility that were previously reported using an unmarked task instruction (i.e., pick the larger number, cf. Bahnmueller et al., 2015) in a three-digit number comparison task using a marked task instruction (i.e., pick the smaller number). Results showed no significant difference in overall reaction times between the comparison tasks using the marked (pick smaller) and the unmarked (pick larger) task instruction. Additional Bayesian analyses provided evidence that linguistic markedness of the task instruction did not affect the numerical effects of interest. Moreover, no evidence for a difference between experiments in the size of either one of the numerical effects was observed. These results were confirmed by Bayesian analyses providing moderate evidence against the interaction of task instruction and the respective numerical effect. Taken together, our data suggest that distance and compatibility effects and with this componential processing of multi-digit numbers are largely unaffected by variations of the linguistic markedness of task instructions.
Numerical Effects and Task Instruction
In line with previous observations regarding three-digit number comparison tasks (Bahnmueller et al., 2015, 2016; Huber et al., 2013; Korvorst & Damian, 2008; Mann et al., 2012), we replicated both the hundred-decade and the hundred-unit compatibility effect as well as the effect of hundred distance in the pick smaller experiment. Importantly, effect sizes observed in the pick smaller experiment were very similar to those observed in the pick larger experiment, and the interaction between task instruction and the numerical effects of interest was not significant. Moreover, Bayesian analyses provided moderate evidence against an influence of linguistic markedness on the three numerical effects under investigation.
Thus, no major disruptions of the behavioural signatures of multi-digit Arabic number processing were observed when participants were confronted with a marked task instruction. Thereby, the present study provides further evidence for the robustness of the numerical effects under investigation and suggests that these numerical effects do not seem to be bound to specific experimental setups. And further, as indexed by significant compatibility effects resulting from interference due to the decision irrelevant tens/unit digit, the present study provides evidence towards the componential processing account put forward for multi-digit number processing (cf. Huber et al., 2016).
General Performance and Task Instruction
However, in contrast to previous findings in single- and two-digit number comparison (Arend & Henik 2015; Verguts & De Moor, 2005), we did not detect reliable differences in overall response times in frequentist analyses. Although the Bayesian analysis supports the null model, the evidential value is relatively weak. Thus, it is possible that with a larger sample the direction of the evidence would change providing evidence for an effect of linguistic markedness. However, given our sample size, this scenario seems rather unlikely. What we can conclude is that an effect of linguistic markedness on general reaction times, if it exists, must be rather subtle. Furthermore, as overall reaction times were comparable between experiments, the mechanism through which we anticipated modulations of the compatibility effects (i.e., longer reaction times when confronted with the marked task instruction resulting in more elaborated processing of a stimulus and, therefore, increased interference due to the irrelevant tens/unit digit in incompatible trails) could not be demonstrated.
Moreover, it seems that most participants in the pick smaller experiment were fairly adaptive to the marked task instruction. Interestingly, in the pick smaller experiment, four participants had to be excluded from the analyses because they consistently picked the larger number although instructed to pick the smaller one. Similar confusions did not occur in the pick larger experiment. Thereby, our results may suggest that, when comparing numbers beyond the two-digit number range, following an unmarked task instruction relies on an initial categorical internalization of the task instruction rather than on a continuous, ongoing conflict or source of interference throughout the comparison task. As this account is rather speculative, future studies might consider manipulating linguistic markedness of the task instruction in within-participant designs, for instance, using a task switching paradigm (cf. Shaki et al., 2012). In such a task switching paradigm participants would have to switch between marked and unmarked task instructions when comparing numbers on a trial by trial basis. This would allow for evaluating whether marked task instructions indeed influence multi-digit number processing on a trial by trial basis when an initial categorical internalization of the task instruction is not possible.
Conclusion
Taken together, we successfully replicated main results reported by Bahnmueller et al. (2015) showing that distance and compatibility effects in a three-digit number comparison task generalize across marked and unmarked task instructions. Crucially, however, linguistic markedness of task instructions did not seem to influence basic numerical processing as the size of numerical effects was comparable between experiments using a marked compared to an unmarked task instruction. In particular, results suggest that basic strategies in three-digit number processing are rather robust against variations of the linguistic markedness of task instructions.