Many researchers have suggested that the ability to process numerical quantities stems from an evolutionarily old Approximate Number System (ANS; e.g., Dehaene, 1997; Cordes et al., 2001; Gallistel & Gelman, 1992). This system is believed by some researchers to solve the “symbol-grounding problem”, in the sense that it explains how children learn to associate semantic meaning with symbolic representations of number (i.e., the sounds of number words or their corresponding Arabic digits). The theory suggests that the ANS provides nonsymbolic representations of number, at least for numbers outside of the subitizing range (i.e., greater than 4), onto which we map our learned symbolic representations. However, more recently this account has been criticised on the grounds that ANS representations are not genuinely numerical (e.g., Gebuis, Cohen Kadosh, & Gevers, 2016; Leibovich & Ansari, 2016; Leibovich, Katzin, Harel, & Henik, 2017).

Given this debate, understanding how the ANS functions is of great theoretical and practical importance. In particular, the precision, or acuity, of children’s ANS representations has been found to predict their general mathematical performance (e.g., Chen & Li, 2014), leading some to hypothesize that training the ANS will improve mathematical performance (e.g., Park & Brannon, 2013). Clearly if such training were shown to be effective, there would be substantial implications for educational practice (but see Szűcs & Myers, 2017).

However, in recent years a debate about the nature of ANS representations has emerged.
In its original “number sense” conception, researchers posited that the representations
formed by the ANS are *numerical*, in the sense that they are related to natural numbers. That is to say that these
representations are formed by directly perceiving the cardinality of sets of objects,
such as a collection of apples or an array of dots. This assumption seems critical
to the proposal that the ANS is the mechanism through which our representations of
symbolic number symbols derive their meaning. If there are important disanalogies
between ANS representations and mental representations of symbolic numbers, then it
seems less likely that the former are the basis of the latter.

There are good reasons to believe that ANS representations are in some sense numerical: the ANS can be used to compare, add and subtract sets of items (e.g., Barth et al., 2005, 2006; Brannon, 2002). However, more recently it has been suggested that, rather than being formed by directly perceiving (albeit approximately) the cardinality of a set, ANS representations are instead formed by amalgamating perceptual features that, taken together, are confounded with the set’s cardinality. For instance, sets of dots with a larger cardinality typically also have a larger cumulative surface area and a larger convex hull (perimeter). Because it is not possible to control for all perceptual features of a set simultaneously, this account is hard to rule out through careful experimental design. As a result, some researchers have argued that the ANS may actually be a holistic sensory-integration system, based on continuous-magnitude cues, not numerical cues (Gebuis, Cohen Kadosh, & Gevers, 2016; Leibovich, Katzin, Harel, & Henik, 2017). Critically, those who favour a sensory-integration account argue that the cues used to generate ANS representations are based on magnitudes that are non-numerical, even though these continuous magnitudes share some characteristics with real (but not natural) numbers (e.g., Leibovich et al., 2017; Núñez, 2017).

In response to the sensory-integration critique of the classical “number sense” account,
Halberda (2019) argued that the critique’s proponents mistakenly assume that perceptual input determines
conceptual content. Halberda suggested that, when considering whether or not ANS representations
are numerical, it is irrelevant how the representations are *formed*: what matters is how they *behave*. If ANS representations behave like natural numbers then they are, in all relevant
aspects, representations of natural numbers.

Interestingly, Halberda’s (2019) argument has a strong philosophical pedigree. In the classic paper *What Numbers Could Not Be,* Benacerraf (1965) made an analogous argument about natural numbers in general, arguing that the philosophically
relevant thing about numbers is not what they *are*, but how they *behave.* He illustrated his point by comparing two different formalizations of the natural
numbers, one based on Zemelo ordinals and one based on von Neumann ordinals. Under
one approach the number 3 is {{{∅}}}, under the other it is {∅, {∅}, {∅, {∅}}} (here
∅ denotes the empty set, {∅} denotes the set containing the empty set, and {∅, {∅}}
denotes the set containing two sets, namely the empty set and the set containing the
empty set). Benacerraf pointed out that there is no fact of the matter about which
of these formalizations is correct: numbers are numbers not because of what they are,
but because of how they behave in relation to other objects within the larger structure.
Both {{{∅}}} and {∅, {∅}, {∅, {∅}}} can play the role of the number 3, because both
fit into wider abstract structures that have the properties we expect the natural
numbers to have (i.e., the usual notions of addition, subtraction, multiplication
and so on can be defined on both of these formal structures). This observation led
to the branch of the philosophy of mathematics known as *structuralism*.

If we take the Halberda/Benacerraf approach seriously, then addressing how ANS representations are formed cannot answer the question of whether or not they are numerical. Instead, we must ask in which respects ANS representations behave like numbers and, in particular, whether they can play the role of numbers within some larger structure. Some have begun to address this issue explicitly. For example, Marshall (2018) asserted that ANS representations fail to behave numerically in at least four respects: they (i) are fuzzy and not discrete, (ii) lack the potential for infinity, (iii) lack general applicability (i.e., cannot be abstracted across modalities) and (iv) lack cardinal/ordinal duality (i.e., they lack the distinction between cardinal and ordinal numerical properties).

Clarke and Beck (2021) agreed with the Halberda/Benacerraf critique of sensory-integration accounts, but
disputed whether Marshall’s (2018) list of numerical properties not shared by ANS representations could resolve the
dispute. Such a resolution would require that Marshall subscribe to what Clarke and
Beck called the *strong sensitivity principle*, the idea that “if X has properties *p*_{1}, … *p _{n} essentially*, then representing X requires being sensitive to

*all*of

*p*

_{1}, …

*p*” (p. 21). Arguing that this principle was implausible, Clarke and Beck instead proposed the

_{n}*weak sensitivity principle*, which only insists that representing X requires being sensitive to

*some*of

*p*

_{1}, …

*p*. So, although Marshall was clearly correct to draw attention to, for example, the fuzzy nature of ANS representations, and to note that this property is not shared by the natural numbers, under Clarke and Beck’s weak sensitivity principle this fact is not sufficient to conclude that ANS representations fail to represent the natural numbers.

_{n}These discussions lead us to suggest that treating the issue of whether ANS representations are numerical as a binary matter is unhelpful. Instead, we should ask in what respects they are numerical by identifying numerical properties that ANS representations have, and those they do not have. The experiments reported in this paper address this issue. In particular, we asked whether participants are able to multiply ANS representations.

Clearly multiplication is well defined on the natural numbers, and educated adults are able to multiply the mental representations they form when exposed to Arabic numerals (usually via recall for single-digit Arabic numerals, using paper-and-pencil strategies for larger multi-digit Arabic numerals). But can ANS representations be multiplied? Prior research has addressed this question, albeit in restricted contexts. For instance, Barth, Baron, Spelke, and Carey (2009) investigated whether 6-7 year old children could multiply ANS representations by a fixed quantity. Specifically, their participants were shown a set of dots that was then covered by an occluder and subjected to a ‘magic’ transformation that resulted in the number of dots being doubled (in a separate experiment, the transformation resulted in half the number of dots). The participants were then asked to compare the magnitude of this (still occluded) quantity to a separate display of dots. Participants performed at above-chance levels on this task, leading Barth et al. to conclude that children have the ability to perform simple transformations of ANS representations. They speculated that this may be a precursor to a more formal understanding of multiplication, and indeed rational numbers more generally.

In a similar study, McCrink and Spelke (2010) told 5-7 year old children that the magical transformation doubled the number of objects in a visual display following its occlusion (“Look! It’s our magic multiplying wand. It made more. There used to be one rectangle, and now there are two.”), again finding that children could assess the outcome of the multiplication at above-chance levels. McCrink and Spelke also found that children could perform this task when the multiplier was 4 and 2.5. Similarly, McCrink, Shafto, and Barth (2017) found that children could scale representations formed from nonsymbolic representations by factors 0.25, 0.5, 2 and 4. Critically, McCrink and Spelke noted that their task, like Barth et al.’s, differed from traditional symbolic multiplication, as taught in schools, in that the multiplier was fixed across trials. In each case, children were asked to multiply ANS representations by an unvarying quantity (0.5, 2, 2.5 or 4). In contrast, in typical symbolic contexts all three of multiplier, multiplicand and product have the potential to vary from problem to problem.

However, the multiplication of representations formed by observing arrays of dots has also been studied in a context where the multiplier, multiplicand and product vary across trials. Ciccione and Dehaene (2020) asked adults to enumerate arrays of dots grouped in different formations. For instance, in one condition, participants were asked to enumerate an ungrouped set of nine dots, and in another they were asked to enumerate three spatially separated groups of three dots. Ciccione and Dehaene found that response times were shorter when participants were able to use a multiplication strategy (e.g., form representations of the number of groups and the number of dots per group, and multiply them). This advantage on trials where a multiplication strategy was available was present both when groups were defined spatially, and also when they were defined by colour. Although this finding provides strong evidence that participants use mental arithmetic to enumerate where it is possible to do so, it does not directly speak to the question of whether or not ANS representations can be multiplied. The quantities used in Ciccione and Dehaene’s study were all within the subitizing range: 2, 3 or 4. In other words, Ciccione and Dehaene were studying whether or not the Object Tracking System (OTS), the system upon which nonsymbolic processing of quantities within the subitizing range is based (e.g., Lipton & Spelke, 2004), generates representations that can be multiplied.

Relatedly, in their studies of operational momentum in symbolic and nonsymbolic arithmetic, Katz and Knops (2014) asked adult participants to multiply two sets of dots, selecting the correct answer from five nonsymbolic alternatives (or seven alternatives in a subsequent similar study, Katz, Hoesterey, & Knops, 2017). They found that participants performed nonrandomly, in the sense that participants were influenced by the response options they were presented with (e.g., whether the correct answer was the second or fourth largest of the five presented). Nevertheless, overall accuracy was low, and close to the 20% chance level. Moreover, Katz and Knops’s problems crossed the subitizing range: of their 24 problems, there was 1 where both operands were within the subitizing range, 18 where one operand was within the subitizing range and one was outside, and 5 where both operands were outside the subitizing range. Given this range of problems it is difficult to draw conclusions about the relative performance of participants on problems requiring the multiplication of two OTS representations with those requiring the multiplication of two ANS representations. However, given Katz and Knops’s results, we might expect that adult participants should be able to perform at above-chance levels on multiplication problems where one operand is an OTS representation and one is an ANS representation. To our knowledge, no previous study has directly investigated whether participants are able to multiply two ANS representations (as opposed to two OTS representations) in a manner analogous to mental representations formed from Arabic numerals: i.e., where the multiplier and multiplicand are ANS representations that potentially vary across trials.

Note that if ANS representations can be multiplied in a similar manner to representations formed from Arabic numerals, then this would not allow us to rule out the possibility that they track (numerical) magnitudes rather than natural numbers. Gallistel (2011, p. 3) defined magnitudes to be real (or computable) numbers and, as with the natural numbers (and indeed other mathematical objects such as vectors, matrices, sets, etc.), it is possible to define the notion of multiplication on the real numbers. However, if ANS representations cannot be multiplied (either together, or with OTS representations), then we can conclude that multiplicability is a numerical property that ANS representations do not possess, and that there is at least one important disanalogy between ANS representations and mental representations of symbolic Arabic numerals.

In sum, our goal in this paper is to explore whether ANS representations can be multiplied. We presented participants with a multiplicand and multiplier, both displayed as arrays of dots, and asked them to verify whether a third array of dots represented the correct product. This allowed us to explore whether participants are able to multiply representations formed from nonsymbolic arrays. By varying the cardinality of our arrays across subitizing (2-4) and non-subitizing ranges (5-8) we explored whether there is a difference between ANS representations and OTS representations with respect to whether they are multipliable. In Experiment 1 we concentrated on the issue of whether multiplications are possible where both multiplicand and multiplier are ANS representations. In Experiment 2 we also explored whether multiplication problems can be successfully solved when the multiplicand is an ANS representation and the multiplier is an OTS representation (and vice versa).

## Experiment 1

### Method

#### Participants

An opportunity sample of 100 adults consented to participate online after having been
recruited through Prolific (www.Prolific.co). We preregistered that if, after data cleaning (following preregistered exclusion
criteria, detailed below and in the Supplementary Materials), fewer than 80 complete datasets had been collected, then we would continue to collect
data until a minimum sample size of 80 had been achieved. This figure was estimated
via a pilot study. After two rounds of data collection and data cleaning, 96 participants’
data were available. Participants’ mean age was 24.3 years (*SD* 5.8 years), 66 were female, 29 male and one person preferred not to say/identified
as another sex. Participants were reimbursed £1.25 for participation. Participation
was voluntary and study procedures were approved by Loughborough University’s Ethical
Approvals (Human Participants) Sub-Committee.

#### Design, Apparatus and Materials

The study used a fourteen-condition repeated-measures design. Both symbolic and nonsymbolic
arithmetic tasks used the multiplicands 2-8, multiplied against themselves to form
seven unique questions (e.g., 2×2, 3×3, etc) across two arithmetic tasks. The study
was programmed in Gorilla^{TM} (www.Gorilla.sc) and completed online. The experiment files can be found online (see Supplementary Materials).

##### Nonsymbolic Task

Nonsymbolic stimuli were created in MATLAB using the CUSTOM scripts (De Marco & Cutini, 2020). The nonsymbolic task consisted of 140 experimental trials. For experimental trials, the seven unique multiplicands/multipliers (2-8) were presented twenty times, half with the correct product and half with the incorrect product. For seventy trials, a set combination of dot arrays was used for each question’s multiplicand, multiplier and correct product (a total of 210 unique dot arrays, for 70 unique-stimulus combinations). Questions were then repeated with incorrect products (a further 70 unique dot arrays for incorrect products). Half the incorrect products were too big (correct answer / 0.7) and half were too small (correct answer × 0.7).

To ensure participants could easily differentiate between different arrays, multiplicands were always pink, multipliers were blue and purported products were black. All stimuli were presented on a plain white background. Trials began with a fixation cross (250ms), dot arrays were presented for 300ms and response screens remained until the participants responded. Between the multiplier and multiplicand, a blank screen was used in lieu of a fixation cross to avoid confusion with, or priming of, addition-based arithmetic. Participants responded with a key press, and response screens reminded participants of the experimental instructions, “Press M if you think dots A × dots B EQUALS dots C. Press Z if you think dots A × dots B does NOT EQUAL dots C”. Response screens also had a plain white background. All 140 trials were presented in a random order for each participant. Participants completed six practice questions, with feedback, before starting the main task. Figure 1 shows a schematic representation of an example trial.

##### Figure 1

The diameters of dots were kept the same across trials, whereas the contour, surface area, convex hull and the average distance between dots varied across stimuli images. Note that we did not attempt to control our nonsymbolic stimuli for these, or other, non-numerical cues (e.g. Clayton, Gilmore, & Inglis, 2015; Gebuis & Reynvoet, 2012). At no point were participants asked to compare the size of two or more dot arrays, so the issue of basing decisions on non-numerical confounds did not arise in our design. Because of this, our stimuli were more ecologically valid than typical stimuli used in nonsymbolic comparison tasks: ANS representations in day-to-day life are formed from stimuli where various continuous quantities are correlated with numerosity. If participants were able to multiply two ANS representations, we would expect our stimuli to give them the best possible chance to do so.

##### Symbolic Task

For the symbolic task, arithmetic questions and answers, colour schemes, response screens, presentation durations, and response keys were all identical to the nonsymbolic task. Here, 56 trials were presented with 8 presentations of each of the 7 unique questions. Numerals were always presented centrally on the screen in Calibri 150-point font. All 56 trials were presented in a random order for each participant. There were no practice questions, but instructions were provided before the task and were reprinted on the response screens.

#### Procedure

Participants completed the study on desktop or laptop computers (a Prolific filter ensured this). Participants first gave their consent to participate and then answered some demographic questions (their age, sex, highest qualification, whether English is their first language, and which country they live in). Then, they completed the nonsymbolic arithmetic task followed by the symbolic arithmetic task, in that order. Instructions had embedded attention checks (a strange word). To ensure participants were attentive, we asked participants to read the instructions carefully and to declare the strange word. At the end of the study, we again asked participants what their highest qualification was and whether they had experienced any technical issues with the study. They received a short written debrief. The session lasted approximately twelve minutes.

#### Analysis

We pre-registered a set of analyses at aspredicted.org (#78603, see Supplementary Materials), raw data and analyses are available as Online Supplements. We pre-registered six exclusion criteria. Before data analysis, we removed any participants
who withdrew before the end of the experiment (*N* not recorded by Gorilla), were aged under 18 (*N* = 0), entered nonsensical items to a free-text question (*N* = 0), did not select the same answer to two identical multiple-choice questions (about
participants’ qualifications, *N* = 1), did not correctly identify the ‘strange word’ embedded within two sets of experimental
instructions (*N* = 4) and had accuracy at or below 67% for any question in the symbolic-arithmetic
task (*N* = 18). This ensured that average symbolic task performance was good and that, on
an individual level, every participant could answer all of the symbolic questions
(with an accuracy of at least 6 out of 8 correct for each question). Additionally,
participants were removed for misunderstanding/misremembering task instructions (*N* = 3). In response to the questions, ‘Did everything go alright with the study?’ Two
participants reported that they performed mental addition, rather than multiplication
for the nonsymbolic task. One participant reported confusion over the answer keys
for the nonsymbolic task. These exclusions were not pre-registered, but deemed necessary
to protect the integrity of the data. Overall, 24 participants’ data were removed
(2 participants failed on more than one criterion).

### Results

#### Pre-Registered Analyses

Mean accuracies for the various conditions are shown in Figure 2. A 7 (problem: 2×2, 3×3, 4×4, 5×5, 6×6, 7×7, 8×8) by 2 (task: symbolic, nonsymbolic)
within-subjects ANOVA was conducted on percentage accuracy. There was a significant
main effect of problem, *F*(6, 570) = 107.39, *p* < .001,
${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .531. There was also a significant main effect of task, *F*(1, 95) = 4612.72, *p* < .001,
${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .980. We predicted, in line with the hypothesis that OTS but not ANS representations
are multipliable, that there would be an interaction whereby accuracy would be high
throughout the symbolic condition (our exclusion criteria assured this), but that
participants’ performance on the nonsymbolic trials would vary by number. As predicted,
there was a significant interaction, *F*(6, 570) = 110.26, *p* < .001,
${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .537. To investigate this interaction further, for the nonsymbolic task, preregistered
one-sample *t*-tests showed that accuracy was significantly above 50% (chance level) for problems
with small subitizable multiplicands/multipliers 2 - 4 (all *p*s < .01, *d*s > 0.25), but not for those with larger multiplicands/multipliers 5 - 8 (all *p*s ≥ .19, *d*s < 0.1).

##### Figure 2

We further predicted that, within the nonsymbolic task, accuracy would be highest
for 2×2, would be lower for 3×3, and then lower again for 4×4. This prediction derived
from the observations that (i) for 2×2 trials, two of the three purported solutions
presented to participants were within the subitizing range for most participants,
whereas none were for the 3×3 and 4×4 trials, and (ii) whereas 3 is subitizable for
all participants, it has been found that not everyone can subitize 4 (e.g., Leibovich-Raveh, Lewis, Kadhim, & Ansari, 2018), suggesting that accuracy should be higher for the 3×3 trials than the 4×4 trials.
As predicted, paired-sample *t*-tests indicated that accuracy was higher for 2×2 trials than both 3×3, *t*(95) = 14.16, *p* < .001, *d* = 1.45, and 4×4 trials, *t*(95) = 20.16, *p* < .001, *d* = 2.06. Accuracy was also higher for 3×3 trials than 4×4 trials, *t*(95) = 4.35, *p* < .001, *d* = 0.44. These results suggest a gradual reduction in accuracy as subitizable multiplicands
and multipliers increase in size.

#### Modelling

The results shown in Figure 2 are not, on their own, sufficient to conclude that ANS representations cannot be
multiplied. In addition, we need to demonstrate that if there were any participants
who generated ANS representations, successfully multiplied them, and then compared
them to their ANS representation of the purported product, they would show accuracy
rates above 50%. To explore this question, we modelled an ANS-based solution to our
task as follows. We assumed that participants, upon seeing the multiplicand array
of *n* dots, would generate an ANS representation, *p*, by sampling from the normal distribution N(*n*, *wn*), where *w* is their personal Weber fraction, a measure of the acuity (precision) of their ANS.
Similarly, we assumed they would then generate an ANS representation of the multiplier,
*q*, by sampling again from the same distribution N(*n, wn*). When observing the purported product *m* (equal to either 0.7*n*^{2}, *n*^{2} or *n*^{2}/0.7), we assumed participants would generate an ANS representation of the purported
product, *r*, by sampling from N(*m*, *wm*). Having formed representations for *p* and *q*, we assumed that participants would then multiply these (of course, for the purposes
of this model, we are assuming that participants are able to multiply ANS representations)
to generate a representation of *pq* (recall that our exclusion criteria ensured that all the participants in our sample
were successfully able to multiply numbers, expressed as Arabic digits, in the range
used in the experiment). Finally, we assumed that participants would compare *pq* to *r*, following the method outlined by Piazza et al. (2004, Supplementary Materials, §1.3). Specifically, we operationalized the window within
which participants were willing to label two ANS representations as representing the
same number with a parameter, δ. So a participant would respond “correct” if *pq* was within δ of *r*; i.e., if (1- δ)*r* ≤ *pq* ≤ (1+ δ)*r,* and “incorrect” otherwise.

We ran a simulation for each problem outside the subitizing range (R code is available
in the Supplementary Materials) to explore whether participants’ accuracy was consistent with them using their ANS
in this manner. The outcomes for 7×7 are shown in Figure 3, but essentially identical results were obtained for other problems. Specifically,
Figure 3 demonstrates that we would expect accuracy levels well above 50% for all δs in the
range 0.05 to 0.5, for low-*w*, medium-*w* and high-*w* samples (operationalized here as being populations with mean *w*s of 0.15, 0.2 and 0.25 respectively). The only situation under which accuracy would
be expected to approach 50% is when δ approaches 0 (an implausible scenario in which
we would expect participants to respond “incorrect” to all trials). In sum, if participants
were successfully multiplying ANS representations to tackle this task, they would
be expected to achieve accuracy levels well above 50%.

##### Figure 3

### Discussion

In sum, we found that participants were able to successfully multiply nonsymbolic quantities, but only when those quantities were within the subitizing range. In contrast, our participants did not perform at above-chance levels when asked to multiply numbers greater than 4. This pattern of performance is not consistent with the proposal that ANS representations are multipliable. In Experiment 1 we only asked participants to tackle square multiplication problems. To extend our results to the situation where the multiplicand and multiplier are different, we ran a second experiment. This allowed us to also consider the situation where one operand is from within, and one is from outside, the subitizing range.

## Experiment 2

### Method

#### Participants

An opportunity sample of 100 adults consented to participate online after having been
recruited through Prolific (www.Prolific.co). We preregistered that if, after data cleaning (following preregistered exclusion
criteria, detailed below and in the Supplementary Materials), fewer than 80 complete datasets had been collected, then we would continue to collect
data until a minimum sample size of 80 had been achieved. After two rounds of data
collection and data cleaning, 93 participants’ data were available. Participants’
mean age was 28.3 years (*SD* 9.6 years), 35 were female and 58 were male. Participants were reimbursed £1.25 for
participation. Participation was voluntary and study procedures were approved by Loughborough
University’s Ethical Approvals (Human Participants) Sub-Committee.

#### Design, Apparatus and Materials

The study used an eight-condition repeated-measures design, with two independent variables:
task type (symbolic and nonsymbolic) and multiplicand/multiplier sizes (small × small,
small × large, large × small, large × large). ‘Small’ multiplicands/multipliers were
defined as numerals 2-4 (i.e., within the subitizing range and processed by the OTS)
and ‘large’ multiplicands/multipliers were defined as numerals 5-7 (i.e., outside
of the subitizing range and processed by the ANS). Unlike Experiment 1, these studies
included both squared and non-squared problems (e.g., 3 × 3, 3 × 4, etc). The study
was programmed in Gorilla^{TM} (www.Gorilla.sc) and completed online. The experiment files can be found online (see Supplementary Materials).

##### Nonsymbolic Task

The creation of dot arrays used the same method as in Experiment 1. The nonsymbolic task consisted of 144 experimental trials. The four unique trial types (small × small, small × large, large × small, large × large) were presented 36 times, half with the correct product and half with the incorrect product. Within each trial type, there were three multipliers, three multiplicands and two answer types (correct/incorrect) resulting in eighteen unique question types per condition (3 × 3 × 2). Each question type was repeated twice. For 72 trials, a set combination of dot arrays was used for each question. Questions were then repeated with incorrect products (an additional 72 unique answer arrays). Half the incorrect products were too big (correct answer / 0.7) and half were too small (correct answer × 0.7). All other task details were the same as Experiment 1.

##### Symbolic Task

For the symbolic task, arithmetic questions and answers, colour schemes, response screens, presentation durations, and response keys were all identical to the nonsymbolic task. Here, 144 trials were presented with 36 presentations of each of the four trial types (small × small, small × large, large × small, large × large). Numerals were always presented centrally on the screen in Calibri 150-point font. All 144 trials were presented in a random order for each participant. There were no practice questions, but instructions were provided before the task and were reprinted on the response screens.

#### Procedure

The procedure was identical to Experiment 1.

#### Analysis

We pre-registered a set of analyses at aspredicted.org (#81047, see Supplementary Materials), raw data and analyses are available as Online Supplements. We pre-registered seven exclusion criteria. Before data analysis, we removed any
participants who withdrew before the end of the experiment (*N* not recorded by Gorilla), were aged under 18 (*N* = 0), entered nonsensical items to a free-text question (*N* = 0), did not select the same answer to two identical multiple-choice questions (*N* = 0), did not correctly identify the ‘strange word’ embedded within two sets of experimental
instructions (*N* = 1), expressed confusion over the task instruction in response to the open-ended
question “Did everything go well with this study?” (*N* = 3), and had accuracy at or below 85% (30 out of 36 questions) for any of the four
trial types in the symbolic-arithmetic task (*N* = 26). Collectively, 27 data sets were removed as three participants were excluded
on two criteria. Our final sample had 93 participants.

### Results

#### Pre-Registered Analyses

Mean accuracies for the various conditions are shown in Figure 4. A 4 (type: small × small, small × large, large × small, large × large) by 2 (task:
symbolic, nonsymbolic) within-subjects ANOVA was conducted on percentage accuracy.
There was a significant main effect of type, *F*(3, 276) = 32.44, *p* < .001,
${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .261. There was also a significant main effect of task, *F*(1, 92) = 5987.46, *p* < .001,
${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .985. We predicted that there would be an interaction whereby accuracy would be
high throughout the symbolic condition (as in Experiment 1, our exclusion criteria
assured this), but that participants’ performance on the nonsymbolic trials would
vary by problem type. In particular, we predicted that accuracy would be significantly
above chance for small × small problems, but not for large × large problems. We had
no specific predictions for the small × large and large × small problems (but in light
of Katz and Knops’s (2014) results, perhaps one might expect slightly above-chance performance).

##### Figure 4

As predicted, there was a significant interaction, *F*(3, 276) = 33.96, *p* < .001,
${\mathrm{\eta}}_{\mathrm{p}}^{2}$ = .270. To investigate this interaction further, for the nonsymbolic task, preregistered
one-sample *t*-tests showed that accuracy was significantly above 50% (chance level) for the small
× small, small × large, and large × small problem types, all *p*s < .005, *d*s > 0.3, but not for the large × large problem type, *p* = .253, *d* = 0.119. Notably, although the mean accuracies for the small × large and large ×
small problems were significantly above chance (*p* = .002 and *p* < .001 respectively), with medium-to-large standardised effect sizes (*d* = 0.323 and *d* = 0.536 respectively), the mean unstandardised accuracies in these conditions, at
52.7% and 54.3% respectively, were only slightly above chance, and well below the
62.8% accuracy achieved by participants in the small × small condition (*d* = 1.210).

Finally, we preregistered a paired *t*-test to compare accuracy on the small × small and large × large problems (the two
conditions most analogous to those in Experiment 1). As expected, the small × small
problems had a significantly higher mean accuracy than the large × large problems,
*t*(92) = 8.388, *p* < .001, *d* = 0.871.

#### Reanalyzing Qu, Szkudlarek, and Brannon (2021)

While we were writing this paper, Qu, Szkudlarek, and Brannon (2021) published an article reporting a study that investigated a similar question. They asked whether approximate nonsymbolic multiplication is possible in young children. Given our finding that educated adults were unable to successfully multiply ANS representations, we expected that the answer to Qu et al.’s question would be no. In contrast, they found that the children in their study were able to complete their approximate multiplication task at above-chance levels.

Qu et al.’s (2021) task involved showing participants an image of a flower. The multiplicand was displayed as an array of dots on the flower’s petal, the multiplier was displayed as the number of petals on the flower (with the dots visible on only one petal). Participants were then showed two candidate solutions to the multiplication, and had to pick the correct one.

One important difference between our study and Qu et al.’s (2021), which could explain the conflicting results, is that Qu et al.’s problems crossed the subitizing range. This was not considered as a factor in their analysis: in some trials both the multiplicand and multiplier were subitizable, in some one was subitizable, and in some neither were subitizable. Qu et al.’s overall accuracy figure (66.7% in the no-feedback condition) was derived by collapsing across all of these three types of trial.

To investigate whether this difference could account for our differing conclusions,
we reanalysed Qu et al.’s (2021) data in a manner similar to our Experiment 2. Specifically, we split their trials
into four categories: trials that involved small × small, small × large, large × small
and large × large multiplications (dots × petals in each case). (Following best practice,
Qu et al. posted their data online along with their manuscript. We are grateful to
them for facilitating this reanalysis.) Note that the mean ratios of the two candidate
solutions displayed to participants by Qu et al. were similar across these four problem
types (means 0.544, 0.532, 0.566 and 0.552 (*SD*s 0.162, 0.159, 0.170, 0.158) for the small × small, small × large, large × small
and large × large problems respectively), suggesting that it is reasonable to compare
the accuracies in these conditions without being concerned of a confound with ratio.

Accuracy data are shown in Figure 5. For the small × small, small × large, and large × small conditions (i.e. trials
where at least one of the multiplicands/multipliers was less than 5) accuracy was
significantly above chance-levels: small × small 77.8%, *t*(43) = 14.4, *p* < .001, *d* = 2.17; small × large 75.0%, *t*(43) = 12.2, *p* < .001, *d* = 1.84; large × small 72.2%, *t*(43) = 10.6, *p* < .001, *d* = 1.61. However for the large × large condition, which involved trials where neither
the multiplicand nor the multiplier were subitizable (i.e., where participants would
have had to use their ANS rather than OTS to represent these quantities), accuracy
was not significantly above chance, 53.4%, *t*(43) = 0.95, *p* = .350, *d* = 0.14.

##### Figure 5

In sum, when analysed in a manner analogous to our study, Qu et al.’s (2021) data show a similar pattern across the four conditions. When participants were asked to multiply two nonsymbolic quantities outside the subitizable range – when they were required to rely upon their ANS rather than their OTS – they did not succeed at above chance levels. In contrast, when participants were asked to multiply two nonsymbolic quantities within the subitizing range, they were successful. Interestingly, unlike in our Experiment 2, Qu et al.’s participants’ performance in the small × large and large × small conditions was high, and only slightly lower than the situation where both multiplicand and multiplier were subitizable. One possibility is that this difference might be the result of Qu et al.’s decision to display the multiplicand and multiplier simultaneously, rather than sequentially as in our study. This difference in display format might also explain the generally higher accuracy rates found by Qu et al. in the small × small condition than found in our Experiment 2.

## General Discussion

In summary, both our data and the data collected by Qu et al. (2021) show that participants can successfully multiply two representations formed from nonsymbolic quantities within the subitizable range at above-chance levels, but are only at chance levels when multiplying two representations formed from non-subitizable quantities. These results are inconsistent with the hypothesis that two ANS representations can be multiplied, which would predict above-chance performance across all trials.

Participants’ performance on the symbolic arithmetic task confirmed that all participants understood how the tasks worked and could complete the corresponding symbolic-arithmetic calculations correctly. This suggests that our findings are not the product of misunderstood task instructions or participants’ poor arithmetic skills: participants were able to successfully multiply two numbers outside the subitizing range when represented symbolically, but not when represented nonsymbolically.

### When Are Nonsymbolic Representations Multipliable?

We have considered different types of nonsymbolic representation: those formed by the OTS (in the range 2 to 4) and those formed by the ANS (the range 5 to 10). Like Ciccione and Dehaene (2020), we found that our participants were capable of reliably multiplying two OTS representations (small × small problems). However, we also investigated whether participants could reliably multiply two ANS representations (large × large problems), finding that they could not. In Experiment 2, and when we reanalysed Qu et al.’s (2021) data, we again found that participants were successful on small × small problems and unsuccessful on large × large problems. However, intriguingly, we also found that they were successful at above chance levels on small × large and large × small problems (which were not used in Experiment 1). Why was this?

As noted in the introduction, prior research has suggested that ANS representations
can be scaled by exact quantities. Barth et al. (2009) and McCrink and Spelke (2010) both used what we might call an *n* × large task, where participants were asked to multiply an ANS representation by
a fixed quantity that remained stable from trial to trial. In McCrink and Spelke’s
study *n* was explicitly identified verbally (via the instruction “the wand takes one rectangle,
and makes it two”).

Perhaps the reason participants can successfully tackle small × large problems but
not large × large problems is that representations generated by the OTS are exact
(e.g., Carey, 2009; Hutchison, Ansari, Zheng, De Jesus, & Lyons, 2020; Lipton & Spelke, 2004; Revkin, Piazza, Izard, Cohen, & Dehaene, 2008), much like the *n* in an *n* × large problem. In contrast, ANS representations are fuzzy, not exact. Relatedly,
and in contrast to ANS representations, representations generated by the OTS can be
verbalised (when observing two dots, participants can report that there are two dots,
whereas they are not able to do this reliably when the number of dots is outside the
subitizing range). If it is the precision/verbalisability of a representation which
determines whether or not it can be used to scale other quantities, then we would
expect participants to succeed on small × small, small × large, large × small, *n* × small, small × *n*, *n* × large and large × *n* problems, but not large × large problems, which is indeed the pattern of results
found in the literature and the current experiments (Barth et al., 2009; Ciccione & Dehaene, 2020; McCrink & Spelke, 2010; Qu et al., 2021).

An alternative possibility for the different results for small × large and large ×
large problems comes from the observation that the scale factors used by Barth et al. (2009) and McCrink and Spelke (2010) were all less than or equal to 4. Although these scale factors were not presented
visually, it is notable that they nevertheless are all within the subitizing range
(although one was a non-integer). Perhaps the size of *n* used to date in *n* × large problems accounts for participants’ successes on these tasks. Running an
experiment using McCrink and Spelke’s paradigm with a larger scale factor would be
necessary to rule out this possibility.

### How Do Participants Multiply OTS Representations?

We found that participants were able to succeed on small × small, small × large and large × small problems. How did they do this? For small × small problems there seem to be at least three possibilities. Given that OTS representations are exact, it is possible that they can be directly multiplied. Alternatively, perhaps participants translated these representations into the verbal code and retrieved the answer from their long-term memory. We cannot distinguish between these possibilities with our data, although future studies could do so. We would expect that if participants were retrieving verbal facts then their performance on small × small problems should be impaired by articulatory suppression paradigms (e.g., Lee & Kang, 2002; Moeller, Klein, Fischer, Nuerk, & Willmes, 2011). Because ANS representations are not exact, we would not expect this retrieval strategy to be available in small × large or large × small problems, and therefore would not expect performance in these conditions to not be affected by articulatory suppression manipulations.

A third possibility, which could be operating in all three problem types where we
found above-chance performance (small × small, small × large and large × small), is
that participants were conducting a sequential addition strategy (i.e. rather than
multiplying 6 by 3, they may have been calculating 6 + 6 + 6). To investigate this
possibility, we compared accuracies and RTs between two groups of trials, *n* × *m*, where (i) *n* > *m* and (ii) *m* > *n.* If participants were using sequential addition, we might expect trials of type (i)
to have to lower RTs and higher accuracy than trials of type (ii) because the latter
type of trials would require more operations (e.g., 3 + 3 + 3 + 3 + 3 + 3 rather than
6 + 6 + 6). We restricted this analysis to just the small × small, small × large and
large × small trials and, for the RT analysis, to only trials where participants responded
correctly. We found no evidence of a difference between trial types (i) and (ii) on
either accuracy, means 0.556, 0.560, *t*(92) = 0.368, *p* = .714, *d* = 0.038, or RT, means 825ms, 814ms, *t*(92) = 0.801, *p* = .425, *d* = 0.083. If participants were not using a sequential addition strategy on small ×
large and large × small problems, it seems likely that these problems are solved using
a scaling method analogous to the *n* × large problems discussed above (Barth et al., 2009; McCrink & Spelke, 2010).

### Are ANS Representations Numerical?

What do our results mean for the ongoing debate about the extent to which ANS representations are numerical? Accepting the Halberda/Benacerraf position – as we do – means that asking whether or not ANS representations are numerical should be interpreted as a question about whether ANS representations have numerical properties. Here we have demonstrated that participants cannot reliably multiply two ANS representations formed by observing two sets of dots outside of the subitizing range. In contrast, participants are able to reliably multiply representations formed by observing two sets of dots in the subitizing range, or by the representations generated from reading two Arabic digits. Thus we can add multiplication to Marshall’s (2018) list of numerical properties not possessed by ANS representations (or at least not fully possessed, in the sense that two ANS representations cannot be reliably multiplied together), alongside discreteness, the potential for infinity, general applicability and ordinality.

Even if one does not accept the Halberda/Benacerraf argument – that something can be said to be (in a certain respect) numerical if it exhibits numerical properties, regardless of what it actually is or how it is formed – then understanding the operational limitations of ANS representations is nevertheless important. In contrast to the robust evidence that both children and adults can reliably add two ANS representations together (e.g., Barth et al., 2005, 2006), here we have shown that the same is not true of multiplication. Given that our participants were able to reliably multiply OTS representations, this finding represents another piece of evidence that ANS and OTS representations differ in important ways (e.g., Carey, 2009; Hutchison et al., 2020; Lipton & Spelke, 2004; Revkin et al., 2008). Previous investigations into nonsymbolic approximate multiplication have often not distinguished between problems with operands within, across and outside the subitizing range (e.g., Katz & Knops, 2014; Qu et al., 2021). Our data suggest that this factor is important and that future researchers investigating this topic should carefully take it into account in their analyses.

Does this mean that we can conclude that ANS representations are not numerical? Not necessarily. As Clarke and Beck (2021) have argued, for ANS representations to successfully represent numbers, they must be sensitive to a subset of the properties that numbers have, not necessarily all of those properties. If this argument is correct, then the goal of researchers should be to understand in which respects ANS representations behave numerically, and in which respects they do not, rather than to try to definitively pronounce upon the binary question of whether or not they are numerical. Our results constitute a step in this direction.