^{1}

^{1}

Are there some differences so small that we cannot detect them? Are some quantities so similar (e.g., the number of spots on two speckled hens) that they simply look the same to us? Although modern psychophysical theories such as Signal Detection Theory would predict that, with enough trials, even minute differences would be perceptible at an above-chance rate, this prediction has rarely been empirically tested for any psychological dimension, and never for the domain of number perception. In an experiment with over 400 adults, we find that observers can distinguish which of two collections has more dots from a brief glance. Impressively, observers performed above chance on every numerical comparison tested, even when discriminating a comparison as difficult as 50 versus 51 dots. Thus, we present empirical evidence that numerical discrimination abilities, consistent with SDT, are remarkably fine-grained.

What are the limits of our perception? This question has been a central focus of the scientific study of the mind since the investigation of thresholds began with psychophysics (

While the argument we will explore applies broadly to all magnitude discriminations, the literature on approximate number cognition is a particularly informative testcase, as limits on our ability to judge number have been widely discussed. The ability to rapidly determine the numerically greater of two collections is attested across species and across human development (e.g.,

The idea that some differences are too small to perceive is prevalent throughout the literature on numerical cognition, as exemplified by the following quote: “The difference between eight and nine is not experienced at all, since eight and nine, like any higher successive numerical values, cannot be discriminated” (

However, the conclusion that these populations are

This last point is of particular theoretical significance, as there is precedent for revision of psychophysical theory in response to new data from edge cases: for example, previous models of magnitude perception such as Weber’s law were eventually found to be inadequate to explain behavior at extremes such as extremely high and low stimulus intensities (

Here, we empirically test whether the psychophysical theories of magnitude comparison (using number as a case study) are indeed correct even in extreme cases, despite their inconsistency with the intuitive appeal of hard limits and the convention of using such limits to characterize the acuity of perceptual systems. That is, are we above chance at numerical discrimination, even for extremely difficult comparisons? Importantly, most of the comparisons that we presented were far below the levels typically employed in numerical discrimination experiments, with the most difficult ratio being 51:50 dots. Although many authors in numerical cognition may expect the contrary, either due to a belief in true perceptual limits or an intuition that subjects would give up on subjectively “impossible” comparisons, we hypothesized that, with enough trials, people would perform above chance on even the hardest ratios.

We ran batches until we had at least 100 subjects for each of four conditions. A total of

On each trial, subjects saw two ensembles and indicated which of the two groups contained more dots. There was always a correct answer, so equal numbers of dots were never shown. Each stimulus image was displayed for 1000 ms. There were two conditions. Half of subjects saw the groups sequentially, at the center of the screen (all white dots against a gray background), with a 50 ms blank screen between ensemble presentations. The other half of subjects saw the groups simultaneously, with one group on the left (white dots) and one on the right (black dots). Following the stimuli, a response prompt (“Did the first/left or second/right image contain more dots?”) remained on the screen until a response was recorded on the subject’s keyboard.

Stimulus sets A and B were distinct sets of images generated using the same algorithm, which were both included solely for the purpose of ensuring generalizability of our results beyond one particular stimulus set. This means that there were essentially four “conditions” with approximately equal numbers of participants: stimulus set A with sequential presentation, stimulus set B with sequential presentation, stimulus set A with simultaneous presentation, and stimulus set B with simultaneous presentation.

Finally, during stimulus generation, we implemented non-numerical feature controls in an attempt to focus subjects on number. Surface area and convex hull ratios on each trial were approximately equated to that trial’s number ratio, with equal numbers of congruent and incongruent trials for each feature. However, we note that our broader argument – that subjects are capable of above chance discrimination performance on extremely difficult ratios – would stand whether subjects in our task were relying on number, area, convex hull, or on any combination of features (i.e., the ratio of any feature on the 51:50 trials was equal to 51:50 – and far more difficult than the ratios typically used in number, area and convex hull experiments).

There were four blocks of trials, with participant-paced breaks in between each block. Each block began with practice trials on easy ratios that scaffolded the participant towards the difficult test ratio (Confidence Hysteresis; ^{st} vs. 2^{nd}) was counterbalanced, such that each option was the correct response on exactly half of the trials in that condition.

Subjects were tested online on their own personal computer during COVID, so factors like total display size and luminance were not tightly controlled – but these should have only minor effects on performance, given the previously-found reliability of internet-based psychological studies (e.g.,

We fit each subject’s response data with two models (see

The probability (

As can be seen in left panel of

For the alternative model, which we are calling the Give Up Model, we devised a modification of the SDT model in an attempt to capture the intuition that comparisons eventually become so difficult as to be imperceptible – i.e., that they give rise to random guessing. In our Give Up Model, if the ratio was more difficult than the Guess Boundary (i.e.,

We excluded subjects based on average accuracy across the whole experiment. Overall, subjects were correct on 62.0% of trials (

Our main question of interest is whether subjects can perceive the difference between groups at the most difficult numerical ratios. For each ratio, we calculated each subject’s accuracy. Then we performed a series of planned one-sample

Subjects performed above chance for all ratios, all

To evaluate whether performance varied by condition, we ran a 2-way (stimulus set x presentation method) between-subjects ANOVA on overall accuracy. There was a marginal difference in performance between subjects who saw stimulus set A (

Because of the difference in performance between the presentation conditions, we evaluated whether subjects were significantly above chance at each ratio separately for each presentation condition. In the sequential presentation condition, subjects were significantly above chance on every comparison including 51:50 (

For each subject, we fit their responses with the two models using Maximum Likelihood Estimation, then evaluated which model provided a better fit to each subject’s data using the Bayesian Information Criterion (BIC;

Additionally, we also fit all 410 subjects’ data together to compare the two model fits (see

The idea that some differences are too small to perceive has intuitive appeal. However, the data presented here suggest that humans are capable of far finer distinctions than this idea would imply. Although comparisons as large as an 8:7 ratio have been cited as the limit of approximate number perception for adults (e.g.,

Although this success may feel counterintuitive, it is in fact consistent with modern models of psychophysics. For example, if representations of number are well-ordered, then there will always be some region of activation for 51 that is greater in magnitude than the activation for 50 (likewise for 101:100, etc.). If the number representations are close (e.g., 51:50), just like any two similar signals in the mind, the region of non-overlap, where 51 has a higher signal than 50, will be small, but it will be represented. And it is this small region of non-overlap in representation that drives the success we observed here – a small but significant improvement from chance. Nothing changes as one progresses to more and more difficult numerical comparisons. The observer is thinking the same thought (e.g.,

In

Why do some previous studies report data consistent with “at chance” performance on difficult ratios (e.g.,

What do the present results mean for our understanding of individual differences in ANS precision? It has been typical in numerical cognition to use the point at which performance transitions from above-chance to at-chance as a metric for distinguishing between populations and species. But although such a description in terms of the minimum ratio of discriminability is widespread across the numerical literature (see

In the SDT model, individual differences are explained via the Weber fraction parameter. In such a model, while everyone would be near perfect with the easiest ratios (e.g., 2:1), performance would only drop to

Data collection and analysis were supported by an NSF GRFP, DGE1746891, awarded to E.M.S. and a McDonnell Foundation Scholar Award awarded to J.H.

The authors have declared that no competing interests exist.

The authors have no additional (i.e., non-financial) support to report.