In studies of long-term memory of multi-digit numbers the leading digit tends to be recalled correctly more often than less significant digits, which has been interpreted as evidence for an analog magnitude encoding of the numbers. However, upon closer examination of data from one of these studies we found that the distribution of recall errors does not fit a model based on analog encoding. Rather, the data suggested an alternative hypothesis that each digit of a number is encoded separately in long-term memory, and that encoding of one or more digits sometimes fails due to insufficient attention in which case they are simply guessed when recall is requested, with no regard for the presented value. To test this hypothesis of nominal encoding with value-independent mistakes, we conducted two studies with a total of 1,080 adults who were asked to recall a single piece of numerical information that had been presented in a story they had read earlier. The information was a three-digit number, manipulated between subjects with respect to its value (between 193 and 975), format (Arabic digits or words), and what it counted (baseball caps or grains of sand). Results were consistent with our hypothesis. Further, the leading digit was recalled correctly more often than less significant digits when the number was presented in Arabic digits but not when the number was presented in words; our interpretation of this finding is that the latter format does not focus readers’ attention on the leading digit.
People encounter numerical information in a multitude of situations in their day-to-day life. Numbers come in several formats, they may come across as large or small depending on the context, and the information they contain may be used in many ways. When numerical information is encountered, it is potentially stored in long-term memory and can later be recalled. For instance, a person reading in the morning paper about a shipwreck disaster that cost 376 lives may at a lunchtime conversation attempt to recall how many lives were lost. Here, we are interested in how accurate such recalled numbers tend to be and what this may tell us about how numerical information is encoded in long-term memory.
Memory for numbers was studied already by
Although the abovementioned NDE and SNARC studies do not address whether an analog magnitude representation is used in long-term memory, a couple of other studies do.
The basic idea of the analog encoding hypothesis is that an encountered number is encoded as a magnitude with a certain degree of noise, which becomes evident when the number is recalled and reported. Moreover, the degree of noise is thought to be the main source of recall errors. A straightforward model is that a given presented number
where the error term
While the model above is perhaps the most straightforward way to describe the analog encoding hypothesis it has been suggested that analog magnitude representations might not be linear but rather log-compressed (e.g.,
where
Note that the literature on analog encoding also includes hybrid models (e.g.,
As argued by both
We shall now present an alternative hypothesis according to which the leading digit correctness phenomenon would be consistent also with nominal encoding. To begin with, we shall follow
We further follow
Before proceeding we remark that there is another kind of “mistake” in nominal encoding that would in fact yield a relation to the value of the presented digit, namely if the person rounds the presented number before encoding it. However, analysis of the data from
Of the possible mistakes listed above, recall failure
We shall now present a formal model of nominal encoding with value-independent mistakes. In contrast to the analog encoding model, we must now describe errors in recall of a given digit position rather than an entire number. We shall use letter
where
Fitting this model to a set of recall data amounts to estimating the values of the parameters
Above we have presented two competing hypotheses based on analog and nominal encoding, respectively. These hypotheses differ in what is the primary object of recall. According to the analog encoding hypothesis, the primary object of recall is the magnitude of the number, whereas according to our nominal encoding hypothesis it is separate digits that are primarily recalled. Thus the two hypotheses address two different kinds of data: residuals for the entire number and residuals for each digit, respectively. This means that we cannot make a direct comparison of how well the two hypotheses fit a dataset. The closest comparison we can make is between what the analog encoding hypothesis predicts about the distribution of residuals for entire numbers and what our nominal encoding hypothesis predicts about the distribution of residuals for
We use letter “A” to label predictions from the analog encoding hypothesis; the label “Alog” for predictions from the log-compressed version of the analog encoding hypothesis; and label “N” for predictions from the nominal encoding with value-independent mistakes hypothesis.
The hypotheses make distinct predictions about how residuals should vary with the presented number or digit. For the analog encoding hypotheses, the predictions are very simple and follow directly from the form of the equations that define the models of analog encoding:
We shall now derive what the nominal encoding hypothesis predicts about how residuals in the leading digit should depend on what leading digit was presented. Let
Note that only the last term depends on which digit value
The hypotheses differ in their conception of the relation between the origins of correct and incorrect recall. Whereas our nominal encoding hypothesis attributes incorrect recall to mistakes, the analog encoding hypothesis attributes both correct and incorrect recall to the same imprecise encoding such that the correctness of a particular instance of recall is determined by whether the imprecision turned out to be negligible or substantial in that instance. This difference between the hypotheses has important consequences for their ability to account for data on correct and incorrect recall. Whereas the nominal encoding hypothesis is consistent with any ratio between correct and incorrect recalls, the analog encoding hypothesis predicts that the proportion of correct recalls should be consistent with the overall degree of imprecision in recall exhibited in the data. For instance, a study of children’s recall of three-digit numbers (
Assuming that the context of the recall task requires the response to be an integer, correct recall is equivalent to |
To obtain the corresponding prediction from the log-compressed model of analog encoding, we rewrite the condition for correct recall, |
As we discussed above, the nominal encoding hypothesis requires no corresponding relation between the standard deviation of residuals and the proportion of correct recall. Instead, the assumption of value-independence of mistakes requires the patterns of residuals to be described by the ten parameters
and
In case the leading digit is incorrectly recalled, the analog encoding hypothesis attributes this to the imprecision being so large that even the leading digit is affected. This implies that the imprecision in recall has completely overwhelmed the magnitude represented by the second digit, so that no signal from the second digit of the presented number would remain in the recalled number.
In contrast, the nominal encoding hypothesis says that each digit is encoded separately. If a mistake has occurred in the leading digit, the second digit may still be correctly encoded. (Although the mistake in the leading digit is an indication that the participant’s attention to the numerical information may have been low, so that the likelihood of mistakes in the second digit is increased.)
The analysis used the data on recall of presented three-digit numbers, yielding 127 × 14 = 1,778 data points, out of which one data point was empty and 19 data points were excluded because the recalled number had more than three digits. The exclusion of those stimuli that were single-digit or double-digit makes for a cleaner analysis of the nominal encoding model, while causing only negligible changes in the results of analysis of the analog encoding model. (The four excluded stimuli yielded disproportionally many correct answers, and the support for our conclusion would be even stronger if recall of these stimuli were included in the analysis.)
The key parameters for the analog encoding models were estimated to
As reported by
The left panel of
95% confidence intervals for mean residuals (left) and mean residuals in the leading digit (right). Data from
Based on the observed standard deviation of residuals (
Based on the observed standard deviation of log-residuals (
For the nominal encoding model we instead compared the observed standard deviation of residuals in the leading digit,
The analog encoding model predicts that correctness of second digits should only be at chance levels in case the leading digit is incorrectly recalled (prediction A-3) whereas the nominal encoding hypothesis predicts correctness as levels above chance unless attention is low (prediction N-3). However, it is difficult to say what exactly would be the chance level, as the possible values for the second digit were not uniformly distributed over the presented numbers (and some digit values are more popular guesses than others). For this reason, we will only carry out a rough analysis. As there are 10 possible digit values, the chance level should be approximately 10%. Out of the 1,254 data points (71%) in which the leading digit was incorrectly recalled, 154 (12%) had the correct value of the second digit. This result seems roughly consistent with analog encoding but also consistent with nominal encoding under low attention levels. As a large majority of children (71%) made mistakes even on the leading digit, attention levels indeed seem to have been generally low.
In this reanalysis of children’s recall of three-digit numbers we found that the data tended to support the hypothesis of nominal encoding with value-independent mistakes over the analog encoding hypothesis. Two findings were particularly inconsistent with analog encoding. First, residuals depended strongly on the presented number. Second, the proportion of correct recalls was much too large in relation to the overall low accuracy of recall.
Whereas results were more consistent with nominal encoding, they were not exactly as predicted. This could be due to the recall task being sectioned in several stories, each of which presented several numbers that were always in the same range; either three low numbers, three medium range numbers, or three high numbers. This could cause some mistakes in recall in which one number is mixed up with another number in the same range, which would violate the model assumption of value-independent mistakes.
The reanalyzed dataset was from a study of children who listened to stories and were asked to recall numerical information from these stories. In order to subject the competing hypotheses to more comprehensive testing we conducted two new studies in which adults read a story and were asked to recall numerical information presented in it.
In the data on children’s recall, we found that recall may have been influenced by other numbers presented in the same story. To avoid such contamination we ensured that the story in our studies contained only a single piece of numerical information (but lots of other kinds of information). In addition to manipulating the value of the presented number, our studies manipulated two other aspects: the format and the context.
A reason to manipulate the format is that processing of numbers may depend on whether they are presented in the format of Arabic digits or words (
Previous research investigating numerical judgments more generally has indicated that people’s judgments are often influenced by contextual factors such as framing (
To ensure an even distribution over different digits, nine three-digit stimulus numbers were constructed such that each of the digits from 1 to 9 appear exactly once in each of the three positions (193, 217, 348, 426, 534, 651, 782, 869, 975). To avoid contamination between stimuli, participants were presented with only one stimulus number each in a between-subjects design. The stimulus number was presented either using digits or words (e.g., 193 vs. one hundred and ninety-three), for a total of 18 versions (nine numbers times two formats). As we presumed that the rate of recall errors might be low when adults are asked to recall a single number we decided to collect a rather large dataset, 540 data points (30 for each of 18 stimuli).
Participants were 540 American adults (52% male, Mean age = 32), recruited online among users of Amazon Mechanical Turk (mturk.com) for a fee of ¢50 US. Participants were anonymous and gave their informed consent to participate.
The study was conducted online and participants were told that the study would examine certain aspects of memory. Participants were asked to read a one-page excerpt adapted from a comic short story by acclaimed writer Connie Willis, and informed that they would later be asked questions about the story. The page contained just one piece of numeric information, namely, the number of baseball caps that a visiting alien had collected. The piece of numeric information varied across eighteen different versions of the text as described above. The material is included in the
The story was followed by a filler task (a six-item questionnaire measuring agreement with relativist and objectivist quotations from various scholars; available on request). The filler task took approximately five minutes to complete. Thereafter, participants were asked to recall various pieces of information presented in the story. Specifically, the first question was how many baseball caps the alien had brought (with the additional instruction “If you don't remember exactly, please give your best estimate”).
We first examined that our data replicated the leading correctness phenomenon. We then examined the predictions from the competing hypotheses, just as we did in our above reanalysis. Analyses were conducted both for the pooled data and separately for the digits and words conditions. However, to avoid an unnecessarily cumbersome results section we report analyses per condition only where this is really relevant.
Seven participants were excluded because they did not fill in a recalled number at all. Another two participants were excluded because they filled in negative numbers, another nine because they filled in numbers with more than three digits (i.e., greater than 999). This left 522 participants for further analysis.
The key parameters for the analog encoding models were estimated to
First we examined whether our data replicated the leading digit correctness phenomenon (
However, separating the digits condition and the words condition revealed an important difference. The digits condition exhibited a very strong leading digit correctness phenomenon, with 75% correct hundreds digits, compared with 56% correct tens digits and 55% correct ones digits. In contrast, the words format did
The difference in results between formats was expected from the hypothesized role of Arabic digits in drawing attention to the first digit. It is possible that the numerical information presented in the digits condition was more salient than the corresponding information in the words format. Note, however, that although such difference in salience could explain a difference in the overall proportion of recalled numbers it cannot explain why the leading digits phenomenon occurs in the digits but not the words condition
95% confidence intervals for mean residuals (left) and mean residuals in the leading digit (right) in Study 1.
Based on the observed standard deviation of residuals (
Similar results were obtained in separate analyses of the digits condition (
Next we compared the observed standard deviation in the leading digit,
Because the numbers used as stimuli included all digits from 1 to 9 equally often but never included 0, we examined the 136 instances of recall where the hundreds digit was recalled incorrectly and the tens digit was recalled as a non-zero digit. The probability that the recalled tens digit would be correct by chance is then one chance in nine (11%). The actual frequency of correct recall of the tens digit in this subset of the data was 38%, 95% CI [29%, 46%], which is much higher than expected by chance. We then examined if the results differed between the two formats. In the words format, the tens digit was correct in 42 out of 89 instances of recall (47%). In the digits format the tens digit was correct only in 9 out of 47 instances of recall (19%). Thus, when the hundreds digit was not correctly recalled the frequency of correct recall of the tens digit was higher for numbers presented in words than for numbers presented in digits, χ2(1,
In Study 1 we replicated the leading digit correctness phenomenon and found the data to be inconsistent with the analog encoding hypothesis. Importantly, the data from our adult population had the same features as the data from the child population in
Finally, note that Study 1 used a contextual cue that presumably influenced recall mistakes. More specifically, it is possible that participants anchored their judgments of the number of baseball caps on previous experiences (
We conducted a second study with the aim of replicating the results of the first study with numbers counting something else. Specifically, the stimulus of Study 1 was the number of baseball caps, whereas in Study 2 we changed the story such that the number instead refers to grains of sand. In addition to testing the robustness of results, this manipulation provides a test of the role of contextual cues for numeric recall.
Participants were 540 American adults (51% male, Mean age = 34), recruited online among users of Amazon Mechanical Turk (mturk.com) for a fee of ¢50 US.
The procedure was identical to that of Study 1 except for one important change. The term “baseball caps” was changed to “grains of sand” both in the short story participants were asked to read and in the subsequent questionnaire.
Nine participants were excluded because they did not fill in a number at all, two participants were excluded because they filled in negative numbers, and 23 participants were excluded because the recalled number had more than three digits. This left 506 participants. Note the greater frequency of recalled numbers with more than three digits in Study 2 (23 out of 540) than in Study 1 (10 out 540 in Study 1), χ2(1,
The key parameters for the analog encoding models were estimated to
Replicating Study 1, the digits condition exhibited a very strong leading digit correctness phenomenon whereas the words condition did not. In the digits condition there was 83% correct hundreds digits, 95% CI [78%, 87%], compared with only 61% correct tens digits, 95% CI [54%, 67%], and only 61% correct ones digits, 95% CI [55%, 67%]. In the words format condition there was just 67% correct hundreds digits, 95% CI [61%, 73%], which was not significantly different from the 61% correct tens digits, 95% CI [54%, 67%], nor significantly different from the 62% correct ones digits, 95% CI [56%, 68%].
We assessed this difference between the two conditions in the same way as in Study 1. Among those participants who recalled either the hundreds or the tens digit correctly but not both, 65 out of 73 (89%) recalled the hundreds digit correctly in the digits condition, compared to only 43 out of 71 (61%) in the words condition. As in Study 1 this difference between the two formats in the recall advantage of the leading digit over the tens digit was statistically significant, χ2(1,
Replicating the corresponding findings in Study 1,
95% confidence intervals for mean residuals (left) and mean residuals in the leading digit (right) in Study 2.
As in Study 1, the proportion of correctly recalled numbers, 46%, was vastly greater than predicted, Pcorr = 0.20%, by the analog encoding model from the observed standard deviation of residuals,
As in Study 1 we examined those instances of recall where the hundreds digit was recalled incorrectly and the tens digit was recalled as a non-zero digit (104 data points). Results were similar to the previous study. The frequency of correct recall of the tens digit, given that the hundreds digit was incorrectly recalled, was 38%, 95% CI [28%, 48%], much higher than the chance level of 11%. As in Study 1, this frequency was higher for numbers presented in words (30 out of 68 instances, 44%) than in digits (9 out of 36 instances, 25%), χ2(1,
Study 2 was designed with two aims in mind. First, we wanted to conduct a replication of Study 1 to evaluate the robustness of our findings. Indeed, we replicated all the main findings. We found strong support for the notion that numbers are stored in memory according to the hypothesis of nominal encoding with value-independent mistakes. Conversely, there is very little support in the data for the analog encoding hypothesis. Our second aim with Study 2 was to evaluate the effect of contextual cues on recall. We found only small effects of contextual cues, but these effects were consistent with the hypothesis that contextual cues may guide guessing when the presented number cannot be accurately recalled.
Numerical information is an important and ubiquitous part of people’s day-to-day life. Previous research has primarily focused on the issue of how such information is represented when numbers are mapped from an external representation to an internal meaning (e.g.,
Here we revisited the leading digit phenomenon, which has previously been taken as an indication that numerical information is stored in memory according to an analog magnitude representation (
To further investigate this alternative account we conducted two studies that presented participants with a story including a piece of numerical information that varied in numerical value, format (words or digits), and what was counted (baseball caps or sand grains). In both of these studies we found the pattern of results to be consistent with the hypothesis of nominal encoding with value-independent mistakes while the analog model of encoding could not account for the data. We concluded that both the data from
We acknowledge that our studies, being conducted online, were not as controlled as traditional memory studies. In particular, we had no way of making sure that participants did not take notes. However, for our conclusions this limitation is not very problematic. Cheating by taking notes could not explain the slope of mean residuals (although it could contribute to a high proportion of correct answers). Moreover, we emphasize that we found the same patterns of results in the data from the more tightly controlled study of
In addition to our main findings we found predicted effects of contextual cues and numerical format. Also these effects are consistent with the hypothesis of nominal encoding with value-independent mistakes. Specifically, contextual cues were expected to affect the guessing part of the recall process, similar to framing and anchoring effects seen in traditional judgment and decision-making tasks (
In this study we examine certain aspects of memory. We want you to carefully read the following excerpt adapted from a comic short story by acclaimed writer Connie Willis. Please, do NOT make any notes about the text as that would defeat the purpose of the study. Later in the study we will ask you questions about what you've read.
"You've got to talk to him," Chris said. "I've told him there isn't enough space, but he keeps bringing things home anyway."
"Things?" Stewart said absently. He had his head half-turned as if he were listening to someone out of the holographic image.
"Things. A Buddha, a Persian rug, and XXX baseball caps [grains of sand in Study 2], so far!" Chris shouted at him. "Things I didn't even know they had on Sony. Today he brought home a piano! How did they even get a piano up here with the weight restrictions?"
"What?" Stewart said. The person who had been talking to him moved into the holo-image, focusing as he entered, put a piece of paper in front of Stewart, and then stood there, obviously waiting for some kind of response. "Listen, Chris, darling, can I put you on hold? Or would you rather call me back?"
It had taken her almost an hour to get him in the first place. "I'll hold," she said, and watched the screen grimly as it went back to a two-dimensional wall image on the phone's screen and froze with Stewart still smiling placatingly at her. Chris sighed and leaned back against the piano. There was hardly room to stand in the narrow hall, but she knew that if she wasn't right in view when Stewart came back on the line, he'd use it as an excuse to hang up. He'd been avoiding her for the last few days.
Stewart's image jerked into a nonsmiling one and grew to a full holo-image again. With the piano in here, there wasn't really enough room for the phone. Stewart's desk blurred and dissolved on the keyboard, but Chris wanted Stewart to see how crowded the piano made the hall. "Chris, I really don't have time to worry about a few souvenirs," he said. "We've got real communications difficulties over here with the aliens. The Japanese translation team's been negotiating with them for a space program for over a week, but the Eahrohhs apparently don't understand what it is we want."
"I'm having communications difficulties over here, too," Chris said. "I tell Mr. Ohghhi" She stopped and looked at the alien's name she had written on her hand so she could pronounce it. "Mr. Ohghhifoehnnahigrheeh that there isn't room in my apartment and that he's got to stop buying things, and he seems to understand what I'm saying, but he goes right on buying. I've only got a small apartment, Stewart."
Now try to recall the story excerpt you read earlier.
How many baseball caps had Mr. Ohghhifoehnnahigrheeh brought to Chris's house? (Enter a number. If you don't remember exactly, please give your best estimate.)
What was the name of the person Chris talked to?
What were the Japanese translating team negotiating with the aliens about?
What did Chris want the other person to see in her apartment?
When the picture froze, what did it show the person doing?
The data collection was paid by funds provided to the first author by Mälardalen University.
We are grateful to Clarissa Thompson and Robert Siegler for sharing their data.
The authors declares that there is no conflict of interest regarding the publication of this paper.