The acquisition of counting, though it may seem to be a mundane skill, is a major milestone for children. In one sense, counting is clearly a feat of word learning: Children need to learn words for highly abstract number concepts and grasp how the structure of the count list gives meaning to the number words (i.e., going up one word in the list represents adding one item to the set; Gallistel & Gelman, 1992; Gelman & Gallistel, 1978). In another sense, counting is a feat of conceptual development: Children build an explicit symbolic representation that enables them to determine, track, and remember exact quantities (Carey, 2009; Carey & Sarnecka, 2006; Frank, Everett, Fedorenko, & Gibson, 2008; Gordon, 2004). A question central to this subject is how acquisition of the verbal count list interacts with nonverbal conceptual development in numerical reasoning.
Previous studies of children’s verbal counting abilities have documented that children begin to recite a count list long before they develop stable meanings for the number words in this list. It then takes children two to three years from the time they learn the count list to acquire the cardinal principle—the idea that the last number in the count represents the cardinality of the set (e.g., Fuson, 1988; Gelman & Gallistel, 1978; Schaeffer, Eggleston, & Scott, 1974; Wynn, 1992). The cardinal principle is a foundational concept that serves as a building block for later mathematical thinking. Using the GiveN task, researchers have consistently shown that Englishspeaking children start off learning the meanings of number words in a piecemeal fashion. First, around the age of two, children develop an understanding of the meaning of “one” (“one”knowers; i.e., they can give one object when asked for one, and avoid giving one when asked for other quantities). They then make stagelike jumps approximately every six months to become “two”, “three”, and then “four”knowers—collectively termed “Subset Knowers” (henceforth SSknowers) because during these stages children know the exact quantities referred to by numerals for a subset of the count list they are able to recite (Le Corre, Van de Walle, Brannon, & Carey, 2006; Lee & Sarnecka, 2010, 2011; Sarnecka & Lee, 2009; Wynn, 1990, 1992). Children who can accurately provide an experimenter with any requested number (usually tested up to 6) are labeled “CardinalPrinciple Knowers” (henceforth CPknowers). Becoming a CPKnower is often regarded as a categorical shift distinct from the subset knower levels. While SSKnowers might understand cardinalities of small numbers, they have not generalized the cardinality principle such that they can flexibly apply it to a range of set sizes.
Using a wide range of tasks, several studies have found that SSknowers and CPknowers differ qualitatively on their interpretations of number word meanings (e.g., Condry & Spelke, 2008; Le Corre et al., 2006; Le Corre & Carey, 2007; Sarnecka & Carey, 2008; Slusser & Sarnecka, 2011; Wynn, 1990, 1992). For example, only CPknowers—but not SSknowers— have exact numerical meanings for quantities larger than three or four. On the GiveN task, only CPknowers can generate a large set of objects upon request (e.g., Can you give me six fish?), and this ability defines their status as CPknowers. Further, CPknowers, but not SSknowers, appear to understand that moving forward one number word on the count list means adding one item to the set—e.g., when an object is added to a set of five objects, the set now has six objects and not seven (Sarnecka & Carey, 2008; see Davidson, Eng, & Barner, 2012 for evidence that this generalization extends only to a limited count list). Moreover, only CPknowers can correctly match pictures that are labeled with large number words (e.g., This picture has eight turtles. Find another picture with eight turtles.), while SSknowers are equally likely to choose the foil that matches on continuous extent such as total surface area, or nonnumerical attributes such as color (green turtles) or mood (happy turtles; Slusser & Sarnecka, 2011). Finally, only CPknowers, but not SSknowers, show evidence that they understand equinumerosity—if two sets have the same cardinal value, they hold the same quantity (Sarnecka & Wright, 2013). Together, these studies show that CPknowers differ from SSknowers with regard to their knowledge of how number words relate to cardinalities of sets of objects, particularly for sets greater than four.
Several studies on populations that lack a verbal count list suggest that the ability to represent large exact quantities is dependent on having exact meanings for large number words. For example, studies with two Amazonian tribes show that verbal counting may allow humans to represent, match, and track large exact quantities (Frank et al., 2008; Gordon, 2004; Pica, Lemer, Izard, & Dehaene, 2004). In one study of the Pirahã, a tribe that lacks a natural number system in their language, performance on setmatching tasks was excellent for small numbers (one, two, three) but became more variable as the set size increased (Gordon, 2004). A followup study by Frank and colleagues (2008) replicated Gordon’s results, showing nearperfect performance on small set sizes and a linear decline in performance for quantities greater than four. In particular, they found that performance on larger sets (> 4) was especially impaired for tasks that required participants to remember particular quantities for some short period of time or to recreate an array in a different spatial orientation than the target array. This pattern of results has also been found in another population. Flaherty and Senghas (2011) showed that older deaf signers in Nicaragua who lacked a stable count list also faced difficulties recreating exact set sizes larger than six, but they were capable of recreating small sets. Collectively, these studies show that speakers of languages that do not have a verbal count list have difficulty representing and tracking large but not small quantities, suggesting that the acquisition of verbal number is central to solving nonverbal numerical problems that require the representation of large exact quantities.
These findings are consistent with previous research showing that small sets of objects can be represented with parallel individuation (PI)—an object tracking system that is present in infants, adults, and other animal species (for review, see Feigenson, Dehaene, & Spelke, 2004). Using PI representations, individual objects are represented as mental symbols in working memory, but the system’s working memory capacity is limited to representations of set sizes of up to three or four items in human adults (e.g., Feigenson & Carey, 2003, 2005; Feigenson, Carey, & Hauser, 2002; Hauser & Carey, 2003; Trick & Pylyshyn, 1994). This explains how people from the Amazonian tribes and Deaf Nicaraguan signers can still represent small exact sets by deploying the PI system, despite not having a verbal count list. However, given its set size limit, the PI system cannot support the representation of large sets beyond four. In fact, much previous research has pointed towards another innate nonverbal representation of sets—the approximate number system (ANS)—for representing sets of all sizes (Dehaene, 2011; Piazza, Pinel, LeBihan, & Dehaene, 2007; Whalen, Gallistel, & Gelman, 1999; Xu & Spelke, 2000; see Feigenson et al., 2004 and Cantrell & Smith, 2013 for reviews). Unlike the PI system, the ANS represents number as continuous mental magnitudes and is limited in precision: The ability to discriminate any two numbers is determined by their ratio (i.e., Weber’s law). For example, it is easier to discriminate 5 from 6 dots than 55 from 56 dots, even though the absolute difference is one dot in both comparisons. Therefore, although the ANS can represent large sets, it cannot generate an exact representation of large quantities, leading to the less precise performance on number tasks seen amongst people from Amazonian tribes and the Deaf Nicaraguan signers. Consistent with this cognitive science literature, in this paper we refer to sets within the PI range as “small” sets and beyond this range as “large” sets.
If neither PI nor the ANS can support the representation of large exact quantities, how do humans represent, for example, a set of ten objects exactly? As studies on populations that lack a verbal count list suggest, the process of acquiring number word meanings may allow us to represent and track large exact quantities (Carey, 2004, 2009). Thus, one possibility is that acquiring the cardinal principle coincides with gaining the ability to track large exact quantities. Drawing on the crosslinguistic evidence, Frank et al. (2008) argued that the verbal counting system is a “cognitive technology” that makes existing exact number representations explicit, allowing a person to note and remember exact quantities larger than four via counting. It is possible that the acquisition of verbal counting influences even more fundamental processes than memory for ephemeral quantities: Learning number language may actually change perceptual, attentional, or conceptual aspects of number representation.
Notably, both of the nonverbal number systems undergo significant developmental change over the same time period during which most children acquire meanings for number words. The acuity of the ANS improves in a protracted fashion throughout early childhood, with a discriminability ratio of 1:2 in 6monthold infants (Xu & Spelke, 2000) and 2:3 in 9 monthold infants (Lipton & Spelke, 2003, 2004). Numerical acuity dramatically increases in children between 3 and 5 years—the same period during which children acquire verbal counting (Halberda & Feigenson, 2008)—and eventually reaches a discriminability ratio of about 7:8 in Western adults (Barth, Kanwisher, & Spelke, 2003). Developmental change is also evident in the PI system: Newborns can track only two objects (Coubart, Izard, Spelke, Marie, & Streri, 2014); 12montholds fail tasks requiring them to track more than three objects simultaneously (Feigenson & Carey, 2003, 2005); between the ages of 3 and 6, this limit increases to sets of four or even five (O’Hearn, Hoffman, & Landau, 2011; RossSheehy, Oakes, & Luck, 2003; Starkey & Cooper, 1995).
Previous studies studying the relationship between children’s counting skills and their ability to solve nonverbal numerical tasks (i.e., tasks that do not require the interpretation of numerical language like “two,” “six,” “more”) have yielded mixed findings. Importantly, many of the studies on number language and concepts do not provide convincing evidence regarding the relationship between acquiring the cardinal principle and representing large sets in typical development, because the methods used do not evaluate both of these abilities simultaneously (Brannon & Van de Walle, 2001; HuntleyFenner & Cannon, 2000; Mix, 1999a, 1999b, 2008a, 2008b; Mix, Huttenlocher, & Levine, 1996; Negen & Sarnecka, 2010). To test the hypothesis that acquiring a count list allows us to represent and track large exact quantities, two criteria need to be met: First, it is necessary to compare preschoolers who have not yet acquired the cardinal principle (i.e., SSknowers) to those who have (i.e., CPknowers). The SSknowers serve as a meaningful comparison group: Although they often have a verbal count list up to “eight” or higher, they only have meanings for the small numbers. Second, it is important to use tasks that require children to represent large exact sets greater than four.
Previous developmental studies exploring the relationship between counting and nonverbal numerical representation have not met these two criteria. Using a nonverbal triad task, Mix (1999a, 1999b, 2008a, 2008b) showed children a target set of objects (2, 3, or 4), and asked them to match it from two to three alternative arrays that differed from the target set in various dimensions (e.g., the number of objects, density between individual objects, length, and object type). Mix tested children with a range of counting abilities (i.e., ages consistent with SS and CPknowers) and found that children with minimal counting proficiency (i.e., those who could give at least two objects in GiveN) could recognize numerical equivalence between sets despite surface dissimilarities, but children who completely lacked counting ability could not (see Negen & Sarnecka, 2010, for similar findings). These findings reveal that verbal counting knowledge is correlated with nonverbal numerical cognition, in that minimal counting ability corresponded with abovechance performance on a matching task. In these studies, further counting development, including performance on the GiveN task consistent with understanding cardinality, was not related to performance on the nonverbal tasks. However, the stimuli used in the study entailed small numbers only. If cardinality is specifically related to nonverbal representation of large exact sets, the relationship could not have been observed in this study.
In another study exploring the relationship between counting and nonverbal number representation, Brannon and Van de Walle (2001) used a nonverbal numerical comparison task and found an overall effect of verbal counting knowledge similar to Mix’s (1999a, 1999b, 2008a, 2008b) studies. Specifically, children who knew at least some number word meanings were better at numerical comparison than those who did not know any number words. However, while this study used both small and large sets, the children were 2 and 3yearolds and all in the precounting or SSknower range. Thus, these findings again indicate a relationship between the onset of verbal counting and nonverbal numerical cognition, but the question remains whether there is an additional relationship between acquiring cardinality and nonverbal representation of large sets.
Finally, numerous recent studies have reported a more general correlation between symbolic number knowledge, typically performance on a math achievement test, and nonverbal numerical acuity in the ANS, in infants through adults (e.g., Libertus, Feigenson, & Halberda, 2011; Libertus, Odic, & Halberda, 2012; Mazzocco, Feigenson, & Halberda, 2011; Sasanguie, Göbel, Moll, Smets, & Reynvoet, 2013; Starr, Libertus, & Brannon, 2013). Most of the studies in this area do not focus on how counting ability and the acquisition of the cardinal principle relates to the development of nonverbal numerical representations. Several recent studies explicitly test for and find a relationship between cardinality and ANS acuity (Chu & Geary, 2015; Shusterman, Slusser, Halberda, & Odic, 2016; Wagner & Johnson, 2011), indicating a qualitative change in magnitude representations when children acquire cardinality. These studies used a dot comparison task in which participants were shown two sets of dots and asked to select the more numerous set (e.g., “Point to the side with more dots”). While this task is often used to investigate the ANS, this paradigm reveals how finely participants perceive moreless relations, but not how accurately participants represent any particular quantity. Thus, the dotcomparison studies do not explicitly test for a difference between CP and SSknowers in the exact representation of quantities specifically larger than four.
To summarize, previous developmental studies have not found a relationship between the acquisition for number words beyond four (i.e., beyond the PI range) and the representation of exact quantities above four. The studies conclude only a minimal relationship between number words and performance on nonverbal tasks beyond the onset of counting, in contrast with the data from atypical populations, which suggest a strong relationship between number language and number concepts. However, those developmental studies did not explicitly hypothesize or test for a relationship between language and thought in the representation of large numbers. Additionally, there is a need in the field to develop nonverbal tasks other than dot comparison to assess the nature of children’s cognitive representations of number. The current study addresses these challenges.
The Present Study
The goal of the present research is to investigate whether children who have grasped the counting system in language (i.e., CPknowers) show better performance in matching and tracking large quantities in nonverbal numerical tasks than those who have not (i.e., SSknowers). We hypothesized that children should be able to solve nonverbal problems involving small quantities (one, two, and three) regardless of their verbal number knowledge, drawing on the PI system (Chi & Klahr, 1975; Feigenson et al., 2004; Starkey & Cooper, 1980). In contrast, for quantities larger than three, children who have not fully grasped the verbal counting system should show more variable and less accurate performance than children who have mastered verbal counting.
To test this, we examined preschool children’s performance on verbal and nonverbal number tasks using both small (13) and large (520) sets. In the present research, we developed two nonverbal numerical tasks, inspired by Hannula and colleagues’ research on ‘spontaneously focusing on numerosity’ or SFON (e.g., Edens & Potter, 2013; Hannula & Lehtinen, 2005; McMullen, HannulaSormunen, & Lehtinen, 2013, 2014; see Rathé et al., 2016, for a recent review). SFON is defined by Hannula and Lehtinen (2005) as a child’s selfgenerated tendency to pay attention to and engage with quantities and number without prompting or any explicit cues in the environment. SFON is measured with tasks that include the exact numerosity of a set as one of several dimensions to which a child might attend (e.g., imitating the experimenter’s action of feeding carrots to a rabbit), without explicit guidance or instruction regarding counting or numbers (i.e., without statements like “How many are there?” or “Count the carrots.”).
In Experiments 1 and 2, we adapted the Cardinality task from Hannula and Lehtinen (2005) to create the Caterpillar Game: children were presented with stuffed toy ‘caterpillars’ with some number of ‘feet’ sewn on, and were asked to retrieve socks for each caterpillar. The major difference between our method and the original was that children in our study completed trials with all set sizes presented, whereas Hannula and Lehtinen started with one truncated the procedure if children did not provide an exactly correct response for a certain set size. Thus, not all children in their study received trials with large set sizes, precluding systematic analyses of children’s performance on those set sizes. We presented children with a variety of set sizes that ranged from 1 to 10. Critically, following the spirit of other SFON research, the instructions did not involve number words or any numerical language (e.g., “more”). Children were also never explicitly prompted to count (e.g., by asking “how many”), but they were not discouraged from doing so.
We predicted that if acquiring the cardinal principle makes it easier for children to encode large quantities, CPknowers, but not SSknowers, should be more accurate in retrieving socks for the caterpillars. However, if any difference were found between SS and CPknowers’ performance on the nonverbal matching task, it could be alternately explained by children’s ability to generate verbal estimates corresponding to the number of items in the set. Therefore, we also assessed children’s estimation knowledge, and the quality of children’s mapping between verbal numerals and approximate representations of quantity in the ANS, using an estimation task—“Fast Cards”—used in previous studies (Le Corre & Carey, 2007; Odic, Le Corre, & Halberda, 2015; Shusterman et al., 2016). In this task, children were shown sets of items flashed quickly on the screen and were asked to guess the number of dots. Previous studies have used this task to identify children who provide larger estimates for larger sets (Mappers) from those whose estimates do not differentiate among large sets (NonMappers). We analyzed children’s performance on the Caterpillar task as a function of the quality of their mappings on Fast Cards.
While Experiments 1 and 2 tested children’s ability to match large numerosities, in Experiment 3 we developed another nonverbal numerical task testing children’s ability to track exact large numerosities. Children were shown an apparatus called “Mr. Elephant.” They saw N balls (between 2 and 7) go into the chute on the top of the box, and then Mr. Elephant “blew” either N or N – 1 (between 1 and 7) balls out of his trunk. Children were then asked if all the balls came out. Similar to the nonverbal numerical matching task in Experiments 1 and 2 (the Caterpillar Game), children were never explicitly prompted to count; however, they were also not discouraged from doing so. We predicted that, if induction of the cardinal principle allows children to encode and track large quantities, CPknowers would demonstrate better performance on tracking the larger numbers of objects that were put into the chute (i.e., more than 3 or 4) compared to SSknowers.
Experiment 1
In Experiment 1, children were tested on two assessments of verbal number knowledge (GiveN and Fast Cards) and one assessment of nonverbal number reasoning (Caterpillar Game). In the Caterpillar Game, three small set sizes (1, 2, and 3) and three large set sizes (6, 7, and 9) were used to test the hypothesis that CPknowers would outperform SSknowers in the high but not in the low number range.
Methods
Participants
Fortynine children (M = 50 months, range = 36–63 months; 31 females) were tested at a child development laboratory or at nearby preschools. One additional child refused to follow the rules of the games during testing and was excluded from analyses. Participants were drawn from a socioeconomically diverse area and were primarily white and from middleclass backgrounds. Children received small prizes for their participation and parents who traveled to the lab received a $5 travel reimbursement.
Testing Session
Children were run in a single session on the Caterpillar Game, Elicited Counting, GiveN, and Fast Cards, in that order. The Caterpillar Game was run first so that its performance would not be affected by exposure to the explicit counting tasks.
Caterpillar Game
Seven 19 in. long caterpillars were created from dark green soccer socks that were stuffed with batting and sewn shut (Figure 1). Each caterpillar was uniquely decorated with a distinct face and features and had a different number of light green feet, three small numbers (1, 2, 3) and three large numbers (6, 7, 9). The feet were distributed along the two sides of the caterpillar body. Seventeen of the children were also tested with a fivefooted caterpillar, which was introduced later in the experiment. Thirtysix identical white infants’ socks were arrayed on a table on the other side of the room from the experimenter and child. The positions of the experimental stimuli were adjusted so that children could not easily see the caterpillar when standing near the socks.
Figure 1
On each trial, children were introduced to a caterpillar and were told the following:
Sammy wants to go for a walk, but he needs socks. See the socks over there? Could you get just enough socks for Sammy? Be careful though! If you don’t bring enough socks, his feet will be cold. But if you bring too many socks, it will make a mess. Sammy’s parents really do not like a messy room, so we don’t want to have extra socks lying around. Can you go there and get just enough socks?”
Critically, the experimenter never explicitly suggested to children that they should count the socks, and they avoided using phrases like “how many socks” or “the right number of socks.” Children were then allowed to go to the sock table, which was located between 2m and 5m away from the testing area, and bring socks, which the experimenter and the child then put onto the caterpillar’s feet. Once all the socks that the child had brought were used (or all the feet covered), the experimenter asked the child if there were “just enough socks.” Children were encouraged to retrieve more socks or return extra socks to the pile as many times as needed. If they brought too many and did not spontaneously correct their error, the experimenter pointed out that the area was now messy and asked the children to return the extra socks; if they did not bring enough, the experimenter pointed out that the caterpillar’s feet would be cold, and asked the child to retrieve additional socks. The child thus received feedback on every trial. We only present analyses of children’s responses on the first attempt to retrieve socks. This is because for most trials, the number of socks that children needed to bring was small after the first retrieval, and therefore children were very accurate with the second retrieval.
Each session began with a onefooted caterpillar to help children understand the task^{i}, and ended with a twofooted caterpillar to ensure that children were attentive throughout the session (i.e., if children were consistently correct on this trial, we assumed that they understood the constraints of the task through all trials). Counting was neither encouraged nor forbidden, but we noted whether the child counted caterpillars’ feet or socks. The onefooted caterpillar was used as the practice trial for each participant. The remaining trials (three, five, six, seven, or nine feet) were administered in one of three pseudorandom orders. A different caterpillar was used on every trial.
Elicited Counting
To establish if the child had a stable count list, the experimenter placed a line of 12 rubber ducks on a table and asked the child to count them. Each child’s highest count was recorded.
GiveN
This task was adapted from Wynn (1992) following the method of Le Corre and Carey (2007). Children were presented with twelve small yellow toy ducks and a large green bowl (the “duck pond”). On each trial, children were asked to put a specific number of ducks in the pond (e.g., “Can you put one in the pond?”). Before coding each trial, the experimenter asked, “Is that N?” and gave the children an opportunity to spread out the ducks in a line, check if they had the correct number, and fix the number of ducks if they identified an error. The first trial asked for one duck, followed by a request for two ducks, three ducks, and so forth until they made an error. The order of trials was N = 1, 2, 3, 4, 5, 6, 8, 7. If children made an error, the experimenter requested one fewer items on the next trial. To be classified as an Nknower, children had to (1) give N on at least two out of three trials, (2) fail to give N+1 on at least two out of three trials where N+1 was requested, and (3) avoid giving N on at least twothirds of the trials asking for more than N. Children who were classified as a ‘1knower’, ‘2knower’, ‘3knower’ and ‘4knower’ were collectively classified as ‘SubsetKnowers’ (SSknowers). Children who passed the three trials (6, 8, 7) were classified as cardinalprinciple knowers, or CPknowers.
Fast Cards
Following Le Corre and Carey (2007), we used a verbal estimation task to identify whether children had made a mapping between number words in the nonsubitizing range (i.e., between 4 and 10) and approximate quantities. The purpose of this task was to investigate whether children’s performance on the Caterpillar Game could be explained generating verbal estimates for the target number of feet. In this task, children were told “This is a game where your job is to say the number word that goes with each picture. What do you see? One fish. So for this picture, you say ‘one.’” During a demonstration set, children saw 1 to 15 fish presented in sequential order to orient them to the task. The experimenter provided the correct answer on each demonstration trial. Children then received four test blocks (trains, hats, monkeys, and snowflakes). Each block used a different fixed random order of 1, 2, 3, 4, 6, 8, and 10 items. In two blocks, total picture area and envelope size were held constant and item size varied; in the other two blocks, item size was held constant while picture area and envelope size varied. Stimuli were displayed on a laptop with a PowerPoint presentation for 1 s. Two children gave responses above 20, which were replaced with a score of 20 to reduce the impact of outlier guesses (such as 100) while giving children credit for guessing a large number. Following previous studies (Le Corre & Carey, 2007), children were classified as Mappers if the linear slope of the mean responses for 6, 8, and 10item trials was above 0.3; otherwise, the child was classified as a NonMapper.
Results
Elicited Counting, GiveN, and Fast Cards
Thirtyeight of the 49 participants were able to count to 10 with no errors, and all could count to 8 without error. Using the GiveN task, children were classified into knower levels. We found 19 SSknowers (M = 47 months) and 30 CPknowers (M = 51 months). SSknowers were further classified as oneknowers (N = 3), twoknowers (N = 9), threeknowers (N = 3), fourknowers (N = 3), and fiveknowers (N = 1). Finally, responses from Fast Cards were used to sort children into Mappers and NonMappers. We found that no SSknowers were able to map large number words onto large numerosities (i.e., zero SSknowers met the criterion for Mappers). For CPknowers, we identified 16 NonMappers (M = 52 months) and 14 Mappers (M = 51 months). Using the 6, 8, and 10item trials, the mean slope for Nonmappers was 0.12 (SD = .28) and the mean slope for Mappers was 0.96 (SD = .69).
Caterpillar Game
For the practice trial, children were shown a onefooted caterpillar. We found that SSknowers brought a mean of 2.16 socks, while CPknowers brought a mean of 2.47 socks. CPknowers performed perfectly when the caterpillar had two feet, and SSknowers were also highly accurate (M = 2.05). Children’s relatively worse performance on the onefooted caterpillar is likely because children expected the caterpillar to have at least two feet, and did not fully understand the constraint that they should not bring back too many socks. The first trial thus provided children the opportunity to understand and reinforce the rule of bringing just enough socks for the caterpillar. Means and standard deviations for the number of socks retrieved at each target size are shown in Table 1.
Table 1
Group  Set Size (X)



1  2  3  5^{a}  6  7  9  20  
Expt. 1  
SS (N = 19)  M  2.16  2.05  2.89  4.71^{a}  5.68  5.37  5.58   
SD  1.64  0.23  1.66  2.93  2.54  2.61  3.06  
CP (N = 30)  M  2.47  2.00  2.83  4.90^{a}  5.80  6.10  7.40   
SD  2.00  0.00  0.70  1.52  1.32  1.77  1.81  
Expt. 2  
SS (N = 7)  M  2.14  2.43    4.88        7.00 
SD  1.35  1.13  2.53  4.36  
CP (N = 16)  M  2.50  2.00    5.25        9.75 
SD  2.22  0.00  1.06  4.74 
Note. SS = Subsetknowers; CP = Cardinalprinciple knowers.
^{a}Only 7 SS and 10 CPknowers were tested with the 5footed caterpillar in Experiment 1.
To examine if SSknowers and CPknowers performed differently on the Caterpillar Game, we first analyzed the number of socks children retrieved for the caterpillars, followed by an analysis of the mean absolute errors on each retrieval.
Mean Retrieval
On lownumber trials, SSknowers and CPknowers showed similar performance: On their first attempt, both groups brought more socks for the threefooted than the twofooted caterpillar, SS: t(18) = 2.19, p = .042, d = 1.03; CP: t(29) = 6.53, p < .001, d = 2.43. However, performance differed between groups on the highnumber trials: Only CPknowers demonstrated an understanding that they should bring more socks for more feet. Focusing on six and nine – the smallest and largest of the highnumber trials – CPknowers brought more socks for the ninefooted than for the sixfooted caterpillar, t(29) = 4.74, p < .001, d = .89, while SSknowers did not, t(18) = 0.16, p = .878, d = .034.^{ii} Additionally, for the ninefooted caterpillar, CPknowers brought significantly more socks than did the SSknowers, t(47) = 2.62, p = .012, d = .686. This difference between SS and CPknowers was not significant at lower set sizes. Finally, the number of socks that CPknowers brought for the six, seven, and ninefooted caterpillars increased linearly with the increasing number of feet (M_{slope} = 0.55, SD = .58). This slope was significantly and robustly different from a chance slope of zero, t(29) = 5.151, p < .001, d = 5.15. In contrast, SSknowers exhibited a flat slope over these three highnumber trials (M_{slope} = 0.02, SD = .91) that was not significantly different from zero, t(18) = 0.072, p = .943, d = .27. The mean slope of CPknowers was significantly higher than SSknowers, t(47) = 2.66, p = .01, d = .776, reflecting CPknowers’ tendency (and SSknowers’ inability) to bring more socks for caterpillars with more feet on the highnumber trials (6to9 range).
Mean Error
SSknowers and CPknowers also differed in the magnitude of their errors in retrieving socks. Error rate was defined as the absolute difference between the number of feet on the caterpillar (target) and the number of socks brought on the first attempt (response). For example, for a sevenfooted caterpillar, children who brought back either five socks or nine socks would have an error of 2.
We conducted a 2x2 mixed ANOVA with KnowerLevel (SS vs. CP) as a betweensubjects factor and Set Size (Small vs. Large) as a withinsubjects factor. The dependent variable is summed errors across trials. The analysis revealed an overall main effect of Knower Level, F(1,47) = 14.89, p < .001, η^{2}_{p} = .241, showing that the magnitude of errors of CPknowers (M = 4.07, SD = 3.83) was smaller than that of SSknowers (M = 8.89, SD = 4.89). Not surprisingly, we also found a main effect of Set Size, F(1,47) = 92.18, p < .001, η^{2}_{p} = .662, showing that the magnitude of errors on large number trials (M = 5.45, SD = 4.43) was larger than that on small number trials (M = 0.49, SD = 1.06). We also found the hypothesized KnowerLevel x Set Size interaction, F(1,47) = 11.75, p = .001, η^{2}_{p} = .200. For the smallnumber trials, the magnitude of errors was similar for SSknowers, M = 0.79, SD = 0.15, and CPknowers, M = 0.30, SD = 0.65, t(47) = 1.60, p = .117, d = .57. However, for the largenumber trials, the magnitude of errors for SSknowers, M = 8.11, SD = 4.63, was significantly greater than that for CPknowers, with a large effect size, M = 3.77, SD = 3.40, t(30) = 3.53, p = .001, d = 1.29. We also analyzed errors for the six, seven, and ninefooted caterpillars individually, and found that SSknowers had significantly higher error rates than CPknowers for each of the largenumber footed caterpillars (sixfooted: t(47) = 3.14, p = .003, d = .92; sevenfooted: t(47) = 2.28, p = .027, d = .67; ninefooted: t(47) = 3.31, p = .002, d = .97; see Figure 2).
Figure 2
Performance by KnowerLevel
Although the sample size did not permit analysis by each knowerlevel, we were interested in whether children’s performance increased across knowerlevels prior to the CP transition. We separated the SSknower group into two groups, 1/2knowers (N = 12) and 3/4Knowers (N = 7). We compared these two groups to each other and found no significant differences on any of the critical dependent measures: mean retrieval for large set sizes, total error for large set sizes, and slope from 6 to 9, even after controlling for age (retrieval: F(1,16) = .076, p = .786, η^{2}_{p} = .005; error: F(1,16) = .297, p = .593, η^{2}_{p} = .018; slope: F(1,16) = .768, p = .394, η^{2}_{p} = .046). Finally, to confirm that the difference between SSKnowers and CPKnowers would hold for both groups of SSKnowers, not just the younger or less knowledgeable ones, we ran a series of OneWay ANOVAs with Group as the independent variable (3 levels: 1/2Knowers, 3/4Knowers, and CPKnowers) and Total Error for the large sets, Slope for the large sets, and Mean Retrieval for the large sets as the dependent variables. All three ANOVAs were statistically significant, F(2,46) > 4.15, p < .03, η^{2}_{p} > .22. Importantly, posthoc LSD tests indicated that the differences between the two groups of SSKnowers were not statistically significant, but the differences between 3/4Knowers and CPKnowers were (Total Error: 1/2Knowers (M = 8.67) vs 3/4Knowers (M = 7.14), p = .419, d = .33; 3/4Knowers vs CPKnowers (M = 3.77), p = 0.46, d = 1.00; Slope: 1/2Knowers (M = .125) vs 3/4Knowers (M = .26), p = .275, d = .416; 3/4Knowers vs CPKnowers, (M = .55), p = 0.011, d = 1.37; Retrieval: 1/2Knowers (M = 5.33) vs 3/4Knowers (M = 5.90), p = .487, d = .24; 3/4Knowers vs CPKnowers (M = 6.43), p = .067, d = .38, the last finding significant in a onetailed ttest.)
Effects of Counting
One explanation for why CPknowers outperformed SSknowers is that CPknowers more readily engaged counting as a problemsolving strategy. To address this, we analyzed effects of counting on task performance. We loosely defined children as “Counting” if they showed evidence of engaging in overt counting on any trial (N = 26) and “Not Counting” if they did not count on any of the trials (N = 23). Only three participants chose to count on every trial, and nobody counted both the feet and the socks on a single trial. Of the 19 SSknowers, 7 counted on at least one trial, while 12 never counted. Of the 30 CPknowers, 19 counted on at least one trial, while 11 never counted. Counting was marginally associated with KnowerLevel, χ^{2}(1) = 3.28, p = .07.
Of the 26 children who counted, only four, all CPKnowers, brought the correct number of socks for each trial on which they counted, implying that counting does not guarantee success on this task. However, on large number trials (6, 7, and 9), children who counted made fewer total errors (3.27) on average than children who did not count (7.91), t(29) = 4.10, p < .001, d = 1.53.
Next we explored whether Counting and KnowerLevel contributed independently to success on the task, or whether a propensity to count was behind the better performance seen in the CPknowers. An ANOVA with total error on large sets as the dependent measure, counting behavior (Counting, Not Counting) and KnowerLevel (SS, CP) as fixed factors, and age in months as a covariate, showed main effects of counting, F(1,44) = 12.95, p .001, η^{2}_{p} = .215, and knowerlevel, F(1,44) = 7.00, p = .011, η^{2}_{p} = .137, no interaction, and no effect of nor interaction with age. Looking only at the subgroup of children who did not count, CPKnowers still made fewer total errors in the largenumber range than did SSknowers, t(21) = 2.38, p = .027, d = 1.039, a large effect. This CPknower advantage was also observed for children who counted, though with a smaller effect, t(24) = 2.18, p = .039, d = .636. Thus, the effects of knowerlevel on performance in the caterpillar game appear to be independent of whether children choose to count during the task and independent of age.
Effects of Verbal Estimation Skills
We also tested the possibility that CPknowers outperformed SSknowers because of superior abilities in generating verbal estimates. The Caterpillar Game is, in some sense, a nonverbal estimation task. Our group of CPknowers included both Mappers, who could generate fairly accurate verbal estimates for set sizes between 6 and 10, and NonMappers, whose estimates were far less accurate. We reasoned that if verbal estimation ability was beneficial to performance on this task, then CPMappers should outperform CPNonMappers. Contrary to this prediction, CPMappers and CPNonMappers made an equal number of total errors on the largenumber trials (mean total error 3.94 vs. 3.97, respectively, t(28) = 0.29, p = .770, d = .11) and exhibited identical slopes on these trials (0.56 for NonMappers, 0.54 for Mappers, t(28) = 0.05, p = .958, d = .019). There was no correlation between the slope of children’s verbal estimates and the slope of their responses on the Caterpillar Game for numbers above 3 (r = .07, p = .64). Thus, although Mappers and NonMappers differ in the accuracy of their verbal estimates for higher numbers, their nonverbal estimation abilities appear identical.
Set Size Five
Because of the marked differences between SSand CPknowers on the high number set sizes, it was of interest to know whether SSknowers’ ability to solve the task was poor for any set size outside the PI limit, or alternatively, whether their ability declined incrementally with increasing set size. Consequently, the last seventeen children tested received an additional trial with a fivefooted caterpillar intermixed with the other trials. SS and CPknowers brought similar numbers of socks, 4.71 (SS) vs. 4.90 (CP), t(15) = 0.17, p = .87, d = .088. CPknowers’ errors were smaller in magnitude than SSknowers’, 0.70 vs. 2.00, though this difference did not reach significance, t(15) = 1.61, p = .13, d = .831. Though these findings suggest that SS and CPknowers performed similarly on the 5footed caterpillar, CPknowers brought significantly fewer socks for the five than the sixfooted caterpillar, t(9) = 2.25, p = .05, d = 2.1, while this difference was not statistically significant in SSknowers, t(6) = 0.79, p = .46, d = .51. To further explore the question of whether SSknowers’ performance in the large number range drops linearly, we analyzed their slopes for the five, six, seven, and ninefooted caterpillars. Only children who were tested on the fivefooted caterpillar were included in this analysis. We found that SSknowers exhibited a flat slope over these four highnumber trials (M_{slope} = 0.14) that was not significantly different from zero, t(16) = 0.37, p = .72, d = .61. These results suggest that SSknowers treated five as they did the other large sets: less precisely than their own performance with small sets, and less precisely than did CPknowers.
Effects of Age, Sex, and Fall vs. Spring Testing
As a final test, separate ANCOVAs were run with KnowerLevel as the independent factor and total error as the dependent variable; counting and one other factor (age, sex, or spring vs. fall testing) were included as covariates in each analysis. Time of testing was identified as a possible covariate because children who were tested in the spring had several more months of schooling than those tested in the fall, and thus may have demonstrated better performance on the Caterpillar Game. KnowerLevel was significantly associated with total error in all three analyses. None of the three covariates yielded significant effects (Age: F(1,45) = 2.43, p = .13, η^{2}_{p} = .051; Sex: F(1,45) = 0.015, p = .903, η^{2}_{p} = .037; Time of testing: F(1,45) = 0.103, p = .750, η^{2}_{p} = .096). These results indicate that having exact meanings for number words higher than four is robustly related to the ability to solve problems with quantities higher than three in a nonverbal task, even after controlling for the effects of spontaneous counting, age, sex, and amount of schooling.
Discussion
Consistent with our hypothesis, Experiment 1 demonstrated comparable performance between SS and CPknowers for small numbers up to four, and contrasting performance between SS and CPknowers for large numbers between six and nine. CPknowers gave increasing responses for larger targets in the highnumber range, while SSknowers did not show differential responses for caterpillars that had a large number of feet. These differences were statistically significant and robust with large effect sizes.
Although there were not significant differences for two of the large set sizes (six and seven), this is likely because SSknowers brought roughly 5.5 socks no matter how many were required. Thus, while their responses appeared accurate for set sizes 6 and 7, this was likely just an artifact of 5.5 being SSknowers’ typical response for all set sizes. In the more informative comparisons on set size nine, error rates for 69, and slopes in the 69 range, SS and CPknowers’ responses were clearly different.
These effects could not be explained by overt counting or by estimation skills. Rather, the results suggested that children’s knowledge of verbal counting and cardinality were related to more refined approximate performance on the nonverbal numerical task. The results implied a sharp cutoff in performance beyond the PI range, but this needed to be tested further as Experiment 1 did not systematically include, for all children, a set size of 5, just beyond the PI boundary. Experiment 1 also showed a stark contrast between small and large sets for subsetknowers, in that they were sensitive to quantities within the smallnumber range, but completely insensitive to quantities beyond it. Given that even subsetknowers have access to approximate number representations, which should support reasoning about larger quantities, this result raises the question of why such representations were not engaged on trials with quantities above four. Experiment 2 was designed to explore these two findings further by introducing a 5footed caterpillar, just beyond the PI range, and a 20footed caterpillar, to test whether subsetknowers would express more sensitivity to a more extreme numerical difference.
Experiment 2
Results from Experiment 1 indicated a clear distinction between SSknowers and CPknowers on their performance on largenumber trials. Specifically, SSknowers appeared to treat all large sets (between 5 and 9) similarly in the Caterpillar Game, while CPknowers differentiated among the large set sizes. Experiment 2 was conducted to address two questions that were raised by the Experiment 1 results. First, it was not clear whether children’s representation of set size ‘five’ followed the pattern of exact responses to small numbers, or poorly differentiated responses to large numbers, or an intermediate pattern. Many SSknowers brought approximately five socks for a variety of large number trials, making it difficult to tell whether responses of five socks to five feet was an accurate response based on an exact representation, or a typical response to large numbers. The fivefooted caterpillar trials in Experiment 1 provided preliminary evidence that children handled sets of five as a “large” number, suggesting a sharp dropoff in performance beyond the PI system’s limit. However, because this set size was added partway into the study, only a small group of children was tested on ‘five’ and the comparison of error between SSknowers and CPknowers did not reach statistical significance. Experiment 2 therefore aimed to replicate the patterns on set size five with a different sample of children.
Second, it was not clear whether SSknowers had entirely failed to notice the differences between the six, seven, and ninefooted caterpillar, or, alternatively, whether they had difficulty discriminating these quantities in order to solve a numberrelevant problem. To address this, we included a caterpillar with 5 feet and a caterpillar with 20 feet, as children should be more likely to notice the difference between 5 and 20 feet than the difference between 6 and 9 feet. If SSknowers used this very marked difference to retrieve more socks for the 20footed caterpillar, it is likely that they simply did not notice the differences between the highnumber caterpillars in Experiment 1. If, however, SSknowers did not bring more socks for the 20footed caterpillar, perhaps they noticed the difference but were unable to apply this information to solve the numerical problem.
Methods
Participants
Twentythree children (M = 48 months, range = 37–61 months) participated in Experiment 2. Children were recruited similarly as in Experiment 1. They completed GiveN, Fast Cards, and the Caterpillar Game, in that order. GiveN and Fast Cards were identical to Experiment 1.
Caterpillar Game
The Caterpillar game consisted of four trials, with set sizes 1, 2, 5, and 20. As in Experiment 1, the first trial involved a onefooted caterpillar to make sure children understood the task and the twofooted caterpillar came last. The order of the two middle trials (five and twentyfooted caterpillars) was counterbalanced across children. Two additional children, both subsetknowers, did not complete the Caterpillar Game and were therefore not analyzed.
Results
We identified 7 SSknowers (1 oneknower, 4 twoknowers, 1 threeknower, and 1 4knower) and 16 CPknowers using GiveN. CPknowers were further sorted using Fast Cards into 8 NonMappers and 6 Mappers; 2 CPknowers did not complete the task. A preliminary analysis revealed no differences at all between CPMappers and CPNonMappers on both 5 and 20foot trials (p > .4), so these groups were combined for analysis as CPknowers.
We first analyzed children’s performance on the fivefooted and twentyfooted caterpillars separately to better understand children’s responses to quantities outside the PI range (see Table 1). For the fivefooted caterpillar, SSknowers brought an average of 4.88 socks (SD = 2.53) while CPknowers brought 5.25 (SD = 1.07). The difference in mean retrieval was not significant between groups, t(21) = 0.05, p = .961, d = .02, but the difference in error was large and significant: SSknowers’ error averaged 2.13 socks compared to 0.63 for CPknowers, t(21) = 3.13, p = .005, d = 1.37.
For the twentyfooted caterpillar, both SS and CPknowers noticed and commented that there were “a lot” of feet, and indeed retrieved more socks, on average, for this caterpillar than children had in Experiment 1 for any other set size (7.00 for SS, 9.75 for CP), including the 9footed caterpillar in Experiment 1; this difference between 9 and 20 was statistically significant only in CPknowers, t(44) = 2.42, p = .02, d = .73.
Next, we asked whether children, particularly the SSknowers, were sensitive to the difference between the 5footed caterpillar and the 20footed caterpillar. CPknowers retrieved significantly more socks for the 20 than the 5footed caterpillar, t(15) = 3.67, p = .002, d = 1.12. SSknowers also retrieved more socks for the 20 than the 5footed caterpillar, though this difference did not reach statistical significance, t(6) = 1.77, p = .127, d = 1.18. Analyzing individual children’s data, we note that five SSknowers brought more socks for the more numerous target, and only one child brought more for the less numerous one; this pattern was significant, Wilcoxon signed ranks test, Z = 17, p = .05 (one child brought the same number). These findings suggest that neither SS and CPknowers treat the 5footed and 20footed caterpillars as if they are the same. Although the pattern of results is similar in both groups, CPknowers differentiated their responses more by bringing more socks for the 20footed caterpillar.
Discussion
Experiment 2 extended the findings from Experiment 1 by examining children’s performance with sets of five (just beyond the PI range) and sets of twenty (more than double the highest set size from Experiment 1). For the fivefooted caterpillar, the variance in the responses was much higher in SS than CPknowers, just as it was for the six, seven, and ninefooted caterpillars in Experiment 1. This pattern suggests that five is treated as a ‘large’ number by subsetknowers, corroborating the conclusion from Experiment 1 that there is a divergence in the quality of SS and CPknowers’ performance on this task with set sizes outside the PI range.
For the twentyfooted caterpillar, the pattern of responses did not exactly mirror those observed for large sets under ten, in two ways. First, SSknowers showed some sensitivity (by Wilcoxon test, above) to the difference between five and twenty, indicating that with sufficiently different target quantities, they could differentiate their responses. Second, even CPknowers underestimated how many socks to bring back, and the differences between SSand CPknowers were not as strong for the 20footed caterpillar as for other set sizes. If CPknowers were generally more attuned to quantity–for instance, if their performance on twenty could be predicted by extrapolating from their responses for six, seven, and nine–then the 20footed caterpillar provided them with the best chance to demonstrate a large difference from SSknowers, but they did not. Notably, “twenty” was beyond the comfortable estimation range for most of the children. Rather than revealing a robust difference between SS and CPknowers in their representation of twenty, it appears that the difference between groups diminished when children were presented with a larger quantity that was less familiar to both groups.
In short, Experiment 1 and 2 indicated that CPknowers had a more refined response to a quantitymatching task, with more accurate and less noisy estimates of socks for a given number of feet. We note two results that could have occurred but did not: CPknowers could have solved the task precisely, simply by counting the feet and the socks; or CPknowers could have exhibited better exact matching of the numbers of socks and feet with a different nonverbal representation (e.g., chunking). However, the observed pattern suggests that CPknowers had a more refined approximate representation of the number of feet and corresponding socks for larger sets beyond the PI range (i.e. above four).
Experiment 3
While Experiments 1 and 2 revealed an advantage in CPknowers in their use of approximate representations of large sets, Experiment 3 explored whether CPknowers show advantages in the representation and use of exact numerical quantities. To this end, we created a new nonverbal task that required children to track an exact number of objects.
We created a different nonverbal paradigm–the Mr. Elephant Game–to test children’s ability to track large exact quantities beyond the PI range. In the Mr. Elephant Game, the experimenter placed balls inside a box named “Mr. Elephant.” On half of the trials, the experimenter surreptitiously stopped one ball from coming out of Mr. Elephant (by toggling a small plastic disc), and on the other half of the trials, all of the balls came out. At the end of each trial, children were asked whether there were balls left in the box. While both the Mr. Elephant Game and the Caterpillar Game were nonverbal numerical tasks, one fundamental difference between these two tasks is the nature of the responses elicited from the children. The Caterpillar Game requires children to retrieve “just enough socks” from a very large pile, posing an essentially openended problem of how many socks to retrieve. In contrast, the Mr. Elephant Game asks children to distinguish x objects from x – 1 objects by answering a yesorno question; thus, on a given trial, children’s responses were always correct or incorrect.
Methods
Participants
Nineteen children (M = 46.8 months, range = 38–56 months, 9 females) participated in Experiment 3. All participants were recruited as in Experiment 1.
Testing Session
Fifteen of the 19 children were run in a single testing session on GiveN and the Mr. Elephant game, in that order. The remaining four participants were run on GiveN in one testing session and on Mr. Elephant in a second testing session two days later.
Mr. Elephant
A hollow, wooden cube (length, width, and height = 27 cm) was painted dark blue, and paper eyes and felt ears were pasted on the front and sides of the box to create “Mr. Elephant” (Figure 3). There was one cylindrical chute on the top of the box and another chute coming out the front, connected by a tube inside the box. Two small plastic doors inside the tube, operated by levers on the exterior of Mr. Elephant, allowed the experimenter to stop the balls from passing all the way through the tube. One door was near the top, and one was near the trunk.
Figure 3
Testing Procedure
At the beginning of the experiment, the child was shown a bowl containing 7 green Styrofoam balls 4 cm in diameter. The experimenter explained to the child that Mr. Elephant liked to eat the “green peanuts” and then blow them out of his trunk. But sometimes, the child was told, the peanuts got stuck in his trunk, so Mr. Elephant needed the child’s help to make sure all the peanuts came out.
On each trial, the experimenter placed either 2, 3, 5, or 7 balls on top of the box in a fixed pseudorandom order. Each number of balls was presented to the child twice—one trial releasing N balls, and the other trial releasing N1 balls—yielding a total of eight trials per testing session. Each testing session began with the easier 2ball trials, which introduced the procedure to the child, and ended with the 3ball trials, to ensure that children understood the task and were attentive throughout the session. The remaining trials (5 and 7) were presented in one pseudorandom order (5 in/4 out, 7in/7out, 5in/5out, 7in/6out). Feedback was available on every trial since one more ball either came out or did not on each trial.
The experimenter circumscribed the balls with her finger and said, “Look! I'm going to feed Mr. Elephant these peanuts!” At the beginning of each new trial, the experimenter said “Remember, let me know if you think a peanut is stuck.” The experimenter then dropped the balls into the top chute one by one. The balls were blocked from immediately coming out of the front chute by the small plastic door near the trunk. On one of the two trials for each number, the experimenter surreptitiously toggled the second plastic disc to block the final ball from going down the top chute.
The experimenter then told the child that Mr. Elephant was going to blow out the ‘peanuts,’ lifted the disc blocking the front chute, and allowed the balls to come out. The child was then asked, “Did they all come out?” An affirmative response was correct for the 50% of the trials when all of the balls came out, while a negative response was correct for the 50% of trials when all but one of the balls came out. Once all of the balls were out, the experimenter said “Good job!” if the child was correct. If the child was incorrect, she would say “Let’s check! Uh oh! I think a peanut is stuck! Can you make it come out? Thank you!” or “Oops! It doesn’t look like any peanuts were stuck.”
Results
We identified 9 SSknowers (4 oneknowers, 1 twoknower, 3 threeknowers, 1 fourknower) and 8 CPknowers using GiveN. Two fiveknowers were also identified, but were excluded from subsequent analyses.^{iii}
We tested whether CPknowers would respond more accurately than SSknowers on the large number trials in the Mr. Elephant Game, as they had on the Caterpillar Game. A 2x2 mixed ANCOVA was performed on the percentage of correct responses, with Set Size (Small [2 and 3 ball trials] vs. Large [5 and 7ball trials]) as a withinsubjects factor, Knowerlevel (SS vs. CP) as a betweensubjects factor, and age (in months) was entered as a covariate. This analysis revealed a significant main effect of KnowerLevel, F(1,14) = 11.87, p = .004, η^{2}_{p} = .46, with CPknowers (M = .81, SD = .093) correctly assessing whether all the balls had come out more frequently than SSknowers (M = .64, SD = .083). In addition, as predicted, there was a significant interaction between Knowerlevel and Set Size, F(1,14) = 8.49, p = .011, η^{2}_{p} = .38 (see Figure 4). There was no significant effect of age. Posthoc pairwise comparisons revealed that in the Small trials, SS and CPknowers performed equally (SS: M = .94, SD = .11; CP: M = .94, SD = .12; p = .93, d = 0). In contrast, CPknowers significantly outperformed SSknowers on the Large trials (SS: M = .34, SD = .18; CP: M = .69, SD = .13; F(1,14) = 15.24, p = .002, η^{2}_{p} = .52). Furthermore, both SS and CPknowers performed well above chance (50% accuracy) for small set sizes (p's < .001), while only CPknowers performed above chance for large set sizes, t(7) = 3.03, p = .019, d = 1.74. In fact, surprisingly, SSknowers fell significantly below chance on the large set size trials, t(8) = 2.88, p = .021, d = 1.70.
Figure 4
Looking at Table 2, it is clear that both SS and CPknowers were more accurate on N1 trials, suggesting that children had an overall tendency to say that something was stuck in Mr. Elephant; nevertheless, CP knowers were more accurate than SS knowers on both trial types (N and N1).
Table 2
Trial Type  SSKnowers  CPknowers 

Empty  .111 (.220)  .500 (.378) 
One Inside  .611 (.417)  .813 (.372) 
d^{1} (dprime)  .947  .878 
Effects of Counting
Counting behavior was not recorded for all participants, but was available for 4 of the 8 CPknowers and 7 of the 9 SSknowers in our sample. No participant counted on every trial, 4 children (2 SSknowers and 2 CPknowers) counted on at least one trial, and 7 children never counted. Of the two SSknowers who counted, both counted on two trials, but responded inaccurately. Of the two CPknowers who counted, one counted on one trial and responded accurately. The other CPknower counted on two trials, and was accurate on one trial but inaccurate on the other trial.
Including only the children for whom counting behavior was recorded, an ANCOVA with SS/CP knower level as the independent variable, age and counting behavior as covariates, and proportion correct on small and large trials as a repeated measure, revealed a large and significant main effect of SS/CP, F(1,7) = 10.66, p = .014, η^{2}_{p} = 2.26, and a marginal interaction between set size and SS/CP, F(1,7) = 4.609, p = .069, η^{2}_{p} = .40, and no other significant effects or interactions. Specifically, there was no hint that counting behavior explained the SS/CP difference, because no effect of counting was detected, F(1,7) = .034, p = .859, η^{2}_{p} = .005. Thus, knowerlevel, not counting behavior or age, predicted success on the Mr. Elephant task.
Discussion
Experiment 3 replicated and extended the pattern of results from Experiment 1 using a different task. SS and CPknowers performed similarly and were highly accurate on a nonverbal number task for smaller numbers (1 to 3), but CPknowers significantly outperformed SSknowers for larger numbers (> 4). Using a nonverbal tracking task, we found evidence that understanding the cardinal principle is related to better tracking and memory for large quantities. Specifically, in the Mr. Elephant Game, children were asked to track one set of items over a temporal and spatial gap, and to notice whether the complete set was reestablished, whereas in the Caterpillar Game, children had to numerically match one set of objects (feet) to another set of objects (socks) in onetoone correspondence. Nevertheless, we found the same pattern of results—namely, CPknowers outperform SSknowers on large number trials.
General Discussion
The current experiments test the relationship between preschool children’s knowledge of cardinality and their responses on two nonverbal tasks: numerical matching (Caterpillar Game) and objecttracking (Mr. Elephant Game). The results provide strong support for the hypothesis that verbal counting and nonverbal quantitative reasoning are related in children, and help bring clarity to a conflicted body of findings about the relationship between these skills during development (e.g., HuntleyFenner & Cannon, 2000; Mix, 1999a, 1999b; Rousselle, Palmers, & Noël, 2004). Just like adults whose language lacks an integer list (Flaherty & Senghas, 2011; Frank et al., 2008; Spaepen, Coppola, Spelke, Carey, & GoldinMeadow, 2011), children who do not yet understand cardinality exhibit relatively coarse representations of numerical quantity when the quantity to be represented is greater than four—beyond the limit of the parallel individuation system. On smallnumber trials with set sizes of one, two, and three, SSknowers performed no differently from CPknowers, consistent with evidence for a primitive, languageindependent cognitive system for representing small exact quantities (Agrillo, Piffer, Bisazza, & Butterworth, 2012; Feigenson, Carey, & Hauser, 2002; Starr, Libertus, & Brannon, 2013). However, on largenumber trials with set sizes between five and nine, CPknowers were more sensitive to the differences between quantities: In the Caterpillar Game, they brought more socks for larger sets of feet, and the number of socks they brought was closer to the target, and in the Mr. Elephant game, they more accurately identified whether the exact number of hidden objects reappeared. Thus, mastery of verbal counting was correlated with performance on large, but not small, set sizes.
Three verbal mathematical capacities other than cardinality notably did not relate to performance on the nonverbal tasks. First, counting behavior during the nonverbal Caterpillar Game improved children’s performance on the large number trials, but it did not explain the difference in success between SS and CPknowers. Second, verbal estimation skill within the CPknower groups was also unrelated to performance on the Caterpillar Game (note that children were not tested on Fast Cards in Experiment 3). Third, no differences were observed across different levels of SSknowers (e.g., “one”knowers, “two”knowers, etc.), with the caveat that the samples were small; as a group, SSknowers’ performance differed substantially from that of CPknowers. Thus, after controlling for age, children’s mastery of verbal counting—specifically, their understanding of cardinality—was the key predictor of accuracy on a nonverbal numerical matching task.
The current data establish the relationship between the acquisition of meanings for large numbers and nonverbal processing for quantities and numerals larger than “four.” We provide developmental evidence for the crosscultural finding that knowledge of a meaningful count list is related to more precise nonverbal representation of large numbers (Flaherty & Senghas, 2011; Frank et al., 2008; Gordon, 2004; Spaepen et al., 2011).
These findings raise two important questions: First, which cognitive systems underlie the observed change in performance on nonverbal tasks, and second, what is the causal mechanism underlying the relationship between verbal and nonverbal numerical knowledge?
Some researchers have argued that the role of number language is to provide a concept of exact number or a “tool for thought” enabling exact number representations (Frank et al., 2008), but this does not seem to be the right explanation for CPknowers’ more precise performance in this study. In the Caterpillar Game, even CPknowers did not use exact representations of quantity to match numbers of socks to feet; rather, they tended to retrieve an approximately correct number of socks. Furthermore, both groups’ nearperfect performance on the small set sizes indicates that they understood the task and its specifically numerical demands. Therefore, if CPknowers paid more attention to numerosity, this was not a binary switch between attending or not attending to number: Both groups attended to numerosity in the small number trials, and even CPknowers failed to engage exact number in the large number trials. Furthermore, when comparing adults from numerate cultures to adults from innumerate cultures, the reason for differences in performance seems clear: The adults in numerate cultures can count, and thus form a stable representation to assist memory for ephemeral events. The children in our study who understood counting, however, did not necessarily deploy it in the service of solving the nonverbal problems. Moreover, the advantage of CPknowers over SSknowers held even after taking counting behavior into account. Therefore, our findings suggest a different role of language than a “tool for thought”: Acquiring language for numbers might be more deeply related to the cognitive representation of numerical quantity, or the use of these representations in computation, beyond providing an efficient tool for memory.
The observed data are consistent with several possible cognitive systems for nonverbal numerical representations that could support these patterns of performance. One explanation is that CPknowers have better numerical acuity (e.g., Chu & Geary, 2015; Shusterman et al., 2016; Wagner & Johnson, 2011), which may help them more accurately encode and reason about the larger set sizes in the nonverbal tasks. This explanation accords with the data in some ways. CPknowers’ responses in the Caterpillar Game demonstrated two characteristics of representations in the ANS: They were approximately but not exactly correct, and their responses exhibited scalar variability (i.e., increasing standard deviations for larger targets). Thus, change in numerical acuity in the ANS could underlie changes in nonverbal numerical problemsolving in the Caterpillar Game.
However, ANS acuity is an unlikely explanation of the CPknowers’ superior performance on the Mr. Elephant Game: The most difficult trials required children to distinguish between outcomes of six and seven balls coming out on the sevenball trials. This ratio of 1.16 is considered a very difficult discrimination ratio on other tests of numerical acuity in this age group (e.g., Halberda & Feigenson, 2008). It would therefore be surprising if differences in ANS acuity between CP and SSknowers drove differential performance on that task.
Differences in PI representations could also plausibly underlie the different performance between CP and SSknowers. In the Caterpillar Game, because the “feet” were distributed across the two sides of the caterpillar “body,” children could solve the problem either by noting the total set size or by noting the set size on each side (e.g., four on one side and five on the other on the ninefooted caterpillar). CPknowers might have been more likely than SSknowers to use a chunking strategy, noting the set on each side and combining them in their final response.
The distribution of the feet on either side of the caterpillar might have also affected SSknowers ability to represent the total numerosity presented. Importantly, it is not the case that SSknowers simply attended to one side and ignored the other; if they had ignored one side, their responses on the large number sets would have ranged from three (one side of the 6footed caterpillar) to five (the largest number on one side on any trial, on the 9footed caterpillar). This was not the case: SSknowers responded with sets larger than five socks on many of the large number trials, suggesting that they were oriented to the totality of the presented set of feet.
In addition to potentially using the two sides to help break down the task into two ‘chunks’, CPknowers might also have been better able to maintain exact representations of the set size on each side. Some research suggests that the setsize limit on PI increases through the preschool years from a set size limit of three in 3yearolds to four or five in 5year olds (Starkey & Cooper, 1995). This increased capacity could plausibly support children’s performance with the seven and ninefooted caterpillar, where children would have to remember a set of four or five on each side; a less mature system, with a limit of only three objects, would limit children’s performance with these larger set sizes. Thus, a small increase in the setsize limit of the PI system could support the observed pattern in the Caterpillar Game.
The results from the Mr. Elephant Game are also consistent with this explanation: CPknowers were much more likely to correctly assess both fiveball trials, demonstrating a true understanding of five discrete objects. SSknowers, on the other hand, performed at chance for the sets of five. These observations strengthen the possibility that there is a relationship between an increase in the setsize limit of the PI system (to five) and the timing of children’s acquisition of the cardinality principal. With much attention in recent years on the relationship between ANS acuity and symbolic mathematics, developmental change in the PI system has received little attention in recent years but may be important for children’s emerging number concepts.
Although it is not considered a cognitive ‘system’ per se, another possible explanation for the CPknowers’ advantage is enhanced spontaneous focus on numerosity or SFON, a child’s tendency to engage with quantities and number in her environment (Rathé et al., 2016). Children who intuitively pay more attention to numerosity in SFON tasks might enter more quickly into skillful counting or might be more motivated to practice counting (Hannula, Räsänen, & Lehtinen, 2007). Individual differences in SFON are consistently related to children’s counting skills such as subitizing in the small number range and rote counting in the large number range (Batchelor, Inglis, & Gilmore, 2015; Edens & Potter, 2013; Hannula & Lehtinen, 2005; Hannula et al., 2007); furthermore, SFON in 3.5yearold children is related to higher subsequent mathematical knowledge at ages 5 and 12 (Hannula & Lehtinen, 2005; HannulaSormunen, Lehtinen, & Räsänen, 2015). However, the current study differs from previous SFON studies in several important ways. Previous SFON studies have not distinguished performance between children who fully understand cardinality past the number “four” from those who do not. Counting ability is typically evaluated in SFON studies by procedural knowledge (e.g., Hannula et al., 2007) rather than a generalized understanding of cardinality. Additionally, the SFON tasks themselves typically focus on smaller set sizes (2, 3, or 4) than the ones used here (e.g., Hannula et al., 2007). In some studies, the participants are slightly older and may have already acquired cardinality (e.g., Batchelor et al., 2015). Finally, few SFON studies take place in the cultural context of the U.S., where children’s early number experiences may meaningfully differ. In short, development of theory related to SFON has enabled novel ways of thinking about the relationship between children’s nonverbal enumeration and their verbal counting skills. The current study is the first to explore this relationship between language and thought using the SFON approach with numbers beyond the subitizing range, and to emphasize the conceptual rather than procedural development of counting and cardinality.
Notably, one recent study provides compelling evidence that the sequentialenumeration tasks used in some SFON studies likely draw on ANS representations (Sella, Berteletti, Lucangeli, & Zorzi, 2016). Nevertheless, SFON tasks should not be taken as a measure of ANS acuity: when guided, ‘lowSFON’ children can perform these nonverbal tasks at the same level as ‘highSFON’ children (Hannula & Lehtinen, 2005), showing that SFON reflects children’s selfguided attention to numerosity, not their underlying competence in numerical discrimination.
The current findings lend support to arguments for a qualitative shift, or conceptual change, between SSKnowers and CPKnowers. Previous research on this transition has focused on children’s conceptual change in terms of their construction of a novel representation for numbers—namely, a meaningful count list (Carey, 2010; Sarnecka & Carey, 2008; Slusser & Sarnecka, 2011; Wynn, 1990, 1992). Our findings corroborate the qualitative change by showing that the acquisition of meaningful counting and cardinality is accompanied by additional changes in nonverbal reasoning.
Some previous researchers have argued that there is no semantic induction when children become CPknowers based on the fact that CPknowers fail to answer many questions about higher numbers within their count list (Davidson, Eng, & Barner, 2012). They posit that these limitations in CPknowers’ knowledge mean that there is no conceptual change, and no radical discontinuity between numerical representations in SSknowers and those in CPknowers. The current data argue against this perspective because they highlight a clear discontinuity in nonverbal representations that correlates with the discontinuity in number language. Additional evidence for a discontinuity between SS and CPknowers comes from Sarnecka and Wright (2013), who demonstrated that CPknowers, but not SSknowers, understand the principle of equinumerosity, and from Shusterman et al. (2016), who demonstrated in a longitudinal study that children’s numerical acuity on a nonverbal dotdiscrimination task increases right around the moment when they become CPknowers. Drawing on all of these findings, we conclude that the transition to CPknower status does indeed represent a major conceptual shift, and that the induction of cardinality is related to a broad suite of changes in children’s representation of quantity, including a dramatic and abrupt increase in its precision.
Of course, there are limits to what children learn when they acquire the cardinal principle. Davidson et al. (2012) convincingly demonstrate that CPknowers do not have stable number meanings ‘as high as they can count,’ contrary to the original suggestions of Carey, Sarnecka and others (Sarnecka & Carey, 2008) that children ‘bootstrap’ the meanings of all of the numbers within their count list when they become CPknowers. Furthermore, children slowly refine the mappings between numerals and representations of quantities in the ANS (Le Corre & Carey, 2007). The acquisition of cardinality, then, might best be characterized as a ‘limited semantic induction’ but nevertheless a robust conceptual change. The CPinduction is important because it is in this moment that children acquire their first meanings for higher number words beyond the range of parallel individuation; because they recognize in a new way (i.e., in a way not available to SSknowers) how the structure of the count list imbues numbers with their meanings; and because, as we show here, they exhibit a corresponding shift in the precision of nonverbal representations for those quantities.
An additional parallel between verbal and nonverbal number concepts comes from the 20footed caterpillar in Experiment 2. On the 20footed caterpillar, neither group performed very accurately, and CPknowers responded only marginally more accurately than SSknowers. This pattern on the nonverbal task parallels the low verbal knowledge of “twenty” in both groups: “Twenty” is essentially an ‘unknown’ number for some CPknowers and most SSknowers: Although they may produce it, they often cannot reliably count to it using stable order (Shusterman & Berkowitz, 2011), and they often cannot generate sets of more than 10 objects in GiveN (Shusterman, Cheung, Sarbh, & Taggart, 2015). CP and SSknowers’ limited verbal meaning for 20 (perhaps as something vague like a lot) may be related to their worse performance with this set size on the nonverbal task. Lower familiarity, distinctiveness, or acquired meaning of large number words like “twenty” (and like “five” for SSknowers) therefore appears to be associated with less accurate performance on nonverbal tasks in which these quantities need to be represented in memory. Even after becoming CPknowers, children clearly need to learn more about and become more familiar with higher number words. An open question is whether further development of number language, beyond cardinality, correlates with more precise numerical representations in nonverbal tasks.
Finally, we note that this study is correlational, and therefore cannot in itself address causality: Change in core nonverbal number representations may support number language; number language may induce change in cognitive systems involved in nonverbal numerical reasoning; or these changes may cooccur as part of an interrelated set of conceptual shifts. The crosscultural findings with innumerate people imply that language causally changes the way in which quantities can be represented, and that in the absence of acquiring such language, conceptual representations are not pushed to change. Studies on the development of number concepts under conditions of accelerated language (e.g., with number training) or delayed language (e.g., due to limited access to a native language) will help to tease apart these possibilities.
To summarize, children’s acquisition of exact cardinal meanings for large numbers (beyond the PI range) correlates with their performance on numerical problemsolving tasks that require remembering and matching large, exact quantities. In particular, responses to target sets between five and nine were less accurate and more variable in children who had not induced cardinality than in those who had. In contrast, smaller set sizes were handled easily by all children, regardless of knowerlevel. Performance on the nonverbal tasks did not vary for children at different SSknower levels, as a function of developing better verbal estimation skills, nor as a function of counting behavior. Our findings accord with reports of close links between symbolic and nonsymbolic mathematical competence in children (Libertus et al., 2011; Shusterman et al., 2016) and adults (Frank et al., 2008), and extend this conclusion by demonstrating the same result with two engaging tasks with low task demands.
These findings thus bring some clarity to a previously conflicted body of literature, by showing a distinct relationship between the acquisition of number meanings larger than “four” and nonverbal problemsolving with quantities larger than four. This pattern of results helps to explain why previous studies, which lacked the specific comparison between SS and CPknowers on large numbers, did not find hypothesized relationships between verbal and nonverbal number knowledge. These findings open up a new set of questions regarding which nonverbal cognitive skills and causal mechanisms underlie the tight link observed between number language and number thought.