Introduction [TOP]
Research on mathematical cognition has made substantial advances in recent years, in areas that span diverse theoretical frameworks both within and across disciplines. For instance, neuroscience research has revealed potential mechanisms contributing to the representation of number and related concepts in the brain (De Smedt, Noël, Gilmore, & Ansari, 2013). Behavioural research has identified distinctive characteristics in the mathematical thinking of young children with atypical development (Dennis, Berch, & Mazzocco, 2009; Geary, 2010; Kaufmann et al., 2013), individual differences in typical mathematics learning (Geary, 2011; Raghubar, Barnes, & Hecht, 2010), and the nature and variety of mathematical expertise (Weber, Inglis, & MejíaRamos, 2014). Studies on the effects of the environment, including home (LeFevre et al., 2009) school (Beilock, Gunderson, Ramirez, & Levine, 2010; Clements & Sarama, 2004), and culture (e.g., Jones, Inglis, Gilmore, & Dowens, 2012), have also influenced theoretical conceptualisations of mathematical learning and contributed to the design of interventions to support that learning (Cohen Kadosh, Dowker, Heine, Kaufmann, & Kucian, 2013).
Collectively, this work could have substantial international policy implications, and indeed specific research programmes have led to largescale trials of teaching practices and materials development (Torgerson, Wiggins, Torgerson, Ainsworth, & Hewitt, 2013). However, the potential of the field as a whole has yet to be fully realised, perhaps in part due to its diverse nature. This diversity affords the benefits that accompany multiple perspectives, but also generates obstacles resulting from discrepant methodologies. So, while mathematical cognition researchers share central concerns – the desire to understand and support mathematical thinking and learning at all educational levels – they have different priorities, methods and conceptions of central theoretical constructs, and their work is published across a diverse range of outlets. Arguably this is an appropriate time to draw together current thinking from these fields and to articulate a mutually understood view of research priorities that can inform and shape future research in mathematical cognition.
This paper is designed to do that. It reports the results of an exercise that brought together 16 researchers from six countries with expertise in mathematics education, psychology and cognitive neuroscience, and that sought to identify key open questions in mathematical cognition. The questions were generated using a collaborative process modelled on previous exercises in other fields (Sutherland et al., 2006, 2012; Sutherland, Fleishman, Mascia, Pretty, & Rudd, 2011), with adaptations appropriate to the comparatively focused research topic and small number of participants. We followed a sixstage process to generate an initial list of questions, select questions that participants collectively viewed as most important, refine these so as to meet a list of inclusion criteria, and organise them in a form conducive to the discussion and planning of new and ongoing research programmes (for details see the Method section). From these questions, practical suggestions evolved regarding how to maximize the field’s ability to share both work and findings across research laboratories and disciplines.
The final question list, as presented in the Results section, is narrow in the sense that it is specifically focused on research related to the cognitive aspects of human mathematical thinking and learning. But the list is broad in that it reflects priorities ranging from issues of representation in the brain, through the mathematical thinking and learning of very young and older children, up to and including the practices and reasoning of expert mathematicians. We intend the questions to support greater coherence in both investigation and reporting of issues in mathematical cognition, and thus to build a stronger base of information for consideration by researchers and policymakers. We hope that the questions will inspire researchers in the relevant fields to likewise address big challenges in mathematical cognition and, in the process, to develop methods that promote interconnected theories.
Background [TOP]
Researchers with interests in mathematical cognition represent a variety of disciplines.
Those working in mathematics education have often trained initially as mathematicians or teachers; some have moved from an undergraduate or masters experience in mathematics straight through to a doctorate in education, but many have moved into educational research later in their careers after teaching in schools, working to prepare teachers, or working as mathematicians. They typically publish in outlets such as the Journal for Research in Mathematics Education^{i}, Educational Studies in Mathematics and the Journal of Mathematical Behavior, and share work at conferences such as that of the International Group for the Psychology of Mathematics Education. Much work in all of these outlets comprises detailed qualitative studies (including case studies) on students’ understandings of specific mathematical concepts, from those in early number right up through undergraduate study and into work on the reasoning of mathematical experts. Other work takes the form of design research (Prediger, Gravemeijer, & Confrey, 2015), and experimental studies are comparatively rare (Alcock, Gilmore, & Inglis, 2013).
Those working in psychology have usually trained initially in developmental, experimental, or cognitive psychology; some pursue research in mathematical cognition due to an interest in mathematics per se, but others are interested in more general cognitive functioning and use mathematical ideas primarily as a vehicle to study this. These researchers typically publish in journals including Cognition, Child Development, Developmental Science, Journal of Experimental Child Psychology, Journal of Educational Psychology and Learning and Instruction, and share work at conferences such as that of the European Association for Research on Learning and Instruction and the Society for Research in Child Development. Unlike those with backgrounds in education, they frequently conduct experimental or quasiexperimental studies, but the mathematical concepts involved are often based in simple arithmetic and its precursors.
Those working in cognitive neuroscience have often trained initially in experimental or cognitive psychology, cognitive science or neuroscience. Most have moved into developmental cognitive neuroscience after doctoral training in adult cognitive neuroscience and/or developmental disorders. These researchers typically publish in journals such as the Journal of Cognitive Neuroscience, Neuropsychologia and Trends in Education and Neuroscience, and share work at conferences such as the Annual Meeting of the Cognitive Neuroscience Society and Conference of the European Society for Cognitive Psychology. Typical studies in this area include measures of brain activity during number processing and arithmetic; often the work relates these to behavioural measures. Central research questions involve investigating the brain correlates of number processing and arithmetic, how these change with development, skill acquisition and training, and their role in dyscalculia.
Crucially for the development of research in mathematical cognition, there has traditionally been little communication between researchers working in these areas; crosscitations (with education in particular) are comparatively rare. If this is a fault then it is a common and predictable one: researchers across all disciplines publish their work primarily in those journals that most closely reflect their influences and methods, so it is natural that citation silos should evolve. But this clearly limits an interdisciplinary endeavour: insights from one area might well inform, accelerate, or supplant theoretical developments in another, and methodological diversity should be embraced as a means of developing both different types of insight and individual researcher skills. For these reasons we sought to assemble researchers with a diverse range of interests in human mathematical cognition, and to implement a process that would focus their attention on their common goals, clarify similarities and differences in theories and methods, and facilitate future communication. In communicating the outcomes of this process in the form of a paper, we aim to further these goals by stimulating a similar debate across the broader mathematical cognition community. We believe that the time is right for this because a collective desire for communication is evident in the establishment of interdisciplinary societies (e.g., the International Mind, Brain, and Education Society) and journals like this one, and because one way to maximise the gains from structured crosstalk is to focus on the big questions we would all like to see answered. We return to this point in the Discussion section of this paper.
Method [TOP]
We modelled our method on that described in Sutherland et al. (2012), who pursued a Delphilike process (Hsu & Sandford, 2007) to identify a list of unanswered questions on the relationship between science and policy; Sutherland has also been involved in similar exercises in ecology and conservation (e.g., Sutherland et al., 2006)^{ii}. We were conscious throughout that, as Sutherland et al. (2011, p. 246) note “[a]ny prioritysetting exercise is the product of the people who participate. The results can be influenced by the interests present. Furthermore, individuals may have agendas. However, a diverse and moderately large group, clear criteria, and a democratic process all help reduce the impact of any individual.” In this section we explain how we addressed these issues.
Compared with fields like science policy, our research domain is small. But we aimed nevertheless to act in the spirit of Sutherland et al.’s (2012) approach, seeking participants who would represent a range of disciplines and who would ideally have experience or at least knowledge of the theories and methods typically in use in two or more areas. Because resources were limited and because we were interested in drawing together a community of researchers whose work could be expected to influence the field for an extended time, we sought particularly to invite early to midcareer academics with research programs that were established enough to have a high probability of making substantial further contributions in the coming decades. With these goals in mind, we invited 16 scholars from the disciplines of mathematics education, psychology and cognitive neuroscience to take part (the 16 are the authors of this paper); 15 attended as delegates at the Royal Societyfunded conference Grand Challenges in Mathematical Cognition^{iii} and one participated in all the nonconference stages and had a particular role in organising the questions for presentation. Table 1 lists the participants’ countries of affiliation, main research areas, and specific illustrative research foci.
Table 1
Participant  Country  Edu  Psych  Neuro  Illustrative Research Foci 

Lara Alcock  UK  X  
Daniel Ansari  Canada  X  X  
Sophie Batchelor  UK  X  X  
MarieJosée Bisson  UK  X  X  
Bert De Smedt  Belgium  X  X  X  
Camilla Gilmore  UK  X  
Silke M. Göbel  UK  X  X  
Minna HannulaSormunen  Finland  X  X  
Jeremy Hodgen  UK  X  
Matthew Inglis  UK  X  X  
Ian Jones  UK  X  
Michèle Mazzocco  USA  X  X  
Nicole McNeil  USA  X  
Michael Schneider  Germany  X  
Victoria Simms  UK  X  
Keith Weber  USA  X 
Note. Edu = Education; Psych = Psychology; Neuro = Neuroscience.
With the participants in place, the questions that form the core of this paper were generated in six stages: Stages 1 and 2 were completed online, Stages 3 to 5 occurred at the conference and Stage 6 took place after the meeting.
In Stage 1, all participants were invited to draw up a list of research questions for which answers would significantly advance our understanding of mathematical cognition. They were encouraged to draw on their own work and that of their broader discipline, as well as their knowledge of classic problems or recent developments; they were also encouraged to solicit input from colleagues and contacts. The team of organisers collated these questions, amalgamated or removed duplicates, and organised the resulting 155 questions into themes. In Stage 2, these questions were anonymised and made available in an online survey via which all participants were invited to rate all questions on a scale of 1 (low) to 10 (high) according to their importance for advancing the field. (The original questions and original and summarised rankings are available as Supplementary Material.) After the survey, the 63 questions with a median score of 8 or higher were retained for the conference stages.
At the conference, Stages 3, 4 and 5 of the question winnowing and refinement process were completed. Delegates were asked to base their question selection and refinement on criteria drawn from Sutherland et al. (2011) stating that the eventual questions should:

address an important gap in knowledge;

be formulated specifically (not as a general topic area);

be clear, where appropriate, about specific interventions and outcome measures;

be answerable specifically (not by ‘it all depends’);

have a factual answer that does not depend on value judgements;

be answerable through a realistic research design;

be of a scope that could reasonably be addressed by a research team.
For Stage 3, delegates were assigned to subgroups of three or four participants representing the variety of research backgrounds present across the assembly of 16. Sufficient time was provided for members of the mixed groups to discuss all the questions, noting ambiguities and suggesting initial rephrasings. For Stage 4, delegates were assigned to topic groups composed of four or five participants, where the topics were named numerical representations and developmental dyscalculia, cognitive factors beyond basic representations, and beyond basic numeracy. The topic groups were tasked with reducing their respective question lists by half, and with rewriting questions where appropriate to reduce ambiguity or duplication and to ensure that the questions met the aforementioned inclusion criteria. For Stage 5, all 15 participants reconvened and reviewed the remaining lists. At this stage further refinements to wording were suggested and questions that appeared essentially equivalent across two of the lists were combined. We note that the discussions were extremely constructive throughout, and we found that our conceptions of key constructs and issues in learning were remarkably similar. Resolving differences was therefore largely a matter of settling on terminology that did not evoke unintended ideas or assumptions for any one group.
Stage 6 took place after the conference. We were conscious that the assignment of delegates to topic groups necessarily stratified the questions by the educational stage of their typical research participants. Thus the question lists were merged and were sent to the nonattending participant for sorting into smaller groups without this restriction. The outcome grouped the questions in a way that focused on conceptual relationships rather than educational stage; these groups of questions appear in the Results section.
Results: Research Questions [TOP]
We present the results of the exercise in two sections. In the first section, we list the groups of questions that emerged from Stage 3 (as described above); each group (labeled AF) is accompanied by a brief commentary. We present the questions in generic forms, rather than from the perspectives of any specific theoretical frameworks, because each might profitably be addressed using multiple frameworks (as is reflected in the field). Where there have been recent developments in competing theories, we endeavour to draw attention to these. In the second section, we take two subgroups of questions – those on the nature of mathematical thinking and those on developmental trajectories and their interactions – and offer more detailed critical review of recent developments in these areas. We take this approach because it enables us not only to illustrate the range of work taking place across research in mathematical cognition but also to discuss the relative maturity of different parts of the field, thereby highlighting the shifting theoretical and methodological demands that arise as research areas develop.
A. Elucidating the Nature of Mathematical Thinking [TOP]

Do infants really have a ‘sense of number’ or are they merely sensitive to quantitative dimensions?

How do feelings of correctness or doubt arise when people are doing mathematics, and how do they influence mathematical reasoning and response to instruction?

What strategies do experts and students use when evaluating whether a mathematical assertion is true or false?

What mathematical cognition do experts use in their practice and what methods can researchers use to provide an accurate portrayal of this?
These questions reflect the ambitious longterm goal of understanding mathematical cognition from infancy to mature expertise. Collectively they address the contribution of nativist theories in proposing foundational “number sense” capacities seen in nearly all human infants (e.g., Feigenson, Dehaene, & Spelke, 2004; Geary, 2007), the mature state of refined formal cognitive capabilities associated with highly successful mathematicians (Weber, Inglis, & MejíaRamos, 2014), and the asyet unexplained mechanisms that link the two. Although individual research studies necessarily focus on a relatively narrow part of this spectrum, participants in the exercise recognised the importance of working towards an empirically evidenced understanding of mathematical cognition that accurately captures effective mathematical reasoning, that provides reliable ways of diagnosing atypical development, and that is specific enough to be leveraged for the design of interventions. In the Critical Review section to follow, we discuss the methodological challenges associated with studying expertise in particular.
B. Mapping Predictors and Processes of Competence Development [TOP]

How do children learn the meaning of noniconic representations of number?

What are the key mathematical concepts and skills that children should have in place prior to the start of compulsory education?

Do causal mechanisms underlie the correlational evidence between domainspecific foundational competencies (e.g., the Approximate Number System) and mathematics performance? What are reliable early and later longitudinal predictors of the development of number skills, arithmetic and other aspects of mathematics? These might include:

What are the interactions amongst these predictors?

What are the directions of associations?
These questions are motivated by concerns about the large proportion of individuals of all ages whose achievement in mathematics is considered inadequate relative to standards or criteria specified by governments or related agencies. Compared with the questions in Set A, they are more specific in several ways. First, they explicitly acknowledge that both typical and atypical mathematical development might be influenced by a wide range of environmental and affective factors, as well as by domainspecific and domaingeneral cognitive competences. As such, it is necessary to consider the full range of influences (and the ways in which we measure these) because such factors might interact in complex ways. Moreover, the nature of these interactions might vary across periods of development. This issue becomes more salient as apparent inconsistencies across research studies come to light, as illustrated by studies of the Approximate Number System (ANS) and formal numerical skills (De Smedt, Noël, Gilmore, & Ansari, 2013; Gilmore et al., 2013). The time is ripe for studies designed to disentangle the effects of different predictors and to map patterns of development, and such studies are beginning to be reported (Fuchs et al., 2010; Göbel, Watson, Lervåg, & Hulme, 2014; Klibanoff, Levine, Huttenlocher, Vasilyeva, & Hedges, 2006; Mazzocco, Feigenson, & Halberda, 2011).
C. Charting Developmental Trajectories and Their Interactions [TOP]

Are there alternative cognitive pathways by which one can be a successful mathematics student? If yes, what does this imply for typical research methods used by mathematical cognition researchers?

How are different mathematical skills (including representing number, counting, performing arithmetic, using fractions) and their developmental trajectories related to each other?

Is developmental dyscalculia qualitatively different from arithmetic performance at the lower end of the normal distribution?

As we build causal models of pathways to mathematics learning and thinking, what models emerge for compensatory mechanisms – cognitive skills or abilities that allow individuals to overcome shortcomings in foundational skills, and that contribute to patterns of behaviours?
The questions in Set B acknowledge the continuum of typical and atypical development, but there is no reason to presume that pathways to mathematical understanding are homogeneous among all children whose path is considered either “typical” or “atypical”; in both scenarios, individual differences are likely (Geary, 2011; Kaufmann et al., 2013, respectively). The questions in Set C thus consider how research might capture variations in mathematical learning. They are particularly pertinent for researchers considering cognition about mathematical concepts beyond basic arithmetic because, earlier points notwithstanding, early number and basic arithmetic have been comparatively well studied. This is partly because arithmetic concepts provide a clean context for setting and evaluating performance on a restricted set of problems with definite right and wrong answers, meaning that these concepts provide a sensible setting for investigating more general cognitive processes (e.g., strategy use, for which see Siegler, 1988; Wu et al., 2008; gesture, for which see GoldinMeadow, Wagner Cook, & Mitchell, 2009). They are also amenable to investigation in the sense that research has identified a fairly restricted number of strategies that a student might use to solve, for instance, a singledigit addition problem. But mathematics as a whole has a hierarchical structure that builds rapidly through many levels of abstraction. At every level, it might be possible to explain learning not only in terms of what students lack – mature working memory systems, proficiency with basic arithmetic facts – but also in terms of what they already know: previous knowledge can interfere with the learning of higherlevel ideas (e.g., McNeil, 2014). Researchers have developed pockets of knowledge about how this might play out (e.g., the natural number bias as discussed by Van Hoof, Lijnen, Verschaffel, & Van Dooren, 2013). But knowledge development is currently understood only in single steps in which a particular existing conception is known to influence interpretations of a new concept – we know very little about longitudinal development in mathematical cognition, and the questions in this section reflect this.
D. Fostering Conceptual Understanding and Procedural Skill [TOP]

How can educators effectively capitalise on informal mathematical knowledge when teaching mathematical concepts?

How can instruction help students to identify normativelyvalued similarities between mathematical domains, problems, representations, and situations?

How can we help children develop fluency with basic facts and skills while still promoting understanding of the underlying concepts?

How can we teach/learn so that useful mathematical knowledge gets automated, while also promoting understanding of the underlying concepts?

For specific concepts, which students profit from different sequences of exposure to more and less concrete representations?
Mathematical thinking does not emerge in a vacuum. By definition, emergence of formal mathematics requires instructional input, but what form should this instruction take? The questions in Set D sit at the heart of a distinction that causes consternation and controversy. Students at all levels must develop procedural fluency so that they can efficiently execute standard manipulations, and conceptual understanding so that they can identify and work effectively with mathematical structures that arise in a variety of situations (Kilpatrick, Swafford, & Findell, 2001). The fundamental distinction between the two is conceptualised in various guises and terminologies in both education (Skemp, 1976; Star, 2005) and psychology (RittleJohnson & Schneider, 2015), and the apparent tension manifests itself most dramatically in the socalled “math wars” (Schoenfeld, 2004), which contrast “back to basics” movements advocating drill and practice – intended to develop factual knowledge and procedural fluency – with programmes designed to foster conceptual understanding by providing mathematical learning situations that are personally meaningful to the student (Gravemeijer, 1994; Van den HeuvelPanhuizen, 2003). A similar tension also appears in a more moderate guise in programmes designed to test the relative efficacy of learning via concrete or abstract representations (De Bock, Deprez, Van Dooren, Roelens, & Verschaffel, 2011; Kaminski, Sloutsky, & Heckler, 2008). If one of these approaches were truly superior in a straightforward way then we would surely by now have conclusive evidence of this, so it seems likely that progress will be made by asking not about the extremes of the dichotomy but about the specific implementation and combination of different types of learning experience. As indicated by the questions in this section, there is much potential for developing and testing specific instructional sequences.
E. Designing Effective Interventions [TOP]

Which domainspecific foundational competencies are most malleable and when in developmental time? And does their malleability impact on other aspects of mathematical performance?

What are the features (including content of intervention or instruction and characteristics of children) of current successful interventions and instruction?

What concepts and skills need to be in place for success in algebra, how are these treated in different curricula, and what learning outcomes result if these treatments and their sequencing are experimentally compared?

What are the most effective interventions for children with dyscalculia, when is the best time for intervention, and which factors best predict the response to intervention?
These questions address issues already raised, but in a more practical way. They focus on the nuances of applied settings, recognising that the effectiveness of any given intervention might depend upon characteristics of the individuals for whom it is intended. The questions also highlight the twoway relationship between theoretical development and experimental testing: by interceding with an intervention based on theoretical propositions about the causes of mathematical errors and difficulties, researchers test both the intervention and the underlying theoretical propositions. It is therefore encouraging that effective interventions do exist and have been tested at scale. Some of these have been designed to allow for individual differences among students who are falling seriously behind, by tailoring researchbased interventions to individuals after detailed diagnoses (Holmes & Dowker, 2013; Torgerson et al., 2013). Others have been designed to change instruction in ordinary classrooms by reducing unhelpful regularities in teaching materials and by deliberately designing activities that have demonstrable positive effects (McNeil, Fyfe, & Dunwiddie, 2015). Such interventions so far have been primarily directed at comparatively young children, which is natural due to levels of researchbased knowledge and to serious concerns about the cultural and economic implications of children falling behind at early educational stages. But the forms of such interventions, when combined with localised knowledge about typical understandings of a broader range of concepts, have potential for improving instruction at higher levels too.
F. Developing Valid and Reliable Measures [TOP]

Are there reliable and valid methods for measuring ANS acuity that can be used to track development over an extended period of time (infancy – childhood – adulthood)?

How can we measure informal numeracy experiences validly and reliably?

How can we measure the variability in children’s mathematical experiences outside of school?

How can we develop reliable and valid measures of understanding of key mathematical concepts?
These questions capture issues that pervade the entire list: to obtain convincing evidence within and across these questions, we must give appropriate attention to our measures of key predictors and outcomes. In some parts of the literature, debates about these issues are both obvious and detailed. For example, in the substantial body of work on the ANS, methods have become an overt subject of debate as comparisons across research studies reveal serious inconsistencies (Clayton, Gilmore, & Inglis, 2015; De Smedt et al., 2013). In other areas, the discussion is less specific: for instance, methods have been advanced for measuring constructs such as understanding in relation to certain concepts, but there is a lack of agreement over what constitutes evidence of conceptual understanding (Crooks & Alibali, 2014). At higher levels, with exceptions such as the Calculus Concept Inventory (Epstein, 2013), there are few posited standard measures of mathematical knowledge and understanding. The listed questions reflect the need for valid and reliable measures of understanding and performance across all levels of mathematics. Attention to multiple methods, to their details and differences, is certainly a positive step.
Critical Reviews [TOP]
In this section we extend the preceding commentaries for those questions from two sections: Section A (the nature of mathematical thinking) and Section B (developmental trajectories and their interactions).
Elucidating the Nature of Mathematical Thinking [TOP]
Recall that the questions in this section are:

Do infants really have a ‘sense of number’ or are they merely sensitive to quantitative dimensions?

How do feelings of correctness or doubt arise when people are doing mathematics, and how do they influence mathematical reasoning and response to instruction?

What strategies do experts and students use when evaluating whether a mathematical assertion is true or false?

What mathematical cognition do experts use in their practice and what methods can researchers use to provide an accurate portrayal of this?
By their nature these questions present serious methodological challenges. A wealth of data suggests that infants have an innate ability to process quantitative information (Izard, Sann, Spelke, & Streri, 2009); habituation and preferentiallooking studies indicate that infants can discriminate between sets of small (Starkey & Cooper, 1980) or large quantities (Xu & Spelke, 2000; but see Clearfield & Mix, 2001, for an alternative perspective). It has been argued that these numerical discrimination abilities are guided by two separate systems, Parallel Individuation for small quantities and the ANS for large quantities (Feigenson et al., 2004); both systems are believed to be multimodal in the sense that they aid the discrimination of visual, auditory and actionbased quantitative stimuli (Carey, 2009; Lipton & Spelke, 2004). But a variety of additional characteristics can vary with quantity: for visual stimuli, for instance, increases in quantity generally cooccur with increases in surface area or contour length (Mix & Cheng, 2012). Although researchers may attempt to control for these extraneous perceptual variables, they cannot all be controlled simultaneously (see Cantrell & Smith, 2013 for an extensive review). Therefore, previous findings may be interpreted as indicating that infants are sensitive to changes in a variety of general magnitude dimensions, such as duration, length, convex hull, and area, in addition to quantity (Gebuis & Reynvoet, 2012; Newcombe, Levine, & Mix, 2015). The question of which is the most accurate interpretation has yet to be resolved.
Regarding more complex mathematics we know even less. This is important because in mathematics education there is a large movement for students to engage in authentic mathematical practices – to learn to approach problems in a manner epistemologically consistent with the thinking of professional mathematicians (e.g., Common Core State Standards Initiative, 2012; Lampert, 1990; National Council of Teachers of Mathematics, 2000; Sfard, 1998). Designing for such learning requires an understanding of how mathematical experts complete mathematical tasks and how professional mathematicians practise their craft.
In particular, a desired outcome is for students to learn to solve nonroutine problems (e.g., Common Core State Standards Initiative, 2012; National Council of Teachers of Mathematics, 2000; Schoenfeld, 1985), a process that often involves making conjectures and estimating their veracity. Such activity likely involves many behaviours and competences, one of which is constructing and interpreting mathematical representations. Both analysis of expert protocols (Koedinger & Anderson, 1990; Schoenfeld, 1985; Weber & Alcock, 2004) and mathematicians’ reflections on their own practice (Hadamard, 1945; Polya, 1957) indicate that mathematicians’ judgments about truth values often involve diagrams, and that their use of these can interact with symbolic reasoning in complex ways (e.g., Koedinger & Anderson, 1990). But there is a growing body of literature indicating that students’ propensities to use diagrams have little correlation with their mathematical achievement (Presmeg, 2006), and we do not yet know why students do not typically reap the same benefits from multiple representations as do mathematicians.
We also lack a clear understanding of what evidence mathematicians use to gain certainty (or high degrees of confidence) in mathematical conjectures. The traditional claim is that mathematicians rely on proofs – logically rigorous deductive arguments (e.g., Griffiths, 2000). But this only raises questions about how such proofs are produced and how mathematicians judge whether they are valid. Given that students usually lack proficiency in constructing and evaluating deductive proofs (e.g., Healy & Hoyles, 2000), even after completing curricula explicitly designed to teach these skills, better models of the proving process could have important implications for mathematics instruction.
Even in the absence of proof, mathematicians must still sometimes make subjective estimations about the plausibility of conjectures. Franklin (2013) argued that such judgments partially determine what problems a mathematician chooses to investigate, and Devlin (2002) proposed that the way mathematicians check arguments for correctness is determined by how much confidence or doubt they have in those arguments. However, there is almost no systematic research into how mathematicians do, or students should, make such judgments. Clearly mathematicians and students do experience varying levels of confidence and doubt about mathematical claims and arguments, so it is natural to ask how these arise and how they influence reasoning and behaviour; these are open questions.
We conclude this section by acknowledging the methodological difficulties inherent in investigating students’ and mathematicians’ mathematical practices. Developing conjectures and forming and evaluating arguments are timeconsuming tasks that rely extensively on an individual’s background knowledge. Experts do not necessarily address these tasks in the same ways, and the obvious approaches to investigating their practices each have limitations. An investigator can ask individuals to describe their own reasoning processes, and many researchers have employed verbal protocol analyses in attempts to model students’ and mathematicians’ problem solving (e.g., Schoenfeld, 1985). But we know that individuals’ selfreports about their own reasoning are often inaccurate (e.g., Nisbett & Wilson, 1977). An alternative is to conduct a naturalistic study, but much mathematical reasoning is done in isolation and silence, so this might provide only limited information. Moreover, a naturalistic study incurs substantial time costs: even in school environments, developing and testing a single conjecture can take a class period or longer, and in expert practice, creating and testing a conjecture can span months or years. Naturalistic studies have taken on this challenge, observing mathematicians in meetings over several months (e.g., Greiffenhagen & Sharrock, 2011; Smith, 2012). But they necessarily have relatively small numbers of participants, making it difficult to generalize the findings to the broader population of mathematicians, especially those who work in different domains. And, of course, analyzing the data requires the investigators to be at least partially conversant with cuttingedge mathematics.
One more tractable methodological alternative is to present individuals with proxy tasks, simplified versions of the activities that we would like to investigate that can be completed in a relatively short time and that do not require extensive background knowledge. Using proxy tasks does not guarantee that behavioural patterns observed would be present in authentic situations, but such tasks do permit the investigator to competently analyze the data, and they are beginning to enable progress at least in confirming or debunking common claims. Recent eyemovement studies, for instance, have confirmed that undergraduate students pay less attention than mathematicians to words (as compared with symbols) in purported proofs, and that they are less likely to shift their attention around in a manner consistent with seeking deductive justifications (Inglis & Alcock, 2012). Other recent studies have revealed that mathematicians disagree on the validity of even fairly simple mathematical arguments (Inglis, MejíaRamos, Weber, & Alcock, 2013), that they respond in more nuanced ways than is traditionally believed to arguments based on diagrams (Inglis & MejíaRamos, 2009a) or presented as from an authoritative source (Inglis & MejíaRamos, 2009b), and that their sense of beauty in a mathematical proof is not systematically related to that proof’s perceived simplicity (Inglis & Aberdein, 2015). Several of these findings challenge conventional claims about the nature of mathematical expertise by revealing more than expected heterogeneity in the experiences and judgments of mathematicians. They thus constitute steps toward a more informed view of what mathematics students should learn.
Charting Developmental Trajectories and Their Interactions [TOP]
The questions in this section are:

Are there alternative cognitive pathways by which one can be a successful mathematics student? If yes, what does this imply for typical research methods used by mathematical cognition researchers?

How are different mathematical skills (including representing number, counting, performing arithmetic, using fractions) and their developmental trajectories related to each other?

How is developmental dyscalculia qualitatively different from arithmetic performance at the lower end of the normal distribution?

As we build causal models of pathways to mathematics learning and thinking, what models emerge for compensatory mechanisms – cognitive skills or abilities that allow individuals to overcome shortcomings in essential skills, and that contribute to patterns of behaviours?
In comparison with the questions discussed in the previous section, research pertaining to these is comparatively advanced. But it is by no means conclusive: the available data suggest that multiple potential pathways to mathematical success exist, but do not specify precisely what those pathways are. This is evident among children who have mathematical difficulties of known (e.g., Dennis et al., 2009) or unknown etiology (e.g., Geary, 2011; Kaufmann et al., 2013), the most easily differentiated of whom have readily classifiable developmental diagnoses such as fragile X syndrome, spina bifida, and 22q deletion syndrome (De Smedt, Swillen, Verschaffel, & Ghesquière, 2009; Mazzocco, Quintero, Murphy, & McCloskey, 2016). Some group differences, while disorderspecific, may provide models for variation in the general public if origins for and mechanisms underlying different pathways and trajectories were well understood. This makes inquiry into these trajectories particularly valuable.
To understand differences in developmental trajectories, researchers attend to relative deficits and assets – a neuropsychological approach that reveals potential dissociations of mathematics skills and that thus extends knowledge of basic cognitive processes and potential routes to both dyscalculia and typical mathematical achievement. For instance, schoolage children with 22q deletion syndrome (22qDS) have intact fact retrieval skills but relatively impaired enumeration of quantities (see the review by De Smedt et al., 2009); they also exhibit intact word learning, but undercount enumeration errors for sets exceeding four (Quintero, Beaton, Harvey, Ross, & Simon, 2014). Whether these enumeration errors reflect difficulties with mapping of numbers to quantities (e.g., Dehaene, Piazza, Pinel, & Cohen, 2003), with spatial attention (Simon, Bearden, McGinn, & Zackai, 2005), or with other basic processes has important implications for the origins of developmental trajectories. But, in all possible cases, their existence demonstrates a dissociation between components of mathematical cognition.
Similar dissociations are evident across other conditions. For instance, mathematical performance remains comparatively unimpaired in children with 22qDS when tapping the verbal system (Mazzocco et al., 2016), but not when engaging in nonverbal tasks. In contrast, whereas children with spina bifida meningomyelocele show delayed counting principles in early childhood and poor numeracy across the lifespan, their mathematical fact retrieval performance is not correlated with presence or absence of a learning disability (English, Barnes, Taylor, & Landry, 2009) but rather with level of fine motor skills (Barnes et al., 2006). In children with Williams syndrome, the dissociation between numerical skills is not only in nature but also in timing: abstract representations of small quantity are age appropriate during infancy, as is the ability to track small item sets. But these skills do not remain intact throughout life, and impaired approximate number representations of large sets emerge even in early childhood. Remarkably, despite this profile, young children with Williams Syndrome (WS) outperform other young girls with fragile X syndrome on counting principles such as cardinality.
These systematic profile analyses are each guided by theories linked to specific biochemical, neuroanatomical, or neuropsychological characteristics of known conditions (as reviewed by Dennis et al., 2009). As such, principled approaches allow us to test theories of the origins of mathematical learning trajectories, or to ask when pathways diverge or become unique. Possible factors in divergence include impaired motor function that prevents motor exploration in infants and toddlers with spina bifida (and that limits finger use in older children), and disrupted development of the visuospatial processing system in 22qDS or WS. But these factors are not syndrome specific, so the mathematical assets observed in these and other populations have implications for potential compensatory mechanisms – routes via which we might promote mathematical learning in any children who share assets in these domains.
This argument about assets extends to domaingeneral skills like working memory (English et al., 2009), and of course developmental differences are also observed in children with typical mathematical achievement (LeFevre et al., 2010). To date, studies of typical populations have focused on numerical skills, fractions, fact retrieval, word problem solving, or general mathematical achievement, depending on the researcher’s discipline. Measured potential mediators or moderators frequently include executive functions (e.g., working memory, inhibitory control; Bull, Espy, & Wiebe, 2008), processing speed, and, more recently, indices of mathematics anxiety or motivational factors (Beilock et al., 2010; Wang et al., 2015). Importantly, most of the relationships to emerge from these studies are nonlinear. Outcomes may interact with problem type (Agostino, Johnson, & PascualLeone, 2010), stage of development (Ansari, 2010), motivation or mathematics anxiety (Beilock et al., 2010; Wang et al., 2015), and other known influences (e.g., reading disability, stereotype threat, motor function) work directly on mathematical outcomes or indirectly moderate known associations (for instance, children with a reading disability or moderate mathematics anxiety might be more or less responsive to an intervention).
Ideally, we would draw comparisons across studies that ask similar questions or address different disorders; this would help to identify common dissociations, alternative compensatory paths, and indicators of whether children are good candidates for specific compensationbased interventions. But, at present, an explicit comparison of this sort is problematic. At minimum it requires commonality in at least some of the measures used to assess skills (mathematical or otherwise), overlap in ages assessed, and similar methodological approaches to study design and data analysis. This returns us to the overarching questions about methodology raised in Section F: better consistency in design and reporting would contribute to both disorderspecific knowledge and knowledge of mathematical cognition in the broadest sense.
Discussion [TOP]
The exercise reported in this paper led to a list of questions that reflects a broad approach to understanding human mathematical cognition, encompassing the effects of environmental and affective factors, of domaingeneral and domainspecific cognitive factors, and of teaching interventions. It covers mathematical cognition as it does or can develop from infancy to expertise: from basic representations of number in the brain, through informal and formal learning and ways in which these might be studied and supported, to the thinking of expert mathematicians. This variation, however, should not be seen as overwhelming: the list exhibits an encouraging level of coherence among the main concerns of researchers at all mathematical levels and from all of the disciplines involved in our exercise. Indeed, although the conference was characterised by a spirit of openness, engagement, and a willingness to do the work necessary to iron out any misunderstandings that arose due to different terminologies, this was hardly required – genuine disagreements were few, which encourages us to believe that our different approaches are converging on similar conceptions of the key issues.
That said, any exercise like this is naturally limited by the experience and knowledge of its participants, and must trade off breadth of representation against depth of focus. With a comparatively small exercise like this, we leant towards depth: all of our participants work on mathematical cognition as a major strand of their own research. One consequence was that minority interests are less well reflected in our list of questions; in particular, although questions related to cognitive neuroscience arose in the drafting stages (see https://figshare.com/articles/S1_Dataset_xlsx/1299358), they did not survive the winnowing process. This could be seen as reflecting a primary focus among our participants on questions of fairly direct concern to educators; fundamental questions about representations in the brain are perhaps, at present, a step further removed from these concerns. Similarly, resource restrictions and the UK location of the funder skewed the participation toward local participants and toward those who publish primarily in English. We fully acknowledge that other researchers – especially those with broader interests, or those working primarily in other regions or languages – might take different views, and we hope that one function of this paper will be to stimulate debate.
We hope also to stimulate debate on prioritisation of research endeavours, toward which end we should clarify that we intend this paper as a resource, not a manifesto: we do not wish to make actual or speculative value judgments about how our fellow researchers use their time and resources. Some bodies do make such judgments: funders are entitled to establish agendas and research priorities. But we consider it inappropriate to use our list – whether preemptively or retrospectively – as a basis for judging the worthiness of others’ work. We nevertheless believe that the spirit of academic freedom would be preserved if an outcome of our exercise is a set of important shared considerations that becomes refined by greater interdisciplinary communication. Accordingly we hope that others will consider addressing some questions directly, and relating their ongoing work to a broader swathe of research pertinent to mathematical cognition.
We are also aware that our list of questions raises issues about practicality: some questions could be addressed via small research projects run by small teams, but some would necessitate largerscale, multilab investigations. The list also raises natural questions about sequencing; for instance, it may seem that Question 25 (How can we measure the variability in children’s mathematical experiences outside of school?) must be answered before Question 7 (including What are reliable early and later longitudinal predictors of the development of number skills, arithmetic and other aspects of mathematics?). But this presupposes that a perfect measure is possible, and that it is possible to know in isolation that we have found it. In fact, such a measure must necessarily be constructed in conjunction with research about the influence of various factors on development: there is no point in measuring perfectly a quantity that turns out not to have influence in the world (arguably, the abstract nature of theoretical constructs means that this is not even possible). There are also more prosaic concerns: we could invest in developing excellent measures for constructs that we currently believe to be important, but what if they turn out not to be? Without some concurrent effort to establish at a rough level whether outsideofschool experiences have as much influence as, say, the first year in school, that investment could be wasted (at least from the perspective of educational development, if not from that of basic research).
So, although our questions represent an informed, collective, current perspective, it would be folly to set ourselves up as able to make predictions about what are destined to become the really important breakthroughs or insights. These will come about due to continued reflection informed by new developments across our fields, so we hope – as stated in the introduction – to promote communication and interdisciplinary work across the fields of mathematics education, psychology and cognitive neuroscience by drawing attention to what is common in their methods, theories, and ultimate research goals.