Enhancing Cognitive Flexibility Through a Training Based on Multiple Categorization: Developing Proportional Reasoning in Primary School

Proportional reasoning is a key topic both at school and in everyday life. However, students are often misled by their preconceptions regarding proportions. Our hypothesis is that these limitations can be mitigated by working on alternative ways of categorizing situations that enable more adequate inferences. Multiple categorization triggers flexibility, which enables reinterpreting a problem statement and adopting a more relevant point of view. The present study aims to show the improvements in proportional reasoning after an intervention focusing on such a multiple categorization. Twenty-eight 4th and 5th grade classes participated in the study during one school year. Schools were classified by the SES of their neighborhood. The experimental group received 12 math lessons focusing on flexibly envisioning a situation involving proportional reasoning from different points of view. At the end of the school year, compared to a control group, the experimental group had better results on the posttest when solving proportion word problems and proposed more diverse solving strategies. The analyses also show that the performance gap linked to the school’s SES classification was reduced. This offers promising perspectives regarding multiple categorization as a path to overtake preconceptions and develop cognitive flexibility at school.

demands. However, adapting the strategy to perform a multiplication is constrained by the accessibility of alternative points of views and strategies to the first ones that come to mind.
Indeed, adaptive expertise in mathematics is akin to finding the solution to a problem in a flexible manner by selecting the most appropriate strategy, and not merely using multiple strategies (Verschaffel et al., 2009). The construct of adaptive expertise has been considered to integrate both conceptual and procedural knowledge (Baroody, 2003;Hatano, 1982). Conceptual knowledge as a necessity for achieving an adaptive expertise is in line with insights on conceptual development. Several approaches stipulate that the conceptual system relies on mental categories (Barsalou, 1991;Malt & Johnson, 1992;Tversky & Hemenway, 1983;Vosniadou, 2012). Mental categories "embody much of our knowledge of the world, telling us what things there are and what properties they have" (Murphy, 2002, p. 2). In fact, categorization provides a maximum amount of information with the least cognitive effort (Rosch, 1978). It makes it possible to relate newly encountered situations to previous ones and to attribute the properties associated with a category to a new situation: when a situation is categorized as member of a specific category, that situation inherits its properties (Hofstadter & Sander, 2013). In this view, assigning an object or a situation to a category provides a certain point of view on that object or that situation by making salient the properties associated with the category. For instance, assigning a tomato to the category of fruits makes salient its botanical properties, whereas assigning it to the category of vegetables makes salient its culinary properties. The most obvious forms of categories concern concrete objects, such as a category for chairs where an object that is considered a chair triggers the inference that one can sit on it (Murphy, 2002). But categories also regard abstract mental constructs such as a category for freedom or proverbs such as a category for situations that can be labeled as you can't judge a book by its cover, which triggers the inference that appearances are often misleading. Mathematics are no exceptions, where for example, repeated addition situations are candidate categories for multiplication (Anghileri, 1989;Mulligan & Mitchelmore, 1997). Therefore, when a mathematical problem is categorized as, for instance, a repeated addition situation, a student is able to make inferences about the solving strategy associated with it, such as adding as many times the multiplier as the multiplicand indicates for finding the solution (Hofstadter & Sander, 2013;Scheibling-Sève et al., 2020). However, a problem such as "what is the surface of a rectangle whose sides are 10 cm and 16 cm" would be outside the scope of the category multiplication as repeated addition.
Some studies contrast deep structures and surface features of a situation as two possible directions for categorization to take place (Chi & VanLehn, 2012;Gentner & Kurtz, 2006). Two situations share the same deep structure if they invoke the same relation, even if they are superficially dissimilar (Gentner & Kurtz, 2006). However, surface features are easily perceived while deep structure features are hardly perceivable unless relevant knowledge has been acquired (Chi & VanLehn, 2012). Novices, unlike experts, construct their categories primarily based on superficial information (Schoenfeld & Herrmann, 1982). One of the goals of teaching is to make it possible for students to transfer knowledge learned in one situation to another. To do this, they must be able to recognize that two situations with different surface features belong to a common category in terms of their deep structure (Dupuch & Sander, 2007). This means that to go beyond one's initial point of view, a categorical shift might be crucial (Vosniadou, 2012). In this approach, participants who failed to solve the problem of calculating the total cost of 50 chocolates, as introduced above, do so because they categorize the problem as member of the multiplication as repeated addition situations category and fail to perceive the situation as part of the category of product situations, that allows commutativity (Lakoff & Núñez, 2000). In the perspective presented in the current study, adaptively selecting different strategies requires flexibly changing between different mental categories. Recognizing structural features makes this recategorization possible. For example, in the first problem regarding the cost of 50 chocolates, one would need to recategorize the situation from a repeated addition to a product situation.
To help students recognize common structural features among different situations and flexibly switch between different categories, a pedagogical approach can be based on semantic recoding (Gamo et al., 2010;Gvozdic & Sander, 2020;Scheibling-Sève et al., 2017). The principle of semantic recoding is to lead the student to recode the representation of the problem initially based on superficial features into a representation that makes the problem's deep mathemati cal structure salient. Indeed, adopting different representations of an object or situation opens possibilities for new inferences, depending on the category that is solicited (Hofstadter & Sander, 2013). A pathway for triggering this kind of cognitive flexibility is to enhance multiple categorization, i.e. a mechanism by which an individual is able to perceive a given entity from different points of view (Scheibling-Sève et al., 2022). For example, first grade students who participated in a semantic recoding intervention practiced solving a subtraction problem such a "There are 11 flowers in the bouquet. Sophie takes out 9 flowers from the bouquet. How many flowers are in the bouquet now?" with strategies compatible with two different points of views (Gvozdic & Sander, 2020). The problem could be either solved by a direct subtraction, which was within the scope of the initial categorization of the problem, as a looking for the remainder situation, either the problem could be solved by an indirect addition. To use the later strategy, a student would need to put aside the semantic features of the wording in the problem, which describe a search for the remainder, and identify an underlying structure, that makes it possible to consider it as well as a part-whole situation, making the missing addend strategy accessible and searching for what needs to be added to 9 to reach 11. Indeed, students who took part in this intervention succeeded better than the control group at solving problems which would require such a recategorization, and used the strategies consistent with this recoded representation to a greater extent. Being able to adopt multiple points of views for a same object or situation therefore makes it easier to move from one category to another and choose the most relevant viewpoint for the situation, and the most adequate solving strategy (Hofstadter & Sander, 2013). Multiple categorization thus constitutes a hallmark of cognitive flexibility.
Nevertheless, achieving cognitive flexibility can be especially difficult when a situation elicits preconceptions. In fact, children conceptualize most notions taught at school based on prior knowledge, also known as preconceptions (e.g., Ausubel, 1968;Carey, 1985). Preconceptions lead to inferences which make them useful for providing explanations and making predictions about future outcomes regarding target notions (Gopnik & Wellman, 1994;Vosniadou, 2017). Another specificity of preconceptions is that they are rarely challenged for falsification (Vosniadou, 2012) and continue to be influential even after instruction (Shtulman & Harrington, 2016). Following previously presented works on categorization, preconceptions as such can be regarded as categories (Babai et al., 2010). Indeed, they can be considered as initial categories which are used as a first categorization at the disposal of students to understand a new situation and make it possible to access specific solving strategies that are associated with the category. Furthermore, preconceptions are characterized by a set of properties that determine the perimeter for their validity (Fischbein, 1989). By assigning the properties of a preconception to a mathematical situation, these preconceptions can at times be useful for learning, but at other times they can lead to erroneous conclusions (Inagaki & Hatano, 2008;Lautrey et al., 2008). Indeed, these initial spontaneously evoked categories are often not aligned with expert categories of the target notion. Failures to solve the problem might result from being stuck on an initial point of view as determined by the preconception. Ultimately, to adaptively use a strategy on problems that are not compatible with a preconception, one would need to recategorize a situation from the preconception to a more expert category.
Thus, multiple categorization can be a leverage to overcome preconceptions and adopt a more relevant perspective on the problem. A domain prone to understanding what the constraints imposed by preconceptions are and how the mechanism of multiple categorization plays a role in overtaking them is proportional reasoning (Carey, 2000;diSessa et al., 2004;Keil, 2011;Vosniadou, 1994).

Proportional Reasoning and Misconceptions
Proportional reasoning has been long time considered as an activity of high expertise (Piaget & Inhelder, 1951). Its misunderstanding leads to many errors and biases (Mackie & Bruce, 2016;Noelting, 1980;Van Dooren et al., 2005). For example, it can affect decision making that relies on statistical information, such as neglecting prime rates which plays a role in risk taking (Casscells et al., 1978). But as with many other mathematical activities, proportional reasoning has cognitive foundations in the early conceptions of number, space and time. Indeed, elementary numerical cognition is delimited by the approximate sense of number (Dehaene, 1997;Spelke & Kinzler, 2007). It allows children to identify a common invariant relation between two variables, representing a first form of proportional reasoning (McCrink & Spelke, 2010). This identification of multiplicative relations also relies on a specific vocabulary, for instance, master ing "twice as many" or "more than half" (Staples & Truxaw, 2012). Some specific mathematical vocabulary concepts ("double", "three more") are even significant predictors of proportional reasoning at the beginning of primary school (Vanluydt et al., 2021). This early proportional reasoning however also relies on several preconceptions that are constraining and impose limits for reaching an expertise and flexibility in solving proportional problems in school and in daily life. Four main preconceptions which act as initial categories used for interpreting situations regarding proportionality can be identified.

Multiplication as Repeated Addition
Proportional reasoning relies on the identification of multiplicative relations (McCrink & Spelke, 2010), and therefore it is influenced by preconceptions regarding multiplication. Indeed, a multiplicative situation which is categorized as part of the well identified preconception of repeated addition (Fischbein, 1989) will not be in line with the concept of ratio. Bell et al. (1981) found that when students aged 12-15 were asked to solve the problem "If petrol costs £1.2 per gallon, what would be the cost of filling a can containing 0.22 gallons?", the most common operation used to find the answer was division (1.2 ÷ 0.22), instead of the correct one, multiplication (1.2 × 0.22). Indeed, when the categorization of multiplication as a repeated addition is adopted, multiplying by 0.22 is equivalent to adding 0.22 times, which is hardly meaningful. Indeed, problems containing a decimal multiplier smaller than one are more difficult for students to solve (Fischbein et al., 1985): such problems are not compatible with the constraining inferences imposed by the preconception since it leads to a smaller result than the initial value. Thus, the preconception of multiplication as repeated addition imposes constraints such as believing that the multiplier must be a whole number and the result must be greater than the multiplied value. This also hinders the possibility to conceive multiplication as a commutative operation.

Division as Sharing
A second preconception one can associate with proportional reasoning and the concept of ratio is division. Indeed, division is not intuitively categorized as the ratio between two quantities, but as sharing, where one quantity is shared into equal parts and one searches for the size of a part (Fischbein, 1989). Fischbein et al. (1985) showed that problems falling outside the scope of the inferences that are made based on the preconception, when the divisor is larger than the dividend, such as "15 friends together bought 5 kg of cookies. How much did each one get?", are challenging for 5 th graders since it is difficult to use the correct solving strategy '5 ÷ 15'. The mental category of dividing is partitioning (or sharing) is thus restrictive because it precludes viewing division as a measurement (Fischbein et al., 1985). Indeed, such an alternative quotative view of division refers to the ratio between quantities of the same unit and entails less constraints than the partitive view. The quotative view only considers that the dividend should be larger than the divisor. Therefore, in proportional problems such as "2 baguettes cost 3€. How much do 8 baguettes cost?", the partitive view would lead to first calculate the price of a baguette. This strategy, known as identifying the base rate, consists of first calculating the base quantity (here, 1 baguette costs 1.50€) and then multiplying the base quantity by the number of units sought (1.50€ × 8 = 12€). However, the quotative view would make it possible to solve the problem with a different strategy (8 ÷ 2 = 4, I buy 4 times more baguettes, so I will pay 4 times more, 4 × 3 = 12€).

Fraction as a Bipartite Structure
Another difficulty in grasping proportional reasoning comes from not categorizing a fraction as a ratio between two quantities. One of the difficulties stems from the fact that students make an analogy between natural numbers and rational numbers and therefore categorize rational numbers as natural numbers and apply the properties of integers to fractions. This phenomenon is called whole number bias (Ni & Zhou, 2005). This can lead to inferences that rational numbers have a single successor (Siegler & Lortie-Forgues, 2015;Vamvakoussi & Vosniadou, 2010), that a larger numerator, denominator, or both, represent a larger fraction (Ni & Zhou, 2005), or that multiplying two fractions necessarily makes the result larger while dividing two fractions necessarily makes it smaller (Siegler & Lortie-Forgues, 2015). A second difficulty amounts to categorizing a fraction as a bipartite, i.e., a part-whole structure (Bonato et al., 2007;DeWolf et al., 2014), related to division (a/b is a division of a by b, with a and b integers and b bigger than a). Thus, the fraction is seen as a division of 2 numbers and not as a number (Sophian, 2007). A challenge in teaching fractions is therefore to build the mental category of fractions as magnitudes. This bipartite conception entails difficulties for comparing fractions, since students see it as a comparison between number pairs. For example, in a paradigm often used, two fractions are presented to participants, and they are asked to judge which fraction is larger. Some pairs are consistent with the whole number bias (6/8 vs. 7/9) and others are inconsistent with that bias (2/9 vs. 1/3). Van Hoof et al. (2013) showed that first and fifth graders reaction times are longer for incongruent pairs, thus illustrating the difficulty to perceive fraction as magnitude. Furthermore, empirical findings reveal that when fractions are viewed as magnitudes and not bipartite structures, expert mathematicians use different fraction comparison strategies (Obersteiner et al., 2013). This suggests that recategorizing fractions from a bipartite point of view to a holistic, magnitude based point of view is important to favor the appropriate and flexible use of strategies.

The Illusion of Linearity
Finally, the last preconception we will develop concerns the linear property of proportions. In fact, as young as 6, students can solve some missing value proportional problems, especially with an informal proportional reasoning that amounts to a repeated addition strategy (Kaput & West, 1994;Sophian & Wood, 1997;Van Den Brink & Streefland, 1979). For example, "For 6 m 2 , I need 0.75 liters of paint. How many liters do I need for 18 m 2 ?" is solved by applying additive reasoning as a principal for linearity: "18 m 2 is 6 + 6 + 6 m 2 , I need 0.75 + 0.75 + 0.75 liter of paint".
But even in contexts where the proportional strategy is not valid, students implement the principle of linearity (Van Dooren et al., 2005). Surprisingly, as they progress through school, from second grade to sixth grade, when students solve non-proportional problems, the number of proportional (linear) strategies and answers, which are incorrect, increases. One explanation of this result comes from students' school experience: at certain points in the mathematics curriculum, significant attention is given to proportionality. The emphasis then is often on performing the procedures correctly and students apply it consistently. Thus, students categorize a problem describing a situation with one missing value among 4 values as a proportional problem e.g. "In his toy box, John has dice in several sizes. The smallest one has a side of 10 mm and weighs 800 mg. What would be the weight of the largest die (with a side of 30 mm)?" (Van Dooren et al., 2004). This categorization is based on surface features -3 known values and 1 missing value -and not on the mathematical structure, related to the principles at play in the situation, in this case a non-linear situation, since volumes of cubes extend exponentially and not linearly with the size of the sides.

A Pedagogical Intervention for Enhancing Flexibility Through Multiple Categorization
In order to attenuate the obstacles imposed by preconceptions in the domain of proportional reasoning, we created a pedagogical intervention based on principles of multiple categorization as a way of increasing flexibility. The inter vention program consisted of 12 one-hour in class math lessons. These lessons were composed of different written arithmetic word problems. Each lesson focused on one key concept of proportional reasoning in relation to the relevant preconceptions (Table 1). By studying how to solve comparison problems, students first worked on additive structures (e.g., three more and three less) and multiplicative structures (e.g., three times more and three times less). Then they learned to distinguish between additive and multiplicative structures (e.g., three more vs. three times more). After these first steps, fractions and proportions were studied. The aim of this intervention was to guide the students towards an awareness of their preconceptions and the construction of mental categories more in line with the academic notion. The latter will be qualified as an expert conception in this study, and strategies in line with expert conceptions will be considered as a reflection of flexibility, since they indicate that a more adequate perspective has been adopted instead of a more intuitive but less relevant one.
In our intervention, the notion of multiple categorization was made explicit to the students through the notion of point of view. For example, in order to learn the reciprocity between multiplication and division, crucial in the construction of the concept of ratio, two points of view can be taken on the following situation: "Jena has 15 marbles and Mateo has 5 marbles". Taking Jena's point of view, labeled "times more" one can conclude: "Jena has three times more marbles than Mateo". While, from Mateo's point of view, labeled "times less" one can conclude: "Mateo has three times less marbles than Jena". Indeed, explicit methods increase performance compared to implicit methods (see meta-analysis by Alfieri et al., 2011) and using labelling helps students to identify a deep structure (Namy & Gentner, 2002). Furthermore, comparing and contrasting two solution methods by their efficiency can lead to greater gains in flexibility than studying the solution methods one at a time (Rittle-Johnson & Star, 2007). This is indeed beneficial for better understanding (Hattikudur et al., 2016;Rittle-Johnson & Star, 2007). And furthermore, in order to promote transfer, it is important to practice the same reasoning across a variety of contents (Bransford et al., 2000;Halpern, 2013;Perkins & Salomon, 1989). In our intervention, in line with these studies, a side-by-side presentation and comparison of the different strategies, associated to the points of views, was made. Students were also prompted to transfer the same points of view on various contexts through the 12 lessons.

The Current Study
The current study investigated the impact of a pedagogical intervention based on multiple categorization principles as a way of achieving flexibility. It used proportional reasoning as a tool of intervention and investigation. The general ra tionale was that since difficulties in understanding proportionality are rooted in preconceptions, categorizing situations in alternative ways should make it possible for students to overcome the constraints induced by preconceptions and to adopt strategies aligned with the expert conception of the mathematical concepts. We expected students in the experimental group to better succeed than students in the active control group. Each group included subgroups created based on grade level and on the school's SES. First, during the pretest, we expected no difference between the groups and between the different subgroups (Prediction 1). At posttest, the experimental group and subgroups were expected to score higher than the control group (Prediction 2). For each skill measured in the tests, at pretest, we expected no differences between groups and between subgroups (Prediction 3). And at posttest, the experimental group and subgroups should score higher than the control groups and subgroups for each subscore regarding the studied notions (Prediction 4).

Method Participants
Twenty-eight French classes participated to the study. 588 students (53% female, mean age 10.5 years, SD = 0.65 for the experimental group; 48% female, mean age 10.6 years, SD = 0.62 for the active control group) were present at both preand posttests (Table 2 for exact distributions). The experimental and control classes were paired according to the socio-economic status commonly associated with the context of the participating schools (low SES, middle SES, high SES). In France, most students attend non-priority education public schools. These are schools with a relatively mixed student population (Botton & Miletto, 2018). All the classes of the middle SES group belonged to such public schools. Furthermore, since 2015 priority education schools are split between Priority Education Networks (REP) and Enhanced Priority Education Networks (REP+). 74.1% of REP+ students are children of working class or unemployed parents (Direction de l'évaluation, de la prospective et de la performance [DEPP], 2016). Only 8% of primary school pupils are enrolled in REP+ network (DEPP, 2018). All the classes of the low SES group belonged to this REP+ network. Lastly, there also exist private schools, which enroll 14.5% of pupils (DEPP, 2017). All the classes of the high SES group came from a selective private Parisian school whose admission is based on exam and interviews. The teachers, who participated in experimental and control groups, did so on a voluntary basis. In each of the sub groups, the selection process for teachers was similar. At the beginning of the year, several projects were presented to them, including the current project. The objective was thus to control for the teacher's "motivation" effect (Willingham, 2008): all the teachers included in the control and experimental groups were motivated to invest themselves in an optional subject project, included in their class hours.

Pre-and Posttests
The pretest consisted of 17 items for 4 th graders and 23 items for 5 th graders. The pretest differed between the two grades since at the beginning of the school year, 4 th graders have never been taught division, fractions, and proportion ality. The posttest was identical for the two grades and consisted of 35 items. The items included in the posttest all required expert conceptions of proportional reasoning to successfully solve the problem. Four items from French national evaluations and 4 items from TIMMS (2015) were integrated. At posttest, 2 items for which the threshold (75% of success) had been met at pretest were removed. The different items of the tests are detailed in Appendix. They were classified according to the 6 different notions that are studied in these grades: • Distinguishing between additive and multiplicative structures • Solving distributivity problems • Solving multiplicative problems • Decomposing and comparing fractions • Solving fraction problems • Solving proportion problems The control and experimental groups took the pretest at the end of the first trimester and the posttest during the last month of school year. The booklets of the tests were composed of a series of problems. Each problem statement was followed by a box to indicate the calculation and a line for the answer statement. In order to control for order effects, 4 booklets were created. At pre-and posttest, students were informed that the test was part of a scientific study and were instructed about the importance of completing the calculation. Each item had to be solved in a limited time (2 or 3 minutes depending on the item). The timing was determined based on pilot tests, and it was introduced to limit the total duration of the test. Once the time was up, the experimenter informed the students they should move on to the next exercise without going back to the previous ones. The pretest was administered by the first author. The posttest was divided into two testing sessions to limit the duration of each testing session for the students. Due to the high amount of testing sessions, two additional experimenters were recruited for conducting the experiments in the classrooms. Teachers were present during the administrations of pre-and posttest but did not intervene and did not keep copies of the tests.

The Intervention Program
The control group followed the usual math curriculum. In France, each class has to follow an official mathematics curriculum specified for each grade (Eduscol, 2022a(Eduscol, , 2022b. All the studied notions seen by the experimental group were part of the official curriculum. Thus, experimental and control classes studied the same notions. The experimental group participated in 12 lessons of 1 hour over a 5-month period. The lessons were part of the teaching hours dedicated to math teaching. The lessons in the middle SES group were entirely conducted by the first author in the presence of the teacher. For the other two groups, half of the lessons were conducted by the first author and half by the teachers. Before the beginning of the intervention, teachers from the experimental classes participated in a 2-hour training on preconceptions and multiple categorization, given by the first and last authors. Before each lesson they had to teach, the teachers received a teacher's guide and the necessary material (student worksheets and slides) (Figure 1). The teacher's guide started with a summary of the general objectives of the lesson (Figure 2). Then the teacher's sheet described step by step the problems and the points which needed particular attention.

Scoring
For each problem, the expert strategy -i.e., a strategy that does not rely on preconceptions but requires categorizing the situation in the expert point of view -was defined prior to collecting the data (Appendix). Each expert strategy counted for 1 point. Calculation errors were not taken into account. For items involving more than one question, the answer to each question was given 1 point. Several scores were derived from the coding: • A global score (ranging from 0 to 18 points for 4 th graders and from 0 to 29 points for 5 th graders on the pretest and 40 points on the posttest) • A sub-score associated with each studied notion (see Appendix Tables A.1, A .2, A.3, A.4, A.5, A.6, A.7, A.8, and A.9) To compare the pretest and posttest which did not contain the same items, a z-score per student at pre-and posttest, relative to the mean and standard deviation of the control group, was calculated (Dillon et al., 2017).

Results at Global Level
The data regarding student performance were not independent, since it was the classrooms that were recruited and not individual students Along with checking the equivalence of the two groups at pretest, this also required to check the variance explained by the hierarchical organization of the data (class clustering). At pretest, the z-score of the experimental group was equal to 0.02 (SD = 0.90) (Figure 3). A t-test comparison of the mean scores of the two population revealed no significant differences among them, t(587.45) = -.17, p > .5. However, such a probabilistic approach is not sufficient to conclude that there is no difference between two groups. Therefore, we resorted to the Bayesian approach and calculated the Bayes factor with the BayesFactor package in R. According to the classification of Kass and Raftery (1995) the BF01 = 10.79 provides substantial support for the absence of difference between the performance of the two groups at pretest. This therefore leads us to consider that the assignment of the participants to the experimental and control classes could be considered quasi-random and makes it possible to further conduct the inferential analysis.

Boxplots of z-scores at Pre-and Posttest by Experimental Conditions
At posttest, the average z-score of the experimental group was 0.66 (SD = 1.2) (Figure 3). To study if the improvement from pretest to posttest was significantly influenced by the intervention, a multilevel analysis was applied, since the data had a hierarchical structure, that considers the dependency of the students nested into classrooms. We first ran models using only the classroom as the random intercept, for the performance both at pre-and posttest. This model made it possible to quantify the intra-class coefficient (ICC). At pretest the ICC = .421 and at posttest the ICC = .387 indicated that there was substantial intra-class homogeneity. Hence, there was 41.9% of the observed variance at the pretest and 38.7% of the variance at outcome of the posttest which can be attributed to the effect of the classroom clustering.
To study the interaction between the Time of testing (Pretest vs. Posttest) and Group (experimental vs. control), linear mixed-effects models (Bates et al., 2015) with the Z-score performance was further fitted. The null model (M0) included only the participants and classroom as random effects. Departing from the null model, we constructed four new models, adding the Time of testing (M1), its interaction with Group (M2), and subsequentially adding Grade (4 th vs. 5 th ) (M3), and SES (Low vs. Middle vs. High) (M4) as the fixed effects. We conducted an ANOVA with the 4 models. As indicated in Table 3, the Akaike Information Criterion (AIC) decreases from the M0 to the M4, which is consistent with the improvement of the fit at each step of the model construction, therefore the M4 was retained. The results from the M4 model revealed that there was a significant interaction between Time of testing and Group (β = -0.66777, t = -10.361, p < .001), with an effect size of the M4 model R GLMM(c) 2 = .75 (Bartoń, 2020). To better understand the importance of the fixed factors, we then also constructed a model to investigate only the results of the posttest, which included the Group, Grade and SES as the fixed factors and classroom as random factor (Table 4). For each fixed effect, the level of the variable whose effect is estimated compared to the reference level for that predictor is indicated in the parentheses. The results unambiguously confirm the highly significant influence of the three factors, in the expected directions: on the posttest, the Experimental group performed better than the Control group, the 5 th graders performed better than the 4 th graders, students from the Low SES group perform lower than the students from the Medium SES, and lower than the High SES group.

Results by Grades
Furthermore, the experimental conditions depending on the grade level (Tables 5 and 6 and Figure 4) were distinguish ed. Pairwise comparisons were conducted, with Bonferroni correction for p-values based on the retained M4 model using lsmeans function from the lsmeans package in R. At pretest, no significant difference between the control and experimental group in 4 th grade (β = -0.0365, t = -.324, p > .05), nor in 5 th grade (β = -.0365, t = -.324, p > .05). This result confirms the second part of Prediction 1.

Boxplots of z-scores at Pre-and Posttest by Experimental Conditions and by Grades
At posttest, each control subgroup scored a lower z-score than the corresponding experimental group (β = -0.6943, t = -6.168, p > .001 for both 4 th and 5 th grade classes). This result confirms the second part of Prediction 2. In addition, while the 4 th grade control group had a significantly lower z-score than the 5 th grade control group, the 4 th grade experimental group had a significantly similar z-score than the 5 th grade control group.

Results by SES
The performance regarding the different SES conducting were then compared, using pairwise comparison with Bonfer roni correction for p-values based on the retained M4 model (Lenth, 2016). At pretest, for each SES the experimental group had a similar z-score to the corresponding control group. This result confirms the last part of Prediction 1. At posttest, each experimental group had a significant higher z-score than its corresponding control group (Tables 7 and 8 and Figure 5). This result confirms the last part of Prediction 2. Note. p-values in bold are inferior to .05.

Boxplots of z-scores at Pre-and Posttest by Experimental Conditions and SES.
Furthermore, the results revealed that the performance gap between the three SES among the experimental groups was maintained at posttest. However, differences were observed in the gap among different SES subgroups between the control and experimental groups. The middle SES experimental group had a lower z-score to the high SES control group at pretest, but a similar z-score at posttest. In contrast, the middle SES control group maintained lower z-scores than the high SES control group. The same trend was significant for the low SES control group and the middle SES experimental group. Interestingly, even though the high SES control group had higher performance than the low SES experimental group at the pretest, the difference was not significant on the posttest.

Results by Sub-Scores
Then, each proportional reasoning sub-score was analyzed at pretest and posttest (Table 9). At pretest, no significant differences were observed through Mann-Whitney-Wilcoxon tests between the two groups for 4 subscores. For the subscore "Decomposing and comparing fractions", the control group was significantly better than the experimental group at pretest. For the subscore "Solving proportion problems", the score on the missing value proportional problems -taken only by the fifth graders -and the score on the proportional graphic situation -taken by all students -were distinguished. While there was no difference between the two groups in the score of the missing value proportional problems, the experimental group did better on the proportional graphic situation. However, given the threshold achieved on the pretest for this item (0.77 for the control group and 0.81 for the experimental group), the item was not kept at posttest. Therefore at pretest: there was no difference on 4 sub-scores of the studied notions with a superiority of the control group for the skill "Decomposing and comparing fractions" and a superiority of the experimental group for only one item -"solve a graphical situation of proportionality". These results partially confirm the first part of Prediction 3.
At posttest, the experimental group had a significantly higher mean than the control group for 5 out of 6 sub-scores regarding the studied notions and with a significant trend (p = .053) for the skill "Decomposing and comparing fractions" (Table 9 and Figure 6). Compared to the pretest, the experimental group caught up and exceeded the control group. These results confirm the first part of Prediction 4.
Each sub-score by grade and SES were also analyzed with Mann-Whitney-Wilcoxon tests. At pretest, the results between the subgroups (by level or type of school) are similar (1242 < U < 12398, .07 < p < .96) except for the skill "Decomposing and comparing fractions". On this task, the high SES control group was better than the high SES experimental group (U = 7034, p < .01) and the low SES control group is better than the low SES experimental group (U = 7590, p < .001). There is no difference between each subgroup for the item "solve a graphical situation of proportionality" (3521.5 < U < 11802, 0.07 < p < .86), unlike the analysis by experimental condition. These results confirm the last part of Prediction 3, even though a small superiority for some control subgroups can be noticed. .02* a Items taken by 5 th graders only. *p < .05. **p < .01. ***p < .001.

Subscores of the Studied Notion's Means at Posttest by Experimental Condition
At posttest, for each sub-score, each experimental subgroup got better scores than the control subgroup, except for the comparison of the middle SES groups on the subscore regarding the studied notion "Decomposing and comparing fractions". On 30 comparisons, 23 comparisons are significant (1565.5 < U < 10590, 1.69E-12 < p < .02), 3 comparisons are at significance threshold (4549 < U < 10828, p = .05), 3 comparisons are non-significant (3140 < U < 5212.5, p > .05), and 1 comparison is in favor of the middle SES control group, although not significantly ("Decomposing and comparing fractions", U = 4015, p > .05). These results are in line with the last part of Prediction 4.

Discussion
The present study was conducted to investigate to which extent a pedagogical intervention based on multiple catego rization might improve students' mathematical flexibility. This intervention focused on proportional reasoning, for which a wide set of preconceptions might hinder students to use an appropriate strategy to find the solution. In fact, preconceptions often lead to problems being categorized based on superficial features and precludes the possibility to consider an alternative, more adequate solving strategy, which would be consistent with the expert point of view. Therefore, teaching students to analyze the notions related to proportional reasoning from different points of views, each point of view being the hallmark of categorizing the problem in a different manner, was expected to lead students to be in position to flexibly adopt relevant strategies. Namely, it was expected that the intervention would allow students to adopt strategies that are outside the scope of the intuitive conception, but consistent with an alternative categorization in line with an expert point of view. Fourth and fifth graders from three different social backgrounds took part in the study. The experimental classes benefited from 12 lessons based on multiple categorization to guide them in overcoming their initial point of view and build an alternative one, that they could adaptively refer to when the initial one reveals to be inadequate for finding the solution. The performance of the experimental and active control groups was compared before and after the intervention. The results revealed that the control and experimental groups had homogeneous performance at pretest. At posttest, the experimental group outperformed the control group and this was consistent among the different grades and the different SES of the schools. This suggests that the pedagogical intervention based on multiple categorization had a beneficial influence on students from the experimental group when it came to building a better understanding of proportionality. In the current study these observations were made using written word problems. Yet, multiplicative thinking and proportional reasoning is crucial in real-world situations such as financial contexts or when assessing risk taking (Casscells et al., 1978;Sawatzki et al., 2019). For example, when students use additive strategies in proportional situations that require comparisons, their ability to make informed financial decisions seems to be limited (Hilton et al., 2012;Sawatzki et al., 2019). Further studies could therefore directly include real-life situations and have a wider range of tasks (such as students baking based on a recipe and adjusting the ingredients to a different proportion) in order to measure their ability to transfer this kind of mathematics knowledge from school to real-life context.
Additionally, this research supports the idea that to develop flexibility on problem solving in school contexts, it helps to dispose of several solving strategies. Indeed, students were encouraged to adopt as many strategies as possible by adopting different points of view. The wide variety of possible solutions was not simply the result of pooling together the different strategies proposed by different students, but all students had to propose several strategies. As a result, at posttest, more than one third of the experimental students proposed two strategies to solve distributivity problems. It was three times more than students from the control group. Additionally, for missing value proportional problems, no students from the control group succeeded to propose two strategies, whereas one seventh of the students from the experimental group succeeded. Thus, it seems that experimental students were not restricted to the first point of view induced by the problem and developed more flexible strategies. In addition, one can note that at posttest, the experimental 4 th grade group reaches a similar level to the 5 th grade control group. Finally, although the gap between the different SES subgroups remained significant across the experimental sub-groups, the performance gaps between the experimental and control groups by SES subgroups have narrowed. The process of strategy selection among several strategies has also received much attention in works about conceptual and procedural knowledge in mathematics and their relations. In the latter, flexibility has been underlined as a crucial point in Star's (2005) reconsideration of proce Flexibility, Categorization and Proportional Reasoning dural knowledge: deep procedural knowledge is introduced and defined as "knowledge of procedures that is associated with comprehension, flexibility, and critical judgment and that is distinct from (but possibly related to) knowledge of concepts. " (p. 408). This association of knowing multiple procedures as well as choosing the most appropriate one given a problem's features has also been termed with procedural flexibility by other scholars (Kilpatrick et al., 2001). Yet, as mentioned in the introduction, the flexible choice between different strategies is also to be considered in connection with conceptual knowledge. The latter "involves connecting concepts to specific procedures -for example, knowing why certain procedures work for certain problems or knowing the purpose of each step in a procedure" (Crooks & Alibali, 2014, p. 371). The relations between conceptual and procedural knowledge are bidirectional (Rittle-Johnson et al., 2001), and developing either one can contribute to fostering flexibility. Yet some studies that focused on instructional interventions have even found that conceptual instruction even leads to greater gains in procedural knowledge than merely focusing on procedural instruction (Rittle-Johnson et al., 2016). The results from our study also contribute to highlighting the importance of conceptual knowledge for fostering procedural knowledge.
Indeed, when a problem can be solved with several strategies, it can be particularly beneficial to work on the conceptual knowledge to which each strategy is attached. This was precisely done in the current study when students were introduced to points of view reflecting the different conceptions. Only after identifying these points of view were mathematical strategies associated with each point of view. This approach is in line with other findings which stress that flexibility cannot simply refer to the smooth transition between several strategies, but that achieving flexibility mobilizes the complex relations between conceptual and procedural knowledge (Baroody, 2003;Prather & Alibali, 2009;Verschaffel et al., 2009). In our view, relying on multiple categorization might contribute to developing conceptual knowledge since it allows one to perceive the common conceptual structure between different problems whose superficial features are different. The intervention and its assessment conducted in this study highlight the usefulness of overcoming some on the limitations of an initial representation, but also provide insight into the benefits for fostering students' flexibility in strategy use.
Funding: The work carried out by the first author has been financed by a doctoral contract from the Université Paris Lumières.  Fraction addition problem Tom ate 1/2 of the cake and Jane ate 1/4 of the cake.
How much of the cake did they eat altogether?
A unit can be written in the form of a fraction 1 h = 4/4 ; 4/4 + 1/4 = 5/4. There are 5 quarters. x x v 1 Fraction decomposition problem Tom ate 1/2 of the cake. And Jane ate 1/4 of the cake.
Between them, what fraction of the cake did they eat?
A fraction can be written as a number multiplied by a fraction (a x 1/b) 3 × 1/4 = 3/4 or 1/4 + 1/4 + 1/4 = 3/4. There are 3 quarters.   Proportion problem 90 students must be transported in 40-seat buses. If the first buses are all full, which proportion of the last bus will be full?
Proportion is a ratio 10/40 = 1/4 of the bus will be filled.