Comparing Fraction Magnitudes: Adults’ Verbal Reports Reveal Strategy Flexibility and Adaptivity, but Also Bias

Many studies have used fraction magnitude comparison tasks to assess people’s abilities to quickly assess fraction magnitudes. However, since there are multiple ways to compare fractions, it is not clear whether people actually reason about the holistic magnitudes of the fractions in this task and whether they use multiple strategies in a flexible and adaptive way. We asked 72 adults to solve challenging fraction comparisons (e

with unfamiliar fractions is scarce. The aim of this study was, therefore, to assess individual strategy use on challenging fraction comparisons using a trial-by-trial approach.
Understanding strategy use in fraction comparison is important for many reasons. First, flexible and adaptive strat egy use is an important facet of mathematical competency (Heinze et al., 2009;Siegler & Lemaire, 1997), and competence with fractions is associated with later mathematical achievement (Siegler et al., 2012). Thus, studying strategy use in fraction comparisons can contribute to better understanding of an important mathematical domain. Second, although there are several mathematically valid strategies for comparing fractions (i.e., clearly defined algorithms that yield a correct answer for any comparison), these strategies are often cognitively demanding, for example, when they involve two-digit multiplication. Consequently, people sometimes rely on heuristics that are not always valid or on estimations that are less precise. Heuristics can be correct and efficient in some cases, but they lead to incorrect responses in other cases, resulting in biased reasoning patterns (e.g., "natural number bias"; Ni & Zhou, 2005). Studying fraction comparison strategies with challenging fraction pairs therefore contributes to understanding people's (imperfect) deci sion making processes in mathematical problem situations (Fischbein, 1987;Gillard et al., 2009). Third, although there is evidence that people can construct fraction magnitude representations quickly, particularly for familiar fractions (Liu, 2018;Schneider & Siegler, 2010), little is known about how people do so for unfamiliar fractions. Finally, fraction comparison tasks are often used to assess fraction magnitude representations (e.g., Schneider & Siegler, 2010). However, the validity of this measure hinges on the assumption that people actually engage in magnitude processing when comparing fractions. This assumption has been found to be untenable for simple fraction comparisons (Obersteiner et al., 2013;Reinhold et al., 2020), but it is still under debate regarding challenging fraction comparisons (e.g., Obersteiner et al., 2020;Schneider & Siegler, 2010).
Studies of educated adults also often report pronounced "distance effects" in fraction comparisons-that is, strong associations between the numerical distance between the two fractions and participants' average accuracy and/or response times. Accuracy tends to be higher, and response times lower, with increasing numerical distance. Such dis tance effects suggest the processing of holistic (overall) magnitude representations, similar to those recruited for whole number magnitude comparisons (Moyer & Landauer, 1967). Obersteiner et al. (2013) found that numerical distance accounted for a substantial amount of the variance in response times for a set of challenging fraction comparisons (up to 71%, depending on the specific item subset), suggesting that people activate holistic magnitudes when comparing fractions. Notably, the distance effect appears to be more pronounced for challenging fraction comparisons than for simple ones, for which alternative short-cut strategies (e.g., comparing numerators) are applicable (Bonato et al., 2007;Ganor-Stern et al., 2011;Obersteiner et al., 2013;Reinhold et al., 2020).
One limitation of previous research is that almost all studies aggregated data across participants (e.g., Meert et al., 2010;Obersteiner et al., 2013;Van Hoof et al., 2013). Such analyses do not capture the potentially large variability in individual patterns of accuracy and response times. Substantial variability has frequently been documented in children, whose responses are often incorrect (Reinhold et al., 2020;Rinne et al., 2017). Although adults generally display less variability in aggregated performance than children, they may still vary in the specific strategies they use to solve challenging fraction comparisons. For familiar fractions, such as 1/2, most adults may readily activate holistic magnitude representations (Liu, 2018); however, it is not well understood how people evaluate the magnitudes of less familiar fractions.

Strategy Use in Fraction Comparison
Various facets of strategy use have been identified and characterized in previous work (Heinze et al., 2009;Lemaire & Siegler, 1995). Strategy repertoire refers to the set of different strategies that individuals know and can apply. In the case of fraction comparison, individuals may have several generally valid strategies for comparing fractions in their repertoires. For example, one can convert both fractions to the same denominator and then compare numerators (2/5 = 16/40; 5/8 = 25/40, hence 5/8 > 2/5); one can convert both fractions into decimals and compare them (2/5 = 0.4; 5/8 = 0.625, and 0.625 > 0.4, so 5/8 > 2/5); or one can use 1/2 as a benchmark (i.e., 2/5 is less than half, 5/8 is more than half, hence 5/8 > 2/5). An individual may know and use all three of these strategies or only a subset of them.
A second facet of strategy use is strategy frequency, which refers to how often people use particular strategies. Strategy frequency may depend on characteristics of the context or task. For example, people may use certain strategies more frequently when comparing fractions mentally than when using paper and pencil. Contextual factors (e.g., time pressure) may also encourage people to use strategies or heuristics that are not generally valid but that lead to correct responses on some comparisons. For example, the gap comparison strategy consists of finding the numerical difference (gap) between the numerator and denominator of each fraction and choosing the fraction with the smaller gap as the larger fraction. In some cases, this strategy yields the correct response (e.g., 31/71 vs. 13/23, where the larger fraction has a smaller gap [40 vs. 10]), but in other cases, it does not (e.g., 31/65 vs. 13/31, where the larger fraction has a larger gap [34 vs. 18]). Moreover, instructing or encouraging people to use specific strategies may also affect the frequency with which they use particular strategies (Fazio et al., 2016).
A third facet of strategy use is strategy efficiency, which refers to the speed and accuracy with which strategies are applied. For example, in challenging fraction comparisons, it may be more efficient to use benchmarks (i.e., reference numbers such as 1/2) than to convert fractions into decimals, because benchmarking relies on estimation instead of exact mental computation. In some situations, people may rely on generally invalid heuristics that are efficient because they are easy to apply (e.g., gap comparison or simple comparison of numerators), even though such heuristics carry the risk of yielding incorrect answers (see above).
Two additional, related facets of strategy use are strategy flexibility and adaptivity. Flexibility refers to the ability to shift between different strategies, and adaptivity refers to how frequently people choose strategies that are best suited for a given problem. Note that identifying which strategy is "best suited" is difficult because it requires both a normative perspective (i.e., which strategies are valid and applicable on a given problem) and an individual perspective (i.e., which strategies a given individual can apply quickly and accurately).
Assessing all facets of strategy use simultaneously is a methodological challenge. For example, assessing strategy efficiency requires recording accuracy and response times, but these measures alone do not allow for straightforward conclusions about specific strategies. Accordingly, previous studies have investigated fraction comparison strategies using a variety of methods, including identifying error patterns across sets of items (González-Forte et al., 2020;Stafylidou & Vosniadou, 2004), measuring reaction times (Gómez & Dartnell, 2019;Morales et al., 2020;, and studying eye movement patterns (Obersteiner & Tumpek, 2016). In all of these studies, data were aggrega ted over many participants and items, and the focus of the analyses was typically on the broad distinction between holistic and componential strategies. Holistic strategies are strategies that rely on reasoning about the magnitudes of fractions as wholes, whereas componential strategies rely on reasoning about fraction components (i.e., numerators and denominators), without considering holistic magnitudes. These past studies have shown that people are more likely to use componential strategies in simple comparisons, in which fractions have the same numerators or denominators (e.g., 13/25 vs. 21/25), and use holistic strategies when this is not the case. However, studies using these methods did not characterize individual strategy use in detail.
One approach that allows for individual-level analyses of strategy use is collecting verbal reports on a trial-by-trial basis. However, previous studies using verbal reports have not considered challenging fraction comparisons. For exam ple, Faulkenberry and Pierce (2011) collected trial-by-trial strategy reports from adults; however, the items included common fractions such as 1/2 and 1/3, potentially affecting the range of strategies used. Clarke and Roche (2009) and González-Forte and colleagues (2019) collected trial-by-trial strategy reports from children in grades 6 and 7, respectively; however, both studies used item sets with very simple fraction comparisons. These studies provided some empirical evidence that people use a range of strategies to make simple fraction comparisons, but they do not provide information about how people make difficult comparisons quickly.
One study that used a trial-by-trial analysis did include a few difficult comparisons. Fazio, DeWolf, and Siegler (2016) studied fraction comparison with verbal reports in university students who were presented with eight different types of fraction pairs. One type of comparison that they included was very challenging-pairs with two-digit, non-equal components and small numerical distances. On these difficult comparisons, participants often used benchmark strategies (referred to as "general magnitude reference"), gap comparisons, or strategies that were coded as "other" (infrequent or uncodable strategies). However, the experiment included only three items of this type. Thus, a larger set of items and a more fine-grained analysis of strategy use is needed to provide a more comprehensive picture of strategy use on challenging fraction comparisons.

Strategy Use and the Natural Number Bias
Analyzing strategy use in fraction comparison may contribute to better understanding of the natural number bias that has been well documented in the literature (e.g., Alibali & Sidney, 2015;Ni & Zhou, 2005). One manifestation of the natural number bias is a systematic difference in performance (in accuracy and/or response times) between fraction comparisons that are congruent with naïve natural number reasoning (i.e., the larger fraction has the larger whole number components, e.g., 1/4 vs. 3/5) and incongruent with naïve natural number reasoning (i.e., the larger fraction has the smaller whole number components, e.g., 1/4 vs. 2/9). Research has documented pronounced individual differences in bias patterns (Gómez & Dartnell, 2019;González-Forte et al., 2020;Rinne et al., 2017). Although studies using simple fraction comparisons have often reported better performance on congruent than incongruent items (e.g., , studies that addressed individual differences or that included more challenging fraction pairs have frequently documented the reverse pattern (i.e., better performance on incongruent than congruent comparisons) (Barraza et al., 2017;DeWolf & Vosniadou, 2015;Obersteiner et al., 2020).
The performance difference between congruent and incongruent items has been attributed to people's use of simple heuristics (e.g., "larger numerator/denominator makes larger fraction" or "smaller numerator/denominator makes larger fraction" heuristics; González-Forte et al., 2020;Reinhold et al., 2020). However, the sources of the reverse bias pattern that is sometimes observed in educated adults are less well understood. It seems unlikely that educated adults use a "smaller numerator/denominator makes larger fraction" heuristic, and some authors have suggested that other heuristics, especially gap comparison, could explain the reverse bias pattern (e.g., Gómez & Dartnell, 2019;Obersteiner et al., 2020). The reason is that for comparisons of fractions smaller than 1 (the item type used in most previous studies), gap comparison yields the correct response for all incongruent items, but it may or may not yield the correct response for congruent items. For example, 31/71 vs. 13/23 is an incongruent comparison, and applying gap comparison (40 vs. 10, hence 13/23 must be larger) yields the correct response. However, 39/51 vs. 18/31 and 35/51 vs. 18/31 are both congruent comparisons, and gap comparison leads to the correct response in the former case (gaps of 12 vs. 13) but not the latter (16 vs. 13). If people rely strongly on gap comparison and the item set includes both congruent and incongruent items, people would be more efficient (faster and more accurate) on incongruent than on congruent items, which might partly explain the reverse bias pattern.

The Present Study
The aim of the present study was to assess adults' strategy use on challenging fraction comparisons. We considered the facets of strategy use described earlier. Considering participants' strategy repertoires and frequencies, we focus in particular on the use of holistic strategies (especially the use of benchmarks) and componential strategies (especially the use of gap comparison). We expected that participants would use holistic strategies to a large extent, because previous studies have documented adults' strong performance and distance effects in fraction comparison (see section entitled Strategy Use in Fraction Comparison), both of which are in line with holistic strategy use. We were less certain about whether adults would use componential strategies, especially gap comparison.
Regarding strategy efficiency, we did not make specific predictions. Although holistic strategies may be faster to apply than some componential strategies, and they may consistently yield accurate answers, they also include approximations that may be imprecise in challenging fraction comparisons.
Regarding strategy flexibility and adaptivity, we expected that individual participants would use many different strategies, as in previous research with simple comparisons (e.g., Fazio et al., 2016). We also expected that strategy use would depend on features of the items. Specifically, we expected that participants would use benchmark strategies more frequently when a benchmark was in between the given fractions (e.g., when one fraction was larger and the other smaller than 1/2), or when one fraction was particularly close to a benchmark.
Finally, we were interested in whether we could evoke shifts in participants' strategy use by highlighting the usefulness of a holistic strategy. To this end, we encouraged some participants to use benchmark strategies. If people can easily shift towards more frequent use of holistic strategies, it would suggest that strategy use is malleable, which would have implications for educational practice.

Stimuli
We used the same set of 56 fraction comparison items as in a previous study that did not involve strategy reports . All fractions were smaller than 1 and in simplest form. All denominators and most numerators (86%) were two-digit numbers. The two numerators or two denominators within a fraction pair were never equal, and they were never integer multiples of one another. Half of the items were congruent (i.e., the larger fraction had a larger numerator and larger denominator than the smaller fraction) and half were incongruent (i.e., the larger fraction had a smaller numerator and smaller denominator than the smaller fraction). Within the congruent and the incongruent subsets, there were three types of items with respect to the benchmarks 0, 1/4, 1/2, 3/4, and 1. The first type was straddle items, in which one fraction was smaller and the other larger than one of the benchmarks, so that fraction pairs of this type "straddled" either 1/4, 1/2, or 3/4. The second type was in-between items, in which both fractions' magnitudes were between two adjacent benchmarks but did not straddle a benchmark. Thus, the two fractions were either both larger than 1/4 and smaller than 1/2, or both larger than 1/2 and smaller than 3/4. The third type was close-to-0-or-1 items, in which both fractions were either smaller than 1/4 (and larger than 0), or larger than 3/4 (and smaller than 1). Even though close-to-0-or-1 items were also between benchmarks, we considered them a separate category because previous research suggested that fractions' proximity to 0 or 1 made these items easier to solve . All items had small numerical distances (M = 0.14; range 0.09-0.16), and all fractions had similar distances to the closest benchmark (0, 1/4, 1/2, 3/4, or 1; M = 0.06; range 0.03-0.11).
The item set was split into two subsets of 28 items each, such that each subset contained equally many items of each type, as described above. We added three filler items to each subset, and these fillers were not included in the analysis. Filler items were easy fraction pairs (with same numerators or with very large differences between fractions) that did not fall into any of the categories described above. These items were added to maintain participants' motivation and attention. 1 Thus, each participant was presented with a total of 31 items, consisting of one subset of 28 items and three filler items. Preliminary analyses revealed no differences in performance across the two subsets, so they were collapsed for analysis.

Procedure
Data were collected in small group sessions, with each participant working individually at a computer. Items were presented using E-Prime software. Participants wore headphones with microphones. They were presented with two fractions at a time and asked to choose the greater fraction as quickly and accurately as possible by pressing the corresponding left ("f") or right ("j") key within 15 seconds. After participants pressed the key, the fraction pair remained visible on the screen, and the question "How did you solve this problem?" appeared above the fraction pair. Participants then had another 20 seconds to respond by speaking into their microphones. After the given time limit, the next item appeared automatically.
There were two practice items before the experiment started. Accuracy feedback was provided for practice items but not for test items. Test items were presented in random order, and the position of the larger fraction (left or right) was counterbalanced across trials.
Participants in the tip condition saw an additional screen after the general instructions. The text on the screen suggested that it could be helpful to think of numbers such as 1/2, 1/4, or 3/4 that could be used as benchmarks to compare fractions. The example "5/8 vs. 3/7" was provided to illustrate the benchmark strategy. The text said that one could think that 5/8 is larger than 1/2, and that 3/7 is smaller than 1/2, so that 5/8 must be the larger fraction. No further explanation was provided.

Data Analysis Accuracy and Response Time Data
We first provide an overview of accuracy and response time data (previously reported in Obersteiner et al., 2019). The data were analyzed in SPSS 23 using a General Estimation of Equations (GEE) model that accounts for repeated meas ures within subjects. In these analyses, the within-subject factors were congruency (congruent/incongruent) and item type (close-to-0-or-1/straddle/in-between); the between-subject factor was tip (yes/no). For the dichotomous accuracy data, we used a binary logistic regression model. For response time data, we used a linear regression model with a logarithmic link function.

Verbal Responses
Participants' verbal responses (N = 2016) were transcribed and then coded by two coders (the first and third authors of this paper). The coding scheme was developed following a combined deductive and inductive approach: We first defined the major categories (holistic and componential) and then identified specific strategies that fell into these categories. Specific strategies (e.g., benchmark, gap comparison) were identified based on the literature (e.g., Clarke & Roche, 2009;Fazio et al., 2016). We modified and extended the coding scheme as necessary until no new strategies were observed in the data. Table 1 provides an overview of strategies that were used on at least 1% of items.
Verbal recordings from 18 randomly-selected participants (n = 504 responses, 25% of the data) were double coded, and interrater agreement was evaluated for this subset. Agreement was high, both for coding strategies into broad categories (holistic vs. componential), Cohen's kappa = 0.84, and for coding strategies into subcategories, kappa = 0.77. Disagreements were discussed until agreement was reached, and the agreed-upon codes were used in the analysis.
Participants sometimes reported two strategies on the same trial. In most cases, one strategy was clearly the primary strategy, and another was mentioned only briefly (e.g., as a possible approach). In these cases, we coded only the primary strategy. Otherwise (in 17% of all trials), we coded both strategies. 2 Because our primary interest was the distinction between holistic and componential strategies, we considered as holistic all strategies in which people relied on overall fraction magnitudes. Importantly, categorizing a strategy as 2) Note that for these reasons the total number of strategies in the following analyses is larger than the number of verbal responses coded. holistic did not imply that people needed to directly activate a holistic magnitude representation; they could also construct a holistic magnitude representation. Note. Heuristics that are not generally valid are marked with an asterisk (*).

Accuracy and Response Times
Average accuracy (Acc; M = 86%, SD = 11%) and response times (RT; M = 4813 ms, SD = 1432) were comparable with previous studies, suggesting that participants were well able to solve these challenging fraction comparisons.

Strategy Use
One participant was excluded from the strategy analyses due to technical problems with the recording. As reported above, the effect of the tip about benchmark strategies on accuracy and response times was generally small, and the main effect of tip was not significant for either accuracy or response times. This was also the case for strategy use. Participants who received the tip used holistic strategies only slightly more often (52%) than participants who did not receive the tip (49%), and this difference was not significant, χ 2 (1) = 2.46, p = .117. Therefore, for the remainder of the analyses, we collapsed the tip groups. 3

Strategy Repertoire and Frequency
Participants used many different strategies (see Table 1). As seen in Table 3, overall, participants used holistic strategies more frequently (51% of responses) than componential strategies (38%). The large majority of holistic strategies (82%) involved benchmarks. Benchmarks were either used as numbers straddled by the given fractions ("one fraction larger, the other smaller"; 19%) or, more often, as reference numbers to which one or both fractions were compared ("fraction close to benchmark"; 81%). The most prominent benchmarks were 1/2 (used in 42% of benchmark strategies), 1 (29%), 1/3 (10%) and 1/4 (4%). Several other numbers (including 0, 3/4 and 2/3) were infrequently used as benchmarks, as well. Holistic strategies that did not include benchmarks involved either multiplicative reasoning about the ratio between numerator and denominator or performing a division, often to convert fractions to an exact decimal or percentage. These strategies were relatively infrequent.
3) We ran all analyses reported in the following separately for participants in the tip and no-tip group. All differences between the two groups were small and did not change the overall conclusions. For separate analyses, see Supplementary Materials. Among componential strategies, the most frequent was gap comparison, which was the second-most-frequent strategy overall. Other componential comparisons were multiplication strategies, in which fraction components were multiplied to get common (or similar) numerators or denominators. It is notable that component comparison (i.e., simple compari son of numerators or denominators) was used infrequently (5% overall).

Strategy Efficiency
Although mean accuracy was high overall, holistic strategies led to correct responses slightly more often (M = 90%, SE = 30) than componential strategies (M = 86%, SE = 35), Wald χ 2 (1) = 4.47, p = .035, OR = 1.04, 95% CI [1.01, 1.08]. There was no significant difference in response times between holistic and componential strategies, χ 2 (1) = 1.92, p = .166. Participants who used holistic strategies more frequently tended to be more accurate overall; percentage use of holistic strategies was correlated with accuracy, r(69) = 0.29, p = .014, but not with average response times, r(69) = -0.01, p = .976. To evaluate how efficiently (i.e., quickly and accurately) different strategies were used, we calculated the Inverse Efficiency Score (IES), which combines accuracy and response times into one measure (Bruyer & Brysbaert, 2011). The IES is calculated by dividing average response times by accuracy, and it can be interpreted as the average response time needed to provide a correct response (i.e., smaller values indicate higher efficiency). Table 4 displays accuracy, response times, and IES for each strategy. Among holistic strategies, benchmarking was the most efficient. Among componential strategies, component approximation was most efficient, followed by gap comparison. Comparing benchmark and gap comparison (the two most frequent strategies), gap comparison was more efficient.

Strategy Flexibility and Adaptivity
Across the 28 comparisons, participants used, on average, five of the strategies described in Table 3 (M = 5.13, SD = 1.22; range: 2-8). 4 No participant rigidly used a single strategy. Thus, participants used strategies flexibly. The number 4) Note that this analysis did not include strategies that were coded as "other". of different strategies participants used was not correlated with their overall accuracy, r(69) = 0.11, p = .373, or response times, r(69) = 0.00, p = .988.
Participants used strategies adaptively in the sense that they frequently used highly efficient strategies-primarily benchmarking and gap comparison (ranked fourth and second, respectively, in Table 4). It is notable that participants rarely used the most efficient strategy, component approximation, presumably because it was applicable only on a subset of the items (i.e., those with similar numerators or denominators). We also evaluated whether strategy use varied depending on item type. First, we considered the three item types (straddle, in-between, and close-to-0-or-1). Recall that participants were most accurate and fastest (i.e., most efficient) on close-to-0-or-1 items, followed by straddle items, and then in-between items (see section entitled Accuracy and Response Times). A major difference between the three item types was that participants used holistic strategies much more frequently than componential strategies for in-between, 60% vs. 27%; χ 2 (1) = 82.51, p < .001 5 , and straddle items, 53% vs. 34%; χ 2 (1) = 44.15, p < .001, but the opposite was true for close-to-0-or-1 items, 38% vs. 54%; χ 2 (1) = 16.70, p < .001. Figure 1 illustrates this difference, and Figure 2 presents data for all strategies for each of the three item types. On in-between and straddle items, participants frequently used benchmarking, and they did so more often than they did on close-to-0-or-1 items (Wald χ 2 (2) = 47.85, p < .001; all direct comparisons between the three types p < .001, OR = 1.07-1.19). On close-to-0-or-1 items, participants still used benchmarking most often. However, on close-to-0-or-1 items, participants also used gap comparison relatively frequently, and they did so more often than on the other two item types (Wald χ 2 (2) = 50.31, p < .001; all direct group comparisons p < .05, OR = 1.03-1.13). Participants also used multiplicative reasoning less frequently on close-to-0-or-1 items than on the other item types (4% vs. 9% vs. 9%) and they used component approximation (the most efficient strategy, see Table 4), more frequently (10% vs. 1% vs. 3%), although these strategies were used infrequently overall. In sum, participants adapted their strategy use to item type, with benchmarking playing a special role on in-between and straddle items. 5) Non-parametric chi-square test of the distributions of holistic vs. componential strategies.

Figure 2
Percent Use of Individual Holistic and Componential Strategies, for Each of the Three Item Types Note. Percentages may not exactly correspond to those displayed in Figure 1 due to rounding.
Finally, we were interested in whether strategy use could partially explain the congruency effect. We had hypothesized that this could be the case if gap comparison was used more often or more efficiently for incongruent than congruent items. Table 5 displays strategy frequencies and efficiency scores for the two most frequent strategies, benchmarking and gap comparison, separately for congruent and incongruent items, and also displays differences in efficiency scores. While there were no substantial differences in frequencies of these strategies between congruent and incongruent items, it stands out that gap comparison was much more efficient on incongruent than congruent items, whereas the efficiency difference was much smaller for benchmarking. As noted earlier, gap comparison yields correct responses on all incongruent items, but not on all congruent items, which may explain the more efficient use of gap comparison on incongruent items overall. In our item set, gap comparison led to the correct response on only 10 of the 28 congruent items. When we removed the 18 congruent items for which gap comparison yielded an incorrect response, the efficiency score difference between congruent and incongruent items was reduced by 50% (from 11.8 to 5.9), but it did not fully disappear. Specifically, incongruent items were still solved significantly more accurately, M = 94%, SE = 1.  -164, 442]. In sum, the use of gap comparison explained some-but not all-of the congruency effect.

Discussion
Adults can solve complex fraction comparisons quickly and accurately. On average, participants needed fewer than 5 seconds per item and solved 86% of these challenging comparisons correctly, in line with earlier research on challenging fraction comparisons in educated adults (Obersteiner et al., 2013Schneider & Siegler, 2010). This generally high level of performance raises questions about what strategies adults use to compare fractions and how flexible and adaptive they are.

Strategy Use in Fraction Comparison
Participants used holistic strategies, which rely on fraction magnitudes, far more frequently than componential strat egies, which do not rely on fraction magnitudes. Thus, in line with earlier research with other methods (González-Forte et al., 2020;Obersteiner et al., 2013), our verbal report data suggest that the fraction magnitude comparison task with challenging items (but not with simple items) is a suitable means to assess fraction magnitude processing.
Our study allowed us to characterize the specific strategies that adults used to reason about fraction magnitudes. The most prominent holistic strategy involved using familiar numbers (especially 1 and ½) as benchmarks, in line with earlier research on simpler fraction comparisons (e.g., Clarke & Roche, 2009). The question remains how participants determined which benchmark was most suitable for any given item. For example, it is not clear why participants mentioned 1/3 as a benchmark more frequently than 3/4. One explanation could be that 1/3 is a unit fraction (with numerator 1), which people may use more readily. On the other hand, one could argue that 3/4 is more familiar from everyday situations. Further research on item features that may encourage using specific benchmarks is needed.
Gap comparison was the second most frequent strategy overall, and the most frequent componential strategy. Although this strategy is not always valid, participants used it relatively frequently, as in previous studies of simpler fraction comparisons (e.g., Clarke & Roche, 2009;González-Forte et al., 2020). Our results showed that participants used gap comparison most frequently and very efficiently on close-to-0-or-1 items. This result reflects participants' adaptivity, because the subset of close-to-0-or-1 items included relatively more items for which gap comparison led to the correct response (31%, compared to 13% in each of the straddle and the in-between item subsets). Again, the question remains how participants decided whether gap comparison would be efficient for any given item. One explanation is that in the close-to-0-or-1 items, one fraction always had a particularly large or particularly small gap, and such gaps were presumably highly salient. It is also possible that participants sometimes used gap comparison first, and then reasoned about fractions' proximity to 0 or 1, which is a form of benchmarking. Although both strategies (gap comparison and benchmark) were assigned when participants mentioned them on the same trial, and we found combinations of gap comparison and benchmark very rarely, we cannot rule out the possibility that some participants used both strategies in combination but did not verbalize both.
We did not find a large effect of the tip that we gave some participants about the usefulness of benchmark strategies. This could mean that strategy use is not easily malleable. However, Fazio et al. (2016) found that university students' strategy use in fraction comparison was malleable. Given these past findings, we assume that the minimal effect of our tip was more likely due to the brief nature of the tip and to the high difficulty of our comparison items. We do not take our results as evidence against teaching approaches that encourage students to use various strategies. On the contrary, we think that adaptive and flexible use of various strategies such as benchmarking should be a focus of mathematics instruction, as it may help participants build readily-accessible representations of fraction magnitudes.

Natural Number Bias
Overall, participants displayed a reverse natural number bias. Our results suggest that this reverse bias may be associ ated with gap comparison more so than with simple comparisons between numerators or denominators, which people commonly employ when comparing fractions with common components (Fazio et al., 2016;Van Hoof et al., 2020). Our fraction comparison items did not have common components, and participants very rarely reported comparing components across fractions. Instead, people used gap comparison not only on incongruent items, for which it was very efficient, but also, albeit somewhat less frequently, on congruent items, for which it was very inefficient. Participants' limited adaptivity in using gap comparison could partly explain the reverse bias pattern.
What other factors could contribute to the reverse bias pattern? One possibility is subtle processing mechanisms that are not accessible to verbal strategy reports (e.g., intuitive processes). Although some studies have identified intuitive processes, such as automatic activation of natural number magnitudes, as contributing to the typical bias pattern , these processes cannot explain the reverse bias pattern. It may be that item congruency is related to other salient features that have not yet been identified.

Limitations
One limitation of this work concerns the validity of retrospective self-reports of strategy use. Our experimental procedure was in line with accepted criteria for valid self-reports (Ericsson & Simon, 1993; i.e., solution time longer than 1 sec, response within 10 seconds). However, some participants struggled to explain their strategies (about 5% of responses were missing or unintelligible), and prompting participants to report their strategies may have affected their strategy choices. Even so, the accuracy and response time data were largely in line with previous work that used the same items but did not require verbal reports , and most of the strategies we identified have been previously documented (see, e.g., Fazio et al., 2016).
A second limitation is that this work does not provide definitive evidence about the sequential and temporal processes involved in fraction comparison. For example, people might start out by considering differences (rather than ratios) between numerators and denominators, and if these differences provide enough information for them to feel confident in their magnitude comparisons, they may base their responses on these differences (i.e., using the gap comparison strategy). If, however, gap comparison is inconclusive, or if other item features encourage holistic reasoning, they may then engage in more effortful holistic reasoning. Some adults, knowing that gap comparison is not a universally applicable heuristic, may bypass such reasoning altogether, and engage immediately in holistic reasoning. Further work is needed to delineate the processes involved in selecting and applying comparison strategies.

Conclusion
In sum, this study revealed that educated adults often rely on fraction magnitudes when comparing challenging fraction pairs. Thus, the fraction comparison task with challenging items is suitable for assessing fraction magnitude processing. The present work highlights both flexibility and adaptivity in adults' strategies for fraction comparison, and it demonstrates that strategy choices contribute to the occurrence of the reverse natural number bias.