Developing a Rigorous Measure of the Pre-School Home Mathematics Environment

Children begin pre-school with varying levels of school readiness. Those children who enter pre-school with better foundational mathematics skills are more likely to succeed in school than those who do not. This initial variation in early mathematics suggests that experiences outside of the school setting, namely the home environment, may support learning and development. This study aims to systematically develop a comprehensive home mathematics environment questionnaire that reliably assesses the experiences of pre-school children (i.e., 3–5-year-olds) following recent recognised scale development and validation methods. Four studies were used to develop and validate the Pre-school Home Mathematics Questionnaire (PHMQ). Study 1 focused on 1) item generation through individual, in-depth interviews with parents of young children and 2) identifying previous questions from other home mathematics environment (HME) questionnaires to be incorporated into the PHMQ. Study 2 involved questionnaire refinement and was used to assess the psychometric properties of the new measure while addressing construct validity (i.e., factor structure and scale score reliability). Study 3 assessed content and criterion validity of the scale. Finally, Study 4 focused on construct validity through confirmatory factor analysis. Overall, the four studies demonstrate construct, content, and criterion validity. Hence, the newly developed PHMQ satisfies the American Psychological Association (APA) standards for psychometric adequacy.


The Home Learning Environment
The frequency of home learning activities has been established to have impact on child development. For example, Melhuish et al. (2013) investigated the long-term effects of different pre-school provision on child development and found that children from homes with the lowest frequency of home learning environment activities were almost three times less likely to attain Level 5 in mathematics at the end of Key Stage 2 (i.e., 11-year-olds in Northern Ireland), than children from homes with a higher frequency of home learning environment activities. Thus, it has been suggested that the frequency of home learning environment activities can diminish or benefit individual success later in life (Sénéchal & LeFevre, 2002).
Studies that explore the nature of the home learning environment have found wide variations between families. For instance, the quality of the home learning environment is associated with the availability of educational resources, for example books and board games (Anders et al., 2012;Cankaya & LeFevre, 2016;Gunn, Simmons, & Kameenui, 1995;Melhuish et al., 2008;. Previous research demonstrates that the quality of the home learning environment can be investigated either in relation to the domain of home literacy or home numeracy (Huntsinger, Jose, Larson, Balsink Krieg, & Shaligram, 2000;Huntsinger, Jose, Liaw, & Ching, 1997;LeFevre et al., 2009;Sénéchal & LeFevre, 2002) or irrespective of domain (Anders et al., 2012;Melhuish et al., 2008). There is a vast amount of literature that examines the role of the home literacy environment in contrast to the emergent literature on the home numeracy environment (HNE; Burgess, Hecht, & Lonigan, 2002;Frijters, Barron, & Brunello, 2000;Hart, Ganley, & Purpura, 2016;Kirby & Hogan, 2008;Sénéchal & LeFevre, 2002;Sénéchal, LeFevre, Thomas, & Daley, 1998). Hence, more research is necessary to understand the impact of the HNE.
The motivation for the creation of home mathematics environment (HME) measures has been grounded in evidence that since the early home environment (i.e., during pre-school years) has been connected to children's literacy skills it is theoretically reasonable to predict that the early home environment will impact children's numeracy skills (Blevins-Knabe, 2016;LeFevre et al., 2009LeFevre et al., , 2010Lukie, Skwarchuk, LeFevre, & Sowinski, 2014). Accordingly, researchers have adapted questions from home literacy environment questionnaires or generated novel questions to create HME questionnaires. Alternatively, other home numeracy questionnaire measures have been based on variations of the Home Observation for Measurement in the Environment (HOME) inventory (Caldwell & Bradley, 1984) for example, Anders et al. (2012). However, as the development of measurements should ideally be both deductive and inductive (Williamson, Karp, Dalphin, & Gray, 1982) the current study used both rigorous approaches for questionnaire development.

Inconsistent Findings in HME Research
It is important to note that the literature on the relation between the HME and mathematical learning has produced inconsistent results, this is in stark contrast to the literature on the home literacy environment and its relation to reading outcomes (Morrison, Bachman, & Connor, 2005). The majority of HNE studies used questionnaire based self-report measures of the frequency of home numeracy activities. Many studies have established a positive, unique impact of the frequency of HNE activities on mathematical development (Dearing et al., 2012;Kleemans, Peeters, Segers, & Verhoeven, 2012;Manolitsis et al., 2013;Niklas & Schneider, 2014). In contrast, some studies have found no relation between HNE and a range of mathematical skills (e.g., Blevins-Knabe, Austin, Musun, Eddy, & Jones, 2000;Missall, Hojnoski, Caskie, & Repasky, 2015). Typically, research indicates that socio-economic status (SES) is related to mathematical development, (Galobardes, Shaw, Lawlor, Lynch, & Smith, 2006;Mercy & Steelman, 1982;Sammons et al., 2004). Even when a relationship has been established, some studies have identified that after controlling for SES and parental attitudes the relationship does not persist (DeFlorio & Beliakoff, 2015).

Child Characteristics
A recent review of published papers indicated that inconsistent findings may be attributable to differences in age of children within samples (Thompson, Napoli, & Purpura, 2017). This narrative review indicated that HNE did not impact on mathematical outcomes of younger children (approximately 3-4-year-olds) but did moderately affect older children (approximately 5-6-year-olds). Although not specifically highlighted by the review, it is striking to note that across the 13 included studies a wide variety of questionnaire measures were administered, with some overlapping content.

Psychometric Properties
Some researchers who have created HNE scales have not provided adequate information about item generation and refinement, scale dimensionality, scale score reliability, or validity (e.g., Kleemans et al., 2012;LeFevre et al., 2009;Melhuish et al., 2008). As few questionnaires have been developed following best practice for scale development or have been validated beyond construct validity (e.g., LeFevre et al., 2009) the inconsistent results are perhaps unsurprising.

Characteristics of Content and Activities
There are many concepts that are captured in mathematics (i.e., numeracy, spatial skills, geometry, patterning) that are not captured in every frequency of activities questionnaire in the same way. For example, geometry is covered in both Hart et al. (2016; age 3-8 years old) and Missall, Hojnoski, and Moreano (2017; age 3-5 years old) questionnaires through different questions; "Fold or cut paper to make 3D objects, " "Play with legos" (Hart et al., 2016) and "Identify shapes in the everyday settings and activities, " "Put shapes together to make a larger shape" (Missall et al., 2017). The wide variety of skills that are encompassed by the concept "mathematics" and the variety of ways by which these skills can be measured could be a source of inconsistence in HME literature.
Some researchers have made distinctions between different types of activities using terms interchangeably, such as indirect versus direct and informal versus formal skills, with different definitions between studies (e.g., Anderson, 1998;LeFevre et al., 2009;Skwarchuk et al., 2014). LeFevre et al. (2009) conceptualised activities as either indirect or direct. Indirect activities were defined as those that are naturally occurring tasks that communicate mathematical information incidentally, for example playing boards games with dice, setting the table or weighing while baking. Direct activities are those that are used to directly teach mathematical skills or concepts to develop child's mathematical skills, for example practicing simple sums and learning to identify number symbols. Skwarchuk et al. (2014) suggested that participating in formal practices would support the development of symbolic mathematics knowledge, while informal mathematics exposure would promote non-symbolic mathematics skills. Skwarchuk et al. (2014) found that formal home numeracy practices accounted for unique variance in children's symbolic number knowledge whereas informal exposure to games with numerical content predicted children's non-symbolic arithmetic performance, thus supporting their hypothesis. However, this hypothesised conceptual model of the HNE  has rarely been re plicated. For example, there appears to be a differential effect of formal and informal activities on mathematical learning, with formal activities being positively related to attainment and informal activities being negatively related (Huntsinger, Jose, & Luo, 2016). Further, Huntsinger et al. (2016) found that participating in formal mathematics activities predicted both formal (learned through explicit instruction using rules, principles, and procedures e.g., calculations both addition and subtraction) and informal (acquired outside of formal schooling e.g., concepts of relative magnitude) mathematics knowledge, whereas engaging in informal activities predicted neither. Hence, dichotomisation of home mathematics activities does not seem to reduce the inconsistences in the literature.
In addition, some studies make a distinction between basic and advanced activities (Skwarchuk, 2009). Of course, the type of content of these two types of activities varies with age, for pre-schoolers advanced activities may include multiplicative counting, whereas this may be a basic task for a child in the early primary years. These developmental changes in children's skills have perhaps led to inconsistent findings on the relationship between HNE and mathematics skills. For older children, heightened frequency of advanced activities is associated with higher level mathematical skills, and the reverse for the frequency of basic activities (Skwarchuk, 2009;Skwarchuk et al., 2014). In contrast for younger pre-schoolers (i.e., 3-years old) the reverse is true, with more basic activities, rather than advanced, associated with higher attainment (Thompson et al., 2017). Of course, these four factors can overlap, but it is important that any validated questionnaire can record and assess this breadth of home-based activities for the targeted age group. In the context of conflicting results from a growing body of studies there is a clear need to develop and validate a coherent and inclusive measure of HME which is both reliable and valid.

Other Considerations
In addition, it is uncertain that the items within currently published HME literature reflect the rapidly changing home environment of children (albeit mainly growing up in the Global West), specifically in relation to technology (OfCom, 2013(OfCom, , 2016. OfCom (2016) state that there are two devices in the home that continue to be used by children: television sets (92% for 3-4-year olds and 96% for 5-7s) and tablets (55% for 3-4s and 67% for 5-7s). Thus, technology advances have potentially expanded the reach of maths learning in the home. Yet, questions about educational technology are rarely used beyond one question in HME questionnaire measures, e.g., How often did you and your child engage in the following activities? "Uses maths software" (Huntsinger et al., 2016) and "Playing counting games using child computer or arithmetic software" (Kleemans et al., 2012) and so on (e.g., Deflorio & Beliakoff, 2015;Skwarchuk & LeFevre, 2015). This makes it difficult to measure the extent of educational technology being used in the home and whether it makes a difference. Hence, this study aims to develop a measure that includes a variety of items regarding educational technology in relation to maths learning. This study will explore what types of items need to be included in a HME measure that reflect educational technology practices in the home environment. Through qualitative research Cahoon, Cassidy, and Simms (2017) identified that parents regularly use technology with their preschoolers to support mathematical learning. Thus, failure to include multiple questions related to technology use in HME questionnaires may lead to misrepresentation of the home environment.
The HNE has sometimes been approached as a unidimensional construct (e.g., Blevins-Knabe & Musun-Miller, 1996;Kleemans et al., 2012) wherein all activities occurring in the home environment related to numeracy have been measured. Thus, many studies focus on the number activities and ignore other important areas such as technology and sibling interaction (Cahoon et al., 2017). Some HNE questionnaires do cover other mathematical domains such as geometry and shape (e.g., LeFevre et al., 2009). However, most HNE questionnaires use narrow terminology by using the term numeracy. It is necessary to be more consistent in communicating that both children's numerical skills and a broader range of mathematics skills (i.e., numeracy, spatial skills, geometry, patterning etc.) are being examined for their effect on the home environment (Blevins-Knabe, 2016;Hart et al., 2016), thus the term HME may be more appropriate. In this paper the terminology HME has been used when broader range of mathematics skills are discussed (including numeracy, spatial skills, geometry, patterning).

Current Study
There are many possible reasons for the inconsistent findings among HNE research; 1) the characteristics of the children participating to the studies, 2) the psychometric properties of the questionnaires that were used in previous studies and 3) the characteristics of the content and activities that the parents offer to these children. This study aims to systematically develop a comprehensive HME questionnaire that reliably assesses the experiences of pre-school children (3-5-year-olds), this new measurement tool will be referred to as the Pre-school Home Mathematics Questionnaire (PHMQ) as it involves home environment relevant dimensions beyond numeracy. The questionnaire was developed using the framework of Learning Trajectories (Clements & Samara, 2004) reflecting the learning goals and activities that children might engage in (Simon, 1995). Most HME questionnaires have been developed and used in home environments that reflect the developed world, for example Canada. This is the first study within the UK that has created an HME questionnaire that is culturally specific, were items are not just deductive and drawn from other HME questionnaires such as Melhuish et al. (2008). Hence, the aim of the PHMQ is to develop a culturally appropriate HME questionnaire that shows good psychometric qualities for 3-5-year olds growing up in the UK. This specific age-related focus is important due to the varied nature of activities that are appropriate across development. It is of utmost importance that this new measurement instrument demonstrates strong psychometric properties (i.e., reliability and validity, Hinkin, 1998;Schoenfeldt, 1984). The creation of measurement tools should ideally be both inductive and deductive (Williamson et al., 1982), an approach unique to this current study of scale development. An advantage of using both deductive and inductive approaches to scale development is that it increases the chances of content validity in the final scale (Hinkin, 1998).
To develop and validate the dimensions of the PHMQ and produce an instrument with evidence of reliability and validity, this study has followed recent scale development and validation research processes (e.g., Hinkin, 1998;Nunes, Pretzlik, & Ilicak, 2005). Overall, four studies are included in this paper that support the examination of construct, content, and criterion validity. The ultimate objective of this scale development and validation process is to ensure that the new PHMQ measure aligns with APA standards for psychometric adequacy (APA, 1995;Hinkin, 1998) Table 1 provides an overview of the processes involved in each of the four studies within this paper that ensure rigorous development and validation methods of the PHMQ. As there are four studies involved in this current paper, each study begins with an overview followed by the method and results of the study. The only study that does not follow this structure is study one which involves an overview and method only. The reason for this is because this study involves generating items based on previous interview transcripts (Cahoon et al., 2017) and it is believed all relevant information is provided for the reader to understand how the items were generated, including information provided in the Supplementary Materials of this paper (see Appendix 2, Table 1). Each study in this paper was reviewed and approved by School of Psychology Research Ethics Committee before the study commenced. Signed written consent was obtained from all participants. For each study the criteria for participation was that the parent/guardian defined themselves as the primary carer for a children of pre-school age.

Study 1: An Overview
Following construct definition, item generation (Table 1: Study 1) featured individual, in-depth interviews with eight parents with pre-school aged children (i.e., 37 to 59 months, M age = 47.5 months), using the same transcripts used in previous literature (Cahoon et al., 2017). The interviews were exploratory and aimed to gain opinions from parents on their everyday routine activities and understand the way in which parents encourage the development of early numeracy skills in the home. The six themes that were identified from the thematic analysis were; 1) numeracy environ ment structure, 2) frequency of number-related experiences, 3) levels of number knowledge, 4) views of technology, 5) parent-child interactions and 6) social interaction. The diversity of the themes illustrated how the HME may be influenced by parents' views and experiences of numeracy-related activities and children's interactions with others. For instance, 1) the numeracy environment structure theme demonstrated the types of environments that parents create for their children to learn numeracy in the home. Initially participants state that teaching mathematics should be instinctive but admitted that it is difficult to spontaneously formulate plans. Findings showed that parents may not always be cognisant when undertaking numerical activities with their child in the home and hence the HME is largely unstructured (see Cahoon et al., 2017 for more detail). Through the thematic analysis used within this paper (i.e., Cahoon et al., 2017) the theoretical foundation for the PHMQ was developed. These transcripts were then used to generate items for the PHMQ using content analysis. In addition, previous questions from other questionnaires were identified and incorporated into the PHMQ.

Method Item Generation
Using content analysis, this inductive approach developed 44 items to create the initial PHMQ. Further, the deductive item generation method developed a base set of items that assessed the HME drawn from previous HME measures (e.g., LeFevre et al., 2009;Lukie et al., 2014;Kleemans et al., 2012;Melhuish et al., 2008) and previous parent-child interaction research, such as observational research involving parent guidance and support (e.g., Bjorklund, Hubertz, & Reubens, 2004;Vandermaas-Peeler, Boomgarden, Finn, & Pittard, 2012). All items were cross-referenced between those mentioned from the interviews (e.g., a numeracy activity such as counting objects) and items from other HME measures or cited in previous parent-child interaction research. Together, the deductive items (N = 25) were combined with the inductive items (N = 44) and therefore totalled to 69 items. Thirty-eight items focused on the frequency of maths activities. Additional questions investigated more nuanced factors, such as interaction with parents and siblings. It is acknowledged that there are more numeracy items than maths items at this stage. However, this would be reflective of the age group that the PHMQ is targeted towards (i.e., ages 3-5 years). Therefore, the activities are developmentally appropriate (see Supplementary Materials, Appendix 2, Table 1 for a detailed breakdown of the items, how each item was generated and initial item reduction criteria).

Study 2: An Overview
Questionnaire refinement (Table 1: Study 2) involved parents with children aged 3 to 5 years old taking part in the PHMQ. The aim of Study 2 was to examine how well items confirmed expectations concerning the psychometric properties of the new measure (Hinkin, 1998), including examining whether items produce necessary variance for subsequent statistical analyses (Stone, 1978). Study 2 addressed construct validity, which incorporated two psychometric properties, factor structure and scale score reliability. Furthermore, it is important that the response scale used (e.g., rank order or rating items) for the items produces necessary variance for subsequent statistical analyses (Stone, 1978).

Method Participants
A total of 172 parents/guardians took part in the PHMQ. To acquire an equal spread of participants across SES through data collection, the proportion of free school meals (FSM) per school was calculated across Northern Ireland, using Department of Education (2014) statistics. It should be noted that FSM is an imperfect proxy of mothers and partners' education and social class (Hobbs & Vignoles, 2007). Therefore, to avoid imperfect proxy bias (i.e., a proxy that correlates with the key variable but cannot be understood in isolation) parents were asked in the PHMQ to complete eight questions from the National Statistics Socio-economic Classification (NS-SEC; Rose & Pevalin, 2010), which allowed the researcher to derive SES using the Standard Occupational Classification (SOC). The FSM statistics were divided into three proportions to distinguish schools that had low (4-18%), medium (19-58%) and high (59-85%) FSM eligibility. The average FSM eligibility was 37.7%. It was anticipated that an equal spread of pre-schools would be contacted from the three FSM eligibility categories. However, there was a low participation rate from the pre-schools in the medium FSM eligibility category, so more pre-schools were contacted from this category. Participants were recruited through 11 local pre-schools and two privately owned soft-play centres. A soft-play centre is a soft obstacle play area for children up to the age of 8 years-old at which parents/guardians supervise play. Thus, it was deemed an ideal area to target parents with children aged between 3 and 5 years old. The proportion of PHMQ returned from each of the low, medium and high FSM categories were 30, 42.5 and 27.5%, respectively. The sample consisted of 148 mothers, 18 fathers, three grandparents, two foster parents and one adoptive parent. The target child that parents/guardians were responding to questions about were aged between 36 and 60 months (M age = 46.2 months; 52.3% female), with 85.5% of the target pre-school child having sibling/s (N = 147). The parents/guardians were between 23 and 65 years old (M age = 35.26 years). SES data was converted into a three-class categorical variable as described in NS-SEC (Rose & Pevalin, 2010), this can be assumed to involve a form of hierarchy: high SES (50.7%), middle SES (17.5%), and low SES (25.5%).

Procedure
The questionnaire was piloted (N = 10) to assess completion time and the ensure that the presentation was easy to read and understand. The questionnaire took approximately 10-15 minutes to complete and adjustments were made to the questionnaire to make sure participants would understand the terminology. After these changes were made the PHMQ was tested. No pilot data, at any point in this study, was used in analysis (e.g., for the exploratory factor analysis [EFA]). The participants that completed the PHMQ in the play centres did the questionnaire on the day they agreed to the study and they did not take them home. The participants who completed the PHMQ in the pre-schools returned the PHMQ to the child's teacher for collection in sealed envelopes to maintain confidentiality.

Data Analysis
Data was entered by two researchers and was verified to ensure 100% validity. A subject-to-variable ratio of 1:4.5 was achieved with 172 participants and 38 variables included in the EFA. This is consistent with previous research which suggests that a ratio of 1:3-1:6 subject-to-variable is acceptable (Arrindell & Van der Ende, 1985;Cattell, 1978).

Questionnaire Refinement
The PHMQ consisted of eight dimensions: 1) parent expectation of their children's academic success, 2) child maths literacy, 3) child counting ability, 4) parent-child teaching methods (e.g., what are the specific things parents say or do to encourage and support their child to learn maths?), 5) target child-sibling interactions (e.g., what numerical activities siblings are most likely to do together?), 6) parent's view of their child's understanding of numeracy, 7) caregivers support of numeracy learning in the home and 8) frequency of maths activities scale. See Supplementary Materials, Appendix 2, Table 1 for a detailed breakdown of the items, how each item was generated and initial item reduction criteria.
The first three dimensions, mentioned above, are known as benchmark questions as they give context to results by allowing comparison between participant responses. These are essential questions that gauge the background of the parents expectations for their child and the child's ability level. Each of these three dimensions had good variance and were retained for the final PHMQ. The next two dimensions parent-child teaching methods and target child-sibling interactions were named as interaction questions as they involve the target child interacting with both parents and siblings. The section parent-child teaching methods was kept due to good variation in results. However, the target child-sibling interactions (originally 13 questions) were reduced due to lack of variability, potentially explained by "halo effect" (i.e., parents wanting their child to be perceived favourably by reporting that they take part in an activity that may be too advanced for them). This finding was also discovered in the previous qualitative interviews (Cahoon et al., 2017). Therefore, 11 ranking options for target child-sibling interactions were reduced to seven ranking options. The threshold for cut off was any rank option that scored over 20% in the least likely categories. The reason for reducing rank order options was that participants found it too difficult to rank order 11 options. However, after reduction to seven ranking options this question piloted (N = 10) again and was still found to be difficult to complete. Therefore, this question was changed to match the 5-point Likert scale of the frequency of maths activities scale. The parent's view of their child's understanding of numeracy and caregivers support of numeracy learning in the home were removed due to lack of variability in results. There was a lack of variance in these questions indicating that they were classic "halo effect" questions (Fitzpatrick, 1991;Wilson, Hewitt, Matthews, Richards, & Shepperd, 2006).
The frequency of maths activities scale of 38 questions, were analysed using EFA to investigate variable relations for this complex concept. These items were analysed using a principle components analysis with oblique rotation (direct oblimin). Table 2 summarises the factor loadings after rotation for the frequency of maths activities scale. The Kaiser-Meyer-Olkin measure verified the sampling adequacy for the analysis (KMO = 0.80), and all KMO values for individual items were greater than 0.59. Five factors, comprising 28 items, had eigenvalues over Kaiser's criterion of 1 and in combination explained 53.14% of the common variance. The factors were labelled as follows; 1) parent-child interactions, 2) computer maths games, 3) TV programmes, 4) shape and 5) counting. Ten items did not load onto any factor and therefore these were removed from further analysis and questionnaire administration. To note, there is an item that factored into the parent-child interactions subscale that involves shape (i.e., Asking shape related questions [e.g., "how many sides does a circle have?"]). However, theoretically this makes sense as this activity would involve a parent interacting with their child to ask shape related questions. Cronbach's alpha for the full scale was .89. Cronbach's alpha for the subscales were acceptable, ranging from .76 for the counting factor to .81 for both the parent-child interactions and computer maths games factors, thus display good internal reliability. Overall, from the initial 69 items, 19 items (14 deductive and five inductive generated items) were removed for different reasons mentioned previously. Thus, a total of 50 items were retained. Through the item reduction process, the PHMQ contained six home environment dimensions; 1) parent expectation of their children's academic success, 2) child maths literacy, 3) child counting ability, 4) parent-child teaching methods, 5) target child-sibling interactions and 6) frequency of maths activities scale. The dimensions called child's understanding of numeracy and support questions were removed.

Study 3: An Overview
After the development of the PHMQ, the scale validation process involved two studies, the first being qualitative to assess content and criterion validity (Table 1: Study 3). Content validity considers whether appropriate questions have been asked in the measure (Nunes et al., 2005). It allows for comparison of the themes identified in the questionnaire with those emerging in subsequent interviews (Nunes et al., 2005). Criterion validity investigates contrast cases of participants with very high or very low scores on each of the themes within a questionnaire and compares the contrasting cases to the interview responses (Nunes et al., 2005). This enhances the validity of the dimensions included within the PHMQ.

Method New Dimension
At this stage before the following semi-structured interviews were conducted, a new dimension called the number game checklist was developed and added to the PHMQ. Skwarchuk et al. (2014) created a measure to assess the informal numeracy experiences by developing a number games title checklist. This framework was adapted from Sénéchal and LeFevre (2002) study that used parent's knowledge of storybook titles as a proxy measure of informal home literacy practices. The number games title checklist by Skwarchuk et al. (2014) was created for a Canadian sample, thus a new, culturally appropriate, number games checklist was developed as a measure of informal home numeracy practices (number games exposure checklist) so that the games were relevant to the United Kingdom (UK). The rationale for the inclusion of this dimension is so that the measure can assess informal home numeracy practices through parent's recognition of board games alongside the frequency of maths activities scale that potentially assesses both formal and informal home numeracy practices. Note that a two-factor model based To develop the board game checklist information was gathered about commercially available board games suitable for children aged 3 to 6 years both in store and online from three retail establishments. To compile the list of games, selection criteria were used to allow parents a chance to have knowledge of the games. In Sénéchal, LeFevre, Hudson, and Lawson (1996) book title checklist fairy tale games (i.e., those games that involved fairy tale characters from movies or television) for which a movie or television version existed were eliminated due to possible over familiarisation. To allow for the games to be readily available to parents only those game titles that were available in two of the three retail establishments were selected. Lastly to ensure that the games were accessible to all parents regardless of income level only games that were under £15 were selected. Games were categorised according to whether they included numerical components (counting, adding and recognising numbers). In contrast to Skwarchuk et al. (2014) that included 25 titles (10 numerical games, 10 non-numerical games, and five plausible but non-existent games), this board game checklist consisted of 30 game titles; 10 numerical; 10 non-numerical and 10 plausible but non-existing games. The number of plausible but non-existent games was increased to 10 as this was equal to that of the numerical and non-numerical game.
The newly created number games checklist was cross-referenced with Skwarchuk et al. (2014) number game exposure checklist. Four numerical, two non-numerical and one plausible but non-existing games were taken from Skwarchuk et al. (2014) checklist as they also reached the selection criteria used in this study. As in previous home numeracy research , parents were asked to indicate their familiarity with children's game titles. Parents were asked not to guess or stop to verify any game titles online or in a catalogue. Participants were informed that non-existing games were included in the checklist to minimise guessing. To calculate the number game checklist score, the total of correctly marked number games was corrected for guessing (e.g., if seven number games and one non-existing games were selected, this was scored as (7−1/10) × 100 = 60%; Skwarchuk et al., 2014). Therefore, overall the PHMQ was made up of seven-home environment relevant dimensions including the informal home numeracy practices (number game exposure) section to the PHMQ. The updated PHMQ with the seven-home environment relevant dimensions was subsequently piloted with parents/guardians to confirm refinement (N = 30).

Participants
Eight participants (M age = 37.8 years) agreed to take part in the PHMQ and the interview; six mothers, one father and one grandparent. The target child (50% female) were aged between 36 and 49 months (M age = 42.8 months). Data saturation was reached in the eight interviews which is consistent with other studies (Isman, Ekéus, & Berggren, 2013;Isman, Mahmoud Warsame, Johansson, Fried, & Berggren, 2013). Data saturation was achieved when further coding was not achievable, thus the ability to obtain additional new information from further interviews was no longer possible (Fusch & Ness, 2015).

Procedure
The interviews took place at two soft-play centres that had been used as sites in Study 2. The topic guide of questions asked to the parents included questions such as, "Do you think your child is interested in maths? If so, why?" and "Can you compare the frequency and structure of mathematical activities to reading at home?. " These questions were used as they were deemed appropriate to gain sufficient information for content and criterion validity as these were the same questions asked in the original interviews (Study 1; Cahoon et al., 2017). Half of the participants were administered the questionnaire before the interview and half of the parents were given the questionnaire after the interview. The individual interviews lasted approximately 40 minutes and the PHMQ took approximately 10 minutes to complete. The interviews were recorded and transcribed before analysing.

Data Analysis
The subscales in the frequency of maths activities scale were used to assess content and criterion validity. The other dimensions from the PHMQ such as the frequency of reading compared to numeracy, target child-sibling interaction, structure of the HNE and parent-child teaching methods, will be evaluated to assess the content validity of the PHMQ. The parents' responses were coded using NVivo (Version 11) into content categories based on the five subscales within the frequency of maths activities scale. Criterion validity was assessed through contrasting cases that were identified by obtaining the total scores for the five subscales and were calculated for each participant. Scores ranged from 0 to 4, based on a 5-point Likert scale. Respondents with low scores were more likely to answer that an activity did not occur and hence would have a score closer to 0. Respondents with high scores would be more likely to answer that an activity occurred almost daily and thus score closer to 4. The parents' interview transcripts were then searched for comments relevant to the subscales.

Content Validity
There was an agreement between parents' views in the interview and those assessed by the PHMQ. Issues surrounding the six dimensions of the PHMQ (i.e., 1) parent expectation of their children's academic success, 2) child maths literacy, 3) child counting ability, 4) parent-child teaching methods, 5) target child-sibling interactions and 6) frequency of maths activities scale) and the five frequency of maths activities subscales were mentioned in the eight interviews and used to assess the content validity. The definitions and sample comments illustrating each dimension and factor are summarised in the Supplementary Materials (see Appendix 3, Table 2). Appendix 3, Table 2 shows that each factor was mentioned from the frequency of maths activities scale in the interview thus, all items were retained in the scale. Summary of Content Validity -Two new additional topics arose from the current interviews that were not mentioned in the previous interviews. The first was that parent's interest in mathematics may influence the frequency of maths activities occurring in the home, therefore a question related to this topic was added to the PHMQ. The second was that parents reported watching videos with mathematics content on YouTube with their children, therefore the item "Maths related YouTube videos" was added to the frequency of maths activities scale. Overall, the analysis of interviews confirmed the dimensions included in the PHMQ.

Criterion Validity
It is only possible to use criterion validity on the frequency of maths activities scale within the PHMQ as criterion validity involves contrasting cases of high and low scores, possible through a Likert scale. Analysing contrasting cases indicates that in the frequency of maths activities subscales there are differences between the extreme high and low scores. The high and low contrasting cases with sample comments illustrating each subscale dimension are summarised in the Supplementary Materials (see Appendix 4, Table 3). Some noteworthy findings were that time limits were important with regards to the frequency of computer usage; this is one reason for the varying frequencies of computer maths games subscale. The types of TV programmes being watched may influence the frequency and perhaps be one reason for the contrasting cases. This would be expected as the TV programmes subscale only involves questions about educational programmes. Therefore, those children who are mostly watching non-educational TV programmes would score low on the TV programmes subscale. It is important to note that a child's interest plays a factor in the TV programmes they want to watch (Cahoon et al., 2017) and this could influence high and low frequencies on this subscale. Nevertheless, the subscale seems to identify contrasting cases well. Parents who scored the lowest on the counting subscale stated that counting was mostly brought up by the parent. Whereas parents who scored high on the counting stated that mathematics was brought up naturally by their child and hence counting may be covered more often in the home if both the parent and the child are likely to bring up counting. Overall, there are clear differences between the views of parents with contrasting frequency scores as assessed through the interviews.

Study 4: An Overview
As previously stated, the scale validation process involved two studies the second being quantitative; construct validity (i.e., CFA; Table 1: Study 4).

Method
Participants 152 parents with children aged 43 to 54 months (M age = 48 months) agreed to complete the PHMQ. 136 (89%) participants (91% female) returned the PHMQ and were 34.9 years-old (SD = 5.7, Range 21-46 years). The same FSM classification approach was used as Study 2. The proportion of PHMQ returned from each of the FSM Eligibility categories were 32, 50 and 18%, respectively. There were 39.5% parents from high SES, 19.7% from middle SES and 23.7% from low SES backgrounds. The additional 17.1% represents missing data or that the responding parent had never worked, had been long-term unemployed or was a full-time student.

Procedure
The parent of the target child was asked to complete the PHMQ. Parents who complete and return the PHMQ were entered into a prize draw for a £50.00 Amazon voucher. The PHMQ was returned to the pre-school teacher and collected by the researcher.

Data Analysis
A CFA was all completed in Mplus Version 1.5 (Muthén & Muthén, 1998. Mplus was used to examine the factor structure instead of SPSS as Mplus allows for the researcher to place each item in the factor suggested by the exploratory factor analysis to test if the model fits. A CFA was utilised on the five subscales found in Study 2.

Construct Validity
A CFA with robust maximum likelihood was conducted in Mplus. This approach has been widely used in CFA models when continuous observed variables slightly or moderately deviate from the normality and it is superior to maximum likelihood (Li, 2016). In Figure 1 the five-factor model is presented.
The selection of the most appropriate model was based upon goodness of fit statistics (Table 3). For more informa tion on other models that were examined (i.e., One factor, total frequency of maths activities, five-factor second order models and two-factor model based on the original definitions of direct and indirect numeracy activities by LeFevre et al. [2009]) refer to the Supplementary Materials (see Appendix 5). The model had acceptable model fit indices reporting a Comparative Fit Index (CFI) of .83 and a Tucker Lewis Index (TLI) of .81. Good fitting models are indicated by a CFI of > .95 (better model: > .97) and the same cut-off value for TLI applies (Geiser, 2012). A CFI > .90 is often regarded as an indicator of an adequate model fit (Awang, 2012;Coroiu et al., 2018;Hair, Black, Babin, & Anderson, 2010) the same cut-off value for TLI applies (Awang, 2012;Coroiu et al., 2018;Forza & Filippini, 1998).
The CFI and the TLI are incremental fit indices that compare the fit of the target model to the fit of a baseline model (Geiser, 2012). In Mplus the baseline model, also known as the null independence model, assumes that the population covariance matrix of the observed variables is a diagonal matrix, in other words, it is assumed that there is no relation between any of the variables (Geiser, 2012). As a consequence, it is possible that the null model is "too good, " meaning that the average level of correlations in the current data is rather low. In this case, Kenny (2015) argued that CFI should not be computed if the RMSEA (i.e., Root-Mean-Square Error of Approximation) of the null model is less than .158 as the CFI obtained will be too small a value (Beldhuis, 2012;Kenny & McCoach, 2003). When investigating the RMSEA values the model demonstrated acceptable RMSEA values (< .08) (Awang, 2012), the RMSEA value was .07. Therefore, the five-factor model is a reasonable model.
The SRMR (i.e., Standardised Root Mean Square Residual) coefficient is a standardised measure for the evaluation of the model residuals, however SRMR is somewhat biased by sample size. Marsh, Hau, and Wen (2004) state that the SRMR values for solutions based on small sample sizes are unacceptable (greater than .08), whereas those based on large sample sizes are acceptable. A value < .08 is generally considered a good fit (Hu & Bentler, 1999). Therefore, taking into consideration all fit criteria for assessing goodness of fit the model presents acceptable fit indices (CFI = .83, TLI = .81, RMSEA = .07, SRMR = .072), thus it seems reasonable that a five-factor model be deemed a suitable measurement model.

The Additional "Maths Related YouTube Videos" Item
As discussed in Study 3, an additional item was discovered through the process assessing content validity and added into the frequency of maths activities scale. This item was named "Maths related YouTube videos. " As confirmed by the interviews with parents during content analysis, younger children mostly use YouTube to consume traditional, "TV-like" content (OfCom, 2016). Therefore, the item "Maths related YouTube videos" was initially added to the TV programmes subscale of the frequency of maths activities scale. However, on examination of the modification indices (i.e., restrictions that may be relaxed to obtain a significant improvement of the global model fit; Geiser, 2012) it was apparent that the item, "Maths related YouTube videos, " should be placed within the computer maths games subscale which made for better model fit indices. The fit indices for the new item placed in the TV programmes subscale were CFI = .81, TLI = .79, RMSEA = .073, SRMR = .078. Whereas, the fit indices for new item placed in the computer maths games subscale were CFI = .82, TLI = .81, RMSEA = .070, SRMR = .072. As suggested by the modification indices and the model fit statistics the new item was placed in the computer maths games subscale. This was the only suggested modification indices, further evidence that the five-factor model is a suitable measurement model.

Discussion
By following the procedures used by Hinkin (1998) and Nunes et al. (2005) the new PHMQ measure demonstrates construct, content, criterion validity and satisfies APA standards for psychometric adequacy (APA, 1995;Hinkin, 1998), which was the ultimate objective of this scale development and validation process. The scale development process (Table  1: Study 1 and 2), presented construct validity, which addressed two psychometric properties. Firstly, the five-factor structure of the frequency of maths activities scale found through the EFA demonstrated that the factor structure and scale score reliability had high levels of reliability (α = .76 to .81).
This high level of reliability is consistent with other studies in which a factor analysis was used to refine the HNE measure. For instance, LeFevre et al. (2009) reported a reliability between .71 and .84 for their numeracy-related activities measure comprising of four factors; 1) number skills, 2) games, 3) applications and 4) number books. Kleemans et al. (2012) established two factors in their home numeracy questionnaire, 1) parent-child numeracy activities and 2) parents' numeracy expectations, with a reliability of .76 and .83, respectively. Further, Lukie et al. (2014) established a four-factor model, 1) exploratory cognitive play, 2) active play, 3) crafts, and 4) screen time, within their child-interest scale with a reliability ranging between .60 to .79. LeFevre et al. (2009) used factor analysis to classify activities reported in the 1) number skills and 4) number books subscales as direct teaching activities and the 2) games and 3) application factors as indirect experiences. However, the results of the factor analysis in the current study does not replicate LeFevre et al. (2009) findings of direct versus indirect experiences, instead five separate subscales were identified, 1) parent-child interactions, 2) computer maths games, 3) TV programmes, 4) shape and 5) counting.
Each of the studies mentioned above contribute to the growing body of research on the influence of the home environment on mathematical development. However, the unique aspect of the current PHMQ measure is its rigorous development through use of both deductive and inductive approaches. Skwarchuk (2009) drew numerical content from a questionnaire, diary entries and videotaped play sessions in a Canadian setting. Similar to Skwarchuk (2009) the aim was to draw out mathematical content that occurred in the home through interviews within a UK content. Literature demonstrates equivocal definitions (Cahoon et al., 2017); rendering is difficult to determine what defines an effective HNE that facilitates development in mathematics. This is further complicated by the lack of agreement on what parental involvement and interactions matter most. The current study broadens the definition of the HME through interviews with parents, allowing items to be generated inductively and therefore developing a comprehensive measure of the HME for preschool children following well-established procedures such as Hinkin (1998) and Nunes et al. (2005). This rigorous approach ensures that the measure captures the actual HNE that young children experience.
The scale validation process (Table 1: Study 3 and 4) consisted of content and criterion validity. Content validity demonstrates that the themes included in the PHMQ are raised by parents in the interviews. The examination of criterion validity showed that there were clear differences between the views and experiences of parents with low and high scores across all five PHMQ subscales. One of the new items that was spontaneously raised by the parents was that their children watched a range of videos on YouTube, including educational videos. YouTube is predominantly utilised, with 37% of 3 to 4-year-olds and 54% of 5 to 7-year-olds, using the YouTube app or website (OfCom, 2016). As confirmed by the interviews with parents, younger children mostly use YouTube to consume traditional, "TV-like" content (OfCom, 2016). Therefore, the item "Maths related YouTube videos" was placed within the computer maths games subscale within the frequency of maths activities scale based on the model fit indices from the CFA. A CFA was used to quantitatively assess the quality of the five-factor structure of the frequency of maths activities scale offering evidence of the construct validity of the scale (Hinkin, 1998). Taking into consideration all criteria for assessing goodness of fit the five-factor model it was deemed a suitable measurement model, confirming the findings from the EFA (Study 2).
Overall, there are more numeracy-based items than mathematics-based items within the PHMQ. This is reflective of the target age group (ages 3-5 years) for the Pre-school Home Mathematics Questionnaire. Therefore, the activities included in the questionnaire are developmentally appropriate. The questionnaire is titled the Preschool Home Mathe matics Questionnaire due to broader items than simply numeracy being included, such as shape and patterns. Similar to Clements, Sarama, and Liu (2008), who created a measure to access the mathematical knowledge and skills of children aged 3 to 7 years, the PHMQ broadly covers mathematics and would be proportional for the amount of non-numeracy maths presented in preschool.

Contribution to Research
As far as the authors are aware, this was the first study that uses both an inductive and deductive approach to develop an HME questionnaire, which increases the chance of content validity in the final scale (Hinkin, 1998). Previous scales (i.e., frequency of number activities scales) have rarely gone beyond creating items using a deductive approach. Further, these scales have rarely been validated beyond construct validity (e.g., LeFevre et al., 2009). Schoenfeldt (1984, p. 78) stated that "the construction of the measuring devices is perhaps the most important segment of any study. " Therefore, the PHMQ, in particular the frequency of maths activities scale, was evaluated across five psychometric properties (i.e., construct validity, factor structure, scale score reliability, content validity and criterion validity) and therefore satisfies APA standards for psychometric adequacy (APA, 1995;Hinkin, 1998). As with all questionnaire methods the PHMQ, is a self-report measure of the HNE and could be subject to social desirability bias. However, the PHMQ has been rigorously developed to allow researchers to obtain data efficiently to further understand how parents contribute to their preschool child's learning. Therefore, the PHMQ is a good measure to use with parents who have children between the ages of 3 and 5 as it is both developmentally appropriate and rigorously developed.
At this stage of the PHMQ development and validation, only one form of criterion validity has been included and no assessment of predictive validity has been reported. Due to the mixed findings in this area of research (Thompson et al., 2017) it is difficult to hypothesise what we would anticipate in terms of predictive validity. Thompson et al. (2017) examined studies relating HNE practices to mathematics performance and established that there are mixed findings in the literature. Some studies show positive directionality (i.e., Anders et al., 2012;Niklas, Cohrssen, & Tayler, 2016), no significant relations (i.e., Blevins-Knabe et al., 2000;Missall et al., 2015) or indicate negative relations (i.e., Blevins-Knabe & Musun-Miller, 1996) between HNE practices and mathematics performance. In fact, both positive and null relationships (i.e., DeFlorio & Beliakoff, 2015;Zippert & Ramani, 2016) or both positive and negative relations (i.e., Skwarchuk, 2009) have been observed within the same study. Therefore, rather than focusing on the predictive nature of the PMHQ we aimed to generate a robustly developed measure with good construct validity, factor structure, scale score reliability, content validity and criterion validity. Thus, future research can utilse this measure to further assess if a relationship between the HME and mathematical development truly exists. Moreover, Daucourt's (2019) meta-analysis on the relationship between the HME and mathematics performance found, on average, a very small effect (r = .14). One of the major limitations of previous studies is that the measurement development process in these studies either 1) reference LeFevre et al. (2009) scale without further attention to age, cultural, or setting specific concerns or 2) present final items and only discuss internal consistency (e.g., Kleemans et al., 2012). In measurement development, reporting a clear and transparent outline of the process that was undertaken to generate the final measure is essential (Hinkin, 1998). One of the core contributions of the current study is that we focus on the measurement development process and provide a model that can be used in other contexts across the numerical cognition field.
One of the issues that may be driving the inconsistency of findings in this area, is the lack of agreement on how the HME should be defined. Our study has addressed this issue by defining the HME from the perspective of the parent through the first study of the four presented in this paper (also see the initial qualitative research to this project, Cahoon et al., 2017). Therefore, the main aim of this paper was to rigorously develop and validate a measure of the home environment that went right back to redefining the HME and subsequently demonstrating high levels of content and criterion validity.
Further, this study goes beyond only including frequency of maths activities question by including questions on children's maths literacy and counting ability. Additional dimensions/items were discovered and included such as parent-child teaching methods (e.g., what are the specific things parents say or do to encourage and support their child to learn maths?) and target child-sibling interactions (e.g., what numerical activities siblings are most likely to do together?). In addition to children interacting with their parents/caregivers at home, interactions with others, such as siblings, have been observed to play an important role in learning numerical concepts (Howe et al., 2015;Howe, Ross, & Recchia, 2011) however, these types of questions have rarely made it into HME questionnaires. These types of interaction questions could allow researchers to investigate if parent-child teaching methods and target child-sibling interactions help in the development of mathematical knowledge.

Limitations
Future research should attempt to align questionnaire measurement with other data collection techniques. This is particularly pertinent as the main focus of questionnaire based HNE measurement is the frequency of activities. Future studies should also attempt to measure the quality of the content of these activities and interactions which is a very difficult aspect to capture using questionnaires.
In both Studies 2 and 4 there are more participants in the high SES category, with the middle SES category having the least participants. Hence, although considerable efforts were made to acquire an equal spread of participants across SES. There were less parents in the middle SES category and then the low SES category than the high SES category. However, this could be expected as research has shown that lower SES parents were less likely than others to engage in their child's schooling (e.g., Braun, Noden, Hind, McNally, & West, 2005;Moon & Ivins, 2004;West, 2007).
Eight participant interviews were used in the criterion validity and although there were clear differences between the views and experiences of parents with low and high scores across all five PHMQ subscales the limited sample size used should be taken with caution. Further, it should be noted that the majority of items/questions used within the PHMQ are numeracy related which would be developmentally appropriate for the intended age group. However, the questionnaire involves home environment relevant dimensions beyond numeracy therefore it has been called the PHMQ so as not to be misleading.

Future Recommendations
Most HME questionnaires have been developed and used in home environments that reflect the developed world, for example Canada, America and the current PHMQ developed for a UK context. This is the first study within the UK that has created an HME questionnaire that is culturally specific, were items are not just deductive and drawn from other HME questionnaires such as Melhuish et al. (2008). Hence, this HME questionnaire alongside other available HME questionnaires may be context specific. There is a need for the development of an international measure that is developed and validated as rigorously as the current measure, but for the context of low-income country contexts. This study offered the theoretical and empirical framework of how an HME measure that reflects the home environment in low income countries could be created and validated to meet APA standards for psychometric adequacy (APA, 1995).

Conclusion
Some of the HME questionnaires have not provided adequate information about item generation and refinement, scale dimensionality, scale score reliability, or validity (e.g., Kleemans et al., 2012;LeFevre et al., 2009;Melhuish et al., 2008). In previous literature a major weakness to studying the HNE is the lack of information describing the psychometric integrity of scales used to measure the construct of the HNE. However, these studies have made a widespread impact on home learning environment research and the number of studies in this area have increased in recent years. The current study extends the rigour of HME questionnaire development and validation. This study provides details on psychometric integrity and appears to be psychometrically sound (Hinkin & Schriesheim, 1989;MacKenzie, Podsakoff, & Fetter, 1991). The PHMQ covers a vast array of HNE areas thus, it is concluded that the PHMQ can be used to successfully describe the HNE that a parent creates for their child to learn numeracy. Every learning experience in the home are shared learning experiences for children, whether this is between parents or siblings. The PHMQ can allow researchers to quickly obtain data to understand how parents contribute to their child learning numeracy related concepts and skills.

Funding:
The authors have no funding to report.