Math anxiety is a form of anxiety in which an individual experiences anxiety during specific math situations (Ashcraft & Ridley, 2005; Bieg et al., 2015). Math anxiety is relatively common. While some estimates suggest 15–20% of American college students may experience high levels of math anxiety (Ashcraft & Ridley, 2005), researchers have cautioned against using prevalence rates given that math anxiety is a continuous trait (Cipora et al., 2019). Studies suggest that children experience math anxiety as early as elementary school and levels of math anxiety increase during middle school, peak in early high school, and remain stable through college (Ahmed, 2018; Hembree, 1990; Luo et al., 2009).

Higher levels of math anxiety are associated with a number of adverse outcomes (e.g., Hembree, 1990; Luttenberger et al., 2018; Núñez-Peña et al., 2013). Studies suggest small to moderate negative correlations between math anxiety and math performance (i.e., scores on math achievement/aptitude tests) for elementary, secondary, and college-aged students (Barroso et al., 2021; Devine et al., 2012; Hembree, 1990; Ma, 1999; Ramirez et al., 2016; Zhang et al., 2019). Math anxiety is associated with increased avoidance of math-related situations, reduced intent to continue with math courses, reduced enrollment in math courses, enrollment in lower-level math courses, and poorer attitudes toward math (Ahmed, 2018; Ashcraft & Ridley, 2005; D’Ailly & Bergering, 1992; Hembree, 1990; Núñez-Peña et al., 2013; Suárez-Pellicioni et al., 2016). Students who experience high levels of math anxiety in middle school are less likely than their peers to be employed in science, technology, engineering, and math (STEM) jobs as adults (Ahmed, 2018), which tend to provide higher salaries and better prospected growth than careers in other industries (Fayer et al., 2017; National Science Board, 2018).

Given the potential short- and long-term consequences of math anxiety in children and adolescents, it is important to have psychometrically sound measures that assess math anxiety in this population. Establishing measurement invariance is an important part of the validation process of a measure since differences in interpretation of the items may be accounting for score discrepancies between groups rather than true differences in the construct (Meredith & Teresi, 2006). The 9-item Abbreviated Math Anxiety Scale (AMAS) is one of the most widely used self-report measures of math anxiety (Hopko et al., 2003), and includes numerous international adaptations (Cipora et al., 2015; Primi et al., 2014; Schillinger et al., 2018). Previous studies using the AMAS with elementary and college-aged populations have demonstrated a two-factor model (i.e., factor 1 = anxiety about learning math, factor 2 = anxiety about math evaluation) provides a good fit (Cipora et al., 2015; Schillinger et al., 2018). Although measurement invariance across gender has been established for the AMAS among elementary school aged children and adults (Caviola et al., 2017; Hopko et al., 2003; Primi et al., 2014; Vahedi & Farrokhi, 2011), it has not yet been evaluated among middle school aged children. This is noteworthy since math anxiety increases significantly for almost one-quarter of students over the course of middle school (Ahmed, 2018) and girls tend to report higher levels of math anxiety than boys (Alexander & Martray, 1989; Bieg et al., 2015; Dowker et al., 2016; Goetz et al., 2013; Hill et al., 2016; Meece et al., 1990; Wang et al., 2014; Wigfield & Meece, 1988), even though gender differences in math performance appear to be small to negligible (Ashcraft & Ridley, 2005; Devine et al., 2012; Hill et al., 2016; Lindberg et al., 2010; Ma, 1999; Reilly et al., 2015). Furthermore, there are no existing studies with the AMAS that have tested a bifactor model in middle school students. Bifactor models allow for the simultaneous assessment of the common and independent effect of specific, latent factors on scale items, and are increasingly being used in psychological research because they often fit the data superior to higher-order models and correlated factors (Bornovalova et al., 2020). The present study aimed to fill these important gaps in the empirical literature by examining the factor structure and equivalence of the factor structure of the AMAS across middle school girls and boys. The following hypotheses were tested:

  1. 1.

    A bifactor model will provide an improved fit for AMAS item scores compared to a two-factor solution, with resulting goodness-of-fit indices meeting or exceeding cutoff scores for adequate model fit (CFI ≥ 0.95, TLI ≥ 0.95, SRMR ≤ 0.09, RMSEA ≤ 0.06; Hu & Bentler, 1999).

  2. 2.

    The AMAS total scale and Learning Math Anxiety (LMA) and Math Evaluation Anxiety (MEA) subscales will be invariant across male and female participants, as evidenced by increasingly constrained nested models meeting or exceeding cutoff criteria for adequate model fit in both groups (CFI ≥ 0.95, TLI ≥ 0.95, SRMR ≤ 0.09, RMSEA ≤ 0.06; Hu & Bentler, 1999), with decrements in CFI values less than or equal to 0.01 in magnitude (Cheung & Rensvold, 2002), and change in RMSEA values less than or equal to 0.015 in magnitude (Chen, 2007).

Materials and Methods

Participants

Participants for this study were 604 children recruited from two middle schools in Texas. Table 1 contains demographic information for the sample. The mean age of the sample was 12.99 years (SD = 0.78; range = 10–15 years). With regard to gender, 203 participants (33.6%) were male and 329 participants (54.5%) were female (missing = 72; 11.9%). The majority of participants were in the 7th grade (54.6%) and from a middle school in Central Texas (68.4%). With regard to ethnicity, 198 participants identified as Hispanic (32.8%) and 403 (66.7%) as Non-Hispanic (Missing = 3; 0.5%). Most participants in our sample had at least one parent who had obtained a minimum of a college degree (53.5%). Five-hundred-forty participants (89.4%) completed the study measures online and 64 (10.6%) completed the study measures in-person.

Table 1 Sample characteristics

Measures

The AMAS is a 9-item self-report questionnaire for assessing math anxiety (Hopko et al., 2003). Respondents rate their anxiety regarding typical situations involving math in school on a Likert-type scale ranging from 1 (low anxiety) to 5 (high anxiety). A total score was derived by summing responses for all items. Total scores ranged from 9 to 45, with higher scores indicating greater anxiety. Learning Math Anxiety (LMA; e.g., listening to a lecture in a math class, listening to another student explain a math formula) and Math Evaluation Anxiety (MEA; e.g., taking an examination in a math course, being given a pop quiz in a math class) subscales were also computed. The AMAS has demonstrated coefficient alphas of 0.86 for the total scale, 0.80 for the LMA subscale, and 0.81 for the MEA subscale among 11- to 13-year-old children, suggesting good internal consistency for children 11 years of age and older (Carey et al., 2017). The measure has demonstrated good 2-week test–retest reliability in adults, with coefficients of 0.85 for the total scale, 0.85 for the LMA subscale, and 0.83 for the MEA subscale (Hopko et al., 2003). In the current study, the AMAS full scale Cronbach’s alpha coefficient was 0.86 and the Learning Math Anxiety (LMA) Scale and Math Evaluation Anxiety (MEA) Scale alphas were 0.82 and 0.80, respectively.

Demographics

Participants completed a demographic questionnaire that assessed for grade, age, race/ethnicity, gender, and parental highest level of education.

Procedures

Participants were recruited in-person and online from two middle schools in Texas during the 2018–2019 and 2019–2020 academic years. Participants were offered the chance to participate in a research study related to math anxiety. Students received recruitment materials in math class or via their school email address. Students were eligible if they were in middle school and provided consent to participate in the study. Students who were unable to read and write in English and/or who had an intellectual disability that precluded them from understanding the consent process and/or the administered measures were excluded from recruitment. Eligible participants completed questionnaires in their classroom or online, which included demographic questions and the AMAS. Participants were offered the opportunity to win one of eight $25 gift cards through completion of the study. Survey collection was anonymous, with the exception of participants providing the first three initials of their first and last names and consent to contact them and their parents by email if they were interested in being entered in the raffle for a gift card. All measures and recruitment and data collection practices were reviewed and approved by the Institutional Review Board of X University and the principals of the participating schools.

Data Analysis

Data analyses were performed using the lavaan and semPlots packages in R, Version 3.6.3 (R Core Team, 2020; Rosseel, 2012; Epskamp, 2015). For survey measures with missing data, a Missing Value Analysis was run to determine whether there were any patterns to the missing responses (Little, 1988). For data that was found to be missing at random, item responses were estimated using the expectation–maximization algorithm, an estimation algorithm for missing data based on maximum-likelihood estimation (Dempster et al., 1977; Schafer & Graham, 2002).

Model Fit

Confirmatory Factor Analysis (CFA) was conducted to test the latent structure of the AMAS items (Brown, 2014). Three types of models were tested for fit, (1) a one-factor model with all items loading onto a single latent factor (math anxiety), (2) a two-factor model with items loading onto two correlated latent factors [Learning Math Anxiety (LMA) and Math Evaluation Anxiety (MEA)], and (3) a bifactor model with items loading onto two orthogonal factors (LMA and MEA) and a common g-factor. For the two-factor model, items 1, 3, 6, 7, and 9 of the AMAS were specified to load onto the first factor (LMA) and items 2, 4, 5, 8 were specified to load onto the second factor (MEA), which are consistent with previous studies (e.g., Hopko et al., 2003). Item coefficients were fixed to zero for the factor that the items were not expected to load onto (Bandalos, 2018). The first item loading onto each factor (i.e., item 1 on LMA and item 2 on MEA) was fixed to 1.0 as a marker indicator (Bandalos, 2018; Brown, 2014). For the bifactor model, items were specified to load onto the LMA and MEA factors in the same manner as the two-factor model and were also specified to load onto the g-factor. A two-factor model was expected to provide a superior fit to the data than a one-factor model (Carey et al., 2017; Caviola et al., 2017). To our knowledge, a bifactor solution has not been previously tested for the AMAS. A bifactor model allows for the simultaneous assessment of both the specific, independent effects of the latent factors (i.e., the LMA and MEA factors) and the common, general effect on the items shared by the factors (i.e., g-factor; Chen et al., 2012). The bifactor model suggests that the specific factors contribute to effects on the measured items above and beyond those accounted for by the common factor, which accounts for the effects on the items shared among the factors. If the bifactor model provided a better fit for the data than the two-factor model, it would provide additional support that the LMA and MEA factors measure separate constructs by controlling for the general factor underlying all item responses (Furtner et al., 2015).

Maximum likelihood model estimation (ML) was used (Bandalos, 2018). The Satorra-Bentler (2010) scaled chi-square and robust standard error adjustments were applied (Bandalos, 2018; Li, 2016; Maydeu-Olivares, 2017). The chi-square statistic and goodness-of-fit indices were used to evaluate fit of the three models (Bandalos, 2018; Brown, 2014; Hooper et al., 2008). Because of the tendency for the chi square test to over-reject good-fitting models, the resulting chi-square statistic were not weighed as highly in assessing model fit for the proposed study as the other fit indices. The following absolute and comparative fit indices were used to test model fit based on recommendations by Hu and Bentler (1999): comparative fit index (CFI; Bentler, 1990), Tucker-Lewis index (TLI; Tucker & Lewis, 1973), standardized root mean square residual (SRMR; Bentler, 1995), and the root mean square error of approximation (RMSEA; Steiger & Lind, 1980). Goodness-of-fit index cutoff criteria recommended by Hu and Bentler (1999) will be used to evaluate model fit (CFI ≥ 0.95, TLI ≥ 0.95, SRMR ≤ 0.09, RMSEA ≤ 0.06). The results for these indices were evaluated in conjunction to assess fit for the two-factor model (Hu & Bentler, 1999).

Measurement Invariance

A Multigroup Confirmatory Factor Analysis (MGCFA) was used to investigate whether the AMAS demonstrated strong factorial invariance across the sample of girls and boys, in order to allow for the comparison of group means without item-specific biases (Meredith & Teresi, 2006). The following sequence of tests was conducted to evaluate configural, metric, and scalar invariance, respectively: (1) simultaneous analysis of equal form, (2) test of equal factor loadings, and (3) invariant intercepts analysis (Brown, 2014; Putnick & Bornstein, 2016). Invariance testing proceeded in a stepwise fashion, in which the least restricted solution was evaluated first, followed by nested models with increasingly restrictive constraints (Brown, 2014; Putnick & Bornstein, 2016).

In line with current conventions for invariance testing, multiple fit criteria were used to assess for measurement invariance between boys and girls (Putnick & Bornstein, 2016). Equivalence of model fit between the configural, metric, and scalar invariance models was evaluated using the scaled difference chi-square test (Satorra & Bentler, 2010) and Cheung and Rensvold’s (2002) suggested criteria of a decrement in the CFI index of 0.01 or smaller (ΔCFI ≤  − 0.01). The ΔCFI fit index was weighed more heavily than the scaled difference chi-square test, as the latter is sensitive to sample size (Milfont & Fischer, 2010; Putnick & Bornstein, 2016).

Results

Missing Data

Of the 604 participants who completed the AMAS, two participants produced invalid responses to several items on the scale by either providing more than one response to an item or responding with a text rather than Likert response. These responses were excluded, and the items were treated as missing data. Of the remaining responses, 1.0% of item responses were missing on the AMAS. Little’s test of Missing Completely at Random (MCAR) was run to determine whether item responses on the AMAS scale were missing at random (Little, 1988). Little’s MCAR test was not significant, suggesting that item responses on the AMAS scale were missing completely at random, X2 (84) = 83.810, p = 0.485. Of the 604 participants who completed the AMAS, 72 participants (11.9%) did not report their gender and were excluded from the multigroup confirmatory factor analysis.

Descriptive Statistics

Responses on the AMAS total scale, LMA subscale, and MEA were found to uphold the assumption of normality through examination of histogram, normal Q-Q plot, and skewness and kurtosis values of the data, which were within normal limits. There were no statistically significant mean differences between boys and girls on the AMAS total score or LMA or MEA subscales.

Model Fit

Fit indices for all models and differences in model fit are presented in Table 2. Item loadings are presented in Table 3, and descriptive statistics for items and inter-item correlations are presented in Table 4.

Table 2 Model fit indices and differences in fit
Table 3 AMAS item loadings by model
Table 4 AMAS item descriptive statistics and inter-item Pearson correlation coefficients

One-Factor Model

The first model tested was a one-factor (unidimensional) model, for which all items were loaded onto a single latent factor, Math Anxiety. Chi-square results from the ML model estimation with Satorra-Bentler adjustment produced a significant value, which initially suggested that the model did not provide adequate fit for the data. However, due to the strong tendency for the chi-square test to reject good-fitting models due to negligible discrepancies in fit function, other fit indices were weighted more highly in determining model fit (Brown, 2014; Hu & Bentler, 1999). When compared to the fit index values recommended by Hu and Bentler (1999) for identifying good-fitting models (CFI ≥ 0.95, TLI ≥ 0.95, SRMR ≤ 0.09, RMSEA ≤ 0.06), the model did not demonstrate a good fit. The one-factor model is depicted in Fig. 1.

Fig. 1
figure 1

Standardized estimates of the one factor model of the Abbreviated Math Anxiety Scale (AMAS) in middle school students. Note. MtA = math anxiety latent factor

Two-Factor Model

The second model tested was a two-factor (bidimensional) model, in which items were loaded onto two latent factors, Learning Math Anxiety (LMA) and Math Evaluation Anxiety (MEA), based on the findings of previous CFAs of the AMAS (e.g., Cipora et al., 2015; Hopko, 2003; Schillinger et al., 2018). Chi-square results from the ML model estimation with Satorra-Bentler adjustment produced a significant value. With the exception of the RMSEA, which was marginally above the cutoff value, comparisons of the model fit index values to Hu and Bentler’s (1999) recommended values suggested that the two-factor model demonstrated a good fit. The two-factor model findings were in line with those of previous research, which found that a two-factor model demonstrated superior fit for the AMAS in children and adults than a single-factor model (e.g., Carey et al., 2017; Caviola et al., 2017; Hopko, 2003). The LMA and MEA latent factors demonstrated a large correlation of r = 0.70, which is similar to previous findings and below the cutoff of 0.85 for problematic discriminant validity (Brown, 2014). The two-factor model is depicted in Fig. 2.

Fig. 2
figure 2

Standardized estimates of the two factor model of the Abbreviated Math Anxiety Scale (AMAS) in middle school students. Note. LMA = learning math anxiety subscale; MEA = math evaluation anxiety subscale

A chi-square test between the single-factor and two-factor models was significant, suggesting a difference in fit between the two models. Taking all of these findings into account, the two-factor model was found to demonstrate a good fit for the AMAS and improved fit over the single-factor model.

Bifactor Model

The third model tested was a bifactor model, for which items were fixed and loaded onto the LMA and MEA factors as described for the two-factor model above, except that all items were also set to load freely on the common g-factor. Since factors in bifactor models are orthogonal, the correlations between factors were set to 0. Chi-square results from the ML model estimation with Satorra-Bentler adjustment produced a non-significant value, which suggested that the model provided adequate fit for the AMAS items. Comparisons of the model fit index values to recommended values for good model fit suggested that the bifactor model demonstrated excellent fit. The bifactor model is depicted in Fig. 3.

Fig. 3
figure 3

Standardized estimates of the bifactor model of the Abbreviated Math Anxiety Scale (AMAS) in middle school students. Note. LMA = learning math anxiety subscale; MEA = math evaluation anxiety subscale; g = g-factor

A chi-square test between the bifactor and two-factor models was significant,

suggesting a significant difference in fit between the two models. Taking the fit results from the single-factor, two-factor, and bifactor models into account, the bifactor model provided an excellent fit and the best fit of the three models tested.

Model Fit by Gender

Increasingly restricted and nested multigroup confirmatory factor analyses were run using the bifactor model from the previous step to determine whether the factor structure of the AMAS was invariant across groups. For measurement invariance analysis, only participants who reported their gender were included (N = 532). As a preliminary step, the bifactor model, found to provide an excellent fit for the entire sample in the previous step, was evaluated for fit independently in boys and girls to determine whether the model provided a good fit for both groups and if a multigroup confirmatory factor analysis was further indicated. Fit indices for the combined bifactor model and bifactor models for boys and girls only are presented in Table 5.

Table 5 Bifactor model fit indices for boys and girls

Responses on all three AMAS scales were normally distributed for boys and girls as determined by examination of histograms, normal Q-Q plots, and skewness and kurtosis values of the data. For the bifactor model for boys and girls combined, chi-square results from the ML model estimation with Satorra-Bentler adjustment produced a significant value, and the fit index values suggested a good fit for the data. Standardized item loadings ranged between 0.06 and 0.69 on the LMA factor, 0.32–0.42 on the MEA factor, and 0.44–0.67 on the g-factor.

For the bifactor model using data from girls only, chi-square results from the ML model estimation with Satorra-Bentler adjustment produced a significant value and fit index that suggested a good fit for the data. Standardized item loadings ranged between 0.04 and 0.71 on the LMA factor, 0.30–0.45 on the MEA factor, and 0.48–0.72 on the g-factor.

For the bifactor model using data from boys only, chi-square results from the ML model estimation with Satorra-Bentler adjustment produced a non-significant value and fit index values suggesting a good fit for the data. Standardized item loadings ranged between 0.11 and 0.79 on the LMA factor, 0.36–0.68 on the MEA factor, and 0.23–0.83 on the g-factor.

Because the bifactor model provided a good fit for both boys and girls independently, a confirmatory factor analysis was run with nested models to test for measurement invariance between the two groups. The model was increasingly constrained to evaluate invariance of model form (configural invariance), factor loadings (metric invariance), and item intercepts (scalar invariance) across gender. Fit indices for the nested models are presented in Table 6. Changes in CFI values were less than 0.01 across increasingly constrained models, suggesting invariance of form, factor loadings, and intercept between genders. Although the Satorra-Bentler scaled difference chi-square test was significant when the unconstrained bifactor model was compared to the configural invariance model, the results of this test were weighed less heavily than the ΔCFI, as the scaled difference chi-square test has been found to be overly sensitive to trivial deviations in model fit in large samples (Milfont & Fischer, 2010; Putnick & Bornstein, 2016). The ΔCFI between the unconstrained and configural invariance models fell within Cheung and Rensvold’s (2002) cutoff criteria of ≤  − 0.01, suggesting configural invariance of the AMAS across gender. The remainder of the increasingly constrained models produced non-significant Satorra-Bentler scaled difference chi-square results and ΔCFI ≤  − 0.01, supporting invariance of item loadings and item intercepts across gender.

Table 6 Fit indices of nested bifactor gender measurement invariance models

Discussion

In order to confirm the factor structure of the AMAS and assess model fit in a middle school population, we performed Confirmatory Factor Analyses for one-factor, two-factor, and bifactor models. Previous studies have found a two-factor model to provide a good fit for the AMAS in college and elementary school populations in the USA and abroad (e.g., Carey et al., 2017; Caviola et al., 2017; Cipora et al., 2015; Schillinger et al., 2018). We therefore expected a two-factor model to provide an improved fit for the data compared to a unidimensional model, with items loading onto two subscales, Learning Math Anxiety (LMA) and Math Evaluation Anxiety (MEA; Hopko et al., 2003). We also sought to determine whether a bifactor model would provide a good and improved fit for the data compared to a two-factor model, as bifactor models allow for the simultaneous assessment of the common and independent effect of specific, latent factors on scale items (Bornovalova et al., 2020). In line with our hypotheses, we found that the two-factor model provided a good fit for the data and an improved fit over the one-factor model, which did not provide an adequate fit. Furthermore, we found that the bifactor model provided a superior fit for the data over the two-factor model. Taken as a whole, these findings suggest the bifactor model may be the best fitting model when assessing math anxiety in middle school students with the AMAS. Future studies that use the AMAS with other middle school populations should consider testing the bifactor model to determine if it provides a superior fit to the two-factor model. Replication of our findings in other middle school populations would underscore the importance of accounting for the unique characteristics of anxiety about learning math and anxiety about math evaluation while taking into account the considerable overlap among these two factors when assessing math anxiety in middle school students with the AMAS. Such information has the potential to not only inform best practices for assessing math anxiety in middle school students, but also treatment planning approaches in middle school students. For example, treating anxiety about learning math and anxiety about math evaluation as two distinct subsets of math anxiety in middle school students may result in poorer treatment outcomes than an approach that acknowledges the unique features of each factor and the considerable overlap between them.

Of note, two items produced relatively smaller item loadings across all models tested. Specifically, item 1 “Having to use tables in the back of a math book” and item 5 “Being given a homework assignment of many difficult problems that is due the next class meeting” produced standardized item loadings of less than 0.60 for the one-factor and two-factor models. These findings, in conjunction with previous research (Cipora et al., 2015; Schillinger et al., 2018), suggest that item 1 and item 5 may not be particularly salient items for assessing math anxiety in this population. Closer examination of item means presented in Table 4 indicates that participants reported relatively lower scores for item 1 and relatively higher scores for item 5 compared to other items. However, examination of the inter-item coefficients suggests that the magnitude of the correlations between items 1 and 5 and the other items on their respective subscales were large enough to indicate that they measure similar constructs (e.g. r > 0.20; Piedmont, 2014).

We propose several explanations for the weak loadings of these items onto the full scale and subscales of the AMAS. First, as schools are increasingly integrating technology into the classroom, especially during the COVID-19 pandemic, it is possible that many students have adopted the use of digital math texts rather than printed books for learning and completing assignments. It is, therefore, possible that today’s school-age students are not familiar with the process of turning to the back of a math book to check for reference tables. This hypothesis is supported by previous research (Cipora et al., 2015; Schillinger et al., 2018) and the authors’ anecdotal experience in administering the AMAS questionnaire in-person, when several students expressed confusion regarding the meaning of item 1 and indicated that they did not use printed textbooks in math class. Due to changes in the modern learning environment since the AMAS was first published in 2003, item 1 may no longer capture the construct that it was intended to measure.

Second, compared to the other items on the MEA scale that measure anxiety related to tests and quizzes, item 5 pertains to anxiety related to completing homework. Because homework may be viewed as a less threatening task than quizzes or tests, it is possible that item 5 measures a closely related, yet distinct construct (e.g. Math Assignment Anxiety). Other authors have argued that being given difficult homework involves both learning math and math evaluation anxiety (Cipora et al., 2015; Schillinger et al., 2018). Thus, it is possible that item 5 measures a subsection of evaluation math anxiety that participants in this particular sample found less anxiety-provoking or it taps into both math learning and math evaluation anxiety.

Third, it is possible that both items represent additional latent factors that are not accounted for by the model. Given that the two items do not correlate strongly (r = 0.26), it is unlikely that they would load onto the same latent factor and would represent separate factors. However, taking into account results of previous exploratory and confirmatory analyses supporting a two-factor solution for the AMAS, as well as the principle of parsimony (Vandekerckhove et al., 2015), this explanation is unlikely.

The bifactor model was found to provide a good fit for both boys and girls in the sample. Results of a multigroup confirmatory factor analysis indicated that the model was equivalent for boys and girls across form, factor loadings, and intercepts. As hypothesized, configural, metric, and scalar invariance were supported between middle school boys and girls. These results suggest that the AMAS demonstrates strong factorial invariance across gender for middle school students and can be used to measure math anxiety in middle-school aged boys and girls in an unbiased manner. That is, when differences between middle school boys and girls are reported on the AMAS, these differences are more likely a result of true differences in the construct rather than differences in interpretation of the items on the AMAS due to gender.

There were a number of limitations to this study. First, the majority of data was collected several months into nationwide lockdowns imposed in response to the COVID-19 pandemic. It is possible that changes in participant schedules, learning environments, and levels of stress related to environmental factors (e.g., COVID-19 related worry, economic impact, isolation, increased mental health difficulties) may have impacted ratings of math anxiety and other constructs. Second, data were collected in two separate settings with some participants completing questionnaires via paper-and-pencil in the classroom, while other students participated at home with questionnaires administered electronically. Although some research suggests that response differences to questionnaires administered electronically versus on paper tend to be negligible (Gwaltney et al., 2008; Mangunkusumo et al., 2005; Muehlhausen et al., 2015), specific research with the AMAS indicates there may be differences between online and paper and pencil AMAS administration (Cipora et al., 2017). Thus, differences between data collection modalities in our study may have impacted the study findings. Third, as we collected data from school children, a specially protected population, we did not penalize participants for skipping items that they did not wish to respond to. As a result, demographic information is missing for a number of participants. Despite this, our sample’s rate of completion was very close to those of large-scale online surveys with youth participants (Anderson & Jiang, 2018; Larson et al., 2011), suggesting that our rate of completion was within expected limits. Our study was only administered in English and we did not collect data on potential participants who were not able to participate because they did not speak English or because they had an intellectual disability that prevented them from understanding the consent process and/or the administered measures. The present study investigated measurement invariance of the AMAS across gender in middle school students; however, future research is needed to assess measurement invariance of the AMAS in this population across other characteristics including grade, ethnicity, and parental highest level of education. Finally, our sample population was homogenous in that the large majority of participants in the sample were White, not Hispanic, in 7th grade, and had parents with high levels of educational attainment. Therefore, results of the study may not generalize to other populations and the study would benefit from replication in a more racially, ethnically, economically, and age diverse sample.

In conclusion, the AMAS demonstrated factorial invariance for gender in a community sample of middle school students, suggesting that the items are interpreted in a similar manner by both boys and girls in this age group. A bifactor model provided the best fit for the measure, suggesting that the Math Evaluation Anxiety and Learning Math Anxiety subscales contribute additional variance in scores over and above the total scale. Overall, our findings suggest that the AMAS is a psychometrically sound measure of math anxiety for use in middle school-aged populations with similar demographic characteristics to our sample and can be used to compare differences in math anxiety between boys and girls in an unbiased manner.