Assessing the assessment measures for menstrual cycle symptoms

Assessing the assessment measures for menstrual cycle symptoms

Journal of Psychosomatic Research 52 (2002) 223 – 237 Assessing the assessment measures for menstrual cycle symptoms A guide for researchers and clin...

150KB Sizes 0 Downloads 34 Views

Journal of Psychosomatic Research 52 (2002) 223 – 237

Assessing the assessment measures for menstrual cycle symptoms A guide for researchers and clinicians A. Haywood a,*, P. Slade b, H. King c a

Department of Psychology, University of Sheffield, Western Bank, Sheffield S10 2TP, UK Department of Psychology, Clinical Psychology Unit, University of Sheffield, Western Bank, Sheffield S10 2TP, UK c Directorate of Sexual and Reproductive Health, Central Health Clinic, Community Health Sheffield, 1 Mulberry Street, Sheffield S1 2PJ, UK b

Received 24 April 2001; accepted 12 December 2001

Abstract A review of measures of menstrual cycle symptoms is provided. This will enable researchers and clinicians to make the appropriate choice of method for their study requirements. In recent years, these measures have taken the form of retrospective questionnaires (rating severity of symptoms from memory) and prospective diaries (daily checklists of symptoms). Many of these draw on aspects of the well-known retrospective questionnaires, the Menstrual Distress Questionnaire and Premenstrual Assessment

Form, in their development and validation. Each measure will be briefly described, followed by comments on its development, psychometric properties and finally an evaluation of its strengths and weaknesses. It concludes with an examination of the implications arising from the review, and some recommendations that menstrual cycle researchers and clinicians may wish to consider, as they decide upon the most appropriate measure for their needs. D 2002 Elsevier Science Inc. All rights reserved.

Keywords: Measurement; Menstrual cycle; Psychometric; Review

Introduction Debate in the literature over the past 20 years regarding the aetiology of premenstrual syndrome (PMS) has led to the emergence of many conflicting theories. These range from a biological/hormonal perspective [1] to an interactional one (an amalgamation of biological factors, psychological processes and social circumstances [2]. Researchers have also considered whether PMS is better described as a single entity, or as incorporating different dimensions of change. Physical and psychological symptoms of PMS are believed to affect up to 75% of women experiencing regular menstrual cycles. Most commonly cited symptoms include: irritability, depression, anxiety, tension, crying easily/feeling tearful, headache/migraine and breast tenderness/soreness [3]. In sufferers, these symptoms are presumed to be significantly higher premenstrually than in the postmenstrual phase. At one extreme end of what

* Tel.: +44-114-222-6585. E-mail address: [email protected] (A. Haywood).

many now view as a continuum of symptom experience, is premenstrual dysphoric disorder (PMDD) said to affect 3 –8% of women [4]. PMDD is categorised under mood disorders in the Diagnostic and Statistical Manual of Mental Disorders (DSM IV) [5] as ‘depression not otherwise specified,’ whilst PMS is mentioned (Appendix B) as an ‘associated feature’ of PMDD. Endicott [6] provides a review of the history, evaluation and diagnosis of PMDD. One crucial aspect of menstrual cycle research concerns the adequacy of the method by which symptoms are assessed. In recent years, menstrual cycle symptoms have been measured by a variety of instruments including both retrospective (rating the severity of symptoms over a typical cycle from memory) and prospective/concurrent reports (daily checklists of symptoms). Retrospective questionnaires have been criticised for providing an inflated estimation of symptom severity [7] biased by cultural expectations and heavily reliant upon participants’ memory of past menstrual related symptoms [8]. Concurrent reports, the most popular method used today, while less reliant on memory, are demanding for participants and lead to biased

0022-3999/02/$ – see front matter D 2002 Elsevier Science Inc. All rights reserved. PII: S 0 0 2 2 - 3 9 9 9 ( 0 2 ) 0 0 2 9 7 - 0

224

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

samples through nonadherence. This is more likely if the investigation is longer than 2 months [9] considered by most as the minimum rating time. There is a bewildering array of instruments that vary in terms of their original development process and subsequent attention to reliability and validity issues. Some were designed for use in the clinical setting and determine presence or absence of PMS by the use of dichotomous criteria, [10]. Others measure symptoms on a scale ranging from not present through to severe [11]. Whilst the former is useful in identifying higher levels of symptom severity, the latter allows the investigator to examine differing levels of change in a range of populations. Guidance should be provided concerning what constitutes significant levels of cyclical suffering; information often neglected in the development process. Consequently, many investigators opt to follow guidelines from the 1983 National Institute of Mental Health (NIMH) Workshop [12]. These criteria suggest that symptoms are of significance when, over 2 consecutive months, there is a 30% change in symptom intensity between the premenstrual period (6 days before the onset of menstruation). This is compared with symptoms suffered between days 5 and 10 of the cycle. Although these recommendations may serve as a starting point, it has been argued that the symptom intensity and degree of change in intensity requires further specification [13]. This paper aims to enable both researchers and clinicians to understand the instruments’ strengths and limitations, and select the most appropriate measure for their requirements. As both retrospective and prospective measures each have their advantages and disadvantages, both methods are included. The description and psychometric properties of each measure will be presented in Tables 1 and 2 (for retrospective and prospective instruments, respectively), followed by a critical evaluation and analysis of its strengths and weaknesses.

Retrospective measures

Psychometric properties Reliability issues. No tests of internal consistency were conducted, but test – retest reliabilities were performed on data from 15 participants. Significant correlations of .57– .95 were reported between symptoms in cycles one and two. Validity issues. Content validity. Items were derived from a review of previous research, open-ended questionnaires and/or interviews with women. Control symptoms from the Blatt Menopausal Index [15] were also included. The measure was tested on a sample of 839 wives of graduate students in a large western university, 472 of whom had no children. They were between the ages of 20 and 30 years (mean 25.2 years). It is unclear whether these women were used in the measure’s original derivation. If so, the homogeneity and relatively young age of the sample may compromise its content validity, as it fails to encompass a varied population. Consequently, it may omit symptoms experienced by women at the older end of the spectrum, or women from different backgrounds. Factor structure Factor analysis revealed eight ‘symptom clusters’ (pain, concentration, behavioural change, autonomic reactions, water retention, negative affect, arousal and control). Richardson [8] has criticised this factor structure for providing an inadequate basis for future research. He suggests that an oblique solution (with correlated factors) would be more appropriate than the orthogonal (uncorrelated) rotation used by Moos. However, this method of analysis has been used by many other researchers [16] including those who have modified the MDQ [17]. Intercorrelations followed factor analysis to investigate any relationship between the symptom scales and the background data. Significant positive correlations were found between age and symptom intensity and number of children and symptom intensity in the premenstrual phase. This further emphasises the need for adequate representation of parous and older women in development samples.

The menstrual distress questionnaire (MDQ) [14] Description and development Developed in the USA, the MDQ is one of the most well cited questionnaires. Consisting of 47 items measured on a six-point scale, it focuses on the severity of symptoms from no experience to partially disabling experience. Women are asked to retrospectively rate their experience of symptoms in the premenstrual, menstrual and intermenstrual phase of their cycle, allowing comparisons to be made between the follicular and luteal phase. Symptom scores for each phase are calculated to identify particular areas of complaint for individual women. No guidance is provided to enable the researcher to determine significant levels of cyclical suffering.

User acceptability The overcomplexity of the rating system for measurement of symptoms (during the premenstrual, menstrual and intermenstrual phase simultaneously) could compromise the measure’s face validity. Strengths and weaknesses Although emphasis was placed on ascertaining the measure’s test – retest reliability, its internal consistency and validity was inadequately addressed. This raises concern, as the MDQ has been used as the basis for many derivative measures. It has also been criticised for its complexity [17] for covering too broad a range of symptoms, and not being specific to the changes in premenstrual tension syndrome

Table 1 Retrospective measures Measure

Items and scale

Reliability

Validity

Factor structure

MDQ [14]

47 items on a six-point scale—No experience of symptom to Partially disabling experience.

No tests of internal reliability were performed.

Rated in premenstrual, menstrual and intermenstrual phase of cycle, so comparison with follicular phase is possible.

Test – retest reliability performed on 15 cases.

Content—unclear whether homogenous sample used to test measure was involved in its derivation. Criterion—no evidence.

Criticised for providing inadequate basis for future research. Intercorrelations investigated relationship between symptoms scales and background data.

No guidance provided for the interpretation of scores. 35 items on a four-point scale (scale not specified).

PMTRS [10]

Rated in the premenstrual phase only. No comparison with follicular phase is possible. No guidance provided for the interpretation of scores. 36-item form with yes/no responses.

PAF [21]

10-item form for therapist use with rating of 0 – 4 (severe). Four subcategories developed to ensure only those severely affected would be diagnosed with PMTS. 95 items on a six-point scale—Not Applicable to Extreme Change. Rated in the premenstrual phase only. No comparison with follicular phase is possible.

Shortened PAF [23]

Shortened PAF [24]

PEA [26]

Guidance is provided for the interpretation of scores and whether or not they meet the criteria for a ‘disorder.’ 33 items—scoring method not stated. Rated in the premenstrual phase only. No comparison with follicular phase is possible. Some guidance is provided for the interpretation of scores ( > 133 severe, 115 moderate). 20- and 10-item forms on a six-point scale—Not Present to Extreme Change. Rated in the premenstrual phase. No comparison with follicular phase is possible. No guidance is provided for the interpretation of scores.

88 items on a four-point scale—Do not experience to Severely experienced. No guidance provided for the interpretation of scores.

No tests of internal reliability were performed.

Evidence of internal consistency. No test – retest reliability.

Construct—no evidence. Content—underpinned by Moos’ MDQ, which may have questionable content validity. Criterion—administered as an interview. Construct—factors correlated with GHQ. Content—measure less able to detect normative changes. Criterion—external assessment by psychiatrist and research nurse. Construct—women completed MDQ, VAS, state-trait anxiety and depression measure. Content—developed on nontreatment seeking women, but fairly homogenous and young. Criterion—narratives rated by two psychiatric social workers.

Evidence of internal consistency. No test – retest reliability.

Construct—compared with POMS and MDQ Content—included items endorsed by women in development of PAF. Criterion—no evidence.

Evidence of internal consistency. Test – retest reliability.

Construct—correlated with general well being. Content—item reduction based on inadequate sample. Criterion—no evidence.

Evidence of internal consistency. No test – retest reliability.

Construct—possible contamination by use of nicotine withdrawal symptoms. Content—developed by specialists, items reduced by interjudge agreement. Criterion—sample classified into severity groups. Construct—no evidence.

Factor analysed with similar factors to Moos’ MDQ emerging.

Not factor analysed.

Not factor analysed.

Items grouped on intercorrelations and alpha coefficients of internal consistency.

Factor analysed.

20-item version was factor analysed.

Factor analysed.

225

Rated in the premenstrual phase. No comparison with follicular phase is possible.

No tests of internal reliability were performed.

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

Modified MDQ [17]

226

Table 2 Prospective measures Items and scale

Reliability

Validity

Factor structure

DSRS[27]

17 items on a six-point scale—Not experienced at all to Very large amount.

Split half reliability coefficients.

Not factor analysed.

Allows the examination of an individual’s symptom severity and classification into high/low levels of suffering.

Test – retest reliability ( * questionable utility of test – retest in prospective measures).

Content—based on young nulliparous women, so may fail to encompass symptoms experienced by other groups. Criterion—premenstrual ratings correlated with criteria indicative of PMS.

No guidance provided for the interpretation of scores. DRF [29]

20 items on a six-point scale—Not experienced at all to Extreme.

No tests of internal reliability were performed.

Scores analysed for 5 days pre- and postmenstrually. Emphasis on ways of summarising complex sets of data.

COPE [11]

End-of-Day Questionnaire [33]

22 items on a four-point scale—Symptom not present to Severe/Intolerable. A follicular phase score of < 40 and luteal phase score of >42 is required as evidence of PMS.

No tests of internal consistency were performed. Test – retest reliability (*)

42 items on a five-point scale—Not present at all to Extreme.

Evidence of internal consistency.

No guidance provided for the interpretation of scores. 20 items on a nine-point scale—Not present at all to Extreme.

PMS defined as the presence of four symptoms of > moderate for four of 6 premenstrual days, and no more than two symptoms of > moderate on no more than two in 6 days following the menstrual week.

Factor analysis performed on 61 – 64 cases (low number for factor analysis).

Criterion—no evidence.

Analyses focussed on midcycle, premenstrual and menstrual phases.

MDQ-T [18]

Construct—unsatisfactory—modified MDQ used without reliability or validity checks. Content—sample may be unrepresentative of population.

No tests of internal reliability were performed.

Construct—scores compared with measures including family history of mental disorders. Content—developed from daily symptom reports of treatment seeking women. Criterion—interviews with 2 psychologists and psychiatrist. Profile of Mood States and Beck Depression Inventory used to establish validity. Construct—COPE correlated with POMS and BDI. Content—questionable as heavily biased towards how exercise makes one feel. Items taken from MDQ were changed from the original. ‘Classic’ symptoms of PMS omitted. Criterion—inadequate as no external assessment of women’s usual symptom levels. Construct—no evidence. Content—Moos’ MDQ modified for prospective use with confusing explanation of items. No rationale for omission of MDQ items. Criterion—women recruited to study on basis of GP/Gynaecologist’s diagnosis of PMS.

Construct—no evidence.

Not factor analysed. Discriminant Function Analysis used to demonstrate COPE’s ability to differentiate between sufferers and controls.

Factor analysed.

Not factor analysed.

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

Measure

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

(PMTS) [10]. This has led to the development of modified retrospective measures. The MDQ has also been modified as a concurrent method of symptom reporting [18]. These measures will be described below. Modified version of the MDQ [17] Description and development This British adaptation of the MDQ has not been well cited in the literature since its development. It comprises 35 items, measured on a four-point scale, instead of the six-point scale used by Moos. This measure was developed as a response to the overcomplexity of Moos’ original ‘three-phase’ MDQ measure, and concerns that some American symptom descriptors were both confusing and lengthy. No criteria are provided for presence/absence of PMS but women were considered sufferers if symptoms either appeared or were exacerbated during the premenstrual period. As women only rate their experience of symptoms during this time, no comparison with the follicular phase is possible. Psychometric properties Reliability issues. No tests of internal consistency or test – retest reliabilities were performed. Validity issues. Content validity. The measure arose from a pilot study of the Moos MDQ, with data collected from 50 women attending a London general practitioner (GP). Items eliminated from the reduced scale were those shown by Moos not to vary in relation to menstrual phase. The ‘distractibility’ symptom was omitted as it was covered by ‘difficulty in concentrating.’ In addition, certain words were changed to suit a British population (e.g., ‘hot flashes’ to ‘hot flushes,’ ‘cramps’ to ‘stomach pains,’ ‘insomnia’ to ‘sleep disturbance’). This modified version of the MDQ is underpinned by Moos’ [14] version, which may have questionable content validity due to the homogeneity of the original sample used. Criterion validity. The instrument was administered to a subsample of participants in the form of an interview. This was to ascertain whether women who were identified as PMS sufferers (73 women) or nonsufferers (45 women) on the modified MDQ, would be similarly identified on interview. Results showed a high sensitivity in detecting positive cases of PMS (98.2%), but lower specificity (66.1%) with a considerable false positive rate. Construct validity. This was addressed by correlating the seven factors (negative affect, pain, concentration, water retention, somatic behavioural change and skin disorders/ breast pain) with the General Health Questionnaire (GHQ). The highest correlation with the GHQ was found with negative affect and the lowest with water retention.

227

Factor structure Data was obtained from a sample of 521 women from 25 GPs. However, no descriptive data is provided regarding age and sociocultural variables of the sample. The 34 symptoms were intercorrelated and subjected to a principal components analysis with varimax rotation of the factor matrix. ‘Strikingly similar factors’ to Moos emerged, which although lend support to his initial factor structure, are not explained in detail. It may have been useful if GHQ items had been included in the factor analysis to see whether they loaded on a separate factor to modified MDQ items. User acceptability The measure has a reduced number of items, and women are only expected to retrospectively rate the premenstrual phase of their cycle, which they may find less complicated than the original. Strengths and weaknesses Although the measure’s reliability was not assessed, there is a fair attempt at ascertaining its criterion and construct validity. Reduction of the original scale from six to four points may lead to a loss of sensitivity, as women are provided with less response choice. The measure appears easy for women to complete, but its focus on the premenstrual phase is less useful if the researcher or clinician is interested in determining patterns of change throughout the cycle. Premenstrual tension rating scale (PMTRS) [10] Description and development A retrospective and frequently used measure developed in the USA. It consists of a 36 item self-report section, with a ‘yes/no’ response to each symptom, plus a 10-item form, with ratings of 0 – 4 (severe) for therapist use. It was developed due to concerns that commonly used scales, such as the MDQ, covered a broad range of phenomena and were, therefore, less specific in identifying particular changes in PMTS. The self-report section and 10-item form are designed to be used simultaneously and monitor changes in PMTS in women already diagnosed as sufferers. To identify the levels of symptomatology necessary for diagnosis of PMTS, four subcategories were developed: (A) the presence of at least five descriptive symptoms indicative of the core changes that occur in PMTS (including feeling irritable, tense and fatigued); (B) a ‘severity factor’ which included social impairment and whether help was sought for symptoms (from another person or from medication); (C) symptoms had to be present for at least six preceding menstrual cycles; (D) symptoms only occurred during the premenstrual period, with relief soon after the onset of menses. In addition, it was established that women did not meet the criteria for other psychiatric disorders.

228

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

Psychometric properties Reliability issues. No test – retest reliabilities were performed and the internal consistency of the measure was not addressed. Validity issues. Content validity. The PMTRS was constructed after an extensive review of the literature and clinical observations. The sample used to construct the measure consisted of 42 women, between the ages of 22 and 43 years (mean = 32 years) who suffered from severe premenstrual symptoms. Most of the sample was married with children. Items were derived from a rank ordered list of 27 MDQ items (4 physical and 23 emotional and behavioural items including two positive affect items). These were combined with six items from the Carroll Depression Scale [19] selected on the basis of participants’ responses during the evaluation period. The scale also included participants’ own descriptions of symptoms. Criterion validity. This was assessed by the external assessment of women by a psychiatrist and research nurse during the initial evaluation period to confirm the principal inclusion criteria. During subsequent visits in the follicular and luteal phase of the cycle, women would complete selfrating scales and participate in semistructured interviews, after which a research nurse and psychiatrist would complete observer-rating scales. No details are provided regarding agreement between observer ratings and women’s self-reports. Construct validity. This was indirectly addressed, as during the study period, women completed the MDQ, visual analogue scales, a measure of state-trait anxiety and a depression measure, in order to generate data for the item analysis which would form the PMTRS. Factor structure Factor analysis was not performed but an item analysis of the MDQ identified symptoms that markedly changed during the follicular and premenstrual phase of the cycle. Only women showing normative MDQ-Today follicular phase scores were included in the item analysis, and these values were compared with three previously studied populations (including Moos’ 1968 sample). A rank ordered list of 27 MDQ items was obtained from the item analysis. User acceptability The measure is easy for women to complete but the selfreport section requires a Yes/No response to the 36 symptoms, not a scale of severity adopted by most other measures. This forced choice response may be unacceptable to participants and could lack the sensitivity to detect less severe suffering. Condon [20] believed this dichotomy avoided the ‘ambiguity inherent in more graduated responses, namely that many minor complaints result in the same score as a few major ones’ (p. 545).

Strengths and weaknesses Although the validity of the measure was addressed, less emphasis was placed on reliability issues and assessment of underlying factor structure. The therapist rating section of the form requires some prior knowledge of participants over time. This has implications for its utility as a research tool, as investigators may not have direct contact with participants. Strict entry criteria resulted in only one in six volunteers being eligible to participate resulting in a relatively small sample size that was unrepresentative of the population at large. It was, however, developed on actual sufferers and, therefore, more likely to measure symptoms relevant to a clinical condition, rather than more normative levels of dysfunction. The premenstrual assessment form (PAF) [21] Description and development Developed in the USA, the PAF is a well-used questionnaire designed to measure changes in mood, behaviour and physical condition during the premenstrual period. 95 symptoms are rated, based on severity of change from ‘usual self’ (from 1 not applicable to 6 extreme change) during the last three premenstrual periods. Item frequencies and intercorrelations reduced the scale to 95 items, and three types of summary scoring systems were developed (bipolar continua, unipolar summary scales and typological categories). The PAF focuses on the severity of change from a woman’s usual nonpremenstrual state so no comparison with the follicular phase can be made. Guidelines are included to enable women to define the premenstrual period and its duration. The PAF enables the investigator to distinguish between premenstrual changes and other symptoms (e.g., depression and anxiety). Guidance is given for the interpretation of scores and whether or not they meet the criteria for a ‘disorder.’ Psychometric properties Reliability issues. Internal consistency coefficients of the unipolar scale scores revealed alphas of above .7 in 13 out of 18 scales. No test – retest reliabilities were performed. Validity issues. Content validity. The initial pool of 150 negative and positive items were generated from the literature, existing questionnaires (including the MDQ) and the authors’ experience. Female staff members reviewed these, and additional items were suggested. Data was obtained from 154 participants, 69 medical centre staff (mean age 34 years) and 85 student nurses. Preliminary data analysis on this sample reduced the 150 items to 95 items. The sample was not seeking treatment for premenstrual problems, which may be helpful when detecting less severe manifestations of symptoms. They were not taking the

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

contraceptive pill or any medication that may have influenced premenstrual changes. However, the sample may be considered homogenous (69 medical centre staff and 85 student nurses) and at the younger end of the spectrum (mean 24 years). Criterion validity. Two psychiatric social workers rated women’s written narratives of their premenstrual changes. Consistency was found between their ratings and the changes reflected in the unipolar summary scale scores. Construct validity. In order to investigate coverage of the PAF with other questionnaires assessing premenstrual changes, PAF items were compared with items in the MDQ and Profile of Mood States (POMS) [22]. This led to certain PAF items being described in greater detail than those appearing in the MDQ or POMS (e.g., seven items on the PAF describe the MDQ ‘irritability’ item). Factor structure Although factor analysis of scales is usual practice, the unipolar summary scales of the PAF were grouped based on item intercorrelations and alpha coefficients of internal consistency. This has been criticised by Richardson [8] who stated the PAF appeared to be developed ‘more on the basis of clinical judgement rather than detailed statistical analysis’ (p. 20). User acceptability Women may find the lengthy instructions coupled with the number of items tedious to complete. Strengths and weaknesses The reliability and validity of the PAF was well addressed and it has received less criticism than the MDQ. It was designed to reflect the variability of PMS, as opposed to treating changes as a single entity. Concerns about its length have led to the development of shortened versions. It also appears to be a complex measure to score. Shortened PAF [23] Description and development A retrospective form consisting of 33 items and developed in the USA. It has not been well cited in the menstrual cycle literature since its development. This measure identifies changes in mood, behaviour and physical well being during the week before onset of menses, thus, allowing no comparison to be made with the follicular phase. It is based on the original PAF, and provides confirmatory evidence for the three main PAF subtypes, which are thought to be associated with premenstrual dysphoric change (PMDC). The authors suggest that a score of 115 is indicative of moderate symptoms, whilst a score of 133 or above suggests severe symptoms of PMDC. 50% of the sample had slight or virtually no symptoms.

229

Psychometric properties Reliability issues. Cronbach’s Alpha for internal reliability revealed a high alpha (.98). No test – retest reliabilities were performed. Validity issues. Content validity. The original 95-item PAF was reduced to the 33 items most frequently endorsed as moderate or higher by the original 154 volunteers (described in the review of the PAF above). This version also included some less frequently endorsed items, ensuring the main subscales of the PAF were represented. Construct validity. The authors found a significant negative correlation between PMDC and general well being (GWB). GWB explained 33% of the variance when regressed with the modified PAF, and predicted the major part of the variance in the hostile depression, atypical depression and anxiety/physical distress factors. This is consistent with research linking PMDC and other depressive conditions, where GWB may well be lowered. Factor structure Questionnaires were administered to 737 women (from an initial sample of 950). They were the wives of military personnel between the ages of 15 and 46 years (mean 27 years) were not pregnant, and had menstruated during the preceding 3 months. The homogeneity of the sample could be criticised for being unrepresentative of the population, but a large sample size was utilised. Factor analysis using principal components analysis with varimax rotation was performed. Three main subtypes of PMDC were revealed (‘hostile,’ ‘atypical’ and ‘anxious’). In addition, two minor subtypes ‘organic’ and ‘positive well being’ were identified. User acceptability The reduced number of items would be acceptable to women. The paper does not make clear whether women have to rate their symptoms on the same scale as the PAF, or during the last three premenstrual periods as they do with the PAF. Strengths and weaknesses The primary focus of this paper was to examine, by regression analysis, modified PAF scores and PAF factor scores with various predictor variables, evaluating stress and well being in a fairly homogenous sample. The reliability was addressed, but there is insufficient evidence concerning its utility as a measuring tool. There is no evidence of criterion related validity, as no direct comparison is made with follicular state. Shortened PAF [24] Description and development A 20- and 10-item retrospective measure developed in the USA and not well cited in the literature. As with the

230

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

PAF, women were asked to rate symptoms on a scale of 1 – 6 (not present at all to extreme change) during the premenstrual week. The measures were administered at baseline, 6 and 12 months. This shortened version of the PAF allows an assessment to be made of three PMS constructs (change in affect, water retention and pain) that occur in the 7 days prior to the onset of menses, compared with the nonpremenstrual state. As with the PAF, and the shortened PAF (above) there is no direct comparison with follicular state. Items were summed to provide the PAF score, but no guidance is given regarding interpretation of the score and what would constitute PMS. Psychometric properties Reliability issues. Internal consistency was assessed and Cronbach’s Alpha was high (.95 at baseline) for the 20-item PAF. Alpha coefficients were then calculated for the three subscale scores (‘affect,’ ‘water retention’ and ‘pain’) and items not contributing to the internal consistency were omitted to form the 10-item scale. Alpha coefficients for the three subscales and the 10-item PAF were also high (.95 at baseline). Test – retest correlation coefficients for the 10and 20- item PAF were .6– .7, with a correlation between subscales of .6 ( P < .001). Validity issues. Content Validity. This measure reduces the 95-item PAF to 20 items then further reduces it to 10 items. The 20 items were those found by Halbreich [21] to change most frequently in the week before onset of menstruation. 217 nontreatment-seeking women from an original sample of 417 white women were used to test the 20-item measure and these data were used to develop the 10-item scale. The entire sample were taking part in a smoking cessation trial, having regular menses and were between the ages of 22 and 65 years (mean 38 years). There are many concerns regarding the sample selection. It could be argued that item reduction for the 10-item scale was based on an inadequate sample, who was at the older end of the spectrum, and all were attempting to stop smoking (reporting smoking an average of 26 cigarettes per day at baseline). This may have explained the presence of certain symptoms such as feeling under stress, weight gain, anxiety, and could potentially influence the data. The sample was composed entirely of white women, yet evidence suggests there may be differences between ethnic groups in the type of symptoms reported [25], thus, making it difficult to generalise to a wider population. Emphasis in the 10-item PAF was strongly placed on physical symptoms (6:4), and the greatly reduced number of items may fail to include the full range of symptoms experienced by women. Construct validity. Scores on the 10-item PAF were correlated with symptoms of nicotine withdrawal (although these symptoms are not specified in the paper). This was to

assess whether the scales were measuring separate symptom domains. Correlations were significant at each measuring point (by an average of .4) suggesting some crossover of symptoms and possible contamination of the PAF scale, which may be due to the inappropriate nature of the sample used to develop the 10-item measure. Factor structure A principal components analysis with varimax rotation was performed on the 20-item PAF, revealing three factors (‘affect,’ ‘water retention’ and ‘pain’). User acceptability Both the 20- and 10-item PAF would be easier and quicker to complete than the original 95-item version. Strengths and weaknesses The utility of the 10- and 20-item PAF as measuring instruments is questionable. Although reliability was addressed, it is arguable whether or not these measures demonstrated adequate content validity. There is also no evidence of criterion related validity. Premenstrual experience assessment (PEA) [26] Description and development Developed in the USA, but not widely used and cited in the menstrual cycle literature. This measure has 88 items, rated on a four point scale (from 0 = I don’t experience this symptom to 3 = severely experience symptom). Women rate the symptoms they experience in the week or two before their monthly period. The measure is designed to elicit information in many areas: medical/gynaecological, psychological/life event stressors, sociocultural influences and premenstrual symptomatology. Measurement of symptoms in the luteal phase means that no comparison can be made with follicular scores. No guidance is given regarding interpretation of scores. Psychometric properties Reliability issues. A Cronbach’s alpha coefficient for internal consistency was computed on each of the five factors (described below) and values falling between .74 and .92 were reported. It would appear that no test – retest reliabilities were performed. Validity issues. Content validity. One hundred and fifty symptoms of PMS (from 0 = probably not a PMS symptom to 3 = definitely a contributing PMS symptom) were gathered by a group of specialists over a 4-year period from interview protocols and clinical practice. Interjudge agreement was used to eliminate certain items, thus, reducing the 150 items to 88. Criterion validity. Although no formal assessment of general premenstrual status was provided, validity was

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

assessed by classifying the sample into severity groups, based on their answers to the question: ‘How much discomfort do you experience before your monthly menstrual period?’ Women were instructed to make one of three possible responses: Severe, Moderate or Mild, (although no definition of ‘discomfort’ is provided). Analyses of variance were conducted, and significant results reported on the instrument’s ability to differentiate between the factor profiles and levels of symptoms severity. Factor structure A sample of 1011 nontreatment seeking women was recruited by newspaper advertisements, flyers posted in universities, colleges, hospitals and presentations in educational establishments. The screening process led to recruitment of a total sample of 878 women, between the ages of 12 and 56 years (mean = 28.4 years making this a relatively young sample). Although attempts were made to recruit from wider backgrounds, the sample was predominantly white, raising the issue of generalisability (addressed above). Following a principal components analysis with varimax rotation, five dimensions were identified: (i) cognitive/attentional, (ii) heightened emotionality, (iii) physical complaints, (iv) sexual behaviour and feelings, (v) eating behaviour and water retention. No eigenvalues are reported in the paper. User acceptability In addition to the 88 items pertaining to premenstrual symptoms, numerous other items were included which participants may find tedious. However, the instructions ask how these symptoms interfere with women’s daily lives, terminology that some may find useful. Strengths and weaknesses Development of this measure was ‘data driven’ rather than grounded in preexisting measures. It provides information on women’s psychosexual, medical and gynaecological history, in addition to sociocultural influences and life stressors. Tested on a nonclinical population, and providing information pertaining to mild, moderate and severe levels of symptomatology, the PEA draws on women’s perception of the type and severity of their premenstrual symptoms, by asking them how the symptom interfered with their daily lives. This may be useful information as women may experience a symptom severely, but may not consider that it is a disruption to life. Although relatively long and time consuming to complete, it has been recommended for use by researchers conducting clinical trials of PMS [3].

Concurrent (or prospective) methods of data collection Although retrospective questionnaires are still used, the majority of studies now employ prospective or concurrent

231

methods of data collection. This is partly due to research suggesting that retrospective instruments are likely to lead to the stereotyped recording of symptoms, and an increased likelihood of women rating their symptoms as more severe by describing their ‘worst case,’ [7]. DSM IV also indicates that prospective daily ratings over two consecutive symptomatic cycles should be used in order to fulfil criteria for determining presence/absence of PMDD. Daily ratings allow the investigator to identify whether symptoms present in the luteal phase and remit in the follicular phase, which is not possible with some retrospective measures mentioned above. The daily symptom rating scale (DSRS) [27] Description and development The DSRS was developed in Australia in 1979. Although not widely cited in the literature, it was designed to provide ratings of women’s subjective experience of symptoms during the menstrual cycle. It comprises 17 items (10 ‘affect’ and 7 ‘physical’) and includes three items reflecting ‘positive affect.’ Symptom rating is on a 0 – 5 scale of intensity (not at all to very large amount). Five weeks of symptoms are recorded on each form. This measure rates daily symptoms over the premenstrual, menstrual and intermenstrual phases of the cycle. It enables the investigator to look at the severity of symptoms in each cycle phase and classify participants into those with high or low levels of symptoms. However, no guidance is given regarding interpretation of scores. Psychometric properties Reliability issues. Reliability of scores during the premenstrual week, menstruation, remainder of the cycle and complete cycle was assessed by calculating split-half reliability coefficients (all of which were above .9). Scores during the premenstrual week were subjected to a variant of Kuder – Richardson’s k0 for nondichotomised data and resulted in a coefficient of .96 [28]. Test –retest reliabilities were performed on data from 25 participants and provided coefficients above .80. As investigators are expecting there to be a variation in scores when using prospective methods of data collection, the use of test – retest reliabilities might be considered as inappropriate. Validity issues. Content validity. It was constructed from items previously used in menstrual cycle symptom questionnaires and rating scales. Items were either unchanged, combined with others or excluded by the authors (e.g., skin changes and emotional lability). Taylor excluded items on the basis that ‘the item was inappropriate to a substantial proportion of the available population sample,’ (p. 88). Omitting items in this way could have serious implications for the measure’s content validity, as the scale may fail to encompass symp-

232

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

toms experienced by different groups of women. A sample of 65 women with a median age of 19 years (mainly hospital staff) tested the measure, and 25 of these completed the scale for two consecutive cycles. The content validity of the measure is, therefore, compromised, as the sample was biased towards nulliparous women at the younger end of the spectrum, and with a particular educational and occupational status. The population of menstruating women was not represented in the research which was noted by Taylor himself, as was the fact that some less commonly occurring symptoms are omitted. He goes on to admit criteria for some items were imperfect (such as breast swelling and swelling of the face, hands and ankles). He believed ‘the selected criteria did tap at least some of the behaviour in question’ (p. 98) but evidence of ‘some behaviour’ might not be particularly useful. Criterion validity. This was addressed by the correlation of premenstrual ratings with four criteria indicative of severe PMS: whether the woman had ever consulted a doctor for her symptoms, the number of tablets taken in premenstrual week, self rating of symptom severity and estimation of severity in relation to most women. Premenstrual scores on the DSRS (total, affective and somatic) correlated significantly with these criteria. Construct validity. A modified version of the MDQ (reduced from 47 to 39 items) was used to obtain an account of women’s previous experience of menstrual and premenstrual symptoms. ‘Orderliness’ was replaced with sexual desire and rated as 1, greatly increased to 5, greatly decreased. Moos’ items ‘weight gain’ and ‘changes in eating habits’ were altered to permit both increases and decreases, and scored as the ‘sexual desire’ item (although ‘change in eating habits’ does not actually appear on the MDQ). The original scoring system was also changed from a 1 –6 to a 0– 5 scale. It appears this modified MDQ was not subjected to any reliability or validity checks, and so may provide an inaccurate indication of participants’ menstrual and premenstrual symptoms. Factor structure It may have been appropriate to have conducted a factor analysis in order to examine the underlying scale structure. Items appear to have been selected for inclusion based on evidence of commonly occurring symptoms from the research. User acceptability The measure appears easy and quick to complete. Three items reflect positive affect which women may find acceptable. Strengths and weaknesses The reliability of the measure is addressed in the paper, and its criterion validity is well addressed. However, its content validity is questionable and the factor structure is neglected. Taylor states that providing test retest correlations

may be seen as inappropriate, as scores are expected to change across different phases of the menstrual cycle. It may be able to detect less severe manifestations of symptoms, as no levels of perimenstrual symptomatology were specified in the inclusion criteria and the participants were sampled from a nonclinical population. The daily rating form (DRF) [29] Description and development Developed in the USA as part of ongoing research into premenstrual changes [7] this 20-item measure has been widely used by other menstrual cycle researchers. Women rate the items every evening, on a scale of 1 (not at all) to 6 (extreme). This measure is completed for an entire menstrual cycle with the focus of analysis on the 5 days pre- and postmenstrually. This is to include both the most severe changes and a stable pattern of lower ratings. There is less emphasis on detecting changes, but instead importance is placed on examining ways of summarising complex sets of data. The summary scale scores (described in ‘factor structure’ below) allow the investigator some flexibility when scoring the measure as they may be calculated in two ways: (1) to focus on premenstrual changes by contrasting scores pre- and postmenstrually; (2) to allow the researcher to focus on one of the summary scores and follow its pattern over an entire cycle or a particular period during the cycle. Psychometric properties Reliability issues. The authors did not directly address the reliability of the measure. Instead they state the study is ‘focused upon detection of patterns of change and their interrelationships’ (p. 129). Validity issues. Content validity. A sample of women between the ages of 18 and 45 years were recruited from notices posted around two medical centres and in newspapers. The sample was not representative of the population, as only women whose scores indicated either moderate to severe dysphoric premenstrual changes or minimal or no dysphoric premenstrual changes were included. Those reporting slight or mild changes were not studied further. Women were included or excluded on the basis of screening using the PAF and following 1 month of daily ratings. Daily ratings were available for 100 subjects, of which staff members rejected 36. Most of the 20 items were selected because ‘there is fairly consistent evidence that they describe changes that are more severe during the premenstrual than during the postmenstrual phase, e.g., fatigue, breast pain, depressed mood and irritability’ (p.129). However, no references are provided to corroborate this. The form includes two items intended to determine the severity of social impairment, plus items relating to alcohol and drug use.

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

Criterion validity. The measure’s criterion validity could have been established by comparing PAF scores and medical histories of women in the sample group with those excluded on the basis of suffering slight or mild changes, but this was not reported. Construct validity. Summary scale scores were compared with other measures, including family history of mental disorders (nonsignificant) and measures of gonadal hormones (results not reported). The relationship between the three clusters (described below) and lifetime diagnosis of affective disorder did not reach significance, but a trend indicative of a positive association was revealed (r =.24, P < .06). Whilst a low positive correlation with depressive symptoms shows evidence of construct validity, care must be taken that overlap with a different diagnostic category is not contaminating the menstrual symptom measure. Factor structure Analyses were performed on 61 – 64 cases. Change scores were calculated for each subject on the 20 items, and a between subjects factor analysis was run. The factors derived from a three and five factor solution were subjected to oblimin and varimax rotations. The oblimin rotation of the five-factor solution (‘dysphoric mood,’ ‘physical discomfort,’ ‘low energy,’ ‘consumption’ and ‘more alcohol, sex, active’) was chosen as the most clinically meaningful basis for a summary scoring system. It should be mentioned that the small number of cases used in the factor analysis might have produced an unreliable factor solution [30]. Analyses of pre- and postmenstrual change scores provided a three cluster solution, separating women into three groups based on their patterns of premenstrual change (moderately worse premenses, mildly worse premenses and no change – worse postmenses). User acceptability The face validity of the measure is questionable due to the confusing layout with all previous ratings visible, enabling participants to make comparisons across days. Several items that were of particular interest to the researchers (e.g., regarding alcohol and drug use) remain on the DRF, but may be irrelevant to other investigators. There is room on the form for women to record contributory factors (e.g., life events) that may have influenced their rating that day. This allows them to comment on any issues they may feel are important to how they are feeling on a particular day. Strengths and weaknesses Although the reliability of the measure is not directly addressed in the paper, the DRF has been well used in many different populations, and it has provided evidence for the notion of premenstrual changes rather than a unitary PMS. An updated version of the DRF has been developed [31] but there is currently no information available regarding its reliability and validity.

233

Calendar of premenstrual experiences (COPE) [11] Description and development A well cited measure developed in the USA, consisting of 22 items (10 ‘physical’ and 12 ‘behavioural’) and rated on a four point scale, from 0 = symptom not present to 3 = severe (intolerable, unable to perform normal activities). This measure was developed to identify PMS and provides the investigator with a follicular and luteal phase ‘cut off’ to distinguish sufferers from nonsufferers. Psychometric properties Reliability issues. No evidence of tests of internal consistency, but test –retest reliability was found to be high (.78, P < .0001) when the calendar was administered in the same phase over two menstrual cycles. As mentioned above, it could be argued that performing test – retest reliabilities might not be particularly useful with prospective methods of data collection. Validity issues. Content validity. The measure was developed from the open ended daily symptom reporting of 170 women over a 3-year period who were seeking treatment for PMS. The most commonly occurring symptoms were included in the calendar, so it appears that symptoms were not allocated by judges, but simply tabulated from daily records kept by women. Criterion validity. The completed scale was tested on women recruited from newspaper advertisements (and, thus, not actively seeking treatment for PMS). A sample of 54 women between the ages of 18 and 45 years participated in the study. 36 were identified as PMS sufferers (mean 35 years) and 18 identified as controls (mean 33 years). Entry criteria to the PMS group was summarised as the presence of at least one affective and physical symptom in the 5 days before menses, for 3 months; severe enough to cause social or work dysfunction with relief within 4 days of onset of menses and a symptom free interval until at least day 12. None of the women in the PMS group had a history of psychiatric disorder (assessed by interview with two psychologists and a psychiatrist) or any coexisting medical illness (assessed from their history and a physical examination). The POMS [22] and the Beck Depression Inventory (BDI) [32] were used to establish the concurrent validity of the COPE variables tension, depression, anger, fatigue and confusion. The measure shows evidence of predictive validity as it correctly identified women in the PMS group from the control group on 104/108 cycles (2.8% false negative rate and no false positives when used over two cycles). Construct validity. This was addressed, as scores on the POMS and BDI were significantly correlated with scores on the COPE.

234

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

Factor structure The calendar was not factor analysed. It would appear that items are grouped on the basis of how they adhere to factors on the POMS, as during the development phase, five POMS factors were found to correspond with COPE items. Differences between scores of PMS sufferers and controls were compared, and a discriminant function analysis was performed to demonstrate the calendar’s ability to distinguish between them. User acceptability This is a very simple measure to complete and, therefore, it may be acceptable to women. Its scoring system is uncomplicated, so the investigator is easily able to obtain luteal and follicular phase scores. As with other measures developed out of the UK, researchers may wish to change the wording of some items (e.g., ‘hot flashes’ to ‘hot flushes’). Further qualification of certain terms could be necessary as they may be regarded as ambiguous (e.g., ‘overly sensitive’). Strengths and weaknesses The measure’s validity was addressed, but no internal reliability tests were performed. In addition, the instrument was not subjected to factor analysis. The scoring system makes it possible to separate the physical and behavioural items in addition to considering them together. Items are evenly divided between those reflecting physical (10 symptoms) and behavioural (12 symptoms). This allows the researcher some flexibility in identifying the types of symptoms experienced by women. The authors state that the measure is applicable to both clinical and research settings. It was developed using treatment-seeking women, and during the testing phase, candidates were extensively screened before entry to the study. This led to fewer than one in 10 nontreatment seeking volunteers being selected for inclusion in the PMS group, and may indicate that the measure is less able to detect changes in women presenting with more normative levels of symptomatology. The method of recruitment may raise issues regarding the characteristics of women likely to respond to newspaper adverts. The end of day questionnaire [33] Description and development This 42-item British measure rates symptoms experienced that particular day, on a five-point scale from 0 (not at all) to 4 (extremely). It has not been well cited in the literature since its development. Women provided ratings for the entire cycle, with analyses focussing on 5 days in the midcycle, premenstrual and menstrual phases. It was reported that the measure was sensitive to changes experienced by women. However, as with most of the instruments described above, no criteria are provided for the identification of symptom levels indicating presence or absence of PMS.

Psychometric properties Reliability issues. The internal consistency of the component-based scales was high (Cronbach’s Alphas ranging from .75 –.95). As developers of prospective instruments have employed test –retest reliabilities, it should be noted that none were performed in this case. Validity issues. Content validity. This is questionable, as the research focus was the effect of exercise on premenstrual symptoms. Adjectives believed to be sensitive to the effects of running on positive and negative mood in men and women, were selected from a mood questionnaire. Further items were chosen after interviews with 10 psychology postgraduate students concerning their feelings throughout their menstrual cycle (‘strong’ ‘in control,’ ‘assertive,’ ‘powerful,’ ‘motivated’ and ‘tearful’). It can be seen that five out of six adjectives selected from this student sample were positive items, that may equally describe how exercise makes one feel, and there is no explanation of how they came to be selected for the questionnaire. Physical items relevant to the menstrual cycle were taken from the MDQ [14]. However, when we examined Moos’ MDQ, several items were noted as being changed from the original. Moos’ ‘tension’ for example became ‘tense’ in the Choi and Salmon’s questionnaire, which could alter the definition of the symptom. In addition, ‘difficulty concentrating’ was changed to a positive ‘concentrates well.’ Their ‘fatigue’ dimension (consisting of four symptoms instead of ‘take naps, stay in bed’ which appears on the MDQ) could equally be exercise oriented, as could ‘sore breasts’ in place of ‘painful breasts.’ Furthermore, MDQ items that remained unchanged from the original may easily be interpreted as being related to exercise (e.g., ‘active’ or ‘dizziness’). In addition, what could be considered as ‘classic symptoms’ of PMS, including anxiety, depression and mood swings do not appear on the questionnaire. In a review of a number of measures, Budieri et al. [3] reported that these symptoms were present on 31, 37 and 30 questionnaires, respectively. Criterion validity. This was inadequately addressed, as there were no external assessments of the women’s usual levels of premenstrual symptomatology. Factor structure The sample used to test the measure consisted of 143 women, comprising competitive sportswomen, sedentary women, and high and low exercisers. They were monitored for 5 days during their mid cycle, premenstrual and menstrual phase. Recruitment was by advertisements and leaflets placed in newspapers, various establishments including sports centres, department stores and Londonbased companies. In addition to the main sample, a further 91 students completed the questionnaire for a separate principal components analysis to standardise the scores.

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

A screen test determined the number of factors to retain for varimax rotation. Five components emerged after rotation, explaining 57.6% of the variance. They were labelled as ‘positive affect,’ ‘negative affect,’ ‘physical symptoms,’ ‘fatigue’ and ‘irritability.’ User acceptability The measure appears to be heavily biased towards items reflecting how exercise makes one feel. Therefore, women may be irritated by what could be considered irrelevant descriptors of menstrual cycle symptoms (e.g., ‘vigorous’ or ‘difficulty breathing’). It does allow for the expression of positive as well as negative items, which women might find acceptable. Strengths and weaknesses Items reflecting ‘affect’ in the End-of-Day Questionnaire were heavily biased towards the effects of exercise, rather than on identification of premenstrual suffering. This may seriously compromise the measure’s content validity, as does the fact that certain key symptoms associated with premenstrual suffering were omitted. With these points in mind, one should question the utility of this measure as a tool for identifying symptoms of premenstrual suffering more generally. The recruitment method may also raise issues about the characteristics of women likely to respond to newspaper adverts. The menstrual distress questionnaire —today (MDQ-T) [18] Description and development A British measure that has not been well cited in the literature. This 20-item scale is based upon the retrospective MDQ but women are asked to report on their experiences of symptoms occurring that day. Items have been reduced from 47 to 20 items, on a 0 – 8 scale, (from 0 = not at all to 8 = extreme). This measure was adapted to use in a study evaluating the effectiveness of cognitive therapy (CT) as a psychological treatment for PMS, compared with a wait list group in a sample of women, referred by their GP or Gynaecologist. Women completed the measure for 2 months, and PMS was defined as the presence of four symptoms with severity of >4 (moderate) for 4 of the 6 premenstrual days and no more than two symptoms of severity >4 on no more than 2 days in the 6 days following the menstrual week.

235

0 –8, and reducing it from 47 to 20 items. Certain wording has been simplified from the original (e.g., ‘lowered motor coordination’ was changed to ‘clumsiness’). Similar items were grouped together and treated as one item (e.g., ‘decreased efficiency, lowered performance’). All the items in the MDQ under the category of ‘behavioural change’ were grouped under the heading ‘I changed or avoided my usual activities today because of my symptoms.’ This is slightly confusing as two of these items ‘decreased efficiency’ and ‘lowered performance’ actually appear on the modified version. Criterion validity. Criterion validity was indirectly addressed, as women were recruited to the study based on a GP or Gynaecologist’s referral following diagnosis of PMS. These women had complained of a 6-month history of premenstrual symptoms, and had no coexisting medical or psychiatric problems. There was a 62% agreement between GP/Gynaecologist diagnosis and confirmation by the daily ratings. Construct validity. This was not addressed, but the BDI and Beck Anxiety Inventory were used by the authors to examine the effect of treatment on depression and anxiety. It may have been useful, therefore, to have correlated the scores with the MDQ-T. Factor structure No factor analysis was performed. User acceptability This measure is simpler to complete than the original MDQ, and is adapted for daily ratings of symptoms, instructing women to report on their experience of symptoms experienced that day. Its reduced number of items and even distribution of psychological and physiological items is acceptable, and one item reflects positive change (‘feeling of well being’) which women might find useful. There is space on the bottom of each sheet for women to include comments for qualitative analysis. This allows the researcher a greater understanding of a participant’s particular situation. Strengths and weaknesses The modified MDQ is used without reliability and validity checks or adequate explanation of its derivation. Although it is easy to see why certain items were omitted from the MDQ (e.g., ‘general aches and pains’) no rationale is given for the omission of other items e.g., ‘skin disorders.’

Psychometric properties Reliability issues. No tests of reliability were performed on the measure. The 0– 8 scale may make the measure more sensitive to change, but the large number of choices could compromise its reliability. Content validity. The MDQ-T has been modified from the original MDQ by extending the 0 –6 scale of severity to

Implications and recommendations There are a variety of measuring instruments available, all of which have their weaknesses, and many drawing on aspects of the MDQ or PAF in their development or validation. There may be scope to develop new instruments,

236

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237

which are not constrained by preexisting measures, but are more clearly grounded in the systematic, qualitative investigation of women’s experiences of their symptomatology. Until such instruments exist, we have to consider the available methods for measuring menstrual cycle symptoms in a systematic way, which has been the purpose of this review. To summarise, when deciding upon the most suitable measure for their purposes, researchers and clinicians need to consider the following points. (1) The derivation of the scale, and whether symptom lists were derived ostensibly from sufferers or a nonclinical population. This may lead the questions regarding whether the scale is more effective in measuring severe or more normative changes. It is also important to consider the appropriateness of participants, in terms of age, parity, cultural group or employment status. (2) The psychometric underpinnings, in terms of evidence of construct validity (through the use of other scales) and criterion validity (using other methods of assessment of PMS such as clinical evaluation). There should also be adequate internal reliability checks and evidence of factor structure. (3) Whether the scale measures symptoms, or responses to symptoms, in terms of impact on life, and how this fits with the purpose of the research. (4) Whether the scale measures symptoms during the premenstrual and follicular phase of the cycle, or during the premenstrual phase only. (5) Whether adequate guidance is given for the interpretation of symptom levels, and whether symptoms are reported on a dichotomy (presence/absence of symptom) or a continuum (with a scale of severity). (6) The general acceptability of the measure to women, in terms of length, ease of completion and clarity of items. The literature suggests that researchers fall into a pattern of use of one particular scale [34,35]. Whilst this lends consistency and comparability to studies, it may be necessary for investigators to regularly consider the alternatives. It is to be hoped this review will be of use in their decisionmaking processes.

Acknowledgments The authors would like to thank the ESRC and Community Health Sheffield who are providing the funding for this research.

References [1] Redei E, Freeman EW. Daily plasma estradiol and progesterone levels over the menstrual cycle and their relation to premenstrual symptoms. Psychoneuroendocrino 1995;20(3):259 – 67. [2] Gotts G, Morse C, Dennerstein L. Premenstrual complaints: an idiosyncratic syndrome. J Psychosom Obstet Gynecol 1995;16:29 – 35.

[3] Budieri DJ, Li Wan Po A, Dornan JC. Clinical trials of treatments of premenstrual syndrome: entry criteria and scales for measuring treatment outcomes. Br J Obstet Gynaecol 1994;101(4):689 – 95. [4] Steiner M, Born L. Advances in the diagnosis and treatment of premenstrual dysphoric disorder. CNS Drugs 2000;13(4):287 – 304. [5] American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 4th ed. Washington, DC: American Psychiatric Association, 1994. [6] Endicott J. History, evolution and diagnosis of pre-menstrual dysphoric disorder. J Clin Psychiatry 2000;61(12):5 – 8. [7] Endicott J, Halbreich U. Retrospective reports of premenstrual depressive changes: factors affecting confirmation by daily ratings. Psychopharmacol Bull 1982;18(3):109 – 12. [8] Richardson JTE. Questionnaire studies of paramenstrual symptoms. Psychol Women Q 1990;14:15 – 42. [9] Hartley Gise L, Bebovits AH, Paddison PL, Strain JJ. Issues in the identification of premenstrual syndromes. J Nerv Ment Dis 1990; 178(4):228 – 34. [10] Steiner M, Haskett R, Carroll B. Premenstrual tension syndrome: the development of research diagnostic criteria and new rating scales. Acta Psychiatr Scand 1980;62:177 – 90. [11] Mortola JF, Girton L, Beck L, Yen SSC. Diagnosis of premenstrual syndrome by a simple prospective and reliable instrument: the calendar of premenstrual experiences. Obstet Gynaecol 1990;36:302 – 7. [12] NIMH, Workshop on pre-menstrual syndrome. Rockville, MD: Cosponsored by the Centre for Studies of Affective Disorders and the Psychobiological Processes and Behavioral Medicine Section, April 1983, 14 – 15. [13] Anderson M, Severino SK, Hurt SK, Williams NA. Pre-menstrual syndrome research: using the NIMH guidelines. J Clin Psychiatry 1988;49(12):484 – 9. [14] Moos RH. The development of the menstrual distress questionnaire. Psychosom Med 1968;30:853 – 67. [15] Blatt M, Wesbader H, Kupperman H. Vitamin E and climacteric syndrome. Arch Intern Med 1953;91:792. [16] Morse C, Dennerstein L. The factor structure of symptom reports in premenstrual syndrome. J Psychosom Res 1988;32:93 – 8. [17] Clare AW, Wiggins RD. The construction of a modified version of the menstrual distress questionnaire for use in general practice populations. London: Academic Press, 1979. [18] Blake F, Salkovskis P, Gath D, Day A, Garrod A. Cognitive therapy for premenstrual syndrome: a controlled trial. J Psychosom Res 1998;45(4):307 – 18. [19] Carroll BJ, Feinberg M, Smouse PE, Rawson SG, Greden JF. The Carroll rating-scale for depression. 1. development, reliability and validation. Brit J Psychiat 1981;138:194 – 200. [20] Condon JT. Investigation of the reliability and factor structure of a questionnaire for assessment of the premenstrual syndrome. J Psychosom Res 1993;37(5):543 – 51. [21] Halbreich U, Endicott J, Schacht S, Nee J. The diversity of premenstrual changes as reflected in the premenstrual assessment form. Acta Psychiatr Scand 1982;65:46 – 65. [22] McNair DM, Lorr M, Droppleman LS. EITS manual for the profile of mood states. San Diego: Educational and Industrial Testing Service, 1971. [23] Rosen LN, Moghadam LZ, Endicott J. Psychosocial correlates of premenstrual dysphoric subtypes. Acta Psychiatr Scand 1988;77: 446 – 53. [24] Allen SS, McBride CM, Pirie PL. The shortened premenstrual assessment form. J Reprod Med 1991;36(11):769 – 72. [25] Woods N, Most A, Dery GK. Estimating perimenstrual distress: a comparison of two methods. Res Nurs Health 1982;5:81 – 91. [26] Futterman LA, Jones ME, Miccio-Fonseca LC, Quigley T. Assessing premenstrual syndrome using the premenstrual experiences assessment. Psychol Rep 1988;63:19 – 34. [27] Taylor JW. The timing of menstruation-related symptoms assessed by a daily symptom rating scale. Acta Psychiatr Scand 1979;60:87 – 105.

A. Haywood, et al. / Journal of Psychosomatic Research 52 (2002) 223–237 [28] Cronbach LJ. Essentials of psychological testing. New York: Harper and Row, 1970. [29] Endicott J, Nee J, Cohen J, Halbreich U. Premenstrual changes: patterns and correlates of daily ratings. J Affect Dis 1986;10:127 – 35. [30] Hammond S. Introduction to multivariate data analysis. In: Breakwell GM, Hammond S, Fife-Schaw C, editors. Research methods in psychology. London: Sage, 1995. p. 377. [31] Endicott J, Harrison W. Daily rating of severity of problems form. New York: Department of Research Assessment and Training, New York State Psychiatric Institute, 1990. [32] Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inven-

237

tory for measuring depression. Arch Gen Psychiatry 1971;35: 773 – 81. [33] Choi PYL, Salmon P. Symptom changes across the menstrual cycle in competitive sportswomen, exercisers and sedentary women. Br J Clin Psychol 1995;34:447 – 60. [34] Fontana AM, Palfai TG. Psychosocial factors in premenstrual dysphoria: stressors, appraisal and coping processes. J Psychosom Res 1994;38(6):557 – 67. [35] Fontana AM, Pontari B. Menstrual-related perceptual changes in women with premenstrual syndrome: factors to consider in treatment. Counsel Psychol Q 1994;7(4):399 – 406.