A validation study revealed differences in design and performance of MEDLINE search filters for qualitative research

A validation study revealed differences in design and performance of MEDLINE search filters for qualitative research

Journal Pre-proof A validation study revealed differences in design and performance of MEDLINE search filters for qualitative research Mandy Wagner, S...

775KB Sizes 0 Downloads 26 Views

Journal Pre-proof A validation study revealed differences in design and performance of MEDLINE search filters for qualitative research Mandy Wagner, Stefanie Rosumeck, Christian Küffmeier, Kristina Döring, Ulrike Euler PII:

S0895-4356(19)30817-0

DOI:

https://doi.org/10.1016/j.jclinepi.2019.12.008

Reference:

JCE 10032

To appear in:

Journal of Clinical Epidemiology

Received Date: 11 September 2019 Revised Date:

27 November 2019

Accepted Date: 11 December 2019

Please cite this article as: Wagner M, Rosumeck S, Küffmeier C, Döring K, Euler U, A validation study revealed differences in design and performance of MEDLINE search filters for qualitative research, Journal of Clinical Epidemiology (2020), doi: https://doi.org/10.1016/j.jclinepi.2019.12.008. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 The Authors. Published by Elsevier Inc.

A validation study revealed differences in design and performance of MEDLINE search filters for qualitative research

Mandy Wagner*, Stefanie Rosumeck, Christian Küffmeier, Kristina Döring, Ulrike Euler

Institute for Quality Assurance and Transparency in Healthcare (IQTIG), Katharina-Heinroth-Ufer 1, 10787 Berlin, Germany

*Corresponding author: Mandy Wagner E-Mail address: [email protected] Postal address: Institute for Quality Assurance and Transparency in Healthcare (IQTIG), KatharinaHeinroth-Ufer 1, 10787 Berlin, Germany

Figures: 1 Tables: 4 Supplementary Tables: 1

Declarations of interest: All authors declare no conflicts of interest.

1

Abstract Objectives: Several search filters exist to identify qualitative research but so far none of them has been validated with an independent set of relevant references irrespective of a medical topic. To provide a comparative overview of validation results for various MEDLINE search filters. Study Design and Setting: Search filters were tested for plausibility. A relative recall approach was used to generate a gold standard based on an overview of systematic reviews of qualitative studies. For each review the included qualitative studies were collected and checked for MEDLINE-indexing. The body of indexed articles yielded the gold standard. Validation tests were conducted to determine whether the references of the gold standard could be identified with the respective search filters. Results: Thirteen search filters were validated in MEDLINE. One search filter by Wong et al. (2004) was found to be the most sensitive (93.63%). While medical subject heading “qualitative research” achieved the best precision (2.15%), sensitivity was the lowest (22.56%). University of Texas provided the best balanced search filter with a sensitivity of 81.96% and a precision of 0.80%. Conclusion: Search filters to identify qualitative research in MEDLINE differ greatly in design and performance. The selection of the appropriate search filter depends on project specific demands and resources.

2

Keywords Search filter; Qualitative Research; Information storage and retrieval; MEDLINE; Databases, bibliographic; Sensitivity and Precision

Running title: Validation of MEDLINE search filters for qualitative research

Word count: 3,560

What is new? Key findings •

MEDLINE search filters for identification of qualitative research differ greatly in design and performance

What this adds to what is known? •

Initial validation of a search filter designed by University of Texas



Comparative overview of validation performance of MEDLINE search filters using a uniform gold standard (independent from medical topic and with literature published in several years)

What is the implication, what should change now? •

It is not possible to suggest a single search filter to identify qualitative research in MEDLINE



The choice for or against a search filter depends on project specific demands and resources

3

1

Introduction

Qualitative studies provide deep insights into patient preferences and experiences. In recent years, qualitative research has established itself as an important contribution to evidence-based healthcare. Therefore, studies of this kind should be considered for high-quality systematic reviews, as stated by the Cochrane Handbook for Systematic Reviews of Interventions [1, 2]. Qualitative research comprises many research designs, methods, and approaches such as interviews, focus groups, ethnography, grounded theory, etc. Furthermore, published qualitative studies are often unstructured and contain no or only insufficient standardized vocabulary [3]. In contrast to other study designs such as randomized controlled trials, it is therefore more difficult and complex to find qualitative studies in bibliographic databases [4, 5]. In order to get a comprehensive picture of qualitative studies as well as systematic reviews of qualitative studies on a specific medical or healthrelated topic, a search filter with a balanced ratio of sensitivity (retrieval of all relevant articles) and precision (retrieval of relevant articles as a proportion of total number of records found) can be a helpful tool for systematic search in bibliographic databases [6]. Many search filters were not adequately validated by means of an independent set of relevant references (gold standard). Moreover, gold standards (GS) are often restricted, lacking an optimal size and references from several years [7]. In addition, topic-specific GS limit generalizability. The aim of the present study was therefore to get an overview of already available search filters to identify qualitative research papers in the bibliographic database MEDLINE. To determine the filter with the highest sensitivity, the highest precision or the best balance between sensitivity and precision, all filters were validated with a newly generated GS without the constraint of a medical topic. With these validation results in mind, a search filter for MEDLINE can be selected with respect to individual requirements.

2

Methods

A four-stepped approach was used: 1. Identification of search filters to locate qualitative research in the bibliographic database MEDLINE, 2. Verification and plausibility checks of search filters, 3. Generation of a gold standard irrespective of a medical or health-related topic, 4. Validation of identified search filters against the newly created GS in MEDLINE using the Ovid platform.

2.1

Identification of search filters

A search was carried out to identify the most important search filters to locate qualitative research in the bibliographic database MEDLINE. Therefore, well-known sources and websites e.g. from the InterTASC Information Specialists' Subgroup (ISSG) [8], the Cochrane Handbook for Systematic Reviews of Interventions [1, 2], and the McMaster University’s department of Health Information Research Unit (HiRU) [9] were extensively searched.

2.2

Verification and plausibility checks of search filters

For validation, all search filters for qualitative studies were included that were generated for the bibliographic database MEDLINE using the Ovid platform, that were plausible and completely reproducible.

4

For replicability of search filters, a plausibility check was performed on all search filters identified. If search filters contained considerable syntax errors or content inconsistencies, they were excluded from validation. Developers and authors were not contacted. The Medical Subject Heading (MeSH) “Qualitative Research” was introduced in 2003. However, some of the identified filters were developed before implementation of this controlled vocabulary and so far have not been updated [10, 11]. Besides, the heading “Qualitative Research” comprises the sole narrower term “Hermeneutics”, with an entry date of 2014, resulting in two additional search filters (“Qualitative Research/” and “exp Qualitative Research/”) [12].

2.3

Generation of a gold standard

A gold standard can be generated using various methods. On the one hand, journals were hand searched in which topic-relevant publications can be expected [10, 11, 13]. On the other hand, relevant articles can also be identified in a bibliographic database [14]. In contrast to creating a GS by manually compiling references (hand search), which is very time-consuming, Sampson et al. [15] formed a GS by combining all studies that were included in various systematic reviews (relative recall approach). This allowed for the generation of a GS that includes relevant publications over a long period of time, regardless of a medical or health related topic. The relative recall approach has already been used [16, 17] and is therefore employed in the context of this validation. A review by Dalton et al. [18] formed the basis to generate the GS. The aim of Dalton et al. was to provide a descriptive overview of systematic reviews of qualitative studies listed in the Database of Abstracts of Reviews of Effects (DARE). A search for the period from 2009 to 2014 - using the internal tagging system - yielded 145 published systematic reviews. Until 2014 DARE itself includes hits from MEDLINE, Embase, PsycINFO, PubMed and CINAHL and tagged relevant reviews as qualitative research since 2009 [18, 19]. Therefore, the source for the GS can be considered as comprehensive. In a first step, all 145 systematic reviews of qualitative studies included in Dalton et al. [18] were obtained in full text and managed in EndNote. Thereafter, all primary studies included in these systematic reviews that also were available as journal articles or dissertations, were loaded in an EndNote library. Grey literature and books were not considered because they are generally not indexed in bibliographic databases. There was no assessment as to whether the primary studies applied qualitative methods, as we assume that they are all qualitative studies and thus fulfil our inclusion criteria. However, if the included studies in the systematic reviews were clearly divided into qualitative and quantitative primary studies, only qualitative studies were considered to compile the GS (e. g. Torquato Lopes et al. [20]). Lastly, duplicates were removed and only publications indexed in MEDLINE were added to the entire GS.

2.4

Validation of search filters

During the validation process it was determined whether the references of the GS could be identified with the respective filter in MEDLINE via Ovid. Based on the number of hits, the sensitivity, the precision and the number needed to read (NNR) were calculated (Table 1). The relative recall approach does not allow for the estimation of the true precision and NNR [17]. However, to enhance the comparability of search filter performances the precision ratio was calculated, while setting the precision ratio of 1 for the search filter with the highest sensitivity [17].

5

3

Results

3.1

Identification of search filters

Thirteen search filters for qualitative research could be identified [10, 11, 14, 21, 22] as well as the MeSH-term "Qualitative Research" (exploded as well as unexploded [12]). Shaw et al. [22] evaluated three different filters ("thesaurus terms", "free-text terms", "broad-based terms") for several databases including MEDLINE via Ovid, where the GS consisted of references on support for breastfeeding. The McMaster University hedges team developed seven different search filters for qualitative studies that can be used for a variety of purposes (different "single term" and "two or three term" strategies) [11]. The gold standard used were publications from 161 health care journals from the year 2000, which dealt with patient experiences and where data were gathered using qualitative methods [11]. In a later publication, McKibbon and colleagues [10], also from the McMaster University hedges team, list another filter which searched a single word in multiple fields. The aforementioned search filters were developed and published without taking into account the MeSH-term "Qualitative Research", introduced in 2003 [12]. The website of the University of Texas School of Public Health (UTHealth) lists a further, non-validated search strategy for qualitative studies considering the MeSH-term mentioned above [21]. In a publication by DeJean et al. [14] an additional hybrid filter was validated. None of these search filters have been validated with a topic-independent GS covering published references from several years. Also, no comparison was made in terms of sensitivity and precision.

3.2

Verification and plausibility checks of search filters for qualitative research

A total of 15 search filters were tested for plausibility (Table 2). All seven different search filters by Wong et al. (Wonga-g) and the search filter from UTHealth were included for validation [11, 21]. The search filters Wongc ("maximizes sensitivity"), Wonge ("maximizes specificity") and Wongg ("best balance of sensitivity and specificity") can also be found on the McMaster University website [9]. The single term strategy “interviews.mp.”, identified in a paper by McKibbon et al. [10], could be a transfer error while testing multiple hedges preliminary for the database PsycINFO. It is possible, that the single term strategy “interviews.tw.” by Wong et al. (Wongb) [11] with a different field notation could have been meant. As this uncertainty could not be ruled out, “interviews.mp.” (McKibbon) was kept as another search filter to identify qualitative articles [10]. Of the three search filters developed by Shaw et al. [22], the search filter with the “free-text terms” (Shawb) was excluded from the validation because it contains both syntax inconsistencies (e. g. missing field notations in some search lines, incorrect parentheses) and content related discrepancies (e. g. “speigelberg$.tw.” instead of “spiegelberg$.tw.”, redundant search lines). The hybrid filter from DeJean et al. [14] relies amongst others on Shawb without correcting syntax errors or inconsistencies. It was therefore excluded as well. Both aforementioned filters were constructed for medical and health-related topics with a special focus on women’s experiences. Shaw et al. [22] searched for the topic breastfeeding and DeJean et al. [14] searched for qualitative articles in the field of chronic obstructive pulmonary disease as well as early breast cancer. Including terms like “women’s stor*“ or “feminis$“ seem appropriate for these topics (if male breast cancer is excluded) but is not suitable for an overall search filter to locate qualitative research in general. Then, terms 6

like “men’s stor*“ should have also been incorporated. In conclusion both search filters (Shawb [22], DeJean [14]) were excluded from validation process. However, Shawa (“thesaurus terms”) and Shawc (“broad-based terms”) were included as well as the MeSH-term "Qualitative Research" (exploded and unexploded; MeSHa and MeSHb [12]). Length of search filters ranged from a single word [10, 11] or MeSH-term [12], up to very long search strategies including free-text search terms and/or controlled vocabulary [21, 22], respectively (Table 2).

3.3

Generation of a gold standard

Of the 145 systematic reviews included in the publication by Dalton et al. [18], 131 were indexed in MEDLINE (90.3%, gold standard 1, Figure 1). Characteristics of these reviews can be found in the initial publication by Dalton et al. [18]. The reviews were published between 2009 and 2014 and focus on different medical or health-related topics, e. g. cancer, mental health, or diabetes. For the systematic literature searches described in the systematic reviews, one to 26 bibliographic databases were employed (mean: 7). MEDLINE (via PubMed and/or via Ovid) was searched the most, in 96.6% of all systematic reviews (140 out of 145). CINAHL was accessed in 87.6%, PsycINFO in 82.8% and Embase in 63.4% of the systematic reviews. In about 60% of included reviews, the search strategies included a search block to identify qualitative studies. This search block was very heterogeneous in the reviews, often only single words were used, such as "qualitative", "grounded theory" or "focus groups" without specifying a field notation. A clear reference to previously published search filters was very rare (see Supplementary Table). Due to a lack of reporting standards regarding search strategies - and in particular the applied search filters - it is unclear how often published filters were used or adapted for the literature search. However, almost all reviews took additional sources (hand searches of specific websites and journals, review of reference lists, contact with authors and experts, forward and backward citation of references, etc.) into account to ensure a systematic search for qualitative studies. In the remaining reviews the search was carried out without restriction to qualitative studies or no details about the entire literature search strategies were reported. In total, the systematic reviews included 3,012 primary studies, of which 2,898 were recorded in an EndNote library, excluding clearly defined quantitative studies, books and grey literature. After deduplication, 2,715 articles remained, of which 2,192 references were indexed in MEDLINE (80.7%, gold standard 2, Figure 1). A critical appraisal of the included primary qualitative studies was done using a wide variety of tools in 133 systematic reviews (91.7%). Authors of 46 reviews planned to exclude studies due to their poor quality. The entire GS consisting of qualitative studies and systematic reviews on qualitative studies included a total of 2,323 references. The publication period of the references included ranged from 1968 to 2014.

3.4

Validation of search filters

The validation with the entire GS showed that the search filter Wongc, which is also preferred by McMaster University as a very sensitive search filter [9], had the best sensitivity in MEDLINE (93.63%) 7

(Table 3). With this filter 2,175 out of 2,323 references could be retrieved. In contrast, there was a NNR of 1,418 articles, indicating a very high screening workload. During validation the search filter Wongd [11] showed a very good sensitivity (92.25%) with a considerably reduced screening effort (NNR: 491 articles), as described by Wong and colleagues [11] with “best sensitivity - small decrease in sensitivity with large increase in specificity”. The best precision of 2.15% could be reached with the MeSH-term "Qualitative Research", but with the lowest sensitivity (22.56%) and a NNR of 47 articles (each, MeSHa and MeSHb) [12]. It was irrelevant whether the MeSH-term was exploded or not. The search filter of the UTHealth [21] demonstrated the best balance between sensitivity and precision (81.96% and 0.80%, respectively) for the entire GS. In this case, 126 articles need to be read to uncover one relevant article (Table 3). This search filter was thus 11.3 times more precise (precision ratio) than the very sensitive search filter Wongc [11]. If the GS is considered separately as systematic reviews of qualitative studies (gold standard 1) and primary studies (gold standard 2), the validation results are slightly different (Table 4). With GS 1, the best sensitivity could be reached with the Wongd search filter (97.71%). One hundred twenty eight of the 131 references in GS 1 could be identified using this search filter. Furthermore, three search filters presented by Wong and colleagues [11] reached a sensitivity of 96.18% (Wonge, Wongf, and Wongg), respectively. Wonge on the other hand, retrieved 126 out of 131 references with a notable reduction in the expected screening workload (NNR: 518 articles). Moreover, the Wonge filter was 5.3 times more precise (precision ratio) than the search filter Wongd. Thus, this search filter achieved the best balance between sensitivity and precision. Wong et al. [11] found that this search filter (Wongd) has the best sensitivity when compared to their other developed two- or three-term strategies, keeping specificity ≥ 50%. The search filter Wongc, listed as the most sensitive filter by McMaster University [9, 11], performed poorer in validation with GS 1 (Table 4, sensitivity of 87.79%) than with the entire GS (Table 3, sensitivity of 93.63%). Again, the MeSH-term searches yielded the best precision (0.360%), but with a low sensitivity of 47.33%.

4

Discussion

4.1

Main findings

To our knowledge, these findings provide a good overview of already available search filters for qualitative research for the bibliographic database MEDLINE. Based on the validation results, which were generated using a gold standard according to the relative recall approach, a search filter for MEDLINE may be selected, with respect to the individual requirements. The search filter Wongc (two- or-three-term strategy), which is also listed on the McMaster University website, can be recommended for a very sensitive search in MEDLINE [9, 11]. Although this search filter was very sensitive, not all relevant qualitative studies from the GS were detected with this filter. This is probably due to the heterogeneous reporting in qualitative studies with no or only insufficient standardized vocabulary and many different research designs, methods, and approaches [3-5]. Although the MeSH-term searches (exploded and unexploded) have the best precision, they cannot be recommended as the sole search string due to the very poor sensitivity of about 23%, even if the 8

NNR promises a low screening workload. With these MeSH-term searches only a few qualitative studies can be found without any claim to completeness (about 500 out of 2,323 from the entire GS) but probably very precise hits. Here, a consequent tagging by indexers of the national library of medicine would have been desirable. Additionally, it would be beneficial if study authors used a rather standardized wording to describe their qualitative studies. However, with most of the search filters tested, both systematic reviews and primary qualitative studies can be retrieved with sufficiently good results. Solely search filters that contain only the single word "interview" (with different field notations) showed very poor results in terms of sensitivity when only systematic reviews were considered (gold standard 1). Here, the use of longer search strategies or of MeSH-terms has proven to be much more advantageous. To our knowledge, the search filter developed by the University of Texas [21] has never been validated and the date and method of development is unknown. It considers the MeSH-term “Qualitative Research” and might fill the gap between one- to three-term strategies by Wong et al. [11] and the opposite of a very long search strategy by Shaw et al. [22]. Our findings show that this search filter obtained a good balance between sensitivity and precision. With its sufficiently good sensitivity combined with an appropriate workload for screening, it can be recommended for systematic searches on qualitative studies. Our results show strong differences for the available search filters for qualitative studies in terms of sensitivity, precision and NNR. When searching qualitative studies in MEDLINE, the selection of a suitable search filter therefore depends on the research question, the claim for completeness and the resources available for screening. Furthermore, a systematic search should include different bibliographic databases (not only MEDLINE). For example, the bibliographic database CINAHL may contain a good collection of qualitative studies as shown for studies on dementia [5]. Furthermore, trials, registries and grey literature should also be included to get a comprehensive overview of a topic. The relative recall approach by Sampson et al. [7] allows the creation of a GS regardless of a medical or health related topic. The size of the GS and the long publication period is also of particular importance to improve the estimation of the validation measurements. In addition, the GS was checked for a possible selection bias. This would be conceivable, for example, if the majority of the 145 systematic reviews themselves have used only one particular published search filter to identify the qualitative studies. Since this assumption did not apply, with the GS in place a good generalizability is given. However, a proper calculation of precision and thus the NNR was not possible with the relative recall approach. For this, the exact number of irrelevant but found references by the search filter would have been necessary. However, this number could only be approximately determined due to a lack of manual screening of the references and thus probably led to an underestimation of the true precision or NNR. The validation results also benefit from the fact that all search filters validated were developed exclusively for use in Ovid MEDLINE, so that no filters had to be translated (with attendant problems of translation inaccuracies).

9

4.2

Limitations

In all, it needs to be emphasized that the calculated performance measures are only rough estimations. It cannot be ruled out that precision calculations are underestimated, because the retrieved references of the individual search filters could still contain relevant qualitative articles that were not part of the entire GS. Furthermore, it is probably possible that some references of the entire GS are not qualitative research because we did not manually verify this. However, to overcome this, we additionally presented the precision ratio as a means of normalization to compare the search filters with each other. Likewise, it could be possible that the GS 2 is biased in selection as some review authors aimed to exclude studies of poor quality. However, the magnitude of this impact could not be entirely examined due to scarce reporting, but seems to be acceptable. Since the GS were generated irrespective of a subject, there is a possibility that the performance measures may differ in combination with a specific medical topic. Nevertheless, by doing so the generalizability is increased. It cannot be ruled out, that using a GS created from publications based on systematic reviews (including systematic searches using specific words referring to qualitative research, see Supplementary Table) influences the presented validation results. The true informative value of the measurements in everyday use depends on the combination of the search filter with the topic-specific search terms and other limitations. For example, the true numbers of performance measures can change considerably by adding a thematic search block. In order to ensure that all qualitative studies will be retrieved, it should also be examined whether the use of a restrictive search filter for qualitative studies should be completely omitted.

5

Conclusions

To the best of our knowledge, these are validation results for a broad range of published search filters in MEDLINE using a GS irrespective of a restrictive subject following the relative recall approach by Sampson and colleagues [15]. In this context, the filter Wongc can be recommended as a very sensitive search filter to identify primary studies and systematic reviews of qualitative research in MEDLINE. However, this comes along with a high number of articles to be read, which would be necessary to identify the relevant ones. The search filter of the University of Texas represents an appropriate choice if a sensitivity of above 80% is desired while minimizing the workload. Artwork and Tables with Captions Figure 1: Generation of gold standard Table 1: Calculations Table 2: Search strategies of all validated search filters Table 3: Validation results with the entire gold standard (2,323 articles) Table 4: Validation results with gold standard 1 (131 references, left) and gold standard 2 (2,192 references, right)

10

Supplementary table: Qualitative research search strategy of 145 systematic reviews included in Dalton et al. (2017)

Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Acknowledgment: The authors gratefully acknowledge Dr. Laura Haas for proof-reading and her valuable input.

11

References [1] Noyes J, Popay J, Pearson A, Hannes K, Booth A. Chapter 20: Qualitative research and Cochrane reviews. Version 5.1.0. Updated 20.03.2011. London: Cochrane Collaboration. URL: http://handbook-5-1.cochrane.org/. Accessed August 26, 2019. [2] Booth A. Chapter 3: Searching for Studies. In: Noyes J, Booth A, Hannes K, Harden A, Harris J, Lewin S, et al.; (editors): Supplementary Guidance for Inclusion of Qualitative Research in Cochrane Systematic Reviews of Interventions. Version 1 (updated August 2011). Cochrane Collaboration Qualitative Methods Group, 2011. URL: http://methods.cochrane.org/qi/supplemental-handbook-guidance. Accessed August 26, 2019. [3] Evans D. Database searches for qualitative research. J Med Libr Assoc 2002;90(3): 290-3. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC116400/pdf/i0025-7338-090-03-0290.pdf. [4] Booth A. Searching for qualitative research for inclusion in systematic reviews: a structured methodological review. Systematic Reviews 2016;5(74). DOI: 10.1186/s13643-016-0249-x. [5] Rogers M, Bethel A, Abbott R. Locating qualitative studies in dementia on MEDLINE, EMBASE, CINAHL, and PsycINFO: A comparison of search strategies. Research Synthesis Methods 2017, Epub 28.10.2017. DOI: 10.1002/jrsm.1280. [6] Beale S, Duffy S, Glanville J, Lefebvre C, Wright D, McCool R, et al. Choosing and using methodological search filters: searchers’ views. Health Information and Libraries Journal 2014;31(2): 133-47. DOI: 10.1111/hir.12062. [7] Jenkins M. Evaluation of methodological search filters – a review. Health Information and Libraries Journal 2004;21(3): 148-63. DOI: 10.1111/j.1471-1842.2004.00511.x. [8] Glanville J, Lefebvre C, Wright K. (editors): ISSG Search Filters Resource. Updated 2019 May 7. York (UK): ISSG [InterTASC Information Specialists’ Sub-Group]. URL: https://sites.google.com/a/york.ac.uk/issg-search-filters-resource/home. Accessed August 26, 2019. [9] HiRU [Health Information Research Unit]. Search Filters for Medline in Ovid Syntax and the PubMed translation. Last modified: February 9, 2016. Hamilton, Ca: HiRU. URL: https://hiru.mcmaster.ca/hiru/HIRU_Hedges_MEDLINE_Strategies.aspx. Accessed August 26, 2019. [10] McKibbon KA, Wilczynski NL, Haynes RB. Developing Optimal Search Strategies for Retrieving Qualitative Studies in PsycINFO. Eval Health Prof 2006;29(4): 440-54. DOI: 10.1177/0163278706293400. [11] Wong SS-L, Wilczynski NL, Haynes RB. Developing Optimal Search Strategies for Detecting Clinically Relevant Qualitative Studies in MEDLINE. Stud Health Technol Inform 2004;107: 311-6. DOI: 10.3233/978-1-60750-949-3-311. [12] NLM [U. S. National Library of Medicine]. Qualitative Research. MeSH Descriptor Data 2019. Revision Date: 16.06.2014. Rockville Pike, US-MD: NLM. URL: https://meshb.nlm.nih.gov/record/ui?ui=D036301. Accessed August 26, 2019. [13] Wilczynski NL, Marks S, Haynes RB. Search Strategies for Identifying Qualitative Studies in CINAHL. Qual Health Res 2007;17(5): 705-10. DOI: 10.1177/1049732306294515. [14] DeJean D, Giacomini M, Simeonov D, Smith A. Finding Qualitative Research Evidence for Health Technology Assessment. Qual Health Res 2016;26(10): 1307-17. DOI: 10.1177/1049732316644429. [15] Sampson M, Zhang L, Morrison A, Barrowman NJ, Clifford TJ, Platt RW, et al. An alternative to the hand searching gold standard: validating methodological search filters using relative recall. BMC Med Res Methodol 2006;6: 33. DOI: 10.1186/1471-2288-6-33. [16] Durão S, Kredo T, Volmink J. Validation of a search strategy to identify nutrition trials in PubMed using the relative recall method. J Clin Epidemiol 2015;68(6): 610-6. DOI: 10.1016/j.jclinepi.2015.02.005.

12

[17]

[18]

[19] [20]

[21]

[22]

Waffenschmidt S, Hermanns T, Gerber-Grote A, Mostardt S. No suitable precise or optimized epidemiologic search filters were available for bibliographic databases. J Clin Epidemiol 2017;82: 112-8. DOI: 10.1016/j.jclinepi.2016.08.008. Dalton J, Booth A, Noyes J, Sowden AJ. Potential value of systematic reviews of qualitative evidence in informing user-centered health and social care: findings from a descriptive overview. J Clin Epidemiol 2017;88: 37-46. DOI: 10.1016/j.jclinepi.2017.04.020. Centre for Reviews and Dissemination. York (UK): University of York. URL: https://www.crd.york.ac.uk/CRDWeb/HomePage.asp. Accessed August 26, 2019. Torquato Lopes APA, das Neves Decesaro M. The Adjustments Experienced by Persons With an Ostomy: An Integrative Review of the Literature. Ostomy Wound Manage 2014;60(10): 34-42. UTHealth [University of Texas Health Science Center at Houston]. Search Filters for Various Databases: Ovid Medline. Last Updated: 28.11.2018. Houston, US-TX: UTHealth. URL: http://libguides.sph.uth.tmc.edu/search_filters/ovid_medline_filters. Accessed February 2, 2019. Shaw RL, Booth A, Sutton AJ, Miller T, Smith JA, Young B, et al. Finding qualitative research: an evaluation of search strategies. BMC Med Res Methodol 2004;4: 5. DOI: 10.1186/14712288-4-5.

13

Table 1: Calculations

a - relevant articles of the gold standard retrieved with search filter

Definitions

b irrelevant articles# retrieved with search filter (hits in MEDLINE for each search filter minus references of the gold standard) #

could be articles relating to qualitative research and relevant but are not part of the gold standard

c - relevant articles of the gold standard not retrieved with search filter Sensitivity (%)

a/(a+c)*100

Precision (%)

a/(a+b)*100

NNR

(a+b)/a

Precision rate

Precision/precision of the search filter with the highest sensitivity

Table 2: Search strategies of all validated search filters

Search filter

Labeled by the filter author(s)

Search strategy for MEDLINE (via Ovid)1

Wonga [11]

Single term – best sensitivity

interview:.mp.

and single term - best optimization of sensitivity and specificity b

Wong [11]

Single term – best specificity

interviews.tw.

Wongc [9, 11]

Two- or three terms – best sensitivity

interview:.tw. or px.fs. or exp health services administration/

Wongd [11]

Two- or three terms – best sensitivity – small decrease in sensitivity with large increase in specificity

interview:.mp. or px.fs. or qualitative.tw.

Wonge [9, 11]

Two- or three terms – best specificity

qualitative.tw. or themes.tw.

Wongf [11]

Two- or three terms – best specificity – small decrease in specificity with large increase in sensitivity

interviews.mp,pt. or qualitative.mp. or experiences.tw.

Wongg [9, 11]

Two- or three terms – best optimization of sensitivity and specificity

interview:.mp. or experience:.mp. or qualitative.tw.

McKibbon [10]

High specificity

interviews.mp.

Shawa [22]

Thesaurus terms

Qualitative Research/ or Nursing Methodology Research/ or Questionnaires/ or exp Attitude/ or Focus Groups/ or discourse analysis.mp. or content analysis.mp. or ethnographic research.mp. or ethnological research.mp. or ethnonursing research.mp. or constant comparative method.mp. or qualitative validity.mp. or purposive sample.mp. or observational method$.mp. or field stud$.mp. or theoretical sampl$.mp. or phenomenology/ or phenomenological research.mp. or life experience$.mp. or cluster sampl$.mp.

Shawb [22]

Free-text terms

excluded from validation

Shawc [22]

Broad based terms

findings.af. or interview$.af. or interviews/ or qualitative.af.

1

Search filter

Labeled by the filter author(s)

Search strategy for MEDLINE (via Ovid)1

UTHealth [21]

-

((("semi-structured" or semistructured or unstructured or informal or "in-depth" or indepth or "face-to-face" or structured or guide) adj3 (interview* or discussion* or questionnaire*))).ti,ab. or (focus group* or qualitative or ethnograph* or fieldwork or "field work" or "key informant").ti,ab. or interviews as topic/ or focus groups/ or narration/ or qualitative research/

MeSHa [12]

-

exp Qualitative Research/

MeSHb [12]

-

Qualitative Research/

DeJean [14]

Hybrid filter (compilation of different search filters connected with Boolean operator ‘OR’)

excluded from validation

notations like *, :, or $ are database specific syntax (please refer to database guide for more details)

Table 3: Validation results with the entire gold standard (2,323 articles)

Search filter

Sensitivity (%)

Precision (%)

NNR

Precision ratio

Wonga

68.23

0.61

165

8.6

Wongb

52.13

1.21

83

17.2

Wongc

93.63*

0.07

1,418

1.0

Wongd

92.25

0.20

491

2.9

Wonge

63.84

0.99

102

14.0

Wongf

84.12

0.59

171

8.3

Wongg

90.31

0.20

490

2.9

McKibbon

57.04

1.06

94

15.1

Shawa

75.38

0.24

415

3.4

Shawc

88.03

0.12

816

1.7

UTHealth

81.96‡

0.80‡

126‡

11.3‡

MeSHa

22.56

2.15**

47

30.5

MeSHb

22.56

2.15**

47

30.5

NNR – number needed to read; * best sensitivity, ** best precision, ‡ best balance between sensitivity and precision (filter with highest precision among filters with sensitivity > 80%)

Table 4: Validation results with gold standard 1 (131 references, left) and gold standard 2 (2,192 references, right)

Gold standard 1 (systematic reviews) Search filter

sens (%)

pre (%)

NNR

Gold standard 2 (qualitative studies)

pre/ratio

sens (%)

pre (%)

NNR

pre/ratio

Wonga

5.34

0.007

14,242

0.2

71.99

0.61

166

9.1

Wongb

4.58

0.013

7,817

0.4

54.97

1.21

83

18.1

Wongc

87.79

0.011

8,814

0.3

93.98*

0.07 1,497

1.0

Wongd

97.71*

0.036

2,762

1.0

91.93

0.19

522

2.9

Wonge

96.18‡

0.193‡

518‡

5.3‡

61.91

0.90

111

13.5

Wongf

96.18

0.099

1,008

2.7

83.39

0.55

182

8.2

Wongg

96.18

0.036

2,753

1.0

89.96

0.19

521

2.9

McKibbon

4.58

0.011

9,415

0.3

60.17

1.06

95

15.9

Shawa

70.23

0.035

2,869

1.0

75.68

0.23

438

3.4

Shawc

94.66

0.021

4,871

0.6

87.64

0.12

869

1.7

UTHealth

93.89

0.118

849

3.3

81.25‡

0.75‡

135‡

11.1‡

MeSHa

47.33

0.360**

278

9.9

21.08

1.90**

53

28.4

MeSHb

47.33

0.360**

278

9.9

21.08

1.90**

53

28.4

sens – sensitivity; pre – precision; NNR – number needed to read; pre/ratio – precision ratio; * best sensitivity, ** best precision, ‡ best balance between sensitivity and precision (filter with highest precision among filters with sensitivity > 80%)

Validation of MEDLINE search filters to identify qualitative research Mandy Wagner*, Stefanie Rosumeck, Christian Küffmeier, Kristina Döring, Ulrike Euler

What is new? Key findings •

MEDLINE search filters for identification of qualitative research differ greatly in design and performance

What this adds to what is known? •

Initial validation of a search filter designed by University of Texas



Comparative overview of validation performance of MEDLINE search filters using a uniform gold standard (independent from medical topic and with literature published in several years)

What is the implication, what should change now? •

It is not possible to suggest a single search filter to identify qualitative research in MEDLINE



The choice for or against a search filter depends on project specific demands and resources

1

Author statement

1) conceived and designed the experiments: MW, SR, UE 2) performed the experiments: MW, SR 3) analyzed and interpreted the data: MW, SR 4) contributed reagents, materials, analysis tools or data: MW, SR, CK, KD 5) wrote the paper: MW, SR, UE

© IQTIG 2019

1

Validation of MEDLINE search filters to identify qualitative research Mandy Wagner*, Stefanie Rosumeck, Christian Küffmeier, Kristina Döring, Ulrike Euler

Declarations of interest: All authors declare no conflicts of interest.

1