Concept mapping internal validity: A case of misconceived mapping?

Concept mapping internal validity: A case of misconceived mapping?

Accepted Manuscript Title: Concept Mapping Internal Validity: A Case of Misconceived Mapping? Authors: Normand P´eladeau, Christian Dagenais, Val´ery ...

720KB Sizes 6 Downloads 104 Views

Accepted Manuscript Title: Concept Mapping Internal Validity: A Case of Misconceived Mapping? Authors: Normand P´eladeau, Christian Dagenais, Val´ery Ridde PII: DOI: Reference:

S0149-7189(16)30144-6 http://dx.doi.org/doi:10.1016/j.evalprogplan.2017.02.005 EPP 1418

To appear in: Received date: Revised date: Accepted date:

19-7-2016 15-1-2017 5-2-2017

Please cite this article as: P´eladeau, Normand., Dagenais, Christian., & Ridde, Val´ery., Concept Mapping Internal Validity: A Case of Misconceived Mapping?.Evaluation and Program Planning http://dx.doi.org/10.1016/j.evalprogplan.2017.02.005 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Title : Concept Mapping Internal Validity: A Case of Misconceived Mapping?

Author names and affiliations : Normand Péladeau, Provalis Research, 1255 Rue University Suite 1202, Montréal, QC H3B 3W9

Christian Dagenais, Department of Psychology, University of Montreal, Pavillon Marie-Victorin, Room C355, P.O. Box 6128, Centre-ville Station, Montreal, Quebec, H3C 3J7, Canada.

Valéry Ridde Department of Social and Preventive Medicine, University of Montreal School of Public Health (ESPUM), Montreal, Canada University of Montreal Public Health Research Institute (IRSPUM), Montreal, Canada.

Corresponding author. Normand Péladeau, Provalis Research, 1255 Rue University Suite 1202, Montréal, QC H3B 3W9, Normand Peladeau

1

Highlights

The concept mapping procedure proposed by Trochim (1989) has been proved as an invaluable method for program evaluators. The way statistical processing of this concept mapping technique was initially conceived and the way it is still being conducted today is less than optimal. Three arguments are brought to demonstrate that the adjustment suggested to the tradition of concept mapping is more than justified. We recommend changing the concept mapping procedure and creating clusters of statements by applying hierarchical cluster analysis techniques on the original proximity or co-occurrence measures.

Abstract Since the early 1990s, the concept mapping technique developed by William M. K. Trochim has been widely used by evaluators for program development and evaluation and proven to be an invaluable tool for evaluators and program planners. The technique combines qualitative and statistical analysis and is designed to help identify and prioritize the components, dimensions, and particularities of a given reality. The aim of this paper is to propose an alternative way of conducting the statistical analysis to make the technique even more useful and the results easier to interpret. We posit that some methodological choices made at the inception stage of the technique were ill informed, producing maps of participants’ points-of-view that were not optimal representations of their reality. Such a depiction resulted from the statistical analysis process by which multidimensional scaling (MDS) is being applied on the similarity matrix, followed by a hierarchical cluster analysis (HCA) on the Euclidian distances between statements as plotted on the resulting two-dimensional MDS map. As an alternative, we suggest that HCA should be performed first and MDS second, rather than the reverse. To support this proposal, we present three levels of argument: 1) a logical argument backed up by expert opinions on this 2

issue; 2) statistical evidence of the superiority of our proposed approach and 3) the results of a social validation experiment.

Keywords concept mapping, multidimensional scaling, hierarchical cluster analysis, evaluation methods, internal validity

Introduction Mapping means knowing. At the beginning of the 1970s, Joseph D. Novak of Cornell University developed a technique of Concept Mapping (CM) which made it possible to visualize the relationships among various concepts (Novak, 1990). Results obtained without the help of statistical analysis, were presented in the form of a diagram in which concepts were linked by arrows and the relationships explained in short sentences. Concept maps have been used in several disciplines, particularly in Education and Philosophy, to give a visual representation of knowledge (Kremer, & Gaines, 1994). Towards the end of the 1980s, William M. K. Trochim, from Cornell University, perfected a Concept Mapping technique that combined strategies for qualitative and quantitative analysis and was based on the active participation of interested parties. The technique is designed to help identify the components, dimensions, and particularities of a given reality, to prioritize them, and relate them to one another (Caracelli, 1989; Daughtry & Kunkel, 1993). The concept maps are based on information produced to answer a single question. The method typically involves five steps: (1) The first step is to formulate the question. (2) A group of participants is then invited to collectively answer this question by generating statements during a brainstorming session. (3) 3

Participants are then asked to sort the topics in piles, creating distinct categories representing an idea or a concept. They also rate each statement in order of importance on a scale of 1 to 5. (4) The data analysis step starts with the creation of a distance matrix between all statements transforming the number of times statements are grouped together in a pile into a distance measure (the more often they appear together, the smaller the distance). A Multidimensional scaling analysis (Kruskal & Wish, 1978) is then applied to this matrix to create a twodimensional map where the position of all statements tends to reflect, as much as possible, the computed distances between them. Next, a hierarchical cluster analysis (Aldenderfer, 1984) is conducted on the map coordinates to group statements that are close to each other, forming clusters that represent concepts1. (5) The last step involves a second meeting with the participants where they are asked to assess, name and interpret the concept map obtained in the previous step. Since the early 1990s, the technique has been widely used by evaluators for program development and evaluation. Most program evaluation journals have published articles on projects that have employed this method. Today, over 200 references to this specific technique have been published in peer review journals. This data collection and analysis technique has proven to be very useful for logic model development, outcome evaluation, needs assessment, concept definition, theory creation, instrument development, etc. The method has been proven to be an invaluable tool for evaluators and program planners. Although the technique of CM has been widely employed, it should be noted that articles published on the subject rarely critically appraise the statistics used in the method. In fact, in our review of 190 articles published in peer review journals from 1989 to 2012, only 12 of them mention the statistical procedures underlying CM, without raising any concerns. This may be due 1

For more details, see Kane & Trochim, 2007; Trochim, 1989a, 1989b.

4

to the fact that most of the authors of these articles are using the technique within the framework of research projects that deal with the advancement of knowledge in a specific field of research and they are not analyzing the method itself. This could also be explained by the fact that the concept mapping (CM) technique developed by Trochim (1989a, 1989b) has been integrated into a software by Concept Systems Incorporated® that runs all the different statistical analyses automatically. Still, the reflexive analysis must be a quality of researchers in general, particularly in the field of evaluation. We believe that some methodological choices made at the inception stage of the technique and still in use today were ill informed, producing maps of participants’ points-of-view that were not optimal representations of their reality, making them unnecessarily harder to interpret. For example, several authors reported the difficulty participants had in understanding and naming clusters and the necessity of removing from the clustered solution, statements with no obvious connection to the others (e.g. Campbell & Salem, 1999; Dagenais, Ridde, Laurendeau, & Souffez, 2009; Gol & Cook, 2004, Mercier, Piat, Péladeau, & Dagenais, 2000; Rosas & Camphausenb, 2007; Sutherland & Katzb, 2005). The aim of this paper is to propose an alternative way of conducting the statistical analysis to make the results easier to interpret and the technique even more useful for evaluators and program planners. We do not challenge the whole CM method, but put into question the statistical analysis performed to create maps (step #4). We propose a change that should result in the creation of more coherent clusters of statements and thus facilitate their interpretation and naming. We will establish that the main problem lies in the statistical analysis sequence by which multidimensional scaling (MDS) is first being applied on the similarity matrix, followed by a hierarchical cluster analysis (HCA) on the Euclidian distances between statements as plotted on the resulting two-dimensional MDS map. We will also demonstrate that this problem is 5

exacerbated by two interrelated factors, namely the high stress values typical of CM studies and the initial choice of restricting the number of dimensions extracted with MDS to two dimensions only. As an alternative, we will advance the idea that the clustering process must be performed on the original similarity (or distance) matrix rather than on the one obtained through a MDS transformation. In other words, HCA should be performed first and MDS second, rather than the reverse as suggested by Trochim (1989a, 1989b). To support this proposal, we will present three levels of arguments. First, we will present a logical argument backed up by expert opinions on this issue. Second, we will attempt to present statistical evidence of the superiority of our proposed approach (HCA→MDS) over the approach implemented by the Concept System software (MDS→HCA). Third, we will present the results of a social validation experiment that demonstrates the superiority of our proposed approach in representing the participants’ points-ofview. Recommendations and suggestions for further research and for alternative ways of producing concept maps will then be presented.

The Logical Arguments Multidimensional scaling is a technique that attempts to represent a matrix of distances (or dissimilarity) between multiple data points on a multidimensional Euclidian space as accurately as possible. It has its origin in psychometrics, where it was developed to identify the underlying dimensions used by people to judge the similarity of a set of objects (Torgersen, 1952). It has been used since then in a variety of fields, especially marketing, but also sociology, political sciences, as well as physics and biology (e.g., Young & Hamer, 1994). In CM, the objects are the statements generated in the brainstorming session, while the similarity is obtained through the grouping of statements by participants into piles during the classification and rating step. A 6

common example given to illustrate MDS is to start with a matrix of distances between cities like the one used in Borgatti, Everett & Johnson (2013) and reproduced in table 1.

Applying MDS on such a matrix and plotting all nine cities on a two dimensions Euclidian plane results in a map that will represent the positions of those cities relative to each other with such a precision that it is be possible to overlap a geographic map of the United States and find the cities located very close to their actual location. Since only relative distances between these cities are being used, such a map may have to be rotated and sometimes flipped to reproduce a US map as we are used to seeing it (north on top and the east on the right). In CM, the co-occurrence of statements in piles created by participants resulting from the sorting process is transformed into a proximity measure, so that the more often two statements appear in the same pile, the closer they will be (Trochim, 1989a, 1989b). A MDS is then applied to this matrix to produce a two-dimensional map of points. In traditional CM, the Euclidian distance between points on the map are then grouped using HCA to form clusters of statements. Except in very rare situations, the Euclidian distances, will not be proportionally equal to the original distances. There will always be some distortion in the visual representation of the distances. Such distortion is quantified in MDS by the stress index. This index is a “badness of fit” measure; the higher the stress, the more distortion there is in the resulting MDS map (Kruskal & Wish, 1978). Stress is inversely related to a correlation coefficient obtained by comparing the original distance from the resulting Euclidian distance on the map: the higher the stress value, the lower the correlation. For Borgatti (1996) : “Care must be exercised in interpreting any map that has non-zero stress since, by definition, non-zero stress means that some or all of the distances in the map are, to some degree, distortions of the input data.” (p.33) 7

Beyond a certain level of stress, the amount of distortion may be so high as to make the interpretation of a MDS solution problematic. The presence of this noise in the graphical representation raises a simple logical argument against the CM statistical approach: Why compute HCA on distances that are, to some extent, distorted, when one could perform clustering on the original distances? One may, however, argue that the answer to this question depends on the amount of noise that has been introduced by the MDS transformation. If very small variations are introduced by MDS, one may consider the distortions to be negligible. Experts in the domain have suggested some criteria for levels of acceptable stress values. For example, Borgatti (1996) suggests that “the rule of thumb we use is that anything under 0.1 is excellent and anything over 0.15 is unacceptable” (p.29). This stress criterion of 0.15 is the most commonly cited and adopted by MDS experts (Davison, 1983; Kruskal & Wish, 1978) while other authors suggest a more liberal criterion of 0.20 as what they consider to be a "poor fit" (Borg & Groenen, 2005). However, according to Trochim (1993) in an unpublished meta-analytic review of published papers that have used CM, the level of stress reported by authors varies from 0.205 to 0.365 with an average value of 0.285, indicating that all concept maps obtained through MDS contained levels of distortion in the represented distanced that would be considered unacceptable. Kane & Trochim (2007) offer two justifications for rejecting the 0.15 criterion and tolerating higher stress values. They first point out that the development of MDS and the establishment of such a strict standard originates from research performed in controlled testing environments, different from the social situation, more typical of concept mapping studies. Such a statement ignores the numerous applications of MDS in less controlled settings (e.g., Young & Hamer, 2004). However, even if one was accepting of this first argument, it would not exempt CM proponents from providing a demonstration that higher stress values could still be used without any major 8

impact on the interpretation of the maps. It also does not exempt them from proposing new criteria, in the light of a careful analysis of the impact. To our knowledge, the question concerning what should be considered as an acceptable level of stress for CM studies has never been answered. In the absence of any clear alternative standards, the descriptive results presented in Trochim (1993) and Kane & Trochim (2007) have become, in absentia, the new standards by which CM users compare the quality of their mapping results. For example, Brennan et al. (2012), Vinson (2014), and Wisner (2008) have suggested their mapping solution is acceptable because they were near or below the average value of 0.285 as reported by Trochim (1993). Other authors who obtained higher stress values have also declared the tolerability of their results since they remained within the range of values presented in the same article (e.g. Steele-Pierce, 2006; Windsor, 2013). In other words, a stress value is tolerable if it does not exceed the highest level obtained by others. The second argument presented by Kane & Trochim (2007) is that “stress calculations are sensitive to slight movements in statements that are not likely to have meaningful interpretative values in concept mapping” (the emphasis is ours). What one should then ask is how “slight” are the movements. To illustrate the amplitude of movements, we can use the US city distance data presented in table 1. We performed a Monte Carlo experiment where we introduced random errors following a normal distribution on both the vertical and horizontal positions of each city. We then computed new matrix distances as well as the stress value. Standard deviations corresponding to various stress levels were estimated using 3,000 runs generating 108,000 distorted distances. Using our 9 cities as examples to obtain stress of 0.15 (considered as a “poor fit”) the deviation of a city from its real location would be anywhere within a 148-mile radius half of the time. The radius would increase to 430 miles if we want to be right 95% of the time (see Figure 1). It means that Chicago could be located anywhere between Omaha on the West to 9

Pittsburgh on the East, Nashville on the South and in the Lake Superior on the North. Despite such an important variation, the relative distances between cities very far from each other, would likely remain fairly reliable. Things could be different, however, for cities close to each other. Washington, DC may very well be plotted closer to Boston than to New York, which may itself be positioned closer to Chicago. Such a fact is a well-know characteristic of MDS. For example, Borgatti, Everett & Johnson (2013) stated that “longer distances tend to be more accurate than shorter distances, so larger patterns are still visible even when stress is high” (p. 91). DeJordy, Borgatti, Roussin & Halgin (2007) state that closer distances in MDS are less meaningful and conclude that "MDS is not typically useful in identifying the relative ranking of similarities within a group of items positioned closely together (except when the stress is 0)" (p. 245). Kruskal & Wish (1978) also come to this conclusion when they state that “MDS does a much better job in representing large distances (the global structure) than in representing small distances (the local structure).” But such amplitude of movements typical to MDS with stress values of 0.15 is not representative of those typically found in CM research. An average stress value of 0.285, as reported by Trochim (1993), would mean that any city could be found anywhere within an 878-mile radius 95% of the time (see Figure 1). Chicago could well be located close to Denver, Colorado, or somewhere in New Mexico, Florida, Maine, or 200 miles off-shore in the Atlantic Ocean. For a higher stress value of 0.365 - the maximum reported by Trochim (1993) - the amplitude of movements would be so great that 95% of the time, Chicago could be plotted anywhere within a 1218-mile radius. Anyone would agree that a variation of this magnitude can hardly be qualified as “slight movements.”

Such large distortions, combined with the lower reliability of shorter distances typical of MDS maps (Borgatti et al., 2013; Kruskal & Wish, 1978) raise another logical objection to the 10

statistical analysis suggested by Trochim (1993). The HCA creates clusters by successively grouping items that are the closest to each other on the MDS map. Thus, it seems very hard to justify progressively building the clusters of items by the agglomeration of the less reliable distances. Arabie, Carroll, & Desarbo (2003) raised this concern when they warn users against such a temptation: “One should not try visually to “define” clusters on the basis of a spatial representation of the objects resulting from an analysis based on […] spatial MDS.” (p. 54, authors’ emphasis) One could argue, like Kane and Trochim (2007), that such a combination of clustering and multidimensional scaling is legitimate and has been performed by MDS experts such as Kruskal and Wish (1978). For example, to justify the use of two-dimensional solutions, they wrote: “…we have almost universally found the two-dimensional solutions to be acceptable and highly useful (…) especially when coupled with cluster analysis like Kruskal and Wish (1978) suggest.” (p. 96) While it does seem to suggest such a thing, a careful reading of the original reference tells a different story: “…the structure within neighborhoods sometimes provides a rather poor representation of the relative proximities between the stimuli involved. This is why we advocate drawing lines on the configuration to indicate the closeness based on the original data” (p. 48, our emphasis) So, not only do Kruskal and Wish (1978) clearly state that the clustering should not be performed on the MDS coordinates but on the original data, but they also proposed using such clustering as a way to diagnose MDS representations. For those authors items clustered on the observed distance using HCA that are not plotted close to each other on a MDS plot indicates the failure of MDS to accurately represent the cluster structure of the data. Borg and Groenen (2005) 11

also suggest using cluster analysis on the original data as a way to "check whether the clusters that one sees in a MDS solution are but scaling artifacts" (p. 108). The Statistical Argument Beside the logical argument or the appeal to authority that we have used in the previous section, one could also attempt to measure how good various clustering approaches are at representing the original groupings of items made by participants in concept mapping projects. To achieve this, we reanalyzed CM data from five studies. The authors, number of items, number of topics, and obtained stress values are presented in table 2.

We computed for each of the five studies, 29 clustering solutions varying the number of clusters from 2 to 30 clusters. The Concept Systems Inc. software was used for this. We then performed ascending hierarchical clustering analysis (HCA) using WordStat 6.1 content analysis software (Provalis Research, 2010) on the same data sets, using the Jaccard coefficient as the similarity measure and a weighted average linkage agglomeration technique. Clustering solutions from 2 to 30 clusters were produced this way. We then computed, for each clustering solution, indicators of the “goodness of fit”, allowing one to assess how close the solutions were to the original grouping of statements made by participants. For the sake of clarity, we have chosen two simple measures that can be easily understood. We will next present results obtained with the two indicators. A) The percentage of pairings represented by clusters. If we keep all statements in a single big cluster, all pairings will be taken into account. With a two clusters solution, several observed pairs of items will be consistent with the clusters, consistency being defined by the fact that the two paired items will be found in the same cluster. Some other observed pairs of items would, however, be inconsistent with the clustering solution, 12

each item of this pair being stored in a different cluster. As the number of clusters increases, and as clusters become smaller, a gradual decrease in the percentage of parings represented by those clusters should be observed. The most important aspect that is of interest for our current demonstration is that a better clustering method should show a slower decline, indicating that the formed clusters are able to encompass a greater number of item pairings than an inferior clustering method. Figure 2 shows clearly that, in all cases, clusters created on the original matrix demonstrate, consistently, a better representation of the original pairings made by participants than clustering made on the MDS coordinates, typical of CM.

B) Number of misclassified items Both the CM and the HCA approach of clustering presented here rely on an ascending hierarchical agglomeration process, by which items are grouped gradually one item or one cluster at a time, until all items form a single cluster. A well-known deficiency of such an approach is that some items that have been aggregated at an early stage, may, as their containing clusters become bigger and bigger, find themselves associated with additional items that they are seldom paired with them. At some points, the average level of association of an item with all the others in its containing cluster may be lower than its average level of association to items of another cluster. As a result, a better fit would potentially be achieved by moving this item to the other cluster. At any given moment of the agglomeration process, one can compute the number of items that could potentially be moved to another cluster for the reason that its average level of association with items in its containing cluster has become lower than the average to another cluster. From a purely statistical perspective, we can call those "misclassified items” to indicate 13

that the statement is more highly associated with other statements in another cluster. A superior clustering method should normally produce lower numbers of misclassifications and may thus be used as another “badness-of-fit” indicator, the higher the number of misclassified items, the worse the clustering solution. We computed the number of “misclassified items” for all 145 clustering solutions and plotted the results graphically, as shown in figure 3.

Results clearly confirm the benefit of clustering the original distance (or proximity) to the clustering of MDS coordinates typical of CM. Only five of the 145 comparisons indicate that clustering MDS coordinates may result in lower number of misclassified items. All of those occurred for clustering solutions with 2 to 4 clusters, untypical to CM final mapping solutions. There is, however, another interesting characteristic of the curves that is worth mentioning. When clustering the MDS coordinates, one can see a gradual increase in the number of misclassified items: the higher the number of clusters, and thus the smaller the number of items in a cluster, the higher is the number of items potentially misclassified. The Kane and Trochim (2007) mapping results appear to be the only deviation from this rule. They show an increase in the number of misclassified items up to a 21 clusters solution then a gradual decrease for the remaining nine clustering solutions. The pattern seems totally different when clustering the original proximity matrix. While it initially tends to increase, it quickly stabilizes and even decreases as the number of clusters increases. Such divergent patterns are consistent with the previously mentioned observations that in MDS, shorter distances tend to be less reliable than longer ones (Kruskal & Wish, 1978). One may thus expect the number of misclassified items to be higher for the first clustered items when performed on MDS coordinates. On the contrary, initial clustering of items when performed on 14

the original distance (or similarity) matrix should be more reliable but may become less reliable as the size of clusters increases, yet it seems to remain better than clustering MDS coordinates. Overall, it appears that this additional “badness-of-fit” indicator provides still more evidence that clustering MDS coordinates may not be an optimal way of clustering statements in CM projects. The Social Validity Arguments Despite the logical arguments presented previously, the appeal to authority, and the statistical arguments, one may still wonder whether, such a superiority also holds in the eye of potential participants. However, concept maps produced by the two alternate approaches may be very different and the process of identifying and naming concepts can be very time consuming making a comparison in the context of a full implementation of the method quite a challenge. Two authors of this paper have been using CM for more than 15 years and, after experimenting with the HCA methods, they had a strong impression that the results were much clearer and easier to interpret. However, to provide a less subjective answer as to which CM solution is perceived as being more valid, we designed a secondary analysis of existing CM data where people were asked to identify indirectly which method offered the most coherent clustering solution. Clusters showing some similarity between the traditional CM and HCA solutions in three of the five concept mapping studies presented previously: Dagenais & Hackett (2008), Jean et al. (2007), and Kane & Trochim (2007) were identified. These three studies were selected because of the greater similarities in the cluster solutions obtained by both approaches. They were also chosen because the subject addressed in these studies did not require specialized knowledge, allowing non-expert external participants to assess more easily the consistency of their solutions. All clusters sharing less than 65% of the statements were discarded along with their associated

15

statements. All other clusters sharing at least 65% of the statements were selected and their discordant items put aside. We then selected from the list of discordant statements those that came from two of the remaining clusters, one associated with the CM solution and the other one from the HCA solution. A total of 37 statements were identified this way. Thirty-four (34) graduate students in a psychology and in a public health department participated in this experiment. No financial incentive was offered and everyone was free to opt out of participating. Three questionnaires were then built, one per study, where each of the discordant statements were presented along with the remaining clusters. Participants had to read and familiarize themselves with the statements in various clusters. Then, participants had to look at the extracted statements and put them back in a cluster, being asked, in a forced choice situation, to choose either the CM or the HCA cluster from which they originated. For example, they were asked if the item "develop employee incentive program" belongs to cluster A or cluster B. Students had no indication as to how existing clustering solutions were obtained. Figure 4 presents the percentage of decisions favoring each of the two solutions for all three studies. In all three cases, a greater proportion of students chose to put back the extracted statement in the cluster obtained using HCA. The largest difference was observed for the Dagenais and Hackett (2008) study, where more than three out of four participants favored the HCA solution. As a matter of fact, all items in this study were put back in the HCA clusters by a majority of students. The smallest difference was found for the Kane and Trochim (2007) study, for which 57% of the decisions still favored the HCA solution over the CM solution.

Such an experiment suggests that grouping items based on their original distance creates clusters that are perceived as more coherent and thus has a greater social validity than clusters created on 16

MDS coordinates. The sample size (n=37) and the number of studies being compared (n=3) are rather small, and would not, by themselves, represent sufficient evidence of the superiority of our proposal over the approach prescribed by Trochim (1989a). Yet, we believe the results of this social validity experiment are the logical consequences of the specific statistical properties of the two approaches. Conclusion One of the objectives of this article is to stress the inherent problem in the statistical procedures used in traditional CM to create clusters of statements. While the concept mapping procedure proposed by Trochim (1989a) has been proved as an invaluable method for program theory building, need assessments, outcome evaluation, instrument development, etc., we believe that this article presents a solid case against the idea of applying cluster analysis on coordinates of a two-dimension multidimensional scaling procedure. We would thus strongly recommend changing such a procedure and creating clusters of statements by applying HCA techniques on the original proximity or co-occurrence measures. However, the current proposal does not address two important methodological issues that need further discussion and may justify additional studies. The choices made here of the Jaccard coefficient as the proximity measure for co-occurrences and of the weighted average linkage as the agglomeration technique for the hierarchical clustering were not based on any careful reflection on their respective statistical properties or on any prior experiment. We used instead the default clustering options of the WordStat content analysis software, settings which have been selected as default because of their good performance when applied to text mining tasks. The clearly superior results obtained from the very start saved us from the necessity of exploring other clustering options. Further experiments on CM data using other similarity measures, other agglomerative techniques or non-hierarchical clustering methods (e.g. k-mean) may provide 17

better grouping, reduce the number of misclassified statements and thus increase the validity of the extracted concepts. Another important issue that has not been addressed is the visual representation of the extracted concepts. Dendrograms typically used to graphically represents the product of a hierarchical clustering are not as intuitive as two-dimensional maps such as those produced by Concept Systems. Dendrograms are also not an optimal solution to represent the relationships (or distances) between all extracted concepts, a task that is better dealt with graphics such as those produced using multidimensional scaling or graph layout algorithms (DeJordy et al., 2007). An obvious solution would be to perform a MDS on a matrix of the distances between concepts created using HCA, those distances being computed for example by averaging the distances between items from different clusters. Such a computation approach has been used successfully by Dagenais, Pinard, St-Pierre, Briand-Lamarche, Cantave & Péladeau (2016). Concepts displayed on such MDS maps could very well be represented as shapes of various sizes to express other numerical dimensions as it is often performed in CM using layers. Concepts may also be graphically displayed on a 2D MDS map as lists of statements ordered by their strength of association with other cluster items. Such a measure, like Trochim’s bridging value, allows users to easily identify items that may potentially be moved to another cluster. However, contrary to the bridging value, it would be based solely on the co-occurrence of statements in piles, and would not take into account standardized Euclidean distances of those points on a MDS plot. One may argue that such a solution would greatly reduce the level of detail to that of the cluster level leaving stakeholders with little to no understanding of how specific statements relate to each other. However, one has to remember that smaller distances are the least accurate, especially when the stress level is high. A better solution proposed by Borgatti (1996) is to extract the similarity matrix of items in a given cluster and reapply a MDS on it. Further 18

explorations of statistical and graphical techniques may be needed to fully support the various requirements of different concept mapping tasks. It seems clear to us that this imperative for producing a visual representation of concepts in the shape of a 2D map has led Trochim (1989a, 1989b) to select a statistical processing that was less than optimal for the extraction of concepts. We believe that the choice of a graphical representation of data should be based on its ability to effectively convey the underlying characteristics of the data and facilitate its interpretation. It should never be done at the cost of its validity or make such interpretation more difficult, as we believe it does in the way concept mapping was initially conceived and the way it is still being performed today. We also believe that the required adjustment to the tradition of concept mapping is more than justified by the improved validity and interpretability of the data that will logically follow from such a change. It could only have a positive impact on the concept mapping technique, a technique that has already proven its usefulness.

Acknowledgments Normand Peladeau is the current president of Provalis Research. Valéry Ridde holds a CIHRfunded Research Chair in Applied Public Health (CPP-137901).

Reference Aldenderfer, M.S. and R.K. Blashfield, Cluster analysis. 1984, Newbury Park: Sage. Arabie, P., Carroll, J. D., & DeSarbo, W. S. (2003). Three-way scaling and clustering (Nachdr.). Newbury Park, Calif.: Sage Publ. Borg, I., & Groenen, P. J. F. (2005). Modern multidimensional scaling: theory and applications (2nd ed). New York: Springer. Borgatti, S. P., Everett, M. G., & Johnson, J. C. (2013). Analyzing social networks. Los Angeles, Calif.: Sage. 19

Borgatti, S.P. (1996). Anthropac 4 Methods Guide. Natick, MA: Analytics Technologies. Brennan, L. K., Brownson, R. C., Kelly, C., Ivey, M. K., & Leviton, L. C. (2012). Concept Mapping. American Journal of Preventive Medicine, 43(5), S337–S350. http://doi.org/10.1016/j.amepre.2012.07.015 Campbell, R., & Salem, D. A. (1999). Concept mapping as a feminist research method: Examining the Community Response to Rape, 23(1), 65–89. doi:10.1111/j.14716402.1999.tb00342.x Caracelli, V. J. (1989). Structured conceptualization. Evaluation and Program Planning, 12(1), 45–52. http://doi.org/10.1016/0149-7189(89)90021-9 Dagenais, C., & Hackett. (2008). Determinants of research use in the field of literacy : A concept mapping project. Montreal: Unpublished manuscript. Équipe RENARD. Dagenais, C., Pinard, R., St-Pierre, M., Briand-Lamarche, M., Cantave, A. K., & Péladeau, N. (2016). Using concept mapping to identify conditions that foster knowledge translation from the perspective of school practitioners. Research Evaluation, 25(1), 70–78. http://doi.org/10.1093/reseval/rvv026 Dagenais, C., Ridde, V., Laurendeau, M. C., & Souffez, K. (2009). Knowledge translation research in population health: establishing a collaborative research agenda. Health Res Policy Syst, 7, 28. http://doi.org/10.1186/1478-4505-7-28 Daughtry, D., & Kunkel, M. A. (1993). Experience of depression in college students: A concept map. Journal of Counseling Psychology, 40(3), 316–323. http://doi.org/10.1037/00220167.40.3.316 Davison, M.L. (1983). Multidimensional scaling. New York: Wiley. DeJordy, R., Borgatti, S. P., Roussin, C., & Halgin, D. S. (2007). Visualizing Proximity Data. Field Methods, 19(3), 239–263. http://doi.org/10.1177/1525822X07302104 Gol, A. R., & Cook, S. W. (2004). Exploring the underlying dimensions of coping: A concept mapping approach, 23(2), 155–171. Retrieved from http://www.atyponlink.com/GPI/doi/abs/10.1521/jscp.23.2.155.31021 Jean, B., Ridde, V., Bédard, S., & Beaudry, R. (2007). Les représentations de la ruralité au Québec: une démarche de cartographie conceptuelle de la ruralité (Rapport no 9) (CRDT et Chaire de recherche du Canada en développement rural) (p. 167). Rimouski: UQAR. Kane, M., & Trochim, W. M. K. (2007). Concept mapping for planning and evaluation. Thousand Oaks: Sage Publ. Knox, C. (1995). Concept Mapping in Policy Evaluation: A Research Review of Community Relations in Northern Ireland. Evaluation, 1(1), 65–79. http://doi.org/10.1177/135638909500100105 Kremer, R., & Gaines, B. R. (1994). Groupware concept mapping techniques. In 12th annual 20

international conference on Systems documentation: technical communications at the great divide. Banff, AB, Canada. Kruskal, J.B. and M. Wish, Multidimensional scaling. Quantitative applications in the social sciences, ed. E.M. Uslaner. 1978, Newbury Park: Sage. Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. (Sage University papers: Quantitative applications in the social sciences). 07-011. Beverly Hills: Sage. Mercier, C., Piat, M., Péladeau, N., & Dagenais, C. (2000). An application of theory-driven evaluation to a drop-in youth center. Evaluation Review, 24(1), 73–91. Retrieved from http://erx.sagepub.com/cgi/reprint/24/1/73 Novak, J. D. (1990). Concept mapping: A useful tool for science education. Journal of Research in Science Teaching, 27(10), 937–949. http://doi.org/10.1002/tea.3660271003 Paré, M-H. (2010). Challenges & Opportunities for North-South Research Partnership in Mental health & Psychosocial Support in Humanitarian Settings. Retrieved from http://www.academia.edu/13624462/Challenges_and_Opportunities_for_NorthSouth_Research_Partnership_in_Mental_health_and_Psychosocial_Support_in_Humanitarian_S ettings Rosas, Scott R, & Camphausen, L. C. (2007). The use of concept mapping for scale development and validation in evaluation. Evaluation and Program Planning, 30(2), 125–135. Retrieved from http://www.sciencedirect.com/science/article/B6V7V-4MX56GG1/2/cb0d369125e522442e814b85c9cf9e16 Siau, K., & Wang, Y. (2007). Cognitive evaluation of information modeling methods. Information and Software Technology, 49(5), 455–474. http://doi.org/10.1016/j.infsof.2006.07.001 Steele-Pierce, M. E. (2006). Leadership as Teaching: Mapping the Thinking of Administrators and Teachers. Antioch University. Retrieved from https://etd.ohiolink.edu/pg_10?0::NO:10:P10_ACCESSION_NUM:antioch1165860089 Sutherland, S., & Katz, S. (2005). Concept mapping methodology: A catalyst for organizational learning. Evaluation and Program Planning, 28(3), 257–269. Retrieved from http://www.sciencedirect.com/science/article/B6V7V-4GFCT041/2/adeaf63e56ae3bc067b4b9786c7a12b3 Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and Method. Psychometrika, 17(4), 401-419. Trochim, W. M. (1993). The reliability of concept mapping. Presented at the Annual Conference of the American Evaluation Association, Dallas, TX. Trochim, W. M. K. (1989a). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12(1), 1–16. http://doi.org/10.1016/0149-7189(89)90016-5 Trochim, W. M. K. (1989b). Concept mapping: Soft science or hard art? Evaluation and 21

Program Planning, 12(1), 87–110. http://doi.org/10.1016/0149-7189(89)90027-X Vinson, C. A. (2014). Using Concept Mapping to Develop a Conceptual Framework for Creating Virtual Communities of Practice to Translate Cancer Research into Practice. Preventing Chronic Disease, 11. http://doi.org/10.5888/pcd11.130280 Windsor, L. C. (2013). Using Concept Mapping in Community-Based Participatory Research: A Mixed Methods Approach. Journal of Mixed Methods Research, 7(3), 274–293. http://doi.org/10.1177/1558689813479175 Wisner, Betsy L. (2008). The impact of meditation as a cognitive -behavioral practice for alternative high school students. Ann Arbor, USA, DAI-A 70/01, Dissertation Abstracts International. Young. F. W. and Hamer. R. M. (1994). Theory and Applications of Multidimensional Scaling. Eribaum Associates. Hillsdale, NJ.

22

Figure 1 Ninety-Five Percent Confident Intervals in the Positioning of Chicago for Three Levels of Stress

Figure 2: Percentage of Statement Pairings by Number of Clusters for Five Concept Mapping Studies

23

Figure 3. Number of Misclassified Items by Number of Clusters for Five Concept Mapping Studies

24

Figure 4: Percentage of Decisions Favoring Two Clustering Approaches

25

DENVER

LOS ANGELES

SAN FRANCISCO

SEATTLE

CHICAGO

MIAMI

WASHINGTON

NYC

BOSTON

Table 1 Distances Between 9 US Cities

BOSTON

0

206

429 1504

963 2976 3095 2979 1949

NYC

206

0

233 1308

802 2815 2934 2786 1771

WASHINGTON

429

233

0 1075

671 2684 2799 2631 1616

MIAMI 1504 1308 1075 CHICAGO

963

SEATTLE 2976

802

0 1329 3273 3053 2687 2037

671 1329

0 2013 2142 2054

996

815 2684 3273 2013

0

SAN FRANCISCO 3095 2934 2799 3053 2142

808

0

379 1235

LOS ANGELES 2979 2786 2631 2687 2054 1131

379

0 1059

DENVER 1949 1771 1616 2037

808 1131 1307

996 1307 1235 1059

0

Table 2. Five concept mapping studies ITEMS

PILES

STRESS

104

18

.353

Dagenais & Hackett (2008) Literacy

81

9

.329

Jean, Bédard & Ridde (2007) Rural living

99

11

.295

Paré (2010) Disaster Mental Health Intervention

89

18

.259

Kane & Trochim (2007) Non-profit

80

11

.313

Dagenais, Ridde, Laurendeau, & Souffez (2009) Knowledge Application & Transfer

26