Evaluation and Program Planning 35 (2012) 124–132
Contents lists available at SciVerse ScienceDirect
Evaluation and Program Planning journal homepage: www.elsevier.com/locate/evalprogplan
Understanding the ‘‘what should be condition’’ in needs assessment data Jeffry L. White a,*, James W. Altschuld b,1 a b
University of Louisiana, Lafayette, 200 E. Devalcourt St., 262 Picard Center, Lafayette, LA 70506, United States The Ohio State University, 1900 Kenny Road, Columbus, OH 43212, United States
A R T I C L E I N F O
A B S T R A C T
Article history: Received 21 February 2011 Received in revised form 29 August 2011 Accepted 10 September 2011 Available online 16 September 2011
In needs assessment (N/A), the calculation of discrepancies is based on the assumption the ‘‘what should be’’ condition is a reasonable representation of respondent perceptions. That assumption may be erroneous and requires a closer inspection. This paper examines the use of importance scores in NA and some of the problems that can arise when they are used as a proxy to measure the ‘‘what should be’’ condition. A review of the literature and ways of dealing with importance scores are presented, followed by a discussion of the problems and issues that can arise. Some solution strategies are offered along with recommendations for practice and research. The paper provides guidance for others interested in improving needs assessment procedures. ß 2011 Elsevier Ltd. All rights reserved.
Keywords: Needs assessment Discrepancy scores Importance ratings
1. Introduction Surveys are prominent in needs assessment (Altschuld & Witkin, 2000) and usually contain items for rating the ‘‘what should be’’ and ‘‘current’’ conditions, as well as motivation or other proximal scales on key topics or issues. At a minimum, discrepancies are calculated between the normative or ‘‘current’’ condition and the optimal or ‘‘what should be’’ state, in accord with the classic definition of needs (Gupta, Sleezer, & Russ-Eft, 2007; Kaufman & English, 1979; Leigh, Watkins, Platt, & Kaufman, 1998). The ‘‘what should be’’ status is often referred to as ‘‘what is important.’’ This raises concerns about the validity of the scores because the use of importance as a proxy is done with few questions about how it is measured and limited research on the subject. This is troubling because importance rankings are frequently used to compute discrepancy scores (Lee, Altschuld, & White, 2007a). This paper discusses the nature of the importance variable in needs assessment and some of the concerns associated with obtaining a meaningful and relevant measure. It begins with a theoretical review and discusses the rationale respondents may use to assign importance scores. This is followed by a discussion on pre-structuring importance data for surveys and offers some potential solution strategies. The paper illuminates some of the problems that can be encountered, their practical implications, and illustrates key issues found in the literature, such as how to
* Corresponding author. Tel.: +1 337 482 1010; fax: +1 337 482 5262. E-mail addresses:
[email protected] (J.L. White),
[email protected] (J.W. Altschuld). 1 Tel.: +1 614 292 6541. 0149-7189/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.evalprogplan.2011.09.001
illuminate disparities. While needs assessors are the primary audience, it is also intended for those interested in the measurement of needs. Finally, it offers recommendations for researchers and suggestions to help avoid erroneous interpretations of needs data.
2. Importance data At first glance it may seem that importance scores should be accepted at face value. But what does that information indicate about the basis the respondents used for assigning the ratings? Everyone has an opinion about what is important but is that opinion well-founded or ill-conceived? Ratings from less-thaninformed perspectives cloud the picture and become suspect in calculating discrepancies. So what might the scores actually signify? 2.1. The nature of importance: theoretical overview Ausubel’s (1978) principles offer insight into the origin of the characteristic of importance, in which prior knowledge is essential before rating an entity. Without accurate information, understanding is imprecise and respondents may construe what they have learned to fit their belief systems (Ausubel, Novak, & Hanesian, 1978). Anderson and Pearson (1984) refer to this as a network of mental processes that depict how individuals look at the world through learned information, which leads to making connections between ideas and concepts. These principles continue to offer guidance for researchers on how to integrate new ideas (Novak, 2003) and make sense of data (Novak, 2002). In needs assessment practice this is elicited by using visual images for
J.L. White, J.W. Altschuld / Evaluation and Program Planning 35 (2012) 124–132
125
Table 1 Basis of importance, effect on discrepancy, and interpretation considerations. Basis of score
Effect on discrepancy
Interpretation considerations
No knowledge about the topic Incomplete or limited knowledge evidenced by quality/quantity of stimuli Inaccurate knowledge evidenced by quality/quantity of stimuli Knowledgeable about the topic evidenced by quality/quantity of stimuli
Discrepancy scores are probably not meaningful Improvement on previous category but meaning is still in question Misleading discrepancies
Scores reflect opinion only Marginally better than opinion
More meaningful discrepancies
Most accurate data
illustration, oral prompts for context, and written stimuli like small case vignettes to elicit responses about needs on a survey. Another way to look at importance is through Coombs’ (1964) theory of data where observations are: (1) products of stimuli, and (2) viewed in geometric terms as relations between the points on a scale. It is the combination of these two propositions that becomes critical to obtaining meaningful observations. Coombs classified data into three categories. Preferential choice is characterized by selecting preferences from homogeneous groups such as political candidates and flavors of ice cream. In practice this is accomplished by providing a variety of options about needs from which to choose. In contrast, single stimulus data is noted by its lack of choices. Instead of selecting from a group, a candidate is scored on a scale in proximity to the other candidates, typically after receiving some form of stimuli (visual, oral or written) about that candidate. The last category, stimulus comparison, is distinguished by choices from pairs of options that are ordered in relationship to the others after information is provided about all of the candidates. The salient point from Coombs’ (1964) theory that is still applicable is that a simple prompt about importance is not sufficient to challenge a respondent to think about choices. Selecting a high or low score is relatively easy since they do not have to make any distinctions. In other words, single stimulus data is a poor alternative and draws attention to the inherent problem – balancing the theoretical and practical aspects (Thompson, 1966, 1968) of needs assessment. 2.2. Basis of importance scores In order for importance scores to be of value in computing discrepancies, thought must be given to the basis on which they were made and the quality and quantity of the prompt. Are they derived from an informed perspective or something less? What is the quality of the evidence? Did they come from single or multiple sources? This raises a number of concerns, as described in Table 1, about the basis on which respondents provide importance ratings and the effect they have on discrepancy scores. The most desirable importance rating is in the last row of Table 1, where a score and subsequent discrepancy originate from an informed perspective. This yields the most valid and legitimate result. When discrepancies are derived from the other rows they are less precise and interpretation becomes problematic.
Error-based or distorted scores
There are other issues that must be addressed when measuring importance. They include selection of the content (Calsyn & Winter, 1999; Kaufman & English, 1979), perspective of the group (Reviere, Berkowitz, Carter, & Ferguson, 1996), and construction of the items (Altschuld & White, 2010; Altschuld & Witkin, 2000; Witkin & Altschuld, 1995). These issues are described in Table 2, along with the nature of the problem and effect they can have on needs assessment. It should be noted that the information presented in Tables 1 and 2 is compiled from the needs assessment literature. The issues and concerns are not intended to be exhaustive but rather generate more discussion about the impact of importance (meaningful or otherwise) on the calculation of discrepancy scores.
3. Pre-structuring importance data Most assessors and needs assessment committees collect information in order to make preliminary decisions about the content of the instrument. This pre-structuring is typically qualitative in nature and creates a context for constructing the quantitative sections of the survey. While mixing methods is acceptable (Greene, 2001), it is not without concern. Merely juxtaposing the interview data into the survey is not sufficient for good alignment and the opposite may be closer to reality (Harris & Brown, 2010; Reams & Twale, 2008; Stemler, 2004). It is rather the strength of alignment of methods that produces well chosen survey items (more on alignment later). When importance survey items are pre-structured, the distributions tend to be negatively skewed and clustered at the upper end of the scale with little differentiation. The lack of variability can be problematic when working with this type of data. Since discrepancies are dependent on current status scores, the concern is the consequence of using invalid data in their calculation. Such highly skewed distributions can be the product of poor alignment of the pre-structured information with the survey, which results in a loss or distortion of the information. There are some scaling options available to help get around this problem. One is to use packed rating scales (Brown, 2004) with more response options at the positive or negative end of the spectrum depending on the context. Others ways are to avoid the use of neutral mid-points (Doyle, 1975; Ory & Wise, 1981), clearly label the end points (Dixon, Bobo, & Stevick, 1984), and provide
Table 2 Problems and issues in measuring importance. Issue
Nature of the problem
Effect on needs assessment
Pre-selection of content by needs assessors (Calsyn & Winter, 1999)
Respondents can rate pre-selected items on high end of scale
Multiple group perspectives of importance (Reviere et al., 1996; Sung, 1992)
Groups rate on different bases of knowledge
Multiple unique sets of items (stimuli) in surveys (Altschuld & White, 2010; Altschuld & Witkin, 2000; Witkin & Altschuld, 1995)
No relative importance of sections due to pre-selection
Positively skewed distribution of importance scores affecting calculation of discrepancies Interpreting different discrepancies within and across groups All large discrepancies seen as similar, no differentiation
J.L. White, J.W. Altschuld / Evaluation and Program Planning 35 (2012) 124–132
126
intermediate response options to obtain more refined discriminations (Lam & Klockars, 1982). This is not the same situation with the ‘‘what is’’ status since respondents often select do not know, not applicable, or other similar options. This can create a different problem – a smaller number of what is responses compared to what should be (Lee, Altschuld, & White, 2007b). There may be little the needs assessor can do other than ensure an adequate sample size. Going further, it is recommended that numerous groups (recipients, service providers, administrators) be sampled for needs assessment surveys. Multiple perspectives are valuable and help the assessors understand how issues are perceived and where subgroups agree or disagree. There is a strong possibility that some items will indicate where subgroups have unique perceptions of importance based on different sources of information of varying degrees of accuracy. The needs assessors may also have to contend with within and across group discrepancies. (The same reasoning applies to current status ratings but they are not the focus of this paper). Importance scores in the second row of Table 2 may reflect different views and be influenced by the various knowledge bases on which they are predicated. There may be differences based on demographic variables such as age, race/ethnicity, gender or other demographic characteristics. The last row corresponds to a latent issue in needs assessment methodology. Surveys are comprised of sections which are not usually rank ordered. Even when importance for all items emanates from accurate information, the sections themselves may not be of equal value. Unless there is an appropriate mechanism for ranking them (more on this later), it becomes difficult to interpret which discrepancies take precedence over others of similar size. This adds more complexity to thinking about discrepancies calculated from an anchor score based on importance. 4. Solution strategies While some of the problems with the importance variable are formidable, the needs assessment literature provides possible solution strategies. Each has strengths, weaknesses and unique characteristics as shown in Table 3. In addition, there are
implications for instrument design, challenges for analyzing data, and interpretation issues for those conducting needs assessments. The first strategy is to normalize the distribution of importance scores and compare the normed scores to the ‘‘what is’’ variable (Altschuld & White, 2010). This removes skewness, but it would be better to have more variability without resorting to a statistical adjustment. While norming data can generate smaller measures of variability (standard errors, variances), it also creates concerns about the accuracy and bias of the results. This can become problematic with some stakeholders (needs assessment committees, providers, recipients), where statistical manipulation could be perceived as an artificial change. The second solution was used by Witkin (1984) in a needs assessment about guidance counseling in schools. She felt that parents were not adequately informed and therefore were unable to make meaningful responses. To overcome this, a short paragraph was embedded before each section of the survey as a prompt before respondents rated the items. This strategy was intended to standardize the information but it made the survey longer and more tedious to complete. The approach in this case was not based on the literature but was primarily intuitive, without a full understanding of the effects on response patterns (B. Witkin, personal communication, August, 1994). Another solution is to have survey respondents describe their knowledge about a topic and confidence in rating its importance in individual interviews, focus groups, or on questionnaires. Incentives (based on the context) can be used to acquire the explanatory information (Anderson, Puur, Silver, Soova, & Voormann, 1994; Brehm, 1994; Dillman, Smyth, & Christen, 2009). In another study, Yoon (1990) asked college administrators (n = 62) about their attitudes and factors involved in the utilization of needs assessments. He also queried the participants about their confidence in the answers and choice of scale values. Similar to Witkin (1984), Yoon’s approach did not use the additional information to attenuate the findings. Still another way at getting at importance was used by Hamann (1997) to study the knowledge and utilization of needs assessments in mental health counseling. Working with three groups of evaluators, administrators, and consumers (n = 162), she wanted to know if the design and structure of the instrument had an effect on the perception and impact on the outcomes. The survey had
Table 3 Potential solutions, strengths, and weaknesses. Solution
Strengths
Weaknesses
Characteristics
Normalizing the distribution of importance scores (Altschuld & White, 2010)
Simple, easy to do
Substitute for not getting variability in the first place May not accurately reflect responses Only partially affects skewness May be perceived as artificial Increases respondent burden by making instrument more complex May change and bias responses
No change to instrument, standardization done after data collection
Adding information (stimuli) to inform ratings of importance (Witkin, 1984) Adding inquiry items about the basis for the ratings (Yoon, 1990) Rank order sections within the survey (Hamann, 1997)
Not too burdensome/costly Eases skewness if observed Provides all respondents with same basic, accurate information prior to rating Standardizes base from which responses are made Gathers information about the underlying nature of responses Better understanding of importance and meaning of discrepancies Straightforward and intuitive Get a better feel for importance Better able to prioritize discrepancies
Visual cueing (Lee et al., 2007b)
Focuses respondents on group differences Visual attracts attention and is unique Highlights understanding of scores/discrepancy
Increases respondent burden May get reconstructed logic as to rationale Requires weighting responses Various approaches used to arrive at weight values Methodological concerns w/different approaches Adds a little to respondent burden Requires follow-up of respondents with additional data gathering Can only deal with a subset of items
Each section of survey includes sufficient detail to set context for response Survey includes items inquiring into the rationale for assigned importance scores Instrument has subsections for rank ordering by respondents at end of survey
Follow-up survey incorporates Fig. 1 illustration into instrument
J.L. White, J.W. Altschuld / Evaluation and Program Planning 35 (2012) 124–132
STUDENTS
(S)
(I) 0.38 0.38
FACULTY/ ADMINISTRATORS 1.0
127
(S) 1.5
2.0
2.5
3.0
3.5
(I)
1.05 1.05 4.0
4.5
5.0
Fig. 1. Importance and satisfaction ratings by group (highlighted via the Altschuld loop).
several subsections and an overall question asking subjects to rank order the different sections. The idea was to use a weighting factor from the ranks as a proxy for importance when examining the numerical data. As was the case in Yoon’s (1990) earlier research, the weighting was not factored into any subsequent calculations. The concepts that Yoon (1990) and Hamann (1997) used might appear to offer a way to partially address the importance dilemma, but they are also problematic. That is because a ranked value is not necessarily an overall ranking of importance or similar attribute (van der Pligt, de Vries, Manstead, & van Harreveld, 2000). For example, the order of finish in a NASCAR race is not an absolute measure of performance. Instead, points are assigned based on finishing position and the number of laps led during the race. It was also one of the reasons the National Collegiate Athletic Association changed how they determined the college football national championship (Oriard, 2009). Rather than relying on national polls, they switched to a formula of weights and values that attempts to account for strength of schedule, quality of win, and other important factors. One possible solution is to assign weights using rank ordered centroids, rank exponent, or similar metrics. These values are derived after taking into account multiple factors (Barron, 1992; Barron & Barrett, 1996). They also undergo a process of rank ordering, weighing the pros and cons of each option, and ultimately balancing the choices (Buede, 2000). Since the result is a more meaningful importance score (and discrepancy analysis), it is worth the extra time and effort (Jia, Fischer, & Dyer, 1998). While this would have been helpful to Hamann (1997) and Yoon (1990), the research was not readily apparent in the needs assessment literature. Thus the call for additional studies and more comprehensive literature reviews. The final strategy in Table 3 is somewhat novel and requires more work. Taken from a study that examined discrepancies, it is a visual cueing technique created to reconcile differences between importance ratings given by college students and faculty (Lee et al., 2007b). While used with an online survey, the approach is also suitable for print media and requires a follow-up survey. To demonstrate its utility in needs assessment, consider the following illustration. Lee and colleagues (2007b) conducted a needs assessment of university programs designed to improve the retention rates of minorities in science, technology, engineering, or mathematics. Two groups, students (n = 186) and faculty (n = 39), were asked to rank the relative importance of, satisfaction with, and frequency of use of pre-college, academic and financial support services. A within method variation of wording for the survey items was used to compare responses from the two groups. When the information was found to be dissimilar, it was necessary to devise a strategy that would provide insight into the rationale for a more meaningful interpretation of needs. The first survey rated importance, satisfaction, and frequency (use) of various programs. Importance was rated notably lower by students on some items compared to faculty responses. Working with respondents geographically dispersed across 15 universities, a qualitative follow-up instrument was administered online. It utilized a visual cueing device with a small subset of items from the original survey where disparities between the groups were large.
The cueing is highlighted in Fig. 1 by what the researchers called the Altschuld Loop, which was designed to call attention to the disparity. In Fig. 1, the opposite ends of the bars represent the average importance (I) and satisfaction (S) ratings. In the example, importance was rated differently by the two groups of respondents. Circling the different ratings was done to highlight the issue directly in front of individuals, prompting feedback about their opinions and perceptions as to why this occurred. The guiding questions were: (1) why was the other group was different from yours, and (2) upon what basis were judgments of importance made by your group and the other one? The loop draws attention to the situation. Space was allotted for reasons underlying the responses. Since answering the second survey took extra time, it should be stressed that only those items with the most pronounced differences were explored. This is a limitation of the technique in that if there are numerous differences the feasibility would be questionable. This study (Lee et al., 2007b) led to some interesting interpretations of the needs data. In several cases, respondents revealed that the basis for their group perceptions was not only unique but seemed in hindsight to have originated from an uninformed perspective, much like that presented in Table 1. Once they had the benefit of the rationale used by the other group, the smaller qualitative sample indicated they would likely change the initial importance (or satisfaction) rating. While the reasons varied between and within the groups (e.g. lack of information, secondhand knowledge, etc.), both were influenced subsequent to receiving the insight of the other. Although the method has potential to provide insight and offer possible explanations for the discrepancies, there are limitations when generalizing from a small to much larger group. On the other hand, without the information obtained from the second study this would not have been apparent and could have resulted in incorrect interpretations and spurious explanations of the data. 5. Illuminating disparities Different approaches could be used as strategies for importance scores originating from an under-informed basis (thus leading to erroneous interpretations). One approach would be to add stimuli (more items) to get additional information. Since numerical data by itself may not always provide enough information, respondents can be asked about the basis for their importance scores. A similar bar graphing technique visually represents two key pieces of information: (a) the importance rating and (b) basis on which the rating is given in Fig. 2. The horizontal bars illustrate the importance score (I) and basis (F) on which they have been assigned. In this case the 3.0 rating (based on 5-point scale) comes from a good foundation of knowledge and would be considered more valid than the inverse situation. This procedure could also be done with rating scales instead of bar graphs. Inquiries could also be made about the type of knowledge – direct (participation in the program), indirect (cursory interactions), anecdotal, or that influenced by sources such as the Internet, television, or media. Importance scores can also be categorized as originating from little or no basis of information to a moderate amount and so forth.
128
J.L. White, J.W. Altschuld / Evaluation and Program Planning 35 (2012) 124–132
Importance rating
(I)
Basis for importance rating
(F) 0
1.0
2.0
3.0
4.0
5.0
Fig. 2. Concurrence of importance rankings (I) with foundation (F) where 0 = limited or no information and 5 = considerable information.
Any items with high values from the little to no category would not be considered as in the same light as those arising from a more informed perspective. At a minimum, they would be treated differently for analysis and interpretation purposes. This gives the assessor a better understanding of the basis of the rating and would be beneficial for generating more realistic interpretations of discrepancies. While adding survey items is always a concern, the simplicity of the technique should mitigate any impact on response rate since it is relatively quick and easy for respondents. An alternative would be to set a minimum level of information required for importance scores to be deemed valid. This could be determined a priori by the needs assessment committee or post priori via a threshold set by a panel of experts. The most valid importance scores would be limited to those above the cut-point and leading to more certainty about discrepancies being generated. Since some subjects may be hesitant to admit to little or no information about an important issue, socially desirable responding is another concern (van de Mortel, 2008). This can be countered by incorporating social desirability scaling (Crowne & Marlowe, 1960; Paulhus, 1998) into the needs instrument that can be correlated with the discrepancies (Thompson & Phua, 2005). Discrepancy scores significantly correlated with the more socially desirable responses would be closely scrutinized. 6. Implications for needs assessment At this juncture several conclusions can be drawn. First are the implications of probing into discrepancy scores and more specifically, importance ratings. Doing so increases the data collection activities. While the methods suggested in Table 3 are intended to offer some guidance, they require more time and effort for the needs assessor. Most auxiliary data collection activities will create an additional burden for the subjects, exacerbate nonresponses (overall and per item), and produce lower return rates. For needs assessors, this will require more resources and deliberation in constructing the instruments capable of eliciting more meaningful responses. Since there is no empirical evidence to suggest that any one way to resolve these issues is superior, practitioners will have to draw from experience, consult with colleagues or other experts, and ultimately require higher budgets. These budgets can be administratively justified based on data quality. The more certainty desired from the discrepancy information, the greater the time and costs. While more inquiry into understanding importance scores is evident, it is with the caveat that each situation is unique. What works in one context may not translate to another. Consider for example a program or service for which there is no prior experience. On what basis other than intuition does a respondent have to choose? Without any frame of reference, the choice of scales may simply be intuitive. In other situations, importance might not become apparent until the nature of the response emerges and indicates a need for more information. With the additional information come more complexity and challenges such as explaining very detailed and complex information to constituents. This is particularly true when support
is required before going forward. To illustrate one such case, consider the Boston Housing Authority’s (Boston Housing Authority, n.d.) needs assessment. Prior to replacing their business software, the authority, serving more than 14,000 public housing units, determined it was mission critical to secure buy-in from their nearly 25,000 elderly, disabled or economically disadvantaged clients. Due to the intricacy of the project (e.g. dealing with work orders, lease enforcement and collection, inspections, payment processing, etc.), considerable effort was devoted to explaining the process to a predominantly non-technical audience via flow charts and visual aids, to simplify what was at times overly technical information. This resulted in better understanding of the processes involved and helped secure support for the project changes. Some of the same tactics can be employed with needs data. Graphics and multimedia can be used to reveal where the data came from, what it means, and how it ultimately will be utilized. The techniques would be based on the level of understanding of the specific audiences. The information obtained from studying importance scores should help address some of the aforementioned needs assessment issues (Altschuld & White, 2010). The first is finding a meaningful standard for the ‘‘what should be’’ condition. The most desirable and accurate is that derived from an informed perspective. Delving into the issue of importance permits the assessor to establish a range of knowledge about the topic that can be used to construct a minimum standard. A secondary concern is confusion of the ‘‘what should be’’ condition with ‘‘wants’’ and ‘‘preferences.’’ In the absence of measurable discrepancies, wants and preferences are merely the opinions expressed on a survey. Altschuld and Witkin (2000) found respondents sometimes expressed what they feel is important in terms of ideal conditions. Probing into the basis of the importance ratings could be utilitarian in this regard. Working from the minimum knowledge base described above, the standard would help delineate meaningful what should be data from wants. 7. Recommendations for practice With so many issues in obtaining a meaningful measure of importance and calculating useful discrepancies, where should assessors turn for assistance? There is not much guidance in the needs assessment literature. This is notable since there has been a continuous mantra calling for more publications on planning and conducting needs assessments (Altschuld & White, 2010; Altschuld & Witkin, 2000; Witkin & Altschuld, 1995). The lack of sources is not surprising since many assessments are never circulated beyond the organization, probably because they are used for internal purposes with no thought to publishing. Sharing them with the evaluation community would go a long way to help practitioners as well as build the body of professional knowledge. Toward that end more needs assessment research should be disseminated. This starts with greater emphasis on writing and publication by university students, professional associations, practitioners, and those supporting needs assessments. Until then, practitioners should review the fundamental measurement and
J.L. White, J.W. Altschuld / Evaluation and Program Planning 35 (2012) 124–132
instrument design literature (Gable & Wolf, 1993). As more articles become available, assessors can discover better ways to implement studies and more specifically, how to obtain valid importance ratings. More deliberation into the construction of items on needs assessment surveys is also a consideration. Particular attention should be given to how surveys were constructed and the types of problems encountered. How did others get around the issue? What approach did they use and did it provide insight into the basis for the responses without compromising the integrity of the instrument? In some cases, it may be that the only way to get richer data is by measuring other attributes (confidence, knowledge, etc.) or using open-ended items that allow respondents to elaborate. A thorough scrutiny of the survey should be undertaken in all needs assessments (Altschuld & White, 2010). This includes a psychometric examination of the instrument’s internal consistency and confirmation of its constructs. The non-metric components such as interview questions and protocols can be validated with qualitative methods such as triangulation, member check, corroboration, and others. As indicated earlier, the validity of the importance ratings can be impacted when the pre-assessment reconnaissance work fails to create a well-aligned survey. In addition to verifying the feasibility of the needs assessment, small pilot studies can be used to confirm the reliability and validity of the instrument. Pilot studies go a long way toward identifying problems with the survey (e.g. instructions, item wording, etc.) as well as access to the sample. This gives the assessor the opportunity to better understand the population and improve the overall fit of the instrument. When done in concert with stratified sampling, variability within subgroups can be identified to forestall some of the confounding problems. To maximize the likelihood that the data will align, the assessor can: (1) structure the pilot study prompts to be consistent with those on the needs survey, (2) remember there is a limited shelflife between data gathering activities, (3) anchor responses to a common context, (4) focus on simplicity and avoid complex structures, (5) present items in unambiguous terms, and (6) when estimating agreement, use consensus-building techniques (Carpenter, 1999; Harris & Brown, 2010). After initial data collection, sending a follow-up e-mail to the respondent simplifies the process of obtaining consensus. Unfortunately, some needs assessments may not lend themselves well to pilot studies. Small projects with limited budgets may preclude this option. In these situations the literature becomes an even more valuable resource. 7.1. Alternative media In addition to improving survey alignment, technology offers innovative opportunities for collecting importance information. For example, data collection can now include multimedia like cyber focus groups, pod casts or webinars to ask respondents about their rationale for assigning importance score values. Speech recognition software such as DragonDictate, Voice Direct and ViaVoice can transcribe the information for thematic analyses with software such as NUD*IST, Atlas/TI, or Ethnograph. Advances in technology permit needs assessors to collect more data about importance scores from even more geographically dispersed groups (Turney & Pocknee, 2005). Concomitant with advances in technology is the move toward newer generations of electronic surveys. While many began as one-dimensional replications of pen and paper surveys, commercial platforms such as Sawtooth, Survey Monkey, and SurveyGizmo are taking advantage of the multimedia opportunities. By integrating imaging software, interactive audio/video, and graphics,
129
we should be able to delve more deeply into the ‘‘what should be’’ condition. This can be accomplished through highly realistic vignettes like those found in gaming technology and avatar communities (Christensen & Levinson, 2003), where the subjects and the needs assessor would interact in a three-dimensional virtual reality. The richness of the data would be limited only by the boundaries of the technology. Using technology to create an interactive environment permits a level of inquiry not attainable through online or traditional questionnaires. In contrast to Witkin’s (1984) reliance on script to prompt survey responses, personal computers and multimedia software can now create a more realistic needs assessment context. Scenarios could be tailored to fit an array of purposes and to examine responses under different conditions. For example, would respondents provide the same importance rating if the program was offered under a different set of circumstances, location, time of day, etc.? With advances in technology, assessors should capitalize on innovative ways of collecting needs data. When planning for technology to collect importance data, needs assessors must consider a number of factors such as culture of the organization, approach and context of the study, and ethical considerations. For example, does the organization have a culture of technology that permits its use? If so, is it readily available throughout the entity or are there areas that are not accessible? What is the technological proficiency of the participants? Do they need training before the project can go forward? The approaches used in data collection are also considerations. A follow-up questionnaire about the rationale of an importance score can generate significant amounts of qualitative information that technology can thematically analyze. The time it takes to transcribe the audio and/or visual recordings can be reduced to expedite the needs assessment process. Similarly, electronic surveys and statistical software have greatly increased the volume of quantitative data needs assessors can handle. The context of the study is another concern. Do the time frame and budget allow the use of technology to collect importance data? Are the respondents centralized or geographically dispersed? In large scale needs assessments, technology can be an invaluable tool (Lee et al., 2007b). The decision to use of technology to collect importance data can also generate ethical dilemmas. The anonymity of technology can lull respondents into a false sense of security. The assessor must be clear when electronic communications and requests for needs data are coordinated and when they are not. The nature of the responses may be different when coordinated (synchronous) versus uncoordinated (asynchronous) communications are used. This can be particularly true in situations when the respondents do not know the differences between the two methods. Needs assessors must ensure all participants understand and fully appreciate the processes involved. Above all, consideration must be given to protecting human subjects against potential abuses of technology. In a similar vein, practitioners should look at various interactive media to see if there are better ways to assess importance. For instance, if alternative media was used independently from a traditional survey, the two sources could be used to validate the information before calculating discrepancy scores. While not advocating abandonment of surveys, the authors recommend experimenting with alternatives that reduce respondent burden while enhancing the quality of the importance data. Care must be taken to be judicious in requests for additional data. Inundating subjects can lead to data asphyxiation and negate any benefits associated with the new technology (Murray, 1998; Shenk, 2009). While it has the potential to be a very useful needs assessment tool (Yilmaz & Altschuld, 2008), it must be emphasized that WEBbased or cyber engagements may not elicit the same response that would be given in an online or traditional survey. Responses
130
J.L. White, J.W. Altschuld / Evaluation and Program Planning 35 (2012) 124–132
obtained in an open forum may be the product of social desirability bias and produce answers not automatically useful for interpretation. This phenomenon is similar to that in exit poll data where interviewees sometimes cast their ballots in a manner different once in the voting booth (Fisher, 1993). The privacy of the computer is not dissimilar from the voting booth. Until more research is done, those working in assessment will have to be attentive to the concerns associated with the new media. In addition to where responses are given is the notion of when. Living in a 24-h news cycle is constantly producing changes in knowledge and life experiences. The passage of time can affect individual and collective answers. Survey responses are dynamic – what was observed six months ago might not pertain today. Such was the case when Malmsheimer and Germain (2002) studied how well a needs survey predicted the utilization of extension training opportunities. Initial results found that most of the respondents (n = 200) indicated a preference for educational workshops. Six months later a follow up was done to see if they actually attended. The study had mixed outcomes ranging from 12% (general interest) to 22%, indicating strong interest. This led to subsequent investigation to see if utilization needs had changed over time or differed more than anticipated. This same logic can be generalized to other concerns like the labor market (job search strategies, training, skills, etc.), higher education, health care, small business practices, and others. The relative importance of a need previously detected can change after respondents view a media report or an article about the topic. While surveys have been the primary focus here, some comments about databases are in order. Before accepting importance scores from archival sources the assessor should investigate how the information was obtained and the quality control mechanisms (Altschuld, 2010). There should never be a strict adherence to electronic information without understanding how the data was collected and the psychometric integrity of the instrument. What was the nature of the questions and the wording used as related to the importance rating? Full understanding and attention to these features cannot be understated, particularly when the assessor is under pressure to use existing information to avoid the added costs to design instruments and obtain new data. 7.2. Knowledge gaps and research options The literature indicates practitioners are incorporating advances in technology into assessments (Altschuld & White, 2010). However, that is not to imply there are no gaps in knowledge. More research is needed into the use of alternative media and the role it plays in need assessment practice. For example, do investigators possess adequate understanding of technology (particularly new advances) and integrate them into their studies? What is their knowledge base and is it reliable? Has technology been used primarily to collect consumer data or expanded to include outcome assessment and program personnel? Other relevant questions are: What groups are most amenable to technology-based needs assessments? What are the training needs of consumers and practitioners in order to capitalize on technology? Which technology-based strategies are better than others? What are the parameters on technology in needs assessment decision-making? What is the knowledge base for new practitioners? What understanding do practitioners have for the algorithms used to store, reduce and analyze data from large needs-based data sets?
What methods are employed to present complex needs information to various audiences and are some more effective than others? What are the ethical issues encountered in the use of technology? What is the prevalence of human error in technology-based needs assessments? While not a comprehensive list of recommendations, what is evident from the literature is that more information is needed about the use of innovative technology approaches in needs assessment practice. 7.3. Data analyses and models New statistical approaches for analyzing survey data are available for needs assessment. The fuzzy Delphi method and fuzzy statistics are areas of particular interest (Chang, Huang, & Lin, 2000). In contrast to the Bayesian approach, respondents select from a range of values, hence the ‘fuzzy’ concept. This may be more consistent with how importance is conceptualized. Instead of being limited to choosing from static numeric values (Halpern, 2003; Ho¨ppner, Klawonn, Kruse, & Ranker, 1999), fuzzy logic models can calculate importance score averages across groups of respondents. Considering that individuals tend to think within a range of values (desired income, travel time), it becomes an interesting and constructive way to design needs assessment instruments. Needs assessors can also use other preference-based models like unfolding, conjoint analysis, and discrete choice method (Rossi, Allenby, & McCulloch, 2006). The algorithms used in unfolding are techniques that respond to changes in circumstances. Respondents are initially asked about the relative importance of an item, followed by similar questions that incorporate variations in the context. This affords needs assessors the opportunity to anticipate and plan for changes as they unfold. Conjoint analysis can be used in needs assessment to measure importance of an item when evaluated simultaneously with others having similar features. Consider for example the importance of a good or service based on the time commitments, costs, and duration of the program. Because people will make tradeoffs, assessors can measure the relative worth of each attribute, or attribute level, or combination of attributes and/or levels. Lastly, discrete choice modeling can be used to determine the priority given to a rating by examining the combination of intrinsic and extrinsic factors involved in making the decision. To illustrate this, consider how the importance of health services to a community is determined. The availability of and demand for medical resources must be considered as well as local demographics, such as income, median age, transportation, available medical alternatives, and so forth. Discrete choice models become a more realistic option for needs assessment because they can account for many more variables. 8. Closing thoughts To better understand the ‘‘what should be condition’’, importance ratings must undergo a rigorous review before calculating discrepancies. This includes: (a) during writing of the items, (b) in the choice of scales, and (c) after data collection. The focal point of this paper has been on some of the problems and issues that can arise and possible solution strategies. The discussion is not exhaustive but intended to generate dialogue about a topic that has been glossed over in the needs assessment literature.
J.L. White, J.W. Altschuld / Evaluation and Program Planning 35 (2012) 124–132
Each of the strategies offered has its own strengths and weaknesses that challenge practitioners to find a balance for the context of the needs assessment. This is imperative since most assessments require a significant investment in time, money, and labor. The benefits are worth the investment however, particularly when using importance scores to calculate discrepancies and prioritize needs. When all is said and done, the end users of needs assessments will make decisions about the allocation of financial, materials, and human resources. Less-than-adequate information about the ‘‘what should be condition’’ is not desirable when dealing with life-impacting situations and prioritizing some needs at the expense of others. Relying on poor quality data only to subsequently find it is less than factual can have serious consequences for programs and personnel, and is of greatest consequence to the recipients of such endeavors. Why should we accept less than the best available data we can obtain? Finally, while the focus has been on importance, the issues raised in this paper could be generalized to satisfaction, motivation, or frequency scores. Therefore our call for more research should not be limited to importance but go beyond to include other similar attributes.
Acknowledgement The authors would like to acknowledge our colleague Yi-Fang Lee, Associate Professor of Comparative Education at National Chi Nan University for her support and contribution to this paper.
References Altschuld, J. W. (2010). Needs assessment phase II: Collecting data. Thousand Oaks, CA: Sage Publications. Altschuld, J. W., & White, J. L. (2010). Needs assessment: Analysis and prioritization. Thousand Oaks, CA: Sage Publications. Altschuld, J. W., & Witkin, B. R. (2000). From needs assessment to action: Transforming needs into solution strategies. Thousand Oaks, CA: Sage Publications. Anderson, R. C., & Pearson, P. D. (1984). A schema-theoretic view of basic processes in reading comprehension. In P. D. Pearson (Ed.), Handbook of reading research (pp. 255–291). Mahwah, NJ: Lawrence Erlbaum Associates. Anderson, B., Puur, A., Silver, B., Soova, H., & Voormann, R. (1994). Use of a lottery as an incentive for survey participation: A pilot study in Estonia. International Journal of Public Opinion Research, 6, 64–71. Ausubel, D. (1978). In defense of advance organizers: A reply to the critics. Review of Educational Research, 48, 251–257. Ausubel, D., Novak, J., & Hanesian, H. (1978). Educational psychology: A cognitive view (2nd ed.). New York: Holt, Rinehart & Winston. Barron, F. H. (1992). Selecting a best multiattribute alternative with partial information about attribute weights. Acta Psychologica, 80(1–3), 91–103. Barron, F. H., & Barrett, B. E. (1996). The efficacy of SMARTER – Simple multi-attribute rating technique extended to ranking. Acta Psychologica, 93(1–3), 23–36. Boston Housing Authority (n.d.). IT needs assessment project. Retrieved from http:// www.bostonhousing.org/detpages/deptinfo128.html. Brehm, J. (1994). Stubbing our toes for a foot in the door: Prior contact, incentives and survey responses. International Journal of Public Opinion Research, 6, 45–63. Brown, G. T. L. (2004). Measuring attitude with positively packed self-report ratings: Comparison of agreement and frequency scales. Psychology Reports, 94(3), 1015–1024. Buede, D. M. (2000). Engineering design of systems: Models and method. New York: John Wiley and Sons. Calsyn, R. J., & Winter, J. P. (1999). Understanding and controlling response bias in needs assessment studies. Evaluation Review, 23(4), 399–417. Carpenter, S. L. (1999). Choosing appropriate consensus building techniques and strategies. In L. Susskind, J. Thomas-Larmer, & S. McKearnan (Eds.), Consensus building handbook (pp. 169–198). Thousand Oaks, CA: Sage Publications. Chang, P.-T., Huang, L.-C., & Lin, H.-J. (2000). The fuzzy Delphi method via fuzzy statistics and membership function fitting and an application to the human resources. Fuzzy Sets and Systems, 112(3), 511–520. Christensen, K., & Levinson, D. (2003). Encyclopedia of community: From the village to the virtual world (Vol. 1, pp. ). ). Thousand Oaks, CA: Sage Publications. Coombs, C. H. (1964). A theory of data. New York: John Wiley and Sons Inc. Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24(4), 349–354. Dillman, D. A., Smyth, J. D., & Christian, L. M. (2009). Internet, mail, and mixed-mode surveys: The tailored design method (3rd ed.). New York: John Wiley and Sons.
131
Dixon, P. N., Bobo, M., & Stevick, R. A. (1984). Response differences and preferences for all-category-defined and end-category-defined Likert formats. Educational and Psychological Measurement, 44(1), 61–66. Doyle, K. O. (1975). Student evaluation of instruction. Lexington, MA: Lexington Books. Fisher, R. J. (1993). Social desirability bias and the validity of indirect questioning. Journal of Consumer Research, 20(2), 303–315. Gable, R. K., & Wolf, M. B. (1993). Instrument development in the affective domain: Measuring attitudes and values in corporate and school settings (2nd ed.). Boston, MA: Kluwer Academic. Greene, J. C. (2001). Understanding social programs through evaluation. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative inquiry (2nd ed., pp. 981–999). Thousand Oaks, CA: Sage Publications. Gupta, K., Sleezer, C. M., & Russ-Eft, D. F. (2007). A practical guide to needs assessment (2nd ed.). San Francisco: Jossey-Bass (Pfeiffer). Halpern, J. Y. (2003). Reasoning about uncertainty. Cambridge, MA: MIT Press. Hamann, M. S. (1997). The effects of instrument design and respondent characteristics on perceived needs. Dissertation Abstracts International: Section A. Humanities and Social Sciences, 58(05), 1672. Harris, L. R., & Brown, G. T. L. (2010). Mixing interview and questionnaire methods: Practical problems in aligning data. Practical Assessment, Research & Evaluation 15(1) Retrieved from http://pareonline.net/pdf/v15n1.pdf. Ho¨ppner, F., Klawonn, F., Kruse, R., & Ranker, T. (1999). Fuzzy cluster analysis: Methods for classification, data analysis and image recognition. New York: John Wiley and Sons. Jia, J., Fischer, G. W., & Dyer, J. S. (1998). Attribute weighting methods and decision quality in the presence of response error: A stimulation study. Journal of Behavioral Decision Making, 11(2), 85–105. Kaufman, R. A., & English, F. W. (1979). Needs assessment: Concept and application. Englewood Cliffs, NJ: Educational Technology Publications Inc. Lam, T. C. M., & Klockars, A. J. (1982). Anchor point effects on the equivalence of questionnaire items. Journal of Educational Measurement, 19(4), 317–322. Lee, Y.-F., Altschuld, J. W., & White, J. L. (2007a). Problems in needs assessment data: Discrepancy analysis. Evaluation and Program Planning, 30(3), 258–266. Lee, Y.-F., Altschuld, J. W., & White, J. L. (2007b). Effects of multiple stakeholders in identifying and interpreting perceived needs. Evaluation and Program Planning, 30(1), 1–9. Leigh, D., Watkins, R., Platt, W., & Kaufman, R. (1998). Alternate models of needs assessment: Selecting the right one for your organization. Human Resource Development Quarterly, 11(1), 87–93. Malmsheimer, R. W., & Germain, R. H. (2002). Needs assessment surveys: Do they predict attendance at continuing education workshops? Journal of Extension 40(4) Retrieved from http://www.joe.org/joe/2002august/a4.php. Murray, B. (1998). Data smog: Newest culprit in brain drain. APA Monitor 29(3) Retrieved from http://www.apa.org/monitor/mar98/smog.html. Novak, J. D. (2002). Meaningful learning: The essential factor for conceptual change in limited or inappropriate propositional hierarchies leading to empowerment of learning. Science of Education, 76(4), 548–571. Novak, J. D. (2003). The promise of new ideas and new technology for improving teaching and learning. CBE-Life Sciences Education, 2(2), 122–132. Oriard, M. (2009). Bowled over: Big-time college football from the sixties to the BCS era. Chapel Hill, NC: The University of North Carolina Press. Ory, J. C., & Wise, S. L. (1981). Attitude change measured by scales with 4 and 5 response options. Paper presented at the annual meeting of the National Council on Measurement in Education. Paulhus, D. L. (1998). Paulhus deception scales (PDS). New York: Multi-Health Systems Inc. Reams, P., & Twale, D. (2008). The promise of mixed methods: Discovering conflicting realities in the data. International Journal of Research and Methods in Education, 31(2), 133–142. Reviere, R., Berkowitz, S., Carter, C. C., & Ferguson, C. G. (1996). Needs assessment: A creative and practical guide for social scientist. Washington, DC: Taylor & Francis. Rossi, P. E., Allenby, G. M., & McCulloch, R. (2006). Bayesian statistics and marketing. John Wiley and Sons. Shenk, D. (2009). Data smog: Surviving the information glut. New York: HarperCollins e-books. Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation 9(4) Retrieved from http://pareonline.net/getvn.asp?v=9&n=4. Sung, K-T. (1992). Identification and prioritization of needs of families by multiple groups: Residents, key informants, and agency directors. Social Research Indicators, 26(2), 137–158. Thompson, J. W. (1966). Coombs’ theory of data. Philosophy of Science, 33(4), 376–382 Retrieved from http://www.jstor.org/stable/186640. Thompson, J. W. (1968). Discussion: Polarity in the social science and in physics. Philosophy of Science, 35(2), 190–194 Retrieved from http://www.jstor.org/stable/186488. Thompson, E. R., & Phua, F. T. T. (2005). Reliability among senior managers on the Marlowe–Crowne short-form social desirability scale. Journal of Business and Psychology, 19(4), 541–554. Turney, L., & Pocknee, C. (2005). Virtual focus groups: New frontiers in research. International Journal of Qualitative Methods 4(2) Retrieved from http://www.ualberta. ca/iiqm/backissues/4_2/html/turney.htm. van de Mortel, T. F. (2008). Faking it: Social desirability response bias in self-report research. Australian Journal of Advanced Nursing, 25(4), 40–48. van der Pligt, J., de Vries, N. K., Manstead, A. S. R., & van Harreveld, F. (2000). The importance of being selective: Weighting the role of attribute importance in attitudinal judgment. Advance in Experimental Social Psychology, 32, 135–200. Witkin, B. R. (1984). Assessing needs in educational and social programs. San Francisco: Jossey-Bass.
132
J.L. White, J.W. Altschuld / Evaluation and Program Planning 35 (2012) 124–132
Witkin, B. R., & Altschuld, J. W. (1995). Conducting needs assessment: A practical guide. Thousand Oaks, CA: Sage Publications. Yilmaz, H. B., & Altschuld, J. W. (2008). Use of two focus group interview formats to evaluate public health grand rounds: Methods and results. Presentation to the annual conference of the American Evaluation Association. Yoon, J-S. (1990). Factors related to the utilization of needs assessment in Ohio colleges and universities. Dissertation Abstract International: Section A. Humanities and Social Sciences, 51(07), 2288.
Dr. Jeffry L. White is Assistant Professor of Educational Foundations and Leadership at the University of Louisiana, Lafayette. His primary foci are in quantitative methods and prioritizing needs assessment data.
Dr. James W. Altschuld is Professor Emeritus of Quantitative Research, Evaluation and Measurement in Education at The Ohio State University. His research and writing interests are in needs assessment and the training of evaluators.