How to improve a technology evaluation model: A data-driven approach

How to improve a technology evaluation model: A data-driven approach

Technovation xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Technovation journal homepage: www.elsevier.com/locate/technovation How t...

564KB Sizes 12 Downloads 44 Views

Technovation xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Technovation journal homepage: www.elsevier.com/locate/technovation

How to improve a technology evaluation model: A data-driven approach Heeyong Noha, Ju-Hwan Seob, Hyoung Sun Yoob, Sungjoo Leea, a b



Department of Industrial Engineering, Ajou University, 206 Worldcup-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, South Korea Department of Industry Information Analysis, Korea Institute of Science and Technology Information, 66 hoegiro, Dongdaemun-gu, Seoul, South Korea

A R T I C L E I N F O

A B S T R A C T

Keywords: Technology evaluation model Validity Improvement South Korea Data-driven

Academic research suggests a number of technology evaluation models. To ensure effective use, models need to be improved in accordance with changing internal and external environments. However, a majority of previous studies focus on model development, while a few emphasize their implementation or improvement. To fill this research gap, this study suggests a systematic approach to examining the validity of technology evaluation models and improving them. We consider three propositions as criteria for improvement: 1) the coherence of the evaluation results with the evaluation purpose, 2) the appropriateness of the evaluation methods, and 3) the concreteness of the evaluation model. Rather than using expert opinions, this study takes a data-driven approach, wherein we analyze actual evaluation results and determine whether the model produces the intended results. A case study of 291 technology evaluation results, all made by the South Korean government in support of technology-based small and medium-sized enterprises, is conducted to verify the suggested approach's applicability. This is one of the few studies to address issues regarding improvements to a technology evaluation model. Its approach can help to develop and continuously improve a valid technology evaluation model, thus leading to more effective practice.

1. Introduction Technology evaluation has long received considerable attention, in both industry and academia (Cho and Lee, 2013; Hsu et al., 2015; Kim et al., 2011; Perkmann et al., 2011); accordingly, to date, a number of technology evaluation models have been suggested and are currently in use. Existing efforts towards the use of a technology evaluation model can be divided largely into two categories, namely, model development and implementation. Development relates to activities for deciding “what and how” to evaluate technologies, in order to achieve the evaluation purpose in a given context (ex-ante efforts). Meanwhile, implementation refers to the application of the model developed in practice, and it includes activities for investigating the evaluation process and results, in an attempt to improve the model (ex-post efforts). The development category is foremost, since without a valid model, any remaining work may not be meaningful. Quite naturally, mainstream research in South Korea, the United States, and Europe has also focused on this first category of activities (Lee et al., 1996). However, once a valid model has been developed, it needs to be implemented with the necessary commitment of resources and a customized application to a real context. The same evaluation model can produce different performance results, depending on differences in resource commitments (Bremser and Barsky, 2004) and project profiles (Loch and Tapper,



2002). Thus, if an evaluation model is to be used effectively, continuous efforts need to be made, so that the model may be implemented and further improved. Regarding development and implementation, Kaplan and Norton (1996, p. 99), who developed a balanced scorecard as a strategic management tool concerning measures to achieve strategic goals, highlight the continuous adaptation of strategy, arguing that “measurement has consequences far beyond reporting on the past. Measurement creates focus for the future.” This implies that implementation, in particular continuous improvement, also requires careful analysis and consideration. Implementing a technology evaluation model can be a management challenge, producing unexpected results, even when the model was reasonably developed on a basis of scientific literature and data. In practice, unexpected barriers (e.g., objection to model use, or a lack of experts involved in the evaluation process) may be encountered in a model's actual application, which can in turn hinder the reception of anticipated results. Sometimes, when developing the model, incorrect cause-and-effect relationships can be hypothesized (e.g., evaluation criteria irrelevant to the construct of interest). Internal or external environments may also change, and this can affect the model's validity (e.g., changes in corporate innovation policy, or in government policy and funding programs). Hence, the validity of the model needs to be tested, and on the basis of that validity, its

Correspondence to: Department of Industrial Engineering, Ajou University, 206 Worldcup-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, South Korea. E-mail addresses: [email protected] (H. Noh), [email protected] (J.-H. Seo), [email protected] (H. Sun Yoo), [email protected] (S. Lee).

http://dx.doi.org/10.1016/j.technovation.2017.10.006 Received 28 April 2016; Received in revised form 22 June 2017; Accepted 28 October 2017 0166-4972/ © 2017 Elsevier Ltd. All rights reserved.

Please cite this article as: Noh, H., Technovation (2017), http://dx.doi.org/10.1016/j.technovation.2017.10.006

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

2. Literature review

improvement strategy needs to be developed. These problems are observed not only in the technology evaluation context, but also in other contexts such as decision support systems and knowledge management. Borenstein (1998) insists that little attention has been given to validate decision support systems that have been developed and put into practice. More recently, Park et al. (2010) argue that a majority of the studies, in a knowledge management context, tend to concentrate on evaluation model development, but that insufficient efforts have been made to improve the evaluation models. The current study recognizes this research gap and aims to develop an approach by which to improve technology evaluation models, while focusing especially on models that look to predict future impacts (i.e., outcomes). Of course, improvement strategies can target various elements such as people (evaluator) and processes (evaluation procedure), while this study emphasizes the significance of tools (evaluation model). The effectiveness of this particular tool—an evaluation model—depends on its ability to discriminate high-performance technologies from low-performance ones. The validity of the model can be tested by comparing predicted performance to actual performance. In any case, it is not easy to measure R & D performance, on account of unobservable effort levels, uncertain project success, and time lags between investment and performance (Loch and Tapper, 2002). Thus, comparison analysis between predicted and actual performance can cover only limited aspects of model validity. Therefore, in this study, we reviewed the literature on validity theory, based on three propositions that we consider valuable guidelines in improving model validity: 1) the coherence of the evaluation results with the evaluation purpose, 2) the appropriateness of the evaluation methods, and 3) the concreteness of the evaluation model (mutually exclusive and collectively exhaustive (MECE) nature of indicators as evaluation criteria). In particular, we suggest a data-driven approach, where evaluation results are used to determine model validity; this allows the data to directly express the characteristics of the model, and thus the current approach can complement an expert-based approach in advancing the model. To verify the applicability of the propositions, we adopted a technology evaluation model that has been used since 2002 by a South Korean government agency. The agency is one of the most representative institutes in charge of supporting South Korean small and medium-sized enterprises (SMEs). SMEs with great potential to successfully commercialize their technologies in the future have been identified and then involved in the national-level SME development program by the agency. For this purpose, the agency has used the model to assess the potential of the technology a firm possesses. Although the agency recognized the significance of ex-post activities in model implementation, it has focused mainly on program implementation and maintenance. The model has evolved through several stages, but has been based mainly on experts’ insights, even as the necessity of employing more systematic approaches that use evaluation results was raised within the agency. Therefore, the agency was selected as a case study, and its technology evaluation data—which were gathered during the 2008–2013 period—were provided as a major analytical source. Theoretically, the current study is one of the few to address how to improve a technology evaluation model that has already been developed and is being implemented. The approach also can be practically helpful by improving the validity of both the development and use of a technology evaluation model. The remainder of this paper is structured as follows. Section 2 describes various types of technology evaluation models and the requirements for building a valid evaluation model. Section 3 then suggests three propositions to test model validity and introduces suitable statistical methods for the test. The study's results are addressed in Section 4, and discussions are presented in Section 5. Finally, contributions and limitations are discussed in Section 6.

2.1. Technology evaluation models Since the early 1960s, technology assessment has received consistent attention from both academia and industry (Azzone and Manzini, 2008; Linstone, 2011). Technology is defined as “… the practical application of knowledge to achieve particular tasks that employs both technical artefacts (hardware, equipment) and (social) information (software, know-how for production and use of artefacts)” (IPCC, 2007). As the nature of technology depends on the tasks to be achieved, technology is diverse in its application, and technological innovation can also be observed in different ways (OECD, 2005). This indicates that technology evaluation can be conducted in various ways, according to the purpose of a given evaluation. This study uses the term “technology evaluation” in a broad sense, so as to include the evaluation, assessment, and measurement of technology or R & D-related factors. Diverse types of technology evaluation, in this context, need to be reviewed to achieve the aim of this study—namely, the development of a framework by which to improve technology evaluation models. We adopt two basic elements to recap the existing approaches to technology evaluation: 1) what to evaluate, and 2) how to evaluate. These can be considered the core of an evaluation model. First, “what to evaluate” relates to an evaluation target. Different technology evaluation purposes require different perspectives with respect to technology. Specifically, as technology has long served as a springboard or a source of innovation (Danneels, 2004; Kang and Park, 2012), key technology evaluation areas include not only technological features, but also technology-based organizational capabilities and technological impacts on markets, since all these factors work together to determine the value of a technology. According to Roure and Keeley (1990), three perspectives—namely, management, environment, and the firm—have been suggested in evaluating the technologies in a venture. The management perspective relates to organizational or individual capabilities that make a firm's technology or R & D-related activities effective and efficient. This perspective has been a particular focus of recent studies that look to assess organizational technology capabilities (e.g., Cheng and Lin, 2012; Kim et al., 2011; Mohammadi et al., 2017; Sobanke et al., 2014; Van Wyk, 2010). The environment perspective is used to predict the successful diffusion of these capabilities in a market and the benefits that can be expected from corresponding R & D investments. Therefore, a number of researchers have incorporated this perspective into their technology evaluation models (e.g., Abbassi et al., 2014; Jolly, 2012; Santiago et al., 2015). Finally, the firm perspective deals with the advantages of technology itself, as a resource, and as a corporate strategic choice with respect to technologies (e.g., Chiu and Chen, 2007; Kim et al., 2011; Rocha et al., 2014; Shen et al., 2010). Here, it should be noted that although the three perspectives are valuable in capturing the value of technology, they need not always be considered simultaneously when evaluating a technology; the choice of perspective in evaluating a technology should instead depend upon the evaluation purposes at hand. Second, “how to evaluate” is linked to evaluation methods, which can in turn be largely classified as qualitative, quantitative, or a combination thereof. Technology assessments, in spite of their well-known limitations in terms of reliability and validity (Yin, 2013), have been conducted in a qualitative manner, given the ease of capturing through this method all the softer aspects of technology-related factors (Azzone and Manzini, 2008; Facey et al., 2010). Qualitative methods include interviews with experts, or a focus group study. On the contrary, quantitative approaches offer hard data and provide numerical clarification. Such inherent advantages make them attractive for use in technology evaluation (Daim et al., 2009; Kalbar et al., 2012; Wang et al., 2015). Quantitative data, or a set of quantified soft data, have been generally applied to the purely quantitative approach in this case, because doing so can facilitate a more realistic evaluation process 2

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

competitive R & D projects, and assessing organizational technology strength). In particular, evaluation criteria and evaluation methods should be developed while taking human reasoning into consideration; this includes the human propensity to use approximate information and sidestep uncertainty in making evaluations. Thus, one should avoid the use of measurements and observations that could be considered by human assessment to be ambiguous. On the assumption that the evaluation criteria are based on a theoretical background vis-à-vis a proper evaluation purpose, we focus more on evaluation methods, while attempting to test appropriateness in terms of whether the methods are sufficiently precise to produce meaningful outcomes. Second, Borsboom et al. (2004) insist that theoretical terms may be referenced by several meanings (e.g., the planet Venus can be called both the “morning star” and the “evening star”), and thus can be measured by several attributes that need to exist in reality and be referenced by theoretical terms. They link this concept to construct validity, which is defined as “the degree to which a test measures what it claims, or purports, to be measuring” (Cronbach and Meehl, 1955). Additionally, they argue as follows:

(Ordoobadi, 2008). Researchers have utilized numerous methodologies as quantitative approaches, including complex mathematical models (e.g., Van Zee and Spinler, 2014; Wang et al., 2015), scoring models based on multi-criteria decision-making analyses (e.g., Daim et al., 2009; Hsu et al., 2010; Kalbar et al., 2012; Lee et al., 2014; Rocha et al., 2014), and heuristic algorithms (e.g., Abbassi et al., 2014). Finally, previous studies have also attempted to measure technological characteristics by simultaneously using qualitative and quantitative methods (Shen et al., 2010; Wortley et al., 2015; Yu and Lee, 2013), in what is generally called a “qualitative,” “hybrid,” or “mixed” approach. The evaluation model in this study takes into account all of the three aforementioned perspectives (i.e., management, environment, and firm) in terms of “what to measure”; it is based on a scoring model regarding “how to measure.” As a scoring model produces final evaluation results simply by adding a set of different factors, it is one of the most commonly used technology evaluation models (Sohn et al., 2005). Additionally, this study restricts its focus to an evaluation model for predicting future impacts (or outcomes). 2.2. Requirements for a valid technology evaluation model

Therefore, a question like Are IQ tests valid for intelligence? can only be posed under the prior assumption that there does exist, in reality, an attribute that one designates when using the term intelligence; the question of validity concerns the question of whether one has succeeded in constructing a test that is sensitive to variations in that attribute. (p. 1065)

It is essential at the outset to understand the concept of validity of a technology evaluation model. Leading theorists (e.g., Cronbach, 1988; Kane, 2001; Messick, 1986; Shepard, 1993) have debated validity over the last two decades, but have not yet achieved a solid consensus. In this study, we newly define the validity of a technology evaluation model as a model's ability to accurately measure the value of technology, according to the purpose of the evaluation. In practice, measurement differs from evaluation: the former is defined as “the assignment of numbers to aspects of objects or events according to one or another rule of convention” (Stevens, 1968), while the latter is defined as “the making of judgements about the value of a program or program design” (Bloom, 1956). However, scoring-based models—one of which is the target model in this study—comprise a set of criteria to be measured, and require a judgment concerning the value of the evaluation target for each criterion to assign an appropriate number. Then, by integrating the assigned numbers for all criteria, a final value of the evaluation target is determined. This indicates that the validity theories for measurement can be effectively applied to evaluation, in particular, the technology evaluation model investigated in this study. Specifically, we follow the work of Borsboom et al. (2004), who suggest simple yet adequate semantics for the validity concept, within three domains—namely, ontology, reference, and causality. As these three domains provide thorough explanations and insights to consider the validity of a technology evaluation model, we make three essential propositions that constitute a framework by which to improve a technology evaluation model, based on these domains. First, Borsboom et al. (2004) explain the concept of ontology, i.e., the nature of existence, rather than epistemology, which studies the nature of knowledge, the rationality of belief, and justification. This means that a valid measurement system should enable users to measure the target “as is,” and not is based on what they already know about the target (i.e., the recognized target). To meet the requirements of ontology, the measurement system needs to be properly designed from the beginning; however, investigations need to feature follow-up activities and monitor whether what is expected to happen corresponds to what actually happened. If this concept is applied to a technology evaluation model, it is concerned with the appropriateness of the evaluation methods, to ensure that the evaluation method is sufficiently appropriate in measuring what it needs to measure. With respect to the ontology requirement, there is a need for both ex-ante efforts (to develop a technology evaluation model) and ex-post efforts (to improve the model). As technology covers a wide range of intersecting and heterogeneous contingencies (Rooney, 1997), the technology evaluation criteria should be well developed while taking into account the evaluation purpose (e.g., for discovering promising technology, selecting

For this reason, the reference domain relates to the concreteness of the evaluation model. As to this reference domain, the validity test measures the appropriateness of inferences made on the basis of measurements (observations for attributes) to indicate the intended construct (theoretical terms). To evaluate whether the evaluation criteria can measure what they claim to measure, the current study focuses on whether a set of criteria (intended construct) consisting of several indicators (measurements) is MECE; such criteria would enable the evaluation of technology from various perspectives, and on the basis of an evaluation purpose. These criteria would also help make appropriate inferences regarding the technology from the evaluation results. Finally, Borsboom et al. (2004) suggest the causational concept of validity, which indicates that differences in measurement outcomes should be explained by differences in attributes. They argue that: …it is clear that if attribute differences do not play a causal role in producing differences in measurement outcomes, then the measurement procedure is invalid for the attribute in question. (p. 1067) This causality requirement suggests the need to test the coherence of the evaluation results with the evaluation purpose. In the context of technology evaluation, this implies that a valid technology evaluation model should be able to estimate future impact or outcomes, on the basis of its evaluation results; otherwise, the technology evaluation results might be useless. As technology evaluation is fundamentally a predictive study—in which R & D is typically conducted at the back end of actual businesses (Baden-Fuller and Haefliger, 2013)—this causality should be particularly investigated when testing the validity of a technology evaluation model. 3. A data-driven approach to improving a model 3.1. Propositions to test model validity Based on the results of our literature review, we propose a framework for testing the validity of a technology evaluation model, while focusing particularly on a way of continuously improving that model. The current study focuses on the scoring model, as mentioned in Section 2. Scoring models are popular, primarily because of their simplicity and robustness, compared with other evaluation methods. Scoring models ordinarily constitute a number of evaluation indicators that produce a 3

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

final score through the addition or multiplication of individual scores (Souder, 1972), and their structure can make the use of an evaluation model easy; the ease with which their final scores are calculated can create the expectation of robustness through risk-pooling effects. Furthermore, scoring models can easily cover a wide range of technology evaluation purposes by employing different sets of indicators, in particular those significantly related to a certain technology evaluation purpose. Limiting our improvement target to an evaluation model, the decisions on “what to evaluate” and “how to evaluate” are placed at the center of the strategies. Between the two, the former relates to selecting appropriate attributes (reference) that can indicate the target technological characteristics (causality), while the latter is closely connected to developing a proper guideline for evaluation (ontology); by having adequate scales for attributes and assigning rational ratings to each of the attributes in accordance with their features, the model can provide practical guidelines for users and act as a useful predictor of the future of the technology. The propositions for a valid technology evaluation model, as suggested in the current study, include the following: 1) the coherence of the evaluation results with the evaluation purpose, 2) the appropriateness of the evaluation methods, and 3) the concreteness of the evaluation model. A framework based on these propositions is proposed to test the validity of a scoring technology evaluation model (Fig. 1). First, the coherence of the evaluation results with the evaluation purpose (causality) is the most significant and basic proposition for a valid model: the model, after all, cannot be valid when its purpose cannot be accomplished. Thus, this coherence is located at the top of the triangle. Once this coherence condition is satisfied, the other two propositions are examined to identify ways in which we can improve the model's efficiency and effectiveness. Second, the appropriateness of the evaluation methods (ontology) relates to a scoring system for each indicator in an evaluation model. In general, evaluators tend to avoid assigning extreme values to indicators, whereby their evaluation results can be biased to a certain degree. A good scoring system should be able to minimize these tendencies by adjusting a scale for indicators or offering enhanced evaluation guidelines. Therefore, this study claims that an evaluation model is valid only when experts’ evaluated scores are distributed sparsely within the indicators’ initially designed score ranges; this means that the model has been developed while considering the evaluators’ judgment patterns. Finally, the third proposition is the concreteness of the evaluation model (reference). If the intention of the initial model's designers is adequately reflected in the model, its indicators should, at a minimum, be mutually exclusive for their corresponding evaluation criteria and collectively exhaustive when used for overall evaluation purposes. If a technology evaluation model is valid, those propositions will be well supported by statistical evidence. To produce statistical evidence, a data-driven approach—in which a validity test is employed while using data collected from the evaluation results—needs to be adopted. This

approach has several advantages over literature or expert-based approaches. First, the actual usability of an evaluation model can be monitored. The model outcomes anticipated at the time of its development may not align with the actual outcomes at the point of model implementation, as Bremser and Barsky (2004) argue. By using a datadriven approach, it becomes easy to see where the gaps between expectation and reality exist. Second, statistical tests can be applied to guide improvements to a model; such tests offer clear criteria and objective information in distinguishing an acceptable model from one that needs to be improved, and they can also propose directions for further improvement. 3.2. Required statistical methodologies for the framework 3.2.1. Coherence of the evaluation results with the evaluation purpose: causality An investigation of evaluation results and actual future values obtained in accordance with the evaluation purpose is needed to test the coherence of the evaluation results with the evaluation purpose. If an evaluation model is useful, technologies with high evaluation will tend to perform relatively well (or should create more value) than ones with low evaluation. Here, of course, the definition of “performance” needs to align with the purpose of the evaluation; external criteria for measuring performance should be designed and used for this test. For example, if the purpose of the evaluation is to identify a technology with a high potential for commercialization within five years, actual future values should be obtained by observing the degree of successful technology commercialization after a five-year time lag between the evaluation and the target—for which another indicator for measuring performance (i.e., the actual degree of success in technology commercialization) needs to be compared to the evaluation results. Then, there should be statistically significant differences in performance (or value) between the groups of high and low technology evaluation scores (Fig. 2); a t-test can be a simple and powerful tool in accomplishing this. If this usefulness test cannot demonstrate statistically significant results, the other two perspectives are unnecessary, as the model has failed to achieve its basic purpose. 3.2.2. Appropriateness of the evaluation methods: ontology A respective scale of individual indicators for technology evaluation should cover the evaluators’ overall perceptions as broadly as possible. However, evaluation scores for each indicator can decline toward the lower end (with an evaluation guide that is too strict) or toward the higher end (with an evaluation guide that is too generous). If the indicator values are skewed, or concentrated on one value, the indicator is not likely to make a meaningful contribution to the overall evaluation results. The evaluation methods—such as evaluation guidelines used to assign values to the corresponding characteristics of the technologies—need to be able to distinguish technologies that should be highly valued for the indicator from those that should not. Thus, the appropriateness of the evaluation method is the proposition that relates to investigations of the dispersion, curvature, and bias of ratings on individual indicators of a technology evaluation model. Descriptive statistics, including skewness and kurtosis, are used to test appropriateness for each indicator. Skewness (adjusted) and kurtosis are measured through the following equations: N

S (skewness) =

N (N − 1) ∑i = 1 (Yi − Y )3/ N ; K (kurtosis) N−1 S3 N

=

∑i = 1 (Yi − Y ) 4 / N S4

,

where N is the number of technologies evaluated, Y is the evaluation result for technology i, Y is the mean value for Y, and S is the standard deviation for Y. Skewness refers to the degree of one-sided propensity; such

Fig. 1. Framework for improving the validity of a technology evaluation model.

4

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

Fig. 2. T-test to investigate the causality condition.

Fig. 3. Skewness and kurtosis tests to investigate the ontology condition.

4. Case study

propensity needs to be adjusted to make indicator values centrally distributed around a mean. Kurtosis can be used to identify the ambiguity of evaluation scales; a higher level of kurtosis indicates that the scale has less-distinguishable ratings and should therefore be adjusted to avoid over-concentration on a particular value. Fig. 3 visualizes the anticipated evaluation results and a possible undesirable (i.e., highly skewed or highly concentrated) distribution.

4.1. Background This study takes up a single case that provides a rich description and understanding of the phenomena of concern (Walsham, 1995). At the same time, this study is exploratory, as it has few precedents; it is also longitudinal, and requires a set of cumulated data and additional data collection. Given these characteristics, we restrict our focus to only a single case. However, this single case may provide a basis by which to explain why ex-post efforts are essential to technology evaluations, and to investigate additional cases in other settings (Darke et al., 1998). Therefore, to test the applicability of the three propositions for improving a technology evaluation model, this study examines an R & D support program funded by the South Korean government that targets domestic SMEs. This program aims to promote successful technology developments in SMEs by reinforcing the competitiveness of entrepreneurial SMEs that would otherwise have limited business capabilities; it ultimately aims to boost the national innovation system by vitalizing the commercialization of novel technologies. For this program, a technology evaluation model capable of identifying R & D projects with great potential in terms of technological

3.2.3. Concreteness of the evaluation model: reference To satisfy the concreteness condition, the indicators of a technology evaluation model should reflect a technology's unique and multifaceted characteristics, without any overlap. Under the premise that a technology evaluation model is developed to satisfy ontological claims, this concreteness can be evaluated through the use of factor analysis. When the model has mutually exclusive and collectively exhaustive evaluation criteria, factor analysis results should indicate that the model comprises independent evaluation criteria, and that its indicators are assigned to only one of the criteria (Fig. 4). However, if an indicator fails to measure a criterion, being simultaneously loaded to multiple criteria, it needs to be revised or deleted from the model.

Fig. 4. Factor analysis test to investigate the reference condition.

5

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

commercialization was first developed in 2002 by one of the South Korean government agencies, and it has been in use since then. There are two reasons for choosing this model as a case example. First, in terms of country characteristics, South Korea has invested considerably in national R & D programs, and a number of technology evaluation models have been developed to select the beneficiaries of these programs. Since the early 1980s, South Korea has strengthened its innovation capabilities through intensive national R & D programs (Kim and Dahlman, 1992); it has also established government-funded organizations that specialize in R & D planning, as well as project selection and evaluation, to increase the efficiency of these programs (Lee et al., 1996). As a result of such efforts, South Korea now has the world's highest rate of R & D intensity—4.3% of GDP—which puts it ahead of Israel, Japan, and the United States. It also has the world's largest government budget appropriations for R & D (OECD, 2016). Considering the enormous resources that South Korea must have invested to evaluate R & D projects, it is worthwhile to derive implications from the South Korean case. Second, in terms of organizational characteristics, the agency in this case study is a representative South Korean organization that has been involved in evaluating technologies among SMEs. Its first-generation technology evaluation model was developed in 2002 and has been successfully used for more than 10 years. There have been several attempts to improve the model; however, all attempts to date have been based on expert opinions. There arose internal organizational needs regarding the use of evaluation data in measuring the performance of the evaluation model and finding a way to improve it more systematically. The evaluation model (hereafter referred to as the “agency model”) has been used to select the beneficiaries of government funding, in support of R & D planning. It evaluates the technologies of applicant companies, in terms of their potential for success in technology commercialization. The model—which consists of a set of evaluation criteria, indicators, and guidelines to obtain values for the indicators—was developed by internal researchers at the agency, based on literature reviews that featured group discussions. Since then, it has been used annually, and two technology and business experts are assigned to evaluate candidate technologies. As mentioned, the model has been adjusted to changing environments, by integrating, adding, and eliminating some of its indicators; however, these adjustments are subject to chronic limitations, in that only internal opinions are used to improve the model. To address these limitations, in 2014, this research team was invited to offer an external investigation; this is expected to lead to results more objective than those generated by internal reviews, and to suggest ways of improving the agency model. The latest version of the model comprises 26 indicators (Table 1); a final value, as an evaluation result, was obtained by calculating a weighted average value for those indicators. The data were gathered between 2008 and 2013, and correspond to 291 evaluation results using the agency model; they were then provided to us for the current study. Hence, the data-driven approach to improving a technology evaluation model was developed through collaboration between us (i.e., the research team) and the agency. However, to test the first proposition of “the coherence of the evaluation results with the evaluation purpose,” further data collection was needed, to compare “the evaluation results” and “the evaluation purpose.” As the evaluation purpose was to find a technology that is likely to show high performance in terms of technology commercialization, two kinds of proxies were used as performance criteria: 1) a willingness to maintain the project, and 2) overall satisfaction with the project results. There were several reasons for using a subjective proxy to measure the performance of R & D projects against objective measures such as sales change or other economic performance measures. First, the commercialization stages of the technologies under analysis were all different when the survey was conducted, because we traced technologies that were evaluated between 2008 and 2013. Some of the technologies started to create market value, while others were still at the research stage and trying to

Table 1 Indicators of the agency model. Class

Indicators

Innovativeness

1–1 1–2 1–3 2–1

Technological competitiveness

2–2

Strategic validity R & D Infrastructure

Competitiveness of managers

Market attractiveness

Economic effect Commercialization feasibility

2–3 2–4 3–1 3–2 4–1 4–2 4–3 4–4 5–1 5–2 5–3 6–1 6–2 6–3 6–4 6–5 6–6 7–1 7–2 8–1 8–2

Degree of technological leading edge Technological differentiation Lifecycle position of technology Technological contribution to a specific product Applicability and extendibility of technology Ease of production Availability of alternative technology Strategic compatibility of a technology Validity of technology development plan Size of R & D office Expertise of R & D employees Public certification records (Legitimacy) Holding of IPRs Manager experience in similar fields Commercialization capability of managers Career of managers Degree of market competition Potential market size Potential market growth Stability of market demand Barriers of market entry Expected market share Expected return on investment Expected economic ripple effect Strategic compatibility of commercialization Validity of commercialization plan

move to a development stage. Second, the technologies under analysis are in different sectors, which might have different characteristics that affect the commercialization process. For example, the time lag between market entry and economic performance may vary by sector. Third, successful technology commercialization may be defined in various ways and cannot always be evaluated in terms of economic performance, especially among startups and entrepreneurial SMEs, which were the target of the funding. Gruber (2007) argues that a positive cash flow or profitability may not be the prime goal for earlystage ventures that are attempting to establish competitive positioning in an emerging market. Therefore, the use of subjective performance measures is fairly common in entrepreneurship research (Pavia, 1990). Finally, subjective performance measures have strong inter-rater reliability (Calantone et al., 2002); additionally, they strongly correlate with objective performance measures (Frishammar and Hörte, 2005; Zhou et al., 2015). The aforementioned rationale encouraged us to design two subjective proxies and use them to measure technology performance, ultimately to test the coherence results in line with the evaluation purpose of this study. Hence, firms that stated that they were willing to maintain the project, those that had already reached the break-even point, or those satisfied with the project, were classified into a “successful” group (73 firms, 25.09%); others were classified into a “failed” group (39 firms, 13.40%). For this data collection, an e-mail survey was conducted: 112 firms responded to the survey (response rate: 38.49%) and comprised the final sample for this study. 4.2. Testing the coherence of the evaluation results with the evaluation purpose A t-test was used in this study to examine coherence. Initially, Levene's test demonstrated that equal variances are supported between successful and failed groups for both proxies. Following an equal variance case, the agency model's final scores were found to be statistically significant in classifying an R & D project's actual future performance, as noted in Table 2. The analytical results show that the successful group has a higher average score than the failed group, thus implying 6

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

Table 2 T-test for the usefulness. Variable

Willingness to keep up the project Overall satisfaction with results of the project

Leven's test for equality of variance

t-test for equality of means

F

Significance

t

Degree of freedom

Significance (two-tailed)

Mean difference

1.607 2.507

.208 .116

2.344 2.480

110 110

.021* .015*

2.380 2.544

* p-value < .05.

failed to meet the conditions required to assume a symmetric distribution. The same range is used for kurtosis as for testing normality. Generally, the criterion for kurtosis is shifted by −3; this is in line with what is used by most commercial statistical software and is used in this study. (Kurtosis was originally developed to have a value of 3 when the normality is satisfied.) In our case study, the ideal distribution of indicator values is a normal distribution, as they were originally designed to take such a bell-shaped distribution; a larger number of technologies appear in the middle of the distribution than around the sides. Therefore, when the same criteria were applied to our data, six indicators (1–2, 3–2, 5–1, 5–2, 5–3, and 6–3) showed room for improvement. Here, one should note that another ideal distribution can be a uniform distribution, where technologies are evenly distributed across different evaluation values. To test whether or not indicator values are taking a uniform distribution, a chi-square test is suggested; this is a test commonly used to see if a statistically significant difference exists between expected and observed frequencies in several categories. As a result, we identified a set of indicators that require further investigation; those indicators that failed to meet both the skewness and kurtosis requirements were given high priority for improvement, while the indicators meeting only one of the requirements were listed to be given attention. In the agency model, 16 indicators were regarded as acceptable, whereas four needed to be improved first (3–2, 5–1, 5–2, and 5–3); the remaining six (1–2, 2–1, 3–1, 4–2, 6–1, and 6–3) needed to be monitored. The indicator values of the agency model tend to show negative skewness and positive kurtosis values. Thus, it is generally recommended that more strict guidelines (for increasing skewness values) and more distinguishable scales (for reducing kurtosis values) need to be designed when a model is to be improved.

Table 3 Descriptive statistics to test the appropriateness. Indicators

Skewness

Kurtosis

Improvement priority

1–1

−.06

.11

Low

−.31 −.87 −.91

1.20 .10 1.07

Medium Low Medium

−.11

−.13

Low

−.24 −.11

−.73 −.13

Low Low

−1.01

.03

Medium

−1.97

3.13

High

−.68 −1.24 −.29

−.51 .56 −.09

Low Medium Low

−.94 −1.63

.53 2.09

Low High

−1.66

2.01

High

−1.51 −1.05 −.76 −.84 −.77 −.63 −.37 −.73 .18 −.44

2.82 .76 .24 1.49 .27 .22 −.13 −.67 −.12 .74

High Medium Low Medium Low Low Low Low Low Low

−.51

−.23

Low

1–2 1–3 2–1 2–2 2–3 2–4 3–1 3–2 4–1 4–2 4–3 4–4 5–1 5–2 5–3 6–1 6–2 6–3 6–4 6–5 6–6 7–1 7–2 8–1 8–2

Degree of technological leading edge Technological differentiation Lifecycle position of technology Technological contribution to a specific product Applicability and extendibility of technology Ease of production Availability of alternative technology Strategic compatibility of a technology Validity of technology development plan Size of R & D office Expertise of R & D employees Public certification records (Legitimacy) Holding of IPRs Manager experience in similar fields Commercialization capability of managers Career of managers Degree of market competition Potential market size Potential market growth Stability of market demand Barriers to market entry Expected market share Expected return on investment Expected economic ripple effect Strategic compatibility of commercialization Validity of commercialization plan

4.4. Testing the concreteness of the evaluation model We investigated the indicators’ MECE characteristics in the agency model through factor analysis (Table 4). Oblique rotation was employed in this analysis and factor components with eigenvalues greater than 1 were selected, following Kaiser's method. As a result, 10 components were identified; the explanatory power of the selected components was acceptable, as their overall cumulative variance approximated 70%. The factor analysis results show that the agency model can be improved by appropriately designing its evaluation criteria and assigning indicators to these criteria. Of the eight criteria, only three—namely, competitiveness of managers (5–1, 5–2, and 5–3), economic effect (7–1 and 7–2), and commercialization feasibility (8–1 and 8–2)—were found to have coherent sets of indicators that were distinct from other sets. Apart from these, other indicators could be recognized via different criteria; some criteria could be divided into subcriteria, while new criteria could be adopted.

that the coherence of the evaluation results with the evaluation purpose is acceptable (Appendix A).

4.3. Testing the appropriateness of the evaluation methods Descriptive statistics, including skewness and kurtosis, were used to test appropriateness (Table 3). The average skewness of the 26 indicators was approximately −.748, indicating that there is a negative skew and that evaluations lean toward the higher score. The average kurtosis of the 26 indicators was approximately .563, meaning that the evaluation scores tended to converge at a certain point. These results suggest that a scale for each indicator can be more carefully elaborated, or that an evaluation guideline can be improved. In a normality test, the generally accepted range for skewness is between −1 and +1, but more generously, the range between −2 and +2 is allowed (Field, 2009; Gravetter and Wallnau, 2014; Trochim and Donnelly, 2006). Under the criterion of a range between −1 and +1, seven indicators (3–1, 3–2, 4–2, 5–1, 5–2, 5–3, and 6–1; see Table 1)

4.5. Strategy for improving the model After the proposed framework was applied to the real case of the agency model, its validity was investigated in terms of the coherence, appropriateness, and concreteness propositions. First, R & D performance was tracked, and willingness and satisfaction were used as 7

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

Table 4 Factor analysis results to test concreteness. Criteria

Innovativeness

Technological competitiveness

Strategic validity R & D Infrastructure

Competitiveness of managers

Market attractiveness

Economic effect Commercialization feasibility

Indicators

1–1 1–2 1–3 2–1 2–2 2–3 2–4 3–1 3–2 4–1 4–2 4–3 4–4 5–1 5–2 5–3 6–1 6–2 6–3 6–4 6–5 6–6 7–1 7–2 8–1 8–2

Degree of technological leading edge Technological differentiation Lifecycle position of technology Technological contribution to a specific product Applicability and extendibility of technology Ease of production Availability of alternative technology Strategic compatibility of a technology Validity of technology development plan Size of R & D office Expertise of R & D employees Public certification records (Legitimacy) Holding of IPRs Manager experience in similar fields Commercialization capability of managers Career of managers Degree of market competition Potential market size Potential market growth Stability of market demand Barriers to market entry Expected market share Expected return on investment Expected economic ripple effect Strategic compatibility of commercialization Validity of commercialization plan

Percent variance Cumulative variance

Factor loadings on components 1

2

3

4

5

6

7

8

9

10

.700 .397 .107 .717

.086 −.115 .215 .066

.014 −.170 .069 .064

.039 .351 .196 .072

.048 −.064 −.081 −.170

−.056 .048 −.237 −.023

−.259 −.443 −.167 .099

.072 .271 −.161 −.159

.163 −.013 −.150 −.197

−.032 −.155 .465 .006

.761 −.084 .506 .125 .212 −.120 −.032 .060 .124 −.111 .151 −.060 −.115 −.010 −.223 .195 .019 .121 .068 −.171 .075 −.086 9.782 9.782

−.132 .015 −.016 −.179 .549 .700 .693 .618 .159 .004 .004 .189 .024 −.011 .007 −.067 −.040 .024 .147 −.116 −.151 .030 8.761 18.543

−.064 −.006 .035 −.627 −.384 .098 .066 .082 .009 .037 −.253 .248 .050 .089 −.266 .728 .254 −.088 −.106 .262 .010 −.112 7.712 26.254

−.150 −.014 −.064 .063 −.203 −.123 .044 .418 .124 .153 .182 −.303 −.105 .790 .544 .080 .123 −.077 .017 .060 .228 −.062 7.186 33.440

.061 −.004 .130 −.076 −.071 .090 .231 −.306 −.010 .808 .715 .556 −.100 .273 −.127 −.157 .052 −.078 .039 −.108 .020 −.027 6.582 40.022

.107 .051 −.233 −.062 −.050 −.050 .216 −.002 −.025 −.065 .187 −.068 −.310 −.152 .061 .051 .014 .031 .020 .107 −.806 −.823 6.288 46.311

−.068 −.035 .088 .061 −.239 .107 −.113 .289 −.045 .033 .077 −.061 −.558 .004 −.225 −.146 −.796 −.764 −.077 .068 .189 −.149 5.735 52.045

.084 .878 −.131 .243 .203 −.007 −.225 .202 −.095 −.010 −.059 .119 .223 .022 −.265 .229 .107 −.177 .124 −.065 .218 −.256 5.705 57.751

.046 .010 .537 .029 −.024 −.007 .098 −.017 −.099 .085 −.102 −.046 .237 −.022 .326 .132 −.136 .084 .601 .699 .016 −.059 5.650 63.401

−.083 .037 .115 −.120 .230 −.335 −.023 .036 −.826 −.107 .239 −.253 .132 −.008 −.104 −.095 .167 −.200 −.123 .172 −.006 .004 5.379 68.780

need for more detailed guidelines that can help an evaluator assign appropriate values as objectively as possible; in the latter case, the indicator needs to be removed from the evaluation model, or rescaled to be stricter. In addition, the evaluation model seems to be better structured, because several indicators failed to be matched to their corresponding criteria, that is, the intended ideal constructs of the model. Table 5 describes a newly designed set of criteria that are based on the factor analysis results. The eight criteria in the original model are extended to 10 criteria, which include four technology-related criteria (technology competitiveness, R & D risks, production risks, and commercialization feasibility), three market-related criteria (market competitiveness, market potential, and economic effect), and three organization-related criteria (management competitiveness, R & D capabilities, and IP capabilities). For example, here, holding intellectual property rights (IPRs) (4–4) may not be an appropriate indicator in measuring R & D infrastructure, as most technologies evaluated by the agency model belong to SMEs, which use both IPRs and non-IPRs as protection mechanisms for their technologies. It might be better to separate the indicator “holding IPRs” from the other three indicators regarding R & D infrastructure. On the other hand, the market attractiveness criteria in the original model consist of six indicators that attempt to evaluate the characteristics of market attractiveness from various perspectives. The analytical results divide the criteria into two subcriteria—that is, market competitiveness and market potential—and reassign one of its indicators, stability of market demand (6–4), as another criterion of R & D risks. Two indicators, technological differentiation (1–2) and lifecycle position of technology (1–3), should be examined ontologically, as they are not closely related to any of the criteria, having a factor loading value greater than .4 but less than .5. Although Hair et al. (1998) insist that, for practical reasons, the loading values must be greater than .35 if the sample size is larger than 250, these two indicators have relatively low loading values, compared with the others; increasing their loading values is expected to improve the overall validity of this model.

proxies to identify future attainments in evaluations. A t-test was conducted to examine the coherence between the evaluation results and what the evaluation intended to measure (R & D performance); a statistically significant difference in R & D performance between firms with high evaluation results and firms with low evaluation results was identified, and this indicated that this model performed as expected. Second, descriptive statistics, including skewness and kurtosis, were used to examine the appropriateness of the evaluation methods. Unlike the situation with coherence, only 16 indicators met the criteria here. Finally, factor analysis was conducted to test the concreteness of the evaluation model; only 14 indicators were satisfactory here and have matched the criteria initially designed to be matched. Although the coherence condition was satisfied in the agency model case, the analytical results demonstrate that both the appropriateness and the concreteness could be improved. This data-driven approach shows that the agency model can be greatly enhanced, even though, according to the coherence results, the basic requirement for being a valid technology evaluation model has been met. Specifically, there are four indicators that require more focus in improving the model with respect to appropriateness—namely, the validity of technology development plan (3–2), manager experience in similar fields (5–1), commercialization capability of manager (5–2), and manager's career (5–3). Since these indicators have in common comparatively low skewness values (less than −1) and high kurtosis values (greater than +1), these indicators’ evaluation results concentrated on one or two positive values; such circumstances are actually unlikely to help distinguish “good” technologies from “bad” ones. We conducted another t-test to determine whether differences in indicator values between the high-performance group and the low-performance group are statistically significant. As expected, none of the four indicators had statistically significant differences, when the significance level was set to .05. It is likely that these indicators were either too subjective to be strict in evaluation, or so obvious that most technologies being evaluated obtained high values. In the former case, there is a 8

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

Table 5 New criteria based on indicators. Indicators

1–1 2–1 2–2 3–2 4–1 4–2 4–3 3–1 6–4 6–2 6–3 5–1 5–2 5–3 8–1 8–2 1–2 6–1 6–5 6–6 2–3 2–4 7–1 7–2 4–4 1–3

Degree of technological leading edge Technological contribution to a specific product Applicability and extendibility of technology Validity of technology development plan Size of R & D office Expertise of R & D employees Public certification records (Legitimacy) Strategic compatibility of a technology Stability of market demand Potential market size Potential market growth Manager experience in similar fields Commercialization capability of managers Career of managers Strategic compatibility of commercialization Validity of commercialization plan Technological differentiation Degree of market competition Barriers to market entry Expected market share Ease of production Availability of alternative technology Expected return on investment Expected economic ripple effect Holding of IPRs Lifecycle position of technology

Factor loadings on components 1

2

3

4

5

6

7

8

9

10

Re-defined criteria

.700 .717 .761 .212 −.120 −.032 .060 .125 .195 −.010 −.223 −.111 .151 −.060 .075 −.086 .397 −.115 .019 .121 −.084 .506 .068 −.171 .124 .107

.086 .066 −.132 .549 .700 .693 .618 −.179 −.067 −.011 .007 .004 .004 .189 −.151 .030 −.115 .024 −.040 .024 .015 −.016 .147 −.116 .159 .215

.014 .064 −.064 −.384 .098 .066 .082 −.627 .728 .089 −.266 .037 −.253 .248 .010 −.112 −.170 .050 .254 −.088 −.006 .035 −.106 .262 .009 .069

.039 .072 −.150 −.203 −.123 .044 .418 .063 .080 .790 .544 .153 .182 −.303 .228 −.062 .351 −.105 .123 −.077 −.014 −.064 .017 .060 .124 .196

.048 −.170 .061 −.071 .090 .231 −.306 −.076 −.157 .273 −.127 .808 .715 .556 .020 −.027 −.064 −.100 .052 −.078 −.004 .130 .039 −.108 −.010 −.081

−.056 −.023 .107 −.050 −.050 .216 −.002 −.062 .051 −.152 .061 −.065 .187 −.068 −.806 −.823 .048 −.310 .014 .031 .051 −.233 .020 .107 −.025 −.237

−.259 .099 −.068 −.239 .107 −.113 .289 .061 −.146 .004 −.225 .033 .077 −.061 .189 −.149 −.443 −.558 −.796 −.764 −.035 .088 −.077 .068 −.045 −.167

.072 −.159 .084 .203 −.007 −.225 .202 .243 .229 .022 −.265 −.010 −.059 .119 .218 −.256 .271 .223 .107 −.177 .878 −.131 .124 −.065 −.095 −.161

.163 −.197 .046 −.024 −.007 .098 −.017 .029 .132 −.022 .326 .085 −.102 −.046 .016 −.059 −.013 .237 −.136 .084 .010 .537 .601 .699 −.099 −.150

−.032 .006 −.083 .230 −.335 −.023 .036 −.120 −.095 −.008 −.104 −.107 .239 −.253 −.006 .004 −.155 .132 .167 −.200 .037 .115 −.123 .172 −.826 .465

Technology competitiveness

5. Discussion

R & D capabilities

R & D risks Market potential Management competitiveness

Commercialization feasibility Market competitiveness

Production risksEconomic effect

IP capabilities

instead, we adopted a logical approach, to investigate the usability of a technology evaluation model and provide an opportunity to improve the model, based on the three aforementioned propositions.

5.1. Theoretical implications This is one of the first attempts to test the validity of technology evaluation models and ultimately improve them. The findings of the analysis indicate that the validity concept developed in the measurement field applies well to the technology evaluation field, as technology evaluation is a process of measuring the value of technology. Three propositions, proposed in this study to test the validity of technology model, can be linked to types of validity in measurement. The first proposition—the coherence of evaluation results with the evaluation purpose—is in line with criterion validity, in particular predictive validity. The third proposition—the concreteness of the evaluation model—is related to construct validity. On the other hand, the second proposition—the appropriateness of the evaluation methods—is peculiar to technology evaluation, although the basic concept comes from the “ontology” aspect of validity theory. Unlike measurement, which simply assigns a number to an object based on an agreed standard, evaluation entails a process of assessing the merit, worth, and significance of an object based on a set of standards (criteria). It means that the criteria used for evaluation should contribute to distinguishing an object with more value from one with less value. Here, it is noted that the suggested approach focuses on continuously improving existing evaluation models. In principle, there are two ways to improve technology evaluation models. One involves radical improvement and attempts to develop a completely different model (e.g., from a scoring model to an option model). The other involves minor improvement and pursues modifications to the existing model (e.g., changes to in an indicator in a scoring model). Between these two, the suggested approach will be more useful for the latter purpose than the former one. Under this condition, to examine the effectiveness of ex-post efforts through the use of comparative analysis, technology evaluation results should be obtained using both the initial model (control group) and the revised model (experimental group). Given the time gap between the technology evaluation and the performance measurement, we could not undertake comparative analysis;

5.2. Policy and practical implications Among the three prepositions, we find the first proposition—the coherence of evaluation results with the evaluation purpose—is the most significant preposition, both in practice and policy making, being a basis for the other two propositions. Therefore, the purpose of technology evaluation needs to be clearly defined before the valuation model is improved or developed. Such purposes can be differentiated along the R & D funnel. This issue is also raised by Cooper (1993), who suggests the use of the well-known Stage–Gate R & D process model. This is a step-by-step mechanism that ranges from the idea evaluation stage to the product support stage; within it, different evaluation purposes are applied to each gate to initiate the next stage of the process. Based on the R & D funnel, organizations may initially wish to discover technological potentials in the earlier R & D stages, in order to construct future plans. Thus, this stage would primarily focus on evaluating a technology's promising aspects. Thus, factors such as originality, innovativeness, urgency, and technological difficulties would be considered significant (e.g., Cho and Lee, 2013; Shen et al., 2010). Second, organizations should deliberately select and conduct a proper R & D project from among a number of R & D candidates. An organization at this stage might look to identify the most remarkable R & D candidate by weighing expected project benefits against required project investments. Therefore, primary considerations might include such resources as cost, time, and labor; the probability of commercial success; patentability; research return; and risk (e.g., Benson et al., 1993; Wang et al., 2010). Third, estimating the actual value of developed technology that derives from R & D projects is another crucial task, whether an organization wants to use that technology directly or transfer it. This technology valuation can be considered similar to the evaluation used to discover promising technologies, but here, more weight is put on factors related to current technological availability; these factors 9

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

determining a significance level, since the ultimate goal of this study is to improve the evaluation model and not to test the validity of the model itself. No single model is best in all cases, given the wide variety of evaluation contexts. The rule of thumb in testing the normality of a distribution in skewness and kurtosis is a range from −2 to 2, but this can be modified more generously or strictly, in line with the evaluation model used. For example, the measurements used to screen an evaluation model can be less precise than those used in investment decisions. Similarly, an evaluation model for early-stage technologies may require less-precise measurements than those for later-stage technologies. Regarding the t-test, the most frequently used criterion is .05, but considering the characteristics of the evaluation model, it can be changed to .1 (more generous) or .01 (more strict). A stricter criterion requires that the difference between highly evaluated technologies and the others needs to be more obvious and apparent. Finally, no absolute criterion value is proposed for determining whether the evaluation model is also valid from the perspective of concreteness (reference). Of course, an index can be designed to measure the degree of discrepancy between the structure of the planned model and that of the actual model. For this purpose, the value 0 is given to an indicator that is assigned to the correct criteria, and 1 to the others; the obtained values are summarized, and are finally divided by the number of indicators. The greater the index value thus obtained, the greater the discrepancy. In this case, the value .385 was obtained, which means that 3.85 of 10 indicators were not properly assigned to the correct criteria. Again, just how strict the criteria should be depends on the user, but this index can help summarize the results in a concise way.

include the scope of application, adaptability, degree of completeness, and ease of use (e.g., Chiu and Chen, 2007; Yu and Hang, 2011; Yu and Lee, 2013). In addition to these three purposes, there may be a fourth that can cross the R & D funnel. An organization's technological capabilities strongly affect its R & D-related activities: securing a higher level of technological capability, for example, can help ensure more effective and efficient R & D activities. Thus, organizations’ technological resources, R & D infrastructure, R & D climate, or even technology commercialization capabilities are given significant consideration in this evaluation context, rather than certain technology or R & D project characteristics. The evaluation targets here are organizational capabilities as a collection of technological assets, and the potential for them to be used and advanced. Relevant studies have investigated assessments of organizations’ technological capabilities (Cheng and Lin, 2012; Kim et al., 2011; Mohammadi et al., 2017; Sobanke et al., 2014; Van Wyk, 2010). The case addressed in this study aligns with the second purpose: a government agency wants to choose R & D projects that are worthy of being supported by government subsidies. As there exists a number of evaluation models (as discussed in Section 2.1), clarifying the evaluation purpose will greatly assist in understanding the context of technology evaluation, and it should therefore be a precondition to testing the validity of an evaluation model. We also found that the guidelines suggested in this study were quite useful in practice for improving a technology evaluation model. The model for case study has been used for several years and improved continuously based on expert opinions. Nevertheless, there remained ample room for further improvement when the approach suggested in this study was applied to the model for testing its validity. By taking a data-driven approach, the objective rationales for improvement plans could be obtained. Particularly, the appropriateness of the evaluation methods can be relatively easily tested and improved when a datadriven approach is taken, while generating meaningful results. Indeed, a number of technology evaluation models are being used as a part of policy-making or strategy-planning processes, while few of them are being monitored to check their validity. Furthermore, the purpose of evaluation or external environments may change as time goes by. For instance, in our case, the target for later evaluation moved towards the earlier stage of technologies in their technology life cycle. Therefore, a validity test for the evaluation model is essential to ensure its effective use as a sustainable tool to distinguish valuable technologies from the others.

6. Conclusions This study proposes a systematic approach to examining the validity of technology evaluation models and to providing guidelines by which to improve them. For this purpose, we reviewed existing studies on both technology evaluation and validity, proposed a framework based on the results of that review, and applied the framework to a single in-depth case (i.e., a technology evaluation model used by a South Korean government agency). Accordingly, we developed a strategy by which the agency model could be improved, and discussed relevant topics. The contribution of this study, based its case study results and discussion, can be described as follows. First, this study can be considered an early effort to connect validity theories and evaluation practices within the technology evaluation context; few attempts have been made to consider these urgent and practical needs. In particular, this study not only applied analytics to test the validity of a technology evaluation model, but also integrated concurrent debates regarding validity and the nature of technology evaluation. On this basis, we suggested a theoretically solid framework for use in actual practice. Second (and consequently), the framework suggested in this study, as well as its propositions, can be used effectively in practice. The strategy derived from this research can work as a useful guideline for organizations that experience similar problems. Third, this study provided an opportunity for suggesting future investigations, and highlighted several topics for discussion. As such, it promotes and extends the research domains of the technology evaluation context towards ex-post efforts, namely, the implementation and improvement of technology evaluation models. Despite its insightful contributions for both industry and academia, this study does have several limitations. First, although the use of a single case can allow researchers to investigate phenomena in depth and thus derive rich descriptions and understanding, there may be problems with generalizability; the validity concept was applied only to a single case and its usability tested in a limited context. Further studies are needed to test its usability in other contexts. In a similar vein, among various technology evaluation models, this study considered only a scoring model. To test their validity, different evaluation models will demand the use of different approaches; this will be addressed as a

5.3. Methodological considerations Although our approach is basically sound, several issues may arise in relation to our test methods; these need to be discussed in greater detail, to explain the rationales behind the development of our approach. First, with respect to the coherence of the evaluation results with the evaluation purpose (causality), we actually needed to test causalities between evaluation results and the evaluation purpose—that is, what the evaluation intended to measure (in our case, technology performance). A t-test, on the other hand, cannot be used to investigate such causalities. In this case, the purpose of the evaluation model is to predict technology performance, by using a binary value (success versus failure). Moreover, all indicators in the model were chosen carefully from existing studies that identify factors that affect technology performance. This justified the use of a t-test as a causality test, but a simple regression analysis might be better for testing causality in other contexts. Second, to check the appropriateness of the evaluation methods (ontology), this study used two descriptive statistics (i.e., skewness and kurtosis) at the significance range between −2 to +2 for the appropriateness of evaluation methods (ontology), and applied a t-test at the significance level of .05 for the coherence of the evaluation results with the evaluation purpose (causality). However, there is no strict rule for 10

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

Future research will address these issues.

future research topic. Finally, this study helped develop a strategy for improving the model under consideration, and this is considered fruitful in making improvements. However, the results of the actual improvement could not be observed, as testing the validity of the improved model would require data collection over several more years. Such a validity test is beyond the scope of this study, but further such activities would help ensure the validity of the suggested approach.

Acknowledgement This work was supported by the Korea Institute of Science and Technology Information (KISTI) and Ajou University.

Appendix A: Mean score differences between successful and failed groups

Variable

Group

No.

Mean

Standard deviation

Willingness to keep up the project Overall satisfaction with results of the project

Successful Failed Successful Failed

73 39 75 37

82.467 80.087 82.479 79.935

4.734 5.782 4.671 5.899

Hsu, Y.L., Lee, C.H., Kreng, V.B., 2010. The application of Fuzzy Delphi Method and Fuzzy AHP in lubricant regenerative technology selection. Expert Syst. Appl. 37 (1), 419–425. IPCC, 2007. Climate Change2007: Mitigation. Contribution of Working Group III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, United Kingdom. Jolly, D.R., 2012. Development of a two-dimensional scale for evaluating technologies in high-tech companies: an empirical examination. J. Eng. Technol. Manag. 29 (2), 307–329. Kalbar, P.P., Karmakar, S., Asolekar, S.R., 2012. Selection of an appropriate wastewater treatment technology: a scenario-based multiple-attribute decision-making approach. J. Environ. Manag. 113, 158–169. Kane, M.T., 2001. Current concerns in validity theory. J. Educ. Meas. 38 (4), 319–342. Kang, K.N., Park, H., 2012. Influence of government R & D support and inter-firm collaborations on innovation in Korean biotechnology SMEs. Technovation 32 (1), 68–78. Kaplan, R.S., Norton, D.P., 1996. The Balanced Scorecard. Harvard Business School Press, Boston MA. Kim, L., Dahlman, C.J., 1992. Technology policy for industrialization: an integrative framework and Korea's experience. Res. Policy 21 (5), 437–452. Kim, S.K., Lee, B.G., Park, B.S., Oh, K.S., 2011. The effect of R & D, technology commercialization capabilities and innovation performance. Technol. Econ. Dev. Econ. 17 (4), 563–578. Lee, M., Son, B., Om, K., 1996. Evaluation of national R & D projects in Korea. Res. Policy 25 (5), 805–818. Lee, S., Kim, W., Kim, Y.M., Lee, H.Y., Oh, K.J., 2014. The prioritization and verification of IT emerging technologies using an analytic hierarchy process and cluster analysis. Technol. Forecast. Soc. Change 87, 292–304. Linstone, H.A., 2011. Three eras of technology foresight. Technovation 31 (2), 69–76. Loch, C.H., Tapper, U.A., 2002. Implementing a strategy‐driven performance measurement system for an applied research group. J. Prod. Innov. Manag. 19 (3), 185–198. Messick, S., 1986. The Once and Future Issues of Validity: Assessing the Meaning and Consequences of Measurement (ETS Research Report No. 86-30). Educational Testing Service, Princeton, NJ. Mohammadi, M., Elyasi, M., Mohseni Kiasari, M., 2017. Technology assessment: technological capability assessment for automotive parts manufacturers. In: Daim, T.U. (Ed.), Managing Technological Innovation: Tools and Methods. World Scientific Series in R & D Management, New Jersey, pp. 103–127. OECD, 2005. Oslo Manual: Proposed Guidelines for Collecting and Interpreting Technological Innovation Data, 3rd ed. OECD Publishing, Paris. OECD, 2016. Main Science and Technology Indicators. Available at: 〈http://www.oecd. org/sti/msti.htm〉. June 2016. Ordoobadi, S.M., 2008. Fuzzy logic and evaluation of advanced technologies. Ind. Manag. Data Syst. 108 (7), 928–946. Park, M., Lee, H.S., Kwon, S., 2010. Construction knowledge evaluation using expert index. J. Civ. Eng. Manag. 16 (3), 401–411. Pavia, T.M., 1990. Product growth strategies in young high‐technology firms. J. Prod. Innov. Manag. 7 (4), 297–309. Perkmann, M., Neely, A., Walsh, K., 2011. How should firms evaluate success in university–industry alliances? A performance measurement system. R & D Manag. 41 (2), 202–216. Rocha, A., Tereso, A., Cunha, J., Ferreira, P., 2014. Investments analysis and decision making: Valuing R & D project portfolios using the PROV exponential decision method. Tékhne 12 (1), 48–59. Rooney, D., 1997. A contextualising, socio-technical definition of technology: learning from Ancient Greece and Foucault. Prometheus 15 (3), 399–407. Roure, J.B., Keeley, R.H., 1990. Predictors of success in new technology based ventures. J. Bus. Ventur. 5 (4), 201–220. Santiago, L.P., Martinelli, M., Eloi-Santos, D.T., Hortac, L.H., 2015. A framework for

References Abbassi, M., Ashrafi, M., Tashnizi, E.S., 2014. Selecting balanced portfolios of R & D projects with interdependencies: a cross-entropy based methodology. Technovation 34 (1), 54–63. Azzone, G., Manzini, R., 2008. Quick and dirty technology assessment: the case of an Italian Research Centre. Technol. Forecast. Soc. Change 75 (8), 1324–1338. Baden-Fuller, C., Haefliger, S., 2013. Business models and technological innovation. Long Range Plan. 46 (6), 419–426. Benson, B., Sage, A.P., Cook, G., 1993. Emerging technology-evaluation methodology: With application to micro-electromechanical systems. IEEE Trans. Eng. Manag. 40 (2), 114–123. Bloom, B.S., 1956. Taxonomy of Educational Objectives. Vol. 1: Cognitive domain. McKay, New York. Borenstein, D., 1998. Towards a practical method to validate decision support systems. Decis. Support Syst. 23 (3), 227–239. Borsboom, D., Mellenbergh, G.J., van Heerden, J., 2004. The concept of validity. Psychol. Rev. 111 (4), 1061. Bremser, W.G., Barsky, N.P., 2004. Utilizing the balanced scorecard for R & D performance measurement. R & D Manag. 34 (3), 229–238. Calantone, R.J., Cavusgil, S.T., Zhao, Y., 2002. Learning orientation, firm innovation capability, and firm performance. Ind. Mark. Manag. 31 (6), 515–524. Cheng, Y.L., Lin, Y.H., 2012. Performance evaluation of technological innovation capabilities in uncertainty. Procedia Soc. Behav. Sci. 40, 287–314. Chiu, Y.J., Chen, Y.W., 2007. Using AHP in patent valuation. Math. Comput. Model. 46 (7), 1054–1062. Cho, J., Lee, J., 2013. Development of a new technology product evaluation model for assessing commercialization opportunities using Delphi method and fuzzy AHP approach. Expert Syst. Appl. 40 (13), 5314–5330. Cooper, R.G., 1993. Winning at New Products: Accelerating the Process form Idea to Launch. Addison-Wesley, Reading, MA. Cronbach, L.J., 1988. Internal consistency of tests: analyses old and new. Psychometrika 53 (1), 63–70. Cronbach, L.J., Meehl, P.E., 1955. Construct validity in psychological tests. Psychol. Bull. 52 (4), 281–302. Daim, T., Yates, D., Peng, Y., Jimenez, B., 2009. Technology assessment for clean energy technologies: the case of the Pacific Northwest. Technol. Soc. 31 (3), 232–243. Danneels, E., 2004. Disruptive technology reconsidered: a critique and research agenda. J. Prod. Innov. Manag. 21 (4), 246–258. Darke, P., Shanks, G., Broadbent, M., 1998. Successfully completing case study research: combining rigour, relevance and pragmatism. Inf. Syst. J. 8 (4), 273–289. Facey, K., Boivin, A., Gracia, J., Hansen, H.P., Scalzo, A.L., Mossman, J., Single, A., 2010. Patients' perspectives in health technology assessment: a route to robust evidence and fair deliberation. Int. J. Technol. Assess. Health Care 26 (03), 334–340. Field, A., 2009. Discovering Statistics Using SPSS. SAGE, London. Frishammar, J., Hörte, S.Å., 2005. Managing external information in manufacturing firms: the impact on innovation performance. J. Prod. Innov. Manag. 22 (3), 251–266. Gravetter, F., Wallnau, L., 2014. Essentials of Statistics for the Behavioral Sciences, 8th ed. Wadsworth, Belmont, CA. Gruber, M., 2007. Uncovering the value of planning in new venture creation: a process and contingency perspective. J. Bus. Ventur. 22 (6), 782–807. Hair, J.F., Tatham, R.L., Anderson, R.E., Black, W., 1998. Multivariate Data Analysis, 5th ed. Prentice-Hall, London. Hsu, D.W., Shen, Y.C., Yuan, B.J., Chou, C.J., 2015. Toward successful commercialization of university technology: performance drivers of university technology transfer in Taiwan. Technol. Forecast. Soc. Change 92, 25–39.

11

Technovation xxx (xxxx) xxx–xxx

H. Noh et al.

Walsham, G., 1995. Interpretive case studies in IS research: nature and method. Eur. J. Inf. Syst. 4 (2), 74–81. Wang, J., Lin, W., Huang, Y.H., 2010. A performance-oriented risk management framework for innovative R & D projects. Technovation 30 (11), 601–611. Wang, J., Wang, C.Y., Wu, C.Y., 2015. A real options framework for R & D planning in technology-based firms. J. Eng. Technol. Manag. 35, 93–114. Wortley, S., Tong, A., Lancsar, E., Salkeld, G., Howard, K., 2015. Public preferences for engagement in Health Technology Assessment decision-making: protocol of a mixed methods study. BMC Med. Inform. Decis. Mak. 15 (1), 52. Yin, R.K., 2013. Case Study Research: Design and Methods. Sage Publications, London. Yu, D., Hang, C.C., 2011. Creating technology candidates for disruptive innovation: generally applicable R & D strategies. Technovation 31 (8), 401–410. Yu, P., Lee, J.H., 2013. A hybrid approach using two-level SOM and combined AHP rating and AHP/DEA-AR method for selecting optimal promising emerging technology. Expert Syst. Appl. 40 (1), 300–314. Zhou, W., Hu, H., Shi, X., 2015. Does organizational learning lead to higher firm performance? An investigation of Chinese listing companies. Learn. Organ. 22 (5), 271–288.

assessing a portfolio of technologies for licensing out. Technol. Forecast. Soc. Change 99, 242–251. Shen, Y.C., Chang, S.H., Lin, G.T., Yu, H.C., 2010. A hybrid selection model for emerging technology. Technol. Forecast. Soc. Change 77 (1), 151–166. Shepard, L.A., 1993. Evaluating test validity. Rev. Res. Educ. 19, 405–450. Sobanke, V., Adegbite, S., Ilori, M., Egbetokun, A., 2014. Determinants of technological capability of firms in a developing country. Procedia Eng. 69, 991–1000. Sohn, S.Y., Moon, T.H., Kim, S., 2005. Improved technology scoring model for credit guarantee fund. Expert Syst. Appl. 28 (2), 327–331. Souder, W.E., 1972. A scoring methodology for assessing the suitability of management science models. Manag. Sci. 18 (10), B-526. Stevens, S.S., 1968. Measurement, statistics, and the schemapiric view. Science 161 (3844), 849–856. Trochim, W.M., Donnelly, J.P., 2006. The Research Methods Knowledge Base, 3rd ed. Atomic Dog, Cincinnati, OH. Van Wyk, R.J., 2010. Technology assessment for portfolio managers. Technovation 30 (4), 223–228. Van Zee, R.D., Spinler, S., 2014. Real option valuation of public sector R & D investments with a down-and-out barrier option. Technovation 34 (8), 477–484.

12