A novel machine learning approach for evaluation of public policies: An application in relation to the performance of university researchers

Technological Forecasting & Social Change 149 (2019) 119756 Contents lists available at ScienceDirect Technological Forecasting & Social Change jour...

Download PDF

787KB Sizes 0 Downloads 5 Views

Report

PDF Reader
Full Text

Technological Forecasting & Social Change 149 (2019) 119756

Contents lists available at ScienceDirect

Technological Forecasting & Social Change journal homepage: www.elsevier.com/locate/techfore

A novel machine learning approach for evaluation of public policies: An application in relation to the performance of university researchers

T

⁎

María Teresa Ballestara, , Luis Miguel Doncelb, Jorge Sainzc, Arturo Ortigosa-Blanchd a

ESIC Business & Marketing School, Spain Universidad Rey Juan Carlos, Spain c University of Bath, UK and Universidad Rey Juan Carlos, Spain d ESIC Business & Marketing School, Spain b

A R T I C LE I N FO

A B S T R A C T

Keywords: Research evaluation Machine learning Longitudinal clustering Incentive-based policies

Research has become the main reference point for academic life in modern universities. Research incentives have been a controversial issue, because of the diﬃculty of identifying who are the main beneﬁciaries and what are the long-term eﬀects. Still, new policies including ﬁnancial incentives have been adopted to increase the research output at all possible levels. Little literature has been devoted to the response to those incentives. To bridge this gap, we carry out our analysis with data of a six years program developed in Madrid (Spain). Instead of using a traditional econometric approach, we design a machine learning multilevel model to discover on whom, when, and for how long those policies have an eﬀect. The empirical model consists of an automated nested longitudinal clustering (ANLC) performed in two stages. Firstly, it performs a stratiﬁcation of academics, and secondly, it performs a longitudinal segmentation for each group. The second part considers the researchers’ sociodemographic, academic information and the evolution of their performance over time in the form of the annual percentage variation of their marks over the period. The new methodology, whose robustness is tested with a multilayer perceptron artiﬁcial neural network with a back-propagation learning algorithm, shows that tenure track researchers present a better response to incentives than tenured researches, and also that gender plays an important role in academia. These discoveries are relevant to administrations and universities for understanding the productivity of academics working under long-term incentive-based programs, the drawbacks and the inequalities for maximizing the generation of knowledge.

1. Introduction The design of incentives for academics and the evaluation of their performance is a pressing issue for higher education institutions and funding agencies. Over the last decades, academia has witnessed a shift in the balance between teaching and research in favor of the former. This process is heavily promoted by the increase in the internationalization of the students, the parallel emergence of international rankings (Shanghai, Times Higher Ed, European Commission, etc.) to reduce informational asymmetries, and the global competition for research funds from private and public organizations (Hicks, 2012). The logic behind this behavior is the belief, diﬃcult to measure, that good research is related to good teaching. This is driving professors to actively promote their accomplishments and their visibility in the search for funding from ﬁnancing bodies and the own institutions

(Auranen and Nieminen, 2010). The emergence of networks of knowledge rapidly translates into a diﬀerentiation over the researchers. Those who are in top tier universities play in a diﬀerent league and have average or below average impact even if they have higher productivity (Taylor, 2011). To increase the dominance of their own researchers, diﬀerent countries have built incentive systems meant to promote the quality of their institutions and researchers, aiming to increase their global impact. Countries learn from the policy experiments of their peers (Dobbin et al., 2007; Easterly and Levine, 1997). Two of the best known are the 2014 Research Excellence Framework (REF) and the 2008 Research Assessment Exercise (RAE) in the United Kingdom. It is diﬃcult to understate their relevance; “with 0.88% of global population, 3.2% of global R&D expenditure, and 4.1% of global researchers, accounting for 9.5% of research downloads, 11.6% of citations, and 15.9% of the world's most

⁎

Corresponding author. E-mail addresses: [email protected] (M.T. Ballestar), [email protected] (L.M. Doncel), [email protected] (J. Sainz), [email protected] (A. Ortigosa-Blanch). https://doi.org/10.1016/j.techfore.2019.119756 Received 12 June 2019; Received in revised form 2 September 2019; Accepted 19 September 2019 0040-1625/ © 2019 Elsevier Inc. All rights reserved.

Technological Forecasting & Social Change 149 (2019) 119756

M.T. Ballestar, et al.

can be used in several policy issues with large social and economic implications that go further than policy management but also have a theoretical impact (Kleinberg et al, 2015; Kleinberg et al, 2018). While these methods clearly have many advantages, Athey (2017) points out that ML driven policies may deprive stakeholders of the knowledge about how and why policies are made, raising issues like transparency, interpretability, fairness, or discrimination. But as she concludes, the need for explanations of real-world policy issues through big data will bring positive new methodological advances and the possibility of its implementation for other policies based on data science. Our methods of choice, clustering and artiﬁcial neural networks, are recommended by Mjolsness and DeCoste (2001) in their seminal paper in Science about machine learning methods, given their ability to: “… improve their performance of a task on the basis of their own previous experience.” They have been widely used before on areas like ecology (Kattge et al, 2011), engineering (LaConte et al, 2005), employment, (Cvecic and Sokolic, 2018 or Wang, 2019) and Business (Ballestar et al, 2018a,b; 2019) and follow the validation scheme similar to the one designed by Pers et al. (2009). The application of these methods to higher education has been mainly on student and teacher evaluation (Sin and Muthu, 2015). Although this analysis, as any knowledge structure, is heavily dependent on the initial structure some predictions can be established. This is the case with academic tenure, which is a traditional topic in higher education research (Musselin, 2005; Chait, 2009; Epstein and Fischer, 2017) as it has a lasting eﬀect on how academics behave, especially in Continental Europe, where, in many cases, contracts are linked to public sector job-security. Our analysis will provide interesting new results on the eﬀect of diﬀerent work contracts, which is relevant for designing new career paths, as European countries have already done, following the recommendations by the European Commission and the OECD (Wang et al, 2018). The analysis of the eﬀects of evaluation on research strategies is a much less-travelled path. None of the previous contributions in the literature reviewed have the same number of evaluations to study the long-term eﬀect of incentives on the behavior of academics. We test the performance of the program over time and how the characteristics of the researchers correspond to diﬀerent responses to the incentives. To do so, we developed an innovative machine learning method which is able to process and learn from a longitudinal dataset which is processed in two diﬀerent stages. Our contribution is dual. Firstly, it focuses on the development of a robust ML method and secondly it is used to evaluate the eﬀects of economic incentives on research in long-term public policies. We focus on two main hypotheses that are relevant for policy design. First, employment status, in terms of contractual relationship, is a key in how incentives aﬀect academics (H1) and, secondly, the eﬀects of this type of policy dilute over time (H2). Both issues have already been discussed at some length in the analysis of the British system (Hobbs and Roberts, 2016) but the lack of data with extensive length of time and number of individuals makes our results very relevant to the discussions. For the purpose of robustness, we test the results of our ML model based on an Automated Nested Longitudinal Clustering method (ANLC), by using a completely newly developed ML model based on a multilayer perceptron (MLP) artiﬁcial neural network (ANN) with a back-propagation learning algorithm, as designed in Ballestar et al (2019) that will also allow for forecasting the results of future policies. Both hypotheses are documented in the literature on human resources (see the reviews by Jenkins et al, 1998 or Wright and Boswell, 2002) and in the experimental approach by Camerer and Hogarth (1999). The use of incentives has been the subject of a heated debate on its eﬀects on academic integrity, reproducibility and focus (see for example Edwards and Roy, 2017; Chambers et al, 2015; Lakens et al 2018, etc.). To our knowledge, none have used a dynamic ML approach such as ours to focus on the dynamic environment which allows for real time traceability and reporting.

highly-cited articles” (Hobbs and Roberts, 2016). Some of the results of both exercises were as expected from the ﬁrst theoretical models in the literature: increase in the hiring of quality academics, pressure for results which widely diﬀer across disciplines and across universities, an improvement in the international rankings and higher visibility of UK universities, which have become international competitors (Taylor, 2011; Tietze, 2018). The competition between departments results in a “hunting season” for top tier academics across the globe, as REF and RAE periods approach. There were also unexpected side results. There is an acute case of short-termism in research. Much of the eﬀort was put into the periods close to the evaluations, so academics were more focused on the volume of the output than on its quality, which aﬀected its impact. Long-term projects are sidelined even if the expectations of their output are large because of the acute need for funding. Also, and although it was not generalized, there has been a surge in the cases of malpractice in academia that has aﬀected the credibility of the whole profession and has increased the call for higher ethical standards for researchers, journals and the entire system (Pontille and Torny, 2010; Rothstein and Uslaner, 2005). To further analyze these issues, which are similar to other public policy assessments, we take advantage of a newly available dataset of an incentives program implemented in the Madrid Region to boost the productivity of the public universities from 2005 to 2010. Instead of using traditional econometrics we developed a data science method, consisting of an Automated Nested Longitudinal Clustering (ANLC) performed in two stages that allows us to ﬁnd the main results of the project, its dynamic behavior and, through machine learning, facilitates the application of the method to the evaluation of public policies. Our results show that long-term incentive programs are more effective with tenure-track researchers, who beneﬁt from them in two diﬀerent ways at the same time: gaining tenure and obtaining the economic rewards. This ﬁnding is valid across areas of knowledge and universities. Also, we see that the eﬀects of the program fade over time and that gender plays an important role in academia. The comparison of the performance between men and women varies depending on their contractual relationship with the university. Men are the ones that reach higher marks in the tenured researcher's group and women in the tenure-track researcher's group, when researchers are signiﬁcantly younger. In the next sections we survey the literature of Artiﬁcial Intelligence (AI), speciﬁcally Machine Learning, and also the literature of public policies evaluation to later analyze our dataset by using a novel multilevel machine learning approach that applies stratiﬁcation in the ﬁrst stage and a non-supervised machine learning method in the second stage. We also validate the robustness of this novel method, analyze the result and propose further developments in the conclusions section. The validation of the robustness of the ANLC also represents an innovation, as we develop a complete new supervised machine learning method, which is a multilayer perceptron (MLP) artiﬁcial neural network (ANN) with a back-propagation learning algorithm, to test the results. 2. Theoretical framework The use of AI on Public Policy design is a relatively novel area of study. One of the AI instruments which is becoming widely used is Machine Learning (ML). Recently Athey and Imbens (2017) address the advantages of these methods, pointing out that “Machine learning methods provide important new tools to improve estimation of causal eﬀects” which can reduce the “…reliance of these estimates on modeling assumptions…” enhancing the “…credibility of policy analysis”. Chalﬁn et al. (2016) point out some additional advantages, as ML allows for a trade-oﬀ between bias and variance, while with traditional econometric methods prediction errors are a function of variance as well as bias when looking for accuracy in out-of-sample predictions. These new ML methods are able to generate better predictions that 2

Technological Forecasting & Social Change 149 (2019) 119756

M.T. Ballestar, et al.

3. Data collection

from sexennium, 2.8 points from projects and 1.5 points from quinquennium. Researchers in the sample reach an average of 3.41 points for sexennium, 2.01 points for projects and 0.98 points for quinquennium. Therefore, the average of the total mark is 6.45, with a growth of 9.36% within the six years. b) Tenure track researchers can obtain a maximum of 4.1 points coming from accreditation, 4.4 points from projects and 1.5 points from quinquennium. Researchers in the sample reach an average of 1.27 points for accreditation, 3.11 for projects and 0.46 points for quinquennium. Therefore, the average of the total mark is 5.46, with a growth of 40.8% within the six years.

We analyze a monetary incentives program developed in the Region of Madrid (Spain) from 2005 to 2010, when it was terminated due to budgetary constraints related to the ﬁnancial crisis. Also, after 2010 the national government imposed hiring restrictions on the universities which would make it impossible to track the diverse eﬀects. The participants were the academics of the six public universities of Madrid, which consisted of around 25,000 individuals, some of whom have high positions in diﬀerent ﬁelds of knowledge in international rankings (Aguion et al, 2010). We included in the sample only the individuals who participated during the whole six years of the program. By doing this we can properly analyze the evolution of their performance over time with our ML method, as most clustering methods do not allow for the presence of missing values (Yu et al., 2014). We applied the marginalization method, a well-known method for handling missing data without biasing the sample. Bias is one of the problems of other methods such as imputation. In the imputation method the estimation of the missing values is inherently less reliable than the observed data. Marginalization is a better solution as it doesn't create any new data values (Wagstaﬀ, 2004). Hence, researchers who did not participated during the whole period were eliminated, leaving us with a sample of 5861. The recommendation concerning sample size in this method is 2^k cases (k = =number of variables) and preferably 5 * 2^k cases (Formann, 1984; Dolnicar, 2002) which would be a minimum of 640 individuals which is much less than the sample size in this research. As the sample size is big enough, we can model it without having to implement any imputation of missing data, therefore avoiding the bias that these methods may have. Before making the selection of the individuals in the sample we carefully checked year by year that the characteristics of the non-selected were fully represented in the data sample by checking the variables as follows. Researchers who joined the program were of two types depending on their contractual relationship with the university, tenured (permanent - civil servant contract) (4279 researchers, 73% of the sample) or tenure track (temporary or permanent but non-civil servant contracts) (1582 researchers, 26.99% of the sample). The criteria of the program to evaluate the productivity of the researchers vary according to type of contractual relationship with the university. Each year researchers who took part in the program got a total mark for their productivity in academia which is made up of three diﬀerent aspects of their performance. Projects and quinquennium are common criteria for both tenured and tenure track researchers, but the third criterion, sexennium, is only for tenured researchers, while accreditation is focused on tenure track researchers. Projects consist of research projects within a delimited period of time whose aim is to increase the knowledge on a speciﬁc topic. These projects have to provide income to the institution where the researcher works. This means they are ﬁnanced by public funds external to the university. Quinquennium consists of a positive assessment of the researcher's teaching activity over ﬁve years which may be full-time or part-time. Sexennium consists of a positive assessment of the research activity over a six-year period for tenured researchers. Contract non-tenured PhD professors need to obtain a favorable report from the National Agency for Quality Assessment and Accreditation of Spain (ANECA). This report is called accreditation. This accreditation is needed to apply for entry into the civil servant's university teaching bodies and become a tenured researcher. The overall mark the researcher can obtain each year goes from zero to ten and it is used to distribute the economic incentives among researchers up to the total budget available for the year which in 2010 was 15,000,000€. The overall annual mark is calculated as follows:

The data from the six years were aggregated at the researcher level in a single table with 5861 records, and 96 variables. The ﬁrst 12 variables contain the researcher's characteristics and the following 84 were calculated to measure the researcher's performance within the years by using a longitudinal perspective. Of these, 14 were relevant for the empirical analysis. Descriptive analysis is detailed in the following sections. 3.1. Researchers’ characteristics The researchers’ characteristic variables used in the model were the type of contractual relationship with the university, gender, and area of knowledge. 3.1.1. Type of contractual relationship Researchers have two types of contractual relationship with the university, civil servants of university teaching bodies providing fulltime service, also called tenured researchers who represent 73.01% of the sample (4279 researchers) and tenure track researchers (some of whom have permanent relation with the university but are not civil servant and hypothetically may be furlough) who represent 26.99% (1582 researchers) of the sample. 3.1.2. Gender distribution Men accounted for 62.3% of the sample (mean age 52.05 years) and women accounted for 37.7% of the sample (mean age 50.12 years). This distribution varies depending on the type of contractual relationship the researcher has with the university. In the tenured group men accounted for 64.9% of the sample (mean age 54.00 years) and women for 35.1% of the sample (mean age 52.97 years). In the tenure track group men accounted for 55.3% of the sample (mean age 45.88 years) and women for 44.7% of the sample (mean age 44.06 years). 3.1.3. Area of knowledge Researchers belong to 168 diﬀerent areas of knowledge, with a high concentration in biology and biomedicine 13.3%, followed by applied economics 3.5%, computer sciences and AI 3.4%, medicine 2.1%, and the rest of the areas of knowledge with weights less than 2%. 3.2. Researchers’ longitudinal performance The longitudinal performance of the researchers is calculated in the form of eleven numerical variables. The evolution of their performance over time is evaluated from two diﬀerent perspectives, by analyzing the absolute value of the total marks obtained each year within the program and also by considering the annual percentage of variation of those marks. 3.2.1. Annual marks Six variables were calculated with the total marks as the addition of the individual marks obtained in each evaluated category on a yearly basis (from 2005 to 2010). Total marks for tenured researchers are calculated as the addition of sexennium, projects and quinquennium marks, while total marks for tenure track researchers are calculated as

a) Tenured researchers can obtain a maximum of 5.7 points coming 3

Technological Forecasting & Social Change 149 (2019) 119756

M.T. Ballestar, et al.

the addition of accreditation, projects and quinquennium marks.

to the last period where the p-value was 0.015.

3.2.2. Annual percentage variation of marks Five variables were calculated to measure the annual percentage variation of achieved marks from 2005 to 2010 at researcher level. The percentage of variation allows for classiﬁcation of researchers according to the evolution of their performance over time independently of their baseline which could bias the results if we were using absolute values instead.

4.2. Second stage of the multilevel model: clusteringprocess In the second stage of the multilevel model, we apply a two-step cluster analysis to group researchers based on their characteristics and longitudinal performance (Heggeseth et al., 2015) within each of the two groups (tenured and tenure track researchers) identiﬁed as signiﬁcantly diﬀerent in the ﬁrst stage of this analysis. Two-step analysis has several advantages over other clustering methods, such as being able to manage and analyze large datasets of categorical and continuous variables and select automatically the number of clusters (Jones and Nagin, 2007). When all variables are independent, and when categorical variables follow a multinomial distribution and continuous variables follow a normal distribution, we have found that the two-step clustering method gives the best results (Hox et al, 2017). The procedure is robust, according empirical internal testing to violations of both categorical and continuous assumptions. And so, “because cluster analysis does not involve hypothesis testing and calculation of observed signiﬁcance levels, other than for descriptive follow-up, it's perfectly acceptable to cluster data that may not meet the assumptions for best performance” (Norušis, 2014; Ballestar et al. 2018a). Our two groups of data met the criteria described by Norušis (2014). On the one hand, the ﬁrst Two-step cluster analysis is performed on the tenured researchers’ group with 4279 records, where we have one categorical variable (gender) and ﬁve continuous variables (annual percentage variation of achieved marks from 2005 to 2010). On the other hand, the second Two-step cluster analysis is performed on tenure track researchers’ group with 1582 records, where we have two categorical variables (gender and area of knowledge) and ﬁve continuous variables (annual percentage variation of achieved marks from 2005 to 2010). The two-step cluster analysis method follows these steps: Firstly, pre-clustering the raw data using the log-likelihood distance as the similarity criterion. Here, a sequential process was used where standardized data records were merged to an existing pre-cluster or a new precluster, whichever led to the largest log-likelihood; Secondly, there were combined the pre-clusters using agglomerative hierarchical clustering under the Schwarz criterion (BIC) which yielded to three clusters in each of the two groups. We used silhouette validation (Rousseeuw, 1987) to evaluate the consistency of the clustering structure. This measures cohesion between elements within a cluster and separation between clusters. The silhouette coeﬃcient ranges from –1 to 1, where –1 means that the model is poor and 1 means that the model is optimal. Values greater than 0.5 indicate good model quality (Kaufman and Rousseeuw, 1990; Ballestar et al. 2018a). The silhouette coeﬃcient was 0.9 for the tenured researchers’ group and 0.8 for the tenure track researchers’ group, so the two models were robust. The predictor importance relates to the importance of each variable of the model in making a prediction. It does not relate to the model accuracy or whether or not the prediction is accurate. In the tenured researchers ‘group the highest importance is held by the gender variable followed by the annual percentage variation over the researchers’ achieved marks from periods 5, 1, 2, 4 and 3. In the tenure track researchers’ group the highest importance is also held by the gender variable but followed by the annual percentage variation over the researchers’ achieved marks from periods 1, 2, 3, 5, 4 and ﬁnally the area of knowledge.

4. Empirical analysis and results This investigation applies data science methods to public policy analysis. In this case we developed a novel approach, a machine learning multilevel model consisting of an ANLC performed in two stages. The ﬁrst stage performs a stratiﬁcation of the academics depending on their contractual relationship with the university, a confounding variable. The main characteristic of a confounding variable is that it correlates with both the predictor of interest and also the outcome (Anderson et al., 1980; Frank, 2000). We use the confounding variable to perform the stratiﬁcation method and later analyze the outcome groups independently in a second stage (Austin and Brunner, 2004; Austin, 2011). The second stage performs a longitudinal segmentation for each of these two groups taking into consideration not only the researchers’ characteristics such as sociodemographic and academic information but also the evolution of their performance over time in the form of the annual percentage variation of their marks over a period of six years. The aim is to identify the diﬀerent groups of researchers depending on their characteristics and also their response to the incentive-based program within the six years that it lasted. 4.1. First stage of the multilevel model: stratiﬁcation In the ﬁrst stage of the multilevel model, a stratiﬁcation method was used to classify and group the researchers’ sample into two groups depending on their type of contractual relationship with the university. The criteria to evaluate researchers’ performance are diﬀerent for tenured researchers than tenure track researchers and these diﬀerences are expected to have an impact on the total marks’ researchers can achieve and in the evolution of their performance too. Therefore, the contractual relationship, a categorical variable, acts as a confounding variable and this stratiﬁcation stage mitigates its eﬀects when analyzing researchers’ longitudinal performance (Cochran, 1968; Anderson et al., 2009). Six one-way ANOVA tests were conducted to conﬁrm signiﬁcant diﬀerences between the two groups in terms of researchers’ average annual marks within the six years of the programme. The goal of the ANOVA testing was to conﬁrm the suitability of the type of contractual relationship between the researcher and the University as confounding variable, and therefore, as variable for performing the stratiﬁcation (Pourhoseingholi et al., 2012). As a result, the ﬁrst group is made up of 4279 tenured researchers (73% of the sample) and the second group is made up of 1582 tenure track researchers (26.99% of the sample). The ANOVA tests revealed statically signiﬁcant diﬀerences between the two groups in terms of average marks obtained within the six years which the program lasted. With the aim of guaranteeing the robustness of the ﬁndings, another ﬁve one-way ANOVA tests were conducted to conﬁrm signiﬁcant differences between the two groups, this time in terms of annual percentage variation over researchers’ achieved marks from 2005 to 2010, conﬁrming that the diﬀerence between the two groups are not only regarding their total marks but also in their performance evolution over time. The ANOVA tests revealed signiﬁcant diﬀerences between the two groups in terms of the ﬁve periods of annual percentage variation over researchers’ marks, all the p-values were 0.000 but one, corresponding

4.3. Clustering structure for tenured and tenure track researchers’ groups. On the one hand, tenured researchers were grouped into three clusters based on their gender and their annual percentage variation 4

Technological Forecasting & Social Change 149 (2019) 119756

M.T. Ballestar, et al.

Fig. 1. Cluster distribution chart for tenured and tenure track researchers’ groups. Table 1 Researchers’ average annual marks from 2005 to 2010.

Total Portfolio Tenured researcher Tenure track researcher

Total Portfolio Tenured researcher Tenure track researcher

Sample size

Annual Marks within the whole six year program year 2005 year 2006 year 2007 Mean Mean Mean

year 2008 Mean

year 2009 Mean

year 2010 Mean

Overall average Mean

5,861 4,279 1,582

5.62 6.09 4.37

6.39 6.61 5.79

6.51 6.67 6.10

6.52 6.66 6.15

6.19 6.46 5.46

5.88 6.25 4.86

6.22 6.49 5.51

Sample size

Std. Deviation

Std. Deviation

Std. Deviation

Std. Deviation

Std. Deviation

Std. Deviation

Std. Deviation Overall

5,861 4,279 1,582

3.06 3.04 2.75

3.03 3.04 2.77

3.01 3.03 2.84

2.99 3.01 2.87

2.99 3.02 2.89

2.97 2.99 2.91

3.01 3.02 2.84

Table 2 Annual percentage variation over researchers’ achieved marks from 2005 to 2010. Annual percentage variation 1 year 2006 vs 2005

Sample Size

Total portfolio Tenured researcher Tenure track researcher

Total portfolio Tenured researcher Tenure track researcher

5,861 4,279 1,582

variation of marks within the six years program variation 2 variation 3 variation 4 year 2007 year 2008 year 2009 vs 2006 vs 2007 vs 2008 2.66% 1.86% 5.21%

2.00% 0.91% 5.35%

variation 5 year 2010 vs 2009

Overall variation year 2010 vs 2005

0.09% -0.15% 0.78%

15.95% 9.36% 40.75%

4.52% 2.74% 11.20%

5.85% 3.71% 13.32%

Sample Size

Std. Deviation 1

Std. Deviation 2

Std. Deviation 3

Std. Deviation 4

Std. Deviation 5

Std. Deviation Overall

5,861 4,279 1,582

0.18 0.12 0.35

0.24 0.16 0.46

0.12 0.09 0.20

0.09 0.04 0.22

0.004 0.01 0.03

0.37 0.24 0.71

On the other hand, tenure track researchers were grouped into three clusters based on their gender, area of knowledge and their annual percentage variation over the achieved marks (Fig. 1). The percentage of the sample in each cluster was as follows Cluster 1(52.7%), Cluster 2 (4.2%) and Cluster 3 (43.1%). The smallest cluster (Cluster 2) had 66

over the achieved marks (Fig. 1). The percentage of the sample in each cluster was as follows Cluster 1 (63.4%), Cluster 2 (34.4%) and Cluster 3 (2.2%). The smallest cluster (Cluster 3) had 93 researchers, and the largest cluster (Cluster 1) had 2,715 researchers. Cluster proﬁles appear inTable 3. 5

Technological Forecasting & Social Change 149 (2019) 119756

M.T. Ballestar, et al.

Table 3 Cluster proﬁles: centroids of continuous variables and frequencies of categorical variables for tenured and tenured track of researchers. Tenured group of researchers gender

Cluster

1 2 3 Combined

Man Frequency

Percent

Women Frequency

Percent

2,715 0 61 2,776

97.80% 0.00% 2.20% 100.00%

0 1,471 32 1,503

0.00% 97.87% 2.13% 100.00%

Centroids

Cluster

1 2 3 Combined

perc_variation_1 Mean Std. Deviation

perc_variation_2 Mean Std. Deviation

perc_variation_3 Mean Std. Deviation

perc_variation_4 Mean Std. Deviation

perc_variation_5 Mean Std. Deviation

2.96 1.91 18.68 2.74

3.44 3.28 36.82 3.71

1.67 1.98 8.64 1.86

0.99 0.95 -3.95 0.91

-0.68 -0.16 27.20 -0.15

0.13 0.08 0.31 0.12

0.16 0.14 0.73 0.16

0.08 0.09 0.23 0.09

0.05 0.04 0.12 0.04

0.03 0.01 0.77 0.01

Tenured track group of researchers gender

Cluster

1 2 3 Combined

Man Frequency

Percent

Women Frequency

Percent

834 41 0 875

95.31% 4.69% 0.00% 100.00%

0 25 682 707

0.00% 3.54% 96.46% 100.00%

Area of knowledge

Cluster

1 2 3 Combined

biology and biomedicine Frequency Percent

applied economics Frequency Percent

computer sciences and AI Frequency Percent

medicine Frequency

Percent

rest of areas of knowledge Frequency Percent

128 16 69 213

34 4 17 55

38 2 14 54

17 1 15 33

51.50% 3.00% 45.50% 100.00%

617 43 567 1,227

60.10% 7.50% 32.40% 100.00%

61.80% 7.30% 30.90% 100.00%

70.40% 3.70% 25.90% 100.00%

50.29% 3.50% 46.21% 100.00%

Centroids

Cluster

1 2 3 Combined

perc_variation_1 Mean Std. Deviation

perc_variation_2 Mean Std. Deviation

perc_variation_3 Mean Std. Deviation

perc_variation_4 Mean Std. Deviation

perc_variation_5 Mean Std. Deviation

8.20 209.68 11.88 11.20

11.66 63.20 13.25 13.32

5.54 15.69 4.28 5.21

4.77 13.09 5.51 5.35

0.56 16.43 -0.0013 0.78

0.25 0.99 0.40 0.35

0.39 0.93 0.49 0.46

0.21 0.37 0.18 0.20

0.19 0.36 0.24 0.22

0.02 0.51 0.0001 0.03

of knowledge and researcher's longitudinal performance), and the output variable is the segment to which the researcher belongs, corresponding to one of the six Clusters calculated by the ANLC multilevel model (three clusters for tenured researchers and three clusters for tenure track researchers). According to this, the ANN has three layers: the input layer with 192 units (receiving values from eight independent/ input variables), a hidden layer (with seventeen units) and the output layer (with six units, one per each Cluster or researchers). In our MLP ANN, the hyperbolic tangent was the activation function for all units in the hidden layer, and the softmax function was the activation function for the six units in the output layer. This new model fulﬁlls two diﬀerent purposes at the same time. On the one hand, it allows for the validation of the ANLC multilevel model by using another completely independent Data science method. On the other hand, this is also a predictive model that can be implemented in real time for the classiﬁcation of new samples of researchers that need to be evaluated (Ballestar et al., 2019). The overall classiﬁcation accuracy of the ANN was 99.2% (an error rate of 0.8%). The confusion matrix in Table 4 shows the percentage of cases classiﬁed correctly and incorrectly of the six categories (Clusters of researchers) of the dependent variable. We used the AUC as the main classiﬁcation performance indicator as it is even more accurate than the accuracy indicator under certain circumstances. The AUC ranges from 0.5 to 1, where 1 means that the model makes perfect classiﬁcation and 0.5 means that the model makes random classiﬁcation. This indicator

researchers, and the largest cluster (Cluster 1) had 834 researchers. Cluster proﬁles appear in Table 3. The centroids for the continuous variables appear in the top part, and the frequencies for the categorical variable appear in the bottom part. 5. Robustness of the model The robustness of the automated nested longitudinal clustering performed in two stages (ANLC) was tested by developing an additional machine learning model which consists of a predictive model based on a multilayer perceptron (MLP) artiﬁcial neural network (ANN) with a back-propagation learning algorithm. Artiﬁcial neural networks (ANN) are mathematical models which are able to manage the analysis of large datasets even when complex relationships exist between the input and output variables. In the model, the input variables are the independent ones, while the output variable corresponds to the dependent one (Ballestar et al. 2018a). This research uses a feed-forward MLP ANN model which is one of the most popular types of ANN. (Kavzoglu and Mather 2003; Hu and Weng 2009). The ANN was trained by using the same data sample as in the construction of the ANLC model, (partitioning the sample 69.7% (4080 researchers) for training and 30.3% (1777 researchers) for validation). The ANN input variables are also the same variables used to develop the ANLC multilevel model (type of contractual relationship, gender, area 6

Technological Forecasting & Social Change 149 (2019) 119756

M.T. Ballestar, et al.

Table 4 Confusion matrix. Classiﬁcation Sample

Training

Testing

tenured_cluster_1 tenured_cluster_2 tenured_cluster_3 tenure_track_cluster_1 tenure_track_cluster_2 tenure_track_cluster_3 Overall Percent tenured_cluster_1 tenured_cluster_2 tenured_cluster_3 tenure_track_cluster_1 tenure_track_cluster_2 tenure_track_cluster_3 Overall Percent

Predicted tenured_cluster_1

tenured_cluster_2

tenured_cluster_3

tenure_track_cluster_1

tenure_track_cluster_2

tenure_track_cluster_3

Percent Correct

1879 0 12 0 0 0 46.3% 830 0 3 0 0 0 46.9%

0 1053 1 0 0 0 25.8% 0 415 1 0 0 0 23.4%

1 1 44 0 4 0 1.2% 1 0 20 0 2 0 1.3%

0 0 0 572 3 0 14.1% 0 0 0 258 2 0 14.6%

1 0 5 3 36 0 1.1% 0 0 7 0 18 0 1.4%

0 0 2 0 0 463 11.4% 0 0 0 0 2 218 12.4%

99.9% 99.9% 68.8% 99.5% 83.7% 100.0% 99.2% 99.9% 100.0% 64.5% 100.0% 75.0% 100.0% 99.0%

Dependent Variable: cluster_ﬁnal

The clustering structures from both groups, tenured and tenure track researchers, are now described.

performance, between women of Cluster 2 and men of Cluster 1, tends to be reduced in periods 3 and 4, being even lower for Cluster 1 than Cluster 2 in the last period 5. (Figure 2). The smallest group of researchers is Cluster 3 and it is made up of women (32 researchers; 34.41% of the cluster) and men (61 researchers; 65.59% of the cluster). These researchers start in the program with the lowest performance of the three groups (average mark 2.35 in 2005) but they present the fastest evolution of their annual percentage variation of marks (average mark 5.06 in 2010). This represents an increase from the ﬁrst year of the program to the last one of 115.53%, in comparison with the 8.61% in Cluster 1 and 8.18% in Cluster 2 (Fig. 2).

6.1. Tenured researchers

6.2. Tenure track researchers

Tenured researchers (4279 individuals) are classiﬁed into three clusters, based on their gender and their annual percentage variation over the achieved marks. Cluster 1 is made up of exclusively male researchers (2715; 63.45% of the sample), while Cluster 2 is made up of exclusively female researcher (1471; 34.38% of the sample) and ﬁnally, Cluster 3 is a small group made up of a mix of both male and female researchers (93 researchers; 2.17% of the sample). (Table 3). Researchers from Cluster 1 have the highest marks within the six years of the program (average mark 6.69), higher than Cluster 2 (average mark 6.21) and Cluster 3 (average mark 3.69). Cluster 1 also shows better evolution of their performance than Cluster 2 in periods 1 and 2 (from 2005 to 2007) reaching a peak of annual percentage variation of their performance of 3.44% in the second period (Fig. 2). Regarding the annual percentage variation of their performance Clusters 1 and 2 show similar trends. They reach a peak in the second period with 3.44% increase for the Cluster 1 and 3.28% for the Cluster 2, but the researchers are not able to keep increasing their marks at that peace, and the incremental gain start to decrease until the end of the program. Cluster 3 presents a more accentuated trend than Cluster 1 and 2, it also reaches a peak in the second period with 18.68% of increase of the performance, and then, the incremental gain starts to decrease rapidly in the following 2 periods up to −3.95% (from 2007 to 2009) to ﬁnalize with a sharp increase of 27.19% in the last period (Fig. 2). This means, that the Cluster 1 of men achieve better marks from the very beginning than the other two clusters, and is able of maintaining a higher level of improvement in their performance in the two ﬁrst periods (from 2005 to 2007) than the women's cluster (Cluster 2). In this second period, the gap in their annual percentage variation of

Tenure track researchers (1582 individuals) are classiﬁed into three clusters, based on their gender and their annual percentage variation over the achieved marks. Cluster 1 is made up of exclusively male researchers (834 researchers; 52.72% of the sample), while Cluster 3 is made up of exclusively female researchers (682 researchers; 43.11% of the sample) and ﬁnally, Cluster 2 is a small group made up of a mix of both male and female researchers (66 researchers; 4.17% of the sample). (Table 3) Researchers from Cluster 3 have the highest marks within the six years of program (average mark 5.89), higher than Cluster 1 (average mark 5.28) and Cluster 2 (average mark 3.26). Cluster 3 also shows better evolution of their performance than Cluster 1 in the periods 1 and 2 (from 2005 to 2007) reaching a peak of annual percentage variation of their performance of 13.25% in the second period. (Fig. 2). Regarding the annual percentage variation of their performance Clusters 1 and 3 show similar trends. They reach a peak in the second period with 13.25% increase for Cluster 3 and 11.66% for Cluster 1, but the researchers are not able to keep increasing their marks at that peace, and the incremental gain start to decrease until the end of the program. Cluster 2 presents a diﬀerent trend, it reaches the peak in the ﬁrst period with 209.68% of increase of the performance, and then, the incremental gain starts to decrease rapidly up to 16.43% (Fig. 2). This means, that the Cluster 3 of women achieves better marks from the very beginning than the other two clusters, and is capable of maintaining a higher level of improvement in their performance in the two ﬁrst periods (from 2005 to 2007) than men's cluster (Cluster 1). Finally, this gap in their annual percentage variation of performance, between women of Cluster 3 and men of Cluster 1, tends to be reduced in periods 3, 4 and 5 (Fig. 2). The smallest group of researchers is Cluster 2 and it is made up of

was calculated for each of the six clusters of researchers obtaining values between 0.987 and 1 meaning that the ANN model was very good (Hosmer et al., 2013). In conclusion, the prediction made by the ANN ﬁts with the classiﬁcation made by the ANLC model in the 99.2% of the cases, indicating a high convergence between the two methods. The robustness of the ANLC is strongly validated by the MLP-ANN. 6. Discussion and implications

7

Technological Forecasting & Social Change 149 (2019) 119756

M.T. Ballestar, et al.

Fig. 2. Average annual marks and annual percentage variation of marks for Clusters of the tenured researchers’ group and tenure track researchers’ group.

The ﬁndings also support H2, showing that incentive-based programs provide incremental growth up to the third year, when they start to decelerate, meaning that this type of program has a positive impact on researchers’ performance in the short term, especially for the tenured-track researchers (11.20% growth the ﬁrst period from 2005 to 2006, 13.32% growth the second period from 2006 to 2007). They reach a saturation point within the third year which has a negative impact on the return of the program. In addition to this, this research shows that gender also plays an important role in academia, being a relevant variable when analyzing performance within each of two main groups. The comparison of the performance between clusters of men and women varies depending on their contractual relationship with the university. Men are the ones that reach higher marks in the tenured researchers’ group with an average mark in 2010 (last year of the program) of 6.86 and women in the tenure track researchers’ group with an average mark of 5.89. Our results are robust and signiﬁcant, and the use of the multilayer perceptron (MLP) artiﬁcial neural network (ANN) with a back-propagation learning algorithm facilitates its use in forecasting and improving public policies.

women (25 researchers; 37.88% of the cluster) and men (41 researchers; 62.12% of the cluster). These researchers start in the program with a very low performance (average mark 0.66 in 2005) but they present a fast evolution of their annual percentage variation of marks (average mark 5.14 in 2010). This represents an increase from the ﬁrst year of the program to the last one of 669.88%, in comparison with the 34.34% in Cluster 1 and 39.40% in Cluster 3 (Fig. 2). 6.3. Summary of ﬁndings The ﬁndings support H1, showing that the employment status of the researchers with the university, in terms of contractual relationship, is key in how incentives aﬀect them. The main reason is that the characteristics and performance of the two groups correspond to professionals who are in very diﬀerent stages of their careers. On the one hand, incentives have little impact on tenured researchers (who represent the 73.01% of the sample), as they just increase their productivity by 9.36% over the six years compared with the increase of 40.75% from tenured track researchers (who represent the 26.99% of the sample). This is consistent with previous literature as summarized by Dnes and Garoupa (2005). On the other hand, the baselines of their performance are also very diﬀerent. Tenured researchers have an average of 6.08 points the ﬁrst year of the program compared to the 4.36 points of the tenured-track researchers, suggesting that programs that evaluate researchers’ performance based on diﬀerent criteria and later rank them according to the outputs to distribute the incentives can have unexpected outcomes that leads to inequalities and ineﬃciencies as also Rauber and Ursprung (2008) and Batterbury (2008) ﬁnd out.

7. Conclusions This paper proposes a new ML method to measure the success and long-term eﬀects of incentives on public policies. We use this model to assess the eﬃciency of long-term incentive-based programs in order to boost research productivity by analyzing an anonymized individuallevel data sample of 5,861 researchers who participate in a program in public universities in the Madrid Region from 2005 to 2010. 8

Technological Forecasting & Social Change 149 (2019) 119756

M.T. Ballestar, et al.

To our knowledge, this is the ﬁrst research which focuses on researchers’ response to this kind of program with an extensive length of time and number of individuals. We have also shown the advantages of research in this area of using data science methods such as machine learning. In this case, we have developed an automated nested longitudinal clustering (ANLC) that performs ﬁrst a stratiﬁcation of researchers depending on their contractual relationship with the university and later, performs a longitudinal segmentation for each of the groups where their characteristics and performance over time are taken into account. Therefore, this paper bridges that gap and paves the way for new lines of research based on data analysis that can be readily implemented for the beneﬁt of both organizations and researchers. One of the main beneﬁts of this research is that it enables us to understand the behavior and response to incentives of heterogeneous groups of researchers. Thus, organizations will be able to optimize the design of programs maximizing the scientiﬁc production, and the development of the researchers’ path at the same time, knowing that research and innovation produce potentially large social beneﬁts (Jaﬀe et al., 2005). Our results are in line with previous literature on incentives such as Jenkins et al. (1998), Camerer and Hogarth (1999), Wright and Boswell (2002), and also the recommendations with regard to the use of Machine Learning such as Chalﬁn et al. (2016) or Athey (2017) and Athey and Imbens (2017). They can be used in the heated debate on the reproducibility of research results for academic promotions (Lakens et al., 2018). The study has some limitations. Future research should further analyze the sample of researchers who did not participate during the whole period. Some of them do not enter at the very beginning and others drop oﬀ before its end. Also, the use of other data science methods would yield additional insight into this issue.

2060–2080. Dnes, A., Garoupa, N., 2005. Academic tenure, posttenure eﬀort, and contractual damages. Econ. Inquiry 43 (4), 831–839. Dobbin, F., Simmons, B., Garrett, G., 2007. The global diﬀusion of public policies: Social construction, coercion, competition, or learning? Annu. Rev. Sociol. 33, 449–472. Dolnicar, S. (2002). A review of unquestioned standards in using cluster analysis for data-driven market segmentation. Easterly, W., Levine, R., 1997. Africa's growth tragedy: Policies and ethnic divisions. Q. J. Econ. 112 (4), 1203–1250. https://doi.org/10.1162/003355300555466. Edwards, M.A., Roy, S., 2017. Academic research in the 21st century: maintaining scientiﬁc integrity in a climate of perverse incentives and hypercompetition. Environ. Eng. Sci. 34 (1), 51–61. Epstein, N., Fischer, M.R., 2017. Academic career intentions in the life sciences: can research self-eﬃcacy beliefs explain low numbers of aspiring physician and female scientists? PloS One 12 (9), e0184543. Formann, A.K., 1984. Die latent-class-analyse: Einführung in Theorie und Anwendung. Beltz. Frank, K.A., 2000. Impact of a confounding variable on a regression coeﬃcient. Sociol. Methods Res. 29 (2), 147–194. Heggeseth, B., Harley, K., Warner, M., Jewell, N., Eskenazi, B., 2015. Detecting associations between early-life DDT exposures and childhood growth patterns: a novel statistical approach. PloS One 10 (6), e0131443. Hicks, D., 2012. Performance-based university research funding systems. Res. Policy 41 (2), 251–261. Hobbs, F. R., & Roberts, L. M. (2016). The stern review of the research excellence framework. Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X., 2013. Applied Logistic Regression. Wiley, Hoboken. Hox, J.J., Moerbeek, M., Van de Schoot, R., 2017. Multilevel Analysis: Techniques and Applications. Routledge. Hu, X, Weng, Q, 2009. Estimating impervious surfaces from medium spatial resolution imagery using the self-organizing map and multi-layer perceptron neural networks. Remote Sens. Environ. 113 (10), 2089–2102. Jaﬀe, A.B., Newell, R.G., Stavins, R.N., 2005. A tale of two market failures: technology and environmental policy. Ecol. Econ. 54 (2-3), 164–174. Jenkins Jr., G.D., Mitra, A., Gupta, N., Shaw, J.D., 1998. Are ﬁnancial incentives related to performance? A meta-analytic review of empirical research. J. Appl. Psychol. 83 (5), 777. Jones, B.L., Nagin, D.S., 2007. Advances in group-based trajectory modeling and an SAS procedure for estimating them. Sociol. Methods Res. 35 (4), 542–571. Kattge, J., Diaz, S., Lavorel, S., Prentice, I.C., Leadley, P., Bönisch, G., Cornelissen, J.H.C., 2011. TRY–a global database of plant traits. Global Change Biol. 17 (9), 2905–2935. Kaufman, L., Rousseeuw, P.J., 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Hoboken, NJ, USA. Kavzoglu, T, Mather, P, 2003. The use of back propagating artiﬁcial neural networks in land cover classiﬁcation. Int. J. Remote Sens. 24 (23), 4907–4938. Kleinberg, J., Ludwig, J., Mullainathan, S., Obermeyer, Z., 2015. Prediction policy problems. Am. Econ. Rev. 105 (5), 491–495. Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., Mullainathan, S., 2018. Human decisions and machine predictions. Q. J. Econ. 133 (1), 237–293. LaConte, S., Strother, S., Cherkassky, V., Anderson, J., Hu, X., 2005. Support vector machines for temporal classiﬁcation of block design fMRI data. NeuroImage 26 (2), 317–329. Lakens, D., Adolﬁ, F.G., Albers, C.J., Anvari, F., Apps, M.A., Argamon, S.E., Buchanan, E.M., 2018. Justify your alpha. Nat. Hum. Behav. 2 (3), 168. Mjolsness, E., DeCoste, D., 2001. Machine learning for science: state of the art and future prospects. Science 293 (5537), 2051–2055. Musselin, C., 2005. European academic labor markets in transition. Higher Educ. 49 (1-2), 135–154. Norušis, M.J., 2014. SPSS 13.0 Statistical Procedures Companion. Prentice Hall. Pers, TH, Albrechtsen, A, Holst, C, Sørensen, TIA, Gerds, TA, 2009. The validation and assessment of machine learning: a game of prediction from high-dimensional data. PLoS One 4 (8), e6287. Pontille, D., Torny, D., 2010. The controversial policies of journal ratings: evaluating social sciences and humanities. Res. Eval. 19 (5), 347–360. Pourhoseingholi, M.A., Baghestani, A.R., Vahedi, M., 2012. How to control confounding eﬀects by statistical analysis. Gastroenterology and Hepatology from bed to bench 5 (2), 79. Rauber, M., Ursprung, H.W., 2008. Life cycle and cohort productivity in economic research: the case of Germany. German Econ. Rev. 9 (4), 431–456. Rothstein, B., Uslaner, E.M., 2005. All for all: equality, corruption, and social trust. World Polit. 58 (1), 41–72. Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65. Sin, K., Muthu, L., 2015. Application of big data in education data mining and learning analytics – a literature review. ICTACT J. Soft Comput. 5 (4). Taylor, J., 2011. The assessment of research quality in UK universities: peer review or metrics? Br. J. Manage. 22 (2), 202–217. Tietze, S., 2018. Multilingual research, monolingual publications: management scholarship in English only? Eur. J. Int. Manage. 12 (1/2), 28–45. Wagstaﬀ, K., 2004. Clustering with missing values: No imputation required. Classiﬁcation, Clustering, and Data Mining Applications. Springer, Berlin, Heidelberg, pp. 649–658. Wang, D., 2019. International labour movement, public intermediate input and wage inequality: a dynamic approach. Econ. Res.-Ekonomska istraživanja 32 (1), 1–16. Wang, J., Lee, Y.N., Walsh, J.P., 2018. Funding model and creativity in science: competitive versus block funding and status contingency eﬀects. Res. Policy 47 (6), 1070–1083. Wright, P.M., Boswell, W.R., 2002. Desegregating HRM: a review and synthesis of micro and macro human resource management research. J. Manage. 28 (3), 247–276. Yu, H., Su, T., Zeng, X., 2014. A three-way decisions clustering algorithm for incomplete data. International Conference on Rough Sets and Knowledge Technology. Springer, Cham, pp. 765–776.

References Aghion, P., Dewatripont, M., Hoxby, C., Mas-Colell, A., Sapir, A., 2010. The governance and performance of universities: evidence from Europe and the US. Econ. Policy 25 (61), 7–59. Anderson, S., Auquier, A., Hauck, W.W., Cakes, D., Vandaele, W., Weisberg, H.I., Bryk, A.S., Kleinman, J., 1980. Statistical Methods for Comparative Studies. Wiley, New York. Anderson, S.R., Auquier, A., Hauck, W.W., Oakes, D., Vandaele, W., Weisberg, H.I., 2009. Statistical Methods for Comparative Studies: Techniques for Bias Reduction Vol. 170 John Wiley & Sons. Athey, S., 2017. Beyond prediction: Using big data for policy problems. Science 355 (6324), 483–485. Athey, S., Imbens, G.W., 2017. The state of applied econometrics: causality and policy evaluation. . Econ. Perspect. 31 (2), 3–32. Auranen, O., Nieminen, M., 2010. University research funding and publication performance—an international comparison. Res. Policy 39 (6), 822–834. Austin, P.C., 2011. An introduction to propensity score methods for reducing the eﬀects of confounding in observational studies. Multivariate Behav. Res. 46 (3), 399–424. Austin, P.C., Brunner, L.J., 2004. Inﬂation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat. Med. 23 (7), 1159–1178. Ballestar, M.T., Grau-Carles, P., Sainz, J., 2018a. Customer segmentation in e-commerce: applications to the cashback business model. J, Bus. Res. 88, 407–414. Ballestar, M.T., Soriano, D.R., Sanz, J., 2018b. Es el big data el siguiente paso en la digitalización de la empresa? Economía Industrial (409), 47–56. Ballestar, M.T., Grau-Carles, P., Sainz, J., 2019. Predicting customer quality in e-commerce social networks: a machine learning approach. Rev. Manag. Sci. 1–15. Batterbury, S., 2008. Tenure or permanent contracts in North American higher education? A critical assessment. Policy Futures Educ. 6 (3), 286–297. Camerer, C.F., Hogarth, R.M., 1999. The eﬀects of ﬁnancial incentives in experiments: a review and capital-labor-production framework. J. Risk Uncertainty 19 (1-3), 7–42. Chait, R., 2009. The Questions of Tenure. Harvard University Press. Chalﬁn, A., Danieli, O., Hillis, A., Jelveh, Z., Luca, M., Ludwig, J., Mullainathan, S., 2016. Productivity and selection of human capital with machine learning. Am. Econ. Rev. 106 (5), 124–127. Chambers, C.D., Dienes, Z., McIntosh, R.D., Rotshtein, P., Willmes, K., 2015. Registered reports: realigning incentives in scientiﬁc publishing. Cortex 66, A1–A2. Cochran, W.G., 1968. The eﬀectiveness of adjustment by subclassiﬁcation in removing bias in observational studies. Biometrics 295–313. Cvecic, I., Sokolic, D., 2018. Impact of public expenditure in labour market policies and other selected factors on youth unemployment. Econ. Res.-Ekonomska Istraživanja 31 (1),

9

A novel machine learning approach for evaluation of public policies: An application in relation to the performance of university researchers

A novel machine learning approach for evaluation of public policies: An application in relation to the performance of university researchers

Recommend Documents