A typology of adopters and nonadopters of improved sorghum seeds in Tanzania: A deep learning neural network approach

A typology of adopters and nonadopters of improved sorghum seeds in Tanzania: A deep learning neural network approach

World Development 127 (2020) 104839 Contents lists available at ScienceDirect World Development journal homepage: www.elsevier.com/locate/worlddev ...

5MB Sizes 0 Downloads 88 Views

World Development 127 (2020) 104839

Contents lists available at ScienceDirect

World Development journal homepage: www.elsevier.com/locate/worlddev

A typology of adopters and nonadopters of improved sorghum seeds in Tanzania: A deep learning neural network approach Aloyce R. Kaliba a,⇑, Richard J. Mushi b, Anne G. Gongwe c, Kizito Mazvimavi d a

College of Business, Southern University and A&M College, Baton Rouge, LA, USA Department of Social Sciences, Mississippi Valley State University, MS, USA c Faculty of Business Administration, St. Augustine University of Tanzania, Mwanza, Tanzania d International Crop Research Institute for Semi-Arid Crops, Bulawayo, Zimbabwe b

a r t i c l e

i n f o

Article history: Accepted 7 December 2019

Keywords: Double hurdle Deep learning Neural networks Sorghum Typology t-SNE

a b s t r a c t For more than three decades, the direction of agricultural research and extension efforts have been toward developing improved seeds for agricultural transformation in Sub-Saharan Africa. Despite these efforts and substantial investment in physical and human capital, the adoption of improved seeds has remained marginal. One of the factors constraining adoption is limited choices among heterogeneous small-scale farmers often targeted by fit-for-all agricultural technologies. In this paper, we typify small-scale sorghum producers in Tanzania based on the socio-economic characteristics of farmers that include a propensity for adoption and intensity of adoption. The two variables are predicted using a deep learning neural network with back-propagation. The visualization of identified adopters and nonadopters groups is achieved using t-distributed stochastic neighbor embedding. Knowing the typology of farmers is a critical first step when the goal is scaling-up the adoption process through tailored advisory services. Results show that sorghum producers in Tanzania are heterogeneous, and there is a need for developing targeted agricultural innovations and public policies that serve specific groups of farmers. Since Tanzania agricultural policies are formulated at the national level, there must be room for adjustment by regional and district levels authorities to reflect local demand for services. Published by Elsevier Ltd.

1. Introduction Except in Sub-Saharan Africa, many developing countries have registered productivity gains from adopting hybrid seeds, inorganic fertilizer, and irrigation. For example, Gollin, Morris, and Byerlee (2005) show that improved maize varieties constituted 17 percent of total area harvested in sub-Saharan Africa compared to 90 percent in East and South-East Asia and the Pacific and 57 percent in Latin America and the Caribbean. Moreover, Breisinger et al. (2011) analyses show that while the population of subSaharan Africa is growing rapidly and about 70 percent of the people living in rural areas, the agricultural sector is not growing fast enough to stimulate and support economic development. Expansion into the new land is the primary source of agricultural growth, a leading cause of environmental degradation.

⇑ Corresponding author. E-mail addresses: [email protected] (A.R. Kaliba), [email protected] (R.J. Mushi), [email protected] (A.G. Gongwe), [email protected] (K. Mazvimavi). https://doi.org/10.1016/j.worlddev.2019.104839 0305-750X/Published by Elsevier Ltd.

The adoption of innovative technologies is the only pathway of making Sub-Saharan agricultural sector efficient and sustainable. As discussed in Meijer, Catacutan, Ajayi, Sileshi, and Nieuwenhuis (2015), a prerequisite for adoption is for farmers to perceive that these technologies are beneficial and profitable. Wesley and Faminow (2014) emphasize stabilizing research and extension linkages that are essential in developing tailored advisory services. They suggest addressing complementary factors that support fertilizer use and marketing participation, and localized value addition activities to increase profits from improved seeds. Merely availing improve technologies cannot ensure sustained adoption and effectively support improved seeds value chain. In the region, public agricultural research and extension programs continue to be the primary source of farming innovations, and there is an urgency of strengthening their ability to deliver effective services. Based on Mazvimavi and Steve Twomlow (2009) and other studies, these services help farmers assess cost, determine risk profile and profitability of the new innovations. Despite the services being freely available, adoption is sometimes low as there is a profound mismatch between available agricultural technologies and the socio-economic circumstances of farmers.

2

A.R. Kaliba et al. / World Development 127 (2020) 104839

Makate, Makate, and Mango (2017) provide evidence that farmers in Sub-Saharan Africa are heterogeneous and face different production and marketing constraints. They need diversified technologies to choose from; generally, fit-for-all agricultural innovations do not achieve more extensive adoption, as option decisions depend on the varied needs and abilities of the farmers. Authors such as Doss (2006, 2013), Kassie, Teklewold, Jaleta, Marenya, and Erenstein (2015), and several others address the adoption questions either by estimating adoption rates or identifying characteristics associated with adoption. Few studies, including Kuivanen et al. (2016) and Makate, Makate, and Mango (2017) typify farmers based on farmer characteristics. Limited studies such as Gorgulu (2010), and Dalog˘lu, Nassauer, Riolo, and Scavia (2014) link the typology of farmers to new agricultural technology adoption patterns. This study fills this knowledge gap and uses the socio-economic characteristics of farmers, availability of institutional support systems, and potential adoption patterns to typify the farmers and use the results to identify possible research and extension support systems for each defined group. Hoop, Mack, Mann, and Schmid (2014) content that the classification of farm households by certain commonalities or differences is an essential step in exploring the factors that explain the adoption. Labeling also helps to understand existing adoption constraints as well as finding opportunities for change. Makate et al. (2017) applied a multivariate analysis that combines principal component and cluster analyses to classify typical farm households based on their socioeconomic characteristics. The two-step approach has two drawbacks observed by Lattin, Carroll, and Green (2011). While principal component analysis fights the curse of dimensionality, it may lead to loss of information and have the difficulty of choosing the best numbers of clusters. The technique is also suitable for variables with near-normal distributions; however, factors influencing adoptions include binary and categorical variables. This study aims to find the salient features of adopters and potential adopters of improved sorghum varieties in Tanzania and suggest tailored interventions to scale-up the adoption process. We use deep learning neural networks to predict propensity for and intensity of adoption and van der Maaten and Hinton’s t-SNE (t-Distributed Stochastic Neighbor Embedding) for classification and visualization of the created farmer’s groups. The robustness of the deep learning neural network results was tested using the double-hurdle model with dependent errors. Cragg (1971) and Jones (1992) explain a procedure that allows modeling propensity for and intensity of adoption using two linked but separate processes, estimated through a probit model and a truncated normally distributed model, respectively. Deep learning is a type of machine learning in which a model learns to perform prediction directly from data, as explained in Bengio, Courville, & Vincent, 2013). Predicting and comparing the results from the two frameworks allow getting the best outcomes. The t-SNE is a non-linear dimensionality reduction, and it is the best tool for visualizing multi-dimensional data. We use data collected from main sorghum producing farming systems in Northern and Central Tanzania with the aims of using novel tools to find representative groups of farmers and generating a body of knowledge for use in public awareness work and advocacy. In the following section, we present a summary of sorghum research activities in Tanzania and method of data collection, a brief review on a double-hurdle model, and deep learning using neural networks, and methods of data analysis. In Section 3, we present summary statistics of variables used in the research and discuss the results arising from the neural network and clusters analyses. The summary and strategies to enhance the uptake of improved sorghum varieties in Tanzania are in Section 4.

2. History of sorghum research in Tanzania, source of data, and conceptual framework 2.1. Sorghum research in Tanzania and data collection Rohrbach and Kiriwaggulu (2001) summarize the history of Sorghum research and development activities in Tanzania. The program traces back to early 1932 when the Colonial Government started the sorghum and millet improvement program. In the 1980s, the International Crops Research Institute for Semi-Arid Tropics (ICRISAT) started a high-level collaboration with the Tanzania Ministry of Agriculture through the Department of Research and Development (DRD). The partnership led to the release of three sorghum varieties (Mgonja et al., 2005): Tegemeo, Pato, and Macia in 1978, 1997, and 1998, respectively. Two other sorghum varieties released in 2002 are Wahi and Hakika. The release of NARCO Mtama 1 variety along with Sila variety was in 2008. Kilimo (2008) presents the agronomic and physical characteristics of these varieties. The varieties are drought tolerant and suitable for human consumption, animal feed, and for baking and brewing. The adoption of these varieties remains low (Kaliba, Mazvimavi, Gregory, Mgonja, & Mgonja, 2018), and this study aims at generating information that could enhance the adoption process of improved sorghum seeds in Tanzania. The source of sample data for this analysis is a survey conducted by Selian Agricultural Research Institute (SARI) Arusha Tanzania, in collaboration with ICRISAT, Nairobi, Kenya. The principal author developed the structured questionnaire used in the study. Twenty-five extension agents working in major sorghum farming systems and three scientists from ICRISAT participated in a twoday training workshop, reviewed, and pretested the questionnaire in Singida Rural (Central Tanzania) and Rombo districts (Northern Tanzania). The sampling frame included Dodoma Region (102 sample households), Kilimanjaro Region (57 sample households), Manyara Region (110 sample households), Singida Region (435 sample households), and Shinyanga Region (118 sample households). Within the region, the districts were Iramba, Singida Rural, and Manyoni districts (Singida Region); Kondoa District (Dodoma Region); Babati District (Manyara Region); Rombo District (Kilimanjaro Region); and, Kishapu District (Shinyanga Region). The selection of districts in the region accounted for the intensity and importance of sorghum production in the farming system. In each district, there was a random selection of sample Wards and sample Villages.1 The final sample includes fourteen Wards and fourteen Villages (Map 1). The sample included 822 sample households, of which 505 were adopters (61.44%) and 317 nonadopters (38.56%) during the 2012/13 farming season.

2.2. The double-hurdle model We use a double-hurdle model as a base-case model to test the robustness of the neural networks model. The double-hurdle models were developed to account for zero observations in the intensity of adoption variable using consumer choice theory. Regardless of potential profitability, Jones (1992) observes that some households cannot be induced to adopt improved agricultural technologies as other behavioral factors and market constraints may determine zero observations. Pudney (1989) derived an alternative model using discrete random preference regimes were adopters are hypothesized to have a different preference structure than nonadopters. Blaylock and Blisard (1993) extend this assumption were intensity of adoption reflects either the 1 Tanzania administration units include Regions, Districts, Wards, and villages. Therefore, the Village is the lowest administrative unit.

A.R. Kaliba et al. / World Development 127 (2020) 104839

3

Map 1. Location of sample households in Tanzania.

decision not to adopt or a standard corner solution. The doublehurdle model generalizes the Tobit model (Tobit, 1958), and a first hurdle represents a household’s decision to adopt, and a second hurdle is how much land to locate to improved seeds. The two choices (i.e., adoption and allocation) can either be dependent on or independent of each other (Jones, 1992) or can be observed sequentially (Lee & Maddala, 1985). Let (yi2 ) be a proportion of cropland allotted to improved seeds for household i and let (yi2 ) be a latent variable representing the proportion of land under ISVs for both adopters and nonadopters, and if (yi1 ) is a latent variable describing the first hurdle (i.e., the decision to adopt), the double-hurdle model is:

yi1



Y

½1  UðZ i h; Xb=r; qÞ

yi2¼0

¼ Z i h þ li;

þ

yi2 ¼ X i b þ ei;

Y yi2>0

  yi2 ¼ 0 if Z i h þ li  0 and X i b þ ei > 0 ; 

ð1Þ



rq : rq r2 1

"

U

# !   Z i h þ rq T ðyi2 Þ  Xb 1 T ðyi2 ÞXb 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffi /

1=2 : r r 1  q2 1  s2 y2i2 ð2Þ

  yi2 ¼ yi2 if Z i h þ li > 0 and X i b þ ei > 0 ;

where; ðl; eÞ BVNð0; RÞ; R ¼

There are two considerations when developing the likelihood function. First, a normal distribution of the two error terms is an implicitly maintained hypothesis, and estimates are inconsistent when the normality assumption is violated (Arabmazar & Peter Schmidt, 1982). Gao et al. (1995) suggests accommodating nonnormal errors by transforming yi2 using Box-Cox transformation. However, Lee and Maddala (1985) show that the Box-Cox transformation is not defined when (yi2 < 0), therefore, proposed using the inverse hyperbolic sine (IHS) transformation. The likelihood function that includes HIS is:

In Eq. (1), Z and X are sets of socio-economic and built environment variables that enter the first and second hurdles, r and q are respectively standard deviation of intensity of adoption and correlation between the errors in a first and second hurdle. A generalized likelihood function for the double-hurdle model can be constructed using Eq. (1) and estimated using the maximum likelihood technique suggested by Gao, Wailes, and Cramer (1995).

The variable T(.) in Eq. (2) represents the HIS transformation function, Uð:Þ, and /ð:Þ are respectively cumulative and density functions, s is the parameter that control kurtosis and all other variables are as explained before. The HIS transformation is such that:

h

1=2 i 1 T ðyi2 Þ ¼ log syi2 þ s2 y2i2 þ 1 =s ¼ sinh ðsyi2 Þ=s:

ð3Þ

In Eq. (3), HIS transformation is symmetric about 0 in s, and one can consider only s > 0. According to Lee and Maddala (1985), T ðyi2 Þ is linear when s approaches zero and behave logarithmically for large values of yi2 and for a wide range of s. It also has the desirable property of being scale-invariant, and besides, this transfor-

4

A.R. Kaliba et al. / World Development 127 (2020) 104839

mation is possible for any random variables with positive and negative values. Imposing the restrictions that s ¼ 0, and when q ¼ 0, Eq. (2) gives Cragg (1971) double-hurdle and when Z = X and h ¼ b=r we have the likelihood function for the Tobit model (Tobin, 1958). Selection among these models is by using the likelihood ratio tests or by t-test for a single parameter restriction (Loeys, Moerkerke, Smet, and Buysse (2012). The second consideration is heteroscedasticity, in which the standard deviation (r) is generally specified using the exponential function such that ri ¼ exp ðwi cÞ, where wi is a vector of exogenous variables, and c are estimable parameters. After estimating all parameters using the likelihood function in Equation (2), the propensity or the probability of adoption is:

Probðyi2 > 0Þ ¼ UðZ i h; Xb=ri ; qÞ;

ð4Þ

and the conditional mean of yi2 , which measures the average intensity of adoption conditional on the probability of adoption being greater than zero, is:

  Z Xb Eðyi2 jyi2 > 0Þ ¼ U

ri

0

1

yi2

1

ri

  Tðyi2Þ Xb /

ri

1 ð1  s

2 y2 Þ i2

 dyi2: ð5Þ

Using the results in Eqs. (4) and (5), the unconditional mean of yi2 , which measures the overall average intensity of adoption among adopters and nonadopters, is the multiple of Eq. (4) and (5). Recent application of the double-hurdle model includes Asfaw, Shiferaw, Simtowe, and Mekbib (2011), who analyze the determinants of the intensity of technology adoption conditional on overcoming seed access constraints in Ethiopia. Also, Akpan, Veronica, and Essien (2012) determine decision variables that influence fertilizer adoption and optimal intensity use among crop farmers in Southern Nigeria, and Amare, Asfaw, and Shiferaw (2011) analyze the determinants of improved pigeon pea and maize seeds adoption conditional on overcoming seed access constraints in Tanzania. 2.3. The fundaments of deep learning neural network model As explained above, a double-hurdle model is a base-case model when adoption behavior consists of two decisions: an adoption decision; and, an amount/quantity decision which takes on continuous positive values. According to Brockett, Cooper, Golden, and Pitaktong (1994) and Soni and Abdullahi (2015), the doublehurdle models fail to capture the underlying dynamics present in the data and suggests using artificial neural networks as an alternative. Machine learning is a subfield of artificial intelligence to enable computers to learn on their own by finding patterns in observed data. With supervised machine learning, the algorithm learns the relationship of variables in the existing data without pre-programmed rules and apply that learned relationship to classify or predict with entirely new data. Following Schmidhuber (2015) and Bengio et al. (2015), let yi ¼ f ðxi1 Þ þ ei , where yi , f ðxi1 Þ, and ei are respectively, the variable to be classified or predicted, the function describing the relationship between variables x (explanatory variables) and y (response variables), and random error term (positive or negative) with mean zero. The goal of supervised learning is to learn the relationship between y and x using the training dataset, xi1 and use the learned relationship to predict unknown values of y as accurately as possible using a new test dataset,xi2 . Therefore, the tasks of supervised learning are to classify or predict y with new datasets into distinct groups according to their resemblances and affinities. Hinton and Salakhutdinov (2006) indicate that deep learning is a subtype of machine learning that uses a neural network as a prediction and classification architecture. Neural networks are general

nonlinear classifiers or predictors and are excellent with genetic data that show highly nonlinear properties. While standard machine learning algorithms manually extract the relevant features of the data for classification and prediction (Bengio, 2009), deep learning feeds the information directly into a deep neural network that automatically learns the data’s features. Bengio et al. (2013) show that deep learning neural networks are very efficient at learning f ðxi ), particularly in situations where the data is complex and nonlinear. It starts with an input layer to match the feature space, followed by multiple layers of nonlinearity, and ending with a linear regression or classification layer to match the output space, as shown in Fig. 1 At each node, there is a P weighted combination of input signals (a ¼ ni¼1 wi xi ), and then Pn an output signal, f ð i¼1 wi xi þ ei ), transmitted by the connected neuron. Therefore, the function f(.) is the nonlinear activation function used throughout the network and the bias ei is the neuron’s activation threshold. What differentiates the outputs of different neurons is their weights (wi ). From Fig. 1, a traditional method of transferring information from a layer to a neuron is setting wk0 ¼ bk for the first input (x0 ), which makes it a bias input and leaves other data (from x1 to xm) to connect directly to the neuron using activation functions. Mathematically, a neural network with one hidden layer is of the form:

yk ¼ f k ak þ

X j

wjk f h aj þ

X

!!

wij xi

:

ð6Þ

j

The function f ð:Þ in Equation (6) represents an activation function, the parameters a are the functions thresholds and wij are weights connecting the input to other hidden and output nodes. There are several choices for the nonlinear activation function, the common ones being the Logistic, Hyperbolic Tangent (tahh), Rectified Linear Unit (ReLu), and Max-Pooling (Maxout) functions. The tanh function is a rescaled and shifted logistic function; its symmetry around zero allows the training algorithm to converge faster (Bishop, 1995). The rectified linear activation function is more proper when working with biological models (Glorot, Bordes, & Bengio, 2011) and applied mostly in image recognition studies. Goodfellow, Warde-Farley, Mirza, Courville, and Bengio (2013) define a Maxout as a generalization of the LeCun activation function in which each neuron picks the most prodigious output of a fixed number of separate channels. Each channel has its weights and bias values and has the property that it can approximate any convex function of the input. For simplicity, Fig. 2 displays the structure of a simple feedforward neural networks models with back-propagation and with three inputs and two outputs. The circles are neurons, and different weights, biases, and activation functions characterize each neuron. The network has two hidden layers, each of which has four nodes, connected by activation functions. Each connection has its weighting parameter estimated by the model. In Fig. 2, using the training dataset with three independent variables (x1 , x2 , and x3 Þ;the network can learn and understand the specific features contained in the data and associate them with a corresponding dependent variable (y1 andy2 Þ. Each layer in the network takes in data from the previous layers, transforms it using activation functions, and passes it on (feedforward). The complexity and detail of the network increase as it learns from one layer to another layer. The layers may have different or equal numbers of neurons and the same or different activation functions. A thing to notice is that the network learns directly from the data—the researcher does not influence what features the network learns. The input layer in Fig. 2 accepts the inputs, and the neurons of each layer mutate the received inputs by the weights and biases (bi) using activation functions. The information moves from the

A.R. Kaliba et al. / World Development 127 (2020) 104839

5

Fig. 1. A neuron of a feedforward neural network.

Fig. 2. A simple structure of the neural network model with backpropagation.

input layer to the hidden layers that process and send the final output to the output layer through feed-forward propagation. Back-propagation may occur to update the weights and biases of the neurons using the differences between predicted and actual output, which improves the prediction power of the neural network. In the machine learning literature, there is no real set method of deciding on the number of nodes in the hidden layer. A good rule of thumb suggested by Masters (1993) is setting the number of nodes to be between the multiple of inputs and outputs.

2.4. Data analysis and cross-validation As explained before, we define the output one variable (yi1 ) as the propensity for adoption using a binary variable, which is equal to one for adopters (61.44% of the sample) and zeroes for nonadopters (38.56% of the sample) of ISVs. Output two variable (yi2 ) is the intensity of adoption that stands for a proportion of acreage allotted to improved seeds (area under improved seeds/area under all annual crops). The available literature guided the choice of variables included in the model as explanatory variables (xi ). These

6

A.R. Kaliba et al. / World Development 127 (2020) 104839

Table 1 Land allocation to improved and local varieties. Varieties

Adopter (N = 505) Tegemeo Pato Macia Wahi Hakika Sila Mtama 1 Nonadopters (N = 317) Langalanga Other local cultivars

Percent Households

Area Under ISVs

Land Allocated to ISVs

Total Farm Size

ha

St. Err

ha

St. Err

%

18% 14% 22% 12% 10% 10% 4%

0.65 0.81 0.94 0.75 0.83 0.72 0.65

0.051 0.015 0.033 0.056 0.050 0.079 0.064

2.68 2.78 1.91 2.77 1.62 2.28 2.37

0.26 0.39 0.35 0.87 0.35 0.13 0.50

24% 29% 49% 47% 51% 32% 27%

54% 91%

0.85 0.75

0.099 0.095

2.82 2.68

0.46 0.88

NA NA

Note: N is the number of households, St. Err is the standard error, and ISVs is improved sorghum varieties.

variables include socio-economics characteristics of farmers, availability of institutional support systems and infrastructure, farming system characteristics, and regional variables to represent characteristics of varieties adopted by farmers. With unbalanced data, all classification algorithms tend to favor the most frequent class. A right solution involves creating a more balanced subset of the original dataset, where there are balanced classes, thus helping classification algorithms to avoid overfitting towards the majority class. From the 505 samples of adopters, we created a subsample of 317 randomly selected adopters. The final sample included all nonadopters and 317 adopters. We used 80 percent of this sample as a training dataset to ‘‘optimal” coefficient for the double-hurdle model and to learn and find the weights for the neural network structure (Fig. 2). Since there are two response variables, we used different metrics to measure the performance and robustness of the two models. For predicting adopters and nonadopters, we use accuracy, precision, recall, and F1 score as defined in Kohavi (1995) and Lim, Loh, and Shih (2000). For the intensity of adoption, we use the textbook root mean squared error (RMSE). The accuracy measured the proportion of correct prediction of adopters and nonadopters from all projections made using the test dataset. The RMSE measures how much the prediction is far away from the real data (Ullah, Gabbett, Finch, & Br, 2014). Cross-validation is another crucial step in building a predictive model (Kohavi, 1995). The validation process aims to confirm that the model learned the patterns from the real data, and it is not picking up too much noise and no overfitting. We use the k-fold cross-validation method discussed in Arlot and Celisse (2010), which involves randomly splitting the original dataset into subsets of data (also known as folds) and then train the model on all but one (k-1) of the subsets, and then evaluate the model on the subset not used for training. This process is repeated k-times, with a different subgroup reserved for evaluation purposes, each time. We conducted all analyses in the R Software Environment (R Core Team, 2019). The R Stuttgart Neural Network Simulator (RSNNS) package (Bergmeir & Benitez, 2012) was for estimating a multioutput deep learning neural network with backpropagation. The rules-of-thumb for determining the number of neurons per layer is by a geometric pyramid rule proposed by Masters (1993). A network with n inputs and m outputs would pffiffiffiffiffiffiffiffiffiffiffiffi have ( n  m) numbers of neurons. Since we had 20 inputs and two outputs, we estimate a neural network with three hidden layers, each with six neurons. We then used the predicted value of propensity for adoption (Pr [y/x]) and expected intensity of adoption (E[Y/X]) from the neural

network (using the original dataset) with all explanatory variables to cluster the households using t-NSE. Since there are different types of variables in the sample, we used the Gower Distance explained in Gower (1971) to partition farmers into (dis)similarity groups. Partitioning around medoids (PAM) explained by Jin and Han (2016) was a method of choice. The silhouette width discussed by Campello and Hruschka (2006) was an internal validation metric for selecting optimal numbers of clusters. We then used the tsne package (Donaldson, 2016) to visualize the identified groups. For interested leaders, van der Maaten and Hinton (2008) and Hinton and Roweis (2012) present the basics and theoretical foundation of the t-SNE algorithm. We also estimated the doublehurdle model using the mhurdle package (Croissant, Carlevaro, & Hoareau, 2016). The results from the two models are compared using the robustness and performance metrics described above.

3. Results and discussion 3.1. Description and summary statistics of variables Summary statistics on the incidence and intensity of adoption are as shown in Table 1. The total farm size of cropland cultivated in the 2013/2014 farming season. All farmers grew a single variety rather than a combination of different varieties. The Macia was the most widely adopted variety (22% of adopters) followed by Tegemeo variety (18% of adopters). Land allocation to ISVs was high among Hakika adopters. While Hakika adopters counted for 10% of the adopters, they allotted 51% of the total cropland to Hakika variety. Macia and Hakika adopters have relatively smaller landholdings. These farmers cultivated about 1.91 and 1.62 ha of land, respectively, and allotted 0.94 (49%) and 0.72 (32%) hectares to ISVs. Other adopters had about 2.28 (Sila adopters) and 2.78 (Pato adopters) hectares of cropland. The proportion of land allotted to ISVS varied from 24 percent (Tegemeo variety) to 47 percent (Wahi variety). There is more variability in the percentage of the area allocated to Sila, Mtama 1, Wahi, Hakika, and Tegemeo varieties compared to the portion of land allocated to Pato and Macia varieties. The majority of nonadopters (91%) cultivated other local varieties other than Langalanga landrace. There were no across regional variations on types of varieties adopted by farmers. Based on the available literature and data, we selected twenty explanatory variables to include in the two models, and Table 2 shows critical categorical variables. The P-value is the table present probability values for the parametric z-test. The p-values in Table 2

7

A.R. Kaliba et al. / World Development 127 (2020) 104839 Table 2 Characteristics of the household head. Variable

Proportion

Total

P-Value

Adopters

Nonadopters

If the household head is resident No Yes

0.004 0.996

0.019 0.981

8 814

0.078*

Household type Female headed Male headed

0.141 0.859

0.117 0.883

108 714

0.376

Marital status of the household head Widow Divorced Single Married

0.091 0.028 0.026 0.855

0.117 0.047 0.025 0.811

84 29 21 689

0.035** 0.198 0.988 0.110

0.117 0.015 0.038 0.737 0.094

0.169 0.031 0.044 0.702 0.054

106 16 31 560 61

0.047** 0.185 0.828 0.33 0.058*

0.214 0.782

0.186 0.811 505

170 652 317

0.372

Education of household head Cannot read and write No school but car read and write Primary level education (1–3) years Primary level education (4–7) years More than seven years of education Gender of the main farmer Female Male N

822

Note: The stars show statistical significance at the 1% (***), 5% (**), and * 10% level, the number represents the sample size in each group, and P-value is the probability value from the parametric z-test. Total numbers size may not sum to 822 dues to missing data.

Table 3 Means and standards deviation of important household level variable. Variables

Adopters

Nonadopters

P-Value

Weighted labor equivalent scale

17.617 (6.985) 6.382 (2.208) 0.749 (0.628) 47.54 (14.28) 36.38 (10.43) 8.224

17.779 (6.638) 6.416 (2.235) 0.711 (0.543) 47.99 (15.03) 37.19 (10.34) 7.656

0.742 (0.311) 0.831 (0.551) 0.372 (0.078) * 0.667 (0.696) 0.277 (0.469) 0.041 **

(3.855)

(3.892)

(0.541)

Unweighted household size Dependent ratio Age of household head Geometric mean age of all adults Weighted education variable for all adults

Note: The stars indicate statistical significance at the 1% (***), 5% (**) and * 10% level. The numbers in brackets are standard deviations. The labor equivalency P P P (P 1 P 2 R þ i P 2i Ri 0:7 i M i þ 0:3 i C i ) and P1 is a dummy variable if the head of the household was available for agricultural activities, and P2 is an ordinal variable for the main occupation of the household member, including the household head (3 = farming, 2 = other related agricultural activities, 1 = other activities). The variable R is a dummy variable indicating if household member including the household head is resident and M is the dummy variable indicating if the household members is a close relative (18 and older) such as wife, son, daughter, etc., who are dependent on the household head. The variable N is the dummy variable indicating if the member is not a close relative (18 and older) such as son in law, niece, etc., who are partially dependent to the household head, and C is a dummy variable if the household member was a child (10–17 years old). The household-level eduP cation/literacy index was also estimated as ( ij Hij AEi =Hij ) and AEi s the number of adults in each education group and Hij are corresponding weights for each education group. The weights were respectively, one, two, three, four, and five, representing respectively cannot read or write, can read, and write but no primary education, four years, seven years, and more than seven years of primary education.

that are greater than 0.1 show enough evidence to reject the null hypothesis that the proportions (between adopters and nonadopters) are equal. Eight out of 822 sample households showed that the head of the household was not living in the home, which was statistically significant (p < 0.1). Results in Table 2 also show that household types (Femaleheaded vis Male headed) were similar across the two sub-groups.

Regarding marital status and education level, it was more likely to find widow (p < 0.05) and illiterate head of households (p < 0.05) in non-adopter groups. Conversely, it was more likely to find the household heads with more than seven years of education in the adopter sample (p < 01). Widows and illiterate are likely to be poor and, therefore, more risk-averse. In both groups, most heads of households had between 4 and 7 years of education. Several studies, including Croppenstedt, Demeke, and Meschi (2003), suggest that adoption rates increase with the education of the farmer, especially if the technology is sophisticated and learning enhances the skills to use the technology (Marra, Pannell, & Ghadimb, 2003). Another household level characteristic that affects adoption is the gender of the farmer link with other factors that indirectly influence adoption behavior, especially genderlinked differences in access to critical inputs. The proportion of female farmers among adopters and nonadopters were statistically similar across the two groups. Table 3 shows the mean and standard deviation (in bracket) of continuous explanatory variables. P-values are from the z-test that compares the means of two groups. The results in Table 3 show that except for the dependency ratio and weighted education variable, the means and standard deviations are statistically nonsignificant for the other variables. While the dependency ratios are equal, the variable is more spread among adopters. For the weighted education variable, the level of spread was identical, but the dependency ratio was higher among adopters. Labor availability is a crucial factor in technology adoption, especially when labor supply is sparse, and it is challenging to hire extra labor. Otherwise, the adoption of labor-intensive technology is likely if labor is abundant and cheap, or if opportunities for household members to seek non-farm employment are artificially depressed. The dependency ratio in Table 3 is a measure showing the number of dependents (aged 0–14 and over the age of 65) to the total population in the housed (aged 15–64). Most studies use the number of adults in the households as a proxy for labor availability. However, the number of adults does not show if an individual took part in agricultural activities. In developing countries, such as Tanzania, child labor is also an essential input in agricultural production. To estimate available labor,

8

A.R. Kaliba et al. / World Development 127 (2020) 104839

Table 4a Other Covariate Variables. Variables

Adopters

Nonadopters

P-Value

Knowledge on improved seeds (years)

3.97 (6.15) 933,069.29 (171,829.62) 6.25 (4.03) 1.28 (2.82) 18.13 7.54

2.34 (7.84) 819,294.96 (129,425.15) 4.70 (3.99) 1.05 2.37 10.46 4.01

0.0001

Total Household wealth (Tshs) Quality of government extension services Quality of extension services from NGOs Participation in market activities (%) Participation in credit market (%)

***

0.2812 0.0001

***

0.2138 0.4631 0.4151

Note: The stars indicate statistical significance at the 1% (***), 5% (**) and *10% level. The numbers in brackets are standard deviations.

Table 5 Determinants of the probability of and intensity of Adoption.

Table 4b Distribution of Sample households.

Variable

Distribution of sample households

Number

Number

% Adopters

Minimum interaction with research Some interaction with research activities High interaction with research activities Intermediate Potential for sorghum production High potential for sorghum production Dodoma Region Kilimanjaro Region Manyara Region Shinyanga Region Singida Region

87.00 358.00 60.00 207.00

58.00 199.00 60.00 143.00

60.00 64.27 50.00 59.14

298.00 62.00 31.00 63.00 63.00 286.00

174.00 40.00 26.00 47.00 55.00 149.00

63.14 60.78 54.39 57.27 53.39 65.75

Note: To estimate the quality of extension services, the farmer answered three questions. The first question asked the farmers if an extension agent from either the private sector or the government visited the farmer. The answer was yes or no. During the follow-up question, the farmer determined the frequency of visit and rated the service at a scale of good (3), average (2), and bad (1). The extension service variable is, therefore, a multiple of rating by the farmer and the frequency of visits.

we considered if the household member was resident, the relationship of a member to the household head, age, whether the member was in school, whether the member was available for farm activities (full-time or part-time) in the last two months. Therefore, the weighted labor equivalent scale in Table 3, adjust for household member’s age and availability in farm activities, and economies of scale as suggested by the modified Oxford scale (OECD, 2008). The scale assigns a value of 1 to the household head, of 0.5 to each added adult member and 0.3 to each child (10 years and older). Adjusted labor availability was similar among the adopter and nonadopter groups. The age of household head in Table 3 captures the experience in agricultural production. Farmers may gain management experience through time, learn more on inputs application, and their expected returns. Although they may have more resources to invest in new agricultural technologies, older farmers tend to be conservative and reluctant to adoption due to risk aversion. Although the age of the household head may have a farreaching influence on both skills and experience in agricultural production, joint decision-making during the adoption process is also common to many households. In Table 3, the geometric mean age of close relative in the household, therefore, presents a better measure of accumulated skills and experience of all adult farmers in the household. Also, due to the joint decision-making process, adults in the families are the manager of different activities associated with technology adoption see, for example, Meijer et al. (2015). The weighted education variable for all adults in Table 3, is the years of education weighted respectively by pro-

Probit (decision) model results Intercept Dummy variable for the Singida Region Dummy variable for the Kilimanjaro Region Dummy variable for the Manyara Region Dummy variable for the Shinyanga Region Dummy variable for the gender of Household head Dummy variable for the marital status of the household head Available labor equivalency in man-days Dependent ratio The weighted mean age of adults in years The weighted mean education for all adults in years Education of household head in years Ranking on the quality of government extension services Ranking on the quality of NGOs extension services Dummy variable if the farmer has access to credit Dummy variable if farmer participation in crop market Log of total wealth Farmer is in a village where research activities are high Farmer is in a village where research activities are low High potential for sorghum production An index of the quality of the built environment Truncated (intensity) model results Intercept Gender of Household head Marital status of the household head Available labor equivalency in man-days Dependent ratio The weighted mean age of adults in years The weighted mean education for all adults in years Education of household head in years An index of the built environment at the village level Standard error of the mean Correlation coefficient Test of normality for distributed errors Log-Likelihood ratio test

Estimate Pr(>|t|) 7.5334 0.3129 0.9562 1.1326 0.0212 2.0662 1.2597

0.0001 0.2311 0.0046 0.0016 0.9417 0.0001 0.0001

0.0187 0.0031 0.0146 0.0386 0.1655 0.0357

0.7943 0.9791 0.0622 0.1408 0.0135 0.0224

** **

0.0208 0.1597 0.2321

0.3813 0.4158 0.0663

*

0.4856 0.5976

0.0001 0.0463

*** **

0.6400

0.0245

**

0.1125 0.5374

0.4908 0.0434

**

3.1393 1.1867 0.9677 0.0287 0.2160 0.0095 0.0088 0.0144 0.1820 0.6483 0.7184 12.2680 28.3002

0.0022 0.0002 0.0004 0.5375 0.0053 0.0494 0.6160 0.7105 0.0121 0.0001 0.0001 0.4478 0.0158

*** ** ** *** ***

*

*** *** *** *** **

** *** *** **

Statistical significance codes: *** significant at 1%, ** significant at 5% and * and significant at 10% level, respectively.

portions of household members who cannot read and write, household members who did not go to school but can read and write, household members who have up to three years of primary education, household members have up to seven years of primary education, and household members who have more than 7 years of primary school. Table 4 presents the summary results related to the availability of institutional support systems and other explanatory variables. Log of total wealth status removes capital constraint during the

A.R. Kaliba et al. / World Development 127 (2020) 104839

adoption processes. The wealth is the sum of a tangible asset, the value of livestock, and cash income. The tangible asset is the value of consumer durables such as radio, television, telephone, refrigerator, bicycle, oxcart, etc. Cash income is from annual sales of improved sorghum varieties, local sorghum varieties, other crops, all kinds of livestock plus income from family businesses, revenue from formal and informal employment, and other sources as recalled by the farmer. Extension and advisory services enhance farmers’ knowledge and skills, removing information constraints and increase the speed of technology transfer. The quality of extension services is the weighted mean of advisory services received by farmers. In Table 4a, the market participation and credit availability variables are respectively percentages of farmers who sold some of the sorghum produced and who used credit to buy inputs such as seeds and fertilizer for sorghum productions. Although not statistically significant, market participation for both adopters and nonadopters is low, which emphasizes the subsistence nature of sorghum production in the study area. Similarly, credit use is by very few farmers, which also emphasize capital constraint in agricultural inputs for sorghum producers in the study area. In this study, access to credit or credit use combines both formal loans from the bank or microfinance institution and credit from informal sources such as friends and relatives to buy agricultural inputs. Table 4b shows the percentage of households living in areas with lower, intermediate, and high interaction with research activities. Research activities include verification and on-farm trials, demonstration plots, or farmer-to-farmer field days. High research interaction means the farmers lived in a village in which ICRISAT, in collaboration with DRD, conducted leading research and extension activities, and the respondent took part in these activities. The intermediate research interaction implies that farmers lived in the village or nearby village with an elevated level of research activities, and the respondent was aware of these activities. The indicator variable for low research interaction entails that the farmers were unaware of these research activities. 3.2. Models diagnoses and typology of adopter and nonadopters Since the double-hurdle model was just for comparison purposes, discussion on the regression results are not in detail, and we focus on model performance. The results of the dependent double-hurdle model using the testing dataset are in Table 5, and the distribution of estimated coefficients through k-fold validation is in Appendix 1. The last part of Table 5 shows the model fit test results. The normality test in Table 5 tries to establish if the distribution of the errors is normal. Because the estimated p-value is 0.4478, which is higher than the significance level of 0.01, we fail to reject the null hypothesis. The likelihood ratio test compares the results from the standard double-hurdle model where an adoption decision is independent of the proportion of land allotted to improved seeds. The test results in Table 5 and the estimated correlation coefficient both support the assumption of dependency between adoption decision and allocation of land to improved sorghum varieties. The first part of Table 5 shows the estimated coefficients and probability values used for testing the level of significance for both the decision and intensity of adoption. For regional dummy variables in the decision-model was estimated using the probit model, and the Dodoma Region is the control for the regional variables. A negative sign shows that farmers in the Dodoma Regions are likely to adopt ISVs compared to other regions, especially Kilimanjaro and Shinyanga Regions. The two regions are newcomers in sorghum production due to increased shortage and variability of rainfall and high price of sorghum due to increased demand (Kabyemela, 2015; Kombe, 2012). Traditionally, the remaining

9

three regions, that is, Singida, Shinyanga, and Dodoma, are leading producers of sorghum in Tanzania, and the adoption propensity for adoption may be the same. All statistically significant variables in the decision model have expected signs, as discussed in the adoption literature. A positive sign shows that the variable increases a propensity for adopting ISVs, and a negative sign indicates that the variable is associated with increasing the probability of nonadoption. Although not statistically significant, the education weighted mean for all adults have a reversed sign and is negative. There is a consensus that an increase in the overall knowledge of household members should have a positive effect on agricultural technology adoption. Since the education of household head is statistically significant, this may be a sign that the head of households dominates the adoption decision-making process, as discussed in Sidibe (2005) and Doss (2013). Also, all signs of statistically significant variables in the intensity model have expected signs. Variables with positive signs support the hypothesis that these variables are associated with increased land under ISVs, and negative signs show that these variables correlate with low acreage under ISVs. Notice that while the variable for the dependency ratio was not statistically significant in the decision model, it was negative and highly statistically significant in the intensity model. Sheikh, Rehman, and Yates (2003) argue that the dependency ratio is inversely related to technology adoption due to higher risk aversion among households with a higher dependency ratio. The variable standing for the weighted mean age of all adults in the household had a negative sign and was statistically significant. Since the lower and upper values of the variable represent households with young and old members, the results imply that younger families are more likely to allocate land to ISVs compared to the households dominated by older members. For example, Doss (2006), David (2005), and Foster and Rosenzweig (2010) show that while families dominated by older adults have more experienced in agricultural production and have access to more resources, young farmers are more innovative and more likely to take the risk. Fig. 3 shows the results of the estimated relative importance of input variables in the neural networks model with backpropagation. Appendix 2 presents similar values calculated through k-fold validations. A procedure for calculation of the relative importance of explanatory variables in the neural network model with backpropagation is explained in David, Rumelhart, Hinton, and Williams (1986). The measure identifies the relative importance of explanatory variables by deconstructing the model weights and determining the relative importance or strength of association between dependent and explanatory variables. The connecting weights are tallied for each input node and scaled relative to all other inputs. Note that the weights that connect variables in a neural network (see Fig. 1 above) are partially analogous to parameter coefficients in a standard regression model. As presented in, the weights dictate the relative influence of information processed in the network such that the weights suppress input variables that are not relevant in their correlation with response variables. The results in Fig. 3 indicate the relative importance of each input variable for both propensity and intensity variables. These values are scaled from 1 to 1, corresponding to all other inputs, and tell the specific importance of each variable when predicting the incidence and intensity of adoption. The bar plot tells us that the variables log of total wealth (LTweath), education of household head (EDUHHH), ranking on the quality of government extension services (GOVGR), marital status of the household head (MSTATUS) and an index showing the quality of built environment at the village level (EFF3) have the most robust positive relationships with the response variables. Similarly, five variables with the most substantial negative correlation with the response variable are a

10

A.R. Kaliba et al. / World Development 127 (2020) 104839

Fig. 3. The importance of variables as estimated from the test dataset.

dummy variable representing the Shinyanga Region, the dependency ratio (DERatio), the gender of household head (GENDER), dummy variables representing Kilimanjaro (KILI), and Manyara Regions. Note that variables that have relative importance close to zero do not have any substantial influence on response variables but will most likely have some marginal effect on the response variables. However, based on the results, the prediction power of variables that are close to zero are irrelevant in the context of the other explanatory variables. The estimated coefficients in the double-hurdle model for the geometric mean age of all adults in the household (Agemean) and a variable indicating if the farmer participated in the credit market (CREDIT) have positive signs in the double-hurdle model and negative signs in the deep learning neural networks model. For propensity of the adoption model, all variables in the double-hurdle model and the deep learning neural network have the same direction in terms of marginal impact. The results in Fig. 4 present the distribution of performance and robustness metrics of deep learning neural networks with backpropagation and the dependent double-hurdle model. As discussed above, the precision, recall, and F1 are robustness and performance metrics for predicting the propensity for adoption and the RMSE for predicting the intensity of adoption. All metrics indicate that the deep learning neural networks model outperforms the double-hurdle model regarding predicting the incidence and the intensity of adoption. While the estimated precision from the double-hurdle model ranged from 0.46 to 0.62, with a mean of 0.54, the comparable value from the neural network ranged from 0.72 to 0.91 with a mean of 0.82. In terms of these metrics, the waste-case scenario of the neural networks model outperformed the best results of the double-hurdle model by 10% points. The deep learning neural networks with back-propagation also outper-

formed the dependent double-hurdle model when using the Recall and Fi metrics. The last part of Fig. 4 shows the distribution of the RMSE for both models. The estimation of the RMSE involves averaging the squared errors, which gives a relatively high weight to large values. Therefore, the RMSE statistic is a negatively oriented score, meaning that lower values are better, and they are more useful when large errors are particularly undesirable. The RMSE from the double-hurdle model ranged from 21.76 to 32.73, with a mean of 27.43. The range of RMSE from the neural networks model was from 8.29 to 30.70, with a mean of 16.45. Although there is an overlap between the two models, the third quartiles were respectively 27.90, and 18.64 and the 90th percentile were respectively 28.96 and 21.74 for the double-hurdle model and the neural networks model. The skewness of the estimated RMSE from k-fold validation was respectively 0.67 and 0.11 for the double-hurdle model and the neural networks model. The results from neural networks model were relatively symmetric compared to the results of the double-hurdle model. Sakthivel and Rajitha (2017) found that comparable results were neural networks model outperformed the double-hurdle model and provided a better fit in terms of RMSE when modeling claims frequency. We, therefore, used the neural networks model to estimate the propensity for adoption and intensity of adoption included in the tSNE model to profile adopters and nonadopters. The first part of Fig. 5 shows the relationship between the estimated average silhouette distance and the proposed number of optimal clusters for both adopters and nonadopters. When there are three clusters, the mean silhouette distance is 0.3 for the entire dataset. However, the shape of the graph does not taper off after three clusters, implying that many farmers are outside or on the boundary of

A.R. Kaliba et al. / World Development 127 (2020) 104839

11

Fig. 4. Distribution of robustness and performance metrics from the k-fold validation.

Fig. 5. Proposes numbers and Position of the Nine Clusters.

the three selected clusters. Tapering occurs when the number of clusters equals 16. Other potential numbers of clusters are 7, 9, and 14. We expected these results, given the heterogeneous nature of small-scale farms. For example, using cluster analysis to study family farms in Switzerland, Hoop et al. (2014) estimated a mean silhouette distance of 0.24 for 12 optimal clusters. Gorgulu (2010) used similar techniques to classify dairy animal performance and calculated average silhouette distances that were between 0.35 and 0.52, and the average silhouette distance was 0.203 for 12 clusters. We used visual inspection to find the number of clusters with the best results after plotting the clusters using the t-SNE and Barnes-Hut algorithm to approximate the distance between households. The algorithm reduced the number of pairwise distances and grouped the farmers into nine clusters, as shown in the second part of Fig. 5. For these nine clusters, there were few overlaps, only a few farmers are outside the eclipse of each cluster, and each cluster had a reasonable sample size. Cluster 1 constitutes 31 adopters and 10 nonadopters (5% of the sample), and cluster 2 had 31 and 30 nonadopters (7.40% of the sample). Cluster 3, 4, and 5 each had 31, 63, 34 adopters and 26, 47, 25 nonadopters (or 7.00%, 13.40%, and 7.1% of the sample). Clusters 6, 7, 8, 9 each had respectively, 29, 70, 78, and 138 adopters and 30, 51, 39, and 59 nonadopters (or 7.10%, 14.70%, 14.20%, and 24.00% of the sample). The as radar/spider charts in Fig. 6 present the summary characteristics of identified farmers’ groups. The abbreviations in the charts denote the gender of household (gend), intensity of adoption (inte), propensity for adoption (prop), index of village levelbuilt environment (buil), high (hrds), intermediate (irds), and low

12

A.R. Kaliba et al. / World Development 127 (2020) 104839

Fig. 6. Farm and farmer characteristics of adopters and nonadopters.

(lrds) spillover from research activities, log of total wealth (wealth), and crop (crop) and credit markets (credit) participation. Other variables represent quality of non-governmental organization (engo) and government extension services (egov), education of household head (eduh), weighted education of all adults in the household (wedu), weighted mean age of all adults in the household (wmage), dependent ratio (ratio), weighted labor equivalent scale (labor), and marital status of the household head (msts). Additional variables indicate if the farmers live in a village with high (higm) or intermediate production potential (medm). Since all continuous variables were normalized to range between zero and one, and all dummy variables range between zero and one, the center of the wheel or the x-axis represents a minimum value, which is zero for clusters, the middle cycle and the last cycles represent the average and the maximum values relative to the sample. The scores on each variable radiate outward on spokes from a central zero hub, and the edge of the wheel are the maximum values from the sample. On each spoke, the further towards the edge of the wheel a variable reach, the higher the value of the variable. Since the objective is determining the variables that are lacking in each group, a non-limiting variable will be close to the edge of the wheel or will relatively stand out compared to other variables. The comparison is among adopters and

nonadopters; the charts, therefore, represent varying characteristics and what is lacking for the cluster to improve both the propensity and intensity of adoption. The variables that are similar within the cluster are therefore not plotted. There were 41 households in cluster 1, and all were from the Dodoma Region and in villages with high and medium interaction with research activities. Therefore, the plot for this cluster has neither regional variables nor indicator variables for research interactions. Reading the chart anticlockwise (and for all other plots thereafter), predicted propensity for and intensity of adoption were higher among adopters compared to nonadopters and were also above the sample averages. The adopters lived in villages with a higher index in terms of the built environment and were relatively wealthy, and the education level of household head and literacy index within the household were relatively higher when compared to nonadopters and the sample. Labor supply was relatively low. The adopters had a higher dependency ratio compared to nonadopters, but it was still below the sample average. The cluster constitutes households with young farmers with relatively low education, with limited access to extension services from both government and nongovernment organizations. Credit availability and participation in crop markets were also low. The most limiting factors to adoption were income and education of household head.

A.R. Kaliba et al. / World Development 127 (2020) 104839

Fig. 6 (continued)

13

14

A.R. Kaliba et al. / World Development 127 (2020) 104839

Improvement in the institutional support systems that enhance extension services quality, increase the availability of credit, and promote participation in crop marketing services will scale-up the adoption of ISVs among these households. Also, 61 farmers in cluster 2 were from the Dodoma Region but villages with limited contact with research activities and in the farming system with medium production potential but relatively higher extension services from the government. The distributions of all other characteristics were similar as in cluster 1, except cluster 2 include all-female adopters in the Dodoma Region. To scaleup, the adoption process in the Dodoma Region requires an immediate increase in research activities such as on-farm trials, demonstration plots, and farmers’ field day that focus on increasing awareness on ISVs, especially among female farmers. Improvement in the institutional support system will have a long-time impact on increasing both the propensity and intensity adoption. Fifty-seven households in clusters 3 lived in villages with high research activities and were from medium production potential in the Kilimanjaro Region. There were significant differences between the predicted propensity for and intensity of adoption between adopters and nonadopters. Adopters were mainly from villages with a high built environment, and an indicator of wealth was almost similar among adopters and nonadopters. Although market participation was relatively low compared to the total sample, participation was higher among adopters compared to nonadopters. Credit availability and quality of extension services from nongovernmental organizations were both low and relatively equal across the two groups. Extension services contacts among adopters were significantly extensive among adopters and above average within the sample. Expect for the marital status in which the nonadopter group had a higher proportion of female farmers; all other variables were almost similar across the two groups. Improvement in the extension services among nonadopters, including farmer training activities and removing credit and marketing constraints, have the potential of increasing adoption rate in the Kilimanjaro Region. Cluster 4 constituted 110 households from the Manyara Region. They lived in villages with both high and medium spillover from research activities, and the farming has medium potential for sorghum production. While the predicted propensity for and intensity of adoption were significantly higher among adopters, the built environment was similar. Adopters were slightly wealthier than nonadopters, and credit availability was also slightly higher among adopters. Market participation was low and at a similar level. Extension contact from nongovernmental organizations was higher among nonadopters. Adopters in this group received a significant quality extension services from the government when compared to nonadopters and the sample. While the education level of the head of the household was higher among adopters, the distribution of the education index was equal for the two groups. Both groups constituted young farmers with equal dependent ratio and marital status. Labor availability was slightly higher among nonadopters, and most female farmers were in the nonadopter group. Targeted extension services are more likely to increase adoption and produce impactful results Cluster 5 constituted 59 households from the Shinyanga Region with elevated levels of spillover from research activities and medium potential for sorghum production. Although both adopters and nonadopters experienced a similar built environment, the predicted value of propensity for and intensity of adoption in Cluster 5 was above the sample average and higher among adopters. Adopters were wealthier than nonadopters, and market participation, credit availability, and extension services from nongovernment organizations were meager for both

groups. Quality of extension services received from the government, education level of household head, and literacy index at the household were also higher among adopters and nonadopters were relatively older compared to nonadopters. Dependency ratio and labor availability were slightly higher among adopters, and most unmarried farmers were in the adopter group. Apart from being inclusive, research, and extension services that target older farmers (who usually have lower education) will bring an immediate impact in accelerating the adoption process of ISVs among farmers in this group. In cluster 6, there were 59 households all from the Shinyanga Region who lived in villages with a low level of spillover from research activities, and the farming system had a medium sorghum production potential. The predicted value of propensity for and intensity of adoption was below the sample average but higher among adopters. Although adopters and nonadopters were equally wealthier, adopters lived in villages with an elevated built environment that guaranteed improved agricultural services. Also, market participation, credit availability, and extension services from nongovernment organizations were low for both groups. Although below the sample average, the quality of extension services received from the government, education level of household head, and literacy index at the household were higher among adopters. The age distribution was similar for the two groups. In this cluster, while the dependency ratio was slightly higher among nonadopters, labor availability was higher among adopters, and most single-parent households belonged to the adopter groups. This group tends to have a smaller landholding. Research and extension services and institutional support systems that target female farmers are more likely to scale-up the adoption of ISVs in this cluster. About 121 farmers from Manyara Region formed Cluster 7. They were from villages with elevated levels of spillover from research activities, and the farming system had a higher potential for sorghum production. The predicted value of propensity for and adoption intensity was somehow similar; however, nonadopter lived in villages with a slightly higher built environment. While the adopters were relatively wealthier, and most single-parent households belonged to the adopter’s group, all other variables had similar values and were all below the sample averages. Adopters and nonadopters have equal potential in scaling-up the adoption process. This cluster needs an integrated approach and coordinated efforts to increase both the incidence and intensity of adoption. Despite the high potential in terms of sorghum production and high research activities, low quality of extension services, and unavailability of important institutional support system is dragging down the adoption process. Cluster 8 has 117 farmers from villages with elevated levels of spillover from research activities, high and intermediate sorghum production potential farming systems in the Singida Region. The propensity for and intensity of adoption was higher among adopters but below the sample average. The built environment was slightly higher among nonadopters, and adopters were wealthier than nonadopters. There was an equal and higher market participation among adopters and nonadopters. The availability of credit and extension services from nongovernmental organizations were still low. The availability and quality of government extension services, education of household head, and literacy index at the household level were higher among adopters but were below the sample averages. The cluster included young farmers with low dependency ratio and labor availability. Poverty and weak institutional support systems are the primary constraints limiting the adoption process. Research and extension systems could take advantage of the current market activities and increase the income of the farmer.

A.R. Kaliba et al. / World Development 127 (2020) 104839

15

All 197 households that formed Cluster 9 were from the Singida Region. The villages experienced low levels of spillover from research activities, and the farming system had an intermediate potential for sorghum production. The adopters predicted propensity for, and the intensity of adoption was just above the sample average. For nonadopters, predicted propensity for, and intensity of adoption was below the sample average. The village level-built environment was higher among adopters, but adopters were wealthier than nonadopters. The indicator variable for participation in the crop market was similar across the two groups, and the availability of credit and quality extension services from the government were still low. The availability of quality extension services was higher among adopters but below the sample average. The distribution of all other remaining variables was relatively the same among adopters and nonadopters. The cluster members were young farmers with a low dependency ratio. Credit availability, limited income, and research and extension services appear to be the main constraints, especially among nonadopters. Lack of credit, participation in crop markets, and availability of extension services from nongovernment organizations were deficient in all nine clusters. Specific government or public policies are essential in ameliorating the credit market. The substance nature of sorghum producers accentuates limited market participation. Farmers must produce surpluses to participate in the crop market. Increased production through the adoption of ISVs will create new opportunities for value addition, increase the income and wealth of farmers. Sample households were either in villages with high, intermediate, and low spillover from research activities and high or medium potential for sorghum production. Significant variables differentiating households across clusters were wealth, availability of government extension services, and education level of household head, literacy index at the household level, and distribution of genders of farmers in the cluster. These results emphasize the need for developing targeted farmer support systems, gender-sensitive, and underscore the principle of learning by doing. Requirements to scale-up the adoption process were generally regional specific. Localizing research and extension activities is necessary for meeting real demand and having a sustainable impact on the lives of people. Because a significant percentage of research and extension activities work has always been toward solving localized agricultural problems, the results support expanding the role of regional and district officials in developing regional focused research and extension activities.

Russell and Norvig (2015) show that this approach finds an exemplary weight at each connection, after the learning rule calculates the error at the output unit, this error is backpropagated to all the units such that the error at each unit is proportional to the contribution of that unit towards total error at the output unit. The fundamental principle is that each layer can be pre-trained by unsupervised learning, one layer at a time, and backpropagation is for fine-tuning all the layers. The process gives better initialization through unsupervised learning rather than by random initialization. Summary of the practical application of deep learning could be found in Bengio (2009), Zhang (1990), and as applied in the face and voice recognition, text translation, and advanced driver aid systems. Introduced by van der Maaten and Hinton (2008), the t-SNE is a variation of Stochastic Neighbor Embedding of Hinton and Roweis (2002) and allows optimization and producing significantly better visualizations by reducing the tendency to lump points together in the center of the map that often renders the visualization ineffective and unreadable. The technique is good at creating clusters that reveal embedding relationships at many different scales by applying the Gower distance. This knowledge generated in this paper is essential in-term of formulating specific agricultural recommendation packages and targeting specific groups of farmers to scale-up the adoption process. Results show that Tanzania sorghum producers are heterogeneous, therefore targeted research and extension efforts aimed at training farmers on improved sorghum production activities will have a more effective outcome. Expanding marketing opportunities in the Manyara has the potential for increasing adoption among adopters and attracting new adopters. There is an immediate need for improving the quality of extension services in the Dodoma regions. Implementing research and extension services that focus on learning by doing in the Kilimanjaro Region will produce impactful and sustainable results. Shinyanga and Singida regions require gender-sensitive research and extension efforts that target young and female farmers. Overall, there an urgent need to localize public institutional support systems to meet the immediate needs of farmers. Coordinated efforts between research, extension, and public policy, is essential in promoting easily accessible agricultural technology aimed at solving localized problems.

4. Summary and implications

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Applying deep learning neural networks with backpropagation allowed to efficiently estimate propensity for and intensity of adoption of improved sorghum varieties in Tanzania. The neural network model outperformed the conventionally double-hurdle models in predicting propensity for and intensity of adoption after controlling for factors that influence the adoption process. The predicted value of propensity for and intensity of adoption was combined with sample households’ characteristics and typify them into nine clusters using t-Distributed Stochastic Neighbor Embedding (t-SNE). As explained in the deep learning neural networks model with backpropagation uses learning rules such as least mean square, gradient descent, newton’s rule, conjugate gradient, etc., to estimate the weights of the connecting layers.

Declaration of Competing Interest

Acknowledgments We want to thank farmers who willingly participate in the study and extension agents in the Dodoma and Kilimanjaro Regions, who conducted the surveys. This study was funded by the International Crop Research Institute for Semiarid Tropics (ICRISAT), Nairobi, Kenya, and Economic and Impact Assessment Program in East Africa. The view expressed in this paper are those of the authors and do not necessarily represent the view of ICRISAT.

16

A.R. Kaliba et al. / World Development 127 (2020) 104839

Appendix 1. Distributions of Parameters from the k-fold validation of the double-hurdle model through resampling and bootstrapping.

A.R. Kaliba et al. / World Development 127 (2020) 104839

17

Appendix 2. Distribution of importance of each variable in the deep neural network estimated through k-fold validation through resampling and bootstrapping.

References Akpan, S. B., Veronica, S. N., & Essien, U. A. (2012). A double-hurdle model of fertilizer adoption and optimum use among farmers in Southern Nigeria. Tropicultura, 30(4), 249–253. Amare, M., Asfaw, S., & Shiferaw, B. (2011). Welfare impacts of maize–pigeon intensification in Tanzania. Agricultural Economics, 43(1), 27–43.

Arabmazar, A., & Peter Schmidt, P. (1982). An investigation of the robustness of the Tobit estimator to non-normality. Econometrica, 50(4), 1055–1063. Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistical Survey, 4, 40–79. Asfaw, S., Shiferaw, B., Simtowe, F., & Mekbib, H. (2011). Agricultural technology adoption, seed access constraints, and commercialization in Ethiopia. Journal of Development and Agricultural Economics, 3(9), 436–477. Bengio, Y., LeCun, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

18

A.R. Kaliba et al. / World Development 127 (2020) 104839

Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. Bergmeir, C., & Benitez, J. M. (2012). Neural networks in r using the Stuttgart neural network simulator: RSNNS. Journal of Statistical Software, 46(7), 1–26. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford, New York: Oxford University Press. Blaylock, J., & Blisard, W. (1993). Women and the demand for alcohol: Estimating participation and consumption. The Journal of Consumer Affairs, 27, 319–334. Breisinger, C., Zhu, T., Al Riffai, P., Nelson, G., Robertson, R., Funes, J., & Verner, D. (2011). Global and local economic impacts of climate change in the Syrian Arab Republic and options for adaptation IFPRI Discussion Paper 1071. Washington, D. C.: International Food Policy Research Institute. Brockett, P. L., Cooper, W. W., Golden, L. L., & Pitaktong, U. (1994). A neural network method for obtaining an early warning of insurer insolvency. The Journal of Risk and Insurance, 61(3), 402–424. Campello, R. J. G. B., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems, 157(21), 2858–2875. Cragg, J. G. (1971). Some statistical models for limited dependent variables with applications to the demand for durable goods. Econometrica, 39, 8299844. Croissant, Y., Carlevaro, F., & Hoareau, S. (2016). mhurdle: Multiple hurdle Tobit models. R package version 1.1-7. Croppenstedt, A., Demeke, M., & Meschi, M. (2003). Technology adoption in the presence of constraints: The case of fertilizer demand in Ethiopia. Review of Development Economics, 7(1), 58–70. Dalog˘lu, I., Nassauer, J. I., Riolo, R. L., & Scavia, D. (2014). Development of a farmer typology of agricultural conservation behavior in the American Corn Belt. Agricultural Systems, 129, 93–102. David, R. L. (2005). Agricultural sustainability and technology adoptions: Issues and policies For developing countries. American Journal of Agricultural Economics, 87 (5), 1325–1334. David, E., Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(October), 533–536. Donaldson, J. (2016). tsne: T-distributed stochastic neighbor embedding for r (tSNE). R package version 0.1-3. https://CRAN.R-project.org/package=tsne. Doss, C. R. (2006). Analyzing technology adoption using microstudies: Limitations, challenges, and opportunities for improvement. Agricultural Economics, 34, 207–219. Doss, C. R. (2013). Intrahousehold bargaining and resource allocation in developing countries. The World Bank Research Observer, 28(1), 52–78. Foster, A., & Rosenzweig, M. (2010). Microeconomics of technology adoption. Annual Review of Economics, 2, 395–424. Gao, X. M., Wailes, E. J., & Cramer, G. L. (1995). Double hurdle model with bivariate normal errors: An application to US rice demand. Journal of Agricultural and Applied Economics, 27, 363–376. Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In International Conference on Artificial Intelligence and Statistics (pp. 315–323). Gollin, D., Morris, M., & Byerlee, D. (2005). Technology adoption in intensive postgreen revolution systems. American Journal of Agricultural Economics, 87(5), 1310–1316. Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. Proceedings ICML, 1319–1327. Gorgulu, O. (2010). Classification of dairy cattle in terms of some milk yield characteristics using fuzzy clustering. Journal of Animal and Veterinary Advances, 9(14), 1947–1951. Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 857–874. Hinton, G. E., & Roweis, S. (2012). Stochastic neighbor embedding. In Proceeding on NIPS’02 Proceedings of the 15th International Conference on Neural Information Processing Systems (pp. 857–864). Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(July), 504–507. Hoop, D., Mack, G., Mann, S., & Schmid, D. (2014). On the dynamics of agricultural labor input and their impact on productivity and income: An empirical study of the Swiss family farm. International Journal of Agricultural Management, 3(4), 221–231. Jin, X., & Han, J. (2016). K-medoids clustering. In book: Encyclopedia of Machine Learning and Data Mining. DOI: 10.1007/978-1-4899-7502-7_432-1. Jones, A. M. (1992). A note on the computation of the double hurdle model with dependence with an application to tobacco expenditure. Bulletin of Economic Research, 44(1992), 67–73. Kabyemela, M. A. (2015). Use of drought-tolerant crops as a strategy for efficient use of available water: Sorghum in same, Tanzania. Water-Smart Agriculture in East Africa page, 67–90. Kaliba, A. R., Mazvimavi, K., Gregory, T., Mgonja, M., & Mgonja, M. (2018). Assessing the adoption of improved sorghum varieties in Tanzania under information and capital constraints. Agricultural and Food Resource Economics, 6(18). https://doi. org/10.1186/s40100-018-0114-4.

Kassie, M., Teklewold, H., Jaleta, M., Marenya, P., & Erenstein, O. (2015). Understanding the adoption of a portfolio of sustainable intensification practices in eastern and southern Africa. Land Use Policy, 42(January), 400–411. Kilimo (2008). Ministry of Agriculture, Livestock, and Fisheries Development (2008) Tanzania Variety list. http://www.kilimo.go.tz/publications/english percent20docs. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence IJCAI, 1137–1145. Kombe, S. (2012). Tanzania: Moshi farmers assured of sorghum, millet market. Tanzania Daily News. (Dar es Salaam) Available: http://allafrica.com/stories/ 201207060815.html. Kuivanen, K. S., Alvarez, S., Michalscheck, M., Adjei-Nsiah, S., Descheemaeker, K., Mellon-Bedi, S., & Groot, J. C. J. (2016). Characterizing the diversity of smallholder farming systems and their constraints and opportunities for innovation: A case study from the Northern Region, Ghana. NJAS-Wageningen Journal of Life Science, 78, 153–166. Lattin, J., Carroll, J. D., & Green, P. E. (2011). Analyzing multivariate data. Pacific Grove, CA: Brooks/Cole, Thomson Learning. Lee, L.-F., & Maddala, G. S. (1985). The common structure of tests for selectivity bias, serial correlation, heteroscedasticity, and non-normality in the tobit model. Microeconomics Review, 26(February), 1–20. Lim, T. S., Loh, W. Y., & Shih, Y. S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithm. Machine Learning, 40, 203–228. Loeys, T., Moerkerke, B., Smet, O. D., & Buysse, A. (2012). The analysis of zero inflated count data: Beyond zero inflated Poisson regression. British Journal of Mathematical and Statistical Psychology, 65, 163–180. Makate, C., Makate, M., & Mango, N. (2017). Smallholder farmers’ perceptions of climate change and the use of sustainable agricultural practices in the Chinyanja Triangle, Southern Africa. Social Science, 6(1), 30. Marra, M., Pannell, D., & Ghadimb, A. (2003). The economics of risk, uncertainty, and learning in the adoption of new agricultural technologies: Where are we on the learning curve? Agricultural Systems, 75, 215–234. Masters, T. (1993). Practical Neural Network Recipes in C++. Morgan-Kaufmann. Mazvimavi, K., & Steve Twomlow, S. (2009). Socioeconomic and institutional factors influencing the adoption of conservation farming by vulnerable households in Zimbabwe. Agricultural Systems, 101(1), 20–29. Meijer, S. S., Catacutan, D., Ajayi, O. C., Sileshi, G. W., & Nieuwenhuis, M. (2015). The role of knowledge, attitudes, and perceptions in the uptake of agricultural and agroforestry innovations among smallholder farmers in sub-Saharan Africa. International Journal of Agricultural Sustainability, 13(1), 40–54. Mgonja, M. A., Chandra, S., Gwata, E. T., Obilana, A. B., Monyo, E. S., Rohrbach, D. D., ... Saadan, H. M. (2005). Improving the efficiencies of national crop breeding programs through region-based approaches: The case of sorghum and pearl millet in southern Africa. Journal of Food, Agriculture and Environment, 3, 124–129. Organization for Economic Cooperation and Development. (2008). Handbook on constructing composite indicators: methodology and user guide. www.oecd. org/sdd/42495745.pdf. Pudney, S. (1989). Modeling individual choice: The econometrics of corners, kinks, and holes. Oxford, U.K.: Basil Blackwell. R Core Team (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/. Rohrbach, D. D., & Kiriwaggulu, J. A. B. (2001). Commercialization prospects for sorghum and pearl millet in Tanzania. Open Access Journal, 1, 1–28. Russell, S., & Norvig, P. (2015). Artificial intelligence: A modern approach. Upper Saddle River, NJ, USA: Prentice Hall. Sakthivel, K. M., & Rajitha, C. S. (2017). A comparative study of zero-inflated, hurdle models with an artificial neural network in claim count modeling. International Journal of Statistics and Systems, 12(2), 265–276. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61(January), 85–117. Sheikh, A. D., Rehman, T., & Yates, C. M. (2003). Logit models for identifying the factors that influence the uptake of new no-tillage technologies by farmers in the rice-wheat and the cotton-wheat farming systems of Pakistan’s Punjab. Agricultural System, 75, 79–95. Sidibe, A. (2005). Farm-level adoption of soil and water conservation techniques in northern Burkina Faso. Agricultural Water Management, 71, 211–224. Soni, A. K., & Abdullahi, A. U. (2015). Using neural networks for credit scoring. International Journal of Science, Technology and Management, 4(5), 26–31. Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26, 24–36. Ullah, S., Gabbett, T. J., Finch, C. F., & Br, J. (2014). Statistical modeling for recurrent events: An application to sports injuries. Sports Medicine, 48, 1287–1293. van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Earning Research, 9, 2579–2605. Wesley, A. S., & Faminow, M. (2014). Research and development and extension services in agriculture and food security. Canada: Asian Development Bank. Zhang, W. (1990). Parallel distributed processing model with local space-invariant interconnections and its optical architecture. Applied Optics, 29(32), 4790–4797.