Environmental controls on cultivated soybean phenotypic traits across China

Environmental controls on cultivated soybean phenotypic traits across China

G Model ARTICLE IN PRESS AGEE-4680; No. of Pages 7 Agriculture, Ecosystems and Environment xxx (2014) xxx–xxx Contents lists available at ScienceD...

2MB Sizes 0 Downloads 14 Views

G Model

ARTICLE IN PRESS

AGEE-4680; No. of Pages 7

Agriculture, Ecosystems and Environment xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Agriculture, Ecosystems and Environment journal homepage: www.elsevier.com/locate/agee

Environmental controls on cultivated soybean phenotypic traits across China Qianqian Li a,b , Yueming Hu a,b,c,∗ , Feixiang Chen a,b , Jinfeng Wang d , Zhenhua Liu a , Zhizhong Zhao c a

Guangdong Provincial Key Laboratory of Land Use and Consolidation, South China Agricultural University, Guangzhou 510642, China Key Laboratory of Construction Land Transformation, Ministry of Land and Resources, South China Agricultural University, Guangzhou 510642, China c College of Agriculture and Animal Husbandry, Qinghai University, Xining 810016, China d Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China b

a r t i c l e

i n f o

Article history: Received 29 July 2013 Received in revised form 10 March 2014 Accepted 16 March 2014 Available online xxx Keywords: Accession Crude oil content Crude protein content Environmental variables Seed weight Phenotypic traits Regression models Soybean germplasm

a b s t r a c t The impacts of environmental variables on basic phenotypic traits of cultivated soybean varieties can be significant and vary with individual traits. However, such studies are extremely rare at the continental scale because of limited observations and potential collinearity and spatial autocorrelation among abiotic factors that can make the attribution difficult. This study was designed to explore and quantify environmental variables that could closely relate to soybean phenotypic traits across China. The data of cultivated soybean phenotypic traits (i.e., 100-seed weight, crude oil content, protein content, plant height) and environmental variables were compiled from 18,686 samples across 29 provinces of China. Different regression models were used to remove collinearity and spatial autocorrelation among selected variables. As the first attempt at a continental scale, our study shows that climatic and geographic variables contributed much more to trait variations than soils did, of which the minimum temperature was most critical, followed by longitude, and soil properties explained more variances on crude protein content than on others. Abiotic variables explained 29, 20, 17 and 38% of the observed variations (P < 0.05) of crude protein content, crude oil content, 100-seed weight and plant height, respectively. This result implies that, besides the effects of farming practices such as fertilization, irrigation, planting density, etc., biotic factors (e.g., genes) likely play a more important role in determining the phenotypic traits and their spatial variability. It is possible to improve soybean quality and yield by selecting suitable environments even though it is hard to develop a kind of soybean varieties with all ideal germplasm traits simultaneously. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Soybean is one of the most valuable crops economically and nutritionally because it has the highest protein content and the second highest oil content of all legumes (Liu, 1997). China, the first country to domesticate soybeans and a major global soybean grower and consumer, has extensive distributions of soybean accessions. Of all soybean germplasm traits, 100-seed weight, plant height, protein content, and oil content are the most valuable phenotypic characteristics related to soybean seed quality (Borras et al., 2004; Pipolo et al., 2004). The variations in these traits are

∗ Corresponding author at: Guangdong Provincial Key Laboratory of Land Use and Consolidation, South China Agricultural University, Guangzhou 510642, China. E-mail address: [email protected] (Y. Hu).

genetically controlled but could be highly influenced by environment conditions (Burton, 1989; Basra and Randhawa, 2002). Soybean traits could vary greatly with geographical locations. For example, in the northwestern area of the United States, soybeans have higher seed oil content and lower seed protein content than those found in the southeastern states (Breene et al., 1988; Hurburgh et al., 1990). Seed weight, plant height, protein content, and oil content could vary largely with variations in temperature. In a controlled environmental greenhouse, 100-seed weight increased with increasing temperature to an optimum level (Sionit et al., 1987) and then declined (Sato and Ikeda, 1979; Baker et al., 1989). A similar temperature impact was reported on seed oil content that was positively correlated with temperature until the higher end of the optimum range (Dornbos and Mullen, 1992; Gibson and Mullen, 1996), whereas, the protein content showed a negative response (Dornbos and Mullen, 1992; Gibson and Mullen, 1996). At a plot

http://dx.doi.org/10.1016/j.agee.2014.03.034 0167-8809/© 2014 Elsevier B.V. All rights reserved.

Please cite this article in press as: Li, Q., et al., Environmental controls on cultivated soybean phenotypic traits across China. Agric. Ecosyst. Environ. (2014), http://dx.doi.org/10.1016/j.agee.2014.03.034

G Model AGEE-4680; No. of Pages 7

ARTICLE IN PRESS Q. Li et al. / Agriculture, Ecosystems and Environment xxx (2014) xxx–xxx

2

or local scale, an experimental field produces greater soybean seed weight and higher plant height in Japan than in Serbia because of higher precipitation in Japan (Miladinovic et al., 2006). In a natural environment at a regional or continental scale, the oil content is positively correlated with the length of day but negatively correlated with the growing season temperature and precipitation (Zu, 1983). The opposite relations were observed on the protein content (Hu et al., 1990). Physical, chemical, and biological properties of a soil significantly affect soybean growth and seed quality. For example, a reduction in potassium supply to the deficient level can lead to lower seed oil content and 100-seed weight but higher protein content (Sale and Campbell, 1986). Plant height was reported to positively respond to soil nitrogen status (Suyantohadi et al., 2010) and soil plowing layer thickness (Yang et al., 1996). These relationships are important hints on deciding which environmental variables should be included in models to evaluate the effects of individual factors on the selected phenotypic traits with a higher confidence. Most of previous studies underscored the significance of correlations between soybean traits and limited environmental factors, and ignored potential collinearity and spatial autocorrelations among selected variables. Here, we explored and quantified the environmental controls on major phenotypic traits of cultivated soybean by statistically removing potential collinearity and spatial correlations among all recorded geographical, climatic, and soil variables across China. 2. Materials and methods 2.1. Data collection The dataset of cultivated soybean germplasms was provided by Chinese Soybean Germplasm Resources Inventory (CSGRI). It contains 18,686 samples that were collected from individual farms across the China as illustrated by Fig. 1. The collection period ranged from 1950 to 2000 but about 80% of all records had been collected since 1978. Each sample came with both qualitative and quantitative measurements of phenotypic traits of soybean accessions and of their associated environmental variables from 29 soybean growing provinces of China (Fig. 1). The geographical ranges for the data set are from 18◦ 29 to 53◦ 33 N and 80◦ 17 to 134◦ 33 E. An average annual precipitation varies from 200 mm in the northwest to 1500 mm in southeastern coastal areas, and daylight hours gradually increase from the southeast to the northwest with the annual maximum of 3550 h in Qinghai and the annual minimum of 791 h in Sichuan. This study mainly focused on four soybean phenotypic traits: 100-seed weight (SW), crude protein content (CP), crude oil content (CO), and soybean plant height (SPH). The environmental (including geographic, climatic, and soil) variables appended to each record of these four traits and their definitions are listed in Table 1. The longitude and latitude were derived from a digitized administrative map of China, and the elevation was obtained from Shuttle Radar Topography Mission (SRTM) 1-km resolution digital elevation data. The climatic variables were derived from the database of WorldClim with a 1-km spatial resolution compiled by Hijmans et al. (2005). To more accurately represent climatic influences on soybean production, we also defined growing period (GP) temperature and precipitation for each soybean accession. The GP daylight hours were calculated according to Allen et al. (1998), and the daylight intensity was reckoned as the quotient of solar radiation and daylight hours. Soil data were extracted from the Second Chinese Soil Inventory Database. 2.2. Data analysis The analyses were conducted with SAS version 9.1 (SAS, Inc., 2004). The all 18,686 samples were grouped with the strata variable

GP. Then a training dataset consisting of 4701 samples was formed using an equal probability-based random selecting procedure from all GP-based original observation groups. In other words, the number of observations for each type of GP was used as the weighting factor to specify the stratum sample sizes, so as to prevent losses of soybean diversity during the random sampling and ensure the representativeness of selected samples. Based on our preliminary study results (not shown here), the original records of some soybean traits and environmental variables were log10 -transformed to normalize their distributions as indicated in Table 1. Regression modeling was performed to identify the importance of each variable to each individual soybean traits. The full multiple linear regressions were conducted to evaluate the overall effect and test the collinearity between predictors. Forward stepwise MLRs were performed to identify the minimum number of significant variables so as to avoid strong multicollinearity (Cocu et al., 2005). Ridge regression models were developed to remove multicollinearity between explanatory variables (Kutner et al., 2005) based on the variance inflation factor (VIF) traces − two-dimensional plots of standardized regression coefficients (Schroeder et al., 1986), and the corresponding VIF against ridge parameter (i.e. the residual sum of squares, also called ridge trace control value or biasing constant). We selected the smallest ridge trace when all standardized coefficients were stable; if possible, VIF is better around 1 (Chatterjee and Hadi, 2006). The explanatory variables, whose a ridge trace is stable but small, or unstable when the coefficient tends to be zero, were dropped in the predictive models of the ridge regression. Variables with unstable ridge traces that do not tend toward zero are also considered to remove (Ji and Peters, 2004; Kutner et al., 2005). Spatial autocorrelation (SAC) might exist in soybean germplasm data, which violates the assumption of the independently and identically distributed residuals (Anselin, 2002) and affects hypothesis testing and prediction (Dormann, 2007; Dormann et al., 2007; Kühn, 2007; Peres-Neto and Legendre, 2010). Therefore, SAC of the regression residuals of a ridge regression model was assessed by fitting the empirical semivariogram into the theoretical spherical semivariogram using the weighted least squares method (Jian et al., 1996; Olea, 2006). The residual maximum likelihood method was used to estimate regression coefficients (Ji and Peters, 2004). Testing results and validating models were conducted during the model-building process. Normal probability plots of residuals (P-P plots) and the Anderson–Darling test were executed on residuals to check the normality. The residuals against the predicted value were plotted to check the heteroskedasticity. The model residuals against predictors were plotted to verify the linearity. The remaining samples were used as the validation dataset to evaluate the performance of predictive models.

3. Results 3.1. Regression analysis and modeling The results presented in Table 2 show that soybean traits were significantly correlated with all the predictive factors either positively or negatively. Strong multicollinearity was found between environmental predictors (VIF > 10 and condition index (CI) > 100), and some of the predictors were not significant at ˛ = 0.01 level, such as precipitation and TDH in the crude oil model (Table S1). Compared with the full MLR, the multicollinearity of the forward stepwise regression was greatly reduced as indicated by VIF and CI (Table S2), and all the insignificant parameter estimates were removed from the models (i.e. the number of left predictors ranged from 7 to 11 out of 19 variables potentially); whereas the ridge

Please cite this article in press as: Li, Q., et al., Environmental controls on cultivated soybean phenotypic traits across China. Agric. Ecosyst. Environ. (2014), http://dx.doi.org/10.1016/j.agee.2014.03.034

G Model

ARTICLE IN PRESS

AGEE-4680; No. of Pages 7

Q. Li et al. / Agriculture, Ecosystems and Environment xxx (2014) xxx–xxx

3

Fig. 1. Sample locations of 18,686 cultivated soybean genotypes across China. One dot represents one soybean accession or germplasm accession. The red dots (n = 4701) represent the training samples randomly (with an equal probability) taken from the original 18,686 samples.

regression completely eliminated the collinearity between independent variables (VIF ≈ 1, CI < 10 in Table S3). Results of spatial analysis indicated that soybean germplasm accessions positively illustrated a spatial autocorrelation within a certain distance and varied with individual traits (Table S4). Compared with non-spatial models, the spatial modeling technique further reduced the redundant variables with a slight decrease in R2 (Table 3).

3.2. Environmental associations The spatial models showed that environmental variables accounted for 38%, 29%, 20%, and 17% of the variance in plant height, crude protein content, crude oil content, and 100-seed weight, respectively (Table 3). Climatic variables were the most important determinants, of which MTmin was the strongest predictor for

Table 1 Descriptions of soybean traits and environmental variables used in this study. Variables

Unit

Description and Source

Soybean GP SW SPH* CP CO

g cm g kg−1 g kg−1

Growing period (or Breeding period) 100-seed weight; Chinese Soybean Germplasm Resources Inventory (CSGRI) Soybean Plant height* ; CSGRI Crude protein content; CSGRI Crude oil content; CSGRI

Geography x y z

m m m

Longitude; http://nfgis.nsdi.gov.cn/nfgis/chinese/c xz.htm Latitude; http://nfgis.nsdi.gov.cn/nfgis/chinese/c xz.htm Elevation; http://srtm.csi.cgiar.org/



C mm ◦ C × 10 ◦ C × 10 ◦ C × 10 mm mm hour

http://www.worldclim.org/ http://www.worldclim.org/ Mean monthly minimum temperature during growing period from 1950 to 2000 Mean monthly maximum temperature during growing period from 1950 to 2000 Accumulated mean monthly temperature during growing period from 1950 to 2000 Mean monthly precipitation during growing period from 1950 to 2000 Accumulated mean monthly precipitation during growing period from 1950 to 2000 The total daylight hours during growing period Solar radiation intensity during growing period

cm (%) (%) (%) (%) (%) (%) (%)

Soil thickness; The second Chinese Soil Inventory Database; CSID Silt fraction; CSID Clay fraction; CSID Soil organic matter content; CSID Soil total nitrogen content; CSID Phosphorus content; CSID Potassium content; CSID The measure of soil acidity or alkalinity; CSID

Climate Temperature Precipitation MTmin MTmax ATmean MP TP*** TDH RAD* Soil SPT ST CY SOM N P*** K pH

Note: The variables with asterisk (*) were log10 -transformed for subsequent analyses. The number of asterisk (*) denotes the times of log10 -transformations. According to the specific growing period (GP) of individual soybean genotype, we estimated MTmin , MTmax , ATmean , MP, TP, TDH, and RAD from the original monthly data.

Please cite this article in press as: Li, Q., et al., Environmental controls on cultivated soybean phenotypic traits across China. Agric. Ecosyst. Environ. (2014), http://dx.doi.org/10.1016/j.agee.2014.03.034

G Model

ARTICLE IN PRESS

AGEE-4680; No. of Pages 7

Q. Li et al. / Agriculture, Ecosystems and Environment xxx (2014) xxx–xxx

4

Table 2 Correlation matrix of soybean characteristics and environmental variables.

SW SPH CP CO GP y x z MTmin MTmax ATmean MP TP TDH RAD SPT SD ST CY SOM N P K pH

SW

SPH

CP

CO

1 −0.053*** 0.022 0.254**** 0.026 0.095**** 0.335**** −0.232**** 0.075**** 0.048*** 0.096**** −0.055**** 0.002 0.023 −0.305**** −0.044**** −0.117**** 0.036 0.166**** 0.054*** 0.075**** 0.026 0.263**** −0.016

−0.053*** 1 −0.249**** 0.012 0.416**** 0.455**** 0.213**** −0.057*** −0.233**** −0.039 0.307**** −0.405**** −0.272**** 0.513**** −0.148**** 0.365**** 0.228**** −0.096**** −0.299**** −0.234**** −0.141**** 0.389**** −0.067**** 0.239***

0.022 −0.249**** 1 −0.517**** −0.064**** −0.443**** −0.248**** −0.005 0.369**** 0.199**** 0.085**** 0.387**** 0.328**** −0.277**** 0.056*** −0.424**** −0.236**** 0.104**** 0.304**** 0.227**** 0.156**** −0.382**** 0.051*** −0.216***

0.254**** 0.012 −0.517**** 1 −0.179**** 0.299**** 0.359**** −0.177**** −0.142**** −0.057*** −0.158**** −0.134**** −0.148**** 0.01 −0.136**** 0.171**** 0.022 −0.015 −0.025 0.032 0.082**** 0.160**** 0.184**** 0.161***

Note: * P < 0.1, ** P < 0.01, *** P < 0.001, **** P < 0.0001. Pearson correlations were established between the four germplasm traits and all environmental predictors based on 4071 training samples.

100-seed weight, crude oil content, and protein content. The geographical variables were responsible for much more variations than soil was, in which longitude was the strongest determinant for 100seed weight, crude oil content, and protein content, while latitude heavily influenced plant height. Elevation also affected 100- seed weight and crude oil content. Soil factors explained more variances in crude protein content than in other three traits. As indicated by Table 3, longitude, elevation, MTmin , ATmean , CY (soil clay fraction), and K content were the positive environmental variables to affect 100-seed weight, while MTmax , TP (total precipitation), and RAD (solar radiation intensity) were the negative predictors on the response of seed weight. The increase in crude protein content was primarily caused by an increase in MTmin and SOM, and by a decrease in longitude, SPT (soil thickness), and P (phosphorus content). Crude oil content significantly positively correlates with longitude, MTmax , RAD, and K, but it negatively correlates with elevation, MTmin , and ATmean . The positive determinants of soybean plant height are latitude, ATmean , TDH (total daylight hours), and P, whereas RAD serves as a weak negative factor (Table 3). The magnitude and sign of a standardized parameter estimate (ı) indicate the contribution of a control variable to the

dependent variable. Apparently, the full regression was of little use for identifying the controlling factors as it couldn’t remove collinearity, resulting in inclusion of highly collinear variables in the list of significant predictors (Table S1). The stepwise regression greatly reduced the number of predictors with a smaller decrease in collinearity compared with the full regression (Table S2). The ridge regression completely removed multicollinearity and further reduced the number of predictors for both plant height and protein content (Table S3). After considering the influence of spatial autocorrelation, the fewest predictors were produced in the spatial regression models with a slight decrease in R2 value of the four soybean traits (except crude oil content) compared with ridge regression models (Table 3). Very similar sets of controlling variables were identified by different models for 100-seed weight and crude oil content (Fig. S1; Table 3, and Tables S2 and S3). Removing collinearity and/or considering spatial autocorrelation can improve the prediction to some extent as indicated by the ridge regression.

3.3. Evaluation and application of the spatial models We plotted the observed values against the predicted ones with the validation dataset to test the performance of the spatial models (Fig. 2). Then, we generated maps (Fig. 3) of the observed and predictive values of soybean traits using the full sample dataset as validation dataset for further assessing the accuracy of these spatial models and identifying their spatial variability. These models explained 9.6%, 30.3%, 26.5%, and 15.3% of the variance in 100-seed weight, plant height, crude protein content, and crude oil content, respectively (Fig. 2). Although the coefficients of determination from the validation dataset were lower than the corresponding R2 values from the training dataset, implying that some other factors might play a more important role in determining variations of the variables, the map clearly illustrated the general patterns of all four soybean traits captured by these spatial models (Fig. 3).

4. Discussion All environmental determinants could only explain less than 40% of the total variance for 100-seed weight, plant height, crude protein, and crude oil content (Table 3), implying that environmental variables were not the most critical variables in determining the soybean germplasm traits even though they showed statistically significant impacts, especially on plant height and crude protein content. The soybean protein content can be highly heritable and generally less affected by environmental factors in comparison to grain yield (Erikson et al., 1982; Burton, 1987; Basra and Randhawa, 2002), which can be also seen from our results, suggesting that some other determinants of soybean grain yield, such as planting

Table 3 Summary of the spatial mixed linear model for environmental effects on soybean germplasm traits. 100-seed weight (R2 = 0.17) Variable Intercept x z MTmin MTmax ATmean TP RAD CY K

ˇ

Crude oil content (R2 = 0.2) ı+

****

72.4 0.15* 0.002** 0.05*** −0.05** 0.0001**** −20.51**** −23.77**** 0.06** 0.65**

0.34 0.29 0.41 −0.33 0.15 −0.14 −0.22 0.10 0.12

Variable

ˇ

Intercept x z MTmin MTmax ATmean RAD K

−18 0.08**** −0.001** −0.006* 0.01** −0.0001**** 10.31**** 0.16*

Crude protein content (R2 = 0.29) ı+ 0.34 −0.27 −0.35 0.27 −0.16 0.30 0.08

Variable Intercept x MTmin SPT SOM P

ˇ

ı+ ****

52.1 −0.08**** 0.01**** −0.01* 0.16*** −0.81*

−0.18 0.26 −0.17 0.14 −0.11

Plant height (R2 = 0.38) Variable Intercept y ATmean TDH RAD P

ˇ

ı+ ****

3.08 0.005* 0.00001**** 0.0002**** −0.72**** 0.07**

0.27 0.23 0.22 −0.02 0.19

Note: Spatial mixed linear models were performed on soybean germplasm traits against the environmental predictors in the ridge regression models based on 4071 training records. * P < 0.1, ** P < 0.01, *** P < 0.001, **** P < 0.0001. R2 = adjusted coefficient of determination; ˇ parameter estimate; ı+ standardized parameter estimate.

Please cite this article in press as: Li, Q., et al., Environmental controls on cultivated soybean phenotypic traits across China. Agric. Ecosyst. Environ. (2014), http://dx.doi.org/10.1016/j.agee.2014.03.034

G Model AGEE-4680; No. of Pages 7

ARTICLE IN PRESS Q. Li et al. / Agriculture, Ecosystems and Environment xxx (2014) xxx–xxx

5

Fig. 2. Predicted values vs. observed values of soybean traits calculated from spatial regression models with the validation dataset excluding the training sample dataset.

density, pods per plant, and seeds per pod, might be much more vulnerable to environmental variables than the seed weight is. One of the questions that puzzles soybean breeders and growers is when and how they can produce bigger-seeds with higherprotein content and also higher-oil content from the environmental

controlling perspective. Several studies argued that it is difficult to simultaneously increase seed oil and protein content as well as yield by conventional breeding (Burton, 1991; Jin et al., 2010). Similarly, our results suggest that environmental conditions can affect the soybean traits but such effects are limited in comparison

Fig. 3. Spatial distributions of observed and predicted values of soybean traits from the full sample dataset. The accuracy of the predictive results from spatial models was tested by mapping all the soybean germplasm accessions in the validation subset. Maps (a) to (d) were produced with observed values of the four soybean traits, while maps (a ) to (d ) were produced using predictive values of spatial models.

Please cite this article in press as: Li, Q., et al., Environmental controls on cultivated soybean phenotypic traits across China. Agric. Ecosyst. Environ. (2014), http://dx.doi.org/10.1016/j.agee.2014.03.034

G Model AGEE-4680; No. of Pages 7 6

ARTICLE IN PRESS Q. Li et al. / Agriculture, Ecosystems and Environment xxx (2014) xxx–xxx

with accessions. As illustrated by Fig. S1, the role of each variable could vary greatly with individual traits. In other words, not all variables could be manipulated to improve these soybean traits simultaneously. For example, MTmin could most positively affect 100-seed weight and crude protein content (followed by MTmax ), but also most negatively impact crude oil content (see Table 3). These are agreed with the conclusions from Kumar et al. (2006), who found that minimum temperature was significantly negatively correlated with soybean seed oil content, but positively correlated with seed protein content. Accumulated GP temperature (ATmean ) was frequently reported as a vital climatic factor used to predict plant growth (Bartholomew and Williams, 2005; Wang et al., 2007), which was also confirmed by our study for 100-seed weight, crude oil content, and plant height. Although precipitation was eliminated from modeling crude protein and crude oil content in our study because of multicollinearity among independent variables, it still demonstrated a significantly negative effect on seed weight (Table 3) as documented by Martijena and Bullock (1997). As well, spatial models confirmed a critical impact of longitude on oil content (positive) and protein content (negative) (see Table 3 and Table S2-4). These results are consistent with those from Kumar et al. (2006). Note that the precipitation in China generally increases gradually from the west to the east, thus, longitude may be an indirect predictor to reflect influences of precipitation on soybean traits. Properly managing water supply could be helpful for breeding soybean varieties with targeted seed protein and oil contents. Different accessions may be adapted to different locations/ environments and thus genotypic variables (100-seed weight, protein content and oil content). Our stepwise regression model showed (after removing multicollinearity effect) that solar intensity (RAD) or total daylight hours (TDH) can help increase seed oil content but lead to a decrease in protein content. Cure et al. (1982) found that shortening day length would facilitate the translocation of nitrogen to seed and increase seed protein content. The strong positive correlation of the plant height to latitude observed from our study (Table 3) appears to be in agreement with Jiang et al. (2011), who claimed that day length positively affects soybean plant height; because daylight hours in China generally increase from south to north. Soil properties are frequently reported as determinants of soybean seed traits (Yang et al., 1996; Suyantohadi et al., 2010). For example, soil K content in a natural environment at the continental scale presented a positive relationship with 100-seed weight and crude oil content (Table 3), whereas soil attributes affect much more on crude protein content than on the other traits (Table 3), in which protein content was negatively correlated with soil profile thickness (SPT) and soil P content but positively associated with SOM. These results suggested that improving soil physical and chemical properties would be favorable for improving soybean seed qualities. In our study, soybean plants were usually taller in higher altitude areas where there are sufficient daylight hours and soil P content (Table 3). For instance, the tallest soybean plant accession was observed in Jishan county of Shanxi Province where the sunlight is plentiful and the soil is rich in P content. Moreover, the GP of this accession is from late-April to mid-October, having longer daylight hours and higher GP accumulated temperature than other accessions of CSGRI. The areas produced higher 100seed weight soybean are in the eastern high-elevation region with higher accumulative temperature, favorable temperature regime without harsh extremes, mild solar intensity, higher soil clay and K contents, and modest annual rainfall (Table 3). As our models indicated, the soybean accessions having the highest 100-seed weight were found in Liyang and Qidong city of Jiangsu Province and Yuexi county of Anhui Province where their growing periods are from

June to October. Qidong and Liyang have much higher soil clay content than other sampling sites of CSGRI; Liyang and Yuexi are richest in soil K. The soybean accession with the highest seed crude oil content (highest-CO) was found in Wuzhai county of Shanxi Province where the growing period is from late-April to late-September with long day light hours, strong solar radiation, low temperature, and less rainfall (Ma et al., 2008). The soybean accession containing the highest seed crude protein content (highest-CP) was also found in Wuyuan county of Jiangxi Province where the soybean accession was growing from May to October and offered with ample rainfall, high monthly temperature (the lowest monthly temperature is >13◦ during the growing period) and high SOM (but low soil P content). The spatial correlations between soybean traits and environmental variables or the spatial variability of these traits mentioned above can be explicitly illustrated and explained by means of Kriging techniques. More importantly, the Kriging (e.g., Universal Kriging) can be applied on residuals to delineate any spatial patterns: no spatial pattern implies that all variables are correctly selected; otherwise, some variables may be missing or the model is not correctly specified. 5. Conclusions Environmental factors could greatly influence soybean germplasm traits. However, the attribution has been challenging because of potential collinearity and spatial autocorrelation among the abiotic factors. For the first time, we conducted such an attribution study at continental scale via performing full multiple linear regression, forward stepwise regression, and ridge regression analyses to eliminate less relevant variables. Across China, climatic variables, especially the MTmin and MTmax , exerted the most influence on soybean traits, of which MTmin was the strongest climatic predictor for 100-seed weight, crude oil content and crude protein content. The longitude, a proxy for the growing period precipitation which generally increased from the east to the west of China, was significantly correlated to 100-seed weight, crude oil content and crude protein content of soybean seeds. Although statistically significant, multiple abiotic variables (listed in Table 3) could only explain small portions of the observed variations of soybean traits (ranging from 17% for 100-seed weight to 38% for plant height), implying a more critical role of biotic factors (such as genetic characteristics) in determining phenotypic traits. It is possible to improve yield and quality of soybean varieties by selecting suitable environmental conditions or improving soils, but it is difficult to produce a soybean accession with all ideal germplasm traits simultaneously as the most favorable conditions vary with individual soybean traits. Note that our conclusions were drawn from the results of observational study (rather than experimental investigation) by which one may be not able to infer the causal effects of correlated environmental variables on soybean properties. Acknowledgements This research was founded by National Natural Science Foundation of China (grant no. 40971125). The authors greatly thank Lei Ji, Hong Liao, Lu Wang and Zhoupeng Ren for their valuable suggestions on data analyses and drafting. We are also grateful to all anonymous referees for their thoughtful comments. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.agee.2014.03.034.

Please cite this article in press as: Li, Q., et al., Environmental controls on cultivated soybean phenotypic traits across China. Agric. Ecosyst. Environ. (2014), http://dx.doi.org/10.1016/j.agee.2014.03.034

G Model AGEE-4680; No. of Pages 7

ARTICLE IN PRESS Q. Li et al. / Agriculture, Ecosystems and Environment xxx (2014) xxx–xxx

References Allen, R.G., Pereira, L.S., Raes, D., Smith, M., 1998. Crop evapotranspiration: Guidelines for computing crop water requirements, FAO Irrigation and Drainage Paper 56. FAO, Rome, Italy. Anselin, L., 2002. Under the hood issues in the specification and interpretation of spatial regression models. Agric. Econ. 27, 247–267. Baker, J.T., Allen, L.H., Boote, K.J., Jones, P., Jones, J.W., 1989. Response of soybean to air-temperature and carbon-dioxide concentration. Crop Sci. 29, 98–105. Bartholomew, P.W., Williams, R.D., 2005. Cool-season grass development response to accumulated temperature under a range of temperature regimes. Crop Sci. 45, 529–534. Basra, A.S., Randhawa, L.S., 2002. Quality Improvement in Field Crops. Food Product Press, New York. Borras, L., Slafer, G.A., Otegui, M.E., 2004. Seed dry weight response to source-sink manipulations in wheat, maize and soybean. A quantitative reappraisal. Field Crop. Res. 86, 131–146. Breene, W.M., Lin, S., Hardman, L., Orf, J., 1988. Protein and oil content of soybeans from different geographic locations. J. Am. Oil Chem. Soc. 65, 1927–1931. Burton, J.W., 1989. Breeding soybean cultivars for increased seed protein percentage. In: Conferencia Mundial de Invetigacion en Soja IV. Proceedings. AASOJA, Buenos Aires, pp. 1079–1085. Burton, J.W., 1987. Quantitative genetics: Results relevant to soybean breeding. In: Wilcox, J.R. (Ed.), Soybean: Improvement, Production and Uses. American Society of Agronomy, Madison, WI, pp. 211–247. Burton, J.W., 1991. Development of high-yielding high-protein soybean germplasm. In: Wilson, R.F. (Ed.), Designing Value-added Soybeans for Markets of the Future. American Oil Chemists Society, Champaign, IL, pp. 109–117. Chatterjee, S., Hadi, A.S., 2006. Regression Analysis by Example, fourth ed. John Wiley & Sons, Inc., New York. Cocu, N., Harrington, R., Rounsevell, M.D.A., Worner, S.P., Huilé, M., 2005. Geographical location, climate and land use influences on the phenology and numbers of the aphid, Myzus persicae, in Europe. J. Biogeogr. 32, 615–632. Cure, J.D., Patterson, R.P., Raper, J., Jackson, C.D.W.A., 1982. Assimilate distribution in soybeans as affected by photoperiod during seed development. Crop Sci. 22, 1245–1250. Dormann, C.F., 2007. Effects of incorporating spatial autocorrelation into the analysis of species distribution data. Glob. Ecol. Biogeogr. 16, 129–138. Dormann, C.F., McPherson, J.M., Araújo, M.B., Bivand, R., Bolliger, J., Carl, G., Davies, R.G., Hirzel, A., Jetz, W., Kissling, W.D., Kühn, I., Ohlemüller, R., Peres-Neto, P.R., Reineking, B., Schröder, B., Schurr, F.M., Wilson, R., 2007. Methods to account for spatial autocorrelation in the analysis of species distributional data. A review. Ecography 30, 609–628. Dornbos@@Jr., D.L., Mullen, R.E., 1992. Soybean seed protein and oil contents and fatty acid composition adjustments by drought and temperature. J. Am. Oil Chem. Soc. 69, 228–231. Erikson, L.R., Beversdorf, W.D., Ball, S.T., 1982. Genotype × environment interactions for protein in Glycine max × Glycine soja crosses. Crop Sci. 22, 1099–1101. Gibson, L.R., Mullen, R.E., 1996. Soybean seed composition under high day and night growth temperatures. J. Am. Oil Chem. Soc. 73, 733–737. Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G., Jarvis, A., 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25, 1965–1978. Hu, M.X., Yu, D.X., Meng, X.X., 1990. The effect of different ecogeographic environment on the seed quality of soybeans in China. Soybean Sci. 9, 39–49. Hurburgh Jr., C.R., Brumm, T.J., Guinn, J.M., Hartwig, R.A., 1990. Protein and oil patterns in U.S. and world soybean markets. J. Am. Oil Chem. Soc. 67, 966–973.

7

Ji, L., Peters, A.J., 2004. A spatial regression procedure for evaluating the relationship between AVHRR-NDVI and climate in the northern Great Plains. Int. J. Remote Sens. 25, 297–311. Jian, X., Olea, R.A., Yu, Y.S., 1996. Semivariogram modeling by weighted least squares. Comput. Geosci. 22, 387–397. Jiang, Y., Wu, C., Zhang, L., Hu, P., Hou, W., Zu, W., Han, T., 2011. Long-day effects on the terminal inflorescence development of a photoperiod-sensitive soybean [Glycine max (L) Merr.] variety. Plant Sci. 180, 504–510. Jin, J., Liu, X., Wang, G., Mi, L., Shen, Z., Chen, X., Herbert, S.J., 2010. Agronomic and physiological contributions to the yield improvement of soybean cultivars released from 1950 to 2006 in Northeast China. Field Crop Res. 115, 116–123. Kühn, I., 2007. Incorporating spatial autocorrelation may invert observed patterns. Divers. Distrib. 13, 66–69. Kumar, V., Rani, A., Solanki, S., Hussain, S.M., 2006. Influence of growing environment on the biochemical composition and physical characteristics of soybean seed. J. Food Compos. Anal. 19, 188–195. Kutner, M.H., Nachtsheim, C.J., Neter, J., Li, W., 2005. Applied Linear Statistical Models, fifth ed. McGraw-Hill, Boston. Liu, K.S., 1997. Soybeans: Chemistry, Technology, and Utilization. Aspen Publishing, New York. Ma, Z.P., Wang, J.J., Li, J.W., Yang, Y., Hao, X.P., 2008. Analysis on the climatic characteristics in recent fifty years in Wuzhai County. Sci.-Tech. Inf. Dev. Econ. 18, 153–154. Martijena, N.E., Bullock, S.H., 1997. Geographic variation in seed mass in the chaparral shrub Heteromeles arbutifolia (Rosaceae). Southwest. Nat. 42, 119–121. Miladinovic, J., Kurosaki, H., Burton, J.W., Hrustic, M., Miladinovic, D., 2006. The adaptability of short-season soybean genotypes to varying longitudinal regions. Eur. J. Agron. 25, 243–249. Olea, R.A., 2006. A six-step practical approach to semivariogram modeling. Stoch. Environ. Res. Risk 20, 307–318. Peres-Neto, P.R., Legendre, P., 2010. Estimating and controlling for spatial structure in the study of ecological communities. Glob. Ecol. Biogeogr. 19, 174–184. Pipolo, A.E., Sinclair, T.R., Camara, G.M.S., 2004. Protein and oil concentration of soybean seed cultured in vitro using nutrient solutions of differing glutamine concentration. Ann. Appl. Biol. 144, 223–227. Sale, P.W.G., Campbell, L.C., 1986. Yield and composition of soybean seed as a function of potassium supply. Plant Soil 96, 317–325. Sato, K., Ikeda, T., 1979. The growth responses of soybean to photoperiod and temperature. IV. The effect of temperature during the ripening period on the yield and characters of seeds. Jpn. J. Crop Sci. 48, 283–290. Schroeder, L.D., Sjoquist, D.L., Stephan, P.E., 1986. Understanding Regression Analysis: An Introductory Guide, Sage University Paper Series on Quantitative Applications in the Social Sciences, series no. 07-057. Sage Publications, Beverly Hills, CA. Sionit, N., Strain, B.R., Flint, E.P., 1987. Interaction of temperature and CO2 enrichment on soybean—photosynthesis and seed yield. Can. J. Plant Sci. 67, 629–636. Suyantohadi, A., Hariadi, M., Purnomo, M.H., Morimoto, T., 2010. Dynamic neural network model for identifying cumulative responses of soybean plant growth based on nitrogen fertilizer compositions. Aust. J. Agric. Eng. 1, 188–193. Wang, H.L., Chen, S.G., Xiang, S.P., Hao, X.Y., Ma, H., 2007. Effects of climate factors on the relative contents of major storage protein fractions and its subunits in soybean seeds. Chin. J. Oil Crop Sci. 29, 431–437. Yang, J., Blanchar, R.W., Hammer, R.D., Thompson, A.L., 1996. Soybean growth and rhizosphere pH as influenced by A horizon thickness. Soil Sci. Soc. Am. J. 60, 1901–1907. Zu, S.H., 1983. The agroclimatic analysis on the oil content of soybean and its geographical distribution in Heilongjiang Province. Soybean Sci. 2, 266–276.

Please cite this article in press as: Li, Q., et al., Environmental controls on cultivated soybean phenotypic traits across China. Agric. Ecosyst. Environ. (2014), http://dx.doi.org/10.1016/j.agee.2014.03.034