Estuarine, Coastal and Shelf Science 236 (2020) 106656
Contents lists available at ScienceDirect
Estuarine, Coastal and Shelf Science journal homepage: http://www.elsevier.com/locate/ecss
A basic end-member model algorithm for grain-size data of marine sediments Xiaodong Zhang *, Hongmin Wang , Shumei Xu , Zuosheng Yang Key Lab of Submarine Geosciences and Prospecting Techniques, MOE and College of Marine Geosciences, Ocean University of China, Qingdao, 266100, PR China
A R T I C L E I N F O
A B S T R A C T
Keywords: End-member unmixing End-member space Basic end-members Genetic algorithm Grain size distribution
End-member (EM) unmixing algorithms are widely used in the earth sciences for unmixing compositional data such as grain-size data of sediments. However, many unmixing algorithms only use goodness-of-fit measures that justify the EMs in the absence of geological feasibility, and others tend to find EMs that enclose the sample points as tightly as possible, resulting in the calculated EMs are still mixed products, especially when unmixing highly mixed dataset. This paper proposes that EM unmixing algorithms should search for the Basic-EMs, i.e., the outermost points in the EM space. The EMs that can be unmixed by mathematics can also be purified by physical processes, i.e., the actual EMs are most likely the Basic-EMs. This paper also introduces a basic end-member model algorithm (BasEMMA) that uses genetic algorithms which mimic natural evolution processes, to seek the Basic-EMs. The evaluations by BasEMMA using both artificial and actual grain-size data show that BasEMMA can accurately find the Basic-EMs. This paper also introduces a procedure for determining the appropriate EM number, which has plagued previous researchers. In summary, this paper introduces a new way to determine geologically feasible EMs, a new EM unmixing algorithm and a new method to determine the appropriate EM number.
1. Introduction The grain size distribution (GSD) of sediments exhibits multiple peaks that reflect different transport modes and sedimentation processes (Middleton, 1976; Ashley, 1978); therefore, unmixing the GSD of sedi ments would be helpful for revealing the provenance, transport and settlement of sediments (Weltje and Prins, 2007). Unmixing composi tional data (nonnegativity and sum-to-one) such as the GSD into con stituent end-members (EMs) and their abundances is a key problem not only in sedimentology (Weltje, 1997; Shang et al., 2016; Liang et al., 2019; Wen et al., 2019; Torre et al., 2020) but also in hydrology, geochemistry and petrology (Myers et al., 1987; Geen et al., 1988; Hannigan et al., 2001; Lourens et al., 2001; Wyatt and Mcsween, 2002; Laumonier et al., 2014; Heslop, 2015; Li et al., 2016; Kim et al., 2017; Derrien et al., 2018). For EM unmixing, there is an inherently nonunique problem that provides an infinite set of EMs that can bound all of the mixed samples (Fig. 1A, Heslop, 2015). Some researchers have advised that the calcu lated EMs should be in close proximity to the samples, from which the Minimum-EMs can be obtained (Weltje and Prins, 2007; Paterson and
Heslop, 2015). If a dataset consists of samples that are widely distributed across the EM space, the Minimum-EMs will be close to the actual EMs. However, when the samples are highly mixed (abundances are never 100% nor 0%. van Hateren et al., 2018), a single extreme sample point, especially points at the edge of the samples, can greatly affect the po sition of the Minimum-EMs, resulting in undesirable results (Fig. 1, revised after Heslop, 2015). We propose that the EM unmixing algorithms should search for the Basic-EMs, i.e., the outermost points in EM space (Fig. 1). Any EM tuple in EM space can be linearly composed by the Basic-EMs, meaning the EM tuple is still a mixed product of different provenances. The mixed product will finally be purified by different “dynamic populations” (Weltje and Prins, 2003). Therefore, the mixed product is unlikely to be actual EMs, or the actual EMs are most likely the Basic-EMs which cannot be further unmixed. Even if researchers wish to study the inter mediate states of the sediments during their transport path, they can study these from the distribution of samples in the entire EM space formed by the Basic-EMs instead of only in a local EM space formed by Random-EMs or by Minimum-EMs (Fig. 1). We have studied the basic end-member model algorithm (BasEMMA)
* Corresponding author. E-mail addresses:
[email protected] (X. Zhang),
[email protected] (H. Wang),
[email protected] (S. Xu),
[email protected] (Z. Yang). https://doi.org/10.1016/j.ecss.2020.106656 Received 1 August 2019; Received in revised form 8 February 2020; Accepted 19 February 2020 Available online 24 February 2020 0272-7714/© 2020 Elsevier Ltd. All rights reserved.
X. Zhang et al.
Estuarine, Coastal and Shelf Science 236 (2020) 106656
(Goldberg, 1989). A genetic search starts with a population of randomly generated initial solutions, and each solution is then evaluated by a fitness function. The solutions are duplicated or eliminated depending on their fitness values, and new generations of solutions are created by applying genetic operators. This method eventually leads to high-performing solutions (Goldberg, 1989; Awadh et al., 1995). 2.3.2. The operation of BasEMMA The operation of BasEMMA is shown in Fig. 2. The solution to Equation (1) includes matrices A, B and E. However, if all of these matrices are included in the solution, there are too many variables to search, resulting in a long search time. Therefore, BasEMMA only adds matrix B in the solution, and uses Equation (2) to estimate A and ignores the effect of E. � A ¼ D BT BBT (2)
Fig. 1. Nonuniqueness of the calculated EMs (A) and the Minimum-EMs sus ceptible to extreme samples (B).
since 2006 and have used BasEMMA for the unmixing of grain-size data from marine sediments (Zhang et al., 2006a, 2006b; 2016; Zhang and Feng, 2009; Meng et al., 2014). In this paper, we introduce BasEMMA in detail, evaluate it using artificial test datasets and propose a new crite rion for determining the appropriate EM number. Finally, we show a case using BasEMMA unmixes grain-size data of marine sediments.
where A is the abundance matrix, D is sample data, B is the EM matrix, BT is the transpose matrix of B, and ðBBT Þ is the inverse of BBT . Since the sample data contain the information of the actual EMs, BasEMMA randomly selects combinations of different sample data to initialize the solutions, which speeds up the search process. The fitness function is composed of two parts and is used to assess the performance of each solution:
2. Linear end-member model for compositional data and its associated algorithms 2.1. Linear end-member model
8 ><
Unmixing techniques assume that the sample data can be described as a linear mixture of EMs in the EM space, which is an X-1-dimensional simplex constrained by nonnegativity and sum-to-one for X EMs. For example, a simplex for 3 EMs is a 2-dimensional triangle and a simplex for 4 EMs is a 3-dimensional tetrahedron. Mathematically, the unmixing problem can be expressed in matrix notation as: D¼ABþE
Fitness function ¼ E Factor D Factor q p � n X � . X X � � E Factor ¼ Aik � Bkj ��100 n �Dij i¼1 j¼1 q p
>: X X X� �B D Factor ¼ q
k¼1 ik
�. Bjk � ðq � ðq
. 1ÞÞ 10
(3)
i¼1 j¼iþ1 k¼1
(1)
where E_Factor is the error factor and D_Factor is the distance factor. D is the sample data, A is the abundance matrix, and B is the EM matrix. n is the sample number, p is the grain-size number, and q is the EM number. E_Factor is a number indicating the mean error between the synthesized GSD and the actual GSD and E_Factor ¼ 0 means no error. D_Factor is a number indicating the mean distance between EMs, and D_Factor ¼ 0 means that the GSDs of the EMs are identical. Smaller fitness values are better. E_Factor is used to find the solution with a small fitting error, and this ensures that the calculated EMs expand in the EM space in which all sample points reside (Fig. 3). E_Factor loses its ability to further optimize the solution when the best-fit solution is found. Then, BasEMMA uses D_Factor to further optimize the solution while gradually approaching the Basic-EMs (Fig. 3). The selection, crossover and mutation operators are the main oper ators in genetic algorithms. The selection operator selects the best so lution to replace the poor solutions according to the fitness function.
where D represents the sample data, A represents the abundance matrix, B represents the EM matrix and E represents the errors. 2.2. Previously proposed algorithms for solving end-member model The difficulty in solving Equation (1) is that both the matrices, A and B, that satisfy the aforementioned constraints must be derived from matrix D. There is no closed-form solution to the equation, which must be solved numerically (Renner, 1993; Heinz and Chang, 2001). Weltje (1997) proposed an EMMA algorithm based on a simplex expansion, which sequentially updates each EM to expand the simplex and reduces the error. Dietze et al. (2012) introduced the EMMAgeo algorithm based on principal component analysis, which seeks the best way to depict data in a low-dimensional EM space. Paterson and Heslop (2015) contributed the AnalySize algorithm under an EM minimum-distance constraint. Other algorithms are NMF by Heslop et al. (2007) and BEMMA by Yu et al. (2016). Based on these unmixing algorithms, many research results have been obtained. However, the evaluation of these unmixing algorithms by van Hateren et al. (2018) showed that many calculated EMs were far from the test EMs. In particular, when unmixing highly mixed dataset, no algorithm yields satisfactory results. Highly mixed dataset, such as the grain-size data from cores sampled in stable sedimentary environments, such as marine mud areas, are very common in sedimentology (Zhang et al., 2106; Liu et al., 2017; Li et al., 2019; Liu et al., 2019a,b). 2.3. A basic end-member model algorithm 2.3.1. The principle of BasEMMA BasEMMA directly solves Equation (1) using genetic algorithms. Genetic algorithms are general-purpose search techniques based on principles inspired from the genetic and evolutionary mechanisms observed in natural systems and the populations of living beings
Fig. 2. Flow chart of BasEMMA. 2
X. Zhang et al.
Estuarine, Coastal and Shelf Science 236 (2020) 106656
Sea by Zhang et al. (2016). The GSDs of the 4 Basic-EMs range from 0.32 to 707 μm and are divided into 28 size grades (Appendix B). Four artificial test datasets were produced using the Basic-EMs and randomly generated abundances (Appendix B). Test dataset 1 represents data evenly mixed by the Basic-EMs 1–4, and the mean EM abundances are approximately 25%. The test dataset 2 is unevenly mixed by the Basic-EMs 1–4, and the mean abundance of Basic-EM1 is approximately 7%. Test dataset 3 represents highly mixed data from Basic-EM 1–3, and the mean abundances of the Basic-EMs 1–3 are 50%, 35% and 15%, respectively. Different from the noise-free test dataset 1–3, test dataset 4 includes noise, and 90% of this dataset is from test dataset 3 and the rest is random error. In addition, the coversand and noisy coversand datasets synthesized from four coversand samples (CS1-4, Appendix B, data from van Hateren et al., 2018) were also used. Fig. 3. The roles of the error factor and the distance factor when determining the Basic-EMs.
3.2. Indexes used to evaluate the unmixing results It is crucial to obtain accurate GSDs of the EMs; therefore, we use the mean distance between the calculated EMs and the Basic-EMs (MDEM) to evaluate whether the GSDs of the calculated EMs are sufficiently accurate:
BasEMMA uses the best solution to replace the worst 10% of the solu tions. The crossover operator exchanges some data from different solu tions and is the main method for generating new solutions and determines the global search capability of the algorithm. BasEMMA randomly picks one EM from 20% of the solutions for exchange with the corresponding EM of another randomly selected solution. The mutation operator fine-tunes some data from an individual solution and is an auxiliary method to generate new solutions and determines the local search capability. BasEMMA randomly selects one EM from 50% of the solutions and then adds a random value between 0.01 and 0.01 to a randomly selected grain-size grade. The GSD of each EM is normalized after the mutation operation since the result might violate the afore mentioned restrictions of nonnegativity and sum-to-one. BasEMMA is embedded in the popular Microsoft Excel program using VBA (Visual Basic for Applications, Appendix A) and the intermediate and final results are displayed in real time using Excel graphs. The au thors advise that BasEMMA should be performed twice. First, the pa rameters of “EM number from 2 to 5”, “Maximum generation number ¼ 30” and “Population number ¼ 300” are used to perform a rough and quick analysis. Second, after the appropriate EM number has been determined, a thorough but slow analysis should be performed using parameters such as “EM number from 3 to 3” and “Maximum generation number ¼ 300”. The “Grain size number” and “Sample number” depend on the data involved.
MDEM ¼
q X p X � �Inverted EMij i¼1
�� Baisc EMij � q
(4)
j¼1
where p is grain-size number and q is the EM number. MDEM ¼ 0 means that the two sets of EMs are identical, and MDEM ¼ 2 means there is no overlap at all. The mean R2 value between the calculated abundances and the test abundances (MRA) is used to evaluate the similarity between the calculated and test abundances. MRA ¼ 0 means that the abundances are completely different, and MRA ¼ 1 means that the abundances are identical. In addition, the median value of the class-wise R2 values be tween the calculated grain-size data and the original data (MCR) is used to compare the accuracy of BasEMMA and previous unmixing algorithms. 3.3. The results of end-member modeling for artificial grain-size datasets BasEMMA was performed twice for unmixing test dataset 1–4 based on “Maximum generation number” values of 30 and 100, respectively. The results are shown in Table 1 and in Appendix C. For the noise-free test dataset 1 and 3, BasEMMA accurately reproduced the Basic-EMs and their respective abundances after 30 generations. However, it took BasEMMA 100 generations to find satisfactory results for the highly mixed test dataset 2, which might be due to the Basic-EMs are far from the sample points of test dataset 2 in the EM space. For the noisy test dataset 4, the MDEM is slightly larger than for the other test datasets, even with a “Maximum generation number” of 100; the main grain-size peaks of the calculated EMs based on 3 EMs are slightly lower than the peaks of corresponding Basic-EMs (Appendix C, Fig. 4E), indicating the effects of random error. However, the MRA is 1, indicating that the signals contained in the abundance sequences are accurately extracted (Appendix C, Fig. 4H).
3. Testing optimality of BasEMMA using artificial grain-size datasets 3.1. The generation of artificial grain-size datasets Fig. 4 shows four Basic-EMs that were revised after the calculated EMs from the grain-size data of the surface sediment in the South Yellow
Table 1 Mean distances between the calculated EMs and the Basic-EMs (MDEM) and the mean R2 values between the calculated abundances and the test abundances (MRA) of test dataset 1–4. Maximum generation number Test dataset 1 Test dataset 2 Test dataset 3 Test dataset 4
Fig. 4. The four Basic-EMs used in this paper. 3
MDEM
MRA
30
100
30
100
0.018 0.059 0.009 0.079
0.012 0.006 0.005 0.075
1 1 1 1
1 1 1 1
X. Zhang et al.
Estuarine, Coastal and Shelf Science 236 (2020) 106656
If the small amount of Basic-EM 1 contained in the unevenly mixed test dataset 2 is taken as errors, the unmixing of test dataset 2 based on 3 EMs provides an example for evaluating the effects of nonrandom error (or a weak signal). The main GSD peaks for the calculated EMs based on 3 EMs are slightly lower than the peaks of corresponding Basic-EMs (Appendix C, Fig. 2E). The MRA is 0.99, indicating that the signals contained in the abundance sequences were extracted almost perfectly. In summary, BasEMMA is capable of unmixing all of the test datasets. 3.4. Optimality evaluation of the new algorithm As shown before, van Hateren et al. (2018) evaluated five EM unmixing algorithms using coversand datasets. We also unmixed the coversand datasets using BasEMMA and found four EMs, which are named as Basic-CS1-4, for the calculated EMs by BasEMMA are Basic-EMs (See the Excel sheet of “Coversand EMs” in Appendix B for details). The MCR of the unmixing is 1 (Table 2), meaning the error is zero. Nevertheless, there are large differences between the GSDs of the Basic-CSs and the CSs: Basic-CS3 and 4 show more prominent peaks (Fig. 5A). Further study shows that the CSs can be accurately reproduced by mixing the Basic-CSs, R2 ¼ 1(Fig. 5B and C. See the Excel sheet of “Coversand EMs” in Appendix B for details). For example: CS3 ¼ 0.010 * Basic-CS1 þ 0.274 * Basic-CS2 þ 0.403 * Basic-CS3 þ 0.313 * Basic-CS4. Although CS3 has a single GSD peak, it is still a mixed product because the modes of Basic-CS 2–4 are very close. If the coversand dataset is not produced by the CSs but instead by the Basic-CSs, the coversand dataset can be viewed as highly mixed dataset. The unmixing of coversand dataset based on 4 EMs by van Hateren et al. (2018) provides a good case for the evaluation of EM unmixing algo rithms for highly mixed grain-size data. The GSDs of the calculated EMs by the aforementioned unmixing algorithms based on 4 EMs vary greatly (Fig. 6). Fig. 5B and C show that there is a large space outside the tet rahedron formed by the CSs in the EM space formed by the Basic-CSs, especially on the Basic-CS3 side. The existence of such a large space outside the tetrahedron means that many different EM tuples can be obtained to accurately reproduce the coversand dataset. It is impractical for the calculated EMs from previous unmixing algorithms to obtain the CSs exactly. The MCR results show that the calculated EMs from BasEMMA, AnalySize, NMF and EMMAgeo can accurately reproduce the coversand dataset although the GSDs of the calculated EMs vary greatly, while the EMMA and BEMMA results show large errors (Table 2). However, some of the calculated EMs from AnalySize, NMF, EMMAgeo and EMMA have obvious bimodal or multimodal GSDs, indicating that they are mixed products (Fig. 6). Although some of the calculated EMs have a single peak, their peaks are lower than those of the Basic-CSs, indicating that they are also mixed products similar to the CS3 as shown before (Fig. 5A). The calculated EM2 from EMMA has a higher peak than the Basic-CS2, indicating that it exceeds the EM space. In summary, most of the EMs calculated by the previous unmixing algorithms can reproduce the coversand dataset well, but they are still mixed products.
Fig. 5. Basic-EMs of coversand dataset (A) and the distribution of coversand samples and the CSs in EM space formed by Basic-CSs (B and C). The coversand dataset are from van Hateren et al. (2018).
3.5. A new criterion for determining the appropriate EM number Previous methods for determining the appropriate EM number are based mainly on statistical results (Weltje, 1997), such as the inflection point on the class-wise R2 curve (Prins et al., 2000; Liang et al., 2019; Wen et al., 2019; Torre et al., 2020). However, the evaluation by van Hateren et al. (2018) shows that the method using the inflection point may be incorrect. Our results also show an exception: the class-wise R2 curve of test dataset 1 (synthesized from 4 EMs) has a significant in flection point when the EM number is 3 (Appendix C, Fig. 1C). The criteria for determining the appropriate EM number need to be further studied. We recommend a new criterion when using BasEMMA: the stability of the GSDs of the newly found EM. This criterion is based on the following results: (1) the GSDs of the calculated EMs in multiple oper ations are stable when the EM number is equal to the actual number (Appendix C, Figs. 1F, 2F and 3E and 4E); (2) the GSD of the newly found EM in multiple operations is not stable when the EM number is greater than the actual number (Fig. 7). This instability may be because the signals in the remaining information are not strong enough to overcome the effects of noise. 4. A case of application of BasEMMA 4.1. Grain-size data of sediments in the East China Sea The grain-size data of the sediments from Core 30 were further used to demonstrate the use of BasEMMA and how to interpret the unmixing results. Core 30 was sampled in the East China Sea in June 2003. Cores 8 and 17 and thirty surface sediment samples derived on the same voyage were also used to explain the geological implications of the unmixed results from the grain-size data of Core 30. The sampling locations of Cores 8, 17 and 30 are shown in Fig. 9 C, and their lengths are 280 cm, 146 cm and 236 cm, respectively. The cores were divided into sub samples with 2 cm intervals except for the 0–6 cm interval of Core 30, which was divided into 1 cm intervals. From each sample, approxi mately 0.2 g of sediment was first digested with 15 mL of H2O2 (15%) and 5 ml of diluted HCl (0.2 mol/dm3) for 24 h, and distilled water was later added for centrifuging. The sample was then ultrasonically vibrated for 15 s. The grain-size analyses were performed using a
Table 2 Median values of the class-wise R2 between the calculated grain-size data and the coversand datasets. BasEMMA AnalySize NMF EMMAgeo EMMA BEMMA a
Coversand
Noisy coversand
1 1a 1a 1a 0.5a Faileda
0.9 0.9a 0.9a 0.9a 0.6a 0.7a
Data from van Hateren et al. (2018). 4
X. Zhang et al.
Estuarine, Coastal and Shelf Science 236 (2020) 106656
Fig. 6. GSD comparisons of the calculated-EMs from different EM unmixing algorithms. The GSDs of EMs calculated from AnalySize, EMMAgeo, DRS-unmixer and EMMA are from van Hateren et al. (2018).
Fig. 7. Instability of newly found EM. The GSDs of the newly found EM in multiple operations on test dataset 1 (A), test dataset 2 (B), test dataset 3 (C), and test dataset 4 (D) when the EM number is greater than the actual number.
Malvern 2000 laser grain-size analyzer with a detection range of 0.02–2,000 μm.
caused by aggregation effects related to biological activities in the East China Sea (Lei et al., 2001). Our EM unmixing results also show that it is difficult to find the EM that corresponds to the very fine sediments (Fig. 8). Therefore, we believe that 3 EM is a reasonable value for unmixing the grain-size data for Core 30. Finally, using the 3 EMs assumption, we performed the second unmixing using BasEMMA ac cording to a “Maximum generation number” of 300.
4.2. The operational process BasEMMA was first performed for unmixing the grain-size data of Core 30 using a “Maximum generation number” of 30. The mean classwise R2 curve has an inflection point at 3 EMs; however, the class-wise R2 values based on 3 EMs are low for the grain-size grades of <2 μm and >300 μm (Fig. 8). Therefore, it is difficult to determine the appropriate EM number based only on the statistical results. Multiple operations based on 4 EMs show that the GSDs of the newly found EM are unstable (Fig. 8D). Therefore, we believe that the signals contained in the grain-size grades of <2 μm and >300 μm are too weak to be extracted. Previous research shows that the sedimentation process of the very fine sediments of <1 μm, which belong to clay, might be
4.3. The geological implications of the calculated EMs and their abundances 4.3.1. Sediment transport information The calculated EM1-3 from the grain-size data of Core 30 exhibit dominant peaks at 8, 16, and 125 μm, respectively (Fig. 9A). EM3 has a secondary peak at 1 μm, while EM1 and 2 have fine tails at grain-sizes less than 1 μm. All sample points for Core 30 cluster on the lower left 5
X. Zhang et al.
Estuarine, Coastal and Shelf Science 236 (2020) 106656
Fig. 8. The unmixing results of grain-size data from Core 30. The class-wise R2 values between the calculated grain-size data and original data (A), sample-wise R2 values between the calculated grain-size data and original data (B), class-wise and sample-wise mean R2 values versus EM number (C), and redundant EMs in multiple operations based on 4 EMs (D).
Fig. 9. The unmixing results of the grain-size data from Cores 8, 17 and 30 and the surface sediment in the East China Sea. The GSDs of the calculated EMs of the grain-size data from Core 30 (A), the sample distribution in EM space (B), the spatial distributions of EM1 (C), EM2(B), and EM3(C) calculated from the grain-size data of the surface sediments in the East China Sea (data from Zhang et al., 2006b).
side of the ternary diagram, with most of the samples lying on the straight line between EM1 and 2 (Fig. 9B), showing a highly mixed feature. Core 30 is located at the far end of the southward transport path of the sediments that were mainly discharged by the Yangtze River (Liu et al., 2007; Xu et al., 2012; Jia et al., 2018). Previous research on the grain-size data of the surface sediments in the East China Sea using BasEMMA shows that the spatial distributions of EM1 and 2 with dominant modes at 10 and 32 μm (corresponding to EM1 and 2 of Core
30) demonstrate the southward transport of terrigenous sediments (Zhang et al., 2006b). On the way south, coarser EM2 is preferentially deposited, resulting in the higher EM2 and lower EM1 abundances in the north; the opposite is observed in the south, showing a proximal-to-distal fining trend (Fig. 9C and D, Zhang et al., 2006b). The spatial distribution of EM3 with a dominant mode at 125 μm (corre sponding to the EM3 of Core 30) reflects the transport of the resus pended relict sediments from the mid-shelf to the inner shelf during short-term climate events such as storms (Fig. 9E, Zhang et al., 6
X. Zhang et al.
Estuarine, Coastal and Shelf Science 236 (2020) 106656
2006b). Similar phenomenon also occurs in the South China Sea (Xu et al., 2017), the middle Okinawa Trough (Li et al., 2019) and the inner-shelf of Heini Bay (Liu et al., 2019a,b). Using Equation (2), the grain-size data from the surface sediments and from Cores 8 and 17 were projected to the EM space formed by the calculated EMs of Core 30 (Fig. 9B). The sample points spread throughout the EM space although they are still densely distributed between EM1 and 2. The EM median for Core 30 is closer to the fine EM1 than for Core 8, indicating the proximal sedimentary characteristics. The ratio between EM1 and EM2 of Core 17, located approximately 130 km southeast of the Yangtze Estuary, is comparable to that of Core 30, located approximately 250 km south of the Yangtze Estuary, indicating the anisotropy in different transport directions of the terrigenous sedi ments (Zhang et al., 2006b; Liu et al., 2007, 2019a,b; Xu et al., 2012; Li et al., 2016; Jia et al., 2018). If we draw a ternary diagram using Minimum-EMs (refer to the red dash lines in Fig. 9B) instead of the Basic-EMs, the samples will scatter throughout the local EM space, which is not conducive to studying the transport phase of Core 30 in the entire sediment transport system. 4.3.2. Climate change information The southward transport of sediments in the East China Sea is mainly controlled by coastal currents mainly driven by the East Asian winter monsoon; and this monsoon also controls China’s climate changes (Xiao et al., 2004, 2005; Xu et al., 2017). Therefore, the grain-size changes seen for the Core 30 sediments may reflect paleoclimatic changes in China. We use EM2/(EM1 þ EM2) as the climate indicator for Core 30. The impact of EM3 is filtered out because it does not reflect the south ward transport of terrestrial sediments as described above. When this climate indicator is large, there are more coarse constituents in the sediments, corresponding to strong coastal currents and winter mon soons, and lower temperatures, and vice versa. According to the AMS 14C ages of adjacent cores since 4,000 years BP (e.g., Cores DD2, 30 and PC6 in Xu et al., 2012), the deposition rate in this area was approximately 0.06–0.17 cm/year, with an average value of 0.11 cm/year. We use the climate indicator from Core 30 for com parison with the paleoclimate change curve of China derived by Zhu (1973) and found that a comparison based on a deposition rate of 0.10 cm/year is in good agreement (Fig. 10). The main warm and cold pe riods in Chinese history show good correspondence to the low and high values of the climate indicator from Core 30. The paleoclimate change curve of China by Zhu (1973) was derived from archeological and his torical documents, so this curve has better time accuracy. The good correspondence between the above two climate change curves not only further confirms the above sediment transport mechanism in the East China Sea but also reveals the potential of using grain-size data from sediments to accurately determine deposition rates in stable sedimen tary environments(Zhang et al., 2106; Liu et al., 2017; Li et al., 2019; Liu et al., 2019a,b).
Fig. 10. (A) Chinese temperature variation curves inferred by Zhu (1973), and (B) climate change information deduced from the grain-size data from Core 30 using BasEMMA.
mixed data such as the grain-size data from cores sampled in stable sedimentary environments. The GSDs of the calculated EMs by BasE MMA in multiple operations will not be exactly the same: the GSDs are stable when the EM number is equal to the actual number, and the GSDs of the newly found EMs are not stable when the EM number is greater than the actual number. This feature can be used to determine the appropriate EM number. Computer code availability The code of the algorithm and supplementary data can be found at https://github.com/ouczxd/BasEMMA. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. CRediT authorship contribution statement Xiaodong Zhang: Conceptualization, Methodology, Software, Formal analysis, Writing - original draft. Hongmin Wang: Data cura tion, Formal analysis, Writing - review & editing. Shumei Xu: Investi gation, Resources. Zuosheng Yang: Writing - review & editing, Supervision.
5. Conclusions We propose that EM unmixing algorithms should search for the Basic-EMs instead of Random-EMs or Minimum-EMs, because the EMs that can be unmixed by mathematics can also be purified by physical processes. By studying the GSDs of the Basic-EMs and the distributions of sample points in the EM space formed by the Basic-EMs rather than in any local EM space formed by the Minimum-EMs or Random-EMs, it is helpful for revealing the true sediment provenances and the transport phases in the entire sediment transport system. BasEMMA introduces genetic algorithms for unmixing compositional data, locates the EM space where the samples reside, and searches for the Basic-EMs in the EM space. The unmixing errors of BasEMMA are comparable to the state-of-the-art EM unmixing algorithms. However, BasEMMA tends to find the outermost points in the EM space, i.e., the Basic-EMs, which is very crucial, especially when unmixing highly
Acknowledgements This work was supported by the Ministry of Science and Technology of China (2017YFC0405502) and the Shandong Provincial Natural Sci ence Foundation, China (ZR2019MD037). We thank J. A. van Hateren for providing the coversand datasets and conducting in-depth discus sion. We thank two anonymous reviewers for their insightful and clear comments, from which our manuscript has greatly benefited. Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi. org/10.1016/j.ecss.2020.106656. 7
X. Zhang et al.
Estuarine, Coastal and Shelf Science 236 (2020) 106656
References
Asian monsoon: evidence from South China Sea sediments. Acta Geol. Sin. 88 (2), 661–668. Middleton, G.V., 1976. Hydraulic interpretation of sand size distributions. J. Geol. 84, 405–426. Myers, J.D., Angevine, C.L., Frost, C.D., 1987. Mass balance calculations with end member compositional variability: applications to petrologic problems. Earth Planet Sci. Lett. 81 (2), 212–220. Paterson, G.A., Heslop, D., 2015. New methods for unmixing sediment grain size data. Gcubed 16 (12), 4494–4506. Prins, M.A., Postma, G., Weltje, G.J., 2000. Controls on terrigenous sediment supply to the Arabian Sea during the late Quaternary: the Makran continental slope. Mar. Geol. 169, 351–371. Renner, R.M., 1993. The resolution of a compositional data set into mixtures of fixed source compositions. J. Roy. Stat. Soc. 42, 615–631. Shang, Y., Beets, C.J., Tang, H., Prins, M.A., Lahaye, Y., Van Elsas, R., Sukselainen, L., Kaakinen, A., 2016. Variations in the provenance of the late Neogene Red Clay deposits in northern China. Earth Planet Sci. Lett. 439, 88–100. Torre, G., Gaiero, D.M., Cosentino, N.J., Coppo, R., 2020. The paleoclimatic message from the polymodal grain-size distribution of late Pleistocene-early Holocene Pampean loess (Argentina). Aeolian Res. 42, 100563. van Hateren, J.A., Prins, M.A., van Balen, R.T., 2018. On the genetically meaningful decomposition of grain size distributions: a comparison of different end member modelling algorithms. Sediment. Geol. 375, 49–71. https://doi.org/10.1016/j. sedgeo.2017.12.003. Weltje, G.J., 1997. End member modeling of compositional data: numerical-statistical algorithms for solving the explicit mixing problem. Math. Geol. 29, 503–549. Weltje, G.J., Prins, M.A., 2003. Muddled or mixed? Inferring palaeo-climate from size distributions of deep-sea clastics. Sediment. Geol. 162 (1–2), 39–62. Weltje, G.J., Prins, M.A., 2007. Genetically meaningful decomposition of grain size distributions. Sediment. Geol. 202 (3), 409–424. Wen, Y., Wu, Y., Tan, L., Li, D., Fu, T., 2019. End-member modeling of the grain size record of loess in the Mu Us Desert and implications for dust sources. Quat. Int. 532, 87–97. Wyatt, M.B., Mcsween, H.Y., 2002. Spectral evidence for weathered basalt as an alternative to andesite in the northern lowlands of mars. Nature 417 (6886), 263–266. Xiao, S.B., Li, A.C., Jiang, F.Q., Li, T.G., Huang, P., Xu, Z.K., 2004. The 2Ka record and its climate significance of the mud area of inner shelf of the east China sea. Chin. Sci. Bull. 49 (21), 2233–2238 (in Chinese with English abstract). Xu, F., Hu, B., Dou, Y., Liu, X., Wan, S., Xu, Z., Xu, T., Liu, Z., Yin, X., Li, A., 2017. Sediment provenance and paleoenvironmental changes in the northwestern shelf mud area of the South China Sea since the mid-Holocene. Continent. Shelf Res. 144, 21–30. Xu, K.H., Li, A.C., Liu, J.P., Milliman, J.D., Yang, Z.S., Liu, C.S., Kao, S.J., Wang, S.M., Xu, F.J., 2012. Provenance, structure, and formation of the mud wedge along inner continental shelf of the East China Sea: a synthesis of the Yangtze dispersal system. Mar. Geol. 291–294, 176–191. Xiao, S.B., Li, A.C., 2005. A study on environmentally sensitive grain size population in inner shelf of the East China sea. Acta Sedimentol. Sin. 1, 122–129 (in Chinese with English abstract). Yu, S.Y., Colman, S.M., Li, L., 2016. BEMMA: a hierarchical Bayesian end member modeling analysis of sediment grain size distributions. Math. Geosci. 48 (6), 723–741. Zhang, C.Y., Feng, X.L., 2009. The spatial distribution and analysis about the grain size of sediments in the Lianyungang nearshore sea area. Acta Oceanol. Sin. 31, 120–127 (in Chinese with English abstract). Zhang, X.D., Xu, S.M., Zhai, S.K., Zhang, H.J., 2006a. The inversion of climate information from the sediment of inner shelf of East China Sea using End member model. Mar. Geol. Quat. Geol. 26, 25–32 (in Chinese with English abstract). Zhang, X.D., Zhai, S.K., Xu, S.M., 2006b. The application of grain size end—member modeling to the shelf near the estuary of Changjiang River in China. Acta Oceanol. Sin. 28, 159–166 (in Chinese with English abstract). Zhang, X.D., Ji, Y., Yang, Z.S., Wang, Z.B., Liu, D.S., Jia, P.M., 2016. End member inversion of surface sediment grain size in the south yellow sea and its implications for dynamic sedimentary environments. Sci. China Earth Sci. 59 (2), 258–267. Zhu, K.Z., 1973. The primary study of the last 5 000 years climate changes in China. Sci. China (2), 168–189 (in Chinese).
Ashley, G.M., 1978. Interpretation of polymodal sediments. J. Geol. 86, 411–421. Awadh, B., Sepehri, N., Hawaleshka, O., 1995. A computer-aided process planning model based on genetic algorithms. Comput. Oper. Res. 22 (8), 841–856. Derrien, M., Kim, M., Ock, G., Hong, S., Cho, J., Shin, K., Hur, J., 2018. Estimation of different source contributions to sediment organic matter in an agricultural-forested watershed using end member mixing analyses based on stable isotope ratios and fluorescence spectroscopy. Sci. Total Environ. 618, 569–578. Dietze, E., Hartmann, K., Diekmann, B., Ijmker, J., Lehmkuhl, F., Opitz, S., Stauch, G., Wünnemann, B., Borchers, A., 2012. An end member algorithm for deciphering modern detrital processes from lake sediments of Lake Donggi Cona, NE Tibetan Plateau, China. Sediment. Geol. 243–244, 169–180. Geen, A.V., Rosener, P., Boyle, E., 1988. Entrainment of trace-metal-enriched Atlanticshelf water in the inflow to the mediterranean sea. Nature 331 (6155), 423–426. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Company, New Jersey, pp. 2104–2116. Hannigan, R.E., Basu, A.R., Teichmann, F., 2001. Mantle reservoir geochemistry from statistical analysis of ICP-MS trace element data of equatorial mid-Atlantic MORB glasses. Chem. Geol. 175, 397–428. Heinz, D.C., Chang, C.I., 2001. Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Rem. Sens. 39 (3), 529–545. Heslop, D., Von Dobeneck, T., H€ ocker, M., 2007. Using non-negative matrix factorization in the “unmixing” of diffuse reflectance spectra. Mar. Geol. 241 (1–4), 63–78. Heslop, D., 2015. Numerical strategies for magnetic mineral unmixing. Earth Sci. Rev. 150, 256–284. https://doi.org/10.1016/j.earscirev.2015.07.007. Jia, J., Gao, J., Cai, T., Li, Y., Yang, Y., Wang, Y., Xia, X., Li, J., Wang, A., Gao, G., 2018. Sediment accumulation and retention of the Changjiang (Yangtze River) subaqueous delta and its distal muds over the last century. Mar. Geol. 401, 2–16. Kim, J.H., Kim, K.H., Thao, N.T., Batsaikhan, B., Yun, S.T., 2017. Hydrochemical assessment of freshening saline groundwater using multiple end members mixing modeling: a study of Red River delta aquifer, Vietnam. J. Hydrol. 549, 703–714. Laumonier, M., Scaillet, B., Pichavant, M., Champallier, R., Andujar, J., Arbaret, L., 2014. On the conditions of magma mixing and its bearing on andesite production in the crust. Nat. Commun. 5, 5607. Lourens, L.J., Wehausen, R., Brumsack, H.J., 2001. Geological constraints on tidal dissipation and dynamical ellipticity of the Earth over the past three million years. Nature 409, 1029–1033. Lei, K., Yang, Z.S., Guo, Z.G., 2001. Sedimentation with aggregation of suspended sediment in a mud area of the northern east China sea. Oceanol. Limnol. Sinica 32 (3), 288–295 (in Chinese with English abstract). Li, T., Sun, G., Ma, S., Liang, K., Yang, C., Li, B., Luo, W., 2016. Inferring sources of polycyclic aromatic hydrocarbons (PAHs) in sediments from the western Taiwan Strait through end-member mixing analysis. Mar. Pollut. Bull. 112, 166–176. Li, Q., Zhang, Q., Li, G., Liu, Q., Chen, M., Xu, J., Li, J., 2019. A new perspective for the sediment provenance evolution of the middle Okinawa Trough since the last deglaciation based on integrated methods. Earth Planet Sci. Lett. 528, 115839. Liang, X., Niu, Q., Qu, J., Liu, B., Liu, B., Zhai, X., Niu, B., 2019. Applying end-member modeling to extricate the sedimentary environment of yardang strata in the Dunhuang Yardang National Geopark, northwestern China. Catena 180, 238–251. Liu, J.P., Xu, K.H., Li, A.C., Milliman, J.D., Velozzi, D.M., Xiao, S.B., Yang, Z.S., 2007. Flux and fate of Yangtze River sediment delivered to the East China sea. Geomorphology 85 (3), 208–224. Liu, S., Mi, B., Fang, X., Li, X., Pan, H., Chen, M., Shi, X., 2017. A preliminary study of a sediment core drilled from the mud area on the inner shelf of the East China Sea: implications for paleoclimatic changes during the fast transgression period (13 ka B. P.-8 ka B.P.). Quat. Int. 441, 35–50. Liu, J., He, W., Cao, L., Zhu, Z., Xiang, R., Li, T., Shi, X., Liu, S., 2019a. Staged finegrained sediment supply from the Himalayas to the Bengal Fan in response to climate change over the past 50,000 years. Quat. Sci. Rev. 212, 164–177. Liu, Y., Huang, H., Liu, X., Yan, L., Zhang, Z., Zhang, Y., Song, Z., 2019b. Response of seafloor sediment composition to a strong storm event in the inner-shelf of Heini Bay, China. Continent. Shelf Res. 175, 1–11. Meng, X.W., Liu, Y.G., Zhang, X.D., Zhang, J., 2014. The combined effect of Tibetan plateau uplift and glacial-interglacial cycles on the quaternary evolution of the east
8