Journal Pre-proof A new approach to formulate fuzzy regression models Liang-Hsuan Chen, Sheng-Hsing Nien
PII: DOI: Reference:
S1568-4946(19)30696-9 https://doi.org/10.1016/j.asoc.2019.105915 ASOC 105915
To appear in:
Applied Soft Computing Journal
Received date : 13 March 2019 Revised date : 30 October 2019 Accepted date : 1 November 2019 Please cite this article as: L.-H. Chen and S.-H. Nien, A new approach to formulate fuzzy regression models, Applied Soft Computing Journal (2019), doi: https://doi.org/10.1016/j.asoc.2019.105915. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier B.V.
Journal Pre-proof *Declaration of Interest Statement
Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Jo
urn a
lP
re-
pro of
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
Title page Click here to view linked References
Journal Pre-proof
A New Approach to Formulate Fuzzy Regression Models
Liang-Hsuan Chen* and Sheng-Hsing Nien
pro of
*Corresponding e-mail:
[email protected]
Department of Industrial and Information Management,
Jo
urn a
lP
re-
National Cheng Kung University, Tainan 701, Taiwan, R.O.C.
*Highlights (for review)
Journal Pre-proof
Jo
urn a
lP
re-
pro of
Highlight: 1. A new approach is developed to formulate fuzzy regression models. 2. A new operator, fuzzy product core (FPC), is proposed for the formulations. 3. The approach reduces unnecessary information to increase model performance. 4. The comparisons show the better performance than the existing approaches.
*Manuscript Click here to view linked References
Journal Pre-proof
A New Approach to Formulate Fuzzy Regression Models
Liang-Hsuan Chen and Sheng-Hsing Nien
Department of Industrial and Information Management,
pro of
National Cheng Kung University, Tainan, Taiwan, R.O.C.
Abstract
A fuzzy regression model is developed to construct the relationship between the response and explanatory variables in fuzzy environments. To enhance explanatory power and take into account
re-
the uncertainty of the formulated model and parameters, a new operator, called the fuzzy product core (FPC), is proposed for the formulation processes to establish fuzzy regression models with
lP
fuzzy parameters using fuzzy observations that include fuzzy response and explanatory variables. In addition, the sign of parameters can be determined in the model-building processes. Compared to existing approaches, the proposed approach reduces the amount of unnecessary or unimportant
urn a
information arising from fuzzy observations and determines the sign of parameters in the models to increase model performance. This improves the weakness of the relevant approaches in which the parameters in the models are fuzzy and must be predetermined in the formulation processes. The proposed approach outperforms existing models in terms of distance, mean similarity, and credibility measures, even when crisp explanatory variables are used.
Jo
Keywords: Fuzzy regression model, Fuzzy product core, Mathematical programming
1
Journal Pre-proof 1. Introduction Statistical regression analysis is a well-known method for formulating the relationship between the response (output) variable and some explanatory (input) variables using a set of observations based on the assumption of normal distributions. In general, the least-squares method is used to
pro of
determine the unbiased parameters in model-fitting processes. The uncertainty of the established model results from the randomness inherent in the observations. In the real world, observations might not be measured as quantitative values, but in linguistic (fuzzy) terms, such as “about 10 inches”, “approximately equal to 50 pounds”, “low”, “high”, and so on. Membership functions are defined to characterize such linguistic terms [1]. To deal with this kind of data, linear regression analysis with fuzzy models [2] was developed to perform the functions of statistical regression
re-
analysis. Fuzzy regression is different from statistical regression in that it does not follow a probability distribution. The deviations (estimation errors) are attributed to the imprecision of the
lP
observed values, the indefiniteness of the model structure, or both. The uncertainty of the model is due to the fuzziness of observations, which introduces noise into the fuzzy parameters of the model. Therefore, an important step in model-fitting processes is the adjustment of fuzzy parameters to fit
urn a
the model using the available samples.
The scope of this study focuses on the development of fuzzy regression models. In general, two types of approach are adopted to formulate optimal fuzzy regression models that minimize the estimation errors between the observed and estimated fuzzy responses. The two types attempt to minimize the total degree differences and the total distances of their membership functions,
Jo
respectively. The first type of approach was firstly proposed by Tanaka et al. [2], who built a fuzzy regression model that can explain all observations with a fuzzy response and several crisp explanatory variables under a subjective confidence level h. The parameters in their model are fuzzy and thus the uncertainty of the model can be described. Tanaka [3], Tanaka and Watada [4], Sakawa and Yano [5] and Hojati et al. [6] later improved this approach. However, these improved versions are sensitive to outliers [7], possibly producing infinite solutions, and unnecessarily widen the 2
Journal Pre-proof spread of the estimated fuzzy response when more data are included in the model [8]. Kao and Chyu [8] proposed a fuzzy regression model with crisp coefficients that uses a two-stage approach to avoid widening the spread. In their formulation processes, a mathematical programming model was built up to minimize the criterion proposed by Kim and Bishu [9], namely the non-overlapping area between the membership functions of the estimated and observed fuzzy responses. Different
pro of
from Tanaka et al.’s approach, which requires the confidence level h to be set subjectively, some studies [10] have formulated fuzzy regression models by determining the optimal confidence level h. With this type of approach, many studies [11-15] have shown the advantages of their respective fuzzy regression analyses. However, the criterion used to measure the model performance in the formulation processes may not reflect the actual difference between the observed and estimated
re-
fuzzy responses [16] when the membership functions do not overlap. As such, the formulated fuzzy regression model may not have optimal performance for the prediction purpose. The second type of approach, proposed by Celmiņš [16], for constructing fuzzy regression
lP
models is based on the least-squares method from statistical regression analysis. The objective of this approach is to minimize the sum of squared errors in terms of the distance between the observed and estimated responses. A number of studies have applied this approach for fuzzy
urn a
observations containing fuzzy response and explanatory variables to establish fuzzy regression models with crisp parameters [17-19]. Some fuzzy regression models with fuzzy parameters have been formulated using fuzzy response and crisp explanatory variables [20-24] or fuzzy response and explanatory variables [25, 26]. These models share the problems described for the first type of
Jo
approach. Chang [27] adopted the concept of least absolute deviations (LADs) for formulating models. Although the LAD estimator is more robust than the least-squares deviation estimator [28], the fuzzy regression models based on LAD produce a wider spread of the estimated responses when the explanatory and response variables and parameters are fuzzy. To avoid this, many studies have developed approaches for constructing fuzzy regression models using crisp explanatory variables [29, 30] or crisp parameters [31, 32], or constructing quadratic models [33, 34]. Few studies have
3
Journal Pre-proof constructed fuzzy regression models based on LAD with fuzzy explanatory and response variables and fuzzy parameters [35, 36]. Since probability assumptions do not hold in fuzzy regression analyses [37], it cannot be proven that the parameters are unbiased and thus fuzzy regression models should retain the uncertainty of explanatory and response variables and parameters. However, many proposed models with crisp parameters may ignore the uncertainty [18, 33, 38],
pro of
since the multiplication of fuzzy numbers may produce a wider spread, degrading model performance. Acknowledging this problem, Hong et al. [39] proposed shape-preserving operations based on the weakest T-norm for formulating fuzzy regression models with the least-squares error. Using shape-preserving operations, Kelkinnama and Taheri [36] established fuzzy least-absolutes regression models and achieved better performance than previous studies. However, the sign of the
re-
parameters in the model must be predetermined in their approach, which restricts its generalized application.
The determination of the sign of parameters in the model-building processes can influence
lP
model performance because the fuzzy multiplication operation for two fuzzy numbers with the same sign is different from that for those with opposite signs [40]. Existing approaches have this problem. For example, some approaches [35, 36] with fuzzy explanatory variables, response
urn a
variables, and parameters are derived based on the assumption that the signs of the fuzzy explanatory variables and fuzzy parameters are either positive or known. However, this may not be the case in practical applications. Sometimes, negative fuzzy parameters are obtained in formulations, or negative fuzzy explanatory variables are used in applications; however, they are
Jo
usually set as positive in the development of approaches. This may affect the performance of the established models and even produce ineffective results. Acknowledging the problems described above, the present study proposes an approach for formulating fuzzy regression models with fuzzy parameters considering observations with fuzzy explanatory and response variables. In the formulation processes, a computationally simple and generalized operator, called the fuzzy product core (FPC), is proposed. With FPC, a mathematical
4
Journal Pre-proof programming problem is built up using LAD as the objective to determine the optimal fuzzy parameters with the corresponding sign. With the established fuzzy regression models, the estimated responses do not have unnecessary spread in their membership functions, increasing model performance. The rest of this paper is organized as follows. In the following section, the concept of the FPC
pro of
operator is described and formulated based on fuzzy multiplication. Some properties of FPC are also provided. Section 3 describes the performance criteria, namely the mean similarity measure, distance measure, and credibility measure, used for comparisons. The proposed new operator is used in Section 4 to build up the mathematical programming model to deal with observations with fuzzy explanatory and response variables based on the criterion of LAD to construct a fuzzy
re-
regression model with fuzzy parameters. In Section 5, some examples are used for comparison with some existing approaches to illustrate the advantages of the proposed approach. Conclusions are
lP
given in Section 6.
2. FPC operator
urn a
Considering the uncertainty of a fuzzy regression model, the parameters of the model are fuzzy. To overcome the possible weakness of existing approaches, a new operator, called FPC, is introduced and derived in this section. Let X {x1 , x2 ,
, xn } denote the universal set, and A and
A ( x) be the fuzzy set of A and its membership function on X, respectively. An -cut of A is
Jo
defined as A {x A ( x) } , [0,1] , representing a crisp interval that can be denoted by A [ AL , AU ] . Particularly, when =0, A 0 [ AL0 , AU0 ] denotes the support of A . To develop
the new operator, the definitions of fuzzy numbers, triangular fuzzy numbers, and fuzzy arithmetic are briefly introduced as follows. Detailed information can be found elsewhere [40-42]. Definition 1. Suppose that A is a fuzzy set on the real line; then, A is called a fuzzy number if the properties of normality, convexity, and boundary are satisfied. 5
Journal Pre-proof Definition 2. A triangular fuzzy number (TFN) A can be represented by a set of triple elements as
A (a L , aC , aU ) . The corresponding membership function A ( x) is linearly increasing and decreasing in [a L , aC ] and [a C , aU ] , respectively, where A (a L )= A (aU )=0 and A ( a C )=1 . In addition, if a L 0 ( aU <0), it means that A 0 ( A 0 ).
pro of
Definition 3. Two algebraic operations, i.e., addition ( ) and multiplication ( ), of two TFNs
A (a L , aC , aU ) and B (b L , bC , bU ) can be expressed as: A B a L b L , a C bC , aU bU
(2)
re-
a Lb L , a C bC , aU bU , if A, B 0 A B aU b L , a C bC , a LbU , if A 0, B 0 aU bU , a C bC , a Lb L , if A, B 0
(1)
Based on Eq. (2), for any two TFNs, A and B , the boundary of A B can be determined by
lP
their extreme values, i.e., a L b L , a L bU , aU b L , or aU bU . The central value is determined by a C bC .
Lemma 1: Define a zero TFN as 0 (0, 0, 0) . Based on the definition of the arithmetic operators
urn a
of TFNs, the multiplication of any two TFNs with different signs, such as A 0 and B 0 , will result in zero if and only if A or B is a zero TFN. The proof is straightforward. Fuzzy multiplication is different from the multiplication of crisp values since the fuzzy product may have a great spread. This spread could lower the value of the generated information due to a
Jo
possible increase in fuzziness (or uncertainty). Furthermore, such the augmented spread from the multiplications of fuzzy numbers can make the performance of using fuzzy data in some applications reduced. The formulation of a fuzzy regression model is one such application. For fuzzy observations with fuzzy explanatory and response variables, the performance of the established fuzzy regression models with fuzzy parameters may be greatly influenced by the inclusion of unnecessary uncertainty. This has motivated the development of a new operator in this
6
Journal Pre-proof study. The core of a fuzzy number A is defined as the space where the height of membership is equal to one; it is denoted as H ( A) x | A( x) 1, x R [41]. If A is a TFN, the core is equal to a C , i.e., H ( A) aC . The core of the algebraic product of two TFNs can be easily obtained from Eq.
pro of
(2) as H ( A B) aC bC , which is a crisp value. Since H ( A B) produces the most likely value without any uncertainty, it is unsuitable for serving as an operator to deal with the problem mentioned above.
Suppose that A and B are two positive TFNs, i.e., A and B > 0. The fuzzy product can be represented as the composition via multiplication of all possible values of the -cuts of A
re-
and B . The -cuts of A B can be expressed as ( A B) = [ AL , AU ] [ BL , BU ] = [ AL BL , AU BU ]. This equation can be decomposed into the union of all x [ BL , BU ] , where x [ AL , AU ] ,
lP
as illustrated in Figure 1. For example, let =0 , A0 [ A0L , A0U ] , and B0 [ B0L , B0U ] . If x x* , an interval can be obtained using x* [ B0L , B0U ] , and the interval of ( A B )0 is the union of x [ B0L , B0U ] , where x [ A0L , A0U ] , i.e., ( A B )0 =
x A0L , A0U
x B0L , B0U . As such, applying
A B 0,1
urn a
decomposition theorems [41], the multiplication of two TFNs can be expressed as:
AL , AU BL , BU
0,1
x BL , BU L U x A , A
where denotes the membership degree of A B , and
x AL , AU
(3)
x BL , BU represents the
Jo
interval of the -cuts .
However, as described above, A B in Eq. (2) may have a great spread that significantly increase the coverage of domain variables x at the lower membership degrees, resulting in some unnecessary fuzziness (uncertainty). To deal with this problem, this study proposes the FPC operator to preserve the core information in the multiplication of two TFNs and thus avoid unnecessary fuzziness. In order to achieve this purpose, the union operator used to determine the 7
Journal Pre-proof interval of ( A B) is changed to the intersection operator, i.e.,
x AL , AU
x BL , BU . Based on
fuzzy logic theories, we adopt the operator “AND”, instead of “OR”, to acquire the common parts from various confidence levels, i.e., different levels, which actually represent the core information. The proposed operator can be formulated as:
0,1
x BL , BU L U x A , A
y
( A B)0
pro of
FPC * ( A B )
x[ A0L , A0U
(4)
x B0L , B0U
]
aU bU
a L bU
A B
a C bC
re-
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
aU b L
0.0
aL bL
x*
x
lP
A Figure 1. Decomposition of A B . Figure 2 illustrates the concept of the new operator, considering the core information. The
urn a
product of FPC * ( A B) has a smaller spread than that of the product of A B since the former operation did not consider unnecessary information. Examining Eq. (4), FPC * ( A B) is computed by multiplying various values of A and a fixed B with respect to each ( [0, 1]); the product approximates the TFN (aU b L , aC bC , a LbU ) . Similarly, FPC * ( B A) is computed by
Jo
multiplying various values of B by a fixed A for each . Based on the commutativity of multiplication, FPC * ( A B) should be equal to FPC * ( B A) . However, either FPC * ( A B) or FPC * ( B A) may be null. Figure 3 illustrates a possible condition for FPC * ( B A) that leads to a null result due to [a Lb L , aU b L ] [a LbU , aU bU ] . In order to satisfy the commutativity requirement, i.e., FPC * ( A B) FPC * ( B A) , we define the formulation of FPC as:
8
Journal Pre-proof FPC ( A B) x BL , BU 0,1 x AL , AU y
x AL , AU 0,1 x BL , BU
FPC * ( A B)0 x[ A0L , A0U ]
(5)
x B0L , B0U
aU bU a L bU
FPC ( A B )
pro of
*
a C bC
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
aU b L
0.0
aL bL
x*
x
y
re-
A Figure 2. Decomposition of FPC * ( A B) .
FPC * ( B A)0
x A0L , A0U
lP
x[ B0L , B0U ]
FPC ( B A) *
aU bU a L bU a C bC
aU b L
urn a
aL bL
x*
x
B Figure 3. Decomposition of FPC * ( B A) .
Based on the above descriptions, the outcome space of multiplication is generated by x [ BL , BU ] , where x [ AL , AU ] , or x [ AL , AU ] , where x [ BL , BU ] , and is bounded by the end
Jo
points of a Lb L , a LbU , aU b L and aU bU . The multiplication of two TFNs, A B , can be represented by an approximate TFN, which is determined by the lowest and largest values of
a Lb L , a LbU , aU b L , and aU bU . This is not done for FPC. Instead, based on Eq. (2), the approximate TFN of FPC is determined as the TFN with the central value a C bC and the narrower spreads. Specifically, the proposed FPC, FPC ( A B) , can be defined as follows. 9
Journal Pre-proof Definition 4. Suppose that A (a L , aC , aU ) and B (b L , bC , bU ) are two TFNs, and o (1) , o (2) ,
o (3) , and o (4) are the values in the increasing sequence of a L b L , a L bU , aU b L , and aU bU . If the relation o(2) a C bC o (3) holds, the approximate TFN of the FPC can be represented as:
FPC ( A B) (o(2) , aC bC , o(3) )
(6)
pro of
Based on the above definition and the formulation of FPC, the following properties can be obtained:
P1: If the relation o(2) a C bC o (3) cannot hold, based on the formulation of FPC, Eq. (5), the resulting approximate TFN of the FPC will not meet the definition of a fuzzy number since the central value will locate outside the support (=0). In other words, FPC ( A B) exists if and only
re-
if o(2) a C bC o (3) .
P2. If FPC ( A B) exists, then FPC ( A B) FPC ( B A) .
Based on the formulation of FPC in Eq. (5), this property is straightforward to prove.
They are expressed as follows:
lP
P3. Some approximate TFNs of FPC can be easily determined under the corresponding conditions.
urn a
(1) For A, B 0 or A, B 0 , if FPC ( A B) exists, then at least one of the following will hold:
b L aC bC a L and bC aU bU a C
(a) FPC ( A B) (a Lb L , aC bC , aU bU ) , iff
bC a L bU a C and b L aC bC aU
(9)
a C bU a L bC (b) FPC ( A B) (a b , a b , a b ) , iff U C and C L a b a b
(10)
(a) FPC ( A B) (aU b L , aC bC , a LbU ) , iff
(7)
(b) FPC ( A B) (a LbU , aC bC , aU b L ) , iff
a L bC aC b L and a C bU aU bC
(8)
Jo
(2) For ( A 0 , B 0 ) or ( A 0 , B 0 ), if FPC ( A B) exists, then at least one of the following will hold:
U U
C C
L L
(3) FPC ( A B) {0} , if 0 [a L , aU ] , 0 [b L , bU ] and a C bC 0 . 10
Journal Pre-proof Proof: (1) For A , B > 0, the values of o (1) = min{a Lb L , aU b L , a LbU , aU bU } = a L b L and o (4) =
max{a Lb L , aU b L , a LbU , aU bU } = aU bU can be obtained based on the above definition. Therefore, if FPC ( A B) exists, it must be composed of a L bU , aU b L , and the central point
pro of
a C bC . If FPC ( A B) (aU b L , aC bC , a LbU ) , then aU b L a C bC a LbU , where the relations aU b L a C bC and a C bC a LbU can be expressed as b L bC aC aU and bC bU a L aC , respectively. Similarly, if FPC ( A B) (a LbU , aC bC , aU b L ) , the relation a LbU a C bC aU b L will hold, implying that
a L aC bC bU and aC aU b L bC . For A , B < 0, since o (1) =
aU bU and o (4) = a L b L are determined, the other parts of the proof are the same as described
re-
above.
(2) Similar to (1), the condition of A 0 , B 0 implies that o (4) = aU b L and o (1) = a L bU . Thus,
lP
if FPC ( A B) exists, it will be composed of a L b L , aU bU , and the central point a C bC . Considering that FPC ( A B) (a Lb L , aC bC , aU bU ) holds, bC b L a L aC
and bU bC
a C aU hold. Otherwise, the equation FPC ( A B) (aU bU , aC bC , a Lb L ) would mean that the
urn a
relations aC aU bU bC and a L aC bC b L are satisfied. (3) If 0 [a L , aU ] and 0 [b L , bU ] , then hold, leading to
x AL , AU
limL
x 0; x[ A , A ]
x BL , BU =
U
x [ BL , BU ] =
x BL , BU
limL
x 0; x[ B , B ] U
x [ AL , AU ] =0 will
x AL , AU = 0, respectively. By P1, if
Jo
a C bC 0 , then FPC ( A B) {0} ; otherwise, FPC ( A B) does not exist. P4. If A is a crisp value, then FPC ( A B) A B ( Ab L , AbC , AbU ) . Since A is a crisp value, one can set a L a C aU A . Then, it can be concluded that
Ab L Ab L AbC AbU AbU . Based on Definition 4, o (1) o (2) and o (3) o (4) will hold, implying that
A B (o(1) , aC bC , o(4) ) (o(2) , aC bC , o(3) ) FPC ( A B)
holds. This property
shows that if one of the fuzzy numbers is crisp, FPC becomes the usual fuzzy multiplication 11
Journal Pre-proof operator. P5. If FPC ( A B) exists, then H FPC ( A B ) H ( A B ) . Based on Definition 4, A B (o(1) , aC bC , o(4) ) and FPC ( A B) (o(2) , aC bC , o(3) ) . Since they are TFNs, their cores are the same as the central point, i.e., a C bC .
Proof:
pro of
P6. For any R , FPC ( A B) FPC ( A B) .
Considering Definition 4, if 0 , then the relation o(1) o (2) a C bC o (3) o (4) holds, leading to FPC ( A B) FPC ( A B) (o(2) , aC bC , o(3) ) . If 0 , the relation o(1)
o (2) a C bC o (3) o (4)
will
make
FPC ( A B) (o(3) , aC bC , o(2) )
and
re-
FPC ( A B) ( o(2) , a C bC , o (3) ) = (o(3) , aC bC , o(2) ) hold. As such, FPC ( A B) = FPC ( A B) holds.
lP
Based on the above definitions, formulation, and properties, the proposed FPC operator can produce a fuzzy number in which only main information is included. This study uses FPC to
urn a
formulate fuzzy regression models.
3. Performance measures
In order to illustrate the performance of the proposed mathematical programming model in constructing fuzzy regression using FPC, comparisons with existing approaches are made. For this
below.
Jo
purpose, three kinds of commonly used performance measure are adopted in this study, as described
(1) Mean similarity measure This measure, proposed by Pappis and Nikos [42], is used to determine the similarity of two fuzzy numbers based on the area concept. This study adopts this measure to determine the similarity degree between the response variables from observations and the formulated fuzzy regression
12
Journal Pre-proof model as the performance criterion of the model. A larger value indicates better model performance. The measure is expressed as:
yi ( x) yˆi ( x) dx S Si 1 ( x ) dx ( x ) dx i 1 i 1 yˆi yi n
n
(11)
where yi ( x) and yˆ ( x) are the membership degrees of the observed and predicted response
pro of
i
variables, respectively. (2) Distance measure
This measure is designed to evaluate the distance between the response variables from observations and the formulated fuzzy regression model in terms of the Hamming distance [43]. By
re-
measuring the distances via several -cuts for all observations, the total estimation errors can be obtained. A smaller sum of distances indicates smaller total estimation errors, and thus higher prediction performance. A good fuzzy regression model will produce a smaller total distance. This
n
n
i 1
i 1
D Di
lP
measure is formulated as:
1 m (Yˆi )Lk (Yi )Lk (Yˆi )U k (Yi )U k 2m k 1
(12)
urn a
where k (k 1) (m 1) . This study uses this measure as a performance criterion for establishing fuzzy regression models in the formulation processes. (3) Credibility measure
The credibility measure [44] was proposed as a performance criterion in formulation processes to find the optimal h level based on Tanaka et al.’s model [2]. This measure considers a crisp
Jo
observed response variable and a fuzzy predicted response, and is represented as:
MFCi*
Y (Yi )
(13)
(Yˆi )U (Yˆi ) L
where Y (Yi ) is the membership degree of observed response Yi . In order to obtain the credibility of two fuzzy numbers, Eq. (13) can be modified by replacing the membership degree of Yi with
13
Journal Pre-proof the highest membership degree at the intersection of Yi and Yˆi , formulated as: n
n
i 1
i 1
MPF MFCi
max Yˆ Y
(14)
(Yˆi )U (Yˆi ) L
The above three measures are used to compare the performance of the proposed model with existing
pro of
approaches via illustrative examples.
4. Fuzzy regression modeling based on FPC
To retain the uncertainty of fuzzy regression models, this study formulates fuzzy regression models with fuzzy parameters using observations with fuzzy explanatory and response variables.
re-
The proposed FPC operator is adopted in the formulation processes, since it focuses on the core information. With the use of the proposed operator, the estimated responses will not have
lP
unnecessary spread in their membership functions, increasing model performance. Consider the fuzzy observation set (Yi , X i1 , X i 2 ,
, X ij ,
, X ip ) , where Yi ( yiL , yiC , yiU ) is the
response variable and X ij ( xijL , xijC , xijU ) represents the explanatory variables, all of which are
urn a
represented as TFNs. The general predicted fuzzy regression model can be expressed as:
Yˆi Bˆ0 Bˆ1 X i1 Bˆ2 X i 2
p
Bˆ p X ip Bˆ j X ij
(15)
j 0
where Yˆi ( yˆiL , yˆiC , yˆiU ) is the predicted fuzzy response variable, Bˆ j (bˆ jL , bˆCj , bˆUj ) is the
Jo
estimated fuzzy parameter, and X i 0 (1,1,1) is specified. To enhance the performance of fuzzy regression models, this study uses the proposed FPC operator in the formulation processes. Assume that the FPC product for each explanatory variable and its corresponding parameter exists. The predicted fuzzy regression model with the FPC operator can be formulated as:
Yˆi Bˆ0 FPC ( Bˆ1 X i1 ) FPC ( Bˆ2 X i 2 )
p
FPC ( Bˆ p X ip ) FPC ( Bˆ j X ij ) j 0
14
(16)
Journal Pre-proof Based on P3, various approximate TFNs of FPC ( Bˆ j X ij ) can be adopted based on the combination of the value sign of the explanatory variable and the corresponding parameter. However, to avoid unnecessary information in the formulation processes, a specific FPC is chosen for a particular combination. For example, when X ij , Bˆ j 0 , two equations, Eq. (7) or (8) can be
pro of
applied, as stated in P3. However, Eq. (8), i.e., FPC ( Bˆ j X ij ) ( X ijLbˆUj , X ijC bˆCj , X ijU bˆ jL ) , is adopted in this study. Such a choice aims to maximally reduce the spread of the fuzzy parameter based on the objective function of the mathematical programming model. That is, the application of Eq. (8) can have the most possibility to obtain the minimal spread of the fuzzy parameters from the mathematical model. As an extreme case, a crisp parameter can be produced for optimizing the
re-
objective function subject to the restrictions, and thus the spread of the fuzzy parameter converges to zero. If such a situation occurs, it means that bˆ jL bˆCj bˆUj bˆ j , for which the requirements of
lP
FPC products, i.e., FPC ( Bˆ j X ij ) L FPC ( Bˆ j X ij )C FPC ( Bˆ j X ij )U are satisfied. Using the notation bˆ j , the relation X ijLbˆ j X ijC bˆ j X ijU bˆj holds, implying that X ijL X ijC bˆ j bˆ j 1 and X ijC X ijU bˆ j bˆ j 1 based on P3. In addition, regardless of whether the parameters are crisp, the
urn a
relations X ijLbˆUj X ijC bˆCj X ijU bˆ jL and bˆ jL bˆCj bˆUj should be considered as the restrictions in the mathematical programming to ensure the existence of an FPC operator, as stated in P1. Instead, if Eq. (7) is applied for X ij , Bˆ j 0 , the reduction of unnecessary information in terms of the spread of the fuzzy parameter is possibly limited. For example, the extreme case that produces crisp
Jo
parameters does not occur, since the relation X ijU bˆj X ijC bˆj X ijLbˆj violates the basic requirements of a fuzzy number since bˆ j 0 . Furthermore, the application of Eq. (7) for X ij , Bˆ j 0 may obtain a wider spread of the fuzzy parameter, compared to that obtained using Eq. (8), so that the unnecessary information is retained. Similarly, the appropriate FPC ( Bˆ j X ij ) can be determined
15
Journal Pre-proof based on the other combinations of the value signs of X ij and Bˆ j . The selection of
FPC ( Bˆ j X ij ) based on the four combinations is summarized by the following four rules: R1. If X ij , Bˆ j 0 , apply Eq. (8): FPC ( Bˆ j X ij ) ( X ijLbˆUj , X ijC bˆCj , X ijU bˆjL )
pro of
R2. If X ij , Bˆ j 0 , apply Eq. (7): FPC ( Bˆ j X ij ) ( X ijU bˆjL , X ijC bˆCj , X ijLbˆUj ) R3. If X ij 0 and Bˆ j 0 , apply Eq. (10): FPC ( Bˆ j X ij ) ( X ijU bˆUj , X ijC bˆCj , X ijLbˆjL ) R4. If X ij 0 and Bˆ j 0 , apply Eq. (9): FPC ( Bˆ j X ij ) ( X ijLbˆ jL , X ijC bˆCj , X ijU bˆUj ) For datasets that contain observations with fuzzy response and crisp explanatory variables, P4 shows that the FPC operator becomes the usual fuzzy multiplication operator, i.e., FPC ( A B) =
re-
A B . If the crisp explanatory variable and its approximated parameter are positive and R1 is adopted, the estimated parameter should be only crisp, since the relation bˆUj xij bˆCj xij bˆjL xij in R1
lP
will make the equivalence of bˆjL bˆCj bˆUj , resulting in unreasonably formulated fuzzy regression models. As such, rule R1 for this case should be modified to adopt Eq. (7) as FPC ( Bˆ j X ij )
urn a
( X ij bˆ jL , X ij bˆCj , X ij bˆUj ) , where the relation X ij bˆ jL X ij bˆCj X ij bˆUj implies that bˆ jL bˆCj X ij X ij 1 and bˆCj bˆUj X ij X ij 1 based on P3. Similarly, the determination of FPC ( Bˆ j X ij ) for the other cases with crisp explanatory variables can be obtained as shown by the following rules: R5. If X ij , Bˆ j 0 , apply Eq. (7): FPC ( Bˆ j X ij ) ( X ij bˆ jL , X ij bˆCj , X ij bˆUj )
Jo
R6. If X ij , Bˆ j 0 , apply Eq. (8): FPC ( Bˆ j X ij ) ( X ij bˆUj , X ij bˆCj , X ij bˆjL ) R7. If X ij 0 and Bˆ j 0 , apply Eq. (9): FPC ( Bˆ j X ij ) ( X ij bˆ jL , X ij bˆCj , X ij bˆUj ) R8. If X ij 0 and Bˆ j 0 , apply Eq. (10): FPC ( Bˆ j X ij ) ( X ij bˆUj , X ij bˆCj , X ij bˆjL ) Following the above rules, the equation of the prediction variable in Eq. (16) can be expressed in terms of the aggregated membership functions of FPC ( Bˆ j X ij ) or FPC ( Bˆ j X ij ) . Without 16
Journal Pre-proof loss of generalization and for simplification, we will only use the notation FPC ( Bˆ j X ij ) hereafter, since a crisp value can be considered as a degenerated fuzzy number. However, to apply the correct rules to formulate the mathematical programming model, the signs of the explanatory variables and their corresponding parameters should be known. Unfortunately, the sign of the TFN
pro of
parameters cannot be determined in advance. To deal with this problem, this study sets up two alternative TFN variables with different signs, i.e., Bˆ j1 0 and Bˆ j 2 0 , for each parameter. Based
on
lemma
1,
if
the
fuzzy
regression
p
Yˆi FPC ( Bˆ j1 X ij ) FPC ( Bˆ j 2 X ij )
subject
j 0
to
model
Bˆ j1 Bˆ j 2 0
has in
the
the
formulation mathematical
re-
programming problem, then one alternative TFN variable will be obtained as the optimal parameter for the corresponding explanatory variable, and the other will be zero. The mathematical programming problem can then be constructed to determine the optimal
lP
fuzzy parameters by setting up two alternative variables, B j1 and B j 2 , j = 0, 1, …, p, for each parameter. For this task, this study adopts the absolute distance between response variables, Yi , of
urn a
each observation and the prediction variable, Yˆi , in the fuzzy regression model as the performance criterion [43]. The absolute distance is defined as: Di
1 m (Yˆi )Lk (Yi )Lk (Yˆi )U k (Yi )U k , 2m k 1
i 1, 2,
,n
(17)
where [(Yˆi )Lk , (Yˆi )Uk ] and [(Yi )Lk ,(Yii )Uk ] are the intervals of the kth -cut in Yˆi and Yi ,
Jo
respectively. By setting up LAD as the objective, the mathematical programming problem can be to minimize the overall absolute distance for all observations as follows: n
n
min Di i 1
i 1
1 m (Yˆi )Lk (Yi )Lk (Yˆi )U k (Yi )U k 2m k 1
(18)
p
ˆ ˆ ˆ s.t. Yi FPC ( B j1 X ij ) FPC ( B j 2 X ij )
(19)
j 0
17
Journal Pre-proof FPC ( Bˆ js X ij ) L FPC ( Bˆ js X ij )C FPC ( Bˆ js X ij )U , i, j , s
(20)
bˆjsL bˆCjs bˆUjs , j , s
(21)
Bˆ j1 Bˆ j 2 0 , j
(22)
i 1, 2,
, n, j 1, 2,
, p, s 1, 2
pro of
where FPC ( Bˆ j1 X ij ) can be determined by R2 or R3, and FPC ( Bˆ j 2 X ij ) by R1 or R4. Note that the restrictions of Eqs. (20) and (21) ensure that the determined FPC ( Bˆ js X ij ) exists, as stated in P1, and Bˆ js meets the basic requirements of a fuzzy number, even though some estimated parameters are crisp values. Eq. (22) restricts that one of the alternative parameters must be zero.
re-
The above model can be further transformed into a linear programming model for computational efficiency. We define the positive deviation variables, i1 k = max{[(Yi )Lk (Yˆi )Lk ], 0}
i1
k
max{[(Yi )Uk (Yˆi )Uk ],0} ,
=
and
the
negative
deviation
lP
and
variables,
i2
k
=
max{[(Yˆi )Lk (Yi )Lk ], 0} and i2 k = max{[(Yˆi )Uk (Yi )Uk ], 0} . Then, (Yi )Lk (Yˆi )Lk i1k i2k and
urn a
(Yi )Uk (Yˆi )Uk i1k i2k , indicating that
| (Yˆi )Lk (Yi )Lk | i1k i2k
and
| (Yˆi )U k (Yi )U k | =
i1 i2 . The model formulation then becomes: k
k
n
n
i 1
i 1
min Di
1 m 1 ik i2k i1k i2k 2m k 1
p
(23)
p
s.t. (Yi )Lk FPC ( B j1 X ij )Lk FPC ( B j 2 X ij )Lk i1k i2k
Jo
j 0
(24)
j 0
p
p
j 0
j 0
(Yi )Uk FPC ( B j1 X ij )Uk FPC ( B j 2 X ij )Uk i1k i2k
(25)
FPC( B js X ij )Lk FPC( B js X ij )Ck FPC( B js X ij )Uk
(26)
bˆjsL bˆCjs bˆUjs
(27)
18
Journal Pre-proof Bˆ j1 Bˆ j 2 0
(28)
i1 0, i2 0, i1 0, i2 0 k
k
i 1, 2,
k
, n, j 1, 2,
(29)
k
, p, k 1, 2,
, m, s 1, 2
where the corresponding FPC ( Bˆ js X ij ) adopts one rule from R1 to R4, if the explanatory
pro of
variable X ij is fuzzy; or, one rule from R5 to R8 is chosen, if X ij is crisp.
The above linear programming model can be easily solved using commercial software, such as LINGO [45]. In addition, the use of only three or four -cuts in the model is sufficient when the
re-
membership functions of fuzzy numbers are triangular [43].
5. Illustrative examples
In this section, some examples are used to illustrate the proposed approach and compare with
lP
existing studies. The first example includes one crisp explanatory variable and one fuzzy response. The explanatory and response variables are both fuzzy in examples 2 to 4. The performance of the proposed model (denoted as CN) is compared to those of the models of Tanaka et al. [2] (TKB),
urn a
Diamond [46] (DM), Kao and Chyu [8] (KC), Chen and Hsueh [43] (CH-MP), Chen and Hsueh [18] (CH-LSE), Wu [1] (WU), and Kelkinnama and Taheri [36] (KT) in terms of similarity, distance, and credibility. DM, CH-MP, CH-LSE, and KT attempt to minimize distance, as done by CN. TKB attempts to find the best explanatory or predicted fuzzy regression model under a subjective
Jo
confidence level and KC formulates the model with the minimum difference of membership degrees between observed and predicted fuzzy responses. Example 1: This example adopts a dataset designed by Tanaka et al. [2]. It has been used to illustrate the performance of fuzzy regression models [2, 8, 43, 46]. This dataset includes five fuzzy observations with one crisp explanatory variable and one fuzzy response (see Table 1). Using this dataset, Chen and Hsueh [43] verified their model and showed its advantages. Since the explanatory
19
Journal Pre-proof variable is crisp and positive in this example, R5 and R7 are adopted for the formulation. Using two
-cuts, the proposed model is formulated as: YˆCN (5.1,6.75,8.4) (1.1,1.25,1.4) X
(30)
Referring to Table 2, the proposed model (CN) and CH-MP have the minimum distance based
pro of
on the distance measure. For the similarity and credibility measures, CN has the best performance among the five approaches. Although the proposed model was formulated considering that explanatory and response variables and parameters are fuzzy, it still outperforms the other methods when the explanatory variable is crisp.
re-
lP
Obs. 1 2 3 4 5
Table 1. Dataset for example 1 X Y Yˆ 1 (6.2, 8, 9.8) (6.2, 8, 9.8) 2 (4.2, 6.4, 8.6) (7.3, 9.25, 11.2) 3 (6.9, 9.5, 12.1) (8.4, 10.5, 12.6) 4 (10.9, 13.5, 16.1) (9.5, 11.75, 14) 5 (10.6, 13, 15.4) (10.6, 13, 15.4)
urn a
Table 2. Comparison of performance of five approaches for example 1 Criterion TKB DM KC CH-MP CN Mean similarity 2.7181 2.7548 2.9123 3.0409 3.1264 Distance 12.3125 6.1125 6.0125 5.6000 5.6000 Credibility 0.4190 0.7764 0.7618 0.7894 0.8957
Example 2: The dataset, proposed by Sakawa and Yano [5], in this example contains eight fuzzy observations with a fuzzy response variable and one fuzzy explanatory variable (see Table 3). To determine the FPC product, the explanatory variable is positive FN; R1 and R3 are thus adopted for
Jo
two alternative TFN parameters. By applying two -cuts in Eq. (17), the proposed model is:
YˆCN (3.6667,3.9994, 4.2222) (0.444 X L ,0.444 X C ,0.444 X U )
(31)
It is noted that since the estimated slope parameter in the above formulation is a crisp value, 0.444, the FPC operation of X and the parameter degenerates to the multiplication of a fuzzy number and a crisp one. As shown in Table 4, the proposed model has a lower distance than those of
20
Journal Pre-proof DM and CH-MP, which use distance as the criterion. For the similarity measure, the TKB model has the best performance, but its performance in terms of distance is much worse than those of the other models. The proposed model has the best performance in terms of distance and credibility measures
pro of
and is second best in terms of the similarity measure.
Table 3. Dataset for example 2
Yˆ
X
Y
(1.5, 2, 2.5) (3, 3.5, 4) (4.5, 5.5, 6.5) (6.5, 7, 7.5) (8, 8.5, 9) (9.5, 10.5, 11.5) (10.5, 11, 11.5) (12, 12.5, 13)
(3.5, 4, 4.5) (5, 5.5, 6) (6.5, 7.5, 8.5) (6, 6.5, 7) (8, 8.5, 9) (7, 8, 9) (10, 10.5, 11) (9, 9.5, 10)
(4.33, 4.83, 5.33) (5.00, 5.50, 6.00) (5.67, 6.39, 7.11) (6.56, 7.06, 7.56) (7.22, 7.72, 8.22) (7.89, 8.61, 9.33) (8.33, 8.83, 9.33) (9.00, 9.50, 10.00)
re-
Obs 1 2 3 4 5 6 7 8
Table 4. Comparison of performance of five approaches for example 2 TKB 3.1051 9.8203 1.5219
DM 1.9276 5.8235 2.8557
lP
Criterion Mean similarity Distance Credibility
KC 1.7829 5.8165 3.0246
CH-MP 2.8080 5.564 0.5145
CN 2.8168 5.5556 3.5209
Example 3: In this example, a dataset consisting of 15 fuzzy observations is adopted (see Table 5).
urn a
Using this dataset, Wu [1] applied the extension principle to formulate a fuzzy regression model. Similarly, Chen and Hsueh [18, 43] adopted this dataset to establish two different fuzzy regression models by applying mathematical programming (CH-MP) and least-squares errors (CH-LSE) based on the distance measure. For the proposed model, as all explanatory variables are TFNs, R1 and R3 are adopted. The fuzzy regression model obtained using the FPC operator is:
Jo
YˆCN (5.3583) (0.5451X1L ,0.4902 X1C ,0.4397 X1U ) (0.0116 X 2L ,0.0089 X 2C ,0.0089 X 2U ) (32) A performance comparison of WU, CH-MP, CH-LSE, and CN is shown in Table 6. Although Chen and Hsueh [18, 43] proposed two models with crisp parameters to reduce the unnecessary spread of the predicted fuzzy response variable, the fuzzy regression model obtained using the proposed FPC operator has the lowest distance among the four models. That is, based on the FPC
21
Journal Pre-proof operator, even though a fuzzy regression model is formulated with fuzzy explanatory and response variables and parameters, the resulting model outperforms other models in terms of the distance
(151, 274, 322) (101, 180, 291) (221, 375, 539) (128, 205, 313) (62, 86, 112) (132, 265, 362) (66, 98, 152) (151, 330, 463) (115, 195, 291) (35, 53, 71) (307, 430, 584) (284, 372, 498) (121, 236, 370) (103, 157, 211) (216, 370, 516)
(1432, 2450, 3461) (2448, 3254, 4463) (2592, 3802, 5116) (1414, 2838, 3252) (1024, 2347, 3766) (2136, 3782, 5091) (1687, 3008, 4325) (1524, 2450, 3864) (1216, 2137, 3161) (1432, 2560, 3782) (2592, 4020, 5562) (2792, 4427, 6163) (1734, 2660, 4094) (1426, 2088, 3312) (1785, 2605, 4042)
(111, 162, 194) (88, 120, 161) (161, 223, 288) (83, 131, 194) (51, 67, 83) (124, 169, 213) (62, 81, 102) (138, 192, 241) (82, 116, 159) (41, 55, 71) (168, 252, 367) (178, 232, 346) (111, 144, 198) (78, 103, 148) (167, 212, 267)
re-
X1
Table 5. Dataset for example 3 X2 Y
lP
Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
pro of
measure.
Yˆ
(104.24, 161.53, 177.82) (88.73, 122.62, 173.13) (155.81, 223.10, 288.00) (91.49, 131.17, 172.00) (51.00, 68.46, 88.21) (102.02, 169.00, 209.95) (60.85, 80.24, 110.79) (105.30, 188.97, 243.41) (82.11, 120.01, 161.51) (41.00, 54.18, 70.33) (202.69, 252.00, 311.77) (192.47, 227.20, 279.32) (91.37, 144.77, 204.57) (78.00, 100.95, 127.69) (143.75, 209.96, 268.30)
Table 6. Comparison of performance of four approaches for example 3 WU 12.2013 151.7525 0.1529
urn a
Criterion Mean similarity Distance Credibility
CH-MP 12.5974 118.1732 0.2401
CH-LSE 12.2649 125.4342 0.2636
CN 12.8657 109.6013 0.2079
Example 4: This example uses a dataset containing 30 observations and four fuzzy explanatory variables (see Table 7). Using this dataset, Chen and Hsueh [18] adopted the least-squares errors
Jo
(CH-LSE) of the distance as the performance criterion to construct the model. The model is:
YˆCH LSE 0.859 X1 0.207 X 2 0.134 X 3 0.108 X 4 (10.794,12.093,12.132)
(33)
Using this dataset, Kelkinnama and Taheri [36] (KT) predetermined the value signs of parameters and then adopted the shape-preserving operations to construct a model based on LAD as:
22
Journal Pre-proof YˆKT 14.526 W (0.7858, 0.8429, 0.9420) W X 1 W (0.3205, 0.1526, 0.1526) W X 2
(34)
W (0.3362, 0.1709, 0.1077) W X 3 W (0.0824, 0.0664, 0.0982) W X 4 where W and W are the arithmetic operations on fuzzy numbers based on the weakest T-norm [47].
associated rules obtained using the FPC operator is:
YˆCN (12.0053,14.2755,14.2755)
pro of
For the proposed approach with the FPC operator, the fuzzy regression model employing the
(0.8713 X 1L , 0.8544 X 1C , 0.8522 X 1U ) ( 0.1570 X 2U , 0.1570 X 2C , 0.1570 X 2L )
(35)
(0.1812 X 3U , 0.1812 X 3C , 0.1812 X 3L ) (0.0655 X 4L , 0.0611X 4C , 0.8522 X 4U )
re-
The performance comparison of CH-LSE, KT, and CN is shown in Table 8. Note that the values of the two parameters in this example are estimated to be negative. Although Chen and Hsueh [18] established a model with crisp parameters to reduce the unnecessary spread of the
lP
predicted fuzzy response variable, the parameters of X 2 and X 3 are positive in their model. The traditional statistical regression method was employed to justify the sign of the parameters by using the central values of all fuzzy observations. The values of the four parameters are 0.85, -0.17, -0.13,
urn a
and 0.076, respectively. Thus, the use of Chen and Hsueh’s model [18] may mislead a decision-maker, making them ignore the influence of the explanatory variables; the KT, and proposed models do not have this disadvantage. In addition, the objective of Kelkinnama and Taheri’s approach [36] is similar to that of the proposed method. They adopted the weakest T-norm
Jo
to reduce the spread of the predicted response variable. However, their model needs to predetermine the sign of the TFN parameters in the formulation processes. In addition, their model formulation requires a series of comparison processes, making implementation difficult. In contrast, based on the FPC operator, the proposed method provides a set of simple and generalized rules to establish fuzzy regression models with fuzzy parameters. As shown in Table 8, the fuzzy regression model obtained using the proposed FPC operator has the lowest distance and highest similarity among the four models. Even though the proposed method focuses on the reduction of unnecessary 23
Journal Pre-proof information in the model-building process for fuzzy explanatory variables and fuzzy parameters, it
(62, 71, 82) (59, 70, 83) (52, 61, 72) (35, 46, 59) (64, 73, 84) (47, 58, 71) (70, 79, 85) (58, 66, 85) (66, 75, 81) (46, 54, 63) (76, 85, 91) (56, 64, 73) (80, 85, 93) (11, 18, 31) (77, 82, 90) (9, 16, 29) (74, 79, 87) (6, 13, 26) (31, 39, 48) (69, 83, 94) (41, 49, 58) (73, 87, 98) (31, 39, 48) (63, 77, 88) (58, 70, 83) (68, 82, 92) (46, 58, 71) (36, 50, 60) (54, 66, 79) (44, 58, 68) (29, 37, 48) (70, 75, 85) (10, 18, 29) (51, 56, 66) (25, 33, 44) (66, 71, 81) (71, 82, 93) (56, 64, 76) (51, 62, 73) (369, 44, 56) (65, 76, 87) (50, 58, 70) (69, 80, 88) (65, 72, 85) (47, 58, 66) (43, 50, 63) (50, 61, 69) (46, 53, 66) (68, 78, 84) (18, 27, 42) (70, 80, 86) (250, 29, 44) (63, 73, 79) (13, 22, 37) (21, 30, 41) (20, 29, 45) (22, 31, 42) (21, 30, 46) (13, 22, 33) (12, 21, 37)
lP
(42, 50, 58) (92, 98, 100) (21, 29, 37) (70, 76, 78) (33, 41, 49) (82, 88, 90) (51, 60, 67) (53, 62, 72) (40, 49, 56) (41, 50, 60) (50, 59, 66) (51, 60, 70) (52, 61, 72) (69, 77, 83) (49, 58, 69) (67, 75, 81) (46, 55, 66) (64, 72, 78) (58, 66, 73) (42, 59, 70) (61, 69, 76) (46, 63, 74) (51, 59, 66) (36, 53, 64) (70, 74, 80) (78, 89, 94) (37, 41, 47) (46, 57, 62) (45, 49, 55) (54, 65, 70) (68, 76, 83) (65, 75, 83) (49, 57, 64) (46, 56, 64) (64, 72, 79) (61, 71, 79) (71, 78, 86) (59, 65, 71) (51, 58, 66) (39, 45, 51) (65, 72, 80) (53, 59, 65) (82, 90, 95) (82, 95, 98) (60, 68, 73) (60, 73, 76) (63, 71, 76) (63, 76, 79) (84, 92, 98) (70, 76, 85) (86, 94, 100) (72, 78, 87) (79, 87, 93) (65, 71, 80) (88, 94, 99) (42, 51, 59) (89, 95, 100) (43, 52, 60) (80, 86, 91) (34, 43, 51)
re-
X2
urn a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
X1
Table 7. Dataset for example 4 X3 X4
Jo
Obs
pro of
is also effective for crisp explanatory variables, negative parameters, or both.
Y
Yˆ
(19, 30, 39) (7, 20, 30) (14, 25, 37) (33, 45, 55) (26, 38, 46) (32, 43, 52) (23, 40, 51) (27, 38, 50) (25, 37, 49) (49, 60, 72) (49, 59, 68) (43, 54, 62) (47, 61, 64) (24, 34, 42) (29, 38, 47) (48, 64, 73) (43, 56, 63) (52, 63, 72) (50, 66, 71) (37, 49, 58) (45, 55, 67) (56, 67, 81) (43, 53, 62) (45, 54, 64) (57, 70, 77) (59, 68, 78) (55, 65, 74) (70, 75, 89) (74, 84, 91) (68, 80, 86)
(21.90, 33.02, 42.55) (7.30, 18.88, 28.61) (14.48, 25.80, 35.43) (33.53, 45.52, 55.00) (25.77, 38.00, 47.04) (31.76, 43.77, 52.72) (28.15, 40.00, 51.99) (26.26, 38.17, 50.19) (24.47, 36.44, 48.48) (47.37, 59.40, 69.40) (47.80, 59.77, 69.73) (41.82, 54.00, 64.05) (47.65, 55.85, 64.71) (24.00, 32.90, 42.04) (28.79, 37.52, 46.59) (54.10, 65.31, 74.18) (42.73, 54.34, 63.38) (51.71, 63.00, 71.90) (49.53, 59.76, 69.58) (37.56, 48.22, 58.21) (45.94, 56.30, 66.17) (56.37, 66.16, 74.49) (43.20, 53.46, 61.98) (45.00, 55.19, 63.69) (57.80, 68.46, 76.77) (59.00, 69.61, 77.91) (54.81, 65.57, 73.93) (73.29, 82.91, 90.70) (73.89, 83.49, 91.26) (68.50, 78.29, 86.15)
Table 8. Comparison of performance of three approaches for example 4 Criterion Mean similarity Distance Credibility
CH-LSE 26.0452 50.25 1.2658
24
KT 26.1807 52.9315 1.4380
CN 27.0338 36.6143 1.3092
Journal Pre-proof 6. Conclusion Fuzzy regression models are usually formulated to characterize the relation between response and explanatory variables in fuzzy environments. In the development of fuzzy regression models, some critical problems must be dealt with. One of them is that the uncertainty arising from the
pro of
model and parameters should be considered in the model due to the nature of fuzzy environments. For this purpose, the parameters of the formulated fuzzy regression models can be represented as a fuzzy number. However, for formulating fuzzy regression models using fuzzy observations with fuzzy response and explanatory variables, existing approaches usually have poor performance if the parameters of the models are fuzzy because the resulting wider spread unnecessarily increases fuzziness. To enhance explanatory power, some approaches adopt crisp parameters in the
re-
formulated models to reduce the resulting spread; however, the uncertainty of the model and parameters is ignored. Another problem with existing approaches is the determination of the sign of
lP
parameters of the formulated models. The signs of parameters and even fuzzy explanatory variables are usually assumed to be positive or known in the existing approaches; however, this is usually not the case in practical applications. Incorrect setting of the sign in the formulation processes will
urn a
make the performance of the formulated models poor. This study proposed the FPC operator for formulating fuzzy linear regression models with fuzzy parameters using fuzzy observations with fuzzy response and explanatory variables. The FPC operator is used in the mathematical programming problem for restrictions. The proposed mathematical programming problem has several advantages. The LAD is used as the objective
Jo
function to measure the actual distance between the observed and estimated responses. The use of the FPC operator can reduce the unnecessary information that results from fuzzy observations in the formulation process, increasing model performance in terms of mean similarity, distance, and credibility measures. More importantly, the uncertainty of the model and parameters is retained. In addition, without the prior assumptions, the sign of parameters can be determined in the formulation processes so that the formulated models can accurately reflect the influence of fuzzy parameters on 25
Journal Pre-proof the estimated fuzzy response. Comparisons with existing approaches show that the performance of the proposed approach is excellent in terms of the three above-mentioned measures, even when crisp explanatory variables are used. To use the proposed approach, some requirements, such as P1, must be met. To deal with this problem, restrictions were added to the mathematical programming model. In future research, a more generalized operator could be developed to deal with more
pro of
common cases.
Acknowledgement
This work was funded in part by Contract MOST 107-2410-H-006-041-MY2 from the Ministry
re-
of Science and Technology, Republic of China.
lP
References
urn a
[1] H.C. Wu, Linear regression analysis for fuzzy input and output data using the extension principle, Computers & Mathematics with Applications, 45 (2003) 1849-1859. [2] H. Tanaka, S. Uejima, K. Asai, Linear-Regression Analysis with Fuzzy Model, Ieee T Syst Man Cyb, 12 (1982) 903-907. [3] H. Tanaka, Fuzzy Data-Analysis by Possibilistic Linear-Models, Fuzzy Sets and Systems, 24 (1987) 363-375. [4] H. Tanaka, J. Watada, Possibilistic Linear-Systems and Their Application to the
Jo
Linear-Regression Model, Fuzzy Sets and Systems, 27 (1988) 275-289. [5] M. Sakawa, H. Yano, Multiobjective Fuzzy Linear-Regression Analysis for Fuzzy Input Output Data, Fuzzy Sets and Systems, 47 (1992) 173-181. [6] M. Hojati, C.R. Bector, K. Smimou, A simple method for computation of fuzzy linear regression, European Journal of Operational Research, 166 (2005) 172-184. [7] D.T. Redden, W.H. Woodall, Properties of Certain Fuzzy Linear-Regression Methods, Fuzzy Sets and Systems, 64 (1994) 361-375. [8] C. Kao, C.L. Chyu, A fuzzy linear regression model with better explanatory power, Fuzzy Sets and Systems, 126 (2002) 401-409. [9] B.J. Kim, R.R. Bishu, Evaluation of fuzzy linear regression models by comparing membership functions, Fuzzy Sets and Systems, 100 (1998) 343-352. [10] F.N. Chen, Y.Z. Chen, J. Zhou, Y.Y. Liu, Optimizing h value for fuzzy linear regression with asymmetric triangular fuzzy coefficients, Engineering Applications of Artificial Intelligence, 47 26
Journal Pre-proof (2016) 16-24.
pro of
[11] H. Hoglund, Fuzzy linear regression-based detection of earnings management, Expert Systems with Applications, 40 (2013) 6166-6172. [12] T. Hong, P. Wang, Fuzzy interaction regression for short term load forecasting, Fuzzy Optimization and Decision Making, 13 (2013) 91-103. [13] H.M. Jiang, C.K. Kwong, W.H. Ip, Z.Q. Chen, Chaos-Based Fuzzy Regression Approach to Modeling Customer Satisfaction for Product Design, IEEE Transactions on Fuzzy Systems, 21 (2013) 926-936. [14] S. Muzzioli, A. Ruggieri, B. De Baets, A comparison of fuzzy regression methods for the estimation of the implied volatility smile function, Fuzzy Sets and Systems, 266 (2015)
re-
131-143. [15] K.Y. Chan, U. Engelke, Varying Spread Fuzzy Regression for Affective Quality Estimation, IEEE Transactions on Fuzzy Systems, 25 (2017) 594-613. [16] A. Celmins, Least-Squares Model-Fitting to Fuzzy Vector Data, Fuzzy Sets and Systems, 22 (1987) 245-269. [17] H.K. Kim, J.H. Yoon, Y. Li, Asymptotic properties of least squares estimation with fuzzy observations, Information Sciences, 178 (2008) 439-451. [18] L.H. Chen, C.C. Hsueh, Fuzzy Regression Models Using the Least-Squares Method Based on
urn a
lP
the Concept of Distance, IEEE Transactions on Fuzzy Systems, 17 (2009) 1259-1272. [19] F. Torkian, M. Arefi, M.G. Akbari, Multivariate Least Squares Regression using IntervalValued Fuzzy Data and based on Extended Yao- Wu Signed Distance, International Journal of Computational Intelligence Systems, 7 (2014) 172-185. [20] R. Coppi, P. D'Urso, P. Giordani, A. Santoro, Least squares estimation of a linear regression model with LR fuzzy response, Computational Statistics & Data Analysis, 51 (2006) 267-286. [21] P. D’Urso, A. Santoro, Goodness of fit and variable selection in the fuzzy multiple linear regression, Fuzzy Sets and Systems, 157 (2006) 2627-2647. [22] E.N. Nasibov, Fuzzy least squares regression model based of weighted distance between fuzzy
Jo
numbers, Automatic Control and Computer Sciences, 41 (2007) 10-17. [23] M.B. Ferraro, R. Coppi, G.G. Rodriguez, A. Colubi, A linear regression model for imprecise response, International Journal of Approximate Reasoning, 51 (2010) 759-770. [24] K.Y. Chan, H.K. Lam, T.S. Dillon, S.H. Ling, A Stepwise-Based Fuzzy Regression Procedure for Developing Customer Preference Models in New Product Development, IEEE Transactions on Fuzzy Systems, 23 (2015) 1728-1745. [25] A.R. Arabpour, M. Tata, Estimating the parameters of a fuzzy linear regression model, Iranian Journal of Fuzzy Systems, 5 (2008) 1-19. [26] Y.Y. Hsu, H.K. Liu, B.L. Wu, On the optimization methods for fully fuzzy regression models, International Journal of Intelligent Technologies and Applied Statistics, 3 (2010) 45-55. [27] P.T. Chang, E.S. Lee, Fuzzy Least Absolute Deviations Regression and the Conflicting Trends in Fuzzy Parameters, Computers & Mathematics with Applications, 28 (1994) 89-101. [28] W. Stahel, S. Weisberg, Directions in robust statistics and diagnostics, Springer Science and 27
Journal Pre-proof Business Media, 2012.
pro of
[29] J.H. Li, W.Y. Zeng, J.J. Xie, Q. Yin, A new fuzzy regression model based on least absolute deviation, Engineering Applications of Artificial Intelligence, 52 (2016) 54-64. [30] J. Chachi, S.M. Taheri, A least-absolutes approach to multiple fuzzy regression, in: Bulletin of the ISI 58th world statistics congress of the International Statistical Institute, Dublin, Ireland, 2011, pp. 1-6. [31] J. Chachi, S.M. Taheri, H.R. Pazhand, M. Geotechnical, An interval-based approach to fuzzy regression for fuzzy input-output data, in: Fuzzy Systems (FUZZ), 2011 IEEE International Conference on. IEEE, IEEE, Taipei, Taiwan, 2011, pp. 2859-2863. [32] S.M. Taheri, M. Kelkinnama, Fuzzy Linear Regression Based on Least Absolutes Deviations,
re-
Iranian Journal of Fuzzy Systems, 9 (2012) 121-140. [33] W.Y. Zeng, Q.L. Feng, J.H. Li, Fuzzy least absolute linear regression, Applied Soft Computing, 52 (2017) 1009-1019. [34] L.H. Chen, W.C. Ko, F.T. Yeh, Approach based on fuzzy goal programing and quality function deployment for new product planning, European Journal of Operational Research, 259 (2017) 654-663. [35] H. Torabi, J. Behboodian, Fuzzy least-absolutes estimates in linear models, Commun Stat-Theor M, 36 (2007) 1935-1944.
urn a
lP
[36] M. Kelkinnama, S.M. Taheri, Fuzzy least-absolutes regression using shape preserving operations, Information Sciences, 214 (2012) 105-120. [37] J. Neter, M.H. Kutner, C.J. Nachtsheim, W. Wasserman, Applied linear statistical models, 4 ed., Irwin, Chicago, 1996. [38] C. Kao, C.L. Chyu, Least-squares estimates in fuzzy regression analysis, European Journal of Operational Research, 148 (2003) 426-435. [39] D.H. Hong, J.K. Song, H.Y. Do, Fuzzy least-squares linear regression analysis using shape preserving operations, Information Sciences, 138 (2001) 185-193. [40] D. Dubois, H. Prade, Fuzzy sets and systems: theory and applications, Academic Press, New
Jo
York, 1980. [41] G.J. Klir, B. Yuan, Fuzzy sets and systems: theory and applications, Prentice Hall, New Jersey, 1995. [42] C.P. Pappis, N.I. Karacapilidis, A Comparative-Assessment of Measures of Similarity of Fuzzy Values, Fuzzy Sets and Systems, 56 (1993) 171-174. [43] L.H. Chen, C.C. Hsueh, A mathematical programming method for formulating a fuzzy regression model based on distance criterion, IEEE Trans Syst Man Cybern B Cybern, 37 (2007) 705-712. [44] X.L. Liu, Y.Z. Chen, A Systematic Approach to Optimizing h Value for Fuzzy Linear Regression with Symmetric Triangular Fuzzy Numbers, Mathematical Problems in Engineering, 2013 (2013) 1-9. [45] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen, LINGO The Modeling Language and 28
Journal Pre-proof Optimizer, LINDO Systems, Chicago, Illinois, 2017.
Jo
urn a
lP
re-
pro of
[46] P. Diamond, Fuzzy Least-Squares, Information Sciences, 46 (1988) 141-157. [47] D.H. Hong, H.Y. Do, Fuzzy system reliability analysis by the use of Tω (the weakest t-norm) on fuzzy number arithmetic operations, Fuzzy sets and systems, 90 (1997) 307-316.
29