A new approach to formulate fuzzy regression models

A new approach to formulate fuzzy regression models

Journal Pre-proof A new approach to formulate fuzzy regression models Liang-Hsuan Chen, Sheng-Hsing Nien PII: DOI: Reference: S1568-4946(19)30696-9 ...

1MB Sizes 1 Downloads 93 Views

Journal Pre-proof A new approach to formulate fuzzy regression models Liang-Hsuan Chen, Sheng-Hsing Nien

PII: DOI: Reference:

S1568-4946(19)30696-9 https://doi.org/10.1016/j.asoc.2019.105915 ASOC 105915

To appear in:

Applied Soft Computing Journal

Received date : 13 March 2019 Revised date : 30 October 2019 Accepted date : 1 November 2019 Please cite this article as: L.-H. Chen and S.-H. Nien, A new approach to formulate fuzzy regression models, Applied Soft Computing Journal (2019), doi: https://doi.org/10.1016/j.asoc.2019.105915. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.

Journal Pre-proof *Declaration of Interest Statement

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Jo

urn a

lP

re-

pro of

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Title page Click here to view linked References

Journal Pre-proof

A New Approach to Formulate Fuzzy Regression Models

Liang-Hsuan Chen* and Sheng-Hsing Nien

pro of

*Corresponding e-mail: [email protected]

Department of Industrial and Information Management,

Jo

urn a

lP

re-

National Cheng Kung University, Tainan 701, Taiwan, R.O.C.

*Highlights (for review)

Journal Pre-proof

Jo

urn a

lP

re-

pro of

Highlight: 1. A new approach is developed to formulate fuzzy regression models. 2. A new operator, fuzzy product core (FPC), is proposed for the formulations. 3. The approach reduces unnecessary information to increase model performance. 4. The comparisons show the better performance than the existing approaches.

*Manuscript Click here to view linked References

Journal Pre-proof

A New Approach to Formulate Fuzzy Regression Models

Liang-Hsuan Chen and Sheng-Hsing Nien

Department of Industrial and Information Management,

pro of

National Cheng Kung University, Tainan, Taiwan, R.O.C.

Abstract

A fuzzy regression model is developed to construct the relationship between the response and explanatory variables in fuzzy environments. To enhance explanatory power and take into account

re-

the uncertainty of the formulated model and parameters, a new operator, called the fuzzy product core (FPC), is proposed for the formulation processes to establish fuzzy regression models with

lP

fuzzy parameters using fuzzy observations that include fuzzy response and explanatory variables. In addition, the sign of parameters can be determined in the model-building processes. Compared to existing approaches, the proposed approach reduces the amount of unnecessary or unimportant

urn a

information arising from fuzzy observations and determines the sign of parameters in the models to increase model performance. This improves the weakness of the relevant approaches in which the parameters in the models are fuzzy and must be predetermined in the formulation processes. The proposed approach outperforms existing models in terms of distance, mean similarity, and credibility measures, even when crisp explanatory variables are used.

Jo

Keywords: Fuzzy regression model, Fuzzy product core, Mathematical programming

1

Journal Pre-proof 1. Introduction Statistical regression analysis is a well-known method for formulating the relationship between the response (output) variable and some explanatory (input) variables using a set of observations based on the assumption of normal distributions. In general, the least-squares method is used to

pro of

determine the unbiased parameters in model-fitting processes. The uncertainty of the established model results from the randomness inherent in the observations. In the real world, observations might not be measured as quantitative values, but in linguistic (fuzzy) terms, such as “about 10 inches”, “approximately equal to 50 pounds”, “low”, “high”, and so on. Membership functions are defined to characterize such linguistic terms [1]. To deal with this kind of data, linear regression analysis with fuzzy models [2] was developed to perform the functions of statistical regression

re-

analysis. Fuzzy regression is different from statistical regression in that it does not follow a probability distribution. The deviations (estimation errors) are attributed to the imprecision of the

lP

observed values, the indefiniteness of the model structure, or both. The uncertainty of the model is due to the fuzziness of observations, which introduces noise into the fuzzy parameters of the model. Therefore, an important step in model-fitting processes is the adjustment of fuzzy parameters to fit

urn a

the model using the available samples.

The scope of this study focuses on the development of fuzzy regression models. In general, two types of approach are adopted to formulate optimal fuzzy regression models that minimize the estimation errors between the observed and estimated fuzzy responses. The two types attempt to minimize the total degree differences and the total distances of their membership functions,

Jo

respectively. The first type of approach was firstly proposed by Tanaka et al. [2], who built a fuzzy regression model that can explain all observations with a fuzzy response and several crisp explanatory variables under a subjective confidence level h. The parameters in their model are fuzzy and thus the uncertainty of the model can be described. Tanaka [3], Tanaka and Watada [4], Sakawa and Yano [5] and Hojati et al. [6] later improved this approach. However, these improved versions are sensitive to outliers [7], possibly producing infinite solutions, and unnecessarily widen the 2

Journal Pre-proof spread of the estimated fuzzy response when more data are included in the model [8]. Kao and Chyu [8] proposed a fuzzy regression model with crisp coefficients that uses a two-stage approach to avoid widening the spread. In their formulation processes, a mathematical programming model was built up to minimize the criterion proposed by Kim and Bishu [9], namely the non-overlapping area between the membership functions of the estimated and observed fuzzy responses. Different

pro of

from Tanaka et al.’s approach, which requires the confidence level h to be set subjectively, some studies [10] have formulated fuzzy regression models by determining the optimal confidence level h. With this type of approach, many studies [11-15] have shown the advantages of their respective fuzzy regression analyses. However, the criterion used to measure the model performance in the formulation processes may not reflect the actual difference between the observed and estimated

re-

fuzzy responses [16] when the membership functions do not overlap. As such, the formulated fuzzy regression model may not have optimal performance for the prediction purpose. The second type of approach, proposed by Celmiņš [16], for constructing fuzzy regression

lP

models is based on the least-squares method from statistical regression analysis. The objective of this approach is to minimize the sum of squared errors in terms of the distance between the observed and estimated responses. A number of studies have applied this approach for fuzzy

urn a

observations containing fuzzy response and explanatory variables to establish fuzzy regression models with crisp parameters [17-19]. Some fuzzy regression models with fuzzy parameters have been formulated using fuzzy response and crisp explanatory variables [20-24] or fuzzy response and explanatory variables [25, 26]. These models share the problems described for the first type of

Jo

approach. Chang [27] adopted the concept of least absolute deviations (LADs) for formulating models. Although the LAD estimator is more robust than the least-squares deviation estimator [28], the fuzzy regression models based on LAD produce a wider spread of the estimated responses when the explanatory and response variables and parameters are fuzzy. To avoid this, many studies have developed approaches for constructing fuzzy regression models using crisp explanatory variables [29, 30] or crisp parameters [31, 32], or constructing quadratic models [33, 34]. Few studies have

3

Journal Pre-proof constructed fuzzy regression models based on LAD with fuzzy explanatory and response variables and fuzzy parameters [35, 36]. Since probability assumptions do not hold in fuzzy regression analyses [37], it cannot be proven that the parameters are unbiased and thus fuzzy regression models should retain the uncertainty of explanatory and response variables and parameters. However, many proposed models with crisp parameters may ignore the uncertainty [18, 33, 38],

pro of

since the multiplication of fuzzy numbers may produce a wider spread, degrading model performance. Acknowledging this problem, Hong et al. [39] proposed shape-preserving operations based on the weakest T-norm for formulating fuzzy regression models with the least-squares error. Using shape-preserving operations, Kelkinnama and Taheri [36] established fuzzy least-absolutes regression models and achieved better performance than previous studies. However, the sign of the

re-

parameters in the model must be predetermined in their approach, which restricts its generalized application.

The determination of the sign of parameters in the model-building processes can influence

lP

model performance because the fuzzy multiplication operation for two fuzzy numbers with the same sign is different from that for those with opposite signs [40]. Existing approaches have this problem. For example, some approaches [35, 36] with fuzzy explanatory variables, response

urn a

variables, and parameters are derived based on the assumption that the signs of the fuzzy explanatory variables and fuzzy parameters are either positive or known. However, this may not be the case in practical applications. Sometimes, negative fuzzy parameters are obtained in formulations, or negative fuzzy explanatory variables are used in applications; however, they are

Jo

usually set as positive in the development of approaches. This may affect the performance of the established models and even produce ineffective results. Acknowledging the problems described above, the present study proposes an approach for formulating fuzzy regression models with fuzzy parameters considering observations with fuzzy explanatory and response variables. In the formulation processes, a computationally simple and generalized operator, called the fuzzy product core (FPC), is proposed. With FPC, a mathematical

4

Journal Pre-proof programming problem is built up using LAD as the objective to determine the optimal fuzzy parameters with the corresponding sign. With the established fuzzy regression models, the estimated responses do not have unnecessary spread in their membership functions, increasing model performance. The rest of this paper is organized as follows. In the following section, the concept of the FPC

pro of

operator is described and formulated based on fuzzy multiplication. Some properties of FPC are also provided. Section 3 describes the performance criteria, namely the mean similarity measure, distance measure, and credibility measure, used for comparisons. The proposed new operator is used in Section 4 to build up the mathematical programming model to deal with observations with fuzzy explanatory and response variables based on the criterion of LAD to construct a fuzzy

re-

regression model with fuzzy parameters. In Section 5, some examples are used for comparison with some existing approaches to illustrate the advantages of the proposed approach. Conclusions are

lP

given in Section 6.

2. FPC operator

urn a

Considering the uncertainty of a fuzzy regression model, the parameters of the model are fuzzy. To overcome the possible weakness of existing approaches, a new operator, called FPC, is introduced and derived in this section. Let X  {x1 , x2 ,

, xn } denote the universal set, and A and

 A ( x) be the fuzzy set of A and its membership function on X, respectively. An -cut of A is

Jo

defined as A  {x  A ( x)  } ,   [0,1] , representing a crisp interval that can be denoted by A  [ AL , AU ] . Particularly, when =0, A 0  [ AL0 , AU0 ] denotes the support of A . To develop

the new operator, the definitions of fuzzy numbers, triangular fuzzy numbers, and fuzzy arithmetic are briefly introduced as follows. Detailed information can be found elsewhere [40-42]. Definition 1. Suppose that A is a fuzzy set on the real line; then, A is called a fuzzy number if the properties of normality, convexity, and boundary are satisfied. 5

Journal Pre-proof Definition 2. A triangular fuzzy number (TFN) A can be represented by a set of triple elements as

A  (a L , aC , aU ) . The corresponding membership function  A ( x) is linearly increasing and decreasing in [a L , aC ] and [a C , aU ] , respectively, where  A (a L )= A (aU )=0 and  A ( a C )=1 . In addition, if a L  0 ( aU <0), it means that A  0 ( A  0 ).

pro of

Definition 3. Two algebraic operations, i.e., addition (  ) and multiplication (  ), of two TFNs

A  (a L , aC , aU ) and B  (b L , bC , bU ) can be expressed as: A  B   a L  b L , a C  bC , aU  bU 

(2)

re-

 a Lb L , a C bC , aU bU  , if A, B  0   A  B   aU b L , a C bC , a LbU  , if A  0, B  0   aU bU , a C bC , a Lb L  , if A, B  0 

(1)

Based on Eq. (2), for any two TFNs, A and B , the boundary of A  B can be determined by

lP

their extreme values, i.e., a L b L , a L bU , aU b L , or aU bU . The central value is determined by a C bC .

Lemma 1: Define a zero TFN as 0  (0, 0, 0) . Based on the definition of the arithmetic operators

urn a

of TFNs, the multiplication of any two TFNs with different signs, such as A  0 and B  0 , will result in zero if and only if A or B is a zero TFN. The proof is straightforward. Fuzzy multiplication is different from the multiplication of crisp values since the fuzzy product may have a great spread. This spread could lower the value of the generated information due to a

Jo

possible increase in fuzziness (or uncertainty). Furthermore, such the augmented spread from the multiplications of fuzzy numbers can make the performance of using fuzzy data in some applications reduced. The formulation of a fuzzy regression model is one such application. For fuzzy observations with fuzzy explanatory and response variables, the performance of the established fuzzy regression models with fuzzy parameters may be greatly influenced by the inclusion of unnecessary uncertainty. This has motivated the development of a new operator in this

6

Journal Pre-proof study. The core of a fuzzy number A is defined as the space where the height of membership is equal to one; it is denoted as H ( A)  x | A( x)  1, x  R [41]. If A is a TFN, the core is equal to a C , i.e., H ( A)  aC . The core of the algebraic product of two TFNs can be easily obtained from Eq.

pro of

(2) as H ( A  B)  aC bC , which is a crisp value. Since H ( A  B) produces the most likely value without any uncertainty, it is unsuitable for serving as an operator to deal with the problem mentioned above.

Suppose that A and B are two positive TFNs, i.e., A and B > 0. The fuzzy product can be represented as the composition via multiplication of all possible values of the  -cuts of A

re-

and B . The  -cuts of A  B can be expressed as ( A  B) = [ AL , AU ]  [ BL , BU ] = [ AL BL , AU BU ]. This equation can be decomposed into the union of all x  [ BL , BU ] , where x  [ AL , AU ] ,

lP

as illustrated in Figure 1. For example, let  =0 , A0  [ A0L , A0U ] , and B0  [ B0L , B0U ] . If x  x* , an interval can be obtained using x*  [ B0L , B0U ] , and the interval of ( A  B )0 is the union of x  [ B0L , B0U ] , where x  [ A0L , A0U ] , i.e., ( A  B )0 =

x A0L , A0U   

x   B0L , B0U  . As such, applying

A B    0,1

urn a

decomposition theorems [41], the multiplication of two TFNs can be expressed as:





   AL , AU    BL , BU  

  0,1



 x   BL , BU   L U  x A , A  

 

where  denotes the membership degree of A  B , and

x AL , AU   

(3)

x   BL , BU  represents the

Jo

interval of the  -cuts .

However, as described above, A  B in Eq. (2) may have a great spread that significantly increase the coverage of domain variables x at the lower membership degrees, resulting in some unnecessary fuzziness (uncertainty). To deal with this problem, this study proposes the FPC operator to preserve the core information in the multiplication of two TFNs and thus avoid unnecessary fuzziness. In order to achieve this purpose, the union operator used to determine the 7

Journal Pre-proof interval of ( A  B) is changed to the intersection operator, i.e.,

x AL , AU   

x   BL , BU  . Based on

fuzzy logic theories, we adopt the operator “AND”, instead of “OR”, to acquire the common parts from various confidence levels, i.e., different  levels, which actually represent the core information. The proposed operator can be formulated as:

  0,1

 x   BL , BU   L U  x A , A  

 

y

( A  B)0 

pro of



FPC * ( A  B ) 

x[ A0L , A0U

(4)

x   B0L , B0U 

]

aU  bU

a L  bU

A B

a C  bC

re-

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

aU  b L

0.0

aL  bL

x*

x

lP

A Figure 1. Decomposition of A  B . Figure 2 illustrates the concept of the new operator, considering the core information. The

urn a

product of FPC * ( A  B) has a smaller spread than that of the product of A  B since the former operation did not consider unnecessary information. Examining Eq. (4), FPC * ( A  B) is computed by multiplying various values of A and a fixed B with respect to each  (  [0, 1]); the product approximates the TFN (aU b L , aC bC , a LbU ) . Similarly, FPC * ( B  A) is computed by

Jo

multiplying various values of B by a fixed A for each . Based on the commutativity of multiplication, FPC * ( A  B) should be equal to FPC * ( B  A) . However, either FPC * ( A  B) or FPC * ( B  A) may be null. Figure 3 illustrates a possible condition for FPC * ( B  A) that leads to a null result due to [a Lb L , aU b L ]  [a LbU , aU bU ] . In order to satisfy the commutativity requirement, i.e., FPC * ( A  B)  FPC * ( B  A) , we define the formulation of FPC as:

8

Journal Pre-proof    FPC ( A  B)     x   BL , BU   0,1  x AL , AU    y

      x   AL , AU   0,1  x BL , BU   

FPC * ( A  B)0  x[ A0L , A0U ]

(5)

x   B0L , B0U 

aU  bU a L  bU

FPC ( A  B )

pro of

*

a C  bC

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

aU  b L

0.0

aL  bL

x*

x

y

re-

A Figure 2. Decomposition of FPC * ( A  B) .

FPC * ( B  A)0 

x   A0L , A0U   

lP

x[ B0L , B0U ]

FPC ( B  A)   *

aU  bU a L  bU a C  bC

aU  b L

urn a

aL  bL

x*

x

B Figure 3. Decomposition of FPC * ( B  A) .

Based on the above descriptions, the outcome space of multiplication is generated by x  [ BL , BU ] , where x  [ AL , AU ] , or x  [ AL , AU ] , where x  [ BL , BU ] , and is bounded by the end

Jo

points of a Lb L , a LbU , aU b L and aU bU . The multiplication of two TFNs, A  B , can be represented by an approximate TFN, which is determined by the lowest and largest values of

a Lb L , a LbU , aU b L , and aU bU . This is not done for FPC. Instead, based on Eq. (2), the approximate TFN of FPC is determined as the TFN with the central value a C bC and the narrower spreads. Specifically, the proposed FPC, FPC ( A  B) , can be defined as follows. 9

Journal Pre-proof Definition 4. Suppose that A  (a L , aC , aU ) and B  (b L , bC , bU ) are two TFNs, and o (1) , o (2) ,

o (3) , and o (4) are the values in the increasing sequence of a L b L , a L bU , aU b L , and aU bU . If the relation o(2)  a C bC  o (3) holds, the approximate TFN of the FPC can be represented as:

FPC ( A  B)  (o(2) , aC bC , o(3) )

(6)

pro of

Based on the above definition and the formulation of FPC, the following properties can be obtained:

P1: If the relation o(2)  a C bC  o (3) cannot hold, based on the formulation of FPC, Eq. (5), the resulting approximate TFN of the FPC will not meet the definition of a fuzzy number since the central value will locate outside the support (=0). In other words, FPC ( A  B) exists if and only

re-

if o(2)  a C bC  o (3) .

P2. If FPC ( A  B) exists, then FPC ( A  B)  FPC ( B  A) .

Based on the formulation of FPC in Eq. (5), this property is straightforward to prove.

They are expressed as follows:

lP

P3. Some approximate TFNs of FPC can be easily determined under the corresponding conditions.

urn a

(1) For A, B  0 or A, B  0 , if FPC ( A  B) exists, then at least one of the following will hold:

b L aC bC a L   and bC aU bU a C

(a) FPC ( A  B)  (a Lb L , aC bC , aU bU ) , iff

bC a L bU a C   and b L aC bC aU

(9)

a C bU a L bC (b) FPC ( A  B)  (a b , a b , a b ) , iff U  C and C  L a b a b

(10)

(a) FPC ( A  B)  (aU b L , aC bC , a LbU ) , iff

(7)

(b) FPC ( A  B)  (a LbU , aC bC , aU b L ) , iff

a L bC aC b L   and a C bU aU bC

(8)

Jo

(2) For ( A  0 , B  0 ) or ( A  0 , B  0 ), if FPC ( A  B) exists, then at least one of the following will hold:

U U

C C

L L

(3) FPC ( A  B)  {0} , if 0 [a L , aU ] , 0 [b L , bU ] and a C bC  0 . 10

Journal Pre-proof Proof: (1) For A , B > 0, the values of o (1) = min{a Lb L , aU b L , a LbU , aU bU } = a L b L and o (4) =

max{a Lb L , aU b L , a LbU , aU bU } = aU bU can be obtained based on the above definition. Therefore, if FPC ( A  B) exists, it must be composed of a L bU , aU b L , and the central point

pro of

a C bC . If FPC ( A  B)  (aU b L , aC bC , a LbU ) , then aU b L  a C bC  a LbU , where the relations aU b L  a C bC and a C bC  a LbU can be expressed as b L bC  aC aU and bC bU  a L aC , respectively. Similarly, if FPC ( A  B)  (a LbU , aC bC , aU b L ) , the relation a LbU  a C bC  aU b L will hold, implying that

a L aC  bC bU and aC aU  b L bC . For A , B < 0, since o (1) =

aU bU and o (4) = a L b L are determined, the other parts of the proof are the same as described

re-

above.

(2) Similar to (1), the condition of A  0 , B  0 implies that o (4) = aU b L and o (1) = a L bU . Thus,

lP

if FPC ( A  B) exists, it will be composed of a L b L , aU bU , and the central point a C bC . Considering that FPC ( A  B)  (a Lb L , aC bC , aU bU ) holds, bC b L  a L aC

and bU bC 

a C aU hold. Otherwise, the equation FPC ( A  B)  (aU bU , aC bC , a Lb L ) would mean that the

urn a

relations aC aU  bU bC and a L aC  bC b L are satisfied. (3) If 0 [a L , aU ] and 0  [b L , bU ] , then hold, leading to

x AL , AU   

limL

x 0; x[ A , A ]

x   BL , BU  =

U

x  [ BL , BU ] =

x BL , BU   

limL

x 0; x[ B , B ] U

x  [ AL , AU ] =0 will

x   AL , AU  = 0, respectively. By P1, if

Jo

a C bC  0 , then FPC ( A  B)  {0} ; otherwise, FPC ( A  B) does not exist. P4. If A is a crisp value, then FPC ( A  B)  A  B  ( Ab L , AbC , AbU ) . Since A is a crisp value, one can set a L  a C  aU  A . Then, it can be concluded that

Ab L  Ab L  AbC  AbU  AbU . Based on Definition 4, o (1)  o (2) and o (3)  o (4) will hold, implying that

A  B  (o(1) , aC bC , o(4) )  (o(2) , aC bC , o(3) )  FPC ( A  B)

holds. This property

shows that if one of the fuzzy numbers is crisp, FPC becomes the usual fuzzy multiplication 11

Journal Pre-proof operator. P5. If FPC ( A  B) exists, then H  FPC ( A  B )   H ( A  B ) . Based on Definition 4, A  B  (o(1) , aC bC , o(4) ) and FPC ( A  B)  (o(2) , aC bC , o(3) ) . Since they are TFNs, their cores are the same as the central point, i.e., a C bC .

Proof:

pro of

P6. For any   R , FPC (  A  B)    FPC ( A  B) .

Considering Definition 4, if   0 , then the relation  o(1)   o (2)   a C bC   o (3)   o (4) holds, leading to FPC (  A  B)    FPC ( A  B)  (o(2) ,  aC bC , o(3) ) . If   0 , the relation  o(1)

  o (2)   a C bC   o (3)   o (4)

will

make

FPC (  A  B)  (o(3) ,  aC bC , o(2) )

and

re-

  FPC ( A  B)    ( o(2) , a C bC ,  o (3) ) = (o(3) ,  aC bC , o(2) ) hold. As such, FPC (  A  B) =   FPC ( A  B) holds.

lP

Based on the above definitions, formulation, and properties, the proposed FPC operator can produce a fuzzy number in which only main information is included. This study uses FPC to

urn a

formulate fuzzy regression models.

3. Performance measures

In order to illustrate the performance of the proposed mathematical programming model in constructing fuzzy regression using FPC, comparisons with existing approaches are made. For this

below.

Jo

purpose, three kinds of commonly used performance measure are adopted in this study, as described

(1) Mean similarity measure This measure, proposed by Pappis and Nikos [42], is used to determine the similarity of two fuzzy numbers based on the area concept. This study adopts this measure to determine the similarity degree between the response variables from observations and the formulated fuzzy regression

12

Journal Pre-proof model as the performance criterion of the model. A larger value indicates better model performance. The measure is expressed as:

  yi ( x)   yˆi ( x) dx      S   Si   1     ( x ) dx   ( x ) dx i 1 i 1  yˆi   yi  n

n

(11)

where  yi ( x) and  yˆ ( x) are the membership degrees of the observed and predicted response

pro of

i

variables, respectively. (2) Distance measure

This measure is designed to evaluate the distance between the response variables from observations and the formulated fuzzy regression model in terms of the Hamming distance [43]. By

re-

measuring the distances via several -cuts for all observations, the total estimation errors can be obtained. A smaller sum of distances indicates smaller total estimation errors, and thus higher prediction performance. A good fuzzy regression model will produce a smaller total distance. This

n

n

i 1

i 1

D   Di  



lP

measure is formulated as:

1 m (Yˆi )Lk  (Yi )Lk  (Yˆi )U k  (Yi )U k  2m k 1



(12)

urn a

where  k  (k  1) (m  1) . This study uses this measure as a performance criterion for establishing fuzzy regression models in the formulation processes. (3) Credibility measure

The credibility measure [44] was proposed as a performance criterion in formulation processes to find the optimal h level based on Tanaka et al.’s model [2]. This measure considers a crisp

Jo

observed response variable and a fuzzy predicted response, and is represented as:

MFCi* 

Y (Yi )

(13)

(Yˆi )U  (Yˆi ) L

where Y (Yi ) is the membership degree of observed response Yi . In order to obtain the credibility of two fuzzy numbers, Eq. (13) can be modified by replacing the membership degree of Yi with

13

Journal Pre-proof the highest membership degree at the intersection of Yi and Yˆi , formulated as: n

n

i 1

i 1

MPF   MFCi  



max Yˆ  Y



(14)

(Yˆi )U  (Yˆi ) L

The above three measures are used to compare the performance of the proposed model with existing

pro of

approaches via illustrative examples.

4. Fuzzy regression modeling based on FPC

To retain the uncertainty of fuzzy regression models, this study formulates fuzzy regression models with fuzzy parameters using observations with fuzzy explanatory and response variables.

re-

The proposed FPC operator is adopted in the formulation processes, since it focuses on the core information. With the use of the proposed operator, the estimated responses will not have

lP

unnecessary spread in their membership functions, increasing model performance. Consider the fuzzy observation set (Yi , X i1 , X i 2 ,

, X ij ,

, X ip ) , where Yi  ( yiL , yiC , yiU ) is the

response variable and X ij  ( xijL , xijC , xijU ) represents the explanatory variables, all of which are

urn a

represented as TFNs. The general predicted fuzzy regression model can be expressed as:

Yˆi  Bˆ0  Bˆ1  X i1  Bˆ2  X i 2 

p

 Bˆ p  X ip   Bˆ j  X ij

(15)

j 0

where Yˆi  ( yˆiL , yˆiC , yˆiU ) is the predicted fuzzy response variable, Bˆ j  (bˆ jL , bˆCj , bˆUj ) is the

Jo

estimated fuzzy parameter, and X i 0  (1,1,1) is specified. To enhance the performance of fuzzy regression models, this study uses the proposed FPC operator in the formulation processes. Assume that the FPC product for each explanatory variable and its corresponding parameter exists. The predicted fuzzy regression model with the FPC operator can be formulated as:

Yˆi  Bˆ0  FPC ( Bˆ1  X i1 )  FPC ( Bˆ2  X i 2 ) 

p

 FPC ( Bˆ p  X ip )   FPC ( Bˆ j  X ij ) j 0

14

(16)

Journal Pre-proof Based on P3, various approximate TFNs of FPC ( Bˆ j  X ij ) can be adopted based on the combination of the value sign of the explanatory variable and the corresponding parameter. However, to avoid unnecessary information in the formulation processes, a specific FPC is chosen for a particular combination. For example, when X ij , Bˆ j  0 , two equations, Eq. (7) or (8) can be

pro of

applied, as stated in P3. However, Eq. (8), i.e., FPC ( Bˆ j  X ij )  ( X ijLbˆUj , X ijC bˆCj , X ijU bˆ jL ) , is adopted in this study. Such a choice aims to maximally reduce the spread of the fuzzy parameter based on the objective function of the mathematical programming model. That is, the application of Eq. (8) can have the most possibility to obtain the minimal spread of the fuzzy parameters from the mathematical model. As an extreme case, a crisp parameter can be produced for optimizing the

re-

objective function subject to the restrictions, and thus the spread of the fuzzy parameter converges to zero. If such a situation occurs, it means that bˆ jL  bˆCj  bˆUj  bˆ j , for which the requirements of

lP

FPC products, i.e., FPC ( Bˆ j  X ij ) L  FPC ( Bˆ j  X ij )C  FPC ( Bˆ j  X ij )U are satisfied. Using the notation bˆ j , the relation X ijLbˆ j  X ijC bˆ j  X ijU bˆj holds, implying that X ijL X ijC  bˆ j bˆ j  1 and X ijC X ijU  bˆ j bˆ j  1 based on P3. In addition, regardless of whether the parameters are crisp, the

urn a

relations X ijLbˆUj  X ijC bˆCj  X ijU bˆ jL and bˆ jL  bˆCj  bˆUj should be considered as the restrictions in the mathematical programming to ensure the existence of an FPC operator, as stated in P1. Instead, if Eq. (7) is applied for X ij , Bˆ j  0 , the reduction of unnecessary information in terms of the spread of the fuzzy parameter is possibly limited. For example, the extreme case that produces crisp

Jo

parameters does not occur, since the relation X ijU bˆj  X ijC bˆj  X ijLbˆj violates the basic requirements of a fuzzy number since bˆ j  0 . Furthermore, the application of Eq. (7) for X ij , Bˆ j  0 may obtain a wider spread of the fuzzy parameter, compared to that obtained using Eq. (8), so that the unnecessary information is retained. Similarly, the appropriate FPC ( Bˆ j  X ij ) can be determined

15

Journal Pre-proof based on the other combinations of the value signs of X ij and Bˆ j . The selection of

FPC ( Bˆ j  X ij ) based on the four combinations is summarized by the following four rules: R1. If X ij , Bˆ j  0 , apply Eq. (8): FPC ( Bˆ j  X ij )  ( X ijLbˆUj , X ijC bˆCj , X ijU bˆjL )

pro of

R2. If X ij , Bˆ j  0 , apply Eq. (7): FPC ( Bˆ j  X ij )  ( X ijU bˆjL , X ijC bˆCj , X ijLbˆUj ) R3. If X ij  0 and Bˆ j  0 , apply Eq. (10): FPC ( Bˆ j  X ij )  ( X ijU bˆUj , X ijC bˆCj , X ijLbˆjL ) R4. If X ij  0 and Bˆ j  0 , apply Eq. (9): FPC ( Bˆ j  X ij )  ( X ijLbˆ jL , X ijC bˆCj , X ijU bˆUj ) For datasets that contain observations with fuzzy response and crisp explanatory variables, P4 shows that the FPC operator becomes the usual fuzzy multiplication operator, i.e., FPC ( A  B) =

re-

A  B . If the crisp explanatory variable and its approximated parameter are positive and R1 is adopted, the estimated parameter should be only crisp, since the relation bˆUj xij  bˆCj xij  bˆjL xij in R1

lP

will make the equivalence of bˆjL  bˆCj  bˆUj , resulting in unreasonably formulated fuzzy regression models. As such, rule R1 for this case should be modified to adopt Eq. (7) as FPC ( Bˆ j  X ij ) 

urn a

( X ij bˆ jL , X ij bˆCj , X ij bˆUj ) , where the relation X ij bˆ jL  X ij bˆCj  X ij bˆUj implies that bˆ jL bˆCj  X ij X ij  1 and bˆCj bˆUj  X ij X ij  1 based on P3. Similarly, the determination of FPC ( Bˆ j  X ij ) for the other cases with crisp explanatory variables can be obtained as shown by the following rules: R5. If X ij , Bˆ j  0 , apply Eq. (7): FPC ( Bˆ j  X ij )  ( X ij bˆ jL , X ij bˆCj , X ij bˆUj )

Jo

R6. If X ij , Bˆ j  0 , apply Eq. (8): FPC ( Bˆ j  X ij )  ( X ij bˆUj , X ij bˆCj , X ij bˆjL ) R7. If X ij  0 and Bˆ j  0 , apply Eq. (9): FPC ( Bˆ j  X ij )  ( X ij bˆ jL , X ij bˆCj , X ij bˆUj ) R8. If X ij  0 and Bˆ j  0 , apply Eq. (10): FPC ( Bˆ j  X ij )  ( X ij bˆUj , X ij bˆCj , X ij bˆjL ) Following the above rules, the equation of the prediction variable in Eq. (16) can be expressed in terms of the aggregated membership functions of FPC ( Bˆ j  X ij ) or FPC ( Bˆ j  X ij ) . Without 16

Journal Pre-proof loss of generalization and for simplification, we will only use the notation FPC ( Bˆ j  X ij ) hereafter, since a crisp value can be considered as a degenerated fuzzy number. However, to apply the correct rules to formulate the mathematical programming model, the signs of the explanatory variables and their corresponding parameters should be known. Unfortunately, the sign of the TFN

pro of

parameters cannot be determined in advance. To deal with this problem, this study sets up two alternative TFN variables with different signs, i.e., Bˆ j1  0 and Bˆ j 2  0 , for each parameter. Based

on

lemma

1,

if

the

fuzzy

regression

p

Yˆi   FPC ( Bˆ j1  X ij )  FPC ( Bˆ j 2  X ij )

subject

j 0

to

model

Bˆ j1  Bˆ j 2  0

has in

the

the

formulation mathematical

re-

programming problem, then one alternative TFN variable will be obtained as the optimal parameter for the corresponding explanatory variable, and the other will be zero. The mathematical programming problem can then be constructed to determine the optimal

lP

fuzzy parameters by setting up two alternative variables, B j1 and B j 2 , j = 0, 1, …, p, for each parameter. For this task, this study adopts the absolute distance between response variables, Yi , of

urn a

each observation and the prediction variable, Yˆi , in the fuzzy regression model as the performance criterion [43]. The absolute distance is defined as: Di 





1 m (Yˆi )Lk  (Yi )Lk  (Yˆi )U k  (Yi )U k ,  2m k 1

i  1, 2,

,n

(17)

where [(Yˆi )Lk , (Yˆi )Uk ] and [(Yi )Lk ,(Yii )Uk ] are the intervals of the kth  -cut in Yˆi and Yi ,

Jo

respectively. By setting up LAD as the objective, the mathematical programming problem can be to minimize the overall absolute distance for all observations as follows: n

n

min  Di   i 1

i 1



1 m (Yˆi )Lk  (Yi )Lk  (Yˆi )U k  (Yi )U k  2m k 1



(18)

p

ˆ ˆ ˆ s.t. Yi   FPC ( B j1  X ij )  FPC ( B j 2  X ij )

(19)

j 0

17

Journal Pre-proof FPC ( Bˆ js  X ij ) L  FPC ( Bˆ js  X ij )C  FPC ( Bˆ js  X ij )U , i, j , s

(20)

bˆjsL  bˆCjs  bˆUjs , j , s

(21)

Bˆ j1  Bˆ j 2  0 , j

(22)

i  1, 2,

, n, j  1, 2,

, p, s  1, 2

pro of

where FPC ( Bˆ j1  X ij ) can be determined by R2 or R3, and FPC ( Bˆ j 2  X ij ) by R1 or R4. Note that the restrictions of Eqs. (20) and (21) ensure that the determined FPC ( Bˆ js  X ij ) exists, as stated in P1, and Bˆ js meets the basic requirements of a fuzzy number, even though some estimated parameters are crisp values. Eq. (22) restricts that one of the alternative parameters must be zero.

re-

The above model can be further transformed into a linear programming model for computational efficiency. We define the positive deviation variables,  i1 k = max{[(Yi )Lk  (Yˆi )Lk ], 0}

i1

k

max{[(Yi )Uk  (Yˆi )Uk ],0} ,

=

and

the

negative

deviation

lP

and

variables,

 i2

k

=

max{[(Yˆi )Lk  (Yi )Lk ], 0} and i2 k = max{[(Yˆi )Uk  (Yi )Uk ], 0} . Then, (Yi )Lk  (Yˆi )Lk   i1k   i2k and

urn a

(Yi )Uk  (Yˆi )Uk  i1k  i2k , indicating that

| (Yˆi )Lk  (Yi )Lk |  i1k   i2k

and

| (Yˆi )U k  (Yi )U k | =

i1  i2 . The model formulation then becomes: k

k

n

n

i 1

i 1

min  Di  



1 m 1   ik   i2k  i1k  i2k 2m k 1

p



(23)

p

s.t. (Yi )Lk   FPC ( B j1  X ij )Lk   FPC ( B j 2  X ij )Lk   i1k   i2k

Jo

j 0

(24)

j 0

p

p

j 0

j 0

(Yi )Uk   FPC ( B j1  X ij )Uk   FPC ( B j 2  X ij )Uk  i1k i2k

(25)

FPC( B js  X ij )Lk  FPC( B js  X ij )Ck  FPC( B js  X ij )Uk

(26)

bˆjsL  bˆCjs  bˆUjs

(27)

18

Journal Pre-proof Bˆ j1  Bˆ j 2  0

(28)

 i1  0,  i2  0, i1  0, i2  0 k

k

i  1, 2,

k

, n, j  1, 2,

(29)

k

, p, k  1, 2,

, m, s  1, 2

where the corresponding FPC ( Bˆ js  X ij ) adopts one rule from R1 to R4, if the explanatory

pro of

variable X ij is fuzzy; or, one rule from R5 to R8 is chosen, if X ij is crisp.

The above linear programming model can be easily solved using commercial software, such as LINGO [45]. In addition, the use of only three or four  -cuts in the model is sufficient when the

re-

membership functions of fuzzy numbers are triangular [43].

5. Illustrative examples

In this section, some examples are used to illustrate the proposed approach and compare with

lP

existing studies. The first example includes one crisp explanatory variable and one fuzzy response. The explanatory and response variables are both fuzzy in examples 2 to 4. The performance of the proposed model (denoted as CN) is compared to those of the models of Tanaka et al. [2] (TKB),

urn a

Diamond [46] (DM), Kao and Chyu [8] (KC), Chen and Hsueh [43] (CH-MP), Chen and Hsueh [18] (CH-LSE), Wu [1] (WU), and Kelkinnama and Taheri [36] (KT) in terms of similarity, distance, and credibility. DM, CH-MP, CH-LSE, and KT attempt to minimize distance, as done by CN. TKB attempts to find the best explanatory or predicted fuzzy regression model under a subjective

Jo

confidence level and KC formulates the model with the minimum difference of membership degrees between observed and predicted fuzzy responses. Example 1: This example adopts a dataset designed by Tanaka et al. [2]. It has been used to illustrate the performance of fuzzy regression models [2, 8, 43, 46]. This dataset includes five fuzzy observations with one crisp explanatory variable and one fuzzy response (see Table 1). Using this dataset, Chen and Hsueh [43] verified their model and showed its advantages. Since the explanatory

19

Journal Pre-proof variable is crisp and positive in this example, R5 and R7 are adopted for the formulation. Using two

-cuts, the proposed model is formulated as: YˆCN  (5.1,6.75,8.4)  (1.1,1.25,1.4) X

(30)

Referring to Table 2, the proposed model (CN) and CH-MP have the minimum distance based

pro of

on the distance measure. For the similarity and credibility measures, CN has the best performance among the five approaches. Although the proposed model was formulated considering that explanatory and response variables and parameters are fuzzy, it still outperforms the other methods when the explanatory variable is crisp.

re-

lP

Obs. 1 2 3 4 5

Table 1. Dataset for example 1 X Y Yˆ 1 (6.2, 8, 9.8) (6.2, 8, 9.8) 2 (4.2, 6.4, 8.6) (7.3, 9.25, 11.2) 3 (6.9, 9.5, 12.1) (8.4, 10.5, 12.6) 4 (10.9, 13.5, 16.1) (9.5, 11.75, 14) 5 (10.6, 13, 15.4) (10.6, 13, 15.4)

urn a

Table 2. Comparison of performance of five approaches for example 1 Criterion TKB DM KC CH-MP CN Mean similarity 2.7181 2.7548 2.9123 3.0409 3.1264 Distance 12.3125 6.1125 6.0125 5.6000 5.6000 Credibility 0.4190 0.7764 0.7618 0.7894 0.8957

Example 2: The dataset, proposed by Sakawa and Yano [5], in this example contains eight fuzzy observations with a fuzzy response variable and one fuzzy explanatory variable (see Table 3). To determine the FPC product, the explanatory variable is positive FN; R1 and R3 are thus adopted for

Jo

two alternative TFN parameters. By applying two -cuts in Eq. (17), the proposed model is:

YˆCN  (3.6667,3.9994, 4.2222)  (0.444 X L ,0.444 X C ,0.444 X U )

(31)

It is noted that since the estimated slope parameter in the above formulation is a crisp value, 0.444, the FPC operation of X and the parameter degenerates to the multiplication of a fuzzy number and a crisp one. As shown in Table 4, the proposed model has a lower distance than those of

20

Journal Pre-proof DM and CH-MP, which use distance as the criterion. For the similarity measure, the TKB model has the best performance, but its performance in terms of distance is much worse than those of the other models. The proposed model has the best performance in terms of distance and credibility measures

pro of

and is second best in terms of the similarity measure.

Table 3. Dataset for example 2



X

Y

(1.5, 2, 2.5) (3, 3.5, 4) (4.5, 5.5, 6.5) (6.5, 7, 7.5) (8, 8.5, 9) (9.5, 10.5, 11.5) (10.5, 11, 11.5) (12, 12.5, 13)

(3.5, 4, 4.5) (5, 5.5, 6) (6.5, 7.5, 8.5) (6, 6.5, 7) (8, 8.5, 9) (7, 8, 9) (10, 10.5, 11) (9, 9.5, 10)

(4.33, 4.83, 5.33) (5.00, 5.50, 6.00) (5.67, 6.39, 7.11) (6.56, 7.06, 7.56) (7.22, 7.72, 8.22) (7.89, 8.61, 9.33) (8.33, 8.83, 9.33) (9.00, 9.50, 10.00)

re-

Obs 1 2 3 4 5 6 7 8

Table 4. Comparison of performance of five approaches for example 2 TKB 3.1051 9.8203 1.5219

DM 1.9276 5.8235 2.8557

lP

Criterion Mean similarity Distance Credibility

KC 1.7829 5.8165 3.0246

CH-MP 2.8080 5.564 0.5145

CN 2.8168 5.5556 3.5209

Example 3: In this example, a dataset consisting of 15 fuzzy observations is adopted (see Table 5).

urn a

Using this dataset, Wu [1] applied the extension principle to formulate a fuzzy regression model. Similarly, Chen and Hsueh [18, 43] adopted this dataset to establish two different fuzzy regression models by applying mathematical programming (CH-MP) and least-squares errors (CH-LSE) based on the distance measure. For the proposed model, as all explanatory variables are TFNs, R1 and R3 are adopted. The fuzzy regression model obtained using the FPC operator is:

Jo

YˆCN  (5.3583)  (0.5451X1L ,0.4902 X1C ,0.4397 X1U )  (0.0116 X 2L ,0.0089 X 2C ,0.0089 X 2U ) (32) A performance comparison of WU, CH-MP, CH-LSE, and CN is shown in Table 6. Although Chen and Hsueh [18, 43] proposed two models with crisp parameters to reduce the unnecessary spread of the predicted fuzzy response variable, the fuzzy regression model obtained using the proposed FPC operator has the lowest distance among the four models. That is, based on the FPC

21

Journal Pre-proof operator, even though a fuzzy regression model is formulated with fuzzy explanatory and response variables and parameters, the resulting model outperforms other models in terms of the distance

(151, 274, 322) (101, 180, 291) (221, 375, 539) (128, 205, 313) (62, 86, 112) (132, 265, 362) (66, 98, 152) (151, 330, 463) (115, 195, 291) (35, 53, 71) (307, 430, 584) (284, 372, 498) (121, 236, 370) (103, 157, 211) (216, 370, 516)

(1432, 2450, 3461) (2448, 3254, 4463) (2592, 3802, 5116) (1414, 2838, 3252) (1024, 2347, 3766) (2136, 3782, 5091) (1687, 3008, 4325) (1524, 2450, 3864) (1216, 2137, 3161) (1432, 2560, 3782) (2592, 4020, 5562) (2792, 4427, 6163) (1734, 2660, 4094) (1426, 2088, 3312) (1785, 2605, 4042)

(111, 162, 194) (88, 120, 161) (161, 223, 288) (83, 131, 194) (51, 67, 83) (124, 169, 213) (62, 81, 102) (138, 192, 241) (82, 116, 159) (41, 55, 71) (168, 252, 367) (178, 232, 346) (111, 144, 198) (78, 103, 148) (167, 212, 267)

re-

X1

Table 5. Dataset for example 3 X2 Y

lP

Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

pro of

measure.



(104.24, 161.53, 177.82) (88.73, 122.62, 173.13) (155.81, 223.10, 288.00) (91.49, 131.17, 172.00) (51.00, 68.46, 88.21) (102.02, 169.00, 209.95) (60.85, 80.24, 110.79) (105.30, 188.97, 243.41) (82.11, 120.01, 161.51) (41.00, 54.18, 70.33) (202.69, 252.00, 311.77) (192.47, 227.20, 279.32) (91.37, 144.77, 204.57) (78.00, 100.95, 127.69) (143.75, 209.96, 268.30)

Table 6. Comparison of performance of four approaches for example 3 WU 12.2013 151.7525 0.1529

urn a

Criterion Mean similarity Distance Credibility

CH-MP 12.5974 118.1732 0.2401

CH-LSE 12.2649 125.4342 0.2636

CN 12.8657 109.6013 0.2079

Example 4: This example uses a dataset containing 30 observations and four fuzzy explanatory variables (see Table 7). Using this dataset, Chen and Hsueh [18] adopted the least-squares errors

Jo

(CH-LSE) of the distance as the performance criterion to construct the model. The model is:

YˆCH  LSE  0.859 X1  0.207 X 2  0.134 X 3  0.108 X 4  (10.794,12.093,12.132)

(33)

Using this dataset, Kelkinnama and Taheri [36] (KT) predetermined the value signs of parameters and then adopted the shape-preserving operations to construct a model based on LAD as:

22

Journal Pre-proof YˆKT  14.526 W (0.7858, 0.8429, 0.9420) W X 1 W (0.3205, 0.1526, 0.1526) W X 2

(34)

W (0.3362, 0.1709, 0.1077) W X 3 W (0.0824, 0.0664, 0.0982) W X 4 where W and W are the arithmetic operations on fuzzy numbers based on the weakest T-norm [47].

associated rules obtained using the FPC operator is:

YˆCN  (12.0053,14.2755,14.2755)

pro of

For the proposed approach with the FPC operator, the fuzzy regression model employing the

 (0.8713 X 1L , 0.8544 X 1C , 0.8522 X 1U )  ( 0.1570 X 2U , 0.1570 X 2C , 0.1570 X 2L )

(35)

 (0.1812 X 3U , 0.1812 X 3C , 0.1812 X 3L )  (0.0655 X 4L , 0.0611X 4C , 0.8522 X 4U )

re-

The performance comparison of CH-LSE, KT, and CN is shown in Table 8. Note that the values of the two parameters in this example are estimated to be negative. Although Chen and Hsueh [18] established a model with crisp parameters to reduce the unnecessary spread of the

lP

predicted fuzzy response variable, the parameters of X 2 and X 3 are positive in their model. The traditional statistical regression method was employed to justify the sign of the parameters by using the central values of all fuzzy observations. The values of the four parameters are 0.85, -0.17, -0.13,

urn a

and 0.076, respectively. Thus, the use of Chen and Hsueh’s model [18] may mislead a decision-maker, making them ignore the influence of the explanatory variables; the KT, and proposed models do not have this disadvantage. In addition, the objective of Kelkinnama and Taheri’s approach [36] is similar to that of the proposed method. They adopted the weakest T-norm

Jo

to reduce the spread of the predicted response variable. However, their model needs to predetermine the sign of the TFN parameters in the formulation processes. In addition, their model formulation requires a series of comparison processes, making implementation difficult. In contrast, based on the FPC operator, the proposed method provides a set of simple and generalized rules to establish fuzzy regression models with fuzzy parameters. As shown in Table 8, the fuzzy regression model obtained using the proposed FPC operator has the lowest distance and highest similarity among the four models. Even though the proposed method focuses on the reduction of unnecessary 23

Journal Pre-proof information in the model-building process for fuzzy explanatory variables and fuzzy parameters, it

(62, 71, 82) (59, 70, 83) (52, 61, 72) (35, 46, 59) (64, 73, 84) (47, 58, 71) (70, 79, 85) (58, 66, 85) (66, 75, 81) (46, 54, 63) (76, 85, 91) (56, 64, 73) (80, 85, 93) (11, 18, 31) (77, 82, 90) (9, 16, 29) (74, 79, 87) (6, 13, 26) (31, 39, 48) (69, 83, 94) (41, 49, 58) (73, 87, 98) (31, 39, 48) (63, 77, 88) (58, 70, 83) (68, 82, 92) (46, 58, 71) (36, 50, 60) (54, 66, 79) (44, 58, 68) (29, 37, 48) (70, 75, 85) (10, 18, 29) (51, 56, 66) (25, 33, 44) (66, 71, 81) (71, 82, 93) (56, 64, 76) (51, 62, 73) (369, 44, 56) (65, 76, 87) (50, 58, 70) (69, 80, 88) (65, 72, 85) (47, 58, 66) (43, 50, 63) (50, 61, 69) (46, 53, 66) (68, 78, 84) (18, 27, 42) (70, 80, 86) (250, 29, 44) (63, 73, 79) (13, 22, 37) (21, 30, 41) (20, 29, 45) (22, 31, 42) (21, 30, 46) (13, 22, 33) (12, 21, 37)

lP

(42, 50, 58) (92, 98, 100) (21, 29, 37) (70, 76, 78) (33, 41, 49) (82, 88, 90) (51, 60, 67) (53, 62, 72) (40, 49, 56) (41, 50, 60) (50, 59, 66) (51, 60, 70) (52, 61, 72) (69, 77, 83) (49, 58, 69) (67, 75, 81) (46, 55, 66) (64, 72, 78) (58, 66, 73) (42, 59, 70) (61, 69, 76) (46, 63, 74) (51, 59, 66) (36, 53, 64) (70, 74, 80) (78, 89, 94) (37, 41, 47) (46, 57, 62) (45, 49, 55) (54, 65, 70) (68, 76, 83) (65, 75, 83) (49, 57, 64) (46, 56, 64) (64, 72, 79) (61, 71, 79) (71, 78, 86) (59, 65, 71) (51, 58, 66) (39, 45, 51) (65, 72, 80) (53, 59, 65) (82, 90, 95) (82, 95, 98) (60, 68, 73) (60, 73, 76) (63, 71, 76) (63, 76, 79) (84, 92, 98) (70, 76, 85) (86, 94, 100) (72, 78, 87) (79, 87, 93) (65, 71, 80) (88, 94, 99) (42, 51, 59) (89, 95, 100) (43, 52, 60) (80, 86, 91) (34, 43, 51)

re-

X2

urn a

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

X1

Table 7. Dataset for example 4 X3 X4

Jo

Obs

pro of

is also effective for crisp explanatory variables, negative parameters, or both.

Y



(19, 30, 39) (7, 20, 30) (14, 25, 37) (33, 45, 55) (26, 38, 46) (32, 43, 52) (23, 40, 51) (27, 38, 50) (25, 37, 49) (49, 60, 72) (49, 59, 68) (43, 54, 62) (47, 61, 64) (24, 34, 42) (29, 38, 47) (48, 64, 73) (43, 56, 63) (52, 63, 72) (50, 66, 71) (37, 49, 58) (45, 55, 67) (56, 67, 81) (43, 53, 62) (45, 54, 64) (57, 70, 77) (59, 68, 78) (55, 65, 74) (70, 75, 89) (74, 84, 91) (68, 80, 86)

(21.90, 33.02, 42.55) (7.30, 18.88, 28.61) (14.48, 25.80, 35.43) (33.53, 45.52, 55.00) (25.77, 38.00, 47.04) (31.76, 43.77, 52.72) (28.15, 40.00, 51.99) (26.26, 38.17, 50.19) (24.47, 36.44, 48.48) (47.37, 59.40, 69.40) (47.80, 59.77, 69.73) (41.82, 54.00, 64.05) (47.65, 55.85, 64.71) (24.00, 32.90, 42.04) (28.79, 37.52, 46.59) (54.10, 65.31, 74.18) (42.73, 54.34, 63.38) (51.71, 63.00, 71.90) (49.53, 59.76, 69.58) (37.56, 48.22, 58.21) (45.94, 56.30, 66.17) (56.37, 66.16, 74.49) (43.20, 53.46, 61.98) (45.00, 55.19, 63.69) (57.80, 68.46, 76.77) (59.00, 69.61, 77.91) (54.81, 65.57, 73.93) (73.29, 82.91, 90.70) (73.89, 83.49, 91.26) (68.50, 78.29, 86.15)

Table 8. Comparison of performance of three approaches for example 4 Criterion Mean similarity Distance Credibility

CH-LSE 26.0452 50.25 1.2658

24

KT 26.1807 52.9315 1.4380

CN 27.0338 36.6143 1.3092

Journal Pre-proof 6. Conclusion Fuzzy regression models are usually formulated to characterize the relation between response and explanatory variables in fuzzy environments. In the development of fuzzy regression models, some critical problems must be dealt with. One of them is that the uncertainty arising from the

pro of

model and parameters should be considered in the model due to the nature of fuzzy environments. For this purpose, the parameters of the formulated fuzzy regression models can be represented as a fuzzy number. However, for formulating fuzzy regression models using fuzzy observations with fuzzy response and explanatory variables, existing approaches usually have poor performance if the parameters of the models are fuzzy because the resulting wider spread unnecessarily increases fuzziness. To enhance explanatory power, some approaches adopt crisp parameters in the

re-

formulated models to reduce the resulting spread; however, the uncertainty of the model and parameters is ignored. Another problem with existing approaches is the determination of the sign of

lP

parameters of the formulated models. The signs of parameters and even fuzzy explanatory variables are usually assumed to be positive or known in the existing approaches; however, this is usually not the case in practical applications. Incorrect setting of the sign in the formulation processes will

urn a

make the performance of the formulated models poor. This study proposed the FPC operator for formulating fuzzy linear regression models with fuzzy parameters using fuzzy observations with fuzzy response and explanatory variables. The FPC operator is used in the mathematical programming problem for restrictions. The proposed mathematical programming problem has several advantages. The LAD is used as the objective

Jo

function to measure the actual distance between the observed and estimated responses. The use of the FPC operator can reduce the unnecessary information that results from fuzzy observations in the formulation process, increasing model performance in terms of mean similarity, distance, and credibility measures. More importantly, the uncertainty of the model and parameters is retained. In addition, without the prior assumptions, the sign of parameters can be determined in the formulation processes so that the formulated models can accurately reflect the influence of fuzzy parameters on 25

Journal Pre-proof the estimated fuzzy response. Comparisons with existing approaches show that the performance of the proposed approach is excellent in terms of the three above-mentioned measures, even when crisp explanatory variables are used. To use the proposed approach, some requirements, such as P1, must be met. To deal with this problem, restrictions were added to the mathematical programming model. In future research, a more generalized operator could be developed to deal with more

pro of

common cases.

Acknowledgement

This work was funded in part by Contract MOST 107-2410-H-006-041-MY2 from the Ministry

re-

of Science and Technology, Republic of China.

lP

References

urn a

[1] H.C. Wu, Linear regression analysis for fuzzy input and output data using the extension principle, Computers & Mathematics with Applications, 45 (2003) 1849-1859. [2] H. Tanaka, S. Uejima, K. Asai, Linear-Regression Analysis with Fuzzy Model, Ieee T Syst Man Cyb, 12 (1982) 903-907. [3] H. Tanaka, Fuzzy Data-Analysis by Possibilistic Linear-Models, Fuzzy Sets and Systems, 24 (1987) 363-375. [4] H. Tanaka, J. Watada, Possibilistic Linear-Systems and Their Application to the

Jo

Linear-Regression Model, Fuzzy Sets and Systems, 27 (1988) 275-289. [5] M. Sakawa, H. Yano, Multiobjective Fuzzy Linear-Regression Analysis for Fuzzy Input Output Data, Fuzzy Sets and Systems, 47 (1992) 173-181. [6] M. Hojati, C.R. Bector, K. Smimou, A simple method for computation of fuzzy linear regression, European Journal of Operational Research, 166 (2005) 172-184. [7] D.T. Redden, W.H. Woodall, Properties of Certain Fuzzy Linear-Regression Methods, Fuzzy Sets and Systems, 64 (1994) 361-375. [8] C. Kao, C.L. Chyu, A fuzzy linear regression model with better explanatory power, Fuzzy Sets and Systems, 126 (2002) 401-409. [9] B.J. Kim, R.R. Bishu, Evaluation of fuzzy linear regression models by comparing membership functions, Fuzzy Sets and Systems, 100 (1998) 343-352. [10] F.N. Chen, Y.Z. Chen, J. Zhou, Y.Y. Liu, Optimizing h value for fuzzy linear regression with asymmetric triangular fuzzy coefficients, Engineering Applications of Artificial Intelligence, 47 26

Journal Pre-proof (2016) 16-24.

pro of

[11] H. Hoglund, Fuzzy linear regression-based detection of earnings management, Expert Systems with Applications, 40 (2013) 6166-6172. [12] T. Hong, P. Wang, Fuzzy interaction regression for short term load forecasting, Fuzzy Optimization and Decision Making, 13 (2013) 91-103. [13] H.M. Jiang, C.K. Kwong, W.H. Ip, Z.Q. Chen, Chaos-Based Fuzzy Regression Approach to Modeling Customer Satisfaction for Product Design, IEEE Transactions on Fuzzy Systems, 21 (2013) 926-936. [14] S. Muzzioli, A. Ruggieri, B. De Baets, A comparison of fuzzy regression methods for the estimation of the implied volatility smile function, Fuzzy Sets and Systems, 266 (2015)

re-

131-143. [15] K.Y. Chan, U. Engelke, Varying Spread Fuzzy Regression for Affective Quality Estimation, IEEE Transactions on Fuzzy Systems, 25 (2017) 594-613. [16] A. Celmins, Least-Squares Model-Fitting to Fuzzy Vector Data, Fuzzy Sets and Systems, 22 (1987) 245-269. [17] H.K. Kim, J.H. Yoon, Y. Li, Asymptotic properties of least squares estimation with fuzzy observations, Information Sciences, 178 (2008) 439-451. [18] L.H. Chen, C.C. Hsueh, Fuzzy Regression Models Using the Least-Squares Method Based on

urn a

lP

the Concept of Distance, IEEE Transactions on Fuzzy Systems, 17 (2009) 1259-1272. [19] F. Torkian, M. Arefi, M.G. Akbari, Multivariate Least Squares Regression using IntervalValued Fuzzy Data and based on Extended Yao- Wu Signed Distance, International Journal of Computational Intelligence Systems, 7 (2014) 172-185. [20] R. Coppi, P. D'Urso, P. Giordani, A. Santoro, Least squares estimation of a linear regression model with LR fuzzy response, Computational Statistics & Data Analysis, 51 (2006) 267-286. [21] P. D’Urso, A. Santoro, Goodness of fit and variable selection in the fuzzy multiple linear regression, Fuzzy Sets and Systems, 157 (2006) 2627-2647. [22] E.N. Nasibov, Fuzzy least squares regression model based of weighted distance between fuzzy

Jo

numbers, Automatic Control and Computer Sciences, 41 (2007) 10-17. [23] M.B. Ferraro, R. Coppi, G.G. Rodriguez, A. Colubi, A linear regression model for imprecise response, International Journal of Approximate Reasoning, 51 (2010) 759-770. [24] K.Y. Chan, H.K. Lam, T.S. Dillon, S.H. Ling, A Stepwise-Based Fuzzy Regression Procedure for Developing Customer Preference Models in New Product Development, IEEE Transactions on Fuzzy Systems, 23 (2015) 1728-1745. [25] A.R. Arabpour, M. Tata, Estimating the parameters of a fuzzy linear regression model, Iranian Journal of Fuzzy Systems, 5 (2008) 1-19. [26] Y.Y. Hsu, H.K. Liu, B.L. Wu, On the optimization methods for fully fuzzy regression models, International Journal of Intelligent Technologies and Applied Statistics, 3 (2010) 45-55. [27] P.T. Chang, E.S. Lee, Fuzzy Least Absolute Deviations Regression and the Conflicting Trends in Fuzzy Parameters, Computers & Mathematics with Applications, 28 (1994) 89-101. [28] W. Stahel, S. Weisberg, Directions in robust statistics and diagnostics, Springer Science and 27

Journal Pre-proof Business Media, 2012.

pro of

[29] J.H. Li, W.Y. Zeng, J.J. Xie, Q. Yin, A new fuzzy regression model based on least absolute deviation, Engineering Applications of Artificial Intelligence, 52 (2016) 54-64. [30] J. Chachi, S.M. Taheri, A least-absolutes approach to multiple fuzzy regression, in: Bulletin of the ISI 58th world statistics congress of the International Statistical Institute, Dublin, Ireland, 2011, pp. 1-6. [31] J. Chachi, S.M. Taheri, H.R. Pazhand, M. Geotechnical, An interval-based approach to fuzzy regression for fuzzy input-output data, in: Fuzzy Systems (FUZZ), 2011 IEEE International Conference on. IEEE, IEEE, Taipei, Taiwan, 2011, pp. 2859-2863. [32] S.M. Taheri, M. Kelkinnama, Fuzzy Linear Regression Based on Least Absolutes Deviations,

re-

Iranian Journal of Fuzzy Systems, 9 (2012) 121-140. [33] W.Y. Zeng, Q.L. Feng, J.H. Li, Fuzzy least absolute linear regression, Applied Soft Computing, 52 (2017) 1009-1019. [34] L.H. Chen, W.C. Ko, F.T. Yeh, Approach based on fuzzy goal programing and quality function deployment for new product planning, European Journal of Operational Research, 259 (2017) 654-663. [35] H. Torabi, J. Behboodian, Fuzzy least-absolutes estimates in linear models, Commun Stat-Theor M, 36 (2007) 1935-1944.

urn a

lP

[36] M. Kelkinnama, S.M. Taheri, Fuzzy least-absolutes regression using shape preserving operations, Information Sciences, 214 (2012) 105-120. [37] J. Neter, M.H. Kutner, C.J. Nachtsheim, W. Wasserman, Applied linear statistical models, 4 ed., Irwin, Chicago, 1996. [38] C. Kao, C.L. Chyu, Least-squares estimates in fuzzy regression analysis, European Journal of Operational Research, 148 (2003) 426-435. [39] D.H. Hong, J.K. Song, H.Y. Do, Fuzzy least-squares linear regression analysis using shape preserving operations, Information Sciences, 138 (2001) 185-193. [40] D. Dubois, H. Prade, Fuzzy sets and systems: theory and applications, Academic Press, New

Jo

York, 1980. [41] G.J. Klir, B. Yuan, Fuzzy sets and systems: theory and applications, Prentice Hall, New Jersey, 1995. [42] C.P. Pappis, N.I. Karacapilidis, A Comparative-Assessment of Measures of Similarity of Fuzzy Values, Fuzzy Sets and Systems, 56 (1993) 171-174. [43] L.H. Chen, C.C. Hsueh, A mathematical programming method for formulating a fuzzy regression model based on distance criterion, IEEE Trans Syst Man Cybern B Cybern, 37 (2007) 705-712. [44] X.L. Liu, Y.Z. Chen, A Systematic Approach to Optimizing h Value for Fuzzy Linear Regression with Symmetric Triangular Fuzzy Numbers, Mathematical Problems in Engineering, 2013 (2013) 1-9. [45] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen, LINGO The Modeling Language and 28

Journal Pre-proof Optimizer, LINDO Systems, Chicago, Illinois, 2017.

Jo

urn a

lP

re-

pro of

[46] P. Diamond, Fuzzy Least-Squares, Information Sciences, 46 (1988) 141-157. [47] D.H. Hong, H.Y. Do, Fuzzy system reliability analysis by the use of Tω (the weakest t-norm) on fuzzy number arithmetic operations, Fuzzy sets and systems, 90 (1997) 307-316.

29