Computer Communications 153 (2020) 538–544
Contents lists available at ScienceDirect
Computer Communications journal homepage: www.elsevier.com/locate/comcom
Research on location fusion of spatial geological disaster based on fuzzy SVM Guobin Chen a , Shijin Li b ,∗ a
Chongqing Key Laboratory of Spatial Data Mining and Big Data Integration for Ecology and Environment, Rongzhi College of Chongqing Technology and Business University, Chongqing 401320, China b Academic Affairs Office, Yunnan University of Finance and Economics, Yunnan 650221, China
ARTICLE
INFO
Keywords: Geological disaster Machine learning Support vector machine Fuzzy reasoning Fuzzy fusion
ABSTRACT In order to effectively improve geological disaster response capacity, disaster tolerance capacity, reduce human and financial losses and other aspects of spatial data fusion plays a key role. Whether the location information can be effectively fused is able to monitor the occurrence of geological hazards and effectively identify the existing risks. In machine learning, the similarity and complementarity between support vector machine and fuzzy inference is the basis of their fusion. Support vector machine can achieve knowledge acquisition and learning, while fuzzy inference has the ability to infer knowledge rules. Aiming at the different attributes and dimensions of information spatial location data, a fuzzy fusion method based on support vector machine is proposed to describe Support vector machine related theories and models. Comparing the efficiency of support vector machine-fusion algorithm on GPLUS, OKLAHOMA and UNC with the other three algorithms, there is a great advantage in RMSE and time. The algorithm in this paper also has good performance in the three data sets on F1, which shows that the algorithm has a good effect.
1. Introduction On the whole, the research of information fusion technology can be summarized as the level of information fusion, the representation and transformation of information, the structure of information fusion and the methods of information fusion. At present, most of the information fusion technologies at home and abroad only solve the problem of optimal fusion estimation of multi-sensor data in specific application fields, so there is no universal information fusion theoretical framework, effective information fusion and generalized fusion model and information fusion algorithm. At present, the common fusion methods of stochastic class and least squares are weighted fusion, Kalman filtering, Bayesian estimation, statistical decision theory, Shafer–Dempster evidence theory and so on. The intelligent fusion methods include production rule with confidence factor, support vector machine, fuzzy logic, Bayesian network and neural network. For Fuzzy Support Vector Machines (FSVM), how to determine the membership value (that is, how to determine the weight of each sample) is a problem. Many researchers have done some effective research in this area. For example, literature [1,2] takes the linear function of the distance between the sample point and its class center as the membership function, and proposes a method to determine the membership based on the class center. However, the linear function of the distance from the sample point to the center of the class in these literatures often depends heavily on the geometry of the training sample set, which can easily lead to the decrease of the membership
degree of the support vector. In order to reduce the dependence of membership function on the geometric shape of the training sample set, the linear function of the distance between the sample point and the in-class plane is taken as the membership function in [3], and a membership degree determination method based on the in-class hyperplane distance is proposed, which can improve the membership degree of the support vector. In reference [4], a new kernel membership function is proposed by taking the linear function of the distance of K other sample points nearest to the sample point as the membership function. In reference [5], a new optimal classifier is established by assigning different dual membership degrees to each sample. The edgeremoved fuzzy support vector machine [6,7] divides the training set into two subsets according to the geometric shape, and omits the subsets which do not contain the support vectors, and trains them in the subsets which contain the support vectors. Obviously, this method is easy to remove some support vectors artificially, which leads to the reduction of classification accuracy. In order to solve the problems of long training time and low training efficiency of fuzzy support vector machine, literature [8] puts forward an improved strategy of increasing the penalty for easily misclassified samples: giving larger membership to edge data and smaller membership to the data near the class center, the practical application effect is better; The paper [9] makes further improvement on the basis of the paper. It gives smaller membership to the data far away from the classification hyperplane, which cannot become the support vector, greatly reduces the data in the training
∗ Corresponding author. E-mail address:
[email protected] (S. Li).
https://doi.org/10.1016/j.comcom.2020.02.033 Received 7 September 2019; Received in revised form 14 November 2019; Accepted 10 February 2020 Available online 15 February 2020 0140-3664/© 2020 Published by Elsevier B.V.
G. Chen and S. Li
Computer Communications 153 (2020) 538–544
Fig. 1. SVM fuzzy fusion structure.
sample set, and solves the problems of long training time and low training efficiency of the fuzzy support vector machine to some extent. The intelligent information fusion method combined with artificial intelligence technology is paid more and more attention because of its short training time, high training efficiency and high classification accuracy. But it should be pointed out that the theory and technology of support vector machine-fuzzy reasoning information fusion are still developing and perfecting. 2. Fuzzy fusion structure analysis of SVM The proposed SVM fuzzy fusion structure is shown in Fig. 1. In Fig. 1, it can be seen that there are three data sources, Geographical data, POI data and User Attribute data. Three different data sources can be divided into several steps: the first step: vector extraction of three different data sources, attribute data extraction. Decompose the extracted attribute data and decompose the features with similar attributes; The second step is to fuzzy process the attributes decomposed by the data source of the first group. The purpose of fuzzy process is to classify the attribute values for better classification. The third step is to process the blurred data of different attribute data source separately, and then process the data fusion again. Three types of data are representative. Geographic data are characterized by spatial location and attribute. Geographic data has two meanings: First, the geographical location of the object itself, the location is usually with some kind of geographical coordinates. Secondly, the relationship between the locations of multiple objects, or spatial relationship, such as the distance between the objects, adjacent, connected, including the relationship. Geospatial data must also include elements of qualitative or quantitative indicators that describe the natural or human attributes of the features, which are referred to as attribute feature data or attribute data. POI data various check-in data, POI (Point of Interest) is a kind of point-like spatial data based on geographical location by abstracting spatial entities and ignoring their volume, area, appearance and other physical information. Each POI corresponds to a spatial entity and generally records the longitude, latitude and other basic information of the entity. The information carried by POI is different according to different types of POI. User attribute data classifies and statistically analyzes users according to their own attributes. The main value is to enrich the dimension of user portrait and make the user behavior insight more detailed. By analyzing what happens to users, it is easier to analyze and predict the behavior of users. Remember (𝐷1 , 𝐷2 , 𝐷3 ), the corresponding data vectors are converted after processing, and the three data vectors are preprocessed respectively. The preprocessed data enters the local fusion center (𝐹1 , 𝐹2 , … , 𝐹𝑛 ), and the processed output ( ) 𝑙1 , 𝑙2 , … ., 𝑙𝑛 is a local decision based on different characteristics of the detected object. Conditional attributes are represented by a set of local decisions in the detection system, that is, local decision results. The decision attributes are represented by a set of final decision quantities output from the fusion center, as shown in Table 1 (see Fig. 2). A circle and a square represent two types of samples, and a thin solid line is called a sorting line, which is a straight line (plane) passing through the nearest sorting line (hyperplane) and parallel to the sorting
Fig. 2. Structure of SVM.
Table 1 Fusion center information table. U
𝐹1
𝐹2
...
𝐹𝑛
L
1 2 ... 𝑘
𝑢11𝑗 𝑢21𝑗 ... 𝑢𝑘1𝑗
𝑢12𝑗 𝑢22𝑗 ... 𝑢𝑘2𝑗
... ...
𝑢1𝑛𝑗 𝑢2𝑛𝑗 ... 𝑢𝑘𝑛𝑗
𝑙1𝑓 𝑗 𝑙2𝑓 𝑗 ... 𝑙𝑘𝑓 𝑗
...
line (hyperplane) in each type, and the distance between them is called a sorting interval. If the classification line equation is defined: (1)
(w ⋅ x) + b = 0
W is the weight of the input vector x, x is the input vector, b is the threshold, and Eq. (1) above is the linear decomposition. In order to ensure the validity of the classification, the maximum classification is 2∕ ||w||, which is decomposed into convex quadratic programming problems. { 1 𝑚𝑖𝑛 ||𝑤||2 (2) 2 𝑠.𝑡.𝑦𝑡 ((𝑤 ⋅ 𝑥) + 𝑏) ≥ 1 In all classifications, the optimal plane has the largest classification interval, where ||𝑤||2 is the smallest, H is called the optimal classification line, and the training samples on H1 and H2 are called support vectors. The Lagrange optimization method is used to obtain the Lagrange poly( ) nomials. It is assumed that 𝑏 = 𝑏1 , 𝑏2 , … , 𝑏𝑛 and Eq. (2) constitutes the Lagrange polynomials and the Lagrange polynomials are maximized. 𝑊 (a) =
𝑛 ∑
𝑎𝑖 −
𝑖=1
𝑛 1∑ 𝑎𝑎 𝑦𝑦 𝑥𝑥 2 𝑗=1 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗
(3)
∑ where, 𝑎𝑖 ≥ 1, 𝑛𝑖=1 𝑦𝑖 𝑎𝑖 = 0 quadratic programming optimizes Eq. (3). ( ) Assuming that there is a maximum value vector a0 = a01 , a02 , … , a0𝑛 of Eq. (3) and the optimal hyperplane is described by (𝑤0 , 𝑏0 ), then 𝑤0 is as shown in Eq. (4). 𝑤0 =
𝑛 ∑ i=1
539
𝑎0𝑖 𝑦𝑖 𝑥𝑖
(4)
G. Chen and S. Li
Computer Communications 153 (2020) 538–544
Minimize Eq. (12):
The decision function of the optimal classification is as shown in Eq. (5) if the restriction condition of Eq. (4) is put forward. ( 𝑛 ) ∑ 𝑓 (x) = sgn 𝑎0𝑖 𝑦𝑖 𝑥𝑖 + 𝑏0 = 0 (5)
⎧ 𝑙 𝑙 ) ( ∑ )( ) ) ( ⎪min 1 ∑ ( 𝑎𝑖 − 𝛼𝑖∗ 𝑎𝑗 − 𝛼𝑗∗ 𝐾 𝑥𝑖 , 𝑥𝑗 − 𝜀 𝑎𝑖 + 𝛼𝑖∗ ⎪𝛼,𝛼∗ − 2 𝑖,𝑗=1 𝑖=1 ⎪ 𝑙 ⎪ ∑ ) ( ∗ ⎪ + 𝑎𝑖 − 𝛼𝑖 𝑦𝑖 ⎨ 𝑖=1 ⎪ 𝑠.𝑡.0 ≤ 𝑎𝑖 , 𝛼𝑖∗ ≤ 𝐶 ⎪ 𝑙 ⎪ ∑ ) ( ⎪ 𝑎𝑖 − 𝛼𝑖∗ = 0 ⎪ 𝑖=1 ⎩
𝑖=1
To introduce Lagrange for Eq. (5): ∑ ( ) 1 𝑎𝑖 𝑦𝑖 𝑤𝑥𝑖 + 𝑏 − 1 (𝑤 ⋅ 𝑤) − 2 i=1 𝑛
𝐿 (w, b, 𝑎) =
(6)
In which, 𝑎 is the Lagrange coefficient. The quadratic programming problem is obtained by differentiating w, b: 𝑛 𝑛 ⎧ ∑ ( )] ∑ [ ( ) ⎪min 1 𝑎𝑖 𝑎 𝑎 𝑦 𝑦 𝜙 𝑥𝑖 ⋅ 𝜙 𝑥𝑗 − ⎪ 2 𝑗=1 𝑖 𝑗 𝑖 𝑗 𝑖=1 ⎪ 𝑛 ∑ ⎨ 𝑠.𝑡. 𝑦 𝑖 𝑎𝑖 = 0 ⎪ 𝑖=1 ⎪ ⎪ 𝑎𝑖 ≥ 0, (𝑖 = 1, 2, … , 𝑛) ⎩
If the optimal solution of the dual problem is 𝛼𝑖∗ , then the normal vector of the optimal classification hyperplane can be calculated as: 𝑤∗ =
𝑙 ∑
( ) 𝑦𝑖 𝛼𝑖∗ 𝜙 𝑥𝑖
(14)
𝑖=1
For a component 0 ≤ 𝛼𝑗∗ ≤ 𝐶 of 𝑎∗ , calculate its offset to give the following equation:
(7)
𝑏∗ = y𝑗 −
𝑙 ∑
( ) 𝑦𝑖 𝛼𝑖∗ 𝐾 𝑥𝑖 , 𝑥𝑗
(15)
𝑖=1
In which, 𝜙(𝑥) is the mapping of the input space Rd data into a high dimensional feature space F by means of a nonlinear mapping. { 𝜙 ∶ 𝑅𝑑 → 𝐹 (8) 𝑥 → 𝜙 (𝑥)
Get the decision function of SVM: ( 𝑙 ) ∑ ( ) 𝑦𝑖 𝛼𝑖∗ 𝐾 𝑥𝑖 , 𝑥𝑗 + 𝑏∗ 𝑓 (𝑥) = 𝑠𝑔𝑛
(16)
𝑖=1
( ) ( ) ( ) In which, the kernel functions is K 𝑥𝑖 , 𝑥𝑗 = 𝜙 𝑥𝑖 ⋅ 𝜙 𝑥𝑗 . In order to improve the classification effect of hyperplane, relaxation variable ) ( 𝜉i , 𝜉𝑖∗ 𝜉i ≥ 0, 𝜉𝑖∗ ≥ 0, 𝑖 = 1, 2, … , 𝑙 and penalty coefficient are used to solve this problem, then the above-mentioned function can be transformed into: 𝑙 𝑙 ∑ ∑ ( ) 1 ( L w, b, 𝜉, a(∗) , = ||w||2 +𝐶 𝛼𝑖 𝜀 + 𝜉i − 𝑦𝑖 (𝜉i + 𝜉𝑖∗ ) − 2 𝑖=1 ( ) ) 𝑖=1 +𝑤 ⋅ 𝜙 𝑥𝑖 + 𝑏 𝑙 ∑ ( ( ) ) 𝛼𝑖∗ 𝜀 + 𝜉𝑖∗ − 𝑦𝑖 + 𝑤 ⋅ 𝜙 𝑥𝑖 − 𝑏 −
(13)
3. Inference fusion model base on fuzzy SVM Fuzzy Support Vector Machine (FSVM) can enhance the noise immunity of SVM. The main idea is to introduce triangular fuzzy numbers to construct a new training set-fuzzy training set, fuzzily the classification problem on the basis of possibility measure, and assign different membership degrees to each training sample point. In this way, by assigning minimal weights to noise points or outliers, the effect of eliminating or reducing the influence of noise points or outliers can be achieved. According to the relevant knowledge of this paper, a triangular fuzzy number is proposed, 𝜏 is a fuzzy number, and the membership function of 𝜏 is:
(9)
⎧ 𝑥 − 𝑟1 ⎪ 𝑟 − 𝑟 , 𝑟1 ≤ 𝑥 ≤ 𝑟2 1 ⎪ 2 1 𝑥 = 𝑟2 𝜇𝜏 = ⎨ ⎪ 𝑥 − 𝑟3 , 𝑟 ≤ 𝑥 ≤ 𝑟 3 ⎪𝑟 − 𝑟 2 3 ⎩ 2
𝑖=1 𝑙 ∑ − (𝛽𝑖 𝜉i + 𝛽𝑖∗ 𝜉𝑖∗ ) i=1
where 𝑎𝑖 , 𝑎∗𝑖 , 𝛽𝑖 , 𝛽𝑖∗ ≥ 0, 𝑖 = 1, 2, … , 𝑙 is a Lagrange multiplier, and Eq. (9) is represented by a minimum value:
(17)
Formula (11) is substituted for Formula (10), which is represented by Formula (2) as follows:
In which, 𝑟1 ≤ 𝑟2 ≤ 𝑟3 , 𝑟𝑗(∈ 𝑅 (𝑗 =)1, 2, 3), 𝜏 is called a triangular fuzzy number, denoted by 𝜏 = 𝑟1 , 𝑟2 , 𝑟3 , and the real number 𝑟1 , 𝑟2 , 𝑟3 is the center and the left and right endpoints of 𝜏. The center embodies the center position of the triangular fuzzy number. Rearrangement of fuzzy training points: Ranking fuzzy positive class points in front and fuzzy negative class points in back, so as to obtain fuzzy training sets of the following forms: ( ) ( ) ( ) ( ) S = { 𝑥1 , 𝑦̃1 , … , 𝑥𝑝 , 𝑦̃𝑝 , 𝑥𝑝+1 , 𝑦̃𝑝+1 , … , 𝑥𝑙 , 𝑦̃𝑙 } (18) ( ) ( ) In which, 𝑥𝑖 , 𝑦̃𝑖 is a positive point of the module; i = 1, … , p; 𝑥𝑗 , 𝑦̃𝑗 is a fuzzy negative class point, j = p + 1, … , 𝑙. Let(( w ∈ R𝑛), 𝑏 ∈)𝑅 hold for a given confidence level 𝜆(0 < 𝜆 < 1) if Pos{𝑦̃𝑖 𝑤 ⋅ 𝑥𝑖 + b ≥ 𝜆 exists, That is, when the fuzzy training set is fuzzy linearly separable at the confidence level, the equivalence is: { ( ) (( ) ) 𝑤 ⋅ 𝑥𝑡 + 𝑏 ≥ 1, 𝑡 = 1, … , 𝑝 (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 (19) ( ) (( ) ) 𝑤 ⋅ 𝑥𝑖 + 𝑏 ≥ 1, 𝑖 = 𝑝 + 1, … , 𝑙 (1 − 𝜆) r𝑖3 + 𝜆r𝑖2
⎧ 𝑙 𝑛 ∑ ( ) ∑ ⎪min 1 𝑎𝑖 𝑎𝑗 𝑦𝑖 𝑦𝑗 𝐾 𝑥𝑖 , 𝑥𝑗 − 𝑎𝑖 ⎪ 2 𝑗=1 𝑖=1 ⎪ 𝑛 ∑ ⎨ 𝑠.𝑡. 𝑦𝑖 𝑎𝑖 = 0 ⎪ ⎪ 𝑖=1 ⎪ 0 ≤ 𝑎𝑖 ≤ 𝐶, (𝑖 = 1, 2, … , 𝑛) ⎩
Firstly, the fuzzy linear separable problem is analyzed. Under the confidence level 𝜆(0 < 𝜆 < 1), the fuzzy linear separable problem is transformed into a fuzzy chance constrained programming problem with (𝑤, 𝑏)𝑇 as the decision variable, i.e. { 1 min ||𝑤||2 (20) (( )2 ) 𝑠𝑡. 𝑃 𝑜𝑠{𝑦̃𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 } ≥ 𝜆, 𝑖 = 1, … , 𝑙
(∗)
𝐿(𝑎) = min(𝐿(𝑤, 𝑏, 𝜉, 𝑎 ))
(10)
Equation (10), for 𝑤, 𝑏, 𝜉 derives: 𝑙 ⎧ 𝜕𝐿 ∑ ( ) ⎪ =0⇒𝑤= 𝑎𝑖 − 𝛼𝑖∗ 𝑥𝑖 𝜕w ⎪ 𝑖=1 ⎪ 𝑙 ⎪ 𝜕𝐿 = 0 ⇒ ∑ (𝑎 − 𝛼 ∗ ) = 0 𝑖 𝑖 ⎪ 𝜕𝑏 𝑖=1 ⎨ ⎪ 𝜕𝐿 = 0 ⇒ 𝐶 − 𝑎𝑖 − 𝛽𝑖 = 0 ⎪ ⎪ 𝜕𝜉i ⎪ 𝜕𝐿 ∗ ∗ ⎪ ∗ = 0 ⇒ 𝐶 − 𝛼𝑖 − 𝛽𝑖 = 0 ⎩ 𝜕𝜉𝑖
(11)
(12)
540
G. Chen and S. Li
Computer Communications 153 (2020) 538–544 Table 2 Raw dataset information.
In which, 𝑦̃𝑖 (𝑖 = 1, … , 𝑙) is a triangular fuzzy number; 𝑃 𝑜𝑠 {⋅} is the probability measure of a fuzzy event {⋅}. The explicit equivalent programming of the fuzzy chance constrained programming equation (20) is as follows: ⎧ 2 min 1 ⎪ 𝑤,𝑏 2 ||𝑤|| ⎪ ( ) (( ) ) ⎨ 𝑠.𝑡. (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 𝑤 ⋅ 𝑥𝑡 + 𝑏 ≥ 1, 𝑡 = 1, … , 𝑝 ⎪ ( ) (( ) ) ⎪ 𝑤 ⋅ 𝑥𝑖 + 𝑏 ≥ 1, 𝑖 = 𝑝 + 1, … , 𝑙 (1 − 𝜆) r𝑖1 + 𝜆r𝑖2 ⎩
(21)
Data set
Node
Side
Clustering coefficient
GPLUS OKLAHOMA UNC
4450 17 425 18 163
1 473 709 892 528 766 800
0.468 0.230 0.202
Table 3 Grouping information after data set decomposition.
The optimal solution of convex quadratic programming equation (20) exists. The Lagrange function is introduced to solve the dual programming. ∑ ( ) ((( ) ) 1 𝛼𝑡 (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 𝑤 ⋅ 𝑥𝑡 + b − 1) ||𝑤||2 − 2 𝑡=1
Data set
Group
Total group
Side
Clustering coefficient
GPLUS OKLAHOMA UNC
3 4 4
11 769 64 980 69 700
1 365 846 763 684 687 215
0.476 0.257 0.265
𝑝
L (w, b, 𝛼, 𝛽) =
−
𝑙 ∑
The optimal classification surface is (w∗ ⋅ 𝑥) + 𝑏∗ = 0, g (x) = (w∗ ⋅ 𝑥) + 𝑏∗ and the optimal classification function is: ( ) 𝑓 (𝑥) = sgn (g (x)) = sgn (w∗ ⋅ 𝑥) + 𝑏∗ , 𝑥 ∈ 𝑅𝑛 (31)
( ) ((( ) ) 𝛽𝑖 (1 − 𝜆) r𝑖1 + 𝜆r𝑖2 𝑤 ⋅ 𝑥𝑖 + b − 1)
𝑖=𝑝+1
(22) )𝑇 )𝑇 ( ( 𝑝 In which, 𝛼 = 𝛼1 , … , 𝑎𝑝 𝑜𝑅 ́ + , 𝛽 = 𝛽𝑝+1 , … , 𝛽𝑙 , 𝛼𝑡 , 𝛽𝑖 are Lagrange multipliers. First, we find the minimum of w, b for Lagrange function From Extreme Conditions: ∇𝑤 L (w, b, 𝛼, 𝛽) = 0, ∇𝑏 L (w, b, 𝛼, 𝛽) = 0
4. Experimental analyses In the Internet of Things, geographic location information can be collected as sensor data [10,11]. In this paper, three real network datasets are simulated, and the statistical indicators of the experimental datasets are shown in Table 2. The dataset is described below: GPLUS [12]: A self-contained network from a Google Plus user, where nodes represent ‘‘friends’’ of the user and edges represent socially connected users 450 and 1,473,709. Each node contains six attribute information described by the user, such as gender, organization, occupation, last name, region, and university. Gender is used as a category label in this article. OKLAHOMA, UNC [13]: Two of the 100 American university Facebook social networks collected in [13]. Wherein OKLAHOMA comprises the edges of nodes 17,425 and 892,528, and UNC comprises the edges of nodes 18,163 and 766,800. Each node contains seven user-described attributes: Student/Teacher Status Tag, Gender, Major, Minor, Address, Grade, High School. Status tags are used as category tags in this article. Dataset attribute decomposition principle, if multiple attributes express personal information into a group, such as: name, gender, and so on. Location information may be group, e.g., residential address, latitude and longitude, etc., and information such as a building may be group, e.g., a school, a shopping mall, etc. The principle of classification is how multiple attributes have the same properties can be grouped into a class, if there is no correlation between attributes can be divided into different classes. There may be incomplete information for some properties, such as missing addresses, and so on. In this case, how the key attribute is missing allows the entire node data to be deleted. By preprocessing the data set, Table 3 information is obtained: Enter POI data information; POI data information includes name, category, longitude and latitude. POI data is a kind of point data representing real geographic entities. It can represent buildings, shops and even the geographical existence of a certain area. In addition to the attribute characteristics given above, the POI data studied in this paper can also have more abundant attribute information such as house number, zip code, address, telephone number and so on. The POI data information is grouped and the actual data is compared. The data position after the fusion algorithm is shown in Fig. 3. The position information in the figure is denoted by the relevant numbers, and the difference between the fused position information and the actual position is not great. In order to show the effectiveness and accuracy of the proposed fusion algorithm, the proposed method is compared with the following algorithms: TADW [14]: Based on the fusion algorithm in the form of matrix factorization, combined with the node attribute information matrix to carry out joint matrix factorization, the fusion representation vector of
(23)
As: w=
𝑝 ∑
𝑙 ∑ ( ) ( ) 𝛽𝑖 (1 − 𝜆) r𝑖1 + 𝜆r𝑖2 𝑥𝑖 𝛼𝑡 (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 𝑥𝑡 +
𝑝 ∑
(24)
𝑖=𝑝+1
𝑡=1
𝑙 ( ) ( ) ∑ 𝛽𝑖 (1 − 𝜆) r𝑖1 + 𝜆r𝑖2 = 0 𝛼𝑡 (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 +
(25)
𝑖=𝑝+1
𝑡=1
Then the maximization problem of the objective function is converted to the minimization problem by using Eq. (25), and the dual programming is obtained as follows: 𝑙 𝑝 ⎧ ∑ ∑ min 1 ⎪ 𝛽𝑖 ) 𝑎 + + 2𝐵 + 𝐷) − ( (𝐴 𝑡 𝛼,𝛽 2 ⎪ 𝑖=𝑝+1 𝑖=1 ⎪ 𝑙 𝑝 ( ) ) ∑ ⎨ ∑ ( 𝛽𝑖 (1 − 𝜆) r𝑖1 + 𝜆r𝑖2 = 0 𝛼𝑡 (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 + ⎪𝑠.𝑡. ⎪ 𝑖=𝑝+1 𝑡=1 ⎪ 𝛼𝑡 ≥ 0, 𝑡 = 1, … , 𝑝; 𝛽𝑖 ≥ 0, 𝑖 = 𝑝 + 1, … , 𝑙 ⎩
(26)
In which, 𝑝 ∑ 𝑝 ⎧ ∑ ( )( ) ⎪𝐴 = 𝛼𝑡 𝑎𝑠 (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 (𝑥𝑡 ⋅ 𝑥𝑠 ) ⎪ t=1 𝑠=1 ⎪ 𝑝 ∑ 𝑙 ∑ ( )( ) ⎪ 𝛼𝑡 𝛽𝑖 (1 − 𝜆) r𝑡3 + 𝜆r𝑡2 (1 − 𝜆) r𝑖1 + 𝜆r𝑖2 (𝑥𝑡 ⋅ 𝑥𝑖 ) ⎨𝐵 = t=1 𝑖=𝑝+1 ⎪ 𝑙 𝑙 ⎪ ∑ ∑ )( ) ( ⎪𝐷 = 𝛽𝑖 𝛽𝑞 (1 − 𝜆) r𝑡1 + 𝜆r𝑖2 (1 − 𝜆) r𝑞1 + 𝜆r𝑞2 (𝑥𝑖 ⋅ 𝑥𝑞 ) ⎪ i=p+1 𝑖=𝑝+1 ⎩
(27)
Formula (26) is a convex quadratic programming problem, the optimal solution of which is: )𝑇 ( ∗ ∗) ( ∗ ∗ 𝛼 , 𝛽 = 𝛼1 , … , 𝛼𝑝∗ , 𝛽𝑝+1 , … , 𝛽𝑙∗ (28) In which, w∗ =
𝑝 ∑
𝑙 ∑ ( ) ( ) 𝑎∗𝑡 (1 − 𝜆) 𝑟𝑡3 + 𝜆𝑟𝑟2 𝑥𝑡 + 𝛽𝑖∗ (1 − 𝜆) 𝑟𝑖1 + 𝜆𝑟𝑖2 𝑥𝑖
𝑡=1
(29)
𝑖=𝑝+1
𝑝 ∑ ( ) ( )( ) b∗ = (1 − 𝜆) 𝑟𝑠3 + 𝜆𝑟𝑠2 𝑥𝑡 + 𝛼𝑡∗ (1 − 𝜆) 𝑟𝑡3 + 𝜆𝑟𝑡2 𝑥𝑡 ⋅ 𝑥𝑠 𝑡=1
+
𝑙 ∑
( )( ) { } 𝛽𝑖∗ (1 − 𝜆) 𝑟𝑟1 + 𝜆𝑟𝑟2 𝑥𝑖 ⋅ 𝑥𝑠 , 𝑠 ∈ 𝑠|𝑎∗ > 0
(30)
𝑖=𝑝+1
541
G. Chen and S. Li
Computer Communications 153 (2020) 538–544
Fig. 3. Comparison of real and fusion positions.
Table 4 RMSE comparison of different algorithms in different datasets.
Fig. 4. Time comparison of different algorithms on the GPLUS dataset.
Data set
Position Num
RMSE TADW
UPP-SNE
SNE
SVM-F
GPLUS
20 40 60
1.213 1.121 1.023
1.276 1.187 1.112
1.265 1.173 1.087
1.109 1.043 1.011
OKLAHOMA
20 40 60
1.209 1.182 1.078
1.276 1.231 1.175
1.298 1.217 1.165
1.176 1.087 1.043
UNC
20 40 60
1.289 1.243 1.217
1.298 1.212 1.198
1.293 1.254 1.213
1.169 1.132 1.112
nodes is obtained. This algorithm is a fusion contrast algorithm based on matrix factorization. UPP-SNE [15]: Based on the Deepwalk algorithm, the SkipGram model is improved, and the knowledge exchange between the two aspects of information is realized by vector inner product. This algorithm is a fusion contrast algorithm based on random walk. SNE [16]: Based on the fusion algorithm of the deep neural network, using the nonlinear feature extraction ability of the deep neural network, mining the complex nonlinear relationship between nodes, to achieve node fusion expression vector feature extraction. The algorithm is a fusion contrast algorithm based on the depth neural network. The root mean square error (RMSE), as the evaluation index (the lower the RMSE value, the higher the accuracy), where bold represents the most accurate value in each row. Set the three dataset locations to 20, 40, and 60, respectively, the experimental results show that the accuracy of linear model is better than that of RFR and GBTR in most cases. In linear model, the accuracy of RFR is higher when the data set is GPLUS and the number of predicted positions is 20. In other cases, the accuracy of RMSE of L2LR model is lower than that of other regression models higher. Therefore, SVM-F is recommended in this paper, see in Table 4. Here is a time comparison of the four algorithms in three datasets, as shown in Figs. 4–6: In Figs. 4–6, Time analysis under three datasets. On GPLUS dataset, the four algorithms differ greatly, especially TADW time is poor, SNE and SVM-F are close in time, but SVM-F is better. In OKLAHOMA data set, in the first 3000, the difference between TADW, UPP-SNE and SNE is small, and the difference between them fluctuates greatly, which shows that the stability of the first three algorithms is poor in OKLAHOMA data set. In the UNC dataset, the first 4000, four algorithms are close, but the later UPP-SNE and SNE alternate in time, the difference is not obvious. Through the comparison of the above three data sets, it is found that the SVM-F algorithm proposed in this
Fig. 5. Time comparison of different algorithms on the OKLAHOMA dataset.
Fig. 6. Time comparison of different algorithms on the UNC dataset.
paper is the best in time efficiency, while the TADW algorithm is the worst in time efficiency. In the multi-class node classification experiment, for any position 𝐴, 𝑇 𝑃 (𝐴), 𝐹 𝑃 (𝐴) and 𝐹 𝑁(𝐴) represent the correct number of positive classes, the number of negative classes and the number of positive classes respectively. C is a collection of all location lists. 𝐹1 is defined 542
G. Chen and S. Li
Computer Communications 153 (2020) 538–544
Secondly, by adjusting the attribute values, we further study the impact of attribute weights on node classification performance. When it is small, the performance of node classification increases with the increase of the number of nodes. As shown in Fig. 8, better performance is achieved in all three datasets when around 0.3. When the number of nodes is large, the performance of node classification decreases gradually, because the larger the number of nodes tends to be the same, weakening the influence of structural feature information between nodes on the representation vectors, resulting in a decline in classification performance. Therefore, the default value is set to 0.3 in this algorithm emulation. 5. Conclusion The application research of SVM-F in fusion data can improve the fusion effect, and it has good effect in time and RMSE, and also has advantage in F1. Research on fusion algorithm is to improve the best performance in the data set and improve the analysis effect. In the process of information fusion between support vector machine and fuzzy inference, whether a certain number of sample data can be obtained has become an important premise of information fusion method based on support vector machine and fuzzy inference. The sample data acquisition of SVM and fuzzy reasoning information fusion process is actually the process of extracting feature information from complex systems. Therefore, it is still an important development direction to strengthen the algorithm of feature information extraction in complex systems.
Fig. 7. SVM-F dimension analysis in dataset.
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments This work was Supported by the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJZD-K201902101); the Open Fund of Chongqing Key Laboratory of Spatial Data Mining and Big Data Integration for Ecology and Environment.
Fig. 8. SVM-F attribute weight analysis in dataset.
as:
∑
𝑃r = ∑
𝐴∈𝐶
𝑇 𝑃 (𝐴)
𝑇 𝑃 (𝐴) + 𝐹 𝑃 (𝐴) ∑ 𝐴∈𝐶 𝑇 𝑃 (𝐴)
(32)
References
(33)
[1] S.G. Chen, X.J. Wu, A new fuzzy twin support vector machine for pattern classification, Int. J. Mach. Learn. Cybern. 9 (3) (2017) 1–12. [2] W. Jin, F. Gong, W. Tian, et al., The segmentation of brain magnetic resonance image by adaptive fuzzy support vector machine, J. Med. Imaging Health Inform. 7 (2) (2017) 400–406. [3] S.E. Pandarakone, Y. Mizuno, H. Nakamura, Distinct fault analysis of induction motor bearing using frequency spectrum determination and support vector machine, IEEE Trans. Ind. Appl. 53 (3) (2017) 3049–3056. [4] X. Peng, L. Kong, D. Chen, A structural information-based twin-hypersphere support vector machine classifier, Int. J. Mach. Learn. Cybern. 8 (1) (2017) 295–308. [5] R.K. Sevakula, N.K. Verma, Compounding general purpose membership functions for fuzzy support vector machine under noisy environment, IEEE Trans. Fuzzy Syst. 25 (6) (2017) 1446–1459. [6] Y. Chai, Y. Wang, J. Zhang, et al., Support vector machine algorithm based on boundary vector, J. Liaoning Tech. Univ. 36 (2) (2017) 202–205. [7] K. Tatsumi, T. Tanino, Support vector machines maximizing geometric margins for multi-class classification, TOP 22 (3) (2014) 815–840. [8] D. Gupta, B. Richhariya, P. Borah, A fuzzy twin support vector machine based on information entropy for class imbalance learning, Neural Comput. Appl. (3) (2018) 1–12. [9] A. Mansouri, L.S. Affendy, A. Mamat, A new fuzzy support vector machine method for named entity recognition, in: International Conference on Computer Science & Information Technology, 2008. [10] Zhenhua Huang, Xin Xu, Juan Ni, Honghao Zhu, Cheng Wang, Multimodal representation learning for recommendation in internet of things, IEEE Internet Things J. (2019) http://dx.doi.org/10.1109/JIOT.2019.2940709.
𝐴∈𝐶
𝑅= ∑
𝑇 𝑃 (𝐴) + 𝑃 𝑁(𝐴) 2 ⋅ 𝑃r ⋅ 𝑅 𝐹1 = 𝑃r + 𝑅 𝐴∈𝐶
(34)
We further analyze the influence of super-parameters on the performance of SVM-F algorithm. As shown in Figs. 7–8, we test the change trend of node classification performance index 𝐅𝟏 under different parameter selection. During the experiment, except for the test parameters, the other parameters were set as default values, and the proportion of classifier training data was set to 10%. First, from the comparison experiment as shown in Fig. 7, it is found that the node classification performance is gradually improved with the increase of the representation vector dimension. When the representation vector is increased to a certain extent, the performance of the algorithm increases slightly and tends to be stable. At the same time, we find that in OKLAHOMA and UNC networks, the performance of node classification improves rapidly with the increase of representation vector dimension. This is because in sparse networks, the node structure features are relatively dispersed, and more dimensions are needed to represent the similarity and difference between nodes. Therefore, the default value is set to 128 in the algorithm simulation, which ensures better performance while using fewer feature dimensions. 543
G. Chen and S. Li
Computer Communications 153 (2020) 538–544 [14] C. Yang, Z. Liu, D. Zhao, et al., Network representation learning with rich text information, in: IJCAI 2015: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, AAAI, Menlo Park, 2015, pp. 2111–2117. [15] D. Zhang, J. Yin, X. Zhu, et al., User profile preserving social network embedding, in: IJCAI 2017: Proceedings of the 26th International Joint Conference on Artificial Intelligence, AAAI, Menlo Park, 2017, pp. 3378–3384. [16] L. Liao, X. He, H. Zhang, et al., Attributed social network embedding, 2017, arXiv preprint arXiv:1705.04969.
[11] J. Leskovec, J.J. Mcauley, Learning to discover social circles in ego networks, in: NIPS 2012: The Twenty-Sixth Annual Conference on Neural Information Processing Systems, MIT Press, Cambridge, 2012, pp. 539–547. [12] Wei Wei, Shuai Liu, Wenjia Li, Dingzhu Du, Fractal intelligent privacy protection in online social network using attribute-based encryption schemes, IEEE Trans. Comput. Soc. Syst. 5 (3) (2018) 736–747. [13] A.L. Traud, P.J. Mucha, M.A. Porter, Social structure of facebook networks, Physica A 391 (16) (2012) 4165–4180.
544