Publicly verifiable privacy-preserving aggregation and its application in IoT

Publicly verifiable privacy-preserving aggregation and its application in IoT

Accepted Manuscript Publicly verifiable privacy-preserving aggregation and its application in IoT Tong Li, Chongzhi Gao, Liaoliang Jiang, Witold Pedry...

1MB Sizes 0 Downloads 48 Views

Accepted Manuscript Publicly verifiable privacy-preserving aggregation and its application in IoT Tong Li, Chongzhi Gao, Liaoliang Jiang, Witold Pedrycz, Jian Shen PII:

S1084-8045(18)30305-9

DOI:

https://doi.org/10.1016/j.jnca.2018.09.018

Reference:

YJNCA 2218

To appear in:

Journal of Network and Computer Applications

Received Date: 18 June 2018 Revised Date:

5 September 2018

Accepted Date: 27 September 2018

Please cite this article as: Li, T., Gao, C., Jiang, L., Pedrycz, W., Shen, J., Publicly verifiable privacypreserving aggregation and its application in IoT, Journal of Network and Computer Applications (2018), doi: https://doi.org/10.1016/j.jnca.2018.09.018. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

RI PT

Publicly Verifiable Privacy-Preserving Aggregation and Its Application in IoT Tong Lia , Chongzhi Gaoa,∗, Liaoliang Jianga , Witold Pedryczb , Jian Shenc a

School of Computer Science, Guangzhou University, Guangzhou, China Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada c School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China

M AN U

SC

b

Abstract

AC C

EP

TE D

With the development of smart devices, the Internet of Things (IoT) has found wide applications and extended various services of Internet. On intermediate nodes and edge nodes of the IoT network, the aggregation primitive is a basic function for forwarding data, which takes data sources of other nodes as input. To protect sensitive information of source nodes while enabling the aggregation computations, some works presented corresponding secure protocols. However, the aggregation node could return invalid results due to transition failures, software bugs, or computation delays. Thus, how to verify the results is a big challenge of secure aggregation protocols. In this paper, we focus on the verification problem, and propose a new publicly verifiable scheme for the aggregation operation. Different from existing solutions, our scheme enables a public verifier to test an aggregation result on the data of source nodes while protecting the data privacy. Security analysis shows that the proposed scheme can achieve the security properties. Finally, we provide the experimental evaluation that demonstrates the effectiveness of our scheme. Keywords: Privacy-Preserving, Aggregation, Verification, IoT



Corresponding author. Email addresses: [email protected] (Tong Li), [email protected] (Chongzhi Gao ), [email protected] (Liaoliang Jiang), [email protected] (Witold Pedrycz), s [email protected] (Jian Shen)

Preprint submitted to XXX Journal

October 4, 2018

ACCEPTED MANUSCRIPT

1. Introduction

AC C

EP

TE D

M AN U

SC

RI PT

With the rapidly development of Internet and communication technologies, the concept of Internet of Things (IoT) has emerged as a popular technology that brings convenience to individuals and enterprises. More and more smart terminals, which are connected on the Internet, play important roles in our daily lives. According to a report of International Data Corporation (IDC), there will be nearly 28 billion installed IoT devices by 2020. Interactive intelligent terminals in IoT include mobile devices and environmental sensors, which can capture snapshots of their states and report them [1, 2]. For example, files, logs, and other real-time contents are forwarded in this way from these terminals to intermediate nodes. Although these activities are benefits for data sharing, propagation, and utilization, a privacy issue arises inevitably. Normally, submitting data to untrusted nodes (e.g., edge nodes or receivers) indicates that data owners will lose direct control over the data in the consequent data processing [3], while the data could contain sensitive information about the owners, such as personal locations and proprietary asset data. Processing these data without secure measures will entail risks of sensitive information leakage. For the aggregation, which aims at computing the sum of modified values, some previous works presented solutions to an untrusted aggregator to execute the computation without disclosing each value. The solutions can be used to implement secret combination over the data of smart devices in IoT. However, there do exist many possible reasons for the aggregation node to return an invalid result, such as transition failures, software bugs, or computation delays, which will affect the subsequent computations. Unfortunately, in an IoT system, previous solutions do not meet both the security of data and verifiability of results. A trivial verification method is to set a supervisor for recalculating the result, but it is difficult for practical applications since calculation details of an intermediate node are not transparent enough for terminals, receivers, and other nodes. Thus, the result should be check whether there is any error by a public verifier. Designing a scheme for secure aggregation on untrusted nodes should face this challenge. To give a solution, in this paper, we propose a new publicly verifiable privacy-preserving aggregation scheme for multiple source nodes in IoT. The highlights of our scheme can be summarized as follows. • We propose a verifiable secure aggregation scheme that enables an untrusted aggregation node to perform the aggregation over data from 2

ACCEPTED MANUSCRIPT

source nodes without revealing the data.

RI PT

• We design a tuple of algorithms in the scheme, by which any public verifier can verify the validity of aggregation results from the aggregation node.

SC

• We implement the prototype of the proposed scheme and conduct the experimental evaluation. The experimental results shows the effectiveness and efficiency of the scheme.

2. Related Works

M AN U

The rest of this paper is organized as follows. Section 2 reviews some related works. Section 3 gives an overview of some previous definitions about our work. In Section 4, we describe our verifiable privacy-preserving aggregation scheme. Section 5 analyzes the correctness and security of the proposed scheme. In Section 6, we implement the prototype of our scheme and evaluate the experimental results of the prototype. In Section 7, we summarize the paper.

AC C

EP

TE D

Secure Aggregation. As mentioned in Section 1, the aggregation is to compute the sum of values. Sometimes, these values will be modified by coefficients before the sum operation. In the network routing [4, 5], the aggregation is a usual operation for receiving packets and forwarding them to neighbour nodes. In IoT, the task is that intermediate nodes combine received data packets and transit them, and the next node still obtains the correct sum. There also are other secure computation solutions for IoT, such as integration [6, 7], keyword search [8], data mash-up [9], and authentication [10]. The aggregation is a very common sub-operation in the machine learning, which is used to create intermediate results of Naive Bayes learning and neural network learning. So, most secure aggregation schemes are proposed in this field. Shokri et al. [11] proposed a privacy-preserving system enables trainers to jointly parallelized train and aggreagte a neural network classifier model. Abadi et al. [12] proposed a deep learning scheme with differential privacy [13]. In this work, trainers should share sensitivity [13] of the whole dataset and aggregate noises to the classifier in each learning round. Ohrimenko et al. [14] gave a solution for the data-oblivious multi-party

3

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

machine learning by using trusted SGX-processors. To reduce the communication overhead and enhance the robustness for dropping out, Bonawitz et al. [15] proposed a secure aggregation protocol. Besides, many existing works concerned the aggregation in learning algorithms, such as neural networks [16–18], Naive Bayes classification [19], k-means clustering [20], and SVM [21]. These previous works did not consider a verification mechanism that supports a public verifier. Verifiable Computation. The verification problem is focused by researchers since the great development of cloud computing and outsourcing techniques. It is doubtable that a server can be fully trusted by users to perform the outsourcing protocols correctly. To give a solution for the problem above, Golle et al. [22] realized the verification of some computational results. Then, several research works aim at the subject of verifiable computations [23–28] in the past decades. Shen et al. [29] designed a secure auditing protocol with public verification. Benabbas et al. [30] firstly proposed a practical verifiable scheme for high-degree polynomial functions. Fiore et al. [31] proposed a publicly verifiable scheme for outsourcing a matrix multiplication based on the garbled circuit, but it is not efficient due to the overhead of garbling a big circuit. Kun et al. [32] proposed an efficient scheme for computing the large-scale matrix multiplication. To fight against a malicious cloud server, Lei et al. [33] proposed a protocol to enable the cheating resistant in verifiable outsourcing computations. Normally, the outsourced data contains sensitive information of their owners, so some verifiable solutions also preserve data owners’ privacy by some cryptographic techniques [34]. Chen et al. [35] proposed a secure outsourcing scheme of large-scale linear equations by blinding the input equations with a sparse matrix. Salinas et al. [36] presented a privacy-preserving matrix transformation scheme that enables users to disguise their private matrices by adding a matrix. Zhang et al. [37] presented a secure outsourcing scheme that supports publicly verifiable computations for batch matrix multiplications. 3. Preliminaries

In this section, we show some definitions of about our proposed scheme as follows.

4

ACCEPTED MANUSCRIPT

SC

RI PT

3.1. Verifiable Scheme A verifiable computation scheme enables a user to outsource his evaluation function f to a server and then verify whether the returned results are correct or not. A formal definition of securely verifiable computation is defined as follows [15]. A verifiable computation scheme VC consists of a tuple of probabilistic, polynomial-time algorithms (KeyGen, P robGen, Compute, V erif y, Solve) with the following functionality:

M AN U

• KeyGen(f, λ) → (pk, sk). Given the security parameter λ and the function f , this algorithm outputs a key pair (sk, pk) where sk is the secret key and pk is the public key for encoding the function f . • ProbGen(sk, x) → σx , τx . Using the private key sk, the algorithm produces an encoded problem (σx , τx ) of the function input x. The part σx is submitted to the server as public, while τx is kept private by the user. • Compute(pk, σx ) → σy . Taking the public key pk and the encoded σx as input, the server outputs the encoded result σy .

TE D

• Verify(sk, τx , σy ) → y∪ ⊥. Using the private key sk and the secret part τx , the verification algorithm decodes the encoded output into the clear output, which indicates whether σy represent the valid output of f or not.

EP

• Solve(sk, τx , σy ) → y. Given the private key sk, the secret part τx , and the encoded result σy , this solving algorithm outputs the result y = f (x).

AC C

In this paper, according to the aggregation and public verification, we construct our scheme in a modified tuple. 3.2. Bilinear Pairing A bilinear map is a map e : G × G0 → GT with the following properties: • Bilinearity: for all u, v ∈ G and a, b ∈ Zp∗ , we have e(ua , v b ) = e(u, v)ab . • Non-degeneracy: e(g, g) 6= 1. 5

ACCEPTED MANUSCRIPT

RI PT

• Computability: there is an efficient algorithm to compute e(u, v) for any u ∈ G and v ∈ G0 . Let e : G × G0 → GT be a bilinear map, where G, G0 and GT are bilinear groups of prime order p.

• (BDHE Problem) The Bilinear Diffie-Hellman Exponent (BDHE) problem is stated as follows. Given g, g α , g β ∈ G, h, ∈ G0 , output e(g, h)αβ .

M AN U

SC

• (BDHE Assumption) Given g, g α , g β ∈ G, h, ∈ G0 , for any α, β ∈ F∗p , if the probability to solve the BDHE problem is negligible, we say that the BDHE assumption holds in GT .

EP

TE D

3.3. Key Agreement A key agreement protocol KA consists of a tuple of algorithms (KA. P aramGen, KA.KeyGen, KA.Agree). The algorithm KA.P aramGen(λ) → param produces some public parameters. KA.KeyGen(param) → (pki , ski ) allows the i-th participant to generate a private-public key pair. KA.Agree (ski , pkj ) → si,j allows any participant i to combine the private key ski with the public key pkj for any j, to generate a shared key si,j between i and j. A typical key agreement scheme is Diffie-Hellman key agreement composed with a hash function. In more details, KA.P aramGen(λ) → (G, g, q, H) chooses group G of prime order q, its generator g, and a hash function H. The algorithm KA.KeyGen(param) → (xi , g xi ) samples a random xi ← F∗q as the secret key ski , while g xi is the public key pki . The agreement between i and j is KA.Agree (ski , pkj ) → si,j that outputs si,j = H((g xj )xi )su,v = H((xv )xu ).

AC C

3.4. Pseudo-Random Generator A secure pseudo-random generator (PRG) that is used in this paper takes in a uniformly random seed of some fixed length. Security for a pseudorandom generator guarantees that its output on a uniformly random seed is computationally indistinguishable from a uniformly sampled element of the output space, as long as the seed is kept secret from adversaries. In this paper, we use the secure pseudo-random generator to generate random mask for data owners.

6

ACCEPTED MANUSCRIPT

4. The Secure Verifiable Aggreagtion Scheme

TE D

M AN U

SC

RI PT

4.1. Architecture and Threat model Consider a set of source nodes each of which holds a feature vector that indicates information collected from sensors. An intermediate node is assigned aggregation operations of a machine learning task over these feature vectors. However, the vectors usually contain sensitive information such as location information, which makes the source nodes reluctant to submit their vectors for an untrusted entity performing the aggregation. Thus, there should be a secure measure for protecting the data privacy against the aggregation node. Alternatively, there do exist many possible reasons for the aggregation node to return a wrong result. To guarantee the correctness of following data forwarding in the network, there also should be a verification mechanism that allows a public verifier to check the aggregation result is correct or not. Normally, the source node which holds private data can be seen as a data owner. In the rest of this paper, we use the terms source node and data owner interchangeably. We consider the secure verification problem in the setting similar to this example. The solution of the problem should include a method for blinding vectors and a verification mechanism for checking aggregation results. As shown in Figure 1, we set three entities in our scheme, and each entity has its own requirement.

EP

• Data Center. We introduce a trusted data center, which is an administrator of the system. The data center provides initialization of parameters and produces keys for data owners. At the same time, it also produces keys for the public verification.

AC C

• Data Owner. There are n data owners need to submit data. After some parameters are generated as public information, a data owner also receives its own key pair for blinding vectors and . For convenience, we set each owner has a feature vector. In order to blind the submitted private vector, the data owner encodes the private vector into a protected form. Besides, the data owner produces a public verification key. • Aggregation Node. The aggregation node has n coefficients for the aggregation. When it receives some blinded vectors, it will compute

7

RI PT

ACCEPTED MANUSCRIPT

Verifier

Verify the Result

Aggregation Node Aggregate Vectors

SC

Other Nodes

Data Owners

M AN U

Figure 1: Architecture of the proposed scheme

the sum of the vectors via some algorithms along with the proof. Finally, the aggregation result is prepared to forward on the outgoing edges, while the proof is released to a public verifier for the following verification.

TE D

• Verifiers. Using verification keys, the main task of the verifier is to check whether the blinding result is correct. It should be remarked that the verifier could be arbitrary entity.

AC C

EP

In our work, we consider an “honest-but-curious” model. Note that the adversary is the untrusted aggregation node, both data owners and the untrusted aggregation node are guaranteed to execute a predetermined protocol. At the end of this interaction, the aggregation node will take advantage of his own power in the execution process to infer about all inputs. The aggregation node honestly execute the predetermined protocol but curious about the content of the computation task. 4.2. The Details of the Scheme Now, we present our construction of the verifiable secure aggregation scheme. There are n devices as data owners and an intermediate node for the aggregation. The i-th data owner holds and forwards a feature vector xi . The aggregation node has some coefficients that can be represented as a P vector s, and it aims at computing the linear combination res = ni=1 si xi . The node will also generate a proof for the result res. Using this proof, any public verifier can efficiently verify the correctness of res. 8

ACCEPTED MANUSCRIPT

M AN U

SC

RI PT

As described previously, other than the verification mechanism, a challenge is to design a blinding method that enables other nodes to perform the aggregation on blinded vectors, while keeping each vector private. To meet this requirement, we adopt the one-time mask for hiding vectors. Similar to the masking in [15], each pair of owners i and j agree on some random vector vi,j . The i-th owner adds all these to xi if i < j. Otherwise, the owner subtracts it from xi . Without coefficients, the mask will be cancelled when all vectors are summed up as follow. P Pi−1 yi = xi + ( nj=i+1 vi,j − j=1 vi,j ). To reduce the communication between owners, the vector vi,j can be regarded as a common seed for a pseudorandom generator P RG which will output a real mask. Moreover, the shared seeds vi,j will be computed by engaging in a key agreement. Then, we describe the algorithms of a verifiable secure aggregation scheme VSA = (P aramGen, KeyGen, M ask, Aggregate, V erif y) as follows.

AC C

EP

TE D

ParamGen(λ, F) → param. This algorithm is for the system initialization and is executed by a trusted data center (administrator of the system). Source Nodes, which are also data owners O, are supposed to private vectors to an aggregation node (aggregator) AG for the aggregation. The aggregation involves a coefficient vector s ∈ F∗n p , where p is a large prime and n is the total number of involved owners. Note that s is determined by the aggregation node. Choose a bilinear map e: G × G0 → GT , where G, G0 , GT are bilinear groups of prime order p > 2λ , while g is a generator of G and g 0 is a generator of G0 . Then, choose δ ∈ F∗p and compute g 00 = g 0δ . Also, the data center initializes a key agreement protocol mathcalKA = (P aram, Gen, Agree) for data owners. Finally, the public parameters param = (G, G0 , GT , KA, p, g, g 0 , g 00 ) are published.

KeyGen(param, s ) → ( PK , h ). Taking the public parameter param and the coefficient vector s as input, the data center runs this algorithm to produce the public key pair PK and evaluation vector h, which are used for aggregating vectors and generating proofs in following procedures. Firstly, it randomly picks up one vector r ∈ F∗n p . Note that r should be 9

ACCEPTED MANUSCRIPT

RI PT

kept secret. Then, it computes another vector h = (h1 , ..., hn ) for generating proofs, where hi = g δsi +ri (1 ≤ i ≤ n). The public key PK = (P K1 , ..., P Kn ) is thus generated, where P Ki = e(g ri , g 0 ). The output of this algorithm is ( PK , h ).

M AN U

SC

Mask(i, xi , s) → (yi , VKi ). This algorithm is executed by the i-th data owner Oi to blind its private vector xi ∈ Fm p and generate the corresponding verification key VKi . Firstly, the owner Oi runs the key agreement algorithm KA according to other owners to obtain several seed vectors vi,j (1 ≤ j ≤ n, j 6= i). To mask the private vector with the output of a pseudo-random generator (PRG), the owner Oi computes yi as follows, where yi andP xi are of the same size. Pn −1 yi = xi + si ( j=i+1 P RG(vi,j ) − i−1 j=1 P RG(vi,j )). Then, the owner generates a public verification key VKi = (V Ki,1 , ..., y V Ki,m ) where V Ki,k = P Ki i,k (1 ≤ k ≤ m), which is used for a verifier V to check whether the aggregation result is correct. The output of this algorithm is (yi , VKi ).

EP

TE D

Aggregate({yi }ni=1 , h ) → ( res , σ). If the aggregation node AG has received all valid masked vectors from the n data owners, it will take the auxiliary vector h as input to run this algorithm. P the The aggregation result res is computed by ni=1 si yi . Alternatively, Q y proof is generated by computing σ = (σ1 , ..., σm ) where σk = ni=1 hi i,k (1 ≤ k ≤ m), The output of this algorithm is PK along with its proof σ.

AC C

Verify({VKi }ni=1 , res, σ) → accept. Taking the result res, the proof σ, and the verification keys {VKi }ni=1 as input, the verifier V runs the verification algorithm to check Q ? e(σk , g 0 ) = e(g resk , g 00 ) ni=1 V Ki,k for each k-th proof.

10

ACCEPTED MANUSCRIPT

5. Security Analysis

RI PT

5.1. Correctness According to the definition in Section 3, we analyze the correctness of the proposed scheme. Theorem 1. The proposed secure verifiable aggregation scheme is correct.

TE D

M AN U

SC

Proof. For each k ∈ {1, ..., m}, the k-th proof of an aggregated vector res is denoted as σk . Q There is Q Pn Pn y σk = ni=1 hi i,k = ni=1 g (δsi +ri )yi,k = g δ i=1 si yi,k + i=1 ri yi,k . The k-th component of res is P resk = ni=1 si yi,k . Thus, if resk and the corresponding σk are both correct, the checking will pass since e(σk , g 0P ) Pn δ n = e(g i=1Psi yi,k + i=1 ri yi,k , g 0P ) n n i yi,k · e(g, g 0 ) i=1 ri yi,k = e(g, g 0 )δ i=1 sQ = e(g resi , g 0 )δ · Q ni=1 e(g, g 0 )ri yi,k = e(g resi , g 00 ) · Qni=1 e(g ri , g 0 )yi,k = e(g resi , g 00 ) · ni=1 V Ki,k . Therefore, algorithms in the proposed is correct. 5.2. Security Then, we give the security proof of the verifiable aggregation scheme.

EP

Theorem 2. The proposed scheme for aggregation is secure under the BDHE assumption.

AC C

Proof. We will prove that if there exists a PPT adversary A who can break the proposed scheme (i.e., forge a valid result along with its proof) with a non-negligible advantage , it can be used to construct an efficient algorithm B to solve the BDHE problem with a non-negligible probability. We set an oracle O of BDHE to initialize g, g α , g β ∈ G, g 0 , ∈ G0 . Next, B utilizes A to complete the experiment as follows. • Selects α, β ∈ F∗p . Set param1 = (p, G, G0 , GT , e, g1 , g 0 , g10 ) where g1 = g α and g10 = (g 0β )δ . • Randomly choose one vector s0 ∈ F∗m p . 11

ACCEPTED MANUSCRIPT

s0

0

• For each 1 6= k 6= m, compute P Kk0 = e(g sk , g 0 ) /e(g1k , g10 ).

i=1 (g

k)

M AN U

SC

RI PT

In B, we can known that public key PK’, the public parameters param, and the corresponding coefficient vector s. The output of O.KeyGen is statistically indistinguishable from the distribution of the output of algorithm KeyGen. That is, given each private vector xi and the masked vector yi , Q Q Q 0 s0 0 0 = where V Ki,k there is e( ni=1 (g sk )xi,k , g 0 ) = e( ni=1 (g1k )yi,k , g10 ) ni=1 V Ki,k 0 yi,k PK i . Then, making adversary A query O.M ask with each input xi , B takes (xi , VKi ) as output. P Finally, A returns forged results (res’, σ) with resk 6= ni=1 si xi . Following B verifies whether res’ = res. If the result passes the verification, it means that B fails, otherwise it returns the following result and solves the BDHE problem by computing 0 −1 e(g, g 0 )αβ = ( Qn σks0 yi,k )(δsk (resk −resk ))

6. Evaluation

TE D

The equation is easy to get. Therefore, the proposed scheme is secure under the BDHE assumption.

In this section, we implement the prototype of the proposed scheme and evaluate the proposed scheme by experimental results.

AC C

EP

6.1. Implementation Details Our prototype is implemented in C++ Language on several different machine in the LAN, and they are connected with 1 Gbps Ethernet network. Some computers act as data owners O, the data center, and the verifier V , each of which is equipped with an Inter(R) Core(TM) i7-2600 3.40 GHz CPU, 4 GB RAM and installed with Ubuntu 16.04 64-Bit Version. Another acts as the aggregation node AG which is equipped with an Intel(R) Xeon(R) E5-2630 v3 2.40 GHz CPU, 16 GB RAM and installed with Ubuntu Server 16.04 64-Bit Version. We adopt GMP library to implement cryptographic operations. Considering security and efficiency, the prime p is set as 1024bit. The key agreement scheme we use here is Diffie-Hellman key agreement along with a hash function. In the experiments, we use fully synthetic data for our algorithms in the verifiable secure aggregation VSA = (P aramGen, KeyGen, M ask, Aggregate, 12

ACCEPTED MANUSCRIPT

RI PT

V erif y). We firstly focus on the performance of the aggregation on n owners with n ranging from 2 to 20, while we set m = 1000. Then, we fix the number of involved owners (n = 12) and change m, which is the dimension of aggregated vectors, from 100 to 5000. The experiment results are elaborately sketched in Figure 2 and Figure 3, and then we give the elaborate experiment process analysis as follows.

AC C

EP

TE D

M AN U

SC

6.2. Experimental Results Time cost of P aramGen. We consider the computational cost of algorithm P aramGen on the side of a data center side. The main overhead is dominated by the initialization of the key agreement protocol KA, while the overhead of other operations is a constant. The data center should execute the algorithm KA.KeyGen to generate the key pair (pki , ski ) for each i-th data owner, so that the owner can obtain seeds of masks by running the agreement algorithm KA.Agree with other owners. Therefore, as shown in Figure 2 and Figure 3, the cost is independent with the dimension m and has a linear correlation with the owner number n. Note that for the largest vector size m = 5000, P aramGen only costs less than two seconds on the data center side. Since KA is only used for computing masks, the key pair will not be revoked in our scheme. Thus, the algorithm P aramGen only needs to be performed once. Furthermore, this algorithm can be performed for one time and prepared off-line. Time cost of KeyGen. Then, We consider the computational cost of algorithm KeyGen on the side of a data center side. the main computation overhead is dominated by exponentiation operations, which include computing the auxiliary vector h and the public key pair PK. Generating h consists of n exponentiation operations over G (i.e. hi = g δsi +ri (1 ≤ i ≤ n)), while the computation of PK contains 2n exponentiation operations over G and n bilinear pairing operations. Similar to P aramGen, it is obviously that the main overhead of this algorithm is independent with the dimension m and linear with the n, which is not affect the performance for an extremely large m. Time cost of M ask. The algorithm M ask is executed parallel on the side of each data owner Oi , so we consider the computational cost of one owner. The main computational cost is to calculate bilinear pairings for verification key VKi with the complexity of O(m). Besides, in this algorithm, it also needs the generation of seeds by the key agreement protocol in group operation with the complexity of O(n) off-line. Note that if the aggregation 13

ACCEPTED MANUSCRIPT

1 5

T im e C o s t ( s )

1 0

RI PT

P a ra m G e n K e y G e n M a s k A g g re g a te V e r if y

0 4

8

1 2

1 6

2 0

M AN U

0

SC

5

N u m b e r o f D a ta O w n e rs

Figure 2: Computational time cost of each algorithm with m = 1000.

AC C

EP

TE D

will be perform in the future, the seeds are reusable. The masks are computed by the pseudo-random generator P RG. The main overhead of this algorithm is linear with the m. Time cost of Aggregate. The algorithm Aggregate is executed on the side of the aggregation node AG. P The task of the aggregation node AG is to compute the aggregation result ni=1 si yi . Besides, it should generate a proof (σ, τ ) with the complexity of O(nm) to prove that the aggregator has performed the correct aggregation. Note that for the largest size m = 5000, our prototype just costs less than 1 minutes. Time cost of V erif y. The algorithm Aggregate is run by a verifier V . As mentioned above, the verifier V can be any public entity which needs to supervise the aggregation in IoT. Given the verification keys {VKi }ni=1 and the proof (σ, τ ), the verifier V checks whether the aggregation result res is correct or not with the complexity of O(mn) multiplication operations over G and O(mn) multiplication operations over GT . For the largest vector size n = 5000 in our experimental result, the time cost is less than 15 seconds. For the large size of vector, it is benefit for the verifier V to apply our scheme into IoT application.

14

ACCEPTED MANUSCRIPT

6 0

5 0

3 0

2 0

1 0

0 0

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

5 0 0 0

M AN U

S iz e o f a V e c to r

SC

T im e C o s t ( s )

4 0

RI PT

P a ra m G e n K e y G e n M a s k A g g re g a te V e r ify

Figure 3: Computational time cost of each algorithm with n = 12.

7. Conclusion

EP

TE D

In this paper, to give a solution for the verification problem in secure aggregation operations, we propose a verifiable privacy-preserving aggregation scheme. The scheme enables an intermediate node to perform aggregation operations on the data collected for source nodes without knowing the data. Meanwhile, the correctness of the aggregation result can be checked by a public verifier. The analysis shows that the scheme is security under the co-CDH assumption, and our experimental result demonstrates the effectiveness and efficiency of the proposed scheme. In the near future, we expect to improve the robustness of the scheme and give an efficient way to undertake drop-outs of data owners.

AC C

Acknowledgement This work was supported by Guangzhou scholars project for universities of Guangzhou (No. 1201561613). References

[1] A. P. Plageras, C. Stergiou, K. E. Psannis, H. Wang, B. B. Gupta, Efficient iot-based sensor big data collection-processing and analysis in 15

ACCEPTED MANUSCRIPT

RI PT

smart buildings, Future Generation Computer Systems 82 (2017) 349– 357. [2] V. A. Memos, K. E. Psannis, Y. Ishibashi, B. G. Kim, B. B. Gupta, An efficient algorithm for media-based surveillance system (eamsus) in iot smart city framework, Future Generation Computer Systems 83 (2017) 619–628.

SC

[3] Z. Huang, J. Lai, W. Chen, T. Li, Y. Xiang, Data security against receiver corruptions: Soa security for receivers from simulatable dems, Information Sciences DOI: 10.1016/j.ins.2018.08.059.

M AN U

[4] R. Ahlswede, N. Cai, S.-Y. Li, R. W. Yeung, Network information flow, IEEE Transactions on information theory 46 (4) (2000) 1204–1216. [5] S.-Y. Li, R. W. Yeung, N. Cai, Linear network coding, IEEE transactions on information theory 49 (2) (2003) 371–381. [6] C. Stergiou, K. E. Psannis, B. G. Kim, B. B. Gupta, Secure integration of iot and cloud computing, Future Generation Computer Systems 78 (2018) 964–975.

TE D

[7] T. Li, W. Chen, Y. Tang, H. Yan, A homomorphic network coding signature scheme for multiple sources and its application in iot, Security and Communication Networks DOI: 10.1155/2018/9641273.

EP

[8] C. Gao, S. Lv, Y. Wei, Z. Wang, Z. Liu, X. Cheng, M-sse: An effective searchable symmetric encryption with enhanced security for mobile devices, IEEE ACCESS 6 (2018) 38860–38869.

AC C

[9] A. M. Elmisery, M. Sertovic, B. B. Gupta, Cognitive privacy middleware for deep learning mashup in environmental iot, IEEE Access 6 (2018) 8029–8041.

[10] A. Tewari, B. B. Gupta, Cryptanalysis of a novel ultra-lightweight mutual authentication protocol for iot devices using rfid tags, Journal of Supercomputing 73 (3) (2017) 1085–1102. [11] R. Shokri, V. Shmatikov, Privacy-preserving deep learning, in: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM, 2015, pp. 1310–1321. 16

ACCEPTED MANUSCRIPT

RI PT

[12] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, L. Zhang, Deep learning with differential privacy, in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ACM, 2016, pp. 308–318.

SC

[13] C. Dwork, A. Roth, et al., The algorithmic foundations of differential R in Theoretical Computer Science privacy, Foundations and Trends 9 (3–4) (2014) 211–407.

M AN U

[14] O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, S. Nowozin, K. Vaswani, M. Costa, Oblivious multi-party machine learning on trusted processors, in: USENIX Security, Vol. 16, 2016, pp. 619–636. [15] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth, Practical secure aggregation for privacy-preserving machine learning, in: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ACM, 2017, pp. 1175–1191.

TE D

[16] T. Chen, S. Zhong, Privacy-preserving backpropagation neural network learning, IEEE Transactions on Neural Networks 20 (10) (2009) 1554– 1564. [17] T. Graepel, K. Lauter, M. Naehrig, Ml confidential: Machine learning on encrypted data, in: International Conference on Information Security and Cryptology, Springer, 2012, pp. 1–21.

EP

[18] J. Yuan, S. Yu, Privacy preserving back-propagation neural network learning made practical with cloud computing, IEEE Transactions on Parallel and Distributed Systems 25 (1) (2014) 212–221.

AC C

[19] J. Vaidya, M. Kantarcıo˘glu, C. Clifton, Privacy-preserving naive bayes classification, The VLDB JournalThe International Journal on Very Large Data Bases 17 (4) (2008) 879–898. [20] G. Jagannathan, R. N. Wright, Privacy-preserving distributed k-means clustering over arbitrarily partitioned data, in: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, 2005, pp. 593–599.

17

ACCEPTED MANUSCRIPT

RI PT

[21] S. Laur, H. Lipmaa, T. Mielik¨ainen, Cryptographically private support vector machines, in: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2006, pp. 618–624.

[22] P. Golle, I. Mironov, Uncheatable distributed computations, in: Topics in Cryptology - CT-RSA 2001, The Cryptographer’s Track at RSA Conference 2001, 2001, pp. 425–440.

M AN U

SC

[23] M. Blum, M. Luby, R. Rubinfeld, Self-testing/correcting with applications to numerical problems, Journal of Computer and System Sciences 47 (3) (1993) 549–595. [24] X. Chen, J. Li, X. Huang, J. Ma, W. Lou, New publicly verifiable databases with efficient updates, IEEE Transactions on Dependable and Secure Computing 12 (5) (2015) 546–556. [25] X. Chen, J. Li, J. Weng, J. Ma, W. Lou, Verifiable computation over large database with incremental updates, IEEE Transactions on Computers 65 (10) (2014) 3184–3195.

TE D

[26] V. Vu, S. Setty, A. J. Blumberg, M. Walfish, A hybrid architecture for interactive verifiable computation, in: IEEE Symposium on Security and Privacy, 2013, pp. 223–237.

EP

[27] J. Li, X. Huang, J. Li, X. Chen, Y. Xiang, Securely outsourcing attribute-based encryption with checkability, IEEE Transactions on Parallel and Distributed Systems 25 (8) (2014) 2201–2210.

AC C

[28] J. Li, X. Chen, M. Li, J. Li, P. P. Lee, W. Lou, Secure deduplication with efficient and reliable convergent key management, IEEE transactions on parallel and distributed systems 25 (6) (2014) 1615–1625. [29] J. Shen, J. Shen, X. Chen, X. Huang, W. Susilo, An efficient public auditing protocol with novel dynamic structure for cloud data, IEEE Transactions on Information Forensics and Security 12 (10) (2017) 2402– 2415. [30] S. Benabbas, R. Gennaro, Y. Vahlis, Verifiable delegation of computation over large datasets, in: Conference on Advances in Cryptology, CRYPTO 2011, 2011, pp. 111–131. 18

ACCEPTED MANUSCRIPT

RI PT

[31] D. Fiore, R. Gennaro, Publicly verifiable delegation of large polynomials and matrix computations, with applications, in: ACM Conference on Computer and Communications Security, 2012, pp. 501–512. [32] K. Jia, H. Li, D. Liu, S. Yu, Enabling efficient and secure outsourcing of large matrix multiplications, in: IEEE Global Communications Conference, 2015, pp. 1–6.

M AN U

SC

[33] X. Lei, X. Liao, T. Huang, F. Heriniaina, Achieving security, robust cheating resistance, and high-efficiency for outsourcing large matrix multiplication computation to a malicious cloud, Information Sciences 280 (2014) 205–217. [34] B. B. Gupta, D. P. Agrawal, S. Yamaguchi, Handbook of Research on Modern Cryptographic Solutions for Computer and Cyber Security, IGI Global, 2016. [35] X. Chen, X. Huang, J. Li, J. Ma, W. Lou, D. S. Wong, New algorithms for secure outsourcing of large-scale systems of linear equations, IEEE Transactions on Information Forensics and Security 10 (1) (2014) 69–78.

TE D

[36] S. Salinas, C. Luo, X. Chen, P. Li, Efficient secure outsourcing of largescale linear systems of equations, in: Computer Communications, 2015, pp. 281–292.

AC C

EP

[37] X. Zhang, T. Jiang, K. C. Li, A. Castiglione, X. Chen, New publicly verifiable computation for batch matrix multiplication, Information Sciences DOI: 10.1016/j.ins.2017.11.063.

19