DedupDUM: Secure and scalable data deduplication with dynamic user management

DedupDUM: Secure and scalable data deduplication with dynamic user management

Information Sciences 456 (2018) 159–173 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

2MB Sizes 0 Downloads 148 Views

Information Sciences 456 (2018) 159–173

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

DedupDUM: Secure and scalable data deduplication with dynamic user management Haoran Yuan a, Xiaofeng Chen a,∗, Tao Jiang a, Xiaoyu Zhang a, Zheng Yan a, Yang Xiang a,b a b

State Key Laboratory of Integrated Service Networks (ISN), Xidian University, Xi’an, P.R. China Digital Research & Innovation Capability Platform, Swinburne University of Technology, Hawthorn, Australia

a r t i c l e

i n f o

Article history: Received 23 January 2018 Revised 5 May 2018 Accepted 7 May 2018 Available online 8 May 2018 Keywords: Data deduplication Random convergent encryption Dynamic user management Access control User joining

a b s t r a c t Data deduplication on cloud enables the cloud servers to store a cope of data and eliminate redundant one so that a goal to save storage space and network bandwidth is realized. Recently, many research works which are concerning to the privacy-preserving problem of dynamic ownership management in the secure data deduplication setting are published. However, to our knowledge, the existing schemes are not efficient when the cloud user joining and revocation frequently go on, especially in the absence of a trusted third party in practical cloud storage systems. In this paper, we propose a secure and scalable data deduplication scheme with dynamic user management, which updates dynamic group users in a secure way and restricts the unauthorized cloud users from the sensitive data owned by valid users. To further mitigate the communication overhead, the pre-verified accessing control technology is adopted, which prevents the unauthorized cloud users from downloading data. In other words, our present scheme also ensures that only the valid cloud users are able to download and decrypt the ciphertext from the cloud server. All this reduces the communication overhead in our scheme implementation. © 2018 Elsevier Inc. All rights reserved.

1. Introduction The adventure of cloud storage attracts a souring number of users and enterprises to store their sensitive data or database to the remote cloud server [1,7–10,18,28,29,38,42–44]. In the year 2015, Center for Democracy & Technology (CDT) issued a report that the size of the total data generated would surpass 7.9 zettabytes (ZB) [34]. Similarly, Internet Data Center (IDC) also claimed that the digital universe would grow double in size every two years, and 40 trillion gigabytes is expected in 2020 (more than 5200 gigabytes for every man, woman, and child) [17]. The fact that the data volume is increasingly growing posts a threat for the functionality of cloud server, it is essential to equip with substantial disk space and bandwidth. On the other hand, the cloud server may store abundant data, for instance, some movies or music files are stored by different cloud users, which wastes a large amount of storage space and backup space. To overcome this problem, deduplication methods over unencrypted data are presented and have attracted much attention in the past years [31]. Nowadays, deduplication techniques have been widely used in the cloud server, e.g., referred to Wuala [40], Mozy [32], Dropbox [13], and Google Drive [12]. It is seen that 90 percent of business applications disk storage space and bandwidth are saved [14].



Corresponding author. E-mail address: [email protected] (X. Chen).

https://doi.org/10.1016/j.ins.2018.05.024 0020-0255/© 2018 Elsevier Inc. All rights reserved.

160

H. Yuan et al. / Information Sciences 456 (2018) 159–173

Despite plenty of benefits, some new security challenges are also in the appearance, especially, the security of the users’ data. Since the cloud server is assumed to be honest-but-curious, it may as well try to infer and analyze the outsourced data. In this situation, to protect the privacy of their sensitive data, the cloud users have to encrypt their data before outsourcing to the cloud server. But, in general different users encrypt the same data with themselves’ encryption keys , which leads identical data output different ciphertexts, as well as deduplication unachievable. Convergent encryption (CE) provides a feasible solution to protect the privacy of data and realize deduplication [11]. The main trick is as follows. It encrypts and decrypts sensitive files with convergent keys through computing the hash value of the file. After encrypting the files, the cloud user only keeps the encryption key and outsources the ciphertext to the cloud server for saving storage. Since the hash value of the same file is deterministic and same, the same file derives the same convergent key, then the same file is determinately encrypted to the same ciphertext. This allows the cloud server to perform deduplication. Unfortunately, convergent encryption scheme has the tag consistency problem. To crack it, Bellare et al. [3] formalized the notion of message-locked encryption and a randomized convergent encryption (RCE) scheme was proposed. However, this scheme also witnesses some security flaws in ownership revocation. If cloud users who keep their encryption keys revoked, they are able to decrypt the corresponding data. In the problem of dynamic ownership management, it has become a research focus that how to restrict unauthorized cloud users from the sensitive data [33]. Recently, Hur et al. presented a secure data deduplication scheme focusing on dynamic ownership management in cloud storage (SDDOM) [20]. This scheme considered the problems of key updating and dynamic ownership management problems. But, some limitations are seen. In the stage of key generation, the cloud server fixes the maximum number of cloud users beforehand and sets a binary KEK (key-encrypting key) tree for the universe of cloud users. Therefore, when a few more cloud users join the cloud and the size further enlarges over the limited number this scheme cannot be adaptive. For detailed analysis of this scheme one is suggested in Section 3. Although some existing schemes involved to data deduplication are able ro support both cloud user revocation and new cloud user joining, they required a fully trusted third party or cloud users for abundant computing power [20,39,41]. In addition, the cloud server cannot distinguish whether the cloud user is authorized or not before the data is downloaded. In other words, the cloud server cannot confirm that whether the cloud user belongs to the group or not and possesses decryption ability. Besides, the malicious attackers are able to download ciphertexts even though the decryption is restricted, which causes enormous communication cost for the cloud server. To the best of our knowledge, these schemes cannot perfectly support cloud user joining and verify cloud users’ identity before the data downloading is completed, especially the fully trusted third party is absent in practical cloud storage systems. In this paper, we attempt to solve those above problems. Our Contribution. In this paper, we further revisit the problem of dynamic user revocation and new cloud user joining over encrypted data deduplication. Our contributions are three folds: •





We propose a data deduplication scheme with dynamic user management which supports cloud user revocation and new cloud user joining by exploiting the re-encryption techniques. Compared with the existing schemes [20,39,41], our scheme does not require a fully trusted third party and the cloud user for excessive computational overhead. Our scheme uses the access control technique to verify the validity of the cloud users before they download data. Only when the cloud users are in group, will the cloud server send ciphertext to the cloud user. Therefore, the abundant communication cost will be reduced.

1.1. Related work Convergent encryption (CE) provides the first clever solution for deduplication over encrypted data [11]. In this scheme, the data owner obtains the encryption key by hashing the data. Then, the user uses the encryption key to encrypt data and gets ciphertext. The same data is always deterministically encrypted with same secret key, realizing the data deduplication over encrypted data. However, the CE scheme is vulnerable to the tag consistency problem. To solve this problem, Bellare et al. proposed the randomized convergent encryption (RCE) scheme [3], which is an implementation of message-locked encryption and uses the additional tag checking mechanism to guarantee the integrity of users’ data. Bellare et al. [2] proposed the DupLESS scheme with secure deduplicated storage to resist brute-force attacks. Jin et al. [22] proposed an anonymous data deduplication scheme based on proxy re-encryption algorithm. To share the convergent key among the different cloud servers, Li et al. [24] realized the convergent key management by employing security Ramp secret sharing scheme [5]. Based on the predicate encryption, Shin and Kim [35] constructed an equality predicate encryption scheme that realized the deduplication over encrypted data. However, this approach is only efficient in the case of the single user deduplication. Chen et al. [6] designed a block-level message-locked encryption (BL-MLE) scheme for secure large file deduplication [37]. Wen et al. [39] proposed a session-key-based convergent key management scheme (SKC) and a convergent key sharing scheme (CKS), both of which can securely and dynamically update in data deduplication. In the SKC scheme, it is difficult to change the session key and replace the encrypted convergent key. In the CKS scheme, abundant computing power is required. Li et al. [26] augmented the CAONT [27] and designed a rekeying-aware encrypted deduplication scheme which realized the secure and lightweight rekeying. They further extended rekeying-aware encrypted deduplication scheme with dynamic access control by integrating the primitives of CP-ABE [4] and key regression [16]. Based on the ownership challenge and proxy

H. Yuan et al. / Information Sciences 456 (2018) 159–173

161

re-encryption, Yan et al. [41] proposed a scheme to deduplicate encrypted data stored in the cloud. Jiang et al. introduced a new primitive called R-MLE2 and a cloud data deduplication scheme with randomized tag [21]. Li et al. proposed several deduplication schemes supporting authorized duplicate check in a hybrid cloud architecture [25]. Liu et al. [30] proposed the first secure deduplication scheme which supported the client-side encryption without additional server based on the PAKE protocol. Tang et al. [36] innovatively modified the classical CP-ABE algorithm and presented a secure deduplication scheme over encrypted data. Li et al. proposed a distributed deduplication systems with higher reliability [23]. Hur et al. solved the dynamic ownership management problem and presented a secure data deduplication scheme [20]. In this work, the dynamic ownership management and keys’ updating problems are investigated. By introducing the group key for dynamic ownership to re-encrypt the ciphertext, it is possible that only the authorized user is able to access the shared data. This scheme is against both cloud users who are not valid in data ownership and the honest-but-curious cloud server accessing the plaintext. In this scheme, the cloud server needs to preset a binary KEK tree, limited by a maximum number, for the universe of users who own the same data. The group key is distributed by the root nodes’ KEKs of the minimum cover sets. Therefore, the SDDOM scheme resolves the dynamic ownership management problem. It is noticed that since the maximum number of total users is fixed, the SDDOM scheme no longer supports any extra cloud user joining. 1.2. Organization The rest of this paper is organized as follows. In Section 2, we give the statements of preliminaries used in our scheme. In Section 3, we take an analysis for the SDDOM scheme. In Section 4, we give the system model concerning the definition of DedupDUM scheme and security requirements, and the notations and detailed descriptions of our proposed scheme are presented in Section 5. The security and efficiency analysis are conducted in Section 6, as well as the performance evaluation is shown in Section 7. Finally, we complete conclusion in Section 8. 2. Preliminaries In this section, we introduce the notations and primitives that will be used in our scheme. 2.1. Randomized convergent encryption Randomized convergent encryption scheme consists of four algorithms described as follows. Definition 1. A randomized convergent encryption scheme RCE = (RCE.KeyGen, RCE.Encrypt, RCE.Decrypt, RCE.TagGen) consists of the following algorithms: •







RCE.KeyGen(M) → (K) is the key generation algorithm that takes the data M as input, then outputs the convergent key K; RCE.Encrypt(K, M) → (C) is the symmetric encryption algorithm that takes the data M and the convergent key K as inputs, then outputs C = C1 ||C2 ||T ; RCE.Decrypt(K, C) → (M) is the symmetric decryption algorithm that takes the ciphertext C = C1 ||C2 ||T and the convergent key K as inputs, then outputs the original data M; RCE.TagGen(C) → TM is the tag generation algorithm that takes the ciphertext C = C1 ||C2 ||T as input, then outputs the tag TM .

2.2. Elgamal public-key encryption scheme Elgamal [15] proposed a public key cryptosystem. We define the ElGamal encryption as follows. Let G be a cyclic multiplicative group with prime order p. Let g ∈ Z p be a generator of G. Definition 2. An ElGamal Public-key encryption scheme consists of the following algorithms ElGamal = (Key.Generation, Encryption, Decryption). •





Key.Generation: The key generation algorithm takes ( p, G, g) as input, outputs the public key and secret key. – The user uniformly selects an element x ∈ Z p and computes X = gx mod p, then outputs a public key e = ( p, G, g, X ) and secret key d = ( p, G, g, x ). Encryption: The ElGamal encryption algorithm takes ( p, G, g, X ) and data M as inputs, outputs the ciphertext C. The operation is described as follows: – For the data M, the user uniformly selects an element k ∈ Z p . – The user computes C1 = gk mod p and C2 = mX k mod p, then outputs the ciphertext C = C1  C2 . Decryption: The decryption algorithm takes ( p, G, g, x ) and ciphertext C = C1  C2 as inputs, outputs the plaintext M = C2 /C1 x mod p.

162

H. Yuan et al. / Information Sciences 456 (2018) 159–173

Fig. 1. Binary KEK tree.

3. Analysis of SDDOM scheme Hur et al. presented a secure data deduplication scheme with dynamic ownership management in cloud storage [20] which considered the problems of key updating and dynamic ownership management. For privacy of their sensitive data, their scheme here assigns the cloud user firstly to use the RCE encryption algorithm to encrypt their data, and the cloud server then performs data deduplication and re-encrypt sensitive data through group key. In addition, for better distribution the group keys, the cloud server constructs a binary KEK (key-encrypting key) tree for the universe of users. The total number of cloud users is bounded by a fixed number. In the binary KEK tree, each node holds a KEK, and every user, represented by a leaf node, obtains the KEKs on the path nodes from its leaf to the root. The cloud server selects root nodes of the minimum cover sets in the binary KEK tree that covers all of the leaf nodes associated with users who upload the same data. The group key is encrypted by the root nodes’ KEKs of the minimum cover sets, and only the valid owners is permitted to obtain the group key. The unauthorized users are prevented from accessing the plaintext stored in the cloud server. Therefore, Hur et al. claimed that their scheme resolves the problem about dynamic ownership management. An example of SDDOM scheme can be shown in Fig. 1. [20], the universe users U = {u1 , u2 , u3 , u4 , u5 , u6 , u7 , u8 }. The user u2 possesses the path keys P K2 = K EK1 , K EK2 , K EK4 , K EK9 . For the group G = {u1 , u2 , u3 , u4 , u7 , u8 }, the root nodes of the minimum cover sets are v2 and v7 , K EK (G ) = {K EK2 , K EK7 }. Then the cloud server randomly chooses the group key and encrypts it by C = {EK (GK )}K ∈K EK (G ) . Due to KEK2 ∈ PK2 , the user u2 can decrypt the ciphertext C and gets the group key GK. However, in the stage of key generation, the cloud server sets a binary KEK tree for the universe of users. It implicitly assumes that the total number of the cloud users is fixed, bounded by a maximum number, and known to the cloud server. However, in the cloud environment, it is difficult to determine the number of universe users before data uploading is performed, and the cloud users are able to upload and access their data at any moment. Therefore, this assumption is unpractical. Besides, once the cloud server sets a binary KEK tree for the universe of users, this scheme will not support new cloud user joining. If a new user wants to insert into a new leaf node of the binary KEK tree, it will take abundant overhead to re-construct the binary KEK tree. If a new cloud user is allowed to insert into the same leaf node as a revocation user. The new user securely receives the path keys, the same as the revocation user, from its leaf node to the root node of the tree. That means that the revocation user has been removed from the binary KEK tree, he also can get the group key if he keeps the path keys. Above all, the SDDOM scheme cannot support the user joining in cloud environment. If the user u2 is revoked from the binary KEK tree and a new user u9 is inserted into the position of u2 in the binary KEK tree, as shown in Fig. 2. Then the user u9 securely receives the path keys P K9 = K EK1 , K EK2 , K EK4 , K EK9 from the cloud server. Now, the group becomes G = {u1 , u9 , u3 , u4 , u7 , u8 } and K EK (G ) = {K EK2 , K EK7 }. Then the cloud server randomly chooses a new group key GK and encrypts it by C  = {EK (GK  )}K ∈K EK (G ) . Because the revocation user u2 can use reserved path keys PK2 to decrypt C and get the GK (KEK2 ∈ KEK(G )), the privacy of user’s data is violated. Therefore, the SDDOM scheme cannot support the user joining in cloud environment.

H. Yuan et al. / Information Sciences 456 (2018) 159–173

163

Fig. 2. User revocation and new user joining.

Fig. 3. The data deduplication model.

4. System model and formal definitions 4.1. System model In this paper, we propose a secure and scalable data deduplication scheme with dynamic user management (DedupDUM). The system of DedupDUM contains two entities: the cloud user and the cloud server. The model of data deduplication is similar to SDDOM [20], which is described in Fig. 3. Data deduplication schemes can be categorized into file-level and blocklevel deduplication. The block data in the block-level scheme can be considered as file data in the file-level scheme and the idea of file-level scheme can easily be used to design block-level scheme. Thus, we only consider the file-level deduplication in this paper. •

Cloud user: A cloud user is an entity that wants to outsource data to the cloud server and access data later. He encrypts his data and uploads it with its index information into the cloud server. Then, the cloud user removes M for saving storage space. We construct a group which containing the cloud users who upload the same data. The group is applied to protect the data privacy.

164 •

H. Yuan et al. / Information Sciences 456 (2018) 159–173

Cloud server: The cloud server is an entity that provides cloud storage services, including deleting redundant data and only storing a single copy of them. The cloud server manages the ownership list of the data, involved to tags of the stored data, the identities of cloud users and the corresponding public keys. Furthermore, the cloud server checks the cloud user’s identification before he downloads the sensitive data. Finally, the cloud server is assumed to be honest-butcurious. That is, it will execute the protocol honestly, but also curious about the contents of stored data. The cloud server should not be accessible to the plaintext.

4.2. The definition of DedupDUM Definition 3. The DedupDUM scheme DedupDUM = (Setup, Encrypt, Group-Encrypt, Group-Decrypt, Decrypt) consists of the following algorithms: •









Setup: The setup algorithm chooses a cyclic multiplicative group G with prime order p. Let g ∈ Z p be a generator of G. Let H: {0, 1}∗ → {0, 1}n be a cryptographic hash function. Let the public parameters be P P = ( p, g, G, H ). Encrypt(M ): The encryption algorithm takes the plaintext M as input, then the cloud user runs the key generation algorithm to get the key K ← RCE.KeyGen(M), runs the opening algorithm to compute C = C 1 ||C 2 ||T ← RCE.Encrypt(K, M) and uses the ciphertext C to generate tag T ← RCE.TagGen(C). In addition, the cloud user randomly chooses an element x ∈ Z p as his secret key and computes X = gx mod p as his public key. Finally, the encrypt algorithm outputs ciphertext C, tag T and public key X. Group-encrypt(C ): For the ciphertext C = C 1 ||C 2 ||T , the cloud server firstly generates the group key G. Then the cloud   server re-encrypts ciphertext C1 by C 1 = EH (G ) (C 1 ) and constructs C  = C 1 ||C 2 ||T .  Group-decrypt(U, C ): The cloud user firstly uses auxiliary information U to recover group key G = U × X mod p. Then   the cloud user uses group key G to decrypt C 1 by C 1 = DH (G ) (C 1 ). Now the cloud user gets the ciphertext C = C 1 ||C 2 ||T . Decrypt(K, C ): The cloud user runs the opening algorithm to generate the tag T ← RCE.TagGen(C) and the original data M ← RCE.Decrypt(K, C). If the decryption algorithm returns M, the cloud user accepts M; if the algorithm returns ⊥, the cloud user drops the message.

4.3. Security requirements The security requirements of our scheme, similar to SDDOM [20], are elucidated as follows: •









Data privacy: The cloud server cannot fully be trusted. Therefore, the cloud server should be prevented from obtaining the plaintext. Additionally, any unauthorized cloud user who does not own the valid ownership should not obtain the plaintext. Data integrity: The data deduplication scheme needs to ensure the tag consistency and protect the data from any poison attacks. The cloud server should guarantee the correctness of data. The data deduplication scheme should allow the authorized cloud user to verify that the data is not to be tampered. Backward secrecy: Before uploading the data, the cloud user should be prevented from accessing the plaintext of the outsourced data. Although the cloud user possesses the data, it should be assumed to be the unauthorized cloud user. Forward secrecy: Contrary to the backward secrecy. After the cloud user deletes or updates the data, it is assumed that he no longer belongs to the group. The cloud server should also need to prevent the cloud user from accessing the outsourced data. Collusion resistance: Unauthorized cloud users whose do not own the ownership of the sensitive data in the cloud server should not be able to access the plaintext even if they collude.

5. Construction of DedupDUM scheme 5.1. Notations Table 1 summarizes the notations used in the DedupDUM scheme. 5.2. A concrete scheme In this subsection, we construct the DedupDUM scheme. In the cloud environment, the cloud server updates the ownership list of the cloud users frequently. The unauthorized cloud users should be prevented from accessing the data stored in the cloud server before uploading their own data or after deleting or updating the data. We assume that the cloud users in the same group upload the same data. To protect the forward and backward secrecy of data, we apply the group key to re-encrypt data. Compared with the existing data deduplication schemes, our scheme does not need to introduce a fully trusted third party and cloud users to spend abundant computing power for ownership’s dynamic management [20,39,41]. Besides, the existing data deduplication schemes cannot differentiate whether the cloud user is authorized or not before they download the data. Therefore, the malicious attacker can download ciphertexts even though they cannot decrypt. All

H. Yuan et al. / Information Sciences 456 (2018) 159–173

165

Table 1 Notation. Symbol $

x←X

λ

a||b Ti Gi IDt Xti Ui Vi

Meaning An element x selected uniformly at random from a set X. λ ∈ N denotes the security parameter. a concatenates b. The tag of data Mi . The group key of data Mi . The identity of the cloud user ut . The user’s ut public key of data Mi . The part of group key generated by the cloud server to help cloud user ui constructs group key Gi . The ownership list Vi = {Ti , (IDt , Xti )} for data Mi , contains tag Ti and each group user’s IDt corresponding to public key Xti storage in the cloud.

this causes the cloud server to spend enormous communication cost. In order to overcome this problem, we use access control technique to verify whether the cloud users are in the group or not. When the cloud users have the decryption ability, they can obtain the ciphertext. Hence, our scheme reduces the abundant communication cost. Our scheme consists of five algorithms defined as follows: •



Setup: Let G be a cyclic multiplicative group with prime order p. Let g ∈ Z p be a generator of G. Encryption algorithm EK (M) is a symmetric encryption that takes the key K and the plaintext M as inputs, then outputs the ciphertext C. Decryption algorithm DK (C) is a symmetric decryption that takes the key K and the ciphertext C as inputs, then outputs the plaintext M. Encryption algorithm E(M, pk) is the ElGamal public-key encryption that takes the public key pk and the plaintext M as inputs, then outputs the ciphertext C. Decryption algorithm D(C, sk) is the ElGamal public-key decryption that takes the secret key sk and the ciphertext C as inputs, then outputs the plaintext M. Let H: {0, 1}∗ → {0, 1}n be a cryptographic hash function. Let the public parameters be P P = ( p, g, G, H ). Every cloud user randomly chooses a number IDi as their identity before they upload their data to the cloud server. Data Upload: Suppose that a cloud user ut wants to upload data Mi to the cloud server, and his corresponding ownership list Vi and ciphertext Ci does not exist in the cloud server, we call the cloud user ut the first uploader. The cloud user ut runs Encrypt(Mi ) algorithm as follows. Firstly, the cloud user runs the key generation algorithm to get the key Ki ← RCE.KeyGen(Mi ). Then the cloud user uses the key Ki to encrypt the plaintext Mi by Ci = Ci1 ||Ci2 ||T ← RCE.Encrypt(Ki , Mi ) and uses the ciphertext Ci to generate Ti ← RCE.TagGen(Ci ). Secondly, the cloud user ut randomly chooses an element i

xti ∈ Z p as his secret key and computes Xti = gxt mod p as his public key. Finally, the cloud user ut sends upload||Ti ||Ci ||IDt to the cloud server and sends Xti to the cloud server through a secure channel (e.g., SSL), with deleting the data Mi . The cloud user ut only retains the key Ki and corresponding secret key xti for storage saving. After the cloud server receiving data, if the cloud user ut is the first uploader, the cloud server firstly creates Vi = {Ti }. Then the cloud server generates two-tuples (IDt , Xti ) and inserts it into Vi . The cloud server keeps the ownership list Vi securely. For protecting the data security, the cloud server needs to prevent unauthorized cloud users from accessing to the plaintext. The cloud server reencrypts ciphertext Ci before distributing it. Therefore, the cloud server conducts Group-encrypt(Ci ) algorithm described as follows: 1. For the ciphertext Ci , the cloud server sets ni = 1 and randomly chooses an element yni ∈ Z p and computes Yni = gyni mod p. 2. The cloud server constructs the ownership group key Gi = Y1 Xti mod p. Then the cloud server re-encrypts ciphertext   Ci1 = EH (Gi ) (Ci1 ) and constructs Ci = Ci1 ||Ci2 ||Ti . If the ownership list Vi and ciphertext Ci already existed in the cloud server. We call the cloud user ut the subsequent uploader. The cloud user ut runs Encrypt(Mi ) algorithm as follows. Firstly, the cloud user runs the key generation algorithm to generate the key Ki ← RCE.KeyGen(Mi ). Then the cloud user uses the key Ki to encrypt the plaintext Mi by Ci ← RCE.Encrypt(Ki , Mi ) and uses the ciphertext Ci to generate Ti ← RCE.TagGen(Ci ). (Because the encryption key L is chosen randomly, with the overwhelming probability, the ciphertext Ci = Ci . The subsequent uploader uploads ciphertext Ci to prevent side-channel attack [19]). Secondly, the cloud user ut randomly chooses an element xti ∈ Z p and

computes Xti = gxt mod p. Finally, the cloud user ut sends upload||Ti ||Ci ||IDt to the cloud server and sends Xti to the cloud server through a secure channel. If Ti = Ti , the cloud server inserts (IDt , Xti ) into Vi and sets ni = ni + 1, then it conducts Group-encrypt(Ci ) algorithm described as follows: i





3. The cloud server decrypts the ciphertext Ci1 as Ci1 = DH (Gi ) (Ci1 ) with the current ownership group key Gi .

4. The cloud server uses number ni to randomly choose an element yni ∈ Z p and computes Yni = gyni mod p. The new group key is Gi

Gi =

GiYni Xti mod p Yni −1

166

H. Yuan et al. / Information Sciences 456 (2018) 159–173 

5. The cloud server re-encrypts the ciphertext Ci1 with new group key Gi and generates ciphertext Ci1 = EH (Gi ) (Ci1 ). Then



 Ci1 ||Ci2 ||Ti .

the finally ciphertext is Ci = Data Download: If a cloud user uj wants to download the data Mi , he computes tag Ti = H (Ki ) and sends download||Ti ||IDj to the cloud server. The data decryption processes is described as follows: 1. The cloud server checks the list Vi to find the public key X ji that corresponds to the IDj . Then the cloud server chooses a random number r ∈ Z p and sends R = E (r, X ji ) to the cloud user uj .

2. Upon receiving the ciphertext R = E (r, X ji ), the cloud user uj decrypts the ciphertext R to get the plaintext r  = D(R, xij ), then he sends H(r ) to the cloud server.

3. The cloud server receives the message H(r ) and checks whether H (r  ) = H (r ). If H (r  ) = H (r ), the cloud server computes auxiliary information Uj IDt =ID j

U j = Yni ×



Xti mod p

IDt ∈Vi



Then the cloud server sends Ci = Ci1 ||Ci2 ||Ti ||U j to the cloud user uj , or does nothing. (Our scheme introduces an identity authentication mechanism, which only admits the group user to receive the ciphertext. This will save the mass of communication cost.) 4. After passing the verification, the cloud user uj receives Ci from the cloud server and runs the Group-decrypt(U j , C  ) algorithm described as follows. The cloud user uj firstly recovers the group key Gi = U j X ji mod p. Then the cloud user 





uses Gi to decrypt Ci1 as Ci1 = DH (Gi ) (Ci1 ). Now the cloud user gets the ciphertext Ci = Ci1 ||Ci2 ||Ti . Finally, the cloud user runs the Decrypt(Ki , Ci ) algorithm described as follows. The cloud user uses the ciphertext Ci to get the tag Ti ← RCE.TagGen(Ci ) and uses the key Ki to recover the original data Mi ← RCE.Decrypt(Ki , Ci ). If the decryption algorithm returns Mi , the cloud user accepts Mi . If the algorithm returns ⊥, the cloud user drops the message. Data Deletion: If the cloud user us wants to delete the data Mi , he sends delete||Ti ||IDs to the cloud server. The processes of data deletion are described as follows: 1. The cloud server checks the list Vi to find the public key Xsi that corresponds to the IDs . Then the cloud server chooses the random number r ∈ Z p and sends R = E (r, Xsi ) to the cloud user us . 2. Upon receiving the ciphertext R = E (r, Xsi ), the cloud user us decrypts the ciphertext R to get the plaintext r  = D(R, xis ) and then sends H(r ) to the cloud server. 3. The cloud server receives H(r ) and checks whether H (r  ) = H (r ). • If H (r  ) = H (r ), the cloud server conducts Group-encrypt (C  ) algorithm. Firstly, the cloud server decrypts C 1 = i i 

DH (Gi ) (Ci1 ), sets ni = ni + 1, uses new number ni to randomly choose an element yni ∈ Z p and computes Yni = gyni mod p. The new group key is Gi =

i

mod p. Then the cloud server deletes (IDs , Xs ) from Vi and re-encrypts

ciphertext Ci = EH (Gi ) (Ci1 ). Finally, Otherwise, it stops running the algorithm. Data Update: If a cloud user us needs to update the data Mi to Mj , us runs Encrypt(M j ) algorithm. Firstly, the cloud user runs the key generation algorithm to get the key Kj ← RCE.KeyGen(Mj ). Then the cloud user uses the key Kj to get the ciphertext C j ← RCE.Encrypt(Kj , Mj ) and uses the ciphertext C j to generate T j ← RCE.TagGen(C j ). Secondly, •



Gi Yn

Yn −1 Xsi i  Ci = Ci1 ||Ci2 ||Ti .

j

j

the cloud user us randomly chooses an element xs ∈ Z p and computes Xs = gxs mod p. Finally, the cloud user us sends

updat e||Ti ||T j ||C j ||IDs to the cloud server and sends Xs to the cloud server through a secure channel. The cloud server conducts the following operations : 1. The cloud server checks the list Vi to find the public key Xsi that corresponds to the IDs . Then the cloud server chooses the random number r ∈ Z p and sends R = E (r, Xsi ) to the cloud user us . 2. Upon receiving the ciphertext R = E (r, Xsi ), the cloud user us decrypts the ciphertext R to get the plaintext r  = D(R, xis ) and then sends H(r ) to the cloud server. 3. The cloud server receives H(r ) and checks whether H (r  ) = H (r ). • If H (r  ) = H (r ), the cloud server conducts Group-encrypt (C  ) algorithm. Firstly, the cloud server decrypts C 1 = i i j



DH (Gi ) (Ci1 ), sets ni = ni + 1. Then it uses the new number ni to randomly choose an element yni ∈ Z p and computes Yni = gyni mod p. The new group key is Gi =  Ci1

Gi Yn

and re-encrypts ciphertext = EH (Gi ) (Ci1 ). Finally, Else, it does nothing and stops running the algorithm. 4. The cloud server checks whether the T j exists. •



i

mod p. Then the cloud server deletes (IDs , Xs ) from Vi

Yn −1 Xsi i  Ci = Ci1 ||Ci2 ||Ti .

If the T j exists, the cloud server inserts (IDs , Xs ) into Vj and sets n j = n j + 1, then it conducts re-encryption operations described in data upload 3-5. j

H. Yuan et al. / Information Sciences 456 (2018) 159–173 •

167

Otherwise, it means that the cloud user us is the first uploader. The cloud server firstly generates V j = {T j }.

Then the cloud server generates two-tuples (IDs , Xs ) and inserts it into Vj , sets n j = 1, stores C j and conducts re-encryption operations described in data upload 1-2. j

6. Analysis of the proposed scheme 6.1. Security analysis In this section, we analyze our scheme security involving five security requirements: data privacy, data integrity, backward secrecy, forward secrecy and collusion resistance. We assume that the underlying basic tools are secure, which include the random convergent encryption scheme, symmetric encryption scheme and ElGamal encryption scheme under the Decisional Diffie–Hellman assumption. The security of our scheme is ensured by those assumptions. Theorem 6.1. The proposed DedupDUM scheme guarantees the data privacy. Proof. In the described trusted model, the cloud server is assumed honest-but-curious. Therefore, the cloud server and unauthorized cloud users should not be able to obtain the plaintext.  One attack may be launched by the cloud server. The ciphertext C = C 1 ||C 2 ||T is stored in the cloud server. Although the cloud server owns the group key, it is computationally infeasible to guess the key K because of the property of cryptographic  hash function. Although the cloud server can decrypt the ciphertext C 1 and gets the ciphertext C1 , it also cannot obtain the 2 1 secret key L from C . The cloud server cannot decrypt C and obtain the plaintext M. Another attack may be launched by the unauthorized cloud user. The unauthorized cloud user wants to obtain the plaintext. In our scheme, we use the ElGamal encryption scheme to realize the access control [15]. Any unauthorized cloud user should be prevented from receiving any message about data. This method not only prevents the unauthorized cloud user from accessing the ciphertext, but also saves abundant communication cost. Although the unauthorized cloud user obtains   the ciphertext C = C 1 ||C 2 ||T , he cannot obtain the group key and cannot decrypt ciphertext C 1 even if he obtains key L. Therefore, our data deduplication scheme prevents the honest-but-curious cloud server and the unauthorized cloud users accessing the plaintext and the data privacy is guaranteed.  Theorem 6.2. The proposed DedupDUM scheme guarantees the data integrity. Proof. In the data deduplication scheme, attacker will use the correct tag and others ciphertexts to damage data integrity.   We assume that the attacker and a cloud user u own the same data M. The attacker uses M = M to generate ciphertext C , then he uploads the ciphertext with the tag T and pretended to be M. Our scheme is based on the RCE scheme [3]. In the RCE scheme, tag consistency can be easily detected. When the cloud  user u requests the data, the cloud server sends ciphertext C 1 ||C 2 ||T , the part of group key U and tag T to the cloud user u.  The ownership group key is obtained by G = U × X mod p. Then the cloud user u uses group key G to decrypt ciphertext C 1 and gets ciphertext C1 . Then he uses key K to get the secret key L from C2 . Finally, the cloud user u decrypts the ciphertext C1 and gets the data M , then uses M to compute T  = H (P, H (P, M )). The cloud user u checks whether T  = T holds or not, if the tag is not consistent then the cloud user drops the message. The tag consistency is easily detected. Therefore, our scheme guarantees the data integrity.  Theorem 6.3. The proposed DedupDUM scheme guarantees the backward secrecy of the outsourced data. Proof. Before a cloud user u uploads the data M, the data M already existed in the cloud server. The corresponding cipher text is C = C 1 ||C 2 ||T . The current group key does not include the cloud user’s public key and the cloud user does not exist in the ownership list V. Therefore, the cloud user cannot pass the identity authentication and the cloud server would ignore the cloud user requests. Because the cloud user cannot get the group key, the cloud user is prevented from accessing the plaintext. The backward secrecy is guaranteed.   Theorem 6.4. The proposed DedupDUM scheme guarantees the forward secrecy of the outsourced data. Proof. After a cloud user deletes or updates the data, the corresponding group key GK and the ownership list V are updated immediately. Since the delete and update can be seen as the cloud user is revoked from the data ownership list. The   cloud server changes the group key from G to G . Then, the cloud server decrypts the ciphertext C 1 as C 1 = DH (G ) (C 1 ) and 

generates the new ciphertext C 1 = EH (G ) (C 1 ). When the cloud user is revoked from the list, the cloud user’s ID and the public key are deleted from data ownership list. Therefore, the cloud user cannot pass the identity authentication and get the group key. Our scheme can guarantee the forward secrecy.  Theorem 6.5. The proposed DedupDUM scheme is secure against the collusion attack. Proof. In order to guarantee collusion resistance, even the unauthorized cloud users collude, they should not be able to get the plaintext. In our data deduplication scheme, the cloud user only possesses the encryption key L and group key G, then they can decrypt the ciphertext and get the plaintext. The group key is generated by the valid cloud users’ public key. After

168

H. Yuan et al. / Information Sciences 456 (2018) 159–173 Table 2 Comparison of secure deduplication schemes. Scheme

CE [11]

RCE [3]

SDDOM [20]

Our scheme

Encrypted data deduplication Tag consistency Ownership management User joining Access control

yes no no no no

yes yes no no no

yes yes yes no no

yes yes yes yes yes

Table 3 Communication overhead. Scheme

Upload message size

CE [11] RCE [3] SDDOM [20] Our scheme

CC CC CC CC

+ CT + CK + CK + CK

+ CID + CT + CID + CT + CID + CT + CID + CX

Download message size

Rekeying key size

Access control

CC CC + CK + CT CC + CK + CT CC + CK + CT

– – (n − m ) log CX

– – – CX + CT

n C n−m K

the cloud user passing the identity authentication, the cloud server will send the auxiliary information to the valid cloud user, which is used to help the cloud user to recover the group key. When the cloud user revocation or new user joining, the group key will be changed and the data is re-encrypted immediately. Even if the unauthorized cloud users maybe own the key K, which can be used to decrypt C2 and get the key L. Because the unauthorized cloud users cannot pass the identity authentication and get the group key G, they also cannot decrypt the ciphertext and get the plaintext. Therefore, collusion resistance is guaranteed in our scheme.  6.2. Efficiency analysis Here, similar to SDDOM [20], we take a comparison of our scheme with previous schemes by using Tables range from 2 to 6. Table 2 is a comparison among four data deduplication schemes. These four schemes are the convergent encryption scheme [11], the randomized convergent encryption scheme [3], secure data deduplication scheme with dynamic ownership management [20] and our scheme in terms of encrypted data deduplication, tag consistency, ownership management, user joining and access control. All the data deduplication schemes allow the cloud users to encrypt their data, they can prevent the unauthorized cloud users and the cloud server from accessing the plaintext and guarantee the data privacy. Convergent encryption neglects the data integrity, it’s vulnerable to the tag consistency attack. Because of the additional mechanism adopted in other schemes, other data deduplication schemes enable the cloud users to detect the tag consistency of their sensitive data. In the SDDOM scheme, the cloud server needs to preset a binary KEK tree for distributing the group key for the universe of users who own the same data, here the total number of cloud users are bounded by a maximum number. The cloud server selects root nodes of the minimum cover sets in the binary KEK tree that can cover all of the leaf nodes associated with cloud users who upload the same data. The group key is encrypted by the root nodes KEKs of the minimum cover sets. Therefore, the SDDOM scheme resolves the dynamic ownership management problem. Since the binary KEK tree is stationary, the KEK tree no longer supports any other new cloud user joining. In addition, the SDDOM scheme cannot distinguish the authorized cloud users before they download sensitive data, then anyone can download the data from the cloud server. This will cause the cloud server to spend enormous communication cost. Different from the previous schemes, we construct an ownership list for every cloud users’ data. Because the group key is generated by the cloud users’ public key, our scheme supports the cloud user revocation and new user joining. Before a cloud user download the data, we enable to check the cloud user’s identity. Only those cloud users who belong to the group and have the decrypt ability can download the data. The abundant communication overhead can be greatly reduced. The main overhead of our scheme consists of the communication overhead and the storage overhead. Table 3 presents the comparison in term of communication overhead among four schemes. CC denotes the size of the encrypted data, CID denotes the size of a cloud user’s ID, CT denotes the size of a tag, CK denotes the size of a key, CX denotes the size of a public key, n denotes the number of cloud users in the system and m denotes the number of owners in an ownership list for an encrypted data. To achieve a 128-bit security level, we set CK = 128 bits, CT = 128 bits and CX = 3072 bits. The SDDOM scheme and the RCE scheme have the same upload and download message sizes. In our scheme, the size of uploading message increases the size of cloud user’s public key. The additional message is used to support cloud user joining and identity authentication. With regard to the rekeying message size, because the CE scheme and RCE scheme didn’t consider key updating for the data ownership management, the SDDOM scheme and our scheme increase the size of n re-encryption key. In the SDDOM scheme, the rekeying message size is (n − m ) log n−m CK , which increase with the number

H. Yuan et al. / Information Sciences 456 (2018) 159–173

169

Table 4 Storage overhead. Scheme

CE [11]

RCE [3]

SDDOM [20]

Our scheme

Key size Tag size

CK CT

CK CT

(logn + 1 )CK CT

CX + CK CT

Fig. 4. The number of keys.

of users. In our scheme, the rekeying message size is CX , which does not increase with the number of users. The additional message is used to protect the backward and forward secrecy. However, in the CE scheme, the encryption key is determined by the data, in the RCE scheme, the encryption key is determined by the initial uploader and never changed. Thus, CE and RCE schemes suffer from security flaws with respect to ownership revocation. With regard to the access control, only our scheme supports access control. We use the access control technique to verify the validity of the cloud users before they download data. When the cloud users belong to the group and have the decryption ability, the cloud server will send ciphertext to the cloud user. The abundant communication cost can be saved. Table 4 presents the comparison in term of storage overhead. In the SDDOM scheme, the cloud user needs to store log n additional KEKs which is used to recover the group key. In our scheme, the cloud user only needs to store one additional key CX for recovering group key. The number of additional key does not increase with the number of users. Our scheme can reduce the abundant storage space especially in the case of the huge number of cloud users situation. The details are as shown in Fig. 4.

7. Performance evaluation In this section, we compare the evaluation performance of the proposed data deduplication scheme with previous works. Each cryptographic operation is implemented by using the OpenSSL library ver. 1.1.0 and Crypto++ library ver. 5.6.5. We test the time cost of data encryption and decryption algorithm with AES where the key is 128-bit. The data size ranges from 10MB to 100MB. We set CK = 128 bits, CT = 128 bits, prime order p = 2048 bits. The testing environment is Intel(R) Core(TM) i7-3770 CPU 3.40 GHz 16.0GB RAM, Ubuntu v12.04.2 8.0GB RAM, Quad-Core processor and 20.0GB Hard disk. We compare the SDDOM scheme with our scheme in upload phase. Both of the two schemes need twice hash operations and once data encryption operation (We didn’t consider the time for key generation). In the download phase, the same part of the SDDOM scheme and our scheme is that these two schemes need twice hash operations and twice decryption  operations. Once decryption operation uses the group key to decrypt Ci1 ← DH (Gi ) (Ci1 ). Another operation uses the key L to

decrypt Mi ← DL (Ci1 ). The different part is that the SDDOM user keeps the log n path keys, the path key is used to decrypt the group key. By contrast, our scheme only keeps the secret key x and the group key constructed by G = U × gx mod p. Then the cloud user uses the group key to decrypt the ciphertext. In the Table 5, we assume that the size of one block is 1MB, the encryption and decryption time are 8.272 ms and 9.692 ms, respectively. Let n denotes the number of blocks. The time of identity authentication is not considered in the Table 5. Computation cost of the proposed scheme is worse than the SDDOM scheme, because our scheme using public-key encryption scheme to recover the group key and the SDDOM scheme using symmetric encryption scheme to recover the group key. The key decryption time with 2048-bit key costs 1.096 ms in our scheme, it is slower than 0.007 ms in SDDOM scheme. The decryption time is 97.921 ms if the scheme uses 128-bit AES key to decrypt the 10MB data. Therefore, the time cost of key decryption is relatively negligible. In general, the decryption time of 10MB data is more than 90 ms and the

170

H. Yuan et al. / Information Sciences 456 (2018) 159–173 Table 5 Comparison of computation cost. Operation

SDDOM [20]

Our scheme

upload

download

upload

download

Hash (ms) Encrypt (ms) Decrypt (ms) Key decrypt (ms) Computation (ms)

0.008 8.272n – – 8.272n + 0.008

0.008 – 19.384n 0.007 19.384n + 0.015

0.008 8.272n – – 8.272n + 0.008

0.008 – 19.384n 1.096 19.384n + 1.104

Fig. 5. Encryption and decryption time.

communication time is more than 10 0 0 ms (10MB/s). The identity authentication time in our scheme only needs 2.072 ms which is relatively negligible. 7.1. Computation time for encryption and decryption The DedupDUM scheme based on the RCE scheme. In the encryption stage, the key generation time in our scheme is longer than the RCE scheme. However, the CE and RCE schemes are not considering the ownership changes in the ownership list. Compared with previous works, the SDDOM scheme and our scheme resolve the dynamic ownership management problem. By using the rekeying mechanism, the backward secrecy and forward secrecy are protected. The detailed data encryption and decryption time for different data sizes (ranging from 10MB to 100MB) are shown in Fig. 5. 7.2. Computation time for upload For the upload time, the SDDOM scheme requires the same computations as the CE and RCE schemes. As described in Section 5, our scheme uses the cloud users’ public key to construct group key. The cloud user randomly chooses an element i xti ∈ Z p as its secret key and computes Xti = gxt mod p as the public key. Compared with other schemes, the public key in our scheme is needed to be transmitted to the cloud server. Whereas, compared with the ciphertext, the size of the public key is negligible. The cost computation time for upload as shown in Fig. 6. 7.3. Computation time for download Compared with other schemes, we apply the access control mechanism to verify whether the cloud user is in the group or not. Before a cloud user downloads the data, the cloud server encrypts a random number r using the cloud user’s public key to check the cloud user’s identity. The cloud user who possesses the corresponding secret key can obtain the random number r. If the cloud user cannot pass the verification, the cloud server would not transmit the ciphertext to the cloud user. Although the time overhead increases in our scheme, the time overhead is short that it can be negligible. Therefore, the cloud server can save abundant communication cost. The measure results of computation time for downloading ciphertext between our scheme and other schemes are depicted in Fig. 7. The measure results of computation time for uploading and downloading is summarized in Table 6.

H. Yuan et al. / Information Sciences 456 (2018) 159–173

171

Fig. 6. Computation time for upload.

Fig. 7. Computation time for download.

Table 6 Upload time and download time (ms). Data size

10 MB 20 MB 40 MB 60 MB 80 MB 100 MB

CE [11]

SDDOM [20]

Our scheme

upload

download

upload

RCE [3] download

upload

download

upload

download

82.724 154.964 302.852 456.275 560.966 708.595

97.921 182.764 375.752 535.527 736.095 865.168

82.724 154.964 302.852 456.275 560.966 708.595

97.929 182.772 375.760 535.535 736.103 865.176

82.724 154.964 302.852 456.275 560.966 708.595

195.857 365.543 751.519 1071.069 1472.205 1730.351

83.700 155.940 303.828 457.251 561.942 709.571

198.898 368.584 754.560 1074.110 1475.246 1733.392

8. Conclusion In this paper, we propose a secure and scalable data deduplication scheme with dynamic user management. By using the symmetric encryption to re-encrypt the sensitive data, it not only provides a dynamic group user update in a secure way, but also prevents the unauthorized cloud user from accessing the sensitive data. To further mitigate the communication overhead, we use the access control technique to verify the validity of the cloud users before they download data. When a cloud user is in the group and have the decryption ability, the cloud server will send ciphertext to the cloud user. Therefore,

172

H. Yuan et al. / Information Sciences 456 (2018) 159–173

the DedupDUM scheme can efficiently prevent the unauthorized users from accessing and downloading the sensitive data of valid users, which greatly reduces the unnecessary communication overhead. Acknowledgments This work is supported by the National Natural Science Foundation of China (Nos. 61572382, 61702402, 61772405), Key Project of Natural Science Basic Research Plan in Shannxi Province of China (No. 2016JZ021) and China 111 Project (Grant No. B16037). References [1] J. Baek, R. Safavi-Naini, W. Susilo, On the integration of public key data encryption and public key encryption with keyword search, in: Proceedings of International Conference on Information Security, Samos Island, 2006, pp. 217–232. [2] M. Bellare, S. Keelveedhi, T. Ristenpart, DupLESS: server-aided encryption for deduplicated storage, in: Proceedings of USENIX Conference on Security, Washington, 2013, pp. 179–194. [3] M. Bellare, S. Keelveedhi, T. Ristenpart, Message-locked encryption and secure deduplication, in: Proceedings of International Conference on the Theory and Applications of Cryptographic Techniques, Athens, 2013, pp. 296–312. [4] J. Bethencourt, A. Sahai, B. Waters, Ciphertext-policy attribute-based encryption, in: Proceedings of IEEE Symposium on Security and Privacy, Oakland, 2007, pp. 321–334. [5] G.R. Blakley, C. Meadows, Security of ramp schemes, in: Proceedings of CRYPTO, Santa Barbara, 1984, pp. 242–268. [6] R. Chen, Y. Mu, G. Yang, F. Guo, BL-MLE: block-level message-locked encryption for secure large file deduplication, IEEE Trans. Inf. Foren. Secur. 10 (12) (2015) 2643–2652. [7] X. Chen, J. Li, X. Huang, J. Ma, W. Lou, New publicly verifiable databases with efficient updates, IEEE Trans. Depend. Secure Comput. 12 (5) (2015) 546–556. [8] X. Chen, J. Li, J. Weng, J. Ma, W. Lou, Verifiable computation over large database with incremental updates, IEEE Trans. Comput. 65 (10) (2016) 3184–3195. [9] X. Chen, J. Li, J. Ma, Q. Tang, W. Lou, New algorithms for secure outsourcing of modular exponentiations, IEEE Trans. Parallel Distrib. Syst. 25 (9) (2014) 2386–2396. [10] M. Dong, H. Li, K. Ota, H. Zhu, HVSTO: efficient privacy preserving hybrid storage in cloud data center, in: Proceedings of IEEE INFOCOM, Toronto, 2014, pp. 529–534. [11] J.R. Douceur, A. Adya, W.J. Bolosky, D. Simon, M. Theimer, Reclaiming space from duplicate files in a serverless distributed file system, in: Proceedings of International Conference on Distributed Computing Systems, 2002, pp. 617–624. [12] Google drive, 2017, [Online], Available: http://drive.google.com/. [13] Dropbox, 2007, [Online], Available: http://www.dropbox.com/. [14] M. Dutch, Understanding data deduplication ratios, in: Proceedings of SNIA Data Management Forum, 2008, pp. 1–13. [15] T. Elgamal, A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Trans. Inf. Theory 31 (4) (1985) 469–472. [16] K. Fu, S. Kamara, T. Kohno, Key regression: enabling efficient key distribution for secure distributed storage, in: Proceedings of Network and Distributed Systems Security Symposium, San Diego, 2006. [17] J. Gantz, D. Reinsel, The digital universe in 2020: Big Data, bigger digital shadows, and biggest growth in the far east, 2012, [Online], Available: http://www.emc.com/collateral/analystreports/idc- the- digital- universe- in- 2020.pdf. [18] J. Han, W. Susilo, Y. Mu, Identity-based data storage in cloud computing, Futur. Gener. Comput. Syst, 29 (3) (2013) 673–681. [19] D. Harnik, B. Pinkas, A. Shulman-Peleg, Side channels in cloud services: deduplication in cloud storage, IEEE Secur. Priv. 8 (6) (2010) 40–47. [20] J. Hur, D. Koo, Y. Shin, K. Kang, Secure data deduplication with dynamic ownership management in cloud storage, IEEE Trans. Knowl. Data Eng. 28 (11) (2016) 3113–3125. [21] T. Jiang, X. Chen, Q. Wu, J. Ma, W. Susilo, W. Lou, Secure and efficient cloud data deduplication with randomized tag, IEEE Trans. Inf. Foren. Secur. 12 (3) (2017) 532–543. [22] X. Jin, L. Wei, M. Yu, N. Yu, J. Sun, Anonymous deduplication of encrypted data with proof of ownership in cloud storage, in: Proceedings of International Conference on Communications in China, 2013, pp. 224–229. [23] J. Li, X. Chen, X. Huang, S. Tang, Y. Xiang, M.M. Hassan, A. Alelaiwi, Secure distributed deduplication systems with improved reliability, IEEE Trans. Comput. 64 (12) (2015) 3569–3579. [24] J. Li, X. Chen, M. Li, J. Li, P.P.C. Lee, W. Lou, Secure deduplication with efficient and reliable convergent key management, IEEE Trans. Parallel Distrib. Syst. 25 (6) (2014) 1615–1625. [25] J. Li, Y.K. Li, X. Chen, P.P.C. Lee, W. Lou, A hybrid cloud approach for secure authorized deduplication, IEEE Trans. Parallel Distrib. Syst. 26 (5) (2015) 1206–1216. [26] J. Li, C. Qin, P.P.C. Lee, J. Li, Rekeying for encrypted deduplication storage, in: Proceedings of International Conference on Dependable Systems and Networks, Toulouse, 2016, pp. 618–629. [27] M. Li, C. Qin, J. Li, P.P.C. Lee, CDStore: toward reliable, secure, and cost-efficient cloud storage via convergent dispersal, in: Proceedings of USENIX Annual Technical Conference, Santa Clara, 2015, pp. 111–124. [28] K. Liang, J.K. Liu, D.S. Wong, W. Susilo, An efficient cloud-based revocable identity-based proxy re-encryption scheme for public clouds data sharing, in: Proceedings of European Symposium on Research in Computer Security, Wroclaw, 2014, pp. 257–272. [29] K. Liang, M.H. Au, J.K. Liu, W. Susilo, D.S. Wong, G. Yang, T.V.X. Phuong, Q. Xie, A DFA-based functional proxy re-encryption scheme for secure, IEEE Trans. Inf. Foren. Secur. 9 (10) (2014) 1667–1680. [30] J. Liu, N. Asokan, B. Pinkas, Secure deduplication of encrypted data without additional independent servers, in: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, 2015, pp. 874–885. [31] D.T. Meyer, W.J. Bolosky, A study of practical deduplication, in: Proceedings of the USENIX Conference on File and Storage Technologies, San Jose, 2011, pp. 1–13. [32] Mozy, 2018, [Online], Available: http://www.mozy.com/. [33] M. Mulazzani, S. Schrittwieser, M. Leithner, M. Huber, E. Weippl, Dark clouds on the horizon: using cloud storage as attack vector and online slack space, in: Proceedings of the USENIX Security Symposium, San Francisco, 2011. [34] N. Park, D.J. Lilja, Characterizing datasets for data deduplication in backup applications, in: Proceeding of IEEE International Symposium on Workload Characterization(IISWC), Atlanta, 2010, pp. 1–10. [35] Y. Shin, K. Kim, Equality predicate encryption for secure data deduplication, in: Proceedings of Proc. Conf. Inf. Security Cryptol, 2012, pp. 64–70. [36] H. Tang, Y. Cui, C. Guan, J. Wu, J. Weng, K. Ren, Enabling ciphertext deduplication for secure cloud storage and access control, in: Proceeding of the 11th ACM on Asia Conference on Computer and Communications Security, 2016, pp. 59–70. [37] J. Wang, X. Chen, Efficient and secure storage for outsourced data: a survey, Data Sci. Eng. 1 (3) (2016) 178–188. [38] J. Wang, X. Chen, X. Huang, I. You, Y. Xiang, Verifiable auditing for outsourced database in cloud computing, IEEE Trans. Comput. 64 (11) (2015) 3293–3303.

H. Yuan et al. / Information Sciences 456 (2018) 159–173

173

[39] M. Wen, K. Ota, H. Li, J. Lei, C. Gu, Z. Su, Secure data deduplication with reliable key management for dynamic updates in CPSS, IEEE Trans. Comput. Social Syst. 2 (4) (2015) 137–147. [40] Wuala, 2015, [Online], Available: http://www.wuala.com/. [41] Z. Yan, W. Ding, X. Yu, H. Zhu, R.H. Deng, Deduplication on encrypted big data in cloud, IEEE Trans. Big Data 2 (2) (2016) 138–150. [42] M. Zhou, Y. Mu, W. Susilo, J. Yan, L. Dong, Privacy enhanced data outsourcing in the cloud, J. Netw. Comput. Appl. 35 (4) (2012) 1367–1373. [43] X. Zhang, K.C. Li, A. Castiglione, X. Chen, New publicly verifiable computation for batch matrix multiplication, Inf. Sci. (2017). [44] Z. Zhang, T. Jiang, J. Li, X. Tao, J. Ma, HVDB: a hierarchical verifiable database scheme with scalable updates, J. Ambient Intell. Hum. Comput. (2018) 1868–5145.