Pervasive and Mobile Computing (
)
–
Contents lists available at ScienceDirect
Pervasive and Mobile Computing journal homepage: www.elsevier.com/locate/pmc
Secure multi-server-aided data deduplication in cloud computing Meixia Miao a , Jianfeng Wang b , Hui Li b , Xiaofeng Chen b,∗ a
School of Entrepreneurship, Xi’an International University, Xi’an 710077, PR China
b
State Key Laboratory of Integrated Service Networks (ISN), Xidian University, Xi’an 710071, PR China
article
info
Article history: Available online xxxx Keywords: Data deduplication Server-aided Brute-force attack Verifiable secret sharing
abstract Cloud computing enables on-demand and ubiquitous access to a centralized pool of configurable resources such as networks, applications, and services. This makes that huge of enterprises and individual users outsource their data into the cloud server. As a result, the data volume in the cloud server is growing extremely fast. How to efficiently manage the ever-increasing datum is a new security challenge in cloud computing. Recently, secure deduplication techniques have attracted considerable interests in the both academic and industrial communities. It can not only provide the optimal usage of the storage and network bandwidth resources of cloud storage providers, but also reduce the storage cost of users. Although convergent encryption has been extensively adopted for secure deduplication, it inevitably suffers from the off-line brute-force dictionary attacks since the message usually can be predictable in practice. In order to address the above weakness, the notion of DupLESS was proposed in which the user can generate the convergent key with the help of a key server. We argue that the DupLESS does not work when the key server is corrupted by the cloud server. In this paper, we propose a new multi-server-aided deduplication scheme based on the threshold blind signature, which can effectively resist the collusion attack between the cloud server and multiple key servers. Furthermore, we prove that our construction can achieve the desired security properties. © 2015 Elsevier B.V. All rights reserved.
1. Introduction Cloud computing, the long dreamed vision of computing as a utility, has plenty of benefits for real-world applications such as on-demand self-service, ubiquitous network access, location independent resource pooling, rapid resource elasticity, usage-based pricing, and outsourcing computation. With the rapid advances of cloud computing, plenty of enterprises and individual users outsource their sensitive data into the cloud storage provider (e.g. Dropbox [1], Google Drive [2]), in which they can enjoy high quality data storage and computing services in a ubiquitous manner, while reducing the burden of data storage and maintenance. As a result, the data volume of storage provider (CSP) is growing much rapidly in recent years, especially when we have entered into the era of big data. According to the analysis of IDC [3], the volume of data in the world is expected to reach 40 trillion gigabytes in 2020. Therefore, one critical challenge of cloud storage provider (CSP) is how to efficiently manage the ever-increasing datum. As a promising primitive, deduplication [4] has attracted more and more attentions from both academic and industrial community. Data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating
∗
Corresponding author. E-mail address:
[email protected] (X. Chen).
http://dx.doi.org/10.1016/j.pmcj.2015.03.002 1574-1192/© 2015 Elsevier B.V. All rights reserved.
2
M. Miao et al. / Pervasive and Mobile Computing (
)
–
data. This technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent [5]. In the process of deduplication, the CSP keeps only one copy for the identical data file. More specifically, the CSP performs the store operation for a data file upon received the first upload request. For the subsequent upload requests, a link from the uploading user to the original data copy will be assigned to him. This ensures that each data file can be stored only as one copy in the server. In the scenario of high data redundancy, deduplication can be used to effectively reduce data storage space and communication overhead. Despite the tremendous benefits, deduplication with sensitive data also suffers from some new security challenges. Specifically, to protect the confidentiality of outsourced data, it will be performed encryption operation before outsourcing. Nevertheless, conventional encryption techniques require that the identical data is encrypted by different users with their own keys, which cause different ciphertexts corresponding to the same plaintext. Thus, it makes cross-user deduplication impossible. To tackle this incompatibility issue, Douceur et al. [6] firstly introduced the notion of convergent encryption. It uses a convergent key deriving from the cryptographic hash value of data content to perform encryption/decryption operations on data copy. That is, given a data file F , the user first computes the convergent key K = H (F ), where H (·) is a one-way collision-resistant hash function. Then, he encrypts F to obtain the ciphertext C = E (K , F ). Since E is a deterministic symmetric encryption scheme, all the users with the identical data can generate the identical convergent key and corresponding ciphertext. This allows the CSP to perform deduplication on the ciphertexts in the cross-user setting. However, convergent encryption is vulnerable to the off-line brute-force attack. That is because that the plaintext space of the given ciphertext C is not big enough (The message is often predictable [7].). Hence, any attacker can learn the corresponding plaintext information by performing the encryption with all possible plaintexts in the off-line phase (note that the encryption scheme E is deterministic and the convergent key K is only depended on the data file F ). To address this issue, Bellare et al. [7] designed a more secure deduplication system called DupLESS. In DupLESS, the user generates his own convergent key with the aid of the key server (KS), where the convergent key involves into the private key of the KS by performing blind signature between the user and KS. We argue that the DupLESS system does not work when the CSP colludes with the KS. The reason is that the attacker can obtain ciphertext and the convergent key simultaneously. Recently, Duan [8] presented a novel distributed encrypted deduplication scheme. In their construction, before uploading a file, user generates the convergent key based on threshold signature technique with the assist of other users. Moreover, a trusted dealer must be included in their scheme to distribute key shares for users. We argue that the trusted dealer works as the similar role of the key server in DupLESS, which is easy vulnerable to the single points of failure. To the best of our knowledge, it seems that there is no effective deduplication scheme that can fully resist the brute-force attack. 1.1. Contributions In this paper, we present a solution on how to design a deduplication scheme can resist the brute force attack. Motivated by reducing the trust assumption on the KS, a multi-key servers deduplication scheme is proposed based on the basic idea of threshold blind signature. Our contributions are two folds as follows:
• We propose a new multi-server-aided deduplication scheme based on the decentralized threshold blind signature, in which each user generates convergent key by interaction with multiple key servers. Furthermore, any partial key servers cannot acquire knowledge of the distributed secret key among all key servers. • Security analysis shows that the proposed scheme is secure in terms of the proposed security model, while can resist brute force attack even if a limited number of key servers are corrupted. 1.2. Related work Secure data deduplication. Data deduplication is an active research area in data storage for several years, which can be used to save network bandwidth and storage space by eliminating duplicate copies of data. However, traditional deduplication techniques [9–11] focused on basic methods and compression ratios, they have not considered the issue of data privacy. To this end, Douceur et al. [6] first introduced the notion of convergent encryption, which can ensure data confidentiality while performing deduplication operations. Driven by ever-increasing datum in cloud computing, plenty of research work on deduplication over encrypted data has been done recently [12,4,7,6,13–17]. Bellare et al. [4] formalized this primitive as message-locked encryption, and explored its application in the space-efficient secure outsourced storage. Stanek et al. [16] proposed a novel deduplication encryption scheme that can provide different security for data files according to the property of popularity (trivially, it means that more users shared the file if a file is more popular). In this way, they can achieve a more fine-grained trade-off between the storage efficiency and data security for the outsourced data. In order to better protect the confidentiality of outsourced data, Li et al. [14] proposed a new authorized data deduplication scheme in a hybrid cloud architecture. In their construction, each user can only perform deduplication on the files satisfying his privilege. Yuan et al. [18] presented a new deduplication scheme which can support efficient and secure data integrity auditing with data deduplication simultaneously. We argue that all the above solutions cannot resist the brute-force attack. As the first attempt, Bellare et al. [7] introduced a DupLESS system that can partially address the problem by adding a key server (KS). That is, the user generated the convergent key by
M. Miao et al. / Pervasive and Mobile Computing (
)
–
3
performing a blind signature protocol with the secret key of the KS. That implies DupLESS can achieve the same security of MLE [4] at worst. Nevertheless, this solution still does not work when the cloud server colludes with the key server. Proof of ownership. Halevi et al. [19] firstly introduced the notion of proofs of ownership (PoW) to ensure data privacy and confidentiality in client-side deduplication. In this way, the user can efficiently prove to the cloud storage server that he indeed owns a file without uploading the file itself. Three concrete PoW constructions are presented based on Merkle hash tree(MHT) built on the content of data file. Specifically, a challenge/response protocol is performed between server and client. Each time the server requires the client to a valid verification object for the requested subset of MHT leaves node.1 Using the PoW, the cheat attack of malicious user can be prevented in the client-side deduplication. Pietro et al. [20] proposed an efficient PoW scheme by choosing the projection of a file onto some randomly selected bit-positions as the file proof, which can only need a constant computational cost. Recently, Blasco et al. [21] presented a novel PoW scheme based on Bloom filter, which is superior in efficiency both server and client side. 1.3. Organization The rest of the paper is organized as follows. We present some preliminaries in Section 2. In Section 3, we present the system and threat model of the proposed deduplication scheme. The detailed constructions and its security analysis are presented in Section 4. The performance evaluation of the construction is given in Section 5. Finally, the conclusion is given in Section 6. 2. Preliminaries 2.1. Bilinear pairings Let G1 , G2 be the cyclic groups of prime order p and g be a generator of G1 , and e : G1 × G1 → G2 be a map with the following properties. 1. Bilinearity: e(g a , g b ) = e(g , g )ab , a, b ∈ Zp . 2. Non-degeneracy: There exist x, y ∈ G1 such that e(x, y) ̸= 1. 3. Computable: For all x, y ∈ G1 , e(x, y) has to be computable in an efficient manner. 2.2. Gap Diffie–Hellman (GDH) groups Let G be a cyclic multiplicative group generated by g with the prime order q. We focus on the following mathematical problems in G: 1. Discrete Logarithm Problem (DLP): Given h, g ∈ G, to find an integer a ∈ Z∗q , such that h = g a whenever such integer exists. 2. Computation Diffie–Hellman problem (CDHP): Given a triple (g , g a , g b ) ∈ G for a, b ∈ Z∗q , compute g ab .
3. Decision Diffie–Hellman problem (DDHP): Given a quadruple (g , g a , g b , g c ) ∈ G for a, b, c ∈ Z∗q , decide whether c ≡ ab mod q.
We call G a GDH group if the DDHP can be solved in polynomial time but no polynomial time algorithm can solve CDHP with non-negligible probability. 2.3. Verifiable secret sharing The concept of secret sharing is introduced by Shamir in [22], which allows to split a secret into different shares and each share is assigned to a participant, only more than a certain number (Threshold) of participants can recover the secret. However, Shamir’s secret sharing scheme requires a dealer to distribute the secret to participant. As an extension of secret sharing, In [23], Chor et al. proposed the first verifiable secret sharing (VSS) scheme. Different from the conventional secret sharing scheme, both the dealer and participants can verify that the shares of secret have been correctly distributed, which can be used to resist participants cheating attack. Pedersen et al. [24] presented a noninteractive VSS scheme. That is, each participant acts as a dealer to choose the secret key and distribute it verifiably to other participants. Any one cannot obtain the information of the secret key if and only if all participants cooperate with each other. 2.4. Threshold blind signature Blind signature firstly introduced by Chaum [25], which can be used to ensure the anonymity of users in electronic voting system. In such scenario, a user can require a signer signs on any message with his own secret key without obtaining neither
1 All leave nodes constitute the data file.
4
M. Miao et al. / Pervasive and Mobile Computing (
)
–
Fig. 1. Architecture of data secure deduplication.
any plaintext information about the signed message nor the resulting signature. However, only a single signer can be used in the traditional blind signature scheme, a dishonest signer can be corrupted by the attackers, it presents a single-pointof-failure. To address this issue, the idea of threshold cryptography is introduced, in which the signature is produced by a group of signers replacing a single signer. Different from the traditional signature scheme, in threshold signature, the secret key is shared by all signers and any not less than t (threshold) signers can produce a valid signature. Specifically, it satisfies the following two properties:
• Any t signers can issue a valid blind signature. • Any t − 1 signers cannot provide a valid blind signature. 2.5. Convergent encryption Convergent encryption [6] can provide data confidentiality in deduplication. To achieve deduplication in cross-user scenario, The user derived convergent key for the identical file in a consistency manner. Moreover, the user derives a tag for the data file, such that the tag will be used to detect duplicates. Note that CE can only achieve the weak security and it is called privacy against chosen distribution attack [4] (This is very similar to the PRIV security in [26].) A convergent encryption scheme consisting of four algorithms is described as follows:
• KeyGenCE (F ) → KC E is the key generation algorithm. It takes data file F as input, and outputs the convergent key KC E; • EncKCE (F ) → C is the symmetric encryption algorithm. It takes both the convergent key KCE and the data copy M as inputs and then outputs a ciphertext C;
• TagCE (F ) → TF is the tag generation algorithm. It takes data file F as input, and outputs a file tag TF ; • DecKCE (C ) → F is the decryption algorithm that takes both the ciphertext C and the convergent key KCE as inputs and then outputs the original data file F . 3. Problem formulation 3.1. System model In this work, we consider a cloud data-outsourcing system supporting deduplication, which consists of three different entities: user, storage cloud service provider (S-CSP) and key cloud service providers (K-CSPs) as elaborated below.
• User. The user firstly checks whether the uploaded data is a duplicate one, before storing it. If found, data uploading should be canceled and a link of the data is assigned to the user.
• S-CSP. The S-CSP is responsible for storing users’ outsourcing data. To reduce the storage cost, the S-CSP eliminates the storage of redundant data via deduplication and keeps only unique data.
• K-CSPs. The K-CSP work as a semi-honest third party to aid to derive convergent key. In our deduplication system, we consider a group of K-CSPs, all of them share a secret key which is used to generate convergent key of data. In addition, each data file is associated with a tag for the duplicate check, which will be stored in the S-CSP. The architecture of the proposed data deduplication scheme is shown in Fig. 1.
M. Miao et al. / Pervasive and Mobile Computing (
)
–
5
3.2. Threat model and security goals We assume that both S-CSP and K-CSP are ‘‘honest-but-curious’’ server. That means they will honestly follow the proposed protocol, but try to find out as much secret information as possible based on their possessions [27]. Users would try to access data either within or out of the scopes of their privileges. Based on the above assumptions, two types of attackers are considered in our deduplication system: (i) An external attacker refers to an entity who may obtain some knowledge of the data file of interest via public channels. This kind of attacker may include revoked users and unauthorized users who want to obtain more secret information out of their scopes; (ii) An internal attacker may refer to S-CSP or any of K-SCPs. Its goal is to extract useful information of outsourced encrypted data or convergent key. In addition, we will be concerned with collusion attack between S-CSP and K-CSPs. Here, we require that the number of compromised K-CSPs is not more than the predefined threshold t if (n, t )-threshold blind signature scheme is used. This means that even the S-CSP colluded with t − 1 K-CSPs, the convergent key cannot be guessed for an unpredictable message by a brute-force attack. We aim to achieve the following two security goals in this paper:
• Security of convergent key: It requires that any attacker cannot get any useful information for the convergent key, even if he corrupts t − 1 K-CSPs at most. Moreover, these K-CSPs are also allowed to collude with S-CSP and users. The goal of the adversary is to retrieve and recover the convergent keys for files that do not belong to them.
• Data confidentiality: It requires that the encrypted data be semantically secure when they are unpredictable. Namely, any attacker cannot get the plaintext information of the encrypted data unless he obtains the convergent key. 4. Multi-server-aided deduplication scheme In this section, we firstly present a basic solution that can resist brute-force attacks in deduplication, and point out the limitation of the solution. Then, an improved approach is given, which can preserve the secrecy of user’s data while preventing brute-force attacks. 4.1. The basic construction We firstly introduce some notations. Let G1 be a GDH group and g is a generator of G1 , H : {0, 1}∗ → G1 , and e : G1 × G1 → G2 . Without loss of generality, we denote by {KSi }i∈1,...,n the set of n K-CSPs, P the system public key (resp., by S the master secret key), Pi the public share of the KSi (resp., by Si the corresponding secret share of it.) The details of the proposed scheme are given as follows. System setup. The system setup phase initializes the necessary parameters. Specifically, all the K-CSPs collaborate to generate system public/secret key pair (P , S ) and public/secret key pair (Pi , Si ) of each KSi . Each KSi performs operations as elaborated below: i
i
• KSi randomly chooses ai0 ∈ Z∗q and computes g a0 , then he publishes g a0 and keeps ai0 secret. • KSi randomly picks up a t − 1 degree polynomial Fi [x] such that Fi [0] = ai0 . Let Fi [x] = ai0 + ai1 x + · · · + ait −1 xt −1 ,
aij ∈ Zq , j = 0, . . . , t − 1.
i
• KSi computes and then publishes g aj for j = 1, . . . , t − 1. Then he sends Fi [j] secretly to other KSj , j = 0, 1, . . . , n; j ̸= i. • When receiving Fj [i] from KSj , j = 0, 1, . . . , n; j ̸= i, the KSi verifies Fj [i] by checking g Fj [i] =
t −1 j k (g ak )i . k=0
Here, we assume that the check of all K-CSPs succeeds. • KSi computes his own secret share n
Si =
Fk [i]
k=1
and publishes his public share Pi = g Si . Then the system public key is published as P =
n
i
g a0 .
i =1
Remark 1. Note that the system public key P = g S , where S=
n i=1
ai0
6
M. Miao et al. / Pervasive and Mobile Computing (
)
–
is the master secret key. It can clearly be seen that S does not appear explicitly in the processing. Any one cannot obtain the useful information of it unless all the K-CSPs collude to compute by sharing their own secret share ai0 . File upload. To upload a file F , the user performs deduplication operation with the aid of K-CSPs. Specifically, the user computes the hash value of the file H (F ), then interacts t K-CSPs to perform threshold blind signature protocol and obtains the convergent key KCE . The tag of each file is TF = Tag CE (F ) = H (Enc KCE (F )). The user sends the tag TF to the S-CSP for performing the duplicate check. If some duplicate is found, a corresponding pointer is assigned to the user; otherwise, the file will be stored on the S-CSP. The detail is described as follows:
• The user randomly chooses r ∈ Z∗q and computes F ′ = rH (F ), then he sends F ′ together with Wi = j̸=i∈C j−j i to each KSi for i ∈ C .2 • For each KSi , on receiving F ′ , he computes a partial signature σi = (F ′ )Si Wi and sends it back to the user. • Upon receiving all partial signatures σi , the user checks by the equation e(σi , g ) = e((F ′ )Wi , Pi ). If holds, the user computes the blind signature
r −1
σ =
σi
,
i∈C
where σ is the convergent key KCE . Then, he checks the correctness of σ with the following equation: e(σ , g ) = e(H (F ), P ). • The user encrypts file F with the conventional symmetric encryption way: C = Enc KCE (F ), then he computes file F tag TF = H (C ) and sends TF to the S-CSP. • On receiving data file tag T (F ), the S-CSP performs the duplicate check, if the tag TF is not matched, the S-CSP stores the ciphertext C and tag TF . Otherwise, the S-CSP assigns a pointer of file F to the user without uploading file F . File download. Suppose the user wants to download file F , the user fetches the C from the S-CSP by using the corresponding pointer. Then he decrypts it with the convergent key. Remark 2. In the phase of file upload, the correctness of convergent key KCE can be verified with the following two equations. On the one hand, the blind signature σ can be deduced into the following form:
r −1
σ =
σi
i∈C
r −1
=
(F )
′ Si Wi
i∈C
=
(F ′ )i∈C
= (H (F ))r
Si
j̸=i∈C
j j−i
r −1
−1 rS
= H (F )S .
(1)
On the other hand, the blind signature σ can be verified as follows: e(σ , g ) = e(H (F )S , g )
= e(H (F ), g S ) = e(H (F ), P ).
(2)
4.2. An enhanced construction with data confidentiality In [28], Harnik et al. pointed out some security weaknesses in client-side deduplication. One threat is that the content of data file stored in server can be revealed to the attacker who know the hash value of it. Specifically, anyone knows the hash value of data file F can generate the corresponding file tag TF and then sends it to the storage server for performing deduplication operations. On receiving the TF from user, the storage server assigns a pointer of F to the user. It is seen obviously that by learning the hash value of F , the user can obtain the entire file. To overcome the weakness, the notion of
2 Here we denote t K-CSPs by C = {KS |1 ≤ i ≤ t }. i
M. Miao et al. / Pervasive and Mobile Computing (
)
–
7
Proof of Ownership (PoW) [19] is introduced, which requires the user to prove the ownership of the given data file to the cloud server. It can address the abuse deduplication of malicious user in client-side deduplication. More precisely, PoW is implemented as an interactive algorithm (denoted by PoW ) run between the user and the server. When a user wants to upload a data file that already exists in the server, the cloud derives a short value φ(F ) (longer than the hash value of it) from the data file F . To prove the ownership of the data file F , the user needs to send φ ′ to the server. If φ ′ = φ(F ) holds, the user can be assigned with a pointer of F without uploading it. Based on the above observations, we present an enhanced construction with data confidentiality by combining the efficient PoW scheme (called bf -PoW [21]) with our basic construction. Our main idea is that checking the ownership of file for user by performing PoW protocol between the user and the server, which can effectively protect the confidentiality of outsourced data. For the convenience of description, some notations are introduced as follows. We denote by H1 and H2 are two cryptographic hash functions, P is a pseudorandom function. A is the index that each entry is a 3-tuples for a single file. AF is a tuple for file F that contains three attributes (TF , BFF , IDF ), where TF is the tag of file F , BFF is a bloom filter for F and IDF is a list of identifiers of users that own F . The details of the enhanced construction is described as follows. System setup. The system setup phase is similar to that in the basic construction, but involves an additional step for initializing PoW protocol. In this work, we denote by PoW F a PoW protocol for file F .
• The necessary system parameters are initialized in the same way as the basic construction, such as system public/secret key pair (P , S ) and public/secret key pair (Pi , Si ) of each KSi . • The PoW protocol is initialized. Specifically, the parameters of the bloom filter are chosen, and the index A is generated. File upload. Suppose that a user uploads a file F , he firstly computes the file hash H1 (F ) and obtains the convergent key KCE = H1 (F ) by interacting with t K-CSPs to perform threshold blind signature protocol. Then he computes the file tag TF = Tag CE (F ) = H (Enc KCE (F )) and sends it to the S-CSP to perform deduplication check.
• If the tag TF is not found, the user splits F into a set of equal blocks {Bi } and computes C = Enc KCE (F ). For each block Bi , the user performs the following operations: (1) Computes the block token TBi = H2 (Bi ) and a pseudorandom value EBi = P (TBi , i); (2) inserts the EBi into the bloom filter BFF ; (3) add the identity of user into the IDF = IDF ∪{IDuser }. Finally, (TF , BFF , IDF ) is inserted into the index A and the ciphertext C is stored in the S-CSP. • If a duplicate of TF is found, the S-CSP will randomly choose q blocks and sends the identities of blocks {Bq } to the user. Upon receiving the set T , the user computes the token Tj = H2 (Bj ), where j ∈ q. Then sends {Tj } back to the S-CSP. The S-CSP performs the following operations: (1) For the q chosen blocks, computes EBj = P (TBj , j) with the returned {Tj }; (2) checks whether all EBj belong to the BFF , if succeed, add the user’s identity into the IDF and a corresponding pointer is assigned to the user. Otherwise, the process is aborted. File download. Suppose the user wants to download file F , the user fetches the C from the S-CSP with his own corresponding pointer. Then he decrypts it with the convergent key. 4.3. Security analysis Our system is designed to address the problem off-line brute force attack in secure deduplication. The security will be analyzed in terms of two aspects, that is, the security of convergent key and the confidentiality of data. Some basic cryptographic tools have been applied to in our construction achieve the deduplication protocol. To show the security of this protocol, we assume that the underlying building blocks are secure, including the convergent encryption scheme, the threshold blind signature and the PoW scheme. Thus, the security will be analyzed based on the above security assumptions. Confidentiality of outsourced data. In our construction, the data file will be encrypted with convergent encryption before outsourcing to the S-CSP. Thus, the confidentiality of data can be achieved if the adversary cannot get the secret key in convergent encryption or break the security of convergent encryption. Considering the traditional convergent encryption cannot achieve semantic security as it is inherently subject to brute force attacks that can recover files falling into a known set. Different from the previous one, the convergent key in our construction is not deterministic in terms of the data content, which still depends on the secret key of K-CSPs and unknown to the attacker. Thus, if the attacker cannot collude with more than t (Threshold) K-CSPs, our scheme can achieve semantically secure. Another attack is that a dishonest user attempts to convince S-CSP that he owns some data file stored in the S-CSP. Although the attacker may obtain the tag of file via public channel, he needs to perform a correct PoW protocol for downloading the data file by interacting with the S-CSP. However, the user cannot get the auxiliary value used to perform PoW if he does not own the file. Thus, based on the security of PoW, the attack can be easily prevented. Security of convergent key. In our construction, the core technique for improving security is to introduce the threshold blind signature [29] into the processing of convergent key. It has been proven in [29] that the proposed scheme can achieve blindness, robustness and unforgetability simultaneously. That means even an adversary can corrupt up to t − 1 K-CSPs, the adversary cannot produce a valid signature on a message (i.e. H(F)). In another word, the security of convergent key could be guaranteed if less than a predefined threshold number K-CSPs are corrupted. Notice that the secret key sharing by all K-CSPs will not be leaked if and only if all K-CSPs are corrupted. Without the secret key, the adversary can only perform
8
(a) Time cost of system setup.
M. Miao et al. / Pervasive and Mobile Computing (
)
–
(b) Time cost of data upload. Fig. 2. Efficiency analysis of the process in the proposed scheme.
the brute force attack with the help of controlled K-CSPs in ‘‘on-line’’ manner. That implies the adversary cannot perform off-line brute force attack even one K-CSP cannot be controlled. Thus, our scheme can provide more stronger security and effectively prevent the off-line brute force attack. 5. Performance evaluation In this section, we provide an experimental evaluation of the proposed data deduplication scheme. For the convenience of discussion, some notations are introduced. We denote by P the paring operation, by Exp the modular exponentiation, by n the total number of K-CSPs, and by t the number of K-CSPs participated in convergent key creation. Note that we omit the ordinary file transfer and file encryption/decryption modules for simplification. In addition, consider the process of data download is very efficient because of it only needs to perform decryption operation (symmetric), it will not influence the efficiency of system, so we omit it in our discuss. All our experiments were performed on a computer with Intel(R) Core(TM) i5-3210 CPU processor running at 2.5 GHz and 4 G memory running Linux. In the following experiments, we fix the reliability level n − t = 2 and vary the number of file in the data upload process from 10 to 100. 5.1. Cost of system setup In the system setup phase, the main computation cost is dominated by modular exponentiation operations, which are used to generate public/private pair for each K-CSP and the system public/private pair. More precisely, each K-CSP conducts n + 2 times Exp to issue his own public/private pair by interacting with other K-CSPs. We perform the experiments with number of K-CSPs ranging from 5 to 10. As shown in Fig. 2(a), the time cost will linearly increase with the number of K-CSPs. 5.2. Cost of data upload In order to ensure the secrecy of data file, the user generates the convergent key by interacting with t K-CSPs. Specifically, he checks all partial signatures from each K-CSP, the total computation consists of 2t + 1 times Exp, 2t + 2 times pairing, and a number of convergent encryption. In our experiments, we fix reliability level n − t = 2. As shown in Fig. 2(b), we present the simulation result in the two cases of number of K-CSPs n = 5 and n = 10. It can clearly be seen that the time cost is increasing with the number of K-CSPs when the same number of file is uploaded. That is because the user needs to interact with additional K-CSP and more partial signatures need to be checked with the number of all K-CSPs. 6. Conclusion In this paper, we make a further study on the problem of resisting off-line brute force attack in deduplication. In our construction, the convergent key is generated with additional secret key of K-CSPs. That is, the user interacted with any t of K-CSPs to perform the threshold blind signature. Note that the secret key can be leaked if and only if all K-CSPs are corrupted. Thus, our scheme can provide a more stronger security and effectively resist the off-line brute force attack. We also prove that our scheme can achieve the desired security goals.
M. Miao et al. / Pervasive and Mobile Computing (
)
–
9
Acknowledgments This work is supported by the National Natural Science Foundation of China (No. 61272455), China 111 Project (No. B08038), Doctoral Fund of Ministry of Education of China (No. 20130203110004), Program for New Century Excellent Talents in University (No. NCET-13-0946), and the Fundamental Research Funds for the Central Universities (Nos. BDY151402 and JB142001-14). References [1] Dropbox, a file-storage and sharing service. http://www.dropbox.com/. [2] Google Drive. http://drive.google.com. [3] J. Gantz, D. Reinsel, The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east, December 2012. http://www. emc.com/collateral/analyst-reports/idcthe-digital-universe-in-2020.pdf. [4] M. Bellare, S. Keelveedhi, T. Ristenpart, Message-locked encryption and secure deduplication, in: Proceedings of 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, EUROCRYPT’13, 2013, pp. 296–312. [5] Data Deduplication. http://en.wikipedia.org/wiki/Data_deduplication. [6] J.R. Douceur, A. Adya, W.J. Bolosky, D. Simon, M. Theimer, Reclaiming space from duplicate files in a serverless distributed file system, in: Proceedings of the 22nd IEEE International Conference on Distributed Computing Systems, 2002, pp. 617–624. [7] M. Bellare, S. Keelveedhi, T. Ristenpart, DupLESS: server-aided encryption for deduplicated storage, in: Proceedings of the 22nd USENIX Security Symposium, 2013, pp. 179–194. [8] Y. Duan, Distributed key generation for encrypted deduplication: achieving the strongest privacy, in: Proceedings of the 6th Edition of the ACM Workshop on Cloud Computing Security, 2014, pp. 57–68. [9] A. Adya, W.J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J.R. Douceur, J. Howell, J.R. Lorch, M. Theimer, R.P. Wattenhofer, FARSITE: federated, available, and reliable storage for an incompletely trusted environment, in: Proceedings of the 5th Symposium Operating Systems Design and Implementation, 2002, pp. 1–14. [10] N. Jain, M. Dahlin, R. Tewari, TAPER: tiered approach for eliminating redundancy in replica synchronization, in: Proceedings of the 4th USENIX Conference on File and Storage Technologies, 2005, pp. 281–294. [11] S. Quinlan, S. Dorward, Venti: a new approach to archival storage, in: Proceedings of the 1st USENIX Conference on File and Storage Technologies, 2002, pp. 89–101. [12] P. Anderson, L. Zhang, Fast and secure laptop backups with encrypted de-duplication, in: Proceedings of USENIX LISA, 2010, pp. 1–8. [13] J. Li, X. Chen, M. Li, J. Li, P. Lee, W. Lou, Secure deduplication with efficient and reliable convergent key management, IEEE Trans. Parallel Distrib. Syst. 25 (6) (2014) 1615–1625. [14] J. Li, Y. Li, X. Chen, P. Lee, W. Lou, A hybrid cloud approach for secure authorized deduplication, IEEE Trans. Parallel Distrib. Syst., no. 1, p. 1, PrePrints. http://dx.doi.org/10.1109/TPDS.2014.2318320. [15] B. Mao, H. Jiang, S. Wu, Y. Fu, L. Tian, Read-performance optimization for deduplication-based storage systems in the cloud, ACM Trans. Storage 10 (2) (2014) 1–22. [16] J. Stanek, A. Sorniotti, E. Androulaki, L. Kencl, A secure data deduplication scheme for cloud storage, in: Proceedings of the 18 International Conference on Financial Cryptography and Data Security, 2014. [17] M. Storer, K. Greenan, D. Long, E. Miller, Secure data deduplication, in: Proceedings of the 4th ACM International Workshop on Storage Security and Survivability, 2008, pp. 1–10. [18] J. Yuan, S. Yu, Secure and constant cost public cloud storage auditing with deduplication, in: Proceedings of the 1st IEEE Conference on Communications and Network Security, 2013, pp. 145–153. [19] S. Halevi, D. Harnik, B. Pinkas, A. Shulman-Peleg, Proofs of ownership in remote storage systems, in: Proceedings of the 18th ACM Conference on Computer and Communications Security, 2011, pp. 491–500. [20] R. Di Pietro, A. Sorniotti, Boosting efficiency and security in proof of ownership for deduplication, in: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, 2012, pp. 81–82. [21] J. Blasco, A. Orfila, R.D. Pietro, A. Sorniotti, A tunable proof of ownership scheme for deduplication using bloom filters, in: Proceedings of the 2nd IEEE Conference on Communications and Network Security, 2014. [22] A. Shamir, How to share a secret, Commun. ACM 22 (1979) 612–613. [23] B. Chor, S. Goldwasser, S. Micali, B. Awerbuch, Verifiable secret sharing and achieving simultaneity in the presence of faults, in: Proceedings of the 26th IEEE Symposaum on the Foundations of Compuler Science, FOCS, 1985, pp. 383–395. [24] T.P. Pedersen, Non-interactive and information-theoretic secure verifiable secret sharing, in: Proceedings of Advances in Cryptology, CRYPTO’91, in: LNCS, vol. 576, Springer-Verlag, 1992, pp. 129–140. [25] D. Chaum, Blind signatures for untraceable payments, in: Proceedings of Advances in Cryptology—CRYPTO’82, Springer-Verlag, 1983, pp. 199–203. [26] M. Bellare, A. Boldyreva, A. O’Neill, Deterministic and efficiently searchable encryption, in: Proceedings of the 27th Annual International Cryptology Conference on Advances in Cryptology, CRYPTO’07, Springer-Verlag, 2007, pp. 535–552. [27] J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, W. Lou, Fuzzy keyword search over encrypted data in cloud computing, in: Proceedings of the 29th IEEE International Conference on Computer Communications, 2010, pp. 441–445. [28] D. Harnik, B. Pinkas, A. Shulman-Peleg, Side channels in cloud services, the case of deduplication in cloud storage, IEEE Secur. Privacy Mag. 8 (6) (2010) 40–47. special issue of Cloud Security. [29] D.L. Vo, F. Zhang, K. Kim, A new threshold blind signature scheme from pairings, in: Proceedings of the Symposium on Cryptography and Information Security, 2003, pp. 233–238.