Generalised regenerating codes for securing distributed storage systems against eavesdropping

Generalised regenerating codes for securing distributed storage systems against eavesdropping

ARTICLE IN PRESS JID: JISA [m5G;March 3, 2017;16:46] Journal of Information Security and Applications 0 0 0 (2017) 1–8 Contents lists available at...

811KB Sizes 0 Downloads 109 Views

ARTICLE IN PRESS

JID: JISA

[m5G;March 3, 2017;16:46]

Journal of Information Security and Applications 0 0 0 (2017) 1–8

Contents lists available at ScienceDirect

Journal of Information Security and Applications journal homepage: www.elsevier.com/locate/jisa

Generalised regenerating codes for securing distributed storage systems against eavesdropping Jian Xu∗, Yewen Cao, Deqiang Wang School of Information Science and Engineering, Shandong University, Jinan, China

a r t i c l e

i n f o

Article history: Available online xxx Keywords: Distributed storage system Regenerating code Generalised regenerating code Secrecy capacity Security level

a b s t r a c t Regenerating codes (RCs) are efficient at both storage cost and repair bandwidth, and thus are regarded as preferable candidates for distributed storage systems (DSSs). For DSSs with RCs, a file stored across n distributed nodes can be reconstructed from k (< n) nodes. The collection of the k nodes is called the reconstruction set. A failed node can be regenerated (i.e., repaired) from d (< n) remaining nodes. The collection of the d nodes is called the regeneration set. In traditional RCs, the numbers of reconstruction sets and regeneration sets are fixed to some specific values. In this paper, we introduce the concept of generalised RCs, in which the value ranges of the numbers of both reconstruction sets and regeneration sets are extended. Compared to traditional RCs, the generalised RCs possess more coding schemes and better system security level in terms of the probability of revealing original data file. An explicit construction of generalised RCs is provided, in which the numbers of both reconstruction sets and regeneration sets can be designed flexibly. Furthermore, based on the generalised RCs, an intruder model where an eavesdropper can access to some nodes is considered and a general upper bound on secrecy capacity is derived. The relationship between the obtained upper bound and existing ones achieved by traditional RCs is discussed in detail. The provided explicit construction is the first optimal construction of generalised RCs, which achieves the upper bound on secrecy capacity and has the flexibility in designing the numbers of reconstruction sets and regeneration sets. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction Distributed storage system (DSS) is an emerging technology relevant to cloud computing and has drawn a lot of attention from the telecommunications industry [1–3]. In DSSs, storage nodes are distributed across a wide geographical area and connected by a network. These storage nodes are individually unreliable (due to many factors, such as disk failures or peer churning in peer-to-peer storage systems [4]) and collectively used to reliably store data files over a long period of time. Examples of these systems include peer-to-peer storage clouds and large data centers [4]. In order to improve the system reliability, data are stored redundantly on the storage nodes. Besides, a self-sustaining DSS must be able to regenerate (i.e., repair) failed nodes. The most straightforward strategy is replication, in which the data are duplicated and stored in multiple storage nodes [5]. This method, though simple, has low storage efficiency. The other strategy is erasure coding which provides better storage efficiency [6]. However, in the face of repairing a failed storage node, erasure coding wastes bandwidth. Both proactive se∗

Corresponding author. E-mail addresses: [email protected] (J. Xu), [email protected] (Y. Cao), [email protected] (D. Wang).

cret sharing [7,8] and regenerating codes (RCs)[4,9] can achieve the similar purpose of security guarantees. However, secret sharing often performs with high redundancy [10]. In contrast, the RCs are efficient at both storage and repair bandwidth (i.e., the amount of data downloaded to repair a failed node) [9,11,12]. Thus, RCs can be preferable candidates for DSSs in terms of data security, storage and repair bandwidth efficiencies. In traditional RCs, a file stored across n distributed nodes, can be recovered from any k out of n nodes. The collection of k nodes is called the reconstruction set. If a node fails, any d out of the remaining (n − 1 ) nodes can be used to repair the lost data previously stored on the failed node. The collection of d nodes is called the regeneration set. It is generally required that in traditional RCs, the number of reconstruction sets, NR , and the number of regeneration sets, NH , satisfy NR = Ckn and NH = Cdn−1 . These requirements of reconstruction and regeneration are preferable for legitimate users who want to reconstruct the original data; however these requirements are rigorous for designing coding schemes, resulting in limited available code constructions. On the other hand, these requirements can be adverse to data security in the presence of an intruder that can eavesdrop on some nodes, as any k out of n nodes (i.e., there are Ckn reconstruction sets) can be used to recover the whole original data file.

http://dx.doi.org/10.1016/j.jisa.2017.02.002 2214-2126/© 2017 Elsevier Ltd. All rights reserved.

Please cite this article as: J. Xu et al., Generalised regenerating codes for securing distributed storage systems against eavesdropping, Journal of Information Security and Applications (2017), http://dx.doi.org/10.1016/j.jisa.2017.02.002

JID: JISA 2

ARTICLE IN PRESS

[m5G;March 3, 2017;16:46]

J. Xu et al. / Journal of Information Security and Applications 000 (2017) 1–8

In this paper, we introduce the concept of generalised RCs where NR and NH satisfy 1 < NR ≤ Ckn and 1 ≤ NH ≤ Cdn−1 , respectively. Obviously, traditional RC is a special case of the generalised RC. Compared to traditional RCs, the proposed generalised RCs have relaxed requirements for data reconstruction and regeneration, thus resulting in more coding schemes. On the other hand, the generalised RCs possess a better security level (SL) and are preferable for securing DSSs against eavesdropping. Here the probability of revealing original data file in the presence of the intruder is referred to system SL. In generalised RCs, only when the eavesdropper (Eve) accessed to specific k nodes (i.e., the k nodes in one reconstruction set), can it obtain the whole original data file. When accessing to other k nodes which are not in one reconstruction set, the original data file cannot be recovered and thus will not leak to the Eve, resulting in a better SL. In practice, different DSS applications have different requirements of SL, such as government’s cloud platforms, the most security-sensitive enterprise cloud platforms and other private or public clouds. By designing different values of NR , the generalised RCs can possess different system SLs. In [13] and [14], failed node regeneration can be realised by accessing to some specific node sets in order to take full advantage of the different storage capacities or different repair bandwidths. While, the number of nodes used for failed node regeneration is not restricted to d, so long as the total data downloaded exceeds a certain threshold. The performances of generalised RCs, such as data security, storage capacity and repair bandwidth, have not been studied by far. Based on the generalised RCs, we focus on the data security and study a type of intruder model. The results obtained in the present paper are as follows. (a) An explicit construction of generalised RCs is provided, in which NR and NH can be designed as different values; (b) The general upper bound on secrecy capacity of the intruder model is derived with an information-theoretic proof; (c) With the aim of verifying the generality of the obtained upper bound, the obtained upper bound is specified into traditional RCs case, then is shown with its compactness (i.e., tightness) and the consistency at two special cases with previous upper bounds obtained in traditional RCs. The provided explicit construction of generalised RCs achieving the upper bound on secrecy capacity is optimal in certain parameter regimes. Besides, an example implementation is provided also. 2. Related work Considerable studies on RCs can be found in the literature [9,12,16–18]. An optimal tradeoff between storage and repair bandwidth was given in the original work of Dimakis et al. [19]. Two special points on this tradeoff curve are termed as minimum storage regenerating (MSR) and minimum bandwidth regenerating (MBR) points, which correspond to the best storage efficiency and the minimum repair bandwidth, respectively. The RCs achieving MSR and MBR points are called MSR and MBR codes, which were constructed in [12]. It has been shown in [20] that the interior points between MSR and MBR points on this tradeoff cannot be achieved under exact repair with optimality. The RCs permit a failed node to repair by downloading β units of information from each of d storage nodes. To demonstrate, an example inspired by Pawar et al. [4] is provided by Fig. 1, where a maximal distance separable (MDS) code is used to store a file F of 4 symbols (x1 , . . . , x4 ) over a finite field Fq of size q = 5. The message symbols (x1 , . . . , x4 ) are coded by a (4,2)MDS code, and then stored on four different nodes (i.e., nodes 1, 2, 3, 4), with each node having a storage capacity of α = 2 symbols. Here we combine input node Ini and output node Outi into a single vertex, representing storage node i, i ∈ [1, 4]. A data collector (DC) can reconstruct the whole file F by connecting to any k = 2 out of n = 4

Fig. 1. An example of a DSS under repair. A file F of 4 symbols (x1 , . . . , x4 ) ∈ Fq is coded by a (4,2)MDS code and then stored on four different nodes. Denote the four storage nodes as “N i”, i ∈ [1, 4]. Combine input node Ini and output node Outi into a single vertex, representing storage node i. Storage node 2 failed and the stored tow symbols x3 , x4 were lost. The failed node 2 is then replaced by replacement node r(2), which downloads (x1 + 2x2 ), (x1 + x3 + 2x2 + 2x4 ), (x1 + 2x3 + 2x2 + x4 ) from nodes 1, 3, 4 respectively, to compute and store symbols x3 , x4 . A data collector (DC) can recover the whole file F by connecting to any k = 2 out of n = 4 storage nodes. The edges in the graph are labeled by their capacities.

storage nodes. When a storage node fails, the system needs to be repaired by replacing the failed node by a new node (called replacement node). As depicted in Fig. 1, node 2 failed and the stored tow symbols x3 , x4 were lost. The failed node 2 is then replaced by replacement node r(2). By making node r(2) connect to d = 3 nodes (helper nodes), and download β = 1 symbol from each, the lost data x3 , x4 can be recovered. Securing DSSs against an Eve who may come at different time instances during the lifetime of the storage system to observe, has been studied extensively across the literature, such as [4,9,15,16,21]. The aim of security against eavesdropping is to prevent an Eve from obtaining any information about the stored data. The intruder model {e1 , e2 }DSS where an Eve who can obtain all the downloaded data passed on to a set E1 of e1 nodes in the system as these nodes undergo repair, and may also gain read-access to the content of a disjoint second set E2 of e2 nodes, is studied in the present paper. Related work on {e1 , e2 }DSS can be found in traditional RCs case in literature [4,9,21]. The authors in [4] considered an {e1 , e2 = 0}DSS, and derived an upper bound on the secrecy capacity of the system. The authors in [9] considered the {e1 , e2 }DSS and constructed secure MBR and secure MSR codes. The constructed secure MBR codes are shown to be optimal in terms of the upper bound [4]. The authors in [21] derived an upper bound of {e1 , e2 }DSS at the special case of MSR point, so that the constructed secure MSR codes in [9] were optimal in terms of this upper bound. The rest of this paper is organized as follows. The framework of generalised RCs and explicit construction of generalised RCs with multi-reconstruction sets and multi-regeneration sets are provided in Section 3. The system model analysis based on information flow graph is presented in Section 4. In Section 5, a general upper bound on secrecy capacity of intruder model in the generalised RCs is derived. The generality of the obtained upper bound is verified under traditional RCs case at MSR and MBR points. In Section 6, an example implementation and the analysis of generalised RCs are given. We conclude the paper in Section 7. 3. Generalised RCs 3.1. Framework of generalised RCs In this subsection, we firstly introduce the framework of generalised RCs for DSSs. We focus on the repair of single node failure throughout this paper because it is the dominant failure case in DSS [22]. Data are stored across n storage nodes in a distributed manner. Each node has a capacity of α symbols, which is used to

Please cite this article as: J. Xu et al., Generalised regenerating codes for securing distributed storage systems against eavesdropping, Journal of Information Security and Applications (2017), http://dx.doi.org/10.1016/j.jisa.2017.02.002

JID: JISA

ARTICLE IN PRESS

[m5G;March 3, 2017;16:46]

J. Xu et al. / Journal of Information Security and Applications 000 (2017) 1–8

store the coded symbols. When a storage node i, i ∈ [1, n] fails, it is replaced by a replacement node r(i) of failed node i. Consider the exact repair case where the data stored on replacement node r(i) is an exact copy of the lost data stored previously on failed node i. Thus the replacement node r(i) and node i can be regarded as the same storage node. The replacement node r(i) can realize the failed node regeneration by downloading totally γ = dβ symbols from d storage nodes (i.e., helper nodes) {i1 , i2 , . . . , id }. Denote set Hi = {i1 , i2 , . . . , id } as a regeneration set of failed node i. Besides, whenever a single node i failed and has not been repaired, a DC can still realize data reconstruction by downloading symbols from a specific set consisting of other k nodes {i1 , i2 , . . . , ik }. Denote set Ri = {i1 , i2 , . . . , ik } as a reconstruction set for failed node i. Definition of generalised regenerating codes (generalised RCs): A RC is called a generalised RC if the following properties are simultaneously satisfied even if arbitrary single node i, i ∈ [1, n] fails: (i) The number of reconstruction sets, NR , satisfies 1 < NR ≤ Ckn . This means there exists at least one reconstruction set Ri consisting of k nodes such that a DC can recover the original data at any time instant. This is called generalised reconstruction property; (ii) The number of regeneration sets, NH , satisfies 1 ≤ NH ≤ Cdn−1 . This means there exists at least one regeneration set Hi consisting of d nodes such that the replacement node r(i) of failed node i can repair the lost data stored previously on the failed node i at any time instant. This is called generalised regeneration property. The replacement node with the existing nodes should always satisfy the generalised reconstruction and generalised regeneration properties. Any RC satisfying the generalised reconstruction and generalised regeneration properties is called a generalised RC. Note that for any DSS to be generalised, it is required that there always exists at least one reconstruction (regeneration) set Ri (Hi ) such that data reconstruction (failed node regeneration) can be realised, whenever arbitrary single node i fails. Assume that this condition is satisfied throughout this paper. In traditional RCs, NR and NH satisfy NR = Ckn and NH = Cdn−1 . This means that “any k out of n” (i.e., there are Ckn reconstruction sets) can be used for data reconstruction, and “any d out of remaining (n − 1 )” (i.e., there are Cdn−1 regeneration sets) can be used for failed node regeneration. Compared with traditional RCs, our proposed generalised RCs have the following advantages:

(a) In the generalised RCs, there exist more coding schemes, because the generalised reconstruction and generalised regeneration properties are extensions of the reconstruction and regeneration properties in traditional RCs; (b) The generalised RCs possess a better security level (SL). Here the probability of revealing original data file in the presence of an intruder that can eavesdrop on a certain number of storage nodes to observe, is referred to as system SL. Obviously, the lower probability of revealing original data file corresponds to a better SL. As shown by Fig. 2, at the extreme right point (i.e., the traditional RCs case), if the Eve obtains the content stored on any k nodes, the whole data file will leak to the Eve. While, in the generalised RCs, only when the Eve obtained the data stored on specific k nodes (i.e., k nodes in one reconstruction set) which can be collectively used for data reconstruction, can it obtain the whole original data. Consider another extreme left point, where there exists only one reconstruction set. In this case, only when the Eve accessed to the data stored on the specific k nodes in the only one reconstruction set, can it obtain the whole original data file. Accessing to any other collection of k nodes, the original data file cannot be recovered and

3

Fig. 2. Security level (SL) in generalised RCs: at the extreme right point (i.e., the traditional RCs case), the number of reconstruction sets, NR , satisfies NR = Ckn . At another extreme left point, NR satisfies NR = 1. In generalised RCs, NR satisfies 1 < NR ≤ Ckn and the system SL can increase as NR decreases.

Fig. 3. An explicit construction representation of generalised RCs for a DSS operating at MBR point. The Fs information symbols x1 , x2 , . . . , xF s and U independent random keys K1 , . . . , KU are mixed appropriately using an (F, F)MDS code with F = U + F s . The encoded symbols z1 , . . . , zF s , K1 , . . . , KU are then stored on the DSS using a generalised repetition code.

will not leak to the Eve, thus possessing a better system SL. In practice, different DSS applications have different requirements of SL, such as government’s cloud platforms, the most security-sensitive enterprise cloud platforms and other private or public clouds. By designing different values of NR , the generalised RCs can possess different system SLs. 3.2. Explicit construction with multi-reconstruction sets and multi-regeneration sets In this subsection, an explicit construction of generalised RCs is provided, in which NR and NH can be designed as different values. Our codes construction is designed for β = 1 in the presence of property of striping [12]. In this property, the whole message file is divided into stripes with small sizes corresponding to β = 1. As each stripe possesses minimum size, the complexities of encoding, generalised reconstruction, and generalised regeneration operations are quite low. Besides, we design the codes construction at a special case of MBR point, i.e., α = dβ [9]. This subsection will give a linear exact generalised RCs construction. The linearity means that the stored symbols are linear combinations of the source symbols over the finite field Fq . Inspired by the approach in [4], our codes construction consists of the concatenation of an MDS code with a generalised repetition code as depicted by Fig. 3. Denote the capacity of a DSS in generalised RCs in the absence of an Eve, as C(α , γ )G ≤ F , where F is the maximum number of information that can be stored on the DSS in the absence of an Eve. Similarly, denote the secrecy capacity of a s DSS in generalised RCs in the presence of an Eve as CG s (α , γ ) ≤ F , where Fs is the maximum number of information symbols that can be stored securely on the DSS in the presence of the Eve. Denote X = [x1 , x2 , . . . , xF s ] ∈ Fq as the information vector containing the Fs information symbols x1 , x2 , . . . , xF s . Define U = F − F s , and then let K = [K1 , . . . , KU ] be a random key vector containing U independent random keys K1 , . . . , KU with each uniformly distributed over the finite field Fq of size q, and independent with the Fs message symbols. Here we choose q ≥ F.

Please cite this article as: J. Xu et al., Generalised regenerating codes for securing distributed storage systems against eavesdropping, Journal of Information Security and Applications (2017), http://dx.doi.org/10.1016/j.jisa.2017.02.002

ARTICLE IN PRESS

JID: JISA 4

[m5G;March 3, 2017;16:46]

J. Xu et al. / Journal of Information Security and Applications 000 (2017) 1–8 Table 1 Generalised repetition codes for a DSS operating at MBR point with n = σ k, d = k + 1 and Fs original message symbols. The F coded symbols are z1 , . . . , zF s , K1 , . . . , KU with F − F s = U. Denote the n nodes as “N i, ” i ∈ [1, n] and denote the α = d coded symbols stored on each of these n nodes as D1 , . . . , Dα .

Then the proposed explicit construction of generalised RCs which meets the upper bound on (secrecy) capacity of a DSS includes an (F, F)MDS code with its generator matrix given by



=

 U×U , F s ×U

U×F s F s ×F s

(1)

where U×F s is a matrix of size (U × Fs ) given by



U×F s

1 ⎢1 ⎢. ⎢ ⎢1 ⎢ ⎢1 ⎢ = ⎢− ⎢1 ⎢ ⎢1 ⎢ ⎢. ⎣1 1

1 1 . 1 1 − 1 1 . 1 0

... ... ... ... ... − ... ... ... ... ...

1 1 . 1 1 − 1 0 . 0 0



1 1⎥ .⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥ −⎥; 0⎥ ⎥ 0⎥ ⎥ .⎥ 0⎦ 0

N1 N2 N3 ... N Fs N (F s + 1 ) ... Nk N (k + 1 ) N (k + 2 ) ... N 2k N ( 2k + 1 ) ... N n(= σ k)

(2)

 U × U is a unit matrix of size (U × U); F s ×F s is a unit matrix of size (Fs × Fs ); and F s ×U is a null matrix of size (Fs × U). Set, zi = xi +

U−i +1

(5) There exist

K j , i ∈ [1, F s ],

(4)

According to (1)–(3), Eq. (4) can be rewritten as X = [z1 z2 . . . zF s K1 . . . KU ] . The F coded symbols z1 , . . . , zF s , K1 , . . . , KU are then stored on the n nodes by using the generalised repetition codes. In the following part, the generalised repetition codes are described in detail under the conditions that



n = σ k, σ ∈ {1, 2, 3, . . .} nα − σ F + σ = n d =k+1

D3

...



z1 z2 z3 ... zF s ... ... ... z1 K2 ... Kα −1 z1 ... ...

K1 K1 K1 K1 K1 K1 K1 K1 K1 K1 K1 K1 K1 K1 K1

K2 Kα K2α −2 ... ... ... ... ... z2 Kα ... K2α −3 z2 ... ...

... ... ... ... ... ... ... ... z3 K2α −2 ... K3α −5 z3 ... ...

Kα −1 K2α −3 K3α −5 ... ... ... ... ... ... ... ... ... ... ... ...

= σ reconstruction sets R(1 ) = {1, 2, . . . , k},

,R ( σ ) = { ( σ − 1 )k + 1, ( σ − 1 )k + 2, . . . , σ k(= n )}, satisfying the generalised reconstruction property. (6) Besides, the α coded symbols stored in a single node, can be found on at least d other different nodes respectively, thus our codes satisfy the generalised regeneration property. Specifically, since the King symbol K1 is stored once in each of the n nodes, there exist (n − α ) regeneration sets Hi for arbitrary single failed node i, i ∈ [1, n].

then the (F, F) MDS code takes information vector X = [x1 , x2 , . . . , xF s ] and random key vector K = [K1 , . . . , KU ] as input, and then output X is given by

X ] ,

D2

R(2 ) = {k + 1, k + 2, . . . , 2k},...

(3)

j=1

X = [K

n k

D1

Then we obviously have the number of reconstruction sets, NR , and the number of regeneration sets, NH , as



1 < NR = nk = σ ≤ Ckn 1 ≤ NH = (n − α ) ≤ Cdn−1

(6)

(5)

(1) In the generalised repetition codes, the F coded symbols are filled in the n storage nodes with each storing α symbols. This can be regarded as filling the nα different locations with the F coded symbols. (2) According to the second equality in (5), the generalised repetition codes can store at least σ copies of each coded symbols, since n > σ . The nα − σ F (= n − σ ) remaining locations are filled with carefully selected symbol, such that for arbitrary single failed node i, i ∈ [1, n], the codes satisfy generalised reconstruction and generalised regeneration properties simultaneously. (3) Specifically, according to the first equality in (5), divide the n storage nodes into σ groups with k nodes per group. Then at the ith group, i ∈ [2, σ ], repeat “vertically” (i.e., across all the k nodes) the data stored “horizontally” (i.e., on a single storage node in the first group), as shown by Table 1. There are (n − σ ) remaining locations filled with symbol K1 , that is to say, the symbol K1 is stored once in each of the n nodes. Here, the symbol K1 is referred to King symbol. (4) For example, in the first group (i.e., nodes 1, 2,... , k), the α = d symbols stored on storage node 1 are z1 , K1 , K2 , ... , Kα −1 . Then in the second group (i.e., nodes (k + 1 ), (k + 2 ),... , 2k), repeat the (α − 1 ) = k symbols z1 , K2 , . . . , Kα −1 (except for the King symbol K1 ) on the first location of each node of second group.

4. System model analysis 4.1. Distributed storage system in generalised RCs Distributed storage system (DSS) is a dynamic network of storage nodes. These nodes include a source node S that has an incompressible data file of size F symbols, each belonging to a finite field Fq of size q. The source node S is connected to n storage nodes, denoted by node i, i ∈ [1, n]. Each node has a storage capacity of α symbols, which may be utilized to save coded parts of the file. A flow graph representation of the DSS is a directed acyclic graph G shown in Fig. 4. The vertices are divided into stages, starting from stage -1. At the wth stage, where w = 1, 2, . . . , there is a single failed node, and it is replaced by a new node called the replacement node of the failed node. The replacement node downloads β units of information from each of d storage nodes (called helper nodes) to repair the lost data previously stored on the failed node. Note that the edges are directed and labelled by the corresponding capacities. The information flow graph G is stated formally as follows: (1) There is a single source vertex, S, at stage -1. It represents the data object of size F to be stored in the system. (2) There are 2n vertices at stage 0, denoted by Ini and Outi , where i = 1, 2, . . . , n. Combine Ini and Outi into a single vertex, representing storage node i. Besides, a directed edge from Ini to Outi with capacity α is drawn.

Please cite this article as: J. Xu et al., Generalised regenerating codes for securing distributed storage systems against eavesdropping, Journal of Information Security and Applications (2017), http://dx.doi.org/10.1016/j.jisa.2017.02.002

JID: JISA

ARTICLE IN PRESS

[m5G;March 3, 2017;16:46]

J. Xu et al. / Journal of Information Security and Applications 000 (2017) 1–8

5

Fig. 4. Information flow graph G corresponding to an {e1 , e2 }DSS: When a node i, i ∈ [1, k] fails at ith stage, its replacement node r(i) connects to p(i) of the replacement nodes at stages 1,2,... ,(i-1) with 0 ≤ p(i ) ≤ i − 1, and downloads β units from each to repair. A DC connects to these k nodes through infinite capacity links and wants to reconstruct the original file of size F. The data downloaded by nodes r(1), · · ·, r(e1 ) to repair (represented by dashed arrows) are observed by an eavesdropper during the repair processes. Nodes e1 + 1, · · ·, e1 + e2 (represented by broken boundaries) are compromised by the eavesdropper who gains access to the content stored on these e2 nodes.

(3) At any stage of network evolution, a data collector (represented by a vertex DC) corresponds to a request for data reconstruction, where the DC retrieves the stored data object by observing the data stored on k nodes. Let i1 , i2 , . . . , ik be the k storage nodes to which a DC connects, and the corresponding reconstruction set is Ri = {i1 , i2 , . . . , ik }. (4) At each stage w (w = 1, 2, . . .), a failure occurs and a corresponding replacement node joins the storage network. The replacement node connects to d helper nodes, and downloads β units from each to repair the lost data. Thus the network evolves through an infinite chain of failures and regenerations. Assume that node i failed at the ith stage, two vertices, Inr(i) and Outr(i) , are constructed at stage i to represent the replacement node r(i) of failed node i. In our paper, we focus on the exact repair where the data stored in replacement node r(i) is exactly the content previously stored in failed node i. Furthermore, d directed edges are drawn from “Out” vertices at earlier stages to the vertex Inr(i) . Let the d “Out” vertices to which Inr(i) connects be Outi1 ,... ,Outid , and let the d helper nodes of failed node i form a regeneration set Hi = {i1 , i2 , . . . , id }. For j = 1, 2, . . . , d, the capacity of the directed edge from Outi j to Inr(i) is β , meaning that the replacement node r(i) would download β units of data from its helper node i j ∈ Hi . The repair bandwidth γ = dβ is typically smaller than file size F with γ ≥ α . (5) Under traditional RCs, any k out of n nodes (i.e., there are Ckn reconstruction sets) can realize data reconstruction, and when arbitrary single node i fails, any d out of the remaining (n − 1 ) nodes (i.e., there are Cdn−1 regeneration sets) can realize the regeneration of failed node. It is assumed without loss of generality that all the replacement nodes r (1 ), . . . , r (i − 1 ) at stages 1,2,... ,(i − 1 ) are as the helper nodes of failed node i. In the proposed generalised RCs, not all collections of k nodes can realize data reconstruction and not all collections of d nodes can realize regeneration. In this case, not just all but some p(i) of the replacement nodes at the earlier (i − 1 ) stages become helper nodes of failed node i. Here parameter p(i) is an integer with 0 ≤ p(i ) ≤ i − 1. The remaining [d − p(i ) ] helper nodes of failed node i are from storage nodes i + 1, . . . , n at stage 0. A flow on the information flow graph G is an assignment of non-negative real numbers to the edges, satisfying the flow con-

servation constraints and capacity constraints. The graph G constitutes a multicast network with the DCs as destinations. An underlying assumption here is that the flow graph G depends on the sequence of failed nodes [4]. Note that for every stage of the network, there can be a different sequence of failure and generation, and thus there may exist different reconstruction sets and regeneration sets.

4.2. Intruder model It is assumed in the intruder model {e1 , e2 }DSS that there exists an Eve who can obtain all the downloaded data passed on to a set E1 of e1 nodes in the system as these nodes undergo repair, and may also gain read-access to the content of a disjoint second set E2 of e2 nodes. As assumed in [4], a DC downloads data from the k output nodes in one reconstruction set R = {Outr (1) , · · ·, Outr (k) }, and an Eve gains all the downloaded data by observing the e1 input nodes Inr (1 ) , · · ·, Inr (e1 ) when they were being repaired, and also read-access to the stored data by observing the e2 output nodes Outr (e1 +1 ) , · · ·, Outr (e1 +e2 ) . Therefore, the amount of data downloaded by node i, (i ∈ E1 ) and passed on to set E1 are [d − p(i ) ]β , which are observed by the Eve, as shown in Fig. 4. The total amount of downloaded data that leak out are

e 1 [d − p(i ) ]β . A worst case is considered where the e1 + e2 stori=1 age nodes compromised to the Eve are in one reconstruction set R. In this case, the upper bound on secrecy capacity of {e1 , e2 }DSS will be derived in the next section. Note that the Eve also has observed the data stored on the nodes in set E1 , since the stored content can be computed from the repair data. Also, we must have

e1 + e2 < k

(7)

otherwise, the Eve will obtain the content on more than k nodes, from which the original data can be computed. As assumed in [9], the Eve has unbounded computational power, is passive and non-collusive. As an example of this model, consider a peer-to-peer storage system. The e1 nodes described above may represent nodes in a network belonging to an adversary. Thereby the Eve is allowed to listen to all the downloaded data during the repair processes of these e1 nodes. The e2 nodes may represent the nodes which may be exposed only momentarily, allowing the Eve access to only the stored data.

Please cite this article as: J. Xu et al., Generalised regenerating codes for securing distributed storage systems against eavesdropping, Journal of Information Security and Applications (2017), http://dx.doi.org/10.1016/j.jisa.2017.02.002

ARTICLE IN PRESS

JID: JISA 6

[m5G;March 3, 2017;16:46]

J. Xu et al. / Journal of Information Security and Applications 000 (2017) 1–8

output nodes in set Out(e2 ) = {Outr (e1 +1 ) , · · ·, Outr (e1 +e2 ) } ⊂ Out(∞ ) .

5. Upper bound on secrecy capacity 5.1. Problem statement: secrecy capacity Denote F as a random variable distributed uniformly over a finite field Fq of size q to represent the incompressible data file of size F at the source node S. The data file of size F is to be stored on a DSS. Thus we have H (F ) = F (in base logq ). Let In(∞ ) = {In1 , In2 , · · ·} and Out(∞) = {Out1 , Out2 , · · ·} be the sets of input and output storage nodes in the flow graph G, respectively. For each storage node i, let Si and Di be the random variables standing for its stored content and the data downloaded from its helper nodes, respectively. Thus, Si also represents the data observed by a DC when connecting to node i. When an Eve accesses to output (input) node Outi (Ini ), it will observe all the stored content Si (downloaded data Di with H (Di ) ≤ γ ). For any subset Out(k) of Out(∞) with | Out(k ) |= k, define S (k )  {Si |Outi ∈ Out(k) }, representing the data stored in k nodes. Similarly, for any subset In(e1 ) of In(∞) , define D (e1 )  {Di |Ini ∈ In(e1 ) }, representing the data downloaded by the e1 nodes in set E1 . For any subset Out(e2 ) of Out(∞) , define S (e2 )  {Si |Outi ∈ Out(e2 ) }, representing the data stored in the e2 nodes in set E2 . The reconstruction property can be written as

H (F |

∀ Out(k) ⊂ Out(∞)

S ( k ) ) = 0,

(8)

And the perfect secrecy condition implies

H (F |D (e1 ) , S (e2 ) ) = H (F ),

∀ In

( e1 )

⊂ In

(∞ )

, Out

( e2 )

(9) ⊂ Out

(∞ )

with | In

( e1 )

|≤ e1

and | Out(e2 ) |≤ e2 . The secrecy capacity Cs (α , γ ) is then defined as the maximum amount of data that can be stored in this system, while the reconstruction property in (8) and the perfect secrecy condition in (9) are satisfied synchronously for all possible DCs and Eves [4], i.e.,

Cs ( α , γ ) 

H ( F ),

sup

The reconstruction property of (8) implies H (F | S (k ) ) = 0 and the perfect secrecy condition of (9) implies H (F |D (e1 ) , S (e2 ) ) = H (F ), it can be therefore obtained that

H (F ) = H (F |D (e1 ) , S (e2 ) ) − H (F |S (k ) )

= H (F |D (e1 ) , S (e2 ) ) − H (F |D (e1 ) , S (e2 ) , S (k\e1 ,e2 ) ) = I (F, S (k\e1 ,e2 ) |D (e1 ) , S (e2 ) )

≤ H (S (k\e1 ,e2 ) |D (e1 ) , S (e2 ) ) k



k

(12a )

=

min{(α − e1 β ), [d − p(i ) ]β}

(12)

i=e1 +e2 +1

where S (k\e1 ,e2 ) represents S (k ) |D (e1 ) , S (e2 ) with H (S (k\e1 ,e2 ) ) = H ( S ( k ) |D ( e 1 ) , S ( e 2 ) ) . Equality (12a) follows from the fact that for node j, j ∈ [e1 + e2 + 1, k], the secure node storage capacity α turns into (α − e1 β ), since node j provides β symbols to each of the replacement nodes r(1), · · ·, r(e1 ) to repair. Note that these e1 β symbols are linearly independent with high probability [4]. Besides, for each replacement node r (i ), i ∈ [e1 + e2 + 1, k], the amount of downloaded data are [d − p(i ) ]β . Then the general upper bound on secrecy capacity in Theorem 1 follows directly from the definition of (10).  Relationship with previous upper bounds: In the following part, the generality of the obtained upper bound on secrecy capacity in Theorem 1 is verified under traditional RCs case, since previous upper bounds on secrecy capacity were obtained in traditional RCs setting. In traditional RCs, p(i ) = i − 1, for each i ∈ [1, k]. According to (11), the secrecy capacity under traditional RC, denoted by CTs (α , γ ), can be obtained as follows.

CTs (α , γ ) = CGs (α , γ ) | p(i) =i−1

(10)



∀ Out(k) ⊂ Out(∞)

H (F |S (k ) ) = 0,

H (Sr (i ) |Dr (1) , . . . , Dr (e1 ) )

i=e1 +e2 +1

k

min{(α − e1 β ), (d − i + 1 )β}

(13)

i=e1 +e2 +1

Comparing (11) and (13), we can obtain that

H (F |D (e1 ) , S (e2 ) ) = H (F ), ( e1 )

∀ In ⊂ In Out(e2 ) |≤ e2 .

(∞ )

∀ In(e1 ) , Out(e2 )

and Out

( e2 )

⊂ Out

(∞ )

with | In

CTs (α , γ ) ≤ CGs (α , γ ) ( e1 )

|≤ e1 and |

5.2. Results The principal result of this section is a general upper bound on secrecy capacity of {e1 , e2 }DSS under the generalised RCs. Theorem 1 (Upper bound): For an {e1 , e2 } DSS under the generalised RCs, the secrecy capacity CG s (α , γ ) is upper bounded by k

CGs (α , γ ) ≤

min{(α − e1 β ), [d − p(i ) ]β},

(11)

i=e1 +e2 +1

where γ = dβ and 0 ≤ p(i ) ≤ i − 1. Proof. Considering an {e1 , e2 }DSS, assume that node i failed at ith stage, and is replaced by node r(i), i ∈ [1, k]. For a single failed node i, some p(i) of the replacement nodes at stages 1,... ,i − 1 are its helper nodes, and the remaining [d − p(i ) ] helper nodes are from nodes i + 1, . . . , n at stage 0, as shown in Fig. 4. Assume that a DC connects to the k nodes in Out(k ) = {Outr (1 ) , · · ·, Outr (k ) }. Now suppose an Eve accesses to e1 input nodes in set In(e1 ) = {Inr (1 ) , · · ·, Inr (e1 ) } ⊂ In

(∞ )

. And suppose the Eve also accesses to the e2

(14)

which implies that under the generalised RCs, secrecy capacity may be improved by choosing p(i ) , i ∈ [e1 + e2 + 1, k] carefully. As mentioned in Section 2, the interior points between MSR and MBR points cannot be achieved under exact repair with optimality. Thus, the generality of the obtained upper bound is verified at these two special points of MSR and MBR. Specifically, the generality of CTs (α , γ ) shown as (13) is verified by specifying it into MBR and MSR points and then showing its compactness (tightness) and consistency with previous results. MSR point: At this point, we have α = (d − k + 1 )β [9]. Then based on (13), the secrecy capacity of {e1 , e2 }DSS at MSR point, denoted by CTs |MSR , can be obtained as follows.

CTs |MSR = CTs |α =(d−k+1)β k



min{(d − k + 1 − e1 )β , (d − i + 1 )β}

i=e1 +e2 +1

(15a )

=

k

( d − k + 1 − e1 )β

i=e1 +e2 +1

= ( d − k + 1 − e1 )β ( k − e1 − e2 ) = (α − e1 β )(k − e1 − e2 )

(15)

Please cite this article as: J. Xu et al., Generalised regenerating codes for securing distributed storage systems against eavesdropping, Journal of Information Security and Applications (2017), http://dx.doi.org/10.1016/j.jisa.2017.02.002

ARTICLE IN PRESS

JID: JISA

[m5G;March 3, 2017;16:46]

J. Xu et al. / Journal of Information Security and Applications 000 (2017) 1–8

where (15a) follows from the fact that the minimum value of (d − i + 1 )β , i ∈ [e1 + e2 + 1, k] is (d − k + 1 )β , which is larger than ( d − k + 1 − e1 )β . Inequality (15) agrees with the upper bound on secrecy capacity obtained in [21]. Compactness: Sasidharan et al. [21] have shown that the upper bound of (15) is tight. MBR point: At this point, we have α = dβ [9], which means a replacement node stores all the data downloaded during its repair. Thus an Eve does not obtain any extra information from the downloaded data. Hence for MBR point, we can set that e2 = 0. Based on (13), the secrecy capacity of {e1 , e2 }DSS at MBR point, denoted by CTs |MBR , can be obtained as follows.

CTs |MBR = CTs |α =dβ k



min{(d − e1 )β , (d − i + 1 )β}

i=e1 +1 k

(16a )

=

i=e1 +1



= kd −

( d − i + 1 )β

  k 2



β − e1 d −

  e 1

2

β

CG ( α , γ ) ≤

min{α , [d − p(i ) ]β}

(17)

i=1

Specifying the above capacity CG (α , γ ) into the traditional RCs case, the capacity of a DSS under traditional RCs can be obtained by

CT (α , γ ) = CG (α , γ ) | p(i) =i−1 ≤

k

min{α , (d − i + 1 )β}

Table 2 A generalised RC for a DSS with (n, k, d)=(6,3,4), (α , β , F, F s ) = (4, 1, 10, 2 ) and Pk = { p(1) , p(2) , p(3) } = {0, 1, 1}. Denote the six nodes as “N i”, i ∈ [1, 6] and denote the α = 4 symbols stored on each of these six nodes as Dj , j ∈ [1, 4]. The reconstruction set and regeneration set of single failed node i are Ri and Hi respectively. For example, when node 3 failed, there exist two regeneration sets H3(1) = {1, 4, 5, 6} and H3(2) = {2, 4, 5, 6}. This means the new replacement node r(3) can connect to nodes 1, 4, 5, 6 or nodes 2, 4, 5, 6, and downloads the symbols K6 , K1 , K7 and K8 respectively, which are the exact symbols previously stored on failed node 3. D1

D2

D3

D4

Ri

Hi

N1

z1

K1

K2

K3

R1 = {4, 5, 6}

N2

z2

K1

K4

K5

R2 = {4, 5, 6}

N3

K6

K1

K7

K8

R3 = {4, 5, 6}

N4

z1

K1

z2

K6

R4 = {1, 2, 3}

N5

K2

K1

K4

K7

R5 = {1, 2, 3}

N6

K3

K1

K5

K8

R6 = {1, 2, 3}

H1(1) H1(2) H2(1) H2(2) H3(1) H3(2) H4(1) H4(2) H5(1) H5(2) H6(1) H6(2)

(16)

where (16a) follows from the fact that the maximum value of (d − i + 1 )β , i ∈ [e1 + 1, k] is (d − e1 )β . Inequality (16) agrees with the secrecy capacity obtained in [4] at MBR point. Compactness: Shah et al. [9] have shown that the upper bound of (16) is tight. Note that when there is no Eve, i.e., e1 = e2 = 0, the upper bound on secrecy capacity CG s (α , γ ) of (11) collapses to capacity of a DSS under the generalised RCs, denoted as CG (α , γ ), given by k

7

(18)

i=1

which agrees with the capacity obtained in [19]. Compactness: The RC constructions in [12] achieves the upper bound of capacity of (18), that is the upper bound of (18) is tight. 6. Example implementation and analysis of generalised RCs 6.1. Example implementation of generalised RCs achieving upper bound In this section, an example implementation of the explicit construction of generalised RCs is provided, which achieves the upper bound on secrecy capacity in Theorem 1. Consider an {e1 , e2 }DSS with n = 6, k = 3, d = α = 4, β = 1, e1 = 2, e2 = 0. Denote set Pk = { p(1) , p(2) , p(3) } = {0, 1, 1}. Based on (5), we have σ = 2. Further, according to (11), we have secrecy capacity CG s (α , γ ) ≤ 2. Let the secure message Fs arrive at the upper bound of secrecy capacity, that is F s = 2. From (17), the capacity of a DSS in the absence of an intruder is CG (α , γ ) ≤ 10. Thus the maximum number of information that can be stored on the DSS in the absence of an intruder is F = 10. Then the parameter set of this example is (α , β , F, Fs ) =(4, 1, 10, 2). The information vector X = [x1 , x2 ] ∈ Fq . Here choose q = 11. The random key vector is K = [K1 , . . . , K8 ]. Then, according

= {3, 4, 5, 6} = {2, 4, 5, 6} = {1, 4, 5, 6} = {3, 4, 5, 6} = {1, 4, 5, 6} = {2, 4, 5, 6} = {1, 2, 3, 6} = {1, 2, 3, 5} = {1, 2, 3, 4} = {1, 2, 3, 6} = {1, 2, 3, 4} = {1, 2, 3, 5}

to (1) and (2), our generalised RCs consist of a (10,10)MDS code with its generator matrix given by



⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1

1

1

0

0

0

0

0

0

0

1

1

0

1

0

0

0

0

0

0

1

1

0

0

1

0

0

0

0

0

1

1

0

0

0

1

0

0

0

0

1

1

0

0

0

0

1

0

0

0

1

1

0

0

0

0

0

1

0

0

1

1

0

0

0

0

0

0

1

0

1

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(19)

then the output codeword X is given by

X = [K

X ] .

(20)

According to (19) and (3), Eq. (20) can be rewritten as X = [z1 z2 K1 . . . K8 ]. The coded symbols z1 , z2 , K1 , . . . , K8 are then stored on the nodes 1,... ,6 as shown by Table 2, following a generalised repetition code. The σ = 2 reconstruction sets are R(1 ) = {1, 2, 3} and R(2 ) = {4, 5, 6}. Besides, there are (n − α ) = 2 regeneration sets H((i1)) and H((i2)) for arbitrary single failed node i, i ∈ [1, 6]. In the following part, the reconstruction, regeneration and security against eavesdropping processes are discussed in details. Reconstruction process: It can be checked that there exists reconstruction set Ri consisting of k = 3 nodes to which a DC connects for data reconstruction, when arbitrary single node i, i ∈ [1, 6] failed and has not been repaired. From the data stored on the three nodes in one reconstruction set Ri , the DC observes all the symbols z1 , z2 , K1 , . . . , K8 and hence can decode the information

symbols x1 and x2 , as z1 = x1 + 8i=1 Ki and z2 = x2 + 7i=1 Ki . For example, when node 1 failed and has not been repaired, there exists reconstruction set R1 = {4, 5, 6}. This means the DC connects to nodes 4, 5, 6 and downloads the symbols K1 , . . . , K8 and z1 , z2 from these three nodes to recover the two message symbols x1 and x2 . The reconstruction set Ri for arbitrary failed node i, i ∈ [1, 6] are summarised in Table 2.

Please cite this article as: J. Xu et al., Generalised regenerating codes for securing distributed storage systems against eavesdropping, Journal of Information Security and Applications (2017), http://dx.doi.org/10.1016/j.jisa.2017.02.002

JID: JISA 8

ARTICLE IN PRESS

[m5G;March 3, 2017;16:46]

J. Xu et al. / Journal of Information Security and Applications 000 (2017) 1–8

Regeneration process: In the case of a single failure, the new replacement node of the failed node contacts its d = 4 helper nodes in the system and recovers an exact copy of the lost data. For example, when node 3 failed, there exist two regeneration sets H3(1 ) = {1, 4, 5, 6} and H3(2 ) = {2, 4, 5, 6}. This means the new replacement node r(3) can connect to nodes 1, 4, 5, 6 or nodes 2, 4, 5, 6, and downloads the symbols K6 , K1 , K7 and K8 respectively, which are the exact symbols previously stored on failed node 3. Security against eavesdropping: To obtain the security guarantee for the data stored in the {e1 , e2 }DSS, consider in the example that nodes 1 and 2 are compromised and observed by an Eve who can access to their downloaded data. The Eve will observe seven independent symbols z1 , z2 , K1 , . . . , K5 (seeing Table 2), and therefore cannot obtain any information about x1 and x2 . Provided the Eve obtains three extra independent symbols K6 , K7 and K8 , the original message and keys will be totally known to the Eve. 6.2. Analysis of generalised RCs Implementation complexity: Generalised RCs possess several desirable properties, such as linearity, small field size and striping, which result in low complexity as discussed below. The generalised RCs are linear on a finite field Fq of size q. As shown in Section 3.2, the stored symbols z1 , . . . , zF s are linear combinations of the source symbols x1 , . . . , xF s , K1 , . . . , KU over the finite field Fq ; Any field of size F or higher suffices in our code; In the presence of the property of striping, the whole message file is divided into stripes of small sizes corresponding to β = 1. Since each stripe is of minimum size, the complexities of encoding, generalised reconstruction, and generalised regeneration processes are very low. In reality, the stripes can be operated in a parallel and efficient way by GPU/FPGA/multi-core processors. Besides, since the repeating “vertically” operation in generalised repetition codes, the reconstruction and regeneration processes can be realised by simply copying symbols, thus resulting in lower implementation complexity. Better system SL: As shown in Section 3.1, the generalised RCs possess a better system SL compared to traditional RCs. This is attractive for securing DSSs against eavesdropping, with more and more attentions attached to the data security. In reality, different DSS applications, such as government’s cloud platforms, the most security-sensitive enterprise cloud platforms and private clouds in banks, usually have different requirements of system SL. By carefully designing the number of reconstruction sets, NR , the generalised RCs can possess different system SLs. 7. Conclusion In this paper, the generalised RCs-the generalization of traditional RCs-are proposed for securing DSSs against eavesdropping. Compared to traditional RCs, the generalised RCs possess more coding schemes and better security level. An explicit construction of generalised RCs is provided at MBR point under the conditions that n = σ k, σ ∈ {1, 2, 3, . . .}, nα − σ F + σ = n, and d = k + 1. In the explicit construction, the number of reconstruction sets, NR , and the number of regeneration sets, NH , can be designed as NR = n = σ and NH = (n − α ) with 1 < nk ≤ Ckn and 1 ≤ (n − α ) ≤ Cdn−1 . k Based on the generalised RC setting, an intruder model is considered and a general upper bound on secrecy capacity of the intruder model is obtained. The obtained upper bound has been shown to be consistent with previous upper bounds at MSR and MBR points in traditional RCs. The explicit construction of generalised RCs can achieve the upper bound on secrecy capacity of the intruder model. As a kind of generalised RC, the explicit construction possesses several desirable advantages, such as linearity, small field size, and striping, resulting in low implementation complex-

ity. In the future work, we plan to investigate explicit construction designs of generalised RCs for arbitrary parameters. Acknowledgments This research was supported by the Natural Science Foundation (NSF) of China, Grant No. 61471222, and the NSF of Shandong Province, Grant No. ZR2015FM003. References [1] Yu Q, Shum KW, Sung CW. Tradeoff between storage cost and repair cost in heterogeneous distributed storage systems. Trans Emerging Telecommun Technol 2015;26(10):1201–11. [2] Ali O, Soar J, Yong J. An investigation of the challenges and issues influencing the adoption of cloud computing in Australian regional municipal governments. J Inf Secur Appl Nov. 2015;27–28:19–34. [3] Dorey PG, Leite A. Commentary : cloud computing-a security problem or solution? Inf Secur Tech Rep Aug. 2011;16(3–4):89–96. [4] Pawar S, Rouayheb SE, Ramchandran K. Securing dynamic distributed storage systems against eavesdropping and adversarial attacks. IEEE Trans Inf Theory Oct. 2011;57(10):6734–53. [5] Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proceedings of the 19th ACM SIGOPS symposium on operating systems principles (SOSP03). ACM, New York, USA; Oct. 2003. p. 29–43. [6] Kubiatowicz J, Bindel D, Chen Y, Czerwinski S, Eaton P, Geels D, et al. Oceanstore: an architecture for global-scale persistent storage. In: Proceedings of the 9th international conference on architectural support for programming languages and operating systems (ASPLOS). Cambridge, MA; Nov. 20 0 0. p. 190–201. [7] Schultz D, Liskov B, Liskov M. MPSS: mobile proactive secret sharing. ACM Trans Inf Syst Secur Dec. 2010;13(4):34. [8] Zhou L, Schneider FB, Renesse RV. APSS: proactive secret sharing in asynchronous systems. ACM Trans Inf Syst Secur Aug. 2005;8(3):259–86. [9] Shah NB, Rashmi K, Kumar PV. Information-theoretically secure regenerating codes for distributed storage. In: 54th annual IEEE global telecommunications conference, Kathmandu, Nepal; 2011. p. 1–5. [10] Li M, Qin C, Li J, Lee PPC. CDStore: toward reliable, secure, and cost-efficient cloud storage via convergent dispersal. IEEE Internet Comput May-June 2016;20(3):45–53. [11] Dimakis AG, Godfrey PB, Wainwright MJ, Ramchandran K. Network coding for distributed storage systems. In: Proceedings of IEEE international conference on computer commun (INFOCOM 07), Anchorage, Alaska; May 2007. p. 20 0 0– 8. doi:10.1109/INFCOM.2007.232. [12] Rashmi KV, Shah NB, Kumar PV. Optimal exact- regenerating codes for the MSR and MBR points via a product-matrix construction. IEEE Trans Inf Theory Aug. 2011;57(8):5227–39. [13] Benerjee KG, Gupta MK. Tradeoff for heterogeneous distributed storage systems between storage and repair cost. Mathematics Mar. 2015. [Online]. Available: http://arxiv.org/abs/1503.02276v1 [14] Benerjee KG, Gupta MK. On heterogeneous regenerating codes and capacity of distributed storage systems. CoRR 2014;abs/1402.3801. [Online]. Available: http://arxiv.org/abs/1402.3801. [15] Chen YJ, Liao CH, Wang LC. An eavesdropping prevention problem when repairing network coded data from remote distributed storage. In: IEEE global communications conference, Atlanta, GA, USA; 2013. p. 2711–16. [16] Xu J, Cao Y, Wang D, Wu C, Yang G. Optimal heterogeneous distributed storage regenerating code at minimum remote-repair bandwidth regenerating point. ETRI J Jun. 2016;38(3):529–39. [17] Koyluoglu OO, Rawat AS, Vishwanath S. The secrecy capacity of minimum bandwidth cooperative regenerating codes. In: IEEE international symposium on information theory proceedings (ISIT), Istanbul, Turkey; 2013. p. 1421–5. [18] Dau SH, Songy W, Yuen C. On block security of regenerating codes at the MBR point for distributed storage systems. In: 2014 IEEE international symposium on information theory (ISIT), Honolulu, HI, USA; 29 Jun.-4 Jul. 2014. p. 1967–71. [19] Dimakis AG, Godfrey PB, Wu Y, Wainwright MJ, Ramchandran K. Network coding for distributed storage systems. IEEE Trans Inf Theory Sept. 2010;56(9):4539–51. [20] Shah N, Rashmi K, Kumar PV, Ramchandran K. Distributed storage codes with repair-by-transfer and nonachievability of interior points on the storage-bandwidth tradeoff. IEEE Trans Inf Theory Mar. 2012;58(3):1837–52. [21] Sasidharan B, Kumar PV, Shah NB, Rashmi KV, Ramachandran K. Optimality of the product-matrix construction for secure MSR regeneration code. In: International symposium on communications, Athens; 2014. p. 10–14. [22] Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, et al. Erasure coding in windows azure storage. In: Usenix conference on technical conference, Boston MA; 2012. p. 82–96.

Please cite this article as: J. Xu et al., Generalised regenerating codes for securing distributed storage systems against eavesdropping, Journal of Information Security and Applications (2017), http://dx.doi.org/10.1016/j.jisa.2017.02.002