G Model
ARTICLE IN PRESS
FUSION-8567; No. of Pages 4
Fusion Engineering and Design xxx (2016) xxx–xxx
Contents lists available at ScienceDirect
Fusion Engineering and Design journal homepage: www.elsevier.com/locate/fusengdes
Utilizing cloud storage architecture for long-pulse fusion experiment data storage Ming Zhang a,b , Qiang Liu a,b , Wei Zheng a,b,∗ , Kuanhong Wan a,b , Feiran Hu a,b , Kexun Yu a,b a b
State Key Laboratory of Advanced Electromagnetic Engineering and Technology, Wuhan, Hubei, China School of Electrical and Electronic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, China
a r t i c l e
i n f o
Article history: Received 30 May 2015 Received in revised form 15 January 2016 Accepted 15 February 2016 Available online xxx Keywords: MongoDB Long-pulse LVS Data archiving Scientific database
a b s t r a c t Scientific data storage plays a significant role in research facility. The explosion of data in recent years was always going to make data access, acquiring and management more difficult especially in fusion research field. For future long-pulse experiment like ITER, the extremely large data will be generated continuously for a long time, putting much pressure on both the write performance and the scalability. And traditional database has some defects such as inconvenience of management, hard to scale architecture. Hence a new data storage system is very essential. J-TEXTDB is a data storage and management system based on an application cluster and a storage cluster. J-TEXTDB is designed for big data storage and access, aiming at improving read–write speed, optimizing data system structure. The application cluster of J-TEXTDB is used to provide data manage functions and handles data read and write operations from the users. The storage cluster is used to provide the storage services. Both clusters are composed with general servers. By simply adding server to the cluster can improve the read–write performance, the storage space and redundancy, making whole data system highly scalable and available. In this paper, we propose a data system architecture and data model to manage data more efficient. Benchmarks of J-TEXTDB performance including read and write operations are given. © 2016 Elsevier B.V. All rights reserved.
1. Introduction With the development of data acquisition system and plasma control technology, more and more tokamaks are undertaking long-pulse experiments, leading data grows rapidly. Taking ASDEX Upgrade as an example, the acquired amount of data per shot increased from 4 GiB to 40 GiB in seven years [1]. In addition, data in Large Helical Device (LHD) increase 10 times in approximately 5 years [2]. Furthermore, the rate of data to be recorded at ITER is at times anticipated to 50GB/S at peak times [3]. However, traditional storage technology equipped with one server has its limit of read–write speed. Although Storage Area Network (SAN) offers high-speed block-level access, it’s so expensive and complicated that only big enterprise can afford it. Therefore, we have to ask cluster and distributed system for assistance.
∗ Corresponding author at: State Key Laboratory of Advanced Electromagnetic Engineering and Technology, Wuhan, Hubei, China. Fax: +86 2787793060. E-mail addresses:
[email protected],
[email protected] (W. Zheng).
The purpose of a distributed file system (DFS) is to allow users of physically distributed computers to share data and storage resources by using a common file system. DFS offers user a high scalable and high-performance solution in general. Research results about the distributed file system have been given in some laboratories and the achievements are relatively advanced. ASDEX Upgrade used AFS-OSD which is featuring a parallel storage concept and modified from AFS (Andrew File System) to store big files more efficient [1]. COMPASS and LHD have tested GlusterFS respectively and got some good results in read–write performance. COMPASS utilized “Distributed-Replicate” 6 × 2 type include six servers and six custom-made computers and the maximum of reading speed approaches a value 300MiB/s for 6 clients [4]. LHD also adopted GlusterFS as their 5th generation storage. Their test results indicated that SSD GlusterFS volumes can provide 300–900MB/s write, 500–1900MB/s read performances [5]. However, DFS is inflexible compared with database cluster. DFS lacks a uniform data model, thus it is not easy to manage the data especially when it grows too fast. When users want to search the
http://dx.doi.org/10.1016/j.fusengdes.2016.02.050 0920-3796/© 2016 Elsevier B.V. All rights reserved.
Please cite this article in press as: M. Zhang, et al., Utilizing cloud storage architecture for long-pulse fusion experiment data storage, Fusion Eng. Des. (2016), http://dx.doi.org/10.1016/j.fusengdes.2016.02.050
G Model FUSION-8567; No. of Pages 4
ARTICLE IN PRESS M. Zhang et al. / Fusion Engineering and Design xxx (2016) xxx–xxx
2
Fig. 1. Archtecture of J-TEXTDB.
data of a signal, it usually provides a block of data rather than the exact data we want. Recently, a new technology about storage, named NoSQL (Not only SQL), has been used widely in Internet company such as Facebook and Twitter. NoSQL databases are increasingly applied in big data and real-time web applications, which are similar with what fusion research needed. Compared with DFS, NoSQL is more flexible and provides many functions including disaster recovery, security and scale horizontally. Hence, J-TEXTDB adopts MongoDB [6] to archive massive sized data. MongoDB is one of NoSQL databases and it is also an open-source, document-oriented database designed for ease of development and scaling. This paper focuses on several issues below which are much vital in fusion research in our opinion.
• High availability and high scalability. • Read–write speed must meet the demands of thousands DAQ cards. • High concurrency of requests when worldwide researchers want to query data. • Redundancy and failover. • Load balance and security.
2. Architecture The architecture of J-TEXTDB is made up by two parts. The first one is application cluster, it is oriented to users and plays a load balancer role in the meantime. Application cluster is composed of Linux Directors and APPServers. The second one is storage cluster, it is responsible for all the data transaction. Storage cluster is composed of ConfigServers and Shards. The architecture is depicted in Fig. 1.
2.1. Application cluster Application cluster is a group of computers running different processes or services. Major functions of Application cluster are as follows: 1. Linux Director receives users requests include general users and DAQs requests, and then dispatches them to an idle APP Server with Least Connections Algorithm [7,8]. 2. APP Server receives request and analyzes them to get a series of atomic operations. App Server sends operations to storage cluster, waiting for the response of it. 3. Storage cluster deals with the operations and sends the results to App Server. 4. App Server integrates the results and returns to the user according to users’ demand. The architecture of the Application cluster is fully transparent to end users, and the users interact as if it were a single highperformance virtual server. When the Director accepts request packets, it routes the packets to the chosen servers and the response packets can be sent to client directly rather than back to Director firstly then return client. In addition, J-TEXTDB adopts Backup server and Keepalived solution to handle director failover and provides a strong and robust health checking for LVS clusters. EAST has also used LVS as load balancer between users and Publish servers to lighten loads [9]. Apart from supporting WebScope of J-TEXT [10], another main function of App Server is to run a lightweight process named Mongos. Mongos is a router for the storage cluster. In other words, if Linux Director is the first door between users and Application cluster, Mongos is the second gate between Application cluster and Storage cluster. Mongos processes and targets operations to storage cluster and then returns results to the App Server.
Please cite this article in press as: M. Zhang, et al., Utilizing cloud storage architecture for long-pulse fusion experiment data storage, Fusion Eng. Des. (2016), http://dx.doi.org/10.1016/j.fusengdes.2016.02.050
G Model FUSION-8567; No. of Pages 4
ARTICLE IN PRESS M. Zhang et al. / Fusion Engineering and Design xxx (2016) xxx–xxx
2.2. Storage cluster Storage cluster scales out based on MongoDB. It divides the data set and distributes the data over multiple servers, or shards [11]. Shard is a very agile component in cluster because each shard is an independent database, and collectively, the shards make up a single logical database. Thus the numbers of shard and the capacity of cluster completely depend on how many computers on hand. We can add shards whenever the storage cluster needs to expand capacity. What’s more, each shard is also a master–slave replica set. With multiple copies of data on different database servers, replication protects a database from the loss of a single server and allows us to recover from hardware failure and service interruptions [11]. Each master–slave replication comprises a primary shard and two secondary shards, when the primary is unavailable, the replica set will elect a secondary to be primary by heartbeat and the priority we set before. Each shard is composed of many chunks which are a contiguous range of shard key values. Database splits chunks when they grow beyond the configured chunk size. The allowed range of the chunk size is between 1 and 1024 megabytes. Proper chunk size is beneficial to the performance of cluster. Config servers are very important because they store the metadata of the storage cluster. There are exactly three Config servers to ensure the data consistency. The metadata contains a mapping of the cluster’s data set to the shards. Mongos likes a bridge to connect users and shards, but it needs to use the metadata to target operations to specific shard when the requests of query coming. Storage cluster is a database archiving all the data including metadata, scientific and engineered data. It concentrates on read–write operations, failover, backup and scalability.
3. Data model and user API
The data model in the test is preliminary, which contains Id, shot, name, double array and path. The attribute of name is created randomly so that the data can be distributed uniformly in the whole cluster. 4. Performance test We adopted following hardware, software and methods for the performance test of J-TEXTDB cluster. 4.1. Server We used Dell PowerEdge R410, CPU: Inter(R) Xeon(R) E5606 2.13GHz; Memory: 8 × 2GB DDR3 2.127GHz; System: VMware ESXI 5.5.0. Utilizing VMware ESXI, we virtualized 6 servers: 2GB memory; Centos 6.5 64 bit; Software: MongoDB 2.6.4, ipvsadm v1.26, Keepalived v1.2.13.
Fig. 2. Shard test with 4 mongos, without LVS.
Data model is a big deal and must take everything into account. Therefore, we designed a preliminary data model first to test the architecture of J-TEXTDB. Since we have chosen NoSQL technology, we should not copy entire model from MDSplus. Nevertheless MDSplus is really good software and used widely in fusion research, a lot of details is worth learning [12]. We absorb the advantages of MDSplus and HDF5, using UNIX file path and shot to locate a signal [13]. There are only two concepts for users, one named Experiment, another called Signal. The Experiment just like the fold in UNIX or the Group in HDF5 and the Signal seems like file in UNIX or Dataset in HDF5. There are only two rules about the model: Fig. 3. Mongos test with 4 shards, without LVS.
1. Experiment and Signal must have two parameters at least: absolute path and shot number. 2. Experiment can have child Experiment and Signal, Signal does not have any child. So users can create any model they like according the simple practice. Both Experiment and Signal are stored as BSON documents in a collection which contains a group of related documents [14]. The path and shot number are not only attributes in the model class, they are indexes of database. It will improve the query speed for avoiding scanning every document in a collection. User API is a service designed to satisfy the demand of segment. When continuous data flow comes, the service splits the data into many small pieces, and then stores them in the database. The data of a signal can be resampled with less sample rate, so it can be displayed in different scale and supports J-TEXT WebScope.
3
Fig. 4. Chunk size test with 4 shards, 4 mongos, without LVS.
Please cite this article in press as: M. Zhang, et al., Utilizing cloud storage architecture for long-pulse fusion experiment data storage, Fusion Eng. Des. (2016), http://dx.doi.org/10.1016/j.fusengdes.2016.02.050
G Model FUSION-8567; No. of Pages 4
ARTICLE IN PRESS M. Zhang et al. / Fusion Engineering and Design xxx (2016) xxx–xxx
4
4.2. Client Custom-made PC, CPU: Inter(R) Core(TM) i5 2.8GHz; Memory: 4GB DDR3; System: Windows7 ultimate SP1 64 bit; Software: vSphere Client 5.5.0. 4.3. Test methods First of all, we listed some factors which may influence the performance of the cluster. 1. 2. 3. 4.
Number of shards. Number of mongos. Chunk size of cluster. LVS.
Then we adopted controlling variable method and normalization to analyze the influence of each variable. PC client is the data producer and multithreading was used to send data to virtual server via gigabit switch. Each test result has conducted 5–10 times at least and obtained the average. The speed of writing data is slowest when cluster only have 1 shard, so the result is normalized and set max time 100%. Fig. 2 shows that when the second shard joined the cluster, it indeed improved a lot. Then the third and the fourth shard contributed less and less for the cluster but improved the performance too. Mongos is the router of cluster, so it sent data to the cluster after querying Config Server. In Fig. 3, the data size is 256MB in first and the second test, 1024MB in third test. It is the worst situation when chunk size is 5MB from Fig. 4. And the line of relative time becomes smooth gradually when the chunk size is big enough. LVS is added to the cluster, so threads in client do not connect mongos directly anymore but with Linux Director. Linux Director adopted Least Connections Algorithm and dispatched requests of threads. It takes 63 ms and 59 ms on average for running LVS and without LVS to connect the database with 500 threads. Read test was also taken and it approached 69MB/s in the cluster under the condition: 4 shards, 4 mongos and without LVS. 5. Discussion We have conducted many tests with the different variables in the cluster, and the data reported above suggest that adding shards is a good way to improve the writing speed of cluster. A proper chunk size makes storage cluster faster in some ways. In addition, it is normal that running LVS is a little slower than running without LVS because threads were distributed to different mongos by modulo operation in the program without LVS. The performance improves a lot when the components of the cluster added from one to two servers, because it is a qualitative change from one server to cluster. Then performance improves slowly result from multiple factors. On the one hand, the cluster may be saturated because data is not big enough when compared with the number of shards. On the other hand, it needs more cost to balance each shard and guarantee the data consistence. Fig. 3 gives us a rough conclusion that it is faster if we decrease the numbers of mongos under the situation with little data.
Compared with first test and second test, the third test is more obvious. It is not the same with our expectation. The results may due to the fact that the CPU of host has only 4 cores, adding mongos increases overheads on switching cores, leading to performance degradation instead. Parallel connections are not enough to make mongos exhausted, it may be the other reason. When a chunk grows beyond the specified chunk size, database splits the chunk in half. Thus, the smaller the chunk is, the more frequent chunk will split. However, large chunks lead to a potentially more uneven distribution of data. So a proper chunk size is vital for the storage cluster. Finally, one point worth emphasizing is that the entire cluster is built on one virtual server. Furthermore, in YCSB tests, MongoDB 3.0 provides around 7x the throughput of MongoDB 2.6 for multithreaded, batch inserts [15,16]. Thus we are confident that storage cluster will perform better in practical environment. 6. Conclusion A number of tests were conducted on J-TEXTDB cluster with different configuration. The number of shards, mongos and the size of chunk influence the performance of storage cluster. LVS provides a unified IP address and reduces the load. The write speed approaches to 47MB/s and read speed is nearly 69MB/s. The result is really ideal, after all, it should be noted that this study has examined only on one virtual server. Next step, we will examine the performance of failover and backup. Furthermore, all of deficiencies above are to be solved in our future work and a data model fits to fusion research or other scientific data storage is also under designing now. References [1] K. Behler, et al., Update on the ASDEX Upgrade data acquisition and data management environment, Fusion Eng. Des. 89 (5) (2014) 702–706. [2] H. Nakanishi, et al., Data acquisition system for steady-state experiments at multiple sites, Nucl. Fusion 51 (2011) 113014. [3] G. Abla, et al., ITERDB—the data archiving system for ITER, Fusion Eng. Des. 89 (5) (2014) 536–541. [4] J. Písaˇcka, et al., Cluster storage for COMPASS tokamak, Fusion Eng. Des. 87 (12) (2012) 2238–2241. [5] H. Nakanishi, et al., Revised cloud storage structure for light-weight data archiving in LHD, Fusion Eng. Des. 89 (5) (2014) 707–711. [6] MongoDB, 2015. https://www.mongodb.org/. [7] Wensong Zhang, Linux Virtual Server for Scalable Network Services, Ottawa Linux Symposium, 2000. [8] Linux Virtual Server, 2015. http://www.linuxvirtualserver.org/. [9] F. Yang, et al., A new long-pulse data system for EAST experiments, Fusion Eng. Des. 89 (5) (2014) 674–678. [10] G. Zhuang, et al., Overview of the recent research on the J-TEXT tokamak, Nucl. Fusion 55 (2015) 104003. [11] MongoDB documentation, 2015. https://docs.mongodb.org/manual/core/ sharding-introduction/. [12] T.W. Fredian, J.A. Stillerman, MDSplus. Current developments and future directions, Fusion Eng. Des. 60 (3) (2002) 229–233. [13] G. Manduchi, Commonalities and differences between MDSplus and HDF5 data systems, Fusion Eng. Des. 85 (3–4) (2010) 583–590. [14] BSON, 2015. http://bsonspec.org/. [15] Performance testing MongoDB 3.0 part 1: throughput improvements measured with YCSB, 2015. https://www.mongodb.com/blog/post/ performance-testing-mongodb-30-part-1-throughput-improvementsmeasured-ycsb. [16] Brian F. Cooper, et al., Benchmarking cloud serving systems with YCSB, SoCC (2010) 143–154.
Please cite this article in press as: M. Zhang, et al., Utilizing cloud storage architecture for long-pulse fusion experiment data storage, Fusion Eng. Des. (2016), http://dx.doi.org/10.1016/j.fusengdes.2016.02.050