Lifetime-aware FTL to improve the lifetime and performance of solid-state drives

Lifetime-aware FTL to improve the lifetime and performance of solid-state drives

Future Generation Computer Systems 93 (2019) 58–67 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: ww...

1MB Sizes 0 Downloads 19 Views

Future Generation Computer Systems 93 (2019) 58–67

Contents lists available at ScienceDirect

Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs

Lifetime-aware FTL to improve the lifetime and performance of solid-state drives ∗

Yubiao Pan a,b , , Yongkun Li c , Huizhen Zhang a , Yinlong Xu c a

The School of Computer Science and Technology, Huaqiao university, Xiamen, China The College of Mechanical Engineering and Automation, Huaqiao university, Xiamen, China c The School of Computer Science and Technology, University of Science and Technology of China, Hefei, China b

highlights • A compression-aware PMT achieves less memory cost compared with existing schemes. • A latency-aware read approach is proposed to address the read amplification problem. • A latency-aware write approach is proposed to further improve the performance.

article

info

Article history: Received 9 January 2018 Accepted 7 October 2018 Available online xxxx Keywords: Solid-state drives Lifetime-aware FTL Compression Endurance Performance

a b s t r a c t Data compression techniques have been deployed in SSDs to prolong their lifetime due to the good tradeoff between data reduction and performance degradation. However, existing schemes either consume plenty of DRAM resources, or cause the read amplification problem, which may further result in significant performance degradation. In this paper, we propose a lifetime-aware FTL scheme (LAFTL) to improve both the lifetime and performance of SSDs by compressing several pages into one physical page. We then design a compression-aware page mapping table (PMT) to reduce the memory consumption, and propose a latency-aware approach to improve the performance. We conduct extensive trace-driven evaluations based on real-world workloads, and results show that our LAFTL reduces the average response time of reads by 3%–78% and ensures a minimal resource consumption in DRAM while obtaining an acceptable data reduction rate compared to other schemes. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Flash-based solid-state drives (SSDs) are a type of emerging storage devices which provide higher I/O performance, lower power consumption and less noise than traditional HDDs. As the price of SSDs continues to drop, SSDs have been widely deployed in consumer devices, desktop systems and large scale storage systems [1–3]. However, SSDs have several limitations, one major issue is that each block in an SSD will wear out after a limited number of program/erase cycles. For example, multi-level cell (MLC) SSDs allow a block to support only 10K erasures, and this number may even drop down to several thousand for triple-level cell (TLC) SSDs [4,5]. When a block reaches its limited number of erasures, it will be treated as a bad block and replaced by a new block in the over-provisioning area, which has a limited size. On the other ∗ Correspondence to: The School of Computer Science and Technology, Huaqiao University, No. 668, Jimei Road, Jimei District, Xiamen City, Fujian Province, China. E-mail addresses: [email protected] (Y. Pan), [email protected] (Y. Li), [email protected] (H. Zhang), [email protected] (Y. Xu). https://doi.org/10.1016/j.future.2018.10.011 0167-739X/© 2018 Elsevier B.V. All rights reserved.

hand, bit error rate in flash chip increases as SSDs undergo more erasures [6,7], and the increase will become sharp as the number of erasures reach the limit. Since less data written into SSDs may lead to less erasures, data reduction technologies provide a good choice to prolong the lifetime of SSDs. For example, data compression schemes have been developed in SSDs [8–12]. Existing data compression schemes are all deployed in the FTL and they can be classified into three categories: compression packing [8–10], combination compressing [11], and compression compacting [12]. For ease of presentation, we denote the compression packing, the combination compression and the compression compacting as CompPack, CombComp and CompComp, respectively. In particular, CompPack compresses each incoming page data first, then stores them in the buffer, finally packs them into a single page. It needs an additional structure to record offset and size of each compressed page in page mapping table (PMT), which consumes a large amount of resources of DRAM in SSDs. Besides, data reduction rate of CompPack is also the lowest among the three schemes. CombComp combines fixed-size group of incoming page data (e.g. 4 pages) into a chunk in the buffer first, then compresses the chunk and flushes the

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67

compressed chunk into flash. It achieves a better data reduction rate, but allowing a compressed chunk span several physical pages will cause read amplification. Besides, it also consumes additional resources in DRAM to serve read requests. CompComp compresses each incoming page and writes compressed pages sequentially to flash, leaving no space between them, which helps to improve the space utilization. However, CompComp also causes read amplification and consumes a large amount of resources in DRAM. Fig. 1 further compares the three schemes in detail. How to mitigate the read amplification problem and reduce the memory overheads while preserving an acceptable data reduction rate still remains as a critical problem to compression-based FTL design so as to improve both the performance and lifetime of SSDs. In this paper, we propose a LAFTL (short for lifetime-aware FTL) scheme by leveraging compression in SSDs. LAFTL improves the performance and lifetime of SSDs with two novel designs. One is a Compression-aware Page Mapping Table, which consumes the least resources in DRAM to support compression compared with existing schemes described above. The other is a Latency-aware Approach, which improves the read/write performance when conducting decompression and compression in SSDs. In particular, we make the following contributions in this paper.

• We propose a compression-aware PMT design to reduce memory consumption. In this design, we only need additional 3 bits for each page in the PMT, which ensures minimal modification of the original PMT in FTL. Furthermore, our design incurs a fixed memory overheads even when the size of physical page increases, while other existing schemes may require more DRAM space for larger page sizes. • We develop a latency-aware read approach to address the read amplification problem. In particular, access to any page triggers only one read operation on one physical page from SSDs with our design, which eliminates the read amplification. Moreover, with compression-aware PMT, our design intelligently stops the decompression procedure when the original page data has been decompressed from the compressed page instead of decompressing all page data encoded in the compressed page. • We also design a latency-aware write approach to further improve the performance. First, our design adaptively stops compression when there is no enough space in a physical page so as to fit the compressed data into one physical page. Second, we also predict the incompressible pages and bypass the compression procedure for these pages so as to avoid unnecessary compression operations. • We validate the effectiveness of LAFTL in improving the lifetime and performance of SSDs by using trace-driven simulations with real-world workloads. Results show that compared to other compression-based schemes, our scheme ensures minimal memory overheads in DRAM, and reduces the average response time while preserving an acceptable data reduction rate. The rest of this paper is organized as follows. In Section 2, we first provide necessary background on SSDs, then give a brief introduction to existing schemes which deploy compression in SSDs, and finally we discuss the problems in existing schemes and present the motivation of our design. In Section 3, we describe the detailed design of our LAFTL. In Section 4, we conduct trace-driven simulations to show the performance and lifetime improvement of our design. We review related works in Section 5 and conclude the paper in Section 6. 2. Background and motivation In this section, we first introduce some background on SSDs. After that, we provide a brief introduction to existing schemes which deploy compression in SSDs, and give the motivation of our design.

59

2.1. Background on SSDs Read, write and erase are three basic operations in SSDs. In particular, read and write operations are performed in unit of page with a typical size 4 KB. Erase operation is performed in unit of block which commonly consists of 128 or 256 physical pages. Because pages cannot be updated in place, SSDs adopt out-of-place update scheme. Precisely, in order to update one page, this scheme first writes the new data to another free page, and then marks the old page as invalid. In additional, SSDs introduce garbage collection (GC) to reclaim the space occupied by those invalid pages. Due to their different operations from HDDs, SSDs employ the flash translation layer (FTL) to handle page mapping and GC so as to provide the same host interface connection (e.g., SATA, PCIE etc.) as HDDs. A page-mapped FTL maps logic pages (LPN) into physical pages (PPN) using a page mapping table (PMT). 2.2. Existing compression-based approaches As we stated in Section 1, data compression not only obtains better data reduction rate than delta encoding, but also achieves higher performance than data deduplication in SSDs. Various data compression schemes in FTL have been proposed, and they can be classified into three categories: CompPack [8–10], CombComp [11], and CompComp [12]. Fig. 1 illustrates the three schemes. In this example, we suppose that there are four pages denoted as A, B, C, D in four incoming write requests. We assume that the four write requests arrive sequentially and their LPN are 0000, 0001, 0002 and 0003, respectively. Fig. 1(a) shows an example of CompPack. It compresses those data into A′ , B′ , C ′ , D′ at first, then packs A′ and B′ together because they can be fitted into one physical page. Unfortunately, C ′ and D′ will occupy one physical page separately because the size of C ′ plus D′ is larger than one page. Finally, those compressed data are flushed to flash after adding their mapping information in PMT. As shown in Fig. 1(a), LPN 0000 is mapped to PPN 0105, so is LPN 0001. LPN 0002 is mapped to PPN 0106 and LPN 0003 is mapped to PPN 0107. In order to distinguish B′ from the page of PPN 0105, it needs additional data structure OFFSET and SIZE. For example, when we want to read the page with LPN 0001, it fetches the data from the physical page with PPN 0105 through PMT, extracts the content of B′ by using its OFFSET and SIZE in PMT, and decompresses B′ to serve the read request. Fig. 1(b) illustrates CombComp. It first combines those four pages into a chunk denoted as (A + B + C + D), then compresses this chunk into (A + B + C + D)′ . Finally, the compressed chunk is flushed to flash after updating mapping information in PMT. As shown in Fig. 1(b), LPN 0000, 0001, 0002, 0003 are all mapped to PPN 0105. Because the compressed chunk (A + B + C + D)′ occupies two physical pages, it is necessary to add an additional data structure to record the number of pages which the compressed chunk occupies. In order to read the page with LPN 0002, it must fetch two physical pages first and decompress the compressed chunk to (A+B+C +D). With the help of Head Information shown in Fig. 1(b), it gets the third page from the original chunk to serve the read request. As illustrated in Fig. 1(c), CompComp compresses those data into A′ , B′ , C ′ , D′ at first just like the CompPack scheme, but it writes the compressed pages to flash sequentially. This scheme allows a compressed page to span two physical pages (e.g., the compressed page C ′ ). It also needs the OFFSET and SIZE in PMT to serve read requests. For instance, in order to read the page with LPN 002, FTL directs the read operation on the physical page with PPN 0105 via PMT. According to the OFFSET and SIZE, the FTL knows that this read operation needs to fetch two physical pages, after read the data on flash, it will get C ′ and decompress it to its original version C .

60

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67

Fig. 1. An example of three categories of compression schemes in FTL.

2.3. Problems and motivation

3.1. Design objectives

On the one hand, all the above compression based schemes consume plenty of resources in DRAM. We assume that those schemes are implemented in page-mapped FTL and the size of each physical page is 4 KB. So 12 bits are enough to represent the OFFSET in Fig. 1(a) and Fig. 1(c), and keeping the SIZE consumes the same number of bits as the OFFSET does. Besides, it is necessary to add one bit to indicate whether one page is a compressed version or not in practical implementation, which is not shown in Fig. 1. In an SSD with a capacity of 256 GB, CompPack needs more additional 256 GB ∗ (12 bits + 12 bits + 1 bit) = 200 MB space in DRAM at least 4 KB compared to the FTL without compression, so does CompComp. Furthermore, it is obvious that the size of the head information in Fig. 1(b) is larger than 200 MB. However, large DRAM consumption may inevitably increase the production cost of SSDs. On the other hand, CombComp and CompComp will cause the read amplification problem which means an SSD may fetch more than one physical page to serve a single page read request. Though CompPack does not cause the read amplification, its data reduction rate is the lowest among the three schemes as it performs compression page by page. From above discussions, we can conclude that how to eliminate the read amplification and minimize the memory overheads while maintaining an acceptable data reduction rate still remains as a critical problem to both the performance and lifetime of SSDs for compression-based FTL design. In this paper, we address this problem by developing a lifetime-aware FTL, which we call LAFTL, and we present the design details in next section.

The lifetime-aware FTL design mainly aims for improving both the performance and lifetime of SSDs, and it is motivated by the following three objectives.

3. Design of LAFTL

• Minimizing additional memory overheads. Compression requires additional resources, which is one major limit when implementing it inside SSDs. The additional resources include a large amount of space in DRAM which is closely related to both the price and performance of SSDs. Therefore, our first objective is to minimize the additional memory overheads. • Eliminating read amplification. Compression may cause the read amplification problem in SSDs, thus finally leads to the degradation of performance. Therefore, we must propose a novel design to eliminate the read amplification when SSDs respond to all read requests. • Obtaining acceptable data reduction. Reducing data writes to SSDs improves the lifetime of SSDs, and thus decreases the price/capacity ratio. Besides, it can also improve the performance of SSDs by reducing the garbage collection overhead. Thus, our third objective is to obtain an acceptable data reduction rate by utilizing compression in LAFTL design. To achieve the above three goals, the main idea of LAFTL is to compress pages together as many as possible and store the compressed output entirely into one single physical page. In particular, LAFTL can compress seven incoming pages into one physical page at most. Specifically, LAFTL consists of a novel compression-aware PMT design to reduce resource consumption, as well as latencyaware read and write approaches to improve performance. We present the details of these techniques in the following subsections. 3.2. Compression-aware PMT

In this section, we first present our design objectives, then show the details of LAFTL, which contains a compression-aware PMT design, and latency-aware read and write approaches. Finally, we discuss several issues in practical implementation.

Because the LAFTL can compress seven pages together into one physical page at most, the PMT requires additional resources as we described in Section 2. Note that, minimizing the consumption of

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67

Fig. 2. Three cases when performing compression in compression-aware PMT.

additional resources is the key point in the LAFTL design. Therefore, we propose a compress-aware PMT which only consumes an additional 3 bits for each page in the PMT. Thus, a 256 GB SSD just GB needs additional 256 × (3 bits) = 24 MB DRAM space which is 4 KB much less than that consumed by existing schemes shown in Fig. 1. As we all know, 3 bits can present eight states. At first, we choose 000 to represent whether the page with a specific LPN is compressed or not. Due to the variation of data compressibility in runtime, some incompressible pages such as video, picture or encrypted data may also be required to write to LAFTL. Unfortunately, the compression algorithm may even increase the size for those data, and thus occupies more than one physical page in the SSD. Thus, it is necessary to store the original version of those data rather than their compressed version. If the state is 000, it means the corresponding physical page contains the original data. Otherwise, we must perform the read and decompression operations to obtain the original data. Then, we use the rest states to indicate the relative order of a page when it is compressed with other pages. For example, 010 indicates that the page with the specific LPN is at the second place when performing compression. Seven states are enough to indicate all relative orders, because the LAFTL compresses seven pages together into one physical page at most. Fig. 2 shows the three cases that may happen when our compression-aware PMT works. Fig. 2(a) shows the case in which seven pages can be compressed into one physical page. In this case, when performing the write requests, the LAFTL compresses those 7 pages together first, and then records the same PPN for those pages in the compression-aware PMT. We note that the additional 3 bits are filled with the corresponding state respectively. Finally,

61

the LAFTL flushes compressed data into the physical page, say the page with PPN 0105 as in the example. If we want to read a page, e.g., the page with LPN 0001, the LAFTL first uses one lookup in the compression-aware PMT to fetch the data from the physical page, i.e., the page with PPN 0105 as in the example, and then performs the decompression to get the data according to the relative order recorded by the three bits. For example, the second 4 KB data is the one we need in the example because its state is 010 in the compression-aware PMT. Fig. 2(b) shows the case in which less than seven pages are compressed into one physical page, e.g., only four pages as shown in the example. Compressing only four pages together means that if the fifth page is compressed with those four pages, then the compressed result will occupy more than one physical page, or the SSD forces to commit the four page to flash or buffering times out. In this case, the compression-aware PMT uses only the first four states to denote the relative orders of the four pages. Finally, Fig. 2(c) shows the case in which the compressed data size exceeds one physical page even only compressing one page of data. That is, the compression actually increases the size of the uncompressed data. We call this case compression fail, and LAFTL will choose to flush the original uncompressed data in this case. As shown in the example, the size of an incoming page with LPN 0000 is increased after the compression, so the LAFTL flushes the original page into the physical page with PPN 0105. Finally, state 000 is assigned to LPN 0000 in the compression-aware PMT to indicate that the data is not compressed. When the LAFTL serves the read request for the page with LPN 0000, it uses a lookup in the compression-aware PMT to get its PPN and state, and then fetches the content from its PPN without doing decompression. As described above, the compression-aware PMT can ensure minimal modification of PMT in FTL. Furthermore, there is no extra space consumed by the compression-aware PMT even when the size of physical page increases. The reason is that the LAFTL uses additional bits to represent the states which is not dependent on the page size. This feature is very important because the size of a page may vary from SSDs, and it is considered to be larger (e.g., 8 KB/16 KB) in some SSDs. In contrast, other schemes introduced in Section 2 will consume more space in DRAM for those SSDs. For example, the OFFSET will consume 14 bits for each LPN as shown in Fig. 1(a) when the size of a physical page increases to 16 KB, while it consumes 12 bits when the page size is 4 KB. 3.3. Latency-aware read Enabling compression in an SSD may increase the latency of read operations. To alleviate the influence of compression on read latency, we propose a latency-aware read approach, which requires to read only one physical page from the flash chip for each read request and also alleviates the latency caused by decompression procedure. In particular, our latency-aware read can stop the decompression procedure intelligently as long as the desired page has been decompressed rather than decompressing all pages compressed into the physical page. We take Fig. 2(b) as an example. If there is a read request to access the page with LPN 0001, the LAFTL finds its PPN and state in the compression-aware PMT first, then fetches the content from the physical page with PPN 0105. According to the state 010, the LAFTL stops the decompression procedure till it has decompressed the first two pages. Finally, it uses the second page to respond to the read request. Note that the decompression procedure can be always stopped correctly by using the state in the compression-aware PMT, though the LAFTL fetches the whole content which may consist of the compressed content and some irrelevant information from the corresponding physical page. Fig. 3 shows the flow of a read operation by using LAFTL.

62

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67

Fig. 3. The flow diagram of a read operation.

Note that Fig. 3 shows how the LAFTL handles only one read request. To handle multiple read requests, a read buffer can be used to avoid some fetching and decompression operations because storing the data which have been fetched from physical pages in the read buffer may increase the hit rate because of data locality. Besides, with compression being enabled, the read buffer can also cache more data, thus it can further improve the hitting rate and alleviate the read latency. 3.4. Latency-aware write The compression may reduce the write latency because it reduces the amount of data written into the SSD. However, it also adds another operation on the write path which introduces more time for finishing the write operation. To address this issue, we design a latency-aware write approach which includes three key points. The first one is that the latency-aware write can intelligently stop the compression procedure when there is not enough space in a physical page for compressing the seven incoming pages of data. After stopping the compression, it cuts the partial content of compressed result which can be just fitted into one physical page. The second key point is that incompressible page data can be predicted by the latency-aware write and the compression procedure is bypassed for the writes of these pages. The last one is that when there is only one page of data that can be compressed into a physical page, the LAFTL keeps its original version on flash instead of storing its compressed result. Note that the first two key points can avoid unnecessary overheads during the compression procedure so as to improve the write performance. The last two key points will eliminate the decompression procedure when performing read operations on those pages.

Fig. 4. The working flow of a write operation.

To realize the latency-aware write approach, two simple predictors denoted as P1 and P2 respectively are proposed. In particular, when the first page in a group of pages is being compressed, P1 checks the data reduction rate as one fourth of its content has been compressed. Only if there is no any bit that can be compressed, P1 treats this page as incompressible, thus the latency-aware write stops the compression procedure and flushes its original content. When the second page can be compressed with the first page into one physical page, P2 gets the data reduction rate of the second page and uses it to predict whether the third page can be compressed with the first two pages into one physical page. If not, then we stop the compression procedure and flush the compressed result of the first two pages. Otherwise, P2 is updated to the data reduction rate of the third page and uses it to do the prediction again. Note that our design of predictors is reasonable because of the locality in workloads. Fig. 4 shows the flow of a write operation. The LAFTL uses a buffer to combine the incoming pages and does not send them through the write module until seven pages

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67

have been combined into a group or the SSD forces to commit the data to flash or buffering times out. Thus the first step in Fig. 4 is reasonable. In addition, we formulate the procedure of compression in the SSD to guide the developers to better select the compression algorithm in the next subsection. 3.5. Discussion 3.5.1. PMT management Data updates will change its LPN–PPN mapping in PMT. Though we add 3 bits into each entry of PMT, there is no difference between PMT management of the LAFTL and the original FTL. We also pack those entries of PMT into translation pages in an ascending order of LPNs. Those translation pages are stored in the flash chips and flash translation layer adopts TLB-like cache policy with LRU scheme to serve requests and maintain consistency. 3.5.2. P1 in the latency-aware write As described above, only if the data reduction rate is 1, P1 treats this page as incompressible. Otherwise, the first page and the second one are always compressed together to check whether they can be compressed into one physical page. However, it will introduce latency caused by compression without any benefit. For example, the data reduction rate of the first page is 0.8, and the second one cannot be compressed with the first page into a physical page. Thus, the LAFTL conducts compression operations for these two pages and flushes the original content of the first one into flash according to Fig. 4. Because the LAFTL compresses several pages into one physical page, it is likely to compress the first two pages into one physical page when the data reduction rate of the first one is 0.5. Therefore, only if the data reduction rate of the one fourth of the first page is less than 0.5, the compression continues. This setting in P1 will alleviate the latency caused by compression but introduce more writes into flash memory due to the increase of the data reduction rate. We will further show the impact of this setting of P1 on average write response time and the number of writes in Section 4. 4. Experimental evaluation In this section, we conduct extensive evaluations to show the effectiveness and performance of our LAFTL by using the widely accepted trace-driven SSD simulator, FlashSim [13]. In particular, we compare LAFTL with three compression-based approaches, CompPack, CombComp, CompComp and traditional FTL without compression from the aspects of lifetime and I/O performance. In terms of the lifetime, we compare the total number of pages written into the SSD. For I/O performance, we show the average read/write response time of requests within a workload. Besides, we also study the impact of different settings for the latency-aware write on average write response time and the number of writes. In the following subsections, we first introduce the FlashSim configuration and the I/O workloads used for driving our evaluations, then we present the evaluation results. 4.1. FlashSim configuration In our evaluation, an SSD with 64 GB capacity is configured to contain 2 packages, each of which contains 4 dies with 32 planes each. There are 256 blocks in each plane and 256 pages of size 4 KB in each block. For timing parameters, reading a page data from flash media to a register consumes 25 us, and writing a page data from a register to flash media consumes 300 us. Besides, compressing one page data is set as 50 us and the time to decompress one page data from its compressed version is configured as 2.5 us. We extend FlashSim with our lifetime-aware FTL, CombComp, CompComp, and CompPack.

63

Table 1 Statistics of workload traces. Trace

# of W.

Avg. W. size

# of R.

Avg. R. size

Fin1 Fin2 Webmail Online

4.06 M 0.56 M 6.39 M 4.21 M

4639.2 B 5369.0 B 4096.0 B 4096.0 B

1.27 M 3.14 M 0.56 M 0.69 M

4125.2 B 4635.7 B 10359.0 B 8894.9 B

Table 2 Additional resource consumption for each page in DRAM under different page sizes. Size of Page

LAFTL

CombComp

CompComp

CompPack

4KB 8KB 16KB

3 bits 3 bits 3 bits

107 bits 107 bits 107 bits

25 bits 27 bits 29 bits

25 bits 27 bits 29 bits

4.2. Workloads We consider four workloads in our evaluations. They are Fin1 and Fin2 [14], Webmail and Online [15]. Fin1, Webmail and Online are all write dominant. In particular, there are 4.06 million write requests and 1.27 million read requests in Fin1 trace. Webmail trace contains 6.39 million write requests and 0.56 million read requests. The number of writes and reads in Online trace are 4.21 million and 0.69 million, respectively. For Fin2, it is read dominant. Readers can refer to Table 1 for the detailed statistics of different workloads. Additionally, we conduct the evaluations with compression-based FTL by assigning randomized data reduction rate values to the requests in the traces. Those values follow Gaussian distribution whose average value equals to µ, which is same to [16]. In our experiments, we have µ = 0.25, 0.5, 0.9 to represent high, medium and low data reduction level, respectively. In the following subsections, we show the results under different I/O workloads with different data reduction levels for different FTLs. 4.3. Memory overheads Our proposed compression-aware PMT minimizes the memory overheads compared to other compression-based FTL. This feature is the key point in our LAFTL. As described in Section 3, our compression-aware PMT only consumes extra 3 bits for each page compared to the page-mapped FTL without compression. For CompComp and CompPack, as shown in Fig. 1, the OFFSET consumes 12 bits with the page size of 4 KB, so does the SIZE. Besides, CompComp and CompPack both need one bit to indicate whether a page is compressed or not. Thus, they both consume extra 25 bits for the compression function. For CombComp, from Fig. 2(c), we find that the amount of additional resource consumption is related with the total number of pages in the SSD. If there are 64 million pages, it will need 26 bits to represent a LPN/PPN which costs additional (26 × 4 + 2 + 1) = 107 bits. Table 2 shows the amount of additional memory overheads for each page in DRAM under different page sizes. Compared with other compression-based FTL schemes, we can conclude that, on the one hand, the LAFTL ensures minimal resource consumption in DRAM which is important for the price and performance of SSDs. On the other hand, the memory consumption is fixed for the compression-aware PMT even when the size of page increases, which is an very important feature because page size varies from SSDs, and it also becomes larger for large-scale SSDs, e.g., 3D SSDs.

64

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67

4.4. Average read response time In this subsection, we compare the average read response time under different I/O workloads with different values of µ. In particular, we collect the response time of each read request and the total number of read requests within a workload, and then compute the average response time over all read requests. Fig. 5 shows the results of average read response time under different workloads. The results show that our LAFTL achieves better read performance than all other schemes, especially for NoComp, CombComp and CompComp. In particular, for the Fin1 trace with different values of µ, the LAFTL reduces the average read response time by 60%, 60%–78%, 20%–47%, 3% compared with NoComp, CombComp, CompComp, CompPack, respectively. The improvement mainly benefits from the high temporal locality in the Fin1 workload, which means that a recently accessed page may be accessed again in the near future with a high probability. The main reason why LAFTL achieves higher performance is that the latency-aware read design in the LAFTL always triggers only one read operation on flash for each read request, while more than one page reads may be triggered for CombComp and CompComp. Compared to CompPack which just triggers one read and decompresses the page data directly via having its size and offset, the LAFTL achieves a better data reduction rate (see Fig. 7) which helps to improve the hit ratio of read buffer indirectly. Thus, the average read response time of the LAFTL can be little better than that of CompPack shown in Fig. 5. Moreover, our LAFTL achieves better write performance and data reduction rate than CompPack (see Figs. 6 and 7). Besides, we see that the average response time increases a lot for CombComp and CompComp when µ varies from 0.25 to 0.9. The main reason is that more and more read requests will trigger more than one page read as it becomes harder to obtain gains via compression when µ becomes larger. 4.5. Average write response time In this experiment, we compare the average write response time under different workloads with different compression based FTL schemes. In particular, we collect the response time of each write request and the total number of writes requests within a workload, and then compute the average response time over all write requests. The results of write response time are shown in Fig. 6. We find that LAFTL always achieves better write performance than other compression-based FTL schemes. The improvement mainly comes from two key points in our LAFTL design. One is that the LAFTL can get a reasonable data reduction rate so as to reduce the total number of writes into the SSD. The other is that the latency-aware write in the LAFTL can intelligently stop the compression procedure if it predicts that this page may not be able to be compressed into one physical page. The former one reduces the latency caused by flash writes, and the later one alleviates the latency caused by compression. Specifically, for the Fin1 workload, the LAFTL reduces the average response time of write requests by 17%-74.1% compared with other schemes when µ = 0.25, and 3%-56.3%, 10.2%-40.9% when µ = 0.5, 0.9. However, for Fin2 workload, the LAFTL reduces it by 15.9%-62.6%, 2.7%-37.7% and 11.5%-20.7% compared with other schemes with µ = 0.25, 0.5, 0.9, respectively. The main reason is that high locality in the Fin1 workload makes data easy to be compressed, so this leads to less writes into the SSD. One special case shown in Fig. 6(c) is that the write performance of all compression-based FTL is even lower than that of NoComp for Webmail workload with µ = 0.9. This is because there are little benefits from the compression procedure. In other words,

Table 3 Total number of page writes to the SSD and the average response time with LAFTL under different settings. Trace

P1

# of page W.

Avg. W. Resp. time (us)

Fin1

0.50 0.75 1.00

1835294 1742367 1740483

156.1 157.7 160.6

Webmail

0.50 0.75 1.00

4554974 4304422 4298763

242.2 243.2 247.8

the saving of latency due to reduction of writes into the SSD is less than the gaining of latency caused by compression operations. The Online workload shares the same trend with the Webmail workload. 4.6. Total number of writes In this subsection, we collect the total number of pages written into the SSD under different workloads with different values of µ. In this evaluation, we allow P1 in the LAFTL to continue the compression only when the data reduction rate of one fourth of the first page is less than 0.75. The results are shown in Fig. 7, and each sub-figure corresponds to the case of a particular setting of µ. The horizontal axis represents different workloads, and the vertical axis represents the number of page writes of each scheme. For comparison, we normalize the results of the FTL without compression (NoComp) as one. From Fig. 7(a), we find that our LAFTL reduces more number of pages written into the SSD than all other compression-based schemes under all workloads with µ = 0.25. When µ is set as 0.5 and 0.9, our scheme achieves similar data reduction rate with CompPack, but a little bit worse than CombComp and CompComp, especially under the Webmail and Online workloads. However, we can see that the data reduction rate of LAFTL is still acceptable. Besides, LAFTL can achieve better read/write performance than other compression-based schemes, which is shown in the last two subsections. 4.7. Impact of different settings on latency-aware write As discussed in Section 3, P1 in the latency-aware write checks the data reduction rate when one fourth of the first page has been compressed. If it is larger than a pre-defined value, P1 stops the compression procedure. Otherwise, it keeps doing compression. Different settings of this threshold have a significant impact on the write performance, e.g., the average write response time and the number of page writes to SSDs. In this experiment, we study this impact by setting the threshold as 0.5, 0.75 and 1 in P1, and run our LAFTL under two workloads with µ = 0.5. Table 3 shows the results. With the value of P1, the average write response time also increases, while the total number of page writes decreases. Because more pages are compressed to check whether it can be compressed with others to one physical page and higher data reduction rate is obtained. Therefore, there is a tradeoff between the number of page writes and the write response time when setting the threshold. In particular, from the results, we can see that setting the threshold as 0.75 provides a good tradeoff for all the four workload traces. 5. Related work There are three kinds of data reduction technologies studied to apply to SSDs so as to prolong the lifetime. Deduplication technology in SSDs [17–19] requires a lot of CPU and I/O resources

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67

65

Fig. 5. Average read response time under different I/O workloads with different data reduction levels.

to achieve high data reduction rate, while delta encoding [16,20] performs better with lower data reduction rate. Different from these two technologies, data compression can achieve better performance which is important to SSDs because compression sits on the critical I/O path. Besides, it helps to obtain better data reduction rate than delta encoding. Therefore, we study how to apply data compression into SSDs in this paper. Existing data compression schemes in SSDs can be classified into three categories: CompPack [8–10], CombComp [11], and CompComp [12]. Yim et al. [8] propose the Internal Packing Scheme after studying a compression layer for SmartMedia card system. This scheme compresses every write and uses two buffering policies to fit those compressed content, which improve the lifetime of SmartMedia Card. Park et al. [9] use a CompPack method and design that the offset information for each compressed logical page can be stored together with their contents to the physical page itself. This design indeed reduces the memory consumption, but introduces worse data reduction rate than LAFTL and causes read operations more complex. Zuck et al. in [10] extend the method in [8] by re-ordering the compressed data to improve the data reduction rate. The three solutions above consume plenty of resources in DRAM and their data reduction rate is worse than our LAFTL. Lee et al. [11] propose CombComp which combines fixed-size group of incoming pages into a chunk, compresses this chunk and flushes the compressed chunk into flash. Besides, this method is implemented via hardware acceleration. Chen et al. [12] propose a CompComp scheme called the IPS real-time scheme, which compresses each page first and allows to flush compressed page span two physical pages, leaving much little fragmentation. [11,12] also consume more resources in DRAM than our LAFTL and

increase the read latency by the read amplification. We emphasize that our LAFTL achieves minimal memory overheads compare to existing schemes, eliminate the read amplification and has better write performance than other schemes. Besides, there are studies focus on the host-level data compression to improve the performance and lifetime of SSD-based storage systems [21–24]. Differently, our paper studies to apply compression into device-level, which can help SSDs to obtain gains when mounting SSDs into storage system without host-level data compression. Furthermore, compression in SSDs can become a selling point for SSD products. 6. Conclusions In this paper, we propose a lifetime-aware FTL to improve the lifetime and performance of SSDs. Our LAFTL tries to compress several pages into one physical page so as to obtain an acceptable data reduction rate and thus enhance the lifetime of SSDs. With the design of compression-aware PMT, the LAFTL ensures a minimal resource consumption in DRAM compared to other existing compression-based FTL schemes. Furthermore, we develop a latency-aware approach to handle read and write requests so as to improve the performance of SSDs. We implement our LAFTL, as well as existing schemes, including CombComp, CompComp and CompPack, in the FlashSim simulator and conduct extensive experiments. Results confirm that our LAFTL not only consumes the least resources in DRAM, but also improves the performance while obtaining an acceptable data reduction rate so as to prolong the lifetime of SSDs.

66

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67

Fig. 6. Average write response time under different I/O workloads with different data reduction levels.

Fig. 7. Total number of writes under different workloads. The results of the FTL without compression (NoComp) are normalized as one.

Acknowledgments

[3] http://www.tomsitpro.com/articles/amazon-aws-google-ssd-cloudcomputing,1-2011.html.

The work of Yubiao Pan was supported by the National Nature Science Foundation of China under Grant 61802133, by the Nature Science Foundation of Fujian Province, China under Grant 2018J05107, by the Education Department of Fujian Province, China under Grant No. JAT170039 and by the Scientific Research Funds of Huaqiao University, China under Grant No. 16BS807. The work of Huizhen Zhang was supported by the National Nature Science Foundation of China under Grant No. 61502181.

[4] F. Chen, D.A. Koufaty, X. Zhang, Understanding intrinsic characteristics and system implications of flash memory based solid state drives, in: SIGMETRICS, 2009.

References

[8] K.S. Yim, H. Bahn, K. Koh, A flash compression layer for smartmedia card systems, IEEE Trans. Consum. Electron. 50 (1) (2004) 192–197.

[1] http://www.wired.com/2012/06/flash-data-centers/all/. [2] http://www.networkcomputing.com/storage/google-plans-to-use-intelssd-storage-in-servers/d/d-id/1067741?.

[5] L.M. Grupp, J.D. Davis, S. Swanson, The bleak future of NAND flash memory, in: FAST, 2012. [6] L.M. Grupp, A.M. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P.H. Siegel, J.K. Wolf, Characterizing flash memory: Anomalies, observations, and applications, in: MICRO, 2009. [7] N. Mielke, T. Marquart, N. Wu, J. Kessenich, H. Belgal, E. Schares, F. Trivedi, E. Goodness, L.R. Nevill, Bit error rate in NAND flash memories, in: IRPS, 2008.

[9] Y. Park, J.-S. Kim, zFTL: Power-efficient data compression support for NAND flash-based consumer electronics devices, IEEE Trans. Consum. Electron. 57 (3) (2011).

Y. Pan et al. / Future Generation Computer Systems 93 (2019) 58–67 [10] A. Zuck, S. Toledo, D. Sotnikov, D. Harnik, Compression and SSDs: Where and how?, in: INFLOW, 2014. [11] S. Lee, J. Park, K. Fleming, J. Kim, et al., Improving performance and lifetime of solid-state drives using hardware-accelerated compression, IEEE Trans. Consum. Electron. 57 (4) (2011). [12] C.-H. Chen, C.-T. Chen, W.-T. Huang, The real-time compression layer for flash memory in mobile multimedia devices, Mob. Netw. Appl. 13 (6) (2008) 547– 554. [13] Y. Kim, B. Tauras, A. Gupta, B. Urgaonkar, Flashsim: A simulator for nand flashbased solid-state drives, in: SIMUL, 2009. [14] Storage Performance Council, http://traces.cs.umass.edu/index.php/Storage/ Storage, 2002. [15] A. Verma, R. Koller, L. Useche, R. Rangaswami, SRCMap: Energy proportional storage using dynamic consolidation, in: FAST, 2010. [16] G. Wu, X. He, Delta-FTL: Improving SSD lifetime via exploiting content locality, in: EuroSys, 2012. [17] F. Chen, T. Luo, X. Zhang, CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives, in: FAST, 2011. [18] J. Kim, C. Lee, S. Lee, I. Son, J. Choi, S. Yoon, H.u. Lee, S. Kang, Y. Won, J. Cha, Deduplication in SSDs: Model and quantitative analysis, in: MSST, 2012. [19] W. He, N. Xiao, F. Liu, Z. Chen, Y. Fu, Dl-dedupe: Dual-level deduplication scheme for flash-based SSDs, in: WAIM, 2013. [20] X. Zhang, J. Li, H. Wang, K. Zhao, T. Zhang, Reducing solid-state storage device write stress through opportunistic in-place delta compression, in: FAST, 2016. [21] B. Mao, H. Jiang, S. Wu, Y. Yang, Z. Xi, Elastic data compression with improved performance and space efficiency for flash-based storage systems, in: IPDPS, 2017. [22] T. Makatos, Y. Klonatos, M. Marazakis, M.D. Flouris, A. Bilas, Using transparent compression to improve SSD-based I/O caches, in: EuroSys, 2010. [23] J. Li, K. Zhao, X. Zhang, J. Ma, M. Zhao, T. Zhang, How much can data compressibility help to improve NAND flash memory lifetime?, in: FAST, 2015. [24] D. Harnik, R.I. Kat, O. Margalit, D. Sotnikov, A. Traeger, To zip or not to zip: effective resource usage for real-time compression, in: FAST, 2013.

Yubiao Pan is currently a lecturer with the School of Computer Science and Technology, Huaqiao University in Xiamen. He received the B.S. and Ph.D. degree from the School of Computer Science and Technology, University of Science and Technology of China, Hefei, China, in 2010 and 2015, respectively. His current research interests include solid-state devices, distributed storage system, and data deduplication.

67

Yongkun Li is currently an associate researcher in School of Computer Science and Technology, University of Science and Technology of China. He received the B.Eng. degree in Computer Science from University of Science and Technology of China in 2008, and the Ph.D. degree in Computer Science and Engineering from The Chinese University of Hong Kong in 2012. After that, he worked as a postdoctoral fellow in Institute of Network Coding at The Chinese University of Hong Kong. His research mainly focuses on performance evaluation of networking and storage systems.

Huizhen Zhang is currently a lecture in College of Computer Science and Technology, Huaqiao University. He received the B.S. degree in Computer Science from University of Science and Technology of China in 2005, and the Ph.D. degree in Computer Architecture from University of Science and Technology of China in 2010. His research mainly focuses on reconfigurable computing, compiler, performance evaluation and optimization of computer systems.

Yinlong Xu received the B.S. degree in mathematics from Peking University in 1983, and the M.S. and Ph.D. degrees in computer science from University of Science and Technology of China (USTC) in 1989 and 2004, respectively. He is currently a professor with the School of Computer Science and Technology at USTC. Prior to that, he served the Department of Computer Science and Technology at USTC as an assistant professor, a lecturer, and an associate professor. Currently, he is leading a group of research students in doing some networking and high performance computing research. His research interests include network coding, wireless network, combinatorial optimization, design and analysis of parallel algorithm, parallel programming tools, etc. He received the Excellent Ph.D. Advisor Award of Chinese Academy of Sciences in 2006.