SMR drive performance analysis under different workload environments

SMR drive performance analysis under different workload environments

Control Engineering Practice 75 (2018) 86โ€“97 Contents lists available at ScienceDirect Control Engineering Practice journal homepage: www.elsevier.c...

3MB Sizes 0 Downloads 25 Views

Control Engineering Practice 75 (2018) 86โ€“97

Contents lists available at ScienceDirect

Control Engineering Practice journal homepage: www.elsevier.com/locate/conengprac

SMR drive performance analysis under different workload environments Junpeng Niu a,b, *, Mingzhou Xie c , Jun Xu b, **, Lihua Xie a , Li Xia c a b c

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore Western Digital Corporation, Singapore Department of Automation, Tsinghua University, China

ARTICLE

INFO

Keywords: SMR drive Queuing theory Disk scheduling algorithm Simulation and modeling

ABSTRACT As the main stream of Hard Disk Drive (HDD) techniques, Shingled Magnetic Recording (SMR) drives have unique features different from conventional disk drives, e.g., append-only (sequential) write, indirect address mapping and garbage collection. Batch process is also designed to fully utilize the sequential write property to improve the SMR drive performance. The selection of different system parameters and policies affects the system performance and capacity efficiency. However, there is no dedicated analytical tool available so far to guide the parameter and policy selection. A queuing model is built and solved through a Markov chain process for the SMR drive to analyze the system performance under different kinds of system settings and workload environments. the control policies and parameter settings are also studied to explore their relation to the system performance of SMR disks, from the point of views of both analytical model and numerical simulation. An adaptive Garbage Collection (GC) policy is proposed to automatically select the foreground GC and background GC to provide consistent drive performance. A SMR drive simulator is further developed to validate the model and check the performance impacts of different drive parameters. We illustrate the similarity of the analytical and simulation results, and show that our tool can be utilized as the guide of the SMR drive design.

1. Introduction To meet the rapid growing data volume, the storage density is an important factor needs to be improved for the HDD design. Lots of efforts have been devoted to increase the Tracks per Inch (TPI) and thus the overall capacity, such as the accuracy of servo control (Herrmann, Ge, & Guo, 2008; Horowitz, Li, Oldham, Kon, & Huang, 2007; Suthasun, Mareels, & Al-Mamun, 2004; Venkataramanan, Chen, Lee, & Guo, 2002). SMR drive (Amer et al., 2011; Feldman & Gibson, 2013; Gibson & Ganger, 2011) is also introduced to increase the storage capacity while minimizing the added storage cost. In a conventional HDD drive, the width of the read track is much narrower than that of the write track which is utilized by the SMR drive to increase the track density through overlapping part of the write track with its neighboring tracks. Fig. 1 shows the data layout of SMR drive, where the outer tracks are overlapped by the inner tracks, e.g. track 1 is overlapped by track 2. As neighboring write tracks are partially overlapped with each other, if write data is randomly recorded to the media, the existing data on its neighboring tracks might be over-written. To avoid the over-written problem, only the appending write is allowed in an SMR drive. To solve this problem, some research works (Liu, Zeng, & Feng, 2016; Luo, Wan * **

et al., 2016; Luo, Yao, Qu, Wan, & Xie, 2016; Xiao, Dong, Ma, Liu, & Zhang, 2016) group a certain amount of consecutive overlapped tracks into a band, with few non-recordable tracks in-between. It solves the sequential-only write problem of SMR drive, however, the data update process takes much more time compared with the conventional HDD. Some of the industry players (INCITS T10 Technical Committee, 2014; INCITS T13 Technical Committee, 2016) and researchers (Hall, Marcos, & Coker, 2012) use log structured data recording to solve the problem. In this type of architecture, the write data, no matter random or sequential, are recorded into the zone sequentially. As the updated data is recorded into a new location instead of replacing the existing data, to make sure the data consistency, the original data need to be marked as invalid, which are called invalid data for simplicity. The invalid data needs to be cleaned to avoid the wastage of disk space. Garbage Collection (GC) is one of the methods to clean the invalid data, which includes 3 steps: (1) Select a suitable zone to do the invalid data cleaning, the selection can be based on the dirty data ratio, which is the ratio of invalid data blocks over the total size of the zone. (2) All the valid data in the zone needs to be collected in batch and pass to the internal memory buffer. (3) The data collected needs to be

Corresponding author at: School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore Corresponding author. E-mail addresses: [email protected], [email protected] (J. Niu), [email protected] (J. Xu).

https://doi.org/10.1016/j.conengprac.2018.03.016 Received 6 October 2017; Received in revised form 20 February 2018; Accepted 19 March 2018 0967-0661/ยฉ 2018 Elsevier Ltd. All rights reserved.

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

Fig. 1. Overview of SMR drive data layout.

Markov Chain model. Meanwhile, a simulator is developed to validate the analytical modeling. The main contribution of this paper are as follows:

processed and written to the new zone in sequence. However, when the workload of disk is heavy, the idle periods required for the GC process might not be enough. The amount of dirty data may keep increasing if they are not cleaned on time. When the disk capacity is nearly full, the performance of the SMR drive becomes worse, as the disk needs to clean some of these dirty data to provide enough space for handling a new coming write request. To tackle the issue, in this paper, a new GC policy is proposed to solve the poor performance of SMR drive when the disk capacity usage reaches the limit. In the new strategy, we classify the GC process as background and foreground. The background GC process only cleans the invalid data during the disk idle period, which has lower priority compared with user requests. Foreground GC process cleans the invalid data together with user requests no matter whether the disk is in an idle mode or not, and it has the same priority as user requests. The background GC and foreground GC processes are selected based on the changing disk statuses, including idle time duration, disk fullness, etc. The bandwidth allocation for the foreground GC is also determined dynamically based on the disk status. With the new proposed GC policy, the performance of SMR drive turns out to be more predictable, particularly when the disk capacity usage reaches the limit. However, with the plenty number of parameters and policies for dealing with the write requests and GC processes, and the changing requirements from client, it is not easy to compare and evaluate the effectiveness of these algorithms, especially the performance of SMR drive under these designs in practice. Some emulation (Pitchumani et al., 2012) and simulation tools (Tan, Xi, Ching, Jin, & Lim, 2013) are designed and developed to conduct the comparison and evaluation, however, the selection of algorithms and tuning of drive parameters are still difficult through try-and-error approach. Theoretical analysis and modeling are required to provide guideline and validation for simulation and actual drive design. Conventionally, there are analytical models (Ruemmler & Wilkes, 1994; Shriver, Merchant, & Wilkes, 1998; Xie, Xia, & Xu, 2017; Xie, Xu, & Xia, 2016) created to analyze the performance of HDD devices. ๐‘€โˆ•๐บโˆ•1 (Abouee-Mehrizi & Baron, 2016; Altman, Avrachenkov, Barakat, & Nรบรฑez-Queija, 2002) and ๐‘€โˆ•๐บโˆ•1โˆ•๐พ (Abouee-Mehrizi & Baron, 2016; Gross, 2008) queues are commonly applied to create the HDD performance model. However, with the various kinds of scheduling algorithms, such as SCAN algorithm (Denning, 1967), Shortest Access Time First (SATF) (Jacobson & Wilkes, 1991; Seltzer, Chen, & Ousterhout, 1990; Worthington, Ganger, & Patt, 1994) and RPO scheduling (Burkhard & Palmer, 2001), the service time is dependent on the status of the queue. In Lebrecht, Dingle, and Knottenbelt (2009), the HDD is modeled as an ๐‘€โˆ•๐บโˆ•1โˆ•๐พ queue with state dependent service time, whose arrivals follows Poisson process, service time have a general distribution with queue state dependent, capacity is limited to support maximum of ๐พ elements, and there is only a single server to process the arrivals. However, the modeling of SMR drive is not touched in history. In this paper, the SMR drive is modeled through An ๐‘€โˆ•๐บโˆ•1โˆ•๐พ queue with state dependent service time, which is solved through a

โˆ™ We formulate the SMR drive as an ๐‘€โˆ•๐บโˆ•1โˆ•๐พ queue with state dependent service time, and the formulation can be utilized to analyze the performance of SMR drive under different workload environments and drive setting parameters. The control policies and parameter settings are also studied to explore their relation to the system performance of SMR disks, from the point of views of both analytical model and numerical simulation. โˆ™ We propose a new GC policy that combines the background GC and foreground GC to solve the existing significant performance degradation problem when disk capacity is nearly full. โˆ™ We design and develop an SMR drive simulator to simulate the drive performance under different settings to validate the analytical model. We study and optimize the SMR drive performance under different drive settings, such as write batch size, GC batch write size, disk status, etc., and under different workload environments, such as read ratio, request size and sequential ratio, etc., through theoretical and simulation approaches. The remainder of this paper is organized as follows. Section 2 formulates the performance of the SMR drive as a queuing model problem. Section 3 analyzes the SMR drive performance through queuing theory. Section 4 compares and analyzes the performance of the SMR drive under analytical modeling and simulation. Finally, Section 5 draws the conclusion. 2. Problem formulation Fig. 2 shows the mechanism of SMR drive, including the sequential write process and the GC process. In the figure, the external requests are issued from host side, which follows Poisson distribution with predefined arrival rate ๐œ†. When requests arrive at the external queue, the read request will be directly passed to the internal queue if the data is not cached. Meanwhile, the write request will be buffered in the external queue. If the number of write requests in the buffered queue reaches a pre-defined value, which is the write batch size, then all the buffered write requests will be combined as one write request, and passed to the internal queue. In the internal buffer, there are two types of operations, one is the user read request and the batched write request, and the other is the GC operation, which include the GC read operation and GC write operation. As shown in Fig. 2, zone 1 is a normal zone containing data for coming read request, zone 3 has the largest ratio of dirty data, zone ๐‘‹ + 1 is the current active zone for keeping write data. The disk can only process one of these operations at one time. With the parameters and settings listed in Tables 1 and 2, the SMR drive model is described in the rest of this section. 87

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

Fig. 2. The illustrative diagram for request queuing for SMR drive with GC process. Table 3 SMR drive setting parameters.

Table 1 SMR drive setting and workload related parameters. Name

Description

Default value

Name

Description

Value

๐‘1 ๐‘2 ๐‘Ÿ๐‘”๐‘ก ๐‘ก๐‘–๐‘‘๐‘™๐‘’ ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ ๐‘Ÿ๐‘‘๐‘–๐‘Ÿ๐‘ก๐‘ฆ ๐‘๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘ ๐‘

Host write batch size GC read batch size Garbage ratio threshold Idle time threshold Disk fullness ratio Average zone garbage ratio Queuing block ratio threshold

8 8 0.3 0.005s 0.8 0.3 0.1%

๐œ† ๐‘Ÿ๐‘Ÿ ๐‘†๐‘Ÿ๐‘’๐‘ž ๐‘Ÿ๐‘ 

Arrival rate Read request percentage Average request size Host request sequential ratio

0.0 0.4 32 K 0.0

๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘–๐‘› ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘Ž๐‘ฅ ๐‘‡ ๐‘Ÿ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ ๐‘›๐‘†๐‘ƒ ๐‘‡ ๐‘†๐‘ ๐‘’๐‘ ๐‘†๐‘ ๐‘’๐‘” ๐‘ ๐‘…๐‘ƒ ๐‘€ ๐‘ก๐‘“ ๐พ

Minimal seek time Maximal seek time Total number of tracks Average sector per track Sector size Total track number (size) per segment Rotation per minute Full revolution time Internal queue buffer size

0.01 ms 8.33 ms 290 K 2050 512 (default 1) 7200 8.33 ms 32

2.2. Disk processing time

Table 2 Parameters utilized in the paper. Name

Description

๐œ†1 ๐œ†2 ๐œ†3 ๐‘ก1 ๐‘ก2 ๐‘ก3 ๐‘ƒ๐‘ ๐‘ก๐‘Ÿ๐‘”๐‘ ๐‘ก๐‘ค ๐‘”๐‘ ๐‘ก(๐‘›) ๐‘๐‘œ๐‘  ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘ก๐‘Ÿ๐‘œ๐‘ก

Disk service rate for read request Disk service rate for batch write Disk service rate for GC process Disk service time for read request Disk service time for batch write Disk service time for GC process Blocking probability GC read processing time GC write processing time Positioning time with n requests Data transfer time Seek time with n requests Rotational latency with n requests

To analyze the overall performance of the SMR drive, firstly we need to formulate ๐‘‡1 , ๐‘‡2 and ๐‘‡3 as (1) and (2). The detailed explanation of these formulations can be found in Niu, Xu, and Xie (2017). In the equations, ๐‘ก(๐‘›) ๐‘๐‘œ๐‘  and ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› are analyzed in detail in Section 3, and their calculations are based on the parameters of the drive mechanical components, which are shown in Table 3. โŽง๐‘‡1 = ๐‘ก(๐‘›) + ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› ๐‘๐‘œ๐‘  โŽช (๐‘›) โŽจ๐‘‡2 = ๐‘ก๐‘๐‘œ๐‘  + ๐‘1 ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› โŽช๐‘‡ = ๐‘ก ๐‘Ÿ + ๐‘ก ๐‘ค . ๐‘”๐‘ ๐‘”๐‘ โŽฉ 3

(1)

(๐‘›) (๐‘›) ๐‘Ÿ ๐‘ค In the equation, the ๐‘ก(๐‘›) ๐‘๐‘œ๐‘  = ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ + ๐‘ก๐‘Ÿ๐‘œ๐‘ก , and the ๐‘ก๐‘”๐‘ and ๐‘ก๐‘”๐‘ are expressed as follows: { ๐‘ก๐‘Ÿ๐‘”๐‘ = ๐‘ก(๐‘›) ๐‘๐‘œ๐‘  + ๐‘2 ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› (2) (๐‘›) ๐‘ก๐‘ค = ๐‘ก ๐‘”๐‘ ๐‘๐‘œ๐‘  + ๐‘2 (1 โˆ’ ๐‘Ÿ๐‘‘๐‘–๐‘Ÿ๐‘ก๐‘ฆ )๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› .

2.1. Workload property formulation 2.3. Disk resource constraints

With arrival rate ๐œ† and the read request ratio ๐‘Ÿ๐‘Ÿ , which is the number of read requests over the total number of requests, we can get the arrival rate of read and write requests as ๐œ†1 = ๐‘Ÿ๐‘Ÿ ๐œ† and (1 โˆ’ ๐‘Ÿ๐‘Ÿ )๐œ† respectively. Meanwhile, the request size ๐‘†๐‘Ÿ๐‘’๐‘ž denotes the average number of data blocks the requests want to access the disk. The sequential access requests mean that two or more requests that access the data in a sequential order. For example, if the last access data block of request ๐ด is ๐‘๐‘™๐‘˜1 and the first data access block of request ๐ต is ๐‘๐‘™๐‘˜2 , and ๐‘๐‘™๐‘˜1 + 1 = ๐‘๐‘™๐‘˜2 , then the request ๐ด and ๐ต are considered as sequential access requests. The number of sequential access requests over the number of total requests is the sequential ratio of the workload, which is denoted as ๐‘Ÿ๐‘  . In the internal queue, the sequential accessed requests can be combined together as one request, which increases the expected request size to ๐‘†๐‘Ÿ๐‘’๐‘ž โˆ•๐‘Ÿ๐‘  , and reduces the arrival rate to ๐œ†๐‘Ÿ๐‘  .

In SMR drive, there are two constraints that affect the drive performance: internal queue length and the disk capacity usage ratio. In this section, we will explain these two constraints in detail. 2.3.1. Internal queue length constraint As the buffer of the queue is finite due to the hardware limitation, the number of requests can be buffered in the queue is limited also. Let us denote the capacity of the external and internal queues as ๐พ1 and ๐พ2 , which are the maximum numbers of requests allowed to be buffered in the external and internal queues. In current HDD design, the media controller can only handle one request at a time. If the disk is busy handling one request, the other coming requests need to wait for 88

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

further reduce the impact on the drive performance, while maintaining the speed of invalid data cleaning. In the proposed policy, when ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ is large, e.g. ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ > 0.6 and there are not enough idle periods, we need to allocate certain ratio of disk processing bandwidth to the foreground GC process. The amount of disk processing bandwidth allocated to foreground GC process increases when ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ increases. 2.4. Performance evaluation To evaluate the performance of SMR drive, IOPS is one of the most important factors to be measured. In this paper, we use ๐œ†๐‘š๐‘Ž๐‘ฅ as computed in (3) under different kinds of workloads to study the SMR drive performances. In the equation, ๐‘๐‘”๐‘ is the bandwidth allocated to GC process, and ๐‘“ (๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ ) is function to calculate the GC bandwidth allocation based on the disk statuses. ๐œ†๐‘š๐‘Ž๐‘ฅ = ๐œ†|0 < ๐‘ƒ๐‘ โ‰ค ๐‘ƒ๐‘๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘ โˆฉ ๐‘๐‘”๐‘ = ๐‘“ (๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ ).

(3)

3. Analysis Fig. 3. The method to select different GC policies.

As shown in the previous section, the performance metric ๐œ†๐‘š๐‘Ž๐‘ฅ is determined by two factors: the requests blocking ratio in the internal queue ๐‘ƒ๐‘ , and the disk fullness status ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ . In this section, we will calculate the ๐‘ƒ๐‘ theoretically and estimate the effects of ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ on the drive performance. The method of estimating the ๐œ†๐‘š๐‘Ž๐‘ฅ will also be provided.

the current one to be completed. And these requests will be buffered in queue during the waiting time. If the queue is full, and there is a new request coming, the new coming request will be blocked. The blocked request may cause data loss or increase the response time, which may downgrade the user experience. In order to limit the blocking request to occur too frequently, we define a blocking probability threshold ๐‘ƒ๐‘๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘ to match the user QoS requirement. If ๐‘ƒ๐‘ < ๐‘ƒ๐‘๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘ , we consider that the incoming user workload requirement can be fulfilled by the SMR drive design. Otherwise, the user needs to change the workload or the design of the SMR drive. Different users may require different QoS for the SMR drive design, such as the bank transaction may require ๐‘ƒ๐‘๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘ = 0, video storage may require ๐‘ƒ๐‘๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘ = 0.5%, etc. Here, we apply ๐‘ƒ๐‘๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘ = 0.1% as an illustration.

3.1. Disk processing time analysis The disk processing time consists of 3 components: seek time, rotational latency and data transfer time. In SMR drive applying RPO scheduling policy, the seek time and rotational latency are related to the number of requests in the disk queue, while the data transfer time is only affected by the size of data to be transferred. For simplicity, we assume that the expected data transfer time is only related to the average request size, which can be expressed as: 60 (4) ๐‘…๐‘ƒ ๐‘€ ร— ๐‘†๐‘ƒ ๐‘‡ where ๐‘†๐‘ƒ ๐‘‡ means sectors per track and ๐‘…๐‘ƒ ๐‘€ means Rounds per Minute.

๐ธ[๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› ] = ๐ธ[๐‘†๐‘Ÿ๐‘’๐‘ž ] ร—

2.3.2. Disk capacity usage constraint Disk needs to apply the GC processes to clean the invalid data, especially when the number of idle periods is not enough for GC process, and the disk is nearly full, e.g. ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ > 0.9, where ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ is the size of usage space over the total capacity of the drive. The allocation of disk bandwidth to GC process significantly reduces the overall performance. To solve the inconsistent performance of SMR drive when disk capacity is nearly full, a new GC policy is proposed as in Fig. 3. In the figure, the module Idle time detection is utilized to check whether the disk is idle, which is done by comparing the time duration that the internal queue length is equal to 0 and the idle time threshold, which is predefined by the drive to check the disk idleness. The module Require foreground GC mainly checks the disk fullness status ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ and the disk capacity cleaned by the background GC. If ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ is large, and the amount of data cleaned by the background GC is not enough, then the foreground GC is required to help for the data cleaning. Otherwise, the foreground GC is deactivated to avoid the performance degradation for user requests. If the disk has lots of idle time periods, the foreground GC is not activated, only background GC is performed, which does not reduce the drive performance significantly. When the disk is always busy, certain amount of disk bandwidth is allocated to the foreground GC based on the disk dirty ratio and the ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ , which reduces the overall drive performance. However, this allows the disk to start cleaning the invalid data earlier to avoid significant performance loss when ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ > 0.9. When the disk only has few idle time periods, both foreground GC and background GC are activated to clean the invalid data, which can

3.1.1. Seek time distribution Seek time is the time usage for the disk head to move from the source track ๐‘‡ ๐‘Ÿ๐‘  to the destination track ๐‘‡ ๐‘Ÿ๐‘‘ . The distance of the seek ๐‘†๐‘‘ can be expressed as ๐‘†๐‘‘ = |๐‘‡ ๐‘Ÿ๐‘  โˆ’ ๐‘‡ ๐‘Ÿ๐‘‘ |. To calculate the seek time distribution, we should know the relationship among the seek time, seek distance, and the seek distance distribution. Lemma 3.1. Assume all the tracks have equal number of sectors, which is equal to the number of sectors on the middle track in actual drive. With ๐‘ ๐‘’๐‘’๐‘˜ known ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘–๐‘› , ๐‘ก๐‘š๐‘Ž๐‘ฅ and ๐‘‡ ๐‘Ÿ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ , then the probability density function (PDF) of the seek time can be expressed as (5). โŽง โŽช2โˆ•๐‘ข, ๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ก) = โŽจ 4(๐‘ก โˆ’ ๐‘š) 2(๐‘ก โˆ’ ๐‘š)3 โˆ’ โŽช ๐‘ข ๐‘›2 ๐‘ข2 โŽฉ where ๐‘ข = ๐‘‡ ๐‘Ÿ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ โˆ’ 1, ๐‘š =

0โ‰ค๐‘ก<๐‘š+๐‘› (5)

otherwise

โˆš ๐‘ ๐‘’๐‘’๐‘˜ ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘–๐‘› โˆš ๐‘ขโˆ’๐‘ก๐‘š๐‘Ž๐‘ฅ ๐‘ขโˆ’1

and ๐‘› =

๐‘ ๐‘’๐‘’๐‘˜ ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘Ž๐‘ฅ โˆ’๐‘ก๐‘š๐‘–๐‘› โˆš . ๐‘ขโˆ’1

Proof. According to Chen & Towsley (1993), the seek time is related to the seek distance, and the relationship is expressed as (6). โŽง0, โŽช ๐‘ ๐‘’๐‘’๐‘˜ โˆš ๐‘ ๐‘’๐‘’๐‘˜ โˆš ๐‘ ๐‘’๐‘’๐‘˜ ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ (๐‘†๐‘‘ ) = โŽจ ๐‘ก๐‘š๐‘–๐‘› ๐‘ข โˆ’ ๐‘ก๐‘š๐‘Ž๐‘ฅ ๐‘š๐‘Ž๐‘ฅ โˆ’ ๐‘ก + โˆš ๐‘š๐‘–๐‘› ๐‘†๐‘‘ โˆš โŽช ๐‘ขโˆ’1 ๐‘ขโˆ’1 โŽฉ 89

๐‘†๐‘‘ = 0 otherwise.

(6)

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

Proof. With known ๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ฅ) and ๐‘“๐‘Ÿ๐‘œ๐‘ก (๐‘ฅ), we can get the disk positioning time by the convolution of the disk seek time and rotational latency, which is expressed as (15).

As we assume all the tracks have same number of sectors, they have the same probability to be accessed by a random request, and the PDF of the track ๐‘‡ ๐‘Ÿ being accessed can be expressed as: ๐‘“๐‘‡ ๐‘Ÿ (๐‘ฅ) =

1 ๐‘ข

๐‘คโ„Ž๐‘’๐‘Ÿ๐‘’

๐‘ข = ๐‘‡ ๐‘Ÿ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ โˆ’ 1.

๐‘“๐‘๐‘œ๐‘  (๐‘ฅ) = ๐‘“๐‘ ๐‘’๐‘’๐‘˜+๐‘Ÿ๐‘œ๐‘ก (๐‘ฅ)

(7)

๐‘…๐‘š๐‘Ž๐‘ฅ

=

The PDF of seek distance is calculated by combining all the possible combinations of two distinct tracks, denoted as ๐‘‡ ๐‘Ÿ1 and ๐‘‡ ๐‘Ÿ2 . During computation, there are two cases to be taken care of: (1) ๐‘‡ ๐‘Ÿ1 > ๐‘‡ ๐‘Ÿ2 , and (2) ๐‘‡ ๐‘Ÿ1 โ‰ค ๐‘‡ ๐‘Ÿ2 . The distribution function is expressed as:

=

๐‘ขโˆ’๐‘ฅ

๐‘“๐‘‘๐‘–๐‘  (๐‘ฅ) =

โˆซ0

(8)

๐‘ข

+

=

๐‘“๐‘‡ ๐‘Ÿ (๐‘ฆ)๐‘“๐‘‡ ๐‘Ÿ (๐‘ฆ + ๐‘ฅ)๐‘‘๐‘ฆ

โˆซ๐‘ฅ

2(๐‘ข โˆ’ ๐‘ฅ) . ๐‘ข2

(9)

Then CDF and PDF of seek time can be expressed as (10) and respectively with ๐น๐‘‘๐‘–๐‘  (๐‘ฅ) and ๐‘“๐‘‘๐‘–๐‘  (๐‘ฅ). Finally, by substituting (9) (11), we can get (5). { ๐น๐‘‘๐‘–๐‘  (0), ) 0โ‰ค๐‘ก<๐‘š+๐‘› ( ๐น๐‘ ๐‘’๐‘’๐‘˜ (๐‘ก) = ๐‘กโˆ’๐‘š 2 ) otherwise ๐น๐‘‘๐‘–๐‘  ( ๐‘› { ๐‘“๐‘‘๐‘–๐‘  (0), 0โ‰ค๐‘ก<๐‘š+๐‘› ( ) ๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ก) = ๐‘กโˆ’๐‘š 2 ๐‘“๐‘‘๐‘–๐‘  ( ) 2(๐‘ก โˆ’ ๐‘š) otherwise. โ–ก ๐‘›

(11) into

๐‘ฅ

โˆซ๐‘ฅโˆ’๐‘…๐‘š๐‘Ž๐‘ฅ

๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ฆ)๐‘‘๐‘ฆ where ๐‘ฆ = ๐‘ฅ โˆ’ ๐‘ง.

(10)

(11)

(1) ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ) = ๐‘›(1 โˆ’ ๐น๐‘๐‘œ๐‘  (๐‘ฅ))๐‘›โˆ’1 ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ).

(16)

(1) In the equation, ๐‘› is the number of requests in the queue, and ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ) is the PDF of smallest disk positioning time of all the requests in the disk queue.

Proof. In the disk with RPO scheduling algorithm, the disk service time is the minimal disk positioning time for all the queuing requests plus the disk transfer time, which is dependent on the number of sectors to be transferred. Thus the disk service time can be expressed generally as: ๐‘ก๐‘›๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ = min (๐‘ก๐‘Ÿ๐‘๐‘œ๐‘  ) + ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› .

(17)

๐‘Ÿ=1,โ€ฆ,๐‘›

In the equation, ๐‘ก๐‘Ÿ๐‘๐‘œ๐‘  is the disk positioning time, which includes the seek time and rotational latency. ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› is the disk transfer time under different request sizes. To calculate the distribution of the ๐‘ก๐‘›๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ , we need to know the minimal positioning time distribution. Here, order statistics (David & Nagaraja, 2004) is employed to calculate the distribution of the minimal positioning time by using the first order distribution. Assume there are ๐‘› requests in the disk queue, denoted as ๐‘…1 , ๐‘…2 , . . . , ๐‘…๐‘› , which are independent and identically distributed. Then the CDF of the minimal positioning time can be expressed as:

(12)

3.1.3. Disk processing time under different queue length The seek time and rotational latency distribution functions are given in the previous section, which are ๐น๐‘ ๐‘’๐‘’๐‘˜ (๐‘ฅ), ๐น๐‘Ÿ๐‘œ๐‘ก (๐‘ก), ๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ก) and ๐‘“๐‘Ÿ๐‘œ๐‘ก (๐‘ฅ). Now we can apply the PDF of the seek time and rotational latency to compute the PDF of disk positioning time ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ), which is the convolution of seek time and rotational latency.

(1) ๐น๐‘๐‘œ๐‘  (๐‘ฅ) = 1 โˆ’ (1 โˆ’ ๐น๐‘๐‘œ๐‘  (๐‘ฅ))๐‘› .

(18)

From (18), we can get the PDF of disk positioning time, by getting (1) derivation of the CDF of the disk positioning time distribution ๐น๐‘๐‘œ๐‘  (๐‘ฅ), which is (16). โ–ก

Lemma 3.2. Given the PDF of the seek time and rotational latency as (5) and (12), the PDF of disk positioning time and service time can be expressed as (13) and (14). ๐‘ฅ โŽง 1 ๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ข)๐‘‘๐‘ข, 0 โ‰ค ๐‘ข < ๐‘…๐‘š๐‘Ž๐‘ฅ โŽช๐‘… โˆซ โŽช ๐‘š๐‘Ž๐‘ฅ 0 ๐‘ฅ โŽช 1 ๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ข)๐‘‘๐‘ข ๐‘…๐‘š๐‘Ž๐‘ฅ โ‰ค ๐‘ข < ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ) = โŽจ ๐‘… ๐‘š๐‘Ž๐‘ฅ ๐‘š๐‘Ž๐‘ฅ โˆซ๐‘ฅโˆ’๐‘…๐‘š๐‘Ž๐‘ฅ โŽช ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘Ž๐‘ฅ โŽช 1 ๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ข)๐‘‘๐‘ข ๐‘ข โ‰ฅ ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘Ž๐‘ฅ โŽช๐‘… โŽฉ ๐‘š๐‘Ž๐‘ฅ โˆซ๐‘ฅโˆ’๐‘…๐‘š๐‘Ž๐‘ฅ { 0, 0 โ‰ค ๐‘ฅ < ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› ๐‘“๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ) = ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ โˆ’ ๐‘ก๐‘ก๐‘Ÿ๐‘Ž๐‘› ) otherwise.

๐‘…๐‘š๐‘Ž๐‘ฅ

๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ฅ โˆ’ ๐‘ง)๐‘‘๐‘ง

Lemma 3.3. If the disk applying RPO scheduling algorithm, and the PDF and CDF of disk positioning time are known as ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ) and ๐น๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ), then we can get the PDF of the minimal disk positioning time under different queue length conditions through (16).

3.1.2. Rotational latency distribution As the disk rotational speed is constant with pre-specified RPM, we can express the rotational latency distribution ๐‘“๐‘Ÿ๐‘œ๐‘ก (๐‘ฅ) as a uniform distribution shown in (12), and the CDF of ๐น๐‘Ÿ๐‘œ๐‘ก (๐‘ฅ) can be easily calculated by integrating the ๐‘“๐‘Ÿ๐‘œ๐‘ก (๐‘ฅ). 0 โ‰ค ๐‘ฅ โ‰ค ๐‘…๐‘š๐‘Ž๐‘ฅ .

1

(15)

๐‘…๐‘š๐‘Ž๐‘ฅ

By using the results from Lemma 3.2, we can get the CDF of the positioning time ๐น๐‘๐‘œ๐‘  (๐‘ฅ) and the disk service time ๐น๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ) by integrating the ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ) and ๐‘“๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ).

Similar model can be applied to the zoned disk model, which has larger number of sectors on the outer tracks, and smaller number of sectors on the inner tracks. The distribution of the track being accessed can be formulated as a linear increasing model, with increasing rate of (๐‘ ๐‘’๐‘[๐‘‡ ๐‘Ÿ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ โˆ’ 1] โˆ’ ๐‘ ๐‘’๐‘[0])โˆ•(๐‘‡ ๐‘Ÿ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ โˆ’ 1). Then the seek distance distribution and seek time distribution can be computed accordingly.

๐‘“๐‘Ÿ๐‘œ๐‘ก (๐‘ฅ) = 1โˆ•๐‘…๐‘š๐‘Ž๐‘ฅ

1 ๐‘…๐‘š๐‘Ž๐‘ฅ โˆซ0

As the disk seek time is in the range [0, ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘Ž๐‘ฅ ], the ๐‘ฆ in (15) must be in the seek time range. Thus (15) needs to be calculated in piecewise, which ๐‘ ๐‘’๐‘’๐‘˜ ๐‘ ๐‘’๐‘’๐‘˜ are ranged as [0, ๐‘…๐‘š๐‘Ž๐‘ฅ ), [๐‘…๐‘š๐‘Ž๐‘ฅ , ๐‘ก๐‘ ๐‘’๐‘’๐‘˜ ๐‘š๐‘Ž๐‘ฅ ), and [๐‘ก๐‘š๐‘Ž๐‘ฅ , ๐‘…๐‘š๐‘Ž๐‘ฅ + ๐‘ก๐‘š๐‘Ž๐‘ฅ ]. After some algebraic calculations, we can get the disk positioning time distribution as shown in (13). Meanwhile, by adding the disk data transfer time into the equation, which gives us the service time distribution as (14). โ–ก

๐‘“๐‘‡ ๐‘Ÿ (๐‘ฆ)๐‘“๐‘‡ ๐‘Ÿ (๐‘ฆ)๐‘“๐‘‡ ๐‘Ÿ (๐‘ฆ โˆ’ ๐‘ฅ)๐‘‘๐‘ฆ.

After some mathematical simplification, we can get the expression of ๐‘“๐‘‘๐‘–๐‘  (๐‘ฅ) as (9), and the CDF of the seek distance ๐น๐‘‘๐‘–๐‘  (๐‘ฅ) can be expressed based on ๐‘“๐‘‘๐‘–๐‘  (๐‘ฅ). ๐‘“๐‘‘๐‘–๐‘  (๐‘ฅ) =

๐‘“๐‘ ๐‘’๐‘’๐‘˜ (๐‘ฅ โˆ’ ๐‘ง)๐‘“๐‘Ÿ๐‘œ๐‘ก (๐‘ง)๐‘‘๐‘ง

โˆซ0

Similar to the results from Lemma 3.3, the PDF of disk service time can be expressed as: (1) ๐‘“๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ) = ๐‘›(1 โˆ’ ๐น๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ))๐‘›โˆ’1 ๐‘“๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ).

Based on (16) and (19), we can compute the numerically.

(13)

(19) (1) ๐‘“๐‘๐‘œ๐‘  (๐‘ฅ)

and

(1) ๐‘“๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ)

3.2. ๐‘ƒ๐‘ calculation and ๐‘€โˆ•๐บโˆ•1โˆ•๐พ queue analysis In an SMR drive with the RPO scheduling algorithm, the disk service time is dependent on the number of requests in the queue, and the queue

(14)

90

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

the queue length at ๐‘ก๐‘› , we have the following condition by considering the queue length limitation: 0 โ‰ค ๐ฟ๐‘› โ‰ค ๐พ โˆ’ 1.

(21)

Let ๐›ผ๐‘› denote the number of requests that arrive during [๐‘ก๐‘›โˆ’1 , ๐‘ก๐‘› ). Assume ๐ฟ๐‘› is known at the time instance ๐‘ก๐‘› , then we can obtain the distribution of ๐›ผ๐‘›+1 through: โˆž

๐‘ƒ ๐‘Ÿ(๐‘Ž๐‘›+1 = ๐‘˜|๐ฟ๐‘› ) =

โˆซ0

๐‘“๐‘‡ |๐ฟ๐‘› (๐‘ก)

(๐œ†๐‘ก)๐‘˜ โˆ’๐œ†๐‘ก ๐‘’ ๐‘‘๐‘ก ๐‘˜!

(22)

where ๐‘“๐‘‡ |๐ฟ๐‘› (๐‘ก) is the PDF of disk service time under queue length ๐ฟ๐‘› . With ๐›ผ๐‘›+1 , we have the following queue length transition equation: (23)

๐ฟ๐‘›+1 = ๐‘š๐‘–๐‘›{๐ฟ๐‘› โˆ’ 1 + ๐›ผ๐‘›+1 , ๐พ โˆ’ 1}.

Based on (23), we can observe that the sequence of ๐ฟ๐‘› , ๐‘› = 1, 2, โ€ฆ , ๐พ โˆ’ 1 is a Markov chain. Let ๐‘ƒ = (๐‘ƒ๐ฟ,๐ฟโ€ฒ ) be the transition probability matrix, in which ๐‘ƒ๐ฟ,๐ฟโ€ฒ is the transition probability from queue length ๐ฟ to queue length ๐ฟโ€ฒ . Based on (22), we can get ๐‘ƒ๐ฟ,๐ฟโ€ฒ through the following equation: (a) Positioning time.

โŽก โŽข๐›ผ0,1 โŽข โŽข โŽข๐›ผ0,1 โŽข โŽข ๐‘ƒ =โŽข 0 โŽข โŽข โŽข โŽข 0 โŽข โŽข โ‹ฎ โŽข โŽฃ 0

๐›ผ2,1

โ€ฆ

๐›ผ๐พโˆ’2,1

๐›ผ1,1

๐›ผ2,1

โ€ฆ

๐›ผ๐พโˆ’2,1

๐›ผ0,2

๐›ผ1,2

โ€ฆ

๐›ผ๐พโˆ’3,2

0

๐›ผ0,3

โ€ฆ

๐›ผ๐พโˆ’4,3

โ‹ฎ 0

โ‹ฎ 0

โ‹ฑ โ€ฆ

โ‹ฎ ๐›ผ0,๐พโˆ’1

๐พโˆ’2 โˆ‘

โŽค ๐›ผ๐‘›,1 โŽฅ ๐‘›=0 โŽฅ ๐พโˆ’2 โŽฅ โˆ‘ ๐›ผ๐‘›,1 โŽฅ 1โˆ’ โŽฅ ๐‘›=0 โŽฅ ๐พโˆ’3 โˆ‘ โŽฅ ๐›ผ๐‘›,2 โŽฅ . 1โˆ’ ๐‘›=0 โŽฅ ๐พโˆ’4 โŽฅ โˆ‘ ๐›ผ๐‘›,3 โŽฅ 1โˆ’ โŽฅ ๐‘›=0 โŽฅ โ‹ฎ โŽฅ 1 โˆ’ ๐›ผ0,๐พโˆ’1 โŽฆ 1โˆ’

In order to know whether there is an equilibrium queue length distribution, and the chain can converge to the steady state, we need to check whether the transition matrix ๐‘ƒ is irreducible and aperiodic. By applying Lemma A.1 as shown in the appendix, we can prove that ๐‘ƒ is irreducible and aperiodic, then there exists an equilibrium distribution. Let us denote the steady state queue length distribution at departure instant as ๐œ‹ (๐‘‘) , and the steady-state probability of queue length ๐ฟ as ๐œ‹๐ฟ(๐‘‘) . Then we can have (20). By solving these linear equations, we can get the steady state queue length distribution at departure instance, which is ๐œ‹ (๐‘‘) . โ–ก

(b) Queue length.

With the steady-state distribution ๐œ‹ (๐‘‘) at departure instance gotten from (20), we can calculate the average service time ๐ธ[๐‘‡ ] and the average queue length ๐ธ[๐‘„] through (24) and (25).

Fig. 4. The disk positioning times and number of requests in disk queue under different IOPS.

๐ธ[๐‘‡ ] =

length distribution is related to the disk processing time and the request arrival rate. It is not possible to use traditional ๐‘€โˆ•๐บโˆ•1โˆ•๐พ queue model to find the solution, as the service time ๐บ is state dependent. In this section, we apply a theoretical approach to solve the problem, and find out the queue length distribution and the blocking probability (1) by given ๐œ† and ๐‘“๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ).

๐พโˆ’1 โˆ‘

(๐ธ[๐‘‡ |๐ฟ] ร— ๐œ‹๐ฟ(๐‘‘) )

๐ฟ=0 ๐พโˆ’1 โˆ‘

๐ธ[๐‘„](๐‘‘) =

(๐ฟ ร— ๐œ‹๐ฟ(๐‘‘) ).

(24) (25)

๐ฟ=0

From Lemma A.2, we can get that the probability of that the queue length is ๐พ, which means the disk queue buffer is full. When the disk queue is in this state, the new coming request will be blocked. Thus ๐œ‹ (๐‘Ž) (๐พ) is the disk blocking probability. Other than that, the average queue lengths and average disk processing times under different ๐œ† can also be calculated through (24) and (25). The comparisons of the theoretical and simulation results of the expected queue length and the disk service time are shown in Fig. 4. From the figures, we can observe that the differences between the analytical results and simulation results are less than 10%. These small differences do not affect the drive performance analysis, especially the performance trends under various kinds of parameter settings and workload environments.

(1) Corollary 3.1. If the PDF of disk service time is given as ๐‘“๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘๐‘’ (๐‘ฅ) in (19), the disk arrival rate is ๐œ†, the disk queue buffer size limitation is fixed as ๐พ, and assume that the processing of requests in the queue follows a Markov Chain model, which has the transition matrix ๐‘ƒ , then the queue length distribution at departure instance ๐œ‹ (๐‘‘) exists and can be obtained by solving (20).

โŽง๐œ‹ (๐‘‘) = ๐œ‹ (๐‘‘) ๐‘ƒ โŽช๐พโˆ’1 โŽจ โˆ‘ (๐‘‘) ๐œ‹๐ฟ = 1. โŽช โŽฉ ๐ฟ=0

๐›ผ1,1

(20)

Based on Algorithm 1, we can get ๐‘ƒ๐‘ under different settings and request arrival rates. Then we can use binary-search algorithm to find ๐œ†๐‘š๐‘Ž๐‘ฅ .

Proof. Let ๐‘ก๐‘› denote the ๐‘›๐‘กโ„Ž departure instant, which means a request is just leaving the queue buffer and serviced by the disk, and ๐ฟ๐‘› denote 91

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

Algorithm 1 Blocking probability calculation. Input: ๐œ†0 , ๐พ, ๐‘“๐‘‡0 |๐‘š (๐‘ก) Output: ๐‘ƒ๐‘ 1: Initialization : 2: ๐‘ƒ = (๐‘ƒ๐ฟ,๐ฟโ€ฒ ) = 0 3: Compute ๐‘ƒ ๐‘Ÿ(๐›ผ|๐‘š) : 4: for ๐‘š = 1 to ๐พ โˆ’ 1 do 5: for ๐‘– = 0 to ๐พ โˆ’ 1 do (๐œ† ๐‘ก)๐‘– โˆž 6: ๐‘ƒ ๐‘Ÿ(๐›ผ = ๐‘–|๐‘š) = โˆซ0 ๐‘“๐‘‡0 |๐‘š (๐‘ก) ๐‘–!0 ๐‘’โˆ’๐œ†0 ๐‘ก ๐‘‘๐‘ก 7: end for 8: end for 9: Compute ๐‘ƒ = (๐‘ƒ๐ฟ,๐ฟโ€ฒ ) : 10: for ๐ฟ1 = 1 to ๐พ โˆ’ 1 do 11: for ๐ฟโ€ฒ1 = ๐ฟ1 โˆ’ 1 to ๐พ โˆ’ 2 do 12: ๐‘ƒ๐ฟ1 ,๐ฟโ€ฒ = ๐‘ƒ ๐‘Ÿ(๐›ผ = ๐ฟโ€ฒ1 โˆ’ ๐ฟ1 + 1|๐ฟ1 ) 1 13: end for โˆ‘๐พโˆ’๐ฟ โˆ’1 14: ๐‘ƒ๐ฟ1 ,๐ฟโ€ฒ = 1 โˆ’ ๐‘›=0 1 ๐‘ƒ ๐‘Ÿ(๐›ผ = ๐‘›|๐ฟ1 ) 1 15: end for 16: Compute ๐‘ƒ๐‘ : 17: ๐œ‹ = ๐œ‹๐‘ƒ 18: for ๐‘– = 1 to ๐พ โˆ’ 1 do 19: ๐‘(๐‘‘) ๐‘– = ๐‘๐‘–๐‘– 20: end for โˆ‘๐พ (๐‘‘) 21: ๐ธ[๐‘‡ ] = ๐‘š=0 (๐ธ[๐‘‡ |๐‘š] ร— ๐‘๐‘– ) (๐‘Ž)

With the request arrival rate at the internal disk queue, the expected size of requests to the internal buffer queue ๐‘†๐‘Ž๐‘๐‘ก can expressed as:

Since there is no GC process, we can assume ๐œ†3 = 0, then the arrival rate at the internal queue ๐œ†๐‘Ž๐‘๐‘ก can be expressed as: ๐œ†๐‘Ž๐‘๐‘ก = ๐œ†1 + ๐œ†2 + ๐œ†3

Through binary searching, the maximum IOPS ๐œ†๐‘Ž๐‘๐‘ก at the internal queue can be found, we can utilize the searching result ๐œ†๐‘Ž๐‘๐‘ก to compute the ๐œ†๐‘š๐‘Ž๐‘ฅ with (27)

25: 26: 27: 28: 29:

(28)

๐œ†๐‘š๐‘Ž๐‘ฅ = ๐œ†๐‘Ž๐‘๐‘ก โˆ•(๐‘Ÿ๐‘Ÿ + (1 โˆ’ ๐‘Ÿ๐‘Ÿ )โˆ•๐‘1 ). 3.4. Performance modeling under different GC batch write sizes

As the impact can be minimized to nearly 0 if the background GC process can be canceled during the GC process being handled by the disk. We will focus on the performance with foreground GC to analyze the ๐œ†๐‘š๐‘Ž๐‘ฅ under different GC batch write sizes ๐‘2 . Let us assume that the foreground GC process arrival rate also follows a Poisson distribution, then the combination of GC processes and user requests also follows a Poisson distribution, which can be proved similarly as Lemma 3.4.

๐œ† ๐œ†+1โˆ•๐ธ[๐‘‡ ] โˆ‘ ๐‘‘ โˆ’ 2) = ๐พโˆ’2 ๐‘—=0 ๐œ‹ (๐‘—) 1โˆ’๐น๐น (๐พโˆ’2)

24: ๐œ‹ (๐‘Ž) (๐พ) =

(27)

= ๐œ†๐‘Ÿ๐‘Ÿ + ๐œ†(1 โˆ’ ๐‘Ÿ๐‘Ÿ )โˆ•๐‘1 .

22: ๐œ‹๐น (1) = 23: ๐น๐น (๐พ

(26)

๐‘†๐‘Ž๐‘๐‘ก = ๐‘†๐‘Ÿ๐‘’๐‘ž โˆ•(๐‘Ÿ๐‘Ÿ + (1 โˆ’ ๐‘Ÿ๐‘Ÿ )โˆ•๐‘1 ).

Theorem 3.1. The average request size to the disk internal queue ๐‘†๐‘Ž๐‘๐‘ก is can be expressed as (29) when the disk is in foreground GC mode.

(๐‘Ž)

1โˆ•๐œ‹๐น (1)โˆ’๐น๐น (๐พโˆ’2)

for ๐‘– = 1 to ๐พ โˆ’ 1 do ๐œ‹ (๐‘Ž) (๐‘–) = ๐œ‹ (๐‘‘) (๐‘–)(1 โˆ’ ๐œ‹ (๐‘Ž) (๐พ)) end for ๐‘ƒ๐‘ = ๐œ‹ (๐‘Ž) (๐พ) return ๐‘ƒ๐‘

๐‘†๐‘Ÿ๐‘’๐‘ž (1 + ๐‘2 ๐‘†๐‘Ž๐‘๐‘ก = ๐‘Ÿ๐‘Ÿ +

(2โˆ’๐‘Ÿ๐‘‘๐‘–๐‘Ÿ๐‘ก๐‘ฆ )

1โˆ’๐‘Ÿ๐‘Ÿ ๐‘1

2

(๐‘Ÿ๐‘Ÿ +

+ (๐‘Ÿ๐‘Ÿ +

๐‘ 1โˆ’๐‘Ÿ๐‘Ÿ )) 1โˆ’๐‘๐‘”๐‘ ๐‘1 ๐‘”๐‘

๐‘ 1โˆ’๐‘Ÿ๐‘Ÿ ) ๐‘”๐‘ ๐‘1 1โˆ’๐‘๐‘”๐‘

.

(29)

Proof. Assume ๐œ† is the arrival rate of user requests, then the user request arrival rate to the internal queue is: ๐œ†๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘›๐‘Ž๐‘™ = ๐œ†1 + ๐œ†2

3.3. Performance modeling under different user batch write sizes

= ๐œ†(๐‘Ÿ๐‘Ÿ + Since we mainly analyze the status of the internal queue to check the ๐‘ƒ๐‘ , it is required to gather the properties of the requests entering or buffering in the internal queue. The requests include the read request, batch write request, GC read and GC write request. In this part, we only consider the SMR drive performance without GC process, so the main incoming requests are the read request and batch write request. Since the write requests in the external queue are batched together as one request to the internal queue, the IOPS and the request size for the internal queue are different from the external queue.

1 โˆ’ ๐‘Ÿ๐‘Ÿ ). ๐‘1

(30)

The GC process arrival rate will be: (๐‘Ÿ๐‘Ÿ + ๐œ†3 = ๐œ†

1โˆ’๐‘Ÿ๐‘Ÿ )๐‘๐‘”๐‘ ๐‘1

1 โˆ’ ๐‘๐‘”๐‘

(31)

.

The total size of data disk processed during the time period is: ๐‘2 + ๐‘2 (1 โˆ’ ๐‘Ÿ๐‘‘๐‘–๐‘Ÿ๐‘ก๐‘ฆ )

). (32) 2 Then the average request size to the internal queue is ๐‘†๐‘Ž๐‘๐‘ก = ๐‘†๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ . With some algebraic calculations, we can get the expression ๐œ†๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘›๐‘Ž๐‘™ +๐œ†๐‘”๐‘ in (29). โ–ก ๐‘†๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ = ๐‘†๐‘Ÿ๐‘’๐‘ž (๐œ† + ๐œ†๐‘”๐‘

Lemma 3.4. If the request arrival rate to the external queue follows a Poisson distribution, the arrival rate to the internal queue will also follow a Poisson distribution.

The arrival rate to the internal queue ๐œ†๐‘Ž๐‘๐‘ก can be expressed as: ๐œ†๐‘Ž๐‘๐‘ก = ๐œ†1 + ๐œ†2 + ๐œ†3

Proof. From Kleinrock (1976), we know that the sum of two Poisson processes with arrival rates of ๐œ†๐‘Ž and ๐œ†๐‘ will still be a Poisson process with arrival rate ๐œ†๐‘Ž + ๐œ†๐‘ . Meanwhile, splitting of a Poisson process with arrival rate of ๐œ† will generate two or more Poisson process with arrival rate of ๐›ผ๐œ† and ๐›ฝ๐œ†. If the workload has arrival rate ๐œ†, and read ratio ๐‘Ÿ๐‘Ÿ , then the read and write requests will follow Poisson distributions with arrival rate of ๐œ†1 = ๐‘Ÿ๐‘Ÿ ๐œ† and (1 โˆ’ ๐‘Ÿ๐‘Ÿ )๐œ†, respectively. The batch write request to the internal queue will also follow a Poisson distribution with arrival rate ๐œ†2 = (1 โˆ’ ๐‘Ÿ๐‘Ÿ )๐œ†โˆ•๐‘1 . The combination of read and batch write requests to the disk internal queue also follows a Poisson distribution with arrival rate of ๐œ†1 + ๐œ†2 , which is ๐œ†(๐‘Ÿ๐‘Ÿ + (1 โˆ’ ๐‘Ÿ๐‘Ÿ )โˆ•๐‘1 ). โ–ก

=

๐œ†๐‘Ÿ๐‘Ÿ + ๐œ†(1 โˆ’ ๐‘Ÿ๐‘Ÿ )โˆ•๐‘1 . 1 โˆ’ ๐‘๐‘”๐‘

(33)

Through binary searching, we can get the maximum IOPS at the internal queue ๐œ†๐‘Ž๐‘๐‘ก . Then we can compute the ๐œ†๐‘š๐‘Ž๐‘ฅ with the (33). ๐œ†๐‘š๐‘Ž๐‘ฅ =

๐œ†๐‘Ž๐‘๐‘ก (1 โˆ’ ๐‘๐‘”๐‘ ) ๐‘Ÿ๐‘Ÿ + (1 โˆ’ ๐‘Ÿ๐‘Ÿ )โˆ•๐‘1

.

(34)

3.5. Performance modeling under different disk fullness statuses Under different ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ , the policy selection for the GC process is different. If ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ is less than a pre-defined threshold, such as 0.5, the disk 92

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

Table 4 Differences between the existing simulators and ours. Simulator

Write process

GC process

Disk status consideration

SMR emulator (Pitchumani et al., 2012) SMR simulator (Tan et al., 2013) Our simulator

Read-Modify-Write Log structured write User batch write

No GC process No GC process New proposed GC process

No No Adaptively allocate disk processing bandwidth to GC process based on disk status

does not require to do the foreground GC process as the disk capacity is still enough for future write requests. So the performance evaluation can be conducted through Eqs. (26)โ€“(28). When ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ is larger than the pre-defined threshold, foreground GC process is required to clean the invalid data. The disk processing bandwidth ๐‘๐‘”๐‘ is calculated based on ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ , and it increases when ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ increases. The performance of the SMR drive under this scenario is estimated through Eqs. (29), (33) and (34).

further reduced by increasing ๐‘1 . Fig. 6(a) also shows that the increasing of ๐‘1 can significantly improve ๐œ†๐‘š๐‘Ž๐‘ฅ when ๐‘Ÿ๐‘Ÿ is small. However, the performance improvement is not so significant when ๐‘Ÿ๐‘Ÿ is large, as the performance of the read request is not affected by the batch process. Although increasing ๐‘1 can improve the overall performance when ๐‘Ÿ๐‘Ÿ is small, we cannot increase ๐‘1 without a limitation. This is because larger ๐‘1 means that more memory buffer space needs to be allocated to store the write requests, and less buffer size can be used to cache the read data. Fig. 6(c) exhibits the results for the SMR drive performance under different request sizes. It is obvious that ๐œ†๐‘š๐‘Ž๐‘ฅ decreases with increased request size ๐‘†๐‘Ÿ๐‘’๐‘ž , as more data transfer time is required for handling the requests with larger ๐‘†๐‘Ÿ๐‘’๐‘ž . With similar disk positioning time, the overall disk service time is larger for a request with larger ๐‘†๐‘Ÿ๐‘’๐‘ž , which reduces the overall disk service rate and ๐œ†๐‘š๐‘Ž๐‘ฅ . Although ๐œ†๐‘š๐‘Ž๐‘ฅ is smaller when ๐‘†๐‘Ÿ๐‘’๐‘ž is larger, the overall throughput is larger compared with workload that has smaller ๐‘†๐‘Ÿ๐‘’๐‘ž , which can be found in Fig. 6(d). The performance ๐œ†๐‘š๐‘Ž๐‘ฅ under different sequential ratios ๐‘Ÿ๐‘  is also studied, and the results are presented in Fig. 6(b). Under different ๐‘1 , ๐œ†๐‘š๐‘Ž๐‘ฅ always increases with increasing ๐‘Ÿ๐‘  , as the disk can save significant amount of positioning time to handle the sequential requests, which increases the disk service rate. We can also notice that the ๐œ†๐‘š๐‘Ž๐‘ฅ improvement with increasing ๐‘1 becomes less significant when ๐‘Ÿ๐‘  increases. The main reason for this is that the sequential requests have similar performance no matter whether applying or not applying batch process, while random requests with batch process have much better performance.

4. Results and discussion In this section, the simulator that we developed to verify the correctness of the SMR drive model is presented. Then the performance of the SMR drive is evaluated through simulation and analytical modeling. The evaluation results under different drive settings and workload environments are compared and discussed. 4.1. Simulation setup To validate the correctness of the proposed analytical model, we design and develop an SMR drive simulator to simulate the performance of the SMR drive under different drive settings and workload environments. The differences between our simulator and the conventional SMR simulator are listed in Table 4. The overall structure of the simulator is shown in Fig. 5. There are mainly 5 components in the SMR drive simulator. Workload generator generates user workload based on different workload properties. SMR handler handles the user requests and the GC process. The external buffer queue is located at this level. The user read request is directly passed to the next module for disk processing. Meanwhile, the user write request will be buffered in the external queue, when the number of write requests in the external queue reaches the write batch size ๐‘1 , all the write requests will be combined and passed to next module. The GC process is also initiated by this module. When the disk conditions, such as disk capacity usage, idleness, etc. meet the GC process requirements, GC read and GC write operations will be issued to the disk. Scheduler buffers all the requests, including user requests and GC processes. The RPO scheduling algorithm is implemented in this module to decide which specular request to be handled by the subsequent disk operation. In the queue, priority is given to different kinds of requests based on the drive design. For example, lower priority is given to the background GC process compared with user requests. The same priority is given to the foreground GC process and user requests. Disk simulate the mechanical properties of the SMR device.

4.3. Performance analysis under different GC batch write sizes Similarly, the performances of SMR drive under different GC batch write sizes are also studied by fixing other parameters as shown in Table 1. The simulation results and the analytical results are shown in Fig. 7. Fig. 7(a) shows a similar performance changing pattern as in Fig. 6(a). The only difference between them is the value of ๐œ†๐‘š๐‘Ž๐‘ฅ , which is mainly caused by the foreground GC process handled by the SMR drive, requiring certain amount of the disk processing bandwidth (In this experiment, we use 20% of disk process bandwidth for GC process). The performances ๐œ†๐‘š๐‘Ž๐‘ฅ under different GC batch write sizes ๐‘2 are also illustrated. It is easy to notice that ๐œ†๐‘š๐‘Ž๐‘ฅ decreases with increased ๐‘2 , as the amount of data needed to be cleaned by GC process increases with larger ๐‘2 , which requires more data transfer time, and thus the overall disk processing time. However, the data cleaning speed can increase , which is the benefit of larger ๐‘2 . It is especially useful with larger ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ . In Fig. 7(c), we notice that ๐œ†๐‘š๐‘Ž๐‘ฅ degrades when the request size ๐‘†๐‘Ÿ๐‘’๐‘ž increases. A similar performance changing pattern can be found in Fig. 6(c). This is on the fact that the requests with larger ๐‘†๐‘Ÿ๐‘’๐‘ž always require longer data transfer time for different values of ๐‘1 and ๐‘2 , which increases the disk service time and decreases the disk processing rate. As shown in Fig. 7(b), ๐œ†๐‘š๐‘Ž๐‘ฅ increases with increasing ๐‘Ÿ๐‘  . The reason can be illustrated similarly as the one shown in Section 4.2. The main reason behind this is that requests with larger ๐‘Ÿ๐‘  save the disk processing bandwidth and increase the disk service rate, which increases the ๐œ†๐‘š๐‘Ž๐‘ฅ . Meanwhile, we also notice that the ๐œ†๐‘š๐‘Ž๐‘ฅ performance degrades when ๐‘2 increases. This is similar to the one shown in Fig. 7(a), as both of them need more time to handle the GC process with larger ๐‘2 , which obviously reduces the disk service rate for user requests and decreases ๐œ†๐‘š๐‘Ž๐‘ฅ performance.

4.2. Performance analysis under different user write batch sizes To study the SMR drive performances under different write batch size settings for different kinds of workloads, we fix other drive parameters as shown in Table 1. The simulation results and the analytical results are shown in Fig. 6. As shown in Fig. 6(a), the ๐œ†๐‘š๐‘Ž๐‘ฅ performance degrades when the ๐‘Ÿ๐‘Ÿ increases. This is because of that the number of write requests decreases when ๐‘Ÿ๐‘Ÿ increases. Then the benefit of batch write to combine several random write requests into one sequential write request is reduced, and the reduction of request arrival rate to the internal queue is less, thus the overall performance degrades with increasing ๐‘Ÿ๐‘Ÿ . Other than that, we notice that ๐œ†๐‘š๐‘Ž๐‘ฅ is larger when ๐‘1 is larger with fixed ๐‘Ÿ๐‘Ÿ . This is a consequence of the IOPS to the internal queue that can be 93

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

Fig. 5. The overall structure of the SMR drive simulator.

(a) Read ratio.

(b) Sequential ratio.

(c) Request size.

(d) Request size throughput.

Fig. 6. SMR drive performance under different workload properties and write batch sizes.

process, and ๐œ†๐‘š๐‘Ž๐‘ฅ will further decrease. As a result, the overall trends of ๐œ†๐‘š๐‘Ž๐‘ฅ decrease with increasing ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ . Similar to Figs. 6(c), 7(c) and 8(c) also shows that the ๐œ†๐‘š๐‘Ž๐‘ฅ decreases and throughput increases with larger ๐‘†๐‘Ÿ๐‘’๐‘ž . The ๐œ†๐‘š๐‘Ž๐‘ฅ performance degradation with increased ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ can be found in Fig. 6(c), which is caused by the same reason as shown in Fig. 8(a). The performance increment with increasing ๐‘Ÿ๐‘  under different ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ can be illustrated by Fig. 8(b). The reason for this is similar as the one we explained in Section 4.2. Based on the results shown in Figs. 8(a)โ€“8(c), we can conclude that ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ only affects the value of the ๐œ†๐‘š๐‘Ž๐‘ฅ , but has minimal impact on the performance changing trends under different workload properties, such as the read ratio ๐‘Ÿ๐‘Ÿ , request size ๐‘†๐‘Ÿ๐‘’๐‘ž and sequential ratio ๐‘Ÿ๐‘  . Meanwhile, ๐œ†๐‘š๐‘Ž๐‘ฅ always reduces when ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ increases. This can be explained by that

4.4. Performance analysis under different disk fullness statuses Similar study is also conducted on the performance of SMR drive under different disk statuses, where other drive parameters are fixed as shown in Table 1. The simulation results and the analytical results are shown in Fig. 8. Fig. 8(a) shows a similar performance changing pattern as Fig. 6(a) with different ๐‘Ÿ๐‘Ÿ . The differences between them are the actual values of ๐œ†๐‘š๐‘Ž๐‘ฅ under different settings, and the performance trends under different ๐‘1 and ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ . In Fig. 8(a), the actual values of ๐œ†๐‘š๐‘Ž๐‘ฅ are much smaller than the ones shown in Fig. 6(a) with same ๐‘Ÿ๐‘Ÿ settings, as certain amount of disk processing bandwidth is allocated to the GC process in Fig. 8(a), which lowers the service rate for user requests and thus ๐œ†๐‘š๐‘Ž๐‘ฅ . With increasing ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ , more disk processing bandwidth is allocated to GC 94

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

(a) Read ratio.

(b) Sequential ratio.

(c) Request size.

(d) Request size throughput.

Fig. 7. SMR drive performance under different workload properties and GC batch write sizes.

5. Conclusion

the larger disk processing bandwidth is allocated to the GC process when ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ increases, which reduces the disk service rate for user requests. However, since we start the foreground GC process earlier, such as ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ = 0.5, more dirty space can be cleaned through these foreground GC processes before that the disk status reaches capacity limit, e.g. ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ > 0.9. Although ๐œ†๐‘š๐‘Ž๐‘ฅ performance reduces a bit, which is dependent on the bandwidth allocation policy. However, the speed of disk capacity usage reduces and the dirty ratio of the whole disk decreases, which means the disk usage ratio increases. More time is allowed for the disk to be utilized before ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ = 0.9 is reached. As the overall dirty ratio reduces significantly, even when ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ > 0.9 is reached, the priority of foreground GC is not so high compared with normal SMR drive design, which may have a large dirty ratio. Thus the performance of SMR drive is improved when ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ > 0.9, and the speed of ๐‘ ๐‘‘๐‘–๐‘ ๐‘˜ reaching to full is lower compared with the one without applying foreground GC policy. In all, we can notice that the performance differences between the simulation results and the analytical results are less than 15% for all the disk parameters and workload properties. In some cases, the performance error is even less than 10%. The performance trends under different workload parameters and drive settings are also similar. This allows us to use the analytical modeling to model the SMR drive and provide reasonable performance estimation under different drive designs and workload environments. It can also work as a guide for the future SMR drive design.

This paper provides an analytical model and a simulation tool to study the SMR drive performance under different kinds of workload environments, such as read ratio, request size, sequential ratio, etc., and SMR drive setting parameters, such as write batch size, GC batch write size, and disk fullness status, etc. We analyze the disk processing time and queue length distribution with RPO scheduling algorithm under different arrival rates through an ๐‘€โˆ•๐บโˆ•1โˆ•๐พ queue modeling with state dependent service time. The ๐œ†๐‘š๐‘Ž๐‘ฅ is also studied through the generated model based on the user QoS requirement, such as request blocking ratio. A new GC policy selection algorithm is proposed to solve the poor performance of SMR drive when the usage of disk capacity reaches limit. The results from analytical modeling and simulation tool are studied and compared, which cross-validate the correctness of the model and the simulation tool. The 10%โ€“15% performance difference between the analytical and simulation results allows us to use the analytical modeling as a guide for the future SMR drive design. We note that there are many influential factors not fully considered in this paper, e.g., the complex IO pattern with burst, fragmented read, spatial/temporal locality of writes, etc. For example, the IO burst pattern contains multiple parameters, e.g., frequency and length. We may want to design a new type of SMR drive, in which the number of GC processes is only enough to accommodate design target for burst size. Fragmented read has turned out to have more important performance impacts to user experience. These topics will be discussed in our future work. 95

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

(a) Read ratio.

(b) Sequential ratio.

(c) Request size.

(d) Request size throughput.

Fig. 8. SMR drive performance under different workload properties and disk statuses.

Appendix. The supporting lemmas

processing time. Thus we can say that all the states in ๐‘ƒ are aperiodic, and the Markov chain matrix ๐‘ƒ is aperiodic. โ–ก

Lemma A.1. The disk queue length state transition matrix ๐‘ƒ is irreducible and aperiodic.

Lemma A.2. If the queue length distribution at the departure instance is ๐œ‹๐ฟ(๐‘‘) , then the request blocking ratio can be calculated through (35).

Proof. From Cinlar (2013), we know that a Markov chain is said to be irreducible if it is possible to get to any state from any state. Let us assume ๐‘– and ๐‘— are a pair of states in ๐‘ƒ . If ๐‘– = ๐‘—, it means the state does not change, which could happen when the disk queue only receives one request during the disk processing time. If ๐‘– < ๐‘—, this could happen when the disk queue receives ๐‘— โˆ’๐‘–+1 requests during the disk processing time. If ๐‘– > ๐‘—, this could occur when the disk queue does not receive any request during ๐‘– โˆ’ ๐‘— disk processing times. Thus we can say that ๐‘ƒ is irreducible. To prove that ๐‘ƒ is aperiodic, first let us define the periodic of a state as ๐‘˜ = ๐‘”๐‘๐‘‘(๐‘› > 0 โˆถ ๐‘ƒ ๐‘Ÿ(๐‘‹๐‘› = ๐‘–|๐‘‹0 = ๐‘–) > 0). If ๐‘˜ > 1, then the state is periodic and if ๐‘˜ = 1, then the state is aperiodic. From Cinlar (2013), we know that a Markov chain is aperiodic if every state is aperiodic. Let us assume there is a state ๐‘– in ๐‘ƒ . If ๐‘– = 0, which means the state of the queue length is 0, then the state can be reached from state ๐‘— = 1, when there is no request arrives to the queue during the disk processing time. If ๐‘– = ๐พ, the state can be reached from ๐‘— = ๐พ โˆ’ 1, when there are 2 requests that arrive during the disk processing time. For any other state, the state can be reached from ๐‘— = ๐‘– โˆ’ 1 when there are 2 requests, or from ๐‘— = ๐‘– + 1 when there is no request that arrives during the disk

๐œ‹ (๐‘Ž) (๐พ) =

๐œ‹๐น(๐‘Ž) (1) =

โˆ‘๐พโˆ’2

๐‘‘ ๐‘—=0 ๐œ‹ (๐‘—) โˆ‘๐พโˆ’2 ๐‘‘ , โˆ’ ๐‘—=0 ๐œ‹ (๐‘—) (๐‘Ž) ๐œ‹๐น (1)

1โˆ’

(35)

1

๐œ† ๐œ†+

1 ๐ธ[๐‘‡ ]

(36)

,

where ๐œ‹๐น(๐‘Ž) (1) are the probability distributions that there is 1 request in the queue. Proof. Based on the Markov chain property, we have ๐œ‹ (๐‘Ž) (๐‘–) = ๐œ‹ (๐‘‘) (๐‘–)(1 โˆ’ ๐œ‹ (๐‘Ž) (๐พ)),

(37)

๐‘– = 0, 1, โ€ฆ , ๐พ โˆ’ 1

๐œ‹ (๐‘Ž) (๐พ) = (1 โˆ’ ๐น๐น (๐พ โˆ’ 2))๐œ‹๐น(๐‘Ž) (1). โˆ‘๐‘–

(38) โˆ‘๐พโˆ’2

With ๐น๐น (๐‘–) = ๐‘—=0 ๐œ‹ (๐‘Ž) (๐‘—), we can get ๐น๐น (๐พ โˆ’ 2) = ๐‘—=0 ๐œ‹ (๐‘Ž) (๐‘—), and then apply it into (38). Based on (37) and (38), we can get (35) through some algebraic calculations. โ–ก 96

J. Niu et al.

Control Engineering Practice 75 (2018) 86โ€“97

References

Xie, M., Xia, L., & Xu, J. (2017). State-dependent ๐‘€โˆ•๐บโˆ•1โˆ•๐พ queuing model for hard disk drives. In The 13th conference on automation science and engineering, CASE. IEEE. Xie, M., Xu, J., & Xia, L. (2016). Performance analysis of media-based cache in hard disk drives. In IEEE region ten conference, TENCON . IEEE.

Abouee-Mehrizi, H., & Baron, O. (2016). State-dependent ๐‘€โˆ•๐บโˆ•1 queuing systems. Queueing Systems, 82(1โ€“2), 121โ€“148. Altman, E., Avrachenkov, K., Barakat, C., & Nรบรฑez-Queija, R. (2002). State-dependent ๐‘€โˆ•๐บโˆ•1 type queuing analysis for congestion control in data networks. Computer Networks, 39(6), 789โ€“808. Amer, A., Holliday, J., Long, D. D., Miller, E. L., Pรขris, J.-F., & Schwarz, T. (2011). Data management and layout for shingled magnetic recording. IEEE Transactions on Magnetics, 47 (10), 3691โ€“3697. Burkhard, W., & Palmer, J. (2001). Rotational position optimization (RPO) disk scheduling. Tech. rep.. La Jolla, CA, USA: University of California at San Diego. Chen, S., & Towsley, D. (1993). The design and evaluation of RAID 5 and parity striping disk array architectures. Journal of Parallel and Distributed Computing , 17 (1โ€“2), 58โ€“74. Cinlar, E. (2013). Introduction to stochastic processes. Courier Corporation. David, H. A., & Nagaraja, H. (2004). Order statistics. John Wiley & Sons. Denning, P. J. (1967). Effects of scheduling on file memory operations. In Proceedings of spring joint computer conference (pp. 9โ€“21). ACM. Feldman, T., & Gibson, G. (2013). Shingled magnetic recording: Areal density increase requires new data management. USENIX , 38(3), 22โ€“30. Gibson, G., & Ganger, G. (April 2011). Principles of operation for shingled disk devices. In Carnegie Mellon University Parallel Data Lab technical report CMU-PDL-11-107. Gross, D. (2008). Fundamentals of queueing theory. John Wiley & Sons. Hall, D., Marcos, J. H., & Coker, J. D. (2012). Data handling algorithms for autonomous shingled magnetic recording HDDs. IEEE Transactions on Magnetics, 48(5), 1777โ€“ 1781. Herrmann, G., Ge, S. S., & Guo, G. (2008). Discrete linear control enhanced by adaptive neural networks in application to a HDD-servo-system. Control Engineering Practice, 16(8), 930โ€“945. Horowitz, R., Li, Y., Oldham, K., Kon, S., & Huang, X. (2007). Dual-stage servo systems and vibration compensation in computer hard disk drives. Control Engineering Practice, 15(3), 291โ€“305. INCITS T10 Technical Committee, (September 2014). Information technology-zoned block commands (ZBC). Draft standard T10/BSR INCITS 536. American National Standards Institute, Inc. INCITS T13 Technical Committee, (Jan 2016). Zoned-device ATA command set (zac) working draft. Draft standard T10/BSR INCITS 536. American National Standards Institute, Inc. Jacobson, D. M., & Wilkes, J. (1991). Disk scheduling algorithms based on rotational position. Palo Alto, CA: Hewlett-Packard Laboratories. Kleinrock, L. (1976). Queueing systems, volume 2: Computer applications, vol. 66. New York: Wiley. Lebrecht, A. S., Dingle, N. J., & Knottenbelt, W. J. (2009). A performance model of zoned disk drives with I/O request reordering. In Sixth international conference on the quantitative evaluation of systems, 2009. QESTโ€™09, (pp. 97โ€“106). IEEE. Liu, W., Zeng, L., & Feng, D. (2016). APAS: an application aware hybrid storage system combining SSDs and SWDs. In 2016 IEEE international conference on networking, architecture and storage, NAS (pp. 1โ€“4). IEEE. Luo, D., Wan, J., Zhu, Y., Zhao, N., Li, F., & Xie, C. (2016). Design and implementation of a hybrid shingled write disk system. IEEE Transactions on Parallel and Distributed Systems, 27 (4), 1017โ€“1029. Luo, D., Yao, T., Qu, X., Wan, J., & Xie, C. (2016). DVS: Dynamic variable-width striping RAID for shingled write disks. In 2016 IEEE international conference on networking, architecture and storage, NAS (pp. 1โ€“10). IEEE. Niu, J., Xu, J., & Xie, L. (2017). Analytical modeling of SMR drive under different workload environments. In 2017 13th IEEE international conference on control and automation, ICCA. IEEE. Pitchumani, R., Hospodor, A., Amer, A., Kang, Y., Miller, E. L., & Long, D. D. (2012). Emulating a shingled write disk. In IEEE 20th international symposium on modeling, analysis and simulation of computer and telecommunication systems (pp. 339โ€“346). IEEE. Ruemmler, C., & Wilkes, J. (1994). An introduction to disk drive modeling. Computer, 27 (3), 17โ€“28. Seltzer, M., Chen, P., & Ousterhout, J. (1990). Disk scheduling revisited. In Proceedings of the winter 1990 USENIX technical conference (pp. 313โ€“323). Shriver, E., Merchant, A., & Wilkes, J. (1998). An analytic behavior model for disk drives with readahead caches and request reordering. ACM SIGMETRICS Performance Evaluation Review, 26(1), 182โ€“191. Suthasun, T., Mareels, I., & Al-Mamun, A. (2004). System identification and controller design for dual actuated hard disk drive. Control Engineering Practice, 12(6), 665โ€“676. Tan, S., Xi, W., Ching, Z. Y., Jin, C., & Lim, C. T. (2013). Simulation for a shingled magnetic recording disk. IEEE Transactions on Magnetics, 49(6), 2677โ€“2681. Venkataramanan, V., Chen, B. M., Lee, T. H., & Guo, G. (2002). A new approach to the design of mode switching control in hard disk drive servo systems. Control Engineering Practice, 10(9), 925โ€“939. Worthington, B. L., Ganger, G. R., & Patt, Y. N. (1994). Scheduling algorithms for modern disk drives. ACM SIGMETRICS Performance Evaluation Review, 22(1), 241โ€“251. Xiao, W., Dong, H., Ma, L., Liu, Z., & Zhang, Q. (2016). HS-BAS: A hybrid storage system based on band awareness of shingled write disk. In 2016 IEEE 34th international conference on computer design, ICCD (pp. 64โ€“71). IEEE.

Junpeng Niu received the B.E. and M.S. degrees in the School of Computer Engineering from Nanyang Technological University, Singapore, in 2007 and 2009, respectively. He is currently a Ph.D. student in the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore.

Jun Xu is a principal engineer in Western Digital. He got his B.S. in applied mathematics in 2001 and Ph.D. in Control and Automation in 2006, from Southeast University (China) and Nanyang Technological University (Singapore), respectively. Before Joining Western Digital, he was with Data storage institute, Nanyang Technological University, and National University of Singapore. He has multi-discipline knowledge and solid experiences in complex system design, management, modeling and simulation, data analytics, data center, cloud storage, and IoT. He has published around 60 international papers and patents and 1 monograph. He was a committee member of several international conferences on control and automation and is an editor of the journal unmanned systems. He is a senior member of IEEE and a certificated FRM.

Lihua Xie received the B.E. and M.E. degrees in electrical engineering from Nanjing University of Science and Technology, Nanjing, China, in 1983 and 1986, respectively, and the Ph.D. degree in electrical engineering from the University of Newcastle, Callaghan, NSW, Australia, in 1992. Since 1992, he has been in the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, where he is currently a Professor and Director, Delta-NTU Corporate Laboratory for Cyberโ€“Physical Systems. He served as the Head of the Division of Control and Instrumentation from July 2011 to June 2014. His research interests include robust control and estimation, networked control systems, multiagent networks, localization, and unmanned systems. He is an Editor-in-Chief for Unmanned Systems and an Associate Editor for the IEEE TRANSACTIONS ON NETWORK CONTROL SYSTEMS. He has served as an Editor of IET Book Series in Control and an Associate Editor of a number of journals including the IEEE TRANSACTIONS ON AUTOMATIC CONTROL, Automatica, IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II. He is an elected member of Board of Governors, IEEE Control System Society (January 2016โ€“ December 2018). He is a Fellow of IEEE and IFAC.

Li Xia is an associate professor at the Center for Intelligent and Networked Systems (CFINS), Department of Automation, Tsinghua University, Beijing China. He received the B.S. degree and the Ph.D. degree in Control Theory in 2002 and 2007 respectively, both from Tsinghua University. After graduation, he worked at IBM Research China as a research staff member (2007โ€“2009) and at the King Abdullah University of Science and Technology (KAUST) Saudi Arabia as a postdoctoral research fellow (2009โ€“2011). Then he returned to Tsinghua University in 2011. He was a visiting scholar at Stanford University, the Hong Kong University of Science and Technology, etc. He serves/served as an associate editor and program committee member of a number of international journals and conferences. His research interests include the methodology research in stochastic learning and optimization, queuing theory, Markov decision processes, reinforcement learning, and the application research in building energy, energy Internet, industrial Internet, Internet of things, etc. He is a senior member of IEEE.

97