CHAPTER 20
Drive Interface The disk drive is not a stand-alone device. To fulfill its function as a device for storing user data, the disk drive needs to be attached to the user’s system, traditionally called the host. It operates by receiving requests, or I/O commands, from the host to either store the data that the host is sending to it or retrieve and return to the host some piece of data that the host had previously stored in the drive. For the disk drive to provide its data storage service to a host, it needs to have a protocol established to receive I/O commands from the host and to signal back to the host their completions. The protocol also has to define the mechanisms for receiving the user data when asked to store them and returning the stored user data when requested. This interface between the disk drive and its host will be discussed in this chapter. A disk drive has a predefined set of commands which is standard for the interface that the disk drive is designed for. While many of the commands in the command set have to do with housekeeping and other miscellaneous things, the vast majority of the time, the commands being sent to the disk drive by its user are I/O requests, i.e., requests to either read or write a piece of data. Since such requests are the primary function and purpose of a disk drive, they are the commands of the greatest interest in this book. Common to all standard interfaces, when an I/O request is issued to a disk drive, three basic parameters are needed before the drive knows what to do: (i) whether the command is for a read or a write operation, (ii) the starting address for the command, and (iii) the number of sectors of data to be read or written. When the drive has finished performing an I/O request, regardless of which standard interface is being used, the drive needs to inform the host of this completion.
In this chapter, the most common standard interfaces in use today for disk drives are discussed. It starts off with a general overview of interfaces, followed by high-level descriptions of the major characteristics and features of five different interface standards. It concludes with a discussion of a disk drive’s cost, performance, and reliability with respect to its interface.
20.1
Overview of Interfaces
The interface is the communication channel over which I/O requests are sent from the host to the disk drive and through which all data transfers for reading and writing take place. For it to function, both ends of the interface, namely the host system and the disk drive, must conform to some predefined protocol specific to that interface. A basic logical model for directly attaching a disk drive to a system is shown in Figure 20.1. This model is applicable whether the host system is a server, a personal computer (PC), or some consumer electronics such as a digital video recorder (DVR). For large enterprise computing systems, directly attached drives are usually only for things such as booting and program loading, with data drives indirectly attached via some storage servers. Various flavors of storage servers and configurations will be discussed in Chapter 24. For PCs, the system I/O bus would be the PCI bus. For DVRs, the processor would be responsible for the MPEG-2 coder/decoder function, as well as scheduling on-time delivery of video stream data to/from the disk. The host side controller may have different names. Host bus adapter (HBA) is a commonly used term. In mainframes it is called a Storage Controller, 699
ch20-P379751.indd 699
8/6/07 4:22:40 PM
700 Memory Systems: Cache, DRAM, Disk
System I/O Bus
Processor
Memory Bus
Memory
Host Side Controller Host-Drive Interface
Drive Controller
HDA FIGURE 20.1: Model of a disk drive attached to a host system via an interface. such as the IBM 3880. In PCs, it may be integrated on the motherboard with no special name, or it may be a PCI card and called simply the controller card or sometimes referred to as the disk drive adapter. Multiple host side controllers may be attached to the system I/O bus, and multiple drives may be attached to each host side controller. The main point is that the host processor issues I/O requests to the drive via the host side controller, and the disk drive exchanges data with the system’s memory through the interface and the host side controller. By using a specific host side controller to bridge between a standard interface and a specific system’s I/O bus, a disk drive with a standard interface can be connected to different systems with distinct processors and architectures. For example, SCSI drives and ATA drives can be connected to a PC with an Intel processor, an Apple computer with a PowerPC processor, or a Sun workstation with a SPARC processor by using the right controller. In the early days, before Large Scale Integration (LSI) made adequate computational power economical to be put in a disk drive, the disk drives were “dumb” peripheral devices. The host system had to micromanage every low-level action of the disk drive. Hence, most of the intelligence resided in the host side of the interface, with the drive side controller doing not much more than controlling the rotation of the drive and the servo. The host system had to know the detailed physical geometry of the disk drive, e.g.,
ch20-P379751.indd 700
number of cylinders, number of heads, number of sectors per track, etc. It even had to handle the defect management of the drive. Upgrading to a new disk drive meant having to upgrade the software on the host system. The interfaces were proprietary and not standardized, although IBM, having the dominant market share at the time, was the de facto standard for the mainframe. Two things changed this picture. First, with the emergence of PCs, which eventually became ubiquitous, and the low-cost disk drives that went into them, interfaces became standardized. Second, large-scale integration technology in electronics made it economical to put a lot of intelligence in the drive side controller. Volume production of disk drives driven by the fast-expanding PC market reinforced the trend. The end result is that today’s disk drive handles all the nitty-gritty details of its internal housekeeping and management. The host side controller becomes much simpler and standardized. Logical addressing allows the host system to do all its disk accesses using simple high-level I/O commands, freeing it from having to know anything about the geometry of the disk drive. New disk drives can be added to a system without having to change its software. Furthermore, instead of proprietary, the computer industry has adopted a handful of standard interfaces. While mainly used by hard disk drives, these standard interfaces can also be used by other storage devices such as tape drives and optical drives. There are several common standard interfaces for disk drives today. They are the parallel and serial versions of ATA, parallel and serial versions of SCSI, and the serial Fiber Channel. They will be discussed in the remainder of this section. A few others, such as Serial Storage Architecture (SSA) and various flavors of Intelligent Peripheral Interface (IPI), have come and gone and will not be covered in this book. Since these interfaces are standardized by standards committees, their details are fully described in their respective official standard documents which are generally thousands of pages long. Hence, it is not necessary, nor possible, for this book to present all the fine details. Rather, the purpose here is to give a very high level overview of each so that the reader can more readily reference the standard documents for precise specifications.
8/6/07 4:22:41 PM
Chapter 20 DRIVE INTERFACE 701
20.1.1 Components of an Interface
20.1.2
An interface logically consists of two protocol levels. The lower level deals with transmission (physical, link, and transport layers), which includes the cables and the connectors for carrying the data, the voltages and electrical signals that are to be applied to the cables, coding schemes, and transmission procedures. It is necessary to define what form each signal takes. Is it level-triggered or edge-triggered? If it is level-triggered, is it active high or active low? If it is edge-triggered, is it triggered on the rising edge, the falling edge, or both? At the higher level is the logical layer, which defines the protocol and the set of commands that can be issued by the host to the drive. The protocol specifies the precise sequence of handshakes that needs to be observed by both the host and the drive. The command set defines what each command code represents and what actions the drive must take in response. Because of the dual layering of the interface, two standard interfaces may share the same lower or higher level. For instance, parallel ATA and serial ATA share the same higher level protocol, but have different transmission layers. Ditto for parallel SCSI, serial SCSI, and Fibre Channel, which all use the SCSI command set. On the other hand, serial ATA and serial SCSI, while using different logical protocols, have similar and compatible, though not identical, transmission layers. This is because serial SCSI borrowed much of the physical layer technology from serial ATA. All the standard interfaces are not static. Instead, they have all gone through multiple generations of evolutions. New versions are defined usually to add features and to increase speed. Fortunately, each new version is backward compatible with earlier versions, most of the time. This means a newer disk drive can be attached to an older version of host side controller, and vice-versa, and things will still function properly, albeit only at the functional and performance level of the older version of the two. Data rate generally increases from one version to the next. This is usually accomplished by increasing the clock rate of the interface.
Ideally, from a performance point of view, an interface should have the following desirable characteristics: •
•
•
•
•
Desirable Characteristics of Interface
Simple protocol. The fewer handshakes that need to be exchanged between the host and the drive to complete a command, the lower the overhead. High autonomy. The less the host processor needs to be involved in completing a command, the more it is freed up to handle other tasks. High data rate, up to a point. It is essential for good performance that the interface data rate be higher than the media data rate of the drive. Otherwise, data-overrun for reads and data-underrun for writes1 will occur, causing the disk drive to suspend media data transfer, thus requiring one or more extra revolutions to complete the command. However, there is not much performance benefit for the interface data rate to be much faster than the media data rate. The only time this benefit is observable is when a command with a large block size is being satisfied with the cache. This is because, as discussed in Chapter 19, the data transfer time is an insignificant portion of an I/O’s total time for random access. For a long sequential transfer, data must come from the media, and hence, the throughput is gated by the media data rate. Overlapping I/Os. If multiple disk drives can be placed on a shared interface, the interface should support concurrent commands to be issued to those disk drives by the host so that all drives on the interface can be kept busy. Command queueing. A disk drive’s throughput can be greatly improved by allowing multiple requests to be queued up at the drive so that the drive can have the flexibility
1Data-overrun is when the read buffer is being emptied at a slower rate than the disk drive is filling it, causing it to become
full. Data-underrun is when the write buffer is being emptied by the disk drive at a faster rate than the host is filling it, causing it to become empty.
ch20-P379751.indd 701
8/6/07 4:22:42 PM
702 Memory Systems: Cache, DRAM, Disk
of choosing which command to service next. Hence, an interface should support this feature for better performance. Command queueing will be discussed in Chapter 21.
20.2 ATA The ATA interface [see www.T13.org] is the most common interface in use in PCs and consumer electronics today, mainly due to its low cost. It is a parallel interface,2 and is starting to be explicitly referred to as PATA to distinguish it from the emerging serial ATA. The first hard disk drive to be attached to a PC was Seagate’s ST506, a 5 1_4 ” form factor 5-MB drive introduced in 1980. The drive itself had little on-board control electronics; most of the drive logic resided in the host side controller. Around the second half of the 1980s, drive manufacturers started to move the control logic from the host side to the drive side and integrate it with the drive.3 The interface for such drives was referred to as AT Attachment (ATA) interface because those drives were designed to be attached to the IBM PC-AT.4 ATA has had six revisions since ATA-1 was standardized in 1994 as ANSI standard X3.221-1994, four years after the standard was first submitted for approval. Currently, the seventh revision, ATA-8, is being worked on by the T13 Technical Committee, which was also responsible for all previous versions of ATA. Each revision usually increased the data rate over its predecessor, sometimes resolved problems with the interface definition (twice to address the maximum disk drive capacity imposed by the standard), and sometimes added new features. To accommodate devices such as CD-ROMs and tape drives which use SCSI commands, a feature was added to ATA to allow packets to be sent over the transport and physical layers. This is called ATAPI (pronounced “a-tap-pee), for ATA Packet Interface.” With this feature, SCSI commands can be placed in packets and sent over
the ATA link. This enables such devices to be added to a system without requiring a SCSI controller card. ATA interface is often referred to as ATA/ATAPI. Cabling For many years ATA used a 40-wire ribbon cable with commensurate connectors.5 A separate 4-wire cable and connector is used for power. Because the cables are unshielded, crosstalk became a problem as the clock rate increased. Today, 80-conductor cables are used, but the original 40-pin connectors are retained. The extra 40 new wires are simply ground wires added between every pair of original conductors. The maximum cable length officially approved is only 18 in. This limit makes it difficult to build a large storage system with many PATA drives. Of the original 40 pins, 6 of them are ground, 16 are for data (2 bytes of parallel transfer), and the remainder are for various signals. Topology Each ATA channel can have two drives attached to it. A typical host adapter would have two ATA channels, as shown in Figure 20.2. Only one drive on each channel can be active at a time, so I/O overlapping is not allowed and the other drive must be kept idle. Device Address A drive on an ATA channel is either Drive 0 (“Master”) or Drive 1 (“Slave”). A jumper or a dip switch on a drive is usually used to set the drive’s address. Data Transfer Modes PATA supports three different types of data transfer modes: •
PIO Mode: Program I/O mode is the original and oldest method of doing data transfer in ATA. It requires the host processor to be actively involved in transferring every 2 bytes of data. Hence, the host’s CPU utilization is high for handling transfer, reducing its ability
2Parallel in the sense that multiple signals are sent in parallel over multiple wires. 3Such drives became known as “Integrated Drive Electronics,” or IDE drives, and the name IDE was also used to refer
to the interface. Hence, IDE and ATA are oftentimes synonymous. Later, when an industry committee was formed to standardize this interface, ATA became the chosen name of the standard. 4AT stands for Advanced Technology. 5Cables are used by 3.5” form factor drives. Most 2.5” and smaller form factor drives are plugged directly into system connectors.
ch20-P379751.indd 702
8/6/07 4:22:43 PM
Chapter 20 DRIVE INTERFACE 703
Different command codes indicate which data transfer mode is to be used.
ATA Controller Primary ATA Bus
Drive 1
Drive 0
Secondary ATA Bus
Drive 0
Drive 1
FIGURE 20.2: Configuration of a dual channel ATA control.
•
•
to do multi-tasking and negatively impacting performance. The fastest PIO data rate is PIO Mode-4, which has a cycle time of 120 ns and a data rate of 16.7 MB/s. Because of its high CPU utilization, PIO is no longer used except for some lower end consumer electronics. DMA mode: First party, or bus mastering, DMA (direct memory access) is the standard DMA mode for ATA. First party means the data sender can initiate data transfer without the use of an external, or third party, DMA controller. DMA is more efficient than PIO because the host processor is not needed to babysit the mundane process of transferring data. The fastest DMA data rate is DMA Mode-2, which has a cycle time of 120 ns and a data rate of 16.7 MB/s. Ultra DMA Mode: The double transition clocking technique was introduced with the Ultra DMA Mode as a means of further increasing data rate. Instead of doing one data transfer on the rising edge of each clock cycle, as is done in DMA mode, data is transferred on both the rising and the falling edges, thus doubling the data rate for a given clock rate. Adding CRC to data transferring using Ultra DMA greatly improves data integrity. Today, Ultra DMA is the dominant mode of data transfer of ATA, with 133 MB/s (30 ns cycle time) being the current fastest Ultra DMA mode, commonly referred to as Ultra/133.
Error Checking ATA initially had no error checking, but CRC was added in Ultra DMA to check for errors in transmission. Command Delivery The host sends an I/O request to an ATA drive via a set of registers. Command parameters (address6 and block size) are first written into the appropriate registers by the host, and then, by writing a command code into the command register, the disk drive is triggered into action to service the command. Status registers are used by the drive to communicate back to the host, such as to indicate that it is busy or that a requested command has been completed. Command Queueing A simple form of tagged command queueing was defined in ATA-5, but was not widely used. Only IBM/Hitachi-GST implemented this ATA command queueing in their 3.5” drives.
20.3 Serial ATA Several factors drove the development of serial ATA (SATA) [see www.serialata.org]. As discussed previously, the ribbon cable of PATA is unshielded, and its susceptibility to crosstalk makes increasing data rate difficult. Furthermore, as the clock rate goes up, the window for all the parallel signals to arrive without skew becomes narrower and difficult to achieve. The wide ribbon cable, which takes up a lot of room, and its short distance place a great deal of limitations on what storage subsystem can be constructed out of PATA drives. The answer is to use a simple serial link that transmits one bit at a time, but at a much faster clock rate. SATA was created so that the lower transport and link level of PATA could be replaced with a much faster serial link, while at the same time keeping the upper logical layer of ATA. In this way, a new SATA controller and a new SATA drive can be transparently introduced to an existing system to improve performance.
6
Address can be given in either the older Cylinder-Head-Sector format or the newer and more commonly used Logical Block Address format.
ch20-P379751.indd 703
8/6/07 4:22:43 PM
704 Memory Systems: Cache, DRAM, Disk
Any host that already can deal with standard ATA devices will not have to change any of its software. Similarly, for drive manufacturers, only the physical layer of ATA drives needs to be replaced; the logical layer remains intact. In fact, the first generation of SATA drives was typically built by using PATA drives and adding a bridge circuit to convert between the SATA and PATA physical layers. A new private consortium, called SATA-IO, was formed to develop the standards for SATA, and is currently working on the second generation SATA-2 standard. SATA-1 was adopted by the T13 standards committee as part of the ATA-7 standard. Cabling SATA uses a 7-wire cable for signals and a separate 5-wire cable for power. Three of the wires of the signal cable are for ground, while the other four wires are two pairs of differential signals—one pair for transmitting in each direction. The maximum cable length is 1 m from controller to drive, but can be as long as 6 m between external boxes, which is one of the reasons why using SATA drives is becoming an attractive option for building large storage systems. Topology SATA is a point-to-point interface, meaning that each SATA port in the host system can have only one SATA drive connected to it. Thus, each SATA drive has its own cable. Of course, a SATA controller card can have multiple ports on it. Four ports is a typical number; Figure 20.3 illustrates such a configuration. Additionally, a SATA port can be expanded by a port multiplier which enables
SATA Controller
Drive 0
Drive 0
Drive 0
Drive 0
FIGURE 20.3: Configuration of a four-port SATA controller.
ch20-P379751.indd 704
system builders to attach up to 15 SATA devices to a single port. Device Address Since only one drive can be attached to a SATA cable, there is no need to give an identification to the drive. Data Transfer Modes The raw speed of the first-generation SATA link is 1.5 Gbps; 8b/10b encoding is used over the link. Hence, this translates to a user data rate of 150 MB/s. SATA controllers and disk drives with 3 Gbps are starting to appear, with 6 Gbps on SATA’s roadmap. Since SATA is emulating the ATA logical interface on top of a serial physical interface, all the original ATA commands are supported. This means commands that explicitly call for one of the ATA transfer modes, such as PIO, will still work and at close to the much higher speed of the underlying SATA link. Error Checking SATA uses CRC to check for errors in transmission. Command Delivery Since SATA’s upper level interface is still the standard ATA logical layer (called the application layer), software communicates by using exactly the same protocol as ATA. However, at the lower level interface, that same information is delivered using a different mechanism. Packets, called Frame Information Structures (FIS), are used to send command and status information serially over the link. Unique FIS codes are used to specify the format and content of an FIS, as well as its purpose. For example, a command FIS will contain the ATA command code along with the parameters of the command such as address and block size. Command Queueing SATA-1 uses the same tagged command queueing that has been defined for ATA. In SATA-2, a newer standard for command queueing, called Native Command Queueing (NCQ), is added. A maximum queue depth of 32 is supported in NCQ, which is adequate.
8/6/07 4:22:44 PM
Chapter 20 DRIVE INTERFACE 705
20.4 SCSI The Small Computer Systems Interface (SCSI) [see www.T10.org] is a more advanced interface with some functionalities and features not available in ATA. Consequentially, it is a more costly interface to implement and, hence, is not as widely used in lower end systems where those added functionalities and features are not needed. Because it is a more costly interface, it tends to be used by more expensive devices, such as disk drives with higher performance and reliability than those used in PCs. Thus, SCSI is more likely to be used in higher end workstations and servers, due to both its functionalities and the availability of disk drives with higher performance and reliability. SCSI consists of an upper logical protocol level and a lower transmission level. It originated from a simple interface called the Shugart Associates Systems Interface (SASI) back in 1979 and has since evolved into a much more complex and advanced interface. Today, it is an ANSI standard under the control of the T10 technical standards committee. The term SCSI, unless otherwise noted, usually implies the original parallel transmission interface (officially referred to as SCSI Parallel Interface or SPI). In a sense, SCSI can be considered to be a collection of several related interface standards grouped under its umbrella label, and they are all governed by an SCSI Architecture Model (SAM)—a common ground for all the sub-standards. SCSI logical protocol using a serial transmission link will be discussed in Section 20.5, and Section 20.6 talks about SCSI logical protocol running on top of Fibre Channel link. Cabling There are a multitude of options for cables and connectors in SCSI. First, SCSI cable can be “narrow” (for 8-bit parallel data transfer) or “wide.” For a given transmission speed, a wide cable has twice the data rate of a narrow cable. There are 50 conductors in a narrow cable and 68 conductors in a wide cable. SCSI cables need to be properly terminated to eliminate reflec-
tion of signals. Next, three different signaling methods can be used in SCSI, viz., single ended (SE), high voltage differential (HVD), and low voltage differential (LVD). The maximum cable length depends on the number of devices on the cable and the signaling method. Higher speed transfer modes require LVD signaling and are not supported for narrow cable. Therefore, today’s SCSI cable is typically wide and uses LVD, with a maximum length of up to 12 m. Finally, cables can be used internally inside a box or externally outside a box. External cables are shielded and better constructed than internal cables as they have to work in an unprotected environment. To add to all of these choices, there are at least eight different types of SCSI connectors, four each for external and internal connections. SCSI devices use the same four conductor power cables and connectors as ATA. Topology SCSI is really a bus, allowing multiple devices7 to be connected to the same cable. Every device on the bus has a unique ID which is assigned a priority. Since each data line in a cable is also used by one device to signal its ID, up to 8 devices can be attached to a narrow bus and 16 devices to a wide bus. The host controller8 is one of the devices and is usually assigned the highest priority. Figure 20.4 illustrates a simple SCSI configuration with four disk drives. Because multiple devices share the same bus, they must arbitrate for the bus before the winning device (the one with the highest priority arbitrating) can put data on the bus. Each device on the SCSI bus can have up to eight logical units, called LUNs, associated with it—LUN 0 is required and is the default if the device itself is the only LUN. With this feature, it is possible to attach a new type of controller onto the SCSI bus as a device on that bus and connect multiple drives to that controller.
7
In SCSI speak, a device is anything that can be attached to the SCSI bus, including the host bus adapter and, of course, disk drive. 8 In SCSI speak, a host or host controller is called the “initiator,” and a storage device is called the “target.”
ch20-P379751.indd 705
8/6/07 4:22:45 PM
706 Memory Systems: Cache, DRAM, Disk
the drive, and other messages are transmitted asynchronously between the host controller and the disk drive.
SCSI Controller (Device 7)
Terminator
Drive 4
Drive 2
Drive 0
Drive 5
FIGURE 20.4: Configuration of an SCSI controller with four disk drives. Note the drive numbers can be in any order. Device Address A device on an SCSI bus must have a unique device address between 0 and 7/15 (narrow/wide). A jumper or a dip switch on a drive is usually used to set the drive’s address. Data Transfer Modes The data speed of SCSI has been doubled five times since its beginning. Naming convention has also evolved: Original SCSI (5 MB/s) and Wide SCSI (10 MB/s); Fast SCSI (10 MB/s) and Fast Wide SCSI (20 MB/s); Ultra or Fast-20 SCSI (20 MB/s) and Fast-20 Wide SCSI (40 MB/s); Ultra2 or Fast-40 SCSI (40 MB/s) and Fast-40 Wide SCSI (80 MB/s); Ultra3 or Ultra160 SCSI (160 MB/s, wide only); and Ultra4 or Ultra320 SCSI (320 MB/s, wide only). Error Checking SCSI uses parity across 8 parallel bits of data for checking errors in transmission. CRC was also added in Ultra3. Command Delivery The SCSI logical layer protocol uses a data structure called command descriptor block (CDB) to communicate to the device or disk drive an I/O request. The CDB contains the command code and the parameters such as address and sector count. SCSI uses LBA only to specify addresses; the CHS format was never used. A CDB can be 6, 10, 12, or 16 bytes long, depending on how much accompanying information needs to be transferred along with the command. CDBs, status information from
Command Queueing Tagged command queueing is supported since SCSI-2 and has been in use for quite some time, especially in servers which typically support many users and, hence, can benefit the most from command queueing. SCSI devices are free to implement a maximum queue depth of any size up to 256, though there is not much benefit for greater than 64 or even just 32.
20.5 Serial SCSI Serial Attached SCSI (SAS) came about for the same reason that SATA was created, viz., use serial link technology to overcome the drawbacks of a parallel interface while retaining the logical layer of the interface so that existing software does not have to be changed to take advantage of new hardware that incorporates this better performing technology. In fact, SAS development people smartly took advantage of the technical work already carried out by the SATA developers and leveraged off of the technical solutions SATA came up with. As a result, SAS quickly got off the ground, using cables and connectors and other link layer designs that are essentially borrowed from SATA. In 2002, about a year after an initial brainstorming meeting to create SAS, it gained rapid approval from the ANSI INCITS T10 technical committee to be a part of the SCSI family of standards. In a nutshell, one can consider SAS to be an interface using the SCSI upper level command set on a lower level serial transmission. By adopting similar cables and connectors as SATA, it is possible to mix both SAS and SATA disk drives in the same SAS domain, as the transmission layers of the SAS lower level interface are designed to be compatible with those of the SATA interface. Thus, a host system with an SAS domain can use the SATA Tunneling Protocol 9 (STP) to talk to a SATA disk drive that is attached using a STP/SATA bridge.
9One of three possible protocols in SAS. The other two are Serial SCSI Protocol (SSP) for talking to SAS devices and SAS
Management Protocol (SMP) for talking to expanders.
ch20-P379751.indd 706
8/6/07 4:22:45 PM
Chapter 20 DRIVE INTERFACE 707
Cabling SAS uses a similar type of cable and connector as SATA, except, as discussed next, SAS devices are dual ported. For each port, two pairs of wires carry differential signals—one pair for transmitting in each direction. The maximum cable length is 1 m from controller to drive. External cables can be as long as 8 m between boxes, which is quite adequate for storage systems with racks of disk drives. Topology Like SATA, SAS is a point-to-point interface, meaning that each SAS port can have only one SAS device connected to it. However, an SAS device is dual ported, with each port connected with a separate physical link so that the device can be accessed by two different hosts or devices independently. This feature provides fail-over capability for improved reliability/availability as is required by high-end systems. The SAS standard also provides for optional expander devices which enable a very large SAS domain to be constructed—up to 16,256 devices. Two types of expanders are defined: fan-out expanders (maximum of one per SAS domain) and edge expanders. Both types of expanders have 128 ports each. Any combination of edge expanders, host adapters, and storage devices can be attached to a fan-out expander, but an edge expander can have no more than one other expander attached to it. Figure 20.5 shows a very simple SAS domain with just one edge expander and four disk drives. Device Address Each SAS port has a worldwide unique identifier, and each SAS device has a worldwide unique name, both of which are used as SAS addresses. The SAS address conforms to the NAA (Name Address Authority) IEEE Registered format identification descriptor, with 4 bits of NAA ID, 24 bits for IEEE company ID, and 36 bits for vendor-specific ID. Since an SATA drive does not have a worldwide name, it is provided by the expander to which it is attached.
Host Controller
Drive
Edge Expander
Drive
Drive
Drive
FIGURE 20.5: Configuration of an SAS domain with one edge expander and four disk drives. Data Transfer Modes SAS leapfrogged SATA and provided for a transmission speed of 3.0 Gbps for the first-generation SAS. With 8b/10b encoding, this translates to a user data rate of 300 MB/s. However, the SAS physical link is full-duplex (data transmission can take place simultaneously in each direction over the two differential signal pairs), unlike SATA which is defined to be half-duplex. Therefore, the first-generation SAS has a theoretical bandwidth of 600 MB/s per physical link. A transmission speed of 600 MB/s (theoretical bandwidth of 1.2 GB/s per link) is being planned for the second-generation SAS. Command Delivery Since SAS’s upper level interface is the standard SCSI logical layer, software communicates by using exactly the same protocol as SCSI. Packets, called “frames,” are used to send command and status information serially over the link. The payload of the packets consists of CDBs and other SCSI constructs. Command Queueing SAS uses the same command queueing as parallel SCSI.
20.6 Fibre Channel Fibre10 Channel (FC) [see www.fibrechannel.org] is a high-end, feature-rich, serial interface. It is a transmission-level protocol which carries a higher cost than the interfaces discussed previously. Although
10Spelling reflects the European origins of the standard.
ch20-P379751.indd 707
8/6/07 4:22:46 PM
708 Memory Systems: Cache, DRAM, Disk
presented last in this chapter, it actual predates SATA an SAS. It was designed originally to operate over fiber optic physical links and hence its name. Later, it evolved to add support for using the same transport and link protocols over copper wiring physical links, but the name Fibre Channel remains. Different logical level protocols can run on top of FC,11 but SCSI is, by far, dominant. In fact, FC is part of the SCSI-3 family of standards. Cabling Different types of copper cables can be used, including coaxial wires and shielded twisted pair. For fiber, 62.5 m multi-mode, 50 m multi-mode, and single mode are the choices. Both short-wave and long-wave lasers can be used. With copper cabling, FC can have a range of 30 m, enough to locate storage devices in a different room or even on a different floor from the host system. With optical cabling, the range can be as long as 50 km, easily covering a campus or even a city. Topology Each FC device, called a node, has two serial paths, one for receiving and one for transmitting. The most basic configuration is that of a point-to-point connection between two nodes, as shown in Figure 20.6(a). However, such a configuration is seldom used in storage applications. Instead, a much more common topology is one in which all the devices are connected in a loop as shown in Figure 20.6(b). Up to 127 nodes can be connected in a loop configuration. Any device on the loop that wishes to transmit data must first arbitrate and gain control of the loop. Hence, this topology is known as the Fibre Channel Arbitrated Loop, or FC-AL, topology. A third topology, one which gives FC its power and flexibility as a high-end storage interface, is the switched fabric topology. With a fabric, any port in the fabric can communicate with any other port in the fabric by means of a cross-point switch. A single FC node can be connected to a port in a fabric in a pointto-point connection. Alternatively, a port in the
fabric can be part of an arbitrated loop. Furthermore, two switches can be connected together, essentially forming a bigger fabric. All of these are as illustrated in Figure 20.7. FC drives are used exclusively in highperformance systems, which also demand high reliability and availability where higher drive cost can be justified. To improve overall system reliability, FC drives are typically dual ported, meaning that a drive can be connected to two different links. Thus, if one link is down, the drive can still be accessed using the second link. With FC-AL, it is customary to have data flow in opposite directions in a dual loop setup. Device Address Standard FC addressing uses a 3-byte address identifier. For FC-AL, the low byte is used as an Arbitrated Loop Physical Address (AL_PA). Only 127 of the 256 addresses are valid, and the remainder are reserved for special FC-AL functions. AL_PA is dynamically assigned to each FC node in an FC-AL loop at initialization. Numerically lower AL_PA has higher priority. AL_PA 0 has the highest priority and is used for the fabric connection to the loop, which will win initialization select to become the loop master. The upper 2 bytes of an FC port’s address are assigned by the fabric; they are set to “0000” if no fabric exists on the loop. Data Transfer Modes The original definition of FC was for a 1 Gbps (100 MB/s) link.12 Since then, the data rate has been doubled to 200 MB/s, with 400 MB/s in the works. As in other serial links used for storage devices, the IBM 8b/10b encoding scheme is used for transmission. Command Delivery Packets, or frames, are used to deliver command and status information of the upper level protocol serially over the link. As mentioned earlier, SCSI command protocol is typically used today, although initially other interface standards such as HIPPI
11 One of the original goals of FC was to allow HIPPI (High-Performance Parallel Interface) to map to it. 12Half speed, quarter speed, and eighth speed were also defined.
ch20-P379751.indd 708
8/6/07 4:22:47 PM
Chapter 20 DRIVE INTERFACE 709
T
FC Node
R
T
T
R
FC Node
(a)
R
R
T
Drive
FC-AL Controller
R
T
Drive
R
R
T
Drive
T
Drive
(b)
FIGURE 20.6: FC topology: (a) point-to-point topology, and (b) loop topology showing a host controller and four FC disk drives in the loop.
FC Node
R
T
T
R
T
R
R
T
FC Node
R
T
T
R
FC Node
R
T
T
R
Fabric
FC Node
T
T
R
R
R
T
T
R
R
T
T
T
R
R
T R
FC Node
Fabric
R
T
T
R
T
R
R
T
FC Node
FC Node
FC Node
FIGURE 20.7: Example of FC Fabric Topology. Two switches are connected in this example. Each fabric has a point-to-point connection and an FC-AL loop connection.
and IPI-3 were also used. Frames are variable lengths, consisting of 36 bytes of overhead and from 128 up to 2112 bytes (2048 bytes of data plus 64 bytes of optional header), in increments of 4, of payload for a total maximum size of 2148 bytes. For improved data reliability, a 4-byte CRC is used. Each port is required to have a frame buffer, so the maximum frame size is dictated by the smallest of the frame buffer sizes at all the ports involved.
20.7
Cost, Performance, and Reliability
It is generally true today that SCSI and FC disk drives13 cost significantly more, have higher performance, and are more reliable than ATA drives. It is a common misconception that these are all inherently due to the interface difference. While it is true that SCSI and FC interface electronics are more costly to implement on a disk drive than the simple ATA, it is nowhere near the two to three times cost difference
13SCSI drives and FC drives from the same manufacturer generally share the same HDA and have similar electronics except for the interface.
ch20-P379751.indd 709
8/6/07 4:22:48 PM
710 Memory Systems: Cache, DRAM, Disk
between those two classes of drives. The main reason for the large difference in cost is that, because it also costs more to implement SCSI or FC in a system than ATA, SCSI and FC are used only in systems that really need their more advanced features and flexibility. These are the systems that require higher performance and reliability, and these are the more expensive higher end systems which can afford to pay more for such capabilities in a disk drive. It is to deliver higher performance and reliability in SCSI/FC drives that makes them more expensive—the design and engineering are more difficult, components used are more expensive, and manufacturing and testing also take longer and cost more [Anderson et al. 2003]. SCSI and FC disk drives generally have higher raw performance, while ATA drives have lower performance. SCSI drives are either 10K or 15K rpm and have sub-4 ms average seek times, while ATA drives are usually 7200 rpm or lower and have average seek times of around 8 ms. The media data rate tends to be not that much different because 3.5” ATA drives use 95-mm diameter disks, while SCSI drives with the same form factor actually use smaller diameter disks (75 mm for 10K rpm and 65 mm for 15K rpm) internally. Whether the difference in interface in itself makes a difference to user’s I/O performance depends on the application environment and workload. •
ch20-P379751.indd 710
A difference in interface speed does not matter for random I/O accesses, as
•
•
discussed in Chapter 19, since the data transfer time is an insignificant component of the total I/O time for such accesses. Even for long sequential accesses, there is little benefit in having a faster interface speed as long as it is faster than the disk’s media data rate, as the sustained data rate is then gated by how fast data can be transferred to/from the disk. For a simple single user and low level of multi-tasking, when most I/O requests show up at the disk drive one at a time synchronously, ATA will outperform SCSI because it has less overhead. In the parallel interface world, SCSI will have a definite advantage over ATA in terms of throughput for a multi-user environment, in that it can support concurrent I/Os in multiple drives on the same channel, and commands can also be queued up in each drive.
As serial interfaces are starting to replace parallel interfaces, the difference in performance between SATA and SAS will likely become insignificant. This is especially true as SATA drives implement NCQ. Not only does having command queueing in SATA level the playing field between ATA and SCSI interfaces, command queueing in itself can also reduce the performance advantage of a higher rpm of SCSI drives, as we will discuss in Chapter 21.
8/6/07 4:22:49 PM