Optical Fiber Technology 17 (2011) 512–517
Contents lists available at SciVerse ScienceDirect
Optical Fiber Technology www.elsevier.com/locate/yofte
Invited Papers
Standardization efforts for 100 Gigabit Ethernet and beyond John D’Ambrosia ⇑,1, Kapil Shrikhande 2 CTO Office, Force10 Networks, San Jose, CA, United States
a r t i c l e
i n f o
Article history: Available online 5 October 2011 Keywords: 100 Gigabit Ethernet (100 GbE) Terabit Ethernet 100GBASE-LR4 Ethernet 400 Gb/s Ethernet Standards
a b s t r a c t The focus of this paper is 100 GbE and beyond. An initial overview of the current 100 GbE specification is provided. New electrical and optical implementations of the specification are considered, as well as other 100 GbE physical layer specifications. Finally, the paper looks to the future and explores the issues associated with beginning the next speed of Ethernet. Ó 2011 Elsevier Inc. All rights reserved.
1. Introduction The ratification of IEEE Std 802.3ba™-2010 raised Ethernet’s top rate beyond 10 to 40 Gb/s and 100 Gb/s. This standard, which significantly leverages existing electrical and optical technology, introduced an architecture that is scalable and flexible, which will enable future implementations that leverage future electrical and optical technology developments. These implementations will be built on the back of future standardization efforts that will enable lower cost, higher density 100 Gigabit Ethernet-based systems. This article will provide a basic overview of IEEE Std 802.3ba™2010 architecture. The establishment of this baseline will allow the exploration of how future technologies could play a role in the development of future implementations 100 Gigabit Ethernet and related physical layer standards. The final part of the article will focus on standardization activities related to the future of Ethernet beyond 100 Gb/s. 2. Architecture Fig. 1 illustrates the new overall architecture that supports both 40 Gigabit Ethernet and 100 Gigabit Ethernet. While all of the ⇑ Corresponding author. E-mail addresses:
[email protected] (J. D’Ambrosia),
[email protected] (K. Shrikhande). 1 John D’Ambrosia is the chair of the IEEE P802.3bj 100 Gb/s Backplane and Copper Cable Task Force and the IEEE 802.3 Ethernet Bandwidth Assessment ad hoc. For more information on the IEEE P802.3bj 100 Gb/s Backplane and Copper Cable Task Force see the following URL: http://www.ieee802.org/3/bj/index.html. For more information on the IEEE 802.3 Ethernet Bandwidth Assessment ad hoc, see the following URL - http:// www.ieee802.org/3/ad_hoc/bwa/index.html. 2 Kapil Shrikhande is vice-chair of the IEEE Next Generation 100Gb/s Optical Ethernet Study Group. For more information on the Study Group, see the following URL: http://www.ieee802.org/3/100GNGOPTX/index.html. 1068-5200/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.yofte.2011.06.010
Physical Layer Specifications (PHYs) have a Physical Coding (PCS) sub-layer, Physical Medium Attachment (PMA) sub-layer, and a Physical Medium Dependent (PMD) sub-layer, only the copper cable (-CR) and backplane (-KR) PHYs have an Auto-Negotiation (AN) sub-layer and an optional Forward Error Correction (FEC) sub-layer. For 40 Gigabit Ethernet the respective PCS and PMA sub-layers support PMDs that operate electrically across four differential pairs in each direction or optically across four optical fibers or four wavelengths in each direction. The architecture can also support other 40 Gigabit PMDs that could operate either across two lanes or a single serial lane, such as the current IEEE P802.3bg project, which is specifying a serial 40 Gb/s optical PMD. Likewise, for 100 Gigabit Ethernet the respective PCS and PMA sub-layers support PMDs that operate electrically across 10 differential pairs in each direction or optically across 10 optical fibers or four optical wavelengths in each direction. This same architecture could also support other 100 Gigabit PMDs that might potentially operate across five lanes, two lanes or a single serial lane. The PCS sub-layer couples the respective Media Independent Interface (MII) to the PMA sub-layer. For 40 Gigabit Ethernet, the MII is called ‘‘XLGMII,’’ and for 100 Gigabit Ethernet, the MII is called ‘‘CGMII.’’ The aggregate stream coming from the MII into the PCS sub-layer undergoes the 64B/66B coding scheme that was used in 10 Gigabit Ethernet. Using a round robin distribution scheme, 66-bit blocks are then distributed across multiple lanes, referred to as ‘‘PCS Lanes,’’ each with a unique lane marker, which is periodically inserted. This is illustrated in Fig. 2. The PMA sub-layer, which is the intermediary sub-layer between the PCS and the PMD, provides the multiplexing function that is responsible for converting the number of PCS lanes to the appropriate number of lanes or channels needed by the PMD. There are four PCS lanes for 40 Gigabit Ethernet and twenty PCS lanes for
513
J. D’Ambrosia, K. Shrikhande / Optical Fiber Technology 17 (2011) 512–517
there are 10 transmit differential pairs and 10 differential receive pairs. In addition to the optional retimed AUI interface, the 40 Gigabit Ethernet and 100 Gigabit Ethernet architecture provides for a nonretimed parallel physical interface (PPI) for either rate between the PMA and PMD sub-layers (or in implementation terms – at the input to a module). For 40 Gigabit Ethernet the interface is known as XLPPI, while the 100 Gigabit Ethernet interface is known as CPPI. These interfaces are non-retimed, utilize 64B/66B encoding, and have been specified to support approximately 100 mm or 4.4 dB of insertion loss on a host board (not including the connector) at 5.15625 GHz. The XLPPI is based on four transmit and four receive differential pairs of 10 Gb/s, while the CPPI is based on 10 transmit and 10 receive differential pairs lanes of 10 Gb/s. Fig. 1. IEEE Std 802.3ba-2010 architecture.
3. The PHysical Layer Specifications (PHYs) 100 Gigabit Ethernet. The input of the PMA sub-layer essentially multiplexes/de-multiplexes the input lanes back to the number of PCS lanes for the given rate while the output stage then converts the PCS lanes to the appropriate number of lanes needed. In this multiplexing scheme, regardless of how the PCS lanes get multiplexed together, all bits from the same PCS lane will follow the same physical path. Therefore, the PMA sub-layer will de-multiplex the lanes back to the original PCS lanes at which point the PCS sub-layer can then perform a de-skewing operation to re-align the PCS lanes, which is assisted by the unique lane markers that was periodically inserted into each PCS lane. The PCS lanes can then be put back into their original order, at which point the original aggregate stream can be reconstructed. Therefore, the functionality embedded in the PCS and PMA represent a two stage process that couples the respective MII to the different 40 Gigabit Ethernet and 100 Gigabit Ethernet PMDs. It is possible to have up to four instances of a PMA sub-layer in a given configuration, which is critical to the architecture’s flexibility in developing various implementations, as a PMA sub-layer will exist on both sides of the respective Attachment Unit Interface (AUI), which is an optional retimed physical interface. For 40 Gigabit Ethernet, the AUI is called ‘‘XLAUI’’ (‘‘XL’’ is the Roman numeral for 40). For 100 Gigabit Ethernet, the AUI is called ‘‘CAUI’’ (‘‘C’’ is the Roman numeral for 100). These interfaces are used for portioning the system design, and have been specified to support approximately 25 cm over a printed circuit board including one connector for chip-to-chip and chip-to-module applications. Each lane operates at an effective data rate of 10 Gb/s. For a XLAUI interface, there are four transmit pairs and four receive pairs. For a CAUI interface,
The 40 Gigabit Ethernet was initially conceived to target computing applications, and the PHYs developed for that application space support distances up to 150 m for a full range of server form factors including blade, rack, and pedestal configurations. The 100 Gigabit Ethernet was conceived initially as targeting network aggregation applications, and the PHYs developed for that application space support distances and media types appropriate for data center networking, as well as service provider intra-office and inter-office connections. It should be noted that 40 Gigabit Ethernet PHYs targeting longer reaches for network applications were also ultimately developed. Table 1 provides a summary of the different physical layer specifications that were developed as part of ultimately targeted by the task force with their respective port type names. As this article is focused on 100 Gigabit Ethernet and beyond, provided below is a description of each of the different PHYs in the 100 GbE family: The 100GBASE-CR10 PHY supports transmission of 100 GbE over 7 m of twin-axial copper cable across 10 differential pair in each direction. The PHY leverages past work from the Backplane Ethernet project by utilizing the 10GBASE-KR architecture, channel budget, and Physical Medium Dependent Sublayer (PMD). The 100GBASE-SR10 PHY is based on 850 nm multi-mode fiber (MMF) optical technology and supports transmission of 100 GbE across 10 parallel fibers in each direction. The effective date rate per lane is 10 Gb/s. Optical Multimode 3 (OM3) grade fiber, which has an effective modal bandwidth of 2000 MHz/km, can support reaches up to at least 100 m, while Optical
M1
M2
Aggregate Stream of 64/66b words
#2n+1
#2n
#n+2
#n+1
#n
#2
#2n+1
#n+1
#2n+2
#n+2
#3n
#2n
#1
#2
M1
PCS Lane 1
M2
PCS Lane 2
Mn M
PCS Lane n
#1
= 66-bit word Simple 66b word round robin
Mn M
#n
Lane markers Fig. 2. PCS lane distribution concept [1].
514
J. D’Ambrosia, K. Shrikhande / Optical Fiber Technology 17 (2011) 512–517
Table 1 Summary of 40 Gigabit Ethernet and 100 Gigabit Ethernet physical layer specifications. Port type
Reach
40GBASE-KR4
At least 1 m backplane
40GBASE-CR4 100GBASE-CR10
At least 7 m cu cable
40GBASE-SR4 100GBASE-SR10
At least 100 m OM3 MMF At least 150 m OM4 MMF
40GBASE-FR
At least 2 km SMF
40GBASE-LR4
At least 10 km SMF
100GBASE-LR4
At least 10 km SMF
100GBASE-ER4
At least 40 km SMF
40 Gigabit Ethernet p
100 Gigabit Ethernet
p
p
p
p
Description 4 10 Gb/s
p
‘‘n’’ 10 Gb/s ‘‘n’’ 10 Gb/s (Use of parallel fiber) 1 40 Gb/s
p
4 10 Gb/s p p
4 25 Gb/s 4 25 Gb/s
This PHY was developed as part of the IEEE P802.3bg project, which was ratified on March 31, 2011.
Multimode 4 (OM4) grade fiber, which has a an effective modal bandwidth of 4700 MHz/km, can support reaches up to at least 125 m. The 100GBASE-LR4 PHY is based on Dense Wave Division Multiplexing (DWDM) technology and supports transmission of at least 10 km over a pair of single-mode fibers (SMF). The four center wavelengths are 1295 nm, 1300 nm, 1305 nm, and 1310 nm. The center frequencies are spaced at 800 GHz, and are members of the frequency grid for 100 GHz spacing and above defined in ITU-T G.694.1. The effective data rate per lambda is 25 Gb/s. Therefore, the 100GBASE-LR4 PMD supports transmission of 100 GbE over four wavelengths on a single SMF in each direction. The 100GBASE-ER4 PHY is also based on DWDM technology and supports transmission of at least 40 km over a pair of singlemode fibers. The four center wavelengths are 1295 nm, 1300 nm, 1305 nm, and 1310 nm. The center frequencies are spaced at 800 GHz, and are members of the frequency grid for 100 GHz spacing and above defined in ITU-T G.694.1. The effective data rate per lambda is 25 Gb/s. Therefore, the 100GBASELR4 PMD supports transmission of 100 GbE over four wavelengths on a single SMF in each direction. To achieve the 40 km reaches called for, it is anticipated that implementations may need to include semiconductor optical amplifier (SOA) technology. While the architecture developed to support 100 GbE is flexible and scalable, first generation implementations of the physical layer specifications are limited by the technology that was available at the time the standard was written. As early adopters look to the initial deployment of systems supporting 100 GbE to provide relief for congested networks, others are looking to the development of the next generation of electrical and optical signaling technologies that will enable reductions in 100 GbE port cost and power, while simultaneously maximizing the usable port densities per system. These same technologies will also provide the building blocks for the next rate of Ethernet.
4. The first generation of 100 GbE – using what is available First generation implementations of specifications tend to be non-optimal. The first generation of CFP modules that implemented 100GBASE-LR4 illustrates this issue. This form factor, developed by the CFP MSA, leverages the 10 10 CAUI specification. The 20 differential pairs of this electrical interface drives the width of the form factor and results in a mismatch between the electrical input and the optical four lambda output of the module. While the 100 GbE architecture can accommodate via the use of a PMA sub-layer. The use of a 10:4 serializer on the transmit side of the module and a 4:10 de-serializer on the receive side of the
module yields higher power consumption, due to their non-integer ratio [2]. This is better illustrated in Fig. 3. The additional power consumption also influences the module form factor size to deal with the heat dissipation of the implementation of the module. The results – the CFP module is specified at 144.75 mm deep by 82 mm width [3]. The width of the CFP is over two times larger than the width of the XENPAK module, which was one on the earlier form factors for 10 Gigabit Ethernet (10 GbE). The IEEE P802.3ba task force leveraged 10 Gb/s technologies in the development of IEEE Std 802.3ba-2010 standard. WHile the 100GBASE-LR4 example cited above demonstrates how the utilization of currently available technology could lead to a sub-optimal implementation, it should still be pointed out that at least an implementation was still possible. 5. The next generation of 100 GbE – developing what is needed Fig. 4 illustrates how the development of a retimed 25 Gb/s electrical interface could be applied to a next generation module implementation of 100GBASE-LR4 [2]. Reducing the width and number of pins of the electrical interface and reducing the power by eliminating the serializer/deserializer in the module could help enable a smaller module size approximately ½ the width of the current implementation. Furthermore, using a non-retimed CPPI interface to the module might further reduce power and the size of the module, due to the elimination of the additional CDR components. Either of these concepts for a smaller module would help to enable lower power and higher port counts, enabling integration to help drive per port costs downward. Communications between the IEEE 802.3 and the Optical Internetworking Forum (OIF) in 2010 indicated an industry interest in the development of 25 Gb/s electrical signaling. The OIF, which has been working on two signaling specifications related to 25 Gb/s signaling - the CEI-25G-LR (intended for backplane applications) and the CEI-28G-SR (intended for chip-to-chip applications) specifications, requested feedback from the IEEE 802.3 on its CEI-28G-VSR project, which is intended as a chip-to-module interface. In its response the IEEE 802.3 noted the need for both retimed and non-retimed interfaces, but noted that non- or partially retimed interfaces need to consider the total end-to-end link budget, and therefore the OIF should initially focus on the development of a retimed interface [4]. Those interested in the OIF CEI25G/28G projects are directed to the OIF for further information. Please see www.oiforum.com. 6. Growing the 100 Gigabit Ethernet family As noted above, the introduction of 25 Gb/s electrical signaling for chip-to-module applications would enable the next generation
515
J. D’Ambrosia, K. Shrikhande / Optical Fiber Technology 17 (2011) 512–517
TX_DIS REFCLK TXLANE9 TXLANE8 TXLANE7 TXLANE6 TXLANE5 TXLANE4 TXLANE3 TXLANE2 TXLANE1 TXLANE0
25G 25G
10:4 Serializer
25G 25G
MD
EML
MD
EML
MD
EML
MD
EML
4:1 WDM MUX
SMF
1:4 WDM DeMUX
SMF
TEC 10x10G RXLANE9 RXLANE8 RXLANE7 RXLANE6 RXLANE5 RXLANE4 RXLANE3 RXLANE2 RXLANE1 RXLANE0
25G
4:10 DeSerializer
25G 25G 25G
TIA
PIN
TIA
PIN
TIA
PIN
TIA
PIN
RX_RCLK RX_LOS Firmware I/O
Micro-Controller Hardware I/O
Fig. 3. Generation 1 100GBASE-LR4 implementation concept.
TX_DIS REFCLK
TXLANE3 TXLANE2
CDR CDR
TXLANE1
CDR
TXLANE0
CDR
25G 25G 25G 25G
LD
DML
LD
DML
LD
DML
LD
DML
4:1 WDM MUX
SMF
1:4 WDM DeMUX
SMF
TEC 4x25G
RXLANE3
CDR
RXLANE2
CDR
RXLANE1
CDR
RXLANE0
CDR
25G 25G 25G 25G
TIA
PIN
TIA
PIN
TIA
PIN
TIA
PIN
RX_RCLK RX_LOS Firmware I/O
Micro-Controller Hardware I/O
Fig. 4. Next generation 100GBASE-LR4 implementation concept.
of 100GBASE-LR4 modules. As noted above, the OIF has its CEI-28G-VSR effort, which could be the potentially enabling technology for this next generation of modules. What about the 100 Gigabit Ethernet Family of physical layer specifications? What efforts are underway? Table 2 provides a summary of potential efforts. Some of these efforts are actually underway at the time of writing, while others are speculative on the author’s part. There are four main efforts, which will be described in the following paragraphs. The first effort entails 100 Gigabit Ethernet across an electrical backplane and a narrower twin-axial copper cable assembly than
the 100GBASE-CR10 PHY specified in IEEE Std 802.3ba-2010. The first step in the stage of developing a standard in the IEEE is the ‘‘Study Group’’ phase. The normal output at this stage is the generation of the project documentation with a key aspect being the identification of project objectives. The first study group meeting was in January of this year, and, as this article is being written, the group is wrestling with the project objectives. The key questions being asked are what will the reaches across the backplane and copper cabling be? The study group is looking at a four lane PHY approach, which implies approximate 25 Gb/s signalling (the actual rate will be dependent upon the actual solution that gets
516
J. D’Ambrosia, K. Shrikhande / Optical Fiber Technology 17 (2011) 512–517
Table 2 Growing the 100 Gigabit Ethernet family.
1,000,000
Status
Description
First study group meeting – January 11 Backplane Twin-axial
4 25 Gb/s 10 10–4 25 Gb/s
100 Gigabit Ethernet
100,000 10 10–4 25 Gb/s Reduced width (10 10–4 25) Lower cost – reduced reach? The 4 25 Gb/s interface?
TBD? Twisted pair
Something less than 100 m?
TBD? Energy efficiency
Electrical and optical aspects?
40 Gigabit Ethernet
Core Networking Doubling ≈18 mos
Rate Mb/s
Anticipated CFI July 11 Chip-to-chip/module Multi-mode fiber Single-mode fiber
10 Gigabit Ethernet
10,000
Gigabit Ethernet
1,000
selected). Like other projects the debate will focus on improving the channel versus using some advanced modulation scheme. This debate will most likely lead to a signalling debate between NRZ and PAM-n signalling. It is anticipated that there will be a ‘‘Call-for-Interest’’ for the next effort in July 2011, and will most likely contain three items: (a) the development of 25 Gb/s electrical chip-to-chip and chipto-module interfaces, (b) reducing the width of the 100GBASESR10 interface to four parallel fibers (i.e. increase the lane rate from 10 Gb/s to 25 Gb/s), and (c) an effort to reduce the cost of 100 Gb/s optical implementations, which might lead to the development of a new optical specification. Fig. 5 shows potential architectural options for supporting 100 GbE optically, utilizing 4 25 Gb/s non-retimed, fully-retimed, or partially retimed electrical interfaces. For the multi-mode effort it will mean the development of 25 Gb/s VCSELS. For the single-mode effort the need for a new optical specification other than 100GBASE-LR4 will be evaluated, based on the interaction with the electrical interfaces. It is unclear to the author at this time how the industry will handle its next rate over twisted pair. Given what has happened with 10GBASE-T, it is likely that the reach will be reduced to better address the data center application, and a reach somewhere from 30 to 40 m is likely. It is also unclear what the next rate of Ethernet over twisted pair will be – 40 Gigabit Ethernet or 100 Gigabit Ethernet. Given the timing of standards and product development 100 Gigabit Ethernet might make more sense, but the technical challenges of such a jump can’t be understated. Finally, there is Energy Efficient Ethernet or EEE. The first generation of EEE is specified in IEEE Std 802.3az™-2010, and specified EEE operation for certain copper-based PHYs for rates 10 Gb/s and below. It targets end-stations, and essentially enables power
MAC RECONCILIATION CGMII PCS PMA (20:10)
MAC RECONCILIATION
PCS PMA (20:4) CPPI-4
CPPI-4
PMD MDI MEDIUM
PMD MDI MEDIUM
100 1995
100GBASE-nR4
2000
2005
2010
2015
2020
Date Fig. 6. Bandwidth projections from IEEE P802.3ba project.
savings by going into a low-power idle during periods of low utilization. The key to EEE actually involves the PCS, so EEE for the 40 Gigabit and 100 Gigabit Ethernet architecture has to be developed.
7. Beyond 100 Gigabit Ethernet The graph in Fig. 6 shows the key chart from the IEEE P802.3ba project that helped justify the need to do 40 Gigabit Ethernet and 100 Gigabit Ethernet. Networking applications, whose bandwidth requirements are doubling approximately every 18 months, have greater bandwidth demands than computing applications, where the bandwidth capabilities for servers are doubling approximately every 24 months. If one projects the trend for core networking out to 2015, there is a need for Terabit Ethernet. There are two issues with the implication of this forecast. First, the technical challenges of such a jump can’t be understated. As pointed out, development work on electrical interfaces at this time is focused on 25 Gb/s signalling, while optical signalling specified by IEEE 802.3 is based on 4 25 Gb/s or serial 40 Gb/s. While it is true that there are serial lambda solutions using advanced modulation techniques for transport, one must remember the associated cost issues, which will always be dictated
MAC RECONCILIATION
MAC RECONCILIATION
CGMII
CAUI PMA (10:4)
Server I/O Doubling ≈24 mos
CGMII
CGMII
PCS PMA (20:4)
PCS PMA (20:4) CAUI-4
Rx:CPPI-4
Tx:CAUI-4
PMA (4:4) PMD MDI
PMA (4:4) PMD MDI
MEDIUM
MEDIUM
100GBASE-nR4
100GBASE-nR4
100GBASE-nR4
Fig. 5. Options for 100 GbE, based on 4 25 Gb/s.
J. D’Ambrosia, K. Shrikhande / Optical Fiber Technology 17 (2011) 512–517
by the application space that is being serviced. There has been significant debate in the industry already regarding the next rate of Ethernet, with many thinking 400 Gigabit Ethernet makes more sense from a technical feasibility perspective. Nonetheless, the other issue that must be considered is the very nature of this paper itself. There is significant work already underway in the industry to enable higher density, lower cost, and lower power100 Gigabit Ethernet. Despite the technical challenges, it is clear that understanding the bandwidth trends of the history will be of vital importance to the Ethernet industry. To that end the IEEE 802.3 Ethernet Wireline Bandwidth Assessment Ad hoc was formed. As the name implies, this ad hoc will be performing an assessment of the Ethernet wir-
517
eline bandwidth needs of the industry, which will be useful reference material for a future standards activity tackling the next rate of Ethernet beyond 100 Gigabit Ethernet. References [1] D’Ambrosia, Law, Nowell, 40 Gigabit Ethernet and 100 Gigabit Ethernet Technology Overview, Ethernet Alliance White Paper, November 2008.
. [2] Cole, Allouche, Flens, Huebner, Nguyen, 100GBE Optical LAN Technologies, IEEE Applications & Practice, December 2007. [3] CFP MSA Draft1.0, March 23, 2009. . [4] IEEE 802.3 Ethernet Working Group Liaison to OIF, July 15, 2010. .