Journal Pre-proof Embedded memory options for ultra-low power IoT devices Khader Mohammad, Temesghen Tekeste, Baker Mohammad, Hani Saleh, Mahran Qurran PII:
S0026-2692(19)30659-7
DOI:
https://doi.org/10.1016/j.mejo.2019.104634
Reference:
MEJ 104634
To appear in:
Microelectronics Journal
Received Date: 13 August 2019 Revised Date:
10 September 2019
Accepted Date: 30 September 2019
Please cite this article as: K. Mohammad, T. Tekeste, B. Mohammad, H. Saleh, M. Qurran, Embedded memory options for ultra-low power IoT devices, Microelectronics Journal (2019), doi: https:// doi.org/10.1016/j.mejo.2019.104634. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.
1
Embedded Memory Options for Ultra-low Power IoT Devices Khader Mohammad1, Temesghen Tekeste2, Baker Mohammad2, Hani Saleh2, Mahran Qurran1 1 Birzeit University, ECE, Ramallah, Palestine 2 Khalifa University, System on Chip Center, EECE, Abu Dhabi, UAE Abstract— The Internet of Things (IoT) connects everyday devices to the internet to gather information using sensors and embedded systems. Emerging low power for Internet of Things wearables, and medical electronic devices has an ultimate goal to reduce overall system power and increase battery life. These applications require efficient memory solutions. Ultra-low energy dissipation for Internet of things devices enables prolonged battery life. Minimizing energy consumption requires correct architectural choice. This paper presents the selection of memory options for Ultra-low power IoT devices. Three type of memories which are flip-flop base memory, latch-based memory and SRAM based memory has been evaluated for 65nm low power foundry technology. The result of the study shown that latch based memory has better resource utilization in terms of power−area product. The latch based RAM saved 60% in power-area product, relative to an SRAM based memory and more than 90% relative to a flipflop based RAM. The study is targeting wearable electronics for ECG monitoring which requires 2KB of RAM. Index Terms—Memory, latch, flip-flop, SRAM, ultra-low power, Internet of Things (IoT) Memory
I. INTRODUCTION The rapid development of the Internet of Things (IoT) has created a number of exciting challenges for designers. This is mainly shown in what to consider for the device in terms of memory. The memory type might depends on whether the IoT device is brand-new technology specifically designed to be connected, or an upgrade to an existing device to create more capability. IoT technology has enabled a wide range of devices to communicate and provide deterministic information in the modern society. IoT application encompasses smart homes, healthcare, industrial sensing, and environmental measurements. Ultra-low power methodology are imperative for IoT devices since these devices are either powered by battery or energy harvesting. Actually, energy consumption is a deterministic factor for prolonged battery life time and hence reduces associated replacement costs. Reducing the energy dissipation mitigates power interruption [1]. An effective way to achieve ultra-low power dissipation in IoT devices is to reduce the supply voltage (VDD). When the supply voltage is reduced the transistors operate either in the sub-threshold (subVth) or the near-threshold region. Operating in the subthreshold has its own difficulties such as integrating memories especially SRAMs. There are also design challenges due to process, voltage and temperature variation. While the number of memory options for IoT devices is low compared to available devices in the market. It creates a challenge to make a decision about what type of memory to include for a specific device. Various memory types for IoT
system-on-chip (SoCs) have been proposed in literature [2] [3] [4] [5]. In [2] a non- volatile memory with special interface for low to medium sized IoT’s is presented. In [3], an ultra-low voltage SRAM in 28nm which is tolerant to process, voltage and temperature variation is reported. A pulsed latch based memory is presented in [4]. Others have constructed a bitinterleaved SRAM macro in 28nm [5]. The SRAM was made to operate in the sub- threshold at 0.25V and offers low leakage operation. There is always a need for ultra-low voltage and ultra-low power memory in IoT devices. One of the main issues in memory architecture is the clock switching energy. A pulsed latch based memory that addresses switching energy is reported in [4]. The pulsed latch based memory uses only one clocked transistor and its power consumption is around 10 times lower than a conventional latch that uses six clocked transistors. One of the appropriately selected memory types for energy efficient digital circuits is demonstrated in [6]. The analysis for a comparison between latch based memory element and flip- flop was performed. It was reported that the pulse latch based design is 14% smaller in area and is 50% less in average energy. The only penalty over using the pulsed latch is the large hold time required when writing data. In ultra-low power digital circuits, there are three types of commonly applicable memory architectures [7]. These are 1) SRAM at nominal VDD along with level shifter for up-scaling to communicate with the rest of the circuit, 2) sub-threshold SRAM with custom design for ultra-low voltage operation, or 3) standard cell based memories (which is basically flip-flops or latches). Each of these approaches have its own advantages and disadvantages, where the first option has relatively high leakage power due to the high voltage requirements though it offers low area. The second option offers a lower power consumption, however at the cost of higher area due to its architecture that would require a larger number of transistors. The third option offers high flexibility since it could be designed fully custom macro to suit the applicable architecture, however it would require a relatively higher area. Standard-cell based memory runs at the same speed and supply voltage as the core. Hence, there is no need for level shifter or custom cell design. When comparing memory options, and what can be used, one of the main factors is to determine the device priorities. Depending on the device and the target application. For example, cost is important metric for IoT so does size of the device as the amount of space required for memory processing must also be kept to a minimum, as the more silicon required, the costs go up. The second is the power consumption. Most IoT devices either run on small batteries or rely on energy harvesting for recharging. For this reason, it’s important to
2
Figure 1: Transparant LOW/HIGH latchs schematic
Figure 2: (a) Edge Triggered D-FF (symbol and transistor view)
Figure 2: (b) Master Slave Rising edge Flip Flop symbol and transistor view consider the power consumption of the memory selection, and choose an option that uses the least amount of power and voltage, both in use and during standby. The third consideration is the startup time. Users want excellent device performance, so memory needs to be sufficient to allow for a quick startup. Keeping these points in mind will allow the designer to make the right memory selection for an IoT device. However, what options do you have to choose from? The selection of memory is also affected by the size and type of memory such as RAM [7]. In [7], the authors have demonstrated the tradeoffs between standard-cell based memories and sub-Vth SRAMs which lies between 4-6 KB. In this paper, three types of RAM are compared for applications in ultra-low power IoT devices. The RAMs are analyzed in-terms of power consumption and utilized area. The analysis is demonstrated in an ECG processor, which uses a latch-based memory. The optimum RAM size where each memory type becomes appropriate is also determined. The paper is organized as follows: Section II describes the ECG basic structure and memory architectures for ultra-low power IoT devices, section III gives the discussion and section IV concludes the paper.
II. MEMORY ARCHITECTURE As highlighted in the previous section, various types of memory types and architectures have been proposed in literates and applied in various IoT devices. The basic concept of memory is a circuit that retains its value after the inputs changes or when the power goes down (non-volatile memory). The basic storage element could be a capacitive element as in dynamic RAMs or bistable latching circuit as the case in flipflops, latches or SRAMs. These type of memories are pros and cons based on supply voltage, size and application. In ultralow power applications, the main challenging areas are leakage power, area, and speed of operation. Usually, IoT sensor nodes operate at low frequency, hence the main concern remains in power and area. Here, we first describe the three types of memory and then analyze which one will work for IoT. A. Latch based Memory There are different design approaches for latches based on the needs, and input combination. A schematic view of transparent
3
low and high latches is shown in Fig.1. The latch is the building block of a flip-flop, as the flip-flop is composed of two latches (Fig.1). A latch requires only half the area of a flipflop as it could be constructed using only 14 transistors. The latch is level sensitive and its hold time is the same as the clock pulse. Usually narrow pulses are utilized to write into a latch. Taking under consideration that trends in high-performance systems require higher clock frequency, more transistors are needed. Latches and flip–flops are both storage devices like SRAM. The main difference between a latch and a flip–flop is the triggering mechanism. Latches are transparent when enabled, whereas flip – flops are dependent on the transition of the clock signal i.e. either positive edge or negative edge as discussed in the next section. B. Flip-flop based Memory Flip-flops as shown in (Fig. 2a) form the basic memory elements for sequential circuits. They form the register file inside central processing units (CPUs). They are preferred for their reliability and ease of data access as the flip-flops are clock edge triggered. A typical D-flip flop requires 26 transistors as shown in (Fig. 2b).A D flip-flop is a logical device that has two stable states. The output of a D flip-flop always follows the input signal and changes the state only when a control signal or the clock signal is activated. In this paper, a positive edge triggered D flip-flop is used. A positive edge triggered flip-flop changes its state only when the clock transits from logic low to logic high. The flip-flop remains in that state till the next rising edge arrives at the clock input. A master slave D flip-flop consists of two D latches and is called so because the second latch changes its state only as a response to a change in the master latch’s state. A D flip-flop has two outputs, one is the desired state and the other is the complement of this output. When the complemented output is connected back to the clock, the flip-flop toggles at every output state and divides the input at D by two. The modern usage of the term flip – flop is reserved to clocked devices and term latch is to describe much simpler devices. As in ((Fig. 2a), the basic D flip-flop circuit in master-slave configuration. It consists of transmission gates and inverters to latch the input. When the clock goes low, transmission gates T1 input is stored at point B and T4 is on. The D input is passed by T1 and inverted . Since T3 is off, the input is not passed to the second latch. Also since T2 is off, the output of the first latch feedback loop does not interfere with the input. Whatever the previous output at Q was, it is latched by the second latch and as T4 is on, the output at Q remains the same. When a positive edge arrives at the clock, T1 and T 4 are turned off and T2 and T3 are turned on. Due to T2, the input at D keeps circulating in the first latch and hence the D input can now be removed. Now as T3 is on, the inverted input at B is passed on to the second latch, where it again gets inverted and is passed on to Q. When the clock goes low again, T1 and T4 are turned on and the same cycle repeats itself. Due to the master slave configuration, the effect of noise which can lead to metastability can be avoided.
As the main design target for memory and sequential design is to have small clock load, shortest Din to Dout direct path and Low-power feedback, and have high driving capability to reach an Optimize speed * power product and optimize factor for AP= Area* Power C. SRAM based Memory Electronic gadget that has a digital processor in its circuit board, starting from the household micro-oven to Apple’s iPhone and the commercial Amazon’s cloud servers, uses a fast and power-efficient on-chip memory called the static random access memory (SRAM) [11][12]. Due to the unprecedented growth in Internet of Things (IoT) devices worldwide, an evolution in new technology has been triggered, with a particular focus on RAM, or Random Access Memory which is one of the key parts of any system and when deciding what system to choose or build, people usually pay much attention to its volume, knowing that the more RAM the better. But what about the performance speed? Being one of the key elements of any system, a user should also pay some attention to RAM design and speed as well, especially when it comes to using it for general IoT or for more speed hungry deceives like gaming .
Figure 3:a Memory Cells: 6T SRAM
Figure 3: b Memory Cells: 6T SRAM write operation
4
Figure 3: c Memory Cells: 6T SRAM read operation
An SRAM is constructed using a bistable latch circuit as shown in Fig. 3a. This type of SRAM only requires 6- transistors. Hence, we expect its size to be relatively smaller than the latch and the flip-flop. The 6-transistor SRAM only operates above threshold and hence requires higher supply voltage. For sub-Vth applications the SRAM requires up sized transistors or additional transistors from 8 up to 14 transistors. In this paper, the analysis is done on a standard cell library and SRAM macro in 65nm. The SRAM macros were characterized for 0.9V, 1V and 1.2V as their nominal operation. An RTL test is constructed that contains only the memory and its access lines (address and data bus). The access lines are the same for all types of memory types under analysis. Synthesis based on standard cell flow is utilized to perform power and area measurement, charge Bit lines (BL) and finally, the voltage difference between BL and its conjugate is amplified and sensed as “0” or “1” - If voltage difference is small enough then the read is nondestructive. For write operation, circuit as shown in Fig. 3b we Apply ”bit” to bit line BL: ”1” - And opposite to conjugate BL. This will Turn on M5 and M6 to save bit , M5 saves on right inverter and M6 saves on left inverter. In terms of device sizes, M5/6 are stronger than M1-4. For read operation, a precharge bit lines to Vdd/2 as in Fig 3c. Then turn on M5-6 to take out charge to bit lines and finally, the voltage difference between BL and its conjugate is amplified and sensed as “0” or “1” If voltage difference is small enough then the read is non-destructive inverter and M6 saves on left inverter. In terms of device sizes M5/6 are stronger than M1-4. SRAM design is important along with design of the data bus used to transfer data between SRAM and CPU. The same thing apply for data buses between the content addressable memory (CAM) system which includes one or more CAM cells, each including a bit cell to store a bit and a complementary bit, and a compare circuit to compare a reference input to the stored bit and to the stored complementary bit[14] [13]. High-performance systems have a significant amount of data transfer between the on-chip L2 cache and the L3 cache of off-chip memory through the power expensive off-chip memory bus[15] so using the right memory design type like SRAM will affect overall performance, cost and area
III. DISCUSSION AND ANALYSES In this section, we discuss the comparative analysis of the SRAM cells for the sub-Vth region which requires a larger area. The drawback for the standard cell based memory is the large area. Hence, there is a tradeoff between area and power for selecting one type of memory. Latches provide almost half the size of flip flops. A. Optimum RAM type versus size of RAM
To optimize the suitable device sizes, we vary the Wp/Wn ratio of the cell and measure the average current “Iavg” from VCC. We also perform measurement for cell delay (rise + fall / 2) and then Plot the Iavg^2 * Delay graph as shown Fig 4. We chose the Wp/Wn ratio corresponding to the minima (1.38) Wp+Wn = total diffusion is fixed by cell architecture and design rules. Now solve for Wp and Wn B. Optimum Operating Frequency for different size of RAM
What would be the impact of operating frequency on the selection of type of RAM. An aggressive scaling of the SRAM transistor size is widely used in VLSI chips. This is because the SRAM cell generally employs the smallest transistor size in all technology generation in order to achieve a smaller chip size. Figure 5 shows that the six-transistor 6T-SRAM cell size is from 180-nm technology downward to 65-nm. The SRAM cell size shrinks by about 50% as one generation of technology advances. In 65- nm technology, SRAM cell sizes of about 0.5 µm2 have already been reported. In the 65nm CMOS generation, a large local Vth variability degrades the 6T-SRAM cell stability. The Vth variability is divided into local and global components. The local Vth variability occurs due to fluctuations of the doped impurities, while
5
scaling as a function of voltage the global Vth variability occurs mainly due to the manufacturing process. Vth variability occurs mainly due to the manufacturing process, which leads to variations in the MOS transistor’s physical dimensions such as gate length (L), gate width (W) and gate oxide thickness. Because the accuracy of the physical
dimensions is improved due to the advance of manufacturing equipment and as the transistor size shrinks with the advance of technology, the global Vth variability is mostly maintained [12];
Figure 4: Optimize size Wp/Wn Ratio
Figure 5 Technology versus SRAM cell Size C. Best operating voltage for each type of RAM
Figure 6: Tradeoff between cell fail and power
What would the energy efficient operating point be for
Figure7: ECG Processing Architecture with Latch Based Memory
6
each type of RAM? Figure 6 tries to show the relationship between voltage scaling, number of failures and power saving. Our proposed approach tries to optimize the power consumption with performance through tolerating the errors in the system.
D. Results
There are three types of memory mentioned above. First, we analyzed them in terms of their respective area and power. Then the memory choice for the ECG architecture was made as presented in Fig. 7. Memory which is based either on latches or flip-flops in which we utilizes the standard cells without a separate memory macro. This standard cell based memory provides flexibility, reliability, less design effort and voltage range which is the same as the standard cells. In case of SRAM macros, the SRAM cells are designed to work at a certain voltage and if you want to operate the SRAM at a lower voltage (sub-Vth region), the SRAM cells have to be redesigned. This requires more design effort and usually done for each type of RAM. SRAM cells for the sub-Vth region require larger area. The drawback for the standard cell based memory is the large area. Hence, there is a tradeoff between area and power for selecting one type of memory. Latches provide almost half the size of flip flops. The memory analysis was demonstrated based on the architecture depicted in Fig. 7[8]. The architecture was designed for processing ECG. It consists of two main parts: ECG feature extraction and Cardiac Autonomic Neuropathy (CAN) classification. In the first stage, ECG features are extracted and utilized for the classification of CAN. These features are the main characteristic points of an ECG wave which are the Pwave, the QRS-complex and the T-wave. The peak, onset and offset of these waves are also determined. In the second stage the extracted features are used to evaluate the QT and RR intervals, which are then applied in detecting CAN severity. The ECG feature extraction stage requires memory to temporarily store ECG samples and perform the delineation process. The required system memory for this architecture was only 2KB. Experiments were done using RTL simulation and synthesis for three different approaches for memory implementation namely, flip-flop based, SRAM based and latch based. Table I gives the synthesis results of area, power (leakage and dynamic) and Power-Area Comparison for the three types of memory at 1V and frequency of 1kHz
Table II gives the synthesis results for the three types of memory at 1V and frequency of 1kHz. Note that the SRAM requires the smallest area for 2KB of memory. Hence, we are using the power-area product comparison metric. The leakage power is the highest in case of the SRAMs and their supply voltage is limited to minimum of 0.9V. The latches exhibit the minimum leakage and dynamic power, and smaller area than flipflops. If we compare them in-terms of power- area product, the latch based memory has the smallest power- area product. When the size of RAM increases, the effect of the area becomes dominant and the relative power area product figures changes. There is a breakeven point where the SRAM becomes more advantageous in terms of power area product. In the same configuration, the supply voltage for the latch and flip-flop based memories could be scaled down to near-
Figure 8: Memory Choices
Figure 9: Die Photo for SoC with Latch-Based Memory
7
Vth or sub-Vth in the same way as the specification of
8
the standard cell library. However, scaling the supply
9
voltage for the SRAMs is not possible
as the SRAM
1 0
6.00E+14 5.00E+14
3.00E+14
Size in (Kb)
4.00E+14
Latch P-A Product SRAM P - A Product Flip Flop P-A Product
2.00E+14 1.00E+14
PA Product 0.00E+00 2
4
8
16
32
64
128
256
512
1024 2048
4096 8192
Figure 10 RAM size with different Memory type
Figure 11 Latch Based Memory for 1024x16 bits
Figure 12 Register based Memory 1024x16 bits
Figure 13 SRAM based memory for 1024x16 bits needs to be redesigned to suit the voltage scbased
1 1
memory for 1024x16 bits
IV. CONCLUSIONS
In the proposed system the latch-based RAM was chosen in order to enable the system to operate at a low-supply voltage of 0.6V. As shown in Table III , The Latch based RAM saved 60% in power-area product, relative to an SRAM based memory and more than 90% relative to a flip-flop based RAM (Fig. 8). The die photo of the ECG processor is depicted in Fig. 9, in which the fabricated chip only consumed only 75nW from a supply voltage of 0.6V at 250Hz. The chip area is only 0.227mm2. This low level of power consumption and low area would not have been achieved if the flip-flop-based memory or SRAM based memory were incorporated. Figure 10: Memory Options comparison of Power-Area with Address and Data lines @1kHz. Figure 11 shows the Latch Based Memory for 1024x16 bits which are area of = 512 * 260 = 133,120 um2 = 0.133 mm2 . Figure 12 shows the Register based Memory 1024x16 bits with Area = 577*292 = 168484 um2 = 0.168 mm2. Figure 13 shows SRAM based memory for 1024x16 bits aith Area : 267 * 138 = 36,846 um2 = 0.0368 mm2 Table I: Memory Area, power (leakage and dynamic) and Power-Area Comparison: 2KB RAM
Latch Area um2 Area mm2 Leakage Dynamic Total P-A product Normalized PA Area per bit
SRAM
Flip Flop
82003.00 0.08 60.50 7.20 67.70 5.55 0.45
21503.00 0.02 507.00 7.80 514.80 11.07 0.89
104697.00 0.10 72.50 46.40 118.90 12.45 1.00
5.01
1.31
6.39
Table II: Memory Options Power-Area Comparison: 2KB Memory Type Area (mm2 ) Area per bit (um2 ) Leakage Dynamic Power (nW) Total Power (nW) Power Area Product
Latch
0.082 5.05 60.5 7.2 67.7 5.55
Flip-Flop
0.1047 6.39 72.5 46.4 118.9 12.45
SRAM
0.0215 1.31 507 7.8 514.8 11.07
Table III: Memory Options Power-Area Comparison: 2KB RAM with Address and Data lines Supply Voltage 0.6V
Operating Frequency 250Hz
Area
Memory
0.227 mm2
2 kb
Power Consumption 75nW
In this paper the selection of memory options for Ultralow power IoT devices is analyzed. Based on the analysis done on three types of memory which are flipflop base memory, latch- based memory and SRAM based memory, it was shown that latch-based memory has better resource utilization in terms of power-area product. The Latch based RAM saved 60% in powerarea product, relative to an SRAM based memory and more than 90% relative to a flip-flop-based RAM.
REFERENCES [1.] M. Kim, J. Lee, Y. Kim, and Y. H. Song, “An analysis of energy con- sumption under various memory mappings for fram-based iot devices,” in Internet of Things (WF-IoT), 2018 IEEE 4th World Forum on. IEEE,2018, pp. 574–579. [2.] M. K. Dinesh and R. Bhakthavatchalu, “Storage memory/nvm based executable memory interface ip for advanced iot applications,” in Recent Trends in Information Technology (ICRTIT), 2016 International Confer- ence on. IEEE, 2016, pp. 1–9. [3.] J. Yang, H. Ji, Y. Guo, J. Zhu, Y. Zhuang, Z. Li, X. Liu, and L. Shi, “A double sensing scheme with selective bitline voltage regulation for ultralow-voltage timing speculative sram,” IEEE Journal of Solid-State Circuits, vol. 53, no. 8, pp. 2415–2426, 2018. [4.] M. Saint-Laurent, B. Mohammad, and P. Bassett, “A 65-nm pulsed latch with a single clocked transistor,” Proceedings of the 2007 international symposium on Low power electronics and design, pp. 347–350, 2007. [5.] B. Mohammad, J. A. Abraham. "A reduced voltage swing circuit using a single supply to enable lower voltage operation for SRAMbased memory", Microelectronics Journal 02/2012; volume 43, issue 2:, pp 110-118, February 2012, DOI:10.1016/j.mejo.2011.11.006 [6.] P. Bassett and M. Saint-Laurent, “Energy efficient design techniques for a digital signal processor,” in IC Design & Technology (ICICDT), 2012 IEEE International Conference on. IEEE, 2012, pp. 1–4. [7.] O. Andersson, B. Mohammadi, P. Meinerzhagen, A. Burg, and J. N. Rodrigues, “Ultra low voltage synthesizable memories: A trade-off dis- cussion in 65 nm cmos,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, no. 6, pp. 806–817, 2016. [8.] T. Tekeste, H. Saleh, B. Mohammad, A. Khandoker, H. Jelinek, and M. Ismail, “A nanowatt real-time cardiac autonomic neuropathy detector,” IEEE transactions on biomedical circuits and systems, vol. 12, no. 4, pp.739–750, 2018. [9.] M. Adiseshaiah D. Sharath Babu Rao V. Venkateswara Reddy, “ IMPLEMENTATION AND DESIGN OF 6T-SRAM WITH READ AND WRITE ASSIST CIRCUITS , IJREAS , Volume 2, Issue 5 (May 2012) [10.] Mike Cadogan, Chris Nickson, ECG Basics [online].
1 2
Available: http://lifeinthefastlane.com/ecglibrary/basics/. [11.] Arijit Banerjee, “Ultra-Low-Power Embedded SRAM Design for Battery- Operated and Energy-Harvested IoT Applications”, Intechopen, 2018 [12.] B. Mohammad; H. Saleh, M. Ismail. “Design Methodologies for Yield Enhancement and Power Efficiency in SRAM-Based SoCs,” IEEE Transaction on Very Large Integration System (TVLSI), volume 23, issue 10, pp2054-2064, Oct. 2015, doi: 10.1109/TVLSI.2014.2360319 [13.] K Mohammad, A Kabeer, TM Taha, M Owaida, M Washha, “Off-chip bus power minimization using serialization with cache-based encoding”, Microelectronics Journal 54, 138149 [14.] K Mohammad ,” Low power content addressable memory system, “ US Patent 9,007,799, 2015 [15.] K Mohammad, A Kabeer, T Taha, “ On-chip power minimization using serialization-widening with frequent value encoding,” VLSI Design journal 2014, 6 [16.] Y.-C. Chien and J.-S. Wang, “A 0.2 v 32-kb 10t sram with 41 nw standby power for iot applications,” IEEE Transactions on Circuits and Systems I: Regular Papers, no. 99, pp. 1– 12, 2018.