High speed and low sensitive current-mode CMOS perceptron

High speed and low sensitive current-mode CMOS perceptron

Microelectronic Engineering 165 (2016) 41–51 Contents lists available at ScienceDirect Microelectronic Engineering journal homepage: www.elsevier.co...

2MB Sizes 0 Downloads 95 Views

Microelectronic Engineering 165 (2016) 41–51

Contents lists available at ScienceDirect

Microelectronic Engineering journal homepage: www.elsevier.com/locate/mee

Research paper

High speed and low sensitive current-mode CMOS perceptron Szymon Szczęsny Faculty of Computing, Chair of Computer Engineering, Poznań University of Technology, Piotrowo 3A Street, BWE, Room 508, 61-138 Poznań, Poland

a r t i c l e

i n f o

Article history: Received 14 May 2016 Received in revised form 11 August 2016 Accepted 22 August 2016 Available online 27 August 2016 Keywords: Perceptron Current-mode Mixed systems CMRR Mismatch Row strategy

a b s t r a c t The paper presents a prototype of a perceptron network working in the current mode, with a digital interface for applications in analog-to-digital systems. The presented implementation makes it possible to create a network using the standard nanometer CMOS technology, to use a row strategy for designing a topography, as well as to place an analog network IPcore on a silicon substrate, common with the digital part. The work introduces an implementation of a CMOS axon with a monotonic, differentiable and sigmoid activation function. It also introduces an implementation of a dendrite with a fully digital interface. The obtained dendrite weight mapping error was 0.072%. The article discusses routing neurons based on using weight values including an elimination of the CMRR concurrent component and reducing power consumption. The work presents a strategy for programming weights including completely eliminating the mismatch process. The works included a mismatch analysis of the operation of the network and a verification of the signal processing accuracy. A sample 7.8 MHz multi-layer implementation with 16 weights yielded precision = 94.62%, and a 5 Ms/s multi-layer implementation of 1098 weights: precision = 93.28%. © 2016 Elsevier B.V. All rights reserved.

1. Introduction In the era of miniaturization of electronic equipment, reconfigurable analog solutions make a competitive alternative for digital circuits due to their valuable physical properties [1–3]. Moreover, it has been proven that it is worth performing a preprocessing of analog signals in an analog circuit [4] before applying them to the digital part of a mixed system. The development of semiconductor techniques and the miniaturization of the CMOS technology makes implementations of analog circuits in current techniques particularly popular. One of the developed areas of research is the implementation of neural networks as reconfigurable circuits. One of the first works focused on the problem demonstrated in an implementation of a perceptron based on using mirrors working in saturation [5]. In this implementation, the sigmoid function is linear in certain areas, though it is not continuous, nor differentiable in the whole range. Another downside of such an implementation is the need of controlling the bias current in the neuron. The implementation of a discontinuous sigmoid function using two level shifted current mirrors in a serial connection is similar [6]. The paper presents a way to implement a number of sample weights with a small possibility of regulating their values. The implementation of the sigmoid function also requires controlling bias currents. The main drawback of both approaches is the discontinuity of the activation function, limiting the learning possibilities of the network. On the other hand, CMOS circuits are continuous by nature. However, the proposed implementations of

E-mail address: [email protected].

http://dx.doi.org/10.1016/j.mee.2016.08.010 0167-9317/© 2016 Elsevier B.V. All rights reserved.

sigmoidal functions only provide their estimates using picewise linear functions. Such a solution can indeed limit hyperbolic functions estimation errors, but because of the discontinuity of the activation function, it is also a source of noise in the response of the final network. The implementation of complex perceptron networks also requires an implementation of taught weights with high accuracy, which is difficult in the discussed implementations. Propositions for implementations of mapping a monotonic activation function based on a CMOS implementation of a class AB current conveyor are presented in the work [7]. However, it is relatively complicated, as it requires using 18 transistors and is controlled by the bias current. Literature presents other interesting implementations of networks working in the current mode, e.g. analog Kohonen neural network [8] or a hardware implementation of a neural network for the purpose of analog built-in selftest [9]. Currently methods for implementing artificial neural networks into an Implantable Unit [10] are being developed, as well as methods for implementing analog implementations of perceptrons, using VHDL-AMS and macro-models, for example [11]. As already mentioned, the choice between a digital and an analog implementation is largely dictated by physical properties, e.g. the power consumption. Experience proves, however, that designers of analog circuits are reluctant to use this solution due to its low processing accuracy, which limits the complexity of implemented examples. Popularization of analog circuits only by pushing up their physical parameters is inefficient because in reality it does not result in practical applications. The digital signal processing accuracy in reprogrammable circuits such as FPGA-s depends solely on the chosen bit resolution. Yet, it is done at the expense of large power consumption of a computing system. The presence of a

42

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

Fig. 1. Mixed system diagram. The neural network works as a preprocessor for analog signals but the system only features a digital interface. Bitstream is used for implementing weights of the taught network.

digital circuit in an analog-to-digital mixed system challenges the merits of reducing the value of the usually low power consumption of the analog part, in relation to the power consumption of the digital part. When it comes to designing reconfigurable analog circuits, it is reasonable to propose mechanisms to improve accuracy, as well as to provide possibilities for implementing the analog part using a standard digital CMOS technology and to add a digital interface to the analog circuit. This article presents an implementation of a perceptron meeting the above criteria. Fig. 1 presents an application of an analog implementation of a neural network on a substrate common with the digital part of a mixed system. The network performs preprocessing of analog signals but the system only features a digital interface for implementing weights of the taught network and for exchanging data with a digital processor. The author proposes to focus on these aspects of an analog implementation of a neural network, which in particular improve the processing accuracy without sacrificing the physical properties. The following chapters explain weight implementation methods with high accuracy in a wide range of values, an analytical method for determining a continuous, monotonic and differentiable activation function of an axon and a routing method for eliminating the concurrent component of the signal. The proposed implementation omits the need for using bias currents or voltages as it was limited only to using digital signals while configuring the structure. The author proposed a minimalistic prototype of a circuit, implementing a sigmoid function, consisting of a single pair of complementary transistors (in a diode connection) and two inverters. Circuits of an axon, a dendrite and a module for eliminating the concurrent component, discussed in the following sections, contain structures making it possible to generate a topography of an analog network using digital techniques such as the row strategy. 2. CMOS neuron model The operation of summing signals in current techniques is implemented in accordance with the Kirchhoff's law in nodes. Roles of multipliers are played by current mirrors, which in dedicated circuits are implemented as a connection of two transistors, one of which operates in a diode configuration. The value of the multiplying coefficient is the result of the ratio between the device transconductance of both transistors. The next subsection presents the idea of implementing a reconfigurable current mirror with high accuracy of mapping of the scaling factor. A digitally programmable mirror makes it possible to easily implement the dendrite weight with an option to reconfigure it. In the following subsections a transistor model of an axon is discussed.

Subsection 2.3 presents a method for programmable routing with CMRR1 elimination and with a digital technology using. 2.1. Reprogrammable current mirror The IPcore application of the neural network, as part of a larger mixed system, requires the provision of a digital interface. Implementing the dendrite weight with high accuracy, as a scaling factor of a single current mirror is impossible, as it would require a very large transconductance of output stages of mirrors. Therefore the author proposed the structure presented in Fig. 2. The circuit consists of a serial connection of two multi-output current mirrors switched using n1 + n2 switches. The architecture of mirrors was based on complementary transistor pairs in each of the stages. Such structure makes it possible to use elimination techniques of the concurrent component in the signal processing track. In the presented model of a dendrite we assumed that each subsequent output stage of the mirror has a scaling factor 2 times larger than the previous stage. The dendrite is configured using a bit word which is a concatenation of vectors A of length n1 and B of length n2. Assuming the smallest scaling factors of mirrors have values αA and αB respectively, the weight value is expressed by Eq. (1).

wi j ¼

n1 X i¼1

2i A½iα A 

n2 X

2 j B½ jα B

ð1Þ

j¼1

Regardless of selecting coefficients αA and αB in such dendrite implementation, the number of unique values of weights is 2n1 þn2 − 2n1 þn2 −1

i. The maximum possible number of weights (2n1 + n2) is re∑i¼1 duced by a set of redundant solutions resulting from the commutativity of multiplying both parts of Eq. (1). Moreover, values of weights for which all elements of vector A or B are zero - are not allowed. To increase the number of weights a method for determining the AB bit word based on a generation of a predefined grid of solutions can be proposed. The circuit in Fig. 2 was simulated for all values of the AB bit word after the extraction of parasites from the topography. The extraction was performed using the Calibre tool of the Mentor Graphics package. The result of the extraction is a simulated netlist of the circuit containing, beside transistors, also resistances and capacities of topography paths. The result of netlist simulation is a grid of 2n1 + n2 − (2n1 + 2n2) 1

Common Mode Rejection Ratio

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

43

resources. The author proposes an implementation using properties of the simplest digital cell, i.e. an inverter composed of a complementary transistor pair. The current mirror layout presented in Section 2.1 works properly in the saturation area of both transistors and at the input current of zero the input node voltage is (VDD + VSS)/2. For a simple CMOS inverter the dependency of the output voltage on the input voltage, in the threshold voltage working area, imitates the sigmoid function in the transposed coordinate system. The author proposes to use the working characteristic of the voltage mode of the inverter cell to implement an axon operating in the current mode. Firstly assume function fs defined with Eq. (2) expressing the dependency of the output voltage to the input voltage of the inverter can be approximated by a composition of a hyperbolic function tanh() and any differentiable function h() (Eq. (3)). V out ¼ f s ðV in Þ

ð2Þ

f s ¼ tanh∘h

ð3Þ

Meeting the condition of differentiability of function h is necessary due to the possibility of network learning using gradient algorithms or backpropagation algorithms. The proposed axon prototype is presented in Fig. 4. The voltage model of the inverter assumes the equality of the currents of both transistors and the output current of zero. Loading the inverter with the earlier-mentioned dendrite results in a non-zero output current of the inverter, which can be expressed with Eq. (4). ip3 ¼ iout þ in3

ð4Þ

Assuming that transistors Mp3 and Mn3 work in the saturation area, the above equation can be expressed as: Fig. 2. Current-mode dendrite realisation with a digital interface.

assignments of bit words and their corresponding weights. Due to postlayout simulations, the obtained grid on the one hand includes information on the actual weight values (which depends on routing and parasites) and on the other guarantees that all weight will have unique values. In such strategy, redundant solutions make it possible to select weights in a more accurate way. The parameters of the generated grid for the 12-bit AB word and αA = αB = 0.14 are the following: maximum weight value 8.3136 and average weight mapping error of 0.072%. The power consumption of a circuit designed in the 0.18 μm technology is 130 μA providing a power supply of 1.8 V. Fig. 3 presents the dendrite response to a pulse input function for different weight values. The response is simulated including reprogramming on the run. The maximum input frequency is 49.57 MHz. The time required for reprogramming a dendrite is 8.5 ns. The proposed implementation is compared in Table 1 with other multipliers. Evidently only one of the implementations is characterized by a greater work speed, which is however achieved at a much bigger power consumption. Moreover, the structure of this implementation does not make it possible to implement a multiplier only by using transistors. Increasing the accuracy of mapping weights in a dendrite is possible by increasing the size of the AB word, although increasing sizes n1 , n2 requires the use of long transistors, which limits the maximum operating frequency. A better solution is to serially connect an additional mirror of size n3 configured with an additional word C. 2.2. Transistor axone model Using current mirrors makes it possible to implement neurons with a linear activation function. Implementing a perceptron requires the use of circuits implementing sigmoidal activation functions. The CMOS technology does not feature any simple solution for implementing such functionality which would not require using excessive hardware

2 βp3  β V DD −V 2 þ V Tp ¼ iout þ n3 ðV 2 −V SS −V Tn Þ2 2 2

ð5Þ

where VTn and VTp are threshold voltages in the given technology. Assuming the equality of device transconductances βp3 = βn3 = β3 and the fact that VSS = 0 the solution of the equation takes a linear form (6). iout ¼ V 2  A þ B iout ¼ g ðV 2 Þ;

ð6Þ

where   A ¼ β3 V Tn −V Tp −V DD   β B ¼ 3 V DD 2 þ 2V DD V Tp þ V Tp 2 −V Tn 2 2

ð7Þ

are constant and depend only on the sizes of transistors. Voltage V2 depends on changes to voltage V1 at the input of the middle stage of the axon in Fig. 4. The stage is an inverter and therefore meets Eq. (2), which for this stage can be expressed with Eq. (8). This equation proves that the introduction of an additional component of an inverter makes it possible to modulate the linear characteristic of a typical current mirror using a composition of functions fS, in which one of the functions is a sigmoidal function tanh. V 2 ¼ f s ðV 1 Þ

ð8Þ

Decomposing the currents equation for the axon input according to Eq. (9) results in obtaining relation (10) describing voltage V1, which linearly depends on the input current iin (Eq.(11)). iin ¼ −ip1 þ in1  −iin 1  − V Tn 2 −V 2Tp 2 β3 V1 ¼ V Tp þ V Tn

ð9Þ

ð10Þ

44

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

Fig. 3. Dendrite response to an impulse input function, including programming the dendrite on the run.

V 1 ¼ hðiin Þ

ð11Þ

Voltage V2 is therefore a composition of function fs and linear function h with an input current argument (Eq. (12)). The output current of the axon, based on Eq. (6) is in turn a composition of function V2 and linear function g (Eq. (13)). V 2 ¼ f s ∘hðiin Þ

ð12Þ

iout ¼ g∘ f s ∘hðiin Þ

ð13Þ

The above analysis proves that current in the serial connection of a current mirror and an inverter inherits some of the static characteristic of the inverter. An approximation of the current is possible using a differentiable composition of a sigmoid function and a linear function. As seen in the above equations, the level of the output current depends on the sizes of transistors in the third axon stage, i.e. the selection of parameter β3. The other two transconductances of the device, at stages one and two, define the mirror scaling factor, which determines the value of voltage VIN. Selecting the scaling factor results in a horizontal shift of characteristic iout. A strategy for selecting transistor sizes was the same as for designing the dendrite, i.e. the length of transistors was modified. Fig. 5 presents exemplar axon simulations for different values of β3 and for scaling factor ββ21 ¼ 1. An approximation of an axon response based on Eq. (13) can be proposed with Eq. (14). γ constants are coefficients of linear functions.

Using simple optimization methods, γ coefficients were chosen, so that the mean square error of the response mapping of the CMOS circuit was as small as possible. Fig. 6 presents a comparison of responses obtained from the axon simulation and its mathematical model for two different scaling factors of mirror β1β2.   iin þ γð4Þ þ γ ð5Þ ioutapr ¼ γð1Þ  iin þ γð2Þ  tanh γ ð3Þ

ð14Þ

γ 1 ¼ ½0:011 −1:023 −2:946 0 0 γ 10 ¼ ½ 0:004 −1:37 −1:377 −0:03 −0:022 

ð15Þ

Coefficients vector γ is defined with Eq. (15). Coefficient γ1 correβ2 β2 ¼ 1 and γ10 corresponds to scaling factor β1 ¼ sponds to scaling factor β1

10. Selection proper transconductances of PMOS transistors, taking into account NMOS transistors, providing it is possible, should result in an output current of zero when applying an input current of zero, which means resetting the last two values in vector γ. For very large values of the scaling factor the sigmoid is approximated in the area of large input currents with a constant value, which corresponds to a zero value of the first element in vector γ. 2.3. Routing The following section addresses several aspects of routing neurons working in the current mode. Implementing large analog structures containing hundreds or more modules is not easy because of the signal

Table 1 Comparison with other multipliers. Parameter

[12]

[13]

[14]

[15]

[16]

[17]

[18]

Current Work

Voltage supply [V] Number of transistors Speed Power

1.8 24 31.2 MHz 207 μW

3.3 4 10 MHz 13.2 μW

1.0 16 2.8 MHz 2.3 μW

0.5 16 10 MHz 1.56 μW

0.75 10 10 MHz 19.9 μW

−2.5 → 25 48 5 MHz 6.4 mW

0.9 10 + 8R + OpAmp 72.45 MHz 25.14 mW

1.8 28 49.57 MHz 234 μW

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

45

Fig. 6. Comparison of simulation results of circuit in Fig. 4 (*,*) with the approximation of its response (−) using a mathematical model of Eq. (14) for parameters of Eq. (15). Fig. 4. Transistor axon prototype.

processing accuracy as well as concurrent components (Common Mode Rejection Ratio) present in both: circuits operating in the voltage mode [20], as well as the ones working in the current mode [21]. Moreover, programmable routing requires additional digital resources which increase the power consumption of the whole structure, as well as its dimensions. The author presents a solution to these problems in the form of routing with the elimination of the CMRR component. Another advantage of the approach is the possibility of removing the digital part from the process of routing neurons. Information concerning connections is already included in weight programming bits. The routing scheme is shown in Fig. 7. Due to the current mode, both positive and negative signals from inputs x are reproduced in current mirrors CM. The number of mirror outputs depends on the number of axons L1 in the first layer. Circuit in Fig. 2 works as dendrite. However, due to the balanced structure, one of the dendrites has two serial connections of multipliers. Dendrites are programmed using the Wi bit matrices consisting of concatenations AB[Wi]. Before being applied to an axon, the summed signals are subject to elimination of the concurrent component in CMRR blocks. These blocks are also separators between the summing node (cell body), to which signals from the dendrites are applied at various delay, and the module which implements the activation function. Implementation of the CMRR blocks is described in

the later part of the current section. Axons output signals are again applied to the next layer of neurons via reproducing circuits. The L2 number of outputs of CM circuits depends on the L2 number of neurons of the subsequent layer. The architecture of subsequent layers is similar. Implementation of the chosen neural network consisting of a certain number of neurons in specific layers, requires turning off blocks which are not used anymore. In the presented solution it is done using vectors AB which contain sole information about weight values. As shown in Fig. 2 - the vector with zero values corresponds to the zero weight and causes switching off the given dendrite, ergo the elimination of the connection in the network. The presence of the switched-off dendrite does not affect the operation of the adjacent modules. The AB word can be also used for disconnecting power from unused dendrites. A schematic of such a connection is presented in Fig. 8. Blocks S of keys connected as OR gate to the VDD line disconnects power from an unused dendrite (and suitable axon/CM/CMRR) module in a situation in which the value of its weight is 0 by default. Therefore, the power consumption of the matrix of neurons depends solely on the power consumption of the used areas of the network. The advantage of such routing is that there is no need for additional signals for implementing network connections. Now let's say a few words about the CMRR elimination from a router topology. A schematic of a block removing the constant component is

Fig. 5. Axone response depending on input current for various lengths L3.

46

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

Fig. 7. A diagram of routing of a perceptron working in a balanced structure including elimination of the concurrent component.

presented in Fig. 9. The circuit consists of three 2-output current mirrors. The values of scaling factors are presented in the schematic. Due to the balanced structure and the presence of non-negated signals p as well as the negated ones m in the whole topology of the neural network the following equation of input currents (16) is true for the circuit in which id currents represent dendrite output signals adding up in the input node of the CMRR circuit. This equation means that any current signals inside a network consisting of d dendrites have their representation using the following pair: (idp,idm).

component is subtracted in the output nodes according to Eq. (19), which means eliminating it from output currents ioutp , ioutm (Eq. (20)). The circuit output current is therefore the value of the signal carrying information.

∧ idp ¼ −idm

X ioutm ¼ − idp d X ioutp ¼ − idm

d

ð16Þ

Each signal coming from the dendrites contains an unknown value of the e concurrent component. Thus it is correct to define Eq. (17) describing the actual values of currents iinp , iinm which flow into the circuit in Fig. 9. Therefore the value of the input current consists of signal idp , idm carrying information and also of the total value of error ed from all branches of the topology connected to the input node of the circuit. Omitting this circuit would mean that the currents would flow directly to the axon, and the e component would affect the response of the perceptron. In an analog reconfigurable circuit, the value of the e component is particularly difficult to determine, let alone taking it into account during data processing. Therefore elimination of the component is crucial for the accuracy of calculations. X  iinp ¼ idp þ ed d X ð17Þ ðidm þ ed Þ iinm ¼ d

The current flowing into node s in Fig. 9 is derived from Eq. (18) and has the value of the doubled sum of all of components e. The e

s ¼ −iinp −iinm ¼ −2

X

ed

ð18Þ

d

ioutm ¼ −iinp −0:5s ioutp ¼ −iinm −0:5s

ð19Þ

ð20Þ

d

The structure shown in Fig. 9 is also adapted to be placed in a row. In addition, it is fully symmetrical, which means that the output signals are summed and supplied to the axon with equal delay. The following implementations proposed by the author, of a: dendrite (Fig. 2), an axon (Fig. 4) and a CMRR circuit (Fig. 9) take advantage of a complimentary structure [22]. Using the lengths of transistors with a constant width for implementing weights makes it possible to design a layout using a row strategy used in digital technologies [19]. Complementary transistors in the given input or output stage have a common length. NMOS transistor width is constant and slightly greater than the minimum width of transistors in the given technology. PMOS transistor width is chosen so that the voltage measured at the drains of both transistors is exactly (V DD + V SS)/2. A part of a layout of an example perceptron implemented in the 0.18 μm technology is presented in Fig. 10. The transistors are placed horizontally in a layout, in rows designed for digital standard cells with a constant height in the given

Fig. 8. A sample floorplan of a network, ready for implementing in the form of a row topography. Input signals are reproduced in mirror blocks C. Axons of the last layer are outputs of the network. Matrix of the W words which are programming the network, also simultaneously manages power distribution in each row, using S switches. Next to the diagram of disconnecting the not-used modules of the network using the weight configuration word.

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

Fig. 9. The concurrent component removing circuit.

technology. On the one hand, it makes it possible to easily implement a network in mixed circuits, and on the other it leads to compacting the topography [23]. More on the methods of selecting transistor sizes in such an approach can be found in [24–26]. 3. Sensitivity to mismatch One of the key problems with implementing neural networks is their sensitivity to changing parameters of modules [27]. This section presents results of a PVT (Process, Voltage, Temperature dispersion) analysis [28] for the proposed axon model. Parameter dispersion is expressed as changes of coefficient γ1 from Eq. (14). Fig. 11 illustrates the

47

importance and the impact, on the shape of a sigmoidal function, (14) of each component of vector γ for sample dispersions of their values. PVT analysis results are shown in Fig. 12. Components γ1(4) and γ1(5) remain zero in all analyses. The performed PVT analyses prove that the parameter dispersion mainly influences the gain value of an axon. Yet, it does not change its symmetry and does not cause an increase of the concurrent component. A separate issue is the analysis of resistance of the proposed dendrite structure for the parameter dispersion of a chip topography. As mentioned in Section 2.1, the method of implementing weights based on a pre-defined grid of solutions makes it possible to include, in the process of configuring a dendrite, the real properties of the topography, including its parameter dispersion. In order to analyse dendrite resistance to parameter dispersion, the author generated 4 sample grids using the TSMC manufacturer technology files, to analyse the dispersion: FF, SS, FS, SF. The grids were generated for dendrite input currents of ±1μA. Then the dendrite was programmed based on the generated grids, for a specific type of dispersion, so that in subsequent sequences it realizes 6 different scaling factors in the following range: 0.5–3. In order to ensure the reliability of the carried out analyses, during the test phase the input functions for the dendrite were different than the ones for which the grid was generated (i.e. ±0.7μA). Fig. 13b) presents the result of dendrite operation after parameter dispersion. The expected weight values were confronted with the ones actually obtained during the test. The figure presents the obtained weight values and percentage errors of their implementation. As a comparison Fig. 13a) presents results of the same analysis performed for a current mirror implementing given weights and designed in the same technology. There was no possibility of reprogramming the mirror, nor compensating the dispersion phenomenon. The figure presents the scale of the problem in the case of implementing circuits operating in the current mode. As seen in Fig. 13b) the method of generating a grid under dispersion conditions provides an opportunity to become independent on the influence of the phenomenon. The multiplier retains linearity and the mapping weights error, regardless of the scale and type of the dispersion, is less than 0.5%. 4. Example implementation in pattern recognition task This section presents the results of implementing an exemplar neural network which recognises 26 letters of the alphabet. The size of the input frame is 5x7 pixels. After standardization, signals take values of ±1μA. This means that the direction of the current is an information carrier. The input layer of the network consists of 35 neurons and the output layer consists of 26 neurons. There are 18 neurons in the hidden layer, which means 1098 multipliers throughout the network. For the network to learn the Levenberg-Marquardt algorithm for 250 epochs

Fig. 10. Fragment of a layout of an analog perceptron network, designed using the digital row strategy. The figure presents the selected dendrites, axons, complimentary component removal circuits and circuits reproducing current signals, implemented as cells with a fixed height.

48

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

Fig. 11. Illustration of the importance of factor γ on the shape of the sigmoidal function described with Eq. (14).

Fig. 12. γ1 vs. process/power supply/temperature mismatch.

and the method of steepest descent with a moment for 600 epochs were used. Fig. 14 presents two examples of learning processes for the above learning algorithms. The effect of the work done by the network is visible in Fig. 15. As current inputs 26 combinations corresponding to subsequent letters of the alphabet were specified. In case of a proper letter recognition, a selected neuron in the output layer changes the direction of the output current. The precision of the network was calculated with

Eq. (27). The average precision of recognizing patterns is 93.28%. Values above 100% result from selected responses of neurons with values below −1 μA. Note that the network is characterized by high selectivity, as out of all the patterns only one proper network output generates current of the opposite direction. It is an important property, as it makes it possible to quickly convert the response to a binary signal, regardless of the value of the flowing current.

Fig. 13. An analysis of sensitivity of dendrite parameters to the parameter dispersion phenomenon was performed using TSMC technological files - predicted weights vs. actual weights: a) current multiplier b) dendrite programmed with a grid method proposed in the current work.

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

49

Fig. 14. The process of learning (MSE) for the Levenberg-Marquardt algorithm (LM) and the method of gradient descent with a moment of (gdx).

The network has been tested on the evaluation of classification abilities. For this purpose 76 models were created, together representing 4 classes of pairs: (A,H), (B,C), (E,F), (X,Y). 1–3 pixels of each of the models were influenced with noise signals taken from the original training vectors belonging to a class opposite in the given pair. Sample models, thus obtained, are presented in Fig. 16a). As an example, the figure presents a few noisy E and F models. The first one has one pixel of model F and the other one has one pixel of model E. The more pixels in a given model are borrowed from a representative of the opposite class, the more difficult is the classification of the model. The transistor simulation of the implementation of the network resulted in a confusion matrix presented in Fig. 16b), which is a commonly used method of assessment of classifiers [34]. On its basis, the following network parameters were calculated: TPR sensitivity = 0.9 (Eq. (21)), TNR specificity = 0.76 (Eq. (22)), precision = 0.79 (Eq. (23)), ACC accuracy = 0.83 (Eq. (24)). TPR ¼ TP=ðTP þ FNÞ

ð21Þ

TNR ¼ TN=ðFP þ TNÞ

ð22Þ

precision ¼ TP=ðTP þ FPÞ

ð23Þ

ACC ¼ ðTP þ TNÞ=ðTP þ FN þ FP þ TNÞ

ð24Þ

Network electrical parameters are presented in Table 2. Network response time makes it possible to use the network in image processing (assuming 24fps speed), in the following standards: XGA, SXGA, WXGA and HD Ready. A comparison of the presented implementation with similar implementations is shown in Table 3. Figure of merit parameters were calculated using Eqs. (25) and (26), where NU is a number of units (e.g. neurons), NC is a number of channels (e.g. size of inputs multiplied by the number of neurons), f means work frequency and P is a power dissipation. The discussed implementation, compared with other notable implementations, stands out with, mainly, a high operation frequency together with low power per channel. FOM1 ¼

NC  f S P

ð25Þ

FOM2 ¼

NU  f S P

ð26Þ

Electrical parameters, e.g. power consumption of a network working in the current mode are not directly proportional to the complexity of a topology. For this reason, for the comparison (with other works) to be reliable, additional simple network for mapping the given non-linear functions were implemented. Power consumption of a network with a single input and output consisting of 6 neurons with biases (16

Fig. 15. Neural network response for 26 subsequent samples. Next to network responses, one can see precision values calculated basing on Eq. (27).

50

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51

Fig. 16. a) Sample elements of pairs of classes used to determine the confusion matrix (F,E), (X,Y), (C,B), (A,H). b) confusion matrix obtained for the 76 sample models.

5. Conclusion Table 2 Parameter of network for pattern recognition task. Parameter

Pattern recogn.

Technology [nm] Power supply [V] Size of the input vector Size of the output vector Number of neurons in the hidden layer Number of multiplying circuits Range of input currents [μA] Range of input currents of an axon [μA] Range of input currents of a dendrite [μA] Power consumption of a neural network [mW] Processing speed Precision [%]

Mapping func.

180 1.8 35 26 18 1098

1 1 4 16 ±1 ±30 ±10 13.7 7.8 MHz 94.62

826 5 MS/s 93.28

Table 3 Comparison with other artificial neural network implementations and the current implementation. Parameter

[29]

[30]

[31]

[32]

[33]

Current work Pattern recogn.

Technology [nm] Number of channels Power per channel [mW] fs [MHz] Area [mm2] Area per unit [mm2] FoM1 [1/nJ] FoM2 [1/nJ] #

350 8

500 100

350 40

65 2048

45 45

0.58 0.003 1.62 0.00174 3.3

1098 0.75

Mapping func. 180 8 0.00171

– – –

0.001 1.8 0.053# 1.2 5.17 1.48 0.12 0.32 0.72e-3

7.07 5# – 2.57 – 0.0023

7.8# 0.039 0.0049

– –

0.33 1.11 30.48# 0.033 0.44 1.9#

2.14 6.65# 0.86 0.48#

4.56# 3.42#

References

The worst case.

dendrites in the network) was 13.7 mW at 1.8 V power supply. The network was working in a balanced structure including reducing the concurrent component. The comparator proposed in work [35] can be used as an ADC2 converter for output signals.

precisioni ¼ 100% 

2

2  iouti X26 26  10 A þ i k¼1 outk

Analog-to-Diginal Converter

−6

The paper introduces a method for implementing a high-speed current-mode perceptron focusing on using it in analog-to-digital systems. It defines design requirements of such systems and solves a number of problems concerning implementing neurons with a sigmoid activation function as analog circuits, with the possibility of implementing them in standard digital nanometer CMOS technologies. One of the major achievements is proposing a noise-limited axon prototype with an activation function continuous, monotonic and differentiable in the whole range. The work introduces an implementation of a dendrite with a fully digital interface for configuring weights. The prototype does not require using bias currents or voltages with a number of DAC3 converters. The article presents a method of using the row-strategy for generating a topography of neurons concerning the fabrication of a network on a substrate common with the digital part of the mixed system. The author proposed a routing method eliminating the common mode rejection ratio, reducing power consumption and solely using dendrite weight values. Significant attention was paid to the data processing accuracy of the network, which was proven by proposing circuits for removing the concurrent component and performing the PVT analysis, analysis of resistance to the mismatch process and the confusion analysis. The obtained results are characterized by a high accuracy of calculating a response by the network, which, apart from the physical parameters, is the main argument for using an analog circuit as an alternative to a digital one.

ð27Þ

[1] M. Laiho, J. Hasler, J. Zhou, D. Chao, L. Wei, E. Lehtonen, J. Poikonen, FPAA/Memristor hybrid computing infrastructure, IEEE Trans. Circuits Syst. Regul. Pap. 62 (3) (2015). [2] S. Szczesny, A. Handkiewicz, M. Naumowicz, M. Melosik, FPAA accelerator for machine vision systems, Przegl. Elektrotech. 9 (91) (2015). [3] S. George, S. Kim, S. Shah, J. Hasler, M. Collins, F. Adil, R. Wunderlich, S. Nease, S. Ramakrishnan, A programmable and configurable mixed-mode FPAA SoC, IEEE Trans. Very Large Scale Integr. VLSI Syst. (99) (2016) 1–9 (vol. PP). [4] E. Vittoz, Analogue VLSI signal processing: why, where and how, Analog Integr. Circ. Sig. 6 (1994) 27–44. [5] G.M. Bo, D.D. Caviglia, M. Valle, A current mode CMOS multi-layer perceptron chip, IEEE Proc. MicroNeuro (1996). [6] A. Paasio, K. Halonen, V. Porra, A. Dawidziuk, Current mode cellular network with digitally adjustable template coefficients, Microelectronics for Neural Networks and Fuzzy Systems, IEEE, 1994. [7] K. Wawryn, B. Strzeszewski, Current mode circuits for programmable WTA neural network, Analog Integr. Circ. Sig. Process 27 (2001) 49–69. [8] T. Talaśka, R. Długosz, R. Wojtyna, Current Mode Analog Kohonen Neural Network, Mixdes, IEEE, 2007 250–255. [9] D. Maliuk, H.G. Stratigopoulos, Y. Makris, An analog VLSI multilayer perceptron and its application towards built-in self-test in analog circuits, On-Line Testing Symposium, IEEE 2010, pp. 71–76.

3

Digital-to-Analog Converter

S. Szczęsny / Microelectronic Engineering 165 (2016) 41–51 [10] H. Hosseini-Nejad, A. Jannesari, A.M. Sodagar, J.N. Rodrigues, A 128-channel discrete cosine transform-based neural signal processor for implantable neural recording microsystems, Int. J. Circuit Theory Appl. 43 (2015) 489–501. [11] G. Doménech-Asensi, J.A. Díaz-Madrid, R. Ruiz-Merino, Synthesis of CMOS analog circuit VHDL-AMS descriptions using parameterizable macromodels, Int. J. Circuit Theory Appl. 41 (2013) 732–742. [12] S. Kaedi, E. Farshidi, A new low voltage four-quadrant current mode multiplier, 20th Iranian Conference on Electrical Engineering (ICEE) 2012, pp. 160–164 (Tehran, Iran). [13] H. Chible, Simulation of four-quadrant four transistors synapse analog multiplier, Int. J. Model. Simul. 35 (2) (2015) 49–56. [14] M.A. Al-Absi, A. Hussein, M.T. Abuelma'atti, A novel currentmode ultra low power analog CMOS four quadrant multiplier, International Conference on Computer and Communication Engineering (ICCCE) 2012, pp. 13–17 (Kuala Lumpur, Malaysia). [15] A.S. Medina-Vazquez, L.N. Oliva-Moreno, GMOS four-Quadrant Analog Multiplier, 9th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE) 2012, pp. 1–6 (Mexico City). [16] M. Kumngern, J. Chanwutitum, 0.75-V four-quadrant current multiplier using floating gate-MOS transistors, International Electrical Engineering Congress (iEECON) 2014, pp. 1–4 (Chonburi, Thailand). [17] M. Kumngern, M.S. Junnapiya, A CMOS four-quadrant current multiplier using electronically tunable CCII, International Conference on Advanced Technologies for Communications (ATC) 2013, pp. 366–369 (Ho Chi Minh City). [18] N.B. Modi, P.P. Gandhi, Four quadrant analog multiplier with VCVS in deep-submicron technology, IEEE Conference on Information and Communication Technologies (ICT) 2013, pp. 1091–1094 (Thuckalay, India). [19] C. Dan, CMOS IC LAYOUT concepts, methodologies, and tools, Library of Congress Cataloging-in-Publication Data, 1999 (ISBN 0-7506-7194-7). [20] G. Giustolisi, G. Palmisano, G. Palumbo, CMRR frequency response of CMOS operational transconductance amplifiers, IEEE Trans. Instrum. Meas. 49 (1) (2000). [21] S. Szczesny, Computer Tools for Layout Generation of Switched-Current Circuits(Ph. D. dissertation) Poznań University of Technology, Poznań, 2013. [22] C.M. Horwitz, M.D. Silver, Complementary current-mirror logic, IEEE J. Solid State Circuits 23 (1) (Feb. 1988) 91–97.

51

[23] A. Handkiewicz, P. Śniatała, G. Pałaszyński, S. Szczesny, P. Katarzyński, M. Melosik, M. Naumowicz, Automated DCT layout generation using AMPLE language, Proceedings of the 17th International Conference MIXDES 2010, pp. 215–218. [24] A. Handkiewicz, S. Szczesny, M. Naumowicz, P. Katarzyński, M. Melosik, P. Śniatała, M. Kropidłowski, SI-studio, a layout generator of current mode circuits, Expert Syst. Appl. 42 (6) (2015) 3205–3218. [25] S. Szczesny, M. Naumowicz, A. Handkiewicz, SI-studio - environment for SI circuits design automation, Bull. Pol. Acad. Sci. Tech. Sci. 60 (4) (2012) 757–762. [26] A. Handkiewicz, P. Katarzyński, S. Szczesny, M. Naumowicz, M. Melosik, P. Śniatała, M. Kropidłowski, Design automation of a lossless multiport network and its application to image filtering, Expert Syst. Appl. 41 (5) (2014) 2211–2221. [27] R. Długosz, T. Talaśka, W. Pedrycz, Fellow IEEE, current-mode analog adaptive mechanism for Ultr-low-power neural networks, IEEE Trans. Circuits Syst. Express Briefs 58 (1) (January 2011). [28] T. McConaghy, K. Breen, J. Dyck, A. Gupta, Variation-Aware Design of Custom Integrated Circuits: A Hands-On Field Guide, Springer, 2013. [29] G. Zatorre, M.T. Sanz, N. Medrano, P.A. Martinez, S. Celma, An analogue CMOS neural circuit for improved sensing(pH.D) Res. Microelectron. Electron. (2006) 185–188. [30] D. Maliuk, H.G. Stratigopoulos, Y. Makris, An analog VLSI multilayer perceptron and its application towards built-in self-test in analog circuits, IEEE 16th International On-Line Testing Symposium 2010, pp. 71–76. [31] J. Chen, T. Shibata, A neuron-MOS-based VLSI implementation of pulse-coupled neural networks for image feature generation, IEEE Trans. Circuits Syst. Regul. Pap. 57 (6) (2010). [32] S. Sasaki, M. Yasuda, H.J. Mattausch, Digital associative memory for word-parallel Manhattan-distance-based vector quantization, Proc. IEEE Eur. Solid-State Circuit Conf., Bordeaux, France Sep. 2012, pp. 185–188. [33] J.R. Shinde, S. Salankar, Multi-objective optimization for VLSI implementation of artificial neural network, Advances in Computing, Communications and Informatics (ICACCI), International Conference 2015, pp. 1694–1700. [34] R. Kohavi, F. Provost, Glossary of terms, Mach. Learn. 30 (2/3) (1998) 271–274. [35] P. Śniatała, M. Naumowicz, A. Handkiewicz, S. Szczesny, J. de Melo, N. Paulino, J. Goes, Current mode sigma-delta modulator designed with the help of transistor's size optimization tool, Bull. Pol. Acad. Sci. Tech. Sci. 63 (4) (2015) 919–922.