Applied Soft Computing 49 (2016) 603–610
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Multi-objective optimization applied to unified second level cache memory hierarchy tuning aiming at energy and performance optimization Filipe Rolim Cordeiro ∗ , Abel Guilhermino da Silva-Filho Informatic Center, Universidade Federal de Pernambuco, CEP: 50740-540 Recife, PE, Brazil
a r t i c l e
i n f o
Article history: Received 20 January 2014 Received in revised form 12 January 2016 Accepted 2 September 2016 Available online 7 September 2016 Keywords: Embedded systems Multi-objective optimization Energy consumption Evolutionary algorithms
a b s t r a c t Cache memory optimization has an important impact on the energy consumption of the embedded system. However, optimization is a hard task due to the large exploration space and conflicting objectives. In this work five multiobjective optimization techniques are applied to cache memory optimization. The PESA-II, NSGAII, SPEA2, PAES and NPGA approaches were applied to 18 different applications from MiBench and PowerStone benchmark suites. Results compared the quality of results in terms of the metrics of general distance, diversity, hypervolume and precision. All techniques had good performance to cache optimization, but PESA-II showed a better performance for all metrics analyzed, having better results in 83% and 88% of cases, compared with the metrics of generational distance and hypervolume, respectively. Additionally, PESA-II needs to explore only 1.47% of exploration space, finding solutions near to Pareto Optimal. © 2016 Elsevier B.V. All rights reserved.
1. Introduction One of the main aspects of optimization in recent projects is related to energy consumption of an embedded system [1,2]. It is known that in a microprocessor system, one of the main factors responsible for energy consumption is the cache memory hierarchy, which can consume up to 50% of the energy required by the complete system [3,4]. Therefore, tuning a memory hierarchy has an important impact on a processors total energy, and, consequently, influences directly the energy consumed by the embedded system. The main purpose of a cache subsystem is to provide high performance access to memory, and the cache optimization should not only save energy, but also prevent the depreciation of the applications performance. Studies reveal that tuning performed on cache memory parameters for a specific application can save on average 60% of the energy consumption [5]. However, each application usually requires a specific kind of architecture and the search for optimal configurations can lead to an elevated cost due to the size of the exploration space. In most commercially used memory hierarchies, with a unified
∗ Corresponding author. E-mail addresses:
[email protected] (F.R. Cordeiro),
[email protected] (A.G. da Silva-Filho). http://dx.doi.org/10.1016/j.asoc.2016.09.006 1568-4946/© 2016 Elsevier B.V. All rights reserved.
second level cache memory, the size of the exploration space can involve thousands of configurations, due to the interdependency between the instruction and the data cache [6]. In those cases, finding optimal architecture configurations in the least time possible becomes a big challenge, considering that it is practically unfeasible for the use of exhaustive mechanisms to perform searches in the space project. Some optimization techniques have been proposed in the last years with the objective of finding a set of optimal solutions, analyzing only a small subregion of the total exploration space. Among those techniques, one that has been successfully applied is the multi-objective approach, whose proposal is to find a set of solutions which are in the optimal region of exploration space. The use of multi-objective optimization algorithms in the area of embedded systems is well applied because it usually presents typical characteristics of problems that can be solved by those algorithms. Among the main characteristics is the search for configurations in a large exploration space, involving project restrictions and aiming to optimize attributes to be defined by the designer. Despite the large number of existing optimization algorithms, few works have been proposed aiming at the optimization of cache memory hierarchy with a unified second level. In this work, five multi-objective optimization algorithms were applied to the optimization of cache memory with a unified second level. The optimization of the algorithms was done aiming to optimize the energy consumption and the number of cycles necessary
604
F.R. Cordeiro, A.G. da Silva-Filho / Applied Soft Computing 49 (2016) 603–610
to execute an application. A set of 18 applications was analyzed to validate the analysis of this work. 2. Background Different strategies have been developed to solve multiobjective problems, such as memory hierarchy optimization. One approach which succeeded for parameters optimization is genetic algorithms (GAs), which was introduced by Holland [7]. This strategy is based on the evolution of a group of individuals, representing the solutions of a problem, which are submitted to a selection procedure to find the best individuals adapted to the problem. The set of the best solutions equally optimal in the exploration space is located in the region called Pareto Optimal, whereas the best solution set found by the algorithm is called Pareto Front. In GAs, each candidate solution is mapped as an individual of a population. The population can create new individuals through genetic recombination and selection, which may generate a new population. Based on the GA approach, several techniques were proposed in the literature, such as NPGA [8], PAES [9], NSGAII [10], PESA-II [11] and SPEA2 [12], which vary in terms of selection operators and parameters to obtain the Pareto Optimal. In NPGA, the selection mechanism is based on Pareto dominance, where a niche counter indicates the number of neighbor solutions, such that the region density can be used as a selection criterion. In PAES, just one individual is produced at a time and compared with its mutated solution. A grid is used to control the density of best solutions. NSGAII uses a concept of non-dominance to select individuals, separating the solutions in Fronts. The crowding distance metric is used to select the best individuals of a non-dominated solution set. PESA-II uses the concept of hyper grids to separate solutions by density, but unlike PAES, all the population is submitted to the operators of crossover and mutation. SPEA2 also uses a neighbor density strength measure to select the individuals of the best solution set, following the basic sequence of GAs. In Silva-Filho et al. [13] applies NSGAII algorithm to the problem of memory hierarchy tuning. The technique was applied to a set of 12 applications, trying to optimize the objectives of energy consumption and number of cycles necessary to execute an application. The results obtained were compared with the techniques TECH-CYCLES and TEMGA and it was possible to show good results of the technique to 67% of analyzed cases. Palermo et al. [14] uses the Discrete Particle Swarm Optimization (DPSO) algorithm for hierarchies of memory with two levels, with unified second level. It is based on the traditional particle swarm optimization (PSO) algorithm, being adapted to the problem of cache memory. In the results, the DPSO is compared only with the exhaustive approach, accelerating the search mechanism by 5 times with a precision of 70%. Gordon-Ross et al. [6] presents the ACE-AWT heuristic, also for cache with unified second level, aiming to reduce energy consumption. This approach is based on the concatenation method, which allows the banks of memory to be logically concatenated, allowing
different configurations of memory hierarchies. Results were compared with the SERP heuristic, achieving a 61% energy reduction. In this work the techniques NPGA, PAES, NSGAII, PESA-II and SPEA2 were applied, which were selected based on a previous study of the multi-objective approaches most used in the literature. The techniques were adapted to optimize cache parameters of cache hierarchy with unified second level. 3. Experimental environment 3.1. Architecture specification The architecture used in this work is composed of a MIPS processor, level 1 instruction cache (IC), level 1 data memory (DM), level 2 unified cache (L2) and a main memory (MEM), such as shown in Fig. 1. It also uses an input tension of 1.7 V, with write-through write policy and transistor technology of 70 nm. Commercial cache configurations that are used in embedded systems applications were adapted to the exploration space. The parameters used to adjust the architecture configuration were cache size, line size and associativity, for each cache component. The ranges of the parameters used are shown in Table 1. As shown in Table 1, each parameter of a cache component varies in a range of three values. The full set of combinations of Table 1, which represents the exploration space, is composed of 9084 valid configurations of cache hierarchy. During the exploration process of the optimization techniques, the cache parameters are tuned, trying to find the cache configuration which has the best tradeoff between consumed energy and cycles necessary to run each application. In this work, we used 18 applications from MiBench [15] and Power Stone [16] benchmarks. The applications involve different areas, allowing the validation of the optimization techniques in different exploration contexts. The architecture behavior changes in relation to energy consumed and cycles, depending on which application is running. The values for total energy consumption and number of cycles necessary to run each application to a specific configuration were obtained through the use of the tools SimpleScalar [17] and eCACTI [18]. The cache memory energy model used supports both energy components: static and dynamic. 3.2. Metrics Metrics are used to measure characteristics of each multiobjective algorithm, helping to understand its behavior and allowing more concrete evaluations of the performance of the algorithm. The metrics are also an important parameter of comparison between algorithms, since many times it is hard to notice which algorithm presents a better solution set for a problem. The metrics used in this work are described bellow. The metric of generational distance was proposed by Van Veldhuizen and Lamont in [19] and it is used to measure the Euclidean distance between the solutions. The generational distance is calculated by the following equation:
GD =
s d2 i=1 i
s
(1)
,
Table 1 Range of cache parameters.
Fig. 1. Cache memory hierarchy.
Parameter
Level one cache
Level two cache
Cache size Line size Associativity
2KB, 4KB, 8KB 8B, 16B, 32B 1, 2, 4
16KB, 32KB, 64KB 8B, 16B, 32B 1, 2, 4
F.R. Cordeiro, A.G. da Silva-Filho / Applied Soft Computing 49 (2016) 603–610
IC 2048
Cache Size
16
Line Size
DM 4
4096
L2
16
Cache Associavity Size
Line Size
605
4
16384
Cache Associavity Size
32
2
Line Size
Associavity
Fig. 2. Solution mapping.
where s is the number of non-dominated solutions from Pareto Front and di is the least Euclidian distance between solution i and the Pareto Optimal. The diversity metric [20] measures the level of uniformity of the distribution of solutions through Pareto Front. The diversity metric is calculated through the following equation:
M
=
de + m=1 m
M
s−1 i=1
| di − d¯ |
de + (s − 1)d¯ m=1 m
,
i∈Q
vi ),
(3)
where to each solution i belonging to Pareto Front Q, a hypercube
vi is constructed based on a reference point. Therefore, the hypervolume is calculated through the union of each hypercube vi from the solutions of the Pareto Front. The reference point can be defined constructing a vector with the worst values of an objective function. In this metric, the bigger the metric the better, because a high value of hypervolume indicates that there was a high spread through the extreme solutions of the Pareto Front. Besides, it indicates that there was a better convergence, because the more the algorithm converges, the higher will be the volume calculated in this metric. The metric of precision describes the accuracy of the algorithm, calculating the number of solutions from the Pareto Front that are in the Pareto Optimal. This indicates the accuracy rate of finding solutions in the Pareto Optimal. 4. Optimization mechanism 4.1. Solution mapping The exploration space, also called search space or decision space, corresponds to the space of possible configurations that a solution can assume. To the problem optimizing memory hierarchy with unified second level, the search space corresponds to the number of possibilities to find a cache configuration involving the parameters of cache size, line size and associativity to instruction cache, data cache and second level cache. Each solution in the search space is mapped to a space of objectives, which represents the objectives to be optimized. In the case of the problem considered, each configuration is mapped to a value of energy and cycles. The mapping process represents a function f : D → I, whose domain is the set of possible configurations of cache, and the image is the number of
Simulate (each soluon of populaon)
Inialize Populaon
SimpleScalar
eCACTI
Crossover Energy, Cycles
Mutate
(2)
where di is the distance of solution i to its neighbor solutions, s is the number of solutions in the Pareto Front, M is the number of objectives to be optimized and d¯ represents the average value of all di distances. The di value can be represented as the Euclidean dise represents tance between the ith and the (i+1)th solutions. The dm the distance between the extreme solution of Pareto Front and the nearest solution, in the mth objective. To an ideal distribution of the solutions, the variable is equal to zero. The hypervolume [21] indicator calculates the volume of the region covered between the point of the solutions of the Pareto Front and a reference point. The hypervolume can be calculated through the following equation: HV = volume(
Start
Selecon
No
Calculate Soluon Fitness Metrics
Yes Stop criterion
Final Soluons
Results
Fig. 3. Project Flow.
cycles and the energy necessary to run an application with those cache parameters. This mapping is done through the SimpleScalar and eCACTI simulations, which provide the values of energy and cycles from a given cache configuration. The mapping of solutions describes how each solution of the problem is represented. In the problem of cache memory hierarchy optimization, each solution represents a configuration of cache memory architecture. The solutions to the problem are mapped as a vector of integer numbers, where each vector position represents a parameter of memory hierarchy, as shown in Fig. 2. The solution vector is divided in three regions, which are related to the cache components: level one instruction cache (IC), level one data memory (DM) and unified second level (L2). For each region three cache parameters are defined: cache size, line size and associativity. In Fig. 2, an example with the following configuration of memory architecture is shown: level one instruction cache with cache size of 2048 bytes, line size of 16 bytes and associativity of 4, level one data cache with cache size of 4096 bytes, line size of 16 bytes and associativity of 4 and second level cache with cache size of 16,384 bytes, 32 bytes of line size and associativity 2. A value of energy consumption and number of cycles is associated with each solution. The mutation operator used to map the problem was the random mutation, in which each vector position has a mutation probability, and the value of that position is varied randomly in a range of possible configurations. The type of crossover used was the two point crossover, which randomly selects two cut points to perform the genetic recombination between the solutions of the problem. 4.2. Project Flow The Project Flow shows the steps performed to implement and simulate each technique. Fig. 3 describes the generalized simulation process used for all techniques. Some techniques will vary in the calculation of fitness value, selection step or stop criterion, but the general process is quite similar. At the beginning of the
606
F.R. Cordeiro, A.G. da Silva-Filho / Applied Soft Computing 49 (2016) 603–610
process, the algorithm generates a random population, in which each individual of the population represents a valid cache configuration. Next, the crossover and mutation operators are applied to the individuals of the population. After that, the simulation of the individuals (cache configurations) is done to obtain the values of energy and cycles necessary to run the application with that cache configuration. The SimpleScalar and eCACTI tools are used to obtain the energy and cycles data. Next, the fitness value of the solution is calculated and then all the solutions are submitted to a selection process to generate the new population. If the stop criterion is achieved, the solutions from the last generation are considered as the final solution set. To evaluate the results, we applied the metrics described previously to measure the quality of the solutions obtained by each technique. NPGA presents a tournament selection mechanism based on Pareto dominance. In this tournament, two individuals are selected randomly and compared to a subpopulation with pre-defined size. If a candidate is dominated by the comparison set and the other is not, then the last is selected. If both or no candidates are dominated by the subpopulation, then the winner is selected based on a niche counter. The niche counter counts the number of neighbor solutions that are close to the solution that is being analyzed. Thus, the niche counter is a way to evaluate the density of close solutions, such that the region density of a solution can be used as a criterion of selection of the individuals of a population. In PAES the concept of population is not used. Each individual is mutated and compared with the original solution. The best one is selected to compose the solution set. Then the last solution is mutated again and the process continues until a stop criterion is reached. As the solution set has a limited size, old solutions from a solution set can trade place with new ones. NSGAII uses the same flow of Fig. 3, but the selection mechanism is based on the concept of nondominance to select individuals, using the metric crowding distance, which is used to select the best individuals of a nondominated solution set. PESA-II presents a selection strategy based on hypergrids. The hypergrids consists of a division of the exploration space into cells with ranges proportional to each dimension, composing the hypercubes. The analysis of those regions allows the verification of the density of the solutions, helping in the selection process and the update of the external archive. Initially N individuals which compose the initial population are randomly generated. From these individuals, the ones which are not dominated are moved to the external archive. If the stop criterion is not reached, a new population of size N is created through the selection of the external archive using as selection criterion the hypercube density. The solutions which are found in less dense regions have a higher probability to be chosen. The process is repeated until the stop criterion is reached. SPEA2 uses an external archive to store the best solutions found through the generations. It is based on the concept of dominance and neighbor density to evaluate the fitness of the found solutions. Through the neighbor density it is possible to observe which individuals are more representative to the final solutions set. Thus, the density is used as a decision criterion between individuals with the same level of dominance.
5. Results This section presents the results obtained through the use the multi-objective optimization techniques, aiming to optimize cache memory hierarchy, according to the exploration space defined in the experimental environment. The optimization techniques were applied to explore cache hierarchy in 18 applications, trying to obtain cache configurations with the lowest values of energy consumption and number of cycles necessary to run the applications.
As the techniques are nondeterministic, each technique execution was repeated 30 times, aiming to obtain the average value of results. Each optimization technique has configuration parameters that influence the behavior of each algorithm in finding the best solution to the problem. The configuration of the parameters of each technique implemented was based on the parameters of their respective articles. To NSGAII, probabilities of crossover, gene mutation and individual mutation of 50%, 20% and 20%, respectively were applied. For SPEA2, those probabilities were of 80%, 11% and 10%, according to the definition of its proposed article. The external file size used in SPEA2 was of 10 individuals. In PAES, the values of 80%, 11%, 10%,10 and 5 were used for the parameters of crossover probability, gene mutation probability, individual mutation probability, external file size and grids by axis, respectively. In PESA-II, the probability of crossover was of 70%, while the other parameters were the same used in PAES. Finally, in NPGA the probability values used were of 90%, 11% and 10%, for the probabilities of crossover, gene mutation, and individual mutation, respectively, and the sigma share value was 0.1. For all techniques, it was used a population size of 10 individuals. The choice of population size was based on Silva-Filho et al. work [13], in which NSGAII was applied to other kinds of cache memory hierarchy and it was recommended to use a population of 10 individuals. Each of the parameter values were based on the articles of each technique. The crossover probability was consistently high in most techniques; however, for NSGAII a crossover rate of 50% was applied, as suggested in [13]. Each stop criterion was based on the number of simulations, which was defined in a way that the algorithm continues its simulation even after it achieves a convergence point. This study was performed to analyze the quality of the solutions in the cases in which the number of simulations was not a critical point can continue the search for optimal solutions. As each technique has a different convergence speed, we use a high number of iterations to evaluate the quality of the solutions after each technique has converged. The results were based on the metrics of general distance, diversity and hypervolume. Table 2 shows the results for the metric of general distance. The first column of Table 2 shows each application of the benchmark, whereas the following columns shows the results of general distance metrics, for each algorithm. The last column shows the best algorithm for each application. The bold value in each row represents the best value found for that application. The metric of general distance measures the proximity of the final solution set to the Pareto Optimal set, where the lower the value the better it is. As can be observed, PESA-II presented the lowest general distance in most of cases, being the best technique in 83% of the applications. That means that the solutions found by PESA-II have a better tradeoff between the objectives that are being optimized when compared to other techniques. A similar analysis was done in comparison with the metric of diversity, as shown in Table 3. According to Table 3, the techniques presenting the best results for the metric of diversity are SPEA2 and PESA-II. SPEA2 and PESA-II each presented best results in 44% of cases. The bold column represents the technique which presented best results for the metric of diversity, for each application. The column named Best also indicates which technique was the best for that application. The metric of diversity shows the uniformity of distribution of solution found by each technique. Solutions uniformly distributed through Pareto are more interesting because they differ more between each other, being equally good, but with different tradeoffs. For this metric, PESA-II also presented good results for all applications, presenting results as good as SPEA2. A comparison done to the hypervolume metric is shown in Table 4.
F.R. Cordeiro, A.G. da Silva-Filho / Applied Soft Computing 49 (2016) 603–610
607
Table 2 Results of general distance metric applied to the solutions found by different multi-objective techniques to 18 applications. Application
NSGAII
SPEA2
PESA-II
PAES
NPGA
Best
basicmath large basicmath small bitcount large bitcount small crc32 large crc32 small djkstra large djkstra small fft large fft small patricia large patricia small qsort large qsort small sha large sha small susan large susan small average
7.81E+09 5.38E+07 3.42E+05 2.02E+04 4.19E+05 2.66E+04 1.49E+07 9.70E+05 6.57E+08 6.36E+06 2.79E+09 5.77E+08 3.45E+07 2.58E+08 8.09E+05 2.33E+05 4.42E+05 2.05E+04 6.78E+08
2.04E+10 3.12E+08 3.39E+05 2.42E+04 2.64E+05 3.49E+04 9.48E+06 5.85E+05 1.38E+09 4.47E+06 4.67E+10 9.20E+08 5.54E+07 6.17E+07 2.43E+05 1.85E+05 1.32E+06 2.32E+04 3.88E+09
2.34E+09 4.26E+07 1.28E+05 1.80E+04 2.02E+05 5.71E+03 1.85E+06 1.12E+05 1.35E+08 2.24E+06 7.71E+08 2.74E+07 4.05E+06 2.99E+04 4.62E+05 1.33E+04 3.51E+05 7.73E+03 1.85E+08
6.07E+11 3.28E+07 1.36E+05 1.68E+04 5.73E+05 6.24E+04 2.48E+07 7.44E+05 1.18E+09 1.02E+07 1.36E+10 4.21E+07 7.96E+07 4.42E+06 6.20E+05 9.84E+04 1.26E+06 4.12E+04 3.45E+10
1.80E+10 3.26E+07 1.53E+05 5.50E+04 2.68E+05 1.86E+04 2.97E+06 2.56E+05 2.98E+08 5.22E+06 1.05E+09 2.54E+08 1.98E+07 1.14E+06 9.33E+05 6.07E+04 5.56E+05 1.84E+04 1.09E+09
PESA-II NPGA PESA-II PAES PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II SPEA2 PESA-II PESA-II PESA-II PESA-II
Table 3 Results of diversity metric applied to the solutions found by different multi-objective techniques in 18 applications. Application
NSGAII
SPEA2
PESA-II
PAES
NPGA
Best
basicmath large basicmath small bitcount large bitcount small crc32 large crc32 small djkstra large djkstra small fft large fft small patricia large patricia small qsort large qsort small sha large sha small susan large susan small average
1.39 0.95 1.21 1.07 1.03 1.04 0.78 0.89 1.10 0.88 1.20 1.26 1.10 1.21 0.97 1.16 1.03 1.07 1.07
0.98 0.89 1.20 1.07 1.07 0.95 0.75 0.53 1.17 0.71 1.10 1.17 0.87 1.04 1.18 0.81 0.99 0.95 0.97
0.93 0.85 0.93 0.80 1.18 1.28 0.96 0.80 1.02 0.83 0.87 0.86 1.14 0.74 1.08 1.07 1.05 1.16 0.98
1.17 1.37 1.13 1.16 1.27 1.13 1.26 1.07 1.40 1.39 1.17 1.15 1.40 1.18 1.36 1.30 1.27 1.10 1.24
1.50 1.30 1.19 1.08 1.11 1.12 1.07 0.86 1.25 1.11 1.31 1.35 1.04 1.21 1.16 1.08 1.06 0.98 1.15
PESA-II PESA-II PESA-II PESA-II NSGAII SPEA2 SPEA2 SPEA2 PESA-II SPEA2 PESA-II PESA-II SPEA2 PESA-II NSGAII SPEA2 SPEA2 SPEA2 SPEA2
Table 4 Results of hypervolume metric applied to the solutions found by different multi-objective techniques for 18 applications. Application
NSGAII
SPEA2
PESA-II
PAES
NPGA
Best
basicmath large basicmath small bitcount large bitcount small crc32 large crc32 small djkstra large djkstra small fft large fft small patricia large patricia small qsort large qsort small sha large sha small susan large susan small average
3.60E+36 2.70E+30 4.01E+22 1.95E+20 4.23E+26 3.12E+21 1.92E+27 7.80E+24 1.35E+32 2.89E+28 1.43E+33 1.21E+30 4.15E+29 4.49E+26 8.09E+26 7.11E+22 9.39E+25 1.29E+21 2.00E+35
3.61E+36 2.68E+30 4.01E+22 1.94E+20 4.23E+26 3.08E+21 1.93E+27 7.87E+24 1.35E+32 2.91E+28 1.43E+33 1.21E+30 4.19E+29 4.50E+26 8.15E+26 7.06E+22 9.40E+25 1.29E+21 2.01E+35
3.63E+36 2.72E+30 4.08E+22 1.97E+20 4.28E+26 3.13E+21 1.95E+27 8.01E+24 1.36E+32 2.95E+28 1.46E+33 1.22E+30 4.23E+29 4.53E+26 8.17E+26 7.21E+22 9.42E+25 1.29E+21 2.02E+35
3.62E+36 2.71E+30 4.03E+22 1.94E+20 4.19E+26 3.01E+21 1.89E+27 7.80E+24 1.36E+32 2.93E+28 1.45E+33 1.21E+30 4.22E+29 4.50E+26 8.13E+26 7.17E+22 9.41E+25 1.28E+21 2.01E+35
3.62E+36 2.70E+30 4.06E+22 1.96E+20 4.28E+26 3.12E+21 1.95E+27 7.97E+24 1.36E+32 2.94E+28 1.45E+33 1.22E+30 4.19E+29 4.52E+26 8.13E+26 7.18E+22 9.42E+25 1.29E+21 2.01E+35
PESA-II PESA-II PESA-II PESA-II NPGA PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II NPGA PESA-II
In Table 4, the technique PESA-II obtained best results for most cases, corresponding to 89% of total applications analyzed. The hypervolume is used to measure, at the same time, the convergence to Pareto Optimal and the spread through it. The values of the
hypervolume are close to each other because the region calculated is very large. Therefore, a small convergence in the Pareto results in a small change in the calculation of the hypervolume. However, the differences of the values between techniques are significant to find
608
F.R. Cordeiro, A.G. da Silva-Filho / Applied Soft Computing 49 (2016) 603–610
Fig. 4. Comparison between PAES, NSGAII, NPGA, SPEA2 and PESA-II for six applications.
the techniques which achieved best results. In Table 4, it was not possible to describe all digits of each hypervolume value; however, the bold values represent the highest values, for each application, considering all digits of each hypervolume value. A similar analysis is done for the metric of precision and it is shown in Table 5. In Table 5, the PESA-II obtained best results for the metric of precision, obtaining the best precision in 89% of the cases. The precision metric is used to analyze the number of solutions presented in the final solution which are presented in the Pareto Optimal. Therefore, these results means that PESA-II has more chance, on average, to find solutions that are in Pareto Optimal than the other techniques analyzed. Based on the analysis performed, it could be observed that PESA-II obtained best results for all metrics applied to the solutions through the 18 applications. Because of this, the PESA-II is recommended to optimize cache memory hierarchies. The same techniques can be applied to a different configuration space or hierarchies.
The quality of Pareto Fronts of each technique can be seen in Fig. 4, where the solution set of each technique is presented, to 6 different applications. As can be observed in Fig. 4, PESA-II usually finds good solutions when compared to other techniques, with similar or superior quality when compared to the solutions obtained from the other techniques. As can be seen in djkstra large (d) and ff large(e), PESAII Pareto Front has a good spread and a uniformity. In fft small (f), although the solutions are all in the same region, the solutions obtained by PESA-II have more interesting values of number of cycles, when compared to other techniques. The Paretos shown in Fig. 4 validate the results obtained from the metrics, which indicated the quality of solutions of PESA-II. Finally, we performed an analysis of time saving when using PESA-II, compared with the exhaustive approach. For this analysis we used a different stop criterion, based on the convergence rate of the algorithm. Thus, as soon as the algorithm converges, it stops and the final solution set is obtained. The comparison of simulation time
F.R. Cordeiro, A.G. da Silva-Filho / Applied Soft Computing 49 (2016) 603–610
609
Table 5 Results of precision metric applied to the solutions found by different multi-objective techniques for 18 applications. Application
NSGAII
SPEA2
PESA-II
PAES
NPGA
Best
basicmath large basicmath small bitcount large bitcount small crc32 large crc32 small djkstra large djkstra small fft large fft small patricia large patricia small qsort large qsort small sha large sha small susan large susan small average
39.17 30 5 18.1 19.83 14.64 10.58 28.62 17.83 5.83 5.54 17.45 54.25 11.67 17.11 13.61 29.83 23.87 20.16
25 20 12.5 10.5 26.15 15.22 21 25.83 32.21 22.5 22.75 9.76 47.5 28.33 49.23 12.67 3.33 43 23.75
87.25 69.5 62.33 52.17 30.01 57.89 73.57 68.69 62.54 54.89 69.57 47.67 87.81 83 72.64 87.14 54.84 39.85 64.52
61.38 66.31 27.5 36.67 16.44 20.24 43.33 50.78 28.89 23.75 58.17 58.4 48.05 52.36 60.54 35.14 40.39 16.81 41.4
31.67 52.42 40.5 26.43 17.27 31.26 43.81 35.49 15.83 32.67 26.99 54.19 56.67 26.67 44.56 47 25.5 34.36 35.74
PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PESA-II PAES PESA-II PESA-II PESA-II PESA-II PESA-II SPEA2 PESA-II
Table 6 Time saved using PESA-II. Notations: s = seconds; h = hours; d = days; min = minutes. Application
One Sim (seconds)
#Sim. PESA-II
Total time (Exhaustive)
Total time (PESA-II)
basicmath large basicmath small bitcount large bitcount small crc32 large crc32 small djkstra large djkstra small fft large fft small patricia large patricia small qsort large qsort small sha large sha small susan large susan small
1424 37 96 6 97 5 47 10 179 20 190 31 105 1 62 6 80 6
61 85 65 63 57 61 54 64 62 62 90 84 62 76 57 65 61 64
149.7 d 3.9 d 11 d 15 h 10 d 12.6 h 5d 1d 18.8 d 50.4 h 20 d 3.3 d 11 d 2.5 h 6.52 d 15 h 8.4 d 15 h
1.02 d 29.6 min 1.68 h 6 min 2.02 h 7.25 min 44.65 min 10.3 min 2.93 h 21.7 min 2.22 h 25.83 min 1.92 h 1.25 min 52.7 min 5.9 min 1.4 h 7.9 min
6. Discussion It could be observed from the presented data that PESAII obtains better results than other techniques in a very high
7
Exhaustive PESA-II
6 5
Energy (J)
with exhaustive approach is presented in Table 6. The simulations were performed using the experimental environment mentioned in section 3 using a computing platform with an Intel Core 2 Duo at 1.33 GHz, with 2GB of RAM memory. PESA-II performs an average of 133 iterations to find the solution set, which corresponds to exploring only 1.47% of total exploration space, which is composed by 9084 different configurations. In terms of time, approximately 40 h were needed to simulate all benchmarks when using PESA-II mechanism, whereas the exhaustive approach required 6000 h of simulation time. Besides finding solutions exploring only a small part of the exploration space, the quality of the solutions of PESA-II is very close to the Pareto Optimal. An example of this analysis is illustrated in Fig. 5, comparing the solution set found by PESA-II with the exhaustive approach, to the fft large application. As can be seen in Fig. 5, the solutions obtained by PESA-II are quite close to the Pareto Optimal of the application, which demonstrates that, in spite of the small search in the exploration space, the algorithm can find solutions with quality close to the exhaustive approach.
4 3 2 1 0 100
110
120
130
140
150
160
170
Number of Cycles (1xE7) Fig. 5. Solutions found by PESA-II running the fft large application.
percentage. It also can be observed from Fig. 4 that Pareto of the solutions of PESA-II are more spread through exploration space, in comparison with the other techniques. This happens due to the nature of generation of new solutions of PESA-II, compared to the other optimization techniques, and the type of Pareto Optimal of cache optimization problem. Most of the techniques use a density measure applied to solutions to obtain an uniformly distributed
610
F.R. Cordeiro, A.G. da Silva-Filho / Applied Soft Computing 49 (2016) 603–610
Pareto. However, they apply the density measure only as a tiebreaking criterium between candidate solutions. This can be seen in NPGA, which uses a niche count, in NSGAII, using crowding distance metric and SPEA2, which also uses a density measure. Because density measure helps to improve the diversity of solutions, it is present in all the optimization algorithms. However, PESA-II uses this metric not as a tie-breaker criterium, but as a selection measure. PESA-II divides the space of exploration in a hypergrid, aiming to have equally dense regions. In PESA-II, solutions that are in less dense regions have a higher probability to provide parents to the next generation. This feature helps the Pareto to spread because usually the extremities solutions of the Pareto are in low dense regions, and selecting parents from low dense regions increases the chance of generating new solutions in that area, extending the Pareto. Because of this, the Pareto of PESA-II is wider for this problem, which increases the precision of the method. This characteristic of PESA-II showed advantages because the Pareto Optimal of the solutions of the used applications are quite spread, as can be seen in Fig. 5. In problems with less spread Pareto, the techniques may be more competitive. Therefore, PESAII showed better results for cache optimization problem and can be applied to other problems with similar characteristics. 7. Conclusion and future works In this work, we developed a study about optimization of cache memory hierarchy, with unified second level, aiming at the reduction of energy consumption and optimization of performance when executed with different applications. Five multiobjective optimization techniques were used, which were implemented and adapted to cache memory optimization. The techniques were simulated for 18 applications of MiBench and PowerStone benchmarks and the results were analyzed according to the metrics of general distance, diversity and hypervolume. Results showed that the technique PESA-II presented best results for the analyzed metrics in the majority of applications, being best in 83% of applications for the general distance metric and in 88% of cases for hypervolume metric. When comparing with the exhaustive approach, PESA-II needed only 1.47% of the total exploration space and presented solutions with quality close to the Pareto Optimal found by the exhaustive search. The implementation methods used in this work can be easily extended to larger exploration spaces and with different objectives or memory architectures. In future works it is intended to apply new optimization techniques, with different memory hierarchies, such as a memory hierarchy with three cache levels. New benchmark applications will also be analyzed, with a larger exploration space. References [1] T. Furuyama, Challenges of digital consumer and mobile soc’s: more Moore possible? in: International Symposium on VLSI Design, Automation and Test, 2007. VLSI-DAT 2007, 2007, p. 1, http://dx.doi.org/10.1109/VDAT.2007. 373196. [2] F. Cordeiro, A. Silva-Filho, C. Araujo, M. Gomes, E. Barros, M. Lima, An environment for energy consumption analysis of cache memories in soc platforms, in: Programmable Logic Conference (SPL), 2010 VI Southern, 2010, pp. 35–40, http://dx.doi.org/10.1109/SPL.2010.5483007. [3] H. Chang, L. Cooke, M. Hunt, G. Martin, A.J. McNelly, L. Todd, Surviving the SOC Revolution: A Guide to Platform-based Design, Kluwer Academic Publishers, Norwell, MA, USA, 1999. [4] A. Malik, B. Moyer, D. Cermak, A low power unified cache architecture providing power and performance flexibility, in: Proceedings of the 2000 International Symposium on Low Power Electronics and Design, 2000. ISLPED ’00, 2000, pp. 241–243, http://dx.doi.org/10.1109/LPE.2000.155290. [5] C. Zhang, F. Vahid, Cache configuration exploration on prototyping platforms, in: Proceedings. 14th IEEE International Workshop on Rapid Systems Prototyping, 2003, 2003, pp. 164–170, http://dx.doi.org/10.1109/IWRSP.2003. 1207044.
[6] A. Gordon-Ross, F. Vahid, N. Dutt, Fast configurable-cache tuning with a unified second-level cache IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 17 (1) (2009) 80–91, http://dx.doi.org/10.1109/TVLSI.2008.2002459. [7] J.H. Holland, Adaptation in Natural and Artificial Systems, Ann Arbor MI University of Michigan Press, 1975, http://dx.doi.org/10.1086/318962. [8] J. Horn, N. Nafpliotis, D. Goldberg, A niched pareto genetic algorithm for multiobjective optimization, in: Proceedings of the First IEEE Conference on Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence, vol. 1, 1994, pp. 82–87, http://dx.doi.org/10.1109/ICEC.1994. 350037. [9] J. Knowles, D. Corne, Approximating the non-dominated front using the pareto archived evolution strategy, Evol. Comput. 8 (1999) 149–172. [10] K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II, Springer, 2000, pp. 849–858. [11] D.W. Corne, N.R. Jerram, J.D. Knowles, M.J. Oates, Pesa-II: Region-based selection in evolutionary multiobjective optimization, in: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2001, Morgan Kaufmann Publishers, 2001, pp. 283–290. [12] E. Zitzler, K. Giannakoglou, D. Tsahalis, J. Periaux, K. Papailiou, T.F. (Eds, E.Z. Ler, M. Laumanns, L. Thiele, Spea2: Improving the Strength Pareto Evolutionary Algorithm for Multiobjective Optimization (2002). [13] A.G. Silva-Filho, C.J.A. Bastos-Filho, D.M.A. Falcão, F.R. Cordeiro, R.M.C.S. Castro, An optimization mechanism intended for two-level cache hierarchy to improve energy and performance using the nsgaii algorithm, in: Proceedings of the 2008 20th International Symposium on Computer Architecture and High Performance Computing, IEEE Computer Society, Washington, DC, USA, 2008, pp. 19–26, http://dx.doi.org/10.1109/SBAC-PAD.2008.9 http://dl.acm. org/citation.cfm?id=1475692.1476011. [14] G. Palermo, C. Silvano, V. Zaccaria, Discrete particle swarm optimization for multi-objective design space exploration, in: 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools, 2008. DSD ’08, 2008, pp. 641–644, http://dx.doi.org/10.1109/DSD.2008.21. [15] M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, R.B. Brown, Mibench: a free, commercially representative embedded benchmark suite, in: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, 2001, pp. 3–14, http://dx.doi.org/10.1109/WWC. 2001.15. [16] J. Scott, L.H. Lee, J. Arends, B. Moyer, Designing the Low-power M*CORE Architecture (1998). [17] D. Burger, T.M. Austin, The simplescalar tool set, version 2.0, SIGARCH Comput. Archit. News 25 (3) (1997) 13–25, http://dx.doi.org/10.1145/268806. 268810. [18] M. Mamidipaka, N. Dutt, ecacti: An enhanced power estimation model for on-chip caches, Tech. rep., In Technical Report TR-04-28, CECS, UCI, 2004. [19] D.A.V. Veldhuizen, G.B. Lamont, Multiobjective Evolutionary Algorithm Research: A History and Analysis, 1998. [20] K. Tan, T. Lee, E. Khor, Evolutionary algorithms with dynamic population size and local exploration for multiobjective optimization, IEEE Trans. Evol. Comput. 5 (6) (2001) 565–588, http://dx.doi.org/10.1109/4235.974840. [21] E. Zitzler, L. Thiele, Multiobjective Optimization Using Evolutionary Algorithms – A Comparative Case Study, Springer, 1998, pp. 292–301. Filipe Rolim Cordeiro received the M.Sc. degree in Computer Engineering from Federal University of Pernambuco (UFPE), Brazil, in 2011. Currently he is Ph.D. candidate of Computer Engineering of UFPE and Assistant Professor of Federal Rural University of Pernambuco. He has worked with Multi-Objective algorithms applied to embedded systems optimization, with focus in cache memory hierarchy, and he has published several papers in the area, in the last years. His research interests include computational intelligence, machine learning, computer vision and artificial intelligence.
Abel Guilhermino da Silva Filho has Ph.D. in Computer Science in Federal University of Pernambuco (UFPE) since 2006, currently is adjunct professor of UFPE, since 2008. He is author and co-author of papers published in the area of embedded systems, in the domain of optimization mechanisms to reduce energy consumption, computer architecture, reconfigurable architectures, high performance applications, evolutionary algorithms to reduce energy consumption, cache memory and memory hierarchies. He is coordinator of research projects of computer engineering and high performance and he has been the revisor of the Journal of Integrated Circuits and Systems and the International Journal of Software Engineering and Knowledge Engineering since 2012.