Electric Power Systems Research 79 (2009) 1393–1399
Contents lists available at ScienceDirect
Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr
Data mining methodology for disaggregation of load demand José A. Dominguez-Navarro ∗ , José L. Bernal-Agustín, Rodolfo Dufo-López Electrical Engineering Department, University of Zaragoza, Calle María de Luna 3, 50018 Zaragoza, Spain
a r t i c l e
i n f o
Article history: Received 9 May 2008 Received in revised form 17 March 2009 Accepted 20 April 2009 Available online 27 May 2009 Keywords: Disaggregation Load demand Data mining Tabu Search
a b s t r a c t This paper presents a new method in data mining to analyze the composition of the electric demand among the different consumption and the behavior of each type of load. The proposed method uses a heuristic optimization algorithm (Tabu Search) for minimizing the error between the real demand and the calculated approximation to this demand. This search is adaptative because the algorithm changes the relative weight of each load as well as the profile of each load. The obtained results show the good operation of the proposed methodology. Also, it is possible to observe that this approach to the knowledge of the demand is better than the classic approach in which “a picture” of the consumption can be obtained, while this methodology obtains the evolution of this consumption in time; that is to say, it shows “a movie” of the behavior of the loads. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Correct knowledge of how and when electricity is consumed was and is the desire of all the agents of an electric system. First, the interest was to know the effect that the demand side management programs had in the activities of planning [1] or the effect that the composition of the demand had in the analyses of the electric systems [2,3]. Later, interest was centered in the influence that the composition of the demand had in the liberalized electric market in the design of rates [4,5], in the flexibility and control of demand [6], in the definition of strategies, and/or in the analysis of this market [7,8]. The main approach utilized has been the “bottom-up” method [1–3] that consists of obtaining the profiles of the individual consumers’ demand or consumption types starting from socioeconomic and historical data. Later, the total demand of the system is obtained by means of an aggregation of the individual demands. The techniques utilized to carry out the classification and identification of the profiles of the consumers have been varied: statistical [1–3], cluster and decision trees [5,9], knowledge discovery in databases [7], and SOM neural networks [8]. Another possible approach is the “top-down” method, based on decomposing the total demand in individual types of consumption. The techniques used for this task have been based on monitor-
∗ Corresponding author. Tel.: +34 976 76 24 01; fax: +34 976 76 22 26. E-mail addresses:
[email protected] (J.A. Dominguez-Navarro),
[email protected] (J.L. Bernal-Agustín),
[email protected] (R. Dufo-López). 0378-7796/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.epsr.2009.04.015
ing the loads and later discovering signs (for example the quantity of harmonics) left by the individual types of consumption in the aggregated demand [6,10]. In this paper, a “top-down” approach is used from the aggregated demand toward the individual demands, but based on methods of data mining. This new method consists of an algorithm of heuristic optimization and a denominated Tabu Search, which minimizes the error between the aggregated demand and the sum of the individual demands. The utilized method proposes an adaptative strategy that allows the modification of individual consumption patterns to the local particularities of the example under study. The paper is structured in three sections: in the first one, the used consumption patterns are described, in the second section, the new “data mining” method used in the disaggregation of the demand is presented, and in the third section, the results obtained with the application of the new technique with regard to real data are presented. 2. Demand curve and consumption patterns The presented technique carries out the disaggregation of the aggregated demand curve in demand curves according to the different consumption types. Next, the aggregated demand curve and the consumption patterns are explained. 2.1. Aggregated demand curve The aggregated demand of a group of consumers during one day in an area can be represented for a vector Dk = [dk1 , . . ., dk24 ] that includes the 24 h power of the day k. For example, the demand curve for one winter day of the initial data is shown in Fig. 1.
1394
J.A. Dominguez-Navarro et al. / Electric Power Systems Research 79 (2009) 1393–1399
Fig. 1. Example of the demand curve of one day in the initial data.
2.2. Normalized consumption patterns Each consumption type can be represented by approximate normalized curves, similar to curves that can be found in some papers [1,11,12]. In the work presented in this paper, these curves have been represented using vectors Pi = [pi1 , . . ., pi24 ] that includes the 24 h power of the pattern i. Fig. 2 shows the standard curves used for different consumption types. These curves will be used in the section on computational results. In the case of the residential consumers, the consumption is typical of housing: illumination, different types of appliances, and heating and air conditioning. These normalized use patterns can be obtained from previous works of other authors [1,11,12], or they can be hypothetical patterns whose validity can be checked using them as a first approach that later on will be modified by means of an adaptative optimization that minimizes the errors regarding the available data. The most important characteristics of the consumption patterns used in this paper are described as follows: Pattern 1: corresponds to the consumption during daylight hours. Pattern 2: corresponds to the consumption during the night: for example, illumination. Daylight hours vary considerably depending on the latitude and time of year, so it is reasonable to use various profiles (for example, for summer and winter illumination). Pattern 3: represents the consumption in the second part of the day (for example, several appliances: TV, DVD, personal computers, microwaves, etc.). Pattern 4: includes the energy consumption corresponding to diverse appliances for cleaning the home, like the washing machine, vacuum cleaners, etc. Pattern 5: corresponds to the consumption of the electric heating or air conditioning. It has been considered to be located in the second part of the day. Pattern 6: represents the constant consumption along the day. For example, a refrigerator does not really consume a constant power along the day, but rather it is connected and it is disconnected periodically. However, the aggregated consumption of all the refrigerators of an area can be represented by a constant power averaged over 24 h of the day. 3. Data mining with Tabu Search The Tabu Search is a search method for neighborhoods [13]. Fig. 3 shows a simple version of the “Tabu Search Algorithm” in pseudocode. This method starts from a feasible initial solution sinitial . Then
Fig. 2. Patterns of normalized consumption types.
the research process is repeated until a stop criterion is reached. In each iteration, the process selects several simple modifications (movements) to apply in the solution snow and be stored in the “candidate list” CL(snow ). Each movement generates a new solution. The principal characteristic of this method is that it forbids repeating the movements carried out recently through the Tabu List. This
J.A. Dominguez-Navarro et al. / Electric Power Systems Research 79 (2009) 1393–1399
1395
The product of two arrays gives a solution that comes closer to the true values: ˆ s = As × Ps D
(1)
Every change of the value of a variable pih or ˛ki supposes going on to a new solution theses modifications, m(pih ) or m(˛ki ), are named movements. 3.2. Evaluation of a solution The real data are stored in a matrix Ds , of dimension nd × 24. For ˆ s , the difference between the two matrixes gives us the solution D the error matrix Es : ˆs Es = Ds − D
(2)
The sum of errors of a line k, ek , represents the error made in the calculation of the approach to the datum k: that is to say, the sum of the errors made in the 24 h of the day k according to the expression: Fig. 3. Tabu Search basic algorithm.
simple strategy avoids falling in local minima because the search is diversified. Next, each solution snew generated by movements of the candidate list is evaluated and the best solution is selected and stored in sbetter . After that, the Tabu List TL(t) is updated. As shown in Table 1, the modified variable is indicated with a cross, and the movement made in each iteration is indicated in the “Movement” column. In Table 1 also it can be seen that is forbidden to return to modify a variable during a certain number of iterations (in this case during two iterations). The forbidden movements are stored in the “Tabu List” during theses iterations. In Table 1, the parameters that cannot be changed due to their belonging to the “Tabu List” are shaded. This mechanism has been observed useful for escape from the local optimum and to continue strategically the search for even better solutions. The name of the heuristic method comes from this strategy. The search continues until a stop criterion is fulfilled, for example when the best reached solution does not improve in a certain number of iterations or the maximum number of iterations allowed is reached. 3.1. Description of a solution ˆ s of the posed problem is defined by two matrixes: A solution D • Ps = [Pi ] of dimension np × 1, where Pi is a normalized profile using the values of 24 h Pi = [pi1 , . . ., pih , . . ., pi24 ], and np is the number of utilized profiles. • As = [˛ki ] of dimension nd × np, where ˛ki is the coefficient that multiplies to the normalized profile i of each day k, and nd is the number of available data. Table 1 Development of Tabu Search.
ek =
24
(dkh − dˆ kh )
(3)
h=1
The sum of errors of a column h, eh , represents the error made in the calculation of the approach in the hour h: that is to say, the sum of the errors made in the nd available data according to the expression: eh =
nd
(dkh − dˆ kh )
(4)
k=1
The error function of all the data can be calculated in these two ways: ˆ s) = Ferror (D
nd k=1
ek =
24
eh
(5)
h=1
3.3. Generation of the list of available movements The available movements to improve the coefficients ˛ki of the matrix As are the movements that act on the coefficients of line k with more error ek . The available movements to improve the coefficients pih of the matrix Ps are the movements that act on the coefficients of the column h with more error eh . • For the coefficients ˛ki : (˛ki )new = mov(˛ki ) = ˛ki + c(ek /24)
(6)
• For the coefficients pih : (pih )new = mov(pih ) = pih + c(eh /nd)
(7)
where c is a parameter that depends on the refinement of the search. The coefficients pih are enclosed between 0 and 1. These allowed movements, in general m(wk ), are stored in a list denominated candidates list of the solution si , CL(si ) = {m(w1 ), m(w2 ), . . ., m(wp )}. Some movements can be forbidden because the variable wk has reached its limit or because some movement has already been applied recently with regard to this variable wk in the search process. That is to say, the movements with regard to the variables wk that belong to the Tabu List are not available. The Tabu List is a vector of length t, TL(t) = {w1 , w2 , . . ., wt−1 , wt }. In the Tabu List, the index ki of the changed coefficient are stored. There will be a Tabu List for the coefficients ˛ki and another for the coefficients pih .
1396
J.A. Dominguez-Navarro et al. / Electric Power Systems Research 79 (2009) 1393–1399
ˆ s ) is the group of all the soluThe neighborhood of a solution N(D tions that can be reached from this solution when a movement is applied. When only the available movements are applied, the ˆ s ), which is a subset of N(D ˆ s ). restricted neighborhood is RN(D 3.4. Description of the algorithm The complete algorithm of disaggregation is shown in Fig. 4. It is composed of two phases and uses the Tabu Search method in each phase. ˆ s of the search process is the product of two A solution D matrixes: As = [˛ki ] and Ps = [Pi ]. The research process is repeated until a stop criterion is reached. In each iteration, the process executes two phases: optimization of the coefficients ˛ki and optimization of the patterns Pi . Each phase can be explained as follows: Phase 1: search of the optimal coefficients ˛ki .The search of the optimal coefficients ˛ki is carried out according to the following problem of minimization: ˆ s) = min : Ferror (D
nd
ek (˛ki )
k=1
(8)
Fig. 5. Obtained approach after the optimization of the coefficients ˛ki in phase 1.
where nd is the number of available data.In this optimization, the standard profiles stay without changes and the coefficients ˛ki are modified. All coefficients ˛ki are initialized with the same value “mean power of the first data divided by number of profiles”.This search of phase 1 obtains a rough approximation of the real data, as shown in Fig. 5. Phase 2: search of the optimal patterns Pi .The calculation of the normalized patterns Pi is carried out according to the following problem of minimization: ˆ s) = min : Ferror (D
24
eh (pih )
(9)
h=1
In this optimization, the coefficients ˛ki are without changes and the coefficients pih of the standard profiles are modified. Each phase is optimized with the Tabu Search Algorithm. This search of phase 2 obtains a better approximation of the real data as is shown in Fig. 6. It is necessary to keep in mind that in all iterations, phases 1 and 2 are executed inside the principal loop of optimization, and therefore the results presented in the aforementioned figures are partial results of the general optimization process. 4. Computational results The described method has been applied to the data for a year of a residential area of a Spanish city to evaluate its operation. For example, a month’s worth of data of hours demanded in MVA used to validate the method are in Table 2. The standard profiles that were considered initially are represented in Fig. 2, and the matrix of coefficients As has been initialized with the unit matrix. Fig. 7 shows the evolution of the Tabu Search in the process of optimization to reduce the error. The steps that are observed in the graph belong together to the transition from phase 1 of coefficient
Fig. 4. Disaggregation algorithm.
Fig. 6. Obtained approach after the optimization of the normalized patterns Pi in phase 2.
Table 2 Real demand data in MW. h2
h3
h4
h5
h6
h7
h8
h9
h10
h11
h12
h13
h14
h15
h16
h17
h18
h19
h20
h21
h22
h23
h24
619 632 594 521 537 603 597 607 618 591 482 479 531 513 555 554 553 489 471 543 565 566 581 560 484 480 557 603 572
591 599 541 508 516 567 567 571 568 556 460 443 504 487 521 527 513 443 433 523 525 528 545 522 439 435 508 559 538
559 577 556 485 509 536 541 545 536 526 403 425 477 450 517 504 492 410 420 493 504 504 512 495 410 419 478 532 513
544 548 502 467 501 522 526 554 521 532 377 407 465 454 469 485 471 380 403 472 488 495 497 474 385 384 473 523 511
526 526 464 442 478 507 514 540 506 493 355 395 456 434 458 468 450 360 395 459 469 473 481 444 355 368 456 508 494
522 524 441 424 455 498 502 503 491 453 340 400 451 427 441 461 416 343 392 458 465 470 465 409 339 358 448 502 485
548 559 423 417 510 528 534 523 484 442 327 449 476 453 486 493 397 328 440 490 503 506 506 380 326 414 481 530 505
585 582 422 394 573 588 590 578 569 432 303 517 533 551 544 558 386 305 513 551 559 561 562 385 264 509 528 584 547
635 621 437 384 628 635 643 617 634 455 296 599 606 613 636 625 415 294 588 622 638 645 635 384 261 584 592 596 591
668 644 494 421 683 668 680 669 658 494 335 674 621 637 665 643 474 343 634 661 678 664 671 472 309 637 633 627 617
675 659 533 455 703 697 695 671 681 529 379 670 649 653 686 639 525 393 683 690 725 738 688 524 387 678 645 665 633
702 666 544 462 714 712 693 692 710 563 387 679 652 660 704 656 546 423 687 707 734 755 699 541 424 698 677 660 645
704 666 519 465 718 712 713 702 704 569 434 677 662 669 713 715 549 451 681 722 754 770 712 546 441 700 697 678 628
677 642 513 497 694 716 674 672 680 562 448 646 619 637 665 674 540 454 652 693 699 702 673 541 447 671 636 628 612
639 607 488 489 643 641 621 640 621 485 427 609 597 602 615 629 496 425 611 645 658 689 632 488 424 628 613 595 576
619 632 594 521 537 603 597 607 618 591 482 479 531 513 555 554 553 489 471 543 565 566 581 560 484 480 557 603 572
591 599 541 508 516 567 567 571 568 556 460 443 504 487 521 527 513 443 433 523 525 528 545 522 439 435 508 559 538
559 577 556 485 509 536 541 545 536 526 403 425 477 450 517 504 492 410 420 493 504 504 512 495 410 419 478 532 513
544 548 502 467 501 522 526 554 521 532 377 407 465 454 469 485 471 380 403 472 488 495 497 474 385 384 473 523 511
526 526 464 442 478 507 514 540 506 493 355 395 456 434 458 468 450 360 395 459 469 473 481 444 355 368 456 508 494
522 524 441 424 455 498 502 503 491 453 340 400 451 427 441 461 416 343 392 458 465 470 465 409 339 358 448 502 485
548 559 423 417 510 528 534 523 484 442 327 449 476 453 486 493 397 328 440 490 503 506 506 380 326 414 481 530 505
585 582 422 394 573 588 590 578 569 432 303 517 533 551 544 558 386 305 513 551 559 561 562 385 264 509 528 584 547
635 621 437 384 628 635 643 617 634 455 296 599 606 613 636 625 415 294 588 622 638 645 635 384 261 584 592 596 591
J.A. Dominguez-Navarro et al. / Electric Power Systems Research 79 (2009) 1393–1399
h1
1397
1398
J.A. Dominguez-Navarro et al. / Electric Power Systems Research 79 (2009) 1393–1399
Fig. 7. Error of the best solution for each iteration.
optimization to phase 2 of normalized consumption profile optimization. The reason is that the search does not advance more in that phase, as shown in the rough approximation of Fig. 5, and the algorithm should pass to the following phase to be able to improve the error and to achieve better approximations as shown in Fig. 6. In the first 100 iterations, the error descends quickly from 1 × 106 up to 2 × 105 , and the error descends slowly after the first 2000 iterations. At the end of the optimization, the coefficients (Fig. 8) as well as the normalized consumption profiles (Fig. 9) have evolved to minimize the error. In Fig. 8, we can observe that the coefficients reflect the weekly cycle of demand; it has a soft evolution along the days of the month without abrupt variations. The main coefficient is 6, because it represents the constant energy consumption throughout the day. In Fig. 9, we can also observe that the profiles have evolved lightly, but they still conserve the essence of the initial profiles.
Fig. 9. Changes in the normalized consumption patterns.
Fig. 8. Temporal evolution of the coefficients after of the optimization: (a) main coefficient; (b) other coefficients of less importance.
With the information provided by the standard profiles and the definitive coefficients, we can obtain the disaggregation of the residential curve according to the different consumption profiles for each one of the analyzed days, as we can observe in Fig. 10 for one of these days (the constant profile is not drawn). In Fig. 11, the quadratic errors of every analyzed day are observed. When observing these errors, it is appreciated that the biggest errors take place on Saturdays, Sundays, and Mondays. Dur-
J.A. Dominguez-Navarro et al. / Electric Power Systems Research 79 (2009) 1393–1399
1399
indicates the number of explanatory variables considered, “Variables” indicates which variables are considered, and “Error” shows the mean square error obtained. It is noted that the elimination of explanatory variables leads to models with larger approximation errors. Table 4 presents the analysis of the importance of selecting the explanatory variables. As can be seen, the choice of explanatory variables in the model is of great importance, as demonstrated by the error obtained. 5. Conclusion The proposed method allows the analysis and disaggregation of electricity consumption as a function of flexible demand profiles. This method solves two major problems for load clustering studies: how to extend the results obtained from on distribution network data to another, and how to consider the local and temporal dependence of load classes. The developed method of general application allows a work of data mining to be carried out in a satisfactory way. The utilized method is characterized by the use of little information: historical demand data and some approximate standard profiles. The obtained results describe with clarity the recurrent behavior of the demand, the weight of each type of consumption, and a profile of consumption that is closer to reality. The presented results allow us to consider the developed method as an appropriate tool for the study of the behavior of consumers in the liberalized energy market.
Fig. 10. Disaggregation of the demand in consumption.
References Fig. 11. Square error.
Table 3 Effect of the number of variables. No. of variables
Variables
Error (MW2 )
6 5 4 3 2 1
1; 2; 3; 4; 5; 6 1; 2; 4; 5; 6 2; 4; 5; 6 4; 5; 6 5; 6 6
51734.3 58482.4 63492.2 75177.0 79747.0 141320.0
Table 4 Effect of the selection of variables. No. of variables
Variables
Error (MW2 )
2 2 2
1; 2 3; 4 5; 6
2112300 2503290 79747
ing these days, some of the consumption is different than it is on the other days (labor days). Fig. 6 shows the real demand and the estimated demand for one of the labor days, and it is appreciated that the approach is good. Table 3 presents the analysis of the number of explanatory variables introduced in the model. The “No. of variables” column
[1] J.H. Eto, J.G. Koomey, J.E. McMahon, E.P. Kahn, Integrated analysis of demandside programs, IEEE Transactions on Power Systems 3 (4) (1988) 1397–1403. [2] A. Capasso, W. Grattieri, R. Lamedica, A. Prudenzi, A bottom-up approach to residential load modeling, IEEE Transactions on Power Systems 9 (2) (1994) 957–964. [3] J.A. Jardini, C.M.V. Tahan, S.U. Ahn, F.M. Figueiredo, Daily load profiles for residential, commercial and industrial low voltage consumers, IEEE Transactions on Power Delivery 15 (1) (2000) 375–380. [4] C.S. Chen, J.C. Hwang, C.W. Huang, Application of load survey to proper tariff design, IEEE Transactions on Power Systems 12 (4) (1997) 1746–1751. [5] B. Pitt, D. Kirchen, Applications of data mining techniques to load profiling, in: Proceedings of IEEE PICA, Santa Clara, CA, 1999. [6] J.A. Fuentes Moreno, A.M. García, A.G. Marín, E.G. Lázaro, C. Alvarez Bel, An integrated tool for assessing the demand profile flexibility, IEEE Transactions on Power Systems 19 (1) (2004) 668–675. [7] V. Figueiredo, F. Rodrigues, Z. Vale, J.B. Gouveia, An electric energy consumer characterization framework based on data mining techniques, IEEE Transactions on Power Systems 20 (2) (2005) 596–602. [8] S. Valero Verdú, M. Ortiz García, C. Senabre, A. Gabaldón Marín, F.J. García Franco, Classification, filtering, and identification of electrical customer load patterns through the use of self-organizing maps, IEEE Transactions on Power Systems 21 (4) (2006) 1672–1682. [9] G. Chicco, R. Napoli, P. Postulache, M. Scutariu, C. Toader, Customer characterization options for improving the tariff offer, IEEE Transactions on Power Systems 18 (1) (2003) 381–387. [10] R. Mancini, Z. Zabar, L. Birembaum, E. Levi, J. Hajagos, S. Kalinowsky, An area substation load model in the presence of harmonics, IEEE Transactions on Power Delivery 11 (1996) 2013–2019. [11] http://www.ree.es/cap07/pdf/indel/Atlas INDEL REE.pdf. [12] J.V. Paatero, P.D. Lund, A model for generating household electricity load profiles, International Journal of Energy Research 30 (5) (2005) 273–290. [13] C.R. Reeves, Heuristic modern techniques for combinatorial problems, John Wiley & Sons, New York, 1993.