A non-parametric learning algorithm for small manufacturing data sets

A non-parametric learning algorithm for small manufacturing data sets

Expert Systems with Applications Expert Systems with Applications 34 (2008) 391–398 www.elsevier.com/locate/eswa A non-parametric learning algorithm ...

315KB Sizes 2 Downloads 160 Views

Expert Systems with Applications Expert Systems with Applications 34 (2008) 391–398 www.elsevier.com/locate/eswa

A non-parametric learning algorithm for small manufacturing data sets Der-Chang Li *, Chun-Wu Yeh Department of Industrial and Information Management, National Chen Kung University, 1 University Road, Tainan 70101, Taiwan

Abstract Nowadays the manufacturing environment changes promptly owing to globalization and innovation. It is noteworthy that the life cycle of products consequently becomes shorter and shorter. Although data mining techniques are widely employed by researchers to extract proper management information from the data, scarce data can only be obtained in the early stages of a manufacturing system. From the view of machine learning, the size of training data significantly influences the learning accuracies. Learning based on limited experience will be a tough task. On account of the cause, this research systematically estimates the data behavior such as the trend and potency to capture the dependency within a sequence of time series data. It should also be added that the analyzed data in this article are dependent examples that come from different populations. This research proposes a non-parametric learning algorithm instead of using parametric statistics for small-data-set learning. The proposed algorithm named the trend and potency tracking method (TPTM) attempts to explore the predictive information through the generation of trend and potency (TP) value of each datum. The extra information extracted from the data trend and potency proves that it can speed up stabilizing the learning task and can dynamically improve the derived knowledge from the occurrence of the latest data.  2006 Elsevier Ltd. All rights reserved. Keywords: Trend and potency function; Small data sets; Non-parametric learning; Machine learning

1. Introduction Nowadays the manufacturing environment changes promptly owing to globalization and innovation. It is noteworthy that the life cycle of products becomes shorter and shorter. How to efficiently manage the manufacturing system is crucial to the future of the company. In that managers can detect the manufacturing problems beforehand and do the troubleshooting in a timely fashion. Although data mining techniques are widely employed to extract proper management information from the data, scarce data can be obtained in the early stages of a manufacturing system. Consequently, investigators always want to acquire more training data to implement learning tasks; nonetheless for small-data-set learning, the problems encountered seriously results from insufficient information. Non-parametric methods are employed in this case and *

Corresponding author. Tel.: +886 911 874507; fax: +886 6 2374252. E-mail address: [email protected] (D.-C. Li).

0957-4174/$ - see front matter  2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2006.09.008

researchers know little about the parameters of the variable of interest in the population because, technically, nonparametric methods do not rely on the estimation of parameters such as the mean and the standard deviation. Thus, managers often try to analyze ‘‘very limited data’’ by using non-parametric methods to extract some useful information in order to make timely manufacturing decisions. From the view of machine learning, the sample size of collected data significantly influences learning accuracies. Learning based on limited experience is a tough task. On account of the cause, this research systematically estimates the data behavior such as the trend and potency to capture the dependency within a sequence of time series data. It should also be added that the analyzed data in this research are dependent examples that come from different populations. In the following sections, we will review some pertinent research concerning this issue and then introduce the proposed approach (the trend and potency tracking method, TPTM) with detailed descriptions. Eventually the learning

392

D.-C. Li, C.-W. Yeh / Expert Systems with Applications 34 (2008) 391–398

algorithm will be applied to the synthetic control chart time series (SCCTS) data obtained from manufacturing processes for demonstrating the ascendancy of the proposed method. Conclusions and discussions are drawn in the last section. 2. Literature reviews First, this research reviewed recent machine learning techniques applied in the field of manufacturing, especially the scope of a flexible manufacturing system (FMS). Secondly, an elaboration of learning with small data sets will be adverted, including the evaluation of methods. The point of observing is to focus on learning with dependent data, so several articles dedicated to the learning with times series data should also be added in this section eventually. 2.1. Applying machine learning techniques in the field of manufacturing The scheduling of jobs in manufacturing systems are essential to reduce the production cost and to balance the loading of production lines. Unlike traditional mass production systems, the surroundings concerning a flexible manufacturing system (FMS) are much more complex. Thus, the phenomena in such an environment are subject to frequent changes during the production period. Among the related research to these scheduling problems, the machine learning methods have been widely applied in the last decade. In general, several methodologies such as analytical, heuristic, simulation-based, and artificial intelligence-based approaches were applied to characterize the manufacturing experience by generating sufficient examples, and machine learning techniques were used to determine the scheduling rules for a manufacturing system (Priore, De La Fuente, Gomez, & Puente, 2001). For example, Shaw, Park, and Raman (1992) employed the ID3 learning algorithm which was proposed by Quinlan (1996) in a pattern directed scheduling system for FMS scheduling; Pierreval and Ralambondrainy (1990) proposed an inductive learning method named GENREG for the sake of reducing the large number of scheduling rules; and Nakasuka and Yoshida (1992) introduced a learning-aided dynamic scheduler to solve scheduling problems. Furthermore, Chen and Yih (1996) implemented the techniques to determine the most important attributes in constructing a knowledge-based scheduling system. Sun and Yih (1996) and Sabuncuoglu and Touhami (2002) applied back-propagation artificial neural networks to the learning with the artificially generated samples. 2.2. Learning with small data sets From the computational learning theory (Anthony & Biggs, 1997), sample sizes in machine learning problems apparently influence the learning performance, and different learning tools require different amounts of data for

stable learning. Investigators are likely to encounter the learning problems with unsatisfactory data in reality. To conquer this problem, generating some artificial data to the exploring system may be one approach. In virtual data generation, Niyogi, Girosi, and Tomaso (1998) employed the prior knowledge obtained from the given small training set to create virtual examples for improving recognition ability in the field of pattern recognition. After being given a 3-D view of an object, new images would be generated from another orientation through mathematical transformations. The new images generated are called virtual samples, and the procedure of creating virtual samples is mathematically equivalent to incorporating prior knowledge. Via these virtual samples, a learning machine can detect a target more precisely with this information. Applying the concept of joining the virtual data, Li, Chen, and Lin (2003) developed the functional virtual population (FVP) approach by adding virtual samples as assistance in obtaining scheduling knowledge in dynamic manufacturing environments. They used the collected small random data set (20 data) to form a functional virtual population and drew more data out of this virtual pool. From their experimental studies, the information of virtual data did raise the learning performance of neural networks. In other words, the scheduling knowledge is not enough in the early stages since the data set collected was rather small. The FVP algorithm expanded the domains of the system attributes and generated a number of virtual samples. Using these virtual samples, a new level of scheduling knowledge was constructed. Similar to the principle of adding some data to the system, Li, Wu, and Chang (2005) proposed the data-fuzzification method to improve the flexible manufacturing system (FMS) forecasting accuracy with the concept of the continuous data band. The authors used a common membership function to fuzzify a set of crisp data to be continuous. Their approach was different from the fuzzifying individual datum traditionally, and aims at filling the gaps between separate data to make the data status more complete. Their results upon performing neural network learning with fuzzified continuous data showed superiority when compared with using traditional crisp data. To further develop the approach, a concept of domain external expansion was also introduced into the procedure in their research. Besides filling the data gaps within the data set, the purpose of domain external expansion is to add more data outside the existing data limits. In that probable data are expected to appear not only inside but also outside the current data ranges. However, this study left open to question how large the domain range should be spread in further research. Huang and Moraga (2004) presented a diffusion neural network (DNN) that combines a conventional neural network and the information diffusion technique for function learning. The information diffusion approach uses fuzzy theories to derive new samples to solve the problem of data incompleteness. Though the DNN shows better learning

D.-C. Li, C.-W. Yeh / Expert Systems with Applications 34 (2008) 391–398

393

accuracy than the back-propagation neural network (BPN), the research does not offer any clear determination of diffusion functions and coefficients. In addition, the symmetric diffusion technique over-simplifies the generation of new samples from time to time and it will bring about the miscalculation of the domain ranges. Either under-estimating or over-estimating the domain ranges would lead to poor learning accuracy. Therefore, the asymmetric domain range estimation, known as data trend estimation, is subsequently developed to conquer this problem. Li, Hsu, Tsai, Lu, and Hu (2007) proposed another diffusion-based learning technique and applied it to the field of cancer diagnosis. In brief, genome and oncogenes are believed to have close relations to human cancers and are reported to transduce the proliferative signals to the nucleus. Bladder cancer cells have been reported to possess increasing expressions of oncogenes such as epidermal growth factor receptor (EGFR), c-erbB-3, c-erbB-4 and HER-2/neu proteins (Quek, Quinn, Daneshmand, & Stein, 2003), and the malignant process is triggered when the number of altered oncogenes and tumor suppressor genes reaches a certain level. Concerning the molecular analysis, it is expected that distinct gene expression profiles should represent different tumor types and these genetic alterations are utilized to diagnose different cancers. However, investigators do not have sufficient genetic alterations to characterize the bladder cancer and the relations among multiple genes are usually non-linear. Therefore the study employed the mega-trend-diffusion technique to improve the accuracy of gene diagnosis for bladder cancer with a very limited number of samples. The modeling results showed that the learning accuracies of the bladder cancer diagnosis enhanced stably, from 82% to 100%. Compared with traditional methods such as back-propagation neural networks and decision-tree learning techniques, this article provides a new approach for a reliable model for the smalldata-set analysis.

dependent-data sets) are mutually dependent by time since the manufacturing system is not yet stable. It implies that learning in the early stages is more difficult than learning in the mature phase of a manufacturing system. Though fewer researches are focused on this hardship, we still think it is meaningful to develop a learning method for dependent data collected in early stages. As mentioned above, making proper decisions via useful information as soon as possible is an important weapon to become more competitive than the others in the industry. Therefore, this research aims at developing a method, named the trend and potency tracking method (TPTM), for extracting useful information based on insufficient data collected in the early stages. The TPTM mainly captures the dependency within a sequence of time series data and then uses it as the basis of formulating the trend and potency (TP) function. The TP value of each datum is calculated through given time stages and the existing data. By the way, the TPTM creates extra information tied closely with the original data and this increases the knowledge for the learning machines.

2.3. Learning with time series data

Fig. 1 shows that when the first observation X1 appears, the central location (CL) emerges in the same position as X1. The occurrence of subsequent datum X2 will guide the CL moving to its possible site. Suppose that X2 comes from

When it comes to dependent-data cases, occasionally called time series data, the data theoretically come from various populations. Hence it is more difficult to extract the information, especially when the number of the available data is small. There are quite a few researches aimed at learning with times series data such as Montanes, Quevedo, and Prieto (2002), Kadous and Sammut (2004), Prudenico and Ludermir (2004a, 2004b), Chi and Ersoy (2005), Popescu and Wong (2005), and Lindsay and Cox (2005). Lately, the research topic related to learning with dependent (or time series) data has spotlighted the forecasting or predicting of a pattern or amount concerning future data. Basically, its requirement for the number of used data is rigorous. Namely, sufficient data are a must for stabilizing the learning processes. Generally, samples collected in the early stages of a manufacturing system (small

3. Development of the incremental learning algorithm When it comes to the small-data-set learning, it indicates that researchers do not have adequate observations in the early learning phase. In other words, the data at hand is not sufficient information for learning machines. Concerning a sequential (time series) data set, data collection can be regarded as an incessant and incremental procedure. Therefore, the incoming data gradually improve and update the details of acquired knowledge. This research intends to probe for more information from the underlying populations for stabilizing the knowledge acquisition process. 3.1. Central location estimation of the existing data

X1

Time

CL direction di at phase 1

X1

X2

X1

X2

CL direction at phase 2

X3

X3

X t X1

CL direction at phase 3

X2 CL direction at phase t

Fig. 1. The occurrence of the data guides the direction of the central location (CL).

394

D.-C. Li, C.-W. Yeh / Expert Systems with Applications 34 (2008) 391–398

the population different from X1’s. This phenomenon means the population has been changed by extrinsic factors, and the CL intuitively moves to a new site between X1 and X2. Through fuzzy set theory, neither X1 nor X2 share the largest value of the membership function. Though both data originate from different populations, the subsequent datum X2 still reveals the important information for tracking the CL. As the procedure carries on, CL direction is influenced by the occurrence of the latest datum and we can estimate the possible site of CL with the existing data. 3.2. Data trend and potency estimation: formulating the trend and potency tracking method (TPTM) Generally speaking, scarce data are obtained for acquiring manufacturing knowledge in the initial stages of production systems. However, few data provide imprecise system information in learning phases and thus lower the prediction efficiency of learning machines. For this reason, investigators should meditate upon how to increase flexibility or allowance in the learning procedure to make the learning machines more general. To conquer this problem, estimating the data behavior such as their trend and potency is practical. As the previous section mentioned, we can estimate the possible data trend by means of the occurrence of the subsequent data. Furthermore, it cannot be emphasized too strongly that closely monitoring data tendency and potency is as important as providing learning flexibility. This research therefore proposes the trend and potency tracking method (TPTM) by formulating the trend and potency (TP) function, an asymmetric domain range expansion, to seize the possible change of the data behavior at different phases. The detailed steps of the procedure are described as follows: Step 1: Assume we obtain t periods of data; that is to say, the data set at phase i consists of fX t1 ; X t2 ; . . . ; X ti g; i ¼ 1; 2; . . . ; t and is marked as Xt. Let X(min:i) be the element in Xt with the minimal value and X(max:i) be the one with the maximal value in X t. Obviously, X(min :i) and X(max:i) indicate the possible range of the CL at phase i. We sort the sequential data according to the order of their occurrence. Step 2: We then calculate the variations ri of the paired data (Xi1, Xi), i = 1, 2, . . . , t to obtain the increasing or decreasing potencies according to the data sequence. If ri concerning the paired data (Xi1, Xi) is positive, it denotes the trend of data at phase i is moving upward. On the contrary, the movement of the data at phase i is descending if ri < 0. Step 3: Since the latest datum greatly dominates the occurrence of the oncoming datum, we offer more weights to the up-to-date data to represent the intensity of different phases. The importance wi of the datum at phase i equals i  1, i =

Step 4:

Step 5:

Step 6:

Step 7:

Step 8:

1, 2, . . . , t. For instance, at phase 6 the importance (or the weight) of the datum w6 = 6  1 = 5. Let Ai = ri · wi, i = 1, 2, . . . , t as an accession to strengthen the data trend and potency at different phases by multiplying both weights and variations. Ai > 0 means the increasing potency (IP), and Ai < 0 is the decreasing potency (DP). We then find out the central location (CL) of the existing data using the following equation: X þX CL ¼ ðmin:iÞ 2 ðmax:iÞ . This research utilizes the CL as the main point to make the asymmetric domain range expansion. We compute the average of the increasing potencies (AIP) and the average of the decreasing potencies (ADP) and then use them to asymmetrically expand the domain range. The upper limit of the expanded domain range (EDR_UL) is X(max:i) + AIP, and the lower limit of the expanded domain range (EDR_LL) is X(min:i) + ADP. Thus we acquire a new expanded domain range to explore extra information of data trend and potency. Through CL, EDR_UL, and EDR_LL, they are employed to form a triangular TP function. Here we set the TP value of the CL to be 1, and then we can obtain the TP values of the existing data X t ¼ fX t1 ; X t2 ; . . . ; X ti g; i ¼ 1; 2; . . . ; t through the ratio rule of a triangle. As Fig. 2 shows, we demonstrate a simple instance and the TP value of p X(min:i) is n ¼ pþq , where p is the distance between EDR_LL and X(min:i) and q is the distance between X(min:i) and CL. The range of the TP value is between 0 and 1, and the TP value represents the current datum’s intensity close to the CL. If a latest datum appears in the sequence, return to step 1 and re-compute the TP value concerning each of the existing data.

3.3. The concept of the trend and potency tracking method (TPTM) Nowadays the environment that we confront is dynamic and the knowledge model that investigators try to explore will be influenced by the occurrence of the oncoming data. That is, collected data usually vary with time in more value TP va

1 n EDR_LL

p

X (min:i)

q

CL

X(max:i) EDR_UL

Fig. 2. A triangular TP function and the TP value corresponding to the datum X(min:i).

D.-C. Li, C.-W. Yeh / Expert Systems with Applications 34 (2008) 391–398

dynamic surroundings and the TPTM holds the concept of managing to seize the current change of information created by the latest data. Upon the TPTM, the generation of TP value for each datum executes to quantify and raise the amount of information concerning the data trend and potency. The TPTM can indeed give learning machines the benefit of accelerating the stabilization of knowledge acquisition. 4. Experimental studies This research manages to explore the benefits of the proposed approach: the trend and potency tracking method (TPTM) for learning machines. The experimental procedure is described in detail in the following sub-section. 4.1. Application: analysis of the synthetic control chart time series (SCCTS) data The experimental demonstration aims to show prediction by means of the TPTM that combines the original

395

data and the trend and potency (TP) values when collected data are insufficient in the early stages. For this purpose, this research selects a synthetic control chart time series data set (Alcock & Manolopoulos, 1999) collected by Knowledge Discovery Database (http://kdd.ics.uci.edu/ databases/synthetic_control/synthetic_control.html). The SCCTS data set contains 600 observations of control charts synthetically and is divided into six different classes (normal, cyclic, increasing trend, decreasing trend, upward shift, downward shift). That is, there are 100 examples for each class and every example has 60-stage data. In this research, we choose the cyclic examples as listed in Table 1 for the experimental analysis. It is noteworthy that the purpose of this research manages to create a knowledge-based model that can provide adequate information for the application concerning the small-data-set learning. In this article, suppose each example has only five-stage data at present, given {X1, X2, . . . , X5}, we can construct a knowledge-based model based on the current observations. Afterwards, we can apply this learned knowledge-based model to predict the

Table 1 The 100 cyclic data (each with 60 stages) in the synthetic control chart time series data set collected from Knowledge Discovery Database No.

Time stage t1

t2

t3

t4

t5

t6

t7

t8

t9

t10

1 2 3 4 5 6 7 8 9. .. 98 99 100

35.771 24.971 35.535 24.21 31.556 28.841 28.4 34.54 28.261

34.396 33.832 41.707 41.768 33.811 34.949 39.884 36.109 33.457

35.225 46.942 39.171 45.223 47.76 42.306 40.404 40.715 43.39

46.2939 42.503 48.396 43.776 40.406 35.188 42.5 40.938 41.215

44.404 40.059 38.61 48.818 34.213 38.322 42.754 35.092 37.153

37.556 30.974 39.418 33.754 30.115 28.558 29.775 38.318 30.182

26.676 24.269 29.16 27.353 13.082 27.03 20.679 24.396 25.123

25.004 19.556 21.181 21.977 19.803 14.152 16.198 23.492 18.335

17.224 14.96 20.481 22.225 19.383 17.746 15.693 19.829 20.659

28.075 33.67 30.573

41.784 38.675 41.074

42.12 39.742 44.979

38.735 41.989 44.922

44.32 37.291 43.272

41.047 38.257 34.576 42.406 26.856 29.961 32.951 35.817 33.437 .. . 34.316 43.975 39.713

32.212 31.909 33.097

31.868 25.878 31.012

24.301 31.08 26.03

14.547 15.858 22.191



..

t59

t60

30.777 11.414 37.225 40.308 35.197 40.234 32.696 37.639 38.329

24.585 13.196 32.949 45.007 23.164 28.644 26.729 38.26 26.839 .. . 44.432 39.749 12.751

.

...

46.458 36.91 19.72

Table 2 A randomly selected training set (15 data) for the 5–5–1 BPN learning structure No.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Input

Output

X1

X2

X3

X4

X5

X6

28.4004 28.5487 30.7568 33.2194 24.0037 26.3583 32.8485 30.6026 24.1212 34.7518 32.3175 29.7213 27.1111 27.6615 26.2397

39.8835 38.744 30.4465 35.8685 35.8809 41.0159 32.6025 32.5901 39.138 37.8581 39.9719 42.3627 34.0624 37.1994 35.0625

40.4039 41.5082 35.8128 36.955 38.8854 46.4368 44.1224 37.9136 44.7903 33.9115 40.6855 36.3361 38.6419 39.1576 48.1225

42.5002 37.8523 42.8655 47.1315 39.6478 42.2493 44.172 48.5191 39.59 39.0104 36.9174 44.5727 36.5439 44.2635 49.9508

42.7544 40.991 38.2801 38.3517 39.082 40.7608 38.9286 39.0556 38.8291 40.3342 36.5269 34.6856 42.0513 46.4725 35.5473

32.9513 40.9896 43.9991 37.7406 40.1985 31.7019 45.4076 31.6838 32.6002 35.3339 35.9285 27.065 32.2969 40.1781 40.5235

396

D.-C. Li, C.-W. Yeh / Expert Systems with Applications 34 (2008) 391–398

Table 3 b 6 Þ while the actual value of X6 The testing set (85 data) for the 5–5–1 BPN learning structure and the predicted value of the first testing item is 29.39766 ð X is 41.0473 No.

16 17 18 19 .. . 97 98 99 100

Input

Actual value

Predicted value

X1

X2

X3

X4

X5

X6

X6

35.7709 24.9706 35.5351 24.2104

34.396 33.8315 41.7067 41.7679

46.2393 42.5033 48.3964 43.7762

44.4036 40.0592 38.6103 48.8175

30.889 28.0747 33.6696 30.5729

32.4894 41.7835 38.6754 41.0741

35.2249 46.9423 39.1705 45.2228 .. . 37.0183 42.1198 39.7419 44.9793

39.6446 38.7345 41.9887 44.9217

45.9137 44.3196 37.2908 43.2718

41.0473 38.257 34.5755 42.4056 .. . 31.5081 34.3157 43.9746 39.713

32.56436 40.27427 33.82551 41.08333 .. . 35.64234 36.60514 38.9501 34.12239

values of the incoming data such as the value of X6. Namely, the five-stage data will be adopted to induce certain knowledge for predicting X6. 4.2. Applying BPN to the original data set The learning tool used in this study is the Pythia software and back-propagation neural network (BPN) is a general learning machine for its convenience. Among the 100 cyclic examples, we randomly pick 15 examples out as learning exemplars as listed in Table 2 to train BPN; each of the 15 exemplars includes five input attributes and one output attribute as (X1, X2, X3, X4, X5, X6). The remaining 85 examples become the testing items as listed in Table 3 to verify the generalization of BPN. Because of the limited training set, this study employs the BPN with a 5–5–1 structure as shown in Fig. 3 (three layers, one hidden layer with five neurons plus one output layer with one neuron, training with 1000 repetitions, learning rate: 0.5) to conduct the prediction task. We repeat this experimental b 6 , and use the process ten times to obtain an average of X mean square error (MSE) for evaluating the prediction quality of X6. 4.3. Applying TP values to augment the original data set Since the original data in the training set contain less information (each training item with five data) to produce robust neural network learning, this research generates the TP value for each datum to augment the training set with sufficient learning size. Take the second training item in Table 4 for example. The original data set is (X1, X2, X3, X 4 , X 5 , X 6 ) = (28.5487, 38.744, 41.5082, 37.8523, 40.991, 40.9896). After augmenting the training set with the TP values, the modified training set becomes (X1, X2, X3, X 4 , X 5 , TP 1 , TP 2 , TP 3 , TP 4 , TP 5 , X 6 ) = (28.5487, 38.744, 41.5082, 37.8523, 40.991, 0.62861, 0.7664, 0.59262, 0.82247, 0.62514, 40.9896). The TP values help the learning process by acquiring more stable knowledge owing to the extra information. For the modified training set, this research uses a 10–5–1 neural network structure for learning the

Fig. 3. The 5–5–1 BPN structure including three layers, five neurons within the hidden layer and one neuron in the output layer.

prediction knowledge and then trains this topology to outb 6 as shown in Table 5. put X 4.4. Experimental results between the BPN and the proposed approach TPTM This research implemented the forecasting task of different cyclic time series data in SCCTS and computed the mean square error (MSE) between the predicted value b 6 Þ and the actual value (X6) of the sixth stage. In the ðX experiment, the 5–5–1 BPN structure’s output is the preb 6 Þ, according to the dicted value of the sixth stage ð X

D.-C. Li, C.-W. Yeh / Expert Systems with Applications 34 (2008) 391–398

397

Table 4 A randomly selected training set with the corresponding TP values (15 data) for the 10–5–1 BPN learning structure No.

Input

Output

X1 Training set 1 28.4004 2 28.5487 3 30.7568 4 33.2194 5 24.0037 6 26.3583 7 32.8485 8 30.6026 9 24.1212 10 34.7518 11 32.3175 12 29.7213 13 27.1111 14 27.6615 15 26.2397

X2

X3

X4

X5

TP1

TP2

TP3

TP4

TP5

X6

39.8835 38.744 30.4465 35.8685 35.8809 41.0159 32.6025 32.5901 39.138 37.8581 39.9719 42.3627 34.0624 37.1994 35.0625

40.4039 41.5082 35.8128 36.955 38.8854 46.4368 44.1224 37.9136 44.7903 33.9115 40.6855 36.3361 38.6419 39.1576 48.1225

42.5002 37.8523 42.8655 47.1315 39.6478 42.2493 44.172 48.5191 39.59 39.0104 36.9174 44.5727 36.5439 44.2635 49.9508

42.7544 40.991 38.2801 38.3517 39.082 40.7608 38.9286 39.0556 38.8291 40.3342 36.5269 34.6856 42.0513 46.4725 35.5473

0 0.62861 0.62027 0.83468 0.22441 0.47976 0.66216 0.80863 0.47425 0.78648 0.60592 0.77651 0.45728 0 0.82934

0.40001 0.7664 0.6003 0.89764 0.72123 0.79734 0.64715 0.85109 0.80072 0.93383 0.60224 0.80017 0.96231 0.98592 0.95634

0.3275 0.59262 0.94572 0.92346 0.51468 0.55947 0.67 0.96481 0.56014 0.71081 0.52045 0.97559 0.79881 0.77773 0.60416

0.03542 0.82247 0.71972 0.62881 0.46227 0.74322 0.66714 0.62321 0.78148 0.83011 0.95233 0.7155 0.90276 0.23486 0.53198

0 0.62514 0.92669 0.95666 0.50117 0.80854 0.96885 0.98921 0.81386 0.71097 0.99709 0.92592 0.62989 0 0.96332

32.9513 40.9896 43.9991 37.7406 40.1985 31.7019 45.4076 31.6838 32.6002 35.3339 35.9285 27.065 32.2969 40.1781 40.5235

Table 5 The testing set with the corresponding TP values (85 data) for the 10–5–1 BPN learning structure and the predicted value of the first testing item is b 6 Þ, while the actual value of X6 is 41.0473 43.29225 ð X No.

Input

Actual value

Predicted value

X2

X3

X4

X5

TP1

TP2

TP3

TP4

TP5

X6

X6

set 35.7709 24.9706 35.5351 24.2104

34.396 33.8315 41.7067 41.7679

35.2249 46.9423 39.1705 45.2228

44.4036 40.0592 38.6103 48.8175

0.55773 0.51245 0.77467 0.26075

0.42399 0.90569 0.99092 0.80671

0.82443 0.85618 0.88243 0.54735

32.4894 41.7835 38.6754 41.0741

37.0183 42.1198 39.7419 44.9793

45.9137 44.3196 37.2908 43.2718

0 0.55562 0.81877 0.31977

0.21304 0.72566 0.90368 0.7984

0.50462 0.6149 0.90206 0.6796 .. . 0.8159 0.70915 0.78229 0.55968

0.74555 0.7705 0.72466 0.73282

30.889 28.0747 33.6696 30.5729

46.2393 42.5033 48.3964 43.7762 .. . 39.6446 38.7345 41.9887 44.9217

0.83451 0.87539 0.52657 0.5632

0 0.60112 0.97654 0.66406

41.0473 38.257 34.5755 42.4056 .. . 31.5081 34.3157 43.9746 39.713

45.03059 42.34584 38.73485 42.51023 .. . 44.07551 38.96939 40.62916 40.88601

X1 Testing 16 17 18 19 .. . 97 98 99 100

knowledge learned from the original data set (15 training data and each datum with five input attributes). While the 10–5–1 BPN structure’s output is the predicted value

b 6 Þ based on the modified training set accompanied with ðX the TP values (15 training data and each datum with 10 input attributes). The results are shown in Table 6.

Table 6 The MSE results concerning two kinds of prediction performances for the cyclic data in SCCTS Experiment

Original data (5–5–1)

TPTM data (10–5–1)

MSE improvement (%)

1 2 3 4 5 6 7 8 9 10

75.2110 49.4867 66.6144 52.4217 54.7936 56.9419 51.1941 55.6704 48.5102 45.2612

54.9034 39.4013 59.8310 51.0857 53.8367 51.0015 48.8599 49.5393 45.0901 39.2403

27.00 20.38 10.18 2.55 1.75 10.43 4.56 11.01 7.05 13.30

Average Variance

55.6105 73.2404

49.2789 38.6117

10.82 0.56

A 5–5–1 BPN structure is trained with the original data, while a 10–5–1 BPN topology is learned with the TPTM data.

398

D.-C. Li, C.-W. Yeh / Expert Systems with Applications 34 (2008) 391–398

5. Conclusions and discussions In this research, we present a new learning algorithm to conquer the problem that occurs in the early stages of manufacturing systems. This kind of problem includes the fact that the data are not enough and the information is not sufficient for the learning process. An approach named TPTM is developed to extract information concealed in a set of successive observations. Briefly speaking, the TPTM looks for the data trend by considering the occurrence order of the observed data and also quantifies the potency for each of the existing data by computing the TP value through the ratio rule of a triangle. It is clear from the experimental results that the proposed approach TPTM can assist in raising the prediction performance of the learning machines such as neural networks by systematically expanding the domain of the training items. In addition, the training mechanism of the TPTM conducts an incremental learning process and it is practical to learn the knowledge in the dynamic and early stages. Especially in this global, competitive environment, the life cycle of products becomes shorter and shorter. TPTM can succor engineers or decision makers to obtain the information for management more efficiently. This research proposes the concept concerning an asymmetric domain range expansion to seize the possible change of the data behavior at different phases. It is a wonder just how many examples will make the learning process stable for the small-data-set learning. The related issue may be an interesting subject for further studies. References Alcock, R. J., & Manolopoulos, Y. (1999). Time series similarity queries employing a feature-based approach. In 7th Hellenic conference on informatics, Ioannina, Greece. Anthony, M., & Biggs, N. (1997). Computational learning theory. Cambridge University Press. Chen, C. C., & Yih, Y. (1996). Identifying attributes for knowledge-based development in dynamic scheduling environments. International Journal Production Research, 34(6), 1739–1755. Chi, H. M., & Ersoy, M. K. (2005). A statistical self-organizing learning system for remote sensing classification. IEEE Transactions on Geoscience and Remote Sensing, 43, 1890–1900. Huang, C., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35, 137–161.

Kadous, M. W., & Sammut, C. (2004). Constructive induction for classifying time series. Lecture Notes in Computer Science, 3201, 192–204. Li, D. C., Chen, L. S., & Lin, Y. S. (2003). Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011–4024. Li, D. C., Hsu, H. C., Tsai, T. I., Lu, T. J., & Hu, S. C. (2007). A new method to help diagnose cancers for small sample size. Expert Systems with Applications, 33(2), in press, doi:10.1016/j.eswa.2006.05.028. Li, D. C., Wu, C., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27, 321–328. Lindsay, D., & Cox, S. (2005). Effective probability forecasting for time series data using standard machine learning techniques. Lecture Notes in Computer Science, 3686, 35–44. Montanes, E., Quevedo, J. R., Prieto, M. M., et al. (2002). Forecasting time series combining machine learning and Box–Jenkins time series. Lecture Notes in Artificial Intelligence, 2527, 491–499. Nakasuka, S., & Yoshida, T. (1992). Dynamic scheduling system utilizing machine learning as a knowledge acquisition tool. International Journal Production Research, 30, 411–431. Niyogi, P., Girosi, F., & Tomaso, P. (1998). Incorporating prior information in machine learning by creating virtual examples. In Proceeding of the IEEE (pp. 275–298). Pierreval, H., & Ralambondrainy, H. (1990). A simulation and learning technique for generating knowledge about manufacturing systems behavior. Journal of the Operational Research Society, 41(6), 461–474. Popescu, C. A., & Wong, Y. S. (2005). Nested Monte Carlo EM algorithm for switching state-space models. IEEE Transactions on Knowledge and Data engineering, 17, 1653–1663. Priore, P., De La Fuente, D., Gomez, A., & Puente, J. (2001). A review of machine learning in dynamic scheduling of flexible manufacturing systems. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 15, 251–263. Prudenico, R. B. C., & Ludermir, T. B. (2004a). Meta-leaming approaches to selecting time series models. Neurocomputing, 61, 121–137. Prudenico, R. B. C., & Ludermir, T. B. (2004b). Using machine learning techniques to combine forecasting methods. Lecture Notes in Artificial Intelligence, 3339, 1122–1127. Quek, M. L., Quinn, D. I., Daneshmand, S., & Stein, J. P. (2003). Molecular prognostication in bladder cancer – a current perspective. European Journal of Cancer, 39, 1501–1510. Quinlan, J. R. (1996). Learning decision tree classifiers. ACM Computing Surveys, 28(1), 71–72. Sabuncuoglu, I., & Touhami, S. (2002). Simulation metamodelling with neural networks: an experimental investigation. International Journal of Production Research, 40, 2483–2505. Shaw, M. J., Park, S., & Raman, N. (1992). Intelligent scheduling with machine learning capabilities: the induction of scheduling knowledge. IIE Transactions, 24(2), 156–168. Sun, Y. L., & Yih, Y. (1996). An intelligent controller for manufacturing cells. International Journal of Production Research, 34(8), 2353–2373.