Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems

Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems

Available online at www.sciencedirect.com Expert Systems with Applications Expert Systems with Applications 35 (2008) 1293–1300 www.elsevier.com/loca...

126KB Sizes 4 Downloads 103 Views

Available online at www.sciencedirect.com

Expert Systems with Applications Expert Systems with Applications 35 (2008) 1293–1300 www.elsevier.com/locate/eswa

Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems Tung-I Tsai a, Der-Chiang Li b

b,*

a Department of Information Management, Tainan University of Technology, Tainan, Taiwan Department of Industrial and Information Management, National Cheng Kung University, 1, University Road, Tainan 701, Taiwan

Abstract If the production process, production equipment, or material changes, it becomes necessary to execute pilot runs before mass production in manufacturing systems. Using the limited data obtained from pilot runs to shorten the lead time to predict future production is this worthy of study. Although, artificial neural networks are widely utilized to extract management knowledge from acquired data, sufficient training data is the fundamental assumption. Unfortunately, this is often not achievable for pilot runs because there are few data obtained during trial stages and theoretically this means that the knowledge obtained is fragile. The purpose of this research is to utilize bootstrap to generate virtual samples to fill the information gaps of sparse data. The results of this research indicate that the prediction error rate can be significantly decreased by applying the proposed method to a very small data set.  2007 Elsevier Ltd. All rights reserved. Keywords: Small data set; Bootstrap; Pilot runs; Manufacturing system

1. Introduction In the production process, we usually execute few pilot runs before a mass production plan in manufacturing systems. Using the limited data (small data sets) to build a robust management model for mass production is difficult because the data obtained from pilot runs is often incomplete and insufficient. Besides, as the product’s life cycle is getting shorter and shorter, the ability to rapidly and robustly transfer the experience learned from pilot runs to mass production has thus become a core competence of enterprises. Furthermore, since a comprehensive manufacturing problem is generally known to be complicated and nonlinear, the artificial neural network (ANN), which have non-linear modeling competence having been widely utilized to extract management knowledge from acquired data. But *

Corresponding author. Tel.: +886 6 2757575x50211; fax: +886 6 2766417. E-mail address: [email protected] (D.-C. Li). 0957-4174/$ - see front matter  2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.08.043

having sufficient training samples is one of the basic assumptions. Researchers believe that without sufficient training samples, neural networks would suffer non-negligible errors. In our experience, a sufficient training sample set is defined as a sample set which provides enough information to a learning method to obtain stable learning accuracy. In the other words, a training sample set which can not establish a robust ANN is considered insufficient, and is named a small sample set in this research. This sample size is believed to have a close relationship with the stability of the system environment, production method used, and sample dimensions. Therefore, an unstable manufacturing situation generally needs more samples than a stable one; and samples with fewer evaluation attributes need fewer samples. Regarding sample sufficiency, computational learning theory (Anthony & Biggs, 1997) looks for some answers to machine learning problems concerning sample size, such as: how many training examples are needed to lead to a successful learning, how many reasonable computations are needed for successful learning, and what is the estimated misclassifying rate in learning. Anthony and Biggs

1294

Tung-I Tsai, D.-C. Li / Expert Systems with Applications 35 (2008) 1293–1300

(1997) also developed a probably approximately correct (PAC) concept to identify classes of hypotheses that can/ cannot be learned from a polynomial number of training examples with a reasonable amount of computation. Furthermore, Vladimir (2000) defined a sample size as small if the ratio of the number of training samples to the VC dimensions of a learning machine function is less than 20. However, these theories focus on general machine learning with a large number of training samples, and cannot be applied to practical cases with the small data set learning model. Adding some artificial data to the system is one effective approach to increase learning accuracy. In Virtual Data Generation, mostly used in Pattern Recognition, Niyogi, Girosi, and Tomaso (1998) used prior knowledge obtained from the given small training set to create virtual examples for improving recognition ability. In their method, from a given 3D view of an object new views may be generated from any other angle through mathematical transformations. The new views generated are called virtual samples. With these virtual samples, a learning machine can verify an instance more precisely. Niyogi et al. (1998) proved that the process of creating virtual samples is mathematically equivalent to incorporating prior knowledge. Few closely related studies in the field of manufacturing are found in the literature: Li and Lin (2006) proposed the Functional Virtual Population (FVP) approach involving the use of a neural network in dynamic manufacturing environments that learn scheduling knowledge. The FVP approach was the first method proposed for small data set learning for scheduling problems, and it was developed to expand the domain of the system attributes and generate a number of virtual samples for constructing the so-called early scheduling knowledge. However, based on a trial and error procedure, the FVP approach requires many steps to complete the process. In 2006, Li, Wu, Tsai, and Chang (2006) utilized a unique data fuzzification technique, named mega-fuzzification, combined with a data trend estimation procedure to systematically expand the small data set obtained in the early stages of manufacturing. In their study, the Adaptive-Network-based Fuzzy Inference System (ANFIS), proposed by Jang (1993), was applied to neuro-fuzzy learning. Although according to the results achieved by Li et al. (2006), the learning accuracy is improved, the ANFIS is not commonly accepted in real world industries, and has been shown to be insensitive in small data set learning. Huang and Moraga (2004) combined the principle of information diffusion (Huang, 1997) with a traditional neural network, called a diffusion-neural network (DNN), for function learning. According to the results of their numerical experiments, the DNN improved the accuracy of the Backpropagation Neural Network (BPN). The information diffusion approach partially fills the information gaps caused by data incompleteness via applying fuzzy theories to derive new samples, but the research does not provide clear indications for determining the diffusion functions

and diffusion coefficients. Besides, the symmetric diffusion technique sometimes over simplifies a generation of new samples, which could cause over-estimation of the domain range. Either under-estimating or over-estimating the ranges would lead to reduced accuracy. Therefore, in order to fully fill the information gaps, Li, Wu, Tsai, and Lin (2007) substituted a sample set for diffusing samples one for one, using a technique called mega diffusion. Furthermore, a data trend estimation concept is combined with the mega diffusion technique to avoid over-estimating. This technique, which combines mega diffusion and data trend estimation, was called megatrend-diffusion by Li et al. (2007). Following mega-trenddiffusion, the production of virtual samples was proposed to improve the FMS scheduling accuracy. Unfortunately, in their research, the DNN is adopted to extract knowledge. The DNN has twice as many input factors as original ones, and this situation means the network has much more complex calculations than the ANN. Iva˘nescu, Bertrand, Fransoo, and Kleijnen (2006) proposed a procedure to solve the limited data problem in batch process industries. They assumed the job arrival moments obey a Poisson arrival process and utilized bootstrap procedure to generate another 250 the bootstrap jobs. According to their results, the procedure they proposed has improved the regression modeling performance. In this research, the bootstrap is applied to generate virtual samples in order to fulfill the data gaps. But, we execute the bootstrap procedure once for each input factor not to resample a job. All the processes will be described in detail in Section 3 and a real data set acquired from a Taiwanese manufacturer of multi-layer ceramic capacitors (MLCC) is used to illustrate the effectiveness of the proposed procedure. The research is organized as follows: Section 2 briefly states the physical meanings of the real data and input factors of the ANN model; and Section 3 offers a numerical example of the proposed method. Computational results and Conclusions are provided in Section 4. 2. The physical meanings of real data and input factors of the ANN model A capacitor is a kind of passive component used for storing and releasing an electric charge in a very short period of time. The MLCC (multi-layer ceramic capacitor) is a product whose basic material is ceramic powder. The cost of ceramic powder contributes about 40% of the entire production cost, and is thus a great influence on the profit margin. Most of the key technology for ceramic powder is currently held by Japanese manufacturers, though some domestic Taiwanese manufacturers are developing their own techniques. Many of the physical characteristics of ceramic powder are difficult to understand and control, and one notable problem is the low stability among batches. Consequently, manufacturers (users) must do some pilot runs after receiving a batch of powder to affirm

Tung-I Tsai, D.-C. Li / Expert Systems with Applications 35 (2008) 1293–1300

the dielectric constant (K-value), which is the dielectric property that determines the amount of electrostatic energy stored in a capacitor relative to a vacuum, and is considered the material’s most important characteristic. This work leads to a delay in production and thus increases in costs. Therefore, if one can appraise the K-value faster and infer the production parameters and the defect rate, the related lead time and stock costs will be decreased. In this study, we focused on AD143 ceramic powder made by Ferro Company for producing Y5V products. Since it is consumed in large quantities, instead of doing test runs, we expected to build up a forecast system through the data set of 44 samples (batches) provided by Ferro Company. After consulting the domestic Taiwanese manufacturer, the twelve input factors utilized in this study are listed as follows: (1) Specific surface area (SA): In the specific volume, the sum of the surface area. The measurement method is putting the powder into a container of regular size with gas, and measuring the surface area through the amount it absorbs. (2) Particle size distribution-90th percentile (PSD-90): In a unit of time, the 19th percentile of the size of the diameter of the flow through ceramic powder. (3) Particle size distribution-50th percentile (PSD-50): In a unit of time, the 50th percentile of the size of the diameter of the flow through ceramic powder. (4) Particle size distribution-10th percentile (PSD-10): In a unit of time, the 10th percentile of the size of the diameter of the flow through ceramic powder. (5) Moisture content (Mois): Measure the moisture content quality of the ceramic powder, after toasting at 130 C for an hour. (6) Sintering temperature (sinter temp.): The temperature in the stove while sintering in the laboratory. (7) The K-value (K): The K-value of samples acquired from the laboratory. (8) Dissipation factor (DF): Denotes the portion of the total energy in the capacitor that is lost as internal heat or the ratio of energy dissipated to the energy stored. A large DF will cause damage to the MLCC and the DF of the ideal capacitor is equal to 0. (9) Temperature characteristic (TC-min): The minimum change of capacitance over the specified temperature range, governed by the specific dielectric material. (10) Temperature characteristic (TC-max): The maximum change of capacitance over the specified temperature range, governed by the specific dielectric material. (11) Curie temperature (TC peak): The temperature at which the ceramic material will exhibit a peak or sudden increase in dielectric constant is called the Curie point. Chemical agents may be added, to shift and/ or depress the Curie point. This is a major consideration in designing specific TCC limits. (12) Particle size distribution-50th percentile of the production line (D-50): In a unit of time, the fiftieth per-

1295

centile of the size of the diameter of the flow through ceramic powder measured in the powder milling station of the manufacturing company. The only value for the output note of the network is the real K-value (RK): the K-value of samples acquired from the pilot runs. 3. The detailed processes The bootstrap, proposed by Efron and Tibshirani (1993), implies resampling a given data set with replacement and is used for measuring the accuracy of statistical estimates. In this paper, we will attempt to use the bootstrap method to generate virtual samples and solve the learning problem using the data sheet (44 data in total) provided by a manufacturer of MLCC. At the beginning of this case study, we simulate a situation that when the company is in the early stage of production, when only three data are available. Thus, we use only three data for training the neural network and then use the rest of the data for validation. Following this, to represent each of the manufacturing stages we will try other data scales (5–35, in increments of 5) to the training set each time. To explain the procedure in detail, a total of 44 data are obtained from the manufacturer, shown in Table 1. Among them, this research randomly selects a specific number (3 and 5–35, in increments of 5) of data as the training set from Table 1, and used the rest as the testing data for evaluating the average learning error rates of the ANN. Thus the experimental scales in this study are 3, 5, 10, 15, 20, 25, 30 and 35 training data. The following is an example of the process with 10 training data, and the procedure is depicted in steps: Step 1. Select 10 data randomly from Table 1 as the training data for training the ANN. The selected data set is listed in Table 2. Step 2. Use the data in Table 2 to construct an ANN. The RK value in Table 2 is the value assigned to the output node of the ANN; others are the inputs of the ANN. Although other researchers, such as Amirakian and Nishimura (1994) and Wang, Dimassimo, Tham, and Morris (1994), provided algorithms suggesting ways to determine the number of hidden nodes and hidden layers, the number of hidden nodes and hidden layers are believed to differ case by case. In this study, the optimal structure of the ANN is determined by the Evolutionary Optimizer tool of Pythia software. Pythia is a program for the development and design of neural networks and features backpropagation networks. This tool executes genetic algorithm (GA) with crossover rate equals 0.2 and mutation rate equals 0.04. Initially, the original generation containing 50 randomly created networks and each network within this generation will be trained shortly and

1296

Tung-I Tsai, D.-C. Li / Expert Systems with Applications 35 (2008) 1293–1300

Table 1 The 44 data obtained from quality certificate of the supplier No.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

Ferro property

Ferro laboratory

Factory production 4

SA

PSD-90

PSD-50

PSD-10

Mois

Sinter temp.

K

DF · 10

TC-min

TC-max

TC peak

D-50

RK

2.8 2.9 2.8 3 3 3.1 3 3.2 3.0 2.7 3.1 3.2 2.9 2.9 3.2 3.1 3.3 3.2 3 3 3.1 3.1 3.1 3.3 3.1 3 2.9 3 3 3.2 3.2 3.1 3.2 2.9 2.6 2.8 3.1 2.9 2.9 2.8 2.8 3 3 3.1

2.15 2.1 2.05 2.06 1.97 1.97 1.97 2.06 2.11 2.07 2.12 2.15 2.01 2.04 2.00 1.93 1.88 1.89 1.87 1.93 2.01 1.93 1.83 1.98 2 1.92 1.92 1.98 1.97 2.04 1.91 1.87 1.82 1.83 1.93 1.91 1.97 1.91 1.94 1.88 1.86 1.87 1.9 1.96

1 0.98 0.98 0.95 0.94 0.94 0.92 0.98 0.96 0.94 0.96 0.95 0.95 0.98 0.97 0.94 0.96 0.92 0.94 0.94 0.94 0.94 0.96 0.92 0.93 0.95 0.95 0.94 0.99 0.96 0.93 0.96 0.95 0.95 0.98 0.97 0.97 0.96 0.96 0.99 0.98 0.97 0.94 0.96

0.44 0.44 0.46 0.43 0.45 0.45 0.43 0.47 0.43 0.42 0.42 0.41 0.44 0.45 0.48 0.47 0.5 0.45 0.46 0.46 0.43 0.46 0.5 0.43 0.43 0.47 0.47 0.44 0.5 0.45 0.46 0.5 0.5 0.49 0.49 0.49 0.48 0.48 0.47 0.52 0.51 0.51 0.47 0.48

0.05 0.02 0.02 0.04 0.06 0.06 0.06 0.03 0.05 0.05 0.01 0.02 0.04 0.02 0.01 0.00 0.07 0.01 0.02 0.05 0.08 0.01 0.07 0.04 0.02 0.01 0.05 0.01 0.05 0.05 0.06 0.04 0.02 0.01 0.04 0.02 0.04 0.03 0.03 0.01 0.08 0.01 0.01 0.07

1297 1307 1271 1292 1287 1282 1286 1274 1306 1268 1284 1283 1272 1281 1272 1268 1281 1296 1300 1284 1270 1292 1296 1262 1270 1305 1278 1269 1274 1270 1273 1280 1280 1281 1270 1286 1293 1283 1273 1288 1266 1293 1292 1280

11,689 11,142 10,834 10,345 10,286 10,806 10,268 10,904 10,282 10,620 10,539 10,797 10,846 10,533 11,206 10,523 10,232 10,651 10,508 11,003 11,117 10,479 10,875 11,750 11,101 10,493 11,442 10,682 11,008 10,994 11,295 11,018 11,590 11,384 11,513 10,875 11,804 11,388 12,242 11,312 10,858 11,170 11,673 11,647

36 40 41 37 28 30 29 29 27 35 30 30 31 30 34 33 36 31 26 29 39 35 23 34 25 40 27 28 22 25 32 28 27 25 24 21 25 26 30 30 36 31 33 45

74.3 73.6 74.5 73.8 73.5 74.5 74.3 74.4 74 74.2 73.6 74 74.2 73.9 74.2 74.3 74 74.2 74 74.9 75 74 74.7 75.6 74.8 74.1 75.1 74.5 74.8 74.6 75 74.6 75.3 75.6 75.6 75.4 76.4 76.4 75.4 75.1 74.8 74.8 75.2 75.6

46.8 41.8 42.1 44.5 49.3 47.5 49.3 49.3 48.5 40.6 46.5 43.5 45.7 41.9 41.7 40.4 41.1 45.3 47.1 46.1 43.8 43.9 51.3 49 53.4 44.3 52.1 52.5 54 54 50.6 51.4 58.3 54.1 55.2 55 59.8 56.1 55.3 51.2 47.3 50.1 49.7 48.2

7.3 7.2 6.3 7.2 9 7.2 7.4 7.5 8.2 5.5 6.9 6.5 6.8 6.5 6.1 5.6 5.9 6.9 7.7 6.5 5.7 6.8 7.2 6.4 6.7 5.2 5.9 6.8 6.7 7 5.8 6.3 7 6.4 6.2 6.4 7 6.3 6.3 6.1 5.3 5.6 5.2 5.1

0.970 0.995 0.940 0.910 0.991 0.905 0.854 0.876 0.958 0.888 0.918 0.843 0.824 0.862 0.879 0.886 1.000 0.883 1.020 0.830 0.908 0.948 0.873 0.917 0.850 0.880 1.030 0.950 0.846 0.914 0.868 0.859 0.955 0.878 0.991 1.062 0.873 0.956 0.874 0.918 1.008 0.929 0.957 0.830

18,092 17,145 16,066 16,188 16,479 16,642 17,153 15,201 17,112 17,577 17,287 17,292 18,302 16,662 18,527 16,606 18,030 17,781 17,537 16,935 18,218 18,181 17,286 16,681 16,603 18,230 17,187 17,047 17,308 17,052 18,624 17,687 17,635 18,623 20,156 20,164 19,840 20,230 19,044 17,427 17,747 18,901 16,640 17,388

Table 2 The 10 training data selected randomly from the total data set No.

SA

PSD-90

PSD-50

PSD-10

Mois

Sinter temp.

K

DF · 104

TC-min

TC-max

TC peak

D-50

RK

5 7 12 19 21 22 34 35 38 39

3 3 3.2 3 3.1 3.1 2.9 2.6 2.9 2.9

1.97 1.97 2.15 1.87 2.01 1.93 1.83 1.93 1.91 1.94

0.94 0.92 0.95 0.94 0.94 0.94 0.95 0.98 0.96 0.96

0.45 0.43 0.41 0.46 0.43 0.46 0.49 0.49 0.48 0.47

0.06 0.06 0.02 0.02 0.08 0.01 0.01 0.04 0.03 0.03

1287 1286 1283 1300 1270 1292 1281 1270 1283 1273

10,286 10,268 10,797 10,508 11,117 10,479 11,384 11,513 11,388 12,242

28 29 30 26 39 35 25 24 26 30

73.5 74.3 74 74 75 74 75.6 75.6 76.4 75.4

49.3 49.3 43.5 47.1 43.8 43.9 54.1 55.2 56.1 55.3

9 7.4 6.5 7.7 5.7 6.8 6.4 6.2 6.3 6.3

0.94 0.92 0.95 0.94 0.94 0.94 0.95 0.98 0.96 0.96

16,479 17,153 17,292 17,537 18,218 18,181 18,623 20,156 20,230 19,044

Tung-I Tsai, D.-C. Li / Expert Systems with Applications 35 (2008) 1293–1300

1297

Table 3 The 100 virtual sample values acquired in Step 3 No.

SA

PSD-90

PSD-50

PSD-10

Mois

Sinter temp.

K

DF · 104

TC-min

TC-max

TC peak

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

2.9 2.6 2.6 3 2.9 3.1 3.2 3 2.9 2.9 3.1 3.1 2.9 3 2.6 2.9 3 2.9 2.9 3 2.6 2.6 3.1 2.6 3 2.9 3.1 2.9 3 3 3 3 3.1 3 2.9 3 3.1 3.2 2.9 2.9 3.2 2.9 2.9 2.6 2.9 3 3 3.2 3 3.2 3 2.6 3.2 3.1 2.9 3.1 2.6 3 3 3.1 2.9 3 2.9

1.93 1.91 1.91 1.93 1.94 1.94 1.93 1.97 1.83 2.01 1.94 1.97 1.83 1.94 1.87 2.01 1.93 1.97 1.83 1.94 1.94 1.97 2.01 2.01 2.01 1.83 1.97 1.83 1.87 1.93 1.97 1.91 1.83 1.94 1.97 1.93 1.94 1.93 1.93 2.15 1.87 1.87 1.91 1.97 1.83 1.91 1.91 2.01 2.01 2.01 1.91 1.97 1.91 1.91 1.94 1.94 2.15 2.01 1.94 2.01 2.01 1.91 1.93

0.96 0.94 0.94 0.94 0.98 0.94 0.95 0.92 0.98 0.98 0.94 0.96 0.94 0.96 0.94 0.94 0.94 0.95 0.96 0.94 0.94 0.94 0.94 0.94 0.92 0.95 0.94 0.94 0.96 0.95 0.94 0.95 0.94 0.94 0.94 0.94 0.95 0.96 0.96 0.92 0.94 0.94 0.96 0.96 0.94 0.96 0.95 0.96 0.95 0.94 0.94 0.94 0.94 0.95 0.94 0.94 0.94 0.98 0.95 0.94 0.96 0.95 0.95

0.49 0.49 0.48 0.43 0.43 0.45 0.43 0.43 0.49 0.41 0.46 0.49 0.49 0.45 0.49 0.45 0.41 0.41 0.41 0.46 0.45 0.46 0.46 0.43 0.47 0.43 0.46 0.49 0.49 0.47 0.45 0.43 0.43 0.43 0.46 0.46 0.48 0.41 0.47 0.45 0.45 0.46 0.48 0.43 0.43 0.43 0.43 0.49 0.41 0.47 0.41 0.49 0.49 0.49 0.49 0.45 0.46 0.49 0.49 0.41 0.46 0.45 0.45

0.06 0.02 0.03 0.03 0.03 0.01 0.04 0.02 0.04 0.01 0.02 0.03 0.06 0.04 0.01 0.03 0.03 0.03 0.03 0.03 0.01 0.02 0.03 0.01 0.02 0.08 0.01 0.06 0.02 0.01 0.03 0.06 0.01 0.06 0.01 0.01 0.03 0.01 0.08 0.04 0.01 0.02 0.08 0.08 0.06 0.01 0.02 0.06 0.08 0.02 0.02 0.01 0.01 0.08 0.06 0.04 0.08 0.04 0.02 0.02 0.01 0.01 0.06

1270 1270 1270 1270 1286 1270 1281 1281 1281 1273 1286 1287 1286 1283 1270 1281 1286 1286 1287 1273 1292 1283 1281 1292 1270 1281 1270 1283 1283 1270 1300 1270 1287 1283 1273 1300 1270 1270 1292 1292 1292 1270 1286 1286 1270 1300 1283 1287 1283 1287 1300 1281 1283 1270 1292 1283 1286 1270 1273 1286 1283 1273 1283

10,268 12,242 11,513 10,797 10,479 11,513 12,242 10,268 10,797 10,508 11,384 10,286 10,479 11,388 10,268 11,384 10,479 10,797 10,286 11,117 11,388 10,508 10,508 11,388 10,508 10,508 11,117 11,513 11,513 11,388 11,384 11,513 11,384 11,513 10,479 11,117 10,268 10,508 10,286 11,117 11,384 11,513 10,286 10,268 10,508 11,384 10,479 10,797 11,513 10,286 10,268 11,117 10,797 10,508 11,384 10,286 10,286 11,384 11,513 10,479 11,388 11,388 11,117

29 28 30 39 26 28 28 26 29 35 39 26 35 28 30 26 28 28 29 30 39 28 35 26 35 26 24 30 26 29 24 39 35 29 24 28 30 29 25 26 26 24 30 28 25 39 30 26 24 35 29 26 39 26 26 25 25 24 35 30 26 30 28

73.5 74.3 75.6 74 74 75 74 76.4 74 74 74 74 74 76.4 76.4 75.6 73.5 75.6 74 74 76.4 75 73.5 75.6 74 74.3 75.4 75.6 73.5 74 73.5 73.5 74 76.4 74 74 74 75.6 75.6 74 74.3 75.6 75.4 73.5 74 74 73.5 75.6 74 74 76.4 75 75.4 75 76.4 76.4 75.6 75.6 74 74.3 74 75.6 75.6

43.5 49.3 49.3 43.8 49.3 56.1 49.3 49.3 55.2 49.3 43.8 49.3 47.1 47.1 55.2 43.9 54.1 49.3 49.3 55.2 43.9 55.3 43.8 49.3 49.3 43.8 43.9 49.3 56.1 55.2 43.5 47.1 47.1 54.1 55.2 47.1 43.9 55.2 55.2 43.9 47.1 49.3 55.2 55.3 43.5 43.8 54.1 55.3 49.3 49.3 47.1 55.3 43.5 56.1 56.1 49.3 56.1 43.5 56.1 49.3 47.1 49.3 54.1

5.7 0.94 18,356.5 6.4 0.94 18,922.0 6.2 0.95 20,011.2 6.2 0.92 18,350.1 6.8 0.96 17,716.7 7.4 0.98 18,513.0 6.4 0.95 18,348.7 5.7 0.94 19,712.8 7.7 0.94 19,055.4 6.3 0.94 17,944.0 7.4 0.95 18,315.4 6.3 0.96 18,371.4 6.3 0.96 18,430.3 9 0.96 18,041.9 5.7 0.98 20,188.3 5.7 0.98 18,575.9 7.4 0.95 17,976.9 6.3 0.94 18,371.4 9 0.94 17,209.2 6.3 0.92 18,241.2 6.3 0.96 19,530.4 6.3 0.94 19,889.0 6.3 0.94 17,987.7 6.5 0.95 19,589.1 7.7 0.94 18,005.1 7.4 0.94 17,254.0 7.4 0.94 18,357.5 6.4 0.94 19,527.0 5.7 0.98 18,472.4 6.5 0.94 18,195.4 7.7 0.94 17,437.5 6.3 0.96 18,317.6 5.7 0.94 18,317.9 7.7 0.94 19,369.1 6.2 0.94 17,482.6 7.7 0.94 17,861.3 6.3 0.96 18,330.2 6.8 0.98 18,436.2 7.4 0.94 19,741.3 6.3 0.95 17,431.2 7.7 0.94 17,988.6 6.2 0.94 18,540.0 6.8 0.98 19,512.5 6.2 0.94 17,588.7 7.4 0.94 17,394.4 6.3 0.92 18,065.1 6.2 0.96 18,473.6 6.5 0.95 19,141.1 6.3 0.98 17,969.3 6.2 0.92 18,345.4 7.7 0.94 17,198.4 6.8 0.96 19,216.1 7.7 0.95 18,458.0 6.5 0.98 18,782.8 5.7 0.96 20,219.7 6.3 0.94 19,116.2 6.3 0.96 19,516.8 6.5 0.96 18,433.4 7.7 0.92 18,309.1 5.7 0.98 18,284.2 5.7 0.94 18,471.5 7.7 0.95 18,546.5 5.7 0.92 20,209.0 (continued on next page)

D-50

RK

1298

Tung-I Tsai, D.-C. Li / Expert Systems with Applications 35 (2008) 1293–1300

Table 3 (continued) No.

SA

PSD-90

PSD-50

PSD-10

Mois

Sinter temp.

K

DF · 104

TC-min

TC-max

TC peak

D-50

RK

64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

3 2.9 3 3.1 2.9 2.9 3.2 3 3.2 2.9 3.2 3.2 3 3.1 3 2.9 2.9 3 2.9 3.1 2.9 3 2.6 2.9 2.6 3 2.6 3.2 2.6 3 2.9 3 3 2.6 3.1 2.9 3

1.97 1.83 1.93 1.83 2.01 2.01 1.97 1.97 1.97 1.97 1.97 2.15 1.97 2.15 1.87 2.01 1.97 1.93 2.15 1.97 1.87 1.93 1.83 1.93 1.97 1.94 1.93 2.15 1.97 1.93 1.94 2.01 1.93 2.15 1.97 1.97 1.83

0.94 0.95 0.96 0.94 0.94 0.98 0.94 0.92 0.95 0.95 0.95 0.94 0.94 0.95 0.95 0.96 0.92 0.92 0.95 0.94 0.92 0.94 0.94 0.94 0.94 0.92 0.94 0.94 0.96 0.92 0.98 0.96 0.92 0.92 0.94 0.94 0.96

0.49 0.47 0.43 0.47 0.46 0.48 0.46 0.48 0.43 0.46 0.45 0.43 0.49 0.47 0.48 0.43 0.45 0.49 0.46 0.41 0.43 0.45 0.48 0.46 0.48 0.41 0.43 0.43 0.48 0.47 0.43 0.41 0.43 0.49 0.49 0.48 0.45

0.08 0.03 0.03 0.06 0.02 0.02 0.01 0.08 0.03 0.03 0.04 0.03 0.03 0.03 0.04 0.06 0.06 0.04 0.03 0.02 0.06 0.01 0.03 0.02 0.03 0.06 0.06 0.01 0.01 0.01 0.08 0.03 0.08 0.08 0.02 0.03 0.02

1292 1270 1270 1283 1287 1287 1281 1270 1292 1286 1283 1270 1270 1286 1287 1292 1270 1300 1281 1292 1270 1286 1283 1286 1270 1270 1270 1270 1273 1270 1283 1283 1283 1300 1287 1286 1273

10,268 11,513 10,508 11,388 12,242 10,286 12,242 12,242 11,513 11,117 10,797 12,242 12,242 12,242 10,797 10,268 10,268 11,117 11,388 12,242 10,286 10,479 10,268 10,479 12,242 11,384 10,268 11,384 10,797 11,117 11,117 10,286 11,384 11,513 10,286 10,268 11,117

30 39 30 28 26 28 30 35 30 26 24 30 26 24 26 26 24 28 26 30 28 28 26 39 26 35 26 24 28 25 25 25 26 24 30 26 29

75.6 76.4 74.3 75.4 74 76.4 74 75.4 75 74 73.5 76.4 75.6 74 75 75 75 75.6 73.5 73.5 74 75.6 75.6 75.4 74 74 75.6 74.3 75.6 74 75.4 75.4 74 74 73.5 75 75.6

43.5 49.3 43.9 43.5 43.5 49.3 43.5 43.5 55.2 43.8 55.2 49.3 55.3 49.3 49.3 54.1 55.2 55.3 54.1 43.5 49.3 47.1 54.1 43.9 56.1 54.1 43.9 43.8 43.9 43.8 54.1 43.9 47.1 49.3 47.1 54.1 56.1

6.5 9 7.7 6.8 7.4 6.2 6.3 5.7 6.4 6.2 6.4 7.7 7.7 6.8 5.7 6.5 6.4 6.4 7.7 9 7.7 7.7 7.4 5.7 6.5 7.4 6.3 6.8 6.8 7.7 6.3 6.3 5.7 9 7.7 7.7 7.7

0.94 0.96 0.94 0.95 0.94 0.96 0.95 0.94 0.94 0.94 0.96 0.94 0.92 0.94 0.94 0.96 0.95 0.98 0.94 0.92 0.94 0.98 0.94 0.98 0.98 0.98 0.96 0.96 0.98 0.94 0.94 0.95 0.95 0.95 0.95 0.94 0.98

17,574.4 18,783.7 17,436.0 18,306.1 17,834.6 18,720.1 18,407.5 19,760.3 18,437.3 17,684.6 18,371.7 18,752.3 19,300.2 18,041.0 18,518.3 17,879.2 18,757.9 19,147.6 17,943.8 17,530.8 17,639.0 18,244.7 19,983.9 18,500.0 19,058.8 18,376.3 17,450.6 17,916.9 17,798.8 18,213.4 20,140.5 17,284.4 17,921.7 16,652.4 18,170.0 18,823.9 18,553.8

its fitness determined according to the parameters in ‘‘goals to achieve’’. The 10 fittest networks of the old generation are leaved as the parents of the next generation. It works persistently until it finds a network with a fitness of 100 or the 1000th generation. The ‘‘goals to achieve’’ is a setting to specify what the network should be optimized for. There are three goals possible: 1. Optimize for medium deviation (Ø deviation<). 2. Optimize for max. deviation within the pattern set (*deviation<). 3. Optimize for size (# neurons < =). In this research, we use the default setting of Pythia software. That is, the medium deviation should be below 0.001, the max. deviation should be below 0.1 and the network size should be below or equal 100. Both checked goals will contribute equally to the overall fitness of an evolutionary created network.

After determining the topology of network, trains the network until 1000 repetitions or 300 s have passed with learn rate equals 0.5 as the default settings. Step 3. Execute the bootstrap procedure for the data in Table 2 once for each input factor to acquire virtual samples. After the input factors of a virtual sample are obtained, the RK value is obtained using the ANN constructed in Step 2. Repeating the procedure 100 times, we can acquire 100 virtual samples (the determination of the optimal number of virtual samples needs further study). The results of the values of each factor are given in Table 3. Step 4. Apply the data in Tables 2 and 3 together as the training data to train a new ANN. The optimal structure of the new ANN is also determined by the Pythia software mentioned as Step 2. Step 5. Use the rest of the data in Table 1 as the testing data for the new ANN to calculate the average error rates. The average error rate is defined as:

Tung-I Tsai, D.-C. Li / Expert Systems with Applications 35 (2008) 1293–1300

1299

Table 4 The 34 RK and the output of network values RK Output

18,624 17,852.89397

17,427 18,000.44408

16,935 16,973.43009

16,662 16,600.94971

17,577 16,576.64653

18,230 17,737.32225

18,901 17,895.05039

17,781 16,474.66786

RK Output

15,201 17,210.32833

16,066 17,365.64859

16,640 17,973.51696

16,603 17,608.13645

17,187 18,803.94318

17,388 18,083.67605

17,747 18,613.52504

19,840 19,805.3381

RK Output

17,286 17,012.58289

16,606 17,655.21324

17,052 17,143.52793

18,092 17,104.96129

17,112 15,352.66035

18,527 17,531.23692

18,030 17,217.84834

17,145 16,738.40954

RK Output

20,164 19,177.95804

16,642 16,696.15803

17,047 17,670.02683

17,308 17,668.71213

17,635 17,947.11132

16,681 17,982.80631

16,188 15,796.90434

18,302 16,889.58651

RK Output

17,287 15,743.9563

17,687 17,819.19656

Table 5 The computational results

Average error rate

Scales of training data sets

3 (%)

5 (%)

10 (%)

15 (%)

20 (%)

25 (%)

30 (%)

35

Bootstrap Primitive data

8.1854 11.2617

6.8855 8.8688

6.0580 7.0899

5.8520 6.7002

5.2769 7.0095

5.0971 6.6064

4.3348 6.0171

3.5647 6.5554

12.0000% 11.0000%

11.2617%

Average error rate

10.0000% 9.0000%

8.1854%

8.8688%

8.0000%

7.0899% 6.7002%

7.0000%

7.0095%

6.6064%

6.5554%

6.8855% 6.0000%

6.0580%

5.0000% 4.0000%

5.8520%

Bootstrap

5.2769%

5.0971%

ANN

6.0171% 4.3348% 3.5647%

3.0000% 0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738 Scales of training data set

Fig. 1. The comparisons of computational results.

Pn

i¼1

jRKi output of networki j RKi

n where n is the number of the samples in validation set and i = 1,2, . . ., n and in this example n equals 34. The RK and the output of network values are shown in Table 4, and the average error rate is 0.046967. Step 6. Repeat steps 1 to 5 10 times and calculate the average error rate. Step 7. Repeat steps 1 to 6 with different scales of training data sets. 4. Computational results and conclusions The computational results are compared with the results obtained using the primitive data, as represented in Table 5 and Fig. 1.

It is obviously the average error rate of ANN using only primitive data does not converge as the number of training samples is increased. That is, the ANN using only primitive data can not build up a robust forecast neural network by using such rare pilot runs data. However, the proposed procedure of this study reveals lower and stable learning errors. Hence, when the data collected is insufficient, the procedure of this study works to make the forecast system better and more stable. The above results are encouraging, and as shown in Fig. 1, when the training data set increases, the average error rate monotonically decreases. This research provides a useful forecast model that can build up a precise model of the powder earlier in the production process, and thus raise the profit margin of the factory. If MLCC manufacturers follow the procedure outlined in this research, it is expected that the turnover rate of stock will rise and thus interest costs can be reduced. The storehouse utilization rate will also improve.

1300

Tung-I Tsai, D.-C. Li / Expert Systems with Applications 35 (2008) 1293–1300

The studies of small samples are few at present, and there is a lot of potential to seek better theories to obtain a higher rate of accuracy. We believe this research can be applied in the pilot run stages for other powers. We also believe that how to forecast the yields more effectively for other new products can be regarded as a worthy subject for future research. References Amirakian, B., & Nishimura, H. (1994). What size network is good for generalization of a specific task of interest? Neural Networks, 7, 321–329. Anthony, M., & Biggs, N. (1997). Computational learning theory. Cambridge University Press. Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. New York: Chapmen & Hall. Huang, C. F. (1997). Principle of information diffusion. Fuzzy Sets and Systems, 91, 69–90. Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35, 137–161. Iva˘nescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. C. (2006). Bootstrapping to solve the limited data problem in production

control: An application in batch process industries. Journal of the Operational Research Society, 57, 2–9. Jang, J. S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man and Cybernetics, 23, 665–685. Li, D. C., & Lin, Y. S. (2006). Using virtual sample generation to build up management knowledge in the early manufacturing stages. European Journal of Operational Research, 175, 413–434. Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. (2006). Using megafuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33, 1857–1869. Li, D. C., Wu, C. S., Tsai, T. I., & Lin, Y. S. (2007). Using mega-trenddiffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34, 966–982. Niyogi, P., Girosi, F., & Tomaso, P. (1998). Incorporating prior information in machine learning by creating virtual examples. In: Proceeding of the IEEE (pp. 275–298). Vladimir, N. V. (2000). The nature of statistical learning theory. New York: Springer. Wang, Z. N., Dimassimo, C., Tham, M. T., & Morris, A. J. (1994). A procedure for determining the topology of multilayer feedforward neural networks. Neural Networks, 7, 291–300.