AE—Automation and Emerging Technologies

AE—Automation and Emerging Technologies

Biosystems Engineering (2002) 82 (2), 151–159 doi:10.1006/bioe.2002.0064, available online at http://www.idealibrary.com on AE}Automation and Emerging...

147KB Sizes 4 Downloads 181 Views

Biosystems Engineering (2002) 82 (2), 151–159 doi:10.1006/bioe.2002.0064, available online at http://www.idealibrary.com on AE}Automation and Emerging Technologies

Specialist Neural Networks for Cereal Grain Classification N. S. Visen1; J. Paliwal1; D. S. Jayas1; N. D. G. White2 1

Department of Biosystems Engineering, University of Manitoba, Winnipeg, MB, Canada R3 T 5V6; e-mail of corresponding author: digvir [email protected] 2 Cereal Research Centre, Agriculture and Agri-Food Canada, 195 Dafoe Road, Winnipeg, MB, Canada R3 T 2M9; e-mail: [email protected] (Received 3 April 2001; accepted in revised form 13 February 2002; published online 11 June 2002)

In the past few years, artificial neural networks have gained widespread acceptance for cereal grain classification and identification tasks. With the availability of different types of neural network architectures, the choice of the architecture for a particular task becomes crucial. It was hypothesized that robust specialist networks can be designed using a combination of simple networks having similar or different network architectures. To test this hypothesis, the classification accuracies of four simple network architectures (namely, back propagation network (BPN), Ward network, general regression neural network (GRNN) and probabilistic neural network (PNN)) were compared with the accuracies given by specialist networks. Each specialist network was designed using a combination of five simple networks, each specializing in classifying one grain type. The grain types used in this study were Canada Western Red Spring (CWRS) wheat, Canada Western Amber Durum (CWAD) wheat, barley, oats and rye. To evaluate the classification accuracy of the different neural network architectures, high resolution colour images of 7500 kernels (1500 kernels of each grain type) were taken for training and testing of networks. For each kernel, eight morphological features (namely, area, perimeter, length of major axis, length of minor axis, elongation, roundness, Feret diameter and compactness) and four colour features (namely, mean, median, mode and standard deviation of the grey-level values of the objects in the image) were extracted and used as input to the neural networks. Best classification accuracies (987, 993, 967, 984, and 969 for barley, CWRS wheat, CWAD wheat, oats and rye, respectively) were obtained using specialist probabilistic neural networks. # 2002 Silsoe Research Institute. Published by Elsevier Science Ltd. All rights reserved

1. Introduction Machine vision systems (MVS) are a key link in the process of achieving total automation for various processes in different types of industries. In the recent past, efforts to develop MVS for industrial applications has increased considerably owing to availability of inexpensive and fast processing hardware. Some of such systems are fully functional in manufacturing, processing and packaging industries where most of the items handled have similar shape, size, colour or other visually distinguishable features. This, however, is not applicable to agricultural products such as cereal grains where features such as shape, size, colour and texture are not governed by a unique mathematical function. The natural variability of these products makes the task of identification and classification extremely challenging and computationally intensive because of the need to have a large number of classification 1537–5110/02/$35.00

features. Much research has been conducted into determining the potential of morphological features to classify different grain species, classes, varieties, damaged grains and impurities using statistical pattern recognition techniques (Segerlind & Weinberg, 1972; Neuman et al., 1987; Keefe, 1992; Sapirstein & Kohler, 1995; Barker et al., 1992a, 1992b, 1992c, 1992d; Paliwal et al., 1999). Some researchers (Neuman et al., 1989a, 1989b; Luo et al., 1999) have tried to use colour features for grain identification but variability in the illumination of common light sources poses a practical problem in such cases. Majumdar et al. (1999) reported that a combination of morphological, textural and colour features resulted in the best classification accuracy. The classification accuracy can be affected by the classification criteria selected. Classification criteria partition the data set thereby placing the object into one of the several categories. The classification criteria 151

# 2002 Silsoe Research Institute. Published by Elsevier Science Ltd. All rights reserved

152

N. S. VISEN ET AL.

are commonly developed using statistical methods or neural network (NN) methods. The statistical methods are based on Bayes minimum error rule (Duda & Hart, 1973). Artificial neural networks are frequently used for pattern recognition. These networks are inspired by the concept of biological nervous systems, have proven to be robust in dealing with ambiguous data and the kind of problems that require the interpolation of large amounts of data. Instead of sequentially performing a programme of instructions, NNs explore many hypotheses simultaneously using massive parallelism (Lippmann, 1987). This, however, is true only if a specific hardware implementation is used. Neural network methods involve developing a computing network of highly interconnected processing elements called ‘neurons’ or ‘nodes’. Neural networks have the potential for solving problems in which some inputs and corresponding output values are known, but the relationship between the inputs and outputs is not well understood or is difficult to translate into a mathematical function. Thus, these classifiers have a great potential in tasks involving grading, sorting and identifying agricultural products. Various NN architectures and techniques have been developed by researchers in the past decade for application in the agricultural industry (Jayas et al., 2000). The developed architecture for a problem, however, is very specific to that problem and may not be suitable for classification of other types of products. The published literature indicates that in most of the efforts to classify agricultural products using NN classifiers, multi-layer feed forward networks such as back propagation network (BPN) were used (Liao et al., 1993; Romaniuk et al., 1993; Dowell, 1993; Sayeed et al., 1995; Ozer et al., 1995; Shibata et al., 1996; Ghazanfari et al., 1998). These networks were made of a single neural network that used certain input features and returned one or several output values that were indicative of the class of unknown object that had been presented to the neural network. With the growing popularity of NN techniques in image analysis, it is important to explore the applicability of various NN architectures for classification of agricultural products including cereal grains. At present, the identification and grading of cereal grains is done manually in Canada. This task is subjective and time consuming. A machine vision system to replace this manual assessment of grain samples is highly desired by the grain industry. Despite the research work, several ‘bottlenecks’ have restricted the implementation of machine vision technology in the grain industry. A part of the problem is the slow classification process. Neural networks, which perform faster classification than most

of their statistical counterparts, might provide a solution to the problem of slow classification in such systems. Previous research (Paliwal et al., 2001) indicates that use of one neural network may not result in high classification accuracy if the objects to be classified have very subtle differences in morphological, colour and textural features. To overcome this problem, a combination of more than one neural network can be used that ‘specialize’ in identifying a particular type of objects. In light of the above problems, the objectives of this study were: (1) to develop four ‘single’ NN architectures for classification of Canada Western Red Spring (CWRS) wheat, Canada Western Amber Durum (CWAD) wheat, barley, oats and rye, using eight morphological and four colour features; (2) to develop four ‘specialist’ NN architectures for classification of CWRS wheat, CWAD wheat, barley, oats and rye, using eight morphological and four colour features; and (3) to compare the performance of the architectures in the classification process.

2. Materials and Methods 2.1. Image acquisition system The hardware consisted of a 3-chip charge coupled device (CCD) colour camera (DXC-3000A, Sony, Japan) with a zoom lens of 10–120 mm focal length (VCL-1012BY), a camera control unit (CCU-M3, Sony, Japan), a personal computer (PC) (PIII 450 MHz), colour frame grabbing board (Matrox Meteor-II multi-channel, Matrox Electronic Systems Ltd., Montreal, PQ), and a diffuse illumination chamber. The camera was mounted over the illumination chamber on a stand which provided easy vertical movement. The National Television System Committee (NTSC) composite colour signal from the camera was converted by the camera control unit at a speed of 30 frames s1 into three parallel analog video signals, namely red (R), green (G), and blue (B), corresponding to the three NTSC colour primaries and a synchronous signal. The frame grabber installed in the PC digitized the RGB analogue video signals from the camera control unit into three 8-bit 640 by 480 square pixel digital images and stored them in three on-board buffers. The spatial resolution of each pixel was 60 mm. The digital images were then sent to the colour monitor for on-line display and transferred to the networked hard disc for storage.

153

NEURAL NETWORKS FOR CEREAL GRAIN

2.2. Software The images were acquired using the Meteor Lite software library (Matrox Electronic Systems Ltd., Montreal, PQ) that was provided along with the frame grabber. The image acquisition programme enabled the saving of an image to a computer file in tagged image file (tif) format. The image analysis script was developed using macros of the program called Image Tools (developed at the University of Texas Health Science Center at San Antonio, TX and available from the Internet by anonymous FTP from ftp://maxrad6.uthscsa.edu) under the Windows environment. For developing the neural networks, a Windows based software, NeuroShell 2 (Ward Systems Group, Frederick, MD), was used. 2.3. Grain samples The Industry Services Division of the Canadian Grain Commission, Winnipeg, Manitoba, provided the cereal grain samples used in this study. The uncleaned commercial samples of five grain types (barley, CWRS wheat, CWAD wheat, oats and rye) were collected from ten different growing regions across Western Canada. These five grain types were chosen because in Canada, these grains are handled the most by railcars, grain elevators, and cargo ships. Therefore, the problem of cross-contamination within these grain types is most prevalent. The grain samples were manually presented to the camera in non-touching fashion to prevent overlapping or touching instances of grain kernels. For each grain type, eight morphological features and four colour features were extracted from the colour images of 1500 kernels (150 each from ten growing regions). 2.4. Morphological features The feature extraction algorithm extracted the following morphological features of individual kernels for each grain type. Area } the area A of the kernel is measured as the number of pixels in the polygon. Perimeter } the perimeter P is the mathematical sum of the Euclidean distances between all the successive pairs of pixels around the circumference of the kernel. Major axis length } it is the length of the longest line L that can be drawn through the object. Minor axis length } it is the length of the longest line l that can be drawn though the object perpendicular to the major axis. Elongation } it is the ratio of the length of the minor axis to the length of the major axis.

Roundness } This is given by ð4pAÞ=P2 Feret diameter } it is the diameter of a circle having the same area as the object and is computed as ½ð4AÞ=p1=2 Compactness } it provides a measure of the object’s roundness. It is the ratio of the Feret diameter to the object’s length L: ½ð4AÞ=p1=2 =L

2.5. Colour features The colour features were extracted by computing the following four statistical parameters for the grey-level distribution within each object in the image: mean grey level, median grey level, mode grey level and standard deviation. Although these features can be determined using a grey scale image, colour images were acquired considering future work of incorporating colour features. 2.6. Neural network architectures NeuroShell 2 provides the user with an option to design several different architectures. For this study, the different types of NNs that were designed and evaluated are BPN, Ward network, general regression neural network (GRNN) and probabilistic neural network (PNN). Each type of network architecture was evaluated for performance using a single network and a combination of specialist networks. 2.6.1. Single network A single network was presented with 12 input patterns and it produced five outputs corresponding to each grain type. The output class was determined by comparing the five outputs which ranged from 0 to 1. The unknown kernel was assigned to the class having the highest output. 2.6.2. Specialist network This network was a combination of five sub-networks (barley specialist network, CWAD wheat specialist network, CWRS wheat specialist network, oats specialist network and rye specialist network). Each of the specialist sub-network was similar to the single network except that it had 12 inputs and one output (except for PNN which requires a minimum of two output categories). A sub-network specializing in a particular grain type, say barley, was trained using the data which consisted of two classes of output patterns } barley and ‘the rest’. In essence, each of the specialist sub-networks

154

N. S. VISEN ET AL.

was trained to categorize an unknown object belonging to its class by assigning it a value between 0 and 1. A value of 1 meant that the unknown object belonged to that class, herein barley, and 0 meant that the object belonged to any of the rest of the classes i.e. CWAD wheat, CWRS wheat, oats or rye. The final outcome was determined by comparing the output of all the specialist sub-networks and the one having the highest value was declared the winner. The unknown grain type was assigned the class corresponding to that of the winner. 2.6.3. Back-propagation networks Back-propagation networks are the most commonly used networks because of their ability to generalize. A BPN consists of an input layer, one or more hidden layers and an output layer. Initially, the number of nodes n in the hidden layer was calculated using the formula n ¼ ½ðni þ no Þ=2 þ ð yÞ05

ð1Þ

where: ni is the number of input nodes, no is the number of output nodes and y is the number of input patterns in the training set (Ward Systems Group, 1998). As there were 12 input features, five output categories and 4000 input patterns, 72 nodes were obtained using the above formula. These nodes were equally divided between the two hidden layers. The number of nodes was varied to see any significant improvement in performance. Work done by Paliwal et al. (2001) revealed that no significant improvement was observed by changing the number of nodes, therefore the number of nodes calculated by the formula was used to train the network. Training a BPN involves presenting a pattern to the network in a forward pass which generates the output; the weights are then changed during a backward pass, beginning with the nodes in the output layer (i.e. back-propagation of ‘error’). When variables are loaded into a neural network, they must be scaled from their numeric range into the numeric range that the neural network deals with efficiently. Neural networks commonly operate in two main numeric ranges depending upon the type of activation functions used: 0 to 1 (logistic and Gaussian) and 1 to 1 (tanh). Logistic scaling function was used for connections between input and hidden layer and logistic activation function was used for all other network connections. The logistic scaling function scales data to (0, 1) according to the following formula: f ðxÞ ¼

1 1 þ eðx% xÞ=s

ð2Þ

where: x% is the mean of all of the values of the variable x in the pattern file and s is the standard deviation of those values. The advantage of using a logistic scaling function is that data, no matter how large, are never clipped or scaled out of range.

Logistic activation functions (also called sigmoid, semi-linear or soft-limiting functions) were used in this study because they provide a balance between linear and non-linear (hard-limiting) activation functions and are considered to be the closest to biological neurons. Linear activation functions cannot suppress noise and have limited learning capabilities whereas non-linear functions may introduce network instability and risk computational and analytical intractability (Mehrotra et al., 1996). The types of BPNs used were: single back propagation (BP) network } a fourlayer network (two hidden layers) with each layer connected to the immediately previous layer; and specialist BP network } a combination of five BP specialist sub-networks. 2.6.4. Ward network Hidden layers in a neural network are known as feature detectors. Ward network basically has a BPN architecture with multiple hidden layers with a unique activation function. Different activation functions applied to hidden layer slabs detect different features in a pattern processed through a network. The Ward network used in this study had three slabs with different activation functions, namely, tanh activation function, Gaussian activation function and Gaussian-complement activation function, thus offering three ways of viewing the data in the hidden layer. The tanh activation function scales to (1, 1) by computing the hyperbolic tangent of the input value. This function tends to squeeze together data at the low and high ends of the original data range. It is thus helpful in reducing the effects of outliers. Gaussian activation function is unique, because unlike the others, it is not an increasing function. It is the classic bell-shaped curve, which maps high values into low ones, and maps mid-range values into high ones. Gaussian-complement activation function tends to bring out meaningful characteristics in the extremes of the data. A linear scaling function was used at input slab and logistic activation function was used at the output slab. The types of Ward networks used were: single Ward network } a three-layer BPN with three slabs in the hidden layer having different activation functions; and specialist Ward network } a combination of five Ward specialist sub-networks. 2.6.5. General regression neural network General regression neural networks (GRNN) are memory–based feed forward networks based on the

NEURAL NETWORKS FOR CEREAL GRAIN

estimation of probability density functions. There are no training parameters such as learning rate and momentum as there are in BPN, but there is a smoothing factor d that is used when the network is applied to new data. The smoothing factor determines how tightly the network matches its predictions to the data in the training patterns (Ward Systems Group, 1998). The smoothing parameter defines the width of Gaussian curve. As d approaches zero, the classifier approximates a nearest-neighbour classifier and as d increases, the classifier approximates a linear discriminator (Specht, 1991). Rather than categorizing data into different classes like PNN, GRNN are able to produce continuous valued outputs and fit multi-dimensional surfaces through data. Logistic scaling function was used for connections between input and hidden layer. The types of GRNN used were: single GRNN } a three-layer network with one hidden neuron for each training pattern; and specialist GRNN } a combination of five GRNN specialist sub-networks. 2.6.6. Probabilistic neural network On the basis of the training data, a PNN estimates the probability distribution function for each of the categories. Thus, in a way, it approximates a Bayesian classifier (Specht, 1990). Probabilistic neural networks are three-layer networks wherein the training patterns are presented to the input layer and the output layer has one neuron for each possible category. There must be as many neurons in the hidden layer as there are training patterns. These patterns are stored at the pattern (hidden) layer, one pattern per node. The network produces activations in the output layer corresponding to the probability density function estimate for that category. When an unknown pattern is presented to the network, the following computation is performed at each pattern node: f ðxÞ ¼ eððxuÞ

T

ðxuÞÞ=2d2

ð3Þ

where: f ðxÞ is the output of hidden layer, x is an input pattern, the superscript T denotes transpose, u is the stored pattern and d is the smoothing factor. The highest output represents the most probable category. Another parameter in a PNN, similar to that in a GRNN, is smoothing factor d, which the algorithm uses to generate the value of the output. A little experimentation for selecting appropriate d is needed, to obtain the best results. Different values of d can be fed to the network, once it has been trained. Logistic scaling function was used for connections between input and hidden layers. The types of PNN used were:

155

single PNN } a three-layer network with one hidden neuron for each possible category, i.e. number of training patterns; and specialist PNN } a combination of five PNN specialist sub-networks. 2.7. Network training Training of the networks was started with all of the 12 features as inputs. The network was trained on 800 kernels and then cross-validated on 200 kernels, for each grain type. The procedure was repeated five times for cross-validation thereby ensuring that all the kernels were, at some point, subjected to training and testing of the neural network. After the network was trained, it was evaluated by applying on a production set consisting of 500 kernels of each grain type. The production set consisted of data that was never presented to the network during training or testing. The weights and thresholds for each neuron were adjusted to minimize the mean square error (MSE) between the predicted and observed outputs. For all the connections (except Ward network which uses different activation functions for different connections), logistic activation functions were used. The number of hidden nodes were varied until the best results were obtained. Training was stopped after 500 epochs. An epoch is defined as the time during which a network is trained by presenting each pattern in the training set exactly once. In all the architectures, 500 epochs were more than enough for the network to train as the coefficient of multiple determination R2 became constant well before 500 epochs. To avoid over-training, the trained network was saved every time it reached a new minimum average error for the test set. Time taken by each network to train was also determined to compare the computational speeds of various networks. The overall performance of a network was judged on the basis of classification accuracy, network complexity and the time required to train and apply the network. Apart from these criteria, the other theoretical factors that were also taken into account were if the network is good at learning local or global features, can interpolate unknown patterns, and the network’s ability to generalize.

3. Results and discussion 3.1. Back propagation networks 3.1.1. Single network The network was trained with 12 neurons in the input layer (corresponding to 12 input parameters) and five neurons in the output layer (corresponding to five

156

N. S. VISEN ET AL.

Table 1 Classification results obtained using a back propagation network Grain type

Classification, % Single network Mean accuracy, %

Barley CWRS wheat CWAD wheat Oats Rye

Specialist network

Standard deviation

9422 9766 8510 9362 8706

Mean accuracy, %

Standard deviation

n

085 056 175 070 166

9388 9702n 8510n 9276 8574n

023 029 059 062 112

Note: CWRS, Canada Western Red Spring; CWAD, Canada Western Amber Durum. n Mean is statistically similar to the mean of single network (probability P>005).

output categories). Thirty-six neurons in each of the hidden layers gave the best results. The network took on an average 39 min and 20 s of the computer’s central processing unit (CPU) time to train for 500 epochs on a Pentium III, 450 MHz personal computer running Windows NT 40 operating system. The average recall time, i.e. the time taken by a trained network to process an unknown input, was 00034 s per kernel. The classification accuracies of the network are shown in Table 1. 3.1.2. Specialist network With 12 input features and one output feature, the network performed the best with 35 neurons in each of the two hidden layers. Each sub-network took 36 min and 17 s of CPU time to train for 500 epochs. The average recall time of each trained specialist subnetwork was 00018 s per kernel. As the specialist network consisted of five sub-networks, the total recall time was 0009 s. Results of t-test indicated that the classification accuracy for all the grain types except oats remained unchanged.

3.2. Ward networks 3.2.1. Single network The network architecture consisted of 12 neurons in the input slab, 24 neurons in each of the three hidden slabs, and five neurons in the output slab. The network took 33 min and 3 s of CPU time to train for 500 epochs. The recall time of the network was 00034 s per kernel. The classification results for Ward network are shown in Table 2. The performance of this network was better than both single and specialist BPN.

3.2.2. Specialist network The specialist ward network consisted of 12 input neurons, 23 neurons in each of the three hidden slabs, and one output neuron. The average training time for each sub-network was 28 min and 35 s of CPU time. Each sub-network took 00018 s per kernel for recall. The average classification accuracies statistically remained the same for the specialist network as compared to Ward single network (Table 2).

Table 2 Classification results obtained using a ward network Grain type

Classification, % Single network Mean accuracy, %

Barley CWRS wheat CWAD wheat Oats Rye

9640 9746 8738 9370 8648

Specialist network

Standard deviation 039 033 133 063 047

Mean accuracy, % n

9636 9740n 8770n 9416n 8726n

Note: CWRS, Canada Western Red Spring; CWAD, Canada Western Amber Durum. n Mean is statistically similar to the mean of single network (probability P>005).

Standard deviation 030 029 099 057 112

157

NEURAL NETWORKS FOR CEREAL GRAIN

Table 3 Classification results obtained using general regression neural network Grain type

Classification, % Single network

Specialist network

Mean accuracy, %

Standard deviation

Mean accuracy, %

Standard deviation

9808 9902 9562 9798 9562

041 034 057 025 089

9870 9926 9666 9862n 9718

073 020 043 051 034

Barley CWRS wheat CWAD wheat Oats Rye

Note: CWRS, Canada Western Red Spring; CWAD, Canada Western Amber Durum. n Mean is statistically similar to the mean of single network (probability P>005).

3.3. General regression neural network 3.3.1. Single network The network consisted of 12 neurons in the input layer, 4000 neurons in the hidden layer, and five neurons in the output layer. The network’s classification results are shown in Table 3. The network took 3 min and 15 s of CPU time for training and determining the default smoothing factor of 0081. Recall time of the network was 00116 s per kernel. The value of d was then varied to determine its effect on the network performance. The network performance improved with reduction in d. Beyond a certain point, however, the number of patterns that could not be classified in any of the output categories also increased as d was decreased. The best results were obtained with d of 0045. The classification accuracies were better than BPN and Ward networks for default smoothing factor and improved further when the smoothing factor was adjusted. 3.3.2. Specialist network The network consisted of 12 neurons in the input layer, 4000 neurons in the hidden layer, and one neuron

in the output layer. Each sub-network took an average of 2 min and 57 s of CPU time for training and determining the default smoothing factor. Recall time of each sub-network per kernel was 00088 s. Except for oats, where the classification accuracy remained unchanged, specialist GRNN outperformed single GRNN (Table 3). 3.4. Probabilistic neural network The network generated two types of output values: a raw output value between 0 and 1 and a probability output value between 0 and 1 for both, single and specialist networks. The raw output is the actual computed value by the network whereas the probability output is the probability of an unknown object belonging to a particular output class. The classification accuracies, however, remained unchanged for single network when using raw output or probability output. 3.4.1. Single network Using 12 input features, the results were first obtained by taking the default value of d generated by Neuro

Table 4 Classification results obtained using probabilistic neural network Grain type

Classification, % Single network

Barley CWRS wheat CWAD wheat Oats Rye

Specialist network (raw output)

Specialist network (probability output)

Mean accuracy, %

Standard deviation

Mean accuracy, %

Standard deviation

Mean accuracy, %

Standard deviation

9812 9904 9594 9798 9626

042 015 056 031 045

9850 9930 9624n 9848n 9364

044 008 115 031 121

9876 9934 9674 9846n 9690

070 016 055 047 025

Note: CWRS, Canada Western Red Spring; CWAD, Canada Western Amber Durum. n Mean is statistically similar to the mean of single network (probability P>005).

158

N. S. VISEN ET AL.

Shell. The network took less then 20 s of CPU time to train because training involved merely copying of input patterns to hidden nodes. The network, however, took 2 min and 7 s of CPU time to determine the default smoothing factor for the network. The recall time for single PNN was 00084 s per kernel. The classification accuracies for single and specialist PNN networks is shown in Table 4.

recall time, showed that the specialist PNN with modified smoothing factor and probability output is best suited for the cereal grain classification. The performance of BPN was least suited for such a classification process. The only disadvantage of using specialist networks is that five different networks need to be designed and trained separately.

3.4.2. Specialist network Unlike other specialist networks, this network had two neurons in the output slab. The input slab consisted of 12 neurons and the hidden layer consisted of 4000 neurons. Each sub-network took less than 20 s to train and a total time of 2 min 21 s (including training time) to determine the default smoothing factor. The recall time for each specialist sub-network was 00058 s per kernel. In comparison to single PNN, the classification accuracy for raw output for barley and CWRS wheat improved whereas it remained unchanged for CWAD wheat and oats. In case of rye, the classification accuracy decreased for the raw output. For probability output, classification accuracy increased for all grain types but remained unchanged for oats in comparison to single PNN (Table 4).

Acknowledgements

4. Conclusions Network performances were evaluated on the basis of classification accuracies, training time and recall time. A thorough analysis of results indicates that specialist networks outperform single networks in ten instances out of 25 comparisons. The performances remained unchanged in 13 instances and deteriorated in two instances. The network architectures in descending order of classification accuracies are probabilistic neural network (PNN), general regression neural network (GRNN), Ward network and back propagation network (BPN). The training time was least for PNN network followed by GRNN, Ward network and BPN. The recall time for PNN and GRNN was higher as compared to BPN and Ward networks. This is because, in case of the former, all the training patterns are stored in the network and compared to the unknown input pattern at the time of recall. The recall for specialist networks was more expensive as compared to their single counterparts. The specialist networks, however, can be made faster by implementing them in parallel. This is supported by the fact that time taken by a specialist sub-network is less as compared to its corresponding single network. An overall evaluation of network performance based on classification accuracies, time taken by it to train and

We thank the Natural Sciences and Engineering Research Council of Canada and the University of Manitoba Graduate Fellowship Committee for partial funding of this study.

References Barker D A; Vouri T A; Hegedus M R; Myers D G (1992a). The use of ray parameters for the discrimination of Australian wheat varieties. Plant Varieties and Seeds, 5(1), 35–45 Barker D A; Vouri T A; Myers D G (1992b). The use of slice and aspect ratio parameters for the discrimination of Australian wheat varieties. Plant Varieties and Seeds, 5(1), 47–52 Barker D A; Vouri T A; Myers D G (1992c). The use of Fourier descriptors for the discrimination of Australian wheat varieties. Plant Varieties and Seeds, 5(1), 93–102 Barker D A; Vouri T A; Hegedus M R; Myers D G (1992d). The use of Chebychev coefficients for the discrimination of Australian wheat varieties. Plant Varieties and Seeds, 5(1), 103–111 Dowell F E (1993). Neural network classification of undamaged and damaged peanut kernels using spectral data. ASAE Paper No. 93-3050 Duda R O; Hart P E (1973). Pattern Classification and Scene Analysis. John Wiley and Sons, Inc., New York Ghazanfari A; Wulfsohn D; Irudayaraj J (1998). Machine vision grading of pistachio nuts using grey-level histogram. Canadian Agricultural Engineering, 40(1), 61–66 Jayas D S; Paliwal J; Visen N S (2000). Multi-layer neural networks for image analysis of agricultural products. Journal of Agricultural Engineering Research, 77(2), 119– 128 Keefe P D (1992). A dedicated wheat grain image analyzer. Plant Varieties and Seeds, 5(1), 27–33 Liao K; Paulsen M R; Reid J F; Ni B C; Bonifacio-Maghirang E P (1993). Corn kernel breakage classification by machine vision using a neural network classifier. Transactions of the ASAE, 36(6), 1949–1953 Lippmann R P (1987). An introduction to computing with neural nets. IEEE, Acoustics, Speech and Signal Recognition Magazine, 4(2), 4–22 Luo X Y; Jayas D S; Symons S J (1999). Identification of damaged kernels in wheat using a colour machine vision system. Journal of Cereal Science, 30(1), 49–59 Majumdar S; Jayas D S; Symons S J (1999). Textural features for grain identification. Agricultural Engineering Journal, 8(4), 213–222

NEURAL NETWORKS FOR CEREAL GRAIN

Mehrotra K; Mohan C K; Ranka S (1996). Elements of Artificial Neural Networks. MIT Press, Cambridge, MA Neuman M; Sapirstein H D; Shwedyk E; Bushuk W (1987). Discrimination of wheat class and variety by digital image analysis of whole grain samples. Journal of Cereal Science, 6(2), 125–132 Neuman M; Sapirstein H D; Shwedyk E; Bushuk W (1989a). Wheat grain colour analysis by digital image processing, I: methodology. Journal of Cereal Science, 10(3), 175–182 Neuman M; Sapirstein H D; Shwedyk E; Bushuk W (1989b). Wheat grain colour analysis by digital image processing, II: wheat class determination. Journal of Cereal Science, 10(3), 183–182 Ozer N; Engel B A; Simon J E (1995). Neural network for quality sorting and grading of cantaloupes by multiple nondestructive sensors. ASAE Paper No. 95-3218 Paliwal J; Shashidhar N S; Jayas D S (1999). Grain kernel identification using kernel signature. Transactions of the ASAE, 42(6), 1921–1924 Paliwal J; Visen N S; Jayas D S (2001). Evaluation of neural network architectures for cereal grain classification using morphological features. Journal of Agricultural Engineering Research, 79(4), 361–370

159

Romaniuk M D; Sokhansanj S; Wood H C (1993). Barley seed recognition using a multi-layer neural network. ASAE Paper No. 93-6569 Sapirstein H D; Kohler J M (1995). Physical uniformity of graded railcar and vessel shipments of Canada Western Red Spring wheat determined by digital image analysis. Canadian Journal of Plant Science, 75(2), 363–369 Sayeed M S; Whittaker A D; Ksehtarnavaz N D (1995). Snack quality evaluation method based on image feature extraction and neural network prediction. Transactions of the ASAE, 38(4), 1239–1245 Segerlind L; Weinberg B (1972). Grain kernel identification by profile analysis. ASAE Paper No. 72-314 Shibata T; Iwao K; Takano T (1996). Evaluating tomato ripeness using a neural network. Journal of the Society of High Technology in Agriculture, 8(3), 60–167 Specht D F (1990). Probabilistic neural networks. Neural Networks, 3(1), 109–118 Specht D F (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2(6), 568–576 Ward Systems Group (1998). NeuroShell 2, Version 4. Frederick, MD