Computers & Operations Research 39 (2012) 93–104
Contents lists available at ScienceDirect
Computers & Operations Research journal homepage: www.elsevier.com/locate/caor
A heuristic approach for allocation of data to RFID tags: A data allocation knapsack problem (DAKP) Lauren Davis a,, Funda Samanlioglu b, Xiaochun Jiang a, Daniel Mota a, Paul Stanfield a a b
Department of Industrial and Systems Engineering, North Carolina A&T State University, Greensboro, NC 27411, USA Department of Industrial Engineering, Kadir Has University, Cibali, Istanbul 34083, Turkey
a r t i c l e i n f o
abstract
Available online 26 February 2011
Durable products and their components are increasingly being equipped with one of several forms of automatic identification technology such as radio frequency identification (RFID). This technology enables data collection, storage, and transmission of product information throughout its life cycle. Ideally all available relevant information could be stored on RFID tags with new information being added to the tags as it becomes available. However, because of the finite memory capacity of RFID tags along with the magnitude of potential lifecycle data, users need to be more selective in data allocation. In this research, the data allocation problem is modeled as a variant of the nonlinear knapsack problem. The objective is to determine the number of items to place on the tag such that the value of the ‘‘unexplained’’ data left off the tag is minimized. A binary encoded genetic algorithm is proposed and an extensive computational study is performed to illustrate the effectiveness of this approach. Additionally, we discuss some properties of the optimal solution which can be effective in solving more difficult problem instances. & 2011 Elsevier Ltd. All rights reserved.
Keywords: RFID tags Data allocation Knapsack problem
1. Introduction Radio frequency identification (RFID) is a form of auto-identification technology (AIT) that uses radio waves to automatically transmit data and identify objects. RFID systems typically consist of 2 components: (1) RFID tags that are placed on items; and (2) RFID tag readers. RFID tags have a small integrated circuit which enables data storage, and an antenna that allows communication and data transmission to the reader. RFID is considered a significant improvement over conventional barcodes in terms of data storage capacity, the ability to be read through extreme circumstances, and the ability to read multiple, distant, and even out-of-sight objects [1,2]. Besides their conventional usages in supply chain applications (e.g., tracking product movement), RFID tags are used in a variety of innovative applications. Animaltracking tags are injected beneath the skin of animals. Other tags are used in smart tires to store information on the operating history of tires. In some tags, there are sensors that collect environmental data such as temperature, pressure, or humidity, and is often applied in food distribution. Tags are also used for security purposes in credit cards, access cards, passports, for tracking purposes in airline baggage handling and transit carrier
Corresponding author. Tel.: + 1 336 334 7780; fax: + 1 336 334 7729.
E-mail address:
[email protected] (L. Davis). 0305-0548/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2011.01.019
labeling, and for money anti-counterfeiting. Innovative applications in healthcare include patient tracking with RFID bracelets, tracking blood products from donor to patient, prevention of medical errors by ensuring the correct dosage is delivered to the correct patient, and fighting drug counterfeiting [3–5]. Many other applications in healthcare are being proposed such as the use of smart health cards to store patient benefits information, drug allergies, and other clinical information that may be of use to healthcare providers [6]. These examples illustrate the emerging AIT uses (such as RFID) and capability such as read/write dynamic data storage, sensor integration, data processing capacity, and self-powering using energy harvesting. For a more comprehensive discussion of RFID technology, the reader is referred to the work of Roberts [7]. AIT has historically been used for asset tracking of goods, often consumables, through the supply chain. Increasingly such technology is being applied to durable goods by such organizations as the Department of Defense, both for the product and their component parts, enabling improvements in life cycle management and expanded opportunities for economic value generation. Processes used to enhance durable goods life cycle management from production through numerous cycles of operation and field/ depot maintenance have the potential to generate alot of operational data. It is interesting to note the similarities between durable good and human life cycle management. Both systems are faced with determining which important information to
94
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
collect and retain for improved operation and health maintenance. In addition, the amount of available information can exceed available capacity. These systems rely on having the right information at the right time to insure items are quickly returned to a functional state. Given the improvement and rapid expansion of RFID and other AIT devices, potential process improvements are achievable by storing some of this information in a form that permits improved information visibility and process freedoms [1]. In the context of durable goods asset tracking, data stored on the device might be used for improvements in product or part distribution, operation, maintenance, redesign and disposal. The rapid growth in RFID technology coupled with the rising data acquisition, usage, and storage needs illustrates the potential challenge faced by future adopters of this technology as it relates to determining ‘the right information’ to store on AIT devices. As in human life cycle management, it is attractive to store information with high frequency and value of use. However, limitations in tag storage capacity restrict the decision maker to consider the most significant and valuable information. Often stored data is translated into operationally relevant parameters. For example, one might estimate the time required to repair a part or perform a medical procedure based on prior usage and maintenance history. Therefore, it is not attractive to store information which could be ascertained by other information previously stored. Practically, this means that data correlation along with value and space requirements must be considered in data allocation. As a result, users need to be selective and find an intelligent way to allocate data elements on RFID tags since the total memory requirements for all data elements may exceed the tag capacities [1,8]. Limited research has been done in the area of allocating data elements on RFID tags. Osman et al. [9] develop mathematical models to optimize the placement of RFID tags for durable end products. Their model considers the selection of the type of tag, the tag configuration and the allocation of data elements on the tags for an end product and its components. The RFID data allocation model presented in Osman et al.’s research is a biobjective model that minimizes the total cost of the tag system and maximizes the total value of data stored on parts. Total value of the data is calculated based on usage frequency of the data item and a fixed value. An illustrative example for a gas turbine engine and its components is presented and solved using a commercial solver, GAMS (General Algebraic Modeling System). In this paper, the data allocation problem to RFID tags with finite memory capacities is modeled as a version of the nonlinear knapsack problem, hereafter referred to as the data allocation knapsack problem (DAKP). In the DAKP, the objective is to minimize the total value of ‘‘unexplained’’ data that is off the tag. This ‘‘unexplained’’ value measures the relationship between the objects on the tag and the objects off the tag and is quantified numerically as a function of the coefficient of determination between data pairs. A binary encoded genetic algorithm (GA) is developed in order to find (near) optimal solutions to DAKP. Due to the sensitive nature of defense applications, we have chosen to illustrate the effectiveness of the GA using declassified clinical data obtained from a health care provider. Our results show that the GA finds an optimal solution in 85% of all cases generated. Under certain GA parameter settings (low level of elite fraction and crossover fraction), the optimal solution is found 100% of the time. Note that these results are applicable to any form of read/ write AIT. The remainder of this paper is outlined as follows. The mathematical model of DAKP and its differences and similarities with other knapsack problems are given in Section 2. Section 3 summarizes the steps of the GA. In Sections 4 and 5, details of the experimental design and results of the computational experiments are presented, along with concluding remarks in Section 6.
2. Problem formulation 2.1. Overview of knapsack problem The objective of the classical knapsack problem is to determine a subset of N items, which should be placed in a knapsack of size C, such that the overall profit (measured as the sum of the item profits pi) is maximized. The multidimensional knapsack problem (MDKP), also known as the m-dimensional knapsack problem, multi-constraint knapsack problem, and multi-knapsack problem, is a knapsack problem with multiple resource (knapsack) constraints. The binary version of the MDKP (0/1 MDKP), also known as the multiconstraint 0-1 knapsack problem, is widely studied in the literature [10–14]. The 0/1 MDKP is formulated as Max z ¼
N X
pi xi
ð1Þ
i¼1
s:t:
N X
aij xi rbj ,
j ¼ 1, . . . ,m
ð2Þ
i¼1
xi A f0,1g,
i ¼ 1, . . . ,N
ð3Þ
where N is the number of items, and m is the number of knapsacks (knapsack constraints) with capacities bj (j¼ 1, y, m). Each item i requires aij units of resource consumption in the jth knapsack, and yields a profit pi (i¼ 1, y, N). The goal is to determine which items to include in the knapsacks given the knapsack capacity limits, such that the total profit is maximized. Binary decision variable xi is 1 if item i is included and 0 otherwise. The standard 0/1 knapsack problem (0/1 KP) is a special case of 0/1 MDKP, where m¼1 [15]. There are several well-known non-standard knapsack problems in the literature [16], such as the multi-dimensional multiple choice knapsack problem, unbounded knapsack problem, nonconvex piecewise linear knapsack problem, collapsing knapsack problem, 0/1 dynamic knapsack problem, fuzzy knapsack problem, stochastic knapsack problem, multi-objective multi-dimensional 0/1 knapsack problem, etc. The multi-dimensional multiple choice knapsack problem [17] is a variant of the classical 0/1 KP, where there are several mutually exclusive groups of items. The objective is to pick exactly one item from each group so that the total value of items collected are maximized subject to m knapsack constraints. When m¼ 1, the problem reduces to a multiple choice knapsack problem. The unbounded knapsack problem takes into account different object types to include in the knapsack while there is an unbounded number of copies of each object type [18]. In the nonconvex piecewise linear knapsack problem, total cost is minimized while the items to include in the knapsack have a divisible quantity and a cost function that is a nonconvex piecewise linear function of the quantity [19]. The collapsing knapsack problem mentioned in Pferschy et al. [20], considers the case where the knapsack capacity is not fixed and is dependent on the items already in the knapsack. In the 0/1 dynamic knapsack problem [21,22], knapsack capacity can change over time between different values. The fuzzy knapsack problem [23] includes imprecise weight coefficients (fuzzy linguistics or decimal truncation) for the items to include in the knapsack, since it is assumed that the decision maker has only a vague knowledge about these items. The stochastic knapsack problem [24] assumes objects arrive to and depart from the knapsack at random times based on a known distribution. The arriving objects are accepted or rejected as a function of the current state, while maximizing average revenue. The multi-objective multidimensional 0/1 knapsack problem in [25–27] is a 0/1 MDKP with several objective
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
functions. Surveys related to different versions of knapsack problems and solution methods can be found in [15,23,28]. The closest versions of the knapsack problem to the DAKP are the nonlinear knapsack problem and the quadratic knapsack problem. The nonlinear knapsack problem is a minimization, nonlinear problem with one constraint, bounds on variables, and in some cases generalized upper bounds [28–30]. It has different versions such as convex, separable, continuous; convex, separable, integer; nonconvex, separable; and nonconvex, nonseparable based on the structures of the objective function and constraint. The quadratic knapsack problem [31], on the other hand, is a special case of the quadratic multiple knapsack problem [32], where there is only one knapsack. The quadratic knapsack problem is an extended version of the 0/1 knapsack problem, where value computations are based on quadratic terms. Here, in addition to the value of each object alone (vi), a non-negative value related to having each pair of objects (objects i and j) together in the knapsack (vij) is taken into consideration. The formulation for the quadratic knapsack problem is shown below as presented in Julstrom [31] N X
Max V ¼
vi xi þ
i¼1
s:t:
N X
N 1 X
N X
vij xi xj
ð4Þ
i ¼ 1 j ¼ iþ1
wi xi r C
ð5Þ
i¼1
where C is the knapsack capacity, wi is a positive weight associated with object i and x1, x2, y, xN are binary variables that indicate the selection (xi ¼1) or exclusion (xi ¼0) of each object. In the following section, we formally define the mathematical model of DAKP and discuss its similarities and differences with the quadratic and nonlinear knapsack problems. 2.2. Data allocation knapsack problem (DAKP) The objective of the DAKP is to determine a subset of N items, which should be placed in a knapsack (RFID tag) with finite capacity C. The optimal allocation of data to the tag should minimize the amount of unexplained information (quantified as a function of data that is not on the tag) while taking into consideration the importance of the data item, as specified by the decision maker. In the DAKP, the individual items are related in a way that the subset of items included on the tag, can explain the data elements off the tag. This relationship is structurally quantified in a correlation matrix (CV). We formally define the DAKP as a version of the nonlinear 0-1 KP as shown below Min
N X
vi ui
ð6Þ
i¼1
s:t:
N X
wi xi r C
ð7Þ
i¼1
ui ¼
N Y
ð1cvij xj Þ,
i ¼ 1, . . . ,N
ð8Þ
ui Z 0,
i ¼ 1, . . . ,N
ð9Þ
j¼1
xi ¼ f0,1g, ( xi ¼
1,
data is included ðon the tagÞ
0,
data is excluded ðoff the tagÞ
)
Here, the total value of unexplained data that is off the tag (6) is minimized, subject to the tag capacity constraint (7). Each item has a non-negative weight wi representing the amount of storage
95
space required. In (6), vi values are specified by the decision maker and they are values representing the importance given to each data item i. The correlation values (cvij) between each data pair (i, j), specifically for data that are on the tag and off the tag, represent the ability of the data elements on the tag to explain the data elements that are off the tag. Thus, (1 cvijxj) values, called ‘‘uncorrelation’’ values in this research, represent the inability of the data elements that are on the tag to explain the data that are off the tag. ui values are calculated based on the ‘‘uncorrelation’’ values between data elements that are on the tag and off the tag. Each ui value can be regarded as a measurement of the proportion of the variance of data i (that is off the tag) that is unexplained by other data that are on the tag. Naturally, data that are on the tag have ui ¼0. It is important to note that this product structure is an estimate of this uncorrelation measure. In practice, life cycle data would come from a variety of sources precluding more sophisticated measures of interrelationship. Additionally, a correlation relationship might be estimated where lack of historical data or data type prevents computation. A 5-data example to demonstrate how ui values are calculated is presented below. Assume that Table 1 shows the correlation values matrix (CV) between each data. If, for this example, data that are on the tag are 1, 2, and 3, and data that are off the tag are 4 and 5, corresponding ui values based on (8) are as follows: u1 ¼ ð1ð1 1ÞÞð1ð0:02432 1ÞÞð1ð0:01075 1ÞÞ ð1ð0:012505 0ÞÞð1ð0:017732 0ÞÞ ¼ 0 u2 ¼ ð1ð0:02432 1ÞÞð1ð1 1ÞÞð1ð0:076046 1ÞÞ ð1ð0:01091 0ÞÞð1ð0:012074 0ÞÞ ¼ 0 u3 ¼ ð1ð0:01075 1ÞÞð1ð0:076046 1ÞÞð1ð1 1ÞÞ ð1ð0:017544 0ÞÞð1ð0:033684 0ÞÞ ¼ 0 u4 ¼ ð1ð0:012505 1ÞÞð1ð0:01091 1ÞÞð1ð0:017544 1ÞÞ ð1ð1 0ÞÞð1ð0:053333 0ÞÞ ¼ 0:959585 u5 ¼ ð1ð0:017732 1ÞÞð1ð0:012074 1ÞÞ ð1ð0:033684 1ÞÞð1ð0:053333 0ÞÞð1ð1 0ÞÞ ¼ 0:937721 Therefore, approximately 96% of information associated with data 4 and 94% of the information associated with data 5 are not explained. DAKP can be regarded as a version of the 0/1 knapsack problem, where instead of maximizing the total value (or profit) associated with items that are included, the total value of excluded and unexplained data are minimized. Here, different from a 0/1 knapsack problem, as a result of ui values, the objective function (6) becomes a nonlinear term. Therefore, DAKP can be regarded as a nonlinear knapsack problem, where based on correlation values, the total value of unexplained data that is off the tag is minimized. DAKP is different from a quadratic knapsack problem, because in the quadratic knapsack problem, objective function (4) becomes a quadratic term as a result of value computations between each pair of data (vij values). However, in DAKP, objective function (6) becomes a nonlinear term as a result of ui values of data that are off the tag, which are calculated by multiplying each ‘‘uncorrelation’’ value of data that is on the tag. Therefore, instead of having a one-to-one (pair) value computation, in DAKP one data off the tag-to-many data on the tag interaction is considered. Table 1 Correlation values matrix (CV) for a 5-data example. cvij
1
2
3
4
5
1 2 3 4 5
1 0.02432 0.01075 0.012505 0.017732
0.02432 1 0.076046 0.01091 0.012074
0.01075 0.076046 1 0.017544 0.033684
0.012505 0.01091 0.017544 1 0.053333
0.017732 0.012074 0.033684 0.053333 1
96
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
The knapsack problem and DAKP are NP-hard, so the computing time may grow exponentially with problem size in the worst case. Genetic algorithms and multiple objective evolutionary algorithms have been widely used for different versions of knapsack problems in order to find (near) optimal solutions. Some examples include the 0/1 knapsack problem [33–35], dynamic knapsack problem [21,22], fuzzy knapsack problem [23], multi-objective multidimensional 0/1 knapsack problem [25,26], quadratic knapsack problem [31], quadratic multiple knapsack problem [32], and nonlinear knapsack problem [29]. In this paper, a binary encoded GA is developed in order to find (near) optimal solutions to DAKP. Details about the GA are given in Section 3.
5. The rest of the population ((100% e%) (100 c%) P of the population) comes from mutation. Mutation is used to maintain diversity within the population. Here, a random number is generated for each gene and the gene is mutated if the number generated is less than the mutation rate. For the binary encoding, the 1s are replaced by 0 s and vice versa. The next generation is now complete with P chromosomes since e% P +(100% e%) c% P+ (100% e%) (100% c%) P¼P. 6. Evaluate the fitness for each new member in the new population. 7. If the termination condition (X) is satisfied, go to step 8, otherwise, go to step 3. 8. Present the best solution and terminate the algorithm.
3. Genetic algorithm
In the developed GA, the evaluation function is created so that it integrates the objective function (6) with the tag memory capacity constraint (7). There are several ways of handling constraints in GA [39,40]. We utilize penalty functions to incorporate the effect of tag memory capacity constraint. The idea of this method is to transform a constrained optimization problem into an unconstrained one by adding (or subtracting) a certain constraint violation penalty value to (from) the objective function. So the evaluation function utilized in the presented GA is the sum of the original objective function (6) and the penalty function (pen). This is represented as
Genetic algorithms were introduced by Holland [36] as a methodology for finding approximate solutions to complex problems by mimicking the concepts of biological evolution. The main idea behind a GA is to start with randomly generated solutions and evolve to better solutions through generations by implementing the ‘‘survival of the fittest’’ strategy. In GAs, solutions are represented as ‘‘chromosomes,’’ and future generations are generated by ‘‘reproduction.’’ The process of creating generations terminates when a termination condition is reached, at which point the best chromosome is presented as the solution [37]. An important design issue of a GA is the chromosomal representation of a solution. Chromosomes are strings of numbers that represent the solution of a problem or can be decoded to represent the solution. Sometimes these numbers are 0 and 1 s (binary encoding), but other possibilities exist like non-negative integers. In the presented GA, binary encoding is used, whereas 1 represents the data is on the RFID tag and 0 represents the data is off the RFID tag. For example, for a 5-data example, if data 1, 2, and 3 are on the tag and 4 and 5 are off the tag, the encoding of the related chromosome will be 1-1-1-0-0 [37,38]. The GA is implemented in Matlab using the GA toolbox. Since GA is an iterative procedure, a termination condition is used. In particular, the GA terminates after a specified number of generations (X) is reached. The basic steps of the GA are shown below for a specific instance of the DAKP with N data items, correlation matrix (CV), decision maker’s values for the N data items (vi), amount of tag memory space required to store the data items (wi), and tag total memory capacity (C): 1. Input population size (P), termination condition (X), GA parameters (e%, c%, p%). 2. Randomly generate the initial population of P chromosomes, each consisting of N genes (xi) that correspond to the N number of possible data items that can be allocated to an RFID tag. Evaluate the fitness for each member of the current population. 3. Perform an elitist reproduction strategy whereby e% of the best chromosomes are copied to the next generation. At this point, the number of chromosomes in the population of the next generation is e% P. 4. Perform crossover with tournament selection to c% of the rest of the population. So (100% e%) c% P of the next generation comes from this genetic operator. Tournament size is taken as p%, so each time p%P chromosomes are selected at random from the population, and the best individual is selected as a parent in crossover. In crossover with tournament selection, two parents are selected, and vector entries (genes) from each parent are selected to form a new chromosome (offspring) of correct length. The particular type of crossover operator selected for this research is discussed in the next section.
Min
N X
vi ui þpenðxÞ
ð10Þ
i¼1
" penðxÞ ¼ r
N X
#þ xi wi C
i¼1
where r is a positive penalty factor and [a] + denotes max(0, a). Therefore, if the tag memory capacity is not exceeded in a data allocation (pen(x)¼0), the evaluation function (10) reduces to the original objective function (6). The population is sorted in the ascending order of the evaluation function (10) since overall this is a minimization problem.
4. Experimental design 4.1. Objective The objective of the experimental design is to (1) evaluate the effectiveness of the GA and the impact of the parameter settings on the quality of the solution obtained; and (2) understand the relationship between the parameters (value, weight, and CV matrix) on the selection of the items that are placed on the tag. While parameter tuning is important in research involving evolutionary operators, the intent of this research is not to conduct an exhaustive study. The choice of parameters to vary and the levels by which they vary are based on values that have been studied in the literature for the 0/1 knapsack problem and its variants. 4.2. Selection of GA parameters While the Matlab GA toolbox functionality allows for several parameters to be varied, we limit the scope of our experimentation to elite percentage, crossover fraction, and population size. Table 2 represents a sample of the genetic algorithm operators used in solving different variations of the knapsack problem. It should be noted that the study of Hoff et al. [41] was an experimental study with the sole objective of determining
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
97
Table 2 GA parameters used in knapsack problems. Reference
Problem type
Selection mechanism
Crossover mechanism
Crossover probability
Mutation rate
Bhatia and Basu [43] Michalewicz et al. [50] Ezziane [33] Hoff et al. [41] Chu and Beasley [42] Hill and Hiremath [45] Khuri et al. [44] Julstrom [31] Hiley and Julstrom [32]
0/1 KP 0/1 KP 0/1 KP MKP MKP MKP MKP QKP Q-MKP
Rank Rank Elitism Weighted Monte Carlo Tournament Roulette Proportional Tournament Tournament
Double point Single point Single point Burst Uniform Uniform Single point Custom Custom
0.6 0.65 0.8 0.5 N/A 0.85, 0.95 0.6 0.7 0.6
1/N 0.05 0.01 1/N 2 bits per child string 0.1, 0.3 1/N Custom Custom
which parameters worked well. Therefore, the results displayed in Table 2 represent the final recommended parameters based on their work. As illustrated in the table, a variety of crossover operators have been used. Single point is the simplest and most frequently used. Chu and Beasley [42] note that combinatorial optimzation problems are relatively insensitive to the choice of a crossover operator. The probability of mutation should be set to prevent convergence to a local optimum, while still generating a level of diversity within the population. Bhatia and Basu [43] suggest that prior research supports the mutation rate be equal to the inverse of the number of variables in the problem, which is also used by Hoff et al. [41] and Khuri et al. [44] for the MKP. In general, small mutation rates are used which seems to be consistent with the sample literature reflected in Table 2, particularly for the 0/1 KP. Several approaches to setting the population size have been considered. Fixed population sizes are used by all with the exception of Hoff et al. [41] and Julstrom [31]. Hoff et al. experiment with several different values and suggest that the best size is proportional to the number of variables in the problem (N). In particular, they suggest a value of 5N provides good results for the multi-dimensional knapsack problem. Hiley and Julstrom [32] set the population size to be 10N for the quadratic knapsack problem. For this research, we choose the scattered crossover function, which randomly selects genes from each parent to create offspring. This is the same as uniform crossover described in Chu and Beasley [42] and Hill and Hiremath [45]. We keep the mutation probability small and set it equal to the inverse of the number of variables in the problem (N). This corresponds to a rate of approximately 0.04. We vary the population size and select values such that the combined number of functional evaluations (population size times the number of generations) is still less than the total solution space. In addition, we set the fitness scaling to rank, which is the default option for the Matlab GA toolbox. It is noted that this option removes the effect of the spread of the raw scores. As a result, the selection function assigns a higher probability of selection to individuals with higher scaled values. Tournament selection with tournament size equal to 20% of the population size is used. The primary parameters that are varied as part of this research are summarized in Table 3. A 4 5 6 factorial design is used in this experiment. Ten replications are executed for each treatment combination resulting in 1200 observations. 4.3. Generation of problem instances 4.3.1. Description of data Since there are no standard datasets for this particular type of knapsack problem, we generate problem instances based on data
Table 3 GA experimental parameters. Parameter
Levels
Values
Elite fraction (e) Crossover fraction (pc) Mutation rate (pm) Population size (P)
4 5 1 6
0.5, 0.10, 0.15, 0.20 [0.5, 0.9] 1/N [30,130]
obtained from a healthcare provider. Human health data is appropriate because of its earlier described similarity to assessment of durable good health. In addition, this particular dataset allows us to demonstrate how the CV matrix can be built from historical data. The data consists of signals and symptoms associated with patients with liver disease, more specifically hepatic cirrhosis. This disease is chronic and degenerative and can be explained by a reduction of liver capacity to handle toxins and produce specific types of substances, such as hormones and proteins [46]. Some of the causes of this particular disease include excessive alcohol consumption, virus infection, and metabolic diseases. Even though there are different treatments to manage the degradation of the patient condition, more severe cases require liver transplantation. Given the important role the liver plays in our metabolism, its dysfunction can be observed by a variety of signals and symptoms that depend on the patient characteristics and stage of the disease. The first steps during medical evaluation are based on disease history, physical examination, and laboratory tests, in which blood samples are analyzed. All steps require an accurate information management system, so that the physician can properly analyze the test results and draw conclusions about the patient’s health condition. The database constructed as part of this research was designed with this focus, in an effort to provide a practical and easy way to access the patient information.
4.3.2. Construction of CV matrix The dataset consists of 40 variables collected from a patient medical examination. Twenty-seven variables are classified as binary, representing the absence or presence of a particular signal. The remaining variables are classified as continuous variables denoting the concentration of particular substances found in the blood of the patient. The variables represent specific hepatic cirrhosis signals used to estimate the mortality and cirrhosis severity, which are important for predicting the time for liver transplantation. A complete definition of the variables is provided in the Appendix A. Thirty-nine patient records, each consisting of the 40 variables mentioned above, were selected. In order to construct the CV matrix, we must measure the strength of the relationship between each variable. The sample coefficient of determination (rij)2 is used to quantify this
98
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
relationship between each pair of variables x(i) and x(j) as follows. "P #2 n k ¼ 1 ðxðiÞk xðiÞÞðxðjÞk xðjÞÞ cvij ¼ ðrij Þ2 ¼ ðn1ÞsxðiÞ sxðjÞ where cvij is the [i,j] element of the matrix CV; x( )k is the kth observation associated with variable x( ); xðÞ is the sample mean associated with variable x( ); Sx( ) is the sample standard deviation associated with variable x( ) and n is the sample size, which is 39 in this study. The sample coefficient of determination is an appropriate measure because it represents the proportion of the variance in one variable that can be explained by the other variable. 4.3.3. Partitioning of data Given that the complete enumeration is used to determine the optimal solution to a specific problem instance thus requiring evaluation of 240 solutions, it is not computationally feasible to use all 40 variables for the data items. Hence, three smaller matrices are constructed by partitioning the data based on the strength of the relationship (high, medium, or low). That is, a matrix classified as high would consist of the variables that have a strong relationship with each other, and thus large (rij)2 entries relative to a matrix classified as low. This partitioning is achieved by ranking the variables according to their values in the CV matrix as follows. (1) Each column is associated with a specific variable x(j) and is sorted in descending order such that cv[1],x(j) Zcv[2],x(j) Z? Zcv[N],x(j). Note, we exclude the variable x(j) from the list and assign it the highest value by default. (2) Each variable x(i) is assigned a numerical value (rank) equal to its relative position in the sorted list. This implies each variable will have a total of n different rank values, one per column variable. (3) We construct a new ranked matrix CVR with entries [i,j] equal to the ranked relationship between variables x(i) and x(j). (4) The overall rank for each variable is determined by summing the entries in each row and sorting the results in ascending order. The result of this ranking, along with the original CV matrix, is presented in the Appendix A. As a result of this scheme, the lower-ranked variables indicate a higher relationship with the other variables in the list. This overall rank is used to classify the variables and construct problem sub-matrices that can be classified as having a high degree of dependence, medium degree of dependence, and low degree of dependence. For example, the high CV matrix for a problem of size N ¼ 29, is constructed using the first 29 variables in the ranked list. The low CV matrix is constructed using the last 29 variables in the list. To qualify the effectiveness of this approach, we calculate the P matrix norm jjCVy jj1 ¼ maxj i jcvij j for each sub-matrix, where y corresponds to the category high, medium, or low. Under this ranking scheme, jjCVH jj1 ZjjCVM jj1 ZjjCVL jj1 .
4.3.4. Generation of values and weights Prior computational studies generate knapsack parameter values (profit and weights) based on the level of correlation between these two parameters. See, for example, the work of Martello et al. [47], Bhatia and Basu [43], and Pisinger [48]. The relationship is typically identified as uncorrelated, weakly correlated, or strongly correlated. In all cases, the weights are randomly generated from a uniform distribution on a pre-defined interval, and the profits are generated as a function of the weights. For the uncorrelated case (UC), the profits are generated randomly from a uniform distribution on a pre-defined interval [1,R]. For the weakly correlated (WC) case, profits are generated in an interval defined by the corresponding weight. For example, piA[wi r,wi +r] where r is either fixed or randomly generated. For the strongly correlated cases, profits are generated as a constant positive deviation from the weights (e.g., pi ¼wi + r). Strongly correlated (SC) instances are typically harder to solve [48]. In our problem we consider two approaches to generation of the weights (data item storage requirements) and corresponding values (vi), which measure the importance of the data item. The first method represents a realistic case based on the data storage requirements of the collected data elements. Namely, each piece of information is modeled as a float data type, requiring 4 bytes of storage space. The (vi) values are constructed in one of 2 ways: (1) vi are all the same and take on the quantity 1; (2) vi are equal to the corresponding column position in the CV matrix. Note the matrix columns are not ordered based on rank. This first method corresponds to a case where the values and weights are uncorrelated. It is assumed that a tag can hold up to 96 bytes of data, based on specifications found in specialized industries [49]. The second method models the strongly correlated instance. The item size (wi) is randomly generated in the interval [1,4] and vi ¼wi +4. For this scenario we assume that the capacity is equal to 80% of the sum of the individual data items. The problem instance parameters are summarized in Table 4.
5. Results 5.1. Effectiveness of GA To measure the effectiveness of the GA, we compute the optimal solution using total enumeration and construct the relative error between the heuristic and optimal solution value. The relative error is defined as ðz~ z Þ=z , where z* is the optimal solution found via total enumeration and z~ is the best solution obtained from the GA. The Matlab GA toolbox is executed on a Dell Inspiron duo-core Laptop with 2.00 GHz processor speed and 1 GB of memory. The total enumeration procedure is executed on a high speed Dell workstation and takes on average 58 h when the number of data items is 29, as 229 solutions are evaluated. The smaller problem (with 26 data items) takes on average 9 h to evaluate 226 solutions. The execution time for the GA is
Table 4 Summary of problem instance settings. Problem instance
CV matrix classification
||CVy||1
N
Penalty factor
Value of data item (vi)
vi, wi correlation level
Tag capacity
1 2 3 4 5
High Low High Medium High
5.32 2.48 5.05 2.72 5.05
29 29 26 26 26
7.25 7.25 10 10 10
Column position Column position 1 1 Based on wi
UC UC UC UC SC
92 92 96 96 54
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
99
solution space is examined (assuming duplicate chromosomes are not generated by the GA). Problem instance 5 corresponds to the case where the values and weights are strongly correlated. As indicated in Pisinger [48], strongly correlated parameter values appear to be more difficult to solve, which is also illustrated by the performance of GA. The fraction of time the optimal solution is found is significantly smaller than the other problem instances. In most cases, the optimal solution is found more frequently at higher population sizes. The average relative error for the cases that were not solved optimally are displayed in Table 6. For compactness, the error is averaged across all population sizes. Even though most of the problems are solved optimally (as indicated in Table 6) when the population size is 130, the few instances that are not solved have very high error rates (on average 40% above the optimal solution). Seeding the GA with better starting solutions may help to eliminate such large optimality gaps. In Section 5.3, we discuss some properties of the optimal solution, observed from the experimental problem set (Table 7).
significantly less, and in most cases an optimal solution is found, which is identical to the solution obtained from the enumeration procedure. The stall generation limit is set to 500 indicating if the weighted average changes in the fitness function over 500 generations is less than 1e 6 the algorithm terminates. Additionally, the algorithm terminates if the best solution in the current population is less than or equal to the optimal solution determined from the enumeration procedure. Under these stopping criteria, the maximum, minimum, and average execution times across all experimental settings is 7.656, 0.031, and 3.11 s, respectively. The fraction of times the optimal solution is found by the GA is shown in Table 5. Most of the problems are solved optimally at the smallest crossover probability value (0.5) and smallest elite fraction (0.05). As the crossover probability increases, larger population sizes are needed to find the optimal solution more frequently. Note, the crossover fraction of (0.9) is dominated by the other solutions in all but two cases (problem instance 2, population size 90, elite fraction 0.15 and 0.05), and is not displayed in the table. For problem instances 2 and 4, Fig. 1 shows the average number of generations until a solution is found, or the algorithm terminates. The algorithm terminates in fewer than 500 generations. Furthermore, the number of generations until the optimal solution is found decreases as the population size increases. Fig. 1, along with the results in Table 5, demonstrate that good solutions can be found and that less than 0.002% (0.018%) of the solution space for problem instance 2 (4) is examined to obtain that solution. That is, the ratio of solutions examined to the size of solution space is less than 0.002%. For example, for problem instance 2, elite fraction of 0.05, and population size of 30, on average 8940 solutions (population size 30 298 generations) are evaluated. Since there are a total of 536,870,912 (229) possible solutions to the problem, then approximately 0.002% of the
5.2. Nonparametric analysis In order to examine the effect of the elite fraction, crossover probability, and population size on the quality of the solution obtained by the GA, nonparametric analysis is performed. The relative error between the heuristic and optimal solution value denotes the dependent variable for this study. A normal probability plot for the residual data of the factorial design indicated a serious normality violation, which was further verified by the Kolmogorov–Smirnov (KS) test (D ¼0.410998, po0.01). The KS-test uses the maximum vertical deviation between the cumulative fraction curve of the treatment group and the cumulative
Table 5 Fraction of replications where the optimal solution is found. P|c
e ¼ 0.05 0.5
e ¼0.10
e¼ 0.15
e ¼ 0.20
0.6
0.7
0.8
0.5
0.6
0.7
0.8
0.5
0.6
0.7
0.8
0.5
0.6
0.7
0.8
Problem instance 1 30 1 0.6 50 1 1 70 1 1 90 1 1 110 1 1 130 1 1
0.5 0.9 0.9 1 1 1
0.4 0.8 0.9 0.7 1 1
0.7 0.9 1 1 1 1
0.9 1 1 1 1 1
0.3 0.7 1 0.9 1 1
0.1 0.8 1 0.9 1 1
0.8 1 1 1 1 1
0.8 1 0.9 1 1 1
0.7 0.9 1 1 1 1
0.4 0.7 0.8 1 1 1
1 0.9 1 1 1 1
0.8 0.9 0.8 1 1 1
0.5 0.8 1 1 1 1
0.1 0.7 0.7 0.9 0.9 1
Problem instance 2 30 1 0.9 50 1 1 70 1 1 90 1 1 110 1 1 130 1 1
0.5 1 1 1 1 1
0.2 0.8 0.9 0.9 1 1
1 0.9 1 1 1 1
0.8 1 1 1 1 1
0.8 1 1 1 1 1
0.3 0.7 1 1 1 1
0.9 1 1 1 1 1
0.7 0.9 1 1 1 1
0.8 0.8 0.9 1 1 1
0.6 0.7 1 0.8 1 1
0.9 1 1 1 1 1
0.7 1 1 1 1 1
0.7 0.7 1 0.9 1 1
0.5 0.8 1 1 0.9 1
Problem instance 3 30 1 0.9 50 1 1 70 1 1 90 1 1 110 1 1 130 1 1
0.7 0.9 1 1 1 1
0.3 0.7 0.8 1 1 1
1 1 1 1 1 1
0.9 1 1 1 1 1
0.7 0.9 1 1 1 1
0.8 0.9 1 1 1 1
0.7 1 1 1 1 1
1 1 1 1 1 1
0.7 1 1 0.9 1 1
0.6 1 0.9 0.9 1 1
0.8 1 1 1 1 1
0.8 1 1 1 1 1
0.8 1 1 1 1 1
0.4 1 1 1 1 1
Problem instance 4 30 1 0.7 50 1 0.9 70 1 1 90 1 1 110 1 1 130 1 1
0.4 1 0.9 1 1 1
0.5 0.7 0.9 0.9 1 1
0.9 1 1 1 1 1
0.8 1 1 1 1 1
0.7 0.7 1 1 1 1
0.4 0.8 0.9 1 1 1
1 0.9 1 1 1 1
0.8 1 1 1 1 1
0.5 1 0.9 0.9 0.9 1
0.3 0.7 1 0.9 1 1
0.9 1 1 1 1 1
0.8 1 1 1 1 1
0.6 0.7 1 0.9 1 1
0.4 0.9 1 1 1 1
100
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
400 350 300 250 200 150 100 50
30 50 70 90 110 130 30 50 70 90 110 130 30 50 70 90 110 130 30 50 70 90 110 130
0
0.05
0.1
0.15
0.2
400 350 300 250 200 150 100 50
30 50 70 90 110 130 30 50 70 90 110 130 30 50 70 90 110 130 30 50 70 90 110 130
0
0.05
0.1
0.15
0.2
Fig. 1. Average number of generations until termination: (a) problem instance 2 and (b) problem instance 4. Table 6 Results for strongly correlated parameter values, problem instance 5. P|c
e¼ 0.05 0.5
e ¼ 0.10
e ¼0.15
e¼ 0.20
0.6
0.7
0.8
0.9
0.5
0.6
0.7
0.8
0.9
0.5
0.6
0.7
0.8
0.9
0.5
0.6
0.7
0.8
0.9
Problem instance 5 30 0.5 0.5 50 0.7 1 70 0.8 0.8 90 1 1 110 0.8 0.9 130 1 0.9
0.4 0.6 0.9 0.7 1 0.9
0.5 0.6 0.9 1 1 0.8
0.6 0.7 0.7 0.6 0.8 0.8
0.5 0.6 0.9 0.9 0.9 1
0.4 0.5 0.7 0.9 0.9 1
0.5 0.7 0.8 1 0.7 1
0.2 0.5 0.9 0.8 0.8 1
0.2 0.4 0.6 0.8 0.7 0.9
0.5 0.7 1 0.8 1 1
0.3 0.8 0.9 0.9 0.8 1
0.3 1 0.5 0.9 1 1
0.5 0.6 0.7 0.8 0.9 0.9
0.2 0.3 0.6 0.7 1 0.8
0.5 0.7 0.7 1 1 1
0.5 0.8 1 0.9 1 1
0.7 0.7 0.8 0.9 0.8 0.9
0.3 0.4 0.6 0.9 0.9 1
0.1 0.1 0.8 0.6 0.9 0.8
Table 7 Average relative error for strongly correlated parameter values. e
0.05 0.10 0.15 0.2
Table 8 Results of nonparametric analysis.
Crossover fraction 0.5
0.6
0.7
0.8
0.9
0.4600 0.4877 0.4486 0.3433
0.4617 0.4260 0.4864 0.4116
0.4543 0.4510 0.4254 0.3935
0.3957 0.5067 0.5040 0.4245
0.1782 0.2333 0.4032 0.3266
fraction curve of the control group as the statistic D. As a result, analysis of variance (ANOVA) is not appropriate to analyze the data in this experiment and therefore, a nonparametric approach
Source
Num DF
Den DF
F-value
p-Value
a b ab c ac bc abc
3 4 12 5 15 20 60
5880 5880 5880 5880 5880 5880 5880
0.525 54.24 0.56 92.34 0.51 5.38 0.50
0.6666 o .0001 0.8741 o .0001 0.9373 o .0001 0.9996
a: elite fraction. b: crossover fraction. c: population size.
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
101
Table 9 Result summary for larger problem instance. Problem data
GAMS result
Best GA result
N
Tag capacity
z
Capacity
Num on
z
Capacity
Num on
z
Capacity
Num on
100 50
260 116
1.82E 07 4.13E 05
232.56 115.21
69 43
1.51E 10 3.04E 05
259.86 115.03
85 41
1.74E 09 1.54E 04
259.38 114.6
78 38
of ANOVA is used. In this study, SAS PROC MIXED with option CHISQ was applied to the data. The results are summarized in Table 8. No significant three-way interaction exists between the variables elite fraction, crossover fraction, and population size. Additionally, No significant two way interaction exists between elite fraction and crossover fraction, or elite fraction and population size. More importantly, no significant elite fraction effect is found. However, a significant interaction between crossover fraction and population size exists indicating that the effect of the crossover fraction on the solution depends on the population size. Post hoc analysis on the interaction revealed that the population size has a significant impact on the solution for all levels of crossover fraction as follows: (1) For level 0.5, F(5,5880)¼3.84, p o0.01; (2) For level 0.6, F(5,5880)¼8.10, p o0.0001; (3) For level 0.7, F(5,5880)¼13.61, p o0.0001; (4) For level 0.8, F(5, 5880)¼34.36, p o0.0001; and (5) For level 0.9, F(5,5880)¼ 53.16, po0.0001. However, when the population size is fixed, it is noted for large values (110 and 130), there is no significant difference among the different levels of crossover fraction (F(4,5880)¼0.91, p ¼0.4577 for population size of 110; F(4,5880)¼1.28, p¼ 0.2762 for population size of 130). When the population size is relatively small, there is a significant difference among the levels of crossover fraction (F(4,5880)¼ 38.32, po0.0001 for population size of 30; F(4,5880)¼22.38, p o0.0001 for population size of 50; F(4,5880)¼7.44, p o0.0001 for population size of 50; F(4,5880)¼ 4.76, p o0.001 for population size of 90). The F statistic is a ratio between the estimate of the variance between samples and the estimate of the variance within samples with two degrees of freedom.
5.3. Properties of optimal solutions Based on the sample problems generated, we are able to characterize some properties of the optimal solution. These characterizations provide methods by which the initial population could be constructed for the GA, which could lead to higherquality solutions for more difficult problem instances. When the coefficient of determination matrix is classified as low, all items have equal weight and the value of the items is in [1,N] (problem instance 2), then the items with the lowest value are left off the tag in the optimal solution. In other words, when there is no relationship between the data items, suggesting low (rij)2 values in the CV matrix, a greedy approach to filling the RFID tag based on the value of the item may be appropriate. That is, order the vi in descending order, and successively add items to the tag until the tag capacity is reached. This result is intuitive, since there is no relationship between the data elements and the importance value determines the decision maker’s priority in terms of data selection. In problem instance 3, the value of the item is set equal to 1 and the CV matrix is classified as high. The items left off the tag under this setting correspond to the values with the highest
Greedy CV result
column averages. Therefore, an approach to filling the RFID tag under this setting would be as follows: P 1. Determine cvj ¼ N i ¼ 1 cvij =N. 2. Order the cvj in ascending order. 3. Successively add items to the tag until the tag capacity is reached. For problem instance 4, this approach does not yield the optimal solution but results in a solution which gives a relative error of 0.000791. Problem instances 1 and 5 represent CV matrices with a high degree of dependence between the data items. Both techniques mentioned above do not yield optimal solutions for those instances. For example, in problem instance 1, sorting values by minimum column average would result in very high value items being left off the tag, yielding an objective function value of 16.07032 as opposed to the optimal of 2.638214. Selecting a solution strictly based on the value yields an objective function value of 4.34103 (relative error of 0.645). 5.4. Larger problem instance To test the scalability of the GA approach presented, two randomly generated problem instances are considered. The first problem contains 50 data elements and the second contains 100. In both instances, the CV matrix is randomly generated with entries taking on values between 0 and 0.5. The weights are generated as indicated in Section 4.3.4 for the strongly correlated problem instance and the tag capacity is 80% of the sum of the individual item weights. The mathematical model presented in Eqs. (6)–(9) is formulated in GAMS and solved using the LindoGlobal solver on the NEOS Server environment.1 We used problem instance 5 to validate the formulation and the optimal solution matched the one obtained from the enumeration procedure. The solution obtained from GAMS for the larger problems is compared with the best solution obtained from the GA after 10 replications. We set the population size to 130 and set the termination criteria based on the stall generation limit of 500. Note, we do not use the second stopping criteria mentioned in Section 5.1 based on the value of the optimal solution. The elite fraction and crossover fraction are set at 0.15 and 0.5, respectively. We also calculate the best solution obtained from the greedy based heuristic outlined in Section 5.3, which selects items based on the average column value in the CV matrix. The results of all three approaches are summarized in Table 9, where z denotes the value of the objective function, capacity denotes the total capacity used, and num on denotes the number of data items allocated to the tag. To minimize the bias associated with round-off error between the different solvers, we take the optimal solution vector from GAMS and calculate the value of z, using the RFID evaluation function in Matlab. In both cases, the solution obtained by the GA results in the smallest value of unexplained data, while simultaneously maximizing the capacity utilization. 1
www.neos.mcs.anl.gov/neos/
102
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
6. Conclusions and recommendations for future studies In this research, the problem of data allocation to RFID tags is studied. The problem is modeled as a version of the nonlinear knapsack problem, namely data allocation knapsack problem (DAKP), where the total value of ‘‘unexplained’’ data is minimized subject to a tag capacity constraint. In order to find (near) optimal solutions to DAKP, a binary encoded GA is developed. An extensive computational study is performed to evaluate the effectiveness of the GA under different problem settings. Our results show that the GA performs well for the uncorrelated (weights and values) problem instances, finding the optimal solution 100% of the time when the elite fraction and crossover fraction are at its lowest levels. For the strongly correlated (weights and values) problem instance, the GA finds the optimal solution more frequently at larger population sizes. Furthermore, nonparametric approach of ANOVA indicates there is significant interaction between the crossover fraction and population size on the quality of the solution obtained. Post hoc analysis revealed, the population size is significant at all levels of crossover fraction. However, the crossover fraction is significant only at low levels of the population size. Based on the sample problems generated, we are able to characterize some properties of the optimal solution. If the decision maker is indifferent between the value of the data items, and the CV matrix has a high degree of dependence, then selecting
items in increasing order of the column averages until the capacity is reached is a good approach to the data allocation problem, assuming the items have equal storage requirements. If there is no dependence between the data items, and the storage requirements are the same, then selecting items in decreasing order of value is an appropriate method. When there is a high degree of dependence, unequal weights, and preference in terms of item value, then the GA is an effective method to determine the allocation of items to the RFID tag. For future research, extensions of DAKP, such as multi-dimensional DAKP and multi-objective DAKP, can be investigated. Multi-dimensional DAKP can be regarded as a DAKP, where there is more than one tag to consider and data must be allocated simultaneously. Also, various objective functions can be optimized simultaneously with DAKP’s objective function (6) such as minimizing the total cost of allocating data or maximizing the total value of data that is allocated. Furthermore, DAKP and multidimensional DAKP can be solved with different heuristics (GA, tabu search, simulated annealing, etc.), and multi-objective DAKP can be solved with multi-objective evolutionary algorithms.
Appendix A See Tables A1 and A2.
Table A1 Description of data elements. Variable
Description
1. Age 2. Alanine aminotransferase 3. Albumin 4. Alcohol 5. Alcohol +VHB 6. Alcohol +VHC 7. Mild Ascitis 8. Moderate Ascitis 9. Severe Ascitis 10. Ascitis—Physical Exam 11. Ascitis—Ultrasonographic 12. Aspartate aminotransferase 13. Bilirubin 14. CHILD A 15. CHILD B 16. CHILD C 17. CHILD PTS 18. Collateral circulation 19. Cryptoglobulinemia 20. Erythema 21. Gender 22. Gynecomastia 23. Hemoglobin 24. Hepatitis B virus 25. Hepatitis C virus 26. INR index 27. Jaundice 28. Plaques 29. Prothrombin time 30. Race 31. Rarefaction 32. Red spots 33. Spiders 34. Splenomegaly 35. Varices 36. Large varices 37. Medium varices 38. Thin varices 39. White blood cells
Self-explanatory Measure of the concentration of the enzyme produced by liver and other organs Measure of the concentration of the protein produced by the liver. Low levels indicating poor liver function. Concentration of alcohol (ethanol) intake (drink) Joint signal of alcohol concentration and Hepatitis B Virus Joint signal of alcohol concentration and Hepatitis C Virus Large quantity of liquid inside abdomen produced by chronic liver disease (light in a standard scale) Large quantity of liquid inside abdomen produced by chronic liver disease (moderate in a standard scale) Large quantity of liquid inside abdomen produced by chronic liver disease (severe in a standard scale) Presence of excess fluid in the space between the tissues lining the abdomen and abdominal organs, measured manually by a physician Presence of excess fluid in the space between the tissues lining the abdomen and abdominal organs, measured in ultrasonographic equipment. Measure of the concentration of the an enzyme produced by liver Measure of the concentration of the yellowish pigment found in bile, a fluid produced by the liver Child Pugh-Score is a classification designed to access the risk of death of a patient suffering from chronic liver disease (5–6) Child Pugh-Score is a classification designed to access the risk of death of a patient suffering from chronic liver disease (7–9) Child Pugh-Score is a classification designed to access the risk of death of a patient suffering from chronic liver disease (10–15) Scores from Child Abnormal veins found in abdomen that suggest chronic liver Disease frequently related to Hepatitis C Skin condition characterized by redness or rash Male/female Benign enlargement of male breast Measure of the concentration of the greater part of red blood cell—a substance involved in oxygen carriage Virus involved in the mechanism of liver disease Virus involved in the mechanism of liver disease International normalized ratio (INR) Observable yellowing of the skin-bilirubin skin deposit Measure of the concentration of the type of cell produced by bone marrow involved in the coagulation Time measurement of clotting tendency of the blood Self-explanatory Notable paucity of axillary and pubic hair Signal found about varices that suggests a recent bleeding complication Grouping of small blood vessels on the surface of the skin Enlargement of the spleen frequently find in chronic liver disease Veins abnormally dilated found in esophagus Veins abnormally dilated found in esophagus (large in a standard scale) Veins abnormally dilated found in esophagus (medium in a standard scale) Veins abnormally dilated found in esophagus (thin in a standard scale) Cells found in the blood responsible for immunity
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
Table A2 Variables rank. Rank
Variable
Sum of the ranks
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Ascitis—physical exam CHILD A Varices CHILD PTS Prothrombin time CHILD C Severe Ascitis INR index Jaundice Rarefaction Alcohol Thin Varices Ascitis—Ultrasonographic Gender Moderate Ascitis Hepatitis C virus White blood cells Splenomegaly Albumin Aspartate aminotransferase Spiders Age Hemoglobin Plaques Medium varices Bilirubin Alanine aminotransferase CHILD B Red spots Cryptoglobulinemia Alcohol + VHC Gynecomasty Large Varices Mild Ascitis Hepatitis B virus Erythema Collateral circulation Race Alcohol+ VHB
489 565 605 628 638 645 689 689 719 721 726 728 734 744 747 757 759 768 771 771 775 783 800 803 804 808 811 822 832 874 878 891 898 907 930 945 965 985 1016
References [1] Angeles R. RFID technologies: supply-chain applications and implementations issues. Information Systems Management 2005;22(1):51–65. [2] Carbunar B, Ramanathan MK, Koyuturk M, Jagannathan S, Grama A. Efficient tag detection in RFID systems. Journal of Parallel and Distributed Computing 2009;69:180–96. [3] Pappu M, Singhal R, Zoghi B. RFID in hospitals: issues and solutions. A report from the Consortium for the Accelerated Deployment of RFID in Distribution, September 2004. [4] Chao C, Yang J, Jen W. Determining technology trends and forecasts of RFID by a historical review and bibliometric analysis from 1991 to 2005. Technovation 2007;27:268–79. [5] Tzeng S, Chen W, Pai F. Evaluating the business value of RFID: evidence from five case studies. International Journal of Production Economics 2008;112:601–13. [6] Smart Card Alliance, Smart Cards in US Healthcare: benefits for patients, providers and payers. Retrieved from RFID Journal, July 14, 2010. [7] Roberts CM. Radio frequency identification. Computers & Security 2006;25:18–26. [8] Apte UM, Dew N, Ferrer G. What is the right RFID for your process? NPS Acquisition Research Sponsored Report Series, NPS-LM-06-009, 2006, p. 1–77. [9] Osman M, Ram B, Stanfield P, Samanlioglu F, Davis L, Bhadury J. Radio frequency identification system optimization models for life cycle of a durable product. International Journal of Production Research: Special Issue on RFID Technology and Applications in Production and Supply Chain Management 2010;48(9):2699–721. [10] Fre´ville A, Hanafi S. The multidimensional 0-1 knapsack problem—bounds and computational aspects. Annals of Operations Research 2005;139(1):195–227. [11] Hanafi S, Wilbaut C. Scatter search for the 0-1 multidimensional knapsack problem. Journal of Mathematical Modelling and Algorithms 2008;7(2): 143–59. [12] Vimont Y, Boussier S, Vasquez M. Reduced costs propagation in an efficient implicit enumeration for the 0-1 multidimensional knapsack problem. Journal of. Combinatorial Optimization 2008;15(2):165–78.
103
[13] Hanafi S, Wilbaut C. Improved convergent heuristic for the 0-1 multidimensional knapsack problem. Annals of Operations Research 2009, doi:10.1007/ s10479-009-0546-z. [14] Boussier S, Vasquez M, Vimont Y, Hanafi S, Michelon P. A multi-level search strategy for the 0-1 multidimensional knapsack. Discrete Applied Mathematics 2010;158(2):97–109. [15] Fre´ville A. The multidimensional 0-1 knapsack problem: an overview. European Journal of Operational Research 2004;155:1–21. [16] Lin EY. A bibliographical survey on some well-known non-standard knapsack problems. Information Systems and Operations Research 1998;36(4): 274–317. [17] Akbar MdM, Rahman MS, Kaykobad M, Manning EG, Shoja GC. Solving the multidimensional multiple-choice knapsack problem by constructing convex hulls. Computers & Operations Research 2006;33:1259–73. [18] Andonov R, Poirriez V, Rajopadhye S. Unbounded knapsack problem: dynamic programming revisited. European Journal of Operational Research 2000;123:394–407. [19] Kameshwaran S, Narahari Y. Nonconvex piecewise linear knapsack problems. European Journal of Operational Research 2009;192:56–68. [20] Pferschy U, Pisinger D, Woeginger GJ. Simple but efficient approaches for the collapsing knapsack problem. Discrete Applied Mathematics 1997;77: 271–80. ~ [21] Simoes A, Costa E. Using biological inspiration to deal with dynamic environments. In: Proceedings of the seventh international conference on soft computing (MENDEL’01), Czech Republic, 2001. ~ A, Costa E. A comparative study using genetic algorithms to deal with [22] Simoes dynamic environments. In: Pearson DW, Steele NC, Albrecht R, editors. Proceedings of the sixth international conference on neural networks and genetic algorithms (ICANNGA’03). Springer: Roanne, France; 2003. p. 203–209. [23] Lin F. Solving the knapsack problem with imprecise weight coefficients using genetic algorithms. European Journal of Operational Research 2008;185: 133–45. [24] Ross KW, Tsang DHK. The stochastic knapsack problem. IEEE Transactions on Communications 1989;37(7):740–7. [25] Aghezzaf B, Naimi M. The two-stage recombination operator and its application to the multiobjective 0/1 knapsack problem: a comparative study. Computers & Operations Research 2009;36:3247–62. [26] Alves MJ, Almeida M. MOTGA: a multiobjective Tchebycheff based genetic algorithm for the multidimensional knapsack problem. Computers & Operations Research 2007;34:3458–70. [27] Jaszkiewicz A. On the computational efficiency of multiple objective metaheuristics: the knapsack problem case study. European Journal of Operational Research 2004;158:418–33. [28] Bretthauer KM, Shetty B. The nonlinear knapsack problem—algorithms and applications. European Journal of Operational Research 2002;138:459–72. [29] Fung RYK, Tang J, Wang D. Extension of a hybrid genetic algorithm for nonlinear programming problems with equality and inequality constraints. Computers & Operations Research 2002;29:261–74. [30] Zhang B, Hua Z. A unified method for a class of convex separable nonlinear knapsack problems. European Journal of Operational Research 2008;191:1–6. [31] Julstrom BA. Greedy, genetic, and greedy genetic algorithms for the quadratic knapsack problem. In: Proceedings of the GECCO’05, Washington, DC, USA, 2005. [32] Hiley A, Julstrom BA. The quadratic multiple knapsack problem and three heuristic approaches to it. In: Proceedings of GECCO’06, Seattle, Washington, USA, 2006. [33] Ezziane Z. Solving the 0/1 knapsack problem using an adaptive genetic algorithm. Artificial Intelligence for Engineering Design 2002;16:23–30. [34] Kubata R, Horio K, Yamakawa T. Genetic algorithm with modified reproduction strategy based on self-organizing map and usable schema. International Congress Series 2006;1291:169–72. ~ [35] Simoes A, Costa E. An evolutionary approach to the zero/one knapsack problem: testing ideas from biology. In: Kurkova´ V, Steele N, Neruda R, Ka´rny´ M, editors. Proceedings of the fifth international conference on neural networks and genetic algorithms (ICANNGA’01). Springer: Prague, Czech Republic; 2001. p. 236–239. [36] Holland JH. Adaptation in natural and artificial systems. Ann Arbor: The University of Michigan Press; 1975. [37] Samanlioglu F, Kurz ME, Ferrell WG, Tangudu S. A hybrid random-key genetic algorithm for a symmetric traveling salesman problem. International Journal of Operations Research 2007;2(1):47–63. [38] Samanlioglu F, Davis L. A heuristic approach for allocation of data to RFID tags. In: Proceedings of the Institute of Industrial Engineers annual conference (IERC), Miami, Florida, USA, 2009. [39] Coello Coello CA. Theoretical and numerical constraint handling techniques used with evolutionary algorithms: a survey of the state of the art. Computer Methods in Applied Mechanics and Engineering 2002;191(11–12):1245–87. [40] Smith AE, Coit DW. Constraint handling techniques—penalty functions. In: Back T, Fogel DB, Michalewicz Z, editors. Handbook of evolutionary computation. Oxford: Oxford University Press and Institute of Physics Publishing; 1997. [chapter C5.2]. [41] Hoff A, Lokketangen A, Mittet I. Genetic algorithms for the 0/1 multidimensional knapsack problems. In: Proceedings of Norsk Informattik Konferanse, 1996, p. 291–302. [42] Chu PC, Beasley JE. A genetic algorithm for the multidimensional knapsack problem. Journal of Heuristics 1998;4(1):63–86.
104
L. Davis et al. / Computers & Operations Research 39 (2012) 93–104
[43] Bhatia AK, Basu SK. Tackling 0/1 knapsack problem with gene induction. Soft Computing—A Fusion of Foundations, Methodologies and Applications 2003;8(1):1–9. [44] Khuri S, Back T, Heitkotter J. The zero/one multiple knapsack problem and genetic algorithms. In: Proceedings of the 1994 symposium on applied computing, 1994, p. 188–93. [45] Hill RR, Hiremath C. Improving genetic algorithm convergence using problem structure and domain knowledge in multidimensional knapsack problems. International Journal of Operational Research 2005;1(1):145–59. [46] Schiff E, Sorrell M, Schiff L, Maddrey W. Schiff’s diseases of the liver. Philadelphia: Lippincott Williams & Wilkins; 2006.
[47] Martello S, Pisinger D, Toth P. New trends in exact algorithms for the 0/1 knapsack problem. European Journal of Operations Research 2000;123: 325–32. [48] Pisinger D. Where are the hard knapsack problems? Computers & Operations Research 2005;32:2271–84. [49] Corp Z. An introduction to passive RFID. Zebra White Paper, 2009. Retrieved from /http://www.zebra.com/id/zebra/na/en/index/resource_library/white_ papers.htmlS. [50] Michalewicz Z, Arabas J. Genetic algorithms for the 0/1 knapsack problem. In: Ras Z, Zemankovo M, editors. Methodologies for intelligent systems (ISMIS 94). Berlin: Springer; 1994. p. 134–143.