Accepted Manuscript Title: Using Classification Techniques for Statistical Analysis of Anemia Authors: Kanak Meena, Devendra K. Tayal, Vaidehi Gupta, Aiman Fatima PII: DOI: Reference:
S0933-3657(18)30131-3 https://doi.org/10.1016/j.artmed.2019.02.005 ARTMED 1665
To appear in:
ARTMED
Received date: Revised date: Accepted date:
27 February 2018 20 August 2018 18 February 2019
Please cite this article as: Meena K, Tayal DK, Gupta V, Fatima A, Using Classification Techniques for Statistical Analysis of Anemia, Artificial Intelligence In Medicine (2019), https://doi.org/10.1016/j.artmed.2019.02.005 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Using Classification Techniques for Statistical Analysis of Anemia Kanak Meena1, Devendra K. Tayal2, Vaidehi Gupta3, Aiman Fatima4
[email protected],
[email protected],
[email protected],
[email protected] 1,2,3,4: Computer Science and Engineering
SC RI PT
1,2,3,4: Indira Gandhi Delhi Technical University for Women, India
Highlights
U
N
EP
A
M
D
Anemia in children is becoming a worldwide problem owing to the unawareness among people regarding the disease, its causes and preventive measures. This study develops a decision support system using data mining techniques that are applied to a database containing data about nutritional factors for children. The data set was taken from NFHS-4, a survey conducted by the Government of India in 2015-16. The work attempts to predict anemia among children and establish a relation between mother’s health and diet during pregnancy and its effects on anemic status of her child. It aims to help parents and clinicians to understand the influence of an infant’s feeding practices and diet on his/her health and provide guidelines regarding diet in order to prevent anemia. The two techniques, decision tree and association rule mining has been applied and compared to select more appropriate technique for this particular task and a model is proposed in the healthcare domain with the aim to reduce the risk of the blood-related disease anemia. A statistical analysis has been provided about Anemia in children with respect to various factors like type of residence, state-wise, gender-wise etc. The results are presented in the form of rules with count and overall accuracy for rules from decision trees and Support, Confidence, Lift and Count for rules from association rules. Some corollaries are suggested to Clinicians, parents and government based on the rules that should be undertaken so as to reduce the risk of anemia in children.
TE
A
CC
Abstract: Anemia in children is becoming a worldwide problem owing to the unawareness among people regarding the disease, its causes and preventive measures. This study develops a decision support system using data mining techniques that are applied to a database containing data about nutritional factors for children. The data set was taken from NFHS-4, a survey conducted by the Government of India in 2015-16. The work attempts to predict anemia among children and establish a relation between mother’s health and diet during pregnancy and its effects on anemic status of her child. It aims to help parents and clinicians to understand the influence of an infant’s feeding practices and diet on his/her health and provide guidelines regarding diet to prevent anemia. Earlier, systems were built on computer using medical experts’ advice which was then translated into algorithms for use. However, this method was time consuming thus, artificial intelligence came into play utilizing knowledge discovery and data mining tools for predictive modeling. The two techniques, decision tree and association rule mining has been applied and compared to select more appropriate technique for this particular task and a model is proposed in the healthcare domain with the aim to reduce the risk of the blood-related disease anemia. Keywords: Data Mining; Anemia; Healthcare; Decision Tree; Associative Classification. 1.
INTRODUCTION
SC RI PT
Anemia is a blood disorder defined as a decrease in hemoglobin level, also stated as a decrease in the amount of Red Blood Cells (RBCs) and the inability to carry oxygen through the body efficiently [1]. The three main causes of anemia can be stated as loss of blood, reduction in production of red blood cell and increase in breakdown of RBCs [2]. It affects nearly a third of population worldwide, with Iron Deficiency Anemia aka IDA, being the most recurrent type, affecting more than 1.5 billion people [3]. In the year 2015, about 54,000 deaths occurred due to it [4]. Fig 1 shows a Venn diagram depicting the world population being affected by the disease anemia, out of which the majority suffers from IDA. It generally remains undiagnosed due to its high prevalence, especially in women and children. It is an easily treatable disease if detected in early stage but may also lead to death when left untreated. IDA is most common among children where chronic condition can even lead to behavioral disturbances like Pica and Restless Legs Syndrome [5] because of long term permanent neurological development impairment and altered motor functions. Treatment for less severe cases of anemia involves oral intake of iron supplementary and fortified food for babies and injectable iron if required. In severe cases [6], methods like blood transfusion, and administration of Erythropoiesis-stimulating agent [7] and hyperbaric oxygen is undertaken. Negligence of preventive measures like healthy diet and feeding has led to such high prevalence of anemia among children.
A
CC
EP
TE
D
M
A
N
U
According to NFHS-4 (National Family Health Survey) National Report (2015-16), 57.52 % children in India suffer from anemia including mild, moderate and severe conditions [8]. The effects of this devastating disease are not restricted to only health but also affect quality of life and gives in to huge economic losses for people and for countries with high prevalence [9]. In South-east countries like Bangladesh, India and Pakistan, losses in economic productivity due to anemia is estimated $4.5 billion annually [9]. Most of the times, people are unable to identify the presence of the disease because the symptoms are too common like paleness of skin, dizziness, etc. but the disease itself is rampant which is a matter of concern. Considering the unawareness of people regarding the role of child’s feeding and diet in preventing anemia as one of the main reasons for high prevalence of anemia nationwide, it is important to impart the required knowledge. Using the recently released real dataset NFHS-4, obtained from the official website of DHS [10], we developed a predictive data mining model. This data accumulated over a time period of two years from 2015-16 contains information about children who are below the age of 5 about their feeding habits, hygiene and health supplements. With the purpose to diminish the widespread presence of this disease and its negative consequences, we were inspired to build a data mining model to predict whether a child is anemic or not. The predictions were made depending upon factors like infant feeding practices, health diet and his/her mother’s health and diet during pregnancy. [11] It had many shortcomings and left many prospects for future work. In this paper, relation between mother’s iron deficiency, and child’s growth and development has been mentioned but the authors were unable to provide definite constraints around this relation. We have proposed our model while overcoming the drawbacks mentioned in paper [11] by defining proper factors and conditions to establish the relation between mother’s health and its effects on the risk of anemia in child.
SC RI PT
Fig1. Venn diagram depicting number of people suffering from anemia and Iron deficiency anemia
U
1.1 Iron Deficiency Anemia
M
A
N
When the body has sufficient functional iron, the remainder is stored usually in bone marrow or liver in the form of iron stores for later use by the cells and when the amount of requirement of iron exceeds the available amount it starts to use the stored iron [12-13]. However, if the stores are continuously used without replenishing the iron supplies then the body eventually gets depleted to the point that the formation of RBCs is no longer normal. Such condition is called anemia which is the hemoglobin value below the normal value [13].
CC
EP
TE
D
The first test to be performed for the diagnosis is usually the CBC (complete blood count) test which takes in components like RBCs, white blood cells (WBCs), hemoglobin, hematocrit and platelets. Usually, low values of hemoglobin and hematocrit levels are enough for indication of IDA; but, small size of RBCs can also be considered for its determination [14]. The first indication for IDA using CBC test is usually high red blood cell distribution width (RDW), reflecting an increased variability in the size of red blood cells (RBCs) [15][16]. Many times, the datasets that we acquire have just hemoglobin level values as the factor, and other attributes are missing. However, anemia can still be detected using it as the sole factor but the error rate using this method is about 5%. In such method, if individuals have 2SDs (two standard deviations) below the distribution mean for hemoglobin (that is, the healthy range) then it is considered the risk line for a person to be suffering from iron deficiency anemia [17]. Table 1 specifies the mean and 2SDs below mean hemoglobin level for infants and children until the age of 6 years where the latter is the level at which IDA is present in a child. Also, the healthy ranges vary distinctly in the early days after birth and is very different generally for all ages. These hemoglobin levels also vary with laboratories performing these tests by a small difference and thus a generalized value/level can never be given.
A
Iron deficiency is a condition in which there are no mobilized iron supplies and is the result of long term negative iron balance [17]. The severe stage of the same is considered to be iron deficiency anemia. Though these two terms are generally used interchangeably but refer to different conditions, the latter being the subset of the former. It means that iron deficiency anemia is the lower end of the distribution of iron deficiency [17]. Data mining is the process of examining large databases that already exist in order to generate new information that helps in predicting future prospects by involving methods like statistics, machine learning and data analysis [19]. The two types of machine learning tasks are supervised and unsupervised learning that deduces functions from labeled and unlabeled training datasets respectively [20]. Consider a given set of input variables XT = (X1, X2... Xp) for which the set of corresponding response variable is Y = (Y1, Y2...Yp) [21]. The predictions based on the training
sample (x1, y1),..., (xp, yp) of previously solved cases, where the values of the target variables are already known is supervised learning, also described as “learning with a supervisor”, where “student” presents an answer in the training sample, and the “supervisor” provides either the correct answer and/or an error associated with the student’s answer which is characterized by loss function L(𝑦 , 𝑦̂). Whereas unsupervised learning is “learning without a supervisor”. In this case, for a set of N observations (X1, X2...Xn) we draw the inferences directly without any supervisor providing right answers and error-rate for each observation [21].
SC RI PT
The model proposed here mainly indulges in exploring two techniques that is, decision trees and association rule mining. The two techniques used are supervised and unsupervised respectively. Another great approach for unsupervised learning is clustering which is very commonly used and provides improved stability [22] and better performance on real world data sets [23]. Also, in order to improve the predictive outcomes, ensemble methods can also be used as it involves building a model by integrating many different models together [24] and can then be converted into an optimization problem [25]. In this paper, however, the two techniques have been applied separately. We intended to compare the two in order to see how applying two completely different techniques can provide varied results and find which one is suited better for this particular task as no one technique is always better than the other [26]. Fig 2 depicts the data mining techniques that are most commonly used and highlights the ones that are being applied and used in the system built here.
U
1.2 Decision Tree
M
A
N
Decision tree is a decision support tool which is a flowchart like structure of decisions and their possible outcomes. It is used to represent an algorithm displaying decisions made using conditional control statements only. In this tool, each internal node represents a test (or a condition check) and then the branches that follow shows the consequences of the test (for example, yes or no), and the leaf node represents a particular class (whether the child is mildly, moderately, severely or not anemic here) which is the final outcome of all the conditions applied till this step [27].
TE
D
It is one of the most popular machine learning tools in use and the reason for this widespread use is its simple and easy algorithm that can be applied in almost every type of situation. It can be applied easily along with other decision-making techniques and allow the possibility of addition of new scenarios and help determine worst, best and expected outcome for different scenarios. However, there are many drawbacks as well which demands for more complicated techniques to be used many times. Sometimes calculations can get very complex if many values are uncertain or missing. Also, in decision trees with many levels, the information gain gets biased with those attributes having higher number of levels reducing the credibility of the technique [28].
CC
EP
Some of the decision trees available are CHAID, QUEST, ID3, CRUISE, CART, MARS, C4.5, E-CHAID, GUIDE, etc. [29]. Decision trees which are used to predict categorical variables are called classification trees while the ones used to predict continuous variables are called the regression tree [29]. Though all these are decision tree-based methods but they all work differently. For example, all except the QUEST, GUIDE, and CART algorithms can produce both binary and multiway split while these only produce binary splits [30].
A
1.3 Association Technique Association is a rule-based machine learning technique that makes use of attributes in a large database to establish some interesting relations between the attributes and the target variable [31]. In data mining, it is a very popular and well researched technique to extract rules [32]. It is an unsupervised method where each rule is composed of two different set of items, left hand side (LHS) or antecedent and right-hand side (RHS) or consequent [33]. For different values of antecedent, consequent has a particular class or variable showcasing a full-fledged rule that has been generated by learning. Various constraints are used in order to select the strongest rules out of many rules formed. These constraints are confidence and support. However, these are insufficient in removing unnecessary association rules and thus another correlated measure is used called lift [32]. Rules with positive correlation, that is when lift is
greater than 1, are generally preferred. Most commonly used algorithms in association rule mining is the apriori algorithm [33], Eclat algorithm [34] and FP growth algorithm [35]. Others include AprioriDP, Context Based Association Rule Mining Algorithm, OPUS Search, etc. Apriori algorithm has been used in this paper for association rule mining.
KNN, ANN, SVM, , Logistic Regression
SC RI PT
Classification Supervised Learning
Regression
Data Mining
U
Unsupervised Learning
K-Means, SOM, etc.
A
N
Clustering
Decision Tree, Multiple Linear Regression, KNN, etc.
M
Fig2. Data Mining Tasks and Techniques
TE
D
The development of the model began with the aim to understand the influence of infant feeding and eating habits on a child’s health and discuss the general ideal practices to be considered to prevent anemia in children. It also establishes a relation between a mother’s health condition and diet during pregnancy, and its effects on the health of her child and the chances for the child to have anemia. It hopes to provide parents of infants and unborn children with certain preventive measures to be taken based on diet to be provided that in turn may help to prevent anemia in babies and thus reduce the rate of risk of anemia in the country.
A
CC
EP
The paper further proceeds as follows, Section 2 Related Works, showing previous works (mainly in last 10-15 years) published in the same domain and the relation of the current study to them. Section 3 is Statistical Analysis followed by Section 4 that is, Development of Model, which is further divided into sections like data collection, data preprocessing, and data mining techniques used showing the details of the developed model. Section 5 is computational results which showcase the results obtained in the entire system followed by the section comparative analysis in which comparison is drawn between the two techniques used. Section 6 named Predicted Outcomes shows the outcomes that have been predicted from the developed model along with suggestions for the target audience to prevent anemia. The last section is Section 7 Conclusion where the summary of the work done is discussed in brief followed by the References at the end. 2.
RELATED WORK
In this section, literature review of papers published in the past 15 years in the field of data mining and its applications for analysis of diseases especially anemia has been discussed. It also explores the relation between the current work and the already published work. In healthcare field, a lot of researches have been made using artificial
intelligence and its subset machine learning as well as data mining techniques to present many different systems. We have discussed a few papers below related to our work that concerned our model.
U
SC RI PT
According to the paper [19] it has been mentioned that, in the past 10 years there have already been many advances for diagnosis of anemia using data mining along with other technologies like machine learning and fuzzy concepts. As we have already discussed, many techniques like Support vector machine (SVM), artificial neural networks (ANN), logistic regression, KNN, K-Means etc. have been used to develop systems for analyzing and predicting anemia. Using clustering algorithms like K-Means and Fuzzy C Means, authors in [36] have predicted five diseases including pernicious anemia from a hemogram blood test sample analysis, while proposing a new weight-based kmeans algorithm. The determining factors considered for comparison of the techniques are accuracy, time and error rate where their proposed algorithm turned out to be better than the already existing ones. For identifying factors associated with anemia in children aged 6 to 59 months, a multinomial multilevel logistic regression model was developed in the paper [37]. Two types of analysis, bivariate and multivariate analysis, that were used in the paper showed different results. The former showing no connection between a child’s living conditions and household living standards on his/her health, while the latter showed a significant effect of mother’s age on her child. It was revealed that children of mothers between the age of 18 and 26 years had a greater risk of having anemia; however, no mention was made about mother being underage. In this paper, we evaluated the risk considering if mother is underage or not at the time of birth as an important factor as it is common for a girl to be under 18 at the time of the birth of her child especially in rural areas.
TE
D
M
A
N
Various machine learning techniques like an artificial neural network (ANN) [38] and support vector machines (SVM) have been used to make a non-linear function to show the interdependency of the data collected and erythrocytes levels [39] due to lack of presence of such models. These techniques proved to be complex in case of large sampling size. Authors in [38] developed artificial neural network (ANN) and an adaptive neuro-fuzzy inference system (ANFIS) to diagnose anemia and serum levels and compared the two models where the ANN model was found to be more likely to be accurate and precise. However, the most frequently used technique is decision tree method [29-30] [40-41]. In paper [30], Jahangiri M. et al. have used various classification and regression trees for differentiating β-thalassemia trait from iron deficiency anemia. The purpose was to select the more precise Decision-tree-based method which turned out to be CRUISE (Classification Rule with Unbiased Interaction Selection and Estimation). Out of these methods, we have selected the Cart algorithm for the model proposed in our paper.
A
CC
EP
The paper [40] focused on developing a decision support system to predict hospitalization of hemodialysis patients using decision trees and association rules with temporal abstraction. In [42], the authors aimed at discovering the cooccurrence of diseases along with diabetes by extracting association rules from medical transcripts of the patients. The study considered only one variable as antecedent of the rule which proves to be less accurate. Also, value of the constraint lift in the rules generated is almost equal to 1, not high and value of confidence is as low as 12% which is highly undesirable. All these shortcomings have been taken care of in the current study to provide better results. However, the study mentioned in paper [40] did not compare the two techniques. Following similar lines, we used these two techniques and made a comparison between the two in order to find the best suitable technique to predict anemia in children and explore the effects of a mother’s health condition on her child’s anemic condition. The aim of the study shown in [11] is to determine if anemia in mother alters the infant development and motherchild interaction or not. Though the study warranties relation between maternal iron deficiency and infant’s growth development but the constraints regarding this relation were not defined and were left for future work. We intended to define these constraints in the current study to answer the questions that were unresolved. Also, it has been mentioned that there is a very strong connection between anemia and depression in women, especially pregnant women. In [43], a similar conclusion has been made, that low hemoglobin level (or anemia) in pregnant women results in depression which is associated with future child’s health and can even lead to his/her death. Considering
maternal nutritional status as a risk factor, we developed a model to establish relation between mother and child’s anemic status using decision trees and association rules.
3. STATISTICAL ANALYSIS
SC RI PT
Association rule mining is still a less explored sector in comparison to other techniques. Thus, this paper focuses on analyzing prevalence of anemia using association rules while drawing comparison with the frequently used technique like decision trees in this domain. It serves as an extended version [19] by implementing the same using data analysis programming language, R in order to deduce results about anemic conditions in children. To develop a model to predict anemia by using techniques apart from the mentioned above, we chose association rules and decision trees for the clinical analysis.
Using the dataset provided by the government of India, some statistical estimation is made about anemia in children and their anemic conditions with respect to various factors like type of residence, sex, state, etc. All these factors matter in the health conditions of a child. 3.1 Type of residence
Fig3. Anemia levels in Rural and Urban areas
CC
EP
TE
D
M
A
N
U
The two main broad categories of human livelihood are classified as rural and urban depending on infrastructure and resident people of the particular area. The Fig 3 shows two pie charts depicting anemia level in children in rural and urban population respectively in India.
A
It can be seen, that number of not anemic children is greater in urban areas than rural areas though the difference is not very huge. Iron deficiency is the main reason for the immense burden of anemia on children in countries like India especially the rural areas [44] but it can also be seen that impact of type of resident area on anemia is not that significant. This implies that people, irrespective of their livelihood, should be heedful about the consequences and the prevalence of the disease. 3.2 Child’s Gender Usually, gender plays an important role in prevalence of a disease, especially in blood disorders where women are more vulnerable to fall prey as compared to men. In Fig 4 it can be seen that the number of male and female children under each category (anemic or not anemic) is almost the same without any noticeable difference. This shows that in children, anemia is not gender specific and no one gender is more likely to get sick owing to the fact that the
U
SC RI PT
children’s bodies are almost similar irrespective of their gender. The fact that women after puberty lose blood due to menstruation [17] is one of the reasons for grown women to be at a greater risk. Thus, parents need to give an equal amount of attention to both the male and the female child while trying to prevent anemia.
N
Fig4. Anemia in Male and Female Children
A
3.3 State wise Analysis
A
CC
EP
TE
D
M
In India, there are 29 states and 7 Union Territories and Fig 5 shows the state wise analysis of anemia in children and the statistics of mild, moderate and severe anemia in each state and union territory of India along with the percentage of not anemic children in the respective area. The bar graph shows the percentage of each category mentioned before. Various results can be drawn from the graph shown such as the Union Territory and the state with the most number of anemic children is Dadar and Nagar Haveli and Haryana respectively. Also, the ones with least number are Puducherry and Mizoram. When compared to the graph and statistics mentioned in [45], where the data considered is NFHS-3 conducted in 2005-06, the state with highest prevalence was Bihar and with the lowest was Goa. However, it can be seen that generally the least prevalence of anemia is being observed in the north-eastern states both in 2005-06 and 2015-16 data. It can be seen that Mizoram, Manipur, Nagaland, Tripura, Assam and Arunachal Pradesh except for Meghalaya is considerably less in comparison to others in Fig 5. Similarly, it can be seen that in paper [45] also, north eastern states had fewer anemic conditions. A detailed analysis of anemia in the northeastern states of India has been shown in paper [37].
SC RI PT U N
A
Fig5. State wise analysis of anemia
M
4. DEVELOPMENT OF PROPOSED DATA MINING MODEL 4.1 Data Collection
A
CC
EP
TE
D
The proposed system uses data collected during the National Family Health Survey (NFHS) coordinated and technically guided by International Institute for Population Sciences (IIPS) Mumbai which is a survey conducted on a large scale in households throughout India. The first survey was conducted in 1992-93 and three rounds of the survey have already been conducted since then. The National Family Health Survey 2015-16 (NFHS-4), the fourth round in the NFHS series, that was conducted after a time interval of 10 years from 2005-06 provides data on population, health, hygiene and nutrition for India and its every State and Union territory. It provides information for India on anemia, maternal and child health, fertility, child mortality, family planning habits, reproductive health and contraception, nutrition and vaccination. Dataset corresponding to children below 5 years was collected from the Demographic Health Survey Program’s data distribution system. The dataset contained data about the children born in last three years before the survey along with his/her mother’s health details. This data is dated from 20th January 2015 to 4th December 2016. Before pre-processing the complete dataset consisted of a total 1341 variables and 259627 observations. The target variable from the dataset is the Anemia level of the child which has 4 categories namely Severe, Moderate, Mild and Not Anemic. Fig. 6 shows the number of children from the survey having different levels of Anemia where number of children suffering from Severe, Moderate, Mild anemia and children who are not anemic are 3386 (1.3%), 60117 (23.1%), 57004 (21.9%) and 88988 (34.2%) respectively out of the total 259627 observations.
Childhood Anaemia 80000 60000 40000 20000 0 Severe
Moderate
Mild
Anaemia level
SC RI PT
Number of children
100000
Not Anemic
U
Fig6. No. of children against each level of Anemia (with respect to NFHS dataset)
N
4.2 Data Preprocessing
TE
D
M
A
The raw data taken for the analysis consisted of 1341 variables or features initially. So, proper feature selection was done, and variables which would have an impact on Hemoglobin levels, Iron deficiency and hence on Anemic levels of children and mother were selected. After this step, total 23 features were shortlisted which consisted of Case Id (Primary key; for unique identification of each child), Month_of_Interview, Year_of_Interview, Day_of_Interview, BirthMonth_child, BirthYear_child, Sex_of_child, Age_of_child, Hemoglobin_level_of_child, Anemialevel_child (as Target/Class feature) along with features involving infant feeding practices and his/her mother’s health and diet during pregnancy. Many a times, feature selection is also done using the ensemble method and has been previously done by many researchers to avoid parameter sensitivity [46]. Also, linkage learning techniques can prove to be a great help while grouping variables which are related to each other [47]. Table 2 shows the list of all the selected features for each type of model with the corresponding data types, their categories/range and action taken for cleaning these attributes.
CC
EP
In Fig 7, data preprocessing has been described using flowchart in detail. As shown in figure, after the feature selection, outlier processing is performed in order to remove the data that caused disturbance in the general pattern. Removal of outliers can drastically change the predictions along with reducing the error rate. Further cleaning is performed on these selected features in order to get a structured dataset, but the methods for the two techniques used are quite different.
A
In order to pre-process the missing values of the data set, before implementing decision trees, if the more than 50% of the observations were missing for a particular feature, that feature was discarded. For rest of the features, only the observations with any missing values were removed. This is depicted in Fig. 7. As shown in Table 3, only few thousands were left out of hundreds of thousands of observations after data preprocessing is performed for the decision tree model. The data is then divided into training and testing dataset for modeling as it is a supervised learning. While for association rule mining, there is no need to remove observations that contains missing values as this technique can deal with the features having missing values and can still generate sensible rules. The variable type of each feature, however, is an important factor for the algorithm used for association as only categorical variables can be inputted in the apriori algorithm. If the variable was categorical then it was added to the dataset but if the variable
CC
EP
TE
D
M
A
N
U
SC RI PT
is numerical then it is first converted into categorical variable using binning. Statistical data binning is a data preprocessing method to group a number of continuous values into a smaller number of groups [48] to reduce minor observation errors which is basically converting numerical variables to categorical variables in simpler terms. Another alternative to this approach for handling both categorical and numerical variables simultaneously is to use a clustering algorithm based on k-means paradigm as shown in [49]. Then these derived features are finally added to the dataset and it is then converted to transactional dataset. The processed datasets are then applied to get respective models.
A
`
Fig7. Data Preprocessing Flowchart
4.3 Platform Used The clinical decision support system showcased in the paper is developed using the R language (version 3.4.2) as the main data analysis language. R is a programming language used by the statisticians, data scientists and data miners for statistical computing and graphics and to develop data analysis software and programs. The platform used is RStudio which is an Integrated Development Environment (IDE) for R. The model was developed on a 64-bit operating system, x64-based processor having 8.00 GB RAM on Intel® Core™ i5-6200U CPU @ 2.30 GHz.
4.4 Decision Tree Learning
SC RI PT
In statistics, machine learning and data mining, it is the construction of a decision tree to visually represent decisions made and is one of the most common approaches for predictive modeling. Out of all the decision tree algorithms available, we used the Classification and regression Tree algorithm [50] which is also known as CART. In data mining, there are mainly two types of decision trees used, namely classification and regression Tree. This algorithm generates one of these two trees depending upon the type of dependable variable. That is, if the variable is categorical then a classification tree is generated else if the variable is numerical then a regression tree is generated. The target variable used in the proposed model is Anemialevel_child with categories severe, moderate, mild and not anemic indicating that it is a categorical variable, thus the trees generated in our work is a classification tree. In R, different packages and functions are available and used to help in generating models. The r-part package was installed and called for the purpose of making a decision tree. The functions in this package make use of the CART algorithm as the basis.
N
U
The construction of a decision tree is a top-down approach as the split starts from the top node and then at each step further splits are made using certain metrics to choose the best split [51]. Various metrics are Gini impurity, Information gain, Variance reduction, etc. for different algorithms. CART algorithm uses the Gini Impurity as a measure represented as IG. It measures the frequency of a element in the subset to be labeled incorrectly (for example, if a child suffering from mild anemia is labeled as a not anemic child). This labeling is a random work done with respect to the distributions of the label present.
𝐽
A
It can be evaluated using the formula written below: 𝐽
𝐽
𝐽
𝐽
𝐽
𝑖=1
𝑘≠1
𝑖=1
M
𝐼𝐺 (𝑝) = ∑ 𝑝𝑖 ∑ 𝑝𝑘 = ∑ 𝑝𝑖 (1 − 𝑝𝑖 ) = ∑ 𝑝𝑖 − 𝑝𝑖2 = ∑ 𝑝𝑖 − ∑ 𝑝𝑖2 = 1 − ∑ 𝑝𝑖2 𝑖=1
𝑖=1
𝑖=1
𝑖=1
TE
4.5 Association Rule Mining
D
IG in the formula above represents the measure quantity Gini Impurity for a set of items with J classes where I range from 1 to J, and pi is the fraction of items labeled with class i.
Association Rule Extraction consists of mainly two steps: Generation of Rules Selection of Interesting Rules [32]
EP
1. 2.
CC
The first step is carried forward using algorithms like Apriori Algorithm and FP Growth Algorithm while for the second step many measures are needed to be taken in order to achieve our goal. Apriori algorithm is used for association rules mining here. This algorithm uses breadth-first search to count the number of itemsets [33].
A
Various constraints are used in order to select the strongest rules out of many rules formed. These constraints are minimum thresholds on support and confidence where the former is how frequently the item set appears and the latter is how often the rule is found to be true. Another is lift which is a relative constraint that considers the confidence of both the rule as well as the dataset. a.
Support
The support S (A) of an itemset A is defined as S (A) = (no. of transactions which contain the itemset A)/ (total no. of transactions) … (1)
b.
Confidence
The confidence C of a rule (𝐴 → 𝐵) is defined as C(𝑨 → 𝑩) = 𝑺(𝑨 ∪ 𝑩)/𝑺(𝑨) c.
… (2)
Lift
The lift L of a rule (𝐴 → 𝐵) is defined as 𝑺(𝑨∪𝑩)
SC RI PT
𝑳(𝑨 → 𝑩) =
… (3)
𝑺(𝑩)∗𝑺(𝑨)
U
The final attributes selected were divided into two groups based upon the Feeding practices and Mother-Child Relation. Since apriori algorithm works on categorical variables, hence all the numerical attributes were first converted to categorical attributes. Hence, Breastfeed_duration was categorized as “Less than 6 months", "From 6 months up to 1 year", "From 1 year to 2 years", "From 2 years to 3 years", "More than 3 years", "Ever breastfed, not currently breastfeeding", "Never breastfed", "Still breastfeeding", "Breastfed until died", "Inconsistent", "Don't know", "Missing” and, Hemoglobin_level_mother and hemoglobin_level_child were categorized as "Less than 9.0g/dl", "From 9.0g/dl to 14.0g/dl", "From 14.0g/dl to 18.0g/dl", "Above 18.0g/dl", "Not Present", "Refused", "Other", "Not Tested". After this step the two datasets were converted to transactional dataset.
D
M
A
N
We can define association rule mining as: Let S = {i1, i2 ... in} be a set of n attributes called items and a database D = {T1, T2 ... Tn} be a set of transactions. A unique transaction ID has been provided to each transaction and contains a subset of the items in S. A rule of the form A→B where A, B ⊆ S and A∩B = ∅ is defined. The sets of items A (or LHS) and B (or RHS) are called antecedent and consequent of the rule respectively. In this case, the set S would look like S = {Gave_ironpills=No, Gave_ironpills=Yes, Gave_fortified_baby_food=No, Gave_fortified_baby_food=Yes, Gave_fortified_baby_food=No, Gave_fortified_baby_food=Yes, hemoglobin_level_child=Less than 9.0g/dl … anemialevel_child=Moderate, anemialevel_child=Mild, anemialevel_child=Severe, anemialevel_child=Not Anemic}. And, dataset T would be a subset of S.
EP
TE
For drawing rules there are two transactional Datasets T 1 and T2. The former represents the one corresponding to Feeding practices of the child and latter to the mother-child relation. Frequent Itemset: It is defined as the set if items having a given minimum support. It is denoted by Fi for ith-item set. Candidate itemset: It is obtained by joining Fk-1 by itself. That is Cartesian product Fk-1 X Fk-1 and eliminating any k-1 size item set which is not frequent and denoted CIk. 1. Rule Generation Step 1 is the algorithm that is used for generating rules.
A
CC
for (i=1 and i=2) { Apriori (Ti, minimum_Support) { CIk: Candidate itemset of size k Fk: frequent itemset of size k F1 = {frequent items}; for (k = 1; Fk! =∅; k++) { CIk+1 = candidates generated from Fk; for each transaction t in database { Increment the count of all candidates in CIk+1 that are contained in T }
Fk+1 = candidates in CIk+1 with minimum_Support } return ∪k Fk; } } 2. Rule Selection Step 2 depicts the part of algorithm that is used in selecting certain rules out of all the rules generated.
SC RI PT
for T1 : Minimum Confidence- 0.7 (or 70%) Minimum Supportfor target variable = “Not anemic” – 0.002 = “Mild” – 0.00003 = “Moderate” – 0.005 = “Severe” – 0.00001 for T2 :
U
Minimum Confidence- 0.85 (or 85%) Minimum Supportfor target variable = “Not anemic” – 0.003 = “Moderate” – 0.02
A
N
The target variables were set individually because the frequency of two of them, that is Mild and Severe, anemia were too low to be considered by the system and thus the support for these variables has come out to be quite less.
M
4.6 Architecture of Developed Model
TE
D
The complete architecture of the proposed model has been concisely depicted in the Fig 8. The database architecture is divided in 3-tier system namely Data-Tier, Processing-Tier and User Tier. The Data-Tier illustrates the data stored in database. Processing-Tier depicts the middle-tier which involves the entire development of the data model. And, the last tier is the User-tier which shows how the model developed is viewed and used by the target audience and how they are benefitted from it. It is the closes level to the end user.
A
CC
EP
In this architecture, Processing Tier showcases the pre-processing phase, the development of the models using the two techniques and the final rules generation and selection. The data obtained from data tier is then passed for preprocessing which is shown in Fig. 7 in detail. After pre-processing, the resulting dataset is used for model development. It involves two different models, decision tree and association model, which follows very different model strategies. For decision tree development, the arguments of the rpart function are set to appropriate values and finally decision trees are obtained corresponding to feeding practices and mother-child relation to predict anemia in children. While, in case of association rules development the dataset after conversion to transactional dataset, is then used by apriori function to generate required rules. Finally, the rules are obtained after long process of filtration, which are then used by clinicians and parents to gain knowledge from it and take preventive measures.
SC RI PT U N A M D TE EP CC
Fig8. Architecture of the proposed Predictive Data Mining Model 5. COMPUTATIONAL RESULTS
A
In this section, we are focused on the shortcomings of the paper [11] which we have overcome by defining the constraints regarding the relation between anemia in mother and infant’s growth and development. These constraints were the factors determining mother’s health and diet during pregnancy. We developed a model that establish relation between mother and child’s anemic status using decision trees and association rules, taking mother’s nutritional status as a risk factor. In [11], the warranty for this relation was given; however, the authors were not able to clearly define them and left it for future works. We defined this relation by providing factors related to mother that matters in child’s health.
5.1 Results of Data Mining
M
A
N
U
SC RI PT
The number of rules inferred is 25 from the two Decision trees and 337 from the two association models. After evaluating, only 32 meaningful rules were selected from the association rules based upon constraints like support and confidence to predict prevalence of anemia in children. However, an imbalance in the classes of the dataset was encountered; it means that there is non-identical varying number of data points present in each class. This was seen as the number of data points in the class ‘Severe’ was very less as compared to the number of data points in class ‘Moderate’. These kinds of imbalanced datasets are difficult to handle using standard algorithms. Thus, the algorithm mentioned in paper [52] can be used for the purpose in future works and has not been used in the current paper. The approach used in this paper to balance the classes was by making some changes in the parameters of the algorithm used for the particular technique. More specifically, the dataset was magnified onto that specific class so as to produce meaningful rules for the same class. Some of the important rules have been explained here.
D
ACCURACY: 97.35% Fig9. Decision Tree for Predicting Anemia using only Hemoglobin levels
TE
The decision tree shown in Fig 8 is made by using the attribute Hemoglobin level of a child only in the control statements.
EP
5.1.1 Rules from Decision Trees
CC
Two decision trees were generated based upon selected attributes namely, infant feeding habits and mother-child relation. Path to any leaf node can be written in the form rules using IF-THEN statements [40]. A rule can be pruned by removing any condition in its antecedent that decreases the accuracy of the rule formed and does not improve its credibility [40]. Rules generated from both decision trees have been mentioned in the Tables 4 and 5 respectively where cover indicates the occurrence of the rule and accuracy represents the overall accuracy of the decision tree formed. Few rules have been explained below.
A
I.
Infant Feeding Habits 1) Rule 4: If (Hemoglobin_level_of_child>=9.95) AND (Hemoglobin_level_of_child>=10.95) AND (Hemoglobin_level_of_child< 11.35) AND (First_time_breastfed=Immediately, within one day) AND (Tinned_powered_or_freshmilk_given=Yes) AND (Gave_ironpills=Yes) THEN (Anemialevel_child = Not Anemic) (count = 41) Time when the child is first put to breast is one of the important factors for prevalence of Anemia in child. Supplementary items like tinned powdered fresh milk and iron pills also helps in reducing the risk for iron deficiency Anemia. It can be inferred from the rule generated here that if a child is put to breast immediately or within one day and if supplementary items like tinned powered fresh milk and
iron pills are given, he/she is less likely to be anemic. As discussed in [17] that there must be adequacy of iron in usual diets. Hence, either supplements can be given, or iron rich diet must be provided.
II.
SC RI PT
2) Rule 6: If (Hemoglobin_level_of_child>=9.95) AND (Hemoglobin_level_of_child< 10.95) AND (Hemoglobin_level_of_child< 10.25) AND (First_time_breastfed=Within one day) AND (Breastfeed_duration_months< 13.5) AND (Age_of_child< 0.5) (Gave_solid_semis_food_yesterday=Yes) AND (Frequency_solidfood< 2.5) THEN Anemialevel_child = Moderate) (count = 6) For children with haemoglobin level between 9.95 and 10.25, age also comes in play for determining risk level of anemia, as presumed from this rule; children below 6 months are prone to anemia. Also, frequency of semi-solid food given should be approximately above 2-3 times for risk prevention. Mother-Child Relation
A
CC
EP
TE
D
M
A
N
U
In Fig 10, decision tree formed for depicting mother-child relation for the risk of having anemia has been shown.
Fig10. Decision Tree depicting mother-child relation
1) Rules relevant to Iron Pills during Pregnancy and Anemia Level of mother
TE
D
M
A
N
U
SC RI PT
Rule 5: IF (AnemiaLevel_of_Mother=Severe, Moderate, Mild) AND (HemoglobinLevel_of_Mother>=10.25) AND (AnemiaLevel_of_Mother=Mild) AND (HemoglobinLevel_of_Mother< 11.45) AND (Took_IronPills_during_pregnancy=No) THEN (AnemiaLevel_of_child = “Moderate”) The fact that mother took Iron pills during pregnancy or not, plays an important role in predicting the prevalence of Anemia in her child. The above rule states that, for mothers with hemoglobin level between 10.25 and 11.45 and having mild level of Anemia; if she has not taken supplementary iron pills then her child is more likely to be ‘Moderately Anemic’. Also, mother having Mild anemia can result into the child’s suffering too. 2) Rule 2: IF (AnemiaLevel_of_Mother=Severe, Moderate, Mild) AND (HemoglobinLevel_of_Mother>=10.25) AND (AnemiaLevel_of_Mother=Mild) AND (HemoglobinLevel_of_Mother< 11.45) AND (Took_IronPills_during_pregnancy=Yes) THEN (AnemiaLevel_of_child = “Not Anemic”) The above rule confirms the conclusion drawn from 1 st rule. Above rule states that given the hemoglobin range of mother between 10.25 and 11.45, if she has taken iron pills during her pregnancy, it is observed that her child is ‘Not Anemic’. Hence, if the two rules are compared then intake of iron pills by mother plays an important role in a child’s anemia status as in this they were taken, and the child is not anemic. 3) Rules relevant to Anemia level of Mother Rule 7: IF (AnemiaLevel_of_Mother=Severe, Moderate, Mild) AND (HemoglobinLevel_of_Mother>=10.25) THEN (AnemiaLevel_of_child = “Moderate”) This rule says that if Anemia Level of mother is ‘Severe’, ‘Moderate’ or ‘Mild’, then there are high chances that her child would also be Anemic. Therefore, when mother is suffering from any kind of anemia, then the child has high chances of falling prey to the disease thus, mother’s health condition has a great impact.
5.1.2 Rules from Apriori Algorithm
CC
EP
The association rules are selected based upon their values of support, confidence and lift. Usually, the higher values of all these three measures are preferred and strong rules are considered to be the ones having confidence greater than at least the value of 0.7 [40]. Unnecessary rules formed were deleted that did not provide any improvement in the system’s accuracy and reliability. Tables 6 and 7 shows the final rules selected from the apriori algorithm where the constraint values support, confidence, lift and count for each rule has been listed alongside. Few rules from each category have been explained below.
A
I.
Infant Feeding Practices Rule 22: If (hemoglobin_level_child=Less than 9.0g/dl) AND (breastfeed_duration=From 6 months up to 1 year) AND (First_time_breastfed=Within one week) AND (tinned_powered_or_freshmilk_given=Yes) AND (Gave_baby_formula=NO) AND (Gave_fortified_baby_food=No) AND (Gave_solid_semis_soft_food_yesterday=No) THEN (anemialevel_child=Severe) This rule describes the cases where severe anemia takes place (haemoglobin level less than 9.0 g/dl). It can be reckoned that when the child is put to breast within one week (generally it is prescribed to be one day) and less breast feed duration may lead to severe impacts. Therefore, it is suggested that if for some reasons child is unable to get breastfeeding then proper supplementary diet should be given.
Mother-Child Relation 1) Rules relevant to Anemia level and age of mother Rule 3: IF (under18=Above 18) AND (hemoglobin_level_mother= from 9.0g/dl to 14.0g/dl) AND (anemia_level_mother=Not Anemic) AND (hemoglobin_level_child=from 14.0g/dl to 18.0g/dl) THEN (Anemialevel_child= “Not Anemic”) When mother is above 18, child is usually healthy born as compared to mother who is below 18 at the time of childbirth. Also, anemic level of mother is a strong determining factor about the child’s anemic level. 6. PREDICTED OUTCOMES
SC RI PT
II.
In this section, we are discussing the outcomes, which have been predicted using the results of the proposed model. Suggestions are made for the clinicians and parents to follow in order to prevent the spreading of anemia. Each of the corollaries listed below represents the preventive measures that should be undertaken, so as to reduce the risk of anemia in children: Corollary 1: A child should be breastfed on an average for 3 years.
N
U
Children who are breastfed for less than a year usually are at a high risk of suffering from anemia and those who are on breastfeeding for about 2 years suffer from mild anemia. A child is generally healthy and not anemic if put on for 3 or more years with a very low risk of suffering from anemia.
A
Corollary 2: A child should be put to breast within one day after birth.
M
When a child is put to breast immediately after birth or within one day and is continued with breast milk he/she grows up to be healthy and less susceptible to anemia. More the delay in putting the child to breast, higher is the chances for the child to be afflicted by the disease.
D
Corollary 3: Iron supplements are needed to be provided to children with hemoglobin level less than 9 g/dl.
TE
After the age of 6 months, the hemoglobin level of a child drops to low values and become more vulnerable to iron deficiency and anemia, thus iron supplements become a requirement for the children. It is required especially in the cases where children have hemoglobin levels less than 9g/dl.
EP
Corollary 4: After 6 months of breastfeeding, a child should be given fortified food, baby formula and fresh milk.
CC
During first 6 months, a child requires only breast milk. However, once that time period has passed, he/she should be provided with additional supplements like fortified food, baby formula and tinned, powdered or fresh milk for proper nourishment and to avoid risks of anemia. Corollary 5: Mother must not be under the age of 18 at the time of birth.
A
If the age of mother is more than 18 at the time of birth, age is no longer the important factor to ensure if the child born will be healthy, not anemic or at risk of anemia as other significant factors come into play. However, if the mother is below 18 then the chances of having anemia becomes quite high for the child. Corollary 6: Mother must intake iron pills during pregnancy. Diet taken during pregnancy plays a very important role in the well-being of the child to be born. A mother who is negligent during this time usually births children with low immunity. For a child to not have anemia since birth or in later years, the mother must ensure to intake iron pills and supplements during her pregnancy time. If iron is taken during this time, the mother usually bears a child with high hemoglobin level and low risk of anemia.
In order to enforce these measures and ensure health of children some preventive steps must be taken by government, clinicians and parents. Government need to spread proper awareness by organizing several campaigns all over the country, and especially in more prone areas. Though the legal age for a woman to get married in India is 18, but in rural areas, it’s very common for a woman to give birth before the age of 18. Hence, there must be special awareness camps in rural areas to spread awareness about family planning, risks of anemia, etc.
SC RI PT
7. CONCLUSION
This study uses data mining techniques for extracting professional knowledge in the healthcare sector specifically blood related disease anemia. Data mining is a technology that dissolves the concept of traditional data analysis, which proved to be quite cumbersome in case of large data sets and sometimes even when data sets were small, thus new methods needed to be developed. This study is an improvement over the traditional systems wherein rules were made using the advice of medical experts whose need can now be dropped due to artificial intelligence tools like data mining techniques. We have used decision tree and association rule mining to analyze the dataset and infer results to develop a predictive model that predicts risk of anemia in children below the age of 5 years.
A
N
U
Anemia level of a child mainly depends upon the hemoglobin level (measured in g/dl) and high levels indicate healthy child while lower levels are the indication of presence of anemia. Among the feeding practices of an infant found in this study, duration of breastfeeding and intake of iron pills are the most important indices for predicting anemia. Mother’s anemia level and iron pills intake during pregnancy plays an important role in impacting the child’s anemia level.
CC
EP
TE
D
M
Two decision trees generated, with hemoglobin level only and infant feeding habits have an accuracy of more than 97% which is generally not favorable because of unrealistically high value. Also, the third decision tree made about the mother-child relation had an accuracy of about 44% which is quite low for it to be trusted. From the results, we can gather that the decision trees generated do not provide us with accurate and reliable results. Thus, this technique did not provide efficient results when applied to the particular dataset with the purpose to predict anemia in children. However, if the constraints measures of the association rules are taken into consideration, it can be seen that every rule has minimum confidence of 0.7 and high value of lift as well, which is highly desirable. Though the support was low for many observations (when the target variable was Severe anemia), but overall, meaningful rules were generated which can be used for the analysis. The number of children suffering from severe anemia is quite less in comparison to other thus explaining the low support value for the rules formed in this case. The decision support system developed in this paper supported the association rules more than the decision trees even though many models are greatly supported by the latter. This also shows that choosing appropriate technique for a specific task is very important and any dedicated technique cannot be idealized to be better as it depends upon the type of data and the purpose of the work chosen.
A
The future studies should consider improving the accuracy of decision tree methods. Another important method that can be used in further work is ensemble method to integrate different models into one for better predictive outcomes. Also, factors like hygiene, standard of living as basis to predict anemia in children as well as women should also be considered in future work. Models can be improved using more advanced concepts like fuzzy logic. Finally, we conclude this research will help the clinicians and parents of newborns to take required preventive measures to reduce the risk of anemia and thus the rate of this largely widespread disease nationwide. As most of the times presence of disease is largely based on the diet consumed and other several factors (like results of some other clinical tests). Especially with respect to the diseases which are generally found in children and affect their health more greatly. It will be always useful to find interesting patterns and rules using our approach for other diseases.
REFERENCES
[9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
CC
[19]
SC RI PT
[8]
U
[7]
N
[6]
A
[5]
M
[4]
D
[3]
TE
[2]
Rodak BF (2007). Hematology : clinical principles and applications (3rd ed.). Philadelphia: Saunders. p. 220. ISBN 978-1-4160-3006-5. Archived from the original on 2016-04-25. Janz TG, Johnson RL, Rubenstein SD (November 2013). "Anemia in the emergency department: evaluation and treatment". Emergency Medicine Practice. 15 (11): 1–15; quiz 15–6. PMID 24716235. Archived from the original on 2016-10-18. GBD 2015 Disease and Injury Incidence and Prevalence, Collaborators. (8 October 2016). "Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015". Lancet. 388 (10053): 1545–1602. GBD 2015 Mortality and Causes of Death, Collaborators. (8 October 2016). "Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980-2015: a systematic analysis for the Global Burden of Disease Study 2015". Lancet. 388 (10053): 1459–1544. PMID 27733281. Einollahi, Behzad (5 September 2014). "Restless Leg Syndrome: A Neglected Diagnosis". Nephro-Urology Monthly. doi:10.5812/numonthly.22009. PMC 4318015 . Kansagara D, Dyer E, Englander H, Fu R, Freeman M, Kagen D (December 2013). "Treatment of anemia in patients with heart disease: a systematic review". Annals of Internal Medicine. 159 (11): 746– 757. doi:10.7326/0003-4819-159-11-201312030-00007. PMID 24297191. Aapro MS, Link H (2008). "September 2007 update on EORTC guidelines and anemia management with erythropoiesis-stimulating agents". The Oncologist. 13 Suppl 3 (Supplement 3): 33– 6. doi:10.1634/theoncologist.13-S3-33. PMID 18458123. National Family Health Survey - 4 (2015 -16). India Fact Sheet. International Institute for Population Sciences.http://rchiips.org/nfhs/pdf/NFHS4/India.pdf. WHO/UNICEF/USAID (2003) Anemia prevention and control: what works. WHO. https://dhsprogram.com/Data/. Beard J.L., Hendricks M.K., Perez E.M., Murray-Kolb L.E. et al. Maternal Iron Deficiency Anemia Affects Postpartum Emotions and Cognition American Society for Nutritional Sciences J. Nutr. 135: 267–272, (2005). "What Is Iron-Deficiency Anemia? - NHLBI, NIH". www.nhlbi.nih.gov. 26 March 2014. Archived from the original on 16 July 2017. Retrieved 17 July 2017. Porwit, Anna; McCullough, Jeffrey; Erber, Wendy (2011). Blood and Bone Marrow Pathology. pp. 173– 195. ISBN 9780702031472. Iron Deficiency Anemia,Medically reviewed by Shuvani Sanyal, MD on July 17, 2017 — Written by Jacquelyn Cafasso and Rachel Nall on October 15, 2015. Howard, Martin; Hamilton, Peter (2013). Haematology: An Illustrated Colour Text. pp. 24–25. ISBN 978-07020-5139-5. Goldman, Lee; Schafer, Andrew (2016). Goldman-Cecil Medicine. pp. 1052–1059, 1068–1073, 2159– 2164. ISBN 978-1-4557-5017-7. World Health Organisation (WHO). Iron Deficiency Anemia. Assessment, Prevention and Control. A guide for programme managers. WHO/NHD/01.3; 2001. Evaluation of Anemia in Children. Copyright © 2010 by the American Academy of Family Physicians. This content is owned by the AAFP. Adapted with permission from Robertson J, Shilkofski N, eds. The Harriet Lane Handbook. 17th ed. Philadelphia, Pa.: Mosby; 2005:337. Tayal.D, Meena.K, Gupta.V, Fatima.A & Vij.S;. Analysis of Various Data Mining Techniques for Blood Related Disease Anemia along with Fuzzy Logic and Machine Learning, Accepted for presentation in INDIACom - 2018; Computing for Nation Development, technically sponsored by IEEE Delhi Section, scheduled to be held during 14th - 16th March, 2018 at Bharati Vidyapeeth, New Delhi (INDIA). Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar (2012) Foundations of Machine Learning, The MIT Press ISBN 9780262018258. Hastie T., Tibshirani R., Friedman J. (2009) Unsupervised Learning. In: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York, NY. Parvin, H., Minaei-Bidgoli, B., Alinejad-Rokny, H., & Punch, W. F. (2013). Data weighing mechanisms for clustering ensembles. Computers and Electrical Engineering, 39(5), 1433-1450. DOI: 10.1016/j.compeleceng.2013.02.004. Minaei-Bidgoli, B., Parvin, H., Alinejad-Rokny, H. et al. Artif Intell Rev (2014) 41: 27. https://doi.org/10.1007/s10462-011-9295-x Rokach, L. Artif Intell Rev (2010) Ensemble-based classifiers 33: 1. https://doi.org/10.1007/s10462-009-91247. Hamid Parvin, Hamid Alinejad-Rokny, Behrouz Minaei-Bidgoli & Sajad Parvin (2013) A new classifier ensemble methodology based on subspace learning, Journal of Experimental & Theoretical Artificial Intelligence, 25:2, 227-250, DOI: 10.1080/0952813X.2012.715683. Pravin H, MirnabiBaboli M, Alinejad-Rokny H (2015). Proposing a classifier ensemble framework based on classifier selection and decision tree. Engineering Applications of Artificial Intelligence, 37, 34-42.
EP
[1]
[20] [21]
A
[22]
[23] [24] [25]
[26]
[35] [36]
[37] [38] [39] [40] [41]
[42] [43]
[44]
[45]
CC
[46]
SC RI PT
[34]
U
[33]
N
[32]
A
[31]
M
[30]
D
[29]
TE
[28]
Janz TG, Johnson RL, Rubenstein SD (November 2013). "Anemia in the emergency department: evaluation and treatment". Emergency Medicine Practice. 15 (11): 1–15; quiz 15–6. PMID 24716235. Archived from the original on 2016-10-18. Deng,H.; Runger, G.; Tuv, E. (2011). Bias of importance measures for multi-valued attributes and solutions. Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN). D.Thangamani P.Sudha, Identification of Malnutrition with Use of Supervised Datamining Techniques – Decision Trees and Artificial Neural Networks International Journal of Engineering and Computer Science ISSN: 2319-7242 Volume - 3 Issues -9 September, (2014) Page No. 8236-8241. Jahangiri M, Khodadi E, Rahim F, Saki N, Saki Malehi A. Decision-tree-based methods for differential diagnosis of β-thalassemia trait from iron deficiency anemia. Expert Systems (2017); 34:e12201. Piatetsky-Shapiro, Gregory (1991), Discovery, analysis, and presentation of strong rules, in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA. K.S. Laxmi, Kumar G.S. Association Rule Extraction from Medical Transcripts of Diabetic Patients. 97814799-2259-14/$31.00©2014. Agrawal, R.; Imieliński, T.; Swami, A. (1993). "Mining association rules between sets of items in large databases". Proceedings of the 1993 ACM SIGMOD international conference on Management of data SIGMOD '93. p. 207. doi:10.1145/170035.170072. ISBN 0897915925. Zaki, M. J. (2000). "Scalable algorithms for association mining". IEEE Transactions on Knowledge and Data Engineering. 12 (3): 372–390. doi:10.1109/69.846291. Han (2000). "Mining Frequent Patterns Without Candidate Generation". Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. SIGMOD '00: 1–12. doi:10.1145/342009.335372. S. Vijayarani and S. Sudha An Efficient Clustering Algorithm for Predicting Diseases from Hemogram Blood Test Samples Indian Journal of Science and Technology, Vol 8(17), DOI: 10.17485/ijst/2015/v8i17/52123, August (2015). Dey, Raheem Multilevel multinomial logistic regression model for identifying factors associated with anemia in children 6–59 months in northeastern states of India Cogent Mathematics (2016), 3: 1159798. Azarkhish I. et al. Artificial Intelligence Models for Predicting Iron Deficiency Anemia and Iron Serum Level Based on Accessible Laboratory Data. J Med Syst (2012) 36:2057–2061. Akrimi J.A., Ahmad A.R., George L.E. Review of Machine Learning Techniques in Anemia Recognition International Journal of Science and Research (IJSR), India March (2013) Online ISSN: 2319-7064. J.-Y. Yeh et al. / Decision Support Systems 50 (2011) 439–448. Maitreya Maity, Prabir Sarkar, Chandan Chakraborty. Computer-Assisted Approach to Anemic Erythrocyte Classification Using Blood Pathological Information (2012) Third International Conference on Emerging Applications of Information Technology (EAIT). K.S. Laxmi, Kumar G.S. Association Rule Extraction from Medical Transcripts of Diabetic Patients. 97814799-2259-14/$31.00©2014. Ahmed F, Sitara A. Exploration of Co-Relation between Depression and Anemia in Pregnant Women using Knowledge Discovery and Data Mining Algorithms and Tools , Journal of Selected Areas in Health Informatics (JSHI), September Edition, (2012). Sant-Rayn P, Black J, Muthayya S, Shet A, et al. Determinants of Anemia Among Young Children in Rural India. American Academy of Pediatrics Print ISSN 0031-4005 Online ISSN 1098-4275 (2010) doi: 10.1542/peds.2009-3108. Goswami S, Das KK, Socio-economic and demographic determinants of childhood anemia. J Pediatr (rio J) (2015); 91:471---77. Minaei-Bidgoli B., Asadi M., Parvin H. (2011) An Ensemble Based Approach for Feature Selection. In: Iliadis L., Jayne C. (eds) Engineering Applications of Neural Networks. EANN 2011, AIAI 2011. IFIP Advances in Information and Communication Technology, vol 363. Springer, Berlin, Heidelberg. Parvin, Hamid & Helmi, Hala & Minaei, Behrouz & Alinejad-Rokny, Hamid & Shirgahi, Hossein. (2011). 2.12. Linkage Learning Based on Differences in Local Optimums of Building Blocks with One Optima. International journal of physical sciences. 10.5897/IJPS11.798. http://stn.spotfire.com/spotfire_client_help/bin/bin_what_is_binning.htm Ahmad, Amir & Dey, Lipika. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering. 63. 503-527. 10.1016/j.datak.2007.03.016. Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-04841-8. Rokach, L.; Maimon, O. (2005). "Top-down induction of decision trees classifiers-a survey". IEEE Transactions on Systems, Man, and Cybernetics, Part C. 35 (4): 476–487. doi:10.1109/TSMCC.2004.843247. Parvin, Hamid; Minaei-Bidgoli, Behrouz; Alinejad-Rokny, Hamid (2013). A new imbalanced learning and dictions tree method for breast cancer diagnosis. Journal of Bionanoscience, 7(6), 673-678.
EP
[27]
[47]
A
[48] [49] [50] [51] [52]
SC RI PT
AUTHOR’S BIO
A
N
U
First Author [Kanak Meena] graduated in Computer Science and Engineering (Bachelor) followed by a Master of Engineering degree in Information Management and Security. She is Pursuing Ph.D. at the Indira Gandhi Delhi Technical University, For Women, Delhi, India. Her research interests include Data Mining, Artificial Intelligence, Predictive Analysis, Big Data and its Management. She has her research paper published in various domains ranging from data mining techniques in various fields (Healthcare) to the various domains of big data. She has 15 research papers published in reputed journals and conferences.
EP
TE
D
M
Second Author [D.K Tayal] is professor at the Indira Gandhi Delhi Technical University, For Women. He completed his Ph.D. from JNU, Delhi, India. His total teaching experience is more than 17 years and is focused on databases and data management. In his research, he explores database applications for innovative information systems in the area of big data management. He has more than 50 research papers published in reputed international journal and conferences.
A
CC
Third Author [Vaidehi Gupta] is a second-year undergraduate, currently pursuing Bachelor of Technology (B.Tech.) degree in Computer Science and Engineering from Indira Gandhi Delhi Technical University for Women, Delhi, India. Her first research work focused on “Analysis of Various Data Mining Techniques for Blood Related Disease Anaemia along with Fuzzy Logic and Machine Learning”. It has been accepted by Computing for Nation Development, to be published by IEEE Xplore in March 2018. Her primary research interests are in the area of Machine Learning, Data Mining and Fuzzy Logic.
Fourth Author [Aiman Fatima] is a second-year student at Indira Gandhi Delhi Technical University for Women (IGDTUW) pursuing Computer Science and Engineering. Her first research work focused on “Analysis of Various Data Mining Techniques for Blood Related Disease Anaemia along with Fuzzy Logic and
A
CC
EP
TE
D
M
A
N
U
SC RI PT
Machine Learning”. It has been accepted by INDIACom - 2018; Computing for Nation Development, to be published by IEEE Xplore in March 2018. Her primary research interests are in the areas of Data Science with Data Mining and Machine Learning, Predictive Analytics and Fuzzy Logic.
AGE SPECIFIC NORMATIVE RED BLOOD CELLS VALUE HEMOGLOBIN (G PER DL) MEAN 2 SDS BELOW MEAN
1 to 3 days
18.5
2 weeks
16.6
1 month
13.9
2 months
11.2
6 months
12.6
6 months to 2 years
12.0
2 years to 6 years
12.5
14.5
SC RI PT
AGE
13.4 10.7 9.4
11.1 10.5 11.5
A
CC
EP
TE
D
M
A
N
U
Table1. Evaluation of Anemia in Children [18]
No. of Features
Selected Features
Type
Categories/Range
Note
Feeding Practices
9
Breastfeed_duration_months
Numeric
0-60 months
Observations with > 60 months were considered outlier and hence removed
First_time_breastfed
Categorical
Gave_zinc Gave_ironpills Tinned_powered_or_freshmilk_given Gave_baby_formula Gave_fortified_baby_food Gave_solid_semis_food_yesterday Frequency_solidfood
Categorical Categorical Categorical Categorical Categorical Categorical Categorical
Immediately, within one day, within one week, within one month, within one month, missing Yes, No, Don’t know Yes, No, Don’t know Yes, No, Don’t know Yes, No, Don’t know Yes, No, Don’t know Yes, No, Don’t know None, 1 time, 2 times, 3 times, 4 times, 5 times, 6 times, 7+ times, Don't know
iron_during_pregnancy (mother took iron nutrition during pregnancy)
Categorical
under18 (if mother is underage or not) hemoglobin_level_mother Anemia_level_mother
Categorical Numerical Categorical
Yes, No, Don’t know
U
N
4
A
Motherchild relation
SC RI PT
Group
Yes, No, Don’t know 2.0 g/l to 22.6 g/l Severe, Moderate, Mild, Not Anemic
A
CC
EP
TE
D
M
Table 2 Details of Features selected
S. No.
Dataset
Total Observations
1 2
Before Preprocessing After Preprocessing
259627 9265
Training
Testing NIL
4848
4417
Table3. Description of Dataset Used for Decision Tree
S. No.
Rule
Count
3
Not Anemic
206
Not Anemic
72
Not Anemic
72
=>
Not Anemic
41
=>
Mild
97
=>
Mild
7
=>
Moderate
6
=>
=>
CC
EP
6
TE
D
5
M
A
4
=>
U
2
Hemoglobin_level_of_child>=9.95 Hemoglobin_level_of_child>=10.95 Hemoglobin_level_of_child< 11.35 First_time_breastfed=Immediately, Within one day Tinned_powered_or_freshmilk_given=No Hemoglobin_level_of_child>=9.95 Hemoglobin_level_of_child>=10.95 Hemoglobin_level_of_child< 11.35 First_time_breastfed=Immediately, Within one day Tinned_powered_or_freshmilk_given=No Hemoglobin_level_of_child>=9.95 Hemoglobin_level_of_child>=10.95 Hemoglobin_level_of_child< 11.35 First_time_breastfed=Within one week, Within one month Hemoglobin_level_of_child>=9.95 Hemoglobin_level_of_child>=10.95 Hemoglobin_level_of_child< 11.35 First_time_breastfed=Immediately, Within one day Tinned_powered_or_freshmilk_given=Yes Gave_ironpills=Yes Hemoglobin_level_of_child>=9.95 Hemoglobin_level_of_child< 10.95 Hemoglobin_level_of_child< 10.25 First_time_breastfed=Within one day Breastfeed_duration_months< 13.5 Age_of_child>=0.5 Hemoglobin_level_of_child>=9.95 Hemoglobin_level_of_child>=10.95 Hemoglobin_level_of_child< 11.35 First_time_breastfed=Immediately, Within one day Tinned_powered_or_freshmilk_given=Yes Gave_ironpills=No Gave_solid_semis_food_yesterday=No First_time_breastfed=Immediately Hemoglobin_level_of_child>=9.95 Hemoglobin_level_of_child< 10.95 Hemoglobin_level_of_child< 10.25 First_time_breastfed=Within one day Breastfeed_duration_months< 13.5 Age_of_child< 0.5 Gave_solid_semis_food_yesterday=Yes Frequency_solidfood< 2.5
N
1
SC RI PT
Predictive Class (Anemic Level of Child)
A
7
ACCURACY: 97.65% Table 5 Rules from DT of Infant Feeding Practices
S. No.
Rule
Count Predictive Class (Anemic Level of Child)
5
45732
Not Anemic
18175
=>
Moderate
18437
Not Anemic
12283
Moderate
5406
=>
Moderate
338
=>
Moderate
296
=>
=>
7
M
A
6
Not Anemic
=>
SC RI PT
4
=>
U
3
AnemiaLevel_of_Mother=Not Anemic AnemiaLevel_of_Mother=Severe, Moderate, Mild HemoglobinLevel_of_Mother>=10.25 AnemiaLevel_of_Mother=Mild HemoglobinLevel_of_Mother< 11.45 Took_IronPills_during_pregnancy=Yes AnemiaLevel_of_Mother=Severe, Moderate, Mild HemoglobinLevel_of_Mother< 10.25 AnemiaLevel_of_Mother=Severe, Moderate, Mild HemoglobinLevel_of_Mother>=10.25 AnemiaLevel_of_Mother=Mild HemoglobinLevel_of_Mother>=11.45 HemoglobinLevel_of_Mother< 12.35 AnemiaLevel_of_Mother=Severe, Moderate, Mild HemoglobinLevel_of_Mother>=10.25 AnemiaLevel_of_Mother=Mild HemoglobinLevel_of_Mother< 11.45 Took_IronPills_during_pregnancy=No AnemiaLevel_of_Mother=Severe, Moderate, Mild HemoglobinLevel_of_Mother>=10.25 AnemiaLevel_of_Mother=Mild HemoglobinLevel_of_Mother>=11.45 HemoglobinLevel_of_Mother>=12.35 AnemiaLevel_of_Mother=Severe, Moderate, Mild HemoglobinLevel_of_Mother>=10.25
N
1 2
D
ACCURACY: 44.21465%
A
CC
EP
TE
Table 5 Rules from DT of Mother-Child relation
6
7 8
9
10
11
CC
12
A
13
14
15 16
Count
=>
anemialevel_child=Not Anemic
0.00262684
1.0000000
2.9175 51
682
=>
anemialevel_child=Not Anemic
0.00302356
0.9987277
2.9138 39
785
=>
anemialevel_child=Mod erate
0.01863442
0.9007634
3.8901 22
4838
=>
anemialevel_child=Mod erate
0.01543753
0.8998653
3.8862 44
4008
=>
anemialevel_child=Mod erate
0.04323895
0.8967170
3.8726 47
11226
=>
anemialevel_child=Mod erate
0.8958891
3.8690 72
14732
=>
anemialevel_child=Mod erate anemialevel_child=Mod erate
0.07575868
0.8861107
19669
0.00507262
0.8762475
3.8268 42 3.7842 46
anemialevel_child=Mod erate
0.00556952
0.8731884
3.7710 35
1446
=>
anemialevel_child=Mod erate
0.01180154
0.8719408
3.7656 47
3064
=>
anemialevel_child=Mod erate
0.03612105
0.8969871
3.8738 14
9378
=>
anemialevel_child=Mod erate
0.03493473 3
0.8972203
3.8748 21
9070
=>
anemialevel_child=Mod erate
0.04686724
0.8882400
3.8360 38
12168
=>
anemialevel_child=Mod erate
0.02394589
0.8885237
3.8372 63
6217
=>
anemialevel_child=Mod erate anemialevel_child=Mod erate
0.06017094
3.8315 91 3.8369 49
15622
=>
=>
=>
SC RI PT
Lift
U
5
Confidence
0.05674294
N
4
Support
D
3
TE
2
hemoglobin_level_child=From 14.0g/dl to 18.0g/dl, breastfeed_duration=More than 3 years hemoglobin_level_child=From 14.0g/dl to 18.0g/dl, First_time_breastfed=Immediately hemoglobin_level_child=Less than 9.0g/dl, First_time_breastfed= Within one day, Gave_ironpills=No, breastfeed_duration=More than 3 years hemoglobin_level_child=Less than 9.0g/dl, First_time_breastfed=Immediately, Gave_ironpills=No, breastfeed_duration=More than 3 years hemoglobin_level_child=Less than 9.0g/dl, Gave_ironpills=No, breastfeed_duration=More than 3 years hemoglobin_level_child = Less than 9.0g/dl, breastfeed_duration = More than 3 years hemoglobin_level_child=Less than 9.0g/dl, Gave_ironpills=No hemoglobin_level_child=Less than 9.0g/dl, Gave_zinc=No, Gave_ironpills=No, breastfeed_duration=More than 3 years hemoglobin_level_child=Less than 9.0g/dl, breastfeed_duration=From 6 months up to 1 year hemoglobin_level_child=Less than 9.0g/dl, First_time_breastfed=Within one week, Gave_ironpills=No hemoglobin_level_child=Less than 9.0g/dl, tinned_powered_or_freshmilk_given= NO, Gave_fortified_baby_food=No hemoglobin_level_child=Less than 9.0g/dl, tinned_powered_or_freshmilk_given= NO, Gave_baby_formula=NO, Gave_fortified_baby_food=No hemoglobin_level_child=Less than 9.0g/dl, Gave_ironpills=No, Gave_baby_formula=No hemoglobin_level_child=Less than 9.0g/dl, Gave_ironpills=No, Gave_fortified_baby_food=No, Gave_solid_semis_soft_food_yesterday =Yes hemoglobin_level_child=Less than 9.0g/dl, Gave_baby_formula=NO hemoglobin_level_child=Less than 9.0g/dl, Gave_ironpills=No,
EP
1
Rules RHS
A
LHS
M
S. No.
0.02257855
0.8872104 0.8884510
1317
5862
0.9981618
=>
anemialevel_child=Mod erate
0.03251588
0.8904124
=>
anemialevel_child=Mild
3.851680e05
3.081344e05
=>
anemialevel_child=Mild
1579
3.8454 20
8442
0.7142857
3.2532 43
10
0.7272727
3.3123 93
8
=>
anemialevel_child=Seve re
1.925840e05
0.7142857
54.769 01
5
TE
D
22
0.00209146
=>
2.9138 60 2.9121 88
CC
EP
Table 6 Rules from Association Technique for Feeding Practices of Infants
A
543
SC RI PT
21
0.9987350
U
20
0.00608180
N
19
anemialevel_child=Not Anemic anemialevel_child=Not Anemic
A
18
=>
M
17
Gave_baby_formula=NO, Gave_fortified_baby_food=No, Gave_solid_semis_soft_food_yesterday =Yes hemoglobin_level_child=From 14.0g/dl to 18.0g/dl hemoglobin_level_child=From 14.0g/dl to 18.0g/dl, First_time_breastfed=Within one day hemoglobin_level_child=Less than 9.0g/dl, Gave_baby_formula=NO, Gave_solid_semis_soft_food_yesterday =Yes hemoglobin_level_child=From 9.0g/dl to 14.0g/dl, First_time_breastfed=Within one week, Gave_zinc=No, Gave_ironpills=Yes, tinned_powered_or_freshmilk_given= NO, Gave_baby_formula=NO, Gave_fortified_baby_food=No, Gave_solid_semis_soft_food_yesterday =No hemoglobin_level_child=From 9.0g/dl to 14.0g/dl, breastfeed_duration=Less than 6 months, First_time_breastfed=Within one week, Gave_ironpills=Yes, tinned_powered_or_freshmilk_given= Yes, Gave_solid_semis_soft_food_yesterday =No} hemoglobin_level_child=Less than 9.0g/dl, breastfeed_duration=From 6 months up to 1 year, First_time_breastfed=Within one week, tinned_powered_or_freshmilk_given= Yes, Gave_baby_formula=NO, Gave_fortified_baby_food=No, Gave_solid_semis_soft_food_yesterday =No
S. No.
Rules RHS
LHS
6
anemialevel_child=Not Anemic
0.004217589
1.0000000
2.917551
1095
=>
anemialevel_child=Not Anemic
0.003035123
1.0000000
2.917551
788
=>
anemialevel_child=Not Anemic
0.003027420
1.0000000
2.917551
786
=>
anemialevel_child=Not Anemic
0.003293186
0.9988318
2.914142
855
=>
anemialevel_child=Not Anemic
0.003273928
0.9988249
2.914122
850
=>
anemialevel_child=Moderate
0.024696969
0.9041173
3.904607
6412
=>
anemialevel_child=Moderate
0.006351420
0.8937669
3.859907
1649
CC
9
A
10
=>
anemialevel_child=Moderate
0.028228959
0.8936715
3.859495
7329
=>
anemialevel_child=Moderate
0.008593097
0.8884906
3.837120
2231
=>
anemialevel_child=Moderate
0.020175097
0.8822638
3.810228
5238
EP
8
TE
D
7
=>
SC RI PT
5
Count
U
4
Lift
N
3
Confidence
A
2
anemia_level_mother=Not Anemic, hemoglobin_level_child=From 14.0g/dl to 18.0g/dl hemoglobin_level_mother=From 9.0g/dl to 14.0g/dl, anemia_level_mother=Not Anemic, hemoglobin_level_child=From 14.0g/dl to 18.0g/dl under18=Above 18, hemoglobin_level_mother=From 9.0g/dl to 14.0g/dl, anemia_level_mother=Not Anemic, hemoglobin_level_child=From 14.0g/dl to 18.0g/dl Took_iron_during_pregnancy=Yes, hemoglobin_level_child=From 14.0g/dl to 18.0g/dl under18=Above 18, Took_iron_during_pregnancy=Yes, hemoglobin_level_child=From 14.0g/dl to 18.0g/dl hemoglobin_level_mother=From 9.0g/dl to 14.0g/dl, anemia_level_mother=Mild, Took_iron_during_pregnancy=Yes, hemoglobin_level_child=Less than 9.0g/dl under18=Above 18, anemia_level_mother=Not Anemic, Took_iron_during_pregnancy=No, hemoglobin_level_child=Less than 9.0g/dl under18=Above 18, hemoglobin_level_mother=From 9.0g/dl to 14.0g/dl, anemia_level_mother=Not Anemic, hemoglobin_level_child=Less than 9.0g/dl under18=Above 18, anemia_level_mother=Mild, Took_iron_during_pregnancy=No, hemoglobin_level_child=Less than 9.0g/dl Took_iron_during_pregnancy=No, hemoglobin_level_child=Less than 9.0g/dl
M
1
Support
Table 7 Rules from Association Technique for Mother-child relation.