Improvised prophecy using regularization method of machine learning algorithms on medical data

Improvised prophecy using regularization method of machine learning algorithms on medical data

Personalized Medicine Universe xxx (2015) 1e9 Contents lists available at ScienceDirect Personalized Medicine Universe journal homepage: www.elsevie...

2MB Sizes 0 Downloads 12 Views

Personalized Medicine Universe xxx (2015) 1e9

Contents lists available at ScienceDirect

Personalized Medicine Universe journal homepage: www.elsevier.com/locate/pmu

Original article

Improvised prophecy using regularization method of machine learning algorithms on medical data Vadamodula Prasad a, *, T. Srinivasa Rao Dr b, P.V.G.D. Prasad Reddy Prof c a

Dept of Computer Science & Engg, Raghu Institute of Technology, AP, India Dept of Computer Science & Engg, Gitam Institute of Technology, Gitam University, AP, India c Dept of Computer Science & Systems Engg, AUCOE (A), Andhra University, AP, India b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 17 July 2015 Received in revised form 29 August 2015 Accepted 11 September 2015

Patients with thyroid disease (TD) boast continuously increasing because of excessive growth of thyroid gland and its hormones. Automatic classification tools may reduce the burden on doctors. This paper evaluates the selected algorithms for predicting thyroid disease diagnoses (TDD). The algorithms considered here are regularization methods (RM) of machine learning algorithms (MLA). The analysis report generated by the proposed work suggests the best algorithm for predicting the exact levels of TDD. This work is a comparative study of MLA on UCI thyroid datasets (UCITD). The developed system deals with RM i.e., ridge regression algorithm (RRA) & least absolute shrinkage and selection operator algorithm (LASSO). The above algorithms personage produce at most 79% accuracy by RRA and 98.99% accuracy by LASSO. Thus, this paper shows the importance of LASSO, along with an example for parameter generation. The decisive factors (DF) also suggest the accuracy rate of LASSO is much better when compared with RRA. Copyright © 2015, International Society of Personalized Medicine. Published by Elsevier B.V. All rights reserved.

Keywords: Thyroid Ridge regression Machine learning Regularization method

1. Introduction The objective of the proposed work is used for identification of the 7 categories of TD [17] i.e., hyperthyroid, hypothyroid, binding protein, replacement therapy, anti-thyroid therapy, non-thyroid illness and miscellaneous features by considering 28 major pristine attributes. The work is done by capturing the attention of MLA i.e., RM [14] and SMS [13]. The system developed is used for offline analysis [3] on TDD. Originally the system is developed by using a SMS, when the SMS fails to identify the relevant data in the

Abbreviations: TD, thyroid disease; TDD, thyroid disease diagnose; MLA, machine learning algorithms; RM, regularization methods; UCITD, UCI thyroid datasets; RRA, ridge regression algorithm; LASSO, least absolute shrinkage and selection operator algorithm; DF, decisive factors; SMS, string matching system; mT, meticulousness; eX, exactitude; cP, compassion; rI, rigor; EAS, expert advisory system; NTP, number of true positives; NFP, number of false positives; NTN, number of true negatives; NFN, number of false negatives; TP, true positives; FP, false positives; TN, true negatives; FN, false negatives; RS, rule-set. * Corresponding author. Department of Computer Science & Engineering, Raghu Institute of Technology, Dakamarri, Visakhapatnam 531162, Andhra Pradesh, India. Tel.: þ91 9440024661, þ91 9177559425. E-mail address: [email protected] (V. Prasad).

knowledge, then automatically the system invokes to get the optimistic disease [1,2] by using two individual methods of RM which belongs to MLA namely (i) RRA [20] (ii) LASSO [5]. The method of RRA on TDD produced a result of maximum accuracy 79%, where as the LASSO on TDD produced 98.99%, but as per the comparisons made by DF [21], it is understood that LASSO [7] performance is best in prophecy, than RRA. Hence, the system is further enhanced with DF values namely meticulousness (mT), exactitude (eX), compassion (cP) and rigor (rI) in order to provide the disease name with its prevention and curing methods. Here, we obtain the diagnostic methods for different symptoms entered by the user dynamically. If the data entered by the user is sufficient and if it matches the knowledge, then the proposed system displays the actual disease with which the human is suffering, or else it displays the dialogue box stating that the knowledge is insufficient. For the purpose of calculating missing attribute values [10], RM is introduced, so that the nearby related diseases of TD can be determined based on the raised values [19], if the exact disease is unavailable in the knowledge of the expert, then the proposed system identically shows the probabilistic disease with which the human is suffering from. The LASSO method provides the actual disease by considering the values generated by the function

http://dx.doi.org/10.1016/j.pmu.2015.09.001 2186-4950/Copyright © 2015, International Society of Personalized Medicine. Published by Elsevier B.V. All rights reserved.

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001

2

V. Prasad et al. / Personalized Medicine Universe xxx (2015) 1e9

LASSO_K, if the raised values satisfies the RM then LASSO is responsible for providing the optimized disease which could be predicted. 2. Material & methods 2.1. Role of TDD, MLA & health professionals 2.1.1. Thyroid medical diagnosis (TDD) Thyroid medical diagnosis (TDD) is known to be subjective and depends not only on the available data but also on the experience of the physician, intuition, biases and even on the psychophysiological condition of the physician. Automatically derived diagnostic knowledge may assist physicians to make the diagnostic process more objective and more reliable. Database is constructed using 390 patient's data consisting 2096 rules obtained from the Intelligent system Laboratory of K.N.Toosi University of Technology from Imam Khomeini hospital [12,13,17]. 2.1.2. Machine learning algorithms (MLA) Machine learning algorithms (MLA) [1,15] has recompensed erudition, high parallelism and pace against noise. The readiness of reuse or reprocess feature in MLA made us to choose it as an option to develop this proposed work. Thus, we proposed a method of invoking MLA into this work of TDD. Nowadays, by developing machinery and information in therapeutic sciences, the computer science professionals are capable of providing expert advisory system EAS [2,9,12] to diagnose different kinds of diseases with high accuracy. 2.1.3. Health professionals Health professionals are made to use these systems, due to some developed errors during general verdict process. Ailment diagnosis operations using EAS are performed based on set of disease symptoms. These systems are based on artificial intelligence [18] which helps the physician to minimize the costs and time in valuable diagnoses. Our previous proposed system [1] had succeeded in tracing out the missing attribute values for the identification of original TDD; the system solves all kind of problems raised by thyroid gland and the hormone produced by it. 2.2. Regularization method

Pseudo code of the RRA developed: Step 1: Input vector, X ¼ (S1, S2 … Sp), where ‘p’ value ranges from 1 to 28 considering the pristine attribute values, which is the count of symptoms. Let {Sj j j ¼ 1, 2, … ., j} be a set of J samples. The outcome of each sample SJ is denoted by the trait tJ. Step 2: Output Y is real-valued. For the problem defined above, the Idimensional binary vector C ¼ {b1, b2….., bI} where each bit indicates (‘1’ and ‘0’) based on the availability in the algorithm Step 3: Predict Y from X by f (X) so that the expected loss function E (L(Y, f (X))) is minimized. To define the objective function, we use ridge regression to feature subset selection. Through regression coefficients by forcing a penalty on the size of subset, ridge regression is able to smoothly approach the solution with less variance compared with forward and backward stepwise selection methods. Give the trait t and C leading to a subset of selected attributes. X ¼ {xk1, xk2,xk3, … …xkn},we fit the following pair wise interaction model as Eq. (1) From Eq. (1), we can obtain selected attributes as displayed in Eq (2), i.e.

tðCÞ ¼ b0 þ

n X

bi xki

i¼1

n1 X X u¼1 v > u

buv xku xkv

Eq. (2)

Step 4: Square loss: L(Y, f (X)) ¼ (Y ¡ f(X))2. The optimal predictor f*(X) ¼ argminf (X)E(Y ∧¡ f(X))2. Then the ridge regression is to compute the coefficient set b ¼ fb0 ; b1 ; b2 ; ::::::::bn g by minimizing the penalized residual sum of squares obtained from trait t from Eq. (2). Therefore,

8 !2 J u i¼1 j¼1 !9 n n 1 X = X X þl b2i þ b2uv ; v>u ∧

u¼1

i¼1

Eq. (3) RM [15] in fields of MLA and problems refers to a process of introducing additional information in order to solve an ill-posed problem in TDD. In this method, we provided some restrictions on data for smoothness or bounds in the data base. The novelty which you can observe in this proposed RM algorithms are that, these are used to calculate the missing values, because it is already informed that the data of this thyroid is inconsistent and vague. As per the representation, the pseudo code which is implemented can show you the exact working style of RRA in RM. The continuous values which are obtained are used for prophecy of TDD. The method of obtaining the continuous numerical values is mentioned in the pseudo codes of Sections 2.2.1 & 2.2.2.

Where complexity parameter is denoted as ðlÞ in Eq. (3) Then, the objective function is defined as a multiple R2 value, which is a decreasing function of the residual sum of squares from obtained Eq (3)

2.2.1. Ridge regression algorithm RRA used for analyzing multiple regression data which is having multi co-linearity. When multi co-linearity [6,8,11] occurs, least squares estimates are unbiased (restrictions are avoided), but their variances are large so they are far from the true value. By adding a degree of bias (restriction) to the regression estimates, RR reduces the standard errors. It is hoped that the net effect will give estimates that are more reliable and useful for TDD.

D¼l

PJ j¼1

∧ tj  b0 

Pn1 j i¼1 bxki  u¼1

Pn

R2 ðcÞ ¼ 1 



2 PJ  t t j¼1 j

P v> u

∧ buv

!2 xjku xjkv

þD

Eq. (4) Where f*(X) ¼ E(Y jX) and “D” values in Eq (4) is defined as n X i¼1

b2i þ

n1 X

X

u¼1 v > u

! b2uv

Eq. (5)

PJ PJ t ¼ 1=J j¼1 tj is the mean outcome and ðt  tÞ2 is the total j¼1 j outcome variation. The complexity parameter l  0 controls the eradication in accuracy levels. In the case of ordinary least squares regression ðl ¼ 0Þ, R2 lies between 0 and 1; R2 ¼ 1 indicates perfect model fit.

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001

V. Prasad et al. / Personalized Medicine Universe xxx (2015) 1e9

Step 5: The function E(Y jX) is the regression function. The expected loss function E (L(Y, f (X))), is the additional data which acts as the restriction to predict the actual TDD. Choice of the ridge complexity parameter l is a variant of model selection used to choose a value. In general, the value chosen should be the one associated with the largest R2 value. 2.2.2. Least absolute shrinkage and selection operator An alternative regularized version of least squares is LASSO [16]. The prime differences between LASSO and RRA is that, in RRA as the restrictions increases, all parameters are reduced setting DF values as non-zero, while in LASSO increasing the restrictions will cause more and more of the parameters to be driven zero. This is an advantage of LASSO over RRA, as driving parameters to zero deselects the features from the regression. Thus, LASSO automatically selects more relevant features and discards the others whereas RRA never fully discards any features. Pseudo code of the LASSO implemented: Step 1: Consider an example, the extension of “Y” (OUTPUT) is proportional to the input given “X” (INPUT), i.e., (S1, S2……S12) and. Y ¼ f (X, k) ¼ kX,

Yi ¼ k Xi þ Vi

Step 4: There are several methods we used to estimate the unknown parameter k. Noting, that the n equations in the m variables in our data comprise an over determined system with one unknown and n equations, we had choose to estimate k using least squares. The sum of squares to be minimized as



Eq. (7)

n X

ðYi  kXi Þ2

Eq. (8)

i¼1

This is obtained from Eq. (7) Step 5: The least squares estimate of the output constant, k, is given by

Pn Xi Yi k ¼ Pi¼1 n 2 i¼1 Xi

Eq. (9)

The above generated least square estimate is the combination obtained from Eqs. (7) and (8). Step 6: The output constant k is the required identical value produced by LASSO, thus k value by Eq. (9), Table 1. Example code for obtaining LASSO function using TD data: [20]

Eq. (6)

constitutes the model, where X is the independent variable. Step 2: To estimate the output constant k, a series of n measurements with different outputs will produce a set of data, (Xi, Yi), i ¼ 1, 2 … n, where Yi is a measured extension. Step 3: Each experimental observation has an error. We denote this error with “V”, and specified an empirical model for our observations,

3

Table 1 This contains the entered data through the questionnaire form, when the TSH, T3, TT4, T4U, FTI, TBG values are enabled “1”, then we can enter the related values of data obtained through the test dynamically and from this, we can calculate the k values as shown above. TSH TSH-value T3 T3-value TT4 TT4-value T4U T4U-value FTI FTI-value 1 1 1 1 1

1.5176 0.4775 0.0678 0.1762 0.0500

1 1 1 1 1

1.4622 0.4234 0.0113 0.2292 0.1500

1 1 1 1 1

1.4183 0.3806 0.0334 0.2713 0.2500

1 1 1 1 1

1.3827 0.3459 0.0696 0.3054 0.3500

1 1 1 1 1

1.3532 0.3172 0.0997 0.3336 0.4500

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001

4

V. Prasad et al. / Personalized Medicine Universe xxx (2015) 1e9

Therefore, the “k” value obtained here could be a value nearer to Boltzman Constant, for obtaining optimistic value nearer, upper and lower approximations of data could be considered by having Boltzman Constant into the data for securing the data [4e7]. 2.3. Decisive factors These factors help in identifying the best algorithm in RM for classification [14,15] and also give the accuracy and correctness for prophecy towards TDD. The term cumulative gives the total count of existing elements in the entered input string which are successful smacks/loss to the database, whereas term individual gives the occurrence, as of read at particular iteration. mT: Meticulousness of a classifier is the percentage of the test set tuples that are correctly predicted.

The procedure discussed above is calculated for single tuple knowledge, but in the beginning it is informed that the UCITD are using additional tuples of data respectively, hence the matching and neighbouring elements trace would give more apex values for the DF. 2.4. System design In the development of this offline web portal we had implemented the system with Fig. 1, Table 3. 2.4.1. Event flow diagram

2.4.2. Functional requirements 2.4.2.1. Inputs

mT ¼ ðNTP þ NTNÞ = ðNTP þ NFP þ NFN þ NTNÞ

✓ Thyroid disease

eX: Exactitude is also referred as true positive rate i.e. the proportion of positive tuples that are correctly identified.

✓ Thyroid disease parameters

eX¼ NTP= ðNTP þ NFNÞ

✓ Curing methodologies

cP : Compassion is defined as the proportion of the true positives against all the positive results.

✓ Preventions

cP¼ NTP=ðNTP þ NFPÞ

2.4.2.2. Outputs

rI: Rigour is the true negative rate i.e. the proportion of negative tuples that are correctly identified.

✓ Information Disease

rI ¼ NTN = ðNTN þ NFPÞ

✓ Description about the disease

2.3.1. Procedure for calculating DF Example: Consider a single tuple database of 12 symptoms or attribute values for calculating DF Table 2. Therefore, the obtained DF for the above considered data is.

✓ Diagnosis 2.4.3. Non-functional requirements ✓ Windows platform that is equipped with Java Server Pages and Net Beans. ✓ Technology is JAVA & MYSQL.

2.4.4. Database

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001

V. Prasad et al. / Personalized Medicine Universe xxx (2015) 1e9

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001

5

6

V. Prasad et al. / Personalized Medicine Universe xxx (2015) 1e9

Table 2 This is a sample table which is picked up from the original database of UCITD, to display the diagnose method along with its prevention and referral source as a suggestion. The entered data is the choice given by the user, if the entered data directly smacks the knowledge, then the SMS system produces 100% output, otherwise the system modifies itself by sending the same data to the classifiers to check, whether the smacks of the data is successful or not, the values which are required by the DF are updated and maintained as shown below. “*” indicates the TP and “~” indicates the TN. The remaining 0's in the Rule Sets are FN and 1's are FP. Original database available: Symptoms

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

Diagnose

Prevention

Rule Set-1

0

1

1

1

0

1

0

0

0

0

1

0

Thyroid is Negative

SVHC (or) Normal condition.

Entered input string: Symptoms

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

Diagnose

Prevention

Rule Set-1 (already existing knowledge)

0

1

1*

1

0

1*

0~

0

0

0

1

0

Thyroid is Negative

SVHC (or) Normal condition.

Entered Data

1

0

1*

0

1

1*

0~

1

1

1

0

1

3. Results & discussions When the input string is given to the SMS, it processes the entered string with the knowledge automatically and then the diagnose is suggested, otherwise the MLA will predict the TD. After predicting the TD levels, the DF calculations are made. As per the

apex value obtained by the DF, the best classifier would be traced out, and the result obtained by the classifier is finally displayed as a suggestion Figs. 2e4, Table 4. Resultant output for the input screen: Report generated as: Hyperthyroid.j 1873. Referral source specified: SVI.

Fig. 1. In this stage, the DF are obtained by the input string smacks or the loss of object values to the knowledge and based on the strength of DF's i.e., DF > 85%, the best resultant algorithm is announced.

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001

V. Prasad et al. / Personalized Medicine Universe xxx (2015) 1e9

7

Table 3 After considering the entered symptoms, the disease can be predicted from this sample database, like wise it consists of more than 2096 tuples for identification of TD. S

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S14

S15

S16

RS-1 RS-2 RS-3 RS-4 RS-5 RS-6 RS-7 RS-8

41 87 44 19 25 60 29 53

F F M F F M F M

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 1 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

S18 1.3 0.15 2 0.45 1.6 0.2 1.4 1.4

S19 1 1 1 1 1 1 1 1

S20 2.5 1.7 1.3 3.2 5.4 4 3.4 1.9

S21 1 1 1 1 1 1 1 1

S22 125 162 136 130 152 68 147 104

S23 1 1 1 1 1 1 1 1

S24 1.14 0.87 0.94 1.83 1.5 1 1.49 0.93

S25 1 1 1 1 1 1 1 1

S26 109 186 145 71 102 67 99 112

S27 0 0 0 0 0 0 0 0

S17 1 1 1 1 1 1 1 1

Fig. 2. Input screen to identify hyperthyroid disease. The resultant output is as followed below and this screen shot shows the diagnosis of hyperthyroid according to the symptoms fed by the user, the prediction and the referral source SVI is suggested to the user for further proceedings.

S28 0 0 0 0 0 0 0 0

Referral Source (S29) SVHC SVI SVHD STMW STMW OTHER STMW SVI

Predicted disease (S30) Negative.j3733 Hyperthyroid.j1873 Negative.j3345 Goitre.j3523 Negative.j1183 T3 toxic.j547 Goitre.j2469 Negative.j43

Fig. 3. Input screen to identify goiter disease. The resultant output is as follows and this screen shot shows the diagnosis of goiter according to the symptoms fed by the user, and as the range of the test results fed, the prediction is made and the referral source STMW is suggested to the user for further proceedings of treatment.

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001

8

V. Prasad et al. / Personalized Medicine Universe xxx (2015) 1e9

Resultant output for the input screen: Report generated as: Goiter.j 3523. Referral source specified: STMW.

4. Conclusion

Fig. 4. This screen shot compares the TDD obtained through several MLA developed using RRA, LASSO and the systems developed is working with result approximations up to 98.99% considering a 28 preeminent attributes in knowledge for identifying the TD. This is the comparison chart produced based on RRA & LASSO algorithms on evaluation methods of DF.

Table 4 Performance of RRA for UCITD for a sample considered as CASE 1, 2 & 3. CASE Entry of symptoms CASE-1 CASE-2 CASE-3

RM algorithms

Meticulousness

Exactitude

Compassion

Rigour

UCITD

UCITD

UCITD

UCITD

RRA LASSO RRA LASSO RRA LASSO

71.59 95.94 74.32 98.96 75.84 95.63

75.17 97.71 73.47 98.71 75.67 98.41

71.03 90.34 76.86 98.45 76.42 96.44

77.5 98.93 79.2 96.48 78.29 95.67

RM of MLA is considered for evaluating the performance in terms of DF in predicting TDD. The attributes for UCITD data S1to S28 are crucial in deciding TD status, with the selected data. LASSO is found giving better results with all the features available. When the observations turned towards Table 4, it is observed that there is a growth rate of values in measure. As per the results obtained, it can be recommended that LASSO related to RM shows high performance rate compared with RRA [11]. The efficiency in the parameters chose the best optimization algorithm which can lead to exact prophecy of the TDD. Successful implementation, i.e., (i) introduced all 28 pristine attributes along with its values for predicting TDD, (ii) comparison of two high flying algorithms belonging to same cadre of MLA are applied on common featured attributes of UCITD, which merely helped in obtaining predicting levels at 98.99% accuracy. Lastly, (iii) about 2096 tuples of data for predicting TD. The successful implementation & completion of this work can give young researchers the idea, how useful the “MLA” in prediction as well as its application in medical epoch. In this experimental study on thyroid using RM, has given good prophecy. As thyroid has large features which indirectly increase prophecy execution time as well as may lead to misclassification, MLA has proved very useful pre-processing step to remove unwanted features and to resolve the missing data. Most importantly this experimental study showed more improved accuracy over the original data set with missed values [10].

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001

V. Prasad et al. / Personalized Medicine Universe xxx (2015) 1e9

References [1] V Prasad, T S Rao, M S P Babu ,” Thyroid disease diagnosis via hybrid architecture composing rough data sets theory & machine learning algorithms.”, DOI: 10.1007/s00500-014-1581-5 [SOFT COMPUTING., SOCO] [2] Prasad V, Rao TS. Health diagnosis expert advisory system on trained data sets for hyperthyroid. Int J Comput Appl 2014;102(3). [3] Prasad V, S Rao T, Babu MSP. Offline analysis & optimistic approach on livestock expert advisory system. CiiT Int J Artif Intelligent Syst Mach Learn 2013;5(12). [4] Prasad V, Rao TS. Information clustering based upon rough sets. Int J Sci Eng Technol Res 2014;3(4):8330e3. [5] Jaggi Martin. An equivalence between the LASSO and support vector machines. 2014. 25 April, arXiv:1303.1152v2. [6] Rohe Karl. A note relating ridge regression and ols p-values to preconditioned sparse penalized regression. 2014. 03rd December, arXiv:1411.7405v2. [7] Seunghak Lee By, Xing Eric P. Screening rules for overlapping group lasso. 2014. 25th October, arXiv:1410.6880v1. [8] Kapelner Adam, Bleich Justin. bartMachine: machine learning with Bayesian additive regression trees. 2014. November, arXiv:1312.2171v3. [9] Azar Ahmad Taher, Hassanien Aboul Ella, Kim Tai-Hoon. Expert system based on Neural-Fuzzy rules for thyroid diseases diagnosis. Springer-Verlag Berlin Heidelberg; 2012. [10] Peters Georg, Weber Richard, Nowatzke Rene. Dynamic Rough clustering and its applications. Appl Soft Comput 2012;12:3193e207.

9

[11] Zhang Bin, Horvath Steve. Ridge regression based hybrid genetic algorithms for multi locus quantitative trait mapping. Int J Bioinform Res Appl 2006;1(3): 261e72. [12] Keles A. ESTDD: expert system for thyroid diseases diagnosis. Expert Syst Appl 2008;34(1):242e6. [13] Crochemore M, Czumaj A, et al. Speeding up two string-matching algorithms. Algorithmica 1994;12(4e5):247e67. [14] Fan Yue, Raphael Louise, Kon Mark. Feature vector regularization in machine learning. 2013. arXiv:1212.4569. [15] Georgiev Stoyan, Mukherjee Sayan. Randomized dimension reduction on massive data. 2013. arXiv:1211.1642. [16] Prasad V, Srinivasa Rao T. Standard cog exploration on medicinal data. Int J Comput Appl (IJCA) 2015;119.10:34e8. [17] Thyroid Disease Records: https://archive.ics.uci.edu/ml/datasets/ ThyroidþDisease. [18] Gharehchopogh FS, Molany M, Mokri FD. Using artificial neural network in diagnosis of thyroid disease: a case study. Int J Comput Sci Appl 2013;3(4):49e61. [19] Prasad V, Siva Kumar R, Mamtha M. Plug in generator to produce variant outputs for unique data. Int J Res Eng Sci 2014b;2(4):1e7. [20] Prasad Vadamodula, Rao Tamada Srinivasa. Implementation of regularization method ridge regression on specific medical datasets. Int J Res Comput Appl Inf Technol 2015;3.2:25e33. [21] Purnachandra Rao M. Database evaluation considering the decisive factors for obtaining an optimistic value to predict the diagnose. Int J Comput Sci Inf Technol 2015;6(3):2145e9.

Please cite this article in press as: Prasad V, et al., Improvised prophecy using regularization method of machine learning algorithms on medical data, Personalized Medicine Universe (2015), http://dx.doi.org/10.1016/j.pmu.2015.09.001