C H A P T E R
7 Data Science Driven Drug Repurposing for Metabolic Disorders Selvaraman Nagamani, Rosaleen Sahoo, Gurusamy Muneeswaran, G. Narahari Sastry Centre for Molecular Modeling, CSIR-Indian Institute of Chemical Technology, Hyderabad, India
1 INTRODUCTION Data science, artificial intelligence, and machine learning are quickly making inroads not only into all aspects of our life and businesses, but also into all branches of science, engineering, and medicine (Craven & Page, 2015; Hand, 2015; Issa, Byers, & Dakshanamurthy, 2014). Health care is an extremely important aspect of human survival and sustainability, and we believe that data-driven approaches have great potential in tackling a number of the challenges that health care may face in the coming decades. Collection and analysis of huge amounts of data provide a detailed understanding of the pathophysiology and manifestation of diseases. In the postgenomic era, data-driven/knowledge-driven approaches and different big-data applications have changed the face of the health-care system (Aronson & Rehm, 2015). Advancements in big-data analytics in the health-care system at both individual and population levels are leading towards personalized medicine. Health-care systems generate a large amount of biomedical data including electronic health records, medical imaging, multiomics data, and scientific articles (Craven & Page, 2015; Hand, 2015; Issa et al., 2014). The results of high-throughput cellular and protein-binding assays can be analyzed using big-data analytics, which broadens our understanding of the drug-discovery process and can be used in the development of chemoinformatics-based databases (Manzoni et al., 2018). Various databases act as the central hub for information, which is collated from different biological, clinical, and physicochemical data. Data mining plays an important role in finding new therapeutic targets, drug-target associations and drug-repurposing hypotheses. It is very interesting to note that new processes and methodologies have been developed to automate the drug-discovery process with a high priority placed on drug repurposing (Schneider, 2018).
In Silico Drug Design. https://doi.org/10.1016/B978-0-12-816125-8.00007-9
191 # 2019 Elsevier Inc. All rights reserved.
192
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
Alterations in metabolic processes lead to metabolic disorders and are large amount of metabolic disorders data are available in the public domain. High-calorie diets are responsible for the dysregulation of a large number of metabolic pathways, which contributes to the development of metabolic diseases. Metabolic syndrome and metabolic disorders are the most common type of metabolic diseases. Metabolic diseases have largely been attributed to genetic background, diet, physical activity, environmental factors, and ageing processes. This chapter provides an overall description of metabolic disorders and drug repurposing. Further, it discusses various areas ranging from the pathophysiology of metabolic disorders, big-data applications in metabolic disorders, genomics, proteomic and epigenetic aspects of metabolic disorders, metabolic health and disease, the metabolic and immune systems, repurposed drugs for metabolic diseases, to the role of data science in drug discovery.
2 OVERVIEW OF METABOLIC DISORDERS In this section we give a brief description of metabolism, metabolic disorders, metabolic syndrome, and the different factors responsible for metabolic disorders.
2.1 Metabolism The human body gets energy from nutrients through all the biochemical reactions in each cell and this process is known as “metabolism.” The metabolic processes are categorized as anabolism and catabolism (Fig. 1) (Lehninger, Nelson, & Cox, 2000).
FIG. 1 Types of cell metabolism: catabolism and anabolism.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
2 OVERVIEW OF METABOLIC DISORDERS
193
Anabolism is the build-up process where complex molecules such as carbohydrates, nucleic acids, and proteins are formed from small molecules. Catabolism is the destructive process where complex molecules are broken down into smaller units to produce energy. Catabolic processes require energy (ATP, NADH, NADPH, and FADH2) for the breakdown of the large complex molecules (e.g., splitting of carbohydrates, proteins, lipids, etc., into their smaller units). The energy obtained from these processes is utilized for construction, maintenance, and several other processes of the body and the remainder is excreted from the body (Lehninger et al., 2000). Metabolic pathways are generally referred to as a series of enzyme-mediated chemical reactions in the cell. Catabolic pathways are convergent in nature, whereas anabolic pathways are divergent in nature. The structure of metabolic pathways is linear, branched, and cyclic. In metabolic pathways, a single precursor may give rise to multiple end products or several precursors may be converted into a single product (Lehninger et al., 2000; Pi-Sunyer, 1993). Metabolites are the substances essential to the metabolic processes or the products of the metabolic processes. Metabolites are classified into primary metabolites and secondary metabolites. Primary metabolites are produced by glycolysis and tricarboxylic acid (TCA) cycle and secondary metabolites are produced by other pathways (e.g., fatty acid derivative biosynthesis, antibiotic biosynthesis, etc.). Diseases, disorders, and syndromes are three different terms associated with the alterations or abnormalities in normal chemical reactions. Diseases generally occur due to the pathophysiological response of external or internal factors. Disorders are the disruption of the normal or regular function of the body as a consequence of the disease. A syndrome is a collection or set of signs and symptoms that characterize or suggest a disease. Any type of disease or disorder that disrupts the normal metabolic processes is known as a metabolic disease, which can either be genetic or inherited.
2.2 Metabolic Disorders and Metabolic Syndrome Metabolic disorders occur mainly due to deficiencies in the enzyme that are necessary to convert one metabolite to another metabolite. The abnormalities or manifestations of metabolic disorders are either due to the accumulation of large amounts of one metabolite or a deficiency of one or more metabolites. Metabolic disorders can be broadly classified into inherited metabolic disorders and acquired metabolic disorders. Inherited metabolic disorders are due to the inborn errors of metabolism, which result from genetic defects, and this leads to deficiencies in the production of enzymes or abnormalities in their function. Abnormal metabolic function in humans causes various types of inherited metabolic disorders, such as lysosomal storage disorders (Hurler syndrome, Tay-Sachs disease, Gaucher’s disease, and Fabry disease), peroxisomal disorders (Zellweger syndrome and adrenoleukodystrophy), and metal metabolism disorders (Wilson’s disease and hemochromatosis), etc. (Sliwinska, Kasinska, & Drzewoski, 2017). Acquired metabolic disorders are associated with external factors, such as an unhealthy lifestyle along with little physical activity and excessive caloric intake. Evidence has shown that human lifestyle is associated with an inherited epigenetic pattern, which affects gene expression, and the activity of proteins that leads to the development of metabolic diseases or disorders (Eckel, Alberti, Grundy, & Zimmet, 2010). Metabolic syndrome is the most common metabolic disorder associated with the global epidemic of obesity and diabetes (Boelens &
2. THEORETICAL BACKGROUND AND METHODOLOGIES
194
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
Wynn, 2017). The increased risk for Type-2 diabetes and cardiovascular disease provides a strong indication of the increased risk for metabolic disorders as a whole, so strategies are required to prevent the emerging global threat. Weight reduction and a reasonable level of physical activity is the fundamental approach and drug treatment can be appropriate for risk reduction in patients with diabetes and cardiovascular disease (Heal, Gosden, & Smith, 2009).
2.3 Different Components Underlying Metabolic Syndrome Unhealthy food and lifestyle has led to an epidemic of obesity in adults and in children, mostly at a young age. An imbalance in the caloric intake and energy expenditure is the leading cause of the accumulation of adipose tissue. Obese patients are at high risk of developing comorbid metabolic conditions, including type-2 diabetes, hypertension, hyperlipidemia, and cardiovascular diseases (Wisse, 2004). There are different criteria for the diagnosis of metabolic syndrome given by the NCEPATP III (National Cholesterol Education Program Adult Treatment Panel III), WHO (World Health Organization), and IDF (International Diabetes Federation) (Table 1) (Eckel TABLE 1 Risk Factors or Criteria for the Diagnosis of Metabolic Syndrome According to WHO (World Health Organization), NCEP ATP III (National Cholesterol Education Program Adult Treatment Panel III), and IDF (International Diabetes Federation)
S. No.
Criteria
WHO T2D, IGT, Glucose Intolerance and/or IR + ≥2 Others
NCEP ATP III ≥3 Risk Factors
IDF Central Obesity + ≥2 Other Risk Factors
1
IGT (impaired glucose tolerance)
5.6 mmol/L (100 mg/dL)
6.1 mmol/L (110 mg/dL)
5.6 mmol/L (100 mg/dL)
2
Insulin resistance
Hyperinsulinemic/euglycemic clamp Glucose intake below lowest quartile
Not included
Not included
3
Obesity
BMI >30 Waist-hip >0.9 M; >0.85 F
Waist >102 cm (40 in.) M; >88 cm (35 in.) F
Waist >94 cm (37.4 in.) M; >80 cm (31.8 in.) F
4
Hypertension
140/90 mmHg
130/85 mmHg
130 mmHg or 85 mmHg
5
Serum triglycerides
1.7 mmol/L (150 mg/dL)
1.7 mmol/L (150 mg/dL)
1.7 mmol/L (150 mg/dL)
6
HDL-cholesterol
<0.9 mmol/L (35 mg/dL) M <1.0 mmol/L (39 mg/dL) F
<1.03 mmol/L (40 mg/dL) M <1.29 mmol/L (50 mg/dL) F
<0.9 mmol/L (40 mg/dL) M <1.1 mmol/L (50 mg/dL) F Drug treatment
7
Microalbuminuria
Urinary albumin excretion rate 20 g/min
Not included
Not included
2. THEORETICAL BACKGROUND AND METHODOLOGIES
3 CLASSIFICATION OF METABOLIC DISORDERS
195
FIG. 2 Possible causative factors implicated in metabolic disorders.
et al., 2010; Heal et al., 2009). Several factors in individuals/populations, such as gender, age, diet, etc., modify the risk of disease (Fig. 2).
3 CLASSIFICATION OF METABOLIC DISORDERS Metabolic disorders are broadly classified using two different categories, namely the general classification of metabolic disorders and pathophysiological classification of metabolic disorders.
3.1 General Classification The general classification of metabolic disorders involves six different types, i.e., disorders in protein metabolism, disorders in lipid metabolism, disorders in carbohydrate metabolism, disorders in hormone metabolism, lysosomal storage disorders, and mitochondrial storage disorders (Bohra & Bhateja, 2015). 3.1.1 Disorders in Protein Metabolism Disorders in protein metabolism are due to a lack of enzymes that lead to alterations or abnormalities in the metabolism of endogenous or exogenous proteins. These disorders are of three different types: 3.1.1.1 DUE TO THE INABILITY TO METABOLIZE SOME AMINO ACIDS
Disorders that occur due to the inability to metabolize some amino acids fall in this category (e.g., maple syrup urine disease, homocystinuria, tyrosinemia, etc.). Maple syrup urine
2. THEORETICAL BACKGROUND AND METHODOLOGIES
196
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
disorder is due to a disorder in isoleucine, leucine, and valine metabolism. Accumulation of homocysteine and its metabolite due to alterations in the methionine metabolism leads to homocystinuria. Tyrosinemia is caused by elevated blood levels of tyrosine (Bohra & Bhateja, 2015). 3.1.1.2 DUE TO ORGANIC ACID DISORDERS
Deficiencies in some enzymes result in the excretion of nonamino organic acids in the urine (e.g., isovaleric acidemia, propionic acidemia, methylmalonic acidemia, etc.). Isovaleric acidemia is caused by a deficiency in the enzyme responsible for leucine metabolism, which leads to the improper metabolism of leucine and produces isovaleric acid. Propionic acidemia occurs as a result of a defective form of propionyl-coenzyme A (CoA) carboxylase, which elevates the propionic acid level. The improper breakdown of certain proteins and fats leads to the elevation of the levels of methylmalonic acid in the blood, which causes methylmalonic acidemia (Bohra & Bhateja, 2015). 3.1.1.3 DUE TO THE UREA CYCLE DEFECT WHICH IS CAUSED BY ANY DEFECT IN OR ABSENCE OF AN ENZYME OR COFACTORS IN THE UREA CYCLE
Defects in urea cycle lead to various diseases, such as citrullinemia, argininosuccinic aciduria, carbamoyl phosphate synthetase deficiency, etc. Citrullinemia is caused by the elevation of ammonia and other toxic substances levels in the blood (Bohra & Bhateja, 2015). 3.1.2 Disorders in Lipid Metabolism Disorders in lipid metabolism are caused by the lack of enzymes or presence of defective enzymes in lipid metabolism. Disturbance in the lipid metabolism results in an increased blood lipid level, which is known as lipidemia/hyperlipidemia. Hyperlipidemia also increases the blood cholesterol and shows a higher risk for atherosclerosis, coronary artery disease (CAD), and stroke (Bohra & Bhateja, 2015). 3.1.3 Disorders in Carbohydrate Metabolism These types of disorders occur either due to defects in the transport of metabolites that are responsible for carbohydrate metabolism or enzyme deficiencies in carbohydrate metabolism. Galactosemia, glycogen storage diseases, disorders of mucopolysaccharide metabolism, and disorders of pyruvate metabolism (Bohra & Bhateja, 2015) are a few examples of disorders associated with carbohydrate metabolism, and diabetes mellitus (DM) is the most common example of a carbohydrate metabolism disorders. 3.1.3.1 DIABETES MELLITUS
DM is a chronic metabolic disease characterized by the impairment of blood glucose level regulation, where the pancreas fails to produce adequate insulin or the body is unable to use the insulin produced. If DM shows a progressive development over a long period of time, severe complications, such as potential blindness, nephropathy, renal failure, etc., can occur. People with diabetes have a high risk for several other diseases, such as cardiovascular diseases, peripheral vascular diseases, and cerebrovascular diseases. The threat of DM is increasing worldwide, which results from several risk factors like genetic predisposition, obesity, low levels of physical activity, improper diet, etc. Insulin resistance (IR) and glucose
2. THEORETICAL BACKGROUND AND METHODOLOGIES
3 CLASSIFICATION OF METABOLIC DISORDERS
197
intolerance conditions are mainly responsible for the development of type-2 diabetes. Several other types of genetic syndrome (i.e., Down’s syndrome, Klinefelter’s syndrome, Turner’s syndrome, and Wolfram’s syndrome) are also accompanied by an increasing incidence of DM. Evidence shows that lifestyle changes delay or even prevent the onset of type-2 diabetes both in men and women with the improvement in glucose tolerance and also reduce several other cardiovascular risk factors (Tuomilehto et al., 2001; World Health Organization, 2016). 3.1.4 Disorders in Hormone Metabolism There are two different types of hormone in our body, namely anabolic hormones (growth hormone (GH), insulin-like growth factor (IGF), insulin, testosterone, oestrogen, etc.) and catabolic hormones (cortisol, glucagon, adrenaline/epinephrine, etc.), which are responsible for carrying out certain metabolic processes. Disturbance in the pituitary hormone, thyroid hormone, or parathyroid hormone results in disorders in hormone metabolism (e.g., hypopituitarism, hyperpituitarism, and hyperthyroidism). Hypopituitarism includes symptoms such as weakness, irregular or stopped menstrual periods, fatigue, low blood pressure, weight loss, infertility, etc. Hyperpituitarism includes symptoms such as headaches, visual disturbance, growth failure, hirsutism, depression, etc. Hyperthyroidism includes symptoms such as irritability, muscle weakness, depression, fatigue, polydipsia, polyuria, etc. (Bohra & Bhateja, 2015). 3.1.5 Lysosomal Storage Disorders Lysosomal storage disorders occur due to defects in lysosomal function, which are inherited by the next generation. The symptoms vary and depend on the particular disorder and different variables, such as age of onset, and severity can be mild to severe. Examples of some lysosomal storage disorders are Schindler disease, Kanzaki disease, Faber disease, Krabbe disease, Tay-Sachs disease, Sandhoff diseases, and Pyknodysostosis (Bohra & Bhateja, 2015). 3.1.6 Mitochondrial Storage Disorders Disorders caused by the mitochondrial dysfunction are known as mitochondrial diseases. These are either acquired or inherited. Mutations in mitochondrial DNA (mtDNA) or in the nuclear genes that code the mitochondrial components cause inherited mitochondrial diseases. The major factors for acquiring mitochondrial dysfunction are the adverse effects of drugs, infections, or other environmental factors. The condition leads to learning disabilities, muscle weakness, loss of muscle coordination, heart diseases, etc.
3.2 Pathophysiological Classification According to the pathophysiological condition, inherited metabolic disorders can be classified into three different categories namely, metabolic disorders that give rise to intoxication, disorders that involve energy metabolism, and complex molecule disorders (Saudubray et al., 1989).
2. THEORETICAL BACKGROUND AND METHODOLOGIES
198
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
3.2.1 Metabolic Disorders That Give Rise to Intoxication Some metabolic disorders, such as phenylketonuria (PKU), maple syrup urine disease, Wilson disease, porphyria, and galactosemia, lead to intoxication by the accumulation of a particular metabolite. This set of diseases are caused by an inborn error of metabolism (Saudubray et al., 1989). 3.2.1.1 PHENYLKETONURIA
PKU is the most common inherited metabolic disorder and is characterized by a deficiency in phenylalanine hydroxylase (PAH), which leads to the accumulation of phenylalanine (Phe) in the blood, resulting in intellectual impairment if the patient is not treated with a dietary restriction of Phe. According to the report of the National Institutes of Health (NIH), in 2001 approximately 1 in 13,500–19,000 infants were born with PKU in the United States. The usual treatment goal is to reduce blood Phe levels to 60 μmol/L (National Institutes of Health Consensus Development Portal, 2001). In 2007, the Food and Drug Administration (FDA) approved the first drug, sapropterin dihydrochloride (Kuvan), for the treatment of PKU to control blood Phe concentrations. It was also reported that tetrahydrobiopterin lowered blood Phe levels of 27 (87%) out of 31 patients with mild hyperphenylalaninemia (10 patients) or mild PKU (21 patients). It has also been observed that oxidation of Phe was significantly increased in 23 of these 31 patients (74%) (Burnett, 2007; Lindegren et al., 2013; Muntau et al., 2002; National Institutes of Health Consensus Development Portal, 2001). 3.2.2 Metabolic Disorders Involving Energy Metabolism Some inborn errors of metabolism lead to a deficiency in energy production within the liver, brain, or other tissues. Hyperglycemia, hyperlactatemia, hepatomegaly, etc., are the common symptoms of this group of diseases (e.g., glycolysis, glycogen metabolism, hyperinsulinism, etc.) (Saudubray et al., 1989). 3.2.2.1 HYPOGLYCEMIA
Diabetic patients get hypoglycemia (low blood sugar) due to inadequate levels of body sugar. This increases with the ageing process and duration of diabetes. It occurs more frequently in type-1 diabetic patients than type-2 diabetic patients, especially those who are treated with sulfonylurea, glinide, or insulin. The rate of hypoglycemia in type-1 diabetic patients ranges from 115 to 320 per 100 patient-years, while for type-2 diabetes the range is 35–70 per 100 patient-years. It is a common problem in type-1 diabetes (especially in the child population) due to the challenges in insulin dosing, different eating patterns, irregularity in dayto-day activities, and difficulties in the early detection of hypoglycemia in children (Seaquist et al., 2013). 3.2.3 Metabolic Disorder Involving Complex Molecules This category includes cellular organelles and diseases that show abnormalities in the synthesis or the catabolism of the complex molecules. Its symptoms are demonstrated as permanent and progressive effects, which are independent of food intake. Lysosomal storage disorders, peroxisomal disorders, congenital disorder of glycosylation, etc., are included in this category (Saudubray et al., 1989).
2. THEORETICAL BACKGROUND AND METHODOLOGIES
4 COMPUTATIONAL APPROACHES
199
3.2.3.1 LYSOSOMAL STORAGE DISORDERS
Lysosomal storage disorders are autosomal recessive in nature and occur due to the deficiency of lysosomal acid lipase, they range from infantile to late onset. Progression of the disease in infants is known as Wolman’s disease, which is very rapid and death occurs within 6 months after birth; in the older patient, the progression of the disease is known as a cholesterol ester storage disease, which shows complications in childhood or later. This occurs due to an increase in the serum aminotransferase levels. The severity of disease-related complications is due to a lack of awareness, which also leads to a delay in diagnosis and misdiagnosis. The results of phase I, II, and III clinical trials of sebelipase alfa showed an improved serum lipid profile by decreasing the serum aminotransferase levels. The results of this case study led to the development of enzyme-replacement therapy for lysosomal acid lipase deficiency among children and adults (Burton et al., 2015; Tolar et al., 2009).
4 COMPUTATIONAL APPROACHES 4.1 Next Generation Sequencing Biological science entered the genomics era after the completion of the human genome project in 2003 (Auton et al., 2015). Sequencing of genomes became faster with the development of next generation sequencing (NGS) techniques, which has encouraged the development of various genomic projects, such as the 1000 genome project and International HapMap project. Along with the advancement of modern technologies, NGS in the postgenomic era has generated large-scale genomics profiles on many diseases, including metabolic disorders, and also on drugs and compounds. NGS can be used in target identification by providing exome information for the whole human, which can be applied to identify mutations at the genomic level. Data on the RNA sequence can be used for the comparative analysis of the gene expression of diseased and normal tissue. This can be helpful to identify the genes and pathways that are important to the understanding of disease pathology. NGS will be helpful in genetic linkage studies for the identification of new drug targets from the complex traits of hereditary diseases. Mutation detection in the whole genome can be further applied to “personalized medicine” by identifying the efficacy of biomarkers, which are very helpful in the medicinal field. Genes and genetic variations information is very important to the analysis of drug response at the time of medication. Personalized medicine and targeted therapy depend on an individual’s genetic makeup. NGS also provides an application in biomarker discovery, which will generate whole-genome microRNA (miRNA) profiles to determine biomarker signatures. These miRNA profiles provide insight into the disease mechanism and identification of new drug targets for the drug-discovery process. Knowledge of NGS can be incorporated in addition to data-driven approaches for drug repurposing in metabolic disorders (Woollard et al., 2011). The NGS has been effectively used in the diagnosis of inborn errors of metabolism (Ghosh et al., 2017). Zhao, Zhu, Boerwinkle, and Xiong (2015) developed a novel statistic based on principal component analysis (PCA) for pathway association tests with NGS data. NGS can also be effectively used to detect hidden inherited metabolic diseases (Yamamoto et al., 2015). NGS enables cheap and rapid detection of de novo mutations, phenotypic spectrum of most genes,
2. THEORETICAL BACKGROUND AND METHODOLOGIES
200
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
digenic inheritance, or the presence of more than one rare metabolic disease, and paves the way to promising new therapeutics (Fernandez-Marmiesse, Gouveia, & Couce, 2018).
4.2 Personalized Medicine Metabolic phenotyping is playing an important role in the development of personalized medicine for metabolic disorders. Analyzing the metabolites present in urine and blood provides information on the metabolic phenotype of an individual or population. Metabolic profiles provide more information than the genotype, gene expression profiles, or even the proteome of an individual. These metabolic profiles provide a “systems level” analysis of different criteria, such as the chemical and physicochemical processes, health status, and environmental factors, for individuals and populations that can be utilized in personalized medicine and public health care ( Janssen, Katzmarzyk, & Ross, 2004). Personalized medicine has become increasingly popular for the diagnosis and treatment of metabolic syndromes, including diabetes, obesity, and nonalcoholic fatty liver acid disease (Shapiro, Suez, & Elinav, 2017). Recently, more microfluid devices with miniaturized biosensors and transducers have become available that can facilitate earlier diagnose and better treatment for metabolic disorders (Guest & Guest, 2018). Fig. 3 shows the integration of the data pool required for personalized medicine.
FIG. 3 Integration of data pools that might help toward developing personalized medicine.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
4 COMPUTATIONAL APPROACHES
201
4.3 Big-Data Approaches Managing the heterogeneity of data is currently the most challenging task. Different types of biological data (i.e., sequences, structures, graphs, patterns, pathways, genes, expression profiles, etc.) generated from experiments, medical practice, and computational analyses are helpful for the better understanding of a biological system. Various bioinformatics tools and computational methods are being developed to extract and analyse the values from the huge amount of data to generate new knowledge using different data-driven approaches. The organization, analysis, and interpretation of the data comprise a daunting task since the complexity and volume of data are increasing day by day. Different health-care industries generate different patient health-care data, such as diagnostic parameters (hemoglobin, glucose level, cholesterol, etc.), medical tests, personal health records, radiology images, clinical trials data submitted to FDA, human genetics, population data and genomic sequences, etc., all of which makes up the “big data” for health care. The available data can be categorized into three different types, namely, structured, unstructured, and semistructured data. Structured data are the easiest type of data to capture, categorize, and analysis (lab value, diagnosis, demographic data, and procedure, etc.). Unstructured data are generally irregular, unorganized, or ambiguous in format. These data are captured but stored as free text. Most of the available data are unstructured. We can get some patterns, which will add more value to the data and also increase the usability of data by converting the unstructured data into structured data. Semistructured data are a mixture of the structured and unstructured data (Raghupathi & Raghupathi, 2014). Large datasets produced by advancement in high-throughput technologies are helpful in the identification of new drug targets, drug indications, and biomarkers. The exploration of big data led to the development of various data science-driven approaches in the field of biomedical research, including the disease subtypes with distinct molecular patterns, identification of novel biomarkers, and discovery of new indications for drugs along with novel mechanism of drug action. However, several obstacles remain to the integration, interpretation, and conversion of the biomedical data into informative knowledge or therapeutic discovery. In order to face these challenges, sophisticated computational approaches are required for the big-data analysis (Li, 2015). The different types of data that are available for the analysis of diseases are single nucleotide polymorphisms (SNPs), copy number variations (CNVs), mutations, gene expressions, protein expressions, protein-protein interactions, protein-DNA interactions, gene silencing, gene over expressions, drug efficacies, drug-target interactions, electronic medical records (EMR)/health records (HER), and -omics profiles. Fig. 4 shows the pictorial representation of data sharing and data analytics in translational medicine that can be applied to metabolic disorders. Big data play an important role in the field of drug discovery and the health-care system and open a new window for personalized medicine. The amount of patients’ data has largely increased because of computer-based information systems (Belle et al., 2015). Apart from the clinical data, various other data, such as claims, cost data, mechanisms of action, side effects, and toxicity, have been used in analysis with the help of the big-data revolution. When such a huge amount of data is used meaningfully, it can avoid unnecessary treatments, minimize adverse drug effects, maximize overall safety, and create a path to personalized medicine (Alyass, Turcotte, & Meyre, 2015). Hadoop is designed to run on large numbers of connected
2. THEORETICAL BACKGROUND AND METHODOLOGIES
202
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
FIG. 4 Pictorial representation of data sharing and big-data analytics in translational medicine.
servers to scale-up data storage and processing at a very low cost. The text mining-based Hadoop platform allows more patient information to be collated, helping to predict disease risk, prevent disease, and provide more precise information about the patients (Alexander & Wang, 2017; Jung & Choi, 2014). In this section we provide a few case studies that apply big-data and data-driven approaches for different biomedical and bioinformatics problems. 4.3.1 Diabetic Retinopathy Screening Using Artificial Intelligence Deep learning provides a thoughtful analysis of data by monitoring the changes in various physiological characteristics. Analyses of different stages of diseases significant contribute to the estimation of the severity and progression of the disease. The characterization of different stages of the disease plays a vital role in diabetic retinopathy treatment. Diabetic retinopathy is a leading cause of vision loss globally. Wong and Bressler (2016) used deep-learning technology for diabetic retinopathy screening. As a first step they “train” an algorithm using large data sets (n ¼ 128,175 images), then they “test” the algorithm using two separate data sets (n ¼ 9963 images and n ¼ 1748 images). The developed screening software is statistically significant with an 87%–90% sensitivity and 98% specificity.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
4 COMPUTATIONAL APPROACHES
203
In another study, Gulshan et al. (2016) also developed a novel artificial intelligence (AI) algorithm to analyse disease-staging for ranking diabetic retinopathy, which helps ophthalmologists to suggest treatments and determines prognoses. The GoogLeNet deep learningneural network algorithm was modified and trained according to 9939 posterior pole photographs of 2740 diabetic patients. The outcome of the study from Gulshan et al. (2016) has few limitations, however, the developed application can be used very effectively to improve the vision of diabetic retinopathy patients. 4.3.2 Drug Repurposing Based on -Omics-Data Mining Data-mining techniques (machine learning, AI, and statistical analysis) are widely used for the analysis of a large amount of data to discover the valuable knowledge. For instance, Zhang, Luo, Xi, and Rogaeva (2015) retrieved genes, proteins, and metabolites associated with diabetes or impaired glucose metabolism from 16 genome-wide association studies (GWAS), 17 proteomics studies, and 18 metabolomics studies on diabetes using a datamining approach. They identified 12 drug targets and repurposed 58 potential drugs for diabetes treatment based on -omics-data mining. Further, they performed drug repurposing by analyzing gene-expression profiles along with 58 drugs and they identified nine drugs (diflunisal, nabumetone, niflumic acid, valdecoxib, phenoxybenzamine, idazoxan, diflorasone, d-cycloserine, and perhexiline) that have great potential for diabetes treatment. 4.3.3 Molecular Property Diagnostic Suite A web-based molecular property diagnostic suite (MPDS) Galaxy tool, is a comprehensive, indigenous end-to-end online drug-discovery portal. MPDS is a first drug discovery workbench that affords an open-source platform for the chemoinformaticians, bioinformaticians, medicinal chemists, computational biologists, pharmacologist,s and others scientists to work on the design of effective as well as safer drugs for a particular disease (Gaur et al., 2017, 2018; Nagamani et al., 2017). The strength of Galaxy lies in its ability to develop extensive userdefined workflows. The MPDS Galaxy tools contain three classes of modules, namely, Data Library (modules: (1) Literature, (2) Target library, and (3) Compound library), Data Processing (modules: (4) File format conversion and (5) Descriptor calculation), and Data Analysis (modules: (6) QSAR, (7) Docking, (8) Screening, and (9) Visualization). While modules 1, 2, 7, and 8 are specific to a particular disease, it is quite possible to make at least three out of the four modules, 2, 7, and 8, common. Further, the Galaxy-based web tool is conveniently designed to integrate any other software or scripts. We have developed MPDS webportals for tuberculosis (MPDSTB) (Gaur et al., 2017) and diabetes mellitus (MPDSDM) (Gaur et al., 2018). Currently, we are planning to develop a MPDS web-portal for metabolic disorders (Sahoo, Nagamani, Gaur, Muneeswaran, & Sastry, 2018). 4.3.4 Genetics and -Omics Toolkit to Analyse Gene Function Genetic studies of genotype-to-phenotype (G2P) relationships have been successfully carried out in different human populations (Altshuler, Daly, & Lander, 2008; Williams & Auwerx, 2015). The major limitations of this type of study are environmental influences and limited access to relevant deep-tissue samples for mechanistic validation studies. Alternatively, genetically modified organisms (i.e., S. cerevisiae, C. elegans, D. melanogaster, M. musculus, and R. norvegicus) are taken as model organisms to imitate the complex genetics of human populations and by
2. THEORETICAL BACKGROUND AND METHODOLOGIES
204
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
controlling the environmental factors the gene-by-environmental interactions (GXE) can be studies for a deeper understanding of tissue modeling at different ages and with different treatments (Cook, Zdraljevic, Roberts, & Andersen, 2017; Williams & Auwerx, 2015). Few available genome-wide associated repositories are GWAS, phenome-wide association studies (PheWAS), or quantitative trait loci (QTL) linkage association; however, these studies have not achieved the full spectrum of the possible relationships between genotypes, intermediate phenotypes, and clinical phenotypes. It is a daunting task to explore this space in humans since we have little genome, transcriptome, proteome, and phenome information for different populations. To address this issue, Li et al. (2018) collected and deposited clinical phenotypes (5092), metabolites (979), proteins (2622), transcripts (34 tissues), and markers (6800) from BXD mouse family in order to understand the G2P relationship and they developed an open-source platform “systems genetics approaches” (systems-genetics.org). This multilayered toolkit will speed up the dissection of gene function and will help scientists to understand the G2P relationship without genetic influence. 4.3.5 Functional Annotation of the Human Genome Understanding the functional impact of genetic alterations on different biological processes is a challenging task. Because of this limitation, different groups have developed numerous human homolog-model organism databases. Wang, Al-Ouran et al. (2017) have developed the model organism aggregated resources for rare variant exploration (MARRVEL) database, which provides a single platform to access the rare variants for humans from other databases (OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER) to study the different model organisms. Various experimental results, such as tissue expression, protein subcellular localization, and the molecular function of the human gene, can be accessed in the output page. Users can access 18 million records using the MARRVEL database to facilitate clinical diagnostics and basic research. 4.3.6 Classification of Common Human Diseases Wang, Gaitsch, Poon, Cox, & Rzhetsky (2017) conducted a large-scale, family-based, phenotypic-variance analysis for several diseases. They thoroughly analyzed the familiar environmental pattern of 149 diseases and the genetic and environmental correlation for a set of 29 complex diseases. Further, they compared the contribution of environmental and genetic determinants to the phenotypic variances and covariances of a broad range of diseases and transformed these covariances into disease classifications. In addition, they observed a nearlinear relationship between total phenotypic- and family-based genetic correlations. Based on this study they identified that migraine is associated with dermatitis by genetic association and it is closely associated with infections cystitis and urethritis as observed by environmental classifications. They also hypothesized that migraine etiology is closely associated with immune system function. In the same manner, though neuropsychiatric diseases stayed in the same stable cluster in both taxonomies, they varied considerably within the cluster. According to ICD-9 taxonomy, depression is closely associated with mood and bipolar disorders, but in the environmental classifications, schizophrenia is significantly closer to bipolar and mood disorders than depression.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
5 IMMUNE METABOLIC INTERACTIONS
205
5 IMMUNE METABOLIC INTERACTIONS 5.1 Metabolic System and Immune System The metabolic and immune systems play the most fundamental role in the survival of the human being. Evidence shows that metabolic and immune pathways are evolutionarily conserved throughout the species. A tight interlink has been observed between metabolic homeostasis and the immune system in most living cell organisms. The immune response varies between individual and population levels, and it is modified throughout life. A cluster of diseases like chronic metabolic disorders, particularly obesity, type-2 diabetes, and cardiovascular disease, are responsible for dysfunction in the central homeostatic mechanism (Sliwinska et al., 2017). The immune system has a huge impact on the metabolic processes through multiple mechanisms. Different cytokines (e.g., leptin, IL-6, and TNF-α) contribute to the expression of metabolic syndrome through multiple mechanisms. 5.1.1 Leptin The role of leptin in energy homeostasis in the body is well known, but in relation to inflammatory syndrome caused by abdominal obesity its role remains unclear. The concentration of leptin is directly proportional to body adiposity and it relays information about the depletion or accumulation of fat stores in the brain (Wisse, 2004). A number of studies have reported on the relationship between leptin, body fat, and insulin resistance, but the mechanism is not known. 5.1.2 IL-6 Production of IL-6 by adipose tissue increases with the increase in adiposity in the same way as leptin. The concentration of IL-6 is directly related to fat, glucose level, and insulin receptor (IR) of the body. Increased serum IL-6 concentrations have been shown to be linked to a net increase in the secretion of IL-6 from adipose tissue. IL-6 gene polymorphism is associated with the development and progression of cardiovascular dysfunction and acts as an indicator for the risk of metabolic syndrome. IL6 inhibitor can be used as treatment for metabolic syndrome (Wisse, 2004). 5.1.3 TNF-α TNF-α concentration increases with an increase in obesity and correlates with insulin resistance. Within the adipose tissue, TNF-α causes adipocyte insulin resistance through serine phosphorylation (inactivation) of both IR and IR substrate1 (IRS-1), both of which are responsible for the reduced activation of phosphoinositol-3-kinase and the essential second messenger signal that governs most of insulin’s metabolic effects (Wisse, 2004).
5.2 Dietary Restriction in Metabolic Health Experimental analysis of different organisms provides reasonable evidence that fasting or periodic fasting has been recommended to promote general health and metabolic benefits. Intermittent caloric restriction improved fasting insulin, insulin resistance, high blood pressure, serum cholesterol, and lipid levels in humans (Brandhorst et al., 2015).
2. THEORETICAL BACKGROUND AND METHODOLOGIES
206
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
Immunomodulatory effects of dietary routines are also suggested in inflammation models, which demonstrated that intermittent caloric restriction was associated with reduced sickness, decreasing circulating levels of IL-6, increasing circulating levels of IL-10, and HPA axis up-regulation. This dietary management is significantly associated with a shift from mitochondrial fatty acid oxidation into ketogenesis and ketolysis and induces plasma betahydroxybutarate levels and GPR109A receptor activation (Rahman et al., 2014).
5.3 Different Molecular Mechanisms Associated With Metabolic Disorders Understanding the relationship between inflammation and metabolic signals provides an antiinflammatory strategy that may be useful in the treatment of IR and metabolic disturbances. Blockade of inflammatory cytokines (TNF-alpha and IL-1) is one of the treatments to improve metabolic functions. Antibody-mediated TNF-α, activation of PPAR-γ using thiazolidinediones, blockage of IL-1beta signaling, reduced HbA1c, and fasting glucose levels are the hallmark of improvement in metabolic disturbances (Donath, 2011). Essential omega-3 fatty acids play an important role in the prevention and treatment of metabolic syndrome. Omega-3 fatty acids improved insulin sensitivity and glucose tolerance in metabolic syndrome and reduced the risk of developing cardiovascular diseases by reducing plasma triglyceride levels (Iglesia et al., 2016).
6 EPIGENETICS, GENETICS AND TRANSCRIPTOMICS 6.1 Host Epigenetic Response in Immune-Metabolic Interactions Environmental factors may influence the host-gene expression via epigenetics and contributes to the immune and metabolic systems’ cross-regulation. In a recent epigenome-wide association study, it was found that inflammatory pathway genes (i.e., TNFRDF4, MAMP3K2, and IL5RA) correlate with methylation markers in obese patients (Wahl et al., 2017). Hyperglycemia is a continuous trigger to the epigenetic activation of inflammatory genes (e.g., NF-kB-p65) through alteration in histone methylation (Brasacchio et al., 2009). Increased global methylation levels in natural killer (NK) cells are found in diabetic patients. Globally, it has also been recorded that obese and type-2 diabetic patients have increased methylation levels in B cells. Methylation epigenetic map of NK cells and B cells leads to functional alterations in immune cells, which correlates with insulin resistance and an indicator of metabolic syndrome. Obesity converts antiinflammatory M2 state to a proinflammatory M1-like state through the polarization of adipose tissue macrophages (ATMs), which occurs due to methylation among these cells and is supported by DNA methyltransferase 3a and 3b (DNMT3a and DNMT3b). Increased DNMT3b in obese patients leads to M1 polarization with the help of saturated fatty acids (Yang et al., 2014). Microbiome has a huge impact on different diseases ranging from intestinal disease, obesity, and cancer (Holmes, Wilson, & Nicholson, 2008). Gut bacteria are also responsible for alterations in metabolism and immunity of the body through short-chain fatty acids (SCFAs). These SCFAs act as histone deacetylase inhibitors to regulate the expression of immune-related genes and thereby attenuate inflammation. Many SCFAs
2. THEORETICAL BACKGROUND AND METHODOLOGIES
6 EPIGENETICS, GENETICS AND TRANSCRIPTOMICS
207
cross the placenta and are involved in the host epigenetic response in immune-metabolic interactions in the offspring.
6.2 Role of Genetics and Transcriptomics in Metabolic Disorders Among humans the primary nucleotide sequence of the genome is approximately 99.9% identical and in only 0.1% of the primary sequence are variations noticed. Polymorphisms are the most common variations, which constitute phenotypic variations, variations in human behavior, morphology variations, and susceptibility to diseases. Polymorphisms can be single nucleotide polymorphisms (SNPs), microsatellite, or minisatellite repeat sequences and viral insertions. SNPs are the most common polymorphism where the nucleotide base pair differences in the DNA sequence occur due to insertions, deletions, or substitutions of one base pair. Metabolic syndromes are mainly due to the on and off response of genes, excess caloric intake, and genetic factors. Using genetic mapping and positional cloning linked to clinical traits, genetic analysis of metabolic syndrome examines genetic variations, polymorphisms, and mutations of underlying single gene disorders with large effects on metabolic traits. It also analyses the level of mRNA transcripts, proteins, and metabolites (Lusis, Attie, & Reue, 2008). Three SNPs that are associated with obesity are on the FTO (fat mass obesity associated) gene, one is near the TREM18 (transmembrane protein 18) gene, and one is near the MC4R (melanocortin 4 receptor) gene. A large number of genes and SNPs are associated with diabetes, such as SLC16A11, SLC16A13, etc. In the case of lipid metabolism disorders, more than 150 mutations are associated with low-density lipoprotein (LDL) receptor gene, apolipoprotein B (APOB), proprotein convertase subtilisin/kexintype-9(PCSK9), low-density lipoprotein receptor adaptor protein-1 (LDLRAP1), apolipoprotein C2 (APOC2), apolipoprotein A5 (APOC5), lipase maturation factor-1 (LMF1), and glycosylphosphatidylinositol anchored high-density lipoprotein binding protein-1 (GPIHB1). SNPs in patatin-like phospholipase domain containing-3 (PNPLA3) and transmembrane six superfamily member-2 (TM6SF2) are associated with nonalcoholic fatty liver disease (NAFLD) (Heindel et al., 2017). For genetic analysis, integration of DNA variation, gene expression, and clinical phenotypes data take into consideration. MicroRNAs (miRNAs) are small noncoding RNAs, which play an important role in the regulation of the gene expression in the posttranscriptional stage either by transcript destabilization or translational inhibition or both. Either one single miRNA can regulate expressions of hundreds of genes or one single gene can regulate expressions of multiple miRNAs. There are several miRNAs that show altered expression in obesity and metabolic syndrome (Table 2) (Aryal, Singh, Rotllan, Price, & Fernandez-Hernando, 2017). miRNA143 and miRNA-223 act as biomarkers for metabolic changes by obesity in humans. Regulation of lipid homeostasis by cholesterol synthesis and lipoprotein secretion in the liver is controlled by miR-122 and miR-370 has similar effects on lipid metabolism. miR-375 is associated with control of pancreas homeostasis; miR-143, miR-27, and miR-335 are associated with lipid metabolism and adipocyte differentiation; and miR-378 plays an important role in the lipid metabolism regulation. miR-33 acts as a posttranscriptional regulator of cholesterol homeostasis. miR-33a and miR-33b involved in cholesterol and lipid metabolism
2. THEORETICAL BACKGROUND AND METHODOLOGIES
208
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
TABLE 2 Role of Different miRNAs in the Alterations of Functions of Different Tissues in Obesity and Metabolic Syndrome Along With Their Targeted Genes S. No.
miRNAs
Tissue
Function
Target Genes
References
1
miR-103 and miR-107
Adipose
Adipocyte differentiation Insulin and glucose homeostasis
PANK1, CAV1, DICER
Trajkovski et al. (2011) Xie, Lim, and Lodish (2009) Martello et al. (2010) Wilfred, Wang, and Nelson (2007)
2
miR-143
Adipose, pancreas
(pre)Adipocyte differentiation and insulin resistance
MAPK7, ORP8 (OSBPL8)
Esau et al. (2004) Takanabe et al. (2008) Jordan et al. (2011)
3
miR-132
Adipose
Adipocyte proliferation and growth, and insulin resistance
CREB
Kl€ oting et al. (2009)
4
miR-175p
Adipose
Adipocyte clonal expansion and insulin resistance
RBL2
Kl€ oting et al. (2009) Wang et al. (2008)
5
miR-29
Adipose, liver, kidney, muscle
Glucose transport, amino acid metabolism, and insulin resistance
INSIG1, CAV2, BCKHA
He, Zhu, Gupta, Chang, and Fang (2007) Wang, Gauthier, Hagenfeldt-Johansson, Iezzi, and Wollheim (2002)
6
miR-122
Liver
Cholesterol biosynthesis, cellular stress response, and hepatitis C virus replication
PMVK, TRPV6, BCL2L2, CCNG1, HMGCR
Lugli, Larson, Martone, Jones, and Smalheiser (2005) Girard, Jacquemin, Munnich, Lyonnet, and Henrion-Caude (2008)
7
miR-145
Colon
Cell proliferation
IRS1
Shi et al. (2007)
8
miR-375
Pancreas
Insulin secretion and pancreatic islet development
MTPN, USP1, JAK2, ADIPOR2
Gauthier & Wollheim, 2006 Stoffel (2017) Krek et al. (2005)
9
miR124a
Pancreas
Pancreatic islet development
FOXA2, RAB27A
Tang, Tang, and Ozcan (2008)
10
miR-9
Pancreas
Insulin secretion
ONECUT2
Plaisance et al. (2006)
11
miR-133
Heart
Long QT syndrome and cardiac hypertrophy
HERG, RHOA, CDC42, WHSC2
Tang et al. (2008)
12
miR-192
Kidney
Kidney and diabetic nephropathy development
SIP1
Kato et al. (2007)
2. THEORETICAL BACKGROUND AND METHODOLOGIES
7 STRUCTURAL BIOINFORMATICS
209
(Vienberg, Geiger, Madsen, & Dalgaard, 2017). Several miRNAs are involved in lipid metabolism (miR-33/33*, miR122, miR27a/b, miR378/378*, miR-34a, and miR-21) and regulation of fat metabolism (miR-27a and miR-27b). The function of miRNAs in lipid metabolism can also provide therapeutic strategies for fatty liver disease. The metabolically relevant tissues that are mostly affected by the role of miRNAs are the beta cell, liver, skeletal muscle, heart, white adipose tissue (WAT), and brown adipose tissue (BAT) (Aryal et al., 2017).
7 STRUCTURAL BIOINFORMATICS The identification and understanding of the function of deleterious SNPs are important to further understand the role of variants in the disease; however, experimentally this is a difficult task. Alternatively, different computational methods are available to understand the effect of nonsynonymous single nucleotide polymorphisms (nsSNPs) on protein activity and the 3D structure of the protein. The tested predictions are more useful for further in-vitro and in-vivo analysis (Doss et al., 2008). SNP information along with the protein’s 3D structure provide insight into the variability in drug responses (McCammon, Gelin, & Karplus, 1977).
7.1 Mutation and Single Nucleotide Polymorphism-Based Structural Analysis There are roughly 300 million validated human variants available in dbSNP build 151 (dbSNP 151 Data Summary, n.d.). It is unrealistic to study each and every individual’s variants in detail. GWAS is a technique that filters out the SNPs important for diseases. In addition to this process, various other tools are helpful to predict the effects of SNPs on protein function and stability, which can be helpful to further filter these datasets. 7.1.1 Variation Databases An enormous amount of data has been generated using NGS projects. The dbSNP (Sherry et al., 2001) is a well-known database to store variations identified using various projects. The data from 1000 genomes, HapMap, and many others, have been incorporated into the dbSNP database. dbVAR (Lappalainen et al., 2013), dbGaP (Mailman et al., 2007), and ClinVar (Landrum et al., 2016) are the SNP variant databases from the National Centre for Biotechnology Information (NCBI). OMIM (Hamosh, Scott, Amberger, Bocchini, & McKusick, 2005), GWAS (Bush & Moore, 2012), UniProt (Magrane & UniProt, 2011), Ensembl (Yates et al., 2016), and Biomart (Guberman et al., 2011) are a few of the databases that provide SNP information along with phenotype. 7.1.2 Deleterious Mutation Prediction The identification of disease-associated SNPs is a major challenge to computational SNP analysis. GWAS (Bush & Moore, 2012) and OMIM (Hamosh et al., 2005) are useful sources to study the variation associated with the disease. Numerous tools are available to predict the impact of nonsynonymous SNPs on protein function. These types of tools fall into two categories, i.e., sequence-based prediction, and structure- and sequence-based predictions.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
210
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
SIFT (Ng & Henikoff, 2003), PROVEAN (Choi & Chan, 2015), FATHMM (Shihab et al., 2013), and PANTHER (Mi, Poudel, Muruganujan, Casagrande, & Thomas, 2016) fall into the first category whereas PolyPhen-2 (Adzhubei, Jordan, & Sunyaev, 2013) and Auto-Mute 2.0 (Masso & Vaisman, 2014) fall into the second category. PhD-SNP (Capriotti, Calabrese, & Casadio, 2006) and Parepro (Tian et al., 2007) are machine-learning techniques where sequence-based support vector machine (SVM) methods are applied to predict the functional effect (Cheng, Randall, & Baldi, 2006). The different databases available on SNP variants are listed in Table 3.
TABLE 3 Public Domain Databases Describing Different Genetic Variation and Mutation Along With Their URLs S. No.
Database and URL
Description
References
1
dbSNP http://www.ncbi.nlm.nih. gov/projects/SNP/
Single nucleotide polymorphism database (dbSNP) is a public-domain for genetic polymorphisms
Sherry et al. (2001)
2
GWAS https://www.ebi.ac.uk/ gwas/
Genome-wide set of disease-related associations with genetic variants
Bush and Moore (2012)
3
OMIM https://www.omim.org/
Human genes, genetic disorders and genotypephenotype relationships
Hamosh et al. (2005)
4
Ensembl https://asia.ensembl.org/ index.html
Comprehensive biological database including automatic annotation on SNP
Birney (2006)
5
HGMD http://www.hgmd.cf.ac. uk/ac/index.php
Gene lesions responsible for human inherited disease
Stenson et al. (2013)
6
HGVD http://www.hgvd. genome.med.kyoto-u.ac. jp/
Japanese genetic variation and association between the variation and transcription level of genes
Higasa et al. (2016)
7
Human mutation analysis (HUMA) https://huma.rubi.ru.ac. za/
Analysis of genetic variation in humans
Brown and Bishop (2017)
8
UniProt http://www.uniprot.org/
Protein database including nsSNPs
Apweiler (2004)
9
VnD http://vnd.kobic.re. kr:8080/VnD/
Variation and drugs database
Yang et al. (2011)
10
dbGaP https://www.ncbi.nlm. nih.gov/gap
Database of genotype and phenotype relationship
Mailman et al. (2007)
2. THEORETICAL BACKGROUND AND METHODOLOGIES
211
7 STRUCTURAL BIOINFORMATICS
TABLE 3 Public Domain Databases Describing Different Genetic Variation and Mutation Along With Their URLs—cont’d S. No.
Database and URL
Description
References
11
ClinVar https://www.ncbi.nlm. nih.gov/clinvar/
Medically important variants and phenotypes
Landrum et al. (2016)
12
PharmGKB https://www.pharmgkb. org/
Impact of human genetic variation on drug response
Thorn, Ji, Weinshilboum, Altman, and Klein (2012)
13
Geneatlas http://genatlas.medecine. univ-paris5.fr/
Gene mutations and their consequences on diseases
Frezal (1998)
14
Genecards http://www.genecards. org/
Comprehensive database on annotated and predicted genes
Safran et al. (2010)
15
ExPASy Molecular Biology server https://www.expasy.org/
Protein database with extensive variant annotations
Artimo et al. (2012)
7.1.3 Role of Structural Bioinformatics in Single Nucleotide Polymorphisms Analysis Analysis of the structure, movement, and interactions of biological macromolecules in 3D space is carried out in the field of structural bioinformatics. Structural bioinformatics play a vital role in pharmaceutical industries since it has been used in all stages of the drugdiscovery process (Blundell et al., 2006; Cavasotto & Phatak, 2009; Chou, 2015; Kapetanovic, 2008; Taboureau, Baell, Fernandez-Recio, & Villoutreix, 2012), where they can be used to complement and sometimes replace costlier experimental techniques (Congreve, Murray, & Blundell, 2005; Durrant & McCammon, 2011; Scapin, 2006). Mutations have been associated with drug resistance in numerous metabolic disorders (Kantardjieff & Rupp, 2004), similarly, mutations can also be linked to drug sensitivity among patients. Analyzing drug resistance and drug-sensitive SNPs along with nsSNPs at the individual level will help to design novel drugs and will give precise information on a particular disease (Yang et al., 2011).
7.2 Protein Secondary Structure Prediction There are a large number of protein sequences available in the UniProt database, but compared to the sequences the available crystal structures are very less. According to the current statistics of UniProt (Universal Protein resource), 5,57,491 annotated/reviewed sequences are available and 1,41,209 structures are available in the protein data bank (PDB) (Berman et al., 2000). Based on these data it is obvious that the number of sequences and structures are increasing; however, the available structures are one-quarter of the available sequences. Different molecular modeling approaches are used to predict the secondary structure of a protein and help in the reduction of the gap between the sequences and structures. It is mainly helpful for the computer-aided drug design (CADD) process when structures are not available.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
212
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
Protein secondary structure provides an insight into the molecular function of the protein. Secondary-structure determination of membrane protein is an arduous task and homology modeling method is a potential alternative to predict the structure of membrane proteins based on the sequence similarities (Reddy, Vijayasarathy, Srinivas, Sastry, & Sastry, 2006). A detailed understanding of protein structure and function aids in CADD (Kantardjieff & Rupp, 2004). There are two different protein-structure prediction methods available, namely, template-based modeling, and ab initio (de novo) methods. Template-based modeling can be further divided into two methods, i.e., homology modeling and protein threading. Homology modeling is a structure prediction based on the similarity between solved protein structures with at least >30% (safe zone). When the sequence similarity is between 20% and 30% (twilight zone) protein threading can be used and when sequence similarity is between 10% and 20% (midnight zone) ab initio method can be used for the protein secondary-structure prediction. Protein threading is a protein modeling method that relies on the same fold as proteins whereas ab-inito modeling conducts a conformational search based on energy function (Peng & Xu, 2010).
7.3 Molecular Docking and Virtual Screening Molecular docking is the prediction of bound conformations of a protein-ligand complex, which is mainly used in structure-based drug design to gain a detailed understanding of biomolecular interactions. Molecular docking allows libraries of compounds to be docked in the active site of receptor protein and this process is called virtual screening. This process can be used to scan compound libraries to identify potential candidates by analysing the binding affinity of each compound to the receptor. The impact of SNPs on the drug response can also be assessed using molecular docking. The mutation in the active site affects the binding affinity of the drug, which may lead to drug resistance. Molecular docking gives binding conformation of the drug molecule in a protein active site and gives clues to predict the SNPs’ effect on drug responses (Blair et al., 2015).
7.4 Molecular Dynamics Simulation Protein-structure prediction and molecular docking provide a snapshot in time of a protein structure and protein-ligand complex, whereas molecular dynamics (MD) simulates the movements and trajectories of all the atoms in a structure over a period of time. MD simulation can be useful for the identification of SNPs that destabilize the protein structure or affect the protein folding and stability of protein-ligand complexes (Kumar & Purohit, 2014). In Fig. 5 we show computational protocols for predicting the effects of mutations on metabolic disorders. The internal energy of the protein and the structure of the protein can also be affected due to SNPs. Gibbs free energy calculation between a wild-type and mutant-type protein is a common measure to analyze the effect of the mutation on stability. The changes in the protein stability due to nsSNPs can be predicted through various tools as displayed in Table 4. SIFT (Ng & Henikoff, 2003), I-Mutant 2.0 (Capriotti et al., 2006) and SNPeffect (Baets et al., 2011) are two methods for predicting protein stability.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
7 STRUCTURAL BIOINFORMATICS
213
FIG. 5 Computational protocols for predicting the effects of mutations on metabolic disorders.
TABLE 4
The Functional Effect of Nonsynonymous SNPs Predicting Tools
S. No.
Tool and URL
Description
References
1
SIFT http://sift.jcvi.org/
SIFT prediction is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSIBLAST
Ng et al. (2003)
2
PolyPhen-2 http://genetics.bwh.harvard.edu/ pph2/
Predicts the possible impact of an amino acid substitution on the structure and function of a human protein
Adzhubei et al. (2013)
3
SNP&GO https://snps-and-go.biocomp.unibo. it/snps-and-go/
Prediction of single point protein mutations likely to be involved in the insurgence of diseases in humans
Capriotti, Calabrese et al. (2013)
4
Mutant-I http://folding.biofold.org/imutant/
Protein stability changes upon a single point of mutation from the protein sequence and structure
Capriotti et al. (2006)
5
SNPEffect http://snpeffect.switchlab.org/
SNPeffect primarily focuses on the molecular characterization and annotation of disease and polymorphism variants in the human proteome
Baets et al. (2011)
6
MutDB http://www.mutdb.org/
A database for assessing the impact of genetic variants
Singh et al. (2007) Continued
2. THEORETICAL BACKGROUND AND METHODOLOGIES
214
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
TABLE 4 The Functional Effect of Nonsynonymous SNPs Predicting Tools—cont’d S. No.
Tool and URL
Description
References
7
SNAP http://www.bio-sof.com/snap
Predictions regarding functionality of mutated proteins from protein sequences and lists of mutants
Bromberg, Yachdav, and Rost (2008)
8
MAPPER http://bio.chip.org/mapper/ mapper-top
Putative transcription factor binding sites (TFBSs) based on Hidden Markov Models built from alignments of known sites
Marinescu (2004)
9
Meta-SNP http://snps.biofold.org/meta-snp/
Meta-predictor of disease-causing variants
Capriotti, Altman, and Bromberg (2013)
10
MutPred http://mutpred.mutdb.org/
Classify an amino acid substitution as diseaseassociated or neutral in human
Mort et al. (2014)
11
Provean http://provean.jcvi.org/
Filter sequence variants to identify nonsynonymous or indel variants that are predicted to be functionally important
Choi and Chan (2015)
7.5 Importance of Modeling, Informatics, Simulation, and Data Analytics in Drug Discovery Molecular modeling, informatics, MD simulation, and different data analytics methods have revolutionized research in life sciences, drug discovery, and the development process. CADD was started during the 1980s when an article was published on the correlation between specific physiochemical properties and the efficacy of known inhibitors (Moreau & Broto, 1980). CADD has become an indispensable tool in lead finding and optimization as it helps to address a wide range of issues. Some of the examples include studying the selectivity, specificity of different families of inhibitors (Badrinarayan & Sastry, 2014), modeling of cell permeability ( Janardhan, Vivek, & Sastry, 2016), target selection and ADME properties prediction based on the scaffold (Choudhury, Priyakumar, & Sastry, 2016), and a combination of these techniques for new lead identification ( Janardhan, John, Prasanthi, Poroikov, & Sastry, 2017). Application of CADD, along with high-throughput synthesis and screening, is very helpful to filter out the most promising molecules from a large number of candidate molecules within reasonable cost and time parameters. The specificity and efficacy of drug targets are essential properties to take into account during drug discovery and development processes. Structural bioinformatics analysis, molecular docking, and MD studies are helpful to identify the conformational variations and role of noncovalent interactions. These approaches are prioritized for the design of different classes of inhibitors (Badrinarayan & Sastry, 2013; Badrinarayan & Sastry, 2014). Docking-based virtual screening (Badrinarayan & Sastry, 2011a, 2011b, 2012; Kulkarni, Srivani, Achaiah, & Sastry, 2007; Reddy et al., 2006; Reddy, Pati, Kumar, Pradeep, & Sastry, 2007), dynamic pharmacophore modeling (Choudhury, Priyakumar, & Sastry, 2014, 2015), and QSAR (Bohari, Srivastava, & Sastry, 2011; Ravindra, Achaiah, & Sastry, 2008; Srivastava, Choudhury, & Sastry, 2012) have also been found to be suitable for the lead optimization.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
7 STRUCTURAL BIOINFORMATICS
215
CADD with the advancement of high-performance computing (HPC) opens a new avenue for drug discovery and leads the process towards personalized medicines. The market rate for drug-discovery information and associated skills is expected to have increased substantially by 2022 and a lot of organizations are already spending on databases and software development, clinical trials, and next-generation sequencing tools. The development of various informatics tools and data science expertise is changing the face of current drug-discovery and development approaches. Informatics and data analytics are very important in the management and analysis of data (i.e., patterns, biomarkers, etc.) and are generated from biomedical sciences at a population level or from individual observations. Thus, the future of drug discovery relies on big-data, data-mining, and machine-learning approaches, which will help in feature selection, classification, image processing, etc. Different machine-learning algorithms are used to observe drug impacts on patients, which is helpful for the development of safer and more effective drugs in less time than the traditional methods. Different methods, such as clustering using representatives (CURE) (Rajaram & Das, 2010) can be very helpful in the classification of large datasets into multiple classes. Fig. 6 shows the importance of modeling, informatics, and AI in health care and drug discovery.
FIG. 6 Importance of modeling, informatics, and artificial intelligence in health care and drug discovery.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
216
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
8 DRUG REPURPOSING Drug repurposing/drug repositioning aims at identifying new therapeutic uses for existing FDA-approved drugs or abandoned pharmacotherapies. As the approved drugs have already passed the preparatory and clinical trial phases, there is less chance of their having toxic effects on the human body. As compared to the traditional drug-development process, drug repurposing needs less developmental time, is cost-effective, and has less failure risk (Gligorijevic, Malod-Dognin, & Przulj, 2016). The traditional process of drug-discovery and development takes 10–17 years, whereas drug repurposing reduces this timeline to 3–12 years. It can result in the renewal of withdrawn drugs and can reveal a number of new indications or new therapeutics for an approved drug (Lotfi, Ghadiri, Mousavi, Varshosaz, & Green, 2017). Drug repurposing is a good approach to improving the efficacy of drugs, which will prevent the risk of failure ( Jin & Wong, 2014; Ye, Liu, & Wei, 2014). Analysis of FDA-approved drugs and their docking studies (Bohari & Sastry, 2012) are useful strategies for drug repurposing. Drug repurposing not only finds new targets for known drugs, but is also helpful in the prediction of dose, frequency, side effects, and bioactivity profiles (Murtazalieva, Druzhilovskiy, Goel, Sastry, & Poroikov, 2017). Big data provides new opportunities for correlating drugs to diseases and diseases to diseases, and for revealing the complexity of the mechanism of action. Understanding the relationships between the drug and target, target and disease, drug and drug, drug and disease, and disease and disease plays an important role in drug repurposing. It has been documented that computational methods and techniques integrated within the big-data approach pave the way for drug repurposing in health care and medicine (Gligorijevic et al., 2016).
8.1 Different Methods for Drug Repurposing There are different methods available for drug repurposing, i.e., drug-, pathway- or network-, target-, knowledge-, phenome-, and signature-based methods. In the drug-based method, we can take either drug similarity or clinical side effects for our subject of analysis to identify new therapeutics for the existing drug. For example, the Predicting Drug IndicaTions (PREDICT) algorithm gives a set of drug-disease known associations and ranks them based on their similarity to identify the unknown associations. The steps of the algorithm are (1) drug-drug and disease-disease similarity construction, (2) measuring similarity to identify drug-disease associations, and (3) prediction of new associations. Another hypothesis is based on the side effects of drugs when the drug binds to off-targets instead of the actual target. This shows that similarities in the side effects may have similarities in their therapeutic properties and mechanism of actions (Gottlieb, Stein, Ruppin, & Sharan, 2011). Based on the similarities in the side effects of FDA-approved drugs we can construct a drug-drug network and can predict new indications by analyzing the neighboring drugs in the network. Network-based approaches mainly focus on the identification of novel drug targets and new indications for the existing drugs. It will provide insight into different drug-disease interactions based on existing knowledge; however, it cannot provide insight into the mechanism of action of drugs (Hurle et al., 2013). There are several network-based approaches that are used to explore networks of drugs, diseases, and targets to reposition drugs for new targets. Drug-target relations can be understood by combining drug-drug, drug-target, drugdisease, and disease-gene relations (Lotfi et al., 2017).
2. THEORETICAL BACKGROUND AND METHODOLOGIES
8 DRUG REPURPOSING
217
8.2 Repurposed Drugs for Metabolic Disorders There are a number of substances that have the potential to be reused as therapeutics for new indications, but only a few compounds are approved for metabolic disorders. Repositioned drugs can be identified either by new targets for known compounds or new indications for known targets. The most common approach is to rescreen existing drugs against new targets to identify possible therapeutic benefits or side effects (Oo & Rusch, 2016). Repurposed drugs for metabolic disorders can be classified into three different groups, such as: 1. The original and repurposed indication of the drugs is concerned with metabolic disorders. 2. The original indication was for nonmetabolic disorders, but after repurposing it is beneficial to metabolic disorders. 3. The original indication was for metabolic disorders and it is repurposed against nonmetabolic disorders. Examples of some repurposed drugs whose original and repurposed indication concerned with metabolic disorders (Finsterer & Frank, 2013) include the following: 1. Tetrahydrobiopterin was originally developed for phenylketonuria and was repurposed for tetrahydrobiopterin deficiency. 2. Ubiquinone was originally approved for primary ubiquinone deficiency and was repurposed for statin-myopathy. 3. Colesevelam was initially approved to reduce low-density lipoprotein cholesterol (LDL-C) level and was later repurposed for type-2 diabetes. 4. Pioglitazone was initially approved as an antidiabetic and was repurposed for oxidative stress. An example of one drug that was originally approved for metabolic disorders and repurposed for nonmetabolic disorders is phenylbutyrate (Finsterer & Frank, 2013). Phenylbutyrate was originally approved for urea-cycle disorders and was repurposed for progressive familial intrahepatic cholestasis type II. Some examples of repurposed drugs that were originally approved for nonmetabolic diseases and repurposed for metabolic diseases include the following (Finsterer & Frank, 2013): 1. Imatinib, dasatinib, and nilotinib were initially approved for cancer but were repurposed for diabetes. 2. Milnacipran and duloxetine were initially approved as antidepressants but were repurposed for fibromyalgia. 3. Cicletanine was initially approved for pulmonary hypertension and was repurposed for diabetes. 4. Pramipexole was initially approved for parkinsonism and was repurposed as an antioxidant. 5. Tetracyclines were initially approved as antibiotics and were repurposed for oxidative stress. 6. Quetiapine and olanzapine were initially approved as neuroleptics and were repurposed as antioxidants.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
218
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
7. Fluoxetine was initially approved as an antidepressant and was repurposed as an antioxidant. 8. NSAID (nonsteroidal antiinflammatory drugs) were initially approved as analgesics and were repurposed as antioxidants. 9. Amitriptyline and nortriptyline were initially approved as antidepressants and were repurposed as antiapoptotics. 10. MAO (mono-amine-oxidase) inhibitors and ladostigil were initially approved for parkinsonism and were repurposed as antiapoptotics. 11. Lithium was initially approved as an antidepressant and was repurposed as an antiapoptotic. 12. Apomorphine was initially approved for parkinsonism and was repurposed as an antioxidant. Thousands of metabolic disorders are known, but the exact reason or mechanism underlying all the diseases and medications is not known for most of the diseases. The huge amount of available data on metabolic processes can be used for CADD and development processes using data-driven approaches. Analyses, using a knowledge-based approach, of the available FDA-approved drugs’ information and compounds that have successfully passed or failed the clinical trials phase can be useful in drug repurposing for metabolic disorders. The
TABLE 5 Different Metabolome Databases, Their Description, and the Number of Molecules Present in Each Database Number of Molecules
Name of the Database
URL
Description
Human Metabolome Database (HMDB)
http://www. hmdb.ca/
Information about small molecule metabolites found in the human body
114,100
Wishart et al. (2007)
Yeast Metabolome Database (YMDB)
http://www. ymdb.ca/
Small molecule metabolites found or produced by Saccharomyces cerevisiae
16,042
Jewison et al. (2011)
Mouse Multiple Tissue Metabolome Database (MMMDB)
http://mmdb. iab.keio.ac.jp/
A collection of metabolites measured from multiple tissue from single mice
Metabolites from 11 different tissues
Sugimoto et al. (2011)
Escherichia coli Metabolome Database (ECMDB)
http://ecmdb. ca/
Metabolomic data and metabolic pathway diagrams about Escherichia coli
3755
Guo et al. (2012)
Pseudomonas Aeruginosa Metabolome Database (PAMDB)
http:// pseudomonas. umaryland. edu/
Metabolomic data and metabolic pathway diagrams about Pseudomonas aeruginosa
>4370
Huang et al. (2017)
Livestock Metabolome Database (LMDB)
http://lmdb. ca/
A collection of small molecule metabolites found in different livestock species
1070
Goldansaz et al. (2017)
2. THEORETICAL BACKGROUND AND METHODOLOGIES
References
REFERENCES
219
processes used to gain an understanding of the relationship between human metabotype and disease-causing risk factors among the general population are known as metabolome-wide association studies (MWAS). Quantification of all the metabolites in a cellular system is the initial step of the metabolomic experiment. But there are various challenges to overcome, such as extraction of the metabolites, the complexity and heterogeneity of the metabolites, and measurement of the metabolites (Goodacre, Vaidyanathan, Dunn, Harrigan, & Kell, 2004). There are different metabolome databases that give information on the metabolites present in humans and other organisms (Table 5).
9 OUTLOOK Modern medical science focuses both on the treatment of diseases and understanding of the diseases health relationship. Metabolic disorders have emerged as an area that presents some of the most important challenges in the noncommunicable diseases and gaining an understanding of their pathophysiology is interesting in its own right. A major lacuna in this aspect is the complexity of the drug-action mechanism in these diseases, and a lack of available drugs for treatment. This has led to a situation where only palliative care is provided to the patient as the effective therapeutic treatment is elusive, not definitive, or too expensive. Efforts to obtain an atomistic understanding of the metabolic pathways are imperative to make any meaningful progress to unravel the pathophysiology of the metabolic disorders. In these efforts, computational approaches involving molecular modeling and data-driven approaches appear to be of profound importance. While the efforts to discover new therapeutic agents are certainly important, one also has to look for the effective utilization of the drug repurposing strategy. We hope that the drug repurposing strategies will be practical and cost-effective and therefore should be thoroughly explored for several of the metabolic disorders. Data analytics have shown great promise and offered hope in tackling this problem due to their ability to analyze large-scale health-care data, which provides a great understanding of the disorders. We feel that the experimental efforts towards finding effective therapeutic solutions to metabolic disorders lie in drug repurposingand molecular modeling and data-analytics approaches are indispensable in these efforts. In the modern era, where life spans are continually increasing, one of the major challenges to health care is posed by ageing and metabolic disorders and we believe that data-driven approaches are going to be indespensable in near future.
References Adzhubei, I., Jordan, D. M., & Sunyaev, S. R. (2013). Predicting functional effect of human missense mutations using PolyPhen-2. Current Protocols in Human Genetics. https://dx.doi.org/10.1002/0471142905.hg0720s76. (Chapter 7), Unit 7 20. Alexander, C. A., & Wang, L. (2017). Big data in healthcare: a new frontier in personalized medicine. Open Access Journal of Translational Medicine & Research, 1(1), 15–18. https://dx.doi.org/10.15406/oajtmr.2017.01.00005. Altshuler, D., Daly, M. J., & Lander, E. S. (2008). Genetic mapping in human disease. Science, 322(5903), 881–888. https://dx.doi.org/10.1126/science.1156409. Alyass, A., Turcotte, M., & Meyre, D. (2015). From big data analysis to personalized medicine for all: challenges and opportunities. BMC Medical Genomics, 8. 33. https://dx.doi.org/10.1186/s12920-015-0108-y.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
220
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
Apweiler, R. (2004). UniProt: the universal protein knowledgebase. Nucleic Acids Research, 32(90001). https://dx.doi. org/10.1093/nar/gkh131. Aronson, S. J., & Rehm, H. L. (2015). Building the foundation for genomics in precision medicine. Nature, 526(7573), 336–342. https://dx.doi.org/10.1038/nature15816. Artimo, P., Jonnalagedda, M., Arnold, K., Baratin, D., Csardi, G., Castro, E. D., & Stockinger, H. (2012). ExPASy: SIB bioinformatics resource portal. Nucleic Acids Research, 40(W1). https://dx.doi.org/10.1093/nar/gks400. Aryal, B., Singh, A. K., Rotllan, N., Price, N., & Fernandez-Hernando, C. (2017). MicroRNAs and lipid metabolism. Current Opinion in Lipidology, 28(3), 273–280. https://dx.doi.org/10.1097/MOL.0000000000000420. Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., & Abecasis, G. R. (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. https://dx.doi.org/10.1038/nature15393. Badrinarayan, P., & Sastry, G. N. (2011a). Virtual high throughput screening in new lead identification. Combinatorial Chemistry and High Throughput Screening, 14(10), 840–860. https://dx.doi.org/10.2174/138620711797537102. Badrinarayan, P., & Sastry, G. N. (2011b). Sequence, structure, and active site analyses of p38 MAP kinase: exploiting DFG-out conformation as a strategy to design new type II leads. Journal of Chemical Information and Modeling, 51(1), 115–129. https://dx.doi.org/10.1021/ci100340w. Badrinarayan, P., & Sastry, G. N. (2012). Virtual screening filters for the design of type II p38 MAP kinase inhibitors: a fragment based library generation approach. Journal of Molecular Graphics and Modeling, 34, 89–100. https://dx.doi. org/10.1016/j.jmgm.2011.12.009. Badrinarayan, P., & Sastry, G. N. (2013). Rational approaches towards lead optimization of kinase inhibitors: the issue of specificity. Current Pharmaceutical Design, 19(26), 4714–4738. https://dx.doi.org/10.2174/1381612811319260005. Badrinarayan, P., & Sastry, G. N. (2014). Specificity rendering ‘hot-spots’ for aurora kinase inhibitor design: the role of non-covalent interactions and conformational transitions. PLoS One, 9(12). https://dx.doi.org/10.1371/journal. pone.0113773. Baets, G. D., Durme, J. V., Reumers, J., Maurer-Stroh, S., Vanhee, P., Dopazo, J., & Rousseau, F. (2011). SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants. Nucleic Acids Research, 40(D1). https://dx.doi.org/10.1093/nar/gkr996. Belle, A., Thiagarajan, R., Soroushmehr, S. M., Navidi, F., Beard, D. A., & Najarian, K. (2015). Big data analytics in healthcare. Biomedical Research International. https://dx.doi.org/10.1155/2015/370194 Article ID: 370194. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., & Bourne, P. E. (2000). The protein data bank. Nucleic Acids Research, 28(1), 235–242. Birney, E. (2006). Ensembl 2006. Nucleic Acids Research, 34(90001). https://dx.doi.org/10.1093/nar/gkj133. Blair, J. M., Bavro, V. N., Ricci, V., Modi, N., Cacciotto, P., Kleinekathfer, U., & Piddock, L. J. (2015). AcrB drug-binding pocket substitution confers clinically relevant resistance and altered substrate specificity. Proceedings of the National Academy of Sciences of the United States of America, 112(11), 3511–3516. https://dx.doi.org/10.1073/ pnas.1419939112. Blundell, T. L., Sibanda, B. L., Montalvao, R. W., Brewerton, S., Chelliah, V., Worth, C. L., & Burke, D. (2006). Structural biology and bioinformatics in drug design: opportunities and challenges for target identification and lead discovery. Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1467), 413–423. https://dx.doi. org/10.1098/rstb.2005.1800. Boelens, J. J., & Wynn, R. F. (2017). Clinical manual of blood and bone marrow transplantation (1st ed.). NYSE: JW. A. [Chapter 31]. doi.org/10.1002/9781119095491.ch31. Bohari, M. H., & Sastry, G. N. (2012). FDA approved drugs complexed to their targets: evaluating pose prediction accuracy of docking protocols. Journal of Molecular Modeling, 18(9), 4263–4274. https://dx.doi.org/10.1007/ s00894-012-1416-1. Bohari, M., Srivastava, H., & Sastry, G. (2011). Analogue-based approaches in anti-cancer compound modelling: the relevance of QSAR models. Organic and Medicinal Chemistry Letters, 1(1), 3. https://dx.doi.org/10.1186/2191-28581-3. Bohra, A., Bhateja S. (2015). Oral aspects of metabolic disorders. Journal of Metabolic Syndrome, 4(3), 1-4. doi: 10.4172/2167-0943.1000178. Brandhorst, S., Choi, I. Y., Wei, M., Cheng, C. W., Sedrakyan, S., Navarrete, G., & Longo, V. D. (2015). A periodic diet that mimics fasting promotes multi-system regeneration, enhanced cognitive performance, and Healthspan. Cell Metabolism, 22(1), 86–99. https://dx.doi.org/10.1016/j.cmet.2015.05.012. Brasacchio, D., Okabe, J., Tikellis, C., Balcerczyk, A., George, P., Baker, E. K., … El-Osta, A. (2009). Hyperglycemia induces a dynamic cooperativity of histone methylase and demethylase enzymes associated with gene-activating epigenetic marks that coexist on the lysine tail. Diabetes, 58(5), 1229–1236. https://dx.doi.org/10.2337/db08-1666.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
REFERENCES
221
Bromberg, Y., Yachdav, G., & Rost, B. (2008). SNAP predicts effect of mutations on protein function. Bioinformatics, 24 (20), 2397–2398. https://dx.doi.org/10.1093/bioinformatics/btn435. € T. (2017). HUMA: a platform for the analysis of genetic variation in humans. Human Brown, D. K., & Bishop, O. Mutation, 39(1), 40–51. https://dx.doi.org/10.1002/humu.23334. Burnett, J. R. (2007). Sapropterin dihydrochloride (Kuvan/phenoptin), an orally active synthetic form of BH4 for the treatment of phenylketonuria. IDrugs, 10(11), 805–813. Burton, B. K., Balwani, M., Feillet, F., Baric, I., Burrow, T. A., Camarena Grande, C., & Quinn, A. G. (2015). A phase 3 trial of sebelipase alfa in lysosomal acid lipase deficiency. New England Journal of Medicine, 373(11), 1010–1020. https://dx.doi.org/10.1056/NEJMoa1501365. Bush, W. S., & Moore, J. H. (2012). Chapter 11: Genome-wide association studies. PLoS Computational Biology, 8(12). e1002822https://dx.doi.org/10.1371/journal.pcbi.1002822. Capriotti, E., Altman, R. B., & Bromberg, Y. (2013). Collective judgment predicts disease-associated single nucleotide variants. mutations in proteins. BMC Genomics, (Suppl. 3), S2. https://dx.doi.org/10.1186/1471-2164-14-S3-S2. Capriotti, E., Calabrese, R., & Casadio, R. (2006). Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics, 22(22), 2729–2734. https://dx.doi.org/10.1093/bioinformatics/btl423. Capriotti, E., Calabrese, R., Fariselli, P., Martelli, P., Altman, R. B., & Casadio, R. (2013). WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation. BMC Genomics, 14 (Suppl. 3). https://dx.doi.org/10.1186/1471-2164-14-s3-s6. Cavasotto, C. N., & Phatak, S. S. (2009). Homology modeling in drug discovery: current trends and applications. Drug Discovery Today, 14(13–14), 676–683. https://dx.doi.org/10.1016/j.drudis.2009.04.006. Cheng, J., Randall, A., & Baldi, P. (2006). Prediction of protein stability changes for single-site mutations using support vector machines. Proteins, 62(4), 1125–1132. https://dx.doi.org/10.1002/prot.20810. Choi, Y., & Chan, A. P. (2015). PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics, 31(16), 2745–2747. https://dx.doi.org/10.1093/bioinformatics/btv195. Chou, K. C. (2015). Impacts of bioinformatics to medicinal chemistry. Medicinal Chemistry, 11(3), 218–234. Choudhury, C., Priyakumar, U. D., & Sastry, G. N. (2014). Molecular dynamics investigation of the active site dynamics of mycobacterial cyclopropane synthase during various stages of the cyclopropanation process. Journal of Structural Biology, 187(1), 38–48. https://dx.doi.org/10.1016/j.jsb.2014.04.007. Choudhury, C., Priyakumar, U. D., & Sastry, G. N. (2015). Dynamics based pharmacophore models for screening potential inhibitors of mycobacterial cyclopropane synthase. Journal of Chemical Information and Modeling, 55(4), 848–860. https://dx.doi.org/10.1021/ci500737b. Choudhury, C., Priyakumar, U. D., & Sastry, G. N. (2016). Structural and functional diversities of the hexadecahydro1H-cyclopenta[a]phenanthrene framework, a ubiquitous scaffold in steroidal hormones. Molecular Informatics, 35 (3–4), 145–157. https://dx.doi.org/10.1002/minf.201600005. Congreve, M., Murray, C. W., & Blundell, T. L. (2005). Structural biology and drug discovery. Drug Discovery Today, 10 (13), 895–907. https://dx.doi.org/10.1016/S1359-6446(05)03484-7. Cook, D. E., Zdraljevic, S., Roberts, J. P., & Andersen, E. C. (2017). CeNDR, the Caenorhabditis elegans natural diversity resource. Nucleic Acids Research, 45(D1), D650–D657. https://dx.doi.org/10.1093/nar/gkw893. Craven, M., & Page, C. D. (2015). Big data in healthcare: opportunities and challenges. Big Data, 3(4), 209–210. https:// dx.doi.org/10.1089/big.2015.29001.mcr. DbSNP 151 Data Summary. n.d. Available from: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi. October 6, 2017. Donath, M. Y. (2011). Inflammation as a sensor of metabolic stress in obesity and type 2 diabetes. Endocrinology, 152 (11), 4005–4006. https://dx.doi.org/10.1210/en.2011-1691. Doss, C. G. P., Sudandiradoss, C., Rajasekaran, R., Choudhury, P., Sinha, P., Hota, P., … Rao, S. (2008). Applications of computational algorithm tools to identify functional SNPs. Functional Integration of Genomics, 8(4), 8309–8316. https://dx.doi.org/10.1007/s10142-008-0086-7. Durrant, J. D., & McCammon, J. A. (2011). Molecular dynamics simulations and drug discovery. BMC Biology, 9(71), 1–9. https://dx.doi.org/10.1186/1741-7007-9-71. Eckel, R. H., Alberti, K. G., Grundy, S. M., & Zimmet, P. Z. (2010). The metabolic syndrome. Lancet, 375(9710), 181–183. https://dx.doi.org/10.1016/S0140-6736(09)61794-3. Esau, C., Kang, X., Peralta, E., Hanson, E., Marcusson, E. G., Ravichandran, L. V., & Griffey, R. (2004). MicroRNA-143 regulates adipocyte differentiation. Journal of Biological Chemistry, 279(50), 52361–52365. https://dx.doi.org/ 10.1074/jbc.c400438200.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
222
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
Fernandez-Marmiesse, A., Gouveia, S., & Couce, M. L. (2018). NGS technologies as a turning point in rare disease research, diagnosis and treatment. Current Medicinal Chemistry, 25(3), 404–432. https://dx.doi.org/ 10.2174/0929867324666170718101946. Finsterer, J., & Frank, M. (2013). Repurposed drugs in metabolic disorders. Current Topics in Medicinal Chemistry, 13 (18), 2386–2394. Frezal, J. (1998). Genatlas database, genes and development defects. Comptes Rendus De LAcademie Des Sciences—Series III—Sciences De La Vie, 321(10), 805–817. https://dx.doi.org/10.1016/s0764-4469(99)80021-3. Gaur, A. S., Bhardwaj, A., Sharma, A., John, L., Vivek, M. R., Tripathi, N., … Sastry, G. N. (2017). Assessing therapeutic potential of molecules: molecular property diagnostic suite for tuberculosis (MPDSTB). Journal of Chemical Sciences, 129(5), 515–531. https://dx.doi.org/10.1007/s12039-017-1268-4. Gaur, A. S., Nagamani, S., Tanneeru, K., Druzhilovskiy, D., Rudik, A., Poroikov, V., & Sastry, G. N. (2018). Molecular property diagnostic suite for diabetes mellitus (MPDSDM): an integrated web portal for drug discovery and drug repurposing. Journal of Biomedical Informatics, 152(11), 4005–4006. Gauthier, B. R., & Wollheim, C. B. (2006). MicroRNAs: ribo-regulators of glucose homeostasis. Nature Medicine, 12(1), 36–38. https://dx.doi.org/10.1038/nm0106-36. Ghosh, A., Schlecht, H., Heptinstall, L. E., Bassett, J. K., Cartwright, E., Bhaskar, S. S., & Banka, S. (2017). Diagnosing childhood-onset inborn errors of metabolism by next-generation sequencing. Archives of Disease in Childhood, 102 (11), 1019–1029. https://dx.doi.org/10.1136/archdischild-2017-312738. Girard, M., Jacquemin, E., Munnich, A., Lyonnet, S., & Henrion-Caude, A. (2008). MiR-122, a paradigm for the role of microRNAs in the liver. Journal of Hepatology, 48(4), 648–656. https://dx.doi.org/10.1016/j.jhep.2008.01.019. Gligorijevic, V., Malod-Dognin, N., & Przulj, N. (2016). Integrative methods for analyzing big data in precision medicine. Proteomics, 16(5), 741–758. https://dx.doi.org/10.1002/pmic.201500396. Goldansaz, S. A., Guo, A. C., Sajed, T., Steele, M. A., Plastow, G. S., & Wishart, D. S. (2017). Livestock metabolomics and the livestock metabolome: a systematic review. PLoS One, 12(5). https://dx.doi.org/10.1371/journal. pone.0177675. Goodacre, R., Vaidyanathan, S., Dunn, W. B., Harrigan, G. G., & Kell, D. B. (2004). Metabolomics by numbers: acquiring and understanding global metabolite data. Trends in Biotechnology, 22(5), 245–252. https://dx.doi.org/10.1016/ j.tibtech.2004.03.007. Gottlieb, A., Stein, G. Y., Ruppin, E., & Sharan, R. (2011). PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology, 7, 496. https://dx.doi.org/10.1038/msb.2011.26. Guberman, J. M., Ai, J., Arnaiz, O., Baran, J., Blake, A., Baldock, R., & Kasprzyk, A. (2011). BioMart central portal: an open database network for the biological community. Database (Oxford). https://dx.doi.org/10.1093/database/ bar041. bar041. Guest, F. L., & Guest, P. L. (2018). Point-of-care testing and personalized medicine for metabolic disorders. Investigations of early nutrition effects on long-term health: methods and applications. Methods in Molecular Biology, 1735, [Chapter 6]. Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., & Webster, D. R. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Journal of the American Association, 316(22), 2402–2410. https://dx.doi.org/10.1001/jama.2016.17216. Guo, A. C., Jewison, T., Wilson, M., Liu, Y., Knox, C., Djoumbou, Y., & Wishart, D. S. (2012). ECMDB: the E. coli metabolome database. Nucleic Acids Research, (D1), 41. https://dx.doi.org/10.1093/nar/gks992. Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., & McKusick, V. A. (2005). Online Mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research, 33(Database issue), D514–D517. https://dx.doi.org/10.1093/nar/gki033. Hand, D. J. (2015). Statistics and computing: the genesis of data science. Statistics and Computing, 25(4), 705–711. https://dx.doi.org/10.1007/s11222-015-9565-6. He, A., Zhu, L., Gupta, N., Chang, Y., & Fang, F. (2007). Overexpression of micro ribonucleic acid 29, highly up-regulated in diabetic rats, leads to insulin resistance in 3T3-L1 adipocytes. Molecular Endocrinology, 21(11), 2785–2794. https://dx.doi.org/10.1210/me.2007-0167. Heal, D. J., Gosden, J., & Smith, S. L. (2009). Regulatory challenges for new drugs to treat obesity and comorbid metabolic disorders. British Journal of Pharmacology, 68(6), 861–874. https://dx.doi.org/10.1111/j.13652125.2009.03549.x. Heindel, J. J., Blumberg, B., Cave, M., Machtinger, R., Mantovani, A., Mendez, M. A., & Saal, F. V. (2017). Metabolism disrupting chemicals and metabolic disorders. Reproductive Toxicology, 68, 3–33. https://dx.doi.org/10.1016/j. reprotox.2016.10.001.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
REFERENCES
223
Higasa, K., Miyake, N., Yoshimura, J., Okamura, K., Niihori, T., Saitsu, H., & Matsuda, F. (2016). Human genetic variation database, a reference database of genetic variations in the Japanese population. Journal of Human Genetics, 61 (6), 547–553. https://dx.doi.org/10.1038/jhg.2016.12. Holmes, E., Wilson, I. D., & Nicholson, J. K. (2008). Metabolic phenotyping in health and disease. Cell, 134(5), 714–717. https://dx.doi.org/10.1016/j.cell.2008.08.026. Huang, W., Brewer, L. K., Jones, J. W., Nguyen, A. T., Marcu, A., Wishart, D. S., & Wilks, A. (2017). PAMDB: A comprehensive Pseudomonas aeruginosa metabolome database. Nucleic Acids Research, 46(D1). https://dx. doi.org/10.1093/nar/gkx1061. Hurle, M. R., Yang, L., Xie, Q., Rajpal, D. K., Sanseau, P., & Agarwal, P. (2013). Computational drug repositioning: from data to therapeutics. Clinical Pharmacology & Therapeutics, 93(4), 335–341. https://dx.doi.org/10.1038/ clpt.2013.1. Iglesia, R. D., Loria-Kohen, V., Zulet, M., Martinez, J., Reglero, G., & Molina, A. R. (2016). Dietary strategies implicated in the prevention and treatment of metabolic syndrome. International Journal of Molecular Sciences, 17(11), 1877. https://dx.doi.org/10.3390/ijms17111877. Issa, N. T., Byers, S. W., & Dakshanamurthy, S. (2014). Big data: the next frontier for innovation in therapeutics and healthcare. Experts Review of Clinical Pharmacology, 7(3), 293–298. https://dx.doi.org/10.1586/17512433. 2014.905201. Janardhan, S., John, L., Prasanthi, M., Poroikov, V. V., & Sastry, G. N. (2017). A QSAR and molecular modelling study towards new lead finding: polypharmacological approach to Mycobacterium tuberculosis. SAR & QSAR in Environmental Reseach, 28(10), 815–832. https://dx.doi.org/10.1080/1062936x.2017.1398782. Janardhan, S., Vivek, M. R., & Sastry, G. N. (2016). Modeling the permeability of drug-like molecules through the cell wall of Mycobacterium tuberculosis: an analogue based approach. Molecular BioSystems, 12(11), 3377–3384. https://dx.doi.org/10.1039/c6mb00457a. Janssen, I., Katzmarzyk, P. T., & Ross, R. (2004). Waist circumference and not body mass index explains obesityrelated health risk. American Journal of Clinical Nutrition, 79(3), 379–384. Jewison, T., Knox, C., Neveu, V., Djoumbou, Y., Guo, A. C., Lee, J., & Wishart, D. S. (2011). YMDB: the yeast metabolome database. Nucleic Acids Research, 40(D1). https://dx.doi.org/10.1093/nar/gkr916. Jin, G., & Wong, S. T. (2014). Toward better drug repositioning: prioritizing and integrating existing methods into efficient pipelines. Drug Discovery Today, 19(5), 637–644. https://dx.doi.org/10.1016/j.drudis.2013.11.005. Jordan, S. D., Kr€ uger, M., Willmes, D. M., Redemann, N., Wunderlich, F. T., Br€ onneke, H. S., & Br€ uning, J. C. (2011). Obesity-induced overexpression of miRNA-143 inhibits insulin-stimulated AKT activation and impairs glucose metabolism. Nature Cell Biology, 13(4), 434–446. https://dx.doi.org/10.1038/ncb2211. Jung, U. J., & Choi, M. S. (2014). Obesity and its metabolic complications: the role of adipokines and the relationship between obesity, inflammation, insulin resistance, dyslipidemia and nonalcoholic fatty liver disease. International Journal of Molecular Sciences, 15(4), 6184–6223. https://dx.doi.org/10.3390/ijms15046184. Kantardjieff, K., & Rupp, B. (2004). Structural bioinformatic approaches to the discovery of new antimycobacterial drugs. Current Pharmaceutical Design, 10(26), 3195–3211. Kapetanovic, I. M. (2008). Computer-aided drug discovery and development (CADDD): in silico-chemico-biological approach. Chemico-Biological Interactions, 171(2), 165–176. https://dx.doi.org/10.1016/j.cbi.2006.12.006. Kato, M., Zhang, J., Wang, M., Lanting, L., Yuan, H., Rossi, J. J., & Natarajan, R. (2007). MicroRNA-192 in diabetic kidney glomeruli and its function in TGF-beta-induced collagen expression via inhibition of E-box repressors. Proceedings of the National Academy of Sciences, 104(9), 3432–3437. https://dx.doi.org/10.1073/pnas.0611192104. Kl€ oting, N., Berthold, S., Kovacs, P., Sch€ on, M. R., Fasshauer, M., Ruschke, K., & Bl€ uher, M. (2009). MicroRNA expression in human omental and subcutaneous adipose tissue. PLoS One. 4(3)https://dx.doi.org/10.1371/journal.pone.0004699. Krek, A., Gr€ un, D., Poy, M. N., Wolf, R., Rosenberg, L., Epstein, E. J., & Rajewsky, N. (2005). Combinatorial microRNA target predictions. Nature Genetics, 37(5), 495–500. https://dx.doi.org/10.1038/ng1536. Kulkarni, R. G., Srivani, P., Achaiah, G., & Sastry, G. N. (2007). Strategies to design pyrazolyl urea derivatives for p38 kinase inhibition: a molecular modeling study. Journal of Computer-Aided Molecular Design, 21(4), 155–166. https:// dx.doi.org/10.1007/s10822-006-9092-9. Kumar, A., & Purohit, R. (2014). Use of long term molecular dynamics simulation in predicting cancer associated SNPs. PLoS Computational Biology. 10(4), e1003318. https://dx.doi.org/10.1371/journal.pcbi.1003318. Landrum, M. J., Lee, J. M., Benson, M., Brown, G., Chao, C., Chitipiralla, S., & Maglott, D. R. (2016). ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Research, 44(D1), D862–D868. https://dx.doi. org/10.1093/nar/gkv1222.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
224
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
Lappalainen, I., Lopez, J., Skipper, L., Hefferon, T., Spalding, J. D., Garner, J., & Church, D. M. (2013). DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Research, 41(Database issue), D936–D941. https://dx.doi.org/10.1093/nar/gks1213. Lehninger, A. L., Nelson, D. L., & Cox, M. M. (2000). Lehninger principles of biochemistry. New York: Worth [Chapter 14]. Li, F. (2015). Data-driven biomarker and drug discovery using network-based approach. Journal of Genetics and Genome Research, 2(2). https://dx.doi.org/10.23937/2378-3648/1410020. Li, H., Wang, X., Rukina, D., Huang, Q., Lin, T., Sorrentino, V., & Auwerx, J. (2018). An integrated systems genetics and omics toolkit to probe gene function. Cell Systems, 6(1), 90–102. e104 https://doi.org/10.1016/j.cels.2017.10.016. Lindegren, M. L., Krishnaswami, S., Reimschisel, T., Fonnesbeck, C., Sathe, N. A., & McPheeters, M. L. (2013). A systematic review of BH4 (sapropterin) for the adjuvant treatment of phenylketonuria. Journal of Inherited Metabolic Disease Report, 8, 109–119. https://dx.doi.org/10.1007/8904_2012_168. Lotfi, S. M., Ghadiri, N., Mousavi, S. R., Varshosaz, J., & Green, J. R. (2017). A review of network-based approaches to drug repositioning. Briefings in Bioinformatics. https://dx.doi.org/10.1093/bib/bbx017. Lugli, G., Larson, J., Martone, M. E., Jones, Y., & Smalheiser, N. R. (2005). Dicer and eIF2c are enriched at postsynaptic densities in adult mouse brain and are modified by neuronal activity in a calpain-dependent manner. Journal of Neurochemistry, 94(4), 896–905. https://dx.doi.org/10.1111/j.1471-4159.2005.03224.x. Lusis, A. J., Attie, A. D., & Reue, K. (2008). Metabolic syndrome: from epidemiology to systems biology. Nature Reviews Genetics, 9(11), 819–830. https://dx.doi.org/10.1038/nrg2468. Magrane, M., & UniProt, C. (2011). UniProt knowledgebase: a hub of integrated protein data. Database (Oxford). https://dx.doi.org/10.1093/database/bar009. bar009. Mailman, M. D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., & Sherry, S. T. (2007). The NCBI dbGaP database of genotypes and phenotypes. Nature Genetics, 39(10), 1181–1186. https://dx.doi.org/10.1038/ng1007-1181. Manzoni, C., Kia, D. A., Vandrovcova, J., Hardy, J., Wood, N. W., Lewis, P. A., & Ferrari, R. (2018). Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Briefings in Bioinformatics, 19(2), 286–302. https://dx.doi.org/10.1093/bib/bbw114. Marinescu, V. D. (2004). The MAPPER database: a multi-genome catalog of putative transcription factor binding sites. Nucleic Acids Research, 33. https://dx.doi.org/10.1093/nar/gki103. Database issue. Martello, G., Rosato, A., Ferrari, F., Manfrin, A., Cordenonsi, M., Dupont, S., … Piccolo, S. (2010). A MicroRNA targeting dicer for metastasis control. Cell, 141(7), 1195–1207. https://dx.doi.org/10.1016/j.cell.2010.05.017. Masso, M., & Vaisman, I. I. (2014). AUTO-MUTE 2.0: a portable framework with enhanced capabilities for predicting protein functional consequences upon mutation. Advances in Bioinformatics, 278385. https://dx.doi.org/ 10.1155/2014/278385. McCammon, J. A., Gelin, B. R., & Karplus, M. (1977). Dynamics of folded proteins. Nature, 2671(5), 585–590. Mi, H., Poudel, S., Muruganujan, A., Casagrande, J. T., & Thomas, P. D. (2016). PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Research, 44(D1), D336–D342. https://dx.doi.org/ 10.1093/nar/gkv1194. Moreau, G., & Broto, P. (1980). The auto-correlation of a topological-structure—a new molecular descriptor. Nouveau Journal De Chimie-New Journal of Chemistry, 4, 359–360. Mort, M., Sterne-Weiler, T., Li, B., Ball, E. V., Cooper, D. N., Radivojac, P., & Mooney, S. D. (2014). MutPred splice: machine learning-based prediction of exonic variants that disrupt splicing. Genome Biology, 15(1), R19. https://dx. doi.org/10.1186/gb-2014-15-1-r19. Muntau, A. C., Roschinger, W., Habich, M., Demmelmair, H., Hoffmann, B., Sommerhoff, C. P., & Roscher, A. A. (2002). Tetrahydrobiopterin as an alternative treatment for mild phenylketonuria. New England Journal of Medicine, 347(26), 2122–2132. https://dx.doi.org/10.1056/NEJMoa021654. Murtazalieva, K. A., Druzhilovskiy, D. S., Goel, R. K., Sastry, G. N., & Poroikov, V. V. (2017). How good are publicly available web services that predict bioactivity profiles for drug repurposing? SAR and QSAR in Environmental Research, 28(10), 843–862. https://dx.doi.org/10.1080/1062936X.2017.1399448. Nagamani, S., Gaur, A. S., Tanneeru, K., Muneeswaran, G., Madugula, S. S., Consortium, M., … Sastry, G. N (2017). Molecular property diagnostic suite (MPDS): Development of disease-specific open source web portals for drug discovery. SAR and QSAR in Environmental Research, 28(11), 913–926. National Institutes of Health Consensus Development Portal (2001). National Institutes of Health Consensus Development Conference Statement: phenylketonuria: screening and management, October 16-18, 2000. Pediatrics, 108 (4), 972–982. Ng, P. C., & Henikoff, S. (2003). SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Research, 31(13), 3812–3814.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
REFERENCES
225
Oo, C., & Rusch, L. M. (2016). A personal perspective of orphan drug development for rare diseases: a golden opportunity or an unsustainable future? Journal of Clinical Pharmacology, 56(3), 257–259. https://dx.doi.org/10.1002/ jcph.599. Peng, J., & Xu, J. (2010). Low-homology protein threading. Bioinformatics, 26(12), i294–i300. https://dx.doi.org/ 10.1093/bioinformatics/btq192. Pi-Sunyer, F. X. (1993). Medical hazards of obesity. Annals of Internal Medicine, 119(7_Part_2), 655. https://dx.doi.org/ 10.7326/0003-4819-119-7_part_2-199310011-00006. Plaisance, V., Abderrahmani, A., Perret-Menoud, V., Jacquemin, P., Lemaigre, F., & Regazzi, R. (2006). MicroRNA-9 controls the expression of granuphilin/Slp4 and the secretory response of insulin-producing cells. Journal of Biological Chemistry, 281(37), 26932–26942. https://dx.doi.org/10.1074/jbc.m601225200. Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science & Systems, 2, 3. https://dx.doi.org/10.1186/2047-2501-2-3. Rahman, M., Muhammad, S., Khan, M. A., Chen, H., Ridder, D. A., M€ uller-Fielitz, H., et al. (2014). The β-hydroxybutyrate receptor HCA2 activates a neuroprotective subset of macrophages. Nature Communications, 5(3944). https://dx.doi.org/10.1038/ncomms4944. Rajaram, T., & Das, A. (2010). Modeling of interactions among sustainability components of an agro-ecosystem using local knowledge through cognitive mapping and fuzzy inference system. Expert Systems With Applications, 37(2), 1734–1744. https://dx.doi.org/10.1016/j.eswa.2009.07.035. Ravindra, G., Achaiah, G., & Sastry, G. (2008). Molecular modeling studies of phenoxypyrimidinyl imidazoles as p38 kinase inhibitors using QSAR and docking. European Journal of Medicinal Chemistry, 43(4), 830–838. https://dx.doi. org/10.1016/j.ejmech.2007.06.009. Reddy, A. S., Pati, S. P., Kumar, P. P., Pradeep, H., & Sastry, G. N. (2007). Virtual screening in drug discovery—a computational perspective. Current Protein & Peptide Science, 8(4), 329–351. https://dx.doi.org/ 10.2174/138920307781369427. Reddy, C. S., Vijayasarathy, K., Srinivas, E., Sastry, G. M., & Sastry, G. N. (2006). Homology modeling of membrane proteins: a critical assessment. Computational Biological and Chemistry, 30(2), 120–126. https://dx.doi.org/10.1016/j. compbiolchem.2005.12.002. Safran, M., Dalah, I., Alexander, J., Rosen, N., Stein, T. I., Shmoish, M., & Lancet, D. (2010). GeneCards version 3: the human gene integrator. Database, 2010. https://dx.doi.org/10.1093/database/baq020. Sahoo, R., Nagamani, S., Gaur, A. S., Muneeswaran, G., & Sastry, G. N. (2018). Developing a disease specific web portal for metabolic disorders: molecular property diagnostic suite—metabolic disorders (MPDSMD). [in preparation]. Saudubray, J. M., Ogier, H., Bonnefont, J. P., Munnich, A., Lombes, A., Herve, F., et al. (1989). Clinical approach to inherited metabolic diseases in the neonatal period: a 20-year survey. Journal of Inherited Metabolic Diseases, 12 (Suppl. 1), 25–41. Scapin, G. (2006). Structural biology and drug discovery. Current Pharmaceutical Design, 12(17), 2087–2097. Schneider, G. (2018). Automating drug discovery. Nature Reviews Drug Discovery, 17(2), 97–113. https://dx.doi.org/ 10.1038/nrd.2017.232. Seaquist, E. R., Anderson, J., Childs, B., Cryer, P., Dagogo-Jack, S., Fish, L., & Endocrine, S. (2013). Hypoglycemia and diabetes: a report of a workgroup of the American Diabetes Association and the Endocrine Society. Journal of Clinical Endocrinology & Metabolism, 98(5), 1845–1859. https://dx.doi.org/10.1210/jc.2012-4127. Shapiro, H., Suez, J., & Elinav, E. (2017). Personalized microbiome-based approaches to metabolic syndrome management and prevention. Journal of Diabetes, 9(3), 226–236. https://dx.doi.org/10.1111/1753-0407.12501. Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M., & Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Research, 29(1), 308–311. Shi, B., Sepp-Lorenzino, L., Prisco, M., Linsley, P., Deangelis, T., & Baserga, R. (2007). Micro RNA 145 targets the insulin receptor substrate-1 and inhibits the growth of colon cancer cells. Journal of Biological Chemistry, 282(45), 32582–32590. https://dx.doi.org/10.1074/jbc.m702806200. Shihab, H. A., Gough, J., Cooper, D. N., Stenson, P. D., Barker, G. L., Edwards, K. J., & Gaunt, T. R. (2013). Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Human Mutation, 34(1), 57–65. https://dx.doi.org/10.1002/humu.22225. Singh, A., Olowoyeye, A., Baenziger, P. H., Dantzer, J., Kann, M. G., Radivojac, P., & Mooney, S. D. (2007). MutDB: update on development of tools for the biochemical analysis of genetic variation. Nucleic Acids Research, 36(Database). https://dx.doi.org/10.1093/nar/gkm659. Sliwinska, A., Kasinska, M. A., & Drzewoski, J. (2017). MicroRNAs and metabolic disorders—where are we heading? Archives of Medical Science, 13(4), 885–896. https://dx.doi.org/10.5114/aoms.2017.65229.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
226
7. DATA SCIENCE DRIVEN DRUG REPURPOSING FOR METABOLIC DISORDERS
Srivastava, H. K., Choudhury, C., & Sastry, G. N. (2012). The efficacy of conceptual DFT descriptors and docking scores on the QSAR models of HIV protease inhibitors. Medicinal Chemistry, 8(5), 811–825. https://dx.doi.org/ 10.2174/157340612802084351. Stenson, P. D., Mort, M., Ball, E. V., Shaw, K., Phillips, A. D., & Cooper, D. N. (2013). The human gene mutation database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Human Genetics, 133(1), 1–9. https://dx.doi.org/10.1007/s00439013-1358-4. Stoffel, M. (2017). MicroRNAs and the regulation of glucose and lipid metabolism. Endocrine Abstracts. https://dx.doi. org/10.1530/endoabs.49.mtbs3. Sugimoto, M., Ikeda, S., Niigata, K., Tomita, M., Sato, H., & Soga, T. (2011). MMMDB: mouse multiple tissue metabolome database. Nucleic Acids Research, 40(D1), D809–D814. https://dx.doi.org/10.1093/nar/gkr1170. Taboureau, O., Baell, J. B., Fernandez-Recio, J., & Villoutreix, B. O. (2012). Established and emerging trends in computational drug discovery in the structural genomics era. Chemical Biology, 19(1), 29–41. https://dx.doi.org/ 10.1016/j.chembiol.2011.12.007. Takanabe, R., Ono, K., Abe, Y., Takaya, T., Horie, T., Wada, H., & Hasegawa, K. (2008). Up-regulated expression of microRNA-143 in association with obesity in adipose tissue of mice fed high-fat diet. Biochemical and Biophysical Research Communications, 376(4), 728–732. https://dx.doi.org/10.1016/j.bbrc.2008.09.050. Tang, X., Tang, G., & Ozcan, S. (2008). Role of microRNAs in diabetes. Biochimica Et Biophysica Acta (BBA)—Gene Regulatory Mechanisms, 1779(11), 697–701. https://dx.doi.org/10.1016/j.bbagrm.2008.06.010. Thorn, C. F., Ji, Y., Weinshilboum, R. M., Altman, R. B., & Klein, T. E. (2012). PharmGKB summary. Pharmacogenetics and Genomics, 22(8), 646–651. https://dx.doi.org/10.1097/fpc.0b013e3283527c02. Tian, J., Wu, N., Guo, X., Guo, J., Zhang, J., & Fan, Y. (2007). Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinformatics, 8, 450. https://dx.doi. org/10.1186/1471-2105-8-450. Tolar, J., Petryk, A., Khan, K., Bjoraker, K. J., Jessurun, J., Dolan, M., & Orchard, P. J. (2009). Long-term metabolic, endocrine, and neuropsychological outcome of hematopoietic cell transplantation for Wolman disease. Bone Marrow Transplantation, 43(1), 21–27. https://dx.doi.org/10.1038/bmt.2008.273. Trajkovski, M., Hausser, J., Soutschek, J., Bhat, B., Akin, A., Zavolan, M., & Stoffel, M. (2011). MicroRNAs 103 and 107 regulate insulin sensitivity. Nature, 474(7353), 649–653. https://dx.doi.org/10.1038/nature10112. Tuomilehto, J., Lindstr€ om, J., Eriksson, J. G., Valle, T. T., H€am€al€ainen, H., Ilanne-Parikka, P., & Uusitupa, M. (2001). Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. New England Journal of Medicine, 344(18), 1343–1350. https://dx.doi.org/10.1056/nejm200105033441801. Vienberg, S., Geiger, J., Madsen, S., & Dalgaard, L. T. (2017). MicroRNAs in metabolism. Acta Physiologica (Oxford), 219 (2), 346–361. https://dx.doi.org/10.1111/apha.12681. Wahl, S., Drong, A., Lehne, B., Loh, M., Scott, W. R., Kunze, S., & Chambers, J. C. (2017). Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature, 541(7635), 81–86. https://dx.doi.org/ 10.1038/nature20784. Wang, J., Al-Ouran, R., Hu, Y., Kim, S. Y., Wan, Y. W., Wangler, M. F., & Bellen, H. J. (2017). MARRVEL: Integration of human and model organism genetic resources to facilitate functional annotation of the human genome. American Journal of Human Genetics, 100(6), 843–853. https://dx.doi.org/10.1016/j.ajhg.2017.04.010. Wang, K., Gaitsch, H., Poon, H., Cox, N. J., & Rzhetsky, A. (2017). Classification of common human diseases derived from shared genetic and environmental determinants. Nature Genetics, 49(9), 1319–1325. https://dx.doi.org/10. 1038/ng.3931. Wang, H., Gauthier, B. R., Hagenfeldt-Johansson, K. A., Iezzi, M., & Wollheim, C. B. (2002). Foxa2 (HNF3β) controls multiple genes implicated in metabolism-secretion coupling of glucose-induced insulin release. Journal of Biological Chemistry, 277(20), 17564–17570. https://dx.doi.org/10.1074/jbc.m111037200. Wang, Q., Li, Y. C., Wang, J., Kong, J., Qi, Y., Quigg, R. J., & Li, X. (2008). MiR-17-92 cluster accelerates adipocyte differentiation by negatively regulating tumor-suppressor Rb2/p130. Proceedings of the National Academy of Sciences of the United States of America, 105(8), 2889–2894. https://dx.doi.org/10.1073/pnas.0800178105. Wilfred, B. R., Wang, W., & Nelson, P. T. (2007). Energizing miRNA research: a review of the role of miRNAs in lipid metabolism, with a prediction that miR-103/107 regulates human metabolic pathways. Molecular Genetics and Metabolism, 91(3), 209–217. https://dx.doi.org/10.1016/j.ymgme.2007.03.011. Williams, E. G., & Auwerx, J. (2015). The convergence of systems and reductionist approaches in complex trait analysis. Cell, 162(1), 23–32. https://dx.doi.org/10.1016/j.cell.2015.06.024.
2. THEORETICAL BACKGROUND AND METHODOLOGIES
REFERENCES
227
Wishart, D. S., Tzur, D., Knox, C., Eisner, R., Guo, A. C., Young, N., & Querengesser, L. (2007). HMDB: the human Metabolome database. Nucleic Acids Research, 35(Database). https://dx.doi.org/10.1093/nar/gkl923. Wisse, B. E. (2004). The inflammatory syndrome: the role of adipose tissue cytokines in metabolic disorders linked to obesity. Journal of the American Society of Nephrology, 15(11), 2792–2800. https://dx.doi.org/10.1097/01. ASN.0000141966.69934.21. Wong, T. Y., & Bressler, N. M. (2016). Artificial intelligence with deep learning technology looks into diabetic retinopathy screening. Journal of the American Medical Association, 316(22), 2366–2367. https://dx.doi.org/10.1001/ jama.2016.17563. Woollard, P. M., Mehta, N. A., Vamathevan, J. J., Van Horn, S., Bonde, B. K., & Dow, D. J. (2011). The application of next-generation sequencing technologies to drug discovery and development. Drug Discovery Today, 16(1112), 512–519. https://dx.doi.org/10.1016/j.drudis.2011.03.006. World Health Organization. (2016). Global report on diabetes. World Health Organization.http://www.who.int/iris/handle/ 10665/204871. Xie, H., Lim, B., & Lodish, H. F. (2009). MicroRNAs induced during adipogenesis that accelerate fat cell development are downregulated in obesity. Diabetes, 58(5), 1050–1057. https://dx.doi.org/10.2337/db08-1299. Yamamoto, T., Mishima, H., Mizukami, H., Fukahori, Y., Umehara, T., Murase, T., & Ikematsu, K. (2015). Metabolic autopsy with next generation sequencing in sudden unexpected death in infancy: postmortem diagnosis of fatty acid oxidation disorders. Molecular Genetics and Metabolism Reports, 5, 26–32. https://dx.doi.org/10.1016/j. ymgmr.2015.09.005. Yang, J. O., Oh, S., Ko, G., Park, S. J., Kim, W. Y., Lee, B., & Lee, S. (2011). VnD: a structure-centric database of diseaserelated SNPs and drugs. Nucleic Acids Research, 39(Database issue), D939–D944. https://dx.doi.org/10.1093/nar/ gkq957. Yang, X., Wang, X., Liu, D., Yu, L., Xue, B., & Shi, H. (2014). Epigenetic regulation of macrophage polarization by DNA methyltransferase 3b. Molecular Endocrinology, 28(4), 565–574. https://dx.doi.org/10.1210/me.2013-1293. Yates, A., Akanni, W., Amode, M. R., Barrell, D., Billis, K., Carvalho-Silva, D., & Flicek, P. (2016). Ensembl 2016. Nucleic Acids Research, 44(D1), D710–D716. https://dx.doi.org/10.1093/nar/gkv1157. Ye, H., Liu, Q., & Wei, J. (2014). Construction of drug network based on side effects and its application for drug repositioning. PLoS One, 9(2), e87864. https://dx.doi.org/10.1371/journal.pone.0087864. Zhang, M., Luo, H., Xi, Z., & Rogaeva, E. (2015). Drug repositioning for diabetes based on ’omics’ data mining. PLoS One. 10(5)e0126082https://dx.doi.org/10.1371/journal.pone.0126082. Zhao, J., Zhu, Y., Boerwinkle, E., & Xiong, M. (2015). Pathway analysis with next-generation sequencing data. European Journal of Human Genetics, 23(4), 507–515. https://dx.doi.org/10.1038/ejhg.2014.121.
2. THEORETICAL BACKGROUND AND METHODOLOGIES