Integrating in silico models for the prediction of mutagenicity (Ames test) of botanical ingredients of cosmetics

Integrating in silico models for the prediction of mutagenicity (Ames test) of botanical ingredients of cosmetics

Journal Pre-proofs Integrating in silico models for the prediction of mutagenicity (Ames test) of botanical ingredients of cosmetics Giuseppa Raitano,...

NAN Sizes 0 Downloads 48 Views

Journal Pre-proofs Integrating in silico models for the prediction of mutagenicity (Ames test) of botanical ingredients of cosmetics Giuseppa Raitano, Alessandra Roncaglioni, Alberto Manganaro, Masamitsu Honma, Laurent Sousselier, Quoc Tuan Do, Eric Paya, Emilio Benfenati PII: DOI: Reference:

S2468-1113(18)30116-6 https://doi.org/10.1016/j.comtox.2019.100108 COMTOX 100108

To appear in:

Computational Toxicology

Received Date: Revised Date: Accepted Date:

7 October 2018 5 May 2019 21 August 2019

Please cite this article as: G. Raitano, A. Roncaglioni, A. Manganaro, M. Honma, L. Sousselier, Q.T. Do, E. Paya, E. Benfenati, Integrating in silico models for the prediction of mutagenicity (Ames test) of botanical ingredients of cosmetics, Computational Toxicology (2019), doi: https://doi.org/10.1016/j.comtox.2019.100108

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.

Integrating in silico models for the prediction of mutagenicity (Ames test) of botanical ingredients of cosmetics

Giuseppa Raitano1*, Alessandra Roncaglioni1, Alberto Manganaro1, Masamitsu Honma2, Laurent Sousselier3, Quoc Tuan Do4, Eric Paya4, Emilio Benfenati1.

1. Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Giuseppe La Masa, 19, 20156 Milano, Italy 2. Division of Genetics and Mutagenesis, National Institute of Health Sciences, 3-25-26 Tonomachi, Kawasaki-ku, Kanagawa 210-9501, Japan 3. UNITIS, 24 rue Marbeuf F - 75008 PARIS 4. Greenpharma S.A.S 3, Allée du Titane, 45100 Orléans, France

*Corresponding author: Giuseppa Raitano, Department of Environmental Health Sciences, Istituto

di Ricerche Farmacologiche Mario Negri IRCCS, Via Giuseppe La Masa 19, 20156 Milano, Italy E-mail: [email protected]

Abstract Plant extracts are widely used as cosmetic ingredients and have to be investigated to guarantee consumer safety. However, these natural products are often complex mixtures of chemicals. No animal tests can be done, in compliance with cosmetic regulations, therefore non-testing methods (NTM) could be useful for preliminary screening to address the safety of finished cosmetic products. We developed an integrated strategy (IS) to assess the genotoxic potential of ~18000 molecules present in natural cosmetics ingredients by combining several quantitative structure-activity relationship (QSAR) models. This IS consists of a sequence of steps to formalize the expert reasoning. We also developed a new classification model based on a large dataset of compounds to clarify the outcomes that remain equivocal after the application of this strategy. 1

Highlights 

The aim was to assess the genotoxic potential of a large dataset of compounds by integrating several in silico models



The integration strategy was tuned and applied to ~ 18000 molecules found in plant extracts used in cosmetics



A new SAR model was developed in Pyton, based on active and inactive rules



The in silico methods were refined

Keywords Cosmetics, plant extracts, integrated strategy, in silico methods, mutagenicity, quantitative structure–activity relationship (QSAR) Abbreviations AD Applicability Domain, ADI Applicability Domain Index, ES Evaluation Set, FN false negative, FP false positive, IRFMN Istituto di Ricerche Farmacologiche Mario Negri, IS Integrated Strategy, ISS Istituto Superirore di Sanità, LR Training Likelihood Ratio, MCC Matthew’s Correlation Coefficient, NTM non- testing methods, PPV Positive Predicted Value, (Q)SAR Quantitative Structure Activity Relationship, SA Structural Alert, SVM Support Vector Machine, TN true negative, TP true positive, T.E.S.T. Toxicity Estimation Software Tool, VEGA virtual models for property evaluation of chemicals.

2

INTRODUCTION Lipsticks, skin cleansers, body lotions, shampoos and haircare products are just a few examples of how cosmetics used in our daily life. In view of their growing diffusion, the European authorities have regulated the production of cosmetic products and their ingredients in order to ensure a high level of protection for human and environmental health but, at the same time, not to interfere with their market. [1] Plant extracts are widely used as cosmetic ingredients due to the consumers’ frequent demand instead of synthetic constituents. However, these natural products are often complex mixtures of chemicals and can induce adverse reactions [2]. No animal tests can be conducted, in compliance with cosmetic regulations [1], so non-testing methods (NTM) like read-across, quantitative structure-activity relationship (QSAR) models and weight-of-evidence approaches are useful for preliminary screening in terms of the safety of ingredients or finished cosmetic products. NTM cut costs and time and can be part of more complex testing strategies, as recommended by legislation in many contexts [3, 4-10]. Indeed, international authorities recommend using more than one model considering both expert-based and statistical approaches [11-12]. Mutagens, carcinogens and toxicants for reproduction (CMR) arouse particular concern because of their adverse impact on human health. Among others, the bacterial reverse mutation test (Ames test) [13] is widely used during the initial screening for genotoxicity to address mutagenicity. The wide availability of consistent experimental data means this in vitro test is often employed to develop (Q)SAR models [14]. We made an in silico assessment of ~18000 molecules found in plant extracts used in cosmetics, applying an integrated strategy (IS) of several (Q)SAR models. Integrated strategies for mutagenicity assessment (Ames test) are already available and give good performance [15]. This IS consists of a sequence of steps to mimic expert reasoning in a formalized and reproducible way. Several tools/elements were considered as input for the IS, for instance the concordance 3

between the individual models’ predictions and their reliability, the positive predictive value (PPV) of structural alerts (SAs) when available, and the presence of specific exception rules. We also developed a new structure-activity relationship (SAR) model based on a large dataset of compounds (with unbalanced distribution between positive and negative compounds) to resolve any assessments that remain equivocal after the use of other tools.

2. MATERIALS AND METHODS 2.1 Models used We used ten freely available models belonging to different platforms. Table 1 sketches them out, focusing on the underlying approaches and their current level of availability.

Table 1. Models used and some details about their approach and availability.

Platforms

Model/profiler used Mutagenicity (Ames test) CONSENSUS model 1.0.2 Mutagenicity (Ames test) model (CAESAR) 2.1.13

VEGA v.1.1.4

T.E.S.T. v. 4.2.1

OECD QSAR toolbox v. 4.2

Approach Weighted combination of the four individual VEGA models depending on the individual models ADI values Statistical SVM model + expertbased SAs

Mutagenicity (Ames test) model (SarPy/IRFMN)1.0.7

Statistical SAs

Mutagenicity (Ames test) model (ISS) 1.0.2

Expert-based SAs

Mutagenicity (Ames test) model (KNN/Read-Across) 1.0.0

Read-across approach

CONSENSUS method

Average prediction of the individual models in the domain of applicability

Hierarchical clustering method

Statistical

FDA method

Statistical

Nearest neighbour method

Statistical (Read-across approach)

DNA alerts for AMES by OASIS v.1.4

Expert-based SAs

Status

already available

already available

already available 4

IRFMN group

New SARpy model (stepwise)

Statistical SAs

newly developed

We used the following VEGA platform (version 1.1.4) models [16]: - Mutagenicity AMES model (CAESAR) v. 2.1.13 model [17], based on the Bursi mutagenicity dataset [18] with a training data set of 4204 and a validation data set of 837 compounds. This hybrid model integrates a trained Support Vector Machine (SVM) classifier and an additional model based on Structural Alerts (SAs) matching. - Mutagenicity AMES model (SARpy/IRFMN) v. 1.0.7 [19] is built as a set of rules extracted automatically by SARpy (SAR in python) software from the same dataset as the CAESAR model [18]. The model includes specific rules for mutagenic and non-mutagenic activity. - The SAs related to mutagenicity of the ToxTree version 2.6 [20] are implemented in Mutagenicity AMES model (ISS) v.1.0.2 [21]. The ISS model provides a mutagenic prediction if at least one SA is matched in the target compound otherwise a compound is predicted as non-mutagenic. The dataset of the model counts 670 compounds. - Mutagenicity AMES model (KNN/Read across) v.1.0.0 [22] runs a read-across on a dataset of 5770 chemicals including a benchmark dataset compiled by Hansen et al. [14] and a collection of data (positive results) made available by the Japan Health Ministry within their Ames QSAR project [23]. - Consensus model v. 1.0.2 makes an overall assessment based on the predictions of the previous four VEGA mutagenicity models (CAESAR, SARpy, ISS and KNN) and their applicability domain index (ADI) values. The consensus model prediction is influenced by the concordance of the individual models’ predictions, together with their reliability in terms of the ADI values. The outcome of the VEGA consensus comes together with a score that is used to measure its confidence (consensus score). The score achieves its maximum value (1) only if one or more models find 5

experimental values in their training sets and all available values are concordant. In all other cases, the score will be lower. VEGA individual models provide an ADI as a measure of the reliability of each prediction. The ADI is calculated for each model by grouping other indices, each considering an element of the AD. The ADI aggregates information about the similarity and common structural features with compounds in the training set, the concordance in the experimental values for similar compounds and the accuracy in their prediction. ADI values range from 0 (worst case) to 1 (best case). In general, mutagenicity predictions with an ADI <0.6 are outside the AD of the model.

From the T.E.S.T. platform (version 4.2.1) [24] we included the following: - Hierarchical clustering method produces a series of clusters from the training set. Clusters are subsets of chemicals from the whole set, which have similar characteristics. A genetic algorithmbased selection was used to generate models for each cluster. - The U.S. Food and Drug Administration (FDA) method makes predictions using a single cluster (constructed at runtime) which contains structurally similar chemicals selected from the overall training set to build a model. This contrasts with the Hierarchical method, where the predictions are made using one or more clusters constructed a priori. - In the nearest neighbour approach, the predicted toxicity is the average of the toxicities of the three most similar chemicals (structural analogues) in the training set if their similarity exceeds a given threshold. - The Consensus method estimates the mutagenic activity by taking an average of the predicted toxicities from the above QSAR methods (hierarchical clustering, FDA and nearest neighbour) provided their prediction is valid.

6

All the TEST models are built on a benchmark dataset compiled by Hansen et al. [14] that consists of 5743 chemicals, and for all those models the prediction is made only when the compound is included in the respective AD. From the OECD QSAR toolbox platform (version 4.2) [25] we used the profiler for DNA alerts for AMES by OASIS. It counts 85 SAs responsible for the interaction of chemicals with DNA extracted from the Ames Mutagenicity model, part of the OASIS TIMES system [26-27]. 2.2 New SARpy model In addition to the ten available models, we developed a new SARpy model based on a stepwise approach. This model was used only when the IS was not able to give a reliable outcome. Generally, SARpy software extracts each possible fragment from a set of molecular structures and correlates these substructures with the activity of the molecules that contain them. As the last step, it selects fragments suitable to become SAs, on the basis of their prediction performance on the training set. Each SA, or rule, is associated to a Training Likelihood ratio (LR) as a measure of its statistical power. The SARpy model already available in the VEGA platform counts 112 rules specific for mutagenicity and 93 for non-mutagenicity. Here we collected a bigger training set (TS), mining several data sources: - A benchmark database compiled by Hansen and colleagues from the scientific literature [14] - A set of ~12000 compounds from the Ames/QSAR international project, National Institute of Health Sciences in Japan [23]. - A set of more than 700 molecules selected from the ECHA CHEM database [28] during the European project CALEIDOS [29]. The results of Ames tests for both Japanese substances (except for those classified as “strong positive”) and ECHA compounds are confidential and cannot be disclosed. Therefore, no information about their chemical identity is provided here. We pruned the data, removing duplicate structures and incongruent experimental results, neutralizing salts and removing counter ions 7

(manually or with in-house software). After this, the TS was composed of 18338 compounds (5025 mutagens and 13313 non-mutagens). From the first application of SARpy software on the whole dataset, we extracted more than 1000 rules specific for detecting either active or inactive compounds. The settled length of the fragments ranged from 2-18 atoms. Those new rules were compared with the old ones (from previous SARpy model) to exclude any in common. Afterwards, new rules with the best LR were added to the old ruleset. We matched this new ruleset with the TS compounds using in-house software (Istmolbase) looking for the presence of a list of fragments (as SMART strings) in a set of molecular structures. This way we checked the correctness of the formalism of the rules and verified their accuracy. Since several fragments (with different output labels) could be present in one molecule, rules selected had to meet the following thresholds: Inactive rules: %TN ≥ 80% and %NewTN≥ 70% Active rules: %TP ≥ 70% and %NewTP≥ 60% Where %TN (True Negative) is the percentage of negative compounds where the inactive rule was detected (generic accuracy). %NewTN is the percentage of negative compounds where the inactive rule was detected, not already matched by rules giving higher accuracy. %TP (True Positive) is the percentage of positive compounds where the active rule was detected. %NewTP is the percentage of positive compounds, where the active rule was detected, not already matched by rules giving higher accuracy. %NewTN and %NewTP were considered during the selection phase since they show how much the rule contributes to the correctness of the predictive model, independently from the other rules. The final list of rules was reduced to 725 (158 active and 567 inactive) after the application of the quality criteria, as explained above.

8

The molecules were classified on the basis of rule with greatest accuracy found in the molecule by Istmolbase. If several fragments with the same accuracy were detected in a molecule, positive prediction was preferred. The 725 rules did not match about 6000 compounds (34%), and for the predicted compounds 748 were wrongly predicted as negative (False Negative, FN) and 335 wrongly predicted as positive (False Positive, FP). To optimize the low coverage and the number of FNs, two additional extractions with SARpy were done using the same setting and criteria as above. We extracted and selected 11 rules for activity to detect among the 9709 compounds predicted as inactive (TN and FN) those that were actually FN. Similarly, 201 active rules where extracted from the 6144 compounds originally not matched by the set of 725 rules. However, since these rules have fewer substances at their basis, we defined the outcome of the rule as “possible” (Figure 1). In total, this new SARpy model counts 370 active and 567 inactive rules and makes predictions applying them in three steps. The decision tree underlying the new model is summarized in Fig. 1 and was applied on the TS. Accuracy, sensitivity and specificity [30] of the prediction results are specified below.

Accuracy = (TP + TN)/(TP + FN + FP + TN)

Eq. 1

Sensitivity = TP/(TP + FN)

Eq. 2

Specificity = TN/(TN + FP)

Eq. 3

Since the training set is unbalanced toward the negative compounds, we also computed the Matthew’s correlation coefficient (MCC) [31], as follows:

MCC = (TP ∗ TN ― FP ∗ FN)/√((TP + FP)(TP + FN)(TN + FP)(TN + FN))

Eq. 4 9

The MCC ranges from −1 to +1: +1 indicates perfect prediction while −1 indicates total disagreement between predicted and observed values. A value of 0 indicates prediction no better than random.

Figure 1. Decision tree of new SARpy model.

2.3 Positive predictive value for the OASIS and ISS SAs. Structural alerts of toxicity are a feature for concern in human chemicals assessment. Most of them identify well-known classes of substances, characterized by their mechanisms of action [32-33], and several predictive programs are based on this human experts’ knowledge. We used the OECD QSAR Toolbox profiler for DNA alerts for AMES by OASIS and the Mutagenicity AMES model (ISS). We collected SAs and validated them through their mechanism of action; they indicate classes of chemicals that potentially cause interaction with DNA. Since this potential is modulated by the rest of the structure in each molecule, not all compounds that give an SA are necessarily toxic. Furthermore, the list of alerts is not complete: not all toxic compounds are "explained" by an alert [34]. Therefore, to avoid false positive predictions with this approach and to 10

compare the alerts objectively, we tested the reliability of the ISS and OASIS alerts by calculating their positive predictive value (PPV), as in equation 1, on a bigger set of compounds, the set used to build the new SARpy model (TS).

𝑇𝑃

eq. 5

𝑃𝑃𝑉 = 𝑇𝑃 + 𝐹𝑃 2.4 Evaluation set (ES)

We used this IS to evaluate a set of 17954 compounds found in plants (natural complex substances – plant extracts), used as ingredients in cosmetics. Those compounds were from the NCStox project (http://www.unitis.org/en/ncs-tox-project-presentation,379.html) which aimed to develop a predictive database to determine the toxicological profile of natural complex substances by integrating data mining and in silico methods. The database was created by extracting from the Greenpharma database GPDB (http://www.greenpharma.com/services/greepharma-core-database-gpdb), at least ten representatives for each molecular group (Table 2). The molecules had to be of plant/mushroom origin. They were identified from the scientific literature and can be determined by different methods. Like for the TS, this ES was inspected for data cleaning. Table 2 shows the distribution of the compounds in 92 molecular groups. Table 2. Compounds in the evaluation set (ES) for each molecular group. Molecular group

No. of comp.

Molecular group

No. of comp.

Molecular group

No. of comp.

Abietane

244

Indole

336

Polycyclic_Diterpene

260

Acetylenic_Derivative

617

Indolizidine

28

Polyinsaturated_Fatty_Acid

60

Acridone

146

Iridoid

558

Proanthocyanidin

195

Amaryllidaceae

58

Isoflavane

73

Protoberberine

143

Anthocyanidin

196

Isoflavone

437

Pterocarpan

167

Anthraquinone

530

Isothiocyanate

25

Purine

84

Apocarotenoid

28

Kaurene

517

Pyranocoumarin

150

11

Aporphine

318

Labdane

560

Pyrethrinoid

7

Aurone

43

Lignane

447

Pyridine

73

Benzoquinone

121

Limonoid

72

Pyrrolizidine

193

Benzylisoquinoline

67

Macrocyclic_Diterpene

40

Quassinoid

165

Cannabinoid

29

Monoinsaturated_Fatty_Acid

68

Quinazoline

17

Carotenoid

114

Monosaccharide

129

Quinoline

54

Chalcone

207

Monoterpene

172

Quinolizidine

74

Coumarin

571

Monoterpene_Acid_And_Ester

17

Rotenoid

92

Coumestan

46

Monoterpene_Alcohol_And_Ether

227

Saturated_Fatty_Acid

72

Cyclitol

23

Monoterpene_Aldehyde

20

Sesquiterpene

866

Dianthrone

29

Monoterpene_Ketone

50

Sesquiterpene_Lactone

490

Dihydroflavonol

235

Morphinane

72

Steroidal_Alkaloid

953

Diterpene_Acid

109

Naphthoquinone

202

Sterol

846

Diterpene_Alcohol

455

Neoflavonoid

60

Stilbenoid

106

Diterpene_Lactone

161

Nucleoside

9

Styrylpyrone

28

Ellagitannin

27

Oligosaccharide

99

Sulfur_Derivative

27

Flavanone

125

Phenethylamine

133

Tetrahydroisoquinoline

376

Flavone

915

Phenol

21

Tocopherol

16

Flavonol

910

Phenolic_Acid

95

Triterpene

225

Furane

53

Phenolic_Acid_Ester

83

Tropane

122

Furanocoumarin

53

Phenols

1

Tropolone

76

Gallotannin

115

Phenyl_Ether

112

Tryptamine

31

Heteroside

45

Phloroglucinol

66

Xanthone

453

Imidazole

63

Piperidine

151

2.5 Exceptions to rules Taking account of the PPV analysis of SAs, we found that some of them had low statistical power on the TS. Some fragments were present both in mutagenic and non-mutagenic compounds of the TS, showing little ability to distinguish the two. Furthermore, in some cases mutagenicity alerts were found in non-mutagenic compounds. We focused on the molecular groups mostly populated in ES and containing molecules fired by a SA with a low PPV, and investigated the structural reasoning for the incongruence. Once the conditions for the exceptions to the rule (of the SA) were recognized, we statistically verified them by calculating their accuracy on the TS (number of correct negatively predicted compounds/number

12

of compounds detected). We thoroughly investigated the mechanism of action (or biological reason) for that detoxifying behavior. 2.6 AMES mutagenicity workflow Different factors were considered to decide the order and strategy for combining the outcomes with the in silico tools. In view of the large number of substances to be screened we decided to include in the first step ready-to-use information without the need for manual inspection of the results (consensus from T.E.S.T. and VEGA and SAs from OASIS). We included in the evaluation both statistical tools (VEGA and T.E.S.T.) and expert-based SAs (from OASIS) to reflect the approach suggested in the ICH M7 guideline [11] for the two types of information to boost the reliability of the assessment. The overall statistical quality of the proposed IS and some of its steps were also checked on the TS at the basis of the model described in section 2.2. The strategy integrates all the predictions made by the 11 models and provides an outcome with its own confidence starting from the agreement in the evaluation of the two consensus models and the assessment of the OASIS profiler. The confidence can be labelled from “low” to “very good” depending on several conditions. Very good confidence was assigned to those outcomes that reported the experimental results in Ames test of the compounds found either in the training sets of the models or in the OECD QSAR toolbox databases. If no experimental data were available, the predictions of the two consensus models and the presence of the OASIS DNA alert were considered the first step of the IS. If the two consensus models agree, the outcome has “good” or “moderate” confidence depending on the OASIS profiler result and on the number of individual models’ predictions. If two consensus models disagree, the confidence on the outcome decreases to “low” down to cases where the IS cannot provide any reliable outcome; only in this last case are the predictions from the new SARPy model used. The ADI value of the predictions, the presence of ISS alerts specific for mutagenicity together with their PPV assessment, and of exceptions to rules may affect the confidence or in some cases change 13

the outcomes of the IS. Since OASIS SAs are already considered in the first step, in this second step of the workflow ISS SAs were used to avoid FNs by inspecting their statistical accuracy. We classified ISS alerts as “not reliable” if the PPV was ≤0.5, “weakly reliable” if it was 0.5-0.7 and “reliable” if it was ≥0.7. Predictions with low ADI value (<0.65) were considered untrustworthy during IS assessment because outside the AD of the VEGA models. Figure 2 shows the IS workflow.

14

Figure 2. AMES workflow at the basis of the IS 15

Figure 3 reports some examples of the outcomes, and their confidence levels.

Figure 3. Possible outcomes of the IS their confidence levels. For example, the IS outcome will be “mutagenic” with moderate confidence if: - The two consensus models agree about mutagenicity, - Four of the seven individual models agree too, - OECD QSAR Toolbox does not find any alert of mutagenicity. Otherwise, the IS outcome will be “non-mutagenic” with moderate confidence if: - The two consensus models agree about non-mutagenicity, - Four of the seven individual models agree too, 16

- OECD QSAR Toolbox does not find any mutagenicity alert. - ISS detects a statistically consistent mutagenicity alert but its prediction is not reliable since it is outside the AD of the model (ADI=0), Table S1 in the supplementary material shows the possible combinations used in the ES assessment. 3. RESULTS and DISCUSSION 3.1 IS on the evaluation set Most of the ES compounds (14,894) were predicted as non-mutagenic by the integrated strategy. Two thirds (9606/14,894; 65%) of negative classifications have good confidence (Figure 4).

Figure 4. Results on 17954 compounds in the plant extracts.

The new SARpy model was used for the assessment of 2695 molecules since other IS steps did not provide any reliable outcome. Taking into account the molecular groups’ distribution, triterpenes, chalcones, proanthocyanidins and stilbenoids, accounting for more than 100 compounds each, were mainly classified as negative (all were non-mutagens). On the other hand, among the most populated molecular groups, xanthones, aporphines, anthraquinones and acridones were largely classified as positive (more than 96%). 17

Considering IS negative outcomes, monoterpene_aldehyde was the molecular group with the highest percentage of outcomes with very good confidence (40%); heteroside had the highest percentage of outcomes with good confidence (98%); anthocyanidin had the highest percentage of outcomes with moderate confidence (84%); and coumestan had the highest percentage of outcomes with low confidence (76%). Apocarotenoid had the highest percentage of positive outcomes with very good confidence (18%), acridone with good confidence (85%) isothiocyanate with moderate confidence (32%) and quinazoline with low confidence (35%). Only very few compounds (reported as negative or positive with very good confidence) had an experimental value in the literature sources (Figure 4). This reflects the fact that substances in natural extracts are experimentally poorly characterized in terms of Ames test data and may also occupy a chemical space not overlapping the one used as the basis for the models. 3.2 New SARpy model We compared the results from the new SARpy model on its dataset (TS) with those of the existing model developed on the Bursi mutagenicity dataset. The statistics were obtained merging “possible non-mutagenic” predictions with “non-mutagenic”, and “possible mutagenic” predictions with “mutagenic” (Table 3).

Table 3. Comparison of the predictions of the existing model (Mutagenicity AMES model-SARpy /IRFMN, 1.0.7) and the new one (New SARPy) on their datasets.

TP TN FP FN

New SARPy

Existing SARPy

4017 11319 1994 1008

2011 1431 425 337 18

TOT Accuracy Sensitivity Specificity MCC

18338 0.84 0.80 0.85 0.62

4204 0.82 0.86 0.77 0.63

Generally, the “in fitting” statistics of the models are similar: accuracy ranges from 0.82 for the available SARpy to 0.84 in the new one while the MCC is respectively 0.63 and 0.62. The new SARpy model gives considerably fewer false positives, raising the specificity to 0.85 of the predictions, while the available SARpy model has higher sensitivity. This can be explained considering the composition of the training sets of the models. The former Mutagenicity AMES model (SARpy /IRFMN) is based on a more balanced dataset (with 56% of mutagens) while the training set of the new model has more negative compounds than positive ones (the prevalence of mutagens is only 27%); therefore it contains more rules for inactive compounds: 370 for actives and 567 for inactives. 3.3 Positive predictive value of SAs During PPV analysis 45 ISS alerts were detected in TS and then investigated: 19 had a PPV > 0.70 (Table 4). No compound had in its structure SA67 triphenylimidazole fragment. Table 4. ISS SAs ordered by PPV: TOT indicates the total number of compounds where the SA is detected in TS, POS indicates the number of positive compounds where the SA is detected. Structural Alerts SA9 Alkyl nitrite SA57 DNA intercalating agents with a basic side chain SA58 Haloalkene cysteine S-conjugates SA62 N-acyloxy-N -alkoxybenzamides SA65 Halofuranones SA64 Hydroxamic acid derivatives SA68 9,10-dihydrophenanthrenes SA21 Alkyl and aryl N-nitroso groups

TOT POS PPV 7 7 1.00 9 9 1.00 7 7 1.00 37 37 1.00 20 20 1.00 13 12 0.92 94 84 0.89 186 166 0.89 19

SA25 Aromatic nitroso group SA22 Azide and triazene groups SA5 S or N mustard SA61 Alkyl hydroperoxides SA63 N-aryl-N-acetoxyacetamides SA18 Polycyclic aromatic hydrocarbons SA6 Propiolactones and propiosultones SA7 Epoxides and aziridines SA69 Fluorinated quinolines SA27 Nitro aromatic SA19 Heterocyclic polycyclic aromatic hydrocarbons SA66 Anthrones SA12 Quinones SA23 Aliphatic N-nitro SA24 alfa,beta unsaturated alkoxy SA3 N-methylol derivatives SA8 Aliphatic halogens SA14 Aliphatic azo and azoxy SA28 Primary aromatic amine, hydroxyl amine and its derived esters (with restrictions) SA59 Xanthones, thioxanthones, acridones SA28ter Aromatic N-acyl amine SA1 Acyl halides SA28bis Aromatic mono- and dialkylamine SA60 Flavonoids SA13 Hydrazine SA37 Pyrrolizidine alkaloids SA30 Coumarins and furocoumarins SA2 Alkyl (C<5) or benzyl ester of sulphonic or phosphonic acid SA29 Aromatic diazo SA4 Monohaloalkene SA26 Aromatic ring N-oxide SA11 Simple aldehyde SA15 Isocyanate and isothiocyanate groups SA10 alfa, beta unsaturated carbonyls SA16 Alkyl carbamate and thiocarbamate SA38 Alkenylbenzenes SA39 Steroidal estrogens SA67 Triphenylimidazole

46 91 35 21 6 674 409 459 15 1419 491 190 235 20 20 11 932 45

41 81 30 18 5 521 305 338 11 1040 349 113 138 11 11 6 499 24

0.89 0.89 0.86 0.86 0.83 0.77 0.75 0.74 0.73 0.73 0.71 0.59 0.59 0.55 0.55 0.55 0.54 0.53

1167 50 273 225 263 7 293 12 51 103 496 43 51 323 60 805 349 123 3 0

620 26 122 100 113 3 125 5 21 42 200 17 20 71 13 167 49 16 0 0

0.53 0.52 0.45 0.44 0.43 0.43 0.43 0.42 0.41 0.41 0.40 0.40 0.39 0.22 0.22 0.21 0.14 0.13 0 N/A

SA9, SA57, SA58, SA62 and SA65 had the best PPVs while SA26, SA11, SA15, SA10, SA16 and SA38 gave a low statistical reliability with PPV<0.40.

20

In the case of OASIS, 78 structural alerts were detected and investigated: 49 had a PPV > 0.70 and 13 gave PPV <0.40. (Table 5). When possible, the comparison of the PPVs showed many OASIS SAs were more accurate than ISS’s. Table 5. OASIS SAs ordered by PPV: TOT indicates the total number of compounds where the SA is detected in TS, POS indicates the number of positive compounds where the SA is detected. OASIS Structural Alerts Acyclic triazenes Alkylnitrites Aminoacridine DNA intercalators Coumarins Flavonoids Haloalkene cysteine s-conjugates Haloepoxides and halooxetanes Halofuranones Haloisothiazolinones Organic diselenides and ditellurides Perfluoroalkyl hypohalites Peroxyacyl nitrates Propyne derivatives Quinoxaline-type 1,4-dioxides Sulfonyl azides N-acyloxy(alkoxy) arenamides Organic azides P-aminobiphenyl analogs Vicinal dihaloalkanes Haloalkanes containing heteroatom N-nitroso compounds Organic peroxy compounds Conjugated nitroalkenes and five-membered aromatic nitroheterocyclics Fused-ring nitroaromatics Nitrogen and sulfur mustards Epoxides and aziridines C-nitroso compounds Polynitroarenes Polycyclic aromatic hydrocarbon and naphthalenediimide derivatives Nitrophenols, nitrophenyl ethers and nitrobenzoic acids Sulfonates and sulfates DNA intercalators with carboxamide and aminoalkylamine side chain Fused-ring primary aromatic amines Nitro azoarenes and p-substituted azobenzenes

TOT POS PPV 6 6 1.00 7 7 1.00 9 9 1.00 11 11 1.00 3 3 1.00 4 4 1.00 1 1 1.00 18 18 1.00 1 1 1.00 1 1 1.00 1 1 1.00 1 1 1.00 1 1 1.00 2 2 1.00 3 3 1.00 33 32 0.97 62 60 0.97 18 17 0.94 14 13 0.93 45 41 0.91 87 79 0.91 32 29 0.91 191 173 0.91 226 203 0.90 9 8 0.89 233 206 0.88 34 30 0.88 50 44 0.88 315 276 0.88 45 39 0.87 36 31 0.86 19 16 0.84 95 80 0.84 53 43 0.81 21

Alpha,beta-Unsaturated Aldehydes Anthrones Nitroalkanes Quinoline derivatives Thiols Dicarbonyl compounds Nitrobiphenyls and bridged nitrobiphenyls P-substituted mononitrobenzenes Arenediazonium salts Haloalcohols Haloalkane derivatives with labile halogen N-hydroxylamines Polarized haloalkene derivatives Nitroarenes with other active groups N-aryl-N-acetoxy(benzoyloxy) acetamides Single-ring substituted primary aromatic amines N,N-dialkyldithiocarbamate derivatives Nitroaniline derivatives Quinones and trihydroxybenzenes Acridone, thioxanthone, xanthone and phenazine derivatives Monohaloalkanes Amino anthraquinones Hydrazine derivatives Acyl halides Four- and five-membered lactones Triarylimidazole and structurally related DNA intercalators Diazoalkanes Alpha-haloethers Haloalkenes with electron-withdrawing groups Sulfonyl halides Alkylphosphates, alkylthiophosphates and alkylphosphonates Specific imine and thione derivatives Sultones Specific acetate esters Diazenes and azoxyalkanes N-methylol derivatives Quinoneimines N-acetoxyamines Quinolone derivatives Quinone methides Geminal polyhaloalkane derivatives Alpha-beta conjugated alkene derivatives with geminal electron-withdrawing groups Pyrrolizidine derivatives Specific 5-substituted uracil derivatives

5 5 5 10 5 14 44 92 12 36 104 90 51 60 7 65 3 69 101 47 15 17 105 14 9 15 17 31 6 33 40 70 13 92 3 3 16 7 34 6 315

4 4 4 8 4 11 34 71 9 27 78 67 37 43 5 44 2 45 63 29 9 10 61 8 5 8 9 16 3 16 16 27 5 32 1 1 5 2 9 1 52

0.80 0.80 0.80 0.80 0.80 0.79 0.77 0.77 0.75 0.75 0.75 0.74 0.73 0.72 0.71 0.68 0.67 0.65 0.62 0.62 0.60 0.59 0.58 0.57 0.56 0.53 0.53 0.52 0.50 0.48 0.40 0.39 0.38 0.35 0.33 0.33 0.31 0.29 0.26 0.17 0.17

8 1 2

1 0 0

0.13 0.00 0.00 22

The flavonoid family is well represented in the ES: there were 2632 compounds, comprising anthocyanidins, aurones, chalcones, dihydroflavonols, flavanones, flavones and flavonols. SA60 Flavonoids matched only the quercetin-type flavonoids (flavonols molecular group). By matching all flavonoids in the TS through SMART strings (supplementary Table S2), it emerged that there were 17 flavonoids in the TS, four of them mutagenic, so the positive rate for this class of compounds is low (0.24). Considering how the SA for flavonoids is coded in both the ISS and OASIS lists, OASIS appeared to identify mutagenic flavonoids better (PPV 1.00) while ISS alerts included several FPs with a lower PPV (PPV 0.43). 3.4 Exception rules Coumarins and flavonoids are widely used in several cosmetics and personal care products respectively as a fragrance ingredient or for their anti-oxidant properties. Though several expertbased SAR predictive methods, like the ISS SAs, consider these compounds potentially genotoxic (in Ames test), our analysis of PPV showed low statistics for this and therefore we explored possible explanations and the potential biological conditions responsible for detoxification of coumarins and flavonoids (SA30 and SA60). 3.4.1 Specific exception rule for coumarins The ISS model detected SA30 coumarins and furocoumarins, and predicts as mutagenic 51 compounds of the TS, but only 21 are experimentally toxic (Table 4). We analysed the structures of mutagenic and non-mutagenic coumarins and found that all the 18 molecules with O or N atoms in seventh position, and not included in an aromatic ring, were experimentally non-mutagenic (Figure 5).

23

Figure 5. Structure of the exception rule for coumarins.

This could be explained biologically since 3,4 epoxidation of coumarins could lead to DNA damage through covalent binding while if the coumarin can be 7-hydroxylated its metabolites are excreted in the urine as glucuronide and sulphate conjugates. The ISS model predicts 845 molecules as mutagenic in the ES due to the presence of SA30 but only 252 of these have - according to the IS application - a mutagenic outcome. In fact, the remaining 593 compounds have a negative outcome that is justified for 473 of them by the presence of the detoxifying fragment. Benfenati et al. (2015) already encoded and then included this rule among those implemented in ToxRead software [35]. The current analysis of coumarins not only confirms this but also boosts its statistical reliability, since now it is based on a three times bigger dataset and clarifies the biological reasons for this behaviour. 3.4.2 Glucoside fragment In our assessment of the ES, the glucoside fragment (GF) was present in many compounds (2816): flavonols, iridoids, flavones, steroidal_alkaloids, anthocyanidins, isoflavones and monosaccharides were the molecular groups most involved, each having more than 100 compounds with that fragment. We investigated the role of this fragment in the TS. Seventeen of the 94 compounds with at least one GF in their structure are mutagenic experimentally. If we consider the molecules with more than one GF, the ratio between mutagenic 24

and non-mutagenic compounds decreases: only three out of 33 compounds are mutagenic (with a prevalence of 0.09%). In particular, if we consider only flavonoids, all five compounds with this condition (GF>1) are experimentally non-mutagenic (Figure 6).

Figure 6. Glucoside fragment characterization: more than one ring, even not directly linked. R group could be an oxygen atom, an aliphatic carbon or an aromatic carbon.

In the whole ES, GF >1 was found in 585 compounds; 29 of them predicted mutagenic mostly with low confidence and 556 predicted non-mutagenic mostly with good confidence. The ISS model found SA60 in 518 compounds while the IS gave 461 out of 518 compounds with negative outcomes, 118 of them with GF>1. 4. CONCLUSIONS We describe a new integrated strategy combining several in silico methods to assess the (Ames) mutagenicity potential of ~18000 molecules found in plants extracts and used as ingredients in cosmetics. Most of them (14,894) were classified as negative, with good confidence for two thirds (65%). In the workflow of this strategy, we combined several tools and optimized their integration. We calculated the positive predictive value on a large database of substances, using expert-based 25

structural alerts, this parameter could serve as an ingredient to assess their reliability in this specific context. Toxicological assessment based on this kind of SAs is biased toward the identification of toxic effects and tends to neglect non-toxic ones. We identified and statistically tested possible exception rules for two structural alerts (SA30 coumarins and furocoumarins and SA60 flavonoids) that fired in a large number of compounds of the ES but gave a low PPV in the TS. This refinement was very useful to improve the outcomes of the AMES integrated assessment strategy and could be valuable to avoid false positives in the detection of substances for further scrutiny. We also developed a new classification model based on a large dataset of compounds to resolve any assessments that remain equivocal after application of the IS. This information about the individual constituents of natural mixtures can be used to prioritize extracts whose chemical composition calls for thorough investigation of their mutagenic potential because of the presence of suspect constituents according to the in silico screening strategy. This approach is in line with what has been proposed in other contexts such as in the EFSA statement on Genotoxicity assessment of chemical mixtures [36] where a component-based approach is recommended in the case of chemically fully defined mixtures. Read-across and (Q)SAR outcomes are mentioned there among the options to assess the mixtures components individually and can assist in defining further strategies to conclude on the genotoxicity assessment.

26

Acknowledgements We would like to thank Pr. Sylvie Michel and Dr. Hanh Dufat from the Laboratory of Pharmacognosy-UMR of the University Paris Descartes for their precious work defining molecular groups. References 1. Regulation (EC) No 1223/2009 of the European Parliament and of the Council of 30 November 2009 on cosmetic products. http://data.europa.eu/eli/reg/2009/1223/oj 2. Plant Extracts in Skin Care Products, Special Issue Editors Beatriz P.P. Oliveira Francisca Rodrigues https://doi.org/10.3390/books978-3-03897-161-0 3. Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC. http://data.europa.eu/eli/reg/2006/1907/oj 4. EFSA report: Guidance on tiered risk assessment for plant protection products for aquatic organisms in edge-of-field surface waters. https://www.efsa.europa.eu/it/efsajournal/pub/3290 5. EFSA Supporting Publications: Evaluation of the applicability of existing (Q)SAR models for predicting the genotoxicity of pesticides and similarity analysis related with genotoxicity of pesticides for facilitating of grouping and read across, DOI: 10.2903/sp.efsa.2019.EN-1598 6. Mombelli E, Ringeissen S. The computational prediction of toxicological effects in regulatory contexts. L’Actualité chimique. 2009; 335:52–59 7. Fjodorova N, Novich M, Vrachko N et al (2008) Directions in QSAR modeling for regulatory uses in OECD member countries, EU and in Russia. J Environ Sci Health C 26:201–236 8. Guidance on information requirements and chemical safety assessment. Chapter R.7a: Endpoint specific guidance. https://echa.europa.eu/documents/10162/13632/information_requirements_r7a_en.pdf/e4a2a18fa2bd-4a04-ac6d-0ea425b2567f 9. Guidance on information requirements and chemical safety assessment Chapter R.6: QSARs and grouping of chemicals. https://echa.europa.eu/documents/10162/13632/information_requirements_r6_en.pdf/77f49f81b76d-40ab-8513-4f3a533b6ac9 10. OECD Environment Health and Safety Publications Series on Testing and Assessment No. 69 “OECD GUIDANCE DOCUMENT ON THE VALIDATION OF (QUANTITATIVE) STRUCTURE-ACTIVITY RELATIONSHIPS [(Q)SAR] MODELS”. http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?doclanguage=en&cote=env/jm/ mono(2007)2 27

11. ICH M7, 2017. (R1). Assessment and Control of DNA Reactive (Mutagenic) Impurities in Pharmaceuticals to Limit Potential Carcinogenic Risk. http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Multidisciplinary/M7/M 7_R1_Addendum_Step_4_31Mar2017.pdf 12. Amberg, A., Beilke, L., Bercu, J., Bower, D., Brigo, A., Cross, K. P., Custer, L., Dobo, K., Dowdy, E., Ford, K. A., Glowienke, S., Van Gompel, J., Harvey, J., Hasselgren, C., Honma, M., Jolly, R., Kemper, R., Kenyon, M., Kruhlak, N., Leavitt, P., Miller, S., Muster, W., Nicolette, J., Plaper, A., Powley, M., Quigley, D. P., Reddy, M. V., Spirkl, H. P., Stavitskaya, L., Teasdale, A., Weiner, S., Welch, D. S., White, A., Wichard, J. and Myatt, G. J. (2016) Principles and procedures for implementation of ICH M7 recommended (Q)SAR analyses. Regul. Toxicol. Pharmacol. 77, 1324 13. OECD (1997), Test No. 471: Bacterial Reverse Mutation Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, https://doi.org/10.1787/9789264071247en. 14 Hansen, K.; Mika, S.; Schroeter, T.; Sutter, A.; ter Laak, A.; Steger-Hartmann, T.; Heinrich, N.; Müller, K.-R. Benchmark Data Set for in Silico Prediction of Ames Mutagenicity. J. Chem. Inf. Model. 2009, 49, 2077-2081.; Benchmark, T. http://doc.ml.tu-berlin.de/toxbenchmark/ (accessed 4/30/10) 15. Evaluation of QSAR models for the prediction of ames genotoxicity: a retrospective exercise on the chemical substances registered under the EU REACH regulation. Antonio Cassano, Giuseppa Raitano, Enrico Mombelli, Alberto Fernández, Josep Cester, Alessandra Roncaglioni, Emilio Benfenati. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2014; 32(3): 273–298. doi: 10.1080/10590501.2014.938955 16. https://www.vegahub.eu/ 17. T. Ferrari and G. Gini, An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts, Chem. Cent. J. 4 Suppl. 1, S2 (2010), pp. 1–6. 18. J. Kazius, R. McGuire, and R. Bursi, Derivation and validation of toxicophores for mutagenicity prediction, J. Med. Chem. 48 (2005), pp. 312–320 19. T. Ferrari, D. Cattaneo, G. Gini, N. Golbamaki Bakhtyari, A. Manganaro, and E. Benfenati, Automatic knowledge extraction from chemical structures: The case of mutagenicity prediction, SAR QSAR Environ. Res. 24 (2013), pp. 365–383. 20. http://toxtree.sourceforge.net 21. R. Benigni and C. Bossa, Structure alerts for carcinogenicity, and the Salmonella assay system: A novel insight through the chemical relational databases technology. Mutat. Res. 659 (2008), pp. 248–261. 22 A. Manganaro, F. Pizzo, A. Lombardo, A. Pogliaghi, and E. Benfenati, Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm, Chemosphere 144 (2016), pp. 1624–1630 23. Ames/QSAR international project, National Institute of Health Sciences in Japan http://www.nihs.go.jp/dgm/amesqsar.html 28

24. T. Martin, User’s Guide for T.E.S.T. (Toxicity Estimation Software Tool), U.S. EPA/National Risk Management Research Laboratory/Sustainable Technology Division, Cincinnati, OH (2016). Available at https://www.epa.gov/sites/production/files/2016-05/documents/ 600r16058.pdf 25. https://www.qsartoolbox.org/it/ 26. Mekenyan, O., Dimitrov, S., Serafimova, R., Thompson, E., Kotov, S., Dimitrova, N., and Walker, J. (2004) Identification of the structural requirements for mutagenicity by incorporating molecular flexibility and metabolic activation of chemicals I: TA100. Chem. Res. Toxicol. 17, 753766.2. 27. Serafimova, R., Todorov, M., Pavlov, T., Kotov, S., Jacob, E., Aptula, A., and Mekenyan, O. (2007) Identification of the structural requirementsfor mutagencitiy, by incorporating molecular flexibility and metabolic activation of chemicals. II. General Ames mutagenicity model. Chem. Res. Toxicol. 20, 662-676. 28. European Chemicals Agency. Registered substances; 2014. http://echa.europa.eu/ web/guest/information-on-chemicals/registered-substances. 29. http://www.life-caleidos.eu/pages/project.php 30. Cooper JA, Saracci R, Cole P. Describing the validity of carcinogen screening tests. Br J Cancer. 1979;39:87-89. 31. Dao P, Wang K, Collins C, Ester M, Lapuk A, Sahinalp SC. Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics. 2011;27:205-213. 32. Benigni Romualdo e Bossa Cecilia Mechanisms of Chemical Carcinogenicity and Mutagenicity: A Review with Implications for Predictive Toxicology, dx.doi.org/10.1021/cr100222q | Chem. Rev. 2011, 111, 2507–2536) 33. Benigni Romualdo e Bossa Cecilia, Structural Alerts of Mutagens and Carcinogens, Current Computer-Aided Drug Design, 2006, 2,(2),169-176) 34. Floris, M., Raitano, G., Medda, R., & Benfenati, E. Fragment Prioritization on a Large Mutagenicity Dataset. Molecular Informatics. DOI: 10.1002/minf.201600133 35. Emilio Benfenati, Serena Manganelli, Sabrina Giordano, Giuseppa Raitano & Alberto Manganaro (2015) Hierarchical Rules for Read-Across and In Silico Models of Mutagenicity, Journal of Environmental Science and Health, Part C, 33:4, 385-403, DOI: 10.1080/10590501.2015.1096881 36. Genotoxicity assessment of chemical mixtures, EFSA Journal 2019;17(1):5519, doi: 10.2903/j.efsa.2019.5519 https://www.efsa.europa.eu/en/efsajournal/pub/5519

29

Supplementary Material Table S1. All combinations of the integrated strategy used during the ES assessment; if the IS outcome was not reliable, a SARpy new model prediction was provided.

2 consensus (TEST&VEGA)

Single models

OASIS OECD QSARToolbox

Outcome

Confidence

agree about nonmutagenicity

all models agree about nonmutagenicity

no alert found

NONMutagenic

good

agree about nonmutagenicity

6/7 models agree about nonmutagenicity

no alert found

NONMutagenic

good

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, exception rule

no alert found

NONMutagenic

good

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, ISS finds alert not reliable

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, ISS finds alert reliable (acc>=0.7)

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, no exception rule

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

5/7 models agree about nonmutagenicity, exception rule

no alert found

NONMutagenic

good

agree about nonmutagenicity

5/7 models agree about nonmutagenicity, alert not reliable

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

5/7 models agree about nonmutagenicity, an alert not reliable, glucoside fragments

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

5/7 models agree about nonmutagenicity

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

5/6 models agree about nonmutagenicity

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

5/7 models agree about nonmutagenicity but ISS finds alert reliable (acc>=0.7)

no alert found

SARpy

low

agree about nonmutagenicity

5/7 models agree about nonmutagenicity but ISS finds alert reliable (acc>=0.7) but ADI=0 or very low

no alert found

NONMutagenic

moderate or low

agree about nonmutagenicity

4/7 models agree about nonmutagenicity, exception rule

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

4/7 models agree about nonmutagenicity but ISS finds alert reliable (acc>=0.7)

no alert found

SARpy

low 30

agree about nonmutagenicity agree about nonmutagenicity

4/7 models agree about nonmutagenicity but ISS finds alert reliable (acc>=0.7), ADI 0 or very low 4/7 models agree about nonmutagenicity

no alert found

NONMutagenic

moderate or low

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

4/7 models agree about nonmutagenicity, ISS finds alert not reliable

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

4/6 models agree about nonmutagenicity

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

3/7 models agree about nonmutagenicity

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

3/7 models agree about nonmutagenicity but ISS finds alert reliable (acc>=0.7)

no alert found

SARpy

low

no alert found

NONMutagenic

moderate or low

no alert found

NONMutagenic

moderate

agree about nonmutagenicity

3/7 models agree about nonmutagenicity but ISS finds alert reliable (acc>=0.7), ADI 0 or very low two models do not provide a prediction

agree about nonmutagenicity

all models agree about nonmutagenicity

alert found

NONMutagenic

moderate

agree about nonmutagenicity

6/7 models agree about nonmutagenicity

alert found

NONMutagenic

moderate

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, ISS finds an alert not reliable

alert found

NONMutagenic

moderate

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, ISS finds alert reliable but the ADI value is 0

alert found

NONMutagenic

moderate

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, exception rule

alert found

NONMutagenic

moderate

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, no exception rule

alert found

NONMutagenic

low

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, glucoside fragments

alert found

NONMutagenic

moderate

agree about nonmutagenicity

6/7 models agree about nonmutagenicity, ISS finds alert reliable (acc>=0.7)

alert found

SARpy

low

agree about nonmutagenicity

5/7 models agree about nonmutagenicity

alert found

NONMutagenic

low

agree about nonmutagenicity

31

agree about nonmutagenicity

5/7 models agree about nonmutagenicity, ISS finds an alert not reliable and ADI 0 or glucoside fragments or exception rule.

alert found

NONMutagenic

moderate

agree about nonmutagenicity

5/7 models agree about nonmutagenicity, ISS finds alert reliable (acc>=0.7)

alert found

SARpy

low

agree about nonmutagenicity

4/7 models agree about nonmutagenicity

alert found

SARpy

low

agree about nonmutagenicity

4/7 models agree about nonmutagenicity, ISS finds alert reliable (acc>=0.7)

alert found

Mutagenic

moderate or low

agree about nonmutagenicity

3/7 models agree about nonmutagenicity

alert found

Mutagenic

low

agree about nonmutagenicity

3/7 models agree about nonmutagenicity, ISS finds alert reliable (acc>=0.7)

alert found

Mutagenic

moderate or low

TEST provides no prediction

all models of VEGA agree about non-mutagenicity

alert found

SARpy

low

TEST provides no prediction TEST provides no prediction, VEGA consensus predicts mutagenic TEST provides no prediction, VEGA consensus predicts mutagenic TEST provides no prediction, VEGA consensus predicts non-mutagenic TEST provides no prediction, VEGA consensus predicts non-mutagenic TEST provides no prediction, VEGA consensus predicts non-mutagenic TEST provides no prediction, VEGA consensus predicts non-mutagenic TEST provides no prediction, VEGA consensus predicts non-mutagenic agree about mutagenicity

all models of VEGA agree about non-mutagenicity

no alert found

NONMutagenic

moderate

3/4 models of VEGA predict mutagenic (ISS finds alert reliable)

no alert found

Mutagenic

moderate

2/4 models of VEGA predict mutagenic (ISS finds alert reliable)

no alert found

SARpy

low

2/4 models of VEGA predict mutagenic (ISS finds alert reliable)

no alert found

SARpy

low

3/4 models of VEGA predict non-mutagenic, the other model provides no prediction

no alert found

NONMutagenic

low

3/4 models of VEGA predict non-mutagenic (ISS finds alert reliable)

no alert found

SARpy

low

3/4 models of VEGA predict mutagenic

no alert found

SARpy

low

1/4 models of VEGA predict mutagenic

no alert found

NONMutagenic

low

alert found

Mutagenic

good

all models agree about mutagenicity

32

agree about mutagenicity

6/7 models agree about mutagenicity

alert found

Mutagenic

good

agree about mutagenicity

5/6 models agree about mutagenicity

alert found

Mutagenic

good

agree about mutagenicity

5/7 models agree about mutagenicity

alert found

Mutagenic

good

agree about mutagenicity agree about mutagenicity

4/7 models agree about mutagenicity all models agree about mutagenicity

alert found

Mutagenic

good

no alert found

Mutagenic

good

agree about mutagenicity

5/6 models agree about mutagenicity (ISS finds alert reliable, acc>=0.7)

no alert found

Mutagenic

good

agree about mutagenicity

5/7 models agree about mutagenicity (ISS finds alert reliable, acc>=0.7)

no alert found

Mutagenic

good

no alert found

Mutagenic

good

no alert found

Mutagenic

moderate

no alert found

Mutagenic

moderate

no alert found

Mutagenic

moderate

no alert found

Mutagenic

good

no alert found

Mutagenic

moderate

no alert found

Mutagenic

moderate

no alert found

Mutagenic

good

agree about mutagenicity

6/7 models agree about mutagenicity (ISS finds alert reliable) 6/7 models agree about mutagenicity

agree about mutagenicity

6/7 models agree about mutagenicity, exception rule

agree about mutagenicity

5/7 models agree about mutagenicity 5/7 models agree about mutagenicity (ISS finds alert reliable)

agree about mutagenicity

agree about mutagenicity agree about mutagenicity agree about mutagenicity agree about mutagenicity

5/7 models agree about mutagenicity (ISS finds alert reliable but the ADI value is 0) 5/7 models agree about mutagenicity 5/7 models agree about mutagenicity, no exception rule

agree about mutagenicity

4/7 models agree about mutagenicity, exception rule, ISS find alert reliable but ADI=0

no alert found

Mutagenic

moderate

agree about mutagenicity

4/7 models agree about mutagenicity, no exception rule

no alert found

Mutagenic

good

agree about mutagenicity

4/7 models agree about mutagenicity (ISS finds alert reliable acc>=0.7)

no alert found

Mutagenic

good

33

agree about mutagenicity

4/6 models agree about mutagenicity

no alert found

Mutagenic

moderate

agree about mutagenicity

3/7 models agree about mutagenicity

no alert found

Mutagenic

moderate

agree about mutagenicity

3/7 models agree about mutagenicity; exception rule

no alert found

Mutagenic

low

disagree

6/7 models agree about nonmutagenicity

no alert found

disagree

1/7 model predicts mutagenic

no alert found

disagree

1/6 model predicts mutagenic 4/6 models predict mutagenicity (ISS finds alert reliable) 4/6 models predict mutagenicity (ISS finds alert reliable) 3/7 models predict mutagenicity, exception rule

no alert found

NONMutagenic NONMutagenic SARpy

alert found

Mutagenic

moderate

no alert found

SARpy

low

no alert found

NONMutagenic

low

2/7 models predict mutagenic

alert found

SARpy

low

alert found

Mutagenic

moderate

no alert found

Mutagenic

moderate

disagree disagree disagree disagree disagree disagree

6/7 models predict mutagenic (ISS finds alert reliable) 5/7 models predict mutagenic (ISS finds alert reliable, acc>=0.7)

moderate low low

disagree

5/7 models predict mutagenic (ISS finds alert reliable) but ADI=0 or exception rule

alert found

Mutagenic

low

disagree

6/7 models predict mutagenic (ISS finds alert reliable)

no alert found

Mutagenic

moderate

disagree

5/7 models predict mutagenic

no alert found

SARpy

low

disagree

3/7 models predict mutagenic (ISS finds alert reliable)

alert found

Mutagenic

low

disagree

3/7 models predict mutagenic (ISS finds an alert not reliable)

alert found

SARpy

low

disagree

4/7 models predict mutagenic

alert found

Mutagenic

low

disagree

4/7 models predict mutagenic; ISS finds an alert not reliable

alert found

Mutagenic

low

disagree

4/7 models predict mutagenic (ISS finds alert reliable)

alert found

Mutagenic

moderate

disagree

4/7 models predict mutagenic, no exception rule

alert found

Mutagenic

moderate

no alert found

SARpy

low

no alert found

Mutagenic

low

disagree disagree

4/7 models predict mutagenic 4/7 models predict mutagenic (ISS finds alert reliable, acc>=0.7)

34

disagree

4/7 models predict mutagenic, exception rule, no alert found by QSAR Toolbox.

no alert found

SARpy

low

disagree

3/6 models predict mutagenic

no alert found

SARpy

low

disagree

3/7 models predict mutagenic, no exception rule

no alert found

SARpy

low

disagree

3/7 models predict mutagenic

alert found

SARpy

low

disagree

3/7 models predict mutagenic (ISS finds alert reliable)

alert found

Mutagenic

low

disagree

3/7 models predict mutagenic (ISS finds alert reliable)

no alert found

SARpy

low

disagree

3/7 models predict mutagenic

no alert found

SARpy

low

disagree

3/7 models predict mutagenic , exception rule

no alert found

NONMutagenic

low

disagree

2/7 models predict mutagenic

no alert found

SARpy

low

disagree

2/7 models predict mutagenic, no exception rule

no alert found

SARpy

low

disagree

2/7 models predict mutagenic, exception rule

no alert found

NONMutagenic

low

disagree

2/6 models predict mutagenic

no alert found

SARpy

low

disagree

two models do not provide a prediction

no alert found

SARpy

low

SARpy

low

Only VEGA prediction is available, low score Only one model of TEST predicts. VEGA consensus and that single model agree about mutagenicity Only one model of TEST predicts. VEGA consensus and that single model agree about nonmutagenicity Only one model of TEST predicts. VEGA consensus and that single model disagree agree about mutagenicity

2/4 VEGA models predict mutagenic (ISS finds alert reliable)

no alert found

Mutagenic

low

All VEGA models predict nonmutagenic

no alert found

NONMutagenic

moderate

no alert found

SARpy

low

no alert found

SARpy

low

2/7 models agree about mutagenicity

35

Table S2. SMART strings used to detect all flavonoids in TS MOLECULAR GROUPS Flavones + flavonols Flavonones + dihydroflavonols + flavan-3ols + flavan-3,4-diols Chalcones Aurones Anthocyanidins

SMARTS Oc1cc(O)c2C(=O)C=C(Oc2c1)c3ccc(O)[$([cH]),$(cO)]c3 Oc1cc(O)c2[$([CH2]),$(C=O),$(CO)]CC(Oc2c1)c3ccc(O)[$([cH]),$(cO)]c3 Oc1ccc(C(=O)C=Cc2ccc(O)[$([cH]),$(cO)]c2)c(O)c1 Oc1ccc(C=C2Oc3cc(O)ccc3C2=O)cc1 Oc1cc(O)c2C=CC(=[O+]c2c1)c3ccc(O)c(Cl)c3

36

Declaration of interests

☐ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

☒The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

The employers of some authors (Mario Negri Institute, Greenpharma and UNITIS) received payement from Botanical alliance consortium since they participated to the first phase of the NCStox project. This allowed to develop a predictive database determining the toxicological profile of more than 18000 natural compounds used in cosmetics.

37

Graphical abstract

38

Highlights 

The aim was to assess the genotoxic potential of a large dataset of compounds by integrating several in silico models



The integration strategy was tuned and applied to ~ 18000 molecules found in plant extracts used in cosmetics



A new SAR model was developed in Pyton, based on active and inactive rules



The in silico methods were refined

39