3D QSAR modeling of 4-nerolidylcatechol derivatives and virtual screening for identification of potent plasmodium inhibitor

The present study was aim to develop a three dimensional quantitative structure–activity relationships (3D QSAR) model based on the structure of 4-nerolidylcatechol (IC 50 =0.67 µM), a novel plant derived Plasmodium inhibitor and its derivatives for identification of efficient antimalarial lead. A statistically validated Partial Least-Squares (PLS) based Molecular Field Analysis (MFA) model was built up using the training set of eight 4-nerolidylcatechol derivatives and their diverse conformers. A statistically reliable model with good predictive power (cross-validated correlation coefficient q 2 =0.769) was obtained. Hence, the generated model was used to screen a library of 30,000 compounds of chembridge database (http://www.chembridge.com). Results of drug likeness prediction and ADMET study has suggested six compounds as potential antimalarial/ plasmodial lead.


Introduction
Malaria is one of the deadly diseases causes due to the infection of Plasmodium spp and becomes a major public health problem of the world. Till date, malaria causing the death of 1-2 million people annually around the World and estimated reports of 300-500 million cases of Malaria (Bharati and Ganguly, 2013;Mohapatra et al., 1998;NVBDCP, 2010). Detection and report of drug resistance stains of malaria parasite has been treated as a major challenge to control malaria and leading to the continuous increases of malarial incidence (Crompton et al., 2014;Ghosh et al., 2000). Till date, not a single drug has been discovered for the complete eradication of all type Plasmodium spp. Therefore, development of new potential antimalarial candidate lead is a time demanding issue.
In this present investigation, a Partial Least-Squares (PLS) based 3D QSAR model on the 4-nerolidylcatechol derivatives was built and used for the identification of novel antimalarial compounds in the cheminformatics database (library). 4-Nerolidylcatechol (4-NC, Figure 1) is a metabolite reported from Piper peltatum L. and Piper umbellatum L. (syn. Pothomorphe peltata (L.) Miq. and Pothomorphe umbellata (L.) Miq. The 4-Nerolidylcatechol has been studied for strong antimalarial property including antioxidant and anti-inflammatory activities as per the data of previous studies. In a recent investigation, new derivatives of 4-NC were synthesized and experimentally validated for the inhibition of Plasmodium (Bagatela et al., 2013;Pinto et al., 2009;Rocha et al., 2011;Silva Lima et al., 2012

Materials and Methods
Dataset preparation: A set of eight 4-nerolidylcatechol derivatives with their diverse experimentally known inhibitory activity (IC50) data was compiled from the literature. 4-nerolidylcatechol is a semi synthetic derivative of catechol known for its antimalarial activity. (Bagatela et al., 2013;Pinto et al., 2009;Rocha et al., 2011;Silva Lima et al., 2012).
Compounds 2D structures were drawn using MarvinSketch v6.2 and Open Bable software was used for molecular file conversion purpose. Energy minimiza -tion for all the dataset compounds were performed using Discovery Studio v3.1 at CHARMM module (Arooj et al., 2011). Prediction of physiochemical proper -ties of the dataset compounds such as hydrogen bond donor, hydrogen bond acceptor, Alogp, Molecular Weight (In Dalton), etc were also computed for drug likeness study (Gogoi et al., 2014).

Conformer generation:
Conformer generation is an important step in 3D QSAR modeling. Herein, we have employed the Poling algorithm to generate maximum of 255 diverse conformations with energy threshold of 20 kcal/mol above the calculated energy minimum for every training set compounds. These conformation were predicted using the diverse conformation generation protocol, where Conformation method was set as FAST using the CHARMM input force field. In FAST conformation, conformational space of small molecules is generated using an efficient systematic search. If the molecule is too large, only one conformation is generated for each possible combination of stereocenters. Conformations for molecules that are neither too small nor too large, as measured by the flexibility of the molecule, are generated with a random search method that uses poling. In the conformation generation step, Maximum Systematic Conformations was set at 1000, Conformation Boltzmann was set at 300 and temperature cutoff was set to 0.2. Number of clusters was set to 20. The other parameter were kept default while conformation generation. This methodology was performed at the DS software v3.1 workspace (Mitra et al., 2010).
Training and test set preparation: Training set and test set compounds were generated for the eight dataset compounds along with the conformers. In quantitative activity relationship study, training set data is a set of compounds used to discover potentially predictive relationships. The test set is a set of data used in QSAR study to assess the strength and utility of a predictive relationship. Herein, Random splitting method of DS software was to generate the training and test set data. Training set percentage was set at 80 while generating the data set.
3D QSAR model generation and validation: QSAR modeling approach is used to generate predictive models correlating the biological activity with the structural descriptor of a molecule. In rational drug design methodology, QSAR plays pivotal role in the prediction of unknown compounds for their potency as optimized candidate lead (Dearden, 2003;Scholz et al., 2013). In 3D QSAR method, the energy potentials calculated using the 3D structures of a set of ligands are used as descriptors to build a model that relates the biological activities to the 3D structures. In most cases, these ligands all bind into the same binding site of the same or similar receptors. In the current investigation, 3D QSAR model was generated using the 8 dataset compounds including their conformers using DS software v3.1 (Dell Server in Windows). CHARMM force field is used and the electrostatic potential and the van der Waals potential are treated as separate terms. A +1e point charge is used as the electrostatic potential probe and distance-dependent dielectric constant is used to mimic the solvation effect. For the van der Waals potential a carbon atom with a 1.73 Å radius is used as a probe. The energy grid potentials are filtered to remove highly correlated descriptors. A partial least square (PLS) model is then built using these remaining descriptors. Hence, the model was use to predict unknown compounds with malarial efficacy.
The partial least squares (PLS) model: Partial least squares regression is an extension of the multiple linear regression model. In PLS, rather than using all the independent variables (as in multiple linear regression), a small number of principal components is used. Partial least squares create a multiple-term linear equation based on a principal components analysis transformation of the independent variables. However, unlike a principal component analysis, the dependent variable is transformed as well. Axes are chosen that maximize retention of the variance and also correlate dependent and independent variables. More specifically, the covari -ance of a transformed independent variable with a transformed dependent variable is maximized.
As in multiple linear regression, the main purpose of partial least squares regression is to build a linear model: where Y is a response matrix (or vector) formed by the dependent variables, X is a matrix formed by the independent variables, B is a matrix of the regression coefficients, and E is an error term for the model.
Virtual screening (database searching): The generated model was applied to screen on 30,000 compounds from Chembridge chemical (Combiset) using DS software v3.1. The Chembridge (http://www.chembridge. com) library is a unique collection of small drug like compounds and useful for computer assisted screening (Wadood et al., 2014).
ADMET prediction: In silico ADMET study is an important step of CADD. Herein, ADMET properties of the screened ligands were studied and compared using the ADMET tool (module) of the Discovery Studio 3.1. ADMET properties such as human intestinal absorption, aqueous solubility, blood brain barrier penetration, plasma protein binding, CYP2D6 binding and hepatotoxicity of the screens were predicted. TOPKAT module of toxicity prediction of DS was used to investigate the carcinogenicity and other toxicity of ligands.

Results and Discussion
Three dimensional quantitative activity relationship modeling is a most efficient and pivotal step in computer aided drug design. Eight dataset compounds including their conformers were considered in this study as presented in the Development of regression models built from whole molecular steric and electrostatic fields can be useful for predicting activity and for visualizing favorable and unfavorable interactions. In this study, we have employed the structure of 4-nerolidylcatechol and their derivatives for generating a Partial Least-Squares (PLS) model is built using energy grids as descriptors.
The energy grids are computed using two probe types designed to measure electrostatic and steric effects. The validity of the model was obtained by leave-one-out (LOO), where internal predictive power was computed at 0.769 in terms of cross-validated correlation coefficient (q 2 ). Component wise 5-Fold Cross Validation Result is presented in the Table II. Further the model was use to predict 30, 000 unknown chembridge compounds. Physiochemical properties calculation, ADME study and Toxicity profiling was performed for molecules fitted to the model. Physiochemical properties of a compound or a drug are very much useful in discovery and development. Physiochemical properties of a ligand can be computed using its two dimensional structure. Herein, physicochemical properties computation was performed. We have predicted ~1500 compounds following lipinski's and Veber's drug likeness rule and fitted to the grid based QSAR model. Important properties such as number of rotatable bond, clogp, alogp etc with optimum score were computed.
ADMET investigation is a crucial step in drug discovery process. The Discovery Studio 3.1 was employed in ADME prediction study. Prebuilt validated model of ADMET models were used to compute the human intestinal absorption, Aqueous solubility (solubility of each compound in water at 25°C), blood brain barrier penetration, plasma protein binding CYP2D6 binding (cytochrome P450 2D6 enzyme inhibition) and hepatotoxicity (dose-dependent human hepatoxicity of compounds) of the screened compounds as presented in the Table III with favourable ADME characteristics. In this current study, we have identified seven compounds with good Solubility (-6.0 < log (Sw) -2.0) level of 2-3 and moderate blood-brain barrier level. CYP2D6, hepatotoxic and plasma protein binding values of the screened compounds were found satisfactory.
Toxicity profiling of a compound is a vital step in computational drug discovery process. Herein, the toxic and environmental effects of the screens were performed using the Toxicity Prediction by Komputer Assisted Technology (TOPKAT) module of Discovery Studio v3.1. TOPKAT uses robust and cross-validated Quantita -tive Structure Toxicity Relationship (QSTR) models for the computation of toxicity level of new compounds. We have predicted six non-carcinogenic compounds using the different TOPKAT modules as presented in the Table IV. These six compounds namely CD10097348, CD10374166, CD11106591, CD75952907, CD76675709 and CD96875226 (Figure 2) has also howed better in silico LD50 (g/kg body weight) and LC50 (mg/m 3 /h). Hence these lead compounds may be    In summary, active antimalarial plant metabolite 4nerolidylcatechol and their derivatives were employed to generate the PLP based 3D QSAR model. Predicted model identified six non-toxic database compounds namely CD10097348, CD10374166, CD11106591, CD75952907, CD76675709 and CD96875226 as a potential antimalarial lead.