Evolutionary analysis of galectins and identification of potential galectin-1 inhibitors: A computational approach

were retrieved from NCBI protein database. These sequences were aligned using multiple sequence alignment method using the ClustalW online tool. From the alignment Abstract Galectins are a family of structurally related carbohydrate-binding proteins and some galectins play a major role in initiation, progression and dissemination of different types of tumors. Multiple sequence alignment was performed for 15 types of galectins and phylogenetic tree was constructed for studying evolutionary relationship. Among galectins, galectin-1 contributes to various events associated with cancer biology including tumor transformation, cell cycle regulation and apoptosis. Hence a rational computational approach was followed for the design of new class of glycol-mimetic inhibitors with high affinity and stability. Ten N-39-triazole analogs have been used for molecular docking with galectin-1. Based on docking studies, hexaconazole is identified as a potential inhibitor of galectin-1 for the inhibition of the tumor activity. The binding mechanism of hexaconazole to galectin-1 in the dynamics system was studied by 10 ns Molecular Dynamics simulation. Thus, our study favors more insight on hexaconazole as a promising inhibitor for galectin-1.


Introduction
Galectin, a type of lectin, adheres the -galactoside.The galectins comprise of a -galactoside adhering lectins bearing a homologous carbohydrate recognition domains (CRDs) (Ajit et al., 2009).The CRDs of galectins consists relatively 130 amino acids, despite only a few residues within the CRDs precisely influencing the glycan ligands.So far 15 galectins have now been identified in mammals and among them 12 galectins are found in humans.The galectins are divided into three major groups.The prototypical galectins comprise of galectins-1, 2, 7, 10, 13, 14, which have only one CRD (Barondes et al., 1994).The tandem repeat galectins comprising the galectin-4, 8, 9, 12, include two CRDs and a single polypeptide linked by tiny peptide domain.
Galectin-1, a homodimer composed of subunits and each subunit folds as one compact globular domain (Leffler, 2001).Galectin-1 can function in both carbohydrate-dependent and independent manners either positive or negative depending on responder cell types or sub cellular localization (Rabinovich et al., 2002).The endogenous protein may function as a growth-promoting factor; exogenously added galectin-1 specifically suppresses tumor cell proliferation (Mercier et al., 2008).Galectin-1 induces cell growth inhibition, inhibits T-cell activation and promotes apoptosis of activated T cells (Levroney et al., 2005).Naturally occurring carbohydrate ligands for galectins have low affinities, too polar and possess low physiological stabilities (Giguere et al., 2006).Some N-39-triazole analogs provided high affinity enhancement and can be considered as possible inhibitors for galectin-1 (Hope et al., 2008).

Evolutionary analysis of galectins
The protein sequences of all the 15 types of galectins available were retrieved from NCBI protein database.These sequences were aligned using multiple sequence alignment method using the ClustalW online tool.From the alignment scores obtained from ClustalW, a phylogenetic tee has been constructed using a software tool, TreeView.The evolutionary relationship that exists between the different types of galectins, their existence and their diverse functions has been analyzed from the constructed phylogenetic tree.

Galectin-1 protein preparation
The crystal structure of galectin-1 was retrieved from Protein Data Bank (PDB) with ID: 1GZW (Lopez-Lucendo et al., 2004).The binding sites available in PDB for galectin-1 were used for the molecular docking analysis.Triazole refers to either one of a pair of isomeric chemical compounds having a molecular formula C2H3N3, having a five membered ring of two carbon atoms and three nitrogen atoms.The class of triazole consists of ten analogs and their structures were obtained from PubChem database.The ligand preparation was performed using PRODRG web server (Schuttelkopf et al., 2004).

Molecular docking
AutoDock 4.2 is used for molecular docking.AutoDock used the binding free energy evaluation to find the best binding mode.The docking energy was obtained from the van der Walls energy and hydrogen bonding energy together, while the binding energy is built up from van der Walls energy and desolvation energy (van Grotthuss et al., 2003).The binding strength and the location of ligand in most of the cases can be decided by the electrostatic interaction between ligands and receptor.The hydrophobic interaction obtained from the docking can affect the agonistic activity to a larger extent (Morris et al., 1998).

Molecular dynamics simulations
MD simulations were carried out using GROMACS 4.5 (Berendsen et al., 1995).The minimized structure of galectin-1 as well as the docked complex was employed in the MD simulation process.GROMOS 43a1 forcefield was used for complex MD simulations (van Gunsteren et al., 1996).The force field parameters of ligand were obtained from PRODRG web server.The aim of MD simulation was to get more precise receptor-inhibitor models in a state close to the natural conditions and to explore the binding modes of the ligands further.Although molecular docking offers reasonable binding structures for investigated ligands, the MD simulation can account for even the smallest variances.Eight Cl - counter ions were added by replacing water molecules to ensure the overall charge neutrality of the simulated system.Energy minimization process, position restraint procedure was performed in association with constant Number of particles, Volume and Temperature ensemble (NVT) and constant Number of particles, Pressure and temperature ensemble (NPT).An NVT ensemble was employed at constant temperature of 300 K with time duration of 100 ps.After stabilization of temperature an NPT ensemble was performed.In this phase a constant pressure of 1 bar was employed with a coupling constant of 5 ps with time duration of 100 ps.The coupling scheme of Berendsen was employed in both of NVT and NPT ensembles.And a final production mdrun was carried for 10 ns for the trajectory analysis.

Principal component analysis (PCA)
In order to identify the dominant motions of galectin-1, we have used principal component analysis method which takes the trajectory of a MD simulation and extracts the dominant modes involved in the motion of the molecule (Amadei et al., 1993).A covariance matrix is built using a simple linear transformation in Cartesian coordinate space and the diagonalization of the covariance matrix is carried out which generates a set of eigenvectors which gives a vectorial description of each component of the motion by indicating the direction of motion.Each eigenvector has a corresponding eigenvalue which explains the energetic contribution of each component to the motion (Mesentean et al., 2006).The protein regions which are responsible for the most relevant collective motions can be identified using this analysis.PCA is performed on galectin-1 using the inbuilt gromacs utilities g_covar and g_anaeig.

Analysis
All the visualizations were carried out using Pymol (Delano, 2009), VMD tools (Humphrey et al., 1996) and graphs were plotted using XMgrace Program (Turner, 2005).The trajectories were analyzed using the inbuilt tool in the GROMACS distribution.

Results and Discussion
The 15 types of galectins were categorized based on the source of the organism, distribution of the protein in various organs, molecular mass and specific functions (Table I).The multiple sequence alignment has been performed using ClustalW for the 42 sequences of different types of galectins from various species.A phylogenetic tree has been constructed using the Tree-View based on the alignment obtained from ClustalW.
From the phylogenetic tree (Figure 1), galectin clades had been analyzed.The phylogenetic tree consists of three clades.The representing galectin-3 sequence from Homo sapiens is used as the reference sequence which falls under clade-II.The clade-II consists of Galectin-1 class of proteins from various species which is observed to be diverged slightly based on the mutation.The clade-II is further classified into two sub classes of galectin family which is further classified into various branches comprising of galectins-1, 3, 7 and 12.The reference sequence found to be less diverged from clade -I which consists of galectins-2, 10, 13 and 15 and it is more distantly diverged from clade-III which consists of galectins-4, 5, 6, 8, 9, 11 and 14.The overall tree showed the current trend of molecular evolution of galectin family with minimal change among galectins under clade II and more changes in clade-I and III of galectin Phylogenetic tree.All the galectins from Clade-II exhibits a functional similarity also, all of these galectins-1, 3, 7, 12 play a vital role in the apoptosis activity (Ajit et al., 2009).The important observation was galectins-1, 3, 7, 12 which were closely related share a functional similarity, with all of them being vital in the apoptosis activity.
The 3D structure of Homo sapiens galectin-1 was downloaded from PDB database with the code 1GZW.The active sites already available for drug target galectin-1 in PDB were used for docking studies.The binding sites were critical in the binding of ligands with the drug target galectin-1.A molecular docking study was performed for a dataset of 10 triazole analogues with the drug target galectin-1.Based on the lowest binding energy obtained for each ligand, the top inhibitors were ranked.The binding free energy of ten screened inhibitors scored by AutoDock ranges between -3 to -8 Kcal/ mol.The results obtained for all ten compounds are similar having a consistent hydrophobic cavity in all  Hexaconazole accounted for seven H-bonds all with side-chain atoms of galectin-1, with Asn46 contributing for three and Arg48, Ser29, His44, His52 forming one each (Figure 2 and 3).Fluconazole formed four H-bonds with all of them formed with the side chain atoms of galectin-1, with Asp54 accounting for two and Arg48, His52 forming one each respectively.Itracona-zole formed five H-bonds with Asn46 contributing to two and Arg48, Arg73 and Glu71 contributing for one each and all of them formed with the side chain atoms of the protein.Epoxiconazole contributed for four H-bonds one each with the side chain atoms of Arg48, Asn46, His44 and Glu71.Posaconazole formed three H-bonds with Arg48, Arg73 and Glu71 contributing for one each and all of them formed with the side chain atoms of the protein.

Structural and functional properties of various types of galectins
In case of voriconazole, two H-bonds were formed one each with the sidechain of Arg48 and Asp125.Propiconazole formed three H-bonds one each with Glu71, Arg48 and Asn46 having all of them formed with the side chain atoms of the protein.Terconazole is involved in the formation of two H-bonds with Arg48 and His44 contributing for one each respectively with their side chain atoms.Isavuconazole formed a lone H-bond with the side chain atom of His44.Albaconazole also formed a single H-bond with the side chain atom of Arg48.The Docking analysis of the ten complexes shows that majority of the ligands binds in a similar fashion with small variations.Nine out of the ten compounds showed interactions with the electrically charged sidechain atoms of Arg48 which is a key residue in the binding pocket.In cases of hexaconazole, itraconazole, epoxiconazole, propiconazole complexes it was observed that the residue Asn46 is interacting with the azole ring of the ligands (Table II).
The inhibitor complex of galectin-1 and hexaconazole obtained from docking analysis was used for molecular dynamics simulations for further analysis of binding mode and the effects of the binding of ligand on the conformation of protein.The trajectories of the complex were obtained for a period of 10 ns.The RMSD of the protein showed an initial rise for a period of 3 ns, commonly attributed for the breakdown from the crysta -llographic structure and after that it remained stable for the rest of the simulation period (Figure 4A) (Levitt et al., 1988).The total energy of the complex has been stable throughout the period of 10 ns (Figure 4B).
The H-bonds between the ligand and the protein were quite stable during the course of the simulation which represents a stable binding pocket (Figure 5).The seventh H-bond formed with His52 did not appear throughout the simulation which might have occurred due to the changes in the positioning of the residues in the binding pocket.
The dynamical mechanical properties of the simulated system can be analyzed using PCA.A covariance matrix for the complete trajectory of simulation during a period of 10 ns has been built.The analysis of the eigen vectors from the PCA showed that first seven eigen vectors were responsible for more than 85% of the total motion in the model.The dynamics of a protein can be best achieved by its behavior in phase space.The projection of the trajectories at 300 K onto the first two principal components (PC1 and PC2), illustrates the motion of the protein in phase space (Kundu et al., 2009).Analysis of these projections shows the clusters of stable states for the protein.The tenth largest eigen value is found to be less than one-tenth of the first largest eigen value.The first principal component dominates the motion of the protein (Figure 6A).A Gibbs free energy landscape is constructed over the first and second largest principal components which gives a clear description of the folding dynamics of the protein (Figure 6B) (Papaleo et al., 2009;Maisuradze et al., 2012).

Conclusion
The overall evolutionary analysis revealed the current trend of molecular evolution of galectin family shared functional similarity with minimal change among galectins.From the molecular docking and molecular dynamics simulations studies, the potential binding mode of hexaconazole to the galectin-1 with stability was revealed and can be used as a futuristic lead compound for inhibition of galectin-1.

Figure 1 :
Figure 1: Phylogentic tree of galectin family from TreeView

Figure 2 :Figure 5 :
Figure 2: H-bond interaction pattern of top inhibitor hexaconazole with galectin-1.Interactions were visualized in Rasmol with black dotted lines and bond length