Computational analysis of cdh1 missense mutations in the cause of hereditary diffuse gastric cancer

evidence survival. Abstract In this present study, we computationally identified the germline missense mutation in the E-cadherin ( CDH1 ) gene causing hereditary diffuse gastric cancer (HDGC). The analysis was initiated with SIFT followed by PolyPhen and I-Mutant2.0 programs with the help of 68 CDH1 variants retrieved from dbSNP. The analysis indicates that 10 variants such as P201R, A298T, E336D, C695R, N751K, Y755C, D768N, G879S, D882N and R169H were commonly found to be less stable and damaging by SIFT, PolyPhen and I-Mutant2.0 programs. Furthermore, SNPs&GO was used to predict the disease related mutations from the protein sequence. Finally, the affinities for the cetuximab with CDH1 variants were examined by using molecular docking algorithm. The result showed that P201R, A298T,


Introduction
E-Cadherin (CDH1) is cell-cell adhesion molecule localized in adherens junctions. The function involves polarity, cell differentiation, tissue integrity and regulating signal transduction pathways (Qian et al., 2004). The extracellular portion of the protein mediates homophilic cellular interactions, and intracellular part provides a link to the actin cytoskeleton through catenins, a multifunctional protein associated with CDH1 gene (Keller et al., 1999). Loss of function or expression of E-Cadherin increases the invasion and metastasis of tumors. It is being called as "Suppression of invasion" gene. It includes dysfunction of cell-cell adhesion, loss of tissue integrity, morphological changes, Loss of heterozygosity (LOH) and increased proliferation. The presence of CDH1 in chromosome 16q22.1 was confirmed by the researchers (Pećina, 2003). The literature evidences available suggest that missense mutation in the CDH1 gene might causes gastric, breast, colorectal, thyroid, endometrium and ovarian cancers (Berx et al., 1998). Moreover, gastric cancer is more predominant than the other cancer types especially caused by the missense mutation in CDH1 gene (Corso et al., 2012). It is the 4 th most common cancer and 2 nd most cancer death worldwide (Zhang et al., 2006). The Germline missense mutations of Ecadherin resulting in E-cadherin inactivation was identified as the supreme importance for the Hereditary Diffuse Gastric Cancer (HDGC) (Kim et al., 2000). The majority of the families with autosomal dominant gastric cancer susceptibility have HDGC Brooks et al., 2004;Kaurah et al., 2007). E-cadherin deficiency provides an obvious explanation for the diffuse, scattered growth of HDGC tumours, as the protein is the central component of epithelial cell-to-cell adhesion junctions and as such is required for the integrity of epithelial layers (More et al., 2007). In gastric cancer a series of trials have produced evidence that chemotherapy increases survival. There are so many drugs available for the chemotherapy treatment (Nishiyama and Wada, 2009;Graziano, et al., 2004;Zhang et al., 2013). However, the cetuximab proved to be effective drug for the treatment of HDGC (Cunningham et al., 2004). E-cadherin expression increases the sensitivity to cetuximab in gastric cancer cell lines. Cetuximab monotherapy has the improved treatment outcome compare to other chemotherapy drugs (Heindl et al, 2012). By stimulating an immune system mediated anti tumour response, cetuximab inhibits cancer-cell proliferation, angiogenic growth factor production and tumor-induced angiogenesis, and cancer cell invasion. In gastric cancer treatment cetuximab is over expressed with the target cells (Lordick et al., 2010). The literature evidences indicates that mutation in E-cadherin leads to the improper binding of cetuximab and leads to the cetuximab resistance. Therefore, monitoring the cetuximab resistance is a key area for the treatment of HDGC. This would be certainly helpful for the development of long acting drug molecule. Hence, in this present study, we identified detrimental missense mutations in Ecadherin using different genomic algorithms. Subsequently, the sensitivity of cetuximab with mutated cells was also examined by docking analysis.

Materials and Methods
The SNPs (Single Nucleotide Polymorphisms) and their associated information for CDH1 gene were obtained from dbSNP database (http://www.ncbi.nlm.nih.gov/ SNP/) for our computational analysis. The protein sequence corresponds to E-cadherin were obtained from uniprot (http://www.uniprot.org/). The 3D Structure of CDH1 protein obtained from Swissmodel workspace (Arnold et al., 2006) (http:// swissmodel.expasy.org/). The mutant structures were generated by using SPDB Viewer package. The 3D structure of the cetuximab was retrieved from Protein Data Bank (Berman et al., 2000) (http:// www.rcsb.org/) for the molecular docking analysis.

Investigations of structural and functional consequences of coding nsSNPs by computational analysis:
The SNP occurring in the protein coding region normally leads to the deleterious consequences in its 3D structure and hence may prone to disease-associated phenomena. In the present study, we used the genomic tools such as SIFT (Ng and Henikoff, 2003), PolyPhen-2 (Ramensky et al., 2002), I-Mutant2.0 (Capriotti et al., 2005) and SNPs&Go (Calabrese et al., 2009) to detect the deleterious coding nsSNPs, and FireDock (Mashiach et al., 2008) to calculate the binding free energy.
Tolerance analysis of missense mutations by SIFT: SIFT (Sorting Intolerant From Tolerant) is a sequence homology based tool available at http:// www.blocks.fhcrc.org/sift/SIFT.html. It presumes that important amino acids will be conserved in the protein family. Thus, changes at well conserved positions tend to be predicted as deleterious. We submitted the query in the form of SNP IDs or as protein sequences. The underlying principle of this program is that SIFT takes a query sequence and uses multiple alignment information to predict tolerated and deleterious substitutions for every position of the query sequence. SIFT is a multistep procedure that, given a protein sequence, i) searches for similar sequences, ii) chooses closely related sequences that may share similar functions, iii) obtains the multiple alignment of the chosen sequences, and iv) calculates normalized probabilities for all possible substitutions at each position from the alignment. Substitutions at each position which normalized probabilities less than a chosen cutoff are predicted to be deleterious and those greater than or equal to the cutoff are predicted to be tolerated (Ng and Henikoff, 2003). The cutoff value in the SIFT program is a tolerance index of ≥0.05. The higher the tolerance index, the less function impact a particular amino acid substitution is likely to have.

Prediction by PolyPhen-2:
The structural level analysis of coding nsSNPs at is considered to be very important to understand the functional activity of the protein. In the present study, structural level analysis was performed with the aid of PolyPhen-2 (Ramensky et al., 2002), which is available at http://coot.embl.de/Polyphen/. Input options for the PolyPhen-2 program are protein sequence or accession number together with sequence position with two amino acid variants. We submitted the query in the form of protein sequence with mutational position and two amino acid variants. Sequence based characterization of the substitution site, profile analysis of homologous sequences, and mapping of substitution site to a known protein three dimensional structure are the parameters taken into account by the PolyPhen-2 program to calculate the score. It calculates PSIC scores for each of the two variants and then computes the difference between them. The higher the PSIC score difference, the higher is the possible functional impact of a particular amino acid substitution.
Stability analysis with I-Mutant2.0: I-Mutant2.0 is a support vector machine (SVM) based tool for the automatic prediction of protein stability changes caused by single point mutations. The predictions were performed starting either from the protein structure or, more importantly, from the protein sequence (Capriotti et al., 2005). The output files show the predicted free energy change value (ΔΔG), which was calculated from the unfolding Gibbs free energy value of the mutated protein minus the unfolding Gibbs free energy value of the native protein (kcal/mol). Positive ΔΔG values meant that the mutated protein has higher stability and negative values are the indication of lesser stability.

Prediction of disease related mutations using SNPs & GO:
Furthermore, we have used Single Nucleotide Polymorphism Database (SNPs) & Gene Ontology (GO) are support vector machine (SVM) based accurate methods used to predict the disease related mutations from protein sequences with a scoring accuracy of 82% and Matthews correlation coefficient of 0.63 (Calabrese et al., 2009). The FASTA sequence of whole protein is considered to be an input option and output will be the prediction results based on the discrimination among disease related and neutral variations of protein sequence. The RI (Reliability Index) higher than 5 reveals the disease related effect of mutation on the parent protein function.
Homology modelling and RMSD analysis: The sequence version of the human E-cadherin protein was retrieved from Swiss-prot (http://www.expasy.ch/sprot/). Then a BLAST (http://www.ncbi.nlm.nih.gov/blast/) sequence analysis was performed against the whole PDB to select the template that could be used to generate the model of E-cadherin. Subsequently, the three dimensional structure of the model for the Ecadherin was generated by the homology modeling software from the Swissmodel workspace (http:// swissmodel.expasy.org/). Furthermore, the mutated model structure was generated by means of SwissPDB viewer. We used conjugate gradient method for optimizing the 3D structures. The deviation between the two structures was evaluated by their Root Mean Square Deviation (RMSD) analysis.

Results and Discussion
The mutations were independently submitted into SIFT program to check its tolerance index (Ng and Henikoff, 2003). Among the 68 variants, 24 variants found to be deleterious having the tolerance index score of  0.05. The result is shown in Table I. We observer that, Out of 24 variants 8 variants were having highly deleterious tolerance score 0. Six variants were having tolerance index score of 0.01, four variants were having tolerance index score of 0.02, one variant had a tolerance index score of 0.03, four variants were having tolerance index score of 0.04 and one variant had a tolerance index score of 0.05.
Protein sequence with mutational position and amino acid variants associated with 68 single point mutants, used in this work were submitted as input to the PolyPhen program (Ramensky et al., 2002) and results were shown in Table I. A PSIC score difference of 1.5 and above was considered to be damaging. Out of 68 variants 23 variants were considered to be damaging by PolyPhen program. Interestingly 13 variants namely, R162W, R169H, N674Y, E336D, A298T, P201R, C695R, G879S, D882N, Y755C, D768N, I393N and N751K were considered to be damaging by PolyPhen also were seen to be deleterious according to the SIFT program.
To further probe this behaviour, we used I-Mutant 2.0 program for our analysis. This program predicts the stability to the protein structure by means of  G value. Out of 68 variants, we obtained 53 variants found to be less stable from the I-Mutant 2.0 Program (Capriotti et al., 2005) as shown in Table I. It is interesting to observe that 5 variants showed a  G value of  -3.0kcal/mol. The other 7 variants were showed a  G value of  -2.0kcal/mol. The other 16 variants showed a  G value of  -1.0kcal/mol. The remaining 25 variants showed a  G value of < -1.0 kcal/mol as depicted in Table 1. Out of 53 variants which showed a negative  G, 4 variants namely, E864K, E702K, E880K and E410K changed their amino acid from negatively charged to positively charged amino acid; 10 variants such as P597S, G879S, A709S, M177T, A80T, A298T, A617T, P30T, A592T and I393N changed from non-polar to polar amino acid; 4 variants, N666H, C306R, N751K and C695R changed from polar to positively charged amino acid; 5 variants, T340M, T340A, T506A, T211P, T395A and S838G changed from polar to non-polar amino acid. S543F and N674Y changed from polar to aromatic amino acid. 2 variants,  N622S  P201R  I535V  K381N  E864K  A298T  V132I  V392I  D768N  P30T  A788V  A408V  A709S  A634V  S838G  L711V  T340A  V574F  R124H  D676N  T506A  V242I  T340M  D498E  V473I  D72N  V832M  L478P  A592T  E880K  I393N  N751K  A617T  C695R  K184I and K69T changed from positively charged to non-polar amino acid; R29Q, R224H, K182N, R224C and K381N changed from positively charged to polar amino acid; R162W and R28W changed from positively charged to aromatic amino acid. 6 variants, D805N, D882N, D443N, D768N, D676N and D72N changed from negatively charged to polar amino acid; W483S and Y755C changed from aromatic to polar amino acid. Finally, the variants such as A401D, P201R and V574F changed from non-polar to negatively charged, positively charged and aromatic amino acid, respectively. It is also to be noted that M203V, M282I, A408V, A634V,A788V, V242I, V153I, V392I, V132I, , V473I, L711V, L630V,L478P, ,I535V, V138M and V832M variants retained non-polar amino acid, N155S and N622S variants retains polar amino acid, E336D and D498E, variants retained negatively charged amino acid, R169H and R124H, variants retained positively charged amino acid property were found to be less stable by I-Mutant 2.0. Most importantly, 18 variants were considered to be damaging by PolyPhen program were also seen to be deleterious according to I-Mutant 2.0 program. The above point portrays that preserving amino acid physico-chemical properties does not necessary result in harmless mutation. Indeed considering only amino acid substitution based on physico-chemical properties could not be able to identify the detrimental effect rather than considering the sequence conservation along with the above said properties could have more advantages and reliable to find out the detrimental effect of missense mutations (Teng et al., 2009).
In order to predict the human disease related single point protein mutations we used SNPs&GO program (Calabrese et al., 2009) to predict a particular variant is  disease related or neutral. Among the 10 detrimental missense mutations 4 variants namely, E336D, A298T, 201R and R169H found to be diseased and the remaining 6 variants predicted to be neutral by SNPs&GO program. The result is shown in Table II.
We observed that, out of 10 variants, 4 variants were aving RI of >5 reveal the disease related effect and the remaining 6 variants were having RI of <5 indicate relatively the neutral effect.
The four detrimental structure of CDH1 were generated by means of Swissmodel program. The mutant structures (E336D, A298T, P201R and R169H) were generated by SwissPDB viewer. The PyMol view of the modelled structures of E-cadherin is shown in Figure 1.
In order to find out the deviation between the two structures, we superimposed the energy refined native structure with all the energy refined mutant structures to get RMSD. The higher the RMSD value, the more is the deviation between the native and the mutant structure, which in turn changes their binding efficiency with inhibitors due to deviation in 3D space of the binding residues of CDH1 gene. Table III shows the RMSD for native structure with all the mutant modelled structures. The value is of 0.015 Å, 0.105 Å, 3.617Å and 0.028 Å for the E336D, A298T, P201R and R169H structures respectively.
Finally, the molecular docking studies were performed to confirm the functional impact of the amino acid mutation. Cetuximab (PDB ID: 1yy8) structure was retrieved from PDB. It is docked with native and mutant (E336D, A298T, P201R and R169H) structures of E-cadherin to understand the binding affinity. Docking was performed using the FireDock program (Mashiach et al., 2008). The result is shown from Figure 2. The analysis indicates the affinity for cetuximab for native CDH1 was found to be -52.31kcal/mol, whereas with the mutants, the DG was found to be in the ranges -29.96 to -45kcal/mol. It can be seen from Figure 2, the mutants established lesser binding affinity with cetuximab than the native type protein. These data clearly portray that mutation in the E-cadherin structure leads to the resistance for cetuximab. This is the clear evidence of the deleterious effect of missense mutations such as E336D, A298T, P201R and R169H lesser. Hence, we conclude that these variants should also consider for the design of drug for the treatment of HDGC.
The mutations of CDH1 namely E336D and A298T were proved more deleterious effect to the structural stability and its function of the E-catherin. In this work, we also found quite a few other drug-resistant mutations by computational approach. We believe that our observations have critical implications for the understanding of CDH1 associated missense mutations and also for the development of novel therapies for this disease.