Predicting Rheumatoid Arthritis Associated Significant Amino-Acid Residues Using Residue-Residue Interaction Analysis

Rheumatoid Arthritis (RA) is a prevalent autoimmune and inflammatory disease that requires restructuring. A lot of research information is available, but a clear etiology and drug target information is still unclear. A bottom-up approach can add more information to existing knowledge about RA. One better way of understanding the disease-related mechanism and drug objectives can be a detailed residue-residue interaction of the proteins involved with RA. In the current research work, we have studied the significant proteins reported in the Indian population that are involved in RA progression and have represented each of them as a complex network of amino acid residues to understand the significance of individual residues in the network. We implied the graph theory approach to identity central important residue, based on topological properties of the network. This approach allows us to look at a more precise method to identify potential drug targets. Our result identified leucine, phenylalanine, tyrosine, and tryptophan as essential nodes in the network, their activity was mainly connected with immune system. Understanding the function of these amino acids in CTLA4, CD40, IRF5, IL2RB, and TRAF could lead to a new treatment options in the fight against Rheumatoid Arthritis.


Introduction
Rheumatoid arthritis (RA) is an inflammatory condition in which the body's immune system attacks its own tissues and joints, causing inflammation and swelling. If left unattended for a long time, this can lead to severe illnesses, joint deterioration, impairment, which can affect internal organs and premature death. Understanding the etiology of RA is challenging because it involves a series of immune reactions involving T-cell, B-Cell, macrophages, cytokines, interleukins, tumor necrosis factors, and various other mediators which are still unknown, all migrate to synovial tissues and initiate an immune response that ultimately produces cartilage damage.
RA affects approximately 0.5% to 1% of the world's population, with the majority affecting more women than men 1 . In India, ~75% of the population is affected. Although RA has been extensively studied but still effective targets for drugs are still unknown. The current objective of RA treatment is to suppress inflammation, pain, and prevent bone destruction. While the symptoms of RA are similar to those of other types of arthritis, such as Rheumatic Fever, Arthralgia, gouty arthritis, and they all have a high rate of morbidity and disability, it is important to distinguish between them 2,3,4 .Both doctors and patients are misled about the disparities in diagnostic capabilities and treatment options available to them 5 . It is critical to comprehend the disease at the molecular level, specifically at the amino acid level, in order to provide correct diagnosis and therapy options. In general, Non-steroidal antiinflammatory drugs (NSAIDs), Disease-modifying antirheumatic drugs (DMARDs), corticosteroids, and biologic agents are few drugs popularly used for RA. NSAIDs and corticosteroids do not prevent RA progression, while DMARDS may lower the rate of RA progression, however, the mechanism is still not fully understood as a consequence, DMARDS dosage and efficacy are still under development 6 . Looking at the rate of increasing RA cases and the side effects of available drugs, it is important to identify potential drug targets. Since RA involves many signaling pathways and molecules, a bottomup approach has more potential to explain disease etiology and drug targets 7 . The reason behind RA is a molecule's malfunction, which ultimately leads to a series of over-expressed or under-expressed cells. A thorough understanding of proteins can, however, help describe the drug-target relationship for RA.
Literature studies have also suggested several factors causing RA, and one among them represents Porphyromonas gingivalis (P. gingivalis) a gramnegative, anaerobic bacterium. P. gingivalis is found in the oral cavity and it's a major periodontitis-related bacteria. One of the known causes of RA is the citrullination of the host protein by protein arginine deiminases (PADs) 8 . Peptidyl-arginine is converted to peptidyl-coralline. The citrullinated protein is recognized as a foreign particle by the immune system, which attacks it. P. Gingivalis produces proteinases that cleave protein from the residues of arginine and lysine. These proteinases have many adverse reactions on the host such as reducing the bactericidal effect and reducing host immune responses that proves that bacteria may be a RA-causing agent and proteinases may be a good drug target candidate 9,10 . TNFα (Tumor Necrosis Factor Alpha) is one of RA-Periodontitis connecting links. When P.gingivalis attacks host, the host releases TNFα, which activates various pathways like cytokine cascade, B-cell, IL-1 activation, NF-μ(nuclear factor-µ), and JNK(c-Jun N-terminal kinases). TNFα over-expression destroys bone and cartilage, so controlling TNFα activity is a significant approach in RA pathogenesis. TRAF1 (TNF Receptor-Associated Factor 1) has adversely regulated TNFα 11 activity 12 . Citrullination affects TRAF1 activities and may contribute to RA development, which may serve as possible therapeutic targets 13 . The entire immune response changes with a change in amino acid residue and the bottom-up approach are therefore appropriate.
In most of the drug-target interaction, the target protein active site or the binding site leads to conformational and functional modifications. Residue in the protein 14 often influences the properties of these sites. It is well known that a single alteration of the residue can lead to an entirely different protein structure and function. There is little information on how the amino acid sequence forms and retains a 3D structure, so this approach provides insights into such residues that can be a more effective target for a drug. Amino acid seldom works in isolation, rather than the form of a complex network in a protein 15 . The protein's three-dimensional structure can be represented and analyzed using graph theory as a network of amino acid interactions. Where each amino acid is a node and physicochemical bonds between each amino acid are represented as edges.
In an amino acid network, essential residues can be defined by analyzing its topological properties, such as centrality, node distribution, shortest path, central closeness, coefficient of clustering, and neighborhood connectivity. Residue network is dense and fewer nodes are connected and act as a communication center with other subnetworks 13,16,17,18,19,20 . Central residues may play an important role in protein function, stability, and transmission of informatio 21,22,23 .
Strongly connected nodes remain close and lightly or unconnected nodes remain isolated, hence, the highest degree node plays a central role 24,25 . Node degree distribution can clarify locally important residues, yet some central parameters need to be explored to get a globally important residue 26,27 . The residueresidue networks, therefore, involve analysis with the Cytoscape Network Analyser plug-in. Network Analyser computes various network properties like Density, Closeness, Betweenness, Coefficient of Clustering, Neighborhood, Connectivity, and more. In our studies, we focus on degree centrality (DC) parameters, Closeness centrality (CC), and Betweenness centrality (BC) 28 . The degree centrality shows how many interactions a node has. More interaction means the node is important because it is connected to other nodes or it plays an important role in holding the network together 29 . Closeness centrality emphasizes the importance of a node in communication. A chief value of the closeness centrality parameter of the node (n) strongly suggests that the node (n) can easily transmit information and itsan essential central residue. Betweenness centrality explains how important a node is for communicating with each other, thus a node of high betweenness is a crucial residue for the whole survival of the network. Progressession in RA involves many signaling pathways, these signaling paths should have a high central parameter's value, which explains its key role in cell communication and signal transduction 30 .
In this study, five RA-causing proteins that have been significantly involved in immunosuppression, inflammation, and signaling processes in the Indian population are taken from the RA variome database. A network residue-residue interaction (RRI) study is conducted and significant amino acids are identified using the topological properties of graph theory. The RING web server (URL: http://protein.bio.unipd.it/ ring ) is used to convert protein structure to an amino acid interaction network 31 and Cytoscape is used to visualize and analyze the network. We've studied each protein as an interactive agent. A Network Analyzer plugin 32 was used to study the topology parameters of interactome residues, explaining the importance of each residue over the interactome.

Material and method
Data were retrieved from the RAvariom database 33 , which includes information on genetic variants for rheumatoid arthritis based on nationality. A complete list of genetic variants for RA reported in the Indian population has been compiled. Further, through a comprehensive literature mining and GENECARD 34 analysis, only those proteins are been sorted for the further analysis which are reported as proteins coding unique genes with protein data bank (PDB) 35 structures. Hence, for the current study viz., cytotoxic T-lymphocyte-associated protein 4 (CTLA 4), Cluster of differentiation 40 (CD40), Interleukin-2 receptor subunit beta (IL2RB), interferon regulatory factor 5 (IRF5), and Tumor necrosis factor receptor (TNFR) associated factor 1 (TRAF1), are been considered which are also are listed in Table 1. where, each residue is represented as a node (N) and the interaction between nodes as an edge (E). Since each protein is represented as a complex network, the importance of each node is represented by Node Degree Distribution, which calculates the total number of connections in which one node connects to other nodes in the network.
The individual protein network was visualized using Cytoscape version 3.0.1 37 and the centrality parameters such as Node Degree (ND), Closeness Centrality (CC), Betweenness Centrality (BC) were computed using the NetworkAnalyzer 38 plug-in of Cytoscape. Based on Node Degree Distribution from RING and Centrality parameters from NetworkAnalyser, topologically significant residues were chosen to identify important amino acid residues 17,18,39 . The resulting residues were further validated through a systematic approach to the mining of literature. Steps involved in methodology are schematically represented in figure 1.

Results and discussion
Altered CTLA4, CD40, IL2RB, IRF5, and TRAF1 genes are associated with RA pathogenesis. A thorough study of the amino acid residues found in each protein provided new insight into the detection of drug targets. In order to find statistically and functionally important amino acid residues, an interactome was created for each protein using the RING (URL: http://protein.bio.unipd.it/ring) web server. Followed by amino acid residue selection, based on RING node degree distribution and NetworkAnalyzer's high centrality values. The node degree distribution cut-off value was taken as nine, i.e. residues with node degree distribution higher than or equal to nine were considered for RING construction . TRAF1 has a node degree of less than nine for all amino acids, therefore residues with node degree distribution 8 have been considered. Details of node-degree distribution residues above nine are shown in Table 2.

CTLA4 (CYTOTOXIC T-LYMPHOCYTE-ASSOCIATED PROTEIN 4)
CTLA4 belongs to the Ig family and plays a significant role in the downregulation of activation of T-cells, macrophages, and osteoclast cells. CTLA4 controls inflammation and structural damage 40 The underexpression of CTLA4 is therefore associated with RA. CTLA4 is a control point protein expressed at a very small amount on T-cell and translocated on T-cell when required to deactivate T-cell. CTLA4 competes with CD28 to bind to CD80/86 to switch off the immune response. CTLA4 protein represented as a network of 108 nodes with an average of 7.91 neighbors, for the current research, network is shown in figure 2. Top ten residues were selected based on centrality parameters, which are listed in Table  3 and sorted based on a degree because the high node degree represents high connectivity. Analyzing Tables 2 and 3 it is evident that Leucine has a node of seventeen degrees with a high proximity centrality parameter and, as a reported node with a high degree of node and proximity, can be considered as a central residue 41 . A comprehensive literature review was performed to verify the interpretations which Indicated that Leucine and Valine play a significant role as immunomodulatory protein 42 . Leucine also plays a key role in enhancing the activity of CTLA4 43 . Any alteration of leucine residue can, therefore, affect the CTLA4 production and that may potentially contribute to under-expression of CTLA4 which ultimately results in an over-expressed immune system. Tyrosine is the next residue with high topological parameters and plays an important role in the internalization of CTLA4 from the plasma membrane 44 . Followed by Cystine which is involved in the activation of the T-cell. The residue referred above explains the importance of each residue in CTLA4 and its importance in the reduction of T-cell proliferation, which further regulates various immune response pathways, therefore other residues listed in Table 3 may be targeted as a key drug target for the treatment of RA.  , and also leads to the production of pro-inflammatory cytokines and controls the expression of co-stimulating molecules 45 , which is why overexpression of CD40 leads to RA. In this study, CD40 is represented as a Residue-Residue interaction network with 128 nodes with an average of 791 neighbors, network is shown in figure 3. Top ten residues have been selected based on the centrality parameters, which are listed in Table 4 and sorted by degree parameter because the higher number of node degree represents high connectivity. According to Table 2 and 4, phenylalanine has a high value for node degree, closeness centrality, and betweenness centrality, which indicates that phenylalanine is a significant residue for CD40 and phenylalanine along with uncharged residues helps transmembrane domain CD40 to produce signals in hydrophobic and lipid-rich environments 46 . Another residue with a high centrality parameter was threonine, it plays an important role in binding CD40 to intracellular proteins, which in turn activates other signaling cascades 47 . Also, threonine is important for M12 signaling.    Table 5 were selected and sorted based on the node degree distribution. Tables 2  and 4 shows that Tyrosine has high values of node degree and proximity centrality, and the activation of different pathways of Tyrosine phosphorylation is very important 49,50 . The results of the RING web server shown in Table 2, indicates that Leucine is an important node, which is essential for the association of all three IL-2R long subunits with Tyrosine, Threonine, and Proline residues 51 . Residues listed in Table 5 can, therefore, be reported as a significant player in these target proteins.

IRF5 (INTERFERON REGULATORY FACTOR 5)
IRF5 belongs to a family of regulatory interferon factors, it regulates pathogen-induced acquired and innate immunity and it is involved in toll-like receptor signaling, B-cell receptor signaling. It plays an essential function in the DNA repair process. Ubiquitination and phosphorylation activate IRF5, which transcribes pro-inflammatory cytokines and type1 INF. IRF5 also determines the ultimate fate of macrophage phenotype 52 . Hence overexpression of IRF5 is associated with RA. A 234-node network was constructed with an average of 7.70 neighbors, network is shown in figure 5. Top nine centralitybased residue have been selected and are specified in Table 6 which are sorted based on node degree. The result is that Serine and Threonine have good value for the parameter of centrality. Serine and Threonine phosphorylation is essential for the dimerization of IRF5 and nuclear translocation. Leucine, Lysine, Serine, and Valine are necessary to establish a hydrophobic environment 53 as shown in Table 6.

TRAF1 (TNF RECEPTOR-ASSOCIATED FACTOR 1)
TRAF1 is a receptor-associated tumor necrosis factor that regulates cell survival, proliferation, and death. TRAF1 is an adjuster molecule instead of an enzyme. This regulatory molecule is used for different signaling pathways, including mitogen-activated protein kinase (MAPK), c-Jun N-terminal kinases (JNK), Nuclear factor kappa B (NF-kB) as well as for signaling anticonception 54 . The 60 node network with an average neighboring number of 6.33 has been built and the result is that serine has good closeness centrality value because serine phosphorylation plays an important role in activating the NF-kappaB and JNK pathway 55 , network is shown in figure 6. whereas Arginine and Glutamic acid have a high value for the centrality parameters and it is involved in the formation of the salt bridge. Leucine and Valine have a good value for the centrality parameters and involved in the establishment of hydrophobic patches. Both salt bridges and hydrophobic patches are important for the stable structure of TRAF1 56 . Bacteria generates a PAD enzyme which can be a cause for aggressive RA by citrullinating arginine residues of TRAF1. This could be one cause of clearly reflecting arginine as a residue of high value in Table 2 and Table 7 of all central parameters with

Conclusion
This work illuminates the importance of the amino acid residues interaction network for any diseases associated target proteins. The methodical analysis supports the fact that residue with high centrality parameters plays a central role in protein function and structure, therefore, residues identified by the Residue-Residue Interaction network may be represented as potential drug targets for RA. Each protein has been interactively studied and combiningly the result reports that Leucine has a high node degree in all five proteins, hence it is very crucial for the overall function of all five proteins and therefore it may be playing a significant role in binding affinities and protein folding of these proteins which are represented as potential drug targets. The node degree distribution and centrality values were also very promising for Valine, Tyrosine, Phenylalanine, and tryptophan and this confirmed that they are also significant amino acid residues. 32 branched-chain amino acids such as Valine and Leucine have been already reported of having a highly immunomodulatory effect explaining the significance of this experiment. Tyrosine phosphorylation plays a crucial role in activating and surviving various immune molecules and their associated pathways, while tryptophan is extremely important concerning antibody-mediated immunity. The current study can be made more significant by examining already reported RA proteins, to understand the importance and the relationship between specific amino acids of each protein on their respective pathways.

Acknowledgment
We would like to thankthe Institute of Biosciences and Technology, Shri Ramswaroop Memorial University, Barabanki, for giving the facility to perform the research. A special thanks to Dr. Sachidanand Singh for allowing me to carry out this research and Kishore Ilangovan for his moral support.

Conflicts of interest
None of the authors have any conflict of interest. We also would like to declare that we do not have any competing interests.

Funding source
This project has not received any funding.