Life Sciences – Page 7 – UROP Spring Symposium 2021

Life Sciences

Predicting Pathogenicity of Clinical Mutations through Deep Learning

Recent developments in gene sequencing and personalized medicine provide a remarkable opportunity to revolutionize healthcare. Many single-nucleotide polymorphisms (nsSNPs) are associated with disease-causing mutations. Because these mutations are often eliminated from the gene pool through purifying selection, the opportunity to determine their relationship to human disease is rare. Previous studies have developed neural networks that can identify pathogenic mutations to an adequate accuracy, but success with solely human variants is still lacking. Here we will train a deep neural network with a large data set of clinically annotated human variants from the dbSNP database. Each input layer is comprised of a sequence containing a clinically annotated variant from the dbSNP database, as well as an evolutionary profile of closely homologous sequences generated from multiple sequence alignment. This input is fed through multiple layers of feature extraction to achieve a final output determination of benign or pathogenic. The final trained network will then be tested on a smaller, separate data set of disease variants to gauge the efficacy and accuracy of the training. Our aim is to develop a neural network that is able to identify pathogenic mutations in human disease patients with high accuracy. This will hopefully allow future studies to utilize this network in the diagnosis or treatment of rare disease patients.

Evaluating the Role of Claudin 4 protein in Inflammatory Retinal Lesion Pathogenesis Using an IR Mouse Model

Diabetic retinopathy is a debilitating complication of diabetes and a leading cause of blindness in the U.S. The blindness is caused by increased permeability of the blood-retinal barrier within the eye and also accumulation of fluid within the retina, called edema. Tight junction proteins in the vascular endothelial cells confer the blood-retinal and blood-brain barrier properties while diabetes alters these tight junctions contributing to vascular permeability. One family of tight junction proteins, claudins, have been shown to confer barrier property to tight junctions. Additionally, a recent study revealed that claudin 4 expression was associated with repair of ischemic damage in the brain. However, claudin 4 was expressed in glial cells, creating a glial limitans, rather than in endothelial cells. Given this interpretation, the current study aims to identify whether or not claudin 4 increases in astrocytes within an ischemic eye model. For the ischemic eye model, the lab used ischemic reperfusion to mimic the similar permeability induced in diabetic retinopathy. This study will also utilize immunofluorescence staining of whole mount retinas with astrocyte marker GFAP and claudin 4 antibodies, confocal microscopy, and imaris imaging analysis software. With this data, more conclusions can be drawn regarding the functionality and relationship between the levels of astrocytes and claudin 4 in the mouse model that mimics diabetic retinopathy.

Evaluating computational methods for predicting protein stability changes upon mutations

Protein-protein interactions (PPIs) are central in biological processes. Most proteins are marginally stable to perform their functions such as binding. Amino acid mutations taking place on proteins may change the protein stability and the affinity of a binding process, which further affects their biological functions. It is reported that amino acid mutations at protein-protein interfaces are frequently implicated in many diseases, including cancer. Therefore, it is of great significance to quantitatively predict the change of protein stability and binding affinity upon mutations, denoted as ddG_stability and ddG_binding. Many tools have been developed for ddG prediction, but there is lacking in a comprehensive comparison of their predictive powers. This study aimed to evaluate the predictive accuracy of a variety of widely used tools for ddG_stability and ddG_binding estimation on a large scale benchmark, placing a guidance on choosing the most accurate tool for ddG prediction.

Computational design of Raf-based protein binders to inhibit mutant Ras-Raf interactions

Designing proteins in an attempt to create treatments for illnesses is a cutting-edge area of research and has limitless possibilities in the field of medicine. One particular area of illness that could benefit from the protein design is cancer. Cancer occurs when there is a mutation in the genetic sequence that codes for a protein and, when produced by a cell, this mutated protein behaves abnormally in the form of loss of function, a gain of a new function, or hyperactivity. One of the leading ideas for treating cancer is the specific targeting of misbehaving proteins in an attempt to eliminate or correct their behavior. To accomplish this, new programs need to be developed to predict and design the structure of a folded protein and isolate protein-protein interactions that can be used to target the protein in question. This research project uses newly developed prediction programs, EvoDesign and EvoEF2, to design the novel structure of one of the proteins in a protein-protein interaction relevant to cancer and found in nature. The programs aim to accomplish this using energy functions and evolutionary patterns. For this study, an analysis was done on an oncogenic Ras mutant, and novel interacting partner protein sequences based on the native Raf scaffold were designed using EvoEF2. The designed protein partners were then compared to the native one. The analysis showed that the designed partners of this study had greater binding affinities for the mutant Ras sequence than the native partner did, and all protein-protein interactions with the designed partners were more energetically favorable than the native protein-protein interactions. Wet lab testing would be required to test the feasibility and binding of some of the best (most favorable) designs.

Skeletal Growth Abnormalities in TSP1/2 Double-Knockout Mice

Thrombospondins (TSPs) are proteins crucial to the development of bone. In mice with TSP1 and TSP2 knockout genotypes, referred to as double knock-outs (DKOs), exostoses, or bone growth outside of normal bone (particularly in soft tissues), and other abnormal skeletal growth phenotypes have been observed in the femoral-tibial joint, particularly the patellar region, specifically extending proximally in the quadricep tendon of older cohorts. To further explore these abnormalities, longitudinal radiographs of DKO mice and wild-types (WT), consisting of C57/Bl6 and functional WT were obtained. These X-ray images were taken of mice at 6 weeks, 9 weeks, 12 weeks, and 20 weeks old, in addition to images collected while conducting fracture research, which span from 13 to 94 weeks old. Mice from fracture surgeries consist of TSP1-null, TSP2-null, and CD47-null genotypes in addition to DKO and WT. All of the radiographs were then collated to compare development of these exostoses in relation to age and genotype, and semiquantitative analysis was performed to gain a better understanding of the severity of the observed abnormalities and determine the age at which they begin to develop. Overall, the current data leads us to conclude that exostoses is present by 24 weeks of age by radiography, with potential of developing even earlier.

Developing a Novel Drug Agent for EGFR Mutant Lung Cancer Patients

Non-small cell lung cancer (NSCLC) patients affected by mutant EGFR are often treated with tyrosine kinase inhibitors (TKIs). However, new mutations in EGFR during treatment often limit the overall efficacy of these TKIs. As a result, patients suffering from EGFR mutations have been left with no long-term treatment options. In order to combat this clinical crisis, the Nyati Lab has developed a novel molecule, DPI-503, which acts as a prolonged and more effective treatment for the EGFR mutations commonly observed in NSCLCs. This molecule works to impair EGFR dimerization, degrade activated forms of EGFR, and therefore, selectively target NSCLC tumors, including those that have developed resistance to existing TKIs. The success of DPI-503 has been confirmed in over 6 different mouse tumor models. Further, both in vivo and in vitro tests have resulted in EGFR driven cancer cell specific cytotoxicity. In order to contribute to the progress of the lab, I have researched and presented information regarding EGFR biology, inhibition mechanisms, and signaling pathways as well as facts and figures detailing the role of EGFR in the success of KRAS-G12C inhibition pathways in colorectal cancers. Additionally, I have engaged in weekly lab meetings to maintain a high degree of understanding regarding lab activities and updates. Overall, the studies produced through the Nyati Lab highlight a novel method for selective targeting of mutant EGFR and indicate that DPI-503 has the potential to improve clinical outcomes for patients with pancreatic, colorectal, and lung cancers. In the future, the Nyati Lab plans to initiate IND enabling studies with the aim of beginning Phase I clinical trials for the above-mentioned patient population.

Bioinformatics study on protein structures

The majority of proteins are composed of foldable, stable subunits called domains. The structures of these proteins can be made up of a single domain or multiple domains. Determining structures of multidomain proteins is a crucial step in elucidating their functions and designing new drugs to regulate these functions. However, it has been largely ignored by the mainstream of computational biology due to the difficulty in modeling inter-domain interactions. Therefore, almost all of the advanced protein structure prediction methods are optimized for modeling single domain proteins. In this study, we presented a method to construct a multidomain protein structure library with known full-length structures to assist the multidomain protein structure prediction. We collect all multidomain proteins from the Protein Data Bank based on the DomainParser, and multidomain proteins defined in CATH and SCOPe databases are also included in the library. This resulted in a total of 15,293 multidomain proteins in the library. The completeness of the library is examined by structurally matching a set of non-redundant multidomain proteins through the library using TM-align. The results show that most of the cases can obtain at least 1 template with correct global fold (TM-score >0.5) from the library, which indicates that the constructed multidomain protein library can likely be used to guide the multidomain protein structure modeling.

Multidomain protein analogous templates detection based on TM-align

Protein structure prediction is a crucial step to understanding and transforming biological and cellular functions. Most proteins exist with multiple domains in cells for cooperative functionality. However, due to the technical difficulties in structural biology, most of the multidomain proteins have only single domain structures solved. To guide the multidomain protein modeling, we present a two-step procedure method to detect the analogous templates from the multidomain protein structure library which includes the multidomain proteins with known full-length structures through the structural alignment. In the first step, individual domains are used to evaluate each template by TM-align, regardless of the overlap between the alignments of different domains, and the average TM-score of all domains is calculated as the local score of a template. In the second step, the top 500 templates selected from the first step are evaluated by the TM-align again with no overlap allowed in the alignments of different domains, and the average TM-score is defined as the global score of a template. Finally, the template with the best global score is selected as the best template. We test the method over 2,269 non-redundant proteins with 2 domains. With homologous templates with sequence identity >30% to the targets excluded, the results indicated that >80% of target proteins have at least 1 template with a TM-score >0.5 and alignment coverage >90%. The data demonstrate that most interdomain orientations can be inferred from the template library, which probably can be used to assist the multidomain protein structure assembly from the independently determined/predicted domain models.

lsa logoum logo