Development of protein property prediction methods from sequence based on deep learning – UROP Spring Symposium 2021

Development of protein property prediction methods from sequence based on deep learning

Harry Yang

Harry Yang

Pronouns: he/him/his

Research Mentor(s): Yang Zhang, Professor
Research Mentor School/College/Department: Department of Computational Medicine & Bioinformatics, Michigan Medicine
Presentation Date: Thursday, April 22, 2021
Session: Session 5 (3pm-3:50pm)
Breakout Room: Room 11
Presenter: 2

Event Link


Although proteins have become increasingly easier to sequence, experimental determination of a protein’s structure remains difficult and time-consuming. Therefore, the prediction of a protein’s structure and properties based on its sequence is a key challenge in making better use of the vast amount of sequencing data. Our project seeks to develop a deep-learning-based method that uses a protein’s sequence to predict its properties, such as phi/psi angles and solvent accessibility. The initial goal of the project was to design and write the deep-learning program using the PyTorch library. After completion of the program, we assembled a training and testing set based on existing data from the Protein Data Bank and used the training data to train the model. We then ran the testing dataset and analyzed the results by comparing the predicted properties to the experimentally determined ones. While we do not have any results yet, we hope to be able to make conclusions about the relative effectiveness of the model we design compared to existing models for prediction. The results we obtain could help us determine which prediction techniques or algorithms are well-suited to this task, or which ones lead to errors and thus may need to be avoided in future research. The results could also contribute to improving the accuracy and efficiency of computational protein structure prediction, allowing scientists to make better use of the available sequencing data without the difficulties of experimental determination.

Authors: Harry Yang, Yang Li, Yang Zhang
Research Method: Computer Programming

lsa logoum logo