Research Mentor(s): Matthew Patrick, Research Investigator
Research Mentor School/College/Department: Department of Dermatology, Michigan Medicine
Presentation Date: Thursday, April 22, 2021
Session: Session 5 (3pm-3:50pm)
Breakout Room: Room 17
Healthcare research involves the processing and analyzing of collected data, which can cost valuable time. The purpose of this study is to develop a software library as a resource for researchers to process and analyze genetic data. This set of scripts were all developed in the Python 3 language, with the usage of the NumPy and Pandas libraries. The first script, getLoci.py, parses through a .txt file, with each line dedicated to a chromosome, its position, significance (p-value), and other data. It uses this information to identify significant loci within the dataset, and returns a list of the positions and p-values of markers which correspond to each locus. Significance of a locus is defined by a p-value threshold and distinct loci are identified by the distance between markers. As an alternative approach, linkageDisequlibrium.py assigns markers to their loci with respect to their linkage disequilibrium. The next script, compareLoci.py, compares two different sets of loci and identifies which loci are present in each set. The user can input the maximum distance between two loci for them to be considered the same. Lastly, fishersExact.py conducts the Fishers Exact test for enrichment of genetic loci among different features (represented by BED files). These scripts have proven to be efficient, as getLoci.py used 5000MB of memory in 4 minutes with a 20 million line dataset.
Authors: Nidhi Jaison, Lam Tsoi, Matthew Patrick
Research Method: Computer Programming