A software library for combining, processing and analyzing multi-omic and electronic health record data – UROP Spring Symposium 2021

A software library for combining, processing and analyzing multi-omic and electronic health record data

Jalen Ballard


Research Mentor(s): Matthew Patrick, Research Investigator
Research Mentor School/College/Department: Department of Dermatology, Michigan Medicine
Presentation Date: Thursday, April 22, 2021
Session: Session 5 (3pm-3:50pm)
Breakout Room: Room 17
Presenter: 5

Event Link


The efficient searching of existing genomic markers is essential for expediting the analysis of key biological information and further manipulating the data in bulk to discover patterns. In particular, the Cutaneous Bioinformatics project aims to process genetic mutations, namely simple nucleotide polymorphisms and insertions-deletions, to deduce how certain mutations facilitate certain epidemiological conditions. To efficiently analyze the genomic markers, a C++ program was developed, tested, and published on GitHub that reads the data from a standard tab-separated value text format and inputs the data into a two-way hash map. The program was developed, tested, and documented by a single individual under the supervision and direction of the leaders of the Cutaneous Bioinformatics project. The software takes command-line arguments and can perform two-way lookups between the markers’ rsIDs and their chromosome, position, and allele sequence. The utilization of a hash table is ideal because it allows lookups in both directions to be performed with constant-time complexity, while manual analysis of the original, multi-gigabyte data file requires linear-time complexity for reverse lookups. The hash table creation was successfully implemented, and, upon deliberations with project supervisors, the syntax of user input for allele sequences in the reverse lookup was taken into account when determining which markers with a particular chromosome and position constitute a match. The incorporation of additional markers and data across other biological disciplines can further augment the ability of researchers to quickly analyze the data and perhaps lead to genetic discoveries.

Authors: Jalen Ballard, Lam Tsoi, Matthew Patrick
Research Method: Computer Programming

lsa logoum logo