Engineering – Page 15 – UROP Spring Symposium 2021

Engineering

Text Analysis of International Trade Agreements

Our project aims to analyze the sentiment and its impact reflected by the wording of trade agreements. We use machine learning to identify topics in the text of trade agreements and then using python to estimate the importance of these topics. Our contribution will be both identifying these topics that have the potential to affect trade flows through text analysis and estimating the sign and size of their impact.

Bioinformatics study on protein structures

The majority of proteins are composed of foldable, stable subunits called domains. The structures of these proteins can be made up of a single domain or multiple domains. Determining structures of multidomain proteins is a crucial step in elucidating their functions and designing new drugs to regulate these functions. However, it has been largely ignored by the mainstream of computational biology due to the difficulty in modeling inter-domain interactions. Therefore, almost all of the advanced protein structure prediction methods are optimized for modeling single domain proteins. In this study, we presented a method to construct a multidomain protein structure library with known full-length structures to assist the multidomain protein structure prediction. We collect all multidomain proteins from the Protein Data Bank based on the DomainParser, and multidomain proteins defined in CATH and SCOPe databases are also included in the library. This resulted in a total of 15,293 multidomain proteins in the library. The completeness of the library is examined by structurally matching a set of non-redundant multidomain proteins through the library using TM-align. The results show that most of the cases can obtain at least 1 template with correct global fold (TM-score >0.5) from the library, which indicates that the constructed multidomain protein library can likely be used to guide the multidomain protein structure modeling.

Ontology-based machine learning towards COVID-19 drug understanding

The pandemic caused by COVID 19 marked its one-year anniversary on March 12, 2021. Since last spring, millions have been victims of this terrible disease and millions have been infected across the globe. In the United States alone, there have been almost 30 million cases, and over 500 thousand people have passed away. Vaccines have been manufactured and distributed around the globe, however, officials predict that COVID 19 will never be fully eradicated, similar to the flu. That is why the objective of the COVID-19 Bioinformatics research project is to determine a drug or a cocktail of drugs using COVID-19 virology data and machine learning that can potentially provide treatment. The process of implementing the algorithms began with feeding data into an algorithm titled OpA2Vec that transformed ontology-based axioms into high dimensional vector representations using cosine similarities. These high-dimensional vectors will be compressed into two dimensions by running them through a t-distributed stochastic neighbor embedding (t-SNE) analysis in order to graph them on two dimensions. The vectors represent how effectively different drugs will react with the different target proteins of COVID19. The graph will help determine clusters or patterns to develop a proof of concept and a potential hypothesis for future experimental verification. A linear neural network modeling is also being implemented. The results will be able to demonstrate a potential drug design for the COVID19 virus that has completely transformed the world as we know it today. Our results will provide a proof of concept to potentially support the experimental verification of our theoretical findings.

Khichri

Although there is abundant evidence that humans are altering the climate in drastic ways, this information is not always readily available to the general public, especially in developing countries around the world. To combat this issue, this research project focuses on studying the general perception of climate change and its impact on food scarcity in Pakistan, and it utilizes foundational design elements to create an interactive web app that would help inform Pakistani youth about the harms of climate change and its impact on food security and costs.

Khichri

Known to our scientific community are the inevitable impacts of Climate Change – catastrophic effects on agriculture and food availability, an increase in extreme weather, and a grander spread of deadly diseases and viruses as a result of more humid and hot climates. However, there is a problem with this information: It’s not being communicated to the general public, especially educated, Urban Pakistanis. There is a large gap between research and research communication; thus, the general public – especially young students and adults – are unaware of the upcoming effects of this global change. Khichri aims to bridge that gap, to bring this information to the jury of the common citizen, and to finally create this urge among the younger generation to take control of their own future. This project focuses on Pakistan (specifically the city of Karachi), the fifth-most vulnerable country to long-term anthropogenic effects of climate change and a country unable to address their own climate concerns. We look into the students’ current understandings of climate change by means of interviews and surveys.

Khichri

Climate change is causing destruction on our environment and in turn, affecting our daily lives. This is most widely known in the form of natural disasters that cause destruction onto our homes, extreme heat waves in summer, etc. In fact, climate change is also causing a shortage in our food and water supply. Although climate change is an issue that should be taken seriously, many people are still oblivious to the scale of its effects. This project is a non traditional look on the effects of climate change on underdeveloped countries, especially Pakistan. We experiment with different factors to create an interactive website that is fun and creative to inform users of the possible effects of climate change on our food systems. The users will take a look into the future on how the supply of ingredients will increase or decrease based on climate change. In practice, this site can be adjusted for many scenarios in other countries. This will be beneficial for the planet as a whole when users communicate and spread their knowledge.

Developing fast and unbiased computer vision algorithms

Computer Vision Algorithms are a fast-developing technology, and we have seen from example that they are currently not as accurate and unbiased as we hope they can be. Our project aims to develop a more efficient system and algorithm to reduce bias in computer vision programs. One way bias may be introduced into these algorithms is through issues with light levels in images and videos. Since many computer vision algorithms rely on video cameras, as opposed to infrared or another type of light, the lack of light in videos introduces uncertainty in a program, which can produce bias, where some categories of images are more accurately processed by the algorithm than others. This bias can manifest itself in different scenarios, such as during nighttime or when recording people with darker skin, and these are the biases that we aim to correct. My part in the project involved labelling the videos that are going to be used for analysis for the algorithms, and attempting to help create a standardized method of labelling in order to have a set of videos with which the algorithm can be trained with. Our sample set was purposefully selected to have a variety of videos with different light levels and skin tones. Our ultimate purpose was to label as many videos as possible to use later on in the project, where other groups are working on developing the algorithm and all other overarching parts of the project. The main project was not completed, and likely will not for some years, but we achieved our loose goal of labelling videos.

Identifying Brain Edema in CT Scans Using Machine Learning

Brain edema is the swelling of the brain as a result of traumatic brain injuries, strokes, tumors, and infections. This affects the patients cognitive and motor function and can lead to lasting adverse health risks and death. Early and accurate identification of edema can prevent these hazards. Studies have found that brain edema is difficult for clinicians to accurately identify, as it often blends in with other brain matter. Additionally finding a link between the volume of edema and the effect on the patient is considered valuable, but there is currently no standard software in place for this end. Even when clinicians are able to identify edema, they are not able to quantify the volume present. Convolutional neural networks were used for training the model to segment the edema region. Images from the PROTECT III collection at the University of Michigan hospital were used for this research. Some images were previously annotated by clinicians and these images were subsequently used in the process of training the machine learning model. The performance of the model was evaluated using quantitative techniques such as dice, sensitivity, specificity, accuracy, and AUC. The goal of this software is to decrease adverse effects and death related to brain edema by creating a system to quantitatively measure edema and make informed decisions on how to treat the patient based on the information collected.

Developing fast and unbiased computer vision algorithms

In an effort to improve driver safety and autonomous vehicle testing, video recordings of drivers allow for data to be analyzed. These videos are first examined by human coders, but a more efficient, automated algorithm would prevent the need for human coders entirely. However, in order to build the algorithm, human coders need to analyze videos of drivers and label various actions, such as if the driver is turning or tilting their head, or hand movements, such as texting, and if their hand is obscured. Once these labels are implemented, they are tested against each other for accuracy, so that the final algorithm is unbiased enough to be implemented into vehicle safety.

Head CT image analysis for detecting edema

Cerebral edema, which is swelling in the brain, is commonly found in patients suffering from head trauma, injuries, or other diseases and can be fatal. The purpose of this research project was to develop a method for automatically detecting and segmenting edema in head CT scans in order to make it faster and easier for clinicians to diagnose and treat traumatic brain injury (TBI) patients. However, edema is difficult to segment due to its unclear boundaries and its similarity in pixel value to other brain tissue. In previous research, most methods for segmenting edema have either been semi-automated or for MRI scans. More accurate methods require MRI scans, but even though an MRI scan is more detailed and can make it easier to segment edema, CT scan is the gold standard for evaluating brain injuries and is faster and more widely available. Therefore, automatic segmentation of CT scans will be very beneficial. In this project, the active contours without edges method, developed by Chan and Vese, is used with manually segmented hematoma as the initial contour. The method was developed in MATLAB. The segmented edema was then compared with the manually segmented images, and the DICE score was used to measure the accuracy. Currently, this method successfully segments select CT scans. However, it needs improvement in order to be more generalizable. In the future, I will look further into other techniques such as deep learning that can help improve accuracy and generalizability in automatic edema segmentation.

lsa logoum logo