Analyzing Readability of COVID-19 Biomedical Literature – UROP Summer Symposium 2021

Analyzing Readability of COVID-19 Biomedical Literature

Aaron Zheng

Aaron Zheng

Pronouns: He/Him/His

UROP Fellowship: Engineering
Research Mentor(s): Yulia Sevryugina, PhD
Research Mentor Institution/Department: U-M Library

Presentation Date: Wednesday, August 4th
Session: Session 1 (3pm-3:50pm EDT)
Breakout Room: Room 1
Presenter: 5

Event Link


Reading scientific literature such as preprints and papers is different from reading ordinary English novels. The average American reads at the 7th- to 8th-grade level, according to the Literacy Project [1]. The COVID-19 pandemic is unique in a way that not only scientists but also the general public actively seek for scientific information that would help to understand the new viral infection and fight its spread. The COVID-19 related literature has been read by the public, government officials, and people who had no prior biomedical education but it is questionable how much was understood. The goal of this project is to understand how readable the COVID-19 biomedical articles are and whether they could be easily understood by an average American. To test this, we analyzed pairs of abstracts from preprints and corresponding research articles related to COVID-19. To evaluate the reading difficulties, we extracted the text from the PDF of the papers using pdfminer and used the previously developed API by Dr. Collins-Thompson et al. [2] readability measure for generic English. The first thing we observed was that the previously developed algorithm can be applied to scientific literature. The general readability scores we obtained for abstracts of COVID-19 papers were in the range of 5th- to 12th grade level, with an average grade level of 8.5, which is a grade level above the average American reading level. We also found that abstracts of peer-reviewed papers and their preprint counterparts have similar readability scores and differ only slightly in their content. The next step would be to extract the text from individual sections of an article and compare it with the corresponding text in a preprint, as well as adding more data to strengthen the statistical strength of this analysis.

Authors: Aaron Zheng, Dr. Kevyn Collins-Thompson, and Dr. Yulia Sevryugina
Research Method: Computer Programming

lsa logoum logo