Text-Mining Tool for Extraction of miRNA Biomarkers from PubMed Abstracts | Computational Medicine Center at Thomas Jefferson University

An initiative on the development of a text-mining tool for the extraction of miRNA biomarkers from PubMed Abstracts.

In recent years, the role of microRNAs (miRNAs) as biomarkers in human diseases has been studied due to the function of miRNAs to control and regulate gene expressions. However, research involving the impact of specific miRNAs and their specific influence on diseases is inefficient due to a significant obstacle in the medical industry: the constantly updating PubMed database containing thousands of unsorted and complex abstracts. With the above observation in mind, during the summer of 2020, we, Alexander Liang and Matt Laws, began a six-week internship under Thomas Jefferson University professor, Dr. Nestoras Karathanasis. Our internship focused on the implementation of a text-mining program that employed the R programming language. Our software automatically and efficiently extracts miRNA biomarkers – disease relationships within publicly available abstracts on PubMed.

Diagram of the tasks we performed in our internship

Per abstract, our program extracts six pieces of important information per abstract:

diseases: all diseases mentioned in the abstract
miRNAs: all miRNAs mentioned in the abstract
relationships: all sentences containing relationships between a disease and miRNA
PMID: the identifier for each abstract
organisms: organisms that the miRNAs belong to
countries: geographic information of research

The end product is a data table with rows containing the succinct summaries of thousands of abstracts, generated at the approximate rate of one summary for an abstract per second. The program compiles the information in a concise, easy to access spreadsheet where each row contains all 6 pieces of extracted information.

Following extraction, we evaluated the accuracy of our tool using the manually curated Human microRNA Disease Database (HMDD) [1] database as our ground truth. We employed three statistics:

recall, which measures the program’s ability to extract relevant relationships according to HMDD
precision, which measures the program’s ability to only generate true relationships according to HMDD
f-score, which averages the previous two statistics

At the end of the internship, the program’s best performance on finding miRNA-disease relationships for a disease was a recall of 0.731, a precision of 0.864, and a f-score of 0.792.

miRNA biomarkers for hepatocellular carcinoma

For more information you can download our presentation slides.

References

Huang, Z, Shi, J, Gao, Y, Cui, C, Zhang, S, Li, J, et al. (2019). HMDD v3.0: A database for experimentally supported human microRNA-disease associations. Nucleic Acids Res. 47, D1013–D1017. doi:10.1093/nar/gky1010. PubMed PMID:30364956.

About Authors

Alexander Liang

Summer Intern

Matt Laws

Summer Intern

Supervisor

Nestoras Karathanasis

Teaching Assistant Professor

Dr. Nestoras Karathanasis is a teaching assistant professor at the College of Life Science at Thomas Jefferson University (TJU) in Philadelphia. He received his PhD from the University of Crete, Greece in the field of "miRNA-mRNA interactions related with cancer" in 2013 and in 2015 was appointed Lead Bioinformatician at Miroculus company in San Francisco before joining TJU.

He has an active interest in precision medicine and his work focuses on the statistical integration of different omics data, the development of miRNA mapping tools and apply machine learning algorithms to address biological questions in key areas such as cancer. As a teaching assistant professor, he has developed three graduate-level courses on computational skill acquisition, (i.e. R programming, data visualization, and transcriptomics data analysis). In 2018, he created and led a machine learning team from the Computational Medicine Center of TJU in participating in the DREAM Single Cell Transcriptomics challenge. His team ranked 6th place among 49 participating teams. As of 2019, he is co-recipient of the TJU Data Science Award for his work as a member of the Computational Medicine Center team.

Text-Mining of miRNA Biomarkers from PubMed Abstracts

Read More

References

About Authors

Alexander Liang

Matt Laws

Supervisor

Nestoras Karathanasis

Left Sidebar