Targeting SARS-CoV-2 by Human MiRNAs

#OpenScience #OpenCode #OpenData #MachineLearning

Blog Summary

The analysis below identifies a number of miRNA and miRNA clusters (e.g. let-7a/let-7b/miR-4763) that are predicted to target different regions of SARS-CoV-2. Further work is needed to validate experimentally if the activation of these miRNA can prevent or slow down the replication of the SARS-CoV-2 virus and help with the COVID-19 pandemic. Of note is that let-7b was previously shown to be a regulator of Hep C virus replication.

Background

Our human bodies already contain molecules that are able to regulate other molecules. As an example of RNA molecules regulating other RNA molecules, short (~22 nucleotides) molecules known as microRNA (miRNA) are known to regulate the abundance of the longer messenger RNAs (mRNAs) and, by extension, the proteins that these mRNAs make. One of the topics on which I have been working for the past 9 years is the development of methods and tools that predict miRNA-mRNA interactions. Because SARS-CoV-2 (the strain of coronavirus that causes COVID-19) is a positive-sense RNA virus, it lends itself well to this type of analysis. In the analysis below, I set out to determine if there are human miRNAs with the potential to target SARS-CoV-2.

Data and Code Availability

All source code and data for this blog post are available on GitHub. All miRNA-mRNA predicted targets were made using the publicly available version 2 of the rna22 that I helped implement over 7 years ago. The rna22 algorithm was originally described here. For this analysis, the following rna22 settings were used:

the size of the seed region was set to n=6
no mismatches/bulges were allowed in in the seed region
all heteroduplexes were forced to have a minimum number of n=14 paired-up bases, and
the folding energy of the heteroduplex could not a maximum threshold of -14 Kcal/mol

In the analysis I focus only on human miRNA that are predicted to target multiple regions of the virus.

Analysis Overview

I provide two code notebooks, each of which analyzes different viral targets:

The first notebook identifies human miRNA whose predicted targets are located anywhere across the ~30kb RNA strand of the Wuhan-Hu-1 genome.
In the second notebook, I identify human miRNA whose predicted targets are located in the vicinity of a transcriptional regulatory sequence (TRS), SLIP, or KNOT or near the start of a gene (STARTGENE).

There were a few considerations and assumptions I made in the approaches above that are worth mentioning:

As a virus mutates having weapons (in this case, miRNAs) that don’t require exact matches may help. MiRNA’s form heteroduplexes with their targets that are essentially ‘forgiving’. This analysis takes advantage of this property and exact-matching is only enforced in the seed.
MiRNAs that are predicted to target SARS-CoV-2 multiple times and appear in clusters in the human genome are preferred. For the purposes of this analysis, I define a cluster as a set of human miRNAs that are not more than 10kb from one another. In both notebooks, reported miRNA clusters are sorted in decreasing order of the number of predicted targets on the viral genome.
Both notebooks account for predictions that target the virus’s positive-sense genome RNA and its negative-sense intermediate.
The features used (e.g. TRS, SLIP, …) in the 2^nd notebook may play interesting roles in the replication life-cycle of the virus. For a nice overview of the transcription and translation of coronavirus see here and here.

To give some examples of miRNA that surface in this analysis – in the 2^nd notebook you will see that the cluster comprising miR-512-3p/miR-515-3p/miR-519e-3p/ miR-520a-3p/miR-523-3p targets seven viral locations around the TRS of ORF10 (negative-sense), the frameshift area of ORF1ab (negative-sense), and the TRS of ORF6 (negative-sense). If you look at the first notebook for the same set of miRNAs, you will notice that they are part of a larger cluster of 75 miRNAs that are predicted to target the viral genome at 238 locations (evenly split between positive-sense and negative-sense) in 10 viral genes and the genome’s 5’ UTR.

As another example, the cluster let-7a-5p/miR-4763-5p/miR-4763-3p/let-7b-5p targets six unique viral locations around the start of ORF10 and ORF3b (negative-sense), the TRS of ORF6 (negative-sense), and the TRS of ORFS (positive-sense). Across the full SARS-CoV-2 genome, this cluster of miRNAs is predicted to target 113 locations (slight skew in favor of the positive-sense) in 11 different genes and the genome’s 5´ UTR. As noted above, let-7b was previously linked as a regulator of Hep C virus replication.

In the first notebook, you will also observe a very large cluster (listed first with 67 members) around the MEG8 and MEG9 lncRNAs. Of note is that the cluster comprising let-7a-5p has fewer cluster members than this one, but a relatively high number of predicted targets per member. To this end, there are also some human miRNAs that are not part of a cluster and are predicted to have a disproportionate number of targets (e.g. miR-6756-5p with 82 predicted targets, miR-6848-5p/3p with 56, and miR-6846-5p with 54). In general, it is interesting that there are miRNAs or their clusters that tend to target the same virus, but at differing features. This signifies that these regions may have been selected upon in the past. More work would be needed to calculate the statistical properties of these observations.

It is important to emphasize that the above represents in silico work. Follow-up experimental work will be needed to evaluate these findings and I hope that sharing this analysis may help bootstrap independent efforts towards this end, and contribute to ongoing efforts to address the problems caused by SARS-CoV-2 infections.

About Author

Phillipe Loher

Director, Machine Learning

Phillipe specializes in Big Data processing for biological discovery. Phillipe has worked for the Computational Medicine Center at Thomas Jefferson University for over 9 years where he has designed many algorithms and software systems needed to efficiently analyze thousands of large datasets. His involvement in advanced software engineering algorithms and programs spans more than 18 years. During that time, he has been involved in a large number of applied computer science and computer engineering activities including: machine learning, data analytics, high performance computing, digital signal processing, low level device drivers, mobile phone platform development, security and security encryption algorithms, and cloud-development. Before joining Thomas Jefferson University (TJU), Phillipe worked at IBM Lotus Software for 8.5 years in various Software Engineering roles within the feature development teams. For several years prior to leaving IBM, he served as manager of software engineers and teams located around the globe.

Feature image: Image at the NIAID Integrated Research Facility (IRF) in Fort Detrick, Maryland. Credit: NIAID (https://creativecommons.org/licenses/by/2.0/)