#OpenScience #OpenCode #OpenData #MachineLearning
Our human bodies already contain molecules that are able to regulate other molecules. As an example of RNA molecules regulating other RNA molecules, short (~22 nucleotides) molecules known as microRNA (miRNA) are known to regulate the abundance of the longer messenger RNAs (mRNAs) and, by extension, the proteins that these mRNAs make. One of the topics on which I have been working for the past 9 years is the development of methods and tools that predict miRNA-mRNA interactions. Because SARS-CoV-2 (the strain of coronavirus that causes COVID-19) is a positive-sense RNA virus, it lends itself well to this type of analysis. In the analysis below, I set out to determine if there are human miRNAs with the potential to target SARS-CoV-2.
Data and Code Availability
All source code and data for this blog post are available on GitHub. All miRNA-mRNA predicted targets were made using the publicly available version 2 of the rna22 that I helped implement over 7 years ago. The rna22 algorithm was originally described here. For this analysis, the following rna22 settings were used:
- the size of the seed region was set to n=6
- no mismatches/bulges were allowed in in the seed region
- all heteroduplexes were forced to have a minimum number of n=14 paired-up bases, and
- the folding energy of the heteroduplex could not a maximum threshold of -14 Kcal/mol
In the analysis I focus only on human miRNA that are predicted to target multiple regions of the virus.
I provide two code notebooks, each of which analyzes different viral targets:
- The first notebook identifies human miRNA whose predicted targets are located anywhere across the ~30kb RNA strand of the Wuhan-Hu-1 genome.
- In the second notebook, I identify human miRNA whose predicted targets are located in the vicinity of a transcriptional regulatory sequence (TRS), SLIP, or KNOT or near the start of a gene (STARTGENE).
There were a few considerations and assumptions I made in the approaches above that are worth mentioning:
- As a virus mutates having weapons (in this case, miRNAs) that don’t require exact matches may help. MiRNA’s form heteroduplexes with their targets that are essentially ‘forgiving’. This analysis takes advantage of this property and exact-matching is only enforced in the seed.
- MiRNAs that are predicted to target SARS-CoV-2 multiple times and appear in clusters in the human genome are preferred. For the purposes of this analysis, I define a cluster as a set of human miRNAs that are not more than 10kb from one another. In both notebooks, reported miRNA clusters are sorted in decreasing order of the number of predicted targets on the viral genome.
- Both notebooks account for predictions that target the virus’s positive-sense genome RNA and its negative-sense intermediate.
- The features used (e.g. TRS, SLIP, …) in the 2nd notebook may play interesting roles in the replication life-cycle of the virus. For a nice overview of the transcription and translation of coronavirus see here and here.
To give some examples of miRNA that surface in this analysis – in the 2nd notebook you will see that the cluster comprising miR-512-3p/miR-515-3p/miR-519e-3p/ miR-520a-3p/miR-523-3p targets seven viral locations around the TRS of ORF10 (negative-sense), the frameshift area of ORF1ab (negative-sense), and the TRS of ORF6 (negative-sense). If you look at the first notebook for the same set of miRNAs, you will notice that they are part of a larger cluster of 75 miRNAs that are predicted to target the viral genome at 238 locations (evenly split between positive-sense and negative-sense) in 10 viral genes and the genome’s 5’ UTR.
As another example, the cluster let-7a-5p/miR-4763-5p/miR-4763-3p/let-7b-5p targets six unique viral locations around the start of ORF10 and ORF3b (negative-sense), the TRS of ORF6 (negative-sense), and the TRS of ORFS (positive-sense). Across the full SARS-CoV-2 genome, this cluster of miRNAs is predicted to target 113 locations (slight skew in favor of the positive-sense) in 11 different genes and the genome’s 5´ UTR. As noted above, let-7b was previously linked as a regulator of Hep C virus replication.
In the first notebook, you will also observe a very large cluster (listed first with 67 members) around the MEG8 and MEG9 lncRNAs. Of note is that the cluster comprising let-7a-5p has fewer cluster members than this one, but a relatively high number of predicted targets per member. To this end, there are also some human miRNAs that are not part of a cluster and are predicted to have a disproportionate number of targets (e.g. miR-6756-5p with 82 predicted targets, miR-6848-5p/3p with 56, and miR-6846-5p with 54). In general, it is interesting that there are miRNAs or their clusters that tend to target the same virus, but at differing features. This signifies that these regions may have been selected upon in the past. More work would be needed to calculate the statistical properties of these observations.
It is important to emphasize that the above represents in silico work. Follow-up experimental work will be needed to evaluate these findings and I hope that sharing this analysis may help bootstrap independent efforts towards this end, and contribute to ongoing efforts to address the problems caused by SARS-CoV-2 infections.
Feature image: Image at the NIAID Integrated Research Facility (IRF) in Fort Detrick, Maryland. Credit: NIAID (https://creativecommons.org/licenses/by/2.0/)