How genomics data science has helped to develop the COVID-19 mRNA vaccine

Commentary by Maria de los Angeles Becerra Rodriguez, a PhD student in the SFI Centre for Research Training in Genomics Data Science

The mRNA vaccines against SARS-CoV-2 virus look like our best weapon so far, to save the world from the COVID-19 pandemic. The first doses were administered in the UK this week.


But how have we been able to get a vaccine in less than a year? Considering clinical trials started only this spring, how was the mRNA vaccine designed so quickly? The answer is that Genomics Data Science helped enormously with this.

The following story is familiar to most. A patient showing respiratory inflammation symptoms was admitted to the Central Hospital of Wuhan on 26 December 2019, but routine viral infection tests all came back negative. Subsequently, total RNA was extracted from the patient’s bronchoalveolar lavage fluid, then ribosomal RNA was depleted and fragmented and the virus was sequenced by Next Generation Sequencing.

Here is where Genomics Data Science becomes important. The sequences of the RNA fragments were aligned against each other to complete the unknown virus puzzle. The longest puzzle completed was 30,474 nucleotides long and very similar to a bat SARS-like coronavirus isolate. The RNA genome organization of this unknown virus, later named SARS-CoV-2, was resolved by comparing its sequence to the bat coronavirus, Bat SL-CoVZC45, and the associated human coronavirus, SARS-CoV Tor2.

The Spike protein was identified and, interestingly, the motifs that enable it to infect humans were detected: amino acids at regions 433–437 and 460–472 in the Spike protein, which are not present in the bat coronavirus, interact with the human cell receptor ACE2.

Then thanks to scientific efforts across the globe, which relied heavily on Genomics Data Science methodologies, the mRNA sequence that codes for the SARS-CoV-2 Spike protein was chosen earlier this year as the molecule that would be part of the mRNA vaccines.

Since then, computational biology has continued to help trace the virus spread and evolution across the world and identify how mutations in the SARS-CoV-2 RNA genome affect the efficacy of the vaccines. Due to natural selection, an alarming variant of the virus with a modification in the 614th amino acid of the Spike protein has emerged. It may represent a more transmissible form of SARS-CoV-2.

How do you think this could affect the vaccine’s efficacy?

The hard work of healthcare and front-line workers as well as scientists is making it possible to save thousands of lives. Here we highlighted how Genomics Data Science can contribute to changing the world!


Wu, F., Zhao, S., Yu, B., Chen, Y. M., Wang, W., Song, Z. G., … & Yuan, M. L. (2020). A new coronavirus associated with human respiratory disease in China. Nature, 579(7798), 265-269.

Korber, B., Fischer, W., Gnanakaran, S. G., Yoon, H., Theiler, J., Abfalterer, W., … & Partridge, D. G. (2020). Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. bioRxiv.

Hadfield et al., Nextstrain: real-time tracking of pathogen evolution, Bioinformatics (2018)



Share the news


Latest News