Phylogenetic network analysis of SARS-CoV-2 genomes-广东盛普生命科技有限公司

首页 ꄲ 盛普前沿 ꄲ COVID-19 ꄲ Phylogenetic network analysis of SARS-CoV-2 genomes

Phylogenetic network analysis of SARS-CoV-2 genomes

Peter Forster, Lucy Forster, Colin Renfrew, and Michael Forster

PNAS first published April 8, 2020 https://doi.org/10.1073/pnas.2004999117

Contributed by Colin Renfrew, March 30, 2020 (sent for review March 17, 2020; reviewed by Toomas Kivisild and Carol Stocking)

Significance

This is a phylogenetic network of SARS-CoV-2 genomes sampled from across the world. These genomes are closely related and under evolutionary selection in their human hosts, sometimes with parallel evolution events, that is, the same virus mutation emerges in two different human hosts. This makes character-based phylogenetic networks the method of choice for reconstructing their evolutionary paths and their ancestral genome in the human host. The network method has been used in around 10,000 phylogenetic studies of diverse organisms, and is mostly known for reconstructing the prehistoric population movements of humans and for ecological studies, but is less commonly employed in the field of virology.

Abstract

In a phylogenetic network analysis of 160 complete human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) genomes, we find three central variants distinguished by amino acid changes, which we have named A, B, and C, with A being the ancestral type according to the bat outgroup coronavirus. The A and C types are found in significant proportions outside East Asia, that is, in Europeans and Americans. In contrast, the B type is the most common type in East Asia, and its ancestral genome appears not to have spread outside East Asia without first mutating into derived B types, pointing to founder effects or immunological or environmental resistance against this type outside Asia. The network faithfully traces routes of infections for documented coronavirus disease 2019 (COVID-19) cases, indicating that phylogenetic networks can likewise be successfully used to help trace undocumented COVID-19 infection sources, which can then be quarantined to prevent recurrent spread of the disease worldwide.

The search for human origins seemed to take a step forward with the publication of the global human mitochondrial DNA tree (1). It soon turned out, however, that the tree-building method did not facilitate an unambiguous interpretation of the data. This motivated the development, in the early 1990s, of phylogenetic network methods which are capable of enabling the visualization of a multitude of optimal trees (2, 3). This network approach, based on mitochondrial and Y chromosomal data, allowed us to reconstruct the prehistoric population movements which colonized the planet (4, 5). The phylogenetic network approach from 2003 onward then found application in the reconstruction of language prehistory (6). It is now timely to apply the phylogenetic network approach to virological data to explore how this method can contribute to an understanding of coronavirus evolution.

In early March 2020, the GISAID database (https://www.gisaid.org/) contained a compilation of 253 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) complete and partial genomes contributed by clinicians and researchers from across the world since December 2019. To understand the evolution of this virus within humans, and to assist in tracing infection pathways and designing preventive strategies, we here present a phylogenetic network of 160 largely complete SARS-Cov-2 genomes (Fig. 1).