Hello, Guest!
DNA

NIH Releases COVID-19 Data to Public Cloud; Ryan Layer Quoted

The National Institutes of Health will release COVID-19 genomic information to the cloud, enabling researchers to publicly access the data, FedScoop reported Tuesday.

“The NCBI Coronavirus Genome Sequence Dataset makes over a decade of viral genome data publicly accessible for researchers, empowering anyone in the research community to participate in the pandemic response,” said Ryan Layer, assistant professor at the University of Colorado Boulder’s BioFrontiers Institute.

The Coronavirus Genome Sequence Dataset was created by the National Center for Biotechnology Information, and has compiled researcher-submitted data such as normalized Sequence Read Archive (SRA) file formats. NIH’s dataset will feature more than 13 thousand SRA runs.

With the recent public release of this data, researchers with active NIH awards will be able to access the dataset at no cost through the Registry of Open Data on Amazon Web Services (AWS). 

NIH’s data will help researchers further study COVID-19 and pandemic diseases, including differences in genetic sequences among infected patients, virus evolution, the role of genetics and how patients react to infection.

The dataset consists of two buckets: one containing raw and normalized files categorized by SRA accession code and another containing accession metadata that will soon be queryable within the Amazon Athena interactive query service.

“Containing COVID-19 outbreaks and preparing for future pandemics will require a deep understanding of the SARS-CoV-2 genome in the context of other COVID-19 patients and the broader Coronaviridae family,” Layer added. 

The project is part of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative. STRIDES is a collaboration between NIH and AWS to use the cloud to assist researchers with active NIH awards.

Video of the Day

Related Articles