Fred Hutch virologist Dr. Trevor Bedford and Max Planck physicist and computational biologist Dr. Richard Neher designed a prototype called nextstrain to analyze and track genetic mutations during the Ebola and Zika outbreaks, but they envision it as adaptable for any virus.
After three rounds of competition — one of which involved a public vote — the software tool has won the first-ever international Open Science Prize.
Using the platform Bedford and Neher built, anyone can download the source code from the public-access code-sharing site GitHub, run genetic sequencing data for the outbreak they are following through the pipeline and build a web page showing a phylogenetic tree, or genetic history of the outbreak, in a few minutes, Bedford said.
The prize competition was sponsored by the U.S. National Institutes of Health, the British-based charitable foundation Wellcome Trust and the U.S.-based Howard Hughes Medical Institute.
“Everyone is doing sequencing, but most people aren’t able to analyze their sequences as well or as quickly as they might want to,” Bedford said. “We’re trying to fill in this gap so that the World Health Organization or the U.S. Centers for Disease Control and Prevention — or whoever — can have better analysis tools to do what they do. We’re hoping that will get our software in the hands of a lot of people.”
For now, the tool is easy to use for Zika and Ebola. (The researchers also built a separate platform called nextflu for influenza.) But adapting the platform for other pathogens still involves a fair amount of work and technical skill, so Bedford is working with a web developer to “get that bar down so it will be easier to have this built out for other things.”
The $230,000 prize money will go towards furthering this effort to adapt the system for additional pathogens.
Nextstrain “is an exemplar of open science and will have a great impact on public health by tracking viral pathogens,” said Robert Kiley, who leads Wellcome’s work on open research, in a statement. All of the Open Science Prize entrants “demonstrated what’s possible when data and code are made open for all,” he said.
The seed for nextstrain sprouted while Bedford was doing postdoctoral research at the University of Michigan. He had published a paper on flu migration using data up to 2010. He found himself thinking what a pity it was that the analysis couldn’t be updated as new data came out. But the fact that a paper had already been published was a disincentive for anyone to write a new paper with just a small update to the data.
From that frustration, nextflu was born. And nextflu led to nextstrain.
The devastating 2013-2016 Ebola epidemic in West Africa leant the project new urgency. Relatively early in the outbreak, researchers sequenced Ebola genomes from patients and immediately uploaded them to the public database GenBank, leading to a surge of collaboration from experts in diverse fields. The collection of shared, publically available data helped answer critically important questions as the epidemic was unfolding. It added to the confirmation that that the outbreak was being sustained by human-to-human contact, not contact with bats or other animal carriers, suggested probable transmission routes and revealed where and how fast mutations in the virus were occurring — all information crucial to both public health and medical interventions.
Even when data is shared, speed is everything in responding to outbreaks, so any tool that speeds data analysis contributes to the effort.
But despite the precedent set by the response to the Ebola epidemic, fewer researchers have shared Zika virus genome sequences from the more recent crisis in Brazil, Central America and the Caribbean, the researchers said.
“I’m not seeing the same thing with Zika,” said Dr. Gytis Dudas, a postdoctoral fellow in Bedford’s laboratory who worked on many of the Ebola analyses. In part, Dudas said, the Zika virus is more difficult to sequence than Ebola, making researchers more likely to guard their rare sequences for publications.
And that, Bedford said, is “a tragedy,” even as he understands that academic careers depend on publishing.
“The idea is that this nextstrain platform would provide some neutral ground with which to share data,” said Bedford. “We’re not trying to make a flashy paper. We just want [the data] to be on the website so people can look at the latest thing and do analyses that aren’t stymied by publication practices. This kind of simple sequence sharing during outbreaks is something that if you could just push the [scientific] community a little bit, you could have some real-world impact in helping respond to epidemics.”