Anyone collectingÂ data needs a place to put it. Harvard geneticist George Church felt that need acutely in the early days of his Personal Genome Project: It was the earlyÂ 2000s, and he had theÂ audacious goal of sequencing some 100,000 human genomes â€” each 25,000 times the size of a traditional electronic record.Â But thoughÂ his vision was ripe, the infrastructure to store and manipulate these titanic data sets wasnâ€™t.
ChurchÂ commissioned Alexander Wait Zaranek, a computer science researcher in his lab, to scope out the tools available to work through such largeÂ data sets. When none were available, Zaranek and his Church lab colleagues Ward Vandewege and Tom Clegg began building their own. And so, ArvadosÂ was born.
Arvados is a content management system for large bulky genomic data sets. Just as blogging platforms like WordPress let journalists and writers upload their data â€” text, videos, images â€” and work with them, so Arvados lets researchers and clinicians import genetic data files. Within the system, they can run a variety of analyses or share the data itself.
The first generation of Arvados was activated in 2007 to service the Personal Genome Project. By 2013, its founders had spun off the company, and in December 2013, Curoverse announcedÂ $1.7 million in seed funding to develop its software.
In the 10Â years since the Personal Genome ProjectÂ was conceived, the effort toÂ useÂ genetic data to inform medicine has exploded internationally.Â In the next year, researchers are expected to generate 85 petabytes of sequencing data from research subjects and patients. â€œThat translates to about 21 million HD movies,â€ Curoverse chief executive Adam BerreyÂ said.
With its Arvados software, Curoverse hopes to be the invisible infrastructure powering such analyses in research labs and clinics over the next decade.
So far, the system has been accessible by invitation onlyÂ â€” Johns Hopkins University, Harvard Medical School, and the Wellcome Trust Sanger Institute (which is storing 20 petabytes of data) are among early adopters.Â StartingÂ Tuesday, any group can sign up to use the system, which can be accessed through a website. Curoverse also sells the system on hardware that can be stored on-site andÂ installed for a fee. The company is preparing for a commercial release this summer.
Image via Flickr user Dave Fayram
Article source: http://www.betaboston.com/news/2015/04/14/a-wordpress-for-genetic-data-curoverse-opens-in-beta-to-researchers/