Bio-IT World Conference 2016
Boston, MA
Tuesday, April 5, 2016
Thursday, April 7, 2016
Bio-IT showcases the myriad of IT and informatics applications and enabling technologies that drive biomedical research, drug discovery & development, and clinical and healthcare initiatives. Compelling talks, including best practice case studies and joint partner presentations, will feature over 260 industry and academic colleagues discussing themes of big data, smart data, cloud computing, trends in IT infrastructure, omics technologies, high-performance computing, data analytics, open source and precision medicine, from the research realm to the clinical arena.


Genome Analysis Pipelines, Big Data Style

Allen Day View Bio

Wednesday, April 6 at 12:40pm

Powerful new tools exist for processing large volumes of data quickly across a cluster of networked computers.

Typical bioinformatics workflow requirements are well-matched to these tools' capabilities. However, the tool Spark, for example, is not commonly used because many legacy bioinformatics applications make assumptions about their computing environment. These assumptions present a barrier to integrating the tools into more modern computing environments.

Fortunately, these barriers are quickly coming down. In this presentation, we'll examine a few operations common to many bioinformatics pipelines, show how they were usually implemented in the past, and how they're being re-implemented right now to save time, money, and make new types of analysis possible. Some code examples will also be provided.


Allen Day

Allen is the Principal Data Scientist at MapR Technologies, where he leads interdisciplinary teams to deliver results in fast-paced, high-pressure environments across several verticals in industry. Previously, Allen founded TinyTube Networks which provided the first mobile video discovery and transcoding proxy service, and Ion Flux which provided a medical-grade, cloud-based human genome sequencing service.

Allen has contributed to a wide variety of open source projects: R (CRAN, Bioconductor), Perl (CPAN, BioPerl), FFmpeg, Cascading, Apache HBase, Apache Storm, and Apache Mahout. Overall, his unique background combines deep technical expertise in data science with a pragmatic understanding of real-world problems. He also pursues interests in linguistics and economics, and — if it hadn’t been obvious — he performs magic.