Allen Day View Bio
Wednesday, April 6 at 12:40pm
Powerful new tools exist for processing large volumes of data quickly across a cluster of networked computers.
Typical bioinformatics workflow requirements are well-matched to these tools' capabilities. However, the tool Spark, for example, is not commonly used because many legacy bioinformatics applications make assumptions about their computing environment. These assumptions present a barrier to integrating the tools into more modern computing environments.
Fortunately, these barriers are quickly coming down. In this presentation, we'll examine a few operations common to many bioinformatics pipelines, show how they were usually implemented in the past, and how they're being re-implemented right now to save time, money, and make new types of analysis possible. Some code examples will also be provided.
Allen is the Principal Data Scientist at MapR Technologies, where he leads interdisciplinary teams to deliver results in fast-paced, high-pressure environments across several verticals in industry. Previously, Allen founded TinyTube Networks which provided the first mobile video discovery and transcoding proxy service, and Ion Flux which provided a medical-grade, cloud-based human genome sequencing service.
Allen has contributed to a wide variety of open source projects: R (CRAN, Bioconductor), Perl (CPAN, BioPerl), FFmpeg, Cascading, Apache HBase, Apache Storm, and Apache Mahout. Overall, his unique background combines deep technical expertise in data science with a pragmatic understanding of real-world problems. He also pursues interests in linguistics and economics, and — if it hadn’t been obvious — he performs magic.