Big Data Everywhere San Diego
San Diego, CA
Tuesday, April 12, 2016
Big Data Everywhere is a half-day conference focused on Hadoop, Spark and other big data technologies that brings together users and developers to share their experience via technical sessions and user success stories. Industry and technical experts will share their knowledge, share best practices, and discuss use cases and business applications.

Talks

Streaming in the Extreme

Jim Scott View Bio

Have you ever heard of Kafka? Are you ready to start streaming all of the events in your business? What happens to your streaming solution when you outgrow your single data center? What happens when you are at a company that is already running multiple data centers and you need to implement streaming across data centers? I will discuss technologies like Kafka that can be used to accomplish, real-time, lossless messaging that works in both single and multiple globally dispersed data centers. I will also describe how to handle the data coming in through these streams in both batch processes as well as real-time processes.What about when you need to scale to a trillion events per day? I will discuss technologies like Kafka that can be used to accomplish, real-time, lossless messaging that works in both single and multiple globally dispersed data centers. I will also describe how to handle the data coming in through these streams in both batch processes as well as real-time processes.
Genome Analysis Pipelines, Big Data Style

Allen Day View Bio

Powerful new tools exist for processing large volumes of data quickly across a cluster of networked computers. Typical bioinformatics workflow requirements are well-matched to these tools' capabilities. However, the tool Spark, for example, is not commonly used because many legacy bioinformatics applications make assumptions about their computing environment. These assumptions present a barrier to integrating the tools into more modern computing environments. Fortunately, these barriers are quickly coming down. In this presentation, we'll examine a few operations common to many bioinformatics pipelines, show how they were usually implemented in the past, and how they're being re-implemented right now to save time, money, and make new types of analysis possible. Some code examples will also be provided.

Speakers

Jim Scott

Jim drives enterprise architecture and strategy at MapR. Jim Scott is the cofounder of the Chicago Hadoop Users Group. As cofounder, Jim helped build the Hadoop community in Chicago for the past four years. He has implemented Hadoop at three different companies, supporting a variety of enterprise use cases from managing Points of Interest for mapping applications, to Online Transactional Processing in advertising, as well as full data center monitoring and general data processing. Prior to MapR, Jim was SVP of Information Technology and Operations at SPINS, the leading provider of retail consumer insights, analytics reporting and consulting services for the Natural, Organic and Specialty Products industry. Additionally, he served as Lead Engineer/Architect for dotomi, one of the world’s largest and most diversified digital marketing companies. Prior to dotomi, Jim held several architect positions with companies such as aircell, NAVTEQ, Classified Ventures, Innobean, Imagitek, and Dow Chemical, where his work with high-throughput computing was a precursor to more standardized big data concepts like Hadoop.

Allen Day

Allen is the Principal Data Scientist at MapR Technologies, where he leads interdisciplinary teams to deliver results in fast-paced, high-pressure environments across several verticals in industry. Previously, Allen founded TinyTube Networks which provided the first mobile video discovery and transcoding proxy service, and Ion Flux which provided a medical-grade, cloud-based human genome sequencing service.

Allen has contributed to a wide variety of open source projects: R (CRAN, Bioconductor), Perl (CPAN, BioPerl), FFmpeg, Cascading, Apache HBase, Apache Storm, and Apache Mahout. Overall, his unique background combines deep technical expertise in data science with a pragmatic understanding of real-world problems. He also pursues interests in linguistics and economics, and — if it hadn’t been obvious — he performs magic.