James Casaletto View Bio
9:00am - 4:30pm
Why Predictive Analytics Needs Hadoop
Standard predictive analytics platforms need to catch up. As data grows bigger, faster, more varied—and more widely distributed—storing, transforming, and analyzing it doesn’t scale using traditional tools. Instead, today’s best practice is to maintain and even process data in its distributed form rather than centralizing it. Apache Hadoop provides a powerful platform and mature ecosystem with which to both manage and analyze distributed data.
Predictive analytics projects can and must accommodate these challenges, i.e., the classic "3 V's" of big data–volume, variety, and velocity—as well as its distributed nature. In this hands-on workshop, leading Hadoop educator and technology leader James Casaletto will show you how to:
- Predict with Hadoop. Create predictive models over enterprise-scale big data using the modeling libraries built into the standard, open-source Hadoop ecosystem.
- Model both batch and streaming data. Implement predictive modeling using both batch and streaming data and gain insights in near real time.
- Model in a distributed fashion. Accommodate predictive modeling projects to the distributed nature of data in order to benefit from parallel computation, while also averting the often unneeded and prohibitively inefficient process of merging and centralizing widely distributed sources of data.
- Do it yourself. Gain the power to extract signals from big data on your own, without relying on data engineers and Hadoop specialists for each and every request.
This training program answers these questions:
- What are the particular challenges of big data for predictive analytics?
- When does Hadoop provide the greatest value?
- How can streaming data be processed in Hadoop?
- How does one build predictive models with Apache Spark?
Hands-on lab (afternoon session):
- Access to an enterprise-scale Hadoop cluster running in the cloud
- Access to real data sets, working code, and hands-on exercises
- Option to install a pre-configured, 1-node Hadoop cluster on your laptop
James is a Principal Solutions Architect for MapR, where he develops and deploys big data solutions with Apache Hadoop. He has strong technical work experience as a leader and individual contributor in a wide variety of internal and customer-facing environments, including startups, academia, government, and large corporations, and his unique skill set spans from solution requirements to product evangelism. James previously was Director of Field Engineering for Nebula, a company that enables businesses to deploy large private cloud computing infrastructures. Additionally, he served as a Technical Marketing Engineer for Cisco Systems, as well as a technical consultant for companies including Symantec, ExitCertified, and Hybridmedia. James holds a BS in Mathematics from the University of Illinois at Urbana-Champaign, an MS in Computer Science from California State University, and a Certificate in Bioinformatics from the University of California.