Strata+Hadoop World NYC 2015
New York, NY
Tuesday, September 29, 2015
to
Thursday, October 1, 2015
MapR is proud to be an Elite Sponsor of Strata + Hadoop World, an event brings together the best minds in strategy, science, and industry for the defining event of the big data community to explore innovations in data-driven businesses and Hadoop use cases.

Talks

Streaming in the Extreme

Jim Scott View Bio

September 30, 2015 2:55pm–3:35pm

Have you ever heard of Kafka? Are you ready to start streaming all the events in your business? What happens to your streaming solution when you outgrow your single data center? What happens when you are at a company that is already running multiple data centers and you need to implement streaming across data centers? What about when you need to scale to a trillion events per day?

In this session, Jim will discuss technologies like Kafka that can be used to accomplish real-time, lossless messaging that works in both single and multiple globally dispersed data centers. He will also describe how to handle the data coming in through these streams in batch processes as well as real-time processes.

Finally, he will discuss why a streaming-only implementation can deliver a better experience with equivalent results to the Lambda Architecture.

 

 

Fixing Chicago’s Crime Data

Mike Emerick and Jayson Margalus View Bio

September 30, 2015 4:35pm–5:15pm

Open government is an incredibly popular topic today. From the appointment to office of the nation’s first chief data scientist, to cities like New York, San Francisco, and Chicago signing executive orders to open up city data to the public, more government data is available to us than ever before. Because of that, one would be right to think we’re in an era of unprecedented transparency.

Yet in late February 2015, investigative journalists revealed that the Chicago police department had been operating “black sites” around the city — essentially, places where Americans were detained, and then disappeared off the record. None of this data made it onto the city’s open data networks. This egregious, illegal behavior was not uncovered through open data, but through traditional journalistic methods.

This, and other stories like this, fundamentally calls into question the data integrity of open government initiatives. How can we still use this information to derive insights into our government? What can we do to identify omissions in the data? And how can we improve the integrity of open government data through traditional data analysis?

 

 

 

Real-World NoSQL Schema Design

Ted Dunning View Bio

September 30, 2015 5:25pm–6:05pm

There are lots of claims about the benefits of NoSQL databases, but few realistic demonstrations of the impact that such a database can have on anything more than toy-sized data. In this talk, Ted will deconstruct a real-world database schema into the corresponding NoSQL design.

The database that he will use is the Musicbrainz database, which exhibits many important idioms found in real databases, such as factoring relations into multiple tables to implement column families, linkage tables, and many-to-one relationships. The transformations that he will highlight show how almost all of the auxiliary tables in the original design are reduced to a format that is much simpler to understand – nested data structures. As a result, the number of tables drops by nearly 5x and the ease of understanding the design increases by a similar degree.

In spite of such radical structural changes, the resulting denormalized and nested data can still be queried with SQL by using Apache Drill, and the queries are often noticeably simpler than the queries used against the original data structures. The methods presented in this talk are practical and easy to apply, and can sometimes even be largely automated.

Ted will also show how a percolator pattern can be used to allow the resulting NoSQL database to be automatically maintained in multiple NoSQL technologies simultaneously, so that full text search, recommendations, and the HBase API can all be used to access the same data.

 

 

The Big Data Dividend (Keynote)

Jack Norris View Bio

October 1, 2015 at 9:10am

Companies that are getting the biggest results from their data projects today are doing more than deploying data hubs and running data analysis, they are incorporating big data directly into their business operations. The big data dividend refers to the ongoing, significant profits that are derived from these data-driven applications. This session will include examples of applications by leading companies and provide insights into how developers and organizations can realize big data dividends from a new class of scalable applications with continuous analytics.

Speakers

Jim Scott

Jim drives enterprise architecture and strategy at MapR. Jim Scott is the cofounder of the Chicago Hadoop Users Group. As cofounder, Jim helped build the Hadoop community in Chicago for the past four years. He has implemented Hadoop at three different companies, supporting a variety of enterprise use cases from managing Points of Interest for mapping applications, to Online Transactional Processing in advertising, as well as full data center monitoring and general data processing. Prior to MapR, Jim was SVP of Information Technology and Operations at SPINS, the leading provider of retail consumer insights, analytics reporting and consulting services for the Natural, Organic and Specialty Products industry. Additionally, he served as Lead Engineer/Architect for dotomi, one of the world’s largest and most diversified digital marketing companies. Prior to dotomi, Jim held several architect positions with companies such as aircell, NAVTEQ, Classified Ventures, Innobean, Imagitek, and Dow Chemical, where his work with high-throughput computing was a precursor to more standardized big data concepts like Hadoop.

Mike Emerick and Jayson Margalus

Mike Emerick is a former IBMer where he co-founded IBM healthcare transformation lab. The Lab was a center for healthcare integration and analytics. His work includes genomic research of the HIV virus, gene ontology specifically for food plants in agriculture. Mike, founded a nonprofit to promote Urban agriculture and the data science behind food production. He has worked with NGO's, heads of state and presidential cabinet members helping them learn and understand the next generation Agriculture and food supplies. He is now working with the electronic frontier foundation to develop new licences for providing and keeping data open and free. He is active in the the Chicago hacker/maker movement and logs many of his creative hours at Workshop 88 in the Chicago Suburbs.

Jayson Margalus is a results-oriented software engineer skilled in big data visualizations, game design and web software. As a Demo Engineer for MapR, Jayson is responsible for building interactive data visualizations, exhibits, and data art that demonstrate the power of MapR. Prior to that role, Jayson was an Adjunct Faculty member at DePaul University, where he co-created and taught “Business for Indie Developers.” While at DePaul, Jayson also taught game design, game history, and game programming. Jayson was also the co-owner of Lunar Giant Studios, an independent game studio, and a founder of Polymath Workshop, a web software company. Earlier in his career, Jayson co-founded and chaired the non-profit IGDA Chicago (International Game Developers Association). Jayson holds a BA in Political Science, History and History of Ideas from North Central College in Chicago.

Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..

Jack Norris

Jack drives understanding and adoption of new applications enabled by data convergence. With over 20 years of enterprise software marketing experience, he has demonstrated success from defining new markets for small companies to increasing sales of new products for large public companies. Jack’s broad experience includes launching and establishing analytic, virtualization, and storage companies and leading marketing and business development for an early-stage cloud storage software provider.