Finding the Right People for Your Hadoop Initiative

This article was originally published by Igor Izotov on his LinkedIn profile here

There are resources and there are Resources. The old statement is still valid: Your project's success is entirely dependent on the people you hire. Hadoop initiatives are no exception to this; instead, these initiatives are particularly demanding.

Here are a few reasons why:

  1. Hadoop is an emerging technology, there is and there will be much hype around it; the temptation to try this technology no matter what is coming from specialists and companies, large and small. Emerging is synonymous to evolving when it comes to Hadoop; funny enough, the book on Hadoop that was just released by O'Reilly and Pentaho (freely available to download here) will become outdated in a few months' time. The team members you require must be a particular, fond-of-learning kind to be able to stay on top of the Hadoop evolution, choose which innovations to adopt and, at the end of the day, deliver well.
  2. Hadoop seems to be a low hanging fruit and the managers are tempted to appoint internal resources to learn Hadoop 'in a couple of days'. Numerous tutorials and how-tos are available on the Internet, developed by each of the Hadoop Vendors Pack. The caveat here is the somewhat monotony of these samples... counting words, scraping Twitter, calculating Pi, teragen and terasort, ingesting log files; these are all good basic technical examples, but they are just scratching the surface of how Hadoop can bring business value as a cohesive framework, as a whole. There must be a blend of internal Hadoop-green resources and practitioners who delivered commercial or near-commercial implementations.

A number of POCs have sprouted up over the last few years, fully satisfying technical resources' thirst for new technology; only a few made it to the beyond-the-technical-POC stage and became properly plugged into the enterprise ecosystem, delivering business value.

Why is the team topic so important?

Hadoop is a framework, a suite of tools that demand your team to possess a multitude of skills. Let us define four Tiers, focussing on roles, not team members.

Hadoop Resources and Roles

Now, this is a lengthy list, isn't it? Most likely your implementation will require additional roles or not require, say, Natural Language Processing specialists, depending on your needs. You might be lucky to find technicians that are able to check off a few roles – in fact, this would be ideal; but blending the management tier roles with the technical ones is not recommended. Although tempting, doing so will increase the chances of your implementation turning into an technical geek-fest with very little focus on the business problem.

Why the Tiers? As with every implementation, success comes in steps and the Hadoop elephant is best consumed one slice at a time. The foundation platform can be delivered by Tier 1 resources, followed by more advanced capabilities leveraging Tier 2 and Tier 3 resources. Needless to say, even the tiniest of implementations needs to involve the business-focussed Leadership Tier resources.

By way of conclusion, it is vitally important to have the right blend of roles and the right business justification for the activity. This applies to virtually every IT initiative and Hadoop is no exception.

The key responsibilities of each role will be covered in the coming posts.

no

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free