Get Real with Hadoop: Enterprise-grade Security

To get real with Hadoop, you need a real enterprise-ready platform. Join us as we begin the countdown of the 10 top reasons our customers choose MapR.

Security has always been important with enterprise customers, but today it’s non-negotiable. A recent breach at a major national retailer has affected about 56 million households. The magnitude of this breach can be put into perspective if you compare that figure to the recent census figure of 115 million US households. If your information has not been compromised yet, it is probably only a matter of time before it will be.

Protecting information, monitoring for threats, alerting affected parties, implementing mitigations to contain the risk, and upgrading to more robust security systems and processes are now an ongoing imperative, and not simply a project with a start and end date. Customers demand this from all enterprises they do business with.

Hadoop is a hacker honeypot
Hadoop is part of these enterprise environments, and therefore it does not get a special pass. In fact, one could argue that Hadoop should be held to a higher standard than any other data store in the history of enterprise systems. Traditional enterprises had the data scattered around in hundreds, if not thousands, of systems and therefore it was cumbersome for hackers to get access to every bit of information. However, Hadoop, as a data lake/reservoir/hub (pick your favorite), is now a honeypot with data from multiple departments.

Contrast this with the organic nature of the open source process, where security is almost always added after a project is successful. Hadoop itself, as well as Spark and Storm, are just a few examples where security was added once broad enterprise use was achieved.

In addition, when security options are available, they are rarely implemented. The most commonly cited reason for the lack of strong security implementation is that it interferes with—and slows down—business due to its complex, cumbersome and intrusive nature.  Another common reason is that turning existing security features on can be a significant project in itself.

Get started today with the MapR security solution
Basics of security remain the same: Authentication, Authorization, Encryption and Auditing.

MapR supports the same Kerberos-based security mechanisms available with Apache Hadoop. In addition, MapR differentiates its security offering by focusing on ease of use and enterprise integration. Key highlights include:

  • Security with or without the use of Kerberos
  • Standard Linux account integration
  • Performance
  • Ease of use
  • Broad ecosystem support

Strata Security

Only MapR provides both standard Hadoop Kerberos authentication for enterprises that have already adopted and are comfortable with Kerberos, as well as a simpler, yet secure, built-in authentication scheme that does not require installing and maintaining a Kerberos system. The systems can authenticate using any registry that is pluggable authentication module (PAM) enabled and gets user information via UNIX APIs which are NSSwitch-controlled. This essentially means that if your security system works with Linux authentication, it almost always will work with MapR—and therefore you can enable trusted authentication for Hadoop in a frictionless manner.

MapR supports Access Control Lists (ACLs) for regulating user privileges to the job queue and cluster. MapR extends the ACL concept to cover volumes, a logical storage construct that makes managing billions of files easier. MapR also provides fine-grained Access Control Expressions (ACEs), a list of logical statements that intersect to define a set of users and the privileges those users are authorized to perform for controlling access to tables. Standard UNIX file system permissions are also supported.

MapR uses several technologies to protect network traffic:

  • The Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol secures several channels of HTTP traffic. In compliance with the NIST standard, the Advanced Encryption Standard in Galois/Counter Mode (AES/GCM) secures several communication channels between cluster components. Kerberos encryption secures several communication paths elsewhere in the cluster.
  • Nodes with CPUs that support AES encryption at the hardware level will provide superior performance on encryption tasks.

Hadoop supports several capabilities that should be utilized. For instance, all job submissions create a record in the job tracker log. The MapR Control System logs all administrative actions and the maprcli logs all commands issues. In addition, logs can be shipped to an external system for longer-term retention.

Many, if not all, of the practices used in other systems can also be used for Hadoop. For instance, firewalls could be configured to monitor and limit access to the cluster. OS system shell logging could be used to capture issued commands. Direct access to Hadoop clusters could be blocked and enterprises could require that all users go through custom frameworks that enforce traceability.

Don’t go it alone
Several security outfits are now offering viable solutions for Hadoop, and you should consider the offerings of these security partners to complement MapR capabilities and establish a robust, secure Hadoop deployment.

The future is here already
With MapR, you can easily build a secure cluster and implement trusted authentication using built-in security features from MapR. In addition, you can encrypt wireline traffic from client to the cluster, both within the cluster and across clusters. You can also ensure fine-grained access controls on MapR-DB with MapR Access Control Expressions (ACEs), which work for all types of execution engines, not just MapReduce.

Are you ready to implement security on your clusters? Please visit the MapR Security Guide to get started.

And get the complete top 10 list here.


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free