Cluster Auditing in a Good Way - Whiteboard Walkthrough

Editor's note: In this week's Whiteboard Walkthrough, Dale Kim, Sr. Director of Industry Solutions at MapR, discusses three examples of how the auditing capabilities in the MapR Converged Data Platform are beneficial for your big data environment.

Hello. My name is Dale Kim of MapR Technologies, and welcome to my Whiteboard Walkthrough. In this episode, I want to talk about the comprehensive auditing capabilities in the MapR Converged Data Platform, and when I talk about auditing, what I'm referring to is logging and tracking all accesses in the platform for the purposes of analytics.

What MapR covers in terms of auditing starts with all data accesses, so all reads, writes, and updates will be tracked if you enable auditing. Also administrative operations, so any of the work that your administrators do can be tracked. Then there are authentication requests, so whether they succeed or fail, that information is available.

Some of the use cases I want to talk about that take advantage of auditing capabilities include the notion of multi-temperature data. Multi-temperature data necessarily requires you to understand what represents high-value data and what represents low-value data, and the auditing capabilities help you identify the more frequently accessed data, which represents the high-value data versus the less frequently accessed data.

Using the auditing capabilities and the insights that you gather, you can use them in conjunction with the construct in MapR known as volumes; it's unique to MapR, and it's essentially a logical partitioning of datasets within your cluster. Using volumes along with data placement control allow you to specify where datasets reside within your cluster.

If you have a heterogeneous cluster with different size servers with different densities of disk, you can put your high-value data on your bigger machines with potentially faster disks, presumably solid-state drives, and then more of your less frequently used data or your archivable data you can deploy to your smaller servers with a higher disk density per server. Of course, the higher performance servers will be more expensive. That's where you put your hot data, and you'll get a lot of savings by allocating your archivable data to your smaller servers.

The second use case I want to talk about is this notion of threats. You might have your MapR cluster and a lot of MapR users reading and writing and accessing data, and you can tell these are MapR users because of their smiley faces. As a result of the analytics, you might see some anomalous behavior. For example, you might have a user making an unusually high number of data accesses after hours, and so you might assume that there's some bad behavior going on. You can easily see that and then talk to that user and identify what the challenge is, but more likely is that the username's account was hacked, so that a bad actor is actually accessing your data. You want to be able to stop that right away.

A third use case with MapR auditing is about compliance. If you're in an industry that's highly regulated, you are probably very familiar with auditing related to regulations or with standards-based frameworks. Things like HIPAA, PCI, or FedRAMP are things you have to deal with.

Of course, you probably know that these compliance issues are not about the product, so you just can't go out and buy products that are compliant. It's more about the people and processes in combination with the technology, so it's the overall environment that is actually audited and achieves compliance. However, the technology can help you to do that. With MapR auditing, you can comply with components of these frameworks by proving that you know how people are accessing your data and what data is being accessed, so that simplifies the overall process when trying to achieve compliance with these frameworks.

That's my quick summary of what MapR auditing can do for you in a big data environment. If you have any questions, please feel to comment below, or if you have any other ideas about topics you'd like to see in a Whiteboard Walkthrough, please comment below. Thanks for watching.



Ebook: Getting Started with Apache Spark
Interested in Apache Spark? Experience our interactive ebook with real code, running in real time, to learn more about Spark.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free