Securing Files in Hadoop at the Right Levels – Whiteboard Walkthrough

In this week's Whiteboard Walkthrough, Mitesh Shah, Product Management at MapR, describes how you can make sure you aren’t opening more access permissions to your sensitive data in Hadoop than you intended, using File Access Control Expressions in MapR.

Here is the undedited transcription: 

Hello everybody. This is Matesh from MapR. I'm product manager for security and data governance. Today, I'd like to talk to you about a feature called File Access Control Expressions, or ACEs for short. 

Security's in the news quite a bit lately. We all know that we're storing potentially sensitive data and we want to restrict access to that data as appropriate and ACEs really help with that.

I'll do that by explaining a couple of scenarios here where ACEs are very helpful and in some scenarios it's not possible to do the same thing with traditional mechanisms like POSIX mode bits or POSIX ACLS. Then, I will round out the discussion with a special feature called whole volume ACE that will be really beneficial, I think, in multi-tenant environments.

To begin with, let's start with the scenario here. We've got a file called thequarterlyreport.csv and that file is owned by Bruce and the owning group is finance. This should look quite familiar to most of you. These are basically POSIX mode bits and that's how the file looks in a UNIX or LINUX file system.

In this case, the owner Bruce has read and write access to the file and the owning group finance has read access to the file. Easy enough setup, right?

What if we have a user Sally and we want to grant Sally access to the file. What do we do in that case? Well, there's a couple of options here, but let's go through what we're thinking here.

Sally is a member of finance. If Sally's a member of finance, then we are done. Everything is good right? Because finance has read access to the file. No problems there, right?

What if Sally is in the marketing group? What do we do in that case? Again, we've got a couple of options. We could move Sally administratively of course to the finance group. She's in marketing today. We move her to finance just so she can access this one file. Not such a great option. Very sad.

We could make the file world readable. We have the last 3 bits here. We can make it a read bit there and set it such that everybody can access the file, including Sally. But that's not such a great option either because now the world has access to the file and really we just wanted Sally to have access to the file.

With ACEs you will have the ability to provide a Boolean expression that allow you to do very compact expression to say exactly who can access the file. In this case, we can simply say well I want Bruce to access the file or the group finance or the user Sally. Simple, compact, beautiful, you're done.

We walk through another scenario. What if, again, same setup here, we've got the file that's theQreport.csv. The owner is Bruce and the owning group is finance. Same permissions here. Bruce and finance have read access to the file. What if in this case we want to grant access to the file if and only if the user is in the group finance and the group US. We've got an intersection here. We only want to provide access to the folks that are in both finance and US. That's the area in purple here.

Well, guess what? With POSIX mode bits, that's really not possible. With POSIX ACLs that's also not possible, but with our feature with the file access control expressions here that is certainly possible. All you have to do is say, “hey I want Bruce to access the file” or the folks that are in both finance and the US. Again, in this case you can use ACEs to do something when you cannot in traditional POSIX mode bit and POSIX ACLs.

I'd like to round out the discussion here with a complimentary feature called Whole Volume ACE. What this does is it really helps in multi-tenant environments where you've got potentially different groups of your organization trying to access the data in the same cluster.

This feature guards against users opening permissions inappropriately either purposefully or accidentally. The example I had here, is we've got a volume, let's call it finance users and that volume's really meant only for finance users. No problem. We've set in this case the whole volume ACE such that any content within that volume is available only to the finance group.

Let's say Bruce is in that volume as well. Bruce has a directory in that volume and there's nothing really stopping him from basically opening up the permissions to his directory to everybody. Read, write, execute, to all. He can do that. Let's take a look at a couple scenarios now.

If Sally from marketing, remember this volume is for finance only. If Sally from marketing tries to access the file and the whole volume ACE is not set, guess what? She's got access. No problem. But, if Sally from marketing tries to access the volume or tries to access Bruce's directory, if the ACE is set, Sally does not have access. Very powerful feature here again that we think will help extraordinarily with multi-tenant environments.

That concludes the presentation on file access control expressions. If you have any questions about this feature, please feel free to comment below. If you have any ideas for other Whiteboard Walkthroughs, obviously comment on those as well. Appreciate your time. Thanks so much!



Ebook: Getting Started with Apache Spark
Interested in Apache Spark? Experience our interactive ebook with real code, running in real time, to learn more about Spark.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free