How to Configure the Network for the MapR Sandbox for Hadoop - #WhiteboardWalkthrough

Editor's Note:  In this weeks Whiteboard Walkthrough, James Casaletto walks you through how to configure the network for the MapR Hadoop Sandbox. Whether you use VirtualBox, VMware Fusion, VMware Player, or pretty much any hypervisor on your laptop to support your MapR Sandbox, you'll need to configure the network. There's essentially three different settings that you can use to configure the network for your Sandbox. One is NAT, one is host-only, and one is bridged.

Here is the transcript: 

Hello. My name is James Casaletto. I work in the Professional Services organization at MapR Technologies. Welcome to this episode of the Whiteboard Walkthrough. I'm going to show you how to configure the network for the MapR Hadoop Sandbox. Whether you use VirtualBox, VMware Fusion, VMware Player, or pretty much any hypervisor on your laptop to support your MapR Sandbox, you'll need to configure the network. There's essentially three different settings that you can use to configure the network for your Sandbox. One is NAT, one is host-only, and one is bridged.

Let's start with NAT. NAT stands for Network Address Translation. NAT effectively supports outbound traffic, so from within the Sandbox, I can connect out to the network, but inbound traffic is not, by default, ok. There are some things we can do to enable that; there's something called a port forwarding rule that I can configure. So I can configure something called port forwarding.  This means that on a port by port basis, I can configure that port to pass through from the host or it can be forwarded from the host to the virtual machine.

If I only have a handful of ports—maybe port 22 for SSH, maybe I have port 8443 for the MCS, maybe I have port 5881 for Zookeeper, maybe I have port 7222 for the CLDB and so on. With a handful of ports like that, it's fine, but if your list gets long, then NAT becomes a little difficult to manage. That's NAT. Host-only is a network setting where you can connect to the Sandbox from this host. Suppose this is a terminal window, I could SSH, or Ping, or Telemat, or FTP from the host, which is my laptop, to the Sandbox and vice versa.  

So basically, the scope of connectivity for host-only is on the host only. This is fine if you only need to work with the Sandbox from within the laptop, but what if you need to, for example, install a software package? Perhaps you want to use Aptia, Yum, Wget or Curl or whatever you need to get out to the internet, host-only will not serve you. So there is no internet option with host-only.

The third option is called bridged. The idea behind bridged is that you take your Ethernet connection for the Sandbox, which is ethO and you bridge it or connect it to one of the network ports on your laptop.  When you say bridged, you also have to bridge it to something, so I can bridge it to my wireless connection, or I could bridge it to my wired Ethernet connection. That works fine. The only time that it doesn't work is if there's some sort of password or dialog that is required to connect to that network, because no dialog is presented inside the Sandbox. For example, if you're at a Starbucks and you have bring up a web browser and have to click “yes” to continue, that wouldn't be supported in our Sandbox. If there is no password or there is no dialog associated with that network connection, that's perfectly fine.  Then you can get out from the Sandbox out to the network and you can also get back into the Sandbox from outside, so it kind of functions like a regular node on the network.

The other part of the bridge though, is you have to be careful about the so-called bridge to nowhere. It may be that when you're at the office that you can connect to the WiFi in your office, (suppose you don't have a password), and then you go home and you want to connect this also through the WiFi, but at home you have a password on your WiFi router:  then basically, you've created a bridge to nowhere.  Likewise, if you're at home and you plug into your Ethernet cable here and you can bridge it to that and that works for you to get out to the internet, and then you come to another place and you don't plug this in, then again, you're kind of bridged to nowhere. Just be careful of that when you're using bridge networking.

With all of these choices, which one do I use? It depends on what your needs are. Bridged is perfectly fine, as long as there's no password or dialog required to make a connection to that network. Host-only is fine as long as you only need to communicate between your laptop and the Sandbox.  NAT is fine so long as the list of ports that you need to connect into (ingress traffic) is short. Otherwise, it's all up to you.

The last thing is, with the MapR Sandbox, what happens when you change between these modes? What you would need to do is reboot your Sandbox. There's a running control script that will take care of the IP address that gets assigned to your Sandbox on startup.

That's it, thank you very much for viewing this session. If you have any comments, please leave them below. Also, you can follow us on Twitter @MapR #WhiteboardWalkthrough. Thank you very much.

no

CTA_Inside

MapR Sandbox for Hadoop
Use the sandbox to experiment with Hadoop technologies using the MapR Control System (MCS) and Hue.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free