Data Safety and Data Recoverability: A Snapshot How-to in a Snap

It doesn’t matter if it’s big data or small data, it’s always BIG for the user. Big data investments mean big expected ROI and big business value. This is why when customers are searching for a big data platform it’s important for them to ask the right questions. Our customers typically plan ahead for enterprise deployments, so they ask good questions such as:

  • "How can I protect my data?"
  • "How do I restore my data if there’s a problem, and how easy is it to do?"
  • "Can you help me deliver on my service-level agreements?"

The questions above, and others like them can be heard in different flavors, but they all boil down to that one concern - data safety & data recoverability.

MapR, by design, has built-in native data protection in the form of container replication. To add more punch to that, we've included snapshot and mirroring capabilities. Our snapshot function is what I’ll focus on today.

Snapshots are point-in-time, static views of data. In other words, they capture the state of the storage system at the time the snapshot command is issued. When implemented with consistency in mind, as they are in MapR, they are guaranteed to reflect the data exactly as it was when the snapshot was taken. Snapshots are useful in a variety of scenarios, one of which is recovering data that was corrupted by user or application errors. They are also good for establishing a baseline view of data upon which point-in-time querying, audit processes, or machine learning techniques can be applied.

You’ll notice that taking a snapshot in MapR is quick and very space efficient. And because they are accessible directly from the file system, organizations do not need to go through significant effort to retrieve snapshot data.

There are two things worth noting. First, taking a snapshot is a volume-level process. Therefore you cannot, and it doesn’t make sense to, take a snapshot of a single file or a single subdirectory. Second, though the method to create a snapshot is the same, the path to recovery slightly varies depending on the type of data to be recovered – regular files versus MapR-DB tables.

Getting into details about how a snapshot works and the finer details is out of scope for this post. This post is intended just to give you a quick “how-to” on the most common restore techniques from a snapshot.

Snapshot and Restore of Simple Files

For this example, I have created two volumes: snapshot_src and snapshot_dst (source and destination respectively). Note that snapshots do not need a specific “destination” volume, I’m just creating that volume as a place in this example to put the recovered snapshot data. Most likely you would overwrite your corrupted data file with the valid version in the snapshot.

[root@mu-node-64 ~]# maprcli volume create -name snapshot_src -path /snapshot_src -topology /data
[root@mu-node-64 ~]# maprcli volume create -name snapshot_dst -path /snapshot_dst -topology /data
[root@mu-node-64 ~]#
[root@mu-node-64 ~]# ls -ld /mapr/my.cluster.com/snapshot_*
drwxr-xr-x. 2 root root 0 May 21 14:02 /mapr/my.cluster.com/snapshot_dst
drwxr-xr-x. 2 root root 0 May 21 14:01 /mapr/my.cluster.com/snapshot_src
Fig.1

Now I’ll create some example text files in the source volume:

[root@mu-node-64 ~]# cd /mapr/my.cluster.com/snapshot_src/
[root@mu-node-64 snapshot_src]# echo a > file1.txt
[root@mu-node-64 snapshot_src]# echo b > file2.txt
[root@mu-node-64 snapshot_src]# ls -ltr
total 1
-rw-r--r--. 1 root root 2 May 21 14:04 file1.txt
-rw-r--r--. 1 root root 2 May 21 14:04 file2.txt
Fig.2

Notice that in Figure 2, I used the Linux cd command to go to the Hadoop directory and create files as if they resided on a regular Linux file system. I can do this because I am using the MapR NFS interface which lets me access my Hadoop data with the Linux command line.

Now, I can take the snapshot as follows:

[root@mu-node-64 snapshot_src]# maprcli volume snapshot create -snapshotname snapshot.snapshot_src -volume snapshot_src
[root@mu-node-64 snapshot_src]# maprcli volume snapshot list
cumulativeReclaimSizeMB  creationtime                  ownername  snapshotid  snapshotname       	volumeid   volumename    ownertype  volumepath 	
0                    	Wed May 21 14:07:09 IST 2014  root   	256000055   snapshot.snapshot_src  136224524  snapshot_src  1      	/snapshot_src 
[root@mu-node-64 snapshot_src]#
[root@mu-node-64 snapshot_src]# hadoop fs -ls /snapshot_src/.snapshot
Found 1 items
drwxr-xr-x   - root root      	2 2014-05-21 14:04 /snapshot_src/.snapshot/snapshot.snapshot_src
Fig.3

And now, I want to restore the snapshot.snapshot_src directory, which contains the snapshot data, by copying it to the destination volume. Since the snapshot looks exactly like a file system directory, the restore process simply uses the hadoop fs -cp command just like on any distribution for Hadoop. It is pretty straight forward as shown below:

[root@mu-node-64 snapshot_src]# hadoop fs -cp /snapshot_src/.snapshot/snapshot.snapshot_src/ /snapshot_dst
[root@mu-node-64 snapshot_src]# cd /mapr/my.cluster.com/snapshot_dst/snapshot.snapshot_src
[root@mu-node-64 snapshot.snapshot_src]#
[root@mu-node-64 snapshot.snapshot_src]# ls -ltr
total 1
-rwxr-xr-x. 1 root root 2 May 21 14:09 file1.txt
-rwxr-xr-x. 1 root root 2 May 21 14:09 file2.txt
Fig.4

As you can probably tell, I also could have used the standard Linux cp command instead of hadoop fs -cp to restore the snapshot data.

Snapshot and Restore of MapR-DB Tables

As mentioned earlier, the restore of MapR-DB tables is a little different. I setup the source table by first creating a tables subdirectory in the snapshot_src volume, and then create the table, as follows:

 
[root@mu-node-64 snapshot.snapshot_src]# hadoop fs -mkdir /snapshot_src/tables
[root@mu-node-64 snapshot.snapshot_src]# maprcli table create -path /snapshot_src/tables/table01
[root@mu-node-64 snapshot.snapshot_src]# maprcli table cf create -path /snapshot_src/tables/table01 -cfname table01_cf01
[root@mu-node-64 snapshot.snapshot_src]#
[root@mu-node-64 snapshot.snapshot_src]# maprcli table cf list -path /snapshot_src/tables/table01
readperm  appendperm  inmemory  versionperm  cfname    	writeperm  compressionperm  memoryperm  compression  ttl     	maxversions  minversions 
u:root	u:root  	false     u:root   	table01_cf01  u:root     u:root       	u:root  	lz4      	2147483647  3            0        	
[root@mu-node-64 snapshot.snapshot_src]#
[root@mu-node-64 snapshot.snapshot_src]# hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.94.17-mapr-1403-SNAPSHOT, rbb690294807b1bf405176c2dfbcff0e815849f4e, Tue Apr  1 14:07:41 PDT 2014
 
Not all HBase shell commands are applicable to MapR tables.
Consult MapR documentation for the list of supported commands.
 
hbase(main):001:0> put '/snapshot_src/tables/table01', 'row1', 'table01_cf01', 'table01_value01'
0 row(s) in 0.3270 seconds
 
hbase(main):002:0> scan '/snapshot_src/tables/table01'
ROW                                              COLUMN+CELL                                                                                                                                   
 row1                                            column=table01_cf01:, timestamp=1400663440775, value=table01_value01                                                                         
1 row(s) in 0.0400 seconds
 
hbase(main):003:0>
Fig.5

And the snapshot was taken as shown below:

[root@mu-node-64 ~]# maprcli volume snapshot create -snapshotname snapshot1.snapshot_src -volume snapshot_src
[root@mu-node-64 ~]# maprcli volume snapshot list
cumulativeReclaimSizeMB  creationtime                  ownername  snapshotid  snapshotname        	volumeid   volumename    ownertype  volumepath 	
0                    	Wed May 21 14:07:09 IST 2014  root   	256000055   snapshot.snapshot_src   136224524  snapshot_src  1      	/snapshot_src 
0                    	Wed May 21 14:44:32 IST 2014  root   	256000056   snapshot1.snapshot_src  136224524  snapshot_src  1      	/snapshot_src 
[root@mu-node-64 ~]#
Fig.6

The latest snapshot is the one bearing the snapshotid 256000056. To restore the table, I’ll first setup the destination table as shown below:

[root@mu-node-64 ~]# hadoop fs -mkdir /snapshot_dst/tables
[root@mu-node-64 ~]# maprcli table create -path /snapshot_dst/tables/table01
[root@mu-node-64 ~]#
Fig.7

And the column family as shown below:

[root@mu-node-64 ~]# maprcli table cf create -path /snapshot_dst/tables/table01 -cfname table01_cf01
[root@mu-node-64 ~]#
Fig.8

The table copy will be carried out as shown below:

[root@mu-node-64 ~]# hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=/snapshot_dst/tables/table01 /snapshot_src/.snapshot/snapshot1.snapshot_src/tables/table01
14/05/21 14:47:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/05/21 14:47:32 INFO security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
Fig. 9

Upon successful execution of the task you should be greeted with output similar to what is shown Fig. 10 and Fig. 11.

14/05/21 15:01:09 INFO mapred.JobClient: Running job: job_201405201959_0014
14/05/21 15:01:10 INFO mapred.JobClient:  map 0% reduce 0%
14/05/21 15:01:23 INFO mapred.JobClient:  map 100% reduce 0%
14/05/21 15:01:24 INFO mapred.JobClient: Job job_201405201959_0014 completed successfully
14/05/21 15:01:24 INFO mapred.JobClient: Counters: 17
14/05/21 15:01:24 INFO mapred.JobClient:   Job Counters
14/05/21 15:01:24 INFO mapred.JobClient: 	Aggregate execution time of mappers(ms)=5035
14/05/21 15:01:24 INFO mapred.JobClient: 	Total time spent by all reduces waiting after reserving slots (ms)=0
14/05/21 15:01:24 INFO mapred.JobClient: 	Total time spent by all maps waiting after reserving slots (ms)=0
14/05/21 15:01:24 INFO mapred.JobClient: 	Rack-local map tasks=1
14/05/21 15:01:24 INFO mapred.JobClient: 	Launched map tasks=1
14/05/21 15:01:24 INFO mapred.JobClient: 	Aggregate execution time of reducers(ms)=0
14/05/21 15:01:24 INFO mapred.JobClient:   FileSystemCounters
14/05/21 15:01:24 INFO mapred.JobClient: 	MAPRFS_BYTES_READ=122
14/05/21 15:01:24 INFO mapred.JobClient: 	MAPRFS_BYTES_WRITTEN=1701
14/05/21 15:01:24 INFO mapred.JobClient: 	FILE_BYTES_WRITTEN=76870
14/05/21 15:01:24 INFO mapred.JobClient:   Map-Reduce Framework
14/05/21 15:01:24 INFO mapred.JobClient: 	Map input records=1
14/05/21 15:01:24 INFO mapred.JobClient: 	PHYSICAL_MEMORY_BYTES=164352000
14/05/21 15:01:24 INFO mapred.JobClient: 	Spilled Records=0
14/05/21 15:01:24 INFO mapred.JobClient: 	CPU_MILLISECONDS=460
14/05/21 15:01:24 INFO mapred.JobClient: 	VIRTUAL_MEMORY_BYTES=2499231744
14/05/21 15:01:24 INFO mapred.JobClient: 	Map output records=1
14/05/21 15:01:24 INFO mapred.JobClient: 	SPLIT_RAW_BYTES=122
14/05/21 15:01:24 INFO mapred.JobClient: 	GC time elapsed (ms)=16
Fig.10
[root@mu-node-64 ~]# hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.94.17-mapr-1403-SNAPSHOT, rbb690294807b1bf405176c2dfbcff0e815849f4e, Tue Apr  1 14:07:41 PDT 2014
 
Not all HBase shell commands are applicable to MapR tables.
Consult MapR documentation for the list of supported commands.
 
hbase(main):001:0> scan '/snapshot_src/tables/table01'
ROW                                              COLUMN+CELL                                                                                                                                   
 row1                                            column=table01_cf01:, timestamp=1400663440775, value=table01_value01                            	                                          
1 row(s) in 0.3190 seconds
 
hbase(main):002:0> scan '/snapshot_dst/tables/table01'
ROW                                              COLUMN+CELL                                                                                                                                   
 row1                                            column=table01_cf01:, timestamp=1400663440775, value=table01_value01                                                                         
1 row(s) in 0.0090 seconds
 
hbase(main):003:0>
Fig.11

Mistakes to Avoid

The following common mistakes can be encountered when running the above commands, and I felt it’d be useful to have them laid out here for quick reference on how to resolve them.

Remember that MapR-DB tables have to be restored with the hbase CopyTable command. If I were to try a simple hadoop fs -cp command to attempt a restore, it would fail.

[root@mu-node-64 ~]# hadoop fs -cp /snapshot_src/.snapshot/snapshot1.snapshot_src/tables/table01 /snapshot_dst
cp: Cannot copy MDP Tables
[root@mu-node-64 ~]#
Fig.12

Remember to create the table to which you’ll restore your snapshot table data, or else you will hit errors like the ones below. Be sure to follow the steps in Fig. 7 shown earlier.

14/05/21 14:47:32 INFO fs.JobTrackerWatcher: Current running JobTracker is: mu-node-64/10.250.50.64:9001
2014-05-21 14:47:33,3158 ERROR Client fs/client/fileclient/cc/dbclient.cc:186 Thread: 140289152771840 OpenTable failed for path /snapshot_dst/tables/table01, LookupFid error No such file or directory(2)
14/05/21 14:47:33 ERROR mapreduce.TableOutputFormat: java.io.IOException: Open failed for table: /snapshot_dst/tables/table01, error: No such file or directory (2)
14/05/21 14:47:33 INFO mapred.JobClient: Cleaning up the staging area maprfs:/var/mapr/cluster/mapred/jobTracker/staging/root/.staging/job_201405201959_0012
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Open failed for table: /snapshot_dst/tables/table01, error: No such file or directory (2)
     	at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:206)
     	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
     	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
Fig. 13

Also remember to setup the column family for the restore table, or else you’ll see the errors below. If you follow steps in Fig. 8 you should be able to avoid this problem.

 
14/05/21 14:51:54 INFO mapred.JobClient: Running job: job_201405201959_0013
14/05/21 14:51:55 INFO mapred.JobClient:  map 0% reduce 0%
14/05/21 14:52:13 INFO mapred.JobClient: Task Id : attempt_201405201959_0013_m_000000_0, Status : FAILED on node mu-node-65
java.io.IOException: Invalid column family table01_cf01
     	at com.mapr.fs.PutConverter.createMapRPut(PutConverter.java:76)
 
attempt_201405201959_0013_m_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
attempt_201405201959_0013_m_000000_0: log4j:WARN Please initialize the log4j system properly.
14/05/21 14:52:18 INFO mapred.JobClient: Task Id : attempt_201405201959_0013_m_000000_1, Status : FAILED on node mu-node-66
java.io.IOException: Invalid column family table01_cf01
     	at com.mapr.fs.PutConverter.createMapRPut(PutConverter.java:76)
 
attempt_201405201959_0013_m_000000_2: log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
attempt_201405201959_0013_m_000000_2: log4j:WARN Please initialize the log4j system properly.
14/05/21 14:52:32 INFO mapred.JobClient: Job job_201405201959_0013 failed with state FAILED due to: NA
14/05/21 14:52:32 INFO mapred.JobClient: Counters: 7
14/05/21 14:52:32 INFO mapred.JobClient:   Job Counters
14/05/21 14:52:32 INFO mapred.JobClient: 	Aggregate execution time of mappers(ms)=24140
14/05/21 14:52:32 INFO mapred.JobClient: 	Total time spent by all reduces waiting after reserving slots (ms)=0
14/05/21 14:52:32 INFO mapred.JobClient: 	Total time spent by all maps waiting after reserving slots (ms)=0
14/05/21 14:52:32 INFO mapred.JobClient: 	Rack-local map tasks=4
14/05/21 14:52:32 INFO mapred.JobClient: 	Launched map tasks=4
14/05/21 14:52:32 INFO mapred.JobClient: 	Aggregate execution time of reducers(ms)=0
14/05/21 14:52:32 INFO mapred.JobClient: 	Failed map tasks=1
[root@mu-node-64 ~]#
Fig. 14

As a side note, if you get errors about “Cannot resolve the host name” like the message below, you likely do not have a functional rDNS. But this is harmless and should let you carry out your copy.

 
14/05/21 14:51:47 INFO mapreduce.TableOutputFormat: Created table instance for /snapshot_dst/tables/table01
14/05/21 14:51:54 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for /10.250.50.64 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '64.50.250.10.in-addr.arpa'
Fig. 15

Happy restores!

no

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free