Achieving replication in Hive if the metastore is in MySQL

This topic explains how to achieve replication in Hive if the metastore is in MySQL. MySQL has built-in replication, which can be used in conjunction with remote mirroring to replicate Hive tables. While Hive does not have this replication capability, this can be achieved using mirror volumes in MapR.

More information about mirror volumes can be found at  http://doc.mapr.com/display/MapR/Working+with+Mirror+Volumes.

Source cluster

Hive metastore is under "metastore" database and there exists a user "hiveuser" with following permissions:

GRANT  SELECT, INSERT, UPDATE, DELETE ON `metastore`.* TO 'hiveuser'@'%' 

1. Ensure that mysql is not bound to a particular interface. On Centos locate /etc/my.cnf and comment out these 2 lines:

#skip-networking
#bind-address = 127.0.0.1 

2. Edit /etc/my.conf with replication specific params as follows under [mysqld] section

server-id  = 1 # Each server needs a  unique ID for replication
binlog-do-db=metastore # database which is to be replicated
log-bin  = mysql-bin # Binary log  used for replication 
binlog_format  = row # Use row based  replication. This param is necessary for replication to work properly
symbolic-links=0

3. Restart mysql service

service  mysqld restart

4. Grant replication right to user which will be used for replication. Here the user name is hiveuser

 
mysql -u root -p
mysql> GRANT REPLICATION SLAVE ON metastore.* to ‘hiveuser’@'%’ identified by ‘mapr’;
mysql> FLUSH PRIVILEGES;
mysql> show grants for hiveuser;
 
+———————————————————————————————————————+
| Grants for hiveuser@% |
+———————————————————————————————————————+
| GRANT REPLICATION SLAVE ON *.* TO ‘hiveuser’@'%’ IDENTIFIED BY PASSWORD ‘*F7B38192A48223DE3454B27D3DC12EC66170CAFE’ |
| GRANT SELECT, INSERT, UPDATE, DELETE ON `metastore`.* TO ‘hiveuser’@'%’ |
+———————————————————————————————————————+
mysql> FLUSH TABLES WITH READ LOCK;
mysql> show master status;
+——————+———-+————–+——————+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+——————+———-+————–+——————+
| mysql-bin.000006 | 107 | metastore | |
+——————+———-+————–+——————+
1 row in set (0.00 sec)
mysql> show variables like ‘binlog%’;
+—————————————–+——-+
| Variable_name | Value |
+—————————————–+——-+
| binlog_cache_size | 32768 |
| binlog_direct_non_transactional_updates | OFF |
| binlog_format | ROW |
| binlog_stmt_cache_size | 32768 |
+—————————————–+——-+
4 rows in set (0.00 sec)
mysql> QUIT; 

5. Obtain a dump of metastore database to be uploaded to slave.

mysqldump  -u root -p metastore > hivems.sql

6. Release locks on mysql tables

mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
mysql> quit;  

7. Configure hive to use a volume instead of default location

maprcli  volume create -name hivevol -path /hivevol 

8. Copy all contents of /user/hive to /hivevol

hadoop  fs -cp /user/hive /hivevol

9. Rename /user/hive to something else and change mount path of hivevol to /user/hive

maprcli  volume unmount -name hivevol
maprcli volume mount -name hivevol -path /user/hive 

On slave mysql server

1. Configure replication parameters and server name in my.cnf

server-id = 2
replicate-do-db=metastore
log-error = /var/lib/mysql/mysql.err   

2. Restart mysql server

service mysqld restart

3. Restore database on slave

mysql -u root -p
mysql> create database metastore;
mysql> quit;
mysql -u root -p metastore < /root/hivems.sql

4. Configure master server params on slave

mysql -u root -p
mysql> slave stop;
mysql> change master to master_host=’10.10.80.232′,master_port=3306,master_user=’hiveuser’,master_password=’mapr’;
mysql> slave start;
mysql> show slave status \G;

5. create a remote mirror of hivevol

maprcli volume create -name hivevol -path /hivevol -type 1 -source
hivevol@amexupgrade -nodelay 1

6. Modify hive.metastore.warehouse.dir to /hivevol in /opt/mapr/hive/hive-0.9.0/conf/hive-site.xml

On master server

1. Create a test table in hive and load some data

CREATE TABLE logs3(t1 string, t2 string, t3 string, t4 string, t5 string, t6
string, t7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘;
LOAD DATA LOCAL INPATH ‘/root/sample.log’ OVERWRITE INTO TABLE logs3;
select count(*) from logs3;

On slave

1. Start mirroring the volume.

maprcli volume mirror start -name hivevol

2. Once mirroring is complete, check data listing in hive

select count(*) from logs3;

Tags
Best Practices
Hive