Hue Tutorial Part 2: Pig, Job Designer and Oozie

Using Pig

Pig is a platform for parallelized analysis of large data sets. Pig programs use a language called Pig Latin.

In this tutorial, create a directory for the US Constitution text file, and then create a Pig script that runs a word count MapReduce job on the text in the file. After you run the MapReduce job, view the wordcount file generated by the job.

Create a new directory for the constitution.txt file:

  1. Click . The File Browser page opens.
  2. Select oozie to open the directory.
  3. Click the New button, and select Directory.
  4. Enter wordcount as the directory name, and click Submit. The wordcount directory appears in the list.
  5. Open the wordcount directory.
  6. Click the Upload button, and select Files.
  7. Click the Select Files button.
  8. Navigate to the constitution.txt file and upload the file. The file appears in the wordcount directory.
    Example: /oozie/wordcount

Note the directory path because you will need it in the next section when you create a PIG script to run the MapReduce job.

Create a Pig script and run a word count MapReduce job:

  1. Click . The Pig script page opens.
  2. In the script window, enter the following Pig Latin commands:
    A = LOAD '/oozie/wordcount' USING TextLoader() AS (words:chararray);
    B = FOREACH A GENERATE FLATTEN(TOKENIZE(*));
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, COUNT(B);
    STORE D INTO '/oozie/wcresults';
    
    Note: You may need to edit the directory paths in lines A and D. Verify that the path in line A points to the directory where you uploaded the constitution.txt file. Verify that the path in line D points to a directory where you want the output results of the wordcount.
  3. In the Editor, click Run.
    The script picks up the constitution.txt file from the wordcount directory. The system runs the MapReduce job and stores the output in a wcresults directory. You can view the logs to verify that the job completed.
  4. In the Editor, click Save to save the script.
  5. Enter ConstitutionWordcount as the script title.

View the wordcount file that contains the MapReduce job results:

  1. Click . The File Browser page opens.
  2. Navigate to the wcresults output directory.
    Example: /oozie/wcresults
  3. Open the part-r-00000 file, and review the output to see which word was used the most in the US Constitution.
  4. Optionally, edit or download the file.

Next: Use Job Designer to create and submit a MapReduce job design.

Using Job Designer

Job Designer is an application that you can use to submit MapReduce, Hadoop streaming, or JAR jobs. A MapReduce job contains Java map and reduce functions. You can use existing mapper and reducer classes in a MapReduce job design without writing a main Java class. A Hadoop streaming job is a job where map and reduce functions, written in a non-Java language, read and write standard Unix inputs and outputs. A JAR job is a job where map and reduce functions, written in Java, read and write standard Unix inputs and outputs.

When you create a MapReduce job in Job Designer, you can configure variables in the form of $variable_name for all job design settings except Name and Description. If you include variables, you can specify values for the variables in a dialog box that appears when you submit the job.

In this tutorial, use File Browser to create a directory that you can upload the sample JAR file to. Use Job Designer to create a MapReduce job using the sample JAR file. Submit the job, and view the output file.

Create a new directory:

  1. Click . The File Browser page opens.
  2. Select oozie to open the directory.
  3. Click the New button, and select Directory.
  4. Enter MapReduceJob as the directory name.
  5. Click Submit. The MapReduceJob directory appears in the list.

Upload the JAR file:

  1. Select MapReduceJob to open the directory.
  2. Click the Upload button, and select Files.
  3. Click Select Files in the pop up dialog.
  4. Upload the oozie-examples-3.3.2-mapr.jar. The file appears in the MapReduceJob directory.
    Example: /oozie/MapReduceJob/oozie-examples-3.3.2-mapr.jar

Create a MapReduce Job Design:

  1. Click . The Job Designer page opens.
  2. Click the New Action button, and select MapReduce.
  3. Configure the job settings with the information below:
    Setting Description
    Name Enter MapReduce_Job_Design as the job name.
    Description Enter Job Design Tutorial as the descriptor.
    JAR Path Enter the fully-qualified path to the JAR file with the classes that implement the mapper and reducer functions.
    Example: /oozie/MapReduceJob/oozie-examples-3.3.2-mapr.jar
    Job Properties Click the Add property button four times. Enter the following property names and their associated value:
    Property Name Value
    mapred.mapper.class org.apache.oozie.example.SampleMapper
    mapred.reducer.class org.apache.oozie.example.SampleReducer
    mapred.output.dir /oozie/MapReduceJob/output
    mapred.input.dir /oozie/examples/input-data/text
  4.  
  5. Click Save. The Job Designs page appears with the MapReduce_Job_Design in the list.

Submit the job design:

  1. Select the checkbox next to MapReduce_Job_Design.
  2. Click the Submit button. A Submit this job dialog appears.
  3. Click Submit.
  4. Click the Log view to see the log file as the job processes.

View the output file:

  1. Click . The File Browser page opens.
  2. Navigate to the MapReduceJob output directory.
    Example: /oozie/MapReduceJob/output
  3. Click part-00000 file and view the job output. You can see that the MapReduce job performed a character count on the text.

Next: Use Oozie to create and submit a workflow.

Using Oozie

Oozie is a workflow system for Hadoop. Use Oozie to set up workflows that execute MapReduce jobs and to set up a coordinator that manages workflows.

In this tutorial, create a workflow to run the same MapReduce job that you ran in the previous tutorial. Submit the workflow to run the job, and then view the output file.

Create a workflow:

  1. Click . The Oozie page opens.
  2. Select
  3. .
  4. Click the Create button. The Create Workflow page appears.
  5. Enter Oozie_Workflow as the name and description.
  6. Select the Is shared checkbox.
  7. Click Save. The Editor page appears.
  8. Drag and drop the MapReduce action into the workflow between the start and end actions. The bar between the start and end actions turns blue when you have the MapReduce action in the correct spot to drop it. As soon as you drop the MapReduce action, the Edit Node page appears. A node in this scenario is the action in the workflow.
  9. Enter MRaction as the name and the description.
  10. Navigate to the oozie-examples-3.3.2-mapr.jar file located in the MapReduceJob directory, and upload the file.
    Example: /oozie/MapReduceJob
    Note: You may need to click the first / in the directory path to get to the root directory if you do not see the MapReduceJob directory in the list of options. You can navigate to the JAR file from the root directory.
  11. Click the Add Property button four times, and enter the following property names and values:
    Property Name Value
    mapred.mapper.class org.apache.oozie.example.SampleMapper
    mapred.reducer.class org.apache.oozie.example.SampleReducer
    mapred.output.dir /oozie/MapReduceJob/ooziewfoutput
    mapred.input.dir /oozie/examples/input-data/text
  12.  
  13. Click Done. The MapReduce action appears in the workflow.
  14. Click Save.
  15. Under Actions in the navigation panel, click Submit. A Submit this job dialog appears.
  16. Click Submit.

View the output file:

  1. Click . The File Browser page opens.
  2. Navigate to the ooziewfoutput directory.
    Example: /oozie/MapReduceJob/ooziewfoutput
  3. Open the part-00000 file to view the job output.

Tutorial Category Reference: